SQL Fundamentals with Python - Joins

Joining Database Table

Key Constraints

Previously, you have learned about the concept of sql constraints. The Primary key works as an index to find the data we want. The Foreign key works as an index to reference data in the index table based on the primary key of the referenced table.

When we want to create a table that has a foreign key from another table, we must create the referenced table first. This is because we cannot create a table that has a foreign key from a table that does not yet exist.

Primary key

First, we need to create a primary key as a unique identifier for the table. Suppose there is a department table that has the following data:

id name
1 IT
2 HRD
3 Finance
4 Marketing

Then there is an employee table that has the following data:

id name age address salary department_id
1 Rizki 25 Jl. Kebon Jeruk 2000000 2
2 Andi 27 Jl. Kebon Sirih 3000000 3
3 Budi 30 Jl. Kebon Melati 4000000 1
4 Caca 32 Jl. Kebon Anggrek 5000000 1
5 Deni 35 Jl. Kebon Mawar 6000000 1

We can conclude that the employee table has a foreign key from the department table. This is because the department_id column in the employee table references the id column in the department table.

join-table-example

We can create a primary key on the id column because the data is unique by adding PRIMARY KEY as constraints on the id column when creating the department table:

CREATE TABLE department (
  id INT PRIMARY KEY, -- the `id` column will become the primary key
  name VARCHAR(255)
);

When the command is executed, the department table will be created with the id column as a primary key of type INT. Automatically, this data must be unique.

Foreign key

Then we can create a foreign key on the employee table by adding FOREIGN KEY as constraints on the department_id column when creating the employee table:

CREATE TABLE employee (
  id INT PRIMARY KEY,
  name VARCHAR(255),
  age INT,
  address VARCHAR(255),
  salary INT,
  department_id INT FOREIGN KEY REFERENCES department(id)
  -- the `department_id` column will become the foreign key

With python, let’s create the department and employee tables, and insert the data into these tables.

import sqlite3

# Connect to SQLite database
conn = sqlite3.connect('db/sqlite.db')

# Create a cursor object
c = conn.cursor()

# Create table 'department'
c.execute('''
CREATE TABLE department (
    id INT PRIMARY KEY,
    name TEXT);
''')

# Insert data into the 'department' table
data = [
    (1, 'IT'),
    (2, 'HRD'),
    (3, 'Finance'),
    (4, 'Marketing')
]
c.executemany('INSERT INTO department VALUES (?,?)', data)

# Create table 'employee'
c.execute('''
CREATE TABLE employee (
    id INT PRIMARY KEY,
    name TEXT,
    age INT,
    address TEXT,
    salary REAL,
    department_id INT,
    FOREIGN KEY(department_id) REFERENCES department(id));
''')

# Insert data into 'employee' table
data = [
    (1, 'Rizki', 25, 'Jl. Kebon Jeruk', 2000000, 2),
    (2, 'Andi', 27, 'Jl. Kebon Sirih', 3000000, 3),
    (3, 'Budi', 30, 'Jl. Kebon Melati', 4000000, 1),
    (4, 'Caca', 32, 'Jl. Kebon Anggrek', 5000000, 1),
    (5, 'Deni', 35, 'Jl. Kebon Mawar', 6000000, 1)
]
c.executemany('INSERT INTO employee VALUES (?,?,?,?,?,?)', data)

# Commit the transaction
conn.commit()

Now, let’s perform a JOIN operation to combine these tables:

import pandas as pd

pd.read_sql_query('''
SELECT 
    employee.id AS employee_id, 
    employee.name AS employee_name, 
    department.name AS department_name,
    employee.age, 
    employee.address, 
    employee.salary
FROM 
    employee
INNER JOIN department ON employee.department_id = department.id
''', conn)
employee_id employee_name department_name age address salary
0 1 Rizki HRD 25 Jl. Kebon Jeruk 2000000.0
1 2 Andi Finance 27 Jl. Kebon Sirih 3000000.0
2 3 Budi IT 30 Jl. Kebon Melati 4000000.0
3 4 Caca IT 32 Jl. Kebon Anggrek 5000000.0
4 5 Deni IT 35 Jl. Kebon Mawar 6000000.0

This will display employee details along with their respective department names.

The way to create a foreign key above is the same as creating a primary key. The only difference is we need to add REFERENCES to determine which table will be used as an index. Thus, the employee table has a column that references the data in the department table based on the id column.

SQL Joins

Previously, we have learned how to retrieve data from a table. But, what if we want to retrieve data from two tables? Suppose we want to retrieve data from table A and table B, and then display both data in one table. To do this, we can use the concept of join.


Please note, we can only join tables that have a foreign key to the referenced table


Suppose there are 2 tables, customers and orders. The customers table has the following data:

id name address
1 Rizki Jl. Kebon Jeruk
2 Andi Jl. Kebon Sirih
3 Budi Jl. Kebon Melati
4 Caca Jl. Kebon Anggrek

The orders table has the following data:

id customer_id order_date
1001 1 2020-01-01
1002 2 2020-01-02
1003 3 2020-01-03
1004 4 2020-01-04

Notice that the customer_id column in the orders table refers to id in the customers table. The relationship between the two tables above is the customer_id column.

We can join the two tables with a table display like the following:

order_id customer_name customer_address order_date
1001 Rizki Jl. Kebon Jeruk 2020-01-01
1002 Andi Jl. Kebon Sirih 2020-01-02
1003 Budi Jl. Kebon Melati 2020-01-03
1004 Caca Jl. Kebon Anggrek 2020-01-04

The first column is the id data in the orders table, the second and third columns are the name and address data in the customers table. Finally, the order_date data in the orders table. The data was successfully combined because there is a reference in the orders table.

To perform a join, we can use the SELECT command by adding JOIN and ON:

SELECT * FROM <table1> -- query command
INNER JOIN <table2>
ON <table1>.<key> = <table2>.<key>

Where <table1> and <table2> are the names of the tables to be merged using the INNER JOIN command, <key> is the column being referenced (primary key or foreign key) using the ON command and compared with =.

For example, we can join the customers and orders tables using inner join with the following query:

SELECT * FROM orders
INNER JOIN customers
ON orders.customer_id = customers.id;

The result of the query is as follows:

id customer_id order_date id name address
1001 1 2020-01-01 1 Rizki Jl. Kebon Jeruk
1002 2 2020-01-02 2 Andi Jl. Kebon Sirih
1003 3 2020-01-03 3 Budi Jl. Kebon Melati
1004 4 2020-01-04 4 Caca Jl. Kebon Anggrek

When joining tables, the columns from the joined table will be on the right. All columns from the orders table will be displayed from the left (id, customer_id, order_date) and then display the columns from the customers table (id, name, address). We can tidy up the merged columns by selecting to SELECT only a few columns using the syntax table_name.column_name. For example:

SELECT
    orders.id AS order_id,
    customers.name AS customer_name,
    customers.address AS customer_address,
    orders.order_date
FROM orders
INNER JOIN customers
ON orders.customer_id = customers.id;

In the SELECT command, we take 2 columns in the orders table, namely id and order_date, and 2 columns in the customers table, namely name and address. We rename the id column in the orders table to order_id using AS, as well as for name and address in the customers table.

The result of the query will be according to what we want:

order_id customer_name customer_address order_date
1001 Rizki Jl. Kebon Jeruk 2020-01-01
1002 Andi Jl. Kebon Sirih 2020-01-02
1003 Budi Jl. Kebon Melati 2020-01-03
1004 Caca Jl. Kebon Anggrek 2020-01-04

Finally, remember to close the connection to the database after all operations.

# Close the connection
conn.close()

Database Transaction

What is a transaction?

A Transaction is a set of SQL Commands designed to execute a set of interrelated actions. We can control each action that occurs by determining when the transaction begins and ends. A transaction can be canceled if an error occurs in one of the actions taken.

Why use a transaction?

Let’s take an example with a sales case study. Each sales process will carry out the process of selecting goods, payment, increasing the total goods sold, and recording the transaction with the selling price. And each of these processes is stored in a database with different tables. The data in the database is as follows:

Products table with example data:

id name price stock
1 Product A 10000 10
2 Product B 20000 20

Sales table with example data:

id product_id quantity total_price
1 1 2 20000
2 2 1 20000

Transactions table with example data:

id date product_id amount
1 2020-01-01 1 15000

Suppose there is a customer using our application, by purchasing one Product A for Rp. 15,000. And the customer made the purchase on 2020-01-01.

When a sale occurs, we will reduce the total stock of goods in the products table column stock, then add the total sales in the sales table, and record the sales transaction in the transactions table. So, the command process is as follows:

-- step 1: get the product data to be searched
SELECT * FROM products WHERE id = 1;

-- step 2: reduce product stock
UPDATE products SET stock = stock - 1 WHERE id = 1;

-- step 3: increase total sales
INSERT INTO sales (product_id, quantity, total_price) VALUES (1, 1, 10000);

-- step 4: record the transaction
INSERT INTO transactions (date, product_id, amount) VALUES ('2020-01-01', 1, 15000);

-- done

sql-transaction

There are 4 SQL commands run to perform the sales transaction process: get the purchased product data in the products table, then perform an update by reducing stock by 1 in the products table, add total sales in the sales table, and record the sales transaction in the transactions table. These commands will always be run if there is a sales process in our application.

However, what if when getting the product it turns out the stock is empty, or when recording the transaction an error occurs. This will result in data inconsistency between each table. The stock may have been reduced by 1, and in the sales table there is already an addition of 1 sale, but in the transactions table no transaction has been recorded. This will confuse us to check these data.

sql-transaction-err

To overcome this data inconsistency, we must return or cancel all SQL commands that have previously been performed. Therefore, we need to use a transaction.

With a transaction, we can ensure that all commands within a transaction will always be run simultaneously. If there is an error, then all commands within a transaction will be canceled.

So there are only 2 conditions, that is all commands in the transaction are successfully executed.

sql-transaction-success

Or cancel all commands if there is an error.

sql-transaction-rollback

SQL Transaction

To use a transaction in SQL, we need to understand the beginning of the process and the end of the process. The beginning of the process is when we start a transaction, and the end of the process is when we end a transaction.

We use the BEGIN TRANSACTION command to start a transaction, and the COMMIT command is used to complete a transaction.

BEGIN TRANSACTION;
-- SQL command

COMMIT;

Based on the example above, we just need to insert 4 SQL commands into the transaction.

BEGIN TRANSACTION;

-- step 1: get the product data to be searched
SELECT * FROM products WHERE id = 1;

-- step 2: reduce product stock
UPDATE products SET stock = stock - 1 WHERE id = 1;

-- step 3: increase total sales
INSERT INTO sales (product_id, quantity, total_price) VALUES (1, 1, 10000);

-- step 4: record the transaction
INSERT INTO transactions (date, product_id, amount) VALUES ('2020-01-01', 1, 15000);

COMMIT;

We can issue a ROLLBACK command to cancel a transaction if there is an error or unmet condition.

BEGIN TRANSACTION;
-- SQL command

ROLLBACK;

You can apply a transaction in an online SQL compiler or SQLite/DBMS tool that you use.

Back to top