SQL Fundamentals with Python - Joins

Joining Database Table

Key Constraints

Previously, you have learned about the concept of sql constraints. The Primary key works as an index to find the data we want. The Foreign key works as an index to reference data in the index table based on the primary key of the referenced table.

When we want to create a table that has a foreign key from another table, we must create the referenced table first. This is because we cannot create a table that has a foreign key from a table that does not yet exist.

Primary key

First, we need to create a primary key as a unique identifier for the table. Suppose there is a department table that has the following data:

id	name
1	IT
2	HRD
3	Finance
4	Marketing

Then there is an employee table that has the following data:

id	name	age	address	salary	department_id
1	Rizki	25	Jl. Kebon Jeruk	2000000	2
2	Andi	27	Jl. Kebon Sirih	3000000	3
3	Budi	30	Jl. Kebon Melati	4000000	1
4	Caca	32	Jl. Kebon Anggrek	5000000	1
5	Deni	35	Jl. Kebon Mawar	6000000	1

We can conclude that the employee table has a foreign key from the department table. This is because the department_id column in the employee table references the id column in the department table.

We can create a primary key on the id column because the data is unique by adding PRIMARY KEY as constraints on the id column when creating the department table:

CREATE TABLE department (
  id INT PRIMARY KEY, -- the `id` column will become the primary key
  name VARCHAR(255)
);

When the command is executed, the department table will be created with the id column as a primary key of type INT. Automatically, this data must be unique.

Foreign key

Then we can create a foreign key on the employee table by adding FOREIGN KEY as constraints on the department_id column when creating the employee table:

CREATE TABLE employee (
  id INT PRIMARY KEY,
  name VARCHAR(255),
  age INT,
  address VARCHAR(255),
  salary INT,
  department_id INT FOREIGN KEY REFERENCES department(id)
  -- the `department_id` column will become the foreign key

With python, let’s create the department and employee tables, and insert the data into these tables.

import sqlite3

# Connect to SQLite database
conn = sqlite3.connect('db/sqlite.db')

# Create a cursor object
c = conn.cursor()

# Create table 'department'
c.execute('''
CREATE TABLE department (
    id INT PRIMARY KEY,
    name TEXT);
''')

# Insert data into the 'department' table
data = [
    (1, 'IT'),
    (2, 'HRD'),
    (3, 'Finance'),
    (4, 'Marketing')
]
c.executemany('INSERT INTO department VALUES (?,?)', data)

# Create table 'employee'
c.execute('''
CREATE TABLE employee (
    id INT PRIMARY KEY,
    name TEXT,
    age INT,
    address TEXT,
    salary REAL,
    department_id INT,
    FOREIGN KEY(department_id) REFERENCES department(id));
''')

# Insert data into 'employee' table
data = [
    (1, 'Rizki', 25, 'Jl. Kebon Jeruk', 2000000, 2),
    (2, 'Andi', 27, 'Jl. Kebon Sirih', 3000000, 3),
    (3, 'Budi', 30, 'Jl. Kebon Melati', 4000000, 1),
    (4, 'Caca', 32, 'Jl. Kebon Anggrek', 5000000, 1),
    (5, 'Deni', 35, 'Jl. Kebon Mawar', 6000000, 1)
]
c.executemany('INSERT INTO employee VALUES (?,?,?,?,?,?)', data)

# Commit the transaction
conn.commit()

Now, let’s perform a JOIN operation to combine these tables:

import pandas as pd

pd.read_sql_query('''
SELECT 
    employee.id AS employee_id, 
    employee.name AS employee_name, 
    department.name AS department_name,
    employee.age, 
    employee.address, 
    employee.salary
FROM 
    employee
INNER JOIN department ON employee.department_id = department.id
''', conn)

	employee_id	employee_name	department_name	age	address	salary
0	1	Rizki	HRD	25	Jl. Kebon Jeruk	2000000.0
1	2	Andi	Finance	27	Jl. Kebon Sirih	3000000.0
2	3	Budi	IT	30	Jl. Kebon Melati	4000000.0
3	4	Caca	IT	32	Jl. Kebon Anggrek	5000000.0
4	5	Deni	IT	35	Jl. Kebon Mawar	6000000.0

This will display employee details along with their respective department names.

The way to create a foreign key above is the same as creating a primary key. The only difference is we need to add REFERENCES to determine which table will be used as an index. Thus, the employee table has a column that references the data in the department table based on the id column.

SQL Joins

Previously, we have learned how to retrieve data from a table. But, what if we want to retrieve data from two tables? Suppose we want to retrieve data from table A and table B, and then display both data in one table. To do this, we can use the concept of join.

Please note, we can only join tables that have a foreign key to the referenced table

Suppose there are 2 tables, customers and orders. The customers table has the following data:

id	name	address
1	Rizki	Jl. Kebon Jeruk
2	Andi	Jl. Kebon Sirih
3	Budi	Jl. Kebon Melati
4	Caca	Jl. Kebon Anggrek

The orders table has the following data:

id	customer_id	order_date
1001	1	2020-01-01
1002	2	2020-01-02
1003	3	2020-01-03
1004	4	2020-01-04

Notice that the customer_id column in the orders table refers to id in the customers table. The relationship between the two tables above is the customer_id column.

We can join the two tables with a table display like the following:

order_id	customer_name	customer_address	order_date
1001	Rizki	Jl. Kebon Jeruk	2020-01-01
1002	Andi	Jl. Kebon Sirih	2020-01-02
1003	Budi	Jl. Kebon Melati	2020-01-03
1004	Caca	Jl. Kebon Anggrek	2020-01-04

The first column is the id data in the orders table, the second and third columns are the name and address data in the customers table. Finally, the order_date data in the orders table. The data was successfully combined because there is a reference in the orders table.

To perform a join, we can use the SELECT command by adding JOIN and ON:

SELECT * FROM <table1> -- query command
INNER JOIN <table2>
ON <table1>.<key> = <table2>.<key>

Where <table1> and <table2> are the names of the tables to be merged using the INNER JOIN command, <key> is the column being referenced (primary key or foreign key) using the ON command and compared with =.

For example, we can join the customers and orders tables using inner join with the following query:

SELECT * FROM orders
INNER JOIN customers
ON orders.customer_id = customers.id;

The result of the query is as follows:

id	customer_id	order_date	id	name	address
1001	1	2020-01-01	1	Rizki	Jl. Kebon Jeruk
1002	2	2020-01-02	2	Andi	Jl. Kebon Sirih
1003	3	2020-01-03	3	Budi	Jl. Kebon Melati
1004	4	2020-01-04	4	Caca	Jl. Kebon Anggrek

When joining tables, the columns from the joined table will be on the right. All columns from the orders table will be displayed from the left (id, customer_id, order_date) and then display the columns from the customers table (id, name, address). We can tidy up the merged columns by selecting to SELECT only a few columns using the syntax table_name.column_name. For example:

SELECT
    orders.id AS order_id,
    customers.name AS customer_name,
    customers.address AS customer_address,
    orders.order_date
FROM orders
INNER JOIN customers
ON orders.customer_id = customers.id;

In the SELECT command, we take 2 columns in the orders table, namely id and order_date, and 2 columns in the customers table, namely name and address. We rename the id column in the orders table to order_id using AS, as well as for name and address in the customers table.

The result of the query will be according to what we want:

order_id	customer_name	customer_address	order_date
1001	Rizki	Jl. Kebon Jeruk	2020-01-01
1002	Andi	Jl. Kebon Sirih	2020-01-02
1003	Budi	Jl. Kebon Melati	2020-01-03
1004	Caca	Jl. Kebon Anggrek	2020-01-04

Finally, remember to close the connection to the database after all operations.

# Close the connection
conn.close()

Database Transaction

What is a transaction?

A Transaction is a set of SQL Commands designed to execute a set of interrelated actions. We can control each action that occurs by determining when the transaction begins and ends. A transaction can be canceled if an error occurs in one of the actions taken.

Why use a transaction?

Let’s take an example with a sales case study. Each sales process will carry out the process of selecting goods, payment, increasing the total goods sold, and recording the transaction with the selling price. And each of these processes is stored in a database with different tables. The data in the database is as follows:

Products table with example data:

id	name	price	stock
1	Product A	10000	10
2	Product B	20000	20

Sales table with example data:

id	product_id	quantity	total_price
1	1	2	20000
2	2	1	20000

Transactions table with example data:

id	date	product_id	amount
1	2020-01-01	1	15000

Suppose there is a customer using our application, by purchasing one Product A for Rp. 15,000. And the customer made the purchase on 2020-01-01.

When a sale occurs, we will reduce the total stock of goods in the products table column stock, then add the total sales in the sales table, and record the sales transaction in the transactions table. So, the command process is as follows:

-- step 1: get the product data to be searched
SELECT * FROM products WHERE id = 1;

-- step 2: reduce product stock
UPDATE products SET stock = stock - 1 WHERE id = 1;

-- step 3: increase total sales
INSERT INTO sales (product_id, quantity, total_price) VALUES (1, 1, 10000);

-- step 4: record the transaction
INSERT INTO transactions (date, product_id, amount) VALUES ('2020-01-01', 1, 15000);

-- done

There are 4 SQL commands run to perform the sales transaction process: get the purchased product data in the products table, then perform an update by reducing stock by 1 in the products table, add total sales in the sales table, and record the sales transaction in the transactions table. These commands will always be run if there is a sales process in our application.

However, what if when getting the product it turns out the stock is empty, or when recording the transaction an error occurs. This will result in data inconsistency between each table. The stock may have been reduced by 1, and in the sales table there is already an addition of 1 sale, but in the transactions table no transaction has been recorded. This will confuse us to check these data.

To overcome this data inconsistency, we must return or cancel all SQL commands that have previously been performed. Therefore, we need to use a transaction.

With a transaction, we can ensure that all commands within a transaction will always be run simultaneously. If there is an error, then all commands within a transaction will be canceled.

So there are only 2 conditions, that is all commands in the transaction are successfully executed.

Or cancel all commands if there is an error.

SQL Transaction

To use a transaction in SQL, we need to understand the beginning of the process and the end of the process. The beginning of the process is when we start a transaction, and the end of the process is when we end a transaction.

We use the BEGIN TRANSACTION command to start a transaction, and the COMMIT command is used to complete a transaction.

BEGIN TRANSACTION;
-- SQL command

COMMIT;

Based on the example above, we just need to insert 4 SQL commands into the transaction.

BEGIN TRANSACTION;

-- step 1: get the product data to be searched
SELECT * FROM products WHERE id = 1;

-- step 2: reduce product stock
UPDATE products SET stock = stock - 1 WHERE id = 1;

-- step 3: increase total sales
INSERT INTO sales (product_id, quantity, total_price) VALUES (1, 1, 10000);

-- step 4: record the transaction
INSERT INTO transactions (date, product_id, amount) VALUES ('2020-01-01', 1, 15000);

COMMIT;

We can issue a ROLLBACK command to cancel a transaction if there is an error or unmet condition.

BEGIN TRANSACTION;
-- SQL command

ROLLBACK;

You can apply a transaction in an online SQL compiler or SQLite/DBMS tool that you use.