SQL Fundamentals with Python - Tables

DML (Data Manipulation Language)

What is DML?

Data Manipulation Language (DML) is used to modify data in a database. DML is responsible for all forms of data changes within the database.

The most frequently used DML commands are INSERT, UPDATE, and DELETE. Let’s discuss these commands by continuing with the example table that was created earlier.

The table has the following structure and data:

id name age address salary
1 Rizki 25 Jl. Kebon Jeruk 2000000
2 Andi 27 Jl. Kebon Sirih 3000000
3 Budi 30 Jl. Kebon Melati 4000000
4 Caca 32 Jl. Kebon Anggrek 5000000
5 Deni 35 Jl. Kebon Mawar 6000000

In the DDL material, we have already created a database and this table, let’s add the data above into the table that is already created:

-- create database
CREATE DATABASE db_employee;

-- use database
USE db_employee;

-- create table
CREATE TABLE employees (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50) NOT NULL,
  age INT NOT NULL,
  address VARCHAR(100),
  salary INT
);

Try in python using SQLite

import sqlite3

# Connect to the SQLite database
# If the database does not exist, it will be created
conn = sqlite3.connect('db/sqlite.db')

# Create a cursor object
cur = conn.cursor()
# Execute a SQL command: this creates a new table
cur.execute('''
    CREATE TABLE employees (
        id INTEGER,
        name TEXT NOT NULL,
        age INTEGER NOT NULL,
        address TEXT,
        salary INTEGER
    )
''')

# Commit the transaction
conn.commit()

We will continue with examples of DML commands.

INSERT

We can use the INSERT command to add new data into a table. The INSERT command has the following structure:

INSERT INTO <table-name> VALUES (column1, column2, column3, ...)

Looking at the example above, we can see that the INSERT command has two parts, namely the INSERT INTO part and the VALUES part. The INSERT INTO part is used to designate which table we will add data to, where <table-name> is the name of the table that we will add data to. The VALUES part is used to specify what data we will add into the table.

Here is an example of an INSERT command that will add new data into the employees table:

INSERT INTO employees VALUES (1, 'Rizki', 25, 'Jl. Kebon Jeruk', 2000000);

Try in python using SQLite

cur.execute("INSERT INTO employees VALUES (1, 'Rizki', 25, 'Jl. Kebon Jeruk', 2000000)")
conn.commit()

See the table:

import pandas as pd

pd.read_sql_query("SELECT * FROM employees", conn)
id name age address salary
0 1 Rizki 25 Jl. Kebon Jeruk 2000000

When adding string data in SQL we must use a single quote (') to mark that the data is a string, like 'Rizky'. If we do not use a single quote, then the data will be considered invalid.

The result will be as follows:

id name age address salary
1 Rizki 265 Jl. Kebon Jeruk 2000000

The command above will add new data into the employees table, the data will be filled in according to the order of the columns in the table. 1 will be filled into the id column, 'Rizki' will be filled into the name column, and 25 will be filled into the age column and so forth.

If we want to add new data into specific columns, we can add the column name in the INSERT INTO section as follows:

INSERT INTO employee (id, name, age) VALUES (1, 'Rizki', 25);

Try in python using SQLite

cur.execute("INSERT INTO employees (id, name, age) VALUES (1, 'Rizki', 25)")
conn.commit()

See the table:

import pandas as pd

pd.read_sql_query("SELECT * FROM employees", conn)
id name age address salary
0 1 Rizki 25 Jl. Kebon Jeruk 2000000.0
1 1 Rizki 25 None NaN

The result will be:

id name age address salary
1 Rizki 265 NULL NULL

The example above is adding new data into the id, name, and age columns. Because we are only adding data into the id, name, and age columns, the address and salary columns will be filled with NULL values.

Please note, NULL data is data that has no value, if we want to set it so that the column cannot be filled with NULL data, we can use the NOT NULL constraints when creating the column.

We can also add NULL data into a specific column, just like adding other data:

INSERT INTO employee (id, name, age, address, salary) VALUES (1, 'Rizki', 25, NULL, NULL);

Try in python using SQLite

cur.execute("INSERT INTO employees (id, name, age, address, salary) VALUES (1, 'Rizki', 25, NULL, NULL)")
conn.commit()

See the table:

import pandas as pd

pd.read_sql_query("SELECT * FROM employees", conn)
id name age address salary
0 1 Rizki 25 Jl. Kebon Jeruk 2000000.0
1 1 Rizki 25 None NaN
2 1 Rizki 25 None NaN

Finally, we can add more than one data into a table using the INSERT command separated by ",", as follows:

INSERT INTO employees
VALUES (1, 'Rizki', 25, 'Jl. Kebon Jeruk', 2000000),
(2, 'Andi', 27, 'Jl. Kebon Sirih', 3000000),
(3, 'Budi', 30, 'Jl. Kebon Melati', 4000000),
(4, 'Caca', 32, 'Jl. Kebon Anggrek', 5000000),
(5, 'Deni', 35, 'Jl. Kebon Mawar', 6000000);

It can also be done using specific columns:

INSERT INTO employees (id, name, age)
VALUES (1, 'Rizki', 25),
(2, 'Andi', 27),
(3, 'Budi', 30),
(4, 'Caca', 32),
(5, 'Deni', 35);

As a result, it will add 5 new data into the employees table.

You can try adding new data into the employees table with the above INSERT command in this online SQL compiler or the SQLite tool you are using.

Try in python using SQLite

# Insert multiple data rows into the table
data = [
    (2, 'Andi', 27, 'Jl. Kebon Sirih', 3000000),
    (3, 'Budi', 30, 'Jl. Kebon Melati', 4000000),
    (4, 'Caca', 32, 'Jl. Kebon Anggrek', 5000000),
    (5, 'Deni', 35, 'Jl. Kebon Mawar', 6000000)
]

cur.executemany("INSERT INTO employees VALUES (?, ?, ?, ?, ?)", data)
conn.commit()

See the table:

import pandas as pd

pd.read_sql_query("SELECT * FROM employees", conn)
id name age address salary
0 1 Rizki 25 Jl. Kebon Jeruk 2000000.0
1 1 Rizki 25 None NaN
2 1 Rizki 25 None NaN
3 2 Andi 27 Jl. Kebon Sirih 3000000.0
4 3 Budi 30 Jl. Kebon Melati 4000000.0
5 4 Caca 32 Jl. Kebon Anggrek 5000000.0
6 5 Deni 35 Jl. Kebon Mawar 6000000.0

UPDATE

The UPDATE command is used to modify existing data in a table. The UPDATE command has the following structure:

UPDATE <table-name>
SET <column-name> = <new-value>
WHERE <condition>;

Where <table-name> is the name of the table whose data we will change, <column-name> is the name of the column whose data we will change, <new-value> is the new value that we will insert into that column, and <condition> is the condition that will determine which data we will modify.

Here is an example of the UPDATE command that will modify the data in the employees table:

UPDATE employees
SET name = 'Rizal', age = 26
WHERE id = 1;

Try in python using SQLite

# Update data in the table
cur.execute("UPDATE employees SET name = 'Rizal', age = 26 WHERE id = 1")
conn.commit()

See the table:

import pandas as pd

pd.read_sql_query("SELECT * FROM employees", conn)
id name age address salary
0 1 Rizal 26 Jl. Kebon Jeruk 2000000.0
1 1 Rizal 26 None NaN
2 1 Rizal 26 None NaN
3 2 Andi 27 Jl. Kebon Sirih 3000000.0
4 3 Budi 30 Jl. Kebon Melati 4000000.0
5 4 Caca 32 Jl. Kebon Anggrek 5000000.0
6 5 Deni 35 Jl. Kebon Mawar 6000000.0

In the example above, we are changing the name data to 'Rizal' and the age data to 26, separating the columns we will change with ",". And the data being modified is the one with id = 1.

Data before modification:

id name age address salary
1 Rizki 25 Jl. Kebon Jeruk 2000000
2 Andi 27 Jl. Kebon Sirih 3000000
3 Budi 30 Jl. Kebon Melati 4000000

Data after modification:

id name age address salary
1 Rizal 26 Jl. Kebon Jeruk 2000000
2 Andi 27 Jl. Kebon Sirih 3000000
3 Budi 30 Jl. Kebon Melati 4000000

ATTENTION !!

Make sure to add WHERE so that we do not change all the data in the table. If we do not add WHERE, then all the data in the table will be changed. The example above is where we only change data that has id = 1, so the name and age data only changes the data at id 1.

You can try modifying the data in the employees table with the above UPDATE command using this online SQL compiler or the DBMS tool you are using.

DELETE

The DELETE command is used to remove existing data from a table. The DELETE command has the following structure:

DELETE FROM <table-name>
WHERE <condition>;

Where <table-name> is the name of the table whose data we will delete, and <condition> is the condition that will determine which data we will delete.

Here is an example of the DELETE command that will delete data in the employees table:

DELETE FROM employees
WHERE id = 1;

Try in python using SQLite

# Delete data from the table
cur.execute("DELETE FROM employees WHERE id = 1")
conn.commit()

See the table:

import pandas as pd

pd.read_sql_query("SELECT * FROM employees", conn)
id name age address salary
0 2 Andi 27 Jl. Kebon Sirih 3000000
1 3 Budi 30 Jl. Kebon Melati 4000000
2 4 Caca 32 Jl. Kebon Anggrek 5000000
3 5 Deni 35 Jl. Kebon Mawar 6000000

In the example above, we are deleting data with id = 1.

Data before deletion:

id name age address salary
1 Rizal 26 Jl. Kebon Jeruk 2000000
2 Andi 27 Jl. Kebon Sirih 3000000
3 Budi 30 Jl. Kebon Melati 4000000

Data after deletion:

id name age address salary
2 Andi 27 Jl. Kebon Sirih 3000000
3 Budi 30 Jl. Kebon Melati 4000000

ATTENTION !!

Make sure to add WHERE just like UPDATE so we do not delete all the data in the table. If we do not add WHERE, then all the data in the table will be deleted. The example above is where we only delete data that has id = 1, so the name and age data only changes the data at id 1.

You can try deleting data in the employees table with the above DELETE command using this online SQL compiler or the DBMS tool you are using.

# Close the connection
conn.close()

DQL (Data Query Language)

What is DQL?

Data Query Language (DQL) is used to retrieve data from a table in a database. The purpose of DQL is to obtain data using SQL commands. The SQL command used is SELECT.

Let’s continue with the data we have created in the previous section.

id name age address salary
1 Rizki 25 Jl. Kebon Jeruk 2000000
2 Andi 27 Jl. Kebon Sirih 3000000
3 Budi 30 Jl. Kebon Melati 4000000
4 Caca 32 Jl. Kebon Anggrek 5000000
5 Deni 35 Jl. Kebon Mawar 6000000

Then we try to save this data into the employees table with the following SQL command:

-- create data in table employee
INSERT INTO employees (id, name, age, address, salary)
VALUES (1, 'Rizki', 25, 'Jl. Kebon Jeruk', 2000000),
(2, 'Andi', 27, 'Jl. Kebon Sirih', 3000000),
(3, 'Budi', 30, 'Jl. Kebon Melati', 4000000),
(4, 'Caca', 32, 'Jl. Kebon Anggrek', 5000000),
(5, 'Deni', 35, 'Jl. Kebon Mawar', 6000000);
import sqlite3

# Connect to SQLite database
conn = sqlite3.connect('db/sqlite.db')

# Create a cursor object
c = conn.cursor()
# Create table
c.execute('''
CREATE TABLE employees (
    id INT,
    name TEXT,
    age INT,
    address TEXT,
    salary REAL);
''')
# Commit the transaction
conn.commit()
# Insert data into the table
data = [
    (1, 'Rizki', 25, 'Jl. Kebon Jeruk', 2000000),
    (2, 'Andi', 27, 'Jl. Kebon Sirih', 3000000),
    (3, 'Budi', 30, 'Jl. Kebon Melati', 4000000),
    (4, 'Caca', 32, 'Jl. Kebon Anggrek', 5000000),
    (5, 'Deni', 35, 'Jl. Kebon Mawar', 6000000)
]
c.executemany('INSERT INTO employees VALUES (?,?,?,?,?)', data)

# Commit the transaction
conn.commit()

SELECT

The SELECT command is used to retrieve data from a table. The SELECT command has a fairly complex syntax. Here is a simple command example to retrieve data from the employee table:

SELECT * FROM employees;

Retrieve all data from the employees table.

import pandas as pd

pd.read_sql_query("SELECT * FROM employees", conn)
id name age address salary
0 1 Rizki 25 Jl. Kebon Jeruk 2000000.0
1 2 Andi 27 Jl. Kebon Sirih 3000000.0
2 3 Budi 30 Jl. Kebon Melati 4000000.0
3 4 Caca 32 Jl. Kebon Anggrek 5000000.0
4 5 Deni 35 Jl. Kebon Mawar 6000000.0

The command above will retrieve data in the employees table. We use * to retrieve all columns in the employee table. If we want to retrieve data from a specific column, we can write the column name in the SELECT command. For example:

The results will be as follows:

id name age address salary
1 Rizki 25 Jl. Kebon Jeruk 2000000
2 Andi 27 Jl. Kebon Sirih 3000000
3 Budi 30 Jl. Kebon Melati 4000000
4 Caca 32 Jl. Kebon Anggrek 5000000
5 Deni 35 Jl. Kebon Mawar 6000000
SELECT id, name, age FROM employees;

Retrieve specific columns (id, name, age) from the employees table.

import pandas as pd

pd.read_sql_query("SELECT id, name, age FROM employees", conn)
id name age
0 1 Rizki 25
1 2 Andi 27
2 3 Budi 30
3 4 Caca 32
4 5 Deni 35

The command above will retrieve all data for several columns, we can write the column name separated by ,. Because in the command it writes the id, name and age columns on the employee table, the result will be as follows:

id name age
1 Rizki 25
2 Andi 27
3 Budi 30
4 Caca 32
5 Deni 35

WHERE

The WHERE command is used to filter the data to be retrieved. The WHERE command must be written after the SELECT command. For example:

SELECT * FROM employees
WHERE id = 1;

Retrieve data from the employees table with id equals to 1.

import pandas as pd

pd.read_sql_query("SELECT * FROM employees WHERE id = 1", conn)
id name age address salary
0 1 Rizki 25 Jl. Kebon Jeruk 2000000.0

The command above will retrieve data from the employees table with id 1, so it only gets 1 data that has id 1. The result will be as follows:

id name age address salary
1 Rizki 25 Jl. Kebon Jeruk 2000000

We can also use the WHERE command for logical operations. For example:

SELECT * FROM employees
WHERE age < 30;

Retrieve data from the employees table with age less than 30.

import pandas as pd

pd.read_sql_query("SELECT * FROM employees WHERE age < 30", conn)
id name age address salary
0 1 Rizki 25 Jl. Kebon Jeruk 2000000.0
1 2 Andi 27 Jl. Kebon Sirih 3000000.0

The command above will retrieve data from the employees table with ages less than 30 years, so it only gets 2 data that has an age less than 30 years. The result will be as follows:

id name age address salary
1 Rizki 25 Jl. Kebon Jeruk 2000000
2 Andi 27 Jl. Kebon Sirih 3000000

There are logical operations that can be used to combine several conditions namely AND and OR.

  • AND is used to combine several conditions with and logic.
  • OR is used to combine several conditions with or logic.
SELECT * FROM employees
WHERE age < 30 AND salary > 2000000;

Retrieve data from the employees table with age less than 30 and more than 2000000.

import pandas as pd

pd.read_sql_query("SELECT * FROM employees WHERE age < 30 AND salary > 2000000", conn)
id name age address salary
0 2 Andi 27 Jl. Kebon Sirih 3000000.0

The command above will retrieve data from the employee table with an age less than 30 years and a salary more than 2 million, so it only gets 1 data that has an age less than 30 years and a salary more than 2 million. The result will be as follows:

id name age address salary
2 Andi 27 Jl. Kebon Sirih 3000000

IN

The IN command is used to filter the data to be retrieved based on a dataset that contains several values. This command is combined with the WHERE command and must be written after it. For example:

SELECT * FROM employees
WHERE id IN (1, 3, 5);

Retrieve data from the employees table with id in (1, 3, 5).

import pandas as pd

pd.read_sql_query("SELECT * FROM employees WHERE id IN (1, 3, 5)", conn)
id name age address salary
0 1 Rizki 25 Jl. Kebon Jeruk 2000000.0
1 3 Budi 30 Jl. Kebon Melati 4000000.0
2 5 Deni 35 Jl. Kebon Mawar 6000000.0

The command above will retrieve data from the employee table with id 1, 3 and 5, so it will get 3 data that has id 1, 3 and 5. The result will be as follows:

id name age address salary
1 Rizki 25 Jl. Kebon Jeruk 2000000
3 Budi 30 Jl. Kebon Melati 4000000
5 Deni 35 Jl. Kebon Mawar 6000000

You can apply the basic SELECT command using this online SQL compiler or the DBMS tool you are using. Don’t forget to add data first before doing the SELECT command.

ORDER BY

The ORDER BY command is used to sort the data to be retrieved. This command is combined with the SELECT command and must be written after it. This command only has two types of sorting namely ASC and DESC.

  • ASC is used to sort data in ascending order (from small to large).
  • DESC is used to sort data in descending order (from large to small).

For example:

SELECT * FROM employees
ORDER BY age DESC;

Retrieve data from the employees table and order by age in descending order.

import pandas as pd

pd.read_sql_query("SELECT * FROM employees ORDER BY age DESC", conn)
id name age address salary
0 5 Deni 35 Jl. Kebon Mawar 6000000.0
1 4 Caca 32 Jl. Kebon Anggrek 5000000.0
2 3 Budi 30 Jl. Kebon Melati 4000000.0
3 2 Andi 27 Jl. Kebon Sirih 3000000.0
4 1 Rizki 25 Jl. Kebon Jeruk 2000000.0

The command above will retrieve data from the employee table and sort it based on the age column in descending order (large to small), so it will get 5 data that is sorted based on age in descending order. The result will be as follows:

id name age address salary
5 Deni 35 Jl. Kebon Mawar 6000000
4 Caca 32 Jl. Kebon Kamboja 5000000
3 Budi 30 Jl. Kebon Melati 4000000
2 Andi 27 Jl. Kebon Sirih 3000000
1 Rizki 25 Jl. Kebon Jeruk 2000000

LIMIT

The LIMIT command is used to limit the number of data to be retrieved. The number of data to be retrieved will be determined by the parameter given by taking in order from the top data. This command is combined with the SELECT command and must be written after it. For example:

SELECT * FROM employees
LIMIT 2;

Retrieve 2 records from the employees table.

import pandas as pd

pd.read_sql_query("SELECT * FROM employee LIMIT 2", conn)
id name age address salary
0 1 Rizki 25 Jl. Kebon Jeruk 2000000.0
1 2 Andi 27 Jl. Kebon Sirih 3000000.0

The command above will retrieve data from the employee table and limit only 2 data to be retrieved (taken from the top), so it will get 3 data taken from the top data. The result will be as follows:

id name age address salary
1 Rizki 25 Jl. Kebon Jeruk 2000000
2 Andi 27 Jl. Kebon Sirih 3000000

Aggregate function

An aggregate function is a function used to produce a value based on the data in a table. This function will produce one value only.

Previously, we have retrieved data from the employee table using the SELECT command. We can use aggregate functions to get certain values from the data in the table. For example, getting the maximum or average value from the data in the salary column of the employee table. Or get the smallest value from all data in the age column.

There are many aggregate functions that can be used, but here we will discuss some functions that are most often used.

COUNT

The COUNT function is a function used to count the number of rows in a table. The result of this function will produce one value only. For example:

SELECT COUNT(*) FROM employees;

Count the number of records in the employees table.

import pandas as pd

pd.read_sql_query("SELECT COUNT(*) FROM employees", conn)
COUNT(*)
0 5

The command above will count the number of rows in the employee table in all columns and produce the value 5, because there are 5 data in the table. The result will be as follows:

COUNT(*)
5

This function will not total the value of a specific column that has NULL data. For example, there is the following table data:

id name age address salary
1 Rizki 25 Jl. Kebon Jeruk 2000000
2 Andi 27 NULL 3000000
3 Budi 30 NULL 4000000
c.execute('DELETE FROM employees')

data = [
    (1, 'Rizki', 25, 'Jl. Kebon Jeruk', 2000000),
    (2, 'Andi', 27, None, 3000000),
    (3, 'Budi', 30, None, 4000000)
]
c.executemany('INSERT INTO employees VALUES (?,?,?,?,?)', data)

conn.commit()

If we use the COUNT function on the address column, the result will be 1 because there is only one data that has a value in the address column. If we use the COUNT function on the salary column, the result will be 3 because all data has a value in the salary column.

  • using the COUNT function for the address column

    SELECT COUNT(address) FROM employees;

    The result is as follows:

    COUNT(address)
    1
  • using the COUNT function for the salary column

    SELECT COUNT(salary) FROM employees;

    The result is as follows:

    COUNT(salary)
    3

If we look at the results, the column name will become COUNT(address) and COUNT(salary). If we want to change the column name, we can use the AS command to change the column name. For example:

SELECT COUNT(address) AS total_address
FROM employees;

Try in python using SQLite

import pandas as pd

pd.read_sql_query("SELECT COUNT(address) AS total_address FROM employees", conn)
total_address
0 1

By using AS we can change the column name to total_address. The result will be as follows:

total_address
1

CONCAT

The CONCAT function is a function used to combine two or more strings into one. This function will produce one string only.

CONCAT(string1, string2, string3, ...)

For example, if there is a persons table with columns and data as follows:

id salutation name age
1 Mr Rizki Sanjaya 25
2 Mr Andi William 27
3 Ms Susi Andini 30

We can combine the salutation and name columns into one column using the CONCAT function. For example:

c.execute('''
CREATE TABLE persons (
    id INT,
    salutation TEXT,
    name TEXT,
    age INT);
''')
conn.commit()
data = [
    (1, 'Mr', 'Rizki Sanjaya', 25),
    (2, 'Mr', 'Andi William', 27),
    (3, 'Mr', 'Susi Andini', 30)
]
c.executemany('INSERT INTO persons VALUES (?,?,?,?)', data)

conn.commit()
import pandas as pd

pd.read_sql_query("SELECT * FROM persons", conn)
id salutation name age
0 1 Mr Rizki Sanjaya 25
1 2 Mr Andi William 27
2 3 Mr Susi Andini 30
SELECT CONCAT(salutation, ' ', name) AS fullname
FROM persons;

Try in python using SQLite (CONCAT is not supported by SQLite. In SQLite, you can use the || operator to concatenate strings.)

import pandas as pd

pd.read_sql_query("SELECT (salutation || ' ' || name) AS fullname FROM persons", conn)
fullname
0 Mr Rizki Sanjaya
1 Mr Andi William
2 Mr Susi Andini

The command above will combine the salutation and name columns using a space as a separator. The result will be as follows:

fullname
Mr Rizki Sanjaya
Mr Andi William
Ms Susi Andini

You can try using the aggregate functions explained in this online SQL compiler or the DBMS tool you are using.

In addition to COUNT, there are other aggregate functions such as MIN, MAX, AVG, SUM, and others that you can read in W3 School - SQL Aggregate Function.

# Close the connection
conn.close()

Next, we continue into accessing and manipulating data in multiple tables using JOIN and TRANSACTION

Back to top