The terms Data and Information are often used interchangeably, but they actually have distinct meanings.
Data refers to raw facts or statistics in the form of text, numbers, symbols, etc., that do not carry specific meaning or purpose. On the other hand, Information is knowledge derived from study, communication, research, or instructions. Essentially, information is obtained from analyzing and interpreting one or more pieces of data.
For example:
The number of likes on a social media post is a piece of data. When combined with other social media engagement statistics, like follows, comments, and shares, it becomes information that can tell us which social media platform is performing the best.
Introducing to database
A database is a systematic collection of data, managed in accordance with interrelated rules, making it easier to transform data into information.
When managing a database, some essential characteristics to consider are:
Speed of data retrieval
Storage space efficiency
Reduction or elimination of data redundancy
Accuracy in establishing relationships between data based on data type rules, data domains, and data uniqueness.
Database system
A database, or “basis data” in Indonesian, is comprised of a basis (warehouse) and data (real-world facts representing objects) like people, goods, and so forth, manifested in the form of letters, numbers, symbols, or combinations thereof.
A database is usually managed by a DBMS (Database Management System) that acts as an interface between the database and its users. Users here can be either humans or applications. DBMS can be used by users to view, change, and manage information in the database.
MySQL: A SQL-based relational DBMS with a client-server architecture.
Elasticsearch: A distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. It centrally stores your data for lightning fast search, fine-tuned relevancy, and powerful analytics that scale with ease.
SQLite: A relational database management system embedded into applications. SQLite stores all data in a single disk file and is very popular for mobile and desktop applications that require lightweight, local data storage.
And others
Data is typically modeled in rows and columns within a table. This structure makes it easier to perform addition, modification, deletion, and other operations. This model is common in relational databases (RDBMSs) that use SQL(Structured Query Language).
SQL is a programming language used to manipulate, search, access, and perform other operations within a database.
But how does this differ from a Spreadsheet (like Excel, or Google Sheets)? Both are easy to use but have clear differences:
Differences
Spreadsheet
Database
Usage
For accounting tasks
Used by large companies to store large amounts of data
Who can access
Designed for single-user, but can be used by multiple users in turns
Can be accessed by multiple users or applications at once
Amount of data that can be stored
Data storage is limited
Can store large amounts of data
To illustrate the usage of a database, we will use SQL with SQLite as the relational database management system (RDBMS).
Up until now, when creating applications in Python, we’ve been storing data in memory in the form of a list of objects:
This approach has some challenges, such as the data not being permanently stored when the application is terminated, and the data is hard to process because it requires several lines of Python code to manipulate the data.
For example, if we want to get the average Grade from the above student list:
sum_of_grades =sum(student.grade for student in students)average_grade = sum_of_grades /len(students)print("Grade Average is ", average_grade)
Grade Average is 74.66666666666667
This requires about 3 lines of code to get the average Grade. Now let’s compare this with using SQL.
Assuming we already have a students table with id, username, and grade columns using SQLite. We can input data with the following SQL command:
import sqlite3# Connect to SQLite database in memoryconn = sqlite3.connect(':memory:')# Create a cursor objectc = conn.cursor()# Create tablec.execute(""" CREATE TABLE students ( id INTEGER PRIMARY KEY, username TEXT, grade INTEGER )""")
# Insert valuesc.execute(""" INSERT INTO students (username, grade) VALUES ('aditira',83), ('gina',77), ('ria',64)""")
<sqlite3.Cursor at 0x27c0db505c0>
And we can retrieve the data with this simple SQL command:
SELECT*FROM students;
import pandas as pd# Commit the transactionconn.commit()# Retrieve data and convert into DataFramepd.read_sql_query("SELECT * FROM students", conn)
id
username
grade
0
1
aditira
83
1
2
gina
77
2
3
ria
64
Quite easy, isn’t it! 😎 And to get the average value of student grades is even easier, with this simple SQL command:
SELECTAVG(grade) FROM students;
import pandas as pd# Commit the transactionconn.commit()# Retrieve data and convert into DataFramepd.read_sql_query("SELECT AVG(grade) FROM students", conn)
AVG(grade)
0
74.666667
# Close the connectionconn.close()
So you can see that using a database, our application data is much easier to process and is stored more securely for long-term use.
DDL (Data Definition Language)
What is DDL?
Before we start using a database, we need to set things up by creating tables, schemas, views, and indexes. This process is referred to as Data Definition Language (DDL).
There are several SQL commands used to carry out these tasks, such as CREATE, ALTER, DROP, TRUNCATE.
Let’s consider the following table as an example. This table contains simple employee data:
ID
name
age
address
salary
1
Rizki
25
Jl. Kebon Jeruk
2000000
2
Andi
27
Jl. Kebon Sirih
3000000
3
Budi
30
Jl. Kebon Melati
4000000
4
Caca
32
Jl. Kebon Anggrek
5000000
5
Deni
35
Jl. Kebon Mawar
6000000
Next, we will apply DDL to this table.
SQL Data Types
To create a table, we need to determine the data types to be used. These data types will limit the kind of data that can be inserted into each column. The data types in SQL are not much different from those in Python. Here are some commonly used data types:
Data type
Description
VARCHAR(size)
A character data type with a flexible length, like string in Python
TEXT
A character data type with a maximum length of 65535 characters
INT / INTEGER
An integer data type
FLOAT
A decimal number data type
BOOLEAN
A boolean data type (true / false)
DATE
A date data type
DATETIME
A date and time data type
For the VARCHAR data type, we need to specify the size (a number from 1 - 255). This size will limit the length of the characters that can be entered into that column. For instance, VARCHAR(50) will limit the length of the characters that can be entered into that column to 50 characters. If we enter characters longer than 50, an error will occur.
The VARCHAR data type can only hold a maximum of 255 characters. If we want to store more than 255 characters, we need to use the TEXT data type, which has a maximum length of 65535 characters.
The CREATE command is used to create a new object such as a database, table, schema, view, or index.
To create a new database, we can use the CREATE DATABASE command as follows:
CREATEDATABASE<db-name>;
Where <db-name> is the name of the database to be created. Don’t forget to end each SQL command with a ;.
Example:
CREATE DATABASE db_employee;
With python and SQLite let’s import the necessary module and establish a connection to the database.
import sqlite3# Connect to the SQLite database# If the database does not exist, it will be createdconn = sqlite3.connect('db/sqlite.db')# Create a cursor objectcur = conn.cursor()
The command above creates a new database named db_employee.
Next, we can also use CREATE to create a new table. We can use CREATE TABLE as follows:
Where <table-name> is the name of the table to be created, <column-name> is the name of the column, <data-type> is the data type to be used, and <constraints> are the rules to be applied to that column. To create a column with more than one rule, we can separate them with a comma ,.
Example command to create an employee table:
CREATE TABLE employee (id INT, name VARCHAR(255) NOT NULL, age INT NOT NULL, address VARCHAR(255), salary INT);
Let’s create an employee table.
# Execute a SQL command: this creates a new tablecur.execute(''' CREATE TABLE employee ( id INTEGER, name TEXT NOT NULL, age INTEGER NOT NULL, address TEXT, salary INTEGER )''')# Commit the transactionconn.commit()
See the table:
import pandas as pdpd.read_sql_query("SELECT * FROM employee", conn)
id
name
age
address
salary
From the example above, we are creating a new table named employee that has 5 columns, namely id, name, age, address, and salary. For the id and age columns, we use the INT data type, which means integer, while for the name and address columns, we use the VARCHAR data type, which means character. The salary column also uses the INT data type, which means integer.
For the name and age columns, we apply the NOT NULL constraint, which means that these columns cannot be empty. If we enter empty data or empty characters "" into these columns, an error will occur.
You can try creating a new table named employee as above in an online SQL compiler or in your own pgAdmin / DBMS tool.
ALTER
The ALTER command is used to modify the structure of the database. For example, if we want to add a new column to an existing table, we can use the ALTER TABLE command as follows:
Where <table-name> is the name of the table to be modified, <column-name> is the name of the column to be added, <data-type> is the data type to be used, and <constraints> are the rules to be applied to that column.
If we want to add a new column named email with data type VARCHAR(255) to the employee table, we can use the following command: