%pip install elasticsearch
Elasticsearch In Practice
Elasticsearch (Source: technocratsid.com)
Index
As mentioned earlier, Elasticsearch doesn’t have a concept like a database; thus, we can create an index (or table) directly. But how do we use Elasticsearch for more than one application? A common practice is to prefix the index name with the application’s name, for example: applicationname_indexname. This way, index names do not conflict between applications.
Creating an Index
To create an Index, we can make an HTTP call like so:
PUT /{index_name}
The rules for index names are as follows:
- They must be lowercase.
- No special characters are allowed, except for -, +, and _, but these cannot be at the start.
- They cannot be longer than 255 bytes.
By creating unique index names, you ensure that data from different applications is stored separately and can be queried independently.
Before we start practicing, install the elasticsearch package
import the packages we need
from elasticsearch import Elasticsearch
import time
import json
Create a connection to elasticsearch, make sure elasticsearch is running on your computer’s localhost or is running on Google Collab.
= Elasticsearch([{'host': 'localhost', 'port': 9200, 'scheme': 'http'}]) es
Creating an Index
# Create index customers
# PUT http://localhost:9200/customers
= es.options(ignore_status=[400]).indices.create(index='customers')
response print(json.dumps(response.body, indent=4))
# Create index products
# PUT http://localhost:9200/products
= es.options(ignore_status=[400]).indices.create(index='products')
response print(json.dumps(response.body, indent=4))
# Create index orders
# PUT http://localhost:9200/orders
= es.options(ignore_status=[400]).indices.create(index='orders')
response print(json.dumps(response.body, indent=4))
# Get All Indexes
# GET http://localhost:9200/_cat/indices?v
= es.cat.indices(v=True)
response print(response)
Deleting an Index
To delete an index that you’ve created, you can simply use the DELETE HTTP method. Deleting an index will automatically remove all data associated with that index. Here’s how you can do it:
DELETE /{index_name}
Just replace {index_name} with the name of the index you wish to delete. But be careful, this operation is irreversible. Once an index is deleted, all the data within it is permanently lost unless you have a backup or recovery mechanism in place.
Delete Index
# Delete index customers
# DELETE http://localhost:9200/customers
= es.options(ignore_status=[400, 404]).indices.delete(index='customers')
response print(json.dumps(response.body, indent=4))
# Delete index products
# DELETE http://localhost:9200/products
= es.options(ignore_status=[400, 404]).indices.delete(index='products')
response print(json.dumps(response.body, indent=4))
# Delete index orders
# DELETE http://localhost:9200/orders
= es.options(ignore_status=[400, 404]).indices.delete(index='orders')
response print(json.dumps(response.body, indent=4))
# Get All Indexes
# GET http://localhost:9200/_cat/indices?v
= es.cat.indices(v=True)
response print(response)
Dynamic Mapping in Elasticsearch
In Elasticsearch, defining the schema of an index is known as mapping. By default, a feature called Dynamic Mapping is enabled, where Elasticsearch auto-detects the data type of each JSON attribute and creates a mapping accordingly.
While convenient, it’s generally recommended to manually create mappings for better control over data indexing and querying. This ensures proper interpretation and storage of your data.
Dynamic Field Mapping
This feature auto-detects the data type of a field in a JSON document and assigns a corresponding Elasticsearch data type:
JSON Data Type | Elasticsearch Data Type |
---|---|
null | No field added |
true / false | boolean |
double | float |
long | long |
array | Depends on the first data item type |
string | date, float, long, text (auto-detected) |
Date Detection
By default, Elasticsearch detects if a string data is in a date format and assigns it a date data type using the format yyyy/MM/dd HH:mm:ss
. This feature is active by default but can be deactivated by setting the date_detection
attribute in the mapping to false
. The date format can be changed by adjusting the dynamic_date_formats
attribute in the mapping.
Update Dynamic Mapping for Date
# Update customers mapping
# PUT http://localhost:9200/customers/_mapping
= es.indices.put_mapping(index='customers', body={
response 'properties': {
'date': {
'type': 'date',
'format': 'yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd'
},'register_at': {
'type': 'date',
'format': 'yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd'
}
}
})print(json.dumps(response.body, indent=4))
# Get Mapping
# GET http://localhost:9200/customers/_mapping
= es.indices.get_mapping(index='customers')
response print(json.dumps(response.body, indent=4))
# Update products mapping
# PUT http://localhost:9200/products/_mapping
= es.indices.put_mapping(index='products', body={
response 'properties': {
'date': {
'type': 'date',
'format': 'yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd'
}
}
})print(json.dumps(response.body, indent=4))
# Get Mapping
# GET http://localhost:9200/products/_mapping
= es.indices.get_mapping(index='products')
response print(json.dumps(response.body, indent=4))
# Update orders mapping
# PUT http://localhost:9200/orders/_mapping
= es.indices.put_mapping(index='orders', body={
response 'properties': {
'date': {
'type': 'date',
'format': 'yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd'
}
}
})print(json.dumps(response.body, indent=4))
# Get Mapping
# GET http://localhost:9200/orders/_mapping
= es.indices.get_mapping(index='orders')
response print(json.dumps(response.body, indent=4))
Number Detection
Although JSON has a number data type, sometimes users send numbers in a string format, such as “100” or “12.12”. In these cases, Elasticsearch may need to detect and convert these string-formatted numbers to actual number data types (long or float).
By default, automatic number detection is not active in Elasticsearch. If you want to activate it, you need to change the numeric_detection
attribute in the mapping to true
.
If a mapping for a certain attribute is not yet available, Elasticsearch will automatically try to convert the data to a number type (either long or float). If the conversion is successful, Elasticsearch will use the corresponding number data type for the field.
# Update customers mapping
# PUT http://localhost:9200/customers/_mapping
= es.indices.put_mapping(index='customers', body={
response 'numeric_detection': True
})print(json.dumps(response.body, indent=4))
# Get Mapping
# GET http://localhost:9200/customers/_mapping
= es.indices.get_mapping(index='customers')
response print(json.dumps(response.body, indent=4))
# Update products mapping
# PUT http://localhost:9200/products/_mapping
= es.indices.put_mapping(index='products', body={
response 'numeric_detection': True
})print(json.dumps(response.body, indent=4))
# Get Mapping
# GET http://localhost:9200/products/_mapping
= es.indices.get_mapping(index='products')
response print(json.dumps(response.body, indent=4))
# Update orders mapping
# PUT http://localhost:9200/orders/_mapping
= es.indices.put_mapping(index='orders', body={
response 'numeric_detection': True
})print(json.dumps(response.body, indent=4))
# Get Mapping
# GET http://localhost:9200/orders/_mapping
= es.indices.get_mapping(index='orders')
response print(json.dumps(response.body, indent=4))
Create API
The Create API is used to add new data to Elasticsearch.
The Create API has a “save” operation, which means it will only create a new document if a document with the provided _id
does not already exist. Attempting to create a document with an _id
that already exists will result in a conflict error.
To use the Create API, you use either the POST or PUT HTTP method with the following endpoint:
POST/PUT /<index_name>/_create/<id>
Here, <index_name>
is the name of the index where you want to create the document, and <id>
is the unique identifier you want to assign to the new document. If the document is successfully created, Elasticsearch will return a confirmation response.
Create Customer
# Insert customers aditira
# POST http://localhost:9200/customers/_create/aditira
= es.index(index='customers', id='aditira', body={
response 'name': 'Aditira Jamhuri',
'register_at': '2023-11-30 00:00:00',
})print(json.dumps(response.body, indent=4))
# Get Mapping
# GET http://localhost:9200/customers/_mapping
= es.indices.get_mapping(index='customers')
response print(json.dumps(response.body, indent=4))
Create Product
# Insert products 1
# POST http://localhost:9200/products/_create/1
= es.index(index='products', id='1', body={
response 'name': 'Product 1',
'price': 10000,
})print(json.dumps(response.body, indent=4))
# Insert products 2
# POST http://localhost:9200/products/_create/2
= es.index(index='products', id='2', body={
response 'name': 'Product 2',
'price': 20000,
})print(json.dumps(response.body, indent=4))
# Get Mapping
# GET http://localhost:9200/products/_mapping
= es.indices.get_mapping(index='products')
response print(json.dumps(response.body, indent=4))
Create Order
# Insert orders 1
# POST http://localhost:9200/orders/_create/1
= es.index(index='orders', id='1', body={
response "order_date": "2023-12-01 00:00:00",
"customer_id": "aditira",
"total": 40000,
"items": [
{"product_id": "1",
"price": 10000,
"quantity": 2
},
{"product_id": "2",
"price": 20000,
"quantity": 1
}
]
})print(json.dumps(response.body, indent=4))
# Get Mapping
# GET http://localhost:9200/orders/_mapping
= es.indices.get_mapping(index='orders')
response print(json.dumps(response.body, indent=4))
Get API
Once you’ve stored data in Elasticsearch using the Create API, you can retrieve this data using the Get API.
The Get API returns the requested data along with its associated metadata, such as the _id
, index name, document version, and so on.
If the data you’re trying to retrieve is not available (i.e., there is no document with the requested _id
in the specified index), the HTTP response code will be 404 Not Found.
To use the Get API, you make an HTTP GET request to the following endpoint:
GET /<index_name>/_doc/<id>
In this endpoint, <index_name>
is the name of the index from which you want to retrieve data, and <id>
is the unique identifier of the document you want to retrieve. If the document is found, Elasticsearch will return the document and its metadata in the response.
Get Document
# Get Customers aditira
# GET http://localhost:9200/customers/_doc/aditira
= es.get(index='customers', id='aditira')
response print(json.dumps(response.body, indent=4))
Get Source API
If you’re interested in retrieving the document data but do not wish to receive the metadata associated with the document, you can use the Get Source API.
To use the Get Source API, you make an HTTP GET request to the following endpoint:
GET /<index_name>/_source/<id>
In this endpoint, <index_name>
is the name of the index from which you want to retrieve data, and <id>
is the unique identifier of the document you want to retrieve.
This call will return only the actual data that you inserted, without any of the metadata information, including the _id
. This is because the _id
is already included in the URL where you’re making the HTTP call. As such, if the document is found, Elasticsearch will return the document data in the response, without any metadata.
Get Source Document
# Get Source Customers aditira
# GET http://localhost:9200/customers/_doc/aditira/_source
= es.get_source(index='customers', id='aditira')
response print(json.dumps(response.body, indent=4))
Check Exists
There may be cases where you only want to check if a document exists in an index, without needing to retrieve the document’s data. In such cases, you can use the Get API, but with the HTTP method HEAD
instead of GET
.
To check if a document exists, you make an HTTP HEAD
request to the following endpoint:
HEAD /<index_name>/_doc/<id>
In this endpoint, <index_name>
is the name of the index where you’re checking for the document, and <id>
is the unique identifier of the document you’re checking for.
Elasticsearch will return a 200 OK
response without any body if the document exists. If the document does not exist, it will return a 404 Not Found
response. This is a quick and efficient way to check for the existence of a document without retrieving or transferring any data.
If data exists
# Check Customers aditira
# HEAD http://localhost:9200/customers/_doc/aditira
= es.exists(index='customers', id='aditira')
response print(json.dumps(response.body, indent=4))
If the data does not exist
# Check Customers wrong
# HEAD http://localhost:9200/customers/_doc/wrong
= es.exists(index='customers', id='wrong')
response print(json.dumps(response.body, indent=4))
Multi Get API
Elasticsearch provides a Multi Get API that allows you to retrieve multiple documents at once. This is useful when you need to fetch documents from different indices in a single API call.
You can use the Multi Get API with the following RESTful API endpoints:
POST /_mget
POST /<index_name>/_mget
In these endpoints, <index_name>
is the name of the index from which you want to retrieve documents. If you omit the <index_name>
, Elasticsearch will retrieve documents from all indices.
The _mget
endpoint accepts a request body that specifies the documents to retrieve. The request body should be a JSON object that contains an ids
array, like this:
{
"ids" : ["1", "2", "3", "4"]
}
If you’re using <index_name>/_mget
, then all IDs in the ids
array will be retrieved from the specified index. If you’re using /_mget
, you can specify the index for each ID in the ids
array, like this:
{
"docs" : [
{
"_index" : "index1",
"_id" : "1"
},
{
"_index" : "index2",
"_id" : "2"
}
]
}
In this case, Elasticsearch will retrieve each document from the specified index.
The Multi Get API is a powerful tool that can significantly reduce the number of API calls you need to make when working with multiple documents across multiple indices.
Multiget Document
# Multiget products
# POST http://localhost:9200/products/_mget
= es.mget(index='products', body={
response 'ids': ['1', '2']
})print(json.dumps(response.body, indent=4))
Search API
While the Get API is used for retrieving a single document using its _id
, the Search API in Elasticsearch is used when you want to search for documents without knowing their _id
. The Search API is quite complex, offering a wide range of querying and filtering options that allow you to perform full-text search, term-based search, and much more.
To use the Search API, you can use the following RESTful API endpoints:
POST /_search
POST /<index_name>/_search
In these endpoints, <index_name>
is the name of the index that you want to search in. If you don’t specify an index name, Elasticsearch will search all indices.
The _search
endpoint accepts a request body that defines the search query. This search query is written in Elasticsearch’s Query DSL (domain-specific language), which is a flexible and powerful language for defining queries.
Here’s a simple example of a search query:
{
"query": {
"match": {
"field_name": "search term"
}
}
}
In this query, Elasticsearch will return documents where field_name
matches the “search term”.
It’s important to note that this is just the tip of the iceberg when it comes to the Search API. It supports a wide range of querying and filtering options, allowing you to perform complex searches on your data. More advanced features of the Search API will be covered in later discussions.
Search Document
# Search products
# POST http://localhost:9200/products/_search
= es.search(index='products', body={
response "query": {
"match": {
"price": 10000
}
}
})print(json.dumps(response.body, indent=4))
Pagination
When you are working with large amounts of data, it can be useful to break up the results of a search query into manageable chunks, or “pages”. Elasticsearch’s Search API supports pagination through the use of query parameters.
There are two important parameters for pagination:
from
: This parameter determines the starting document from where the results should be returned. The count starts from 0. So, if you setfrom
to 10, Elasticsearch will skip the first 10 results.size
: This parameter determines the number of search hits to return. By default, Elasticsearch returns 10 results per page. If you want more or fewer results, you can change thesize
parameter.
Here’s an example of how to use these parameters in a search query:
{
"from" : 0, "size" : 20,
"query": {
"match": {
"field_name": "search term"
}
}
}
In this example, Elasticsearch will return the first 20 documents that match the search term. If you want to get the next 20 documents, you can change from
to 20:
{
"from" : 20, "size" : 20,
"query": {
"match": {
"field_name": "search term"
}
}
}
This way, you can navigate through the search results page by page. It’s important to note that the maximum value of from + size
is 10000 by default. If you need to handle more data, you need to use the scroll API or increase this limit.
Search Document dengan Pagination
# Search products page 1
# POST http://localhost:9200/products/_search?size=1&from=0
= es.search(index='products', body={
response 'size': 1,
'from': 0
})print(json.dumps(response.body, indent=4))
# Search products page 2
# POST http://localhost:9200/products/_search?size=1&from=1
= es.search(index='products', body={
response 'size': 1,
'from': 1
})print(json.dumps(response.body, indent=4))
Sorting
The Search API in Elasticsearch also supports sorting of the search results. This is done using the sort query parameter.
The sort parameter’s value is defined as:
<field>:<direction>
In this, <field>
is the name of the field you want to sort by and <direction>
can be either asc
(for ascending order) or desc
(for descending order).
For example, if you have a timestamp field in your documents and you want to sort the results in descending order of timestamp
, you can do this:
{
"query": {
"match": {
"field_name": "search term"
}
},
"sort" : [
{ "timestamp" : {"order" : "desc"}}
]
}
If you need to sort by more than one field, you can specify multiple fields separated by commas. The sorting will be applied in the order you specify. For example:
{
"query": {
"match": {
"field_name": "search term"
}
},
"sort" : [
{ "field1" : {"order" : "asc"}},
{ "field2" : {"order" : "desc"}},
{ "field3" : {"order" : "asc"}}
]
}
In this example, Elasticsearch first sorts the documents in ascending order by field1
. For documents where field1
is the same, it then sorts in descending order by field2
, and so on. This way, you can fine-tune the order of your search results to meet your needs.
Search Document dengan Sorting
# Search products sort by price
# POST http://localhost:9200/products/_search?sort=price:asc
= es.search(index='products', body={
response 'sort': {
'price': 'asc'
}
})print(json.dumps(response.body, indent=4))
# Search products sort by name
# POST http://localhost:9200/products/_search?sort=name:asc
= es.search(index='products', body={
response 'sort': [
'name.keyword': 'asc' }
{
]
})print(json.dumps(response.body, indent=4))
Index API
In the past, the Index API was commonly used to create documents in Elasticsearch, but in newer versions, the Create API is often used instead. However, the Index API still has its uses.
The Index API has a “create or replace” nature. This means that if a document with the specified ID does not exist, it will be created. If a document with the specified ID already exists, the existing document will be replaced (deleted and created anew) with the new document.
This is a key difference between the Index API and the Create API. With the Create API, if a document with the specified ID already exists, an error conflict will occur. With the Index API, no such error will occur, and the existing document will simply be replaced.
To use the Index API, you can use the following RESTful API command:
POST /<index_name>/_doc/<id>
PUT /<index_name>/_doc/<id>
Here, <index_name>
is the name of the index where you want to create or replace the document, and <id>
is the ID of the document.
In the body of the request, you would include the full content of the new document you want to create or replace. For example:
PUT /my_index/_doc/1
{
"field1": "value1",
"field2": "value2"
}
In this example, a document with ID 1 and the specified fields and values will be created in my_index
. If a document with ID 1 already exists, it will be replaced with the new document.
Index Document
# Save product 3
# POST http://localhost:9200/products/_doc/3
= es.index(index='products', id='3', body={
response 'name': 'Product 3',
'price': 30000,
})print(json.dumps(response.body, indent=4))
# Save product 4
# POST http://localhost:9200/products/_doc/4
= es.index(index='products', id='4', body={
response 'name': 'Product 4',
'price': 40000,
})print(json.dumps(response.body, indent=4))
# Save product 5
# POST http://localhost:9200/products/_doc/5
= es.index(index='products', id='5', body={
response 'name': 'Product 5',
'price': 50000,
})print(json.dumps(response.body, indent=4))
Choose Create API or Index API?
Your choice between Create API and Index API depends on your needs:
Index API: This is the most common choice among programmers because it’s safe from conflicts if a document with the same ID already exists in Elasticsearch. The Index API will replace the existing document, rather than producing an error.
Create API: Use this if you don’t want to replace an existing document. If a document with the same ID already exists, the Create API will produce an error. To prevent this, it’s recommended to use the Get API first to check if the document already exists.
Update API
The Index API behaves much like a strict librarian. Upon an update, it replaces the old document entirely, demanding all attributes anew. If you only submit the updated attribute, it overwrites the rest, potentially causing data loss.
Need to tweak a few attributes without a full rewrite? Use the Update API, your friendly editor:
POST /<index_name>/_update/<id>
But beware, if the document doesn’t exist, Elasticsearch returns a 404 Not Found error. Always check your document’s existence before updating.
Update Document
# Update product 5
# POST http://localhost:9200/products/_update/5
= es.update(index='products', id='5', body={
response 'doc': {
'price': 50000000
}
})print(json.dumps(response.body, indent=4))
# Get product 5
= es.get(index='products', id='5')
response print(json.dumps(response.body, indent=4))
Delete API
To remove a document, the Delete API is your digital shredder. Utilize it through this RESTful API command:
DELETE /<index_name>/_doc/<id>
But remember, if you attempt to delete a document that doesn’t exist, Elasticsearch will return a 404 Not Found error. Always make sure the document is in your collection before trying to remove it.
Insert Spammer
# Insert customer spammer
# POST http://localhost:9200/customers/_create/spammer
= es.index(index='customers', id='spammer', body={
response 'name': 'Spammer',
'register_at': '2023-12-06 00:00:00',
})print(json.dumps(response.body, indent=4))
# Get customer spammer
= es.get_source(index='customers', id='spammer')
response print(json.dumps(response.body, indent=4))
Delete Spammer
# Delete customer spammer
# DELETE http://localhost:9200/customers/_doc/spammer
= es.delete(index='customers', id='spammer', ignore=[404])
response print(json.dumps(response.body, indent=4))
# Get customer spammer
= es.get_source(index='customers', id='spammer', ignore=[404])
response print(json.dumps(response.body, indent=4))
Bulk API
When dealing with many operations in Elasticsearch, the Bulk API shines by speeding up the process instead of manually handling each operation. The Bulk API bundles various operations together, including create, index, update, or delete. To use the Bulk API, apply the following RESTful API commands:
POST /_bulk
POST /<index_name>/_bulk
This allows more efficient handling of large volumes of operations.
Format Request Body untuk Bulk API
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n
Bulk Document
# Bulk documents
# POST http://localhost:9200/_bulk
= es.bulk(body=[
response 'create': {'_index': 'customers', '_id': 'dina'}},
{'name': 'Dina', 'register_at': '2023-11-30 00:00:00'},
{'index': {'_index': 'customers', '_id': 'yuni'}},
{'name': 'Yuni', 'register_at': '2023-11-30 00:00:00'},
{'update': {'_index': 'products', '_id': '1'}},
{'doc': {'price': 250000}},
{'create': {'_index': 'customers', '_id': 'spammer'}},
{'name': 'Spammer', 'register_at': '2023-12-06 00:00:00'},
{'delete': {'_index': 'customers', '_id': 'spammer'}},
{
])print(json.dumps(response.body, indent=4))
# Get document dina
# GET http://localhost:9200/customers/_doc/dina
= es.get(index='customers', id='dina')
response print(json.dumps(response.body, indent=4))
# Get document yuni
# GET http://localhost:9200/customers/_doc/yuni
= es.get(index='customers', id='yuni')
response print(json.dumps(response.body, indent=4))
# Get document spammer
# GET http://localhost:9200/customers/_doc/spammer
= es.get(index='customers', id='spammer', ignore=[404])
response print(json.dumps(response.body, indent=4))
# Get document product 1
# GET http://localhost:9200/products/_doc/1
= es.get(index='products', id='1')
response print(json.dumps(response.body, indent=4))