This tutorial will show you how to use the _cat API to view information about shards in an Elasticsearch cluster, what node the replica is, the size it takes up the disk, and more.
To view all the shards in an Elasticsearch cluster, you can use the GE request at the _cat/shards API endpoint, as follows:
If you are a cURL user, use the following command:
Executing the above command will give you information about all the shards in the cluster, as shown below (output truncated):
You can also filter the output and specify the format of the result. For example, to obtain the output in YAML format, add the format=yaml parameter to the request, as follows:
The cURL command for this is:
The output should in YAML format as:
You can even choose to obtain specific headers. For example, to obtain the index name, shard name, shard state, shard disk space, node id, and node IP, filter by passing them to the header argument as:
The cURL command is as follows:
Executing the above command gives you selected information about the shards in the JSON format. Skip the format parameters to use the default tabular format.
T0 0btain information about a shard for a specific index, pass the name of the index as follows:
Input the cURL command as follows:
This command gives you information about the shards of that specific index:
NOTE: You can also use parameters to filter the data above.
In this guide, we showed you how to use the cat API to obtain information about shards running in the Elasticsearch cluster.
]]>Automatic shard rebalancing conforms to restrictions and rules like allocation filtering and forced awareness, leading to the most efficient and well-balanced cluster possible.
NOTE: Do not confuse shard reallocation, which is the process of finding and moving unassigned shards to the nodes in which they reside, with rebalancing. Rebalancing takes assigned shards and moves them evenly to various nodes, the purpose being the equal distribution of shards per node.
To enable automatic cluster rebalancing in Elasticsearch, we can use the PUT request to_cluster API endpoint and add the settings we need.
The settings available for dynamic shard rebalancing include:
Consider the request below to allow automatic shard rebalancing for the cluster.
The following is the cURL command:
This command should return a response as the JSON object acknowledges the settings that are updated.
You can also rebalance a shard manually for a specific index. I would not recommend this option because the Elasticsearch default rebalancing options are very efficient.
However, should the need to perform manual rebalancing arise, you can use the following request:
The cURL command is:
NOTE: Keep in mind that if you perform a manual rebalance, Elasticsearch may move the shards automatically to ensure the best rebalance possible.
This guide walked you through updating and modifying the settings for an Elasticsearch cluster to enable automatic shard rebalancing. The article also covered manual rebalancing, if you require it.
]]>However, as you know, once data gets mapped into an index, it’s unmodifiable. To do this, you will need to reindex the data with the modifications you require. This process may lead to downtime, which is not a very good practice, especially for a service that is already in circulation.
To circumvent this, we can use index aliases, which allow us to switch between indices seamlessly.
The first step is to ensure you have an existing index that you wish to update the data.
For this tutorial, we will have an old and new index that will function as their names.
For cURL users, use the appended command:
Next, create a new index that we are going to use. Copy all the settings and mappings from the old index as:
Here’s the cURL command:
Having the setting and mappings in the new index, use the reindex api to copy the data from the old index to the new one:
Here’s the cURL command:
Now, copy the alias of the old index to the new one using the _alias api as:
Here’s the cURL command:
Once completed, you can now remove the old index, and the applications will use the new index (due to the alias) with no downtime.
Once you master the concepts discussed in this tutorial, you will be in a position to reindex data from an old index to a new one in place.
]]>Working with databases is very fun but can sometimes be challenging, especially when dealing with already-existing data.
For example, if you want to change the type of a specific field, it might require you to take the service down, which can have grave repercussions, especially in services that process large amounts of data.
Fortunately, we can use Elasticsearch’s powerful features such as Reindexing, ingest nodes, pipelines, and processors to make such tasks very easy.
This tutorial will show you how to change a field type in a specific index to another, using Elasticsearch Ingest nodes. Using this approach will eliminate downtime that affects services while still managing to perform the field type change tasks.
Elasticsearch’s ingest node allows you to pre-process documents before their indexing.
An Elasticsearch node is a specific instance of Elasticsearch; connected nodes (more than one) make a single cluster.
You can view the nodes available in the running cluster with the request:
The cURL command for this is:
Executing this command should give you massive information about the nodes, as shown below (truncated output):
By default, all Elasticsearch nodes enable ingest and are capable of handling ingest operations. However, for heavy ingest operations, you can create a single node dedicated to ingesting only.
To handle pre_process, before indexing the documents, we need to define a pipeline that states the preprocessors series.
Preprocessors are sets of instructions wrapped around a pipeline and are executed one at a time.
The following is the general syntax of how to define a pipeline:
The description property says what the pipeline should achieve. The next parameter is the preprocessors, passed on as a list in the order of their execution.
To create a pipeline that we will use to convert a type, use the PUT request with the _ingest API endpoint as:
For cURL, use the command:
Once we have the pipeline in the ingest node, all we need to do is call the indexing API and pass the pipeline as an argument in the dest of the request body as:
For cURL:
To verify that the pipeline has applied correctly, use the GET request to fetch that specific field as:
This should return the data as:
In this guide, we have looked at how to work with Elasticsearch Ingest nodes to pre-process documents before indexing, thus converting a field from one type to another.
Consider the documentation to learn more.
https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html
]]>In this quick guide, we will examine how to enable Elasticsearch Xpack security features and how to use security API to create users and roles.
Let us get started!
NOTE: We are assuming you already have Elasticsearch installed and running on your system. If not, consider the following tutorials to install Elasticsearch.
https://linuxhint.com/visualize_apache_logs_with_elk_stack/
https://linuxhint.com/install-elasticsearch-ubuntu/
By default, Elasticsearch Features, Xpack, are disabled, and you will need to enable them. First, stop Elasticsearch and Kibana, so you can edit the configuration.
In the Elasticsearch configuration file, edit the xpack.security.enabled entry and set it to true.
By default, you’ll find the elasticsearch.yml located in /etc/elasticsearch.
Save the file and restart Elasticsearch and Kibana.
NOTE: Depending on the license you have, once you’ve activated xpack, you will need to run the command below to set up passwords and authentication:
If you have Elasticsearch and Kibana coupled, you can easily create users in the Kibana stack management.
Start by launching Kibana, then log in. Use the passwords you used when setting up.
Once logged in, select the Kibana Dock and navigate to Stack Management and the security section.
Now, navigate to users and click on “create user.” When creating a user, Kibana will ask you to assign a role. You can view all available roles in Stack Management – Security –Roles.
Provide the username, password, and full name.
Besides this simple way to create Elasticsearch users, you can use the more powerful method discussed below:
Another way to create native users in Elasticsearch is to use the API, using {security} as the endpoint, we can add, update, and remove users in Elasticsearch.
Let us look at how to carry out these operations.
To interact with the security API, we use POST and PUT HTTP requests, making sure we have the user information in the request’s body.
When creating a new user, you must pass the user’s username and password; both are required parameters. Elasticsearch usernames must not be more than 1024 characters and can be alphanumeric. Usernames do not allow whitespaces.
The information you can provide in the request body include:
Once you have the body of the request containing it, send the post request to _security/user/<username>.
Consider the request below that shows how to create a user using API.
If you’re using cURL, enter the command below:
This should return created: true as a JSON object.
If you create a user in Elasticsearch and set the enabled parameter as false, you will need to enable the account before using it. To do this, we can use the _enable API.
You should ensure to pass the username you wish to enable in the PUT request. The general syntax is as:
For example, the request below enables the user linuxhint:
The cURL command is:
The reverse is also true; to disable a user, use the _disable endpoint:
The cURL command is:
To view user information, use the GET request followed by the username you wish to view. For example:
The cURL command is:
That should display information about the specified username, as shown below:
To view information about all the users in the Elasticsearch cluster, omit the username and send the GET request as:
If you can create users, you can delete them too. To use the API to remove a user, simply send the DELETE request to _security/user/<username>.
Example:
The cURL command is:
That should return a JSON object with found:true as:
This tutorial taught you how to enable Elasticsearch Security features. We also discussed how to use Kibana Stack Management to manage users. Finally, we discussed how to create users, view user information, and delete users.
This information should get you started but remember that mastery comes from practice.
Thank you for reading.
]]>When you’re modifying data in an Elasticsearch index, it can lead to downtime as the functionality gets completed and the data gets reindexed.
This tutorial will give you a much better way of updating indices without experiencing any downtime with the existing data source. Using the Elasticsearch re-indexing API, we will copy data from a specific source to another.
Let us get started.
NOTE: Before we get started, Reindexing operations are resource-heavy, especially on large indices. To minimize the time required for Reindexing, disable number_of_replicas by setting the value to 0 and enable them once the process is complete.
The Reindexing operation requires the source field to be enabled on all the documents in the source index. Note that the source field is not indexed and cannot be searched but is useful for various requests.
Enable the _Source field by adding an entry as shown below:
To reindex documents, we need to specify the source and destination. Source and destination can be an existing index, index alias, and data streams. You can use indices from the local or a remote cluster.
NOTE: For indexing to occur successfully, both source and destination cannot be similar. You must also configure the destination as required before Reindexing because it does not apply settings from the source or any associated template.
The general syntax for Reindexing is as:
Let us start by creating two indices. The first one will be the source, and the other one will be the destination.
The cURL command is:
Now for the destination index (you can use the above command and change a few things or use the one given below):
As always, cURL users can use the command:
Now, we have the indices that we want to use, we can then move on to reindex the documents.
Consider the request below that copies the data from source_index to destination_index:
The cURL command for this is:
Executing this command should give you detailed information about the operation carried out.
NOTE: The source_index should have data.
You can view the status of the Reindexing operations by simply using the _tasks. For example, consider the request below:
The cURL command is:
That should give you detailed information about the Reindexing process as shown below:
We’ve covered everything you need to know about using Elasticsearch Reindexing API to copy documents from one index (source) to another (destination). Although there is more to the Reindexing API, this guide should help you get started.
]]>This tutorial discusses the art of using Elasticsearch CAT API to view detailed information about indices in the cluster. This information should help you manage how the clusters are performing and what actions to take.
You may already know that Elasticsearch loves JSON and uses it for all its APIs. However, displayed information or data is only useful to you when it’s in a simple, well-organized form; JSON might not accomplish this very well. Thus, Elasticsearch does not recommend using CAT API with applications but for human reading only.
With that out of the way, let’s dive in!
To get high-level information about an Elasticsearch index, we use the_cat API. For example, to view information about a specific cluster, use the command:
You can also use the cRUL command:
Once you execute the request above, you will get information about the specified index. This information may include:
The _cat API can also fetch high-level information about all indices in a cluster, for example:
For cURL users, enter the command:
This should display information about all indices in the cluster, as shown below:
In most cases, you will only need specific information about indices. To accomplish this, you can use _cat API parameters.
For example, to get only the UUID of the index, size, and health status, you can use the h parameter to accomplish this. For example, consider the request below:
The cURL command for this example is:
That should display filtered information for all indices in the cluster. Here’s an example output:
Suppose you want detailed statistics for a specific index. In such cases, you can use the _stats endpoint to query the data. For example, to get detailed information about an index called temp_2, use the request:
You can also use cURL as:
An example statistic information should be as shown below:
In this quick tutorial, we have learned how to use Elasticsearch API to get information about single or multiple indices within a cluster. We also learned how to filter data to get only the required values. You can learn more by checking the _cat and _stats API.
For more Elasticsearch tutorials, search the site.
Thank you for reading.
]]>Luckily, with Elasticsearch, when data become redundant, all you need to do is access a tool to perform requests and transfer data over the network.
This quick guide will show you how to use the mighty Elasticsearch API to delete documents and indices.
NOTE: We assume you have Elasticsearch running on your system and that you have a tool for making requests such as cURL. We also provide raw Kibana requests if you are using the Kibana Console (recommended).
If you want to delete and index in Elasticsearch, you first need to verify it exists before sending the DELETE request.
If you try to delete a non-existing index, you will get an error, similar to the one shown below:
For cURL command:
Deleting an index will give an error as:
There are various ways to check if an index exists; the best is to list its name. For example, you can use wildcards to match a specific name.
The example request below lists indices with names te*
The cURL command is:
This command should return all the indices matching that specific pattern, allowing you to remember only the partial name of the index you wish to remove.
Another way is to add the ignore_unavailable parameter to the request. For example:
Once you have the index you wish to remove from Elasticsearch, use the DELETE request followed by the index name.
The general syntax is:
The index name can be a specific index or a wildcard that selects a group of indices. Ensure to use wildcards correctly; otherwise, you might remove the wrong indices.
NOTE: Deleting Elasticsearch indices using aliases is disallowed.
Consider the example request below that removes the temp_1 index:
For cURL command:
Executing this command should respond with a JSON object, indicating the successful removal of the index.
Elasticsearch is smart enough to know that you can remove indices accidentally. Therefore, you can set what types of wildcard expressions are allowed.
These type of wildcards expressions include:
For this quick and simple guide, we discussed the process of using Elasticsearch to delete indices from a cluster. We also discussed simple ways you can implement to avoid errors for indices that do not exist.
Thank you for reading.
]]>This tutorial will walk you through the ins and outs of Elasticsearch index templates that allow you to define templates or blueprints for common indices. For example, if you are constantly logging data from external sources, you can define a blueprint for all logging indices.
NOTE: Before we begin, it is good to note that the tutorial focuses on the latest version of Elasticsearch—7.8 at the time of writing—and it may vary from other versions. We also assume that you have Elasticsearch running on a system somewhere.
Let us get started working with Elasticsearch index templates.
An Elasticsearch index template is a method used to instruct Elasticsearch to configure indices upon creation. For example, an index template used on a data stream configures the stream’s backing indices upon creation. An index template is created manually before index creation. When creating an index, the template applies the configuration settings for the index.
The latest version of Elasticsearch has two types of usable templates. One is the index template, and the other is component templates. As we have already established, index templates help create Elasticsearch indices.
Component templates are reusable modules or blocks used to configure settings, mapping, and aliases. Component templates do not get applied directly to the created indices but can help create index templates.
Some default index templates used by Elasticsearch include: metrics-*-*, logs-*-* .
To create new index templates or update existing ones, we use the PUT template API. Using the _index_template endpoint, we can send an HTTP request to add a template.
The general syntax for creating a template is:
It is good to note that the template name is a required parameter. Consider the request below that creates an index template as template_1
For cURL users, the command is:
Elasticsearch uses a wildcard pattern to match index names where the templates are applied. Changing or updating an index template does not affect already created indices only the ones which will be created after using that template.
From above, you can comment on your templates using the C-Language commenting method. You can add as many comments as you want, anywhere in the body except the curly braces’ opening.
In the body of an index template, you can include various definition such as:
There are other properties you can include in the index template body. Consider the documentation to learn more.
https://www.elastic.co/guide/en/elasticsearch/reference/7.10/index-templates.html
Below is an example request to create a new template with version 1.0
You cannot have more than one index template with a matching pattern and the same priority. Hence, ensure to assign different priorities to match pattern templates.
To view information about an index template, send a GET request to the _index_template API. For example, to view information about template_2, use the request:
The cURL command is:
This command should display information about template_2
You can also use wildcards to get matching templates. For example, consider the request below to view all templates in Elasticsearch.
The cURL command is.
This command should give you information about all templates in Elasticsearch
Deleting a template is just as simple as the GET template but using DELETE request as:
You can use the cURL command:
This command automatically deletes the specified template.
This tutorial covered what Elasticsearch index templates are, how they work, and how to create, view, and delete index templates. This basic information should help you get started on using Elasticsearch index templates.
]]>To help safeguard against data loss, Elasticsearch has various features that allow you to ensure data availability, even in data failure instances.
Some of the ways that Elasticsearch uses to provide you with data availability include:
This tutorial shows you how to create cluster snapshots, which will help you be ready should an irreversible data failure event occur.
Let’s get started.
As mentioned, an elastic snapshot is a backup copy of a running Elasticsearch cluster. This snapshot can be of an entire cluster or specific indices and data streams within a particular cluster.
As you will soon learn, a repository plugin manages Elasticsearch snapshots. These snapshots are storable in various storage locations defined by the plugin. These include local systems and remote systems such as GCP Storage, Amazon EC2, Microsoft Azure, and many more.
Before we dive into creating Elasticsearch snapshots, we need to create a snapshot repository because many of Elasticsearch’s services use the Snapshot API to perform these tasks.
Some of the tasks handled by the Snapshot API are:
To create a snapshot repository, we use the _snapshot API endpoint followed by the name we want to assign to the snapshot repository. Consider the request below that creates a repository called backup_repo
Here’s a cURL command for the above request:
To pass the snapshot repository path, you must first add the system’s path or the parent directory to the path.repo entry in elasticsearch.yml
The path.repo entry should look similar to:
You can find the Elasticsearch configuration file located in /etc/elasticsearch/elasticsearch.yml
NOTE: After adding the path.repo, you may need to restart Elasticsearch clusters. Additionally, the values supported for path.repo may vary wildly depending on the platform running Elasticsearch.
To confirm the successful creation of the snapshot repository, use the GET request with the _snapshot endpoint as:
You can also use the following cURL command:
This should display information about the backup repository, for example:
If you have more than one snapshot repositories and do not remember the name, you can omit the repo name and call the _snapshot endpoint to list all the existing repositories.
GET /_snapshot or cURL curl -XGET http://localhost:9200/_snapshot
Creating an Elasticsearch snapshot for a specific snapshot repository is handled by the create snapshot API. The API requires the snapshot repository name and the name of the snapshot.
NOTE: A single snapshot repository can have more than one snapshot of the same clusters as long as they have unique identities/names.
Consider the following request to add a snapshot called snapshot_2021 to the backup_repo repository.
To use cURL, use the command:
The command should return a response from Elasticsearch with 200 OK and accepted: true
Since it does not specify which data streams and indices you want to have backed up, calling the above request backups all the data and the cluster state. To specify which data streams and indices to back up, add that to the request body.
Consider the following request that backups the .kibana index (a system index) and specifies which user authorized the snapshot and the reason.
The cURL command for that is:
The ignore_unavailable sets a Boolean state that returns an error if any data streams or indices specified in the snapshot are missing or closed.
The include_global_state parameter saves the cluster’s current state if true. Some of the cluster information saved include:
NOTE: You can specify more than one indices separated by commas.
A common argument used with the _snapshot endpoint is wait_for_completion, a Boolean value defining whether (true) or not (false) the request should return immediately after snapshot initialization (default) or wait for a snapshot completion.
For example:
The cURL command is:
When you have the wait_for_completion parameter set to true, you’ll give an output similar to the one shown below:
The GET snapshot API handles the view snapshots functionality.
All you need to pass in the request is the snapshot repository and the name of the snapshot you wish to view the details.
The snapshot should respond with details about a specified snapshot. These details include:
For example, to view the details about the snapshot_3 created above, use the request shown below:
The request should return a response with the details of the snapshot as:
You can also customize the request body to get specific details about a snapshot. However, we will not look into that for now.
Let us say you want to view information about all snapshots in a specific snapshot repository; in that case, you can pass an asterisk wildcard in the request as:
The cURL command for that is:
The response is a detailed dump of all the snapshots in that repository as:
Wildcards are very useful for filtering specific information about the snapshots.
Deleting a snapshot is very simple: all you have to do is use the DELETE request as:
The cURL command is:
The response should be acknowledged:true
If the snapshot does not exist, you will get a 404 status code and snapshot missing error as:
In this guide, we have discussed how to create Elasticsearch snapshots using the Snapshot API. What you’ve learned should be enough to allow you to create a snapshot repository, view the snapshot repositories, create, view, and delete snapshots. Although there’re customizations you can make with the API, the knowledge in this guide should be enough to get you started.
Thank you for reading.
]]>In this quick tutorial, we will look at Elasticsearch, specifically how to create indices in the Elasticsearch engine. Although you do not need any comprehensive knowledge about ELK stack to follow this tutorial, having a basic understanding of the following topics might be advantageous:
NOTE: This tutorial also assumes that you have Elasticsearch installed and running on your system.
Without oversimplifying or overcomplicating things, an Elasticsearch index is a collection of related JSON documents.
As mentioned in a previous post, Elasticsearch indices are JSON objects—considered the base unit of storage in Elasticsearch. These related JSON documents are stored in a single unit that makes up an index. Think of Elasticsearch documents as tables in a relational database.
Let’s relate an Elasticsearch index as a database in the SQL world.
Elasticsearch uses a powerful and intuitive REST API to expose its services. This functionality allows you to use HTTP requests to perform operations on the Elasticsearch cluster. Therefore, we will use the create index API to create a new index.
For this guide, we will use cURL to send the requests and preserve integrity and usability for all users. However, if you encounter errors with cURL, consider using Kibana Console.
The syntax for creating a new index in Elasticsearch cluster is:
To create an index, all you have to do is pass the index name without other parameters, which creates an index using default settings.
You can also specify various features of the index, such as in the index body:
The index name is a required parameter; otherwise, you will get an error for the URIL (/)
To create a new index with the name single_index, we pass the request:
For cURL, use the command:
This command should result in HTTP Status 200 OK and a message with acknowledged: true as:
The request above creates an index single_index with default settings as we did not specify any configurations.
When creating names for Elasticsearch indices, you must adhere to the following naming standards:
When using the PUT request to create an index, you can pass various arguments that define the settings for the index you want to have created. Values you can specify in the body include:
For an example of creating an index with body configurations, consider the request below:
For a cURL equivalent request:
The above request creates a new index with the name single_index_with_body with 2 numbers of shards and 2 replicas. It also creates a mapping with a field of name field1 and type as a JSON object.
Once you send the request, you will get a response with the status of the request as:
“Acknowledged” shows whether the index was successfully created in the cluster, while “shards_acknowledged” shows whether the required number of shard copies were started for every shard in the specified index before time out.
To view the information about the index you created, use a similar request to that of creating an index, but use the HTTP method instead of PUT as:
For cURL,
This command will give you detailed information about the requested index as:
This guide discussed how to work with Elasticsearch to create index API to create new indices. We also discussed how to create suitable names for the indices and configuration settings.
By using this guide, you can now create and view indices using the Elasticsearch API.
]]>Elasticsearch is a free and open-source search and analytic engine used to collect, manage, and analyze data.
Elasticsearch is a comprehensive tool that uses Apache Lucene to process text, numerical, structured, and unstructured geospatial data. Elasticsearch uses a simple and very powerful REST API that allows users to configure and manage it. When coupled with other tools such as Kibana and Logstash, it is one of the most popular real-time and Data Analysis Engines.
Once data is collected from sources like system logs, metrics, application data, etc., it gets added to Elasticsearch and indexed, allowing you to perform complex data queries and create summaries and informative dashboards using visualization tools like Kibana.
Having ironed out what Elasticsearch is, let’s talk about one of the most important things about Elastic: an index.
In Elasticsearch, an index refers to a collection of closely related documents in the form of JSON data. The JSON data correlates the keys with corresponding values to their keys.
Here’s an example of a JSON document:
Elasticsearch indexes are in the form of an inverted index, which Elasticsearch search using full-texts. An inverted index works by listing all the unique words in any Elasticsearch document and accurately matches the document in which the word transpires.
The Inverted indexing feature provided by Elasticsearch also allows for real-time search and can be updated using the Elasticsearch indexing API.
Elasticsearch exposes its services and functionality using a very Powerful REST API. Using this API, we can create an alias for an Elasticsearch Index.
An Elastisearch index alias is a secondary name or identifier we can use to reference one or more indices.
Once you create an index alias, you can reference the index or indices in Elasticsearch APIs.
An example of an appropriate index would be indices that store system logs for apache. If you regularly query apache logs, you can create an alias for apache_logs, and query and update that specific index.
To create an alias for a particular index, we use the PUT request followed by the index’s path and the alias to create.
In REST, we use a PUT method to request the passed entity or value to get stored at the request URL. Simply put, an HTTP PUT method allows you to update information about a resource or create a new entry if none exists.
For this tutorial, I am assuming you have Elasticsearch installed, and you have an API client or a tool to send HTTP requests such as cURL.
Let us start by creating a simple index with no alias or parameters.
For simplicity, we will use cURL as we assume you have only installed Elasticsearch without Kibana. However, if you have Kibana installed or encounter errors when using curl, consider using the Kibana Console because it’s better suited for Elasticsearch API requests.
This command creates a simple index using default settings and returns the following.
Now that we have an index in Elasticsearch, we can create an alias using the same PUT request as:
We start by specifying the method, in this case, a PUT followed by the URL of the index to which we want to add an alias. The next is the API we want to use, in this case, the Index Alias API (_alias) followed by the name of the alias we want to assign to the index.
Here’s the cURL command for that:
This command should respond with 200 OK status and “acknowledged”:
You may also come across a method to add an alias to an index as:
Using Elasticsearch index alias API, you can add, update and remove index aliases as you see fit.
When you create sophisticated aliases such as those filtered to a specific user, you might want to get information about the index. You can view the information using the GET method as:
Here is the cURL command:
This command will display the information regarding the alias. Since we have not added any information, it will typically resemble.
Ensure that the alias exist to avoid getting a 404 error as shown below:
The result will be an “alias does not exist or missing” as:
To remove an existing alias from an index, we use the method we’ve used to add an alias but with a DELETE request instead. For example:
The equivalent cURL command is:
Elasticsearch should respond with 200 OK and acknowledged: true
There are other ways to update and remove aliases from an index in Elasticsearch. However, for simplicity, we have stuck with a single request.
In this simple tutorial, we have looked at creating an Elasticsearch index and then an alias. We have also covered how to delete an alias.
It’s worth noting that this guide is not the most definitive in the world; its purpose was to serve as a starter guide for creating Elasticsearch, not a comprehensive guide.
If you wish to learn more about the Elastic Index API, consider the resources below.
We also recommend having a basic knowledge of working with Elasticsearch and API; it will be of great help when working with the ELK stack.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-add-alias.html
]]>This tutorial will show you how to create a Docker image that integrates Elasticsearch, Kibana, and Logstash. You can then use the image to deploy the ELK stack on any Docker container.
For this guide, we will start by installing and setting up Docker on a system. Once we set up Docker, we will deploy a container running Elasticsearch, Kibana, and Logstash in the same system. In that Container, we can then tweak and customize Elastic Stack to our needs.
Once we have the appropriate ELK stack, we will export the Docker container to an image you can use to build other containers.
The very first thing we need to do is install Docker on a system. For this tutorial, we are using Debian 10 as the base system.
The very first step is to update the apt packages using the following command:
Next, we need to install some packages that will allow us to use apt over HTTPS, which we can do using the following command:
The next step is to add the Docker repository GPG key using the command:
From there, we need to add the Docker repository to apt using the command:
Now we can update the package index and install Docker:
Now that we have Docker up and running on the system, we need to pull a Docker container containing the ELK stack.
For this illustration, we will use the elk-docker image available in the Docker registry.
Use the command below to pull the Docker image.
Once the image has been pulled successfully from the docker registry, we can create a docker container using the command:
Once you create the Container, all the services (Elasticsearch, Kibana, and Logstash) will be started automatically and exposed to the above ports.
You can access the services with the addresses
Once we have ELK up and running on the Container, we can add data, modify the settings, and customize it to meet our needs.
For the sake of simplicity, we will add sample data from Kibana Web to test it.
On the main Kibana home page, select Try sample data to import sample.
Choose the data to import and click on add data
Now that we have imported and modified the Container, we can export it to create a custom Elk image that we can use for any Docker image.
With all the changes in the Elastic stack container, we can export the Container to an image using a single command as:
Using the above command, we created the image elkstack with the tag version2 to the docker repository myrepo. This saves all the changes we made from the Container, and you can use it to create other containers.
This quick and simple guide showed you how to create a custom ELK image for Docker with changes. For those experienced with Docker, you can use Dockerfiles to accomplish the same tasks but with more complexity.
]]>Monitoring and analyzing logs for various infrastructures in real-time can be a very tedious job. When dealing with services like web servers that constantly log data, the process can very be complex and nearly impossible.
As such, knowing how to use tools to monitor, visualize, and analyze logs in real-time can help you trace and troubleshoot problems and monitor suspicious system activities.
This tutorial will discuss how you can use one of the best real-time log collections and analyzing tools- ELK. Using ELK, commonly known as Elasticsearch, Logstash, and Kibana, you can collect, log, and analyze data from an apache web server in real-time.
ELK is an acronym used to refer to three main open-source tools: Elasticsearch, Logstash, and Kibana.
Elasticsearch is an open-source tool developed to find matches within a large collection of datasets using a selection of query languages and types. It is a lightweight and fast tool capable of handling terabytes of data with ease.
Logstash engine is a link between the server-side and Elasticsearch, allowing you to collect data from a selection of sources to Elasticsearch. It offers powerful APIs that are integrable with applications developed in various programming languages with ease.
Kibana is the final piece of the ELK stack. It is a data visualization tool that allows you to analyze the data visually and generate insightful reports. It also offers graphs and animations that can help you interact with your data.
ELK stack is very powerful and can do incredible data-analytics things.
Although the various concepts we’ll discuss in this tutorial will give you a good understanding of the ELK stack, consider the documentation for more information.
Elasticsearch: https://linkfy.to/Elasticsearch-Reference
Logstash: https://linkfy.to/LogstashReference
Kibana: https://linkfy.to/KibanaGuide
Before we begin installing Apache and all dependencies, it’s good to note a few things.
We tested this tutorial on Debian 10.6, but it will also work with other Linux distributions.
Depending on your system configuration, you need sudo or root permissions.
ELK stack compatibility and usability may vary depending on versions.
The first step is to ensure you have your system fully updated:
The next command is to install the apache2 webserver. If you want a minimal apache installed, remove the documentation and utilities from the command below.
By now, you should have an Apache server running on your system.
We now need to install the ELK stack. We will be installing each tool individually.
Let us start by installing Elasticsearch. We are going to use apt to install it, but you can get a stable release from the official download page here:
https://www.elastic.co/downloads/elasticsearch
Elasticsearch requires Java to run. Luckily, the latest version comes bundled with an OpenJDK package, removing the hassle of installing it manually. If you need to do a manual installation, refer to the following resource:
https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html#jvm-version
In the next step, we need to download and install the official Elastic APT signing key using the command:
Before proceeding, you may require an apt-transport-https package (required for packages served over https) before proceeding with the installation.
Now, add the apt repo information to the sources.list.d file.
echo “deb https://artifacts.elastic.co/packages/7.x/apt stable main” | sudo tee /etc/apt/sources.list.d/elastic-7.x.list
Then update the packages list on your system.
Install Elasticsearch using the command below:
Having installed Elasticsearch, start and enable a start on boot with the systemctl commands:
The service may take a while to start. Wait a few minutes and confirm that the service is up and running with the command:
Using cURL, test if the Elasticsearch API is available, as shown in the JSON output below:
Install the logstash package using the command:
Enter the command below to install kibana:
Here’s how to configure the ELK stack:
In Elasticsearch, data gets ordered into indices. Each of these indexes has one or more shard. A shard is a self-contained search engine used to handle and manage indexes and queries for a subset in a cluster within Elasticsearch. A shard works as an instance of a Lucene index.
Default Elasticsearch installation creates five shards and one replica for every index. This is a good mechanism when in production. However, in this tutorial, we will work with one shard and no replicas.
Start by creating an index template in JSON format. In the file, we will set the number of shards to one and zero replicas for matching index names (development purposes).
In Elasticsearch, an index template refers to how you instruct Elasticsearch in setting up the index during the creation process.
Inside the json template file (index_template.json), enter the following instructions:
Using cURL, apply the json configuration to the template, which will be applied to all indices created.
Once applied, Elasticsearch will respond with an acknowledged: true statement.
For Logstash to gather logs from Apache, we must configure it to watch any changes in the logs by collecting, processing, then saving the logs to Elasticsearch. For that to happen, you need to set up the collect log path in Logstash.
Start by creating Logstash configuration in the file /etc/logstash/conf.d/apache.conf
Now ensure to enable and start logstash service.
To enable Kibana, edit the main .yml config file located in /etc/kibana/kibana.yml. Locate the following entries and uncomment them. Once done, use systemctl to start the Kibana service.
Kibana creates index patterns based on the data processed. Hence, you need to collect logs using Logstash and store them in Elasticsearch, which Kibana can use. Use curl to generate logs from Apache.
Once you have logs from Apache, launch Kibana in your browser using the address http://localhost:5601, which will launch the Kibana index page.
In the main, you need to configure the index pattern used by Kibana to search for logs and generate reports. By default, Kibana uses the logstash* index pattern, which matches all the default indices generated by Logstash.
If you do not have any configuration, click create to start viewing the logs.
As you continue to perform Apache requests, Logstash will collect the logs and add them to Elasticsearch. You can view these logs in Kibana by clicking on the Discover option on the left menu.
The discover tab allows you to view the logs as the server generates them. To view the details of a log, simply click the dropdown menu.
Read and understand the data from the Apache logs.
In the Kibana interface, you will find a search bar that allows you to search for data using query strings.
Example: status:active
Learn more about ELK query strings here:
Since we are dealing with Apache logs, one possible match is a status code. Hence, search:
This code will search for logs with the status code of 200 (OK) and display it to Kibana.
You can create visual dashboards in Kibana by selecting the Visualize tab. Select the type of dashboard to create and select your search index. You can use the default for testing purposes.
In this guide, we discussed an overview of how to use the ELK stack to manage logs. However, there is more to these technologies that this article can cover. We recommend exploring on your own.
]]>We will start working with Best Practices to follow with Elasticsearch and what problems it can create when we avoid these points. Let’s get started.
One thing ES can surely do is, working without mappings. So, when you start feeding JSON data to your ES index, it will iterate over the fields of data and create a suitable mapping. This seems direct and easy as ES is selecting the data-type itself. Based on your data, you might need a field to be of specific data-type.
For example, suppose you index the following document:
This way, Elasticsearch will mark the “date” field as “date” type. But when you index the following document:
This time, the type of the date field has been changed and ES will throw an error and won’t allow your document to be indexed. To make things easy, you can index a few documents, see what fields are indexed by ES and grab the mapping from this URL:
This way, you won’t have to construct the complete mapping as well.
The default cluster name that ES starts is called elasticsearch. When you have a lot of nodes in your cluster, it is a good idea to keep the naming flags as consistent as possible, like:
Apart from this, recovery settings for nodes matter a lot as well. Suppose some of the nodes in a cluster restart due to a failure and some nodes restart a little after other nodes. To keep the data consistent between all these nodes, we will have to run consistency program that will keep all clusters in a consistent state.
It is also helpful when you tell the cluster in advance how many nodes will be present in the cluster and how much recovery time will these need:
With the correct config, a recovery which would have taken hours can take as little as a minute and can save a lot of money to any company.
It is important to know how much space your data will take and the rate at which it flows into Elasticsearch, because that will decide the amount of RAM you will need on each of the node of the cluster and the master node as well.
Of course, there are no specific guidelines to achieve the numbers needed but we can take some steps which provide us with a good idea. One of the steps will be to simulate the use-case. Make an ES cluster and feed it with almost the same rate of data as you would expect with your production setup. The concept of start big and scale down can also help you be consistent about how much space is needed.
When you define indexed large templates, you will always face issues related to syncing the template across your various nodes of the cluster. Always note that the template will have to be re-defined whenever a data model change occurs. It is a much better idea to keep the templates as dynamic. Dynamic Templates automatically update field mappings based on the mappings we defined earlier and the new fields. Note that there is no substitute to keeping the templates as small as possible.
Linux makes use of Swapping process when it needs memory for new pages. Swapping make things slow as disks are slower than the memory. The mlockall property in ES configuration tells ES to not swap its pages out of the memory even if they aren’t required for now. This property can be set in the YAML file:
In the ES v5.x+ versions, this property has changed to:
If you’re using this property, just make sure that you provide ES with big enough heap-memory using the -DXmx option or ES_HEAP_SIZE.
The performance of a cluster is slightly affected whenever you make mapping update requests on your ES cluster. If you can’t control this and still want to make updates to mappings, you can use a property in ES YAML config file:
When the model update request is in pending queue for the master node and it sends data with the old mapping to the nodes, it also has to send an update request later to all the nodes. This can make things slow. When we set the above property to false, this makes master sense that an update has been made to the mapping and it won’t send the update request to the nodes. Note that this is only helpful if you make a lot of changes to your mappings regularly.
ES nodes have many thread pools in order to improve how threads are managed within a node. But there are limitations on how much data each thread can take care of. To keep track of this value, we can use an ES property:
This informs ES the number of requests in a shard which can be queued for execution in the node when there is no thread available to process the request. If the number of tasks goes higher than this value, you will get a RemoteTransportException. The higher this value, the higher the amount of heap-space will be needed on your node machine and the JVM heap will be consumed as well. Also, you should keep your code ready in case this exception is thrown.
In this lesson, we looked at how we can improve Elasticsearch performance by avoiding common and not-so-common mistakes people make. Read more Elasticsearch articles on LinuxHint. ]]>
In Elasticsearch, each document belongs to an Index and a Type. An Index can be considered as a Database whereas a Type can be seen as a Table when compared to a Relational Database. A mapping Type was a logical partition of an object with other objects which belonged to other Mapping Types in the same Index.
Each Mapping Type has its own fields. For example, a type of user can have following fields:
Another Mapping Type in the same index website can have following fields which are completely different from the user type:
While searching for a document in an index, the search could have been limited to a single document by specifying a single field as:
The _type field of the documents was combined with its _id to generate a _uid field so documents with same _id could exist in a single index.
Read Elasticsearch Tutorial for Beginners for a deeper understanding of Elasticsearch Architecture and get started with it with Install ElasticSearch on Ubuntu.
Just like what we said above while explaining how Index and Types were similar to a Database and a Table in a Relational Database, Elasticsearch team thought the same but this wasn’t the case as Lucene Engine doesn’t follow the same analogy. This is because of the following reasons:
Although the decision has been made, we still need to separate different types of data. Now, the first alternative is to separate documents in their own index which has two advantages:
Another alternative to separating the data is maintaining a custom _type field in each document we insert, like:
This is an excellent usage if you’re looking for a complete custom solution.
As removing Mapping Types is a big change, ES team is doing the process slowly. Here is a schedule for the roll out extracted from elastic.co:
In this lesson, we looked at why were Elasticsearch Mapping types removed and will be completely unsupported in coming versions.
]]>Let us also visualise how things will work:
For this lesson and all installations it needs, you should have root access to the machine. We will be using a machine with this configuration:
Few application servers from where you want to gather data from would also be a good to have.
To install Elasticsearch on Ubuntu, we must install Java first. Java might not be installed by default. We can verify it by using this command:
Checking Java version
Here is what we get back with this command:
Installing Java
Once these commands are done running, we can again verify that Java is now installed by using the same version command.
Next step for the ELK Stack setup is installing Elasticsearch on Ubuntu Machine which will store the logs generated by systems and applications. Before we can install Elasticsearch, we need to import its public GPG keys to the rpm package manager:
GPG Keys
Now, insert the mentioned lines to the configuration file for the repository ‘elasticsearch.repo’:
Repository Config
Now, read the lesson Install ElasticSearch on Ubuntu for installation process. Once ES is up and running, make sure it responds normally to this curl command:
ES Status
The normal output will be:
Installing Logstash is very easy using the apt package manager and is available with the same repository and public key as Elasticsearch, so we don’t have to do that again. Let’s create the source list to start:
Create Source list
Update the apt package list:
Updating Packages
Install Logstash with a single command:
Install Logstash
Logstash is installed but it is not configured yet. We will configure Logstash in coming sections.
Kibana is very easy to install. We can start by creating the Kibana source list:
Create Kibana source list
Now, we will update the apt package list:
Updating Packages
We are ready to install Kibana now:
Install Kibana
Once Kibana is installed, we can run it:
Start Kibana Service
Before we show you the Kibana Dashboard, we need to setup the Filebeat Log shipping agent as well.
We are ready to install Filebeat now:
Install Filebeat
Before we can start the Filebeat service, we need to configure it for the input type and document type. Because we’re using system logs only as of now, let’s mention this in the configuration file in ‘/etc/filebeat/filebeat.yml’:
Configure Filebeat
We can also start filebeat now:
Start Filebeat Service
Once filebeat is up and running, we can check that it is OK by issuing the following curl command:
Testing Filebeat
We should receive a similar result as we got in the ES installation.
We are now ready to connect to Kibana. As we already started the Kibana service, its dashboard should be visible at:
Kibana Dashoboard URL
Once you’re up on Kibana, create an index on Kibana with name ‘filebeat-*’. Now based on the logs available, you can see the metrics and logs in your Kibana Dashboard:
In this lesson, we looked at how we can install and start using the ELK Stack for log visualisation and support an excellent Dashboard for business teams.
]]>Elasticsearch is one of the most popular NoSQL databases which is used to store and search for text-based data. It is based on the Lucene indexing technology and allows for search retrieval in milliseconds based on data that is indexed.
Based on Elasticsearch website, here is the definition:
Elasticsearch is an open source distributed, RESTful search and analytics engine capable of solving a growing number of use cases.
Those were some high-level words about Elasticsearch. Let us understand the concepts in detail here.
To start using Elasticsearch, it must be installed on the machine. To do this, read Install ElasticSearch on Ubuntu.
Make sure you have an active ElasticSearch installation if you want to try examples we present later in the lesson.
In this section, we will see what components and concepts lies in the heart of Elasticsearch. Understanding about these concepts is important to understand how ES works:
Due to the concept of Horizontal scaling, we can virtually add an infinite number of nodes in an ES cluster to give it a lot more strength and indexing capabilities.
Note that Types are deprecated from ES v6.0.0 onwards. Read here why this was done.
Elasticsearch is known for its near real-time searching capabilities and the flexibilities it provides with the type of data being indexed and searched. Let’s start studying how to use search with various types of data.
A match query will find all three documents when searched for Ball hit. A proximity search can tell us how far these two words appear in the same line or paragraph due to which they matched.
SQL Queries: Partial Matching
On some occasions, we only need to run partial match queries even when they can be considered like brute-force techniques.
When it comes to an analytics engine, we usually need to run analysis queries in a Business-Intelligence (BI) domain. When it comes to Business Analysts or Data Analysts, it wouldn’t be fair to assume that people know a programming language when they want to visualise data present in ES Cluster. This problem is solved by Kibana. Kibana offers so many benefits to BI that people can actually visualise data with an excellent, customisable dashboard and see data inetractively. Let’s look at some of its benefits here.
At the core of Kibana is Interactive Charts like these:
Kibana comes supported with various type of charts like pie charts, sunbursts, histograms and much more which uses the complete aggregation capabilities of ES.
Kibana also supports complete Geo-Aggregation which allows us to geo-map our data. Isn’t this cool?!
With Pre-built Aggregations and Filters, it is possible to literally frag, drop and run highly optimized queries within the Kibana Dashboard. With just a few clicks, it is possible to run aggregated queries and present results in the form of Interactive Charts.
With Kibana, it is also very easy to share dashboards to a much wider audience without doing any changes to the dashboard with the help of Dashboard Only mode. We can easily insert dashboards into our internal wiki or webpages.
Feature images taken form Kibana Product page.
To see the instance details and the cluster information, run the following command:
Now, we can try inserting some data into ES using the following command:
Inserting Data
Here is what we get back with this command:
Let’s try getting the data now:
Getting Data
When we run this command, we get the following output:
In this lesson, we looked at how we can start using ElasticSearch which is an excellent Analytics Engine and provides excellent support for near real-time free-text search as well.
]]>Elasticsearch is one of the most popular NoSQL databases which is used to store and search for text based based data.
Elasticsearch is based on the lucene indexing technology and allows for search retrieval in milliseconds based on data that is indexed. It supports database queries through REST APIs. This means that we can use simple HTTP calls and use HTTP methods like GET, POST, PUT, DELETE etc. to access data.
To install Elasticsearch on Ubuntu, we must install Java first. Java might not be installed by default. We can verify it by using this command:
When we run this command, we get the following output:
We will now install Java on our system. Use this command to do so:
Once these commands are done running, we can again verify that Java is now installed by using the same command.
Now, installing Elasticsearch is just a matter of few commands. To start, download the Elasticsearch package file from ES page:
When we run the above command, we will see the following output:
Next we can install the downloaded file the dpkg command:
When we run the above command, we will see the following output:
Make sure that you download the deb package only from the ES website.
The config files for Elasticsearch will be stored at /etc/elasticsearch. To make sure that Elasticsearch is started and stopped with the machine, run the following command:
We have an active installation for Elasticsearch now. To use Elasticsearch effectively, we can some important changes to the configuration. Run the following command to open the ES config file:
We first modify the node.name and cluster.name in elasticsearch.yml file. Remember to remove the # before each line you want to edit to unmark it as a comment.
Modify these properties:
Once you’re done with all the config changes, start the ES server first time:
When we run this command and check the service status, we get the following output:
Now that Elasticsearch has started, we can start using it for our commands.
To see the instance details and the cluster information, run the following command:
You may have to install curl, do it so using this command:
When we run this command, we get the following output:
Now, we can try inserting some data into ES using the following command:
When we run this command, we get the following output:
Let’s try getting the data now:
When we run this command, we get the following output:
In this quick post, we learned how we can install Elasticsearch and run basic queries on it.
]]>See release notes for Elasticsearch v5.2.0 and Kibana v5.2.0 for full details of release.
How to install Java 8 latest update on CentOS
sudo rpm --import http://packages.elasticsearch.org/GPG-KEY-elasticsearch
sudo vi /etc/yum.repos.d/elasticsearch.repo
[elasticsearch-5.x] name=Elasticsearch repository for 5.x packages baseurl=https://artifacts.elastic.co/packages/5.x/yum gpgcheck=1 gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-md
sudo yum install elasticsearch
sudo vi /etc/elasticsearch/elasticsearch.yml
/etc/init.d/elasticsearch status sudo chkconfig --levels 235 elasticsearch on
How to install Java 8 latest update on CentOS
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
sudo vi /etc/yum.repos.d/kibana.repo
[kibana-5.x] name=Kibana repository for 5.x packages baseurl=https://artifacts.elastic.co/packages/5.x/yum gpgcheck=1 gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-md
sudo yum install kibana
sudo -i service kibana stop sudo -i service kibana start
ps -p 1
—– For SysV init —–
sudo chkconfig --add kibana sudo -i service kibana stop sudo -i service kibana start
—– For systemd —–
sudo /bin/systemctl daemon-reload sudo /bin/systemctl enable kibana.service sudo systemctl stop kibana.service sudo systemctl start kibana.service
http://localhost:5601
]]>You can install nginx and configure it to act as a proxy server. This would enable you access kibana via port 80