Each Elasticsearch index consists of shards. Shards are a logical and physical separation of an index. In this article, we will talk about sizing Elasticsearch shards – an important part of it, which seriously affects the performance of the cluster. In high-load systems, choosing the right storage architecture configuration will significantly save on hardware. Most of the article is based on relevant section Elastic documentation. Details under the cut.
Elasicsearch shard sizing
How Elasticsearch works with shards
Search queries usually fall into several shards (in production loaded environments, we recommend using a node with the coordinating role). Each shard performs a search query in one processor thread. If there are a lot of queries, the pool of search threads ends (namely search, since there are others), which leads to queues, decreased performance and, as a result, slow search speed.
Each shard uses memory and processor resources. A small number of larger shards use fewer resources than many smaller ones.
Let’s now take a look at the segments out of the corner of our eye (see the picture below). Each Elasticsearch shard is a Lucene index. The maximum number of documents that can be dropped into the Lucene index is 2,147,483,519. The Lucene index is divided into smaller data blocks called segments. A segment is a small Lucene index. Lucene searches all segments sequentially. Most shards contain multiple shards that store index data. Elasticsearch stores segment metadata in the JVM Heap so that it can be quickly retrieved for retrieval. As the shard grows in size, its shards merge into fewer, larger shards. This reduces the number of segments, which means less metadata is stored in heap (see also forcemerge, to which we will return a little later in the article).
Another thing to say about the rebalancing of the cluster. If a new node is added or one of the nodes fails, the cluster is rebalanced. Rebalancing itself is an expensive operation from a performance point of view. A cluster is balanced if it has an equal number of shards on each node and there is no concentration of shards of any index on any node. Elasticsearch starts an automatic process called rebalancing that moves shards between nodes in the cluster to rebalance it. When rebalancing, pre-set values are applied. segment allocation rules (We will tell you more about allocation awareness and other rules in one of the following articles). If you are using data tiersElasticsearch will automatically place each shard at the appropriate level. The balancer works independently at each level.
How to make Elasticsearch work even better with shards
Correctly delete data. If you delete a document, it will not be deleted immediately from the file system. Instead, Elasticsearch marks the document as deleted on every shard. The marked document will continue to use resources until it is deleted during periodic segment merging. If you need to physically reclaim space, it’s best to drop entire indexes right away, which will eventually free up the file system.
Create shards ranging in size from 10 to 50 GB. Elastic says shards larger than 50GB have the potential to reduce the likelihood of cluster failover. Because of the very rebalancing that we talked about at the beginning of the article. Well, large shards are more expensive to transmit over the network. The 50GB limit looks like a spherical horse in a vacuum, of course, so we’re leaning more towards 10GB ourselves. This is where the person advises 10 GB and look at the size of the documents in the following plan:
- 0 to 4 million documents per index: 1 shard.
- 4 to 5 million documents per index: 2 shards.
- More than 5 million documents are counted according to the formula: (number of documents / 5 million) + 1 shard.
20 or less shards per 1 GB JVM Heap. The number of shards a node can juggle is proportional to the size of the JVM Heap node. For example, a node with 30 GB JVM Heap should have no more than 600 shards. Less is more likely to be better. If this proportion is not met, you can add a node. Let’s see how much JVM Heap is used there on each node:
Now let’s see how many shards are on each node and see that everything is in order with our test benches. Will live.
The number of shards on a node can be limited using the option index.routing.allocation.total_shards_per_node, but if there are already a lot of them, take a closer look at Shrink API…
It is not at all necessary to create indexes that are 1 day in size. We often met with customers an approach in which a new index was created every new day. Sometimes this is justified, sometimes you can wait a month. After all, a rollover can be launched not only with max_age, but also with max_size or max_docs. On Habré there was an article in which Adel Sachkov, at that time from Yandex Money (now no longer), shared a useful life hack: he created indexes not at the time of the onset of a new day, but in advance, so that this process would not affect the performance of the cluster, but he there were microservices.
… every day new indices are created on the number of microservices – so earlier every night the elastic would clinch for about 8 minutes, while hundreds of new indices were created, several hundred new shards, the disk load graph went to the shelf, queues for sending logs to elastic on hosts, and Zabbix bloomed with alerts like a Christmas tree. To avoid this, it was common sense to write a Python script to pre-create the indexes.
With the Christmas tree, it turned out to be a good pun.
Don’t neglect ILM and forcemerge. Indexes should flow smoothly between the corresponding nodes according to ILM… OpenDistro has a similar mechanism.
With indexes that are no longer written to, you can forcemerge — merging smaller segments into larger ones. This will ultimately reduce the overhead of operating shards and increase the search speed. Forcemerge requires significant resources, so it’s best to do it around off-peak hours. Let’s add that forcemerge is the actual creation of a new segment from two old ones, so free disk space will definitely not be superfluous.
Come in the comments and tell us about your experience with expanding shards into nodes. It would be interesting to know what works in your case.
Webinar announcement. Elastic invites you to visit the webinar on March 17 at 12 o’clock Moscow time Elastic Telco Day: Applications and operational highlights from telco environments… Experts will talk about the application of Elastic solutions in telecom. Check in.
Meetup suggestions. We are planning an online Elastic meetup in April. Write in the comments or in a personal what topics you would be interested to make out, what speakers to hear. If you would like to speak yourself and have a story to tell, write too. Join the group Elastic Moscow User Group, so as not to miss the announcement of the meetup.
Channel in the cart. Subscribe to our channel Elastic Stack Recipes, there are interesting materials and announcements of events.
Read our other articles:
- Sizing an Elasticsearch Cluster and Testing Performance in Rally
- Sizing Elasticsearch
- How Elastic Stack (Elasticsearch) licenses are licensed and different
- Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)
- Elastic under lock and key: enabling security options for the Elasticsearch cluster for access from inside and outside
If you are interested in Elastic Stack implementation, administration and support services, you can leave a request at feedback form on a dedicated page.