Elasticsearch Anti-Patterns

Dinesh Naik
3 min readNov 21, 2023

--

An anti-pattern is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive.

Image by David Zydd from Pixabay

When it comes to achieving high performance in Elasticsearch clusters with strict SLAs and millisecond response times, it’s important to avoid certain anti-patterns that can hinder performance.

Elasticsearch is build on top of Lucene, a full-text search engine, written in Java. Solr and Elasticsearch are similar in many ways. You can read about Solr Anti patterns here.

Here are some common Elasticsearch anti-patterns to watch out for:

  1. Shards Misconfiguration: Having too many or too few shards can negatively impact performance. Too many shards lead to increased overhead and resource consumption, while too few shards can limit parallelism and scalability. It's crucial to strike the right balance and optimize the number of shards based on cluster size, hardware, and workload.
  2. Over-reliance on Sorting: Sorting large result sets can be resource-intensive. Sorting at query time can significantly impact response times. Consider pre-sorting or using other techniques like pagination and caching to avoid the need for extensive sorting.
  3. Mapping Explosion: Creating too many field mappings can cause index size inflation, increase memory consumption, and slow down indexing and search operations. Design your mappings carefully, avoid excessive use of multi-fields, and consider using dynamic mapping templates to control mapping explosion.
  4. Frequent Cluster State Changes: Rapid changes to cluster state, such as frequent shard rebalancing, adding/removing nodes, or index creation/deletion, can impact cluster stability and introduce unnecessary overhead. Minimize cluster state changes, automate them cautiously, and plan maintenance activities during off-peak hours.
  5. Over-fetching: Retrieving excessive data from Elasticsearch can lead to increased network latency and reduced query performance. Optimize your queries by selecting only the necessary fields, limiting the number of documents returned, and leveraging features like source filtering and partial updates.
  6. Storing billions of documents in a shard:
    Keep in mind that Lucene/Solr/Elasticsearch has a limitation of about 2.14 billion documents per lucene-index , But in Solr/Elasticsearch, even deleted documents count toward a regular document and the result of an update is not 1 document → it is 1 deleted document and 1 new document. So deleted documents also count toward the 2.14 billion document limit. Since it’s hard to estimate the deleted documents, it is recommended to limit the number of regular documents to 1 billion. This gives you room for deleted documents and some buffer to expand.
  7. Lack of Query Optimization: Writing inefficient queries can have a significant impact on response times. Avoid unnecessary filtering, nested queries, or inefficient search techniques. Understand the query DSL, leverage caching mechanisms, and use query profiling tools to identify and optimize slow queries.
  8. Inadequate Hardware/Resources: Insufficient hardware resources, including CPU, memory, and disk I/O, can limit Elasticsearch’s performance. Ensure your cluster is adequately provisioned based on your workload requirements. Monitor system resource utilization regularly and scale up resources when necessary. Refer this for tuning disk usage.
  9. Ignoring Query and Indexing Performance Tuning: Elasticsearch provides various performance tuning options such as caching, throttling, and bulk indexing. Ignoring these optimizations can negatively impact cluster performance. Understand the available options, monitor and optimize query and indexing performance, and fine-tune your cluster accordingly. Refer
  10. Lack of Monitoring and Alerting: Without proper monitoring and alerting, it’s difficult to identify performance bottlenecks or deviations from SLAs. Implement comprehensive monitoring, track relevant metrics (e.g., query latency, indexing rate, JVM heap usage), set up alerts for critical thresholds, and regularly analyze the collected data to optimize cluster performance.
  11. Heap size: Too large or too small
    You need to understand that one size does not fit all when it comes to heap usage in Elasticsearch. You want a heap that’s large enough so that you don’t have OOM exceptions and problems with constant garbage collection, but small enough that you’re not wasting memory or running into huge garbage collection pauses.
  12. Inadequate Cluster Sizing and Capacity Planning: Underestimating the required resources or improper cluster sizing can lead to poor performance and SLA violations. Conduct thorough capacity planning exercises, simulate workload scenarios, and ensure you have enough nodes, shards, and replicas to handle your expected traffic.

Conclusion:

By avoiding these anti-patterns and implementing best practices specific to your use case, you can optimize the performance of your Elasticsearch cluster, meet strict SLAs, and achieve millisecond response times.

--

--