Why does my Python ES client timeout even though the query runs fast when I test it in Kibana?

Kibana often sets a default server-side timeout or limits the size of the request. Furthermore, Kibana requests might hit a different set of coordinating nodes, or your Python application might be doing excessive parallel queries that exhaust the client-side connection pool. Check if your Python client is reusing connections properly (e.g., using `urllib3` connection pooling) and ensure the payload size isn't causing network latency.

Should I just increase the timeout in my application to fix the error?

No. Increasing the client timeout is a temporary bandage, not a cure. If your queries are taking 30+ seconds, extending the timeout will simply cause your application's worker threads to block for longer, eventually leading to application-level thread starvation and full service outages. You must address the underlying query performance or cluster resources.

What does `es_rejected_execution_exception` mean in relation to timeouts?

This exception means the node's thread pool queue (usually the `search` or `write` queue) is completely full. The node is actively refusing to accept new requests because it cannot keep up. From the client's perspective, this rejection often surfaces as a timeout or a 503 Service Unavailable error if retries are exhausted.

How do I stop a runaway query that is causing cluster-wide timeouts?

You can use the Elasticsearch Tasks API. First, find the task ID using `GET /_tasks?detailed=true&actions=*search*`. Once you identify the long-running task, you can terminate it forcefully by issuing a POST request to `/_tasks/ /_cancel`.

Does adding more memory (RAM) automatically fix Elasticsearch timeouts?

Not necessarily. While adding RAM allows you to increase the JVM Heap (up to the recommended 31GB limit for compressed ordinary object pointers), simply throwing hardware at unoptimized queries (like deep pagination or massive aggregations on high cardinality text fields) will only delay the inevitable. Optimize your mappings and queries first.

Troubleshooting Elasticsearch API Timeouts: Fixing ReadTimeoutError and es_rejected_execution_exception

Diagnose and resolve Elasticsearch API timeout errors. Learn how to optimize queries, tune thread pools, fix GC pauses, and stabilize your cluster.

Last updated: February 23, 2026

Last verified: February 23, 2026

1,759 words

Key Takeaways

Unoptimized queries, such as deep pagination or massive aggregations on high-cardinality fields, are the most common cause of API timeouts.
High JVM Heap pressure resulting in 'Stop-The-World' Garbage Collection (GC) pauses can cause nodes to become unresponsive, triggering timeout exceptions in clients.
Thread pool rejections (specifically the 'search' and 'write' thread pools) indicate your cluster is overwhelmed and actively dropping requests.
Mismatch between Elasticsearch client timeouts, Load Balancer (e.g., AWS ALB, Nginx) timeouts, and actual query execution time often masks the true bottleneck.

Approaches to Resolving Elasticsearch Timeouts
Method	When to Use	Time to Implement	Risk Level
Increase Client Timeout	Immediate mitigation for intermittent spikes while investigating root causes.	Minutes	High (Can mask underlying cluster instability and exhaust application threads)
Kill Long-Running Tasks	Emergency intervention when a rogue query (e.g., heavy wildcard) is locking up the cluster.	Minutes	Low (Only affects the canceled query, saves the cluster)
Optimize Queries & Pagination	Long-term fix for slow performance; transitioning from 'from/size' to 'search_after'.	Days/Weeks	Low (Improves overall cluster health and application responsiveness)
Scale Out Data Nodes / Heap	When CPU/Memory utilization is consistently near 100% despite query optimization.	Hours/Days	Medium (Requires budget, infrastructure changes, and node rebalancing)

Understanding Elasticsearch API Timeouts

As a DevOps engineer or SRE, encountering an Elasticsearch API timeout is a stressful but common rite of passage. These timeouts rarely point to a single isolated failure; instead, they are usually a symptom of a broader systemic issue—resource exhaustion, unoptimized application logic, or architectural bottlenecks.

When an application attempts to interact with an Elasticsearch cluster, it relies on an HTTP client (e.g., Python's elasticsearch-py, Node.js @elastic/elasticsearch, or Java's RestHighLevelClient). If the cluster fails to return a response within the configured temporal threshold, the client severs the connection and throws an exception.

You will typically see error messages in your application logs resembling the following:

Python: elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='es-cluster.internal', port=9200): Read timed out. (read timeout=10))
Java: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-0
Node.js: TimeoutError: Request timed out
Elasticsearch Logs: es_rejected_execution_exception: rejected execution of org.elasticsearch.transport.TransportService

To permanently resolve these issues, we must shift our focus from the application's client settings to the underlying cluster health and query performance. Let's break down the diagnostic process and the structural fixes required to stabilize your Elasticsearch infrastructure.

Step 1: Diagnose the Root Cause

Before changing configurations or scaling infrastructure, you must identify why the timeouts are occurring. Elasticsearch provides extensive APIs for introspection.

1. Check for Long-Running Tasks

If timeouts suddenly spike, a rogue query might be consuming all cluster resources. A common culprit is a massive aggregation or a deeply nested query executed by a data scientist or a poorly optimized microservice.

You can inspect currently executing tasks using the _cat/tasks API:

curl -X GET "localhost:9200/_cat/tasks?v&detailed=true" | grep "search"

For a more programmatic approach, use the Task Management API to find queries running longer than a specific threshold (e.g., 10 seconds):

curl -X GET "localhost:9200/_tasks?detailed=true&actions=*search*" | jq '.nodes[].tasks[] | select(.running_time_in_nanos > 10000000000)'

2. Analyze Thread Pool Rejections

Elasticsearch uses distinct thread pools to manage different types of operations (search, write, fetch). When a node receives more requests than it can process, requests are placed in a queue. If the queue fills up, Elasticsearch rejects the request, throwing an es_rejected_execution_exception. This almost always translates to an API timeout or 503 error on the client side.

Check your thread pool stats:

curl -X GET "localhost:9200/_cat/thread_pool/search,write?v&h=node_name,name,active,queue,rejected,completed"

A continuously incrementing rejected count on the search thread pool indicates that your queries are too slow, or your request volume exceeds the cluster's concurrency limits.

3. Monitor JVM Heap and GC Pauses

Elasticsearch runs on the Java Virtual Machine (JVM). As it processes queries, it allocates objects in the Heap. If the heap fills up, the JVM triggers a Garbage Collection (GC). Minor GCs are fast, but a major "Stop-The-World" GC pauses all application threads. If a node is paused for 15 seconds doing garbage collection, any client waiting for a response from that node will time out.

Examine the node stats for GC metrics:

curl -X GET "localhost:9200/_nodes/stats/jvm?pretty"

Look for high heap_used_percent (consistently over 85%) and long GC durations. If you see frequent long GC pauses, you have a memory pressure problem, likely caused by Fielddata usage, large aggregations, or deeply nested documents.

4. Enable and Review Slow Logs

If resource utilization looks normal but clients still time out, individual queries are likely the problem. Enable the Search Slow Log to identify queries that take longer than your application's timeout threshold.

Dynamically update your index settings to log slow queries:

PUT /my-index-000001/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.query.info": "5s",
  "index.search.slowlog.threshold.fetch.warn": "1s"
}

Review the logs in /var/log/elasticsearch/my-cluster_index_search_slowlog.log to find the exact JSON bodies of the offending queries.

Step 2: Implement Fixes

Once you have identified the bottleneck, you can apply targeted remediation strategies.

Immediate Mitigation: Canceling Rogue Tasks

If a specific query is bringing down the cluster, you can forcefully cancel it using the Task API.

Find the Task ID from the diagnostic steps above, then issue a cancel request:

curl -X POST "localhost:9200/_tasks/node_id:task_id/_cancel"

Query Optimization: Fixing Deep Pagination

A classic cause of API timeouts is "deep pagination" using the from and size parameters. If you request from: 10000 and size: 10, the coordinating node must fetch 10,010 documents from every shard, sort them all in memory, and then discard 10,000 of them to return the final 10. This requires massive CPU and memory overhead.

The Fix: Transition to search_after or the Point in Time (PIT) API for deep scrolling.

GET /my-index/_search
{
  "size": 10,
  "query": { "match": { "status": "active" } },
  "sort": [
    {"timestamp": "desc"},
    {"_id": "asc"}
  ],
  "search_after": [
    1629837493000,
    "doc-123"
  ]
}

Bounding Query Execution Time

By default, an Elasticsearch query will run until it completes, even if the client has already timed out and closed the connection. This leads to "zombie queries" consuming resources for no reason.

You should enforce a server-side timeout on all expensive queries by appending the timeout parameter to your request body. This tells Elasticsearch to return whatever partial results it has gathered, or abort the query, freeing up the thread pool.

GET /my-index/_search
{
  "timeout": "5s",
  "query": { ... }
}

Resolving High Heap Pressure and Thread Pool Rejections

If timeouts are caused by resource exhaustion (GC pauses or thread rejections), you must reduce the load on your nodes or scale the cluster.

Reduce Shard Count: Too many small shards (the "over-sharding" problem) consume heap memory for cluster state and Lucene segments. Aim for shard sizes between 30GB and 50GB. Use the _shrink API or Index Lifecycle Management (ILM) to consolidate small indices.
Optimize Mappings: Avoid using the text field type for exact match filtering or aggregations; use keyword instead. Disable dynamic mapping to prevent accidental explosion of mapped fields.
Scale Out: If optimizations are exhausted and CPU/Heap remains pegged at 90%+, you must add more data nodes to the cluster. Elasticsearch scales horizontally very well. Adding nodes distributes the primary and replica shards, reducing the memory and CPU burden on any single JVM.

Aligning Client and Proxy Timeouts

Finally, ensure your timeout stack is logically configured. If your application expects a response in 10 seconds, but your Nginx reverse proxy times out in 5 seconds, you will see 504 Gateway Timeouts before Elasticsearch even finishes processing.

A standard best practice is:

Elasticsearch Query Timeout (timeout in JSON): 8 seconds
Application Client Read Timeout: 10 seconds
Load Balancer / Proxy Timeout: 15 seconds

This cascading setup ensures that if a query runs too long, Elasticsearch terminates it gracefully, rather than the proxy ruthlessly severing the connection while the database continues to churn in the background.

Frequently Asked Questions

bash

#!/bin/bash
# Elasticsearch Diagnostic Script: Identify Timeout Root Causes

ES_HOST="http://localhost:9200"

echo "=== 1. Checking Cluster Health ==="
curl -s -X GET "${ES_HOST}/_cluster/health?pretty"

echo -e "\n=== 2. Checking for Thread Pool Rejections (Search & Write) ==="
curl -s -X GET "${ES_HOST}/_cat/thread_pool/search,write?v&h=node_name,name,active,queue,rejected,completed"

echo -e "\n=== 3. Finding Tasks Running Longer Than 10 Seconds ==="
curl -s -X GET "${ES_HOST}/_tasks?detailed=true&actions=*search*" | \
  jq '.nodes[]?.tasks[]? | select(.running_time_in_nanos > 10000000000) | {node, action, running_time_seconds: (.running_time_in_nanos / 1000000000), description}'

echo -e "\n=== 4. Checking JVM Heap Pressure (Look for heap_used_percent > 85%) ==="
curl -s -X GET "${ES_HOST}/_nodes/stats/jvm?pretty" | grep "heap_used_percent"

# To cancel a rogue task, uncomment and replace TASK_ID:
# curl -X POST "${ES_HOST}/_tasks/node_id:task_id/_cancel"

Error Medic Editorial

The Error Medic Editorial team consists of senior SREs and DevOps engineers dedicated to providing actionable, code-first troubleshooting guides for distributed systems, databases, and cloud infrastructure.

Sources

Fix Elasticsearch API timeout errors—ReadTimeoutError, SocketTimeoutException, 408/504 responses—with step-by-step diagnosis and config tuning.

Elasticsearch API Timeout: Diagnosing and Fixing Request Timeout Errors

Fix Elasticsearch API timeout errors fast. Covers socket timeouts, request_timeout settings, slow queries, and cluster health fixes with real commands.

Elasticsearch API Timeout: How to Diagnose and Fix Connection, Request, and Bulk Timeout Errors

Fix Elasticsearch API timeouts by tuning socket/request timeout settings, adjusting thread pools, and scaling cluster resources. Step-by-step guide with command

Elasticsearch API Timeout: How to Diagnose and Fix Connection, Request, and Search Timeouts

Fix Elasticsearch API timeouts fast: tune request_timeout, adjust index.search.slowlog thresholds, scale shards, and configure circuit breakers to stop 504s.