Elasticsearch API Timeout: How to Diagnose and Fix Connection, Request, and Search Timeouts
Fix Elasticsearch API timeouts fast: tune request_timeout, adjust index.search.slowlog thresholds, scale shards, and configure circuit breakers to stop 504s.
- The most common root causes are undersized thread pools, heap pressure triggering GC pauses, overly broad wildcard queries, and misconfigured client-side or load-balancer timeout values.
- A 'ReadTimeoutError' or HTTP 504 from the REST API almost always means the coordinating node accepted the request but could not assemble shard responses within the timeout window — the fix lives on the cluster side, not the client side.
- Quick wins: raise request_timeout on the client, add ?timeout=30s to the REST call, profile the slow query with the Profile API, and check hot_threads to find the CPU bottleneck before touching any shard or replica counts.
| Method | When to Use | Time to Apply | Risk |
|---|---|---|---|
| Increase client request_timeout | Client throws ReadTimeoutError but cluster is healthy | < 5 min | Low — buys time, does not fix root cause |
| Add ?timeout=30s query parameter | One-off slow query or bulk indexing job | < 1 min | Low — scoped to single request |
| Reduce query scope (filter before query) | Wildcard/fuzzy queries on large indices | 30–60 min | Medium — requires query refactoring |
| Scale horizontal (add data nodes) | Shard queue depth consistently > 0 | Hours to days | Medium — requires cluster rebalance |
| Tune thread pool queue size | Bulk or search rejections visible in _cat/thread_pool | 5–15 min | Medium — wrong value causes OOM |
| Increase JVM heap (up to 31 GB) | GC pauses visible in logs, heap > 85% | 15–30 min with restart | High — requires rolling restart |
| Enable request caching | Repeated aggregation queries on non-volatile indices | 15 min | Low — may serve stale data |
| Circuit breaker adjustment | requests.breaker.total.tripped counter rising | 10 min | High — masking memory pressure |
Understanding the Elasticsearch API Timeout Error
When your application or curl command hits an Elasticsearch API timeout you will see one of several error signatures depending on the layer that gave up first.
Client-side (Python elasticsearch-py):
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=10))
REST / HTTP layer:
{"error":{"root_cause":[{"type":"search_phase_execution_exception","reason":"all shards failed"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"logs-2024","node":"abc123","reason":{"type":"query_shard_exception","reason":"Request timed out"}}]}}
Coordinating node log (elasticsearch.log):
[WARN][o.e.a.s.SearchService] [node-1] timeout while executing [indices:data/read/search[phase/query]]
Bulk indexing timeout:
{"error":{"type":"timeout_exception","reason":"Timeout waiting for task [bulk[shard_id=3]]"}}
Each error surface points to a different layer of the stack. Understanding the taxonomy saves you from chasing the wrong fix.
Architecture: Why Timeouts Happen
Every Elasticsearch search flows through three phases: the coordinating node fans out to primary or replica shards (query phase), each shard returns doc IDs and scores (fetch phase), and the coordinator merges and returns results. A timeout can fire at any phase boundary. The default search.default_search_timeout is -1 (unlimited) at the cluster level, but client libraries and API gateways almost always impose their own limits — usually 10–30 seconds.
Timeout triggers, ranked by frequency in production:
- JVM GC pause — A full GC pause of 10–20 s on a data node causes all in-flight shard requests to exceed the timeout. Observable via
/_nodes/stats(jvm.gc.collectors.old.collection_time_in_millisclimbing). - Thread pool saturation — The
searchthread pool defaults toint(availableProcessors / 2) + 1threads with a queue of 1000. When the queue fills, new requests are rejected with a 429, but requests already queued may time out before a thread picks them up. - Hot shards / skewed data distribution — A single shard holding 80 % of documents means one thread does most of the work while others sit idle.
- Expensive queries — Unbounded
wildcard,script, or nestedtermsqueries with millions of candidate documents. - Network partition or slow disk I/O — Especially common on cloud instances with burst-capable EBS volumes that exhaust their I/O credits.
Step 1: Identify Which Layer Is Timing Out
Before changing any configuration, confirm whether the timeout originates at the client, the coordinating node, or a data node.
# 1a. Check cluster health — RED means shards are unassigned and queries will hang
curl -s 'http://localhost:9200/_cluster/health?pretty'
# 1b. Check hot threads — reveals what CPU is actually doing
curl -s 'http://localhost:9200/_nodes/hot_threads?threads=5&interval=500ms'
# 1c. Check thread pool queues and rejections
curl -s 'http://localhost:9200/_cat/thread_pool/search,bulk,write?v&h=node_name,name,active,queue,rejected,completed'
# 1d. Check JVM heap and GC pressure
curl -s 'http://localhost:9200/_nodes/stats/jvm?pretty' | \
python3 -c "import sys,json; n=json.load(sys.stdin)['nodes']; \
[print(v['name'], v['jvm']['mem']['heap_used_percent'],'% heap') for v in n.values()]"
# 1e. Check slow log — requires slowlog threshold set (see Step 2)
curl -s 'http://localhost:9200/_cat/indices?v&h=index,search.fetch_time,search.query_time,search.scroll_time'
Step 2: Enable the Slow Log to Profile the Offending Query
The Elasticsearch slow log is the most actionable diagnostic tool for search timeouts. Enable it dynamically without restarting.
# Enable slowlog on a specific index (adjust index name and thresholds)
curl -X PUT 'http://localhost:9200/logs-2024/_settings' \
-H 'Content-Type: application/json' \
-d '{
"index.search.slowlog.threshold.query.warn": "5s",
"index.search.slowlog.threshold.query.info": "2s",
"index.search.slowlog.threshold.fetch.warn": "1s",
"index.search.slowlog.level": "info"
}'
Once set, slow queries appear in logs/elasticsearch_index_search_slowlog.json. Look for the took field and the source field showing the full query JSON. A query taking > 5 s in the slow log but appearing fast in _profile output usually indicates I/O wait, not CPU.
Step 3: Use the Profile API to Find the Expensive Clause
Add "profile": true to any search request to get per-shard timing broken down by query clause.
curl -X GET 'http://localhost:9200/logs-2024/_search?pretty' \
-H 'Content-Type: application/json' \
-d '{
"profile": true,
"query": {
"bool": {
"must": [{"wildcard": {"message": "*error*"}}],
"filter": [{"range": {"@timestamp": {"gte": "now-1h"}}}]
}
}
}'
In the response, shards[].searches[].query[].time_in_nanos reveals which clause dominates. A wildcard on an un-analyzed keyword field at 2–3 billion nanoseconds is a clear culprit — replace it with a full-text match query or a prefix-aware edge_ngram analyzer.
Step 4: Apply the Appropriate Fix
Fix A — Immediate relief (client timeout increase):
For Python:
from elasticsearch import Elasticsearch
es = Elasticsearch(
['http://localhost:9200'],
request_timeout=60, # seconds
retry_on_timeout=True,
max_retries=3
)
For curl / REST clients, append ?timeout=30s to the URL:
curl 'http://localhost:9200/logs-2024/_search?timeout=30s'
Note: this tells Elasticsearch to return partial results after 30 s rather than hanging. It does not make the query faster.
Fix B — Rewrite expensive queries (permanent fix):
Replace wildcard with match or use the multi_match query with best_fields type. For log analytics, move aggregation-heavy dashboards to async search (_async_search) so they do not block the user thread.
Fix C — Thread pool tuning:
# Add to elasticsearch.yml on each node, then rolling-restart
thread_pool.search.size: 16 # default: (vCPU/2)+1
thread_pool.search.queue_size: 2000 # default: 1000
Do not set size above vCPU * 2. Setting queue_size too high causes requests to succeed eventually but greatly increases P99 latency.
Fix D — Heap and GC tuning:
Set JVM heap to 50 % of available RAM, capped at 30 GB (above 32 GB, the JVM disables compressed OOPs, wasting memory). Edit jvm.options:
-Xms16g
-Xmx16g
For modern Elasticsearch 8.x, use the ES_JAVA_OPTS approach in jvm.options.d/ rather than editing the main file.
Step 5: Verify the Fix
After applying changes, confirm the cluster is stable and timeout rate has dropped:
# Watch rejection rate in real time (Ctrl-C to stop)
watch -n 5 'curl -s http://localhost:9200/_cat/thread_pool/search?v&h=node_name,active,queue,rejected'
# Check circuit breaker trips
curl -s 'http://localhost:9200/_nodes/stats/breaker?pretty' | grep -E '(tripped|limit_size_in_bytes|estimated_size_in_bytes)'
# Run a benchmark query and measure wall-clock time
time curl -s 'http://localhost:9200/logs-2024/_search?size=0' \
-H 'Content-Type: application/json' \
-d '{"query":{"match_all":{}},"aggs":{"per_host":{"terms":{"field":"host.keyword","size":10}}}}' | \
python3 -c "import sys,json; r=json.load(sys.stdin); print('took:', r['took'], 'ms')"
A healthy cluster should return most aggregations under 500 ms. If P95 is still above 5 s after thread pool and heap tuning, consider adding data nodes and triggering a shard rebalance via _cluster/reroute.
Frequently Asked Questions
#!/usr/bin/env bash
# elasticsearch-timeout-diagnostics.sh
# Run against any Elasticsearch node to collect timeout-related metrics.
# Usage: ES_HOST=http://localhost:9200 bash elasticsearch-timeout-diagnostics.sh
ES="${ES_HOST:-http://localhost:9200}"
echo "=== Cluster Health ==="
curl -s "$ES/_cluster/health?pretty"
echo ""
echo "=== Thread Pool: search + write + bulk ==="
curl -s "$ES/_cat/thread_pool/search,write,bulk?v&h=node_name,name,type,active,queue,rejected,completed,largest"
echo ""
echo "=== JVM Heap Usage per Node ==="
curl -s "$ES/_nodes/stats/jvm" | \
python3 -c "
import sys, json
data = json.load(sys.stdin)
for nid, node in data['nodes'].items():
heap_pct = node['jvm']['mem']['heap_used_percent']
gc_old = node['jvm']['gc']['collectors']['old']['collection_time_in_millis']
print(f"{node['name']:30s} heap={heap_pct}% old_gc_ms={gc_old}")
"
echo ""
echo "=== Circuit Breaker Status ==="
curl -s "$ES/_nodes/stats/breaker" | \
python3 -c "
import sys, json
data = json.load(sys.stdin)
for nid, node in data['nodes'].items():
print(f"Node: {node['name']}")
for bname, bdata in node['breakers'].items():
print(f" {bname}: tripped={bdata['tripped']} used={bdata['estimated_size']} limit={bdata['limit_size']}")
"
echo ""
echo "=== Hot Threads (top 3, 500ms sample) ==="
curl -s "$ES/_nodes/hot_threads?threads=3&interval=500ms&type=cpu"
echo ""
echo "=== Pending Tasks ==="
curl -s "$ES/_cluster/pending_tasks?pretty"
echo ""
echo "=== Slowlog thresholds on all indices ==="
curl -s "$ES/_all/_settings?pretty&filter_path=**.slowlog"
echo ""
echo "=== Search Latency (took ms for match_all with size=0) ==="
curl -s -w "\nHTTP %{http_code} wall=%{time_total}s" \
"$ES/_search?size=0&pretty" \
-H 'Content-Type: application/json' \
-d '{"query":{"match_all":{}}}' | tail -5Error Medic Editorial
Error Medic Editorial is a team of senior DevOps and SRE engineers with collective experience running Elasticsearch clusters from single-node development setups to 200-node multi-region deployments. The team specializes in distributed systems observability, JVM performance tuning, and cloud-native search infrastructure.
Sources
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html#search-timeout
- https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-slowlog.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html
- https://github.com/elastic/elasticsearch/issues/21301
- https://stackoverflow.com/questions/22924717/elasticsearch-timeout-setting
- https://www.elastic.co/guide/en/elasticsearch/reference/current/circuit-breaker.html