Why am I getting an elasticsearch timeout during bulk indexing?

Bulk indexing timeouts occur when the indexing thread pool queue is full. This is often caused by sending bulk requests that are too large, or having slow disk I/O. Reduce your bulk request payload size (e.g., to 5MB-15MB) and ensure you are using fast SSDs.

How do I permanently fix the elasticsearch connection refused error?

First, ensure the Elasticsearch service is running and hasn't crashed from an OOM error. Next, modify elasticsearch.yml to set `network.host: 0.0.0.0` to listen on all interfaces, and ensure port 9200 is open in your local firewall and cloud security groups.

What is the best way to resolve an elasticsearch disk full block?

When the disk hits the 95% flood watermark, indices are marked read-only. You must first free up physical disk space (delete old indices or logs) to drop usage below 90%. Then, you must manually run a PUT request to `_all/_settings` setting `index.blocks.read_only_allow_delete` to `null` to restore write access.

Why does my cluster crash with elasticsearch out of memory when I have 64GB of RAM?

By default, the JVM heap might be misconfigured. Never assign more than 50% of your system RAM to the Elasticsearch heap, and strictly never exceed 31GB. Exceeding 31GB disables compressed oops (Object Pointers), actually reducing available memory efficiency and causing massive GC overhead resulting in OOM crashes.

I see elasticsearch permission denied in the logs when starting up. How do I fix it?

This usually means the Elasticsearch process (running as the `elasticsearch` user) lacks read/write access to the data or log directories. Run `sudo chown -R elasticsearch:elasticsearch /var/lib/elasticsearch /var/log/elasticsearch` to correct the directory ownership.

Troubleshooting Elasticsearch Timeout, OOM, and Connection Errors

Comprehensive SRE guide to diagnosing and fixing Elasticsearch timeout, connection refused, out of memory (OOM), disk full, and permission denied errors.

Last updated: February 24, 2026

Last verified: February 24, 2026

1,657 words

Key Takeaways

Timeouts are often caused by unoptimized queries causing long Garbage Collection (GC) pauses or thread pool exhaustion.
Out of Memory (OOM) crashes happen when the JVM heap is undersized, fielddata explodes, or circuit breakers fail to trip in time.
Disk full errors trigger flood-stage watermarks, placing indices into a strict read-only mode that must be manually reversed after clearing space.
Connection refused errors stem from network interface misconfigurations (network.host), firewall rules, or a completely crashed JVM process.
Permission denied errors typically occur after upgrades or restoring snapshots due to incorrect file ownership or missing Keystore RBAC privileges.

Fix Approaches Compared
Error Type	Primary Diagnostic	Immediate Mitigation	Long-term Fix
elasticsearch timeout	GET _nodes/stats/thread_pool	Cancel long-running tasks via _tasks API	Optimize query structure, implement routing, scale data nodes
elasticsearch out of memory	Check JVM heap in _cat/nodes	Clear fielddata cache (POST /_cache/clear)	Set heap to 50% of RAM (max 31GB), use doc_values
elasticsearch disk full	GET _cat/allocation?v	Delete old indices or clear system logs	Adjust high/flood watermarks, configure ILM policies
elasticsearch connection refused	netstat -tulpn \| grep 9200	Restart process, check elasticsearch.yml	Bind network.host correctly, configure TLS properly

Understanding the Error

Elasticsearch is a highly distributed, resource-intensive search and analytics engine. When it operates normally, it is lightning fast. However, when resource limits are reached, network topologies change, or unoptimized queries are executed, the cluster can rapidly destabilize. The most common symptoms of cluster distress manifest as a variety of client-side and server-side errors, most notably the elasticsearch timeout.

Timeouts are rarely an isolated network issue; they are usually the canary in the coal mine indicating severe underlying resource contention. This guide, written from the perspective of a Site Reliability Engineer (SRE), will walk you through diagnosing and permanently resolving timeouts, as well as the closely related issues of elasticsearch connection refused, elasticsearch disk full, elasticsearch out of memory, and elasticsearch permission denied.

Scenario 1: Diagnosing and Fixing `elasticsearch timeout`

The most common error developers see is an ElasticsearchTimeoutException or a java.net.SocketTimeoutException on the client side. This happens when the Elasticsearch node takes too long to respond to a REST API request.

Step 1: Diagnose the Timeout Root Cause

Timeouts usually occur because the node's thread pools are exhausted, or the JVM is stuck in a "Stop-the-World" Garbage Collection (GC) pause. To determine which is happening, check the thread pools:

curl -X GET "localhost:9200/_cat/thread_pool/search?v&h=id,name,active,rejected,completed"

If you see the rejected count steadily increasing, your cluster is overwhelmed by too many concurrent requests. If the thread pools look fine, check the garbage collection metrics in the node stats.

Step 2: Mitigate the Timeout

If a massive, unoptimized query is hogging resources, you can find and cancel it using the Task Management API:

# Find long-running tasks
curl -X GET "localhost:9200/_tasks?detailed=true&actions=*search*"

# Cancel a specific rogue task
curl -X POST "localhost:9200/_tasks/<task_id>:12345/_cancel"

For a permanent fix, ensure your queries are optimized. Avoid heavy aggregations on large text fields, utilize filter context instead of query context for exact matches (to leverage caching), and ensure your cluster has enough data nodes to distribute the shard load.

Scenario 2: Handling `elasticsearch out of memory`

An elasticsearch out of memory (OOM) error is catastrophic. The JVM crashes, the node drops out of the cluster, and the cluster state turns yellow or red. The Elasticsearch logs (/var/log/elasticsearch/<cluster-name>.log) will definitively show java.lang.OutOfMemoryError: Java heap space.

Step 1: Check JVM Heap Pressure

Before a node crashes, you will typically see high heap usage. You can monitor this actively:

curl -X GET "localhost:9200/_cat/nodes?v=true&h=name,heap.percent,ram.percent,cpu"

If heap.percent is consistently above 85-90%, you are at severe risk of an OOM crash.

Step 2: Fix and Prevent OOM

Configure the Heap Size Correctly: The JVM heap should be set to exactly 50% of the available physical RAM, but never more than 31GB (to ensure Java uses compressed Object Pointers, or compressed oops). Edit /etc/elasticsearch/jvm.options:
```
-Xms16g
-Xmx16g
```
Clear the Cache: If the node is struggling but hasn't crashed, force a cache clear to buy time:
```
curl -X POST "localhost:9200/_cache/clear"
```
Use Doc Values: Ensure that your mappings use doc_values for aggregations and sorting, which rely on the OS filesystem cache rather than the JVM heap.

Scenario 3: Recovering from `elasticsearch disk full`

Elasticsearch has built-in safety mechanisms to prevent disks from filling up completely, which would corrupt indices. It uses "watermarks". When the flood-stage watermark (default 95%) is hit, you will experience an elasticsearch disk full scenario. The exact error usually looks like:

ClusterBlockException[blocked by: [FORBIDDEN/12/index read-only / allow delete (api)]]

Step 1: Free Up Disk Space

The immediate action is to free up disk space on the affected nodes. Check allocation and disk usage:

curl -X GET "localhost:9200/_cat/allocation?v"

Delete older, unnecessary indices, or clear out standard system logs (/var/log/messages, old rotated logs) to drop the disk usage below the high watermark (default 90%).

Step 2: Remove the Read-Only Block

Once space is cleared, Elasticsearch does not automatically remove the read-only block. You must do this manually via the settings API:

curl -X PUT "localhost:9200/_all/_settings"
-H 'Content-Type: application/json'
-d'{
  "index.blocks.read_only_allow_delete": null
}'

To prevent this long-term, implement Index Lifecycle Management (ILM) policies to automatically roll over and delete older indices.

Scenario 4: Fixing `elasticsearch connection refused`

If you receive an elasticsearch connection refused error, it means the client cannot establish a TCP connection to the Elasticsearch port (default 9200 for HTTP, 9300 for transport).

Step 1: Verify the Process is Running

First, check if the JVM process is even running or if it crashed (perhaps due to an OOM error):

systemctl status elasticsearch

If it is running, check if it is listening on the correct interfaces:

sudo netstat -tulpn | grep 9200

Step 2: Fix Network Binding

By default, Elasticsearch binds only to localhost (127.0.0.1) for security reasons. If you are trying to connect from an external application server, the connection will be refused. Edit /etc/elasticsearch/elasticsearch.yml:

# Set to a specific IP, or 0.0.0.0 for all interfaces
network.host: 0.0.0.0
# If binding to a non-loopback address, you must configure discovery
discovery.seed_hosts: ["host1", "host2"]
cluster.initial_master_nodes: ["node-1", "node-2"]

Also, verify that firewalld, iptables, or AWS Security Groups are allowing inbound traffic on port 9200.

Scenario 5: Resolving `elasticsearch permission denied`

An elasticsearch permission denied error generally happens during startup, plugin installation, or snapshot/restore operations. The Elasticsearch service runs as the elasticsearch user, and it must have strict ownership over its data, log, and configuration directories.

Step 1: Fix File Ownership

If Elasticsearch fails to start and logs show AccessDeniedException, fix the ownership of the critical directories:

sudo chown -R elasticsearch:elasticsearch /var/lib/elasticsearch
sudo chown -R elasticsearch:elasticsearch /var/log/elasticsearch
sudo chown -R elasticsearch:elasticsearch /etc/elasticsearch

Step 2: Keystore and Snapshot Permissions

If the permission denied error occurs when accessing the keystore (e.g., for AWS S3 repository plugins), ensure the keystore has the correct permissions:

sudo chmod 660 /etc/elasticsearch/elasticsearch.keystore

For snapshot repositories using a shared network file system (NFS), ensure the NFS mount allows the elasticsearch user (UID/GID) to write to the mounted directory across all nodes in the cluster.

Conclusion

Maintaining a healthy Elasticsearch cluster requires vigilant monitoring of resources. Timeouts and OOMs are symptoms of memory and CPU exhaustion; connection drops and disk full errors point to infrastructure limits; and permission errors are configuration oversights. By using the _cat APIs, carefully managing JVM heap, and establishing robust ILM policies, you can ensure your cluster remains resilient under heavy analytical and search workloads.

Frequently Asked Questions

bash

#!/bin/bash
# Elasticsearch SRE Diagnostic Script
# Run this on the Elasticsearch node to quickly triage timeouts and cluster health

ES_URL="http://localhost:9200"

echo "=== 1. Checking Process Status ==="
systemctl status elasticsearch --no-pager | grep Active

echo -e "\n=== 2. Cluster Health ==="
curl -s -X GET "$ES_URL/_cluster/health?pretty"

echo -e "\n=== 3. Node Resource Usage (Heap, CPU, Disk) ==="
curl -s -X GET "$ES_URL/_cat/nodes?v=true&h=name,cpu,ram.percent,heap.percent,disk.used_percent"

echo -e "\n=== 4. Thread Pool Rejections (Leading to Timeouts) ==="
curl -s -X GET "$ES_URL/_cat/thread_pool/search,write?v&h=node_name,name,active,queue,rejected"

echo -e "\n=== 5. Pending Tasks ==="
curl -s -X GET "$ES_URL/_cat/pending_tasks?v"

Error Medic Editorial

Error Medic Editorial is composed of Senior Site Reliability Engineers and DevOps architects dedicated to breaking down complex database and infrastructure failures into actionable, verified solutions.

Sources

Resolve an Elasticsearch cluster health status red error. Step-by-step diagnostic commands, unassigned shard recovery, and data restoration strategies.

Fixing Elasticsearch Timeout and Connection Refused Errors

Comprehensive guide to resolving Elasticsearch timeout, connection refused, OOM, and disk full errors. Learn root causes, diagnostic commands, and permanent fix

Troubleshooting Elasticsearch OOM: Fixing OutOfMemoryError and Killed Processes

Fix Elasticsearch OutOfMemoryError and kernel OOM killed process crashes. Learn how to tune JVM heap, optimize queries, and configure circuit breakers.

Cassandra 'Connection Refused' on Port 9042: Complete Troubleshooting Guide

Fix Cassandra connection refused errors on port 9042. Diagnose OOM kills, misconfigured listen_address, firewall blocks, slow queries, and data corruption with