Error Medic

Resolving Cassandra Read Timeout During Query: LOCAL_QUORUM, ONE, and SERIAL

Comprehensive guide to fixing Cassandra read timeouts (ReadTimeoutException) at LOCAL_QUORUM, ONE, and CAS SERIAL consistencies. Diagnose tombstones, GC pauses,

Last updated:
Last verified:
1,478 words
Key Takeaways
  • Tombstone Overload: Scanning too many tombstones is the #1 cause of Cassandra read timeouts. Use nodetool tablestats to check.
  • Garbage Collection (GC) Pauses: Stop-the-world JVM pauses exceeding read_request_timeout_in_ms will drop requests.
  • Network or Disk I/O Bottlenecks: Slow disks or saturated NICs cause replica responses to miss the coordinator's timeout window.
  • Unoptimized Queries: Large partition reads or ALLOW FILTERING without partition keys overwhelm the coordinator.
  • Quick Fix: Check system.log for 'Scanned over X tombstones', run nodetool tpstats for dropped messages, and isolate the offending table.
Troubleshooting Approaches Compared
MethodWhen to UseTime to ExecuteRisk Level
Force Major CompactionHigh tombstone count causing timeouts on specific tablesHours to DaysHigh (Disk I/O intensive)
Tune GC / HeapLong JVM pauses seen in gc.log or system.logMinutes (Requires Restart)Medium
Downgrade Consistency (e.g., QUORUM to ONE)Emergency mitigation to restore availabilityImmediate (App side)High (Data staleness/Inconsistency)
Increase read_request_timeout_in_msQueries legitimately take longer due to payload sizeMinutes (Rolling Restart)Medium (Masks underlying issue)

Understanding the Error

When working with Apache Cassandra, one of the most dreaded errors a developer or operator can encounter is the ReadTimeoutException. This error indicates that the coordinator node did not receive enough responses from replicas within the configured timeout window (default is typically 5000ms for reads).

Depending on your application's consistency requirements, the exact error message will vary. You might see:

  • Cassandra timeout during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded)
  • Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)
  • Cassandra timeout during read query at consistency LOCAL_ONE
  • Cassandra timeout during cas write query at consistency SERIAL

The last one is particularly interesting because it occurs during Lightweight Transactions (LWTs). A Compare-And-Set (CAS) write utilizes the Paxos consensus protocol, and the read phase of this protocol must achieve SERIAL or LOCAL_SERIAL consistency. If the Paxos ballot fails to reach a quorum in time, this timeout is thrown.

Root Causes

Why do these timeouts happen? The distributed nature of Cassandra means the bottleneck could be anywhere from the disk of a single replica to the network fabric.

  1. Tombstone Overload: When you delete data in Cassandra, it isn't removed immediately. A marker called a 'tombstone' is written. During a read, Cassandra must scan and filter out these tombstones. If a query scans thousands of tombstones to find a few live rows, the CPU overhead and memory pressure will cause a timeout.
  2. Garbage Collection (GC) Pauses: Cassandra runs on the JVM. If the heap is poorly tuned, or if massive queries are creating high object churn, the JVM will trigger a 'Stop-the-World' garbage collection. If this pause exceeds read_request_timeout_in_ms (default 5 seconds), the coordinator assumes the node is dead for that request.
  3. Hardware Bottlenecks: High CPU utilization, saturated disk I/O (i.e., high iowait), or network packet loss will delay replica responses.
  4. Bad Query Patterns: Unbounded partition reads, cross-partition queries using ALLOW FILTERING, or fetching massive payloads in a single request.

Step 1: Diagnose the Bottleneck

Before changing any configurations, you must identify why the reads are slow. Log into a Cassandra node and use the built-in diagnostic tools.

Check for Dropped Messages Run nodetool tpstats. Look at the Dropped column for READ and MUTATION stages. If you see high numbers of dropped reads, the node is shedding load because it cannot process requests fast enough.

Analyze Latency Histograms Run nodetool proxyhistograms. This provides a distribution of coordinator-level latencies. If the 99th percentile for reads is close to or above your timeout threshold (e.g., 5000000 microseconds), the cluster is generally degraded.

Identify the Offending Table Run nodetool top partitions or nodetool tablestats. Look for tables with a high Maximum tombstones per slice or poor read latency.

Check the system.log (usually in /var/log/cassandra/system.log) for tombstone warnings: WARN [ReadStage-2] 2023-10-27 10:00:00,000 ReadCommand.java:400 - Read 1000 live rows and 50000 tombstone cells for query SELECT * FROM keyspace.table WHERE...

Step 2: Implement the Fix

Fix A: Resolving Tombstone Issues

If your logs are full of tombstone warnings, you have a data model or maintenance problem.

  • Run Compaction: If data has passed its gc_grace_seconds, running compaction will permanently remove the tombstones. Use nodetool compact <keyspace> <table>.
  • Adjust Query Limits: Ensure your application isn't doing massive slice queries over highly deleted partitions.
  • Change Compaction Strategy: If the table has heavy updates/deletes, switch from SizeTieredCompactionStrategy (STCS) to LeveledCompactionStrategy (LCS), which handles overwrites and deletes much more efficiently.
Fix B: Mitigating GC Pauses

Check the GC logs (gc.log). If you see pauses lasting 3-10 seconds, you need JVM tuning.

  • Enable G1GC: If you are still using CMS on an older Cassandra version, switch to G1GC in jvm.options.
  • Increase Heap: Ensure MAX_HEAP_SIZE is adequately set (typically 8GB to 31GB, never more than 32GB due to compressed oops).
  • Reduce Churn: Stop querying large batches of data at once. Implement pagination using paging state.
Fix C: Addressing CAS / SERIAL Timeouts

Cassandra timeout during cas write query at consistency SERIAL means your LWTs are failing.

  • LWTs require 4 round trips. They are extremely sensitive to latency.
  • Ensure network latency between nodes (especially across data centers) is minimal.
  • Check for high contention. If multiple threads are constantly trying to CAS the exact same partition key simultaneously, Paxos ballots will continuously fail and retry until the timeout is hit. Redesign the application logic to reduce contention on single partitions.
Fix D: The Band-Aid (cassandra.yaml)

If you have legitimately heavy queries (e.g., analytical workloads) and you cannot optimize them further, you can increase the timeout limits in cassandra.yaml. Warning: This masks the symptom; it does not cure the disease.

Edit /etc/cassandra/cassandra.yaml: read_request_timeout_in_ms: 10000 range_request_timeout_in_ms: 20000 cas_contention_timeout_in_ms: 3000

Requires a rolling restart of the cluster to take effect.

Frequently Asked Questions

bash
#!/bin/bash
# Cassandra Diagnostic Script for Read Timeouts

# 1. Check for dropped read messages in the last 5 minutes
echo "=== Dropped Messages ==="
nodetool tpstats | grep -E 'Message type|READ|MUTATION|RangeSlice'

# 2. Check coordinator level latencies (look at 99% line)
echo -e "\n=== Proxy Histograms ==="
nodetool proxyhistograms

# 3. Search system.log for tombstone warnings today
echo -e "\n=== Tombstone Warnings ==="
grep "tombstone" /var/log/cassandra/system.log | tail -n 10

# 4. Check for long GC pauses
echo -e "\n=== Long GC Pauses (> 1000ms) ==="
grep "stopped:" /var/log/cassandra/gc.log | awk '{print $NF}' | sed 's/seconds//g' | awk '$1 > 1.0 {print "Pause of " $1 " seconds found"}'

# 5. Identify tables with the most tombstones
echo -e "\n=== Top Tables by Tombstones ==="
nodetool tablestats | grep -E 'Table:|Maximum tombstones per slice' | awk 'NR%2{printf "%s ", $0; next;}1' | sort -k8 -n -r | head -n 5
E

Error Medic Editorial

The Error Medic Editorial team consists of senior SREs and Database Administrators specializing in distributed systems, NoSQL databases, and high-availability infrastructure.

Sources

Related Guides