Why do I get a timeout at consistency ONE? Shouldn't it be the fastest?

Yes, consistency ONE only requires a single replica to respond. If you get a timeout at ONE, it means the replica that was contacted is either completely unresponsive (due to a long GC pause or heavy I/O), or the network connection between the coordinator and the replica is broken, causing the message to be dropped.

How do I temporarily increase the read timeout?

You can modify the 'read_request_timeout_in_ms' setting in the cassandra.yaml file. The default is usually 5000ms. Increasing this will give queries more time to complete, but it requires a rolling restart of your Cassandra nodes to take effect. It is generally not recommended as a permanent fix without addressing the underlying latency cause.

Why does the timeout specifically mention LOCAL_QUORUM?

LOCAL_QUORUM is the most common consistency level for production applications. It requires a majority of replicas in the local datacenter to acknowledge the read. If the replication factor is 3, 2 nodes must respond. If one node is down and another is experiencing a GC pause, the coordinator will fail to get 2 responses in time, resulting in this specific error.

Can tombstones cause a ReadTimeoutException?

Absolutely. It is the most common cause. When Cassandra executes a read, it must scan past tombstones (deleted data markers) to find live data. If your query forces Cassandra to scan hundreds of thousands of tombstones, the CPU and memory overhead will easily exceed the 5-second default timeout. Check system.log for 'Scanned over X tombstones' warnings.

Resolving Cassandra Read Timeout During Query: LOCAL_QUORUM, ONE, and SERIAL

Q: What does 'Cassandra timeout during cas write query at consistency SERIAL' mean?

This occurs during Lightweight Transactions (LWT / IF NOT EXISTS). A CAS operation uses the Paxos protocol, which involves a 'prepare/promise' read phase operating at SERIAL consistency. If the nodes cannot achieve a quorum for the Paxos ballot within the cas_contention_timeout_in_ms (often due to high contention on the same row or network delays), this timeout is thrown.

Comprehensive guide to fixing Cassandra read timeouts (ReadTimeoutException) at LOCAL_QUORUM, ONE, and CAS SERIAL consistencies. Diagnose tombstones, GC pauses,

Last updated: February 24, 2026

Last verified: February 24, 2026

1,478 words

Key Takeaways

Tombstone Overload: Scanning too many tombstones is the #1 cause of Cassandra read timeouts. Use nodetool tablestats to check.
Garbage Collection (GC) Pauses: Stop-the-world JVM pauses exceeding read_request_timeout_in_ms will drop requests.
Network or Disk I/O Bottlenecks: Slow disks or saturated NICs cause replica responses to miss the coordinator's timeout window.
Unoptimized Queries: Large partition reads or ALLOW FILTERING without partition keys overwhelm the coordinator.
Quick Fix: Check system.log for 'Scanned over X tombstones', run nodetool tpstats for dropped messages, and isolate the offending table.

Troubleshooting Approaches Compared
Method	When to Use	Time to Execute	Risk Level
Force Major Compaction	High tombstone count causing timeouts on specific tables	Hours to Days	High (Disk I/O intensive)
Tune GC / Heap	Long JVM pauses seen in gc.log or system.log	Minutes (Requires Restart)	Medium
Downgrade Consistency (e.g., QUORUM to ONE)	Emergency mitigation to restore availability	Immediate (App side)	High (Data staleness/Inconsistency)
Increase read_request_timeout_in_ms	Queries legitimately take longer due to payload size	Minutes (Rolling Restart)	Medium (Masks underlying issue)

Understanding the Error

When working with Apache Cassandra, one of the most dreaded errors a developer or operator can encounter is the ReadTimeoutException. This error indicates that the coordinator node did not receive enough responses from replicas within the configured timeout window (default is typically 5000ms for reads).

Depending on your application's consistency requirements, the exact error message will vary. You might see:

Cassandra timeout during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded)
Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)
Cassandra timeout during read query at consistency LOCAL_ONE
Cassandra timeout during cas write query at consistency SERIAL

The last one is particularly interesting because it occurs during Lightweight Transactions (LWTs). A Compare-And-Set (CAS) write utilizes the Paxos consensus protocol, and the read phase of this protocol must achieve SERIAL or LOCAL_SERIAL consistency. If the Paxos ballot fails to reach a quorum in time, this timeout is thrown.

Root Causes

Why do these timeouts happen? The distributed nature of Cassandra means the bottleneck could be anywhere from the disk of a single replica to the network fabric.

Tombstone Overload: When you delete data in Cassandra, it isn't removed immediately. A marker called a 'tombstone' is written. During a read, Cassandra must scan and filter out these tombstones. If a query scans thousands of tombstones to find a few live rows, the CPU overhead and memory pressure will cause a timeout.
Garbage Collection (GC) Pauses: Cassandra runs on the JVM. If the heap is poorly tuned, or if massive queries are creating high object churn, the JVM will trigger a 'Stop-the-World' garbage collection. If this pause exceeds read_request_timeout_in_ms (default 5 seconds), the coordinator assumes the node is dead for that request.
Hardware Bottlenecks: High CPU utilization, saturated disk I/O (i.e., high iowait), or network packet loss will delay replica responses.
Bad Query Patterns: Unbounded partition reads, cross-partition queries using ALLOW FILTERING, or fetching massive payloads in a single request.

Step 1: Diagnose the Bottleneck

Before changing any configurations, you must identify why the reads are slow. Log into a Cassandra node and use the built-in diagnostic tools.

Check for Dropped Messages Run nodetool tpstats. Look at the Dropped column for READ and MUTATION stages. If you see high numbers of dropped reads, the node is shedding load because it cannot process requests fast enough.

Analyze Latency Histograms Run nodetool proxyhistograms. This provides a distribution of coordinator-level latencies. If the 99th percentile for reads is close to or above your timeout threshold (e.g., 5000000 microseconds), the cluster is generally degraded.

Identify the Offending Table Run nodetool top partitions or nodetool tablestats. Look for tables with a high Maximum tombstones per slice or poor read latency.

Check the system.log (usually in /var/log/cassandra/system.log) for tombstone warnings: WARN [ReadStage-2] 2023-10-27 10:00:00,000 ReadCommand.java:400 - Read 1000 live rows and 50000 tombstone cells for query SELECT * FROM keyspace.table WHERE...

Step 2: Implement the Fix

Fix A: Resolving Tombstone Issues

If your logs are full of tombstone warnings, you have a data model or maintenance problem.

Run Compaction: If data has passed its gc_grace_seconds, running compaction will permanently remove the tombstones. Use nodetool compact <keyspace> <table>.
Adjust Query Limits: Ensure your application isn't doing massive slice queries over highly deleted partitions.
Change Compaction Strategy: If the table has heavy updates/deletes, switch from SizeTieredCompactionStrategy (STCS) to LeveledCompactionStrategy (LCS), which handles overwrites and deletes much more efficiently.

Fix B: Mitigating GC Pauses

Check the GC logs (gc.log). If you see pauses lasting 3-10 seconds, you need JVM tuning.

Enable G1GC: If you are still using CMS on an older Cassandra version, switch to G1GC in jvm.options.
Increase Heap: Ensure MAX_HEAP_SIZE is adequately set (typically 8GB to 31GB, never more than 32GB due to compressed oops).
Reduce Churn: Stop querying large batches of data at once. Implement pagination using paging state.

Fix C: Addressing CAS / SERIAL Timeouts

Cassandra timeout during cas write query at consistency SERIAL means your LWTs are failing.

LWTs require 4 round trips. They are extremely sensitive to latency.
Ensure network latency between nodes (especially across data centers) is minimal.
Check for high contention. If multiple threads are constantly trying to CAS the exact same partition key simultaneously, Paxos ballots will continuously fail and retry until the timeout is hit. Redesign the application logic to reduce contention on single partitions.

Fix D: The Band-Aid (cassandra.yaml)

If you have legitimately heavy queries (e.g., analytical workloads) and you cannot optimize them further, you can increase the timeout limits in cassandra.yaml. Warning: This masks the symptom; it does not cure the disease.

Edit /etc/cassandra/cassandra.yaml: read_request_timeout_in_ms: 10000 range_request_timeout_in_ms: 20000 cas_contention_timeout_in_ms: 3000

Requires a rolling restart of the cluster to take effect.

Frequently Asked Questions

bash

#!/bin/bash
# Cassandra Diagnostic Script for Read Timeouts

# 1. Check for dropped read messages in the last 5 minutes
echo "=== Dropped Messages ==="
nodetool tpstats | grep -E 'Message type|READ|MUTATION|RangeSlice'

# 2. Check coordinator level latencies (look at 99% line)
echo -e "\n=== Proxy Histograms ==="
nodetool proxyhistograms

# 3. Search system.log for tombstone warnings today
echo -e "\n=== Tombstone Warnings ==="
grep "tombstone" /var/log/cassandra/system.log | tail -n 10

# 4. Check for long GC pauses
echo -e "\n=== Long GC Pauses (> 1000ms) ==="
grep "stopped:" /var/log/cassandra/gc.log | awk '{print $NF}' | sed 's/seconds//g' | awk '$1 > 1.0 {print "Pause of " $1 " seconds found"}'

# 5. Identify tables with the most tombstones
echo -e "\n=== Top Tables by Tombstones ==="
nodetool tablestats | grep -E 'Table:|Maximum tombstones per slice' | awk 'NR%2{printf "%s ", $0; next;}1' | sort -k8 -n -r | head -n 5

Error Medic Editorial

The Error Medic Editorial team consists of senior SREs and Database Administrators specializing in distributed systems, NoSQL databases, and high-availability infrastructure.

Sources

Fix Cassandra connection refused errors on port 9042. Diagnose OOM kills, misconfigured listen_address, firewall blocks, slow queries, and data corruption with

Fixing Cassandra 'Connection Refused', OOM, and Slow Queries: A Complete Guide

Comprehensive troubleshooting guide to resolve Cassandra connection refused, out of memory (OOM), slow queries, timeouts, and SSTable corruption errors.

DynamoDB Slow Query, Timeout & Table Lock: Complete Troubleshooting Guide

Fix DynamoDB slow queries, ProvisionedThroughputExceededException, and timeout errors. Step-by-step diagnosis with AWS CLI commands and proven solutions.

ERROR: deadlock detected - Resolving PostgreSQL Deadlocks & Connection Exhaustion

Fix PostgreSQL deadlocks (ERROR: 40P01) and connection pool exhaustion. Learn to trace lock contention, enforce consistent lock ordering, and optimize transacti