Error Medic

Cassandra 'Connection Refused' on Port 9042: Complete Troubleshooting Guide

Fix Cassandra connection refused errors on port 9042. Diagnose OOM kills, misconfigured listen_address, firewall blocks, slow queries, and data corruption with

Last updated:
Last verified:
1,846 words
Key Takeaways
  • Root cause #1: JVM heap exhaustion triggers a silent OS OOM kill — Cassandra writes no log entry. Check `dmesg | grep -i oom` and raise -Xmx in jvm.options to min(RAM/4, 8GB).
  • Root cause #2: Native transport misconfiguration — listen_address set to 0.0.0.0 (invalid) or start_native_transport: false prevents port 9042 from opening even when the process is running.
  • Root cause #3: Firewall rules blocking port 9042, or a full disk causing Cassandra to halt new connections to prevent data loss.
  • Root cause #4: Commitlog or SSTable corruption prevents node startup — identified by FSReadError or CorruptSSTableException in /var/log/cassandra/system.log.
  • Quick fix sequence: run `systemctl status cassandra`, `ss -tlnp | grep 9042`, `dmesg | grep oom`, then inspect system.log for ERROR or FATAL lines before attempting any restart.
Fix Approaches Compared
MethodWhen to UseTimeRisk
Restart Cassandra serviceService stopped, no corruption errors in log2-5 minLow
Increase JVM heap (-Xmx)OOM kills in dmesg or GC pauses > 5s in system.log10 min + restartLow
Fix listen_address / rpc_addressProcess running but port 9042 not listening5 min + restartLow
Open firewall for port 9042nc -zv from client fails, no ACCEPT rule in iptables2 minLow
Remove corrupted commitlogFSReadError on startup, node refuses to start15 minMedium — recent writes may be lost
nodetool scrub --skip-corruptedCorruptSSTableException during reads, node online30-120 minMedium — corrupt rows dropped
nodetool repair -prData inconsistency after node recoveryHoursHigh — heavy I/O, low-traffic window
Restore from snapshotSevere unrecoverable data corruptionHoursHigh — requires recent valid backup

Understanding Cassandra "Connection Refused" Errors

When a client receives Connection refused on port 9042, the TCP handshake itself is failing — Cassandra is not running, not bound to the expected address, or a firewall is dropping the SYN packet. This differs from authentication errors (TCP completes before rejecting) and read timeouts (connection succeeds but query is slow).

Common error messages:

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed
  (tried: /10.0.0.1:9042 (TransportException: [/10.0.0.1:9042] Cannot connect))

cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers',
  {'10.0.0.1': ConnectionRefusedError(111, 'Connection refused')})

Step 1: Verify Cassandra Service State

# Check service state
systemctl status cassandra --no-pager -l

# Confirm JVM process exists
ps aux | grep CassandraDaemon | grep -v grep

# Verify port 9042 is bound
ss -tlnp | grep 9042

If ss -tlnp | grep 9042 returns nothing, Cassandra is not listening. Read the last startup attempt before restarting:

tail -100 /var/log/cassandra/system.log | grep -iE "error|exception|fatal|killed"
journalctl -u cassandra -n 50 --no-pager

Step 2: Diagnose Out-of-Memory (OOM) Kills

OOM kills are the most frequent silent cause of Cassandra failures. The Linux kernel terminates the JVM without writing to Cassandra's logs.

# Kernel OOM evidence
dmesg | grep -iE "oom|killed process" | grep -i java

# GC pressure warnings
grep -E "GCInspector.*[0-9]{4,}ms" /var/log/cassandra/system.log | tail -20

A GC pause warning preceding self-halt:

WARN  [GCInspector] GCInspector.java:286 - G1 Young Generation GC in 11432ms.
G1 Eden Space: 8388608 -> 0; G1 Old Gen: 7516192768 -> 7814037504;

Pauses over 10 seconds trigger Cassandra's self-halt. Fix in /etc/cassandra/jvm.options (Java 8) or /etc/cassandra/jvm11-server.options (Java 11+):

# Rule: min(total_RAM / 4, 8GB). Never exceed 31GB.
# For a 32GB RAM server:
-Xms8G
-Xmx8G

Set -Xms equal to -Xmx to prevent resize pauses. Verify OS file descriptor limits:

ulimit -n  # Minimum 100000 for production Cassandra

Step 3: Check Native Transport Configuration

If the process is running but port 9042 is not listening:

grep -E "^(listen_address|rpc_address|broadcast_rpc_address|native_transport_port|start_native_transport)" \
  /etc/cassandra/cassandra.yaml

Correct production settings:

listen_address: 10.0.0.1         # Specific IP — never 0.0.0.0
rpc_address: 0.0.0.0             # Accept client connections on all interfaces
broadcast_rpc_address: 10.0.0.1  # IP advertised to clients
native_transport_port: 9042
start_native_transport: true

Critical mistake: listen_address: 0.0.0.0 is invalid and causes startup failure. Always use a specific IP for listen_address and pair rpc_address: 0.0.0.0 with broadcast_rpc_address.

Step 4: Firewall and Network Verification

# Test TCP reachability from the application server
nc -zv <cassandra_node_ip> 9042

# iptables (RHEL, Amazon Linux, older Ubuntu)
iptables -L INPUT -n -v --line-numbers | grep -E "9042|DROP|REJECT"

# nftables (Debian 12+, Ubuntu 22.04+)
nft list ruleset | grep -B2 -A2 9042

Required open ports:

Port Purpose
7000 Inter-node gossip (plain)
7001 Inter-node gossip (TLS)
7199 JMX monitoring
9042 Native CQL (clients)

Step 5: Slow Queries and Timeout Troubleshooting

If connections succeed but you see OperationTimedOutException or WriteTimeoutException, the problem is throughput, not connectivity.

# Check thread pool saturation — non-zero Dropped is critical
nodetool tpstats

# Examine latency percentiles
nodetool proxyhistograms

# Check compaction backlog
nodetool compactionstats

# Identify hot tables
nodetool tablestats <keyspace>.<table> | grep -E "Maximum|Mean|Local read|Local write"

Non-zero Dropped values in ReadStage, MutationStage, or ViewMutationStage mean Cassandra is shedding work. Enable slow query logging in cassandra.yaml:

slow_query_log_timeout_in_ms: 500

Then review:

grep "slow query" /var/log/cassandra/system.log | tail -20

Step 6: Data Corruption Diagnosis and Recovery

Commitlog corruption prevents startup with:

ERROR [main] CassandraDaemon.java:689 - Exception encountered during startup
org.apache.cassandra.io.FSReadError: java.io.IOException:
  Corrupt commitlog /var/lib/cassandra/commitlog/CommitLog-7-1639000000000.log

Commitlog recovery (risks loss of last few seconds of uncommitted writes):

systemctl stop cassandra
mv /var/lib/cassandra/commitlog /var/lib/cassandra/commitlog.bak.$(date +%s)
mkdir -p /var/lib/cassandra/commitlog
chown cassandra:cassandra /var/lib/cassandra/commitlog
systemctl start cassandra

For SSTable corruption discovered at runtime:

# Online scrub — drops corrupt rows, preserves valid data
nodetool scrub --skip-corrupted <keyspace> <table>

# Offline scrub for severe corruption
systemctl stop cassandra
sstablescrub --skip-corrupted <keyspace> <table>
systemctl start cassandra

# Restore replica consistency
nodetool repair -pr <keyspace>

Step 7: Validate Cluster Health

# All nodes should show UN (Up/Normal)
nodetool status

# Verify gossip propagation
nodetool gossipinfo | grep -E "STATUS|ENDPOINT_IP" | head -20

Healthy nodetool status output:

Datacenter: datacenter1
=======================
Status=Up/Down  |/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens  Owns  Host ID           Rack
UN  10.0.0.1    45.6 GiB   256     33.3% abc123-...-def456  rack1
UN  10.0.0.2    46.1 GiB   256     33.3% ghi789-...-jkl012  rack1
UN  10.0.0.3    44.9 GiB   256     33.4% mno345-...-pqr678  rack1

Any node showing DN (Down/Normal) is unreachable. Investigate with journalctl -u cassandra -n 200 --no-pager on that host immediately.

Frequently Asked Questions

bash
#!/usr/bin/env bash
# Cassandra Connection Refused -- Rapid Diagnostic Script
# Run as root or the cassandra OS user on the affected node
# Usage: bash cassandra-diag.sh 2>&1 | tee /tmp/cass-diag-$(date +%Y%m%d-%H%M%S).log

set -uo pipefail

CASS_LOG="/var/log/cassandra/system.log"
CASS_YAML="/etc/cassandra/cassandra.yaml"

echo "=== [1] Cassandra Service Status ==="
systemctl status cassandra --no-pager -l 2>&1 | head -20 || true

echo ""
echo "=== [2] Port 9042 Listening? ==="
ss -tlnp | grep 9042 || echo "ALERT: Nothing is listening on port 9042"

echo ""
echo "=== [3] Recent Errors in system.log ==="
if [ -f "$CASS_LOG" ]; then
  grep -iE "error|exception|fatal|oom|killed|corrupt" "$CASS_LOG" | tail -30
else
  echo "system.log not found at $CASS_LOG"
fi

echo ""
echo "=== [4] OOM Killer Activity ==="
dmesg | grep -iE "oom|killed process" | grep -i java | tail -10 \
  || echo "No OOM kills found in dmesg"

echo ""
echo "=== [5] GC Pause Warnings (>= 1000ms) ==="
grep -E "GCInspector.*[0-9]{4,}ms" "$CASS_LOG" 2>/dev/null | tail -10 \
  || echo "No long GC pauses found"

echo ""
echo "=== [6] Network Address Configuration ==="
if [ -f "$CASS_YAML" ]; then
  grep -E "^(listen_address|rpc_address|broadcast_rpc_address|native_transport_port|start_native_transport)" \
    "$CASS_YAML"
else
  echo "cassandra.yaml not found at $CASS_YAML"
fi

echo ""
echo "=== [7] JVM Heap Settings ==="
for f in /etc/cassandra/jvm.options \
         /etc/cassandra/jvm11-server.options \
         /etc/cassandra/jvm17-server.options; do
  if [ -f "$f" ]; then
    echo "File: $f"
    grep -E "^-Xm[sx]" "$f"
    break
  fi
done || echo "No JVM options file found at expected paths"

echo ""
echo "=== [8] Firewall Rules for Port 9042 ==="
iptables -L INPUT -n --line-numbers 2>/dev/null | grep -E "9042|DROP|REJECT" | head -10 \
  || nft list ruleset 2>/dev/null | grep -B2 -A2 9042 \
  || echo "Unable to inspect firewall rules"

echo ""
echo "=== [9] Disk Usage ==="
df -h /var/lib/cassandra /var/log/cassandra 2>/dev/null || df -h /

echo ""
echo "=== [10] Cluster Status ==="
nodetool status 2>/dev/null || echo "nodetool unavailable -- Cassandra may not be running"

echo ""
echo "=== [11] Thread Pool Dropped Messages ==="
nodetool tpstats 2>/dev/null | grep -E "Stage|Dropped" | head -30 \
  || echo "nodetool unavailable"

echo ""
echo "=== [12] Compaction Backlog ==="
nodetool compactionstats 2>/dev/null | head -20 || echo "nodetool unavailable"

echo ""
echo "=== Diagnostic Complete ==="
echo "Reference: https://cassandra.apache.org/doc/latest/cassandra/troubleshooting/index.html"
E

Error Medic Editorial

Error Medic Editorial is a team of senior DevOps engineers and SREs with hands-on experience operating Apache Cassandra clusters at scale in production. Our troubleshooting guides are derived from real incident postmortems and reviewed for technical accuracy against current Cassandra documentation.

Sources

Related Guides