PostgreSQL 'Connection Refused': Fix Service Failures, OOM, High CPU, Segfaults, and Crashes
Fix PostgreSQL connection refused errors fast. Step-by-step commands for service not starting, disk full, OOM kills, too many connections, segfaults, and high C
- 'Connection refused' almost always means PostgreSQL is not running or is listening on the wrong interface—check with systemctl status postgresql and tail the server log in /var/log/postgresql/
- Disk full (ENOSPC in WAL directory), OOM kills by the Linux kernel, and wrong file permissions on PGDATA or the Unix socket directory are the three most common startup failure causes
- FATAL: too many connections is solved immediately by terminating idle sessions and permanently by deploying PgBouncer in transaction pooling mode—raising max_connections alone is a temporary stopgap
- Segfaults and core dumps are crash-safe in PostgreSQL: the server replays WAL on restart, but you must identify the root cause (bad RAM, a known bug, or a corrupt extension) to prevent recurrence
- High CPU from autovacuum storms or missing indexes is diagnosed with pg_stat_activity and EXPLAIN (ANALYZE, BUFFERS); kill runaway queries and tune autovacuum cost parameters for lasting relief
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Restart PostgreSQL service | Service crashed, inactive, or stuck | < 1 min | Low |
| Free disk space / expand volume | ENOSPC in logs, WAL directory >80% full | 5–30 min | Low |
| Adjust max_connections and restart | FATAL: too many connections at startup | 2–5 min | Medium — requires restart |
| Deploy PgBouncer (transaction mode) | Persistent connection exhaustion in production | 30–60 min | Low — zero downtime |
| Tune shared_buffers / work_mem + restart | OOM kills, postmaster memory pressure | 10 min + restart | Medium |
| Terminate idle/long-running queries | High CPU, lock waits, autovacuum blocked | < 1 min | Low |
| Fix PGDATA and socket permissions | Permission denied on startup or connect | 2–5 min | Low |
| Upgrade PostgreSQL minor version | Segfault from a known upstream bug | 1–4 hours | High — test on replica first |
Understanding 'PostgreSQL Connection Refused'
When you run psql and see:
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed:
Connection refused
or the TCP variant:
psql: error: connection to server at "127.0.0.1", port 5432 failed: Connection refused
it means either PostgreSQL never started, it crashed after starting, or it is listening on a different address or port than you expect. Each scenario has a distinct signature in the logs.
Step 1: Confirm Whether PostgreSQL Is Running
systemctl status postgresql # Debian/Ubuntu
systemctl status postgresql@14-main # Debian multi-version
systemctl status postgresql-14 # RHEL/CentOS
If the unit is failed or activating (auto-restart), pull the crash reason:
journalctl -u postgresql --since "30 minutes ago" --no-pager
Also check the PostgreSQL server log:
tail -n 100 /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log
# RHEL path:
tail -n 100 /var/lib/pgsql/data/log/postgresql-$(date +%a).log
Startup failure signatures and their causes:
| Log line | Root cause |
|---|---|
No space left on device |
Disk full — WAL or data directory |
Out of memory |
Linux OOM killer terminated postmaster |
Address already in use |
Port 5432 held by another process |
data directory ... has wrong ownership |
PGDATA not owned by postgres user |
permission denied on socket path |
/var/run/postgresql permissions wrong |
database system identifier differs |
Wrong PGDATA pointed at wrong cluster |
server process (PID N) was terminated by signal 11 |
Segfault / core dump |
Step 2: Fix Disk Full (ENOSPC)
PostgreSQL will refuse all connections and crash the moment it cannot write a WAL segment. Check disk usage first:
df -h /var/lib/postgresql
du -sh /var/lib/postgresql/*/main/pg_wal/
du -sh /var/log/postgresql/
Immediate remediation:
# Delete archived WAL backup labels older than 7 days
find /var/lib/postgresql/*/main/pg_wal/ -name '*.backup' -mtime +7 -delete
# Rotate and compress old server logs
find /var/log/postgresql/ -name '*.log' -mtime +14 -delete
# If a pg_dump or temporary sort spilled to disk, clean /tmp
du -sh /tmp && find /tmp -name 'pgsql_tmp*' -delete
After freeing space, start PostgreSQL and run VACUUM FULL on the most bloated tables to reclaim physical space.
Step 3: Diagnose and Fix Out-of-Memory (OOM) Kills
The Linux kernel OOM killer terminates the PostgreSQL postmaster when system memory is exhausted. You will see in dmesg:
kernel: Out of memory: Killed process 18342 (postgres) total-vm:8GB, anon-rss:7GB
Check for OOM events:
dmesg | grep -i 'out of memory\|killed process\|oom_kill'
grep -i 'oom\|killed' /var/log/syslog | tail -30
Memory configuration in postgresql.conf:
shared_buffers = 2GB # 25% of total RAM
work_mem = 64MB # per sort/hash; multiply by max_connections for worst case
maintenance_work_mem = 512MB # VACUUM, CREATE INDEX, ALTER TABLE
effective_cache_size = 6GB # planner hint only — 50–75% of total RAM
max_connections = 100 # lower this to reduce peak memory footprint
Protect the postmaster from the OOM killer permanently via systemd:
# /etc/systemd/system/postgresql.service.d/oom.conf
[Service]
OOMScoreAdjust=-1000
systemctl daemon-reload && systemctl restart postgresql
Step 4: Resolve 'Too Many Connections'
The full error is:
FATAL: remaining connection slots are reserved for non-replication superuser connections
or simply FATAL: too many connections. Superusers always retain one reserved slot, so connect immediately as postgres:
psql -U postgres -c "SELECT count(*), state FROM pg_stat_activity GROUP BY state ORDER BY 1 DESC;"
Terminate stale idle connections without a restart:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
AND state_change < now() - interval '5 minutes'
AND pid <> pg_backend_pid();
Raise the hard limit in postgresql.conf and restart:
max_connections = 500
For a production fix without repeated restarts, install PgBouncer:
apt-get install pgbouncer # Debian/Ubuntu
yum install pgbouncer # RHEL
Minimal /etc/pgbouncer/pgbouncer.ini:
[databases]
mydb = host=127.0.0.1 dbname=mydb
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
pool_mode = transaction
max_client_conn = 2000
default_pool_size = 20
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
Step 5: Fix Permission Denied Errors
OS-level permission errors prevent startup:
pg_ctl: could not open file ".../postmaster.pid": Permission denied
could not connect to server: Permission denied
Is the server running locally and accepting connections on socket "/var/run/postgresql/.s.PGSQL.5432"?
Fix ownership:
chown -R postgres:postgres /var/lib/postgresql
chmod 700 /var/lib/postgresql/14/main
chown postgres:postgres /var/run/postgresql
chmod 775 /var/run/postgresql
For database-level authentication errors (FATAL: password authentication failed or FATAL: no pg_hba.conf entry for host), edit /etc/postgresql/14/main/pg_hba.conf:
# TYPE DATABASE USER ADDRESS METHOD
host all myapp 10.0.0.0/8 scram-sha-256
local all all peer
Reload without a full restart:
SELECT pg_reload_conf();
-- or from the shell:
systemctl reload postgresql
Step 6: Handle Segfaults and Core Dumps
PostgreSQL logs a segfault as:
LOG: server process (PID 22791) was terminated by signal 11: Segmentation fault
DETAIL: Failed process was running: SELECT * FROM large_table ORDER BY id;
LOG: terminating any other active server processes
LOG: all server processes terminated; reinitializing
PostgreSQL is crash-safe: it replays WAL on the next startup and returns to a consistent state. Start the service and verify:
systemctl start postgresql
tail -f /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log
# Wait for: LOG: database system is ready to accept connections
To diagnose the root cause, enable core dumps:
# Allow postgres user to dump cores
ulimit -c unlimited
echo '/tmp/core.%e.%p.%t' | tee /proc/sys/kernel/core_pattern
# Reproduce and analyze
gdb $(which postgres) /tmp/core.postgres.22791.*
(gdb) bt full
(gdb) info registers
Common segfault causes:
- Known PostgreSQL bug — upgrade to the latest minor release (e.g., 14.10 → 14.12).
- Faulty RAM — run
memtest86+for a full memory test cycle. - Corrupt or incompatible extension — comment out extensions in
shared_preload_librariesone at a time and restart. - Kernel version incompatibility — check the PostgreSQL release notes for your version.
Step 7: Diagnose High CPU and Slow Queries
High CPU without an outright crash degrades response times until the service becomes effectively unavailable.
-- Active queries running longer than 30 seconds
SELECT pid,
now() - query_start AS duration,
left(query, 100) AS query_snippet,
state,
wait_event_type,
wait_event
FROM pg_stat_activity
WHERE state <> 'idle'
AND query_start < now() - interval '30 seconds'
ORDER BY duration DESC;
-- Tables with the most dead tuples (autovacuum bloat)
SELECT relname,
n_dead_tup,
n_live_tup,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 15;
Kill a runaway query:
SELECT pg_cancel_backend(pid); -- sends SIGINT (graceful)
SELECT pg_terminate_backend(pid); -- sends SIGTERM (forceful)
Tune autovacuum to be less disruptive:
autovacuum_vacuum_scale_factor = 0.01 # trigger earlier on large tables
autovacuum_vacuum_cost_delay = 2ms # reduce I/O throttle
autovacuum_max_workers = 4
Step 8: Port Already in Use
If PostgreSQL cannot bind port 5432:
ss -tlnp | grep 5432
lsof -i TCP:5432
Kill the conflicting process or change PostgreSQL's port:
# postgresql.conf
port = 5433
Update all application connection strings and reload.
Monitoring and Prevention
- Disk alerting: alert at 80% full on the PostgreSQL volume.
- Connection monitoring: alert when
pg_stat_activitycount exceeds 80% ofmax_connections. - pg_stat_statements: enable this extension to track aggregate query performance over time.
- pg_basebackup + WAL archiving: maintain a point-in-time recovery target of at least 24 hours.
- Prometheus + postgres_exporter: provides dashboards for connections, cache hit ratio, dead tuples, and replication lag.
Frequently Asked Questions
#!/usr/bin/env bash
# postgresql-diagnose.sh — Rapid triage for PostgreSQL failures
# Run as root or the postgres OS user
set -euo pipefail
PG_LOG_DIR="${PG_LOG_DIR:-/var/log/postgresql}"
PGDATA="${PGDATA:-/var/lib/postgresql}"
PG_PORT="${PGPORT:-5432}"
sep() { echo; echo "=== $* ==="; }
# ── 1. Service status ──────────────────────────────────────────────────────
sep "1. PostgreSQL Service Status"
systemctl status postgresql --no-pager -l 2>/dev/null \
|| systemctl status 'postgresql*' --no-pager -l 2>/dev/null \
|| pg_ctl status 2>/dev/null \
|| echo "Cannot determine service status"
# ── 2. Listening sockets ───────────────────────────────────────────────────
sep "2. Port ${PG_PORT} Listeners"
ss -tlnp | grep "${PG_PORT}" || echo "Nothing listening on port ${PG_PORT}"
# ── 3. Disk space ──────────────────────────────────────────────────────────
sep "3. Disk Usage"
df -h "${PGDATA}"
echo "--- WAL directory size ---"
du -sh "${PGDATA}"/**/pg_wal/ 2>/dev/null || true
echo "--- Server log directory size ---"
du -sh "${PG_LOG_DIR}" 2>/dev/null || true
# ── 4. Recent log errors ───────────────────────────────────────────────────
sep "4. Recent FATAL/ERROR/PANIC in Server Log"
LATEST_LOG=$(ls -t "${PG_LOG_DIR}"/*.log 2>/dev/null | head -1 || true)
if [[ -n "${LATEST_LOG}" ]]; then
grep -E 'FATAL|ERROR|PANIC|segfault|oom' "${LATEST_LOG}" | tail -50
else
echo "No log files found in ${PG_LOG_DIR}"
fi
# ── 5. OOM events ──────────────────────────────────────────────────────────
sep "5. OOM Kill Events (dmesg)"
dmesg | grep -i 'out of memory\|oom_kill\|killed process' | tail -20 || echo "None found"
# ── 6. Connection counts ───────────────────────────────────────────────────
sep "6. Current Connection Counts"
psql -U postgres -Aqt -c \
"SELECT lpad(count(*)::text,5) || ' ' || coalesce(state,'(null)') AS state_count
FROM pg_stat_activity GROUP BY state ORDER BY count(*) DESC;" \
2>/dev/null || echo "Cannot connect to PostgreSQL — skipping live checks"
# ── 7. Long-running queries ────────────────────────────────────────────────
sep "7. Queries Running > 30 Seconds"
psql -U postgres -c \
"SELECT pid,
now() - query_start AS duration,
wait_event_type || '/' || wait_event AS wait,
left(query, 80) AS query_snippet
FROM pg_stat_activity
WHERE state <> 'idle'
AND query_start < now() - interval '30 seconds'
ORDER BY duration DESC
LIMIT 20;" \
2>/dev/null || true
# ── 8. Dead-tuple bloat ────────────────────────────────────────────────────
sep "8. Top Tables by Dead Tuples (run VACUUM ANALYZE on these)"
psql -U postgres -c \
"SELECT relname,
n_dead_tup,
n_live_tup,
round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup,0) * 100, 1) AS dead_pct,
last_autovacuum::date
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 10;" \
2>/dev/null || true
# ── 9. File permissions ────────────────────────────────────────────────────
sep "9. Critical Directory Permissions"
ls -ld /var/run/postgresql "${PGDATA}" 2>/dev/null || true
# ── 10. Memory pressure ────────────────────────────────────────────────────
sep "10. System Memory"
free -h
echo
echo "Diagnostic complete. Share this output with your DBA or open a support ticket."
Error Medic Editorial
The Error Medic Editorial team is composed of senior DevOps engineers and SREs with combined decades of experience operating PostgreSQL clusters at scale — from single-server deployments to multi-terabyte high-availability installations. They specialize in Linux performance tuning, database reliability engineering, incident response, and post-mortem analysis.
Sources
- https://www.postgresql.org/docs/current/runtime.html
- https://www.postgresql.org/docs/current/kernel-resources.html
- https://www.postgresql.org/docs/current/runtime-config-connection.html
- https://wiki.postgresql.org/wiki/Out_of_Memory_(OOM)
- https://www.postgresql.org/docs/current/pgbouncer.html
- https://dba.stackexchange.com/questions/69/what-does-postgresql-log-when-it-crashes
- https://github.com/pgbouncer/pgbouncer/blob/master/doc/config.md