Error Medic

Troubleshooting 'PostgreSQL Connection Refused': Resolving Crashes, OOM, and Service Failures on Linux

Fix 'psql: error: connection to server failed: Connection refused'. Learn to diagnose PostgreSQL crashes, OOM errors, max connections, and disk full issues.

Last updated:
Last verified:
1,675 words
Key Takeaways
  • Verify the PostgreSQL service is actively running and listening on the correct network interfaces (listen_addresses).
  • Check system logs for the Linux OOM (Out of Memory) killer terminating the postgres process due to excessive memory utilization.
  • Investigate the $PGDATA partition for 'No space left on device' errors which cause immediate database panics and crashes.
  • Resolve 'FATAL: remaining connection slots are reserved' by tuning max_connections or implementing a connection pooler like PgBouncer.
  • Audit pg_hba.conf for correct client authentication rules to prevent 'permission denied' rejections.
Diagnostic and Fix Approaches Compared
MethodWhen to UseTimeRisk
Service Restart (systemctl)PostgreSQL failed to start or stopped unexpectedly< 5 minsLow
Tuning max_connectionsSeeing 'too many connections' FATAL errors10 minsMedium (Increases RAM usage)
Configuring pg_hba.confRemote clients get 'permission denied' or IP rejections5 minsLow (Requires reload, no downtime)
Clearing Disk Space / Archiving WALDatabase panicked due to 'disk full' in pg_wal30+ minsHigh (Never manually delete WAL files)
Adjusting shared_buffers / work_memKernel OOM killer is terminating PostgreSQL instances15 minsMedium (Requires restart and performance testing)

Understanding the 'Connection Refused' Error in PostgreSQL

The dreaded psql: error: connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused is a symptom, not a root cause. It explicitly means the operating system rejected the TCP/IP or Unix domain socket connection attempt. This happens either because the PostgreSQL daemon (postgres) is not running, it is not listening on the expected interface/port, or a network firewall is actively blocking the request.

When PostgreSQL is not working, the failure often cascades from underlying systemic issues: a PostgreSQL crash due to an Out of Memory (OOM) event, a disk full scenario corrupting the startup sequence, or resource exhaustion like 'too many connections'. In this comprehensive guide, we will dissect each scenario, from basic service interruptions to complex core dumps and high CPU bottlenecks.

Step 1: Validating the Service and Socket State

Before diving into complex logs, you must determine if the database process is alive.

1. Check the Service Status: Use systemd to verify the state of the PostgreSQL service on your Linux host.

systemctl status postgresql

If the service is marked as failed or inactive, PostgreSQL is not starting. You must check journalctl -xeu postgresql to find the exact startup failure reason.

2. Verify Listening Ports: If the service is active, ensure it is actually binding to the correct port.

ss -nltp | grep 5432

If you do not see postgres listening on 0.0.0.0:5432 (all IPv4) or 127.0.0.1:5432 (localhost), the server might only be listening on a Unix socket, or the listen_addresses parameter in postgresql.conf is misconfigured.

Step 2: Diagnosing PostgreSQL Crashes and Core Dumps

If PostgreSQL failed to start or stopped unexpectedly during operation, the system logs are your absolute source of truth.

The OOM Killer (PostgreSQL Out of Memory) PostgreSQL is highly memory-dependent. If shared_buffers or work_mem is configured too aggressively relative to your server's physical RAM, the Linux kernel's Out of Memory (OOM) killer will forcibly terminate the postgres process to save the OS environment. This results in a sudden crash and "connection refused" for all subsequent client requests. Check the kernel ring buffer for OOM events:

dmesg -T | grep -i -E 'killed process|oom'

Fix: Reduce shared_buffers (typically 25% of total RAM is the recommended starting point) and lower work_mem. Ensure your Linux system has adequate swap space configured to prevent sudden OOM termination. You can also adjust the oom_score_adj for the PostgreSQL process, though fixing the memory configuration is the permanent solution.

Segmentation Faults and Core Dumps A postgresql segfault usually indicates a bug in a PostgreSQL extension (like PostGIS or TimescaleDB), hardware memory corruption, or a severe data corruption issue. Check your PostgreSQL logs (usually located in /var/log/postgresql/ or /var/lib/pgsql/data/log/ depending on your distro):

tail -n 100 /var/log/postgresql/postgresql-14-main.log

Look for errors like: server process (PID ...) was terminated by signal 11: Segmentation fault. Fix: Disable suspected third-party extensions, run a memory test (memtester) on the host hardware, and ensure you are running the latest minor release of your PostgreSQL version to patch known C-level bugs. If a core dump is generated, you may need to use gdb to inspect the backtrace.

Step 3: Resolving 'Disk Full' and Storage Failures

PostgreSQL will immediately stop accepting connections, enter a read-only mode, and eventually crash if the partition containing its data directory ($PGDATA) or Write-Ahead Logs (pg_wal / pg_xlog) runs out of space. The logs will scream: PANIC: could not write to file "pg_wal/xlog...": No space left on device.

1. Identify the Full Mount:

df -h /var/lib/postgresql

2. Remediation: CRITICAL WARNING: Do NOT manually delete files in pg_wal/ or pg_xlog/. Doing so will corrupt your database permanently and result in unrecoverable data loss. Instead, clear temporary OS files, compress old application logs residing on the same partition, or add a new virtual disk and symlink non-critical directories. If your archive_command is failing and causing WAL logs to pile up, fix the destination storage so PostgreSQL can successfully archive and rotate the WAL files automatically.

Step 4: 'Too Many Connections' and Connection Exhaustion

If you receive the error message FATAL: remaining connection slots are reserved for non-replication superuser connections, PostgreSQL is actively running, but it has exhausted its max_connections limit.

This usually occurs due to connection leaks in your application logic or the complete lack of connection pooling.

Quick Fix: Increase max_connections in postgresql.conf:

max_connections = 300

(Note: Changing this parameter requires a full service restart: systemctl restart postgresql).

Long-term Fix: Deploy a connection pooler like PgBouncer or Pgpool-II. PostgreSQL forks a completely new OS process for every incoming connection. Meaning, high connection counts lead to severe memory bloat, high CPU context switching, and eventually an OOM crash or extreme database slowness. A pooler multiplexes thousands of client connections into a small handful of actual PostgreSQL backend processes.

Step 5: Fixing Permission Denied and Authentication Errors

Sometimes the network connection succeeds, but PostgreSQL actively rejects the login with FATAL: permission denied for database "mydb" or FATAL: Ident authentication failed for user "myuser".

1. Network Bindings (postgresql.conf) By default, PostgreSQL only listens on local loopback. To allow external network connections, edit postgresql.conf:

listen_addresses = '*'

2. Client Authentication (pg_hba.conf) You must explicitly allow the incoming IP address subnet and specify the authentication method in pg_hba.conf. Add a rule for your application servers at the top of the file:

host    all             all             10.0.0.0/16             scram-sha-256

After modifying these files, reload the configuration (no restart required for pg_hba.conf edits):

systemctl reload postgresql
# Or via SQL:
# SELECT pg_reload_conf();

Step 6: Addressing High CPU and Slow Performance

When PostgreSQL is excessively slow, experiencing postgresql high cpu, or blocking heavily on locks, application clients might time out, simulating a "connection refused" or "connection dropped" at the application layer.

Identify rogue, long-running queries using the pg_stat_activity system view:

SELECT pid, usename, state, query, extract(epoch FROM now() - query_start) AS duration_seconds
FROM pg_stat_activity
WHERE state != 'idle' 
ORDER BY duration_seconds DESC;

Kill stuck or blocked queries holding locks:

SELECT pg_terminate_backend(pid);

Long-running transactions hold locks and prevent the autovacuum daemon from clearing dead tuples. This leads to massive table bloat and catastrophic performance degradation. Ensure autovacuum is enabled, running frequently, and aggressively tuned for your high-transaction tables.

Frequently Asked Questions

bash
#!/bin/bash
# Diagnostic script for PostgreSQL connection and crash issues

echo "=== Checking PostgreSQL Service Status ==="
systemctl status postgresql --no-pager | grep -E 'Active:|Main PID:'

echo -e "\n=== Checking Listening Ports ==="
ss -nltp | grep 5432

echo -e "\n=== Checking for OOM Killer Events in syslog ==="
dmesg -T | grep -i -E 'killed process|oom.*postgres'

echo -e "\n=== Checking Disk Space on PostgreSQL Mounts ==="
df -h $(sudo -u postgres psql -t -P format=unaligned -c 'show data_directory;')

echo -e "\n=== Checking Max Connections Configuration ==="
sudo -u postgres psql -c "SHOW max_connections;"
sudo -u postgres psql -c "SELECT count(*) AS current_connections FROM pg_stat_activity;"
E

Error Medic Editorial

Error Medic Editorial is a collective of senior DevOps and Site Reliability Engineers dedicated to providing battle-tested troubleshooting guides and infrastructure solutions.

Sources

Related Guides