Troubleshooting PostgreSQL Connection Refused & Service Failures
A complete DevOps guide to fixing PostgreSQL connection refused errors, service crashes, OOM, disk full, high CPU, and too many connections.
- Connection refused usually means the PostgreSQL service is stopped, restarting, or listening on the wrong interface.
- Service crashes are frequently caused by Out of Memory (OOM) killer, Disk Full errors on the WAL directory, or permission issues.
- Connection limits ('too many connections') will block new clients; use a connection pooler like PgBouncer instead of blindly increasing max_connections.
- Quick Fix: Check service status ('systemctl status postgresql') and inspect the latest logs ('journalctl -u postgresql' or '/var/log/postgresql/') to identify the exact root cause.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Restart Service (`systemctl restart`) | Service is stopped or hung without data corruption | Low | Low |
| Modify `listen_addresses` / `pg_hba.conf` | Network connections are rejected or routing fails | Medium | Low |
| Increase `max_connections` | Seeing 'FATAL: too many connections' in logs | Low | Medium (Can cause OOM) |
| Install & Configure PgBouncer | High connection churn or persistent 'too many connections' | High | Low |
| Tune `shared_buffers` & `work_mem` | Experiencing OOM kills or high memory utilization | Medium | Medium (Requires restart) |
| Clear Disk Space / Expand Volume | Logs show 'No space left on device' (Disk Full) | Medium | High (If deleting wrong files) |
Understanding the Error: PostgreSQL Connection Refused
When you encounter a psql: error: connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused error, it signifies that the client application successfully reached the host, but no PostgreSQL server process was listening on the specified port. This is arguably the most common PostgreSQL error, but it's often a symptom of a deeper issue, ranging from simple configuration oversights to severe resource exhaustion leading to a crash.
In this comprehensive guide, we will troubleshoot not only the standard "connection refused" scenario but also the cascading failures that cause it: service not starting, out of memory (OOM) kills, disk full errors, high CPU utilization, and connection limit exhaustion.
Step 1: Diagnose the Initial State
The first step is always to check the status of the PostgreSQL service and examine the most recent logs. A stopped service is the primary reason for a connection refusal.
1.1 Check Service Status
Use systemctl to verify if the service is active, inactive, or failed. If the service is inactive, it explains the connection refusal. The critical question is why it stopped.
1.2 Inspect PostgreSQL Logs
The logs are the ultimate source of truth. Depending on your OS and installation, they are typically found in /var/log/postgresql/, /var/lib/pgsql/data/log/, or accessible via journalctl -u postgresql. Look for FATAL, PANIC, or ERROR messages immediately preceding the time the connections started failing.
Step 2: Root Cause Analysis and Fixes
Based on your log analysis, you will typically fall into one of the following root cause categories.
Scenario A: The Service is Not Starting (Configuration or Permissions)
If systemctl status postgresql shows the service as failed immediately after trying to start, the issue is likely configuration or filesystem permissions.
- Listen Addresses and Port: Ensure
postgresql.confhaslisten_addresses = '*'(or a specific IP) andport = 5432. If it's set tolocalhostbut you attempt to connect externally, the connection will be refused by the OS before it even reaches Postgres. - pg_hba.conf: While a
pg_hba.confrejection usually results in a "fatal: no pg_hba.conf entry" or "permission denied" error rather than a pure "connection refused", severe misconfigurations here can prevent startup or block specific subnets. - Permission Denied: If the
postgressystem user cannot read the data directory, configuration files, or SSL certificates, the service will crash on startup. Fix ownership usingchown -R postgres:postgres /var/lib/postgresql/data(adjust path as necessary) and ensure permissions are strictly0700for the data directory.
Scenario B: "Too Many Connections"
If the service is running but refusing new connections, you might see FATAL: sorry, too many clients already in the logs. This often presents as a "connection refused" or a timeout to the end-user application layer.
- The Band-Aid Fix: You can increase
max_connectionsinpostgresql.conf. However, this is dangerous because every PostgreSQL connection forks a new OS process, consuming a baseline amount of memory. Setting this too high is a primary trigger for OOM kills. - The Engineering Fix: Implement a connection pooler like PgBouncer or Odyssey. A pooler sits between your application and PostgreSQL, multiplexing thousands of application connections onto a small, fixed number of actual database connections, drastically reducing memory overhead and context switching.
Scenario C: Resource Exhaustion (OOM, High CPU, Disk Full)
When a database server runs out of critical resources, the OS or the PostgreSQL engine itself will take drastic measures, leading to a crash or a forced restart. During this recovery phase, connections will be refused.
- Out of Memory (OOM) and Segfaults: If you see
Out of memory: Killed processin your system logs (dmesgor/var/log/messages), the Linux OOM killer terminated the PostgreSQL postmaster process to save the OS. This happens when memory configurations likeshared_buffers,work_mem, ormaintenance_work_memare too high for the available system RAM. A segmentation fault (segfault) or core dump can also occur due to memory corruption, extensions with memory leaks, or hardware issues.- Fix: Tune
postgresql.conf. A general rule of thumb is to setshared_buffersto 25% of total RAM. Strictly limitwork_membecause this allocation can be multiplied by the number of active connections and the number of sort/hash operations in a single complex query. Lowermax_connectionsif no pooler is used.
- Fix: Tune
- Disk Full: A
PANIC: could not write to file "pg_wal/xlog..." : No space left on deviceerror will immediately crash the database to prevent silent data corruption.- Fix: Free up disk space immediately. You may need to clear old log files (not WAL files!), expand the logical volume, or set up WAL archiving to move Write-Ahead Logs to a cheaper storage tier like S3 to prevent the pg_wal directory from filling up.
- High CPU and Slow Queries: A system pinned at 100% CPU might not actively refuse network connections at the socket level, but connection timeouts from the application will masquerade as connection failures. High CPU is often caused by missing indexes forcing sequential scans on massive tables, unoptimized complex joins, or severe bloat.
- Fix: Enable and use the
pg_stat_statementsextension to identify the most time-consuming queries. UseEXPLAIN ANALYZEon these queries to understand the query plan, add appropriate indexes, and ensureautovacuumis keeping statistics up to date.
- Fix: Enable and use the
Scenario D: Crashes and Core Dumps
If the PostgreSQL logs show server process (PID ...) was terminated by signal 9 or mention generating a core dump without an obvious OOM message, PostgreSQL encountered a critical internal failure.
- Hardware Issues: Faulty RAM or failing local NVMe/SSD storage can cause unexpected panics that look like software bugs.
- Software Bugs: While rare in core PostgreSQL, bugs do exist. Ensure you are running the latest minor release for your major PostgreSQL version (e.g., if on 15.2, upgrade to 15.x latest). Also, heavily scrutinize any third-party extensions (like PostGIS or TimescaleDB) as they run in the same memory space and a bug there will crash the entire database.
Step 3: Preventative Maintenance and SRE Best Practices
To avoid waking up to a critical "connection refused" alert, implement proactive monitoring and SRE practices:
- Monitor Connection Utilization: Set up alerts when active connections reach 80% of
max_connections. - Monitor Disk Space Strictly: Alert at 75%, 85%, and 90% utilization of the data and WAL directories. A full disk is an avoidable outage.
- Mandatory Connection Pooling: Make PgBouncer a standard component of your database architecture, even for small workloads, to protect the database engine.
- Autovacuum Tuning: Ensure
autovacuumis aggressively configured. Default settings are often too passive for high-write databases, leading to table bloat, slow queries (high CPU), and eventually transaction ID wraparound issues.
Frequently Asked Questions
# 1. Check if the PostgreSQL service is running and view its status
sudo systemctl status postgresql
# 2. View the last 50 lines of PostgreSQL logs to find PANIC, FATAL, or ERROR messages
sudo journalctl -u postgresql -n 50 --no-pager
# 3. Check for Out of Memory (OOM) kills in system logs
sudo dmesg -T | grep -i 'out of memory\|postgres\|killed'
# 4. Check disk space, specifically the partition holding /var/lib/postgresql
df -h
# 5. Fix ownership if permission denied errors are found on startup (adjust path for your OS)
sudo chown -R postgres:postgres /var/lib/postgresql/15/main
sudo chmod 0700 /var/lib/postgresql/15/main
# 6. Check active connections and limits (run via psql if accessible)
# psql -U postgres -c "SHOW max_connections;"
# psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;"Error Medic Editorial
Our SRE and DevOps editorial team specializes in deep-dive troubleshooting for mission-critical Linux infrastructure and open-source databases. We turn late-night pagers into robust, scalable architectures.
Sources
- https://www.postgresql.org/docs/current/server-start.html
- https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
- https://wiki.postgresql.org/wiki/Number_Of_Database_Connections
- https://stackoverflow.com/questions/1404464/how-to-fix-connection-refused-error-in-postgresql
- https://severalnines.com/database-blog/understanding-postgresql-connection-pooling