Why do I get 'Connection Refused' but the InfluxDB service is running?

This is typically caused by a bind-address misconfiguration. If InfluxDB is bound to 127.0.0.1 (localhost), any connection attempts from external IP addresses or Docker bridge networks will be refused. Check your `influxdb.conf` and set `bind-address = ":8086"`.

How can I tell if InfluxDB crashed due to Out of Memory (OOM)?

You can check the Linux kernel logs using `dmesg -T | grep -i oom-killer` or inspect the systemd journal with `journalctl -u influxdb -f`. Look for lines stating `Out of memory: Killed process (influxd)`.

How do I fix InfluxDB slow queries timing out my application?

Slow queries are usually caused by unbounded time ranges or high-cardinality `GROUP BY` clauses. Optimize your queries to always include a `WHERE time > now() - X` condition. You can also enforce a global timeout by setting `query-timeout = "60s"` in the `[coordinator]` section of your `influxdb.conf`.

What is the difference between an in-memory index and TSI?

The in-memory index stores all series keys in RAM, which is fast but leads to OOM crashes if you have millions of unique tag combinations (high cardinality). TSI (Time Series Index) moves the index to disk, heavily reducing memory usage at the cost of slight disk I/O overhead.

How can I gracefully drop old data to prevent InfluxDB out of memory errors?

You should use Retention Policies (RPs). An RP automatically drops data older than a specified duration. For example, `CREATE RETENTION POLICY "30_days" ON "mydb" DURATION 30d REPLICATION 1 DEFAULT` will delete data older than 30 days, freeing up memory and disk space.

Troubleshooting InfluxDB Connection Refused and Out of Memory Errors

Fix Approaches Compared
Method	When to Use	Time	Risk
Change bind-address	Service runs but external clients get Connection Refused	5 mins	Low
Increase ulimit (File Descriptors)	Logs show 'too many open files' before crashing	10 mins	Low
Enable TSI (Time Series Index)	High series cardinality causing OOM kills	1-2 hours	Medium (Requires downtime and index rebuild)
Implement Downsampling & RPs	Database size growing unbounded, causing slow queries	Days	High (Data deletion involved)

Understanding the Error: InfluxDB Connection Refused

When you encounter curl: (7) Failed to connect to localhost port 8086: Connection refused or dial tcp 127.0.0.1:8086: connect: connection refused, it usually means the InfluxDB service is either not running, crashing in a loop, or bound to the wrong network interface.

However, in production environments, a "Connection Refused" error is rarely just a simple networking mistake. Frequently, it is the trailing symptom of a more severe underlying issue, such as an InfluxDB Out of Memory (OOM) kill by the Linux kernel, or the database becoming completely unresponsive due to an InfluxDB Slow Query or InfluxDB Timeout.

When InfluxDB runs out of memory, the Linux OOM killer terminates the influxd process. Subsequently, any client attempting to write or query data receives a "Connection Refused" error because the daemon is dead. Similarly, if the database is locked up processing a massive, unoptimized query, the HTTP API may fail to respond within the reverse proxy's timeout window, or the TCP backlog might fill up, leading to rejected connections.

Common Root Causes

Service Stopped or Crash Looping (OOM Kills): The most common reason for unexpected connection refusals. The influxd process consumes memory proportional to series cardinality and query complexity.
Bind Address Configuration: The bind-address or http-bind-address in influxdb.conf is set to 127.0.0.1 but clients are trying to connect via a public or Docker network IP.
Firewall and Security Groups: Port 8086 (HTTP API) or 8088 (RPC) is blocked by iptables, ufw, or cloud provider security groups.
File Descriptor Exhaustion: InfluxDB opens many files for TSM and TSI files. If ulimit -n is too low, it stops accepting new connections.

Step 1: Diagnose the Connection Refused Error

First, determine if the process is actually running or if it was recently killed.

Check the Service Status: Look for Active: failed or Active: inactive. If it recently restarted, check the uptime. If the service is not running, clients cannot connect.

Check for OOM Kills in the Kernel Log: If you see Out of memory: Killed process <PID> (influxd), your connection refused error is definitively an OOM issue. InfluxDB requires substantial RAM when handling high cardinality or complex GROUP BY time intervals on large datasets.

Verify Network Binding: If the service is running perfectly fine but external clients cannot connect, check the listening ports. You should see it listening on :::8086 or 0.0.0.0:8086. If it only shows 127.0.0.1:8086, external traffic will be refused.

Step 2: Fixing "Connection Refused" due to Network/Config

If the issue is purely networking or configuration:

1. Update the Bind Address: Edit your /etc/influxdb/influxdb.conf (or the respective Docker environment variables) and set bind-address = ":8086" under the [http] section. Restart the service.

2. Adjust File Descriptors: If you see too many open files in the logs leading to dropped connections, edit /lib/systemd/system/influxdb.service and add LimitNOFILE=65536. Then run sudo systemctl daemon-reload && sudo systemctl restart influxdb.

Step 3: Resolving OOM Kills and Memory Issues

If the "Connection Refused" is caused by OOM kills, you must address the memory consumption. InfluxDB memory usage is driven by series cardinality and query payload.

1. Switch to TSI (Time Series Index): By default, older versions of InfluxDB (1.x) keep the entire series index in memory (in-memory index). If you have millions of unique series (e.g., generating unique tags per request like UUIDs), you will run out of RAM. Enable TSI to move the index to disk. In influxdb.conf, under [data], set index-version = "tsi1". Note: If you have existing data, you must rebuild the index using the influx_inspect buildtsi tool before restarting.

2. Implement Proper Retention Policies (RPs): Keeping raw, high-resolution data forever guarantees eventual OOMs and slow queries. Create a Retention Policy to drop high-res data after 30 days.

3. Use Continuous Queries (CQs) or Tasks: Downsample your data. Instead of querying 6 months of 1-second resolution data, query 1-hour resolution data using CQs.

Step 4: Mitigating InfluxDB Slow Queries and Timeouts

Sometimes, connection refused or 502 Bad Gateway errors happen because the HTTP request times out while InfluxDB is churning through a massive query.

1. Query Timeout Configuration: Prevent rogue queries from locking up the database. In influxdb.conf under [coordinator], configure query-timeout = "60s" and log-queries-after = "10s". Setting log-queries-after allows you to identify which queries are causing the bottlenecks in your logs.

2. Analyze Slow Queries: Run SHOW QUERIES in the InfluxDB CLI. If you see queries executing for hundreds of seconds, you may need to kill them using KILL QUERY <qid>.

3. Optimize Your Queries:

Avoid leading wildcards in regex tag matching (=~ /.*value/).
Limit the time range: Always include a WHERE time > now() - 1h clause. Querying without a time bound scans the entire database.
Reduce GROUP BY cardinality: Grouping by a tag that has 100,000 unique values will create 100,000 buckets in memory, instantly causing an OOM or massive timeout.

Final Verification

After applying fixes (especially memory limits and TSI):

Monitor memory usage with top or an external monitoring tool like Telegraf + Grafana.
Watch the InfluxDB logs: tail -f /var/log/influxdb/influxd.log for any level=error or level=warn.
Test external connections from a different host: curl -I http://<influxdb-ip>:8086/ping. If it returns a 204 No Content HTTP status, the connection is successfully established and InfluxDB is healthy.