Error Medic

Troubleshooting 'Datadog Not Working': A Complete Diagnostic Guide

Fix Datadog agent not working or reporting metrics. Step-by-step guide to diagnose connection issues, API key errors, and integration failures.

Last updated:
Last verified:
970 words
Key Takeaways
  • Verify the Datadog Agent is running and the service is not in a failed state.
  • Check API and App keys for validity, ensuring they are correctly configured in datadog.yaml.
  • Ensure network connectivity to Datadog endpoints on port 443 (outbound).
  • Inspect agent logs for authentication errors or integration-specific failures.
Common Datadog Fix Approaches Compared
MethodWhen to UseTimeRisk
Restart AgentAgent is hung or unresponsive1 minLow
Network TestAgent cannot reach Datadog intake5 minsLow
Update KeysAuthentication errors in agent logs2 minsLow
Reinstall AgentCorrupted installation or persistent crashes10 minsMedium

Understanding the Error

When developers or operations teams report that "Datadog is not working," the symptoms can vary widely. It might mean that host metrics are missing, APM traces are not appearing, or a specific integration is failing to collect data. The Datadog Agent is the core component responsible for collecting and forwarding this data. Therefore, troubleshooting almost always begins at the agent level.

Common error messages you might encounter in the Datadog agent logs include:

  • Error connecting to Datadog intake: dial tcp: lookup intake.logs.datadoghq.com: no such host (Network issue)
  • API Key invalid or HTTP 403 Forbidden (Authentication issue)
  • Agent failed to start: datadog.yaml is missing or invalid (Configuration issue)

Step 1: Diagnose the Agent Status

The first step is always to check the status of the Datadog Agent on the affected host. The agent provides a built-in status command that runs a comprehensive health check.

Run the agent status command (see the code block below for the exact command). Look for the following sections in the output:

  • Agent (v7.x.x): Ensure the version is up-to-date and the process is running.
  • Collector: This section lists all configured integrations. Look for integrations marked with [ERROR].
  • Forwarder: Check for dropped payloads or connection errors. If the forwarder is dropping data, it usually indicates a network or API key problem.
  • Endpoints: Verify that the agent is trying to connect to the correct Datadog site (e.g., datadoghq.com vs datadoghq.eu).

Step 2: Verify Network Connectivity

The Datadog Agent requires outbound internet access to send data to Datadog's servers. By default, it communicates over HTTPS (port 443).

If the agent status shows forwarder errors, test connectivity from the host to the Datadog intake endpoints. You can use tools like curl or telnet.

Ensure that your firewalls, security groups, or proxies are not blocking outbound traffic to Datadog's IP ranges.

Step 3: Check Configuration and Authentication

If the network is fine but data is still not appearing, the issue is likely with authentication or configuration.

  1. Check datadog.yaml: The main configuration file is usually located at /etc/datadog-agent/datadog.yaml (Linux) or C:\ProgramData\Datadog\datadog.yaml (Windows). Ensure it exists and is readable by the dd-agent user.
  2. Verify API Key: The most critical setting is the api_key. Ensure it is correct and active in your Datadog account. Do not confuse the API key with the Application Key.
  3. Verify the Site: Ensure the site parameter matches your Datadog region (e.g., datadoghq.com, datadoghq.eu, us3.datadoghq.com, us5.datadoghq.com, ap1.datadoghq.com). Using the wrong site will result in authentication failures.

Step 4: Inspect Agent Logs

If the status command doesn't reveal the root cause, dive into the agent logs. The main log file is typically /var/log/datadog/agent.log. Look for ERROR or WARN level messages.

Common log locations:

  • Agent: /var/log/datadog/agent.log
  • Trace Agent: /var/log/datadog/trace-agent.log (for APM issues)
  • Process Agent: /var/log/datadog/process-agent.log (for Live Process issues)

Step 5: Address Integration-Specific Issues

If core metrics are reporting but a specific integration (e.g., PostgreSQL, Nginx, Kubernetes) is not working:

  1. Check the integration configuration file in /etc/datadog-agent/conf.d/<integration_name>.d/conf.yaml.
  2. Ensure the Datadog agent has the necessary permissions to access the service it is monitoring (e.g., database user permissions, file read permissions for logs).
  3. Run the integration specifically to see detailed errors using the agent command line tool.

Frequently Asked Questions

bash
# 1. Check Datadog Agent Service Status (Systemd)
sudo systemctl status datadog-agent

# 2. Run the comprehensive Datadog Agent status command
sudo datadog-agent status

# 3. Test network connectivity to Datadog US site
curl -v https://api.datadoghq.com

# 4. View recent agent logs for errors
sudo tail -n 50 /var/log/datadog/agent.log | grep -i error

# 5. Restart the Datadog Agent after configuration changes
sudo systemctl restart datadog-agent
D

DevOps Troubleshooting Team

A collective of senior SREs and DevOps engineers dedicated to solving complex infrastructure and monitoring challenges.

Sources

Related Guides