Troubleshooting 'ansible failed': Connection Refused, Permission Denied, and Timeouts
Learn how to quickly diagnose and fix common Ansible failures including connection refused, permission denied, and SSH timeout errors with our comprehensive gui
- Connection refused usually indicates SSH service is down, blocked by a firewall, or listening on a non-standard port.
- Permission denied errors typically stem from incorrect SSH keys, wrong remote user, or missing sudo privileges.
- Timeout errors often occur due to network latency, large file transfers, or misconfigured SSH keepalive settings.
- Always use the -vvv or -vvvv flags with ansible-playbook to reveal the exact SSH command and failure point.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Update inventory with specific `ansible_port` | Target server uses a non-standard SSH port | 2 mins | Low |
| Configure SSH Key-Based Authentication | Encountering 'Permission denied (publickey)' | 5 mins | Low |
| Adjust `ansible_timeout` in ansible.cfg | Tasks fail due to slow network or long operations | 2 mins | Low |
| Use `become: yes` and `become_user` | Task requires root/sudo privileges | 2 mins | Medium |
Understanding the Error
When running an Ansible playbook or ad-hoc command, seeing fatal: [hostname]: FAILED! is an inevitable rite of passage for any DevOps engineer. Ansible relies heavily on SSH (by default) to connect to remote Linux nodes, execute modules, and return the results. When this process breaks down, it usually manifests in one of three primary ways: Connection refused, Permission denied, or a Timeout.
Because Ansible abstracts away the underlying SSH connections, diagnosing the exact cause requires peering under the hood. The failure can occur at various stages: during the initial TCP handshake, during SSH authentication, during privilege escalation (sudo), or while waiting for a long-running module to complete.
Here are the most common error manifestations:
1. Connection Refused
fatal: [webserver01]: UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.1.50 port 22: Connection refused",
"unreachable": true
}
2. Permission Denied
fatal: [dbserver01]: UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: user@192.168.1.60: Permission denied (publickey,password).",
"unreachable": true
}
3. Timeout
fatal: [appserver01]: UNREACHABLE! => {
"changed": false,
"msg": "Data could not be sent to remote host \"192.168.1.70\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.1.70 port 22: Operation timed out",
"unreachable": true
}
Let's break down how to diagnose and fix each of these scenarios systematically.
Step 1: Diagnose with Verbosity
The most powerful tool in your Ansible troubleshooting arsenal is the verbosity flag. By appending -vvv or -vvvv to your command, Ansible will print the exact SSH command it is trying to execute, along with the detailed output from the SSH client.
ansible-playbook site.yml -i inventory.ini -vvv
Look closely at the output. You will see a line starting with <hostname> ESTABLISH SSH CONNECTION FOR USER: <username>. Following that will be the actual ssh command, including all the arguments Ansible is passing (like ControlMaster settings, specific identity files, and strict host key checking parameters). Copying this exact SSH command and running it manually from your terminal is the quickest way to isolate whether the issue is with Ansible's configuration or a fundamental networking/SSH problem.
Step 2: Fixing 'Connection Refused'
A 'Connection refused' error explicitly means that the control node reached the target server's IP address, but the server actively rejected the connection on the specified port. This is different from a timeout (where packets are dropped entirely).
Root Causes & Solutions:
- SSH Service is Not Running: The most obvious cause. The target machine might have rebooted and the
sshdservice failed to start.- Fix: You will need out-of-band access (like a console via your cloud provider or VMware vSphere) to log in and start the service:
sudo systemctl start sshdand ensure it's enabledsudo systemctl enable sshd.
- Fix: You will need out-of-band access (like a console via your cloud provider or VMware vSphere) to log in and start the service:
- Wrong Port: Is the SSH daemon listening on a non-standard port (e.g., 2222) for security reasons, but Ansible is trying to use the default port 22?
- Fix: Update your inventory file to specify the correct port using the
ansible_portvariable.[webservers] web01 ansible_host=192.168.1.50 ansible_port=2222
- Fix: Update your inventory file to specify the correct port using the
- Firewall Blocking (with REJECT): While firewalls usually DROP packets (causing timeouts), some are configured to actively REJECT connections (sending a RST packet). Check
iptables,ufw, orfirewalldon the target machine, as well as any network firewalls or AWS Security Groups in between.- Fix: Ensure port 22 (or your custom port) is open to the IP address of your Ansible control node.
Step 3: Fixing 'Permission Denied'
This error occurs when the TCP connection is successful, but the SSH authentication phase fails. The server doesn't recognize the credentials you are providing.
Root Causes & Solutions:
- Wrong Remote User: By default, Ansible attempts to log in using the same username as the user running the
ansiblecommand on the control node. If you are logged in asalicelocally, but need to log in asubuntuon the remote server, it will fail.- Fix: Specify the correct remote user in your inventory, playbook, or command line.
- Inventory:
web01 ansible_user=ubuntu - Command line:
ansible-playbook site.yml -u ubuntu
- Inventory:
- Fix: Specify the correct remote user in your inventory, playbook, or command line.
- Missing or Incorrect SSH Key: The SSH server is configured for public key authentication, but your control node isn't offering the correct private key, or the public key isn't in the remote user's
~/.ssh/authorized_keysfile.- Fix:
- Ensure your SSH agent is running and has the key loaded:
ssh-add ~/.ssh/id_rsa. - Tell Ansible explicitly which key to use in the inventory:
web01 ansible_ssh_private_key_file=~/.ssh/custom_key.pem. - Verify the permissions on the remote
authorized_keysfile are correct (600) and the directory~/.sshis 700.
- Ensure your SSH agent is running and has the key loaded:
- Fix:
- Password Authentication Failed: If you are relying on passwords (not recommended for automation, but sometimes necessary), you might have mistyped it or the server might have disabled password auth via
PasswordAuthentication noinsshd_config.- Fix: Run your playbook with the
--ask-pass(-k) flag to prompt for the SSH password. Ensuresshpassis installed on your control node.
- Fix: Run your playbook with the
- Privilege Escalation (sudo) Issues: Sometimes the SSH connection succeeds, but a specific task fails with permission denied because it requires root access.
- Fix: Ensure you have
become: yeson the task or playbook level. If sudo requires a password, pass the--ask-become-pass(-K) flag.
- Fix: Ensure you have
Step 4: Fixing 'Timeout'
Timeouts are frustrating because they require waiting. They indicate that the control node sent a packet and never received a response within the configured window.
Root Causes & Solutions:
- Network Reachability/Firewall Drops: The most common cause. A router is misconfigured, a VPN dropped, or a firewall is silently dropping packets to port 22.
- Fix: Use standard network troubleshooting tools. Try
ping <hostname>. Usenc -zv <hostname> 22ortelnet <hostname> 22to see if the TCP port is reachable. Check routing tables and security groups.
- Fix: Use standard network troubleshooting tools. Try
- DNS Resolution Issues: Ansible might be trying to connect to a hostname that resolves to the wrong IP, or DNS resolution itself is timing out.
- Fix: Verify DNS resolution locally:
dig <hostname>ornslookup <hostname>. Alternatively, use IP addresses directly in your inventory viaansible_hostto bypass DNS entirely.
- Fix: Verify DNS resolution locally:
- Slow Tasks or Connections: Sometimes the network is just slow, or the server is under heavy load and takes a long time to establish the SSH connection. Alternatively, a specific Ansible task (like downloading a massive file or running a complex script) might exceed the default timeout.
- Fix: Increase the timeout settings. In your
ansible.cfgfile, adjust the global timeout (default is 10 seconds):
For SSH-specific connection timeouts, adjust the[defaults] timeout = 30ssh_argsto include a longerConnectTimeout:[ssh_connection] ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ConnectTimeout=30
- Fix: Increase the timeout settings. In your
- Stale SSH Multiplexing Sockets: Ansible uses SSH multiplexing (ControlMaster) by default to speed up connections. If a connection drops uncleanly, the local socket file can become stale, causing subsequent connection attempts to hang.
- Fix: Clear the stale sockets. They are usually located in
~/.ansible/cp/. You can simply delete them:rm -rf ~/.ansible/cp/*.
- Fix: Clear the stale sockets. They are usually located in
Advanced Debugging: ANSIBLE_DEBUG
If -vvvv still doesn't give you enough information, you can turn on full debugging output by setting the ANSIBLE_DEBUG environment variable before running your command:
ANSIBLE_DEBUG=1 ansible-playbook site.yml -i inventory.ini
This will output an immense amount of internal Ansible processing data, which can be overwhelming, but is invaluable for tracking down obscure bugs related to variable templating, plugin loading, or internal task execution flow that might be masquerading as connection failures.
Frequently Asked Questions
# 1. Run with maximum verbosity to see the exact SSH command failing
ansible all -i inventory.ini -m ping -vvvv
# 2. Test the raw SSH connection manually (replace with details from -vvvv output)
ssh -vvv -o ConnectTimeout=10 -i ~/.ssh/mykey.pem user@10.0.0.5
# 3. Test basic TCP port reachability (bypassing SSH protocol)
nc -zv 10.0.0.5 22
# 4. Clear potentially stale Ansible SSH multiplexing sockets
rm -rf ~/.ansible/cp/*
# 5. Temporarily disable host key checking for testing (export as env var)
export ANSIBLE_HOST_KEY_CHECKING=FalseError Medic Editorial
The Error Medic Editorial team consists of senior Site Reliability Engineers and DevOps practitioners dedicated to demystifying complex infrastructure issues. With decades of combined experience managing large-scale distributed systems, we provide actionable, production-ready solutions for the modern engineering workflow.