Error Medic

Troubleshooting GitHub Actions Timeout: "The job was canceled because it exceeded the configured timeout"

Fix GitHub Actions timeout, out of memory (OOM), and runner offline errors. Learn to configure timeout-minutes, fix permission denied issues, and debug hangs.

Last updated:
Last verified:
1,456 words
Key Takeaways
  • GitHub Actions default job timeout is 360 minutes (6 hours); unconfigured jobs can hang and consume valuable action minutes before failing.
  • An Out of Memory (OOM) error (Exit Code 137) often manifests as a generic 'github actions failed' or sudden timeout without clear logs.
  • Interactive prompts (like npm login or apt-get without -y) are a leading cause of jobs hanging infinitely until they time out.
  • Always implement explicit `timeout-minutes` at both the workflow and job levels to fail fast and save CI/CD budget.
  • 'Permission denied' errors on repository tokens can cause silent retries that eventually look like network timeouts.
Common Fix Approaches for Hanging GitHub Actions
MethodWhen to UseTime to FixRisk Level
Set `timeout-minutes`Preventing infinite loops or stuck dependency downloads from burning minutes.5 minsLow
Increase Runner Size / Adjust HeapFixing 'github actions out of memory' or silent Exit Code 137 crashes.15 minsMedium (Increased Cost)
Fix `permissions` blockResolving 'github actions permission denied' during package pushes or cloning.10 minsLow
Self-Hosted Runner Service RestartWhen 'github actions runner offline' prevents jobs from picking up.20 minsMedium

Understanding the Error: Why is GitHub Actions Not Working?

When your CI/CD pipeline grinds to a halt, seeing an error like The job was canceled because it exceeded the configured timeout can be deeply frustrating. A timeout is rarely just a slow build—it is usually a symptom of underlying infrastructure constraints, unhandled exceptions, resource exhaustion, or a process waiting for interactive user input in a non-interactive environment.

By default, a GitHub-hosted runner will let a job run for 360 minutes (6 hours) before forcibly terminating it. If your workflow normally takes 5 minutes and suddenly runs for 6 hours, you haven't just experienced a slow network connection; your runner has entered a deadlock state.

Symptom 1: The Infinite Hang (Interactive Prompts and Deadlocks)

The most common reason a GitHub Action times out is that a background process is waiting for standard input (stdin). Because CI environments are headless (non-interactive), the prompt waits forever.

Common culprits include:

  • Running apt-get install without the -y flag.
  • SSH/SCP commands prompting for a host key verification (StrictHostKeyChecking=accept-new is missing).
  • Package managers (npm, yarn, pip) prompting for authentication because of a github actions permission denied scenario when reading a private registry.

Symptom 2: Out of Memory (Exit Code 137)

Sometimes, what appears as a timeout or a generic github actions failed message is actually a fatal memory crash. GitHub's standard Linux runners come with 7GB of RAM. Memory-intensive tasks like Webpack builds, compiling large Rust/C++ projects, or running comprehensive end-to-end test suites can easily exceed this limit.

When the Linux kernel OOM (Out of Memory) killer terminates your process, it usually exits with Code 137. However, if this happens inside a container or a suppressed sub-shell, the runner might hang, attempting to recover or failing to flush the logs, eventually leading to a timeout.

Symptom 3: Network & Auth Blocks (Permission Denied)

If you see github actions permission denied in your logs right before a job stalls, you are likely dealing with restrictive GITHUB_TOKEN scopes. By default, new repositories may only grant read access to the standard token. If your script attempts to push to GitHub Packages (GHCR) or tag a release without the explicit permissions: packages: write or contents: write block, the CLI tool might enter an exponential backoff retry loop. This loop continues silently until the workflow times out.

Symptom 4: The 'Runner Offline' Trap

If your workflow is queued but never starts, you might be facing a github actions runner offline issue. For GitHub-hosted runners, this indicates a widespread GitHub outage. For self-hosted runners, this means the runner application service (actions.runner.service) on your EC2 instance or Kubernetes pod has crashed, lost network connectivity, or its registration token has expired.


Step 1: Diagnose the Root Cause

Before blindly throwing larger runners at the problem, you need to identify exactly where the workflow is getting stuck.

1. Enable Step Debug Logging

To get granular insights into what GitHub Actions is doing behind the scenes, enable runner diagnostic logging and step debug logging. You can do this by setting repository secrets:

  • ACTIONS_RUNNER_DEBUG to true
  • ACTIONS_STEP_DEBUG to true

Once re-run, your logs will include trace-level outputs, showing you exactly which network call or file operation caused the freeze.

2. Inspect Memory Usage Dynamically

If you suspect a github actions out of memory condition, add a background memory monitor to your workflow step to log resource usage right up until the crash.

3. SSH into the Runner

A powerful troubleshooting technique is using an action like mxschmitt/action-tmate. This allows you to SSH directly into the GitHub runner while the job is executing. You can run htop, check network configurations, and execute the failing command manually to see if it prompts for input.


Step 2: Fix and Optimize

Fix 1: Implement Fail-Fast Timeouts

Never rely on the default 360-minute limit. Protect your organization's action minutes by explicitly setting timeout-minutes at both the workflow and job level. If your job normally takes 10 minutes, set the timeout to 15.

jobs:
  build:
    runs-on: ubuntu-latest
    timeout-minutes: 15 # Fails fast if stuck
    steps:
      - uses: actions/checkout@v4

Fix 2: Resolve Out of Memory Errors

If your Node.js build is failing with OOM, increase the V8 heap space limit using an environment variable: env: NODE_OPTIONS: "--max_old_space_size=6144" (Allocates ~6GB of the 7GB available on standard runners).

If you are running Docker compose clusters within the runner, consider upgrading to GitHub's Larger Runners (e.g., 16-core, 64GB RAM) or optimizing your Dockerfile to use multi-stage builds, which drastically reduces memory footprint during image generation.

Fix 3: Fix Permission Denied & Auth Loops

Always explicitly define the permissions your GITHUB_TOKEN needs. If a step hangs while interacting with the GitHub API or a package registry, ensure your YAML includes:

permissions:
  contents: read
  packages: write
  pull-requests: write

Additionally, ensure all CLI commands use non-interactive flags. Replace apt-get install tree with DEBIAN_FRONTEND=noninteractive apt-get install -y tree.

Fix 4: Recovering Offline Self-Hosted Runners

If a self-hosted runner shows as offline, SSH into your host machine and check the service status. Usually, restarting the runner service or re-authenticating the runner token restores connectivity: sudo ./svc.sh status followed by sudo ./svc.sh restart. If the machine is behind a corporate firewall, ensure outbound traffic on port 443 to github.com and *.actions.githubusercontent.com is allowed, as dropped WebSocket connections will cause the runner to appear offline.

Frequently Asked Questions

bash
# --- Diagnostic snippet: Run this in a step before your build --- 
# Monitor memory usage in the background to detect OOM leaks leading to timeouts
(while true; do 
  echo "=== Memory Usage ===" 
  free -m 
  echo "=== Top Processes ==="
  top -b -n 1 | head -n 15
  sleep 30 
done) &

# --- Example of forcing non-interactive execution to prevent hangs ---
export DEBIAN_FRONTEND=noninteractive
sudo apt-get update && sudo apt-get install -y --no-install-recommends postgresql-client

# --- Self-hosted runner recovery commands ---
# Run these on your self-hosted instance if it shows as 'Offline'
cd ~/actions-runner
sudo ./svc.sh status
sudo ./svc.sh restart
E

Error Medic Editorial

Error Medic Editorial is composed of Senior DevOps and SRE professionals dedicated to untangling complex CI/CD pipelines, cloud infrastructure errors, and deployment bottlenecks.

Sources

Related Guides