Error Medic

Fixing GitHub Actions Timeout and Runner Offline Errors

Comprehensive guide to troubleshooting GitHub Actions timeouts, out of memory errors, permission denied issues, and offline runners with practical fixes.

Last updated:
Last verified:
1,186 words
Key Takeaways
  • Timeouts are often caused by infinite loops in scripts, hung network requests, or waiting for interactive prompts.
  • Out of memory (OOM) errors frequently occur during heavy build processes (like Webpack or Maven) on standard GitHub-hosted runners.
  • Permission denied errors usually stem from incorrect GITHUB_TOKEN scopes or missing secrets.
  • Offline runners are typically self-hosted runner issues related to network connectivity, service crashes, or token expiration.
  • Use the 'timeout-minutes' keyword at the job or step level to prevent runaway workflows from consuming all your billing minutes.
Common Fixes for GitHub Actions Failures
IssuePrimary Fix ApproachTime to FixComplexity
Job TimeoutAdd timeout-minutes, fix hung scripts10-30 minsLow
Out of Memory (137)Optimize build, use larger runners, increase swap30-60 minsMedium
Permission Denied (403)Update permissions block in workflow YAML5-15 minsLow
Runner OfflineRestart self-hosted runner service, check network15-45 minsMedium

Understanding GitHub Actions Failures

When a GitHub Actions workflow fails, it can halt your CI/CD pipeline, preventing deployments and blocking pull requests. The error messages can range from explicit exit codes to frustratingly vague timeouts. Understanding the root cause of these failures is critical for maintaining a reliable development velocity.

Diagnosing 'The job running on runner ... has exceeded the maximum execution time of 360 minutes.'

This is the classic GitHub Actions timeout error. By default, a job on a GitHub-hosted runner can run for up to 6 hours (360 minutes). If your job hits this limit, it's almost certainly stuck. Common culprits include:

  1. Hanging Network Requests: A script is waiting for a response from an external API or database that is down or unreachable, and there is no timeout configured on the request itself.
  2. Interactive Prompts: A CLI tool is asking for user input (e.g., 'Are you sure you want to proceed? [y/N]') and waiting indefinitely because there is no TTY.
  3. Infinite Loops: A bug in your test suite or build script has caused an infinite loop.

The Fix: First, always set a reasonable timeout-minutes on your jobs and even individual steps. If a typical build takes 10 minutes, set the timeout to 20. This fails fast and saves money. Second, review the logs right before the timeout. Look for commands that might prompt for input and add flags like --non-interactive, -y, or --quiet.

Handling 'Process completed with exit code 137' (Out of Memory)

Exit code 137 typically means the process was killed by the OOM (Out Of Memory) killer. Standard GitHub-hosted Linux runners come with 7 GB of RAM. Memory-intensive tasks like compiling large C++ projects, running heavy Java builds, or complex Node.js bundling can easily exceed this.

The Fix:

  1. Optimize the Build: Can you run tests in parallel on different jobs rather than sequentially in one? Can you reduce the number of workers your test runner uses?
  2. Increase Swap Space: You can artificially increase available memory by creating a swap file before your build step.
  3. Larger Runners: If optimization isn't enough, consider upgrading to GitHub's Larger Runners (which offer up to 64-core and 256GB RAM options) or using a beefy self-hosted runner.

Resolving 'Permission denied' or '403 Forbidden'

These errors usually occur when your workflow tries to perform an action it isn't authorized for, such as pushing a tag, creating a release, or authenticating with a cloud provider.

The Fix: Ensure the GITHUB_TOKEN has the correct scopes. By default, the token permissions might be restricted to read-only depending on your repository or organization settings. Explicitly define the required permissions in your workflow YAML:

permissions:
  contents: write
  pull-requests: read

If you are accessing external services (like AWS, GCP, or Azure), ensure your secrets are correctly populated and that you are using OIDC (OpenID Connect) for authentication where possible, rather than long-lived credentials.

Troubleshooting 'Offline' Self-Hosted Runners

If your workflow is queued indefinitely and your self-hosted runner shows as 'Offline' in the GitHub UI, the runner process on your host machine has likely stopped communicating with GitHub.

The Fix:

  1. Check the Service: Ensure the runner service is running on the host machine (systemctl status actions.runner.*).
  2. Review Runner Logs: Inspect the diagnostic logs located in the _diag folder within the runner application directory. Look for network timeouts or SSL errors.
  3. Network Configuration: Verify that the host machine can reach github.com and api.github.com on HTTPS (port 443).
  4. Re-configuration: In rare cases, the runner token may have expired or become corrupted. You may need to remove the runner from the GitHub UI and reconfigure it on the host.

Frequently Asked Questions

yaml
# Example of fixing timeout and permission issues in workflow.yml

name: CI Build

on: [push]

# Fix 1: Explicitly grant needed permissions to avoid 403s
permissions:
  contents: read
  packages: write

jobs:
  build:
    runs-on: ubuntu-latest
    # Fix 2: Prevent 6-hour hangs by setting a realistic timeout
    timeout-minutes: 15 

    steps:
      - uses: actions/checkout@v4

      # Fix 3: Handle OOM issues by adding swap space (if needed)
      - name: Create Swap Space
        run: |
          sudo fallocate -l 4G /swapfile
          sudo chmod 600 /swapfile
          sudo mkswap /swapfile
          sudo swapon /swapfile
          free -h

      - name: Run Build
        # Fix 4: Ensure non-interactive mode for tools that might prompt
        run: npm ci --prefer-offline --no-audit && npm run build
E

Error Medic Editorial

Error Medic Editorial is a team of veteran Site Reliability Engineers and DevOps practitioners dedicated to demystifying complex CI/CD failures and providing actionable solutions.

Sources

Related Guides