Error Medic

GitHub Actions Timeout, Permission Denied & Runner Offline: Complete Troubleshooting Guide

Fix GitHub Actions timeout, out-of-memory, permission denied, and runner offline errors with step-by-step commands and YAML fixes.

Last updated:
Last verified:
2,490 words
Key Takeaways
  • Job timeouts are caused by hung processes, missing -y flags, or no explicit timeout-minutes set—the default is 6 hours for GitHub-hosted runners
  • Out-of-memory kills produce exit code 137 or a silent SIGKILL; fix by raising NODE_OPTIONS/JVM heap, splitting jobs, or upgrading to a larger runner
  • Permission denied errors stem from insufficient GITHUB_TOKEN scopes—add an explicit permissions block to the workflow or job
  • Runner offline means the self-hosted runner service crashed, the registration token expired, or the host lost network connectivity to GitHub
  • Enable ACTIONS_RUNNER_DEBUG=true and ACTIONS_STEP_DEBUG=true secrets for verbose logs when the root cause is unclear
Fix Approaches Compared
MethodWhen to UseTime to ApplyRisk
Set explicit timeout-minutesJob hangs indefinitely or exceeds 6 h default< 2 minLow — only cancels jobs sooner
Add npm ci / apt-get -y flagsHanging on package install prompts< 5 minLow — idempotent change
Add actions/cacheSlow builds due to repeated dependency downloads15–30 minLow — cache miss falls back to fresh install
Set NODE_OPTIONS / MAVEN_OPTSNode or JVM OOM kill (exit 137)< 5 minLow — tuning only
Upgrade to larger GitHub-hosted runnerLegitimate memory ceiling on standard runners< 5 minMedium — cost increase
Add permissions block to workflowResource not accessible by integration< 5 minLow — additive change
Store PAT as secret, use in checkoutCross-repo push or advanced scopes needed10–20 minMedium — PAT must be rotated
Restart self-hosted runner serviceRunner shows Offline in repo settings5 minLow — service restart
Re-register self-hosted runnerRunner token expired or runner corrupted10 minLow — old runner entry is removed
Enable tmate debug sessionCannot reproduce failure locally15 minLow — SSH session is ephemeral

Understanding GitHub Actions Failures: Timeout, OOM, Permission Denied & More

GitHub Actions workflows fail for a surprisingly small set of root causes. Learning to read the error signature quickly saves hours of trial-and-error. This guide maps each error message to a concrete fix.


Recognise the Error Before You Fix It

Each failure class has a distinct fingerprint in the job logs:

Timeout:

##[error]The job running on runner GitHub Actions X has exceeded the maximum execution time of 360 minutes.
Error: The operation was canceled.

Out of Memory (OOM):

Killed
Process completed with exit code 137.

or a sudden silent job cancellation with no error text—the Linux OOM killer sent SIGKILL.

Permission Denied (GITHUB_TOKEN):

Error: Resource not accessible by integration
remote: Permission to org/repo.git denied to github-actions[bot].
Error: HttpError: GitHub Actions is not permitted to create or approve pull requests.

Runner Offline:

No runners are available to run the requested job.
##[error]This request has been automatically failed because there are no available runners online to process the request.

General failure (catch-all):

Process completed with exit code 1.
##[error]The process '/usr/bin/git' failed with exit code 128.

Step 1: Diagnose the Root Cause

Read the Raw Logs

The GitHub Actions UI collapses log lines. Always click gear icon → View raw logs on a failed run to see the untruncated output. For programmatic access:

gh run view <RUN_ID> --log-failed
Identify Which Timeout Layer Fired

GitHub Actions has two independent timeout controls:

  • Job-level timeout-minutes: default 360 minutes for GitHub-hosted runners, unlimited for self-hosted.
  • Step-level timeout-minutes: no default—steps run until the job timeout kills them.

If you see the 360-minute message, the job-level default fired. If a specific step message appears, a step-level timeout was set. Either way, trace back to the last log line before cancellation—that is the hanging operation.

Confirm OOM vs Timeout

OOM exit code is 137 (128 + 9 = SIGKILL). Timeout cancellation typically shows the ##[error] timeout message. If the job vanishes with exit 137 and no timeout message, it is OOM.

Add a pre-flight memory check to any suspect job:

- name: Runner memory profile
  run: free -h && df -h && nproc

GitHub-hosted runner memory limits:

  • ubuntu-latest / ubuntu-22.04: 7 GB RAM, 2 vCPU, 14 GB SSD
  • windows-latest: 16 GB RAM, 2 vCPU
  • macos-latest: 14 GB RAM, 3 vCPU
Confirm Permission Scope

The GITHUB_TOKEN is minted per workflow run with the minimum scope unless you override it. The repository-level default is Read and write or Read-only depending on Settings → Actions → General → Workflow permissions. Check your current effective scope:

gh api /repos/{owner}/{repo}/actions/permissions
Check Runner Status

For self-hosted runners navigate to Settings → Actions → Runners (repo level) or Organization Settings → Actions → Runners. A runner showing Offline or Idle for an extended period needs attention.


Step 2: Fix Timeout Errors

Set an Explicit Job Timeout

Never rely on the 6-hour default for short jobs. Setting a tight timeout catches regressions early:

jobs:
  build:
    runs-on: ubuntu-latest
    timeout-minutes: 20
    steps:
      - uses: actions/checkout@v4
      - name: Install dependencies
        timeout-minutes: 5
        run: npm ci
      - name: Run tests
        timeout-minutes: 12
        run: npm test
Eliminate Hanging Processes

Common causes of indefinite hangs:

  1. Interactive promptsapt-get, pip, brew waiting for confirmation. Fix: always pass -y / --yes / --non-interactive.
  2. Servers that never bind — test suite starting a dev server that fails to listen. Fix: use wait-on or health-check loops with a timeout.
  3. npm install vs npm cinpm install can stall on registry issues. Fix: use npm ci which is deterministic and faster.
  4. Deadlocked processes — build tool waiting on a lock file held by a previous cancelled run. Fix: add a cache-busting step or clean workspace.
- name: Install system deps
  run: sudo apt-get update && sudo apt-get install -y build-essential curl

- name: Install node deps
  run: npm ci --prefer-offline
Cache Dependencies to Prevent Slow Installs
- uses: actions/cache@v4
  with:
    path: |
      ~/.npm
      ~/.cache/pip
    key: ${{ runner.os }}-deps-${{ hashFiles('**/package-lock.json', '**/requirements.txt') }}
    restore-keys: |
      ${{ runner.os }}-deps-

Step 3: Fix Out of Memory (OOM) Errors

Raise the Process Memory Limit

For Node.js builds hitting the default 512 MB V8 heap:

- name: Build frontend
  run: npm run build
  env:
    NODE_OPTIONS: "--max-old-space-size=4096"

For JVM-based builds (Maven, Gradle):

- name: Maven package
  run: mvn -B package --no-transfer-progress
  env:
    MAVEN_OPTS: "-Xmx4g -Xms512m -XX:+UseG1GC"
    JAVA_TOOL_OPTIONS: "-Xmx4g"
Split Tests Across Matrix Shards

A single large Jest or Pytest run can exhaust memory. Shard across parallel jobs:

jobs:
  test:
    strategy:
      matrix:
        shard: [1, 2, 3, 4]
      fail-fast: false
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npx jest --shard=${{ matrix.shard }}/4 --forceExit
Use a Larger Runner

For GitHub Team or Enterprise, larger hosted runners are available:

jobs:
  heavy-build:
    runs-on: ubuntu-latest-8-cores   # 32 GB RAM, 8 vCPU

For self-hosted runners, provision a machine with sufficient RAM and register it with an appropriate label.


Step 4: Fix Permission Denied Errors

Add a Permissions Block

The most common fix. Add it at the workflow level or per-job:

# Workflow-level — applies to all jobs
permissions:
  contents: write
  pull-requests: write
  packages: write
  id-token: write       # required for OIDC/cloud auth
  issues: write
  statuses: write

jobs:
  deploy:
    # Job-level override — more restrictive is safer
    permissions:
      contents: read
      id-token: write
Fix Repository-Level Default Permissions

Navigate to Settings → Actions → General → Workflow permissions and select Read and write permissions if your workflows need to push commits or create releases. Also enable Allow GitHub Actions to create and approve pull requests if needed.

Use a PAT for Cross-Repository or Elevated Operations

For operations that exceed GITHUB_TOKEN capabilities (pushing to another repo, triggering workflows in a different organisation):

jobs:
  cross-repo-push:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          token: ${{ secrets.AUTOMATION_PAT }}
          repository: org/other-repo
      - name: Commit and push
        run: |
          git config user.email "ci-bot@example.com"
          git config user.name "CI Bot"
          git add .
          git commit -m "chore: automated update"
          git push

Store the PAT as a repository or organisation secret. Rotate it on a schedule using gh secret set.


Step 5: Fix Runner Offline Errors

Restart Self-Hosted Runner Service
# Navigate to the runner installation directory
cd /opt/actions-runner

# Check the managed service status
./svc.sh status

# Restart
./svc.sh stop && ./svc.sh start

# For systemd-managed installation
sudo systemctl status "actions.runner.*"
sudo systemctl restart "actions.runner.OWNER-REPO.RUNNER-NAME.service"

# Inspect runner diagnostic logs
ls -lt _diag/ | head -5
tail -100 _diag/Runner_$(date +%Y%m%d)*.log
Re-Register an Expired Runner

Runner registration tokens are valid for 1 hour. If the runner was registered long ago and the service was reinstalled, re-register:

# Remove the old registration (get removal token from GitHub Settings UI)
./config.sh remove --token <REMOVE_TOKEN>

# Re-register with a fresh token from Settings → Actions → Runners → New self-hosted runner
./config.sh \
  --url https://github.com/OWNER/REPO \
  --token <NEW_REGISTRATION_TOKEN> \
  --name my-runner \
  --labels linux,x64,production \
  --unattended

./svc.sh install
./svc.sh start
Ensure Network Connectivity

Self-hosted runners must reach these endpoints (allow outbound HTTPS/443):

  • github.com
  • api.github.com
  • *.actions.githubusercontent.com
  • objects.githubusercontent.com
  • *.blob.core.windows.net (artifact storage)

Test connectivity from the runner host:

curl -v https://api.github.com/zen
curl -v https://pipelines.actions.githubusercontent.com/_apis/health

Step 6: Enable Debug Logging

For failures that are not obvious from the logs, enable runner and step debug output by adding these as repository secrets (not variables):

Secret Name Value
ACTIONS_RUNNER_DEBUG true
ACTIONS_STEP_DEBUG true

This adds verbose output including environment variables, runner internals, and full step traces. Disable after debugging to avoid log noise and potential secret exposure.

For interactive debugging, drop a tmate session into a failed job:

- name: Interactive debug session on failure
  uses: mxschmitt/action-tmate@v3
  if: ${{ failure() }}
  timeout-minutes: 10
  with:
    limit-access-to-actor: true

This opens an SSH tunnel directly into the runner, letting you inspect the filesystem and environment interactively.

Frequently Asked Questions

bash
#!/usr/bin/env bash
# GitHub Actions self-hosted runner diagnostic script
# Run this on the runner host to diagnose common issues

set -euo pipefail

RUNNER_DIR="${1:-/opt/actions-runner}"

echo "=== Runner host diagnostics ==="
echo "Date: $(date -u)"
echo "Hostname: $(hostname)"
echo ""

echo "=== System resources ==="
free -h
echo ""
df -h /
echo ""
nproc
echo ""

echo "=== Runner service status ==="
if command -v systemctl &>/dev/null; then
  systemctl list-units 'actions.runner*' --no-pager 2>/dev/null || echo "No systemd runner units found"
fi

if [[ -f "${RUNNER_DIR}/svc.sh" ]]; then
  "${RUNNER_DIR}/svc.sh" status 2>&1 || true
fi
echo ""

echo "=== Network connectivity to GitHub ==="
curl -s --max-time 10 https://api.github.com/zen && echo " [OK] api.github.com" || echo " [FAIL] api.github.com unreachable"
curl -s --max-time 10 -o /dev/null -w "%{http_code}" https://github.com | grep -q "200\|301\|302" && echo " [OK] github.com" || echo " [FAIL] github.com unreachable"
echo ""

echo "=== Recent runner diagnostic logs ==="
if [[ -d "${RUNNER_DIR}/_diag" ]]; then
  LATEST_LOG=$(ls -t "${RUNNER_DIR}/_diag"/Runner_*.log 2>/dev/null | head -1)
  if [[ -n "${LATEST_LOG}" ]]; then
    echo "Log: ${LATEST_LOG}"
    tail -50 "${LATEST_LOG}"
  else
    echo "No runner logs found in ${RUNNER_DIR}/_diag/"
  fi
else
  echo "Runner directory ${RUNNER_DIR} not found"
fi
echo ""

echo "=== Runner registration check ==="
if [[ -f "${RUNNER_DIR}/.runner" ]]; then
  echo "Runner is registered:"
  cat "${RUNNER_DIR}/.runner"
else
  echo "WARNING: .runner file not found — runner may not be registered"
fi
echo ""

echo "=== Trigger a workflow re-run via GitHub CLI ==="
echo "# Re-run all failed jobs in the most recent run of a workflow:"
echo "# gh run list --workflow=build.yml --limit=1 --json databaseId -q '.[0].databaseId' | xargs gh run rerun --failed"
echo ""
echo "# Enable debug logging for the next run (set as repository secrets):"
echo "# gh secret set ACTIONS_RUNNER_DEBUG --body true"
echo "# gh secret set ACTIONS_STEP_DEBUG --body true"
echo ""
echo "Diagnostic complete."
E

Error Medic Editorial

Error Medic Editorial is a team of senior DevOps and SRE engineers with combined experience across GitHub Actions, GitLab CI, Jenkins, and Kubernetes-based CI/CD pipelines. We write actionable troubleshooting guides grounded in production war stories, not just documentation summaries.

Sources

Related Guides