Why does my GitHub Action fail exactly after 6 hours?

Six hours (360 minutes) is the hardcoded default timeout for GitHub-hosted runners. If your action hits this limit, a process within your workflow is hung, deadlocked, or waiting for interactive user input (like a yes/no prompt).

How do I fix a 'github actions out of memory' error with Exit Code 137?

Exit Code 137 means the Linux kernel killed your process for consuming too much RAM. Fix this by optimizing your build process, increasing memory allocation for your runtime (e.g., NODE_OPTIONS='--max_old_space_size=6144'), or switching to GitHub Larger Runners.

What causes the 'github actions permission denied' error when pushing Docker images?

This occurs when the default GITHUB_TOKEN lacks the required scopes to write to the registry. Add an explicit `permissions: packages: write` block to your workflow file to grant the necessary authorization.

Why is my self-hosted github actions runner offline?

A self-hosted runner goes offline if the host machine loses outbound internet connectivity, the actions runner service crashes, or the registration token expires. Verify network access to GitHub APIs and restart the service using `./svc.sh restart`.

How can I debug a workflow that is not working without creating dozens of dummy commits?

You can use a local runner simulator like `nektos/act` to test workflows on your local machine using Docker. Alternatively, temporarily inject the `mxschmitt/action-tmate` step into your workflow to SSH into the runner and debug live.

Troubleshooting GitHub Actions Timeout: "The job was canceled because it exceeded the configured timeout"

Common Fix Approaches for Hanging GitHub Actions
Method	When to Use	Time to Fix	Risk Level
Set `timeout-minutes`	Preventing infinite loops or stuck dependency downloads from burning minutes.	5 mins	Low
Increase Runner Size / Adjust Heap	Fixing 'github actions out of memory' or silent Exit Code 137 crashes.	15 mins	Medium (Increased Cost)
Fix `permissions` block	Resolving 'github actions permission denied' during package pushes or cloning.	10 mins	Low
Self-Hosted Runner Service Restart	When 'github actions runner offline' prevents jobs from picking up.	20 mins	Medium

Understanding the Error: Why is GitHub Actions Not Working?

When your CI/CD pipeline grinds to a halt, seeing an error like The job was canceled because it exceeded the configured timeout can be deeply frustrating. A timeout is rarely just a slow build—it is usually a symptom of underlying infrastructure constraints, unhandled exceptions, resource exhaustion, or a process waiting for interactive user input in a non-interactive environment.

By default, a GitHub-hosted runner will let a job run for 360 minutes (6 hours) before forcibly terminating it. If your workflow normally takes 5 minutes and suddenly runs for 6 hours, you haven't just experienced a slow network connection; your runner has entered a deadlock state.

Symptom 1: The Infinite Hang (Interactive Prompts and Deadlocks)

The most common reason a GitHub Action times out is that a background process is waiting for standard input (stdin). Because CI environments are headless (non-interactive), the prompt waits forever.

Common culprits include:

Running apt-get install without the -y flag.
SSH/SCP commands prompting for a host key verification (StrictHostKeyChecking=accept-new is missing).
Package managers (npm, yarn, pip) prompting for authentication because of a github actions permission denied scenario when reading a private registry.

Symptom 2: Out of Memory (Exit Code 137)

Sometimes, what appears as a timeout or a generic github actions failed message is actually a fatal memory crash. GitHub's standard Linux runners come with 7GB of RAM. Memory-intensive tasks like Webpack builds, compiling large Rust/C++ projects, or running comprehensive end-to-end test suites can easily exceed this limit.

When the Linux kernel OOM (Out of Memory) killer terminates your process, it usually exits with Code 137. However, if this happens inside a container or a suppressed sub-shell, the runner might hang, attempting to recover or failing to flush the logs, eventually leading to a timeout.

Symptom 3: Network & Auth Blocks (Permission Denied)

If you see github actions permission denied in your logs right before a job stalls, you are likely dealing with restrictive GITHUB_TOKEN scopes. By default, new repositories may only grant read access to the standard token. If your script attempts to push to GitHub Packages (GHCR) or tag a release without the explicit permissions: packages: write or contents: write block, the CLI tool might enter an exponential backoff retry loop. This loop continues silently until the workflow times out.

Symptom 4: The 'Runner Offline' Trap

If your workflow is queued but never starts, you might be facing a github actions runner offline issue. For GitHub-hosted runners, this indicates a widespread GitHub outage. For self-hosted runners, this means the runner application service (actions.runner.service) on your EC2 instance or Kubernetes pod has crashed, lost network connectivity, or its registration token has expired.

Step 1: Diagnose the Root Cause

Before blindly throwing larger runners at the problem, you need to identify exactly where the workflow is getting stuck.

1. Enable Step Debug Logging

To get granular insights into what GitHub Actions is doing behind the scenes, enable runner diagnostic logging and step debug logging. You can do this by setting repository secrets:

ACTIONS_RUNNER_DEBUG to true
ACTIONS_STEP_DEBUG to true

Once re-run, your logs will include trace-level outputs, showing you exactly which network call or file operation caused the freeze.

2. Inspect Memory Usage Dynamically

If you suspect a github actions out of memory condition, add a background memory monitor to your workflow step to log resource usage right up until the crash.

3. SSH into the Runner

A powerful troubleshooting technique is using an action like mxschmitt/action-tmate. This allows you to SSH directly into the GitHub runner while the job is executing. You can run htop, check network configurations, and execute the failing command manually to see if it prompts for input.

Step 2: Fix and Optimize

Fix 1: Implement Fail-Fast Timeouts

Never rely on the default 360-minute limit. Protect your organization's action minutes by explicitly setting timeout-minutes at both the workflow and job level. If your job normally takes 10 minutes, set the timeout to 15.

jobs:
  build:
    runs-on: ubuntu-latest
    timeout-minutes: 15 # Fails fast if stuck
    steps:
      - uses: actions/checkout@v4

Fix 2: Resolve Out of Memory Errors

If your Node.js build is failing with OOM, increase the V8 heap space limit using an environment variable: env: NODE_OPTIONS: "--max_old_space_size=6144" (Allocates ~6GB of the 7GB available on standard runners).

If you are running Docker compose clusters within the runner, consider upgrading to GitHub's Larger Runners (e.g., 16-core, 64GB RAM) or optimizing your Dockerfile to use multi-stage builds, which drastically reduces memory footprint during image generation.

Fix 3: Fix Permission Denied & Auth Loops

Always explicitly define the permissions your GITHUB_TOKEN needs. If a step hangs while interacting with the GitHub API or a package registry, ensure your YAML includes:

permissions:
  contents: read
  packages: write
  pull-requests: write

Additionally, ensure all CLI commands use non-interactive flags. Replace apt-get install tree with DEBIAN_FRONTEND=noninteractive apt-get install -y tree.

Fix 4: Recovering Offline Self-Hosted Runners

If a self-hosted runner shows as offline, SSH into your host machine and check the service status. Usually, restarting the runner service or re-authenticating the runner token restores connectivity: sudo ./svc.sh status followed by sudo ./svc.sh restart. If the machine is behind a corporate firewall, ensure outbound traffic on port 443 to github.com and *.actions.githubusercontent.com is allowed, as dropped WebSocket connections will cause the runner to appear offline.