Error Medic

Troubleshooting GitLab CI: Fixing Timeout, Permission Denied, and Stuck Pipeline Errors

Comprehensive guide to resolving GitLab CI job timeouts, 'permission denied' deployment errors, and stuck pipelines. Learn advanced runner debugging techniques.

Last updated:
Last verified:
1,915 words
Key Takeaways
  • GitLab CI job timeouts are usually caused by the default 60-minute project limit or runner-specific hard limits overriding project settings.
  • 'Permission denied' errors frequently stem from missing execution bits (chmod +x) on scripts, malformed SSH private keys in CI/CD variables, or insufficient Docker socket privileges in DinD setups.
  • Pipelines 'not working' or stuck in pending status almost always indicate a tag mismatch between the `.gitlab-ci.yml` job and registered runners, or runners dropping offline.
  • Enabling CI_DEBUG_TRACE is the fastest way to expose underlying permission issues before the job fully fails.
Fix Approaches Compared
MethodWhen to UseTimeRisk
Increase Project TimeoutJobs legitimately take longer than 60m (e.g., heavy e2e tests, ML model builds)2 minsLow
Deploy Specific RunnersShared runners consistently timeout or lack necessary compute/memory resources30 minsMedium
Inject SSH Keys via ssh-agentGit clone of private submodules or rsync deployments fail with 'publickey' errors10 minsLow
Run GitLab Runner in Privileged ModeDocker-in-Docker (DinD) builds fail with docker.sock permission denied15 minsHigh

Understanding the Errors

GitLab CI is a robust continuous integration tool, but complex deployment pipelines often run into execution limits, runner configuration mismatches, and access control roadblocks. When your pipeline halts, you are typically dealing with one of three primary symptoms: a GitLab CI timeout, a Permission denied error during execution or cloning, or a completely stalled pipeline where GitLab CI is not working at all.

This guide breaks down each error state, providing architectural context and exact technical steps to restore green builds.


Symptom 1: GitLab CI Timeout

The Error: ERROR: Job failed: execution took longer than 1h0m0s or ERROR: Job failed: execution took longer than 10m0s

The Context: Timeouts in GitLab CI occur at three distinct levels, and a misconfiguration in any of them will forcefully terminate your job. GitLab enforces timeouts to prevent runaway processes from consuming infinite compute hours, especially on shared SaaS runners.

  1. Project-Level Timeout: The default is 60 minutes. If your e2e test suite or docker image build takes 65 minutes, it will be killed.
  2. Runner-Level Timeout: The administrator of a specific runner can set a maximum job timeout. If the runner limit is 10 minutes, but your project limit is 60 minutes, the job will still fail at 10 minutes. The runner limit strictly overrides the project limit if the runner limit is lower.
  3. Job-Level Timeout: Defined directly in the .gitlab-ci.yml file using the timeout keyword.

Step 1: Diagnose the Timeout Layer First, verify exactly how long the job ran before failing. If it failed at exactly 60 minutes, it's almost certainly the project default. If it failed at a seemingly random round number like 10 or 30 minutes, suspect the runner configuration.

Step 2: Fix the Project Timeout If you have maintainer access to the repository:

  1. Navigate to your project in GitLab.
  2. Go to Settings > CI/CD > General pipelines.
  3. Scroll down to Timeout.
  4. Change the value from 60 (or whatever the current limit is) to a value that accommodates your longest job, plus a 20% buffer (e.g., 90 or 120).
  5. Save changes.

Step 3: Fix Job-Level Overrides Sometimes, you only want one massive job to have a long timeout so you don't risk blocking runners for hours on simple linting jobs. Edit your .gitlab-ci.yml:

heavy_integration_test:
  stage: test
  script:
    - make test-all
  timeout: 3 hours 30 minutes

Step 4: Check Runner Constraints If you increased the project timeout but the job still fails early, you are hitting the runner's hard limit. If you host your own runner, edit the /etc/gitlab-runner/config.toml:

[[runners]]
  name = "heavy-lifter"
  url = "https://gitlab.com/"
  token = "YOUR_TOKEN"
  executor = "docker"
  # Increase the runner-wide timeout to 7200 seconds (2 hours)
  output_limit = 4096

Note: Runner configuration doesn't strictly define the timeout in the config.toml (that dictates output limit), but the runner registration limits it. You must update the maximum job timeout in the GitLab Admin UI under Admin Area -> CI/CD -> Runners -> [Edit Runner] -> Maximum job timeout.


Symptom 2: GitLab CI Permission Denied

The Error Variations:

  • bash: line 14: ./deploy.sh: Permission denied
  • Permission denied (publickey,keyboard-interactive)
  • Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock

The Context: Permissions issues manifest differently depending on whether you are executing a local script, pulling code via SSH, or interacting with a privileged daemon like Docker.

Scenario A: Script Execution Denial If you see ./script.sh: Permission denied, Git tracked the file, but it didn't track the executable bit. Windows users frequently commit shell scripts without setting POSIX executable permissions.

The Fix: Do not just run chmod +x script.sh in the CI pipeline (though that works as a band-aid). Fix it at the Git level so it persists:

git update-index --chmod=+x script.sh
git commit -m "Make script.sh executable"
git push

Scenario B: SSH Key Access Errors If your pipeline fails while running npm install for a private git package, cloning a submodule, or rsync-ing to a remote server, it lacks the proper SSH keys.

The Fix: You must inject the SSH key securely using GitLab CI/CD variables and ssh-agent.

  1. Go to Settings > CI/CD > Variables.
  2. Add a variable named SSH_PRIVATE_KEY. Paste the exact contents of your id_rsa or id_ed25519 key. Crucial: Ensure there is a trailing newline at the end of the key block in the variable text box, or ssh-add will silently fail.
  3. Update your .gitlab-ci.yml before_script:
before_script:
  - 'command -v ssh-agent >/dev/null || ( apt-get update -y && apt-get install openssh-client -y )'
  - eval $(ssh-agent -s)
  # Properly inject the key, handling line endings
  - echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
  - mkdir -p ~/.ssh
  - chmod 700 ~/.ssh
  # Disable StrictHostKeyChecking for the pipeline environment
  - echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config

Scenario C: Docker Socket Permissions When running Docker-in-Docker (DinD) to build container images within a CI pipeline, you might be denied access to /var/run/docker.sock.

The Fix: If using a self-hosted runner, the runner executor must be configured in privileged mode to spawn inner Docker containers. Edit /etc/gitlab-runner/config.toml:

[runners.docker]
  tls_verify = false
  image = "docker:20.10.16"
  privileged = true     # THIS IS THE CRITICAL FIX
  disable_entrypoint_overwrite = false
  oom_kill_disable = false
  disable_cache = false
  volumes = ["/certs/client", "/cache"]

Restart the runner: sudo gitlab-runner restart.


Symptom 3: GitLab CI Not Working (Pending or Stuck)

The Error: This job is stuck because you don't have any active runners online with any of these tags assigned to them...

The Context: When a pipeline shows as "Pending" indefinitely, or seems to "not work" without producing logs, the job has not actually reached a runner. The GitLab coordinator is waiting for an available runner that matches the job's requirements to poll for work.

Step 1: Verify Tags GitLab routes jobs to runners based on tags. If your .gitlab-ci.yml specifies a tag that no active runner possesses, the job will hang forever.

build_app:
  stage: build
  tags:
    - aws-linux-heavy   # Does a runner with this exact tag exist?
  script:
    - make all

Check Settings > CI/CD > Runners and ensure a runner with a green circle (online) has the tag aws-linux-heavy. If you intend to use shared runners, remove the tags: block from your YAML entirely so it can execute on any generic available runner.

Step 2: Check Runner Registration Status If you host your own runners, log into the server hosting the runner and verify its connection to the GitLab instance:

sudo gitlab-runner verify

Output should look like: Runtime platform arch=amd64 os=linux pid=1409 revision=... Verifying runner... is alive runner=xyz123

If it says is removed, the runner token was revoked or deleted from the GitLab UI. You must re-register the runner using sudo gitlab-runner register.

Step 3: Concurrency Limits If your pipelines only "stop working" during busy hours, you are likely hitting concurrency limits. A runner will only execute x jobs simultaneously. In /etc/gitlab-runner/config.toml, look at the very top line: concurrent = 1 If it is set to 1, and job A is running, job B will be stuck in "Pending" until job A finishes. Increase this number (e.g., concurrent = 10) based on the CPU and RAM available on your runner host machine.

Advanced Debugging: Unmasking Hidden Failures

When standard logs fail to explain why a job is timing out or failing with cryptic permissions errors, enable highly verbose logging.

Add this to your CI/CD Variables (or directly in the .gitlab-ci.yml under variables:): CI_DEBUG_TRACE: "true"

This will expose the raw shell execution, variable expansion, and exact exit codes of every background command the GitLab Runner executes before and after your defined script block. Warning: This can expose masked secrets in logs, so use it only temporarily for debugging private repositories, and clear the logs afterward.

Frequently Asked Questions

bash
# --- Diagnostic & Fix Commands for GitLab Runner ---

# 1. Verify runner connectivity and status
sudo gitlab-runner verify

# 2. Check runner logs for hidden daemon errors
sudo journalctl -u gitlab-runner -f

# 3. Fix script execution permissions permanently in Git
git update-index --chmod=+x build.sh
git commit -m "chore: add execution permissions to build script"

# 4. Standard boilerplate to securely inject SSH keys in before_script
eval $(ssh-agent -s)
echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
mkdir -p ~/.ssh
chmod 700 ~/.ssh
ssh-keyscan gitlab.com >> ~/.ssh/known_hosts
chmod 644 ~/.ssh/known_hosts

# 5. Fix Docker-in-Docker permissions (run on runner host)
# Ensure gitlab-runner user is part of the docker group
sudo usermod -aG docker gitlab-runner
sudo systemctl restart gitlab-runner
E

Error Medic Editorial

Error Medic Editorial is a collective of senior DevOps and SRE professionals dedicated to demystifying CI/CD pipelines, cloud infrastructure, and modern deployment architectures.

Sources

Related Guides