Why does my systemd service show 'Main process exited, code=killed, status=9/KILL'?

Status 9 corresponds to SIGKILL. This is almost always caused by the Linux kernel's OOM (Out of Memory) killer or systemd-oomd forcefully terminating the process because the system ran out of RAM or the service exceeded its configured `MemoryMax` limit.

How do I prevent the OOM killer from terminating my critical database service?

You can protect critical services by setting a negative OOM score. Run `systemctl edit ` and add `OOMScoreAdjust=-900` under the `[Service]` section. This tells the kernel to prioritize killing other, less critical processes first.

Where are systemd core dumps stored, and how do I read them?

By default, systemd-coredump stores core dumps in `/var/lib/systemd/coredump/` (compressed). You shouldn't read the files directly; instead, use the `coredumpctl` command. Run `coredumpctl list` to view recent crashes, and `coredumpctl info ` or `coredumpctl gdb ` to debug.

What causes a systemd unit to fail with 'Permission denied' (status 203/EXEC)?

A 203/EXEC error means systemd could not execute the binary specified in `ExecStart`. Common causes include: missing execute permissions (`chmod +x`), an incorrect path, missing interpreter (e.g., wrong shebang in a Python/Bash script), or SELinux/AppArmor blocking execution.

Why is systemd-journald consuming high CPU?

High CPU in `systemd-journald` usually means an application is spamming logs uncontrollably, causing massive I/O overhead. You can use `journalctl -f` to find the noisy service, and configure `RateLimitIntervalSec` and `RateLimitBurst` in `/etc/systemd/journald.conf` to throttle the logging.

Fixing systemd OOM (Out of Memory) Kills and Service Failures: A Complete Guide

systemd OOM Mitigation Approaches Compared
Method	When to Use	Time	Risk
Increase MemoryMax in Unit	Service legitimately needs more memory for workloads	5 mins	Low
Adjust OOMScoreAdjust	Critical service (e.g., DB) shouldn't be killed first	5 mins	Medium (May kill other services)
Configure OOMPolicy=continue	Service can recover itself or should be left to kernel OOM	10 mins	Low
Analyze Core Dump via coredumpctl	Service crashes unexpectedly before hitting OOM limits	30+ mins	Low

Understanding systemd OOM Kills and Service Failures

When managing Linux servers, encountering a systemd failed state is a rite of passage for any DevOps or SRE engineer. One of the most disruptive and confusing scenarios is the systemd oom (Out of Memory) kill. In modern Linux distributions (like Ubuntu 22.04+ and Fedora), memory management is handled not just by the kernel OOM killer, but actively by systemd-oomd. When a service consumes too much memory, you might see it mysteriously terminate, leading to cascading application failures.

This comprehensive guide will walk you through diagnosing and fixing systemd OOM events, analyzing a systemd core dump, troubleshooting systemd high cpu usage, resolving systemd permission denied errors, and fixing a systemd service not starting.

Step 1: Diagnosing the systemd Failure

Before changing configurations, you must confirm why systemd is not working as expected. Is it a kernel-level kill, a systemd-oomd intervention, or a crash resulting in a core dump?

Checking for OOM Kills

If a service suddenly stops, check its status:

systemctl status my-app.service

If you see the exact error message: Main process exited, code=killed, status=9/KILL, it was forcefully terminated. To confirm if it was an OOM kill, check the kernel ring buffer and the systemd journal:

# Check kernel OOM logs
dmesg -T | grep -i -E 'killed process|oom'

# Check systemd-oomd logs
journalctl -u systemd-oomd | tail -n 50

You might see an output like: systemd-oomd[678]: Killed /system.slice/my-app.service due to memory pressure.

Checking for Core Dumps

If the service crashed due to a segmentation fault (often resulting in code=dumped, status=11/SEGV), systemd will generate a core dump. You can view these using:

coredumpctl list
coredumpctl info <PID>

Step 2: Fixing systemd OOM (Out of Memory) Issues

When a service is killed by systemd-oomd or the kernel OOM killer, it means the system or the cgroup ran out of memory. You have several ways to address this.

1. Adjusting Memory Limits (MemoryHigh and MemoryMax)

Often, a service is artificially constrained by systemd unit file limits. You can override these limits without modifying the package-provided unit file by using drop-in files.

systemctl edit my-app.service

Add the following lines to increase the memory limit:

[Service]
# Set a soft limit where systemd starts aggressively throttling the process
MemoryHigh=2G
# Set the absolute hard limit before the OOM killer is invoked
MemoryMax=3G

2. Protecting Critical Services with OOMScoreAdjust

If you have a critical service (like PostgreSQL or MySQL) that absolutely must not be killed during memory pressure, you can adjust its OOM score. The kernel OOM killer looks for the process with the highest score. A score of -1000 completely disables OOM killing for that process.

systemctl edit postgresql.service

[Service]
OOMScoreAdjust=-900

Note: Use this cautiously. If your database consumes all RAM and cannot be killed, the entire server may become unresponsive.

3. Managing systemd-oomd OOMPolicy

In systemd version 243+, you can define how systemd reacts to an OOM event within the cgroup using OOMPolicy=.

[Service]
# Options: continue, stop, kill
OOMPolicy=continue

Setting this to continue means systemd won't terminate the entire cgroup if one child process goes OOM, which is highly useful for worker-based applications like Gunicorn or PHP-FPM.

Step 3: Resolving systemd High CPU Usage

Sometimes, the issue isn't memory, but systemd high cpu usage. If systemd-journald or the main systemd process (PID 1) is pinned at 100% CPU, it usually indicates an I/O bottleneck or an application spamming logs.

Identify the culprit: Run journalctl -f to see if an application is writing thousands of lines per second.
Rate Limit Journald: Edit /etc/systemd/journald.conf to throttle aggressive logging:
```
[Journal]
RateLimitIntervalSec=30s
RateLimitBurst=1000
```
Restart journald: systemctl restart systemd-journald

Additionally, you can use systemd-cgtop to view resource usage per cgroup, which is often more useful than the standard top command for containerized or systemd-managed environments.

Step 4: Fixing systemd Service Not Starting and Permission Denied

If you are dealing with a systemd service not starting, the status will usually show code=exited, status=....

Status 203/EXEC: systemd permission denied

The exact error Main process exited, code=exited, status=203/EXEC is incredibly common. It explicitly means systemd could not execute the binary.

Root Causes & Fixes:

Missing Executable Flag: The file isn't executable. Fix: chmod +x /path/to/binary
Wrong Architecture: You are trying to run an ARM binary on x86_64.
SELinux/AppArmor: The security module blocked execution. Check audit logs: ausearch -m avc -ts recent. If SELinux is the culprit, restore the context: restorecon -Rv /path/to/binary.
Missing Shebang: If it's a script, ensure #!/bin/bash or #!/usr/bin/env python3 is at the very top of the file.

Dependency Failures

If systemd is not working because a service simply won't start, check for dependency failures. If Service A Requires Service B, and Service B fails, Service A will never start.

journalctl -xeu my-app.service

Look for errors like Dependency failed for My Application.

Conclusion

Troubleshooting systemd oom, systemd core dump, and failure states requires a systematic approach. By utilizing journalctl, coredumpctl, and understanding systemd drop-in configurations (systemctl edit), you can stabilize your Linux infrastructure, tame the OOM killer, and ensure your critical services remain highly available.