MariaDB Crash Recovery: Fixing "InnoDB: Operating system error", OOM Kills, and Connection Limits
Comprehensive guide to MariaDB crash recovery. Learn how to fix InnoDB corruption, OOM killer terminations, connection limits, and replication lag.
- OOM (Out of Memory) kills by the Linux kernel are a primary cause of sudden MariaDB crashes. Monitor dmesg for mysqld terminations.
- Corrupted InnoDB tablespaces or transaction logs (often caused by hard crashes or disk full events) prevent MariaDB from starting successfully.
- Exhausted resources, such as disk space or connection limits ('Too many connections'), can manifest as complete database lockups or pseudo-crashes.
- Use the 'innodb_force_recovery' directive in my.cnf to salvage data from a corrupted database, stepping from level 1 up to 6 only as a last resort.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| innodb_force_recovery=1 to 3 | Minor InnoDB corruption preventing startup; dirty pages. | 15-30 mins | Low |
| innodb_force_recovery=4 to 6 | Severe corruption, redo log failures. Used to dump data before rebuild. | 1-4 hours | High (Data loss possible) |
| Point-in-Time Recovery (Backup) | Unrecoverable corruption, dropped tables, or catastrophic hardware failure. | Depends on DB size | Low |
| Tune max_connections / open_files_limit | Error 1040: Too many connections, or 'Too many open files' crashes. | 10 mins | Low |
Understanding MariaDB Crashes
When a MariaDB instance crashes, the immediate symptom is usually a failed connection attempt returning ERROR 2002 (HY000): Can't connect to local server through socket '/run/mysqld/mysqld.sock' (2) or a sudden termination of active queries.
Before taking drastic measures, it's critical to identify the root cause. A crash is usually the result of one of the following:
- Hardware/OS Resource Exhaustion: The Linux Out-Of-Memory (OOM) killer, a full disk, or CPU starvation.
- InnoDB Corruption: Data page or redo log corruption preventing the storage engine from initializing.
- Configuration Limits: Hitting
max_connections,open_files_limit, or excessive replication lag that stalls the server.
Step 1: Diagnose the Crash
The first step in any database crash recovery is examining the logs. Do not immediately attempt to restart the database without checking why it stopped.
1. Check the MariaDB Error Log
The default location is usually /var/log/mysql/error.log or /var/log/mariadb/mariadb.log. Look for the last entries before the crash.
tail -n 100 /var/log/mysql/error.log
Look for errors like:
[ERROR] InnoDB: Attempted to open a previously opened tablespace[ERROR] InnoDB: Database page corruption on disk or a failed file read[ERROR] mysqld got signal 11 ;(Segmentation fault)
2. Check for OOM Kills If MariaDB disappears without any error in its own log, the Linux kernel likely terminated it to save the system from running out of memory.
dmesg -T | grep -i oom
# Or check syslog/messages:
grep -i "killed process" /var/log/messages
If you see Killed process 12345 (mysqld) total-vm:..., you have an OOM issue.
3. Check Disk Space and Permissions A 100% full disk will cause MariaDB to crash when it attempts to write to the binary log, redo log, or temporary tables.
df -h /var/lib/mysql
ls -la /var/lib/mysql
Ensure the mysql user owns all files in the data directory. A mariadb permission denied error often occurs after moving the data directory or restoring backups with the wrong user (e.g., root).
Step 2: Fix OOM and Resource Exhaustion
If the server ran out of memory, you must tune your MariaDB buffers. The biggest culprit is usually the innodb_buffer_pool_size.
- Open
/etc/mysql/mariadb.conf.d/50-server.cnf(or/etc/my.cnf). - Reduce
innodb_buffer_pool_size. A rule of thumb is 50-70% of total RAM, but if you have a 2GB VPS, setting it to 1.5GB will cause OOM kills when the OS or connection threads need memory. - Check connection memory limits. High
max_connectionscombined with large per-thread buffers (join_buffer_size,sort_buffer_size) can easily exceed available RAM.
Fixing "Too Many Connections"
If the error is ERROR 1040 (HY000): Too many connections, the server isn't strictly crashed, but it is inaccessible.
- Connect using the dedicated admin port if configured, or restart the service if absolutely necessary.
- Increase
max_connectionsinmy.cnf:[mysqld] max_connections = 500 - Ensure your application is properly closing database connections to avoid connection leaks.
Step 3: InnoDB Crash Recovery (Corruption)
If MariaDB refuses to start due to InnoDB corruption, you must use the innodb_force_recovery parameter. Warning: Levels 4-6 can permanently corrupt your data. Always back up the data directory (cp -a /var/lib/mysql /var/lib/mysql_backup) before proceeding.
- Edit your MariaDB configuration file:
[mysqld] innodb_force_recovery = 1 - Attempt to start MariaDB (
systemctl start mariadb). - If it fails, increment the value (2, then 3, up to 6) and try again.
- Crucial Step: Once MariaDB starts in recovery mode, it is strictly read-only. You must immediately dump your data using
mysqldump.mysqldump -u root -p --all-databases --routines --triggers > all_databases_rescue.sql - Stop MariaDB, completely remove the corrupted data directory, initialize a new data directory (
mariadb-install-db), remove theinnodb_force_recoveryline, start the fresh database, and restore the SQL dump.
Step 4: Resolving Replication Lag and Deadlocks
While not hard crashes, these cause severe application disruption.
- MariaDB Deadlock: Occurs when two transactions hold locks the other needs. MariaDB automatically rolls back one transaction. Find the culprit by running
SHOW ENGINE INNODB STATUS;and looking at theLATEST DETECTED DEADLOCKsection. Fix this at the application level by ensuring transactions acquire locks in a consistent order. - MariaDB Replication Lag: Check
SHOW SLAVE STATUS\G(orSHOW REPLICA STATUS\G). IfSeconds_Behind_Masteris high, check for slow queries (mariadb slow querylog) blocking the replication thread, or consider enabling parallel replication (slave_parallel_threads). - MariaDB Timeout: If connections are dropping with
MySQL server has gone awayorLock wait timeout exceeded, increaseinnodb_lock_wait_timeoutornet_read_timeout/net_write_timeout, and optimize the slow queries causing the locks.
Frequently Asked Questions
# 1. Check for Out-Of-Memory (OOM) kills by the Linux kernel
dmesg -T | grep -i -E 'oom|killed process'
# 2. Check MariaDB error logs for crash reasons
tail -n 100 /var/log/mysql/error.log
# 3. Check disk space and inode usage
df -h /var/lib/mysql
df -i /var/lib/mysql
# 4. Emergency Backup of the data directory before recovery attempts
sudo systemctl stop mariadb
sudo cp -a /var/lib/mysql /var/lib/mysql_backup_$(date +%F)
# 5. Enable InnoDB Force Recovery (Add to [mysqld] section in my.cnf)
# innodb_force_recovery = 1
# systemctl start mariadb
# 6. Dump all data when started in recovery mode
mysqldump -u root -p --all-databases --routines --triggers --events > full_recovery_dump.sqlError Medic Editorial
The Error Medic Editorial team consists of senior Site Reliability Engineers and Database Administrators dedicated to providing actionable, production-ready solutions for complex infrastructure incidents.