Troubleshooting SAP Data Migration: Connection Refused, Timeouts, and Crashes
Fix SAP data migration failures including WSAECONNREFUSED and RFC timeouts. Learn expert SAP troubleshooting steps for slow performance and configuration.
- Network or Gateway exhaustion often causes 'SAP connection refused' during high-volume data loads.
- Memory parameter misconfiguration (e.g., TSV_TNEW_PAGE_ALLOC_FAILED) leads to 'SAP crash' and short dumps in ST22.
- Resolve 'SAP timeout' and 'SAP slow performance' by reducing migration batch sizes, scaling background work processes (rdisp/wp_no_btc), and tuning gateway parameters.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Reduce Batch Size | Frequent timeouts, SQL deadlocks, or slow performance during load | 5 mins | Low |
| Increase Gateway Connections (gw/max_conn) | Seeing WSAECONNREFUSED or RFC communication errors | 15 mins (requires restart) | Medium |
| Tune Memory Parameters (ztta/roll_extension) | ST22 shows memory allocation dumps (TSV_TNEW_PAGE_ALLOC_FAILED) | 30 mins | High |
| Scale Background Work Processes (rdisp/wp_no_btc) | Jobs stuck in 'Ready' state, SAP not working due to queue block | 10 mins (dynamic via RZ11) | Low |
Understanding the Error
When undertaking an SAP data migration, whether moving to SAP S/4HANA using the Migration Cockpit or transferring data to ECC via SAP Data Services (BODS) or Legacy System Migration Workbench (LSMW), encountering system instability is unfortunately common. Engineers often face issues where they see a SAP connection refused error, experience severe SAP slow performance, or suffer a complete SAP crash. This comprehensive SAP troubleshooting guide is designed to walk you through diagnosing and fixing the root causes of these migration blockers.
During large-scale data ingestion, the SAP NetWeaver ABAP stack and the underlying database (HANA, Oracle, SQL Server, MaxDB) are pushed to their absolute limits. Millions of records being validated, transformed, and inserted concurrently generate massive memory consumption, heavy network traffic, and intense disk I/O. If your SAP configuration is not meticulously tuned for bulk data operations, the system will protect itself by dropping connections or terminating memory-hungry processes. This results in the dreaded SAP timeout (RFC_ERROR_COMMUNICATION) or memory dumps. A solid SAP configuration tutorial for migrations must address the application server layer, the database layer, and the network gateway.
Common Error Messages
If your monitoring systems are alerting that SAP not working is the current state, or your migration jobs are failing, you will likely see one of the following exact error messages in your application logs or system traces:
WSAECONNREFUSED: Connection refused(Found in dev_rd trace files or external migration tool logs)RFC_ERROR_COMMUNICATION: SAP timeout(Gateway timeout during remote function calls)TSV_TNEW_PAGE_ALLOC_FAILEDorSYSTEM_NO_ROLL(ST22 ABAP dump indicating extended memory or roll area exhaustion)SQL error 1205: Transaction (Process ID) was deadlocked on lock resources(Visible in SM21 or SM50)CALL_FUNCTION_SIGNON_REJECTED(Maximum concurrent logins or resource limits have been hit)
Step 1: Diagnose
Before wildly changing profile parameters or restarting instances, you must pinpoint the exact bottleneck. Is it the network gateway dropping packets, the SAP application server running out of dialog processes, or the database struggling with table locks? Effective SAP troubleshooting begins with a structured investigation using standard transaction codes.
1. Check the System Log (SM21)
Transaction SM21 is your first stop for system-wide issues. Look for database disconnects, exhausted work processes, or gateway errors. If you see repeated entries stating 'Operating system call recv failed (error no. 10054)', the connection is being forcibly dropped by an intermediary firewall, a load balancer, or the SAP Gateway itself due to an overload condition.
2. Analyze Short Dumps (ST22)
If the data migration causes a SAP crash or terminates a specific background job unexpectedly, immediately check ST22. A TSV_TNEW_PAGE_ALLOC_FAILED dump explicitly means the migration program tried to allocate more memory than permitted by the instance profile or the physical server capacity. This almost always happens if the migration tool sends a massive single payload instead of properly chunking the data into manageable packets.
3. Monitor Work Processes (SM50 / SM66)
Observe transaction SM50 (local instance) or SM66 (global system). Are all your Background (BTC) or Dialog (DIA) processes in a Running state executing sequential database reads/writes, or are they stuck in On Hold (e.g., RFC wait, ENQ wait)? If all available work processes are completely consumed by the migration, new inbound connections from users or other interfaces will result in a SAP connection refused error or hanging screens.
4. Gateway Monitor (SMGW)
Go to transaction SMGW -> Goto -> Logged on Clients. If the number of active RFC connections is hovering near or at your gw/max_conn limit, the gateway dispatcher will actively refuse any new migration threads. This is a primary cause of integration failures between SAP Data Services and the target ECC/S4 system.
Step 2: Fix
Based on your diagnostic findings, apply the following remediations.
Resolution 1: Resolving SAP Connection Refused & Timeouts
If the migration tool (like SAP BODS, MuleSoft, or a custom API script) is configured for high concurrency and opens hundreds of parallel RFC connections, the SAP Gateway will quickly reject them, leading to a SAP connection refused.
To permanently fix this, you need to adjust the SAP Gateway profile parameters. This is a crucial step in preparing the system.
- Log in to the SAP GUI and go to transaction
RZ10. - Select your active Instance Profile and choose 'Extended maintenance'.
- Increase the following parameters (ensure you consult your hardware limits and SAP sizing guidelines first):
gw/max_conn(Maximum number of connections. You may need to increase this from the default of 2000 to 5000 or higher).gw/max_sys(Maximum number of clients. Scale this alongside max_conn).
- If you are seeing a SAP timeout, carefully check the parameter
rdisp/max_wprun_time(Maximum work process run time). If your migration jobs are mistakenly running as Dialog (DIA) processes rather than Background (BTC) processes, they will be hard-killed if they exceedrdisp/max_wprun_time(default is usually 600 seconds). Always switch the migration to use Background processing. If you must use Dialog, temporarily increase this value, but do not leave it high permanently.
Resolution 2: Addressing SAP Slow Performance and Memory Dumps
When transferring millions of business partners or financial documents, memory limits are the first constraint you will hit. If you experience a SAP crash with a TSV_TNEW_PAGE_ALLOC_FAILED dump, the user context has exhausted the roll buffer, extended memory, and heap memory.
- Reduce Batch Sizes (The Safest Fix): This is the most effective and lowest-risk fix. In the SAP S/4HANA Migration Cockpit (LTMC/Fiori app), SAP Data Services, or your middleware layer, dramatically reduce the commit size or batch size. Instead of sending 100,000 records per single RFC call, reduce it to 5,000 or 10,000. This dramatically lowers the memory footprint per work process, preventing the allocator from panicking.
- Tune Memory Parameters (High Risk): In
RZ10, review your memory configuration. You may need to temporarily increaseztta/roll_extension(Extended Memory limit per user) andabap/heap_area_diaorabap/heap_area_nondia(Heap memory limits). Warning: Misconfiguring memory limits can cause the OS to start paging/swapping, which will severely impact the performance of the entire server. Ensure your server has the physical RAM necessary to back these limit increases.
Resolution 3: Scaling Work Processes for Throughput
If you have monitored CPU and RAM and found plenty of idle capacity, but SAP slow performance persists and your migration is crawling, your system might be bottlenecked simply by a lack of configured work processes.
- Go to
RZ11(for temporary, dynamic changes that revert on restart) orRZ10(for permanent changes). - Increase the parameter
rdisp/wp_no_btcto add more background work processes. This allows the Migration Cockpit or background dispatchers to spawn more parallel jobs, increasing your throughput. - Ensure your Operation Mode (
RZ04) is configured to allocate more background processes during the dedicated migration window (e.g., nights or weekends).
Resolution 4: Database Level Optimization
An optimal SAP configuration for migrations must tightly integrate with the database layer. Bulk inserts generate massive amounts of transaction logs (redo logs) and invalidate statistics.
- Log File Sizing: Ensure your database log volumes have sufficient free space and the auto-extend settings are aggressive enough. A completely full log volume will instantly halt the SAP system and crash the migration.
- Update Database Statistics: As you load millions of rows into empty or near-empty tables, the database query optimizer's statistics become completely stale. This leads the DB to choose terrible execution plans, causing massive SAP slow performance. You must schedule a job (via
DB13, DB02, or native DB tools like HANA Studio) to update statistics on the target tables after every 10-20% of the data volume is loaded. - HANA Savepoints: If using SAP HANA, actively monitor savepoint durations. If savepoints take too long (e.g., > 5 seconds) due to heavy disk I/O from the massive data influx, you may see application-level freezing and timeouts. Ensure your underlying disk I/O subsystem meets SAP HANA KPI requirements for write latency.
Post-Migration Cleanup
Once the SAP data migration is officially complete and verified, it is an absolute critical path requirement to revert any risky temporary parameters (such as massively inflated heap memory limits, increased dialog timeouts, or skewed operation modes) back to their established baseline values. Leaving a system configured for a massive one-time data load during normal day-to-day user operations poses a severe stability risk and can lead to unexpected outages.
Frequently Asked Questions
# Example diagnostic commands to run at the OS level (Linux) as the <sid>adm user
# 1. Check if the SAP Gateway port is listening and count established connections
# Replace 3300 with your actual gateway port (33<instance_number>)
netstat -an | grep 3300 | wc -l
# 2. Tail the SAP Gateway developer trace for 'connection refused' or 'resource' errors
tail -f /usr/sap/<SID>/<INSTANCE>/work/dev_rd | grep -i -E 'refused|resource'
# 3. Check for severe memory bottlenecks or swapping at the OS level during the migration
vmstat 2 10
# 4. (SAP HANA specific) Check disk I/O performance on the log volume which can heavily bottleneck data loads
iostat -x -d 2 5
# 5. Review the primary dispatcher and work process trace files directly for deadlocks or DB disconnects
grep -i 'deadlock' /usr/sap/<SID>/<INSTANCE>/work/dev_w*Error Medic Editorial
Error Medic Editorial comprises senior SREs, SAP Basis Administrators, and DevOps engineers dedicated to solving complex enterprise system failures, optimizing large-scale data migrations, and ensuring high availability for mission-critical applications.