Error Medic

ServiceNow Troubleshooting Guide: Fixing "Transaction cancelled: maximum execution time exceeded" and Slow Performance

Fix ServiceNow slow performance, 'Transaction cancelled' timeouts, and configuration crashes with expert diagnostic scripts and data migration optimizations.

Last updated:
Last verified:
2,553 words
Key Takeaways
  • Identify performance bottlenecks immediately using the Response Time Indicator to differentiate between Browser, Network, and Server latency.
  • Resolve 'Transaction cancelled' errors by optimizing inefficient GlideRecord queries, adding database indexes, and utilizing asynchronous processing via event queues.
  • Prevent complete node crashes during bulk data migration by disabling workflows and business rules (setWorkflow(false)) and leveraging Concurrent Import Sets.
Fix Approaches Compared for ServiceNow Slowness
MethodWhen to UseTimeRisk
Index Optimization & Query TuningSlow database queries, high server processing time, full table scansMediumLow
Increasing Transaction QuotasImmediate temporary relief for critical REST/UI timeoutsFastHigh (Masks root cause, risks node stability)
Asynchronous Processing (Events)Heavy server-side scripts, complex integrations, large aggregationsHighLow
Disabling Business Logic via ScriptBulk data imports, historical data loads, mass updatesFastMedium (May skip necessary auditing or sync logic)

Understanding ServiceNow Performance and Configuration Errors

When managing an enterprise ServiceNow instance, administrators frequently encounter a range of issues from general slowness to complete transaction timeouts. A common error message users might see is Transaction cancelled: maximum execution time exceeded or Default quota rule 'REST API Request Timeout' applied. These issues often stem from suboptimal configuration, inefficient server-side scripts, or massive data migration tasks that overwhelm the platform's database nodes.

ServiceNow is a massive, multi-tenant cloud platform, but your specific instance operates within defined resource constraints, specifically regarding worker threads, database connections, and memory allocation. When these resources are exhausted, the platform protects itself by terminating long-running transactions, which manifests to the end-user as a crash, a timeout, or a seemingly broken application. Understanding the root causes of these resource exhaustion events is critical for any ServiceNow administrator or Site Reliability Engineer (SRE).

The Anatomy of a ServiceNow Crash or Timeout

ServiceNow operates on a multi-node architecture, typically load-balanced across multiple physical or virtual application servers. When users report 'ServiceNow not working' or 'ServiceNow crash', it rarely means the entire cloud infrastructure has gone down. More likely, specific application nodes are experiencing thread starvation, memory exhaustion (OutOfMemoryError), or database connection pool depletion. This is usually triggered by several common anti-patterns:

  1. Inefficient GlideRecord Queries: The most common culprit. Queries lacking proper index usage or utilizing CONTAINS operators on extremely large tables (like task, cmdb_ci, or sys_audit) force the underlying MariaDB database to perform full table scans. This locks rows, spikes CPU usage on the database tier, and cascades back to the application tier as slow response times.
  2. Runaway Business Rules: Before and After business rules that enter infinite loops (e.g., Incident A updates Incident B, which triggers a rule updating Incident A) or perform synchronous outbound REST calls. Synchronous calls block the worker thread until the third-party system responds. If the third-party system is slow, your ServiceNow threads get tied up, leading to a queue buildup and eventual timeouts.
  3. Heavy Data Migrations: Using standard import sets without batching, or triggering complex workflows on millions of imported records simultaneously, can completely overwhelm the background scheduler and database.
  4. Client-Side Memory Leaks: Massive dashboards or forms with poorly written Client Scripts that generate thousands of DOM elements or continually poll the server via GlideAjax without clearing intervals.

Step 1: Diagnose the Bottleneck (Client vs. Network vs. Server)

The first step in the ServiceNow troubleshooting guide is pinpointing the exact layer causing the delay. Blindly changing server configurations without knowing if the issue is actually in the user's browser is a waste of time.

Enable the Response Time Indicator: In the bottom right corner of the classic UI, or via the stats.do page, you can analyze the response time breakdown for every transaction. This is your primary diagnostic tool.

  • Browser: Time spent rendering the DOM and executing UI Scripts/Client Scripts. High browser time points to heavy Client Scripts, synchronous GlideAjax calls, or massive DOM sizes (e.g., loading 1000+ rows in a list or a very complex UI page).
  • Network: Latency between the user's physical location and the ServiceNow data centers. This is usually outside your control but can be mitigated by optimizing payload sizes or using a CDN for custom assets.
  • Server: Time spent processing business rules, ACLs, evaluating script includes, and executing database queries. High server time is where we focus our backend configuration fixes and code refactoring.

Check the System Logs (syslog): Navigate to System Logs > Errors. Look for recurring patterns matching the time of reported slowness. Specifically, use the list filters to search for keywords like Slow Query, Long-running script, or Transaction cancelled. The 'Transaction Logs' (syslog_transaction) table is also invaluable here; you can filter for transactions where Response time > 5000 (5 seconds) to find the worst offenders.

Step 2: Fix 'ServiceNow Slow Performance' and Timeouts

Optimizing Database Queries (GlideRecord)

When scripts execute poorly constructed queries, the database struggles. Always prefer addQuery() with indexed fields. Avoid addEncodedQuery() with LIKE or MATCH_RGX if possible, as these bypass database indexes.

Anti-pattern (Do not do this):

var gr = new GlideRecord('incident');
// A 'CONTAINS' query on a large text field without a text index forces a full table scan
gr.addQuery('short_description', 'CONTAINS', 'server outage');
gr.query();

Optimized (Do this instead): Ensure fields like 'short_description' have a Zing text index generated, or rely on global search capabilities. If querying reference fields, always query by the sys_id, not the display value.

var gr = new GlideRecord('incident');
// Querying by indexed reference field
gr.addQuery('caller_id', 'a9e8f7g6h5j4k3l2m1n0'); 
// Limit the result set if you only need a few records
gr.setLimit(10);
gr.query();
Fixing Quota Rule Timeouts

If you encounter Default quota rule 'REST API Request Timeout' applied or a similar quota cancellation message, it means a transaction has exceeded the maximum allowable time defined by the platform's governors. The default for a UI transaction is typically around 298 seconds, while REST API requests might be limited to 60 seconds.

Solution:

  1. Navigate to System Definition > Transaction Quotas in the application navigator.
  2. Locate the rule affecting your transaction type (e.g., REST API, UI Action, Background Script).
  3. While you can increase the timeout limit temporarily to force a critical process through, this is a dangerous band-aid that risks platform stability. The true best practice is to optimize the underlying transaction. If it's a massive data aggregation or migration, you must switch to asynchronous processing. Utilize GlideSystem.eventQueue() to fire an event, and have a Script Action pick up the heavy lifting in the background, freeing up the interactive UI thread.

Step 3: ServiceNow Data Migration Troubleshooting

During large-scale 'servicenow data migration' projects, administrators often experience severe platform degradation. Importing millions of CI (Configuration Item) records or historical incident tickets can easily cause a 'servicenow crash' if not handled delicately.

To migrate millions of records safely without hitting performance degradation:

  1. Use Concurrent Import Sets: Standard import sets process rows sequentially. Concurrent Import Sets utilize multiple scheduled workers to process the transform map in parallel chunks. This drastically reduces the total import time.
  2. Disable Business Rules and Workflows (Conditionally): When importing historical data, you almost never want to trigger the standard business logic. You don't want SLA engines to recalculate, and you certainly don't want thousands of email notifications firing off to users for tickets closed 3 years ago.

In your Transform Map script (specifically the onBefore script), you can programmatically bypass these engines:

// Inside an onBefore Transform Script
if (source.u_historical_flag == 'true') {
    target.setWorkflow(false); // Disables Business Rules, Workflows, and Flow Designer flows
    target.autoSysFields(false); // Prevents updating sys_updated_on and sys_updated_by
    target.setEnginesOff(true); // Disables additional engines like SLA and Data Policies
}

This simple change can reduce the processing time per row from 500ms down to 10ms, which is the difference between a successful migration and an entire weekend of downtime.

Step 4: Resolving Configuration Conflicts and ACL Issues

'ServiceNow configuration' issues often manifest as missing UI elements, empty reference fields, or explicit security errors. If a form is not loading correctly or users see the dreaded 'Security constraints prevent access to requested page' message, the issue lies in the Access Control Lists (ACLs).

  1. Security Incident Response / ACL Debugging: Do not guess which ACL is failing. Enable System Security > Debugging > Debug Security. Then, impersonate the affected user (using the Impersonate User feature) and attempt to reload the problematic record or list. The debug output at the bottom of the screen will show a detailed matrix of every table-level and field-level ACL evaluated, explicitly highlighting which specific rule rejected the user due to a missing role or a failed script condition.
  2. Update Set Preview Errors: Configuration changes are moved between instances (Dev -> Test -> Prod) via Update Sets. Never skip or ignore Update Set preview errors. If a preview shows 'Found a newer version of this record', it means someone made a direct configuration change in the target environment that conflicts with your update. You must manually compare the XML payloads and choose whether to accept your version or keep the existing one. Ignoring this leads to broken scripts and severe configuration drift in your production environment.

Step 5: Advanced Server-Side Troubleshooting Techniques

When standard logs don't reveal the root cause, SREs must dig deeper into the ServiceNow platform's underlying JVM and database performance metrics.

ServiceNow Performance Home and Diagnostics: Navigate to System Diagnostics > Diagnostics Page. This dashboard provides real-time visibility into the health of your application nodes. Key metrics to monitor include:

  • Available Memory: If memory consistently hovers near 0% and Garbage Collection is running constantly, you have a memory leak, likely caused by an infinite loop accumulating objects in an array, or a massive GlideRecord query attempting to load millions of rows into RAM simultaneously instead of iterating properly with .next().
  • Active UI/Background Threads: ServiceNow nodes have a finite number of worker threads (semaphores) available for processing web requests and background jobs. If the 'Active UI Threads' count hits the maximum allowed (often 16 or 32 depending on node size), all subsequent user requests will queue up, resulting in users staring at a spinning loading wheel until transactions eventually hit the timeout threshold and cancel.

Analyzing Database Connection Pools: ServiceNow communicates with its MariaDB backend via a connection pool. If a poorly optimized script holds a database connection open for an extended period without releasing it (e.g., executing complex nested while loops while holding a GlideRecord lock), the pool can become exhausted. This leads to the entire node stalling, as no other transactions can execute database queries. You can monitor database connection wait times in the syslog_transaction table.

Thread Dumps and Platform Support: In extreme cases where a node is completely unresponsive, you may need to request a Thread Dump from ServiceNow Technical Support. A thread dump captures the exact stack trace of every executing thread on the JVM at a specific moment in time. By analyzing the thread dump, support engineers can tell you exactly which script include or business rule was executing on a specific thread that caused it to lock up. While you cannot generate a full JVM thread dump directly from the UI, you can use the Active Transactions (All Nodes) module to forcefully kill long-running threads before they crash the node entirely. Always exercise caution when killing transactions, as it can leave data in a partially updated, inconsistent state.

Step 6: Managing Integration and API Performance

Enterprise ServiceNow environments rarely exist in a vacuum. They are connected to dozens of other systems via REST, SOAP, and JDBC integrations. These external dependencies are frequent sources of 'ServiceNow slow performance'.

Outbound Integration Bottlenecks: If ServiceNow triggers an outbound REST message synchronously during a user transaction (e.g., querying an external inventory system when a user opens an incident), the user's browser will hang until the external system responds. If the external inventory system goes down or experiences extreme latency, it pulls ServiceNow down with it. Best Practice: Always execute outbound integrations asynchronously. Trigger an event that runs a Script Action, or use Flow Designer with asynchronous actions. If a synchronous call is absolutely unavoidable, ensure you have extremely strict and short timeout values configured on the HTTP request to fail fast rather than hanging the thread.

Inbound API Overload: Conversely, external systems can overwhelm ServiceNow by bombarding its REST APIs with thousands of requests per second. This can consume all available API semaphores, locking out other integrations. To mitigate this, utilize Rate Limiting (available in modern ServiceNow releases). You can configure rate limits based on the integration user account or IP address, ensuring that a single aggressive external application cannot DDoS your instance and consume all background processing power.

Conclusion

Mastering ServiceNow troubleshooting requires moving beyond simple restarts and entering a mindset of methodical diagnosis. By properly identifying the failing tier (client, network, or server), isolating the specific transaction via system logs, and applying targeted code optimizations, you can stabilize even the most complex enterprise instances. Whether you are refactoring inefficient GlideRecord queries, safely handling massive data migrations via asynchronous events, or debugging complex ACL configurations, a systematic approach is your best defense against system crashes and performance degradation.

Frequently Asked Questions

javascript
// Diagnostic Script: Find long-running transactions in the last hour
var logGr = new GlideRecord('syslog_transaction');
logGr.addEncodedQuery('sys_created_onONLast hour@javascript:gs.beginningOfLastHour()@javascript:gs.endOfLastHour()^type=background^ORtype=ui^response_time>10000');
logGr.orderByDesc('response_time');
logGr.setLimit(10);
logGr.query();
while (logGr.next()) {
    gs.print('Slow Transaction: ' + logGr.url + ' - Time: ' + logGr.response_time + 'ms - User: ' + logGr.sys_created_by);
}

// Fix Script: Optimizing Data Migration by Disabling Workflows
// Add this to an onBefore Transform Script
(function runTransformScript(source, map, log, target /*undefined onStart*/ ) {
    // Disable business rules, notifications, and auditing for this import to prevent timeouts
    target.setWorkflow(false);
    target.autoSysFields(false);
})(source, map, log, target);
E

Error Medic Editorial

Error Medic Editorial comprises senior DevOps and SRE professionals dedicated to solving complex enterprise platform issues. With decades of combined experience in ServiceNow architecture, our team provides battle-tested troubleshooting guides and optimization strategies.

Sources

Related Guides