Error Medic

Troubleshooting ServiceNow Performance Analytics: Resolving 'Job exceeded maximum execution time' and High Memory Consumption

Diagnose and fix slow ServiceNow Performance Analytics jobs, optimize instance performance, and apply best practices to resolve timeout and memory errors.

Last updated:
Last verified:
1,476 words
Key Takeaways
  • Unoptimized indicator source conditions running against unindexed fields are the primary cause of PA job timeouts and instance performance degradation.
  • Excessive breakdown matrices and inefficient PA scripts (running row-by-row) exponentially increase data collection time and memory usage.
  • Resolve immediate timeouts by limiting the date range of historical collections, optimizing queries to hit database indexes, and disabling unnecessary breakdown matrices.
Fix Approaches Compared
MethodWhen to UseTimeRisk
Optimize Indicator SourcesJob logs show slow queries or timeouts on specific indicators1-2 hoursLow
Disable Breakdown MatricesData collector job runs out of memory or takes hours to complete15 minsMedium (Reduces analytical depth)
Refactor PA Scripts to Database ViewsHeavy row-by-row processing is identified in job logs4-8 hoursMedium
Adjust Data Collector System PropertiesGlobal row limits are hit despite optimized queries10 minsHigh (Can cause wider instance degradation)

Understanding the Error

ServiceNow Performance Analytics (PA) is a powerful tool for unlocking enterprise insights and driving continuous improvement. However, as datasets grow, poorly configured PA components can severely impact overall ServiceNow instance performance. When Data Collector jobs become bloated, they lock database tables, consume excessive application node memory, and lead to significant user-facing latency.

System administrators and developers typically encounter several distinct error messages and symptoms when Performance Analytics jobs fail or degrade instance health:

  • Performance Analytics job failed: Maximum execution time exceeded
  • Data Collector job ran out of memory
  • Error: The number of rows fetched (50001) exceeds the limit of 50000
  • General instance sluggishness reported during the execution window of daily or historical PA jobs.

These errors almost always point to structural inefficiencies in how data is being queried, grouped, or calculated during the collection process.

Step 1: Diagnose the Bottleneck

Before making changes, you must identify exactly which component of the PA job is causing the performance degradation. A single job might contain dozens of indicators, and usually, only one or two are responsible for the bottleneck.

1. Review Data Collector Job Logs Navigate to Performance Analytics > Data Collector > Job Logs. Filter for jobs with a state of 'Collected with errors', 'Collected with warnings', or simply sort by 'Duration' descending to find the longest-running jobs.

2. Analyze Job Execution Logs Open a specific problematic job log and navigate to the Job Execution Logs related list. This list breaks down the exact time taken for each indicator and breakdown. Look for anomalies—for instance, an indicator that takes 45 minutes to collect while others take 10 seconds. Note the specific Indicator Source and Breakdowns associated with the slow execution.

3. Identify Slow Database Queries If the PA job is causing global instance performance issues, the database is likely struggling with unoptimized queries. Navigate to System Diagnostics > Stats > Slow Queries. Filter the list for queries executed by the pa_data_collector user or containing conditions that match your PA indicator sources. If you see queries taking thousands of milliseconds and performing full table scans, you have identified the root cause.

Step 2: Fix Indicator Sources and Conditions

The most common reason for ServiceNow performance analytics examples failing is querying large, transactional tables (like task, incident, or sys_audit) without leveraging database indexes.

Optimize Query Conditions: Review the conditions on your slow Indicator Sources. Ensure that the first condition in the filter utilizes a highly selective, indexed field. For example, filtering by sys_created_on ON Today or state = Closed is generally fast because these fields are indexed. Conversely, using operators like CONTAINS, DOES NOT CONTAIN, or MATCHES REGEX on large text fields (like description or work_notes) forces the database to perform a full table scan, bypassing indexes completely.

Filter Out Junk Data: Ensure your Indicator Sources are not pulling in irrelevant data. Add conditions to exclude canceled, duplicate, or test records. The fewer rows the Data Collector has to process, the faster the job will run.

Step 3: Address Breakdown Matrices and Scripts

If your Indicator Sources are optimized but the job is still running out of memory, the issue likely lies in breakdown matrices or PA scripts.

Disable Unnecessary Breakdown Matrices: Breakdown matrices calculate the intersection of multiple breakdowns (e.g., Priority AND Assignment Group). If you have an indicator with 4 breakdowns, and each has 50 elements, collecting the matrix requires calculating $50^4$ combinations. This results in an exponential explosion of data that can easily cause a memory exhaustion error. Navigate to your PA Job, go to the 'Indicators' related list, open the indicator record, and uncheck 'Collect matrix' unless the business explicitly requires cross-breakdown analysis.

Refactor PA Scripts: Performance Analytics scripts evaluate data row-by-row during the collection process. If a script performs complex logic, string manipulation, or (worst of all) GlideRecord queries against other tables, it will drastically slow down the job.

Best Practice: Avoid GlideRecord queries in PA scripts entirely. If you need data from related tables, create a Database View that joins the tables, and build your Indicator Source on the Database View instead. This pushes the processing down to the database layer, which is exponentially faster than processing row-by-row in the application layer.

Step 4: System Properties and Benchmarks ServiceNow

Sometimes, enterprise datasets legitimately exceed default system limits. ServiceNow provides several system properties to govern Data Collector behavior, found under Performance Analytics > System Properties.

  • com.snc.pa.dc.max_row_count_indicator_source: (Default 50,000). The maximum number of rows an indicator source can process. If you hit this limit, first try to segment your data or improve filters. Only increase this property as a last resort, as it directly increases memory consumption.
  • com.snc.pa.dc.max_records: The maximum number of records processed by a single job.

ServiceNow Benchmarks Best Practices: ServiceNow Benchmarks allow you to compare your instance's KPIs against industry peers. However, participating in benchmarks requires running specific PA jobs. To ensure these jobs don't impact your instance:

  1. Schedule for Off-Peak: Ensure all historical and heavy daily data collection jobs (including benchmark jobs) are scheduled during your organization's lowest traffic hours (typically weekend nights or 2 AM - 4 AM).
  2. Incremental Historical Collection: Never run a historical job for 3 years of data at once. Break it up. Run the job for Q1 2023, let it finish, then run Q2 2023. This prevents massive database locks and timeouts.

Frequently Asked Questions

javascript
// Background script to identify currently running or hung PA Data Collector jobs
// Run this in Scripts - Background to quickly check job status

var paJobLogGr = new GlideRecord('pa_job_log');
paJobLogGr.addQuery('state', 'IN', 'collecting,queued');
paJobLogGr.orderByDesc('sys_created_on');
paJobLogGr.query();

if (paJobLogGr.hasNext()) {
    gs.info('--- Currently Active or Queued Performance Analytics Jobs ---');
    while (paJobLogGr.next()) {
        var jobName = paJobLogGr.job.getDisplayValue();
        var state = paJobLogGr.state.getDisplayValue();
        var started = paJobLogGr.sys_created_on.getDisplayValue();
        var recordsProcessed = paJobLogGr.inserts;
        
        gs.info('Job: ' + jobName + ' | State: ' + state + ' | Started: ' + started + ' | Records Inserted: ' + recordsProcessed);
    }
} else {
    gs.info('No Performance Analytics jobs are currently active or queued.');
}

// To forcefully cancel a hung job, you can navigate to the pa_job_log record and change the state to 'Canceled',
// or use a targeted GlideRecord update (use with caution in production).
E

Error Medic Editorial

The Error Medic Editorial team consists of senior DevOps engineers, Site Reliability Engineers, and ServiceNow Certified Master Architects dedicated to solving complex enterprise infrastructure and platform performance challenges.

Sources

Related Guides