SQL Server Agent Job Failure History Report

Report SQL Server Agent Job Status and Failure History

This script queries the SQL Server Agent job history stored in msdb to produce a detailed report of each job's most recent run status, failure and success counts over the last 60 days, total lifetime run counts, and job step information including subsystem type and proxy name.

Purpose and Overview

SQL Server Agent jobs automate the critical work of database backups, index maintenance, integrity checks, ETL loads, and more. When a job fails, the impact can range from a missed backup to a halted data pipeline. Knowing which jobs failed, how recently they failed, and whether failures are isolated events or a recurring pattern is essential for any DBA. This script, written by Jason Brimhall and published in the ktaranov/sqlserver-kit community repository, uses a series of common table expressions to build a rich job history report. It shows the most recent outcome for each job status type, counts failures and successes within the last 60 days relative to the latest run, and includes job step details. To report only failures, uncomment one line. Run this script in the context of the msdb database.

Code Breakdown

The complete script is shown below, followed by an explanation of each section. Run this in msdb or prefix the unqualified table references with msdb.dbo..

 1USE msdb;
 2GO
 3WITH jobhist AS (
 4    SELECT
 5        jh.job_id,
 6        jh.run_date,
 7        jh.run_status,
 8        jh.step_id,
 9        ROW_NUMBER() OVER (
10            PARTITION BY jh.job_id, jh.run_status
11            ORDER BY jh.run_date DESC
12        ) AS rownum
13    FROM dbo.sysjobhistory AS jh
14    WHERE jh.step_id = 0
15),
16agglastsixty AS (
17    SELECT
18        sjh.job_id,
19        sjh.run_status,
20        COUNT(*) AS RunCount
21    FROM dbo.sysjobhistory AS sjh
22    INNER JOIN jobhist AS jh
23        ON jh.job_id = sjh.job_id
24       AND jh.run_status = sjh.run_status
25       AND jh.step_id = sjh.step_id
26    WHERE CONVERT(DATE, CONVERT(VARCHAR(30), sjh.run_date))
27              > DATEADD(d, -60, CONVERT(DATE, CONVERT(VARCHAR(30), jh.run_date)))
28        AND jh.rownum = 1
29    GROUP BY sjh.job_id, sjh.run_status
30),
31aggtotal AS (
32    SELECT
33        sjh.job_id,
34        sjh.run_status,
35        COUNT(*) AS RunCount
36    FROM dbo.sysjobhistory AS sjh
37    INNER JOIN jobhist AS jh
38        ON jh.job_id = sjh.job_id
39       AND jh.run_status = sjh.run_status
40       AND jh.step_id = sjh.step_id
41    WHERE jh.rownum = 1
42    GROUP BY sjh.job_id, sjh.run_status
43)
44SELECT
45    j.name                                           AS JobName,
46    sc.name                                          AS CategoryName,
47    j.date_created,
48    j.enabled,
49    CONVERT(DATE, CONVERT(VARCHAR(30), oa.run_date)) AS RunDate,
50    CASE oa.run_status
51        WHEN 0 THEN 'Failed'
52        WHEN 1 THEN 'Succeeded'
53        WHEN 2 THEN 'Retry'
54        WHEN 3 THEN 'Canceled'
55        WHEN 4 THEN 'In Progress'
56    END                                              AS run_status,
57    als.RunCount                                     AS Last60StatusCount,
58    agt.RunCount                                     AS TotalStatusCount,
59    js.subsystem                                     AS JobStepSubsystem,
60    js.command,
61    spr.name                                         AS ProxyName,
62    spr.credential_id
63FROM dbo.sysjobs AS j
64LEFT JOIN jobhist AS oa
65    ON oa.job_id = j.job_id
66LEFT OUTER JOIN agglastsixty AS als
67    ON als.job_id = oa.job_id
68   AND als.run_status = oa.run_status
69LEFT OUTER JOIN aggtotal AS agt
70    ON agt.job_id = oa.job_id
71   AND agt.run_status = oa.run_status
72INNER JOIN sys.server_principals AS sp
73    ON j.owner_sid = sp.sid
74INNER JOIN dbo.syscategories AS sc
75    ON j.category_id = sc.category_id
76INNER JOIN dbo.sysjobsteps AS js
77    ON js.job_id = j.job_id
78LEFT JOIN dbo.sysproxies AS spr
79    ON js.proxy_id = spr.proxy_id
80WHERE oa.rownum = 1
81-- AND oa.run_status = 0  -- uncomment to show only failed jobs
82ORDER BY RunDate DESC;

CTE 1: jobhist — Most Recent Run per Status per Job

The jobhist CTE reads from dbo.sysjobhistory and assigns a row number using ROW_NUMBER() partitioned by job_id and run_status, ordered by run_date DESC. This means row number 1 is the most recent time a job ended with each status code. The filter WHERE jh.step_id = 0 restricts to job-level outcome rows. In sysjobhistory, step_id 0 is always the summary row for the entire job — individual job steps have step_id values of 1, 2, 3, and so on. Filtering to step_id = 0 avoids double-counting.

CTE 2: agglastsixty — Count per Status in Last 60 Days

agglastsixty counts how many times each job ended with each run_status in the 60 days leading up to the most recent occurrence of that status. It joins back to jobhist on job_id, run_status, and step_id, then filters on rownum = 1 to anchor the date window to the most recent run. The DATEADD comparison finds all history rows within 60 days before that anchor date. This gives a rolling-60-day count without requiring a fixed calendar date in the query.

CTE 3: aggtotal — Lifetime Count per Status

aggtotal is the same structure as agglastsixty but without the 60-day date restriction. It counts every historical occurrence of each status for each job, giving a total lifetime run count. Comparing Last60StatusCount to TotalStatusCount shows whether failures are recent or historical.

Main SELECT: Job Details and Step Information

The main query joins dbo.sysjobs (one row per agent job) to the three CTEs, dbo.syscategories (job category like Database Maintenance or Data Collector), dbo.sysjobsteps (one row per step per job), and dbo.sysproxies (proxy accounts used by job steps for running under alternate credentials).

The CASE expression on oa.run_status translates the integer status codes into readable text: 0 = Failed, 1 = Succeeded, 2 = Retry, 3 = Canceled, 4 = In Progress.

The WHERE clause oa.rownum = 1 keeps only the most recent run row for each job and status. To restrict the output to failed jobs only, uncomment AND oa.run_status = 0.

Key Benefits and Use Cases

  • Shows the most recent run outcome for every SQL Server Agent job in one result set
  • Counts failures and successes over the last 60 days to identify recurring problem jobs
  • Total lifetime count shows whether a failure is new or has been happening for a long time
  • Job step subsystem column identifies T-SQL, SSIS, PowerShell, CmdExec, and other step types
  • Proxy name column reveals which credential a step runs under, useful for permission troubleshooting
  • Simple one-line change to filter the output to failed jobs only

Performance Considerations

  • msdb context required: The table names in this script (dbo.sysjobs, dbo.sysjobhistory, etc.) are unqualified — they assume the current database is msdb. Run the script after USE msdb; or prefix each table with msdb.dbo. if running from a different database context.
  • sysjobhistory growth: On servers with many jobs running frequently, msdb.dbo.sysjobhistory can grow large. SQL Server Agent has a built-in history retention setting under SQL Server Agent Properties > History. Keeping a manageable row limit (such as 10,000 rows per job and 100,000 rows total) keeps msdb healthy.
  • run_date format: The run_date column in sysjobhistory stores dates as integers in YYYYMMDD format, not as proper date values. The CONVERT(DATE, CONVERT(VARCHAR(30), run_date)) pattern converts this integer to a proper date for comparison and display.
  • sysjobsteps join multiplies rows: Because the script joins to dbo.sysjobsteps, a job with multiple steps produces multiple output rows — one per step. If you want one row per job rather than one row per step, add a filter such as AND js.step_id = 1 or aggregate the step information.

Practical Tips

To see only jobs that failed in the last 24 hours, add a date filter to the WHERE clause:

1WHERE oa.rownum = 1
2    AND oa.run_status = 0
3    AND CONVERT(DATE, CONVERT(VARCHAR(30), oa.run_date)) >= CAST(DATEADD(DAY, -1, GETDATE()) AS DATE)

To quickly list all currently running jobs without the CTE overhead, use the simpler msdb.dbo.sp_help_job stored procedure with @execution_status = 1 (executing) or query msdb.dbo.sysjobactivity directly.

For a summary report that shows one row per job (not per step), remove the join to dbo.sysjobsteps and dbo.sysproxies and adjust the SELECT list accordingly. This is useful for a quick pass/fail dashboard across all jobs.

Schedule this script as a SQL Server Agent job that runs each morning and writes results to a reporting table. Send an alert email when any row in that table has run_status = 'Failed' and Last60StatusCount > 1 — meaning the job has failed more than once recently and is not a one-off.

Conclusion

This script from Jason Brimhall provides a thorough view of SQL Server Agent job history by combining the most recent run status with 60-day and lifetime failure counts, plus job step details. The CTE-based structure makes it easy to adapt — uncomment the failure filter for a quick failure report, or adjust the 60-day window to match your monitoring period. Running it in msdb on a regular schedule gives DBAs a reliable way to catch recurring job failures before they escalate into data loss or missed SLA events.

References

Posts in this series