Analyzing SQL Agent job activity using PowerShell

From time to time, it is up to DBAs and sysadmin’s to prove themselves innocent of charges that ‘they’ are killing the entire [insert core infrastructure system here] with “their” system activity.

Recently, we have had some serious problems on our relatively new Enterprise class SAN system (vendor names will be withheld to protect the innocent). On a daily occurrence, the SAN was experiencing 100% Write Pending cache for an hour or 2, then dropping off. During these spikes, LUNs attached to mail stores went offline, ERP systems slowed to a crawl, and all manner of other “catastrophic” events took place. Eyes were almost lost with the speed it took for the fingers to fly up and start pointing at each other. But those of us in the Enterprise IT group quickly began rational analysis of the various processes that were running during the outages.

As the only Enterprise DBA, it was up to me to check for any SQL based activity / jobs that may be hitting the SAN during these time frames, or at the very least, provide some evidence that these systems were performing at usual levels. Normally I have a much narrower target when starting these investigations “Our SharePoint server is very slow, can you have a look”, so digging into metrics and looking at activity, is pretty straight forward. This time however, the target was very wide.

Luckily, my best friend PowerShell is in my life now, and I could quickly begin to analyze data across all my servers in some simple scripts. As this was a very specific timeframe, spike starts at 3.00am ends at 4.00am, my first thought was “is there a job firing during this spike?”.

So I set out to query all of my servers, and output and jobs into xl for analysis,. Below is the end result:


cls
[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | out-null
$MonitorInstance = New-Object ("Microsoft.SqlServer.Management.Smo.Server") "MyMonitoringServer"
$MonitorDB = $MonitorInstance.Databases.Item("SQLMonitor")

$ds = $MonitorDB.ExecuteWithResults("[usp_SEL_sql_name_with_instance_id]")

#create our main job array
$objarray = @()

#set a date object
$current_date = Get-Date
$start_time = Get-Date -Date "2013-09-04 19:00:00"
$end_time = Get-Date -Date "2013-09-04 22:30:00"

#loop through each instance
foreach ($set in $ds.Tables[0].Rows)
{

$instance = $set.instance_name
# Create an SMO connection to the instance
$srv = New-Object ('Microsoft.SqlServer.Management.Smo.Server') $instance

#get all the jobs on the server
$jobs = $srv.JobServer.Jobs

#go through each job and find data for the job name we want
foreach ($job in $jobs)
{
if($job.LastRunDate -gt $current_date.AddDays(-1)) #get the jobs run in the last 24 hours
{

$jobName = $job.Name;
$jobEnabled = $job.IsEnabled.ToString();
$jobLastRunDate = $job.LastRunDate;
$jobLastRunOutcome = $job.LastRunOutcome.ToString();
$jobNextRunDate = $job.NextRunDate.ToString();
$jobStatus = $job.CurrentRunStatus.ToString()
$duration = 0;

foreach($step in $job.JobSteps)
{
#Set Duration
$duration += $step.LastRunDuration;
}
$jobEndDate = $jobLastRunDate.AddSeconds($duration).ToString()

#set our job start and job end variables
$jobstart = $job.LastRunDate
$jobend = $jobLastRunDate.AddSeconds($duration)

if(($jobstart -gt $start_time -and $jobstart -lt $end_time) `
-or ($jobend -lt $end_time -and $jobend -gt $start_time) `
-or ($jobstart -lt $start_time -and $jobend -gt $end_time))
{
$properties = @{ `
'Instance'=$instance; `
'JobName'=$jobName; `
'Enabled'=$jobEnabled; `
'LastRunDate'=$jobLastRunDate; `
'LastRunOutcome'=$jobLastRunOutcome; `
'NextRunDate'=$jobNextRunDate; `
'Status'=$jobStatus; `
'Duration'=$duration; `
'EndDate'=$jobEndDate;}

$object = New-Object –TypeName PSObject –Prop $properties
$objarray += $object
}
}
}
}

#output the jobs
$objarray | select * | Export-Csv C:\DBCC.csv -NoTypeInformation

I should point out, that I keep a monitoring server that has a home grown database with a list of all my currently managed instances in it, and you can see I query that to pull the data I require to loop through each instance. The above script can be easily changed to pass in a text file with a comma separated list of servers.

The flow of the script is pretty easy to follow:

• Go to my monitoring server and get a list of all active instances.
• Set up an array to store my job objects.
• Specify a time frame to search.
• Create an SMO instance to the server currently in the loop.
• Search for jobs that fall into our time frame.
• Create a new object containing the properties of the job we would like to investigate.
• Add that job object into our array.
• Next Please.

Once it has run, it then finally outputs the data into a csv file, so I can go ahead and filter, sort and manage the output in xl. Below is a screenshot of the data I got out for my late evening time frame, as you can see, I make free use of the jobs provided by the Mighty Ola Hallengren:

JObSchedulecapture

I ran this script a number of times, after a particularly bad spike was experienced, and each time it showed that no SQL jobs matched the spikes. This went a long was to convincing the IT managers that SQL jobs were not the cause of the problems we were experiencing.

The investigation is still ongoing into exactly what was causing the problem, even the SAN Vendor engineers are scratching their head at the lack of a smoking gun, but migrating data to a smaller SAN and reducing workloads (backups, SQL jobs, VM activity) during the time frame, has at least brought it under control, to the point that user activity is unaffected, and the spikes are smaller more controlled.

Thanks for Reading.

Advertisements
This entry was posted in Powershell, SQL Server and tagged , , , , . Bookmark the permalink.

2 Responses to Analyzing SQL Agent job activity using PowerShell

  1. matt says:

    Hi,

    Great post.. I *think* there might be a minor issue with part of the script though….only because I came across the same problem.

    It seems that the lastrunduration can’t simply be summed because its not actually the number of seconds the step has run for. It’s actually in ‘hhmmss’ format, so it would need to be split up and converted. See http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.management.smo.agent.jobstep.lastrunduration.aspx to confirm.

    Cheers

    Matt

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s