Automating SQL Backup verification testing with PowerShell

Introduction
Talk to any person responsible for a database, no matter what platform, and ask them what is their most important responsibility, I would hope that all of them would list backups at the top of their list. Managing these backups is a well documented and tested process. There are plenty of tools, native and 3rd party, for managing backups: Keeping track of missed backups, last backups, backup size, backup duration, backup type… the list goes on and on. However, a backup strategy is pretty much useless if it isn’t tested. Simply relying on a ‘backup completed’ check is setting yourself up for disaster.

This process is less popular for vendors and DBAs alike. The requirements for testing, the availability of platforms for testing, the company requirements for testing all differ from platform to platform and company to company.

In this article I will present a series of PowerShell functions that work with SMO objects to automatically manage this process in my environment. With a few small changes, the scripts can be tweaked to fit pretty much any scenario that manages more than 1 or 2 instances.

The Scenario
My requirement was driven by quarterly SOX auditing that required proof that we completed regular verification our of SQL backups and then document that proof. In my current position, I am managing thousands databases spread across multiple instances. The priority of each system is entirely dependent on it’s function in the company, ERP systems take position #1, and the rest scale on down from there.

The source of the proof for the audit was previously coming from a number of different sources:
• ERP Refreshes – As part of regular business operations, the ERP systems were refreshed from dev to Quality , Production to Quality and D, Q or P to sandbox, depending on the requirement.
• Regular system refreshes – Moving databases between both vendor and proprietary in house environments, from prod to dev and back.
• Recovery – Emergency tickets to restore production databases to test environments to recover accidentally deleted data.
• DBA Backup testing – A Manual process to validate a random database or 2 on a regular interval (in my case monthly).

When any of these events came to pass, it was taken as an opportunity to collect the proof of restore and save it into the restore log. The first 3 were usually not predictable, as they relied solely on business requirements. However, as a lone DBA doing a monthly manual restore of random databases, I felt that this was not something that I should be spending any more than a couple of minutes \ month, and my recent love affair with PowerShell prompted me to take the plunge and write the code necessary to automate and (mostly) forget about this job.

The actual process is a fairly simple one:
1. Select an random Instance.
2. Select a random Database.
3. Select a Backup.
4. Attempt a restore.
5. Collect data from restored database as proof.
6. Delete database.
7. Email results.

There were, however, a couple of caveats as to what I could realistically accomplish within our environment.

Although most of our instances sit in our data center, we have numerous instances hosted in sites around the world, but these generally back up to local disk, so on-disk retention was usually only a couple of days, so I was limited to testing backups on instances in our data center. The databases in the DC are backed up using native SQL tools to an EMC Data Domain via a network share. The on disk retention for these was 30 days, so I was going to be unable to test any backups older than 30 days, without requesting a restore from our backup team. This met our SOX requirements though, as all important systems are in the central DC, and as long as the backup was being tested, auditors would be happy.

I have a dedicated vm that I use for my monitoring, so this was where I was going to be restoring to. I only had 50GB of useable space to do my restores to though, so I was only going to be able to concentrate on databases > 50GB.

The Code

Logging
Troubleshooting PowerShell, or any scripting language for that matter, isn’t always straight forward, so for any complex piece of code, I usually throw in a logging function, to output steps processed with and relevant variables:

Function Write-DebugLog
{
Param([String]$Message, [String]$Path)
try
{
$date = Get-Date -Format “yyyy-MM-dd HH:mm:ss”
$outFile = $Path
$msg = $date + ": " + $Message
$msg | Out-File -FilePath $outFile -Append
}
catch
{
write-output "Exception Type: $($_.Exception.GetType().FullName)"
write-output "Exception Message: $($_.Exception.Message)"
}
}

Instance Selection
I manage all of my instances by storing relevant data in a home grown database that contains a list of all servers, instances, databases, data files etc. In my script I am using this database as the selection tool for finding a random instance to test against. Your code can easily be amended to use a txt file that contains all of your servers, or your own inventory database:

Function Get-RestoreSourceServer
{
param([object]$instance, [String]$LogFile);
try
{
$sourceInstance = $null
#select random server
while(!($sourceInstance))
{
$sourceInstanceName = $instance.ConnectionContext.ExecuteScalar("select top 1 instance_name from MONITOR.dbo.instance where `
instance_ip like '%1.1.%' order by newid()"); #the IP identifies all of data center servers that I am concerned about
#if this is a named instance, strip all chars post \
if ($sourceInstanceName.Contains("\"))
{
$computername = $sourceInstanceName.Replace($sourceInstanceName.SubString($sourceInstanceName.IndexOf("\")),"");
}
else
{
$computername = $sourceInstanceName
}
Write-DebugLog -Message "Attempting to connect to Instance on server ($computername)" -Path $LogFile
#check its availability, if available, lets try and get a database, with a valid backup
if(Test-Connection -ComputerName $computername -Count 1 -Quiet)
{
#connect to our instance
$sourceInstance = New-Object ("Microsoft.SqlServer.Management.Smo.Server") $sourceInstanceName
Write-DebugLog -Message "Connected successfully to Instance ($sourceInstanceName)" -Path $LogFile
}
else
{
$sourceInstance = $null
Write-DebugLog -Message "Unable to Connect to Instance on ($computername)" -Path $LogFile
}
}

#our 2 return values
$true
$sourceInstance
}
catch #output an exception, and return a false flag, and an empty object
{
Write-DebugLog -Message "Exception Message: $($_.Exception.Message) Exception Type: $($_.Exception.GetType().FullName)" -Path $LogFile
$false
$null
}
};

Database Selection
Once I have a valid Instance to work from, I will then select a database from the server, that is less than 50GB in size, by querying the sys.databases system table:

Function Get-RestoreSourceDatabase
{
param([object]$instance, [String]$LogFile);
try
{
Write-DebugLog -Message "Checking for Database on $instance" -Path $LogFile
#we have a valid instance, lets find a valid database that is 50GB in size, or less and not a system db
$database = $instance.Databases | Where-Object{($_.ID -gt 4) -and ($_.size -lt 50000)} | get-random -Count 1
Write-DebugLog -Message "$database found on Instance $instance" -Path $LogFile
#ouput our boolean value and the database
$true
$database
}
catch #output an exception, and return a false flag, and an empty object
{
Write-DebugLog -Message "Exception Message: $($_.Exception.Message) Exception Type: $($_.Exception.GetType().FullName)" -Path $LogFile
$false
$null
}
};

Backup Selection
After a database has successfully been chosen then we can check for a valid backup to restore from. The process here is to find a backup that is less than 30 days old, and is still on disk, and available to restore from. SMO let me down here, as I was hoping to be able to enumerate the backup sets for the database, and then create an SMO object for that backup to pull file data from, but was unable to find an instantiate a valid object, so I went to the msdb backup tables to find data (if anyone knows a way to get an SMO object that represents a backup, with file location, please let me know, and I will update the scripts.):

Function Get-RestoreBackupLocation
{
param([object]$Database, [String]$LogFile);
try
{
Write-DebugLog -Message "Checking for Valid Backup for $Database" -Path $LogFile
$backupDate = Get-Date;
$backupDate = $backupDate.AddDays(-30); #current retention period
#Find all full backups for the database less than 30 days old (current retention period for backups on disk)
$backupset = $Database | %{$_.EnumBackupSets()} | Where-Object{($_.BackupSetType -eq 1) -and ($_.BackupStartDate -gt $backupDate)} | get-random -Count 1
$backupsetID = $backupset.ID
$MediaSetId = $backupset.MediaSetId
Write-DebugLog -Message "Validating Backup on BackupSetID($backupsetID) and MediaSetId($MediaSetId)" -Path $LogFile
$backup_dataset = $Database.ExecuteWithResults("SELECT backup_set_id, bs.media_set_id, bmf.physical_device_name `
from msdb.dbo.backupset bs LEFT JOIN msdb.dbo.backupmediafamily bmf `
ON bs.media_set_id = bmf.media_set_id `
where backup_set_id = $backupsetID and bs.media_set_id = $MediaSetId");

foreach ($t in $backup_dataset.Tables[0].Rows)#only ever one row, 2 values returned
{
$backup_loc = $t.physical_device_name
}
Write-DebugLog -Message "Testing Backup Physical Path ($backup_loc)" -Path $LogFile
#if this path is valid, then we can retrun this as a valid object
if(Test-Path -Path $backup_loc)
{
Write-DebugLog -Message "Backup Physical Path ($backup_loc) is valid" -Path $LogFile
#create a new object from the variables
$properties = @{ `
'backupsetID'=$backupsetID; `
'backup_loc'=$backup_loc; `
'valid'=$true}

$object = New-Object -TypeName PSObject -Property $properties;
return $object
}
else #return a false flag, and an empty object
{
Write-DebugLog -Message "Backup Physical Path ($backup_loc) is NOT valid" -Path $LogFile
$properties = @{ `
'backupsetID'=0; `
'backup_loc'=""; `
'valid'=$false}

$object = New-Object -TypeName PSObject -Property $properties;
return $object
}
}
catch #output an exception, and return a false flag, and an empty object
{
Write-DebugLog -Message "Exception Message: $($_.Exception.Message) Exception Type: $($_.Exception.GetType().FullName)" -Path $LogFile
$properties = @{ `
'backupsetID'=0; `
'backup_loc'=""; `
'valid'=$false}

$object = New-Object -TypeName PSObject -Property $properties;
return $object
}
};

Restore the Database to our target instance
As stated previously, once we have found a valid backup still on disk, we will restore the database to the monitoring instance. I add a _restored onto the name as it is restored, to make sure that there are no accidents, and we end up overwriting an existing database.

Function Run-RestoreBackupVerification
{
Param([object]$Database,[object]$target,[object]$Backup, [String]$LogFile);
try
{
#get the relevant backup data
$backup_location = $Backup.backup_loc;
$backupsetID = $Backup.backupsetID;
Write-DebugLog -Message "Attempting to restore backup($backup_location) of Database($Database)" -Path $LogFile
$smo = "Microsoft.SqlServer.Management.Smo";
$restore_name = "_restored"; #my choice to idenitfy the database on the target instance
$restore_path = "S:\SQLDATA"; # S id dedicated to testing these restores on my monitoring instance.

#set a new name, just in case!
$restore_name = $Database.Name + $restore_name;

#build our restore object
$BackupDevice = New-Object ("$smo.BackupDeviceItem") ($backup_location, 'File');
$Restore = New-Object ("$smo.Restore");
$Restore.Checksum = $True;
$Restore.Devices.Add($BackupDevice);
$Restore.Database = $restore_name;
$Restore.ReplaceDatabase = $False; # make sure we aren't breaking anything!

Write-DebugLog -Message "Attempting to Get Data Files for backup ('$backup_location')." -Path $LogFile
# this is how we find all the files for the database in the selected backup set
$backupfiles = $Database.EnumBackupSetFiles($backupsetID);
foreach ($file in $backupfiles)
{
$FileName = $restore_path + "\" + [System.IO.Path]::GetFileName($file.PhysicalName)
$NewFileLoc = New-Object ("$smo.RelocateFile") ($file.LogicalName, $FileName)
$Restore.RelocateFiles.Add($NewFileLoc)
}
#run our restore
$Restore.SqlRestore($target);
Write-DebugLog -Message "Completed Restore of backup($Backup.backup_loc) to Database($restore_name)" -Path $LogFile
$true
$restore_name
}
catch #output an exception, and return a false flag, and an empty object
{
Write-DebugLog -Message "Exception Message: $($_.Exception.Message) Exception Type: $($_.Exception.GetType().FullName)" -Path $LogFile
$false
$null
}
};

Verify the backup and email results
Once the database is restored, I take a row count of all user tables and output this to a file.

Function Get-DatabaseRowCounts
{
Param([string]$DatabaseName,[string]$sourceInstance,[object]$targetInstance, [String]$LogFile);

try
{
Write-DebugLog -Message "Attempting to collect row data of database($DatabaseName)" -Path $LogFile
#set the database object
$Database = $targetInstance.Databases.Item("$DatabaseName")

#set our file name and path
$date = Get-Date
$date = $date.ToShortDateString().Replace("/","")
$file = $date + "_" + $sourceInstance.Replace("\","$") + "_" + $Database.Name + "_TableCount.txt"
$path = "C:\temp\"
$fullPath = $path + $file
[String]$header = "TableName, RowCount"

Write-DebugLog -Message "Creating output file $fullPath" -Path $LogFile
#create the file
New-Item -Path $path -ItemType "file" -Name $file -Value $tableList
#add the header row
Add-Content -Path $fullPath -Value $header

Write-DebugLog -Message "Adding data to output file $fullPath" -Path $LogFile

#get the data to build our file contents
$ds = $Database.ExecuteWithResults("SELECT OBJECT_NAME(OBJECT_ID) as TableName, `
st.row_count FROM sys.dm_db_partition_stats st `
WHERE index_id < 2 ORDER BY st.row_count DESC")

foreach ($row in $ds.Tables[0].Rows)
{
$newRow = "`n" + $row.TableName + "," + $row.row_count;

Add-Content -Path $fullPath -Value $newRow;

}
Write-DebugLog -Message "Output file complete. Dropping database ($Database)" -Path $LogFile
#we are through, set our true flag and output the database.
$Database.Drop() | out-null;
$true
$fullPath
}
catch #output an exception, and return a false flag, and an empty object
{
Write-DebugLog -Message "Exception Message: $($_.Exception.Message) Exception Type: $($_.Exception.GetType().FullName)" -Path $LogFile
$false
$null
}
};

Put it all Together
There could by any stages above where the code could fail out, it could try and execute against a server that is having maintenance on it, the backup file could be one that isn’t on disk any more, the restore could fail… All of these items will be recorded in the log, so we can investigate why the selected values did not work, but we really want to validate that a proper backup was selected and restored ok, so we assign a Boolean value to all of the above functions, and only a true value from every function will close out the process and send the email to the DBA group:


$targetInstance = New-Object ("Microsoft.SqlServer.Management.Smo.Server") SERVER1 #This is where we will restore our database, and the source of the database lists
$processComplete = $false;
$instance = $false;
$database = $false;
$backup = $false;
$restored = $null;
$output = $false;
$attachment = $null;

#output items - this is the log file that will store steps taken, and any errors.
$Log = "C:\temp\BackupVerification_" + [DateTime]::Now.ToString("yyyyMMdd_HHmmss") + ".txt";

#mail items
$smtp = "smtp.domain.com";
$from = "dba@domain.com";
$to = "dba@domain.com";
$subject = "DBA: SQL Server backup verification results";
$body = "Find attached the latest reults from SQL Agent job: DBA - Backup verification Process.";
$attachment = @();# 2 attachments \ message, 1 log and 1 table row output
#mail items
while($processComplete -eq $false)
{
#reset everything
$instance = $false;
$database = $false;
$backup = $false;
$restored = $null;
$output = $false;
Write-DebugLog -Message "--------------------------------------" -Path $Log
#step 1 - get a source instance for our testing
$instance = Get-RestoreSourceServer -instance $targetInstance -LogFile $Log
if($instance[0] -eq $true)
{
#step 2 - get a source database for our testing
$database = Get-RestoreSourceDatabase -instance $instance[1] -LogFile $Log
if($database[0] -eq $true)
{
#step 3 - get the backup file
$backup = Get-RestoreBackupLocation -Database $database[1] -LogFile $Log
if($backup.valid -eq $true)
{
#step 4 - restore the database
$restored = Run-RestoreBackupVerification -Database $database[1] -target $targetInstance -Backup $backup -LogFile $Log
if($restored[2] -eq $true)
{
#step 5 - output our results
$output = Get-DatabaseRowCounts -DatabaseName $restored[3] -sourceInstance $instance[1].Name -targetInstance $targetInstance -LogFile $Log
if($output[1] -eq $true)
{
#we are through, send the email, set the flag, and we are done.
$attachment += $Log
$attachment += $output[2]
Send-MailMessage -SmtpServer $smtp -From $from -To $to -Subject $subject -Body $body -Attachments $attachment | out-null
$processComplete = $true;
}
else
{
Write-DebugLog -Message "ERROR: Could not output File" -Path $Log
continue
}
}
else
{
Write-DebugLog -Message "ERROR: Could not restore Database" -Path $Log
continue
}
}
else
{
Write-DebugLog -Message "ERROR: Could not get valid Backup" -Path $Log
continue
}

}
else
{
Write-DebugLog -Message "ERROR: Could not get Valid Database" -Path $Log
continue
}
}
else
{
Write-DebugLog -Message "ERROR: Could not get a valid Instance" -Path $Log
continue
}
};

Conclusion
I have one script that holds all of the above code, and I run it as a job in SQL agent on a weekly basis. It can be called as an Operating System Command in the Agent:

Agent Details

It will then send me an email as below:

EmailDetails

This does not get round all of the steps required for the audit process. I still have to upload the results to our hosted SharePoint site, and record the date of the run, but I can now run this on a daily, weekly, monthly or ad-hoc basis and keep our SOX team happy that we are running random tests on a selection of our databases, and along with the other refreshes, described above, we are covering most scenarios.

A full copy of the code can be downloaded here.

Thanks for reading

Advertisements
This entry was posted in Powershell, SQL Server and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s