Disaster Recovery, Recovery, SQL Server

Sending A Custom Email on DBCC CHECKDB Failure

With SQL Server Maintenance Plans it’s quite possible to run scheduled DBCC CHECKDB consistency checks via an agent job and be emailed when the job fails. Unfortunately the message that gets sent is the generic agent notification which doesn’t necessarily lend itself to understanding quickly what’s going on:

JOB RUN:             ‘Meaningless Job Name’ was run on 25/04/2014 at 14:26:11
DURATION:        0 hours, 0 minutes, 10 seconds
STATUS:               Failed
MESSAGES:        The job failed.  The Job was invoked by User TSG\aUser.  The last step to run was step 1 (Meaningless Step Name).

I wanted to be able to send a much more helpful email message whenever a DBCC CHECKDB failed. In particular I wanted a message that was:

  • Entirely custom  so that I could have it jump out at the receiver as being unusually urgent. The risk in our environment is that with so many emails flying about – something like this could be ignored with disastrous consequences
  • Able to offer guidance on what the to do if you receive the email. This is because the receiver may well not be a SQL Server specialist and may not know how to deal with an urgent corruption issue
  • Able to contain rich formatting to highlight important information

The resulting email is shown below, but you could get much fancier with your formatting if you wish:

Email on DBCC Failure

I haven’t tested the code extensively yet but we are trialling it in production without issue at the moment. I thought I would share the code in case anyone would find it useful. There are a variety of very clever DBCC CHECKDB scripts out there already, but none of the ones I found were written to allow the sending of a custom email body.

Instructions are in line. I’d recommend you run this as an SQL Agent job and have the job also email you on failure to protect against any bugs in the code.

-- Overview
-- Performs a full DBCC CHECKDB on all databases specified below and will email any failures to the email address specified
-- How to use
-- 1. Ensure the mail profile is configured in SQL Server
-- 2. Provide a valid "to" email address
-- 3. Add the databases to check to the @databasesToCheck table variable
-- 4. Schedule this script to run during an appropriate maintenance window

-- Future Improvement
-- Log the output of DBCC to a semi-permenant location to avoid the need to re-run it after the error is detected. This would be important in
-- very large db setups or those with very high availability requirement

-- Ensure the following Database Mail profile is configured and an appropriate email address specified

DECLARE @databasesToCheck TABLE (
	ID INT IDENTITY (1, 1),				-- This allows use to select and process databases in a specified order
	databaseName VARCHAR(250)

-- Add a row for each database you wish to check
INSERT INTO @databasesToCheck

	DECLARE @currentDatabaseName VARCHAR(250)

	-- Define a cursor to allow us to process each database in turn
	SELECT databaseName
	FROM @databasesToCheck

	OPEN curDatabases

	FETCH NEXT FROM curDatabases
	INTO @currentDatabaseName

	-- Loop through each database we want to check

			DECLARE @currentProcessingResult INT

			-- The following statements are really for the benefit of showing something useful in the job history window:
			PRINT ('Processing database ' + @currentDatabaseName)

			-- Not ideal but the only way I know of to dynamically build adhoc DBCC commands
			-- NOTE: We surpress info messages when things are going well. We only care when things go wrong
			SET @dynamicSQL = 'DBCC CHECKDB ([' + @currentDatabaseName + ']) WITH NO_INFOMSGS'

			PRINT ('  Executing DBCC CHECKDB on target database ')

			-- NOTE: Running a straight DBCC CHECKDB will not raise an exception even if corruption is found. The only way to determine if failure occured
			-- is to use the return value and determine if we need to raise our own exception. See here for further details:
			-- http://support.microsoft.com/kb/953503
			EXEC @currentProcessingResult = sp_executesql @dynamicSQL

			-- If the last run detected an error, raise an error and stop
			IF @currentProcessingResult <> 0
					-- Weird limitation in RAISERROR - it can't take a string concatentation so we need to do it like this
					DECLARE @errorMessage VARCHAR (1000) = 'DBCC CHECK DB failed for database ' + @currentDatabaseName
					RAISERROR (@errorMessage, 11, 1)

			PRINT ('  DBCC CHECKDB completed successfully with no errors on database ' + @currentDatabaseName)

			FETCH NEXT FROM curDatabases
			INTO @currentDatabaseName


	CLOSE curDatabases
	DEALLOCATE curDatabases


	DECLARE @emailBody NVARCHAR(4000)
	DECLARE @catchErrorMessage VARCHAR (1000) = ERROR_MESSAGE()

	PRINT('Critical Error Detected: ' + @catchErrorMessage)

	-- Generate a formatted email using HTML. Trying to format in plain text is a bit of a pain. HTML allows you to create a template in Visual Studio
	-- and pretty much copy it in here verbatim
	SET @emailBody = N'
	        <style type="text/css">
                #divContent {
                    font-family: Tahoma, Arial, Verdana, sans-serif;
                    font-size: 0.8em;

        <div id="divContent">

		    <p><strong>Warning: A potentially serious issue has been detected during overnight corruption checks on SQL Server. Please do not ignore this message as our ability to recover data may be effected</strong>

            <p>The error message provided by the agent job was <em>"' + @catchErrorMessage + '"</em> </p>' +

            '<p><strong>Why has this message been sent?</strong>
                <br />
                This message has been sent because corruption has been detected in the following database: <br />

            <ul><li><strong>' + @currentDatabaseName + '</strong> </li></ul>

            <p>Please note that the overnight DBCC CHECKDB process will have stopped once the first database corruption was detected. Other databases may also be corrupt and should be checked manually</p>
                <strong>Why is it important that we respond quickly?</strong> <br />
                If one of the live databases has been corrupted, the best chance of recovering data is via a previous backup. Because backups are only retained for a short period of time it is essential that we isolate the backup most likely to help us recover as quickly as possible <br />

                <strong>What should happen next?</strong>

                <li>Run DBCC CHECKDB on the problem database again to retrieve the full details of the corruption. This may take a significant amount of time to complete so consider saving the results.</li>
                <li>Isolate all backups from the affected database as they may be needed to effect a repair</li>
                <li>Contact Microsoft Technical Support to get their assistance with the repair operation</li>
                <li><strong>DO NOT</strong> run DBCC CHECKDB with "repair_allow_data_loss" before speaking to Microsoft first! This may cause data loss unnecessarily and will quite possibly make the situation worse</li>
                <li>If possible find a location to restore at least one backup to</li>
                <li>Investigate the underlying cause of the corruption. It is important to understand why the corruption occured before it occurs again</li>


	EXEC msdb.dbo.sp_send_dbmail
		@profile_name = @mailProfileName,
		@recipients = @toEmailAddresses,
		@body_format = 'HTML',
		@subject = 'WARNING: A potentially serious issue has been detected during overnight database consistency checks',
		@body = @emailBody

	-- This is minging but depending on how the code above fails, we could still have an open cursor at this point.
	-- Close it and ignore the possibility that its already closed
        --clean it up
		CLOSE curDatabases
		DEALLOCATE curDatabases
        -- Would happen if the cursor is already closed. Wishing TSQL had a TRY CATCH FINALLY construct here

	-- Raise an additional exception to cause any calling job to also fail. This will increase the chance of detection if there is a problem
	-- with database mail
	RAISERROR (@catchErrorMessage, 11, 1)



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s