Archive | March, 2015

NOCHECK – the uses and impact

31 Mar

Having been bitten by the delights of a Foreign Key(FK) with NOCHECK I thought it time to do a little digging.

I’ve always been aware of the perils of this option but have encountered a table with this feature and I was caught out by it.
I’m also aware of the impact it can have on the Optimiser but I’ve never looked into it, so I’m taking this opportunity to look at the impact there too.

What is NOCHECK?

Basically, it allows you to add a constraint onto a table without validating the existing contents of the appropriate column. So, for example, I can create a FK constraint for a column against a reference table even though not all of the values currently held will match against that reference table.

I’m going to create a couple of tables:

CREATE TABLE dbo.Locations( 
	ID		VARCHAR(10), 
	[Description]	VARCHAR(20), 
CONSTRAINT PK_Locations PRIMARY KEY CLUSTERED( 
	[ID] ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] 
) ON [PRIMARY] 

GO 


CREATE TABLE dbo.StockItems( 
	ID		INT IDENTITY, 
	[Description]	VARCHAR(50) NOT NULL, 
	Location	VARCHAR(10) NOT NULL 
CONSTRAINT PK_StockItems PRIMARY KEY CLUSTERED( 
	[ID] ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] 
) ON [PRIMARY] 

GO 

And then load them with some simple data:

INSERT INTO [dbo].Locations (ID, [Description]) 
VALUES('W1', 'Warehouse 1'), 
      ('W2', 'Warehouse 2'), 
	  ('W3', 'Warehouse 3'), 
	  ('W4', 'Warehouse 4'), 
	  ('OS1', 'Off-Site Storage 1'); 
GO 

INSERT INTO dbo.StockItems( [Description], Location ) 
VALUES  ( 'Blue Widgets', 'W1'), 
		( 'Green Widgets', 'W3'), 
		( 'Red Widgets', 'OS1'), 
		( 'Round Widgets', 'X2'), 
		( 'Square Widgets', 'W2'); 
GO 

If I wanted to create a FK between dbo.StockItems Location and dbo.Locations ID you would expect a problem, because the value ‘X2’ within the StockItems table doesn’t exist in the Locations table:

NoCheck_01a

And you’d be right, as long as I didn’t specify ‘NOCHECK’.
So, with ‘NOCHECK’ instead:

NoCheck_02

And the Foreign Key has been created.
Although the table ‘StockItems’ still contains an ‘invalid’ entry it is no longer possible to add other invalid values:

NoCheck_03

Because the constraint now checks that column against the reference table.

Reloading Data

Occasionally it may be necessary to remove the data from a table and re-insert it (altering a table but keeping fragmentation to a minimum is one example) and this is where the NOCHECK can act against you:
Let’s move all of the data out of ‘StockItems’ into a temporary table and truncate the DB table:

SELECT *
INTO ##temp
FROM dbo.StockItems;

SELECT * FROM ##temp;

TRUNCATE TABLE dbo.StockItems;
GO

Now try putting the data back with:

SET IDENTITY_INSERT dbo.StockItems ON;

INSERT INTO [dbo].[StockItems](ID, [Description],[Location])
	SELECT ID, [Description], Location
	FROM ##temp
	ORDER BY ID;

SET IDENTITY_INSERT dbo.StockItems OFF;
GO

NoCheck_04

Data that was in that table now cannot be re-inserted, unless I remove the FK constraint, run the insert and then add the FK back.

Validating the Data

From version 2008 onwards there is a DBCC command which will provide information about the integrity of a constraint:

NoCheck_11

This shows what values are currently breaking the constraint and will need dealing with before you can change the constraint to ‘CHECK’.

NOCHECK and the Optimiser

Another, ‘quieter’ issue with using NOCHECK is that the Optimiser ignores Foreign Keys that have that option specified.

Looking at sys.foreign_keys, there is a property ‘is_not_trusted’, and for a FK constraint with NOCHECK this is set to 1. It implies that the data in that column cannot be trusted to be valid, when comparing it against the column that it references.

NoCheck_08

From https://msdn.microsoft.com/en-us/library/ms190273.aspx
“The query optimizer does not consider constraints that are defined WITH NOCHECK. Such constraints are ignored until they are re-enabled by using ALTER TABLE table WITH CHECK CHECK CONSTRAINT ALL.”

Removing the invalid data won’t fix this, but removing the invalid data and changing the FK constraint will.

DELETE dbo.stockitems
WHERE [location] = 'X2';
GO

ALTER TABLE [dbo].[StockItems]  WITH CHECK CHECK CONSTRAINT ALL
--Or
ALTER TABLE [dbo].[StockItems] WITH CHECK CHECK CONSTRAINT [FK_Location]

NoCheck_05

Comparing the Query Plan and Statistics with the ‘before and after’ for ‘is_not_trusted’ :
When ‘is_not_trusted’ = 1:

–(4 row(s) affected)
–Table ‘Locations’. Scan count 0, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
–Table ‘StockItems’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

NoCheck_09

When ‘is_not_trusted’ = 0:

–(4 row(s) affected)
–Table ‘StockItems’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

NoCheck_10

So, where the Optimiser can trust a FK constraint it can make improvements to the efficiency of the plan.

NOCHECK does have its place, but the impact to performance and behaviour needs to be understood before using it.

Advertisements

Automatic Data Purging With Partition Switching

16 Mar

In a previous article I showed a basic example of Partition Switching, using animal names (http://wp.me/p3Vxvi-2u). While not a realistic example for everyday use it did get the principles across. However, certainly in my experience, partitioning is normally dependent upon a number – usually a date.

In this article I’ll create a partition process that is used to keep data for three days (based upon a date within the row) and then remove the data automatically, via a job.

Firstly I require a database for this demonstration.

CREATE DATABASE PartitionTest;
GO

Now I need a Partition Function. Because there can only ever be 3 possible values with this calculation, I can be a little lazy and just specify two of the values. If I just write the function for values of 1 and 2, then 3 will always go into the third partition, normally used for ‘invalid’ value ranges. I can’t have any invalid values (provided it has been coded and tested properly), so use the third partition for the ‘normal’ work instead of out-of-range values.

CREATE PARTITION FUNCTION [pf_Retention](tinyint) AS RANGE LEFT FOR VALUES (1, 2)
GO

The associated Partition Scheme will have three partitions. Values of 1 will go to the first in the list, values of 2 will go to the second in the list and all others will go to the third. There are only ever going to be three values for this example, so I won’t worry about a fourth for invalid values.

CREATE PARTITION SCHEME [ps_Retention] AS PARTITION [pf_Retention] TO ([PRIMARY], [PRIMARY], [PRIMARY])
GO

Having created these I can now create the table that will use this partition Scheme:

CREATE TABLE UserAccess(
    UserAccessID		INT IDENTITY(1,1),
    UserID				INT,
    LoginDateTime       DATETIME,
    LogoutDateTime		DATETIME,
    PartitionNumber		AS (CONVERT([tinyint],abs(datediff(day,CONVERT([date],'20000101',(112)),LogoutDateTime)%(3)+(1)),(0))) PERSISTED NOT NULL,
CONSTRAINT [PK_Petdetails] PRIMARY KEY CLUSTERED
(
    UserAccessID		ASC,
    PartitionNumber     ASC
) WITH (IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON ps_Retention(PartitionNumber)
) ON ps_Retention(PartitionNumber)  ; 

GO 

CREATE TABLE UserAccess_PartitionSwitch(
    UserAccessID		INT IDENTITY(1,1),
    UserID				INT,
    LoginDateTime       DATETIME,
    LogoutDateTime		DATETIME,
    PartitionNumber		AS (CONVERT([tinyint],abs(datediff(day,CONVERT([date],'20000101',(112)),LogoutDateTime)%(3)+(1)),(0))) PERSISTED NOT NULL,
CONSTRAINT [PK_Petdetails_PartitionSwitch] PRIMARY KEY CLUSTERED
(
    UserAccessID		ASC,
    PartitionNumber     ASC
) WITH (IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON ps_Retention(PartitionNumber)
) ON ps_Retention(PartitionNumber)  ; 

GO

Notice that the PartitionNumber column is a calculated, persisted column. This value is based upon the LogoutDatetime, so will be kept for three days from that Datetime value.
At first glance the calculation for the Partition Number can look confusing:

(CONVERT([tinyint],abs(datediff(day,CONVERT([date],'20000101',(112)),LogoutDateTime)%(3)+(1)),(0)))

So let’s start with the basics and build up from there. Substitute the ‘LogoutDateTime’ with an ISO standard date, to help show what is happening.

Using a Base Date (in this case the 1st of January 2000) calculate the number of days between then and 20150310. Of course you need to set a Base date that you will suit your purposes. In this case should I ever have a date in 1999 or earlier then it will not work as expected.

The 112 enforces ISO standard date formatting:

SELECT datediff(day,CONVERT([date],'20000101',(112)),'20150310') --112 is ISO standard, to avoid any confusion

As I’m using a 3 day retention scheme, use a modulo of 3, which will result in 0, 1 or 2. The abs function is just there to ensure no negative values, an unlikely event for this example:

SELECT abs(datediff(day,CONVERT([date],'20000101',(112)),'20150310')%(3))

Because a Partition number of 0 is invalid add 1, so making the only possible values 1, 2 or 3:

SELECT abs(datediff(day,CONVERT([date],'20000101',(112)),'20150310')%(3)+(1))

As it is such a small number, convert it to a TinyInt:

SELECT (CONVERT([tinyint],abs(datediff(day,CONVERT([date],'20000101',(112)),'20150310')%(3)+(1)),(0)))

To make sure the partition function is working, add a few rows of test data:

USE [PartitionTest]
GO

INSERT INTO [dbo].[UserAccess]
           ([UserID]
           ,[LoginDateTime]
           ,[LogoutDateTime])
VALUES     (1
           ,'20150308 01:00:00'
           ,'20150308 02:00:00'),
		    (2
           ,'20150309 01:00:00'
           ,'20150309 02:00:00'),
		    (3
           ,'20150310 01:00:00'
           ,'20150310 02:00:00'),
		    (4
           ,'20150311 01:00:00'
           ,'20150311 02:00:00'),
		    (5
           ,'20150312 01:00:00'
           ,'20150312 02:00:00'),
		    (6
           ,'20150313 01:00:00'
           ,'20150313 02:00:00'),
		    (7
           ,'20150314 01:00:00'
           ,'20150314 02:00:00')

And then check the result – look at the table contents and at how the data is spread across the actual partitions:
Partition_01
From these result it can be shown that the PartitionNumber column is never outside of the range 1, 2, 3 and the partitions themselves match the data spread indicated by the value of PartitionID.
Now to test that the Partition Switching will actually work – ensuring that the two tables have been set up correctly:
Partition_02
The three rows in Partition 2 have been moved to the Partition_Switch table.
So, I know that the structure of the tables is correct and that we can switch partitions. The next step is to create a routine that will perform this action automatically.
Our table is designed to keep data for three days, and to do this successfully we need to make sure that the next partition we are going to need (tomorrow’s partition) is empty before that day starts. So just before midnight we need to remove the data that is about to become four days old.

A Stored Procedure is now required – ‘dbo.Purge_Table’, which ascertains the partition number that is now three days old and purges it (after some checks to make sure it should), because that will be the partition required tomorrow.

CREATE PROCEDURE [dbo].[Purge_Table] AS
    SET NOCOUNT ON 

    DECLARE @DaysToKeep				TINYINT,
        @PurgePartition			TINYINT,
        @TotalPartitions		TINYINT,
        @Today					DATETIME,
        @MaxDateInPartition		DATETIME,
        @DateToPurge			DATETIME,
        @SQLCode				VARCHAR(500) 

    SET @TotalPartitions = 3;
	SET @DaysToKeep = 2; 

    SET @Today = CURRENT_TIMESTAMP
    SET @DateToPurge = DATEADD (dd, (@DaysToKeep * -1), @Today)  --Includes today. We keep today, we keep yesterday but the day before yesterday needs to go 

    SET @PurgePartition = 0     

	SET @PurgePartition = (CONVERT(tinyint,ABS(datediff(day,CONVERT([date],'20000101',(112)),@DateToPurge)%(@TotalPartitions)+(1)),(0))) --Same formula as used within the table, but
																																		 --using the @DateToPurge, to find what partition
																																		 --that date will be in.

    SET @PurgePartition = $Partition.pf_Retention (@PurgePartition)   

	-----------------------------------------------------------------------------------------------
    ---- PURGE TABLE
    -----------------------------------------------------------------------------------------------

	--Get the latest LogOutDateTime from the partition that is going to be purged.
    SELECT @MaxDateInPartition = MAX (LogOutDateTime)
     FROM dbo.UserAccess
     WHERE $Partition.pf_Retention (PartitionNumber) = @PurgePartition 

    IF EXISTS (SELECT 1 FROM dbo.UserAccess
                WHERE PartitionNumber = @PurgePartition) --Check there is data for this partition
		BEGIN
			IF DATEDIFF (dd, @MaxDateInPartition, @Today) >= @DaysToKeep OR @MaxDateInPartition IS NULL --Check the date within this partition is old enough to be removed
			BEGIN
				SELECT @SQLCode = 'ALTER TABLE dbo.UserAccess SWITCH PARTITION ' +
					CAST (@PurgePartition AS varchar) +
					' TO dbo.UserAccess_PartitionSwitch PARTITION ' +
					CAST (@PurgePartition AS varchar) 

				EXEC (@SQLCode) 

				TRUNCATE TABLE dbo.UserAccess_PartitionSwitch	--Having switched the data over, remove it 

				RAISERROR ('The dbo.UserAccess Table has been successfully Purged.', 10, 1)
			END
			ELSE
				RAISERROR ('OPERATION CANCELLED. Incorrect Data would be purged from dbo.UserAccess',16, 1)
		END
	ELSE
		RAISERROR ('dbo.UserAccess - partition has no data', 10, 1)
GO

Having cleared the table (and PartitionSwitch table) I input three rows, with today, yesterday and the day before as the logOutDateTime:
Partition_03
Now execute the Stored Procedure and the row in partition 2 should be switched to the PartitionSwitch table and removed (based on the execution date of 20150316).
Partition_04
Now all that is required is a scheduled job that executes this Stored Procedure at the end of each day, ensuring there is always an empty partition available for the next day and any data older than is required is no longer stored.