Getting More Data Than You Bargained For With NOLOCK

29 May ThompsonTwins

Generally, when somebody posts a query using the NOLOCK hint, there are a variety of warnings about dirty reads, phantom reads and double-reading (due to page splits).
The dirty read example is common and tends to be the main example used to demonstrate what to watch out for with NOLOCK, but I’ve never seen an example of double-reading. So, I thought I’d try and see how much of an issue such a thing can be. It certainly sounds plausible, but is it a ‘one in a million’ chance, or is it a Terry Pratchett ‘one in a million’ chance – where it is pretty much guaranteed to happen?

So, create a database for this testing and then create a table within:

CREATE TABLE [dbo].[Test1](
	[ID] [int] IDENTITY(1,1) NOT NULL,
	[Test_Description] [nvarchar](max) NULL,
PRIMARY KEY CLUSTERED 
(
	[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY]
) ON [PRIMARY]

GO

A very simple table – an Identity column and a text column. The important part is that I’ve specified a Fill Factor of 100 for the Clustered Index, meaning there is no space for expansion of the data in the current location. Therefore, if I increase the contents of the ‘Test_Description’ column it should force a page-spilt, because the data won’t fit back where it was originally stored.

I then generate 500,000 rows with a simple integer id and the entry ‘TEST’ in the ‘Test_Description’ column.
In one window of SSMS I now set up a query to update this column on every row. For text-based test data I tend to use a text file of ‘War and Peace’, from Project Gutenberg. For this test I just use the first paragraph.

In another window I set up a query to return every row from the same table, specifying ‘READ UNCOMMITTED’ isolation level.
Once prepared I run the two queries together:

Nolock_02

Nolock_01

The update has returned a count of 500,000, which matches the number or rows I placed into that table.
However, the select has returned a count of 586,078. Considerably higher than the number of rows within that table. In addition, it can be seen within the results that I’m retrieving a mixture of updated and non-updated rows.

Exporting the results to a csv file and then into Excel, I can sort the results and see a large number of duplicates (based on the ID column):

Nolock_03

The Select found the rows before they were updated, and then again when the page split had moved the data, retrieving the row twice.
If I leave the select query to run once the update has completed, I get 500,000 rows – all updated.

This was a relatively easy test to set up, and has been consistent in the fact that it has duplicated a large number of rows every time. Of course, specifying a fill factor of 100 would pretty much guarantee a page split somewhere, but in the day-to-day operations of most databases, pages will become full and splits will happen eventually. So the scale of the issue might not be as large as the example I’ve created, but the underlying issue may still be present.

Obviously, if this was something more important that a short piece of text (financial data, for example) then the results derived from this query would be wildly inaccurate.

Compress All DB Files on a Server Instance

30 Apr

Or rather, on a Test or Development server instance. If you need to do this to a Live server, then you have larger issues.

Most sites have servers for Developers and DBAs to work their dark magic, as well as various test servers for the next level of testing. These servers do, of course, have databases that nobody admits to creating and other more recognisable databases that have had huge amounts of test data inserted, updated, removed and then generally left, like a bloated whale carcass – taking far more space then they now need.

There are several options in these cases:

1. Remove the databases that are no longer required.
This isn’t a quick fix, when a server has run out of space and all development has come to an abrupt halt. The database might have a use that only a few people are aware of, and they’re off on holiday, in meetings or just ignoring my stroppy emails. The best solution in such cases is to take the database offline and wait a few days, so not an immediate fix for space shortage.

2. Move database files.
Also not necessarily an option. Some setups are in specific locations for a reason and there might be no viable storage anywhere else for that server.

3. Shrink the databases.
The most common option for these servers, but some can have a large number of files, so using SSMS for this can be rather tedious.

4. Ask for more Storage.

Hydra2
Yeah, right.

These servers aren’t treated well, but that is their purpose. It saves the QA and Live servers from a lot of grief later in the development cycle.

So, I had a look around for something that I could run against a server instance and would compress all of the databases for me. There are a few bits and pieces around, but nothing that quite did what I was after.
What I wanted was something that would give me some choice over what I compressed, because sometimes it is simply a waste of time to compress a very large file for for a little gain.

So, I have written something, stored it as a plain text file and it can be downloaded from here.

It doesn’t backup databases beforehand, nor does it check the health of the database (before or after). I was after something that performed the basics and saved me a lot of time.

And if you think this routine has solved all of your problems on your mission-critical systems, you really need to read this Paul Randal – Why You Should Not Shrink Your Data Files. The inside of your database will look like the car under those tank-tracks.

NOCHECK – the uses and impact

31 Mar yes-no-buttons

Having been bitten by the delights of a Foreign Key(FK) with NOCHECK I thought it time to do a little digging.

I’ve always been aware of the perils of this option but have encountered a table with this feature and I was caught out by it.
I’m also aware of the impact it can have on the Optimiser but I’ve never looked into it, so I’m taking this opportunity to look at the impact there too.

What is NOCHECK?

Basically, it allows you to add a constraint onto a table without validating the existing contents of the appropriate column. So, for example, I can create a FK constraint for a column against a reference table even though not all of the values currently held will match against that reference table.

I’m going to create a couple of tables:

CREATE TABLE dbo.Locations( 
	ID		VARCHAR(10), 
	[Description]	VARCHAR(20), 
CONSTRAINT PK_Locations PRIMARY KEY CLUSTERED( 
	[ID] ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] 
) ON [PRIMARY] 

GO 


CREATE TABLE dbo.StockItems( 
	ID		INT IDENTITY, 
	[Description]	VARCHAR(50) NOT NULL, 
	Location	VARCHAR(10) NOT NULL 
CONSTRAINT PK_StockItems PRIMARY KEY CLUSTERED( 
	[ID] ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] 
) ON [PRIMARY] 

GO 

And then load them with some simple data:

INSERT INTO [dbo].Locations (ID, [Description]) 
VALUES('W1', 'Warehouse 1'), 
      ('W2', 'Warehouse 2'), 
	  ('W3', 'Warehouse 3'), 
	  ('W4', 'Warehouse 4'), 
	  ('OS1', 'Off-Site Storage 1'); 
GO 

INSERT INTO dbo.StockItems( [Description], Location ) 
VALUES  ( 'Blue Widgets', 'W1'), 
		( 'Green Widgets', 'W3'), 
		( 'Red Widgets', 'OS1'), 
		( 'Round Widgets', 'X2'), 
		( 'Square Widgets', 'W2'); 
GO 

If I wanted to create a FK between dbo.StockItems Location and dbo.Locations ID you would expect a problem, because the value ‘X2’ within the StockItems table doesn’t exist in the Locations table:

NoCheck_01a

And you’d be right, as long as I didn’t specify ‘NOCHECK’.
So, with ‘NOCHECK’ instead:

NoCheck_02

And the Foreign Key has been created.
Although the table ‘StockItems’ still contains an ‘invalid’ entry it is no longer possible to add other invalid values:

NoCheck_03

Because the constraint now checks that column against the reference table.

Reloading Data

Occasionally it may be necessary to remove the data from a table and re-insert it (altering a table but keeping fragmentation to a minimum is one example) and this is where the NOCHECK can act against you:
Let’s move all of the data out of ‘StockItems’ into a temporary table and truncate the DB table:

SELECT *
INTO ##temp
FROM dbo.StockItems;

SELECT * FROM ##temp;

TRUNCATE TABLE dbo.StockItems;
GO

Now try putting the data back with:

SET IDENTITY_INSERT dbo.StockItems ON;

INSERT INTO [dbo].[StockItems](ID, [Description],[Location])
	SELECT ID, [Description], Location
	FROM ##temp
	ORDER BY ID;

SET IDENTITY_INSERT dbo.StockItems OFF;
GO

NoCheck_04

Data that was in that table now cannot be re-inserted, unless I remove the FK constraint, run the insert and then add the FK back.

Validating the Data

From version 2008 onwards there is a DBCC command which will provide information about the integrity of a constraint:

NoCheck_11

This shows what values are currently breaking the constraint and will need dealing with before you can change the constraint to ‘CHECK’.

NOCHECK and the Optimiser

Another, ‘quieter’ issue with using NOCHECK is that the Optimiser ignores Foreign Keys that have that option specified.

Looking at sys.foreign_keys, there is a property ‘is_not_trusted’, and for a FK constraint with NOCHECK this is set to 1. It implies that the data in that column cannot be trusted to be valid, when comparing it against the column that it references.

NoCheck_08

From https://msdn.microsoft.com/en-us/library/ms190273.aspx
“The query optimizer does not consider constraints that are defined WITH NOCHECK. Such constraints are ignored until they are re-enabled by using ALTER TABLE table WITH CHECK CHECK CONSTRAINT ALL.”

Removing the invalid data won’t fix this, but removing the invalid data and changing the FK constraint will.

DELETE dbo.stockitems
WHERE [location] = 'X2';
GO

ALTER TABLE [dbo].[StockItems]  WITH CHECK CHECK CONSTRAINT ALL
--Or
ALTER TABLE [dbo].[StockItems] WITH CHECK CHECK CONSTRAINT [FK_Location]

NoCheck_05

Comparing the Query Plan and Statistics with the ‘before and after’ for ‘is_not_trusted’ :
When ‘is_not_trusted’ = 1:

–(4 row(s) affected)
–Table ‘Locations’. Scan count 0, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
–Table ‘StockItems’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

NoCheck_09

When ‘is_not_trusted’ = 0:

–(4 row(s) affected)
–Table ‘StockItems’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

NoCheck_10

So, where the Optimiser can trust a FK constraint it can make improvements to the efficiency of the plan.

NOCHECK does have its place, but the impact to performance and behaviour needs to be understood before using it.

Automatic Data Purging With Partition Switching

16 Mar

In a previous article I showed a basic example of Partition Switching, using animal names (http://wp.me/p3Vxvi-2u). While not a realistic example for everyday use it did get the principles across. However, certainly in my experience, partitioning is normally dependent upon a number – usually a date.

In this article I’ll create a partition process that is used to keep data for three days (based upon a date within the row) and then remove the data automatically, via a job.

Firstly I require a database for this demonstration.

CREATE DATABASE PartitionTest;
GO

Now I need a Partition Function. Because there can only ever be 3 possible values with this calculation, I can be a little lazy and just specify two of the values. If I just write the function for values of 1 and 2, then 3 will always go into the third partition, normally used for ‘invalid’ value ranges. I can’t have any invalid values (provided it has been coded and tested properly), so use the third partition for the ‘normal’ work instead of out-of-range values.

CREATE PARTITION FUNCTION [pf_Retention](tinyint) AS RANGE LEFT FOR VALUES (1, 2)
GO

The associated Partition Scheme will have three partitions. Values of 1 will go to the first in the list, values of 2 will go to the second in the list and all others will go to the third. There are only ever going to be three values for this example, so I won’t worry about a fourth for invalid values.

CREATE PARTITION SCHEME [ps_Retention] AS PARTITION [pf_Retention] TO ([PRIMARY], [PRIMARY], [PRIMARY])
GO

Having created these I can now create the table that will use this partition Scheme:

CREATE TABLE UserAccess(
    UserAccessID		INT IDENTITY(1,1),
    UserID				INT,
    LoginDateTime       DATETIME,
    LogoutDateTime		DATETIME,
    PartitionNumber		AS (CONVERT([tinyint],abs(datediff(day,CONVERT([date],'20000101',(112)),LogoutDateTime)%(3)+(1)),(0))) PERSISTED NOT NULL,
CONSTRAINT [PK_Petdetails] PRIMARY KEY CLUSTERED
(
    UserAccessID		ASC,
    PartitionNumber     ASC
) WITH (IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON ps_Retention(PartitionNumber)
) ON ps_Retention(PartitionNumber)  ; 

GO 

CREATE TABLE UserAccess_PartitionSwitch(
    UserAccessID		INT IDENTITY(1,1),
    UserID				INT,
    LoginDateTime       DATETIME,
    LogoutDateTime		DATETIME,
    PartitionNumber		AS (CONVERT([tinyint],abs(datediff(day,CONVERT([date],'20000101',(112)),LogoutDateTime)%(3)+(1)),(0))) PERSISTED NOT NULL,
CONSTRAINT [PK_Petdetails_PartitionSwitch] PRIMARY KEY CLUSTERED
(
    UserAccessID		ASC,
    PartitionNumber     ASC
) WITH (IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON ps_Retention(PartitionNumber)
) ON ps_Retention(PartitionNumber)  ; 

GO

Notice that the PartitionNumber column is a calculated, persisted column. This value is based upon the LogoutDatetime, so will be kept for three days from that Datetime value.
At first glance the calculation for the Partition Number can look confusing:

(CONVERT([tinyint],abs(datediff(day,CONVERT([date],'20000101',(112)),LogoutDateTime)%(3)+(1)),(0)))

So let’s start with the basics and build up from there. Substitute the ‘LogoutDateTime’ with an ISO standard date, to help show what is happening.

Using a Base Date (in this case the 1st of January 2000) calculate the number of days between then and 20150310. Of course you need to set a Base date that you will suit your purposes. In this case should I ever have a date in 1999 or earlier then it will not work as expected.

The 112 enforces ISO standard date formatting:

SELECT datediff(day,CONVERT([date],'20000101',(112)),'20150310') --112 is ISO standard, to avoid any confusion

As I’m using a 3 day retention scheme, use a modulo of 3, which will result in 0, 1 or 2. The abs function is just there to ensure no negative values, an unlikely event for this example:

SELECT abs(datediff(day,CONVERT([date],'20000101',(112)),'20150310')%(3))

Because a Partition number of 0 is invalid add 1, so making the only possible values 1, 2 or 3:

SELECT abs(datediff(day,CONVERT([date],'20000101',(112)),'20150310')%(3)+(1))

As it is such a small number, convert it to a TinyInt:

SELECT (CONVERT([tinyint],abs(datediff(day,CONVERT([date],'20000101',(112)),'20150310')%(3)+(1)),(0)))

To make sure the partition function is working, add a few rows of test data:

USE [PartitionTest]
GO

INSERT INTO [dbo].[UserAccess]
           ([UserID]
           ,[LoginDateTime]
           ,[LogoutDateTime])
VALUES     (1
           ,'20150308 01:00:00'
           ,'20150308 02:00:00'),
		    (2
           ,'20150309 01:00:00'
           ,'20150309 02:00:00'),
		    (3
           ,'20150310 01:00:00'
           ,'20150310 02:00:00'),
		    (4
           ,'20150311 01:00:00'
           ,'20150311 02:00:00'),
		    (5
           ,'20150312 01:00:00'
           ,'20150312 02:00:00'),
		    (6
           ,'20150313 01:00:00'
           ,'20150313 02:00:00'),
		    (7
           ,'20150314 01:00:00'
           ,'20150314 02:00:00')

And then check the result – look at the table contents and at how the data is spread across the actual partitions:
Partition_01
From these result it can be shown that the PartitionNumber column is never outside of the range 1, 2, 3 and the partitions themselves match the data spread indicated by the value of PartitionID.
Now to test that the Partition Switching will actually work – ensuring that the two tables have been set up correctly:
Partition_02
The three rows in Partition 2 have been moved to the Partition_Switch table.
So, I know that the structure of the tables is correct and that we can switch partitions. The next step is to create a routine that will perform this action automatically.
Our table is designed to keep data for three days, and to do this successfully we need to make sure that the next partition we are going to need (tomorrow’s partition) is empty before that day starts. So just before midnight we need to remove the data that is about to become four days old.

A Stored Procedure is now required – ‘dbo.Purge_Table’, which ascertains the partition number that is now three days old and purges it (after some checks to make sure it should), because that will be the partition required tomorrow.

CREATE PROCEDURE [dbo].[Purge_Table] AS
    SET NOCOUNT ON 

    DECLARE @DaysToKeep				TINYINT,
        @PurgePartition			TINYINT,
        @TotalPartitions		TINYINT,
        @Today					DATETIME,
        @MaxDateInPartition		DATETIME,
        @DateToPurge			DATETIME,
        @SQLCode				VARCHAR(500) 

    SET @TotalPartitions = 3;
	SET @DaysToKeep = 2; 

    SET @Today = CURRENT_TIMESTAMP
    SET @DateToPurge = DATEADD (dd, (@DaysToKeep * -1), @Today)  --Includes today. We keep today, we keep yesterday but the day before yesterday needs to go 

    SET @PurgePartition = 0     

	SET @PurgePartition = (CONVERT(tinyint,ABS(datediff(day,CONVERT([date],'20000101',(112)),@DateToPurge)%(@TotalPartitions)+(1)),(0))) --Same formula as used within the table, but
																																		 --using the @DateToPurge, to find what partition
																																		 --that date will be in.

    SET @PurgePartition = $Partition.pf_Retention (@PurgePartition)   

	-----------------------------------------------------------------------------------------------
    ---- PURGE TABLE
    -----------------------------------------------------------------------------------------------

	--Get the latest LogOutDateTime from the partition that is going to be purged.
    SELECT @MaxDateInPartition = MAX (LogOutDateTime)
     FROM dbo.UserAccess
     WHERE $Partition.pf_Retention (PartitionNumber) = @PurgePartition 

    IF EXISTS (SELECT 1 FROM dbo.UserAccess
                WHERE PartitionNumber = @PurgePartition) --Check there is data for this partition
		BEGIN
			IF DATEDIFF (dd, @MaxDateInPartition, @Today) >= @DaysToKeep OR @MaxDateInPartition IS NULL --Check the date within this partition is old enough to be removed
			BEGIN
				SELECT @SQLCode = 'ALTER TABLE dbo.UserAccess SWITCH PARTITION ' +
					CAST (@PurgePartition AS varchar) +
					' TO dbo.UserAccess_PartitionSwitch PARTITION ' +
					CAST (@PurgePartition AS varchar) 

				EXEC (@SQLCode) 

				TRUNCATE TABLE dbo.UserAccess_PartitionSwitch	--Having switched the data over, remove it 

				RAISERROR ('The dbo.UserAccess Table has been successfully Purged.', 10, 1)
			END
			ELSE
				RAISERROR ('OPERATION CANCELLED. Incorrect Data would be purged from dbo.UserAccess',16, 1)
		END
	ELSE
		RAISERROR ('dbo.UserAccess - partition has no data', 10, 1)
GO

Having cleared the table (and PartitionSwitch table) I input three rows, with today, yesterday and the day before as the logOutDateTime:
Partition_03
Now execute the Stored Procedure and the row in partition 2 should be switched to the PartitionSwitch table and removed (based on the execution date of 20150316).
Partition_04
Now all that is required is a scheduled job that executes this Stored Procedure at the end of each day, ensuring there is always an empty partition available for the next day and any data older than is required is no longer stored.

Mentoring – risk and reward

8 Feb http://www.learntotradethemarket.com/wp-content/uploads/mentor2.jpg

So, Paul Randall is offering to mentor six people for a couple of months, in response to the feedback from the Tribal Awards last year.

Now that’s a double-edged sword, isn’t it?

First of all, the opportunity to seek guidance and assistance from one of the most knowledgeable people in the SQL Server community, for free, is not to be sniffed at.

But just imagine the fun to be had in job interviews forever afterwards:

“So, you trained with Paul Randall?”, I know its mentoring, but people will see what they want to, “Well, we have a 200 server system with 20 Tb databases in 30 countries. Clustering is a bit flaky and the Merge Replication between our 7 Russian sites and Australia keeps failing – how would you fix that?”.  People can set higher expectations when they see certain names in connection with you.

And of course, there’s always the thought of failing to meet Paul’s expectations too. After all, this is his own time he’s sacrificing – squeezing in between his day job and the inordinate amount of reading he appears to do to.  Years from now whenever he’s asked – “Oh yeah, Steve – nice enough guy but I wouldn’t want him anywhere near my systems.” – the career version of the Black Spot. The rest of my career on NoSQL databases.

And what do I actually need anyway? With the all-seeing eye that is Google and a company-paid subscription to SafariBooksOnline, it certainly isn’t reading material. And although it pains me to say so, I do work with several people who really know their SQL Server, so the day-to-day stuff should be catered for. And no, I’m not giving names – they’d be insufferable.

And that’s the problem – day-to-day stuff.  Training beyond what I need for my current job. Methodology. The ‘what’ and the ‘how’.

I have so many draft articles in my blog – slowly composting because I’m not sure how to approach them or even if they’re of enough interest to myself really to actually complete.

I paid my own way to SQL Pass a couple of years ago and I’ve paid for this year too, as well attending a SQLBits conference a few years ago. This is currently how I find out what others do, outside of my little bubble of experience.

I’ve changed direction several times within my long career in IT and it’s usually because somebody who is very experienced in their field and can teach it has shown me into their little world. Not necessarily taught me the syntax and the tool usage, but shown me how to think in a way that takes advantage of whatever language/tool they use. A couple of months of being shown that can make changes that last years.

So I’m willing to put my name forward for this, in my own style. Who knows, we may both learn something. Even if not selected it made me think about what I really want out of this new direction my career has taken. And if I get nothing more from this, I can at least thank the man who introduced me to the Wool trilogy via his twitter messages.

Generating Replication Scripts – Things To Watch Out For

5 Feb audrey2B_2514

Scripting a Publication is useful for both Disaster Recovery and copying a Publication to a test environment. If you use the Wizard each time then you have to make sure that you’re consistent with the options and choices made in Live and once again on the Test (or DR) server.  By creating the script you should get something that reproduces the Publication and Subscription in full.

However, generating the script doesn’t always give you an exact copy of what you are scripting from. Some defaults might be included that weren’t used originally and some settings may be specified that were simply not used with the original creation.

A couple of examples I can demonstrate using the Publication scripted in earlier articles (http://wp.me/p3Vxvi-3l), a more entertaining example using partition switching will have to be described.

Example 1 – There wasn’t a Publication Snapshot in the Original

But if you ask SSMS to generate a Create Script for the Publication it will include one:

To generate the scripts from an existing Publication you need to right-click on the Publication name within Replication/Local Publications branch (on the Publisher):

Art01

Specify where you want the script to go and it will be generated.

In this option I’ve chosen to have the script sent straight to a New Query Window.

Now bear in mind that this Publication does not use a Snapshot. I never specified it when I originally scripted this publication.

So what is this here for?

Art02

If I’m going to use this script as a true copy of the original then I need to remove this entry.

Example 2 – The ‘sync_type’ has been set incorrectly

From the Subscriber, generate the Create Script just as with the Publisher, to a New Query Window.

This time the comments within the script do have the good grace to warn you that a default has been use that you might not want:

Art03

 

In this case I need that to be set to ‘none’.

As an aside, in SQL Server 2000 this setting was also case-sensitive – ‘none’ wouldn’t work as expected, but ‘None’ would.

Example 3 – Partition Switching fails with Replication

I have no test example to hand with which to demonstrate this (at least, not one that I can show outside of my employer’s environment) but it is simple enough to describe, now that I know the cause and resolution.

Several databases in use at my current workplace specify different partition schemes between the Publisher and the Subscriber. This is a common practice, particularly where the Subscriber might be a ‘staging’ database used to ultimately transfer data elsewhere.  So the Publisher might keep data for a couple of weeks but the Subscriber only needs to store it for a day or two, because another system is being used to store/consume that data for other purposes (reporting, analysis or whatever).

So, in my innocence I script the Publication and Subscription via SSMS, make the alterations shown in the previous two examples and create the Publication on a test environment. All is good and Replication works fine. Data that I insert into the Publisher appears in the Subscriber and I have that warm, smug feeling of having created a Publication without incident. Something to be savoured.

However, part of creating this test environment also includes setting up the jobs that purge the data in both Publisher and Subscriber, with the use of Partition Switching (for a basic example of Partition Switching, have a look at http://wp.me/p3Vxvi-3l ).

When the job runs against the Publisher that executes Partition Switching I get the following error:

“The table ‘<schema>.<tablename>’ belongs to a publication which does not allow switching of partitions [SQLSTATE 42000] (Error 21867)”.

Just for a laugh, if you want to see just how many errors Replication can produce, go to https://technet.microsoft.com/en-us/library/cc645609(v=sql.105).aspx (and to add to the fun, they aren’t logged).

After much digging around and asking others who have more Replication scars than myself it transpires that some settings aren’t scripted out and also aren’t found by any of the guid screens associated with Replication.

In the past I have right-clicked on the Publication name, selected ‘Properties’ and looked at ‘Subscription Options’, believing that comparing these between the original and the copy would be enough.  Ah, well.

There is a Replication command ‘sp_helppublication’ which shows several setting that are not visible elsewhere. At its most basic, running this command with just the Publication name will produce a row with all of the setting associated with that Publication:

Art04

 

With the particular Publication in mind I scrolled along to the far right, and the setting for ‘allow_partition_switch’ was set to 0. As BOL specifies for this parameter – “Specifies whether ALTER TABLE…SWITCH statements can be executed against the published database”.

Executing ‘sp_helppublication’ against the Live Publication shows this value as ‘1’ – so it is permitted in Live but wasn’t scripted anywhere by the automatic process through SSMS.

To change this to the value required requires the command ‘sp_changePublication’, executed against the Publisher DB in question:

EXEC sp_changepublication @publication=N'Publication Name;', @property=N'allow_partition_switch', @value = 'true';

However, that isn’t the end of it. In executing this command it also sets ‘replicate_partition_switch’ to ‘1’, which I don’t want. The publisher and Subscriber in our environments generally have different Partition Switching schemes, so just because the Publisher decides to purge any data doesn’t mean that the Subscriber does too. So I now need to unset that parameter:

--it also sets 'replicate_partition_switch' to 'true' when the previous command executes and we want that as --'false'
EXEC sp_changepublication @publication=N'Publication Name', @property=N'replicate_partition_switch', @value = 'false';

Having jumped through these hoops I now find that Partition Switching works fine and my Publication in Test really is a copy of the Publication in Live.

Replication – Monitoring via the Jobs

5 Nov Monitor

In the previous article I covered how to use the Replication Monitor to see what is happening with a Publication and to find out what the errors are.
However, in some circumstances Replication Monitor may not show the entire picture with regard to the health of a Publication.
Information can also be gleaned from the jobs that actually perform the tasks that make the Publication work.

In this example I have removed the login of the Subscriber from the Distributor with a simple DROP LOGIN command.

USE [master]
GO

DROP LOGIN [NT SERVICE\SQLAgent$SUBSCRIBER-C]

GO

Now, on the Publisher I’ll add one row to the table that is part of the Publication:

USE [PublisherDB]
GO

INSERT INTO [ReplTest].[PublicationTable]
           ([EmailAddress]
           ,[DOB]
)
     VALUES
           ('test2@test.com'
           ,'23 Jul 1970'
)
GO

With this Publication the data should be sent at about 30 seconds past each minute. Open the replication Monitor and look at the history of the Distribution:

AgentError01

However, this has been sitting here for several minutes now and the history has not moved – stuck at the last transaction that completed at 13:12:29. There is also no error message displayed, so this screen indicates that there must be a problem (because I know it should run every minute) but gives no details.

The tab for ‘Undistributed Commands’ shows that there is one transaction waiting to get to the Subscriber:

AgentError02

And firing a tracer token never gets to the Subscriber:

AgentError03

From the Tracer Token we can see that the Publisher to Distributor connection must be working and there is no response between the Distributor and Subscriber.

This Publication is a Pull Subscription, so on the Subscriber there is a job that pulls the data from the Distributor. So, look at the job history on the Subscriber. In this case there is only one Job so it is easy to select, but the name should have something within it that shows which Publication it is part of. Within the history for that job there is a long list of failures – one every minute oddly enough:

AgentError04

Expand the ‘tree’ from the red cross and the latest step just tells you there’s a problem and to look either in the previous job step or replication Monitor. We know that Replication Monitor has told us all it is going to, so look at the previous step of this job:

AgentError05

So now we know that the problem is that the job cannot connect to the Distributor. In this case recreating the LOGIN on the Distributor for this job will correct the issue:

USE [master]
GO

CREATE LOGIN [NT SERVICE\SQLAgent$SUBSCRIBER-C] FROM WINDOWS WITH DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english]
GO

ALTER SERVER ROLE [sysadmin] ADD MEMBER [NT SERVICE\SQLAgent$SUBSCRIBER-C]
GO

And we can see that the Replication monitor shows two transactions successfully processed (because the Tracer Token is a transaction too):

AgentError06

The Tracer token finally completed its journey:

AgentError07

And the Job History looks fine too:

AgentError08

Follow

Get every new post delivered to your Inbox.