Tag Archives: Replication Errors

Generating Replication Scripts – Things To Watch Out For

5 Feb

Scripting a Publication is useful for both Disaster Recovery and copying a Publication to a test environment. If you use the Wizard each time then you have to make sure that you’re consistent with the options and choices made in Live and once again on the Test (or DR) server.  By creating the script you should get something that reproduces the Publication and Subscription in full.

However, generating the script doesn’t always give you an exact copy of what you are scripting from. Some defaults might be included that weren’t used originally and some settings may be specified that were simply not used with the original creation.

A couple of examples I can demonstrate using the Publication scripted in earlier articles (http://wp.me/p3Vxvi-3l), a more entertaining example using partition switching will have to be described.

Example 1 – There wasn’t a Publication Snapshot in the Original

But if you ask SSMS to generate a Create Script for the Publication it will include one:

To generate the scripts from an existing Publication you need to right-click on the Publication name within Replication/Local Publications branch (on the Publisher):

Art01

Specify where you want the script to go and it will be generated.

In this option I’ve chosen to have the script sent straight to a New Query Window.

Now bear in mind that this Publication does not use a Snapshot. I never specified it when I originally scripted this publication.

So what is this here for?

Art02

If I’m going to use this script as a true copy of the original then I need to remove this entry.

Example 2 – The ‘sync_type’ has been set incorrectly

From the Subscriber, generate the Create Script just as with the Publisher, to a New Query Window.

This time the comments within the script do have the good grace to warn you that a default has been use that you might not want:

Art03

 

In this case I need that to be set to ‘none’.

As an aside, in SQL Server 2000 this setting was also case-sensitive – ‘none’ wouldn’t work as expected, but ‘None’ would.

Example 3 – Partition Switching fails with Replication

I have no test example to hand with which to demonstrate this (at least, not one that I can show outside of my employer’s environment) but it is simple enough to describe, now that I know the cause and resolution.

Several databases in use at my current workplace specify different partition schemes between the Publisher and the Subscriber. This is a common practice, particularly where the Subscriber might be a ‘staging’ database used to ultimately transfer data elsewhere.  So the Publisher might keep data for a couple of weeks but the Subscriber only needs to store it for a day or two, because another system is being used to store/consume that data for other purposes (reporting, analysis or whatever).

So, in my innocence I script the Publication and Subscription via SSMS, make the alterations shown in the previous two examples and create the Publication on a test environment. All is good and Replication works fine. Data that I insert into the Publisher appears in the Subscriber and I have that warm, smug feeling of having created a Publication without incident. Something to be savoured.

However, part of creating this test environment also includes setting up the jobs that purge the data in both Publisher and Subscriber, with the use of Partition Switching (for a basic example of Partition Switching, have a look at http://wp.me/p3Vxvi-3l ).

When the job runs against the Publisher that executes Partition Switching I get the following error:

“The table ‘<schema>.<tablename>’ belongs to a publication which does not allow switching of partitions [SQLSTATE 42000] (Error 21867)”.

Just for a laugh, if you want to see just how many errors Replication can produce, go to https://technet.microsoft.com/en-us/library/cc645609(v=sql.105).aspx (and to add to the fun, they aren’t logged).

After much digging around and asking others who have more Replication scars than myself it transpires that some settings aren’t scripted out and also aren’t found by any of the guid screens associated with Replication.

In the past I have right-clicked on the Publication name, selected ‘Properties’ and looked at ‘Subscription Options’, believing that comparing these between the original and the copy would be enough.  Ah, well.

There is a Replication command ‘sp_helppublication’ which shows several setting that are not visible elsewhere. At its most basic, running this command with just the Publication name will produce a row with all of the setting associated with that Publication:

Art04

 

With the particular Publication in mind I scrolled along to the far right, and the setting for ‘allow_partition_switch’ was set to 0. As BOL specifies for this parameter – “Specifies whether ALTER TABLE…SWITCH statements can be executed against the published database”.

Executing ‘sp_helppublication’ against the Live Publication shows this value as ‘1’ – so it is permitted in Live but wasn’t scripted anywhere by the automatic process through SSMS.

To change this to the value required requires the command ‘sp_changePublication’, executed against the Publisher DB in question:

EXEC sp_changepublication @publication=N'Publication Name;', @property=N'allow_partition_switch', @value = 'true';

However, that isn’t the end of it. In executing this command it also sets ‘replicate_partition_switch’ to ‘1’, which I don’t want. The publisher and Subscriber in our environments generally have different Partition Switching schemes, so just because the Publisher decides to purge any data doesn’t mean that the Subscriber does too. So I now need to unset that parameter:

--it also sets 'replicate_partition_switch' to 'true' when the previous command executes and we want that as --'false'
EXEC sp_changepublication @publication=N'Publication Name', @property=N'replicate_partition_switch', @value = 'false';

Having jumped through these hoops I now find that Partition Switching works fine and my Publication in Test really is a copy of the Publication in Live.

Replication – Monitoring via the Jobs

5 Nov

In the previous article I covered how to use the Replication Monitor to see what is happening with a Publication and to find out what the errors are.
However, in some circumstances Replication Monitor may not show the entire picture with regard to the health of a Publication.
Information can also be gleaned from the jobs that actually perform the tasks that make the Publication work.

In this example I have removed the login of the Subscriber from the Distributor with a simple DROP LOGIN command.

USE [master]
GO

DROP LOGIN [NT SERVICE\SQLAgent$SUBSCRIBER-C]

GO

Now, on the Publisher I’ll add one row to the table that is part of the Publication:

USE [PublisherDB]
GO

INSERT INTO [ReplTest].[PublicationTable]
           ([EmailAddress]
           ,[DOB]
)
     VALUES
           ('test2@test.com'
           ,'23 Jul 1970'
)
GO

With this Publication the data should be sent at about 30 seconds past each minute. Open the replication Monitor and look at the history of the Distribution:

AgentError01

However, this has been sitting here for several minutes now and the history has not moved – stuck at the last transaction that completed at 13:12:29. There is also no error message displayed, so this screen indicates that there must be a problem (because I know it should run every minute) but gives no details.

The tab for ‘Undistributed Commands’ shows that there is one transaction waiting to get to the Subscriber:

AgentError02

And firing a tracer token never gets to the Subscriber:

AgentError03

From the Tracer Token we can see that the Publisher to Distributor connection must be working and there is no response between the Distributor and Subscriber.

This Publication is a Pull Subscription, so on the Subscriber there is a job that pulls the data from the Distributor. So, look at the job history on the Subscriber. In this case there is only one Job so it is easy to select, but the name should have something within it that shows which Publication it is part of. Within the history for that job there is a long list of failures – one every minute oddly enough:

AgentError04

Expand the ‘tree’ from the red cross and the latest step just tells you there’s a problem and to look either in the previous job step or replication Monitor. We know that Replication Monitor has told us all it is going to, so look at the previous step of this job:

AgentError05

So now we know that the problem is that the job cannot connect to the Distributor. In this case recreating the LOGIN on the Distributor for this job will correct the issue:

USE [master]
GO

CREATE LOGIN [NT SERVICE\SQLAgent$SUBSCRIBER-C] FROM WINDOWS WITH DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english]
GO

ALTER SERVER ROLE [sysadmin] ADD MEMBER [NT SERVICE\SQLAgent$SUBSCRIBER-C]
GO

And we can see that the Replication monitor shows two transactions successfully processed (because the Tracer Token is a transaction too):

AgentError06

The Tracer token finally completed its journey:

AgentError07

And the Job History looks fine too:

AgentError08

Replication Monitor Basic Overview

31 Oct

Most issues with Replication are detailed in either the Replication Monitor or in the history of the jobs used to run the Replication tasks. The following examples are based upon the Pull Subscription detailed in earlier articles.

Replication Monitor

This tool is available to anyone who is a member of the sysadmin role or replmonitor role. It is loaded from any SQL Server instance that has Replication enabled, although life is easier if you launch it from the Publisher or Distributor (anywhere else and you may have to configure it to look for the correct servers).
Within SSMS Object Explorer, right-click on ‘Replication’ and select ‘Launch Replication Monitor’, which will result in:

Replication_Monitor_03

In the screenshot above the tree structure in the left window has been expanded. The highest level is the Distibutor, expand that and beneath it is the Publisher and beneath that is the actual Publication.
Select the Distributor and three tabs are available in the right-hand pane – ‘Publications’, ‘Subscription Watch List’ and ‘Agents’.

Publications shows the Publications that this Distributor is responsible for, providing the Publisher name, Publications name, the number of Subscriptions and basic performance data.

Agents shows the various Agents that can be involved in Replication. In this Publication select ‘Merge Agent’ will show nothing, because it isn’t a Merge Publication and selecting ‘Snapshot Agent’ will show a status of ‘Never Started’ because snapshots are not used with Publication. Other selections will show the status of various Agents and jobs connected to this Distributor.

Subscription Watch List is the tab probably used most often. Double-click on the entry in the right-hand window and another window pops up, providing a detailed history of that Publication:

Replication_Monitor_04

There are three tabs in this control and the most important of these tends to be the ‘Distributor To Subscriber History’ tab. As a default it shows the last 100 synchronisations (in this Publication there is one synchronisation every minute) but can be changed by using the drop-down at the top of that tab.

To show what should normally happen with this tab clear the Subscriber table, generate 1000 rows of data within the Publisher and watch this screen as the data is Published across.
For ease, Redgate SQL Data Generator has been used to insert the 1000 rows. From the current display on that tab you can see that it normally refreshes at roughly 30 seconds past each minute, so once it is due press F5 to refresh the screen and the message can be seen giving basic details of what was Published from the Distributor to the Subscriber:

Replication_Monitor_05

Errors within the Publication

Now repeat the insertion on the Publisher, having truncated the table on the Publisher first it will generate Primary Keys that already exist on the Subscriber. This of course, will not go down well when the data gets to the Subscriber.

Initially the History will show that there is a problem and it is going to retry the individual commands, instead of the entire batch in one go:

Replication_Monitor_06

A short while later it gives up and provide a detailed error message:

Replication_Monitor_07

In this case it is informing of a violation of the Primary Key constraint, as that PK already exists on the Subscriber. This series of retrying and error reporting will repeat until something is done about it. In this case remove the data from the Publisher that already exists with the Primary Keys being Published and all will return to normal:

Replication_Monitor_08

Another way to use this tool for locating the problem is in the Error Details part of the screen, with the Transaction Number displayed.
To show what can happen if the customised Stored Procedure is wrong, within the Subscriber change the SP ‘dbo.sp_MSins_Repl_SubscriptionTable’ to remove reference to th ‘PK’ column, which is NOT NULL.

USE [SubscriberDB]
GO

ALTER procedure [dbo].[sp_MSins_Repl_SubscriptionTable]
    @c1 bigint,
    @c2 varchar(50),
    @c3 date,
    @c4 datetime
as
begin  
	insert into [Repl].[SubscriptionTable](
		--[PK],
		[EmailAddress],
		[DOB],
		[DateCreated]
	) values (
    --@c1,
    @c2,
    @c3,
    @c4	) 
end  

Of course, any attempt to write a row to this table via this SP now will result in an error. The trick is to work out what command is failing within replication.

On the Publisher, execute the following:

USE [PublisherDB]
GO

INSERT INTO [ReplTest].[PublicationTable]
           ([EmailAddress]
           ,[DOB])
     VALUES
           ('test@test.com'
           ,'01 Jun 1970')
GO

Initially it will retry the command, as before. Eventually it will show an error message, along with the Transaction Sequence Number:

Replication_Monitor_09

That binary number is used to run a query against the Distributor database to get the data it is attempting to send. The query is ‘sp_browsereplcmds’, with two parameters – both are the binary value taken from the Replication Monitor:

Replication_Monitor_10

If multiple rows are returned from this query then the required row is the one where ‘command_id’ matches ‘Command ID’ shown in the error message from Replication Monitor (although you should check the column ‘partial_command’, as the query might be split across several rows).

From the column ‘command’ we can see the call to ‘sp_MSins_Repl_SubscriptionTable’ along with the parameters used.
Copying the contents of this column to SSMS and running it against the Subscriber (with a minor amount of editing to keep SSMS happy) shows the error:

Replication_Monitor_11

Now we have the command that is causing the issue and the parameters it is using. In this case it’s obviously a fault within the SP, so simply amend it back to save the PK column and all returns to normal.

Replication_Monitor_12

Tokens

Another basic way to check Replication is all connected up is to run a Tracer Token through it. A Tracer Token is just a small amount of data written to the log of the Publisher and then tracked through the Publication.
To run a Tracer Token through Replication Monitor go to the bottom of the tree structure in the left-hand window, selecting the Publication. Three tabs appear and one of these is ‘Tracer Tokens’. Select that tab and press the button ‘Insert tracer’.

Replication_Monitor_13

The time for ‘Publisher to Distributor’ will generally be within a few second, as that part of the Publication is configured to run constantly. ‘Distributor to Subscriber’ can take a while, depending upon when that job is scheduled. In this Publication it is once every minute, so could take anywhere up to one minute to complete. Once completed it shows the total latency and is an indication that the connections are configured correctly for the three main elements (Publisher, Distributor and Subscriber) to communicate correctly.