Replication – Monitoring via the Jobs

5 Nov Monitor

In the previous article I covered how to use the Replication Monitor to see what is happening with a Publication and to find out what the errors are.
However, in some circumstances Replication Monitor may not show the entire picture with regard to the health of a Publication.
Information can also be gleaned from the jobs that actually perform the tasks that make the Publication work.

In this example I have removed the login of the Subscriber from the Distributor with a simple DROP LOGIN command.

USE [master]
GO

DROP LOGIN [NT SERVICE\SQLAgent$SUBSCRIBER-C]

GO

Now, on the Publisher I’ll add one row to the table that is part of the Publication:

USE [PublisherDB]
GO

INSERT INTO [ReplTest].[PublicationTable]
           ([EmailAddress]
           ,[DOB]
)
     VALUES
           ('test2@test.com'
           ,'23 Jul 1970'
)
GO

With this Publication the data should be sent at about 30 seconds past each minute. Open the replication Monitor and look at the history of the Distribution:

AgentError01

However, this has been sitting here for several minutes now and the history has not moved – stuck at the last transaction that completed at 13:12:29. There is also no error message displayed, so this screen indicates that there must be a problem (because I know it should run every minute) but gives no details.

The tab for ‘Undistributed Commands’ shows that there is one transaction waiting to get to the Subscriber:

AgentError02

And firing a tracer token never gets to the Subscriber:

AgentError03

From the Tracer Token we can see that the Publisher to Distributor connection must be working and there is no response between the Distributor and Subscriber.

This Publication is a Pull Subscription, so on the Subscriber there is a job that pulls the data from the Distributor. So, look at the job history on the Subscriber. In this case there is only one Job so it is easy to select, but the name should have something within it that shows which Publication it is part of. Within the history for that job there is a long list of failures – one every minute oddly enough:

AgentError04

Expand the ‘tree’ from the red cross and the latest step just tells you there’s a problem and to look either in the previous job step or replication Monitor. We know that Replication Monitor has told us all it is going to, so look at the previous step of this job:

AgentError05

So now we know that the problem is that the job cannot connect to the Distributor. In this case recreating the LOGIN on the Distributor for this job will correct the issue:

USE [master]
GO

CREATE LOGIN [NT SERVICE\SQLAgent$SUBSCRIBER-C] FROM WINDOWS WITH DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english]
GO

ALTER SERVER ROLE [sysadmin] ADD MEMBER [NT SERVICE\SQLAgent$SUBSCRIBER-C]
GO

And we can see that the Replication monitor shows two transactions successfully processed (because the Tracer Token is a transaction too):

AgentError06

The Tracer token finally completed its journey:

AgentError07

And the Job History looks fine too:

AgentError08

Replication Monitor Basic Overview

31 Oct monitoring

Most issues with Replication are detailed in either the Replication Monitor or in the history of the jobs used to run the Replication tasks. The following examples are based upon the Pull Subscription detailed in earlier articles.

Replication Monitor

This tool is available to anyone who is a member of the sysadmin role or replmonitor role. It is loaded from any SQL Server instance that has Replication enabled, although life is easier if you launch it from the Publisher or Distributor (anywhere else and you may have to configure it to look for the correct servers).
Within SSMS Object Explorer, right-click on ‘Replication’ and select ‘Launch Replication Monitor’, which will result in:

Replication_Monitor_03

In the screenshot above the tree structure in the left window has been expanded. The highest level is the Distibutor, expand that and beneath it is the Publisher and beneath that is the actual Publication.
Select the Distributor and three tabs are available in the right-hand pane – ‘Publications’, ‘Subscription Watch List’ and ‘Agents’.

Publications shows the Publications that this Distributor is responsible for, providing the Publisher name, Publications name, the number of Subscriptions and basic performance data.

Agents shows the various Agents that can be involved in Replication. In this Publication select ‘Merge Agent’ will show nothing, because it isn’t a Merge Publication and selecting ‘Snapshot Agent’ will show a status of ‘Never Started’ because snapshots are not used with Publication. Other selections will show the status of various Agents and jobs connected to this Distributor.

Subscription Watch List is the tab probably used most often. Double-click on the entry in the right-hand window and another window pops up, providing a detailed history of that Publication:

Replication_Monitor_04

There are three tabs in this control and the most important of these tends to be the ‘Distributor To Subscriber History’ tab. As a default it shows the last 100 synchronisations (in this Publication there is one synchronisation every minute) but can be changed by using the drop-down at the top of that tab.

To show what should normally happen with this tab clear the Subscriber table, generate 1000 rows of data within the Publisher and watch this screen as the data is Published across.
For ease, Redgate SQL Data Generator has been used to insert the 1000 rows. From the current display on that tab you can see that it normally refreshes at roughly 30 seconds past each minute, so once it is due press F5 to refresh the screen and the message can be seen giving basic details of what was Published from the Distributor to the Subscriber:

Replication_Monitor_05

Errors within the Publication

Now repeat the insertion on the Publisher, having truncated the table on the Publisher first it will generate Primary Keys that already exist on the Subscriber. This of course, will not go down well when the data gets to the Subscriber.

Initially the History will show that there is a problem and it is going to retry the individual commands, instead of the entire batch in one go:

Replication_Monitor_06

A short while later it gives up and provide a detailed error message:

Replication_Monitor_07

In this case it is informing of a violation of the Primary Key constraint, as that PK already exists on the Subscriber. This series of retrying and error reporting will repeat until something is done about it. In this case remove the data from the Publisher that already exists with the Primary Keys being Published and all will return to normal:

Replication_Monitor_08

Another way to use this tool for locating the problem is in the Error Details part of the screen, with the Transaction Number displayed.
To show what can happen if the customised Stored Procedure is wrong, within the Subscriber change the SP ‘dbo.sp_MSins_Repl_SubscriptionTable’ to remove reference to th ‘PK’ column, which is NOT NULL.

USE [SubscriberDB]
GO

ALTER procedure [dbo].[sp_MSins_Repl_SubscriptionTable]
    @c1 bigint,
    @c2 varchar(50),
    @c3 date,
    @c4 datetime
as
begin  
	insert into [Repl].[SubscriptionTable](
		--[PK],
		[EmailAddress],
		[DOB],
		[DateCreated]
	) values (
    --@c1,
    @c2,
    @c3,
    @c4	) 
end  

Of course, any attempt to write a row to this table via this SP now will result in an error. The trick is to work out what command is failing within replication.

On the Publisher, execute the following:

USE [PublisherDB]
GO

INSERT INTO [ReplTest].[PublicationTable]
           ([EmailAddress]
           ,[DOB])
     VALUES
           ('test@test.com'
           ,'01 Jun 1970')
GO

Initially it will retry the command, as before. Eventually it will show an error message, along with the Transaction Sequence Number:

Replication_Monitor_09

That binary number is used to run a query against the Distributor database to get the data it is attempting to send. The query is ‘sp_browsereplcmds’, with two parameters – both are the binary value taken from the Replication Monitor:

Replication_Monitor_10

If multiple rows are returned from this query then the required row is the one where ‘command_id’ matches ‘Command ID’ shown in the error message from Replication Monitor (although you should check the column ‘partial_command’, as the query might be split across several rows).

From the column ‘command’ we can see the call to ‘sp_MSins_Repl_SubscriptionTable’ along with the parameters used.
Copying the contents of this column to SSMS and running it against the Subscriber (with a minor amount of editing to keep SSMS happy) shows the error:

Replication_Monitor_11

Now we have the command that is causing the issue and the parameters it is using. In this case it’s obviously a fault within the SP, so simply amend it back to save the PK column and all returns to normal.

Replication_Monitor_12

Tokens

Another basic way to check Replication is all connected up is to run a Tracer Token through it. A Tracer Token is just a small amount of data written to the log of the Publisher and then tracked through the Publication.
To run a Tracer Token through Replication Monitor go to the bottom of the tree structure in the left-hand window, selecting the Publication. Three tabs appear and one of these is ‘Tracer Tokens’. Select that tab and press the button ‘Insert tracer’.

Replication_Monitor_13

The time for ‘Publisher to Distributor’ will generally be within a few second, as that part of the Publication is configured to run constantly. ‘Distributor to Subscriber’ can take a while, depending upon when that job is scheduled. In this Publication it is once every minute, so could take anywhere up to one minute to complete. Once completed it shows the total latency and is an indication that the connections are configured correctly for the three main elements (Publisher, Distributor and Subscriber) to communicate correctly.

Replication – Scripting the Pull Subscription – Basic Example

6 Oct From http://bonnieplants.com/growing/growing-brussels-sprouts/

Having prepared the Distributor in the previous article it is now possible to create a Subscription.

First of all create a test database and table on the Publisher (Publisher-A):

CREATE DATABASE PublisherDB; 
GO 

USE PublisherDB; 
GO 

CREATE SCHEMA ReplTest; 
GO 

CREATE TABLE ReplTest.PublicationTable( 
	PK				BIGINT IDENTITY, 
	EmailAddress	VARCHAR(50), 
	DOB				DATE, 
	DateCreated		DATETIME DEFAULT GETDATE(), 
 CONSTRAINT PK_PublicationTable PRIMARY KEY CLUSTERED 
( 
	PK ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)  
)  

GO 
  

A database is also required on the Subscriber (Subscriber-C):

CREATE DATABASE SubscriberDB; 
GO 

USE SubscriberDB; 
GO 

CREATE SCHEMA Repl; 
GO 

CREATE TABLE Repl.SubscriptionTable( 
	PK				BIGINT, 
	EmailAddress	VarChar(50), 
	DOB				Date, 
	DateCreated		DateTime, 
	ReplDatecreated	DateTime DEFAULT GETDATE(), 
 CONSTRAINT PK_SubscriptionTable PRIMARY KEY CLUSTERED 
( 
	PK ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) 
) 

GO 

Now back to the Publisher. This server needs to be configured as a Publisher. The steps used to create the Distibution Server have already associated this server with the Distributor but now this server needs to be configured as a Publisher and also associated with the Distributor (Server B needs to know about Server A and then Server A needs to know about Server B as a seperate step):

use master  
exec sp_adddistributor @distributor = N'9999-PC\Distributor-B', @password = N'Spr0uts' 
GO 

The response should be ‘You have updated the Publisher property ‘active’ successfully.’

Having configured the server to allow Publications, the database itself needs to be configured for publishing:

exec sp_replicationdboption @dbname = N'PublisherDB', @optname = N'publish', @value = N'true' 
GO 

And Log Reader Agents need to be created:

exec PublisherDB.sys.sp_addlogreader_agent @job_login = null, @job_password = null, @publisher_security_mode = 1 
GO 
exec PublisherDB.sys.sp_addqreader_agent @job_login = null, @job_password = null, @frompublisher = 1 
GO 

The response from this will notify you of two Jobs, although the exact names may vary:
“Job ‘9999-PC\Publisher-A-PublisherDB-1′ started successfully.
Job ‘[9999-PC\Publisher-A].5′ started successfully.”

Now create the Publication:

use PublisherDB; 
exec sp_addpublication @publication = N'Test_Publication', @description = N'Transactional publication of database ''PublisherDB''.', @sync_method = N'native', @retention = 0, 
@allow_push = N'false', @allow_pull = N'true', @allow_anonymous = N'false', @enabled_for_internet = N'false', 
@snapshot_in_defaultfolder = N'true', @compress_snapshot = N'false', @ftp_port = 21, @ftp_login = N'anonymous', 
@allow_subscription_copy = N'false', @add_to_active_directory = N'false', @repl_freq = N'continuous', @status = N'active', 
@independent_agent = N'true', @immediate_sync = N'false', @allow_sync_tran = N'false', @autogen_sync_procs = N'false', 
@allow_queued_tran = N'false', @allow_dts = N'false', @replicate_ddl = 1, @allow_initialize_from_backup = N'false', 
@enabled_for_p2p = N'false', @enabled_for_het_sub = N'false' 
GO 

If you’ve used the Wizard to generate the scripts, the next command adds the publication snapshot. Where I work snapshots aren’t used because the databases are rather large and the lock such a command takes would make database access somewhat problematic for the hundreds of users that connect every minute.
So we’ll skip the ‘sp_addPublication_Snapshot’ command and move onto the next step – granting access to specified accounts.

We now need to create the various accounts and permissions that this Replication will require:

First, the Distributor needs access to this Publisher:

CREATE LOGIN [NT SERVICE\SQLAgent$DISTRIBUTOR-B] FROM WINDOWS WITH DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english] 
GO 

And to the DB:

USE PublisherDB; 
GO 

CREATE USER [NT SERVICE\SQLAgent$DISTRIBUTOR-B] FOR LOGIN [NT SERVICE\SQLAgent$DISTRIBUTOR-B] 
GO 

ALTER ROLE [db_owner] ADD MEMBER [NT SERVICE\SQLAgent$DISTRIBUTOR-B] 
GO

Various accounts involved with the Replication process also require access:

exec sp_grant_publication_access @publication = N'Test_Publication', @login = N'sa' 
GO 
exec sp_grant_publication_access @publication = N'Test_Publication', @login = N'NT AUTHORITY\SYSTEM' 
GO 
exec sp_grant_publication_access @publication = N'Test_Publication', @login = N'NT Service\SQLAgent$PUBLISHER-A' 
GO 
exec sp_grant_publication_access @publication = N'Test_Publication', @login = N'NT Service\SQLAgent$DISTRIBUTOR-B' 
GO 

Now we tell the Replication process which table we want to Replicate. Within Replication the tables are referred to as ‘Articles’, borrowing the terms from the age-old process of newspapaer publishing. For this Article we’re only going to be interested in copying new rows across. For now we don’t care about updating or deleting data, so there’ll only be one Stored Procedure defined – against the parameter ‘@ins_cmd’. You specify the name and then the process of creating the Article takes care of generating the code:

exec sp_addarticle @publication = N'Test_Publication', @article = N'Test_Publication_Article', @source_owner = N'ReplTest', @source_object = N'PublicationTable', @type = N'logbased', 
@description = N'', @creation_script = N'', @pre_creation_cmd = N'none', @schema_option = 0x0000000000030002, @identityrangemanagementoption = N'none', 
@destination_table = N'SubscriptionTable', @destination_owner = N'Repl', @status = 24, @vertical_partition = N'false', 
@ins_cmd = N'CALL [dbo].[sp_MSins_Repl_SubscriptionTable]', @del_cmd = N'NONE', @upd_cmd = N'NONE' 
GO 

At this point you can right-click on ‘Replication/Local Publications’ and see the Properties, confirmimg the table is set up for Replication:
Replication_Properties01

Going back to the Distributor, the Distributor needs to have the login details for the Subscriber, as this is a Pull Subscription and it will be the responsibility of the Subscriber to transfer the data from the Distributor:

USE [master] 
GO 
CREATE LOGIN [NT SERVICE\SQLAgent$SUBSCRIBER-C] FROM WINDOWS WITH DEFAULT_DATABASE=[master] 
GO 
ALTER SERVER ROLE [sysadmin] ADD MEMBER [NT SERVICE\SQLAgent$SUBSCRIBER-C] 
GO 
USE [distribution] 
GO 
CREATE USER [NT SERVICE\SQLAgent$SUBSCRIBER-C] FOR LOGIN [NT SERVICE\SQLAgent$SUBSCRIBER-C] 
GO 
USE [distribution] 
GO 
ALTER ROLE [db_owner] ADD MEMBER [NT SERVICE\SQLAgent$SUBSCRIBER-C] 
GO 

On the Subscriber the Pull Subscription now needs to be created:

use SubscriberDB 
GO 
exec sp_addpullsubscription @publisher = N'9999-PC\Publisher-A', @publication = N'Test_Publication', @publisher_db = N'PublisherDB', @independent_agent = N'True', 
@subscription_type = N'pull', @description = N'', @update_mode = N'read only', @immediate_sync = 0 

As well as the Agent that will be responsible for it:

exec sp_addpullsubscription_agent @publisher = N'9999-PC\Publisher-A', @publisher_db = N'PublisherDB', @publication = N'Test_Publication', 
@distributor = N'9999-PC\Distributor-B', @distributor_security_mode = 1, @distributor_login = N'distributor_admin', @distributor_password = N'Spr0uts', 
@enabled_for_syncmgr = N'False', @frequency_type = 4, @frequency_interval = 1, @frequency_relative_interval = 0, 
@frequency_recurrence_factor = 0, @frequency_subday = 4, @frequency_subday_interval = 1, @active_start_time_of_day = 30, 
@active_end_time_of_day = 235959, @active_start_date = 0, @active_end_date = 0, @alt_snapshot_folder = N'', 
@working_directory = N'', @use_ftp = N'False', @job_login = null, @job_password = null, @publication_type = 0 
GO 

Now the custom Stored Procedure needs to be created within the Subscriber, to process the inserts. To do this, the Stored Procedure is generated from the Publisher with the command:

Use PublisherDB; 
GO 

EXEC sp_scriptpublicationcustomprocs 'Test_Publication' 

The respnse from this:
Script_Stored_Procs
can then be copied and pasted into the Subscriber and executed against the Subscriber DB:
Script_Stored_Procs_Output

Finally, from the PublisherDB we need to active the Subscription:

exec sp_addsubscription @publication = N'Test_Publication', @subscriber = N'9999-PC\Subscriber-C', @destination_db = N'SubscriberDB', @subscription_type = N'Pull', @sync_type = N'None', 
@article = N'all', @update_mode = N'read only', @subscriber_type = 0 
GO 

A word of warning regarding the parameter ‘@sync_type’. Where I work, the use of ‘none’ is commonplace (because we don’t replicate schema changes automatically – we control it ourselves) and if you look at the documentation it is deprecated, so a better alternative is ‘replication support only’. However, when you generate the scripts from an existing Publication using the Wizard, you may find that the parameter has ‘automatic’ set against it – regardless of the value you initially created.
Replication has lots of lovely surprises like that.

At this point, from SSMS on the Subscriber you should be able to expand ‘Replication/Local Publications’ and see the name of the Publication and the Subscriber details:
Replication_Details

A basic Pull Subscription has now been set up. In following articles I’ll cover how to test and monitor this, with basic steps on what to look for when it goes wrong.

Replication – Scripting the Distributor – Basic Example

2 Oct FreeGreatPicture.com-8021-sprouts-leaves

By far the easiest way to create the Distributor is via the Wizard and examples abound of that – one such example is at SQL Server Central http://www.sqlservercentral.com/articles/Stairway+Series/72282/ .

So why use scripts?

Mainly for Disaster Recovery and consistency. Should you have to rebuild the server then running the scripts is easier and more reliable than re-running the Wizard – there’s less chance somebody can enter different values and therefore cause issues with the Replication.

So, let’s have a look at the scripts behind this wizard.

Firstly – I have three servers, installed as separate instances on the same machine:
Publisher-A, as the Publisher
Distributor-B as the Distributor and
Subscriber-C as the Subscriber.
Currently none of these servers are configured for Replication. All instances were installed to cater for Replication but that is all.

Firstly, make sure the Agent Services are running – it makes life easier and you will need them to actual perform the replication tasks. SQL Configuration Manager can be used for this.

There are four scripts in the standard Distributor creation, gleaned from scripting the Wizard. All of these script are being executed on ‘Distributor-B’ and of course I’ve replaced the server name – I may well be working in a secure environment but there’s no need to tempt anybody.

1. Designate the Distributor server:

use master
exec sp_adddistributor 
     @distributor = N'9999-PC\Distributor-B', 
     @password = N'Spr0uts' 
GO 

Having executed this code you will see an additional entry in sys.sysservers, giving ‘srvname’ of ‘repl_distributor’ and showing ‘dist’ as 1. An associated entries under ‘Linked Servers’ within SSMS will also be visible.

Next, create the Distibutor database:

exec sp_adddistributiondb @database = N'distribution', 
			@data_folder = N'C:\Program Files\Microsoft SQL Server\MSSQL11.Distributor-B\MSSQL\DATA', 
			@log_folder = N'C:\Program Files\Microsoft SQL Server\MSSQL11.Distributor-B\MSSQL\DATA', 
			@log_file_size = 2, 
			@min_distretention = 0, 
			@max_distretention = 72, 
			@history_retention = 48, 
			@security_mode = 0, 
			@login = N'sa',--login used when connecting to the distributor to create the distribution database 
			@password='Spr0uts' 
GO 

I’ve left the default name for the DB ‘distributor’ and in this case have left the other defaults too (retention periods and login).
As this code executes it displays various messages relating to the creation of the DB, as well as adding the roles ‘guest’ and ‘replmonitor’. Once completed the database is visible under ‘System Databases’ as a distributor:

Sys_Databases

The next command runs against this new distributor database and is used to record the location of the snapshot folder. Now as I don’t intend to use snapshots I’m only including this because all scripts appear to include it and I can’t find a decent explanation for it. When I have the time I’m going to try excluding this command and see what fun it creates. For now though I’ll just play safe:

use [distribution] 
if (not exists (select * from sysobjects where name = 'UIProperties' and type = 'U ')) 
	create table UIProperties(id int) 
if (exists (select * from ::fn_listextendedproperty('SnapshotFolder', 'user', 'dbo', 'table', 'UIProperties', null, null))) 
	EXEC sp_updateextendedproperty N'SnapshotFolder', N'\\9999-PC\C$\Replication\Snapshot', 'user', dbo, 'table', 'UIProperties' 
else 
	EXEC sp_addextendedproperty N'SnapshotFolder', N'\\9999-PC\C$\Replication\Snapshot', 'user', dbo, 'table', 'UIProperties' 
GO 

Notice that the locations needs a fully-qualified name.

The final command registers a Publisher that will use this Distributor:

exec sp_adddistpublisher @publisher = N'9999-PC\Publisher-A', 
		@distribution_db = N'distribution', 
		@security_mode = 1, 
		@working_directory = N'\\9999-PC\C$\Replication\Snapshot', 
		@trusted = N'false', 
		@thirdparty_flag = 0, 
		@publisher_type = N'MSSQLSERVER'
GO

This will create an entry under ‘Linked Servers’ for the Publisher. The only response from this code is the usual ‘Command completed successfully’ message. Trying it a second time will tell you that the server is already listed as a publisher.

Note: If you’ve used the ‘Configure Distribution Wizard’ to create the scripts, you’ll notice that the Distributor is also created as a Publisher. This step isn’t necessary for the configuration I’m using here.

Now we have a Distributor created and can use that to build upon in following articles.

SSIS Error Code “0x80070057″ in Slowly Changing Dimension

28 Aug

SSIS can produce an inordinate number of error messages and numbers. Some of these can even be helpful and informative.
Generally accompanied by an Error Code, one of the most popular appears to be “0x80070057” and can have a number of causes.

Recently I came across this particularly helpful code when creating a Slowly Changing Dimension:

Error: 0xC0202009 at ‘Task Name’ , Slowly Changing Dimension [14907]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80070057.
An OLE DB record is available. Source: “Microsoft SQL Server Native Client 10.0″ Hresult: 0x80070057 Description: “The parameter is incorrect.”.
Error: 0xC0047022 at ‘Task Name’ , SSIS.Pipeline: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component “Slowly Changing Dimension” (14907) failed with error code 0xC0202009 while processing input “Slowly Changing Dimension Input” (14918). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.
Error: 0xC02020C4 at ‘Task Name’ , ‘View Name’ [1]: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.
Error: 0xC0047038 at , SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component “’View Name’” (1) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.

The behaviour was thus – run it for the first time and the process extracted data from a View and inserted all of the rows as new rows into the output. Run it a second time, when it would check for changes (because the output table now had data) and it would instantly fail with the above error message.

The first thing I noticed was the “The parameter is incorrect”, which is strange because the Wizard created everything based on the details I fed into it – so it shouldn’t be anything I’ve done.
So, as I had created 5 other Slowly Changing Dimensions for other Views and Tables I decided to recreate this one from the start with a very ‘narrow’ table. The table concerned with this particular process was rather wide, certainly when compared to the others that had worked successfully.
A couple of fairly tedious hours later I had a failure when I added the last few columns back in (as ever, it’s always something at the end). These columns were varchar(8000) and there were six of them.
This is where the line in the error message “The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020” started to look more interesting.
Checking these columns showed that the 8000 size was a tad generous, based upon the data likely to be stored within them. Reducing these to 300 each removed the error and everything worked as expected.

I have no idea what size buffer SSIS can create for this processing but the message implied that it wasn’t large enough for this particular table. Reducing the width of the table corrected this.
It may not be the solution for every instance of that message but in this case it worked and can be added to the list of possible solutions to one less-than-helpful SSIS Error Code.

Replication – the sprouts of my DBA world.

14 Aug h94C0D998

h94C0D998
Actually, that isn’t entirely fair. I don’t have to eat sprouts, which is just as well. However, I do have to deal with Replication – certainly in my current job. Whereas my refusal to eat the Little Green Balls of Death won’t result in a divorce (because my wife, as wonderful as she is, does for some strange reason really like sprouts) I can’t imagine a job that specified it uses Replication will suffer my presence for long if I flatly refuse to touch it.

SQL Server has a broad range of tools and features, so understanding all of them beyond a basic level is probably beyond most of us. However, within your role you will be expected to have a detailed understanding of some aspects of SQL Server and here, Replication is amongst those required skills.

Like many people, I have created Replication tasks in the past by using the Wizards, so I can at least have a basic understanding of the subject. However, the Wizard does everything for you, makes assumptions and on a simple test system it does work quite well.

On an industrial scale though, and processed via SQL Scripts it is another story. Some of the error messages it can give are less than useful and let’s face it – Replication Monitor isn’t the most helpful tool.

The following series of articles on Replication are to act as an aide-memoire for myself, which is the real reason I have this blog series. Like most people I find that if I don’t use an aspect of SQL Server for an extended period of time I forget the finer details. Having got to grip with a portion of Replication I need to document it. If it helps anybody else then that’s a bonus.

Assumptions

First – the interesting bits. No DDL changes replicated (generally), because in some cases we don’t require replication of all of the columns and the subscriber table may have additional columns (its own partition scheme – so another partition id and a date time showing when it was written to the subscriber).

No snapshots, because of the size of some of the publisher’s tables. A lock is taken while a snapshot is generated, which isn’t desirable on a database that is required 24/7.

And of course – no Wizards. If you have to rebuild a server then it is so much easier to have the scripts to hand, rather than answer a plethora of questions from a Wizard and hope you gave the answers that the DBA who first set up Replication gave. Easier to see what is going to happen and also removes some of the more interesting defaults that the Wizards can specify.

I have three SQL Server 2012 instances on my Development machine and I’ll be using these as Publisher, Distributor and Subscriber.

The articles that follow will include the initial creation of the Distributor, creating a Pull Subscription and changing an Article. As my experience with sprouts Replication continues I’ll add more articles, because there’s no way this mess can stay fresh in my mind for long.

Partition Switching – Basic Example

17 Feb shell_game

Want to delete a couple of records quickly from a large table; well DELETE is more than capable (indexing issues aside). Want to delete a couple of million from a large table quickly? Now it can get a little more interesting.

Partitioning is a very useful tool for moving large amounts of data very quickly. However, it does require an understanding of the several elements of this interesting aspect of SQL Server.

    What is Partitioning?

You can have a table within a database that for general use appears to be one large storage area for your information. However, it can be stored spilt into several filegroups and these filegroups can be spread across several folders/drives. So your data in reality can be spread across a wide range of drives for improved access on a hardware level whilst appearing to the user as one table, without having to take any considerations for its partitioned structure.

    How do you Partition?

To create a partitioned table there are three things that need to be present:

1. A Partition Function, which describes how the data will be split across partitions. It shows the values that will be used for dictating how the data is spread across those partitions. At this point it doesn’t specify the column itself, just the values.

2. A Partition Scheme, to show which filegroups are used for the partitions. If you design a table to have two partitioned values (as described in the Partition Function) then the Partition Scheme will have a filegroup for each of these plus one extra for those values that won’t fit within the values specified in the Function.

3. Allocating the table to a Partition Scheme. When the table is created it needs to be allocated at this time to a Partition Scheme. It is at this point that you specify which column of the table is used by the Partition Scheme, to map values according to the Partition Function.

    Example

Quite often examples of partitioning are given using date ranges, because that is a popular way of dividing the data. This time I’m going to use something that I found easier to visualise and demonstrate – animals.

For no particular reason I’m going to create a table that stores details of cats and dogs. One partition will contain Cats and the other Dogs. Using these as the boundaries of the partitions will show what happens when other animals are introduced, depending upon how they are listed alphabetically.

First of all, create a database. To properly utilise partitioning I should create the database across several filegroups but that isn’t essential for this demonstration. In my environment the following code will create a database in one filegroup – ‘PRIMARY’.

CREATE DATABASE PartitionTest; 
GO 

Now we create a Partition Function, which requires a name, the definition of the column it will ultimately be mapped against (or to look at it the other way – the mapped column should have the same definition as the Partition Function), the description of how the partition boundaries are set and the values that these boundaries will have.

CREATE PARTITION FUNCTION [pf_PetTypes](varchar(40)) AS RANGE LEFT FOR VALUES (N'Cat', N'Dog') 
GO 

This function is called ‘pf_PetTypes’. It has two boundaries, ‘Cat’ and ‘Dog’. The ‘RANGE LEFT’ means that anything to the left of ‘Cat’ will go into the first partition. So alphabetically anything with a value of ‘Cat’ or less will go into this partition.
Anything to the left of ‘Dog’ (and including ‘Dog’) alphabetically right down to (but not including) ‘Cat’ will go into the second partition.
Anything beyond ‘Dog’ will go into the third partition that will be specified in the Partition Scheme.

‘RANGE RIGHT’ is another option for the Partition Function and this would of course alter the behaviour of the partitions.

Having created the Partition Function we now associate it with a Partition Scheme, which will map the ranges specified in the Function to filegroups.

CREATE PARTITION SCHEME [ps_PetTypes] AS PARTITION [pf_PetTypes] TO ([PRIMARY], [PRIMARY], [PRIMARY]) 
GO 

This Scheme shows three partitions, which in this example all point to the same filegroup. For partition switching this is not a problem. To spread your partitions across multiple drives or folders you’d have to have different filegroups specified, which would have started with the database creation.
This Scheme maps the values of the first value within the Function to the first filegroup ‘PRIMARY’, the second value to the second filegroup ‘PRIMARY’ and any values that are beyond the range will go to the third filegroup ‘PRIMARY’.

Now the table can be created and mapped to the Partition Scheme.

CREATE TABLE PetDetails( 
	PetDetailID		INT IDENTITY(1,1), 
	PetType			VARCHAR(40) NOT NULL, 
	PetBreed			VARCHAR(20)	NOT NULL DEFAULT 'Unknown', 
	PetName			VARCHAR(20), 
	SpecialRequirements	VARCHAR(500), 
	AdditionalNotes		VARCHAR(500), 
CONSTRAINT [PK_Petdetails] PRIMARY KEY CLUSTERED 
( 
	PetDetailID ASC, 
	Pettype		ASC  
) WITH (IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON ps_PetTypes(PetType)  
) ON ps_PetTypes(PetType); 
 
GO 

In this example I’ve specified the Partition Scheme against the Clustered Index and the table itself (‘ON ps_PetTypes(PetType)’). This isn’t actually necessary – one or the other is sufficient, it’s just a habit I’ve developed.
It is here that you specify which column is used to decide which partition is required. The column is the parameter used for the ps_PetTypes Partition Scheme. In this case the column is PetType.

A Clustered Index has been created against this table and where this has been done the partition column has to be part of the Key, otherwise the error ‘Partition columns for a unique index must be a subset of the index key’ is thrown and the table creation fails.

For partition switching we need somewhere to actually switch the data to. For this purpose you need another table that is identical in structure to the main table. If there are defaults and suchlike within the main table then these aren’t required in the additional table but columns, definitions and Clustered Index must match.

CREATE TABLE PetDetails_PartitionSwitch( 
	PetDetailID		INT IDENTITY(1,1), 
	PetType			VARCHAR(40) NOT NULL, 
	PetBreed			VARCHAR(20) NOT NULL, 
	PetName			VARCHAR(20), 
	SpecialRequirements	VARCHAR(500), 
	AdditionalNotes		VARCHAR(500), 
CONSTRAINT [PK_Petdetails_PartitionSwitch] PRIMARY KEY CLUSTERED 
( 
	PetDetailID ASC, 
	Pettype		ASC 
) WITH (IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON ps_PetTypes(PetType)  
) ON ps_PetTypes(PetType); 
 
GO 

In this case I’ve created another table with the suffix ‘_PartitionSwitch’, to signify the purpose of this table.
Now that we’ve created the tables the partitions details can be found:

SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_01

Both tables have three partitions, with no rows of data.
So we’ll add some data and look again:

INSERT INTO dbo.PetDetails 
        ( PetType , 
          PetBreed , 
          PetName , 
          SpecialRequirements , 
          AdditionalNotes 
        ) 
VALUES  ( 'Dog' , 'Alsatian' , 'Pookie' ,  '' , '' ), 
		( 'Dog' , 'Boxer', 'Rover','','Not the brightest bulb in the shop'), 
		( 'Cat', 'Persian', 'Snowball', '',''); 
GO 


SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_02

PetDetails has one row in Partition 1 and two rows in Partition 2, because we added one ‘Cat’ row and two ‘Dog’ rows.
This table was designed with Cats and Dogs in mind, so what happens when other animals are used?

INSERT INTO dbo.PetDetails 
        ( PetType , 
          PetBreed , 
          PetName , 
          SpecialRequirements , 
          AdditionalNotes 
        ) 
VALUES  ( 'Bird' , 'Canary' , '' ,  '' , '' ); 
GO 
 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_03

Because ‘Bird’ is left alphabetically of ‘Cat’ it also goes into Partition 1.

Add another one:

INSERT INTO dbo.PetDetails 
        ( PetType , 
          PetBreed , 
          PetName , 
          SpecialRequirements , 
          AdditionalNotes 
        ) 
VALUES  ( 'Frog' , 'Pig Frog' , '' ,  '' , '' ); 
GO 

--Frog goes into partition 3, because it is beyond the range of 'Dog' 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_04

The value ‘Frog’ is beyond the boundary set by ‘Dog’ alphabetically, so it goes into the extra partition that is there to cater for out of range values.
Remove these additional values, so we can get back to ‘clean’ data:

--Get rid of the distracting data 
DELETE dbo.PetDetails 
WHERE PetType NOT IN ('cat','dog'); 
GO 

It is possible to use the partition function to ascertain what values will be stored into which partition without actually writing any data:

--What partition will 'cat' be in 
SELECT $PARTITION.pf_PetTypes('cat'); 
GO 

--What partition would 'elephant' be placed into? 
SELECT $PARTITION.pf_PetTypes('elephant'); 
GO 

The result is the partition number the data will be written to.
Check the partitions before we switch:

SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_05

I know that the ‘Cat’ entry is in Partition 1. To move this data by switching the partition the command is:

ALTER TABLE dbo.PetDetails 
SWITCH PARTITION 1 TO dbo.PetDetails_PartitionSwitch PARTITION 1; 
GO 

SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_06

Partition 1 of PetDetails now has no rows (check the partition_number column as the order may have changed) and Partition 1 of PetDetails_PartitionSwitch has one row.

SELECT * 
FROM dbo.PetDetails; 
GO 

PS_07

There is no trace of the ‘Cat’ entry in this table, it is now in PetDetails_PartitionSwitch:

SELECT * 
FROM dbo.PetDetails_PartitionSwitch; 
GO 

PS_08

Now we can remove the data quickly by using TRUNCATE TABLE against Petdetails_PartitionSwitch.

The speed of the SWITCH PARTITION command and the TRUNCATE TABLE are extremely fast, because they are both DDL commands. Therefore switching and truncating one row or a million rows will take pretty much the same amount of time; certainly not enough difference for you to worry about.
For the finer detail on Partition Switching – http://technet.microsoft.com/en-us/library/ms191160(v=sql.105).aspx

Follow

Get every new post delivered to your Inbox.