Replication – Scripting the Pull Subscription – Basic Example

6 Oct From http://bonnieplants.com/growing/growing-brussels-sprouts/

Having prepared the Distributor in the previous article it is now possible to create a Subscription.

First of all create a test database and table on the Publisher (Publisher-A):

CREATE DATABASE PublisherDB; 
GO 

USE PublisherDB; 
GO 

CREATE SCHEMA ReplTest; 
GO 

CREATE TABLE ReplTest.PublicationTable( 
	PK				BIGINT IDENTITY, 
	EmailAddress	VARCHAR(50), 
	DOB				DATE, 
	DateCreated		DATETIME DEFAULT GETDATE(), 
 CONSTRAINT PK_PublicationTable PRIMARY KEY CLUSTERED 
( 
	PK ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)  
)  

GO 
  

A database is also required on the Subscriber (Subscriber-C):

CREATE DATABASE SubscriberDB; 
GO 

USE SubscriberDB; 
GO 

CREATE SCHEMA Repl; 
GO 

CREATE TABLE Repl.SubscriptionTable( 
	PK				BIGINT, 
	EmailAddress	VarChar(50), 
	DOB				Date, 
	DateCreated		DateTime, 
	ReplDatecreated	DateTime DEFAULT GETDATE(), 
 CONSTRAINT PK_SubscriptionTable PRIMARY KEY CLUSTERED 
( 
	PK ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) 
) 

GO 

Now back to the Publisher. This server needs to be configured as a Publisher. The steps used to create the Distibution Server have already associated this server with the Distributor but now this server needs to be configured as a Publisher and also associated with the Distributor (Server B needs to know about Server A and then Server A needs to know about Server B as a seperate step):

use master  
exec sp_adddistributor @distributor = N'9999-PC\Distributor-B', @password = N'Spr0uts' 
GO 

The response should be ‘You have updated the Publisher property ‘active’ successfully.’

Having configured the server to allow Publications, the database itself needs to be configured for publishing:

exec sp_replicationdboption @dbname = N'PublisherDB', @optname = N'publish', @value = N'true' 
GO 

And Log Reader Agents need to be created:

exec PublisherDB.sys.sp_addlogreader_agent @job_login = null, @job_password = null, @publisher_security_mode = 1 
GO 
exec PublisherDB.sys.sp_addqreader_agent @job_login = null, @job_password = null, @frompublisher = 1 
GO 

The response from this will notify you of two Jobs, although the exact names may vary:
“Job ‘9999-PC\Publisher-A-PublisherDB-1′ started successfully.
Job ‘[9999-PC\Publisher-A].5′ started successfully.”

Now create the Publication:

use PublisherDB; 
exec sp_addpublication @publication = N'Test_Publication', @description = N'Transactional publication of database ''PublisherDB''.', @sync_method = N'native', @retention = 0, 
@allow_push = N'false', @allow_pull = N'true', @allow_anonymous = N'false', @enabled_for_internet = N'false', 
@snapshot_in_defaultfolder = N'true', @compress_snapshot = N'false', @ftp_port = 21, @ftp_login = N'anonymous', 
@allow_subscription_copy = N'false', @add_to_active_directory = N'false', @repl_freq = N'continuous', @status = N'active', 
@independent_agent = N'true', @immediate_sync = N'false', @allow_sync_tran = N'false', @autogen_sync_procs = N'false', 
@allow_queued_tran = N'false', @allow_dts = N'false', @replicate_ddl = 1, @allow_initialize_from_backup = N'false', 
@enabled_for_p2p = N'false', @enabled_for_het_sub = N'false' 
GO 

If you’ve used the Wizard to generate the scripts, the next command adds the publication snapshot. Where I work snapshots aren’t used because the databases are rather large and the lock such a command takes would make database access somewhat problematic for the hundreds of users that connect every minute.
So we’ll skip the ‘sp_addPublication_Snapshot’ command and move onto the next step – granting access to specified accounts.

We now need to create the various accounts and permissions that this Replication will require:

First, the Distributor needs access to this Publisher:

CREATE LOGIN [NT SERVICE\SQLAgent$DISTRIBUTOR-B] FROM WINDOWS WITH DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english] 
GO 

And to the DB:

USE PublisherDB; 
GO 

CREATE USER [NT SERVICE\SQLAgent$DISTRIBUTOR-B] FOR LOGIN [NT SERVICE\SQLAgent$DISTRIBUTOR-B] 
GO 

ALTER ROLE [db_owner] ADD MEMBER [NT SERVICE\SQLAgent$DISTRIBUTOR-B] 
GO

Various accounts involved with the Replication process also require access:

exec sp_grant_publication_access @publication = N'Test_Publication', @login = N'sa' 
GO 
exec sp_grant_publication_access @publication = N'Test_Publication', @login = N'NT AUTHORITY\SYSTEM' 
GO 
exec sp_grant_publication_access @publication = N'Test_Publication', @login = N'NT Service\SQLAgent$PUBLISHER-A' 
GO 
exec sp_grant_publication_access @publication = N'Test_Publication', @login = N'NT Service\SQLAgent$DISTRIBUTOR-B' 
GO 

Now we tell the Replication process which table we want to Replicate. Within Replication the tables are referred to as ‘Articles’, borrowing the terms from the age-old process of newspapaer publishing. For this Article we’re only going to be interested in copying new rows across. For now we don’t care about updating or deleting data, so there’ll only be one Stored Procedure defined – against the parameter ‘@ins_cmd’. You specify the name and then the process of creating the Article takes care of generating the code:

exec sp_addarticle @publication = N'Test_Publication', @article = N'Test_Publication_Article', @source_owner = N'ReplTest', @source_object = N'PublicationTable', @type = N'logbased', 
@description = N'', @creation_script = N'', @pre_creation_cmd = N'none', @schema_option = 0x0000000000030002, @identityrangemanagementoption = N'none', 
@destination_table = N'SubscriptionTable', @destination_owner = N'Repl', @status = 24, @vertical_partition = N'false', 
@ins_cmd = N'CALL [dbo].[sp_MSins_Repl_SubscriptionTable]', @del_cmd = N'NONE', @upd_cmd = N'NONE' 
GO 

At this point you can right-click on ‘Replication/Local Publications’ and see the Properties, confirmimg the table is set up for Replication:
Replication_Properties01

Going back to the Distributor, the Distributor needs to have the login details for the Subscriber, as this is a Pull Subscription and it will be the responsibility of the Subscriber to transfer the data from the Distributor:

USE [master] 
GO 
CREATE LOGIN [NT SERVICE\SQLAgent$SUBSCRIBER-C] FROM WINDOWS WITH DEFAULT_DATABASE=[master] 
GO 
ALTER SERVER ROLE [sysadmin] ADD MEMBER [NT SERVICE\SQLAgent$SUBSCRIBER-C] 
GO 
USE [distribution] 
GO 
CREATE USER [NT SERVICE\SQLAgent$SUBSCRIBER-C] FOR LOGIN [NT SERVICE\SQLAgent$SUBSCRIBER-C] 
GO 
USE [distribution] 
GO 
ALTER ROLE [db_owner] ADD MEMBER [NT SERVICE\SQLAgent$SUBSCRIBER-C] 
GO 

On the Subscriber the Pull Subscription now needs to be created:

use SubscriberDB 
GO 
exec sp_addpullsubscription @publisher = N'9999-PC\Publisher-A', @publication = N'Test_Publication', @publisher_db = N'PublisherDB', @independent_agent = N'True', 
@subscription_type = N'pull', @description = N'', @update_mode = N'read only', @immediate_sync = 0 

As well as the Agent that will be responsible for it:

exec sp_addpullsubscription_agent @publisher = N'9999-PC\Publisher-A', @publisher_db = N'PublisherDB', @publication = N'Test_Publication', 
@distributor = N'9999-PC\Distributor-B', @distributor_security_mode = 1, @distributor_login = N'distributor_admin', @distributor_password = N'Spr0uts', 
@enabled_for_syncmgr = N'False', @frequency_type = 4, @frequency_interval = 1, @frequency_relative_interval = 0, 
@frequency_recurrence_factor = 0, @frequency_subday = 4, @frequency_subday_interval = 1, @active_start_time_of_day = 30, 
@active_end_time_of_day = 235959, @active_start_date = 0, @active_end_date = 0, @alt_snapshot_folder = N'', 
@working_directory = N'', @use_ftp = N'False', @job_login = null, @job_password = null, @publication_type = 0 
GO 

Now the custom Stored Procedure needs to be created within the Subscriber, to process the inserts. To do this, the Stored Procedure is generated from the Publisher with the command:

Use PublisherDB; 
GO 

EXEC sp_scriptpublicationcustomprocs 'Test_Publication' 

The respnse from this:
Script_Stored_Procs
can then be copied and pasted into the Subscriber and executed against the Subscriber DB:
Script_Stored_Procs_Output

Finally, from the PublisherDB we need to active the Subscription:

exec sp_addsubscription @publication = N'Test_Publication', @subscriber = N'9999-PC\Subscriber-C', @destination_db = N'SubscriberDB', @subscription_type = N'Pull', @sync_type = N'None', 
@article = N'all', @update_mode = N'read only', @subscriber_type = 0 
GO 

A word of warning regarding the parameter ‘@sync_type’. Where I work, the use of ‘none’ is commonplace (because we don’t replicate schema changes automatically – we control it ourselves) and if you look at the documentation it is deprecated, so a better alternative is ‘replication support only’. However, when you generate the scripts from an existing Publication using the Wizard, you may find that the parameter has ‘automatic’ set against it – regardless of the value you initially created.
Replication has lots of lovely surprises like that.

At this point, from SSMS on the Subscriber you should be able to expand ‘Replication/Local Publications’ and see the name of the Publication and the Subscriber details:
Replication_Details

A basic Pull Subscription has now been set up. In following articles I’ll cover how to test and monitor this, with basic steps on what to look for when it goes wrong.

Replication – Scripting the Distributor – Basic Example

2 Oct FreeGreatPicture.com-8021-sprouts-leaves

By far the easiest way to create the Distributor is via the Wizard and examples abound of that – one such example is at SQL Server Central http://www.sqlservercentral.com/articles/Stairway+Series/72282/ .

So why use scripts?

Mainly for Disaster Recovery and consistency. Should you have to rebuild the server then running the scripts is easier and more reliable than re-running the Wizard – there’s less chance somebody can enter different values and therefore cause issues with the Replication.

So, let’s have a look at the scripts behind this wizard.

Firstly – I have three servers, installed as separate instances on the same machine:
Publisher-A, as the Publisher
Distributor-B as the Distributor and
Subscriber-C as the Subscriber.
Currently none of these servers are configured for Replication. All instances were installed to cater for Replication but that is all.

Firstly, make sure the Agent Services are running – it makes life easier and you will need them to actual perform the replication tasks. SQL Configuration Manager can be used for this.

There are four scripts in the standard Distributor creation, gleaned from scripting the Wizard. All of these script are being executed on ‘Distributor-B’ and of course I’ve replaced the server name – I may well be working in a secure environment but there’s no need to tempt anybody.

1. Designate the Distributor server:

use master
exec sp_adddistributor 
     @distributor = N'9999-PC\Distributor-B', 
     @password = N'Spr0uts' 
GO 

Having executed this code you will see an additional entry in sys.sysservers, giving ‘srvname’ of ‘repl_distributor’ and showing ‘dist’ as 1. An associated entries under ‘Linked Servers’ within SSMS will also be visible.

Next, create the Distibutor database:

exec sp_adddistributiondb @database = N'distribution', 
			@data_folder = N'C:\Program Files\Microsoft SQL Server\MSSQL11.Distributor-B\MSSQL\DATA', 
			@log_folder = N'C:\Program Files\Microsoft SQL Server\MSSQL11.Distributor-B\MSSQL\DATA', 
			@log_file_size = 2, 
			@min_distretention = 0, 
			@max_distretention = 72, 
			@history_retention = 48, 
			@security_mode = 0, 
			@login = N'sa',--login used when connecting to the distributor to create the distribution database 
			@password='Spr0uts' 
GO 

I’ve left the default name for the DB ‘distributor’ and in this case have left the other defaults too (retention periods and login).
As this code executes it displays various messages relating to the creation of the DB, as well as adding the roles ‘guest’ and ‘replmonitor’. Once completed the database is visible under ‘System Databases’ as a distributor:

Sys_Databases

The next command runs against this new distributor database and is used to record the location of the snapshot folder. Now as I don’t intend to use snapshots I’m only including this because all scripts appear to include it and I can’t find a decent explanation for it. When I have the time I’m going to try excluding this command and see what fun it creates. For now though I’ll just play safe:

use [distribution] 
if (not exists (select * from sysobjects where name = 'UIProperties' and type = 'U ')) 
	create table UIProperties(id int) 
if (exists (select * from ::fn_listextendedproperty('SnapshotFolder', 'user', 'dbo', 'table', 'UIProperties', null, null))) 
	EXEC sp_updateextendedproperty N'SnapshotFolder', N'\\9999-PC\C$\Replication\Snapshot', 'user', dbo, 'table', 'UIProperties' 
else 
	EXEC sp_addextendedproperty N'SnapshotFolder', N'\\9999-PC\C$\Replication\Snapshot', 'user', dbo, 'table', 'UIProperties' 
GO 

Notice that the locations needs a fully-qualified name.

The final command registers a Publisher that will use this Distributor:

exec sp_adddistpublisher @publisher = N'9999-PC\Publisher-A', 
		@distribution_db = N'distribution', 
		@security_mode = 1, 
		@working_directory = N'\\9999-PC\C$\Replication\Snapshot', 
		@trusted = N'false', 
		@thirdparty_flag = 0, 
		@publisher_type = N'MSSQLSERVER'
GO

This will create an entry under ‘Linked Servers’ for the Publisher. The only response from this code is the usual ‘Command completed successfully’ message. Trying it a second time will tell you that the server is already listed as a publisher.

Note: If you’ve used the ‘Configure Distribution Wizard’ to create the scripts, you’ll notice that the Distributor is also created as a Publisher. This step isn’t necessary for the configuration I’m using here.

Now we have a Distributor created and can use that to build upon in following articles.

SSIS Error Code “0x80070057″ in Slowly Changing Dimension

28 Aug

SSIS can produce an inordinate number of error messages and numbers. Some of these can even be helpful and informative.
Generally accompanied by an Error Code, one of the most popular appears to be “0x80070057” and can have a number of causes.

Recently I came across this particularly helpful code when creating a Slowly Changing Dimension:

Error: 0xC0202009 at ‘Task Name’ , Slowly Changing Dimension [14907]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80070057.
An OLE DB record is available. Source: “Microsoft SQL Server Native Client 10.0″ Hresult: 0x80070057 Description: “The parameter is incorrect.”.
Error: 0xC0047022 at ‘Task Name’ , SSIS.Pipeline: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component “Slowly Changing Dimension” (14907) failed with error code 0xC0202009 while processing input “Slowly Changing Dimension Input” (14918). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.
Error: 0xC02020C4 at ‘Task Name’ , ‘View Name’ [1]: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.
Error: 0xC0047038 at , SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component “’View Name’” (1) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.

The behaviour was thus – run it for the first time and the process extracted data from a View and inserted all of the rows as new rows into the output. Run it a second time, when it would check for changes (because the output table now had data) and it would instantly fail with the above error message.

The first thing I noticed was the “The parameter is incorrect”, which is strange because the Wizard created everything based on the details I fed into it – so it shouldn’t be anything I’ve done.
So, as I had created 5 other Slowly Changing Dimensions for other Views and Tables I decided to recreate this one from the start with a very ‘narrow’ table. The table concerned with this particular process was rather wide, certainly when compared to the others that had worked successfully.
A couple of fairly tedious hours later I had a failure when I added the last few columns back in (as ever, it’s always something at the end). These columns were varchar(8000) and there were six of them.
This is where the line in the error message “The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020” started to look more interesting.
Checking these columns showed that the 8000 size was a tad generous, based upon the data likely to be stored within them. Reducing these to 300 each removed the error and everything worked as expected.

I have no idea what size buffer SSIS can create for this processing but the message implied that it wasn’t large enough for this particular table. Reducing the width of the table corrected this.
It may not be the solution for every instance of that message but in this case it worked and can be added to the list of possible solutions to one less-than-helpful SSIS Error Code.

Replication – the sprouts of my DBA world.

14 Aug h94C0D998

h94C0D998
Actually, that isn’t entirely fair. I don’t have to eat sprouts, which is just as well. However, I do have to deal with Replication – certainly in my current job. Whereas my refusal to eat the Little Green Balls of Death won’t result in a divorce (because my wife, as wonderful as she is, does for some strange reason really like sprouts) I can’t imagine a job that specified it uses Replication will suffer my presence for long if I flatly refuse to touch it.

SQL Server has a broad range of tools and features, so understanding all of them beyond a basic level is probably beyond most of us. However, within your role you will be expected to have a detailed understanding of some aspects of SQL Server and here, Replication is amongst those required skills.

Like many people, I have created Replication tasks in the past by using the Wizards, so I can at least have a basic understanding of the subject. However, the Wizard does everything for you, makes assumptions and on a simple test system it does work quite well.

On an industrial scale though, and processed via SQL Scripts it is another story. Some of the error messages it can give are less than useful and let’s face it – Replication Monitor isn’t the most helpful tool.

The following series of articles on Replication are to act as an aide-memoire for myself, which is the real reason I have this blog series. Like most people I find that if I don’t use an aspect of SQL Server for an extended period of time I forget the finer details. Having got to grip with a portion of Replication I need to document it. If it helps anybody else then that’s a bonus.

Assumptions

First – the interesting bits. No DDL changes replicated (generally), because in some cases we don’t require replication of all of the columns and the subscriber table may have additional columns (its own partition scheme – so another partition id and a date time showing when it was written to the subscriber).

No snapshots, because of the size of some of the publisher’s tables. A lock is taken while a snapshot is generated, which isn’t desirable on a database that is required 24/7.

And of course – no Wizards. If you have to rebuild a server then it is so much easier to have the scripts to hand, rather than answer a plethora of questions from a Wizard and hope you gave the answers that the DBA who first set up Replication gave. Easier to see what is going to happen and also removes some of the more interesting defaults that the Wizards can specify.

I have three SQL Server 2012 instances on my Development machine and I’ll be using these as Publisher, Distributor and Subscriber.

The articles that follow will include the initial creation of the Distributor, creating a Pull Subscription and changing an Article. As my experience with sprouts Replication continues I’ll add more articles, because there’s no way this mess can stay fresh in my mind for long.

Partition Switching – Basic Example

17 Feb shell_game

Want to delete a couple of records quickly from a large table; well DELETE is more than capable (indexing issues aside). Want to delete a couple of million from a large table quickly? Now it can get a little more interesting.

Partitioning is a very useful tool for moving large amounts of data very quickly. However, it does require an understanding of the several elements of this interesting aspect of SQL Server.

    What is Partitioning?

You can have a table within a database that for general use appears to be one large storage area for your information. However, it can be stored spilt into several filegroups and these filegroups can be spread across several folders/drives. So your data in reality can be spread across a wide range of drives for improved access on a hardware level whilst appearing to the user as one table, without having to take any considerations for its partitioned structure.

    How do you Partition?

To create a partitioned table there are three things that need to be present:

1. A Partition Function, which describes how the data will be split across partitions. It shows the values that will be used for dictating how the data is spread across those partitions. At this point it doesn’t specify the column itself, just the values.

2. A Partition Scheme, to show which filegroups are used for the partitions. If you design a table to have two partitioned values (as described in the Partition Function) then the Partition Scheme will have a filegroup for each of these plus one extra for those values that won’t fit within the values specified in the Function.

3. Allocating the table to a Partition Scheme. When the table is created it needs to be allocated at this time to a Partition Scheme. It is at this point that you specify which column of the table is used by the Partition Scheme, to map values according to the Partition Function.

    Example

Quite often examples of partitioning are given using date ranges, because that is a popular way of dividing the data. This time I’m going to use something that I found easier to visualise and demonstrate – animals.

For no particular reason I’m going to create a table that stores details of cats and dogs. One partition will contain Cats and the other Dogs. Using these as the boundaries of the partitions will show what happens when other animals are introduced, depending upon how they are listed alphabetically.

First of all, create a database. To properly utilise partitioning I should create the database across several filegroups but that isn’t essential for this demonstration. In my environment the following code will create a database in one filegroup – ‘PRIMARY’.

CREATE DATABASE PartitionTest; 
GO 

Now we create a Partition Function, which requires a name, the definition of the column it will ultimately be mapped against (or to look at it the other way – the mapped column should have the same definition as the Partition Function), the description of how the partition boundaries are set and the values that these boundaries will have.

CREATE PARTITION FUNCTION [pf_PetTypes](varchar(40)) AS RANGE LEFT FOR VALUES (N'Cat', N'Dog') 
GO 

This function is called ‘pf_PetTypes’. It has two boundaries, ‘Cat’ and ‘Dog’. The ‘RANGE LEFT’ means that anything to the left of ‘Cat’ will go into the first partition. So alphabetically anything with a value of ‘Cat’ or less will go into this partition.
Anything to the left of ‘Dog’ (and including ‘Dog’) alphabetically right down to (but not including) ‘Cat’ will go into the second partition.
Anything beyond ‘Dog’ will go into the third partition that will be specified in the Partition Scheme.

‘RANGE RIGHT’ is another option for the Partition Function and this would of course alter the behaviour of the partitions.

Having created the Partition Function we now associate it with a Partition Scheme, which will map the ranges specified in the Function to filegroups.

CREATE PARTITION SCHEME [ps_PetTypes] AS PARTITION [pf_PetTypes] TO ([PRIMARY], [PRIMARY], [PRIMARY]) 
GO 

This Scheme shows three partitions, which in this example all point to the same filegroup. For partition switching this is not a problem. To spread your partitions across multiple drives or folders you’d have to have different filegroups specified, which would have started with the database creation.
This Scheme maps the values of the first value within the Function to the first filegroup ‘PRIMARY’, the second value to the second filegroup ‘PRIMARY’ and any values that are beyond the range will go to the third filegroup ‘PRIMARY’.

Now the table can be created and mapped to the Partition Scheme.

CREATE TABLE PetDetails( 
	PetDetailID		INT IDENTITY(1,1), 
	PetType			VARCHAR(40) NOT NULL, 
	PetBreed			VARCHAR(20)	NOT NULL DEFAULT 'Unknown', 
	PetName			VARCHAR(20), 
	SpecialRequirements	VARCHAR(500), 
	AdditionalNotes		VARCHAR(500), 
CONSTRAINT [PK_Petdetails] PRIMARY KEY CLUSTERED 
( 
	PetDetailID ASC, 
	Pettype		ASC  
) WITH (IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON ps_PetTypes(PetType)  
) ON ps_PetTypes(PetType); 
 
GO 

In this example I’ve specified the Partition Scheme against the Clustered Index and the table itself (‘ON ps_PetTypes(PetType)’). This isn’t actually necessary – one or the other is sufficient, it’s just a habit I’ve developed.
It is here that you specify which column is used to decide which partition is required. The column is the parameter used for the ps_PetTypes Partition Scheme. In this case the column is PetType.

A Clustered Index has been created against this table and where this has been done the partition column has to be part of the Key, otherwise the error ‘Partition columns for a unique index must be a subset of the index key’ is thrown and the table creation fails.

For partition switching we need somewhere to actually switch the data to. For this purpose you need another table that is identical in structure to the main table. If there are defaults and suchlike within the main table then these aren’t required in the additional table but columns, definitions and Clustered Index must match.

CREATE TABLE PetDetails_PartitionSwitch( 
	PetDetailID		INT IDENTITY(1,1), 
	PetType			VARCHAR(40) NOT NULL, 
	PetBreed			VARCHAR(20) NOT NULL, 
	PetName			VARCHAR(20), 
	SpecialRequirements	VARCHAR(500), 
	AdditionalNotes		VARCHAR(500), 
CONSTRAINT [PK_Petdetails_PartitionSwitch] PRIMARY KEY CLUSTERED 
( 
	PetDetailID ASC, 
	Pettype		ASC 
) WITH (IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON ps_PetTypes(PetType)  
) ON ps_PetTypes(PetType); 
 
GO 

In this case I’ve created another table with the suffix ‘_PartitionSwitch’, to signify the purpose of this table.
Now that we’ve created the tables the partitions details can be found:

SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_01

Both tables have three partitions, with no rows of data.
So we’ll add some data and look again:

INSERT INTO dbo.PetDetails 
        ( PetType , 
          PetBreed , 
          PetName , 
          SpecialRequirements , 
          AdditionalNotes 
        ) 
VALUES  ( 'Dog' , 'Alsatian' , 'Pookie' ,  '' , '' ), 
		( 'Dog' , 'Boxer', 'Rover','','Not the brightest bulb in the shop'), 
		( 'Cat', 'Persian', 'Snowball', '',''); 
GO 


SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_02

PetDetails has one row in Partition 1 and two rows in Partition 2, because we added one ‘Cat’ row and two ‘Dog’ rows.
This table was designed with Cats and Dogs in mind, so what happens when other animals are used?

INSERT INTO dbo.PetDetails 
        ( PetType , 
          PetBreed , 
          PetName , 
          SpecialRequirements , 
          AdditionalNotes 
        ) 
VALUES  ( 'Bird' , 'Canary' , '' ,  '' , '' ); 
GO 
 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_03

Because ‘Bird’ is left alphabetically of ‘Cat’ it also goes into Partition 1.

Add another one:

INSERT INTO dbo.PetDetails 
        ( PetType , 
          PetBreed , 
          PetName , 
          SpecialRequirements , 
          AdditionalNotes 
        ) 
VALUES  ( 'Frog' , 'Pig Frog' , '' ,  '' , '' ); 
GO 

--Frog goes into partition 3, because it is beyond the range of 'Dog' 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_04

The value ‘Frog’ is beyond the boundary set by ‘Dog’ alphabetically, so it goes into the extra partition that is there to cater for out of range values.
Remove these additional values, so we can get back to ‘clean’ data:

--Get rid of the distracting data 
DELETE dbo.PetDetails 
WHERE PetType NOT IN ('cat','dog'); 
GO 

It is possible to use the partition function to ascertain what values will be stored into which partition without actually writing any data:

--What partition will 'cat' be in 
SELECT $PARTITION.pf_PetTypes('cat'); 
GO 

--What partition would 'elephant' be placed into? 
SELECT $PARTITION.pf_PetTypes('elephant'); 
GO 

The result is the partition number the data will be written to.
Check the partitions before we switch:

SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_05

I know that the ‘Cat’ entry is in Partition 1. To move this data by switching the partition the command is:

ALTER TABLE dbo.PetDetails 
SWITCH PARTITION 1 TO dbo.PetDetails_PartitionSwitch PARTITION 1; 
GO 

SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_06

Partition 1 of PetDetails now has no rows (check the partition_number column as the order may have changed) and Partition 1 of PetDetails_PartitionSwitch has one row.

SELECT * 
FROM dbo.PetDetails; 
GO 

PS_07

There is no trace of the ‘Cat’ entry in this table, it is now in PetDetails_PartitionSwitch:

SELECT * 
FROM dbo.PetDetails_PartitionSwitch; 
GO 

PS_08

Now we can remove the data quickly by using TRUNCATE TABLE against Petdetails_PartitionSwitch.

The speed of the SWITCH PARTITION command and the TRUNCATE TABLE are extremely fast, because they are both DDL commands. Therefore switching and truncating one row or a million rows will take pretty much the same amount of time; certainly not enough difference for you to worry about.
For the finer detail on Partition Switching – http://technet.microsoft.com/en-us/library/ms191160(v=sql.105).aspx

XML and STUFF for a simple CSV column

27 Dec Cookie Monster

I’m not a big fan of XML via T-SQL – far too long-winded a monster for my liking. However, there is no denying the usefulness of it and the fact that on occasions it simply doesn’t matter if you like an aspect of SQL Server, you may simply have to learn to use it. If I could pick and choose which bits I get to play with I suspect I wouldn’t be a DBA for long.

One nice little ‘trick’ with the XML commands that I’m grateful for is used to provide a comma-separated list of data. It’s an interesting solution that is often shown to people who need such a thing and is far faster than other solutions I’ve seen.
As an example, I would like to produce a csv list of the surnames from the following data:

Data01

To obtain the following results:

Data02

First of all – the test data:

CREATE TABLE #PersonList( 
		PersonID	INT IDENTITY, 
		Forename	VARCHAR (50), 
		Surname		VARCHAR(50) 
		); 
GO 
 
INSERT INTO #PersonList ( Forename, Surname ) 
VALUES  ( 'John', 'Smith'), 
		( 'Tom', 'Brown'), 
		( 'Tom', 'Sawyer'), 
		( 'Harry', 'Flashman'), 
		( 'Anna', 'Howe'), 
		( 'Julian', 'Sorel'), 
		('Sherlock', 'Holmes'); 
GO 
 
SELECT * 
FROM #PersonList; 

The command to produce the csv data is:

SELECT STUFF((SELECT ', ' + Surname  
                            FROM #PersonList 
		          ORDER BY Surname 
                            FOR XML PATH('')), 1, 2, ''); 

It comprises of two commands. The inner command is FOR XML in it’s simplest form, with a comma added to the selected data:

SELECT ', ' +  Surname 
           FROM  #PersonList 
           ORDER BY Surname 
           OR XML PATH(''); 

This is a very simple example of the XML command. Normally the PATH would have something within it and would therefore produce a very basic XML.
For example:

SELECT Surname  
         FROM  #PersonList 
         FOR XML PATH('test'); 

will produce XML output thus:

XML01

Not valid XML in this form (hence the red underline) but you get the idea.

Change the command to add a comma between each surname and remove the element name (the ‘test’) and you get a strange XML value like this:

XML02

The next requirement is to remove the leading comma and convert this back to a character field. This is where STUFF comes in. In this case the first parameter for STUFF is the XML query.
Then it looks at the first character position of the query result and replaces the two characters from that point with a zero length field (”).
As STUFF returns character data or binary data depending upon the data type it is processing, in this case it returns the result as a character field, with the leading comma removed.
Two problems solved with one command.

I’ve added the ORDER BY just to make the data look more organised and of course the ‘inner’ XML query can be more involved than this one but it should give an idea of how to use this for this particular purpose:

SELECT STUFF((SELECT ', ' + Surname 
                            FROM #PersonList 
		          ORDER BY Surname 
                            FOR XML PATH('')), 1, 2, ''); 

UnPivot – Basic Example

23 Dec 1387817832_two_hand_two_finger_rotate_gestureworks

In a previous posting I described a simple example of Pivot – a much used feature which I personally have problems recalling the syntax of (hence the article in the first place). It’s lesser known relative in UnPivot, which is designed to do the opposite – it transforms columns of data into rows.

In the following example I have a few details about properties – the number of Bedrooms, Garages, Gardens and Attic Spaces. I’ve used this as an example because the column names will make it more apparent how the query and the results are connected.

I want to transform the data from showing the property features as column names to actually have those column names listed as data, along with the values that these columns contain:

Unpivot01

First of all – the test data:

CREATE TABLE #Properties 
    ( 
      PropertyID INT , 
      Attic_Spaces INT , 
      Bedrooms INT , 
      Garages INT , 
      Gardens INT 
    ) ; 
GO 

INSERT  INTO #Properties 
        ( PropertyID, Attic_Spaces, Bedrooms, Garages, Gardens ) 
VALUES  ( 101, 1, 2, 0, 1 ), 
        ( 102, 2, 1, 1, 2 ), 
        ( 103, 1, 2, 1, 2 ), 
        ( 104, 1, 3, 2, 2 ) ; 
GO 

SELECT  * 
FROM    #Properties ;   

The basic construct of an UnPivot is quite simple:

SELECT PropertyID, 
	   PropertyFeature, 
	   NumberOf 
	   FROM #Properties p 
	   UNPIVOT ( NumberOf FOR PropertyFeature IN ( Attic_Spaces, Bedrooms, Garages, Gardens ) ) AS unpvt ; 

This will select PropertyID from the #Properties table. Then it will select the columns ‘PropertyFeature’ and ‘NumberOf’ from the UNPIVOT part of the query. In this section I have asked it to create a column called ‘PropertyFeature’, which is taken from the columns called ‘Attic_Spaces’, ‘Bedrooms’, ‘Garages’ and ‘Gardens’. The value that is held within those ‘original’ columns (within the #Properties table) will be stored in the column called ‘NumberOf’ within the UNPIVOT section.

The alias for the #Properties table isn’t required, it’s just a habit of mine. However, the alias for the UNPIVOT (the ‘AS unpvt’) is mandatory even though you see no reference to it.

This may not be the most accurate description of what it is really doing, but it is how my brain gets around the syntax and that’s good enough for me, for now.

Follow

Get every new post delivered to your Inbox.