Archive | Training RSS feed for this section

Forwarding Pointers and Fragmentation on a Heap Table

3 Dec

A heap is a table that has no clustered index – it may have non-clustered indexes but without a clustered index it is a heap.
When a row is updated within the heap, if the new data size means a row cannot be stored in the original location, SQL Server moves the row to another location and leaves a pointer (called a forwarding record) to show where it has been moved to.

Continue reading

Replication – the sprouts of my DBA world.

14 Aug

h94C0D998
Actually, that isn’t entirely fair. I don’t have to eat sprouts, which is just as well. However, I do have to deal with Replication – certainly in my current job. Whereas my refusal to eat the Little Green Balls of Death won’t result in a divorce (because my wife, as wonderful as she is, does for some strange reason really like sprouts) I can’t imagine a job that specified it uses Replication will suffer my presence for long if I flatly refuse to touch it.

SQL Server has a broad range of tools and features, so understanding all of them beyond a basic level is probably beyond most of us. However, within your role you will be expected to have a detailed understanding of some aspects of SQL Server and here, Replication is amongst those required skills.

Like many people, I have created Replication tasks in the past by using the Wizards, so I can at least have a basic understanding of the subject. However, the Wizard does everything for you, makes assumptions and on a simple test system it does work quite well.

On an industrial scale though, and processed via SQL Scripts it is another story. Some of the error messages it can give are less than useful and let’s face it – Replication Monitor isn’t the most helpful tool.

The following series of articles on Replication are to act as an aide-memoire for myself, which is the real reason I have this blog series. Like most people I find that if I don’t use an aspect of SQL Server for an extended period of time I forget the finer details. Having got to grip with a portion of Replication I need to document it. If it helps anybody else then that’s a bonus.

Assumptions

First – the interesting bits. No DDL changes replicated (generally), because in some cases we don’t require replication of all of the columns and the subscriber table may have additional columns (its own partition scheme – so another partition id and a date time showing when it was written to the subscriber).

No snapshots, because of the size of some of the publisher’s tables. A lock is taken while a snapshot is generated, which isn’t desirable on a database that is required 24/7.

And of course – no Wizards. If you have to rebuild a server then it is so much easier to have the scripts to hand, rather than answer a plethora of questions from a Wizard and hope you gave the answers that the DBA who first set up Replication gave. Easier to see what is going to happen and also removes some of the more interesting defaults that the Wizards can specify.

I have three SQL Server 2012 instances on my Development machine and I’ll be using these as Publisher, Distributor and Subscriber.

The articles that follow will include the initial creation of the Distributor, creating a Pull Subscription and changing an Article. As my experience with sprouts Replication continues I’ll add more articles, because there’s no way this mess can stay fresh in my mind for long.

Partition Switching – Basic Example

17 Feb

Want to delete a couple of records quickly from a large table; well DELETE is more than capable (indexing issues aside). Want to delete a couple of million from a large table quickly? Now it can get a little more interesting.

Partitioning is a very useful tool for moving large amounts of data very quickly. However, it does require an understanding of the several elements of this interesting aspect of SQL Server.

    What is Partitioning?

You can have a table within a database that for general use appears to be one large storage area for your information. However, it can be stored split into several filegroups and these filegroups can be spread across several folders/drives. So your data in reality can be spread across a wide range of drives for improved access on a hardware level whilst appearing to the user as one table, without having to take any considerations for its partitioned structure.

    How do you Partition?

To create a partitioned table there are three things that need to be present:

1. A Partition Function, which describes how the data will be split across partitions. It shows the values that will be used for dictating how the data is spread across those partitions. At this point it doesn’t specify the column itself, just the values.

2. A Partition Scheme, to show which filegroups are used for the partitions. If you design a table to have two partitioned values (as described in the Partition Function) then the Partition Scheme will have a filegroup for each of these plus one extra for those values that won’t fit within the values specified in the Function.

3. Allocating the table to a Partition Scheme. When the table is created it needs to be allocated at this time to a Partition Scheme. It is at this point that you specify which column of the table is used by the Partition Scheme, to map values according to the Partition Function.

    Example

Quite often examples of partitioning are given using date ranges, because that is a popular way of dividing the data. This time I’m going to use something that I found easier to visualise and demonstrate – animals.

For no particular reason I’m going to create a table that stores details of cats and dogs. One partition will contain Cats and the other Dogs. Using these as the boundaries of the partitions will show what happens when other animals are introduced, depending upon how they are listed alphabetically.

First of all, create a database. To properly utilise partitioning I should create the database across several filegroups but that isn’t essential for this demonstration. In my environment the following code will create a database in one filegroup – ‘PRIMARY’.

CREATE DATABASE PartitionTest; 
GO 

Now we create a Partition Function, which requires a name, the definition of the column it will ultimately be mapped against (or to look at it the other way – the mapped column should have the same definition as the Partition Function), the description of how the partition boundaries are set and the values that these boundaries will have.

CREATE PARTITION FUNCTION [pf_PetTypes](varchar(40)) AS RANGE LEFT FOR VALUES (N'Cat', N'Dog') 
GO 

This function is called ‘pf_PetTypes’. It has two boundaries, ‘Cat’ and ‘Dog’. The ‘RANGE LEFT’ means that anything to the left of ‘Cat’ will go into the first partition. So alphabetically anything with a value of ‘Cat’ or less will go into this partition.
Anything to the left of ‘Dog’ (and including ‘Dog’) alphabetically right down to (but not including) ‘Cat’ will go into the second partition.
Anything beyond ‘Dog’ will go into the third partition that will be specified in the Partition Scheme.

‘RANGE RIGHT’ is another option for the Partition Function and this would of course alter the behaviour of the partitions.

Having created the Partition Function we now associate it with a Partition Scheme, which will map the ranges specified in the Function to filegroups.

CREATE PARTITION SCHEME [ps_PetTypes] AS PARTITION [pf_PetTypes] TO ([PRIMARY], [PRIMARY], [PRIMARY]) 
GO 

This Scheme shows three partitions, which in this example all point to the same filegroup. For partition switching this is not a problem. To spread your partitions across multiple drives or folders you’d have to have different filegroups specified, which would have started with the database creation.
This Scheme maps the values of the first value within the Function to the first filegroup ‘PRIMARY’, the second value to the second filegroup ‘PRIMARY’ and any values that are beyond the range will go to the third filegroup ‘PRIMARY’.

Now the table can be created and mapped to the Partition Scheme.

CREATE TABLE PetDetails( 
	PetDetailID		INT IDENTITY(1,1), 
	PetType			VARCHAR(40) NOT NULL, 
	PetBreed			VARCHAR(20)	NOT NULL DEFAULT 'Unknown', 
	PetName			VARCHAR(20), 
	SpecialRequirements	VARCHAR(500), 
	AdditionalNotes		VARCHAR(500), 
CONSTRAINT [PK_Petdetails] PRIMARY KEY CLUSTERED 
( 
	PetDetailID ASC, 
	Pettype		ASC  
) WITH (IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON ps_PetTypes(PetType)  
) ON ps_PetTypes(PetType); 
 
GO 

In this example I’ve specified the Partition Scheme against the Clustered Index and the table itself (‘ON ps_PetTypes(PetType)’). This isn’t actually necessary – one or the other is sufficient, it’s just a habit I’ve developed.
It is here that you specify which column is used to decide which partition is required. The column is the parameter used for the ps_PetTypes Partition Scheme. In this case the column is PetType.

A Clustered Index has been created against this table and where this has been done the partition column has to be part of the Key, otherwise the error ‘Partition columns for a unique index must be a subset of the index key’ is thrown and the table creation fails.

For partition switching we need somewhere to actually switch the data to. For this purpose you need another table that is identical in structure to the main table. If there are defaults and suchlike within the main table then these aren’t required in the additional table but columns, definitions and Clustered Index must match.

CREATE TABLE PetDetails_PartitionSwitch( 
	PetDetailID		INT IDENTITY(1,1), 
	PetType			VARCHAR(40) NOT NULL, 
	PetBreed			VARCHAR(20) NOT NULL, 
	PetName			VARCHAR(20), 
	SpecialRequirements	VARCHAR(500), 
	AdditionalNotes		VARCHAR(500), 
CONSTRAINT [PK_Petdetails_PartitionSwitch] PRIMARY KEY CLUSTERED 
( 
	PetDetailID ASC, 
	Pettype		ASC 
) WITH (IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON ps_PetTypes(PetType)  
) ON ps_PetTypes(PetType); 
 
GO 

In this case I’ve created another table with the suffix ‘_PartitionSwitch’, to signify the purpose of this table.
Now that we’ve created the tables the partitions details can be found:

SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_01

Both tables have three partitions, with no rows of data.
So we’ll add some data and look again:

INSERT INTO dbo.PetDetails 
        ( PetType , 
          PetBreed , 
          PetName , 
          SpecialRequirements , 
          AdditionalNotes 
        ) 
VALUES  ( 'Dog' , 'Alsatian' , 'Pookie' ,  '' , '' ), 
		( 'Dog' , 'Boxer', 'Rover','','Not the brightest bulb in the shop'), 
		( 'Cat', 'Persian', 'Snowball', '',''); 
GO 


SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_02

PetDetails has one row in Partition 1 and two rows in Partition 2, because we added one ‘Cat’ row and two ‘Dog’ rows.
This table was designed with Cats and Dogs in mind, so what happens when other animals are used?

INSERT INTO dbo.PetDetails 
        ( PetType , 
          PetBreed , 
          PetName , 
          SpecialRequirements , 
          AdditionalNotes 
        ) 
VALUES  ( 'Bird' , 'Canary' , '' ,  '' , '' ); 
GO 
 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_03

Because ‘Bird’ is left alphabetically of ‘Cat’ it also goes into Partition 1.

Add another one:

INSERT INTO dbo.PetDetails 
        ( PetType , 
          PetBreed , 
          PetName , 
          SpecialRequirements , 
          AdditionalNotes 
        ) 
VALUES  ( 'Frog' , 'Pig Frog' , '' ,  '' , '' ); 
GO 

--Frog goes into partition 3, because it is beyond the range of 'Dog' 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_04

The value ‘Frog’ is beyond the boundary set by ‘Dog’ alphabetically, so it goes into the extra partition that is there to cater for out of range values.
Remove these additional values, so we can get back to ‘clean’ data:

--Get rid of the distracting data 
DELETE dbo.PetDetails 
WHERE PetType NOT IN ('cat','dog'); 
GO 

It is possible to use the partition function to ascertain what values will be stored into which partition without actually writing any data:

--What partition will 'cat' be in 
SELECT $PARTITION.pf_PetTypes('cat'); 
GO 

--What partition would 'elephant' be placed into? 
SELECT $PARTITION.pf_PetTypes('elephant'); 
GO 

The result is the partition number the data will be written to.
Check the partitions before we switch:

SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_05

I know that the ‘Cat’ entry is in Partition 1. To move this data by switching the partition the command is:

ALTER TABLE dbo.PetDetails 
SWITCH PARTITION 1 TO dbo.PetDetails_PartitionSwitch PARTITION 1; 
GO 

SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 
GO 

PS_06

Partition 1 of PetDetails now has no rows (check the partition_number column as the order may have changed) and Partition 1 of PetDetails_PartitionSwitch has one row.

SELECT * 
FROM dbo.PetDetails; 
GO 

PS_07

There is no trace of the ‘Cat’ entry in this table, it is now in PetDetails_PartitionSwitch:

SELECT * 
FROM dbo.PetDetails_PartitionSwitch; 
GO 

PS_08

Now we can remove the data quickly by using TRUNCATE TABLE against Petdetails_PartitionSwitch.

The speed of the SWITCH PARTITION command and the TRUNCATE TABLE are extremely fast, because they are both DDL commands. Therefore switching and truncating one row or a million rows will take pretty much the same amount of time; certainly not enough difference for you to worry about.
For the finer detail on Partition Switching – http://technet.microsoft.com/en-us/library/ms191160(v=sql.105).aspx

XML and STUFF for a simple CSV column

27 Dec

I’m not a big fan of XML via T-SQL – far too long-winded a monster for my liking. However, there is no denying the usefulness of it and the fact that on occasions it simply doesn’t matter if you like an aspect of SQL Server, you may simply have to learn to use it. If I could pick and choose which bits I get to play with I suspect I wouldn’t be a DBA for long.

One nice little ‘trick’ with the XML commands that I’m grateful for is used to provide a comma-separated list of data. It’s an interesting solution that is often shown to people who need such a thing and is far faster than other solutions I’ve seen.
As an example, I would like to produce a csv list of the surnames from the following data:

Data01

To obtain the following results:

Data02

First of all – the test data:

CREATE TABLE #PersonList( 
		PersonID	INT IDENTITY, 
		Forename	VARCHAR (50), 
		Surname		VARCHAR(50) 
		); 
GO 
 
INSERT INTO #PersonList ( Forename, Surname ) 
VALUES  ( 'John', 'Smith'), 
		( 'Tom', 'Brown'), 
		( 'Tom', 'Sawyer'), 
		( 'Harry', 'Flashman'), 
		( 'Anna', 'Howe'), 
		( 'Julian', 'Sorel'), 
		('Sherlock', 'Holmes'); 
GO 
 
SELECT * 
FROM #PersonList; 

The command to produce the csv data is:

SELECT STUFF((SELECT ', ' + Surname  
                            FROM #PersonList 
		          ORDER BY Surname 
                            FOR XML PATH('')), 1, 2, ''); 

It comprises of two commands. The inner command is FOR XML in it’s simplest form, with a comma added to the selected data:

SELECT ', ' +  Surname 
           FROM  #PersonList 
           ORDER BY Surname 
           OR XML PATH(''); 

This is a very simple example of the XML command. Normally the PATH would have something within it and would therefore produce a very basic XML.
For example:

SELECT Surname  
         FROM  #PersonList 
         FOR XML PATH('test'); 

will produce XML output thus:

XML01

Not valid XML in this form (hence the red underline) but you get the idea.

Change the command to add a comma between each surname and remove the element name (the ‘test’) and you get a strange XML value like this:

XML02

The next requirement is to remove the leading comma and convert this back to a character field. This is where STUFF comes in. In this case the first parameter for STUFF is the XML query.
Then it looks at the first character position of the query result and replaces the two characters from that point with a zero length field (”).
As STUFF returns character data or binary data depending upon the data type it is processing, in this case it returns the result as a character field, with the leading comma removed.
Two problems solved with one command.

I’ve added the ORDER BY just to make the data look more organised and of course the ‘inner’ XML query can be more involved than this one but it should give an idea of how to use this for this particular purpose:

SELECT STUFF((SELECT ', ' + Surname 
                            FROM #PersonList 
		          ORDER BY Surname 
                            FOR XML PATH('')), 1, 2, ''); 

The King is dead, Long Live the King

8 Nov

So, SQLPass 2013 has finished. Of course, all I read about it from those that attended are good things and having attended the 2012 conference I can believe them.

So, when are you going to start planning for SQLPass November 2014?

Yes, it might be 11 months away but look at what the fees have increased to at that point:
SQLPass01
After December 6th it goes up by $300 – almost a third. Then it climbs steadily in several steps from that point onward. That’s a lot of money to save just by making the decision early.

If you’re paying for it yourself (which is how I do it) then that’s more money to be spent on the rest of the organising that goes with this – flights, hotels, food and suchlike.
If your company is paying for it I’m sure they’d appreciate an early request to spend a little less – then you both win.

Of course, paying this early means that you have no idea what the content will be but I wouldn’t worry about that. I’ve held off registering for some of the smaller conferences before, because there may be only a handful of speakers and even less presentations of interest to myself.
But this is the largest event of this type – there is no shortage of high-quality speakers. If you think you have a SQL Server related interest that nobody here can help you with then I suspect quite a few of the speakers would like the challenge of talking to you.

Pay early and then wait to see what you’ve bought – in this case it’s the best way to do it.

http://www.sqlpass.org/summit/2013/About/WhoShouldAttend/AttendeesShareTheirExperience.aspx