Want to delete a couple of records quickly from a large table; well DELETE is more than capable (indexing issues aside). Want to delete a couple of million from a large table quickly? Now it can get a little more interesting.
Partitioning is a very useful tool for moving large amounts of data very quickly. However, it does require an understanding of the several elements of this interesting aspect of SQL Server.
- What is Partitioning?
You can have a table within a database that for general use appears to be one large storage area for your information. However, it can be stored split into several filegroups and these filegroups can be spread across several folders/drives. So your data in reality can be spread across a wide range of drives for improved access on a hardware level whilst appearing to the user as one table, without having to take any considerations for its partitioned structure.
- How do you Partition?
To create a partitioned table there are three things that need to be present:
1. A Partition Function, which describes how the data will be split across partitions. It shows the values that will be used for dictating how the data is spread across those partitions. At this point it doesn’t specify the column itself, just the values.
2. A Partition Scheme, to show which filegroups are used for the partitions. If you design a table to have two partitioned values (as described in the Partition Function) then the Partition Scheme will have a filegroup for each of these plus one extra for those values that won’t fit within the values specified in the Function.
3. Allocating the table to a Partition Scheme. When the table is created it needs to be allocated at this time to a Partition Scheme. It is at this point that you specify which column of the table is used by the Partition Scheme, to map values according to the Partition Function.
-
Example
Quite often examples of partitioning are given using date ranges, because that is a popular way of dividing the data. This time I’m going to use something that I found easier to visualise and demonstrate – animals.
For no particular reason I’m going to create a table that stores details of cats and dogs. One partition will contain Cats and the other Dogs. Using these as the boundaries of the partitions will show what happens when other animals are introduced, depending upon how they are listed alphabetically.
First of all, create a database. To properly utilise partitioning I should create the database across several filegroups but that isn’t essential for this demonstration. In my environment the following code will create a database in one filegroup – ‘PRIMARY’.
CREATE DATABASE PartitionTest; GO
Now we create a Partition Function, which requires a name, the definition of the column it will ultimately be mapped against (or to look at it the other way – the mapped column should have the same definition as the Partition Function), the description of how the partition boundaries are set and the values that these boundaries will have.
CREATE PARTITION FUNCTION [pf_PetTypes](varchar(40)) AS RANGE LEFT FOR VALUES (N'Cat', N'Dog') GO
This function is called ‘pf_PetTypes’. It has two boundaries, ‘Cat’ and ‘Dog’. The ‘RANGE LEFT’ means that anything to the left of ‘Cat’ will go into the first partition. So alphabetically anything with a value of ‘Cat’ or less will go into this partition.
Anything to the left of ‘Dog’ (and including ‘Dog’) alphabetically right down to (but not including) ‘Cat’ will go into the second partition.
Anything beyond ‘Dog’ will go into the third partition that will be specified in the Partition Scheme.
‘RANGE RIGHT’ is another option for the Partition Function and this would of course alter the behaviour of the partitions.
Having created the Partition Function we now associate it with a Partition Scheme, which will map the ranges specified in the Function to filegroups.
CREATE PARTITION SCHEME [ps_PetTypes] AS PARTITION [pf_PetTypes] TO ([PRIMARY], [PRIMARY], [PRIMARY]) GO
This Scheme shows three partitions, which in this example all point to the same filegroup. For partition switching this is not a problem. To spread your partitions across multiple drives or folders you’d have to have different filegroups specified, which would have started with the database creation.
This Scheme maps the values of the first value within the Function to the first filegroup ‘PRIMARY’, the second value to the second filegroup ‘PRIMARY’ and any values that are beyond the range will go to the third filegroup ‘PRIMARY’.
Now the table can be created and mapped to the Partition Scheme.
CREATE TABLE PetDetails( PetDetailID INT IDENTITY(1,1), PetType VARCHAR(40) NOT NULL, PetBreed VARCHAR(20) NOT NULL DEFAULT 'Unknown', PetName VARCHAR(20), SpecialRequirements VARCHAR(500), AdditionalNotes VARCHAR(500), CONSTRAINT [PK_Petdetails] PRIMARY KEY CLUSTERED ( PetDetailID ASC, Pettype ASC ) WITH (IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON ps_PetTypes(PetType) ) ON ps_PetTypes(PetType); GO
In this example I’ve specified the Partition Scheme against the Clustered Index and the table itself (‘ON ps_PetTypes(PetType)’). This isn’t actually necessary – one or the other is sufficient, it’s just a habit I’ve developed.
It is here that you specify which column is used to decide which partition is required. The column is the parameter used for the ps_PetTypes Partition Scheme. In this case the column is PetType.
A Clustered Index has been created against this table and where this has been done the partition column has to be part of the Key, otherwise the error ‘Partition columns for a unique index must be a subset of the index key’ is thrown and the table creation fails.
For partition switching we need somewhere to actually switch the data to. For this purpose you need another table that is identical in structure to the main table. If there are defaults and suchlike within the main table then these aren’t required in the additional table but columns, definitions and Clustered Index must match.
CREATE TABLE PetDetails_PartitionSwitch( PetDetailID INT IDENTITY(1,1), PetType VARCHAR(40) NOT NULL, PetBreed VARCHAR(20) NOT NULL, PetName VARCHAR(20), SpecialRequirements VARCHAR(500), AdditionalNotes VARCHAR(500), CONSTRAINT [PK_Petdetails_PartitionSwitch] PRIMARY KEY CLUSTERED ( PetDetailID ASC, Pettype ASC ) WITH (IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON ps_PetTypes(PetType) ) ON ps_PetTypes(PetType); GO
In this case I’ve created another table with the suffix ‘_PartitionSwitch’, to signify the purpose of this table.
Now that we’ve created the tables the partitions details can be found:
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; GO
Both tables have three partitions, with no rows of data.
So we’ll add some data and look again:
INSERT INTO dbo.PetDetails ( PetType , PetBreed , PetName , SpecialRequirements , AdditionalNotes ) VALUES ( 'Dog' , 'Alsatian' , 'Pookie' , '' , '' ), ( 'Dog' , 'Boxer', 'Rover','','Not the brightest bulb in the shop'), ( 'Cat', 'Persian', 'Snowball', '',''); GO SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; GO
PetDetails has one row in Partition 1 and two rows in Partition 2, because we added one ‘Cat’ row and two ‘Dog’ rows.
This table was designed with Cats and Dogs in mind, so what happens when other animals are used?
INSERT INTO dbo.PetDetails ( PetType , PetBreed , PetName , SpecialRequirements , AdditionalNotes ) VALUES ( 'Bird' , 'Canary' , '' , '' , '' ); GO SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; GO
Because ‘Bird’ is left alphabetically of ‘Cat’ it also goes into Partition 1.
Add another one:
INSERT INTO dbo.PetDetails ( PetType , PetBreed , PetName , SpecialRequirements , AdditionalNotes ) VALUES ( 'Frog' , 'Pig Frog' , '' , '' , '' ); GO --Frog goes into partition 3, because it is beyond the range of 'Dog' SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; GO
The value ‘Frog’ is beyond the boundary set by ‘Dog’ alphabetically, so it goes into the extra partition that is there to cater for out of range values.
Remove these additional values, so we can get back to ‘clean’ data:
--Get rid of the distracting data DELETE dbo.PetDetails WHERE PetType NOT IN ('cat','dog'); GO
It is possible to use the partition function to ascertain what values will be stored into which partition without actually writing any data:
--What partition will 'cat' be in SELECT $PARTITION.pf_PetTypes('cat'); GO --What partition would 'elephant' be placed into? SELECT $PARTITION.pf_PetTypes('elephant'); GO
The result is the partition number the data will be written to.
Check the partitions before we switch:
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; GO
I know that the ‘Cat’ entry is in Partition 1. To move this data by switching the partition the command is:
ALTER TABLE dbo.PetDetails SWITCH PARTITION 1 TO dbo.PetDetails_PartitionSwitch PARTITION 1; GO SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; GO
Partition 1 of PetDetails now has no rows (check the partition_number column as the order may have changed) and Partition 1 of PetDetails_PartitionSwitch has one row.
SELECT * FROM dbo.PetDetails; GO
There is no trace of the ‘Cat’ entry in this table, it is now in PetDetails_PartitionSwitch:
SELECT * FROM dbo.PetDetails_PartitionSwitch; GO
Now we can remove the data quickly by using TRUNCATE TABLE against Petdetails_PartitionSwitch.
The speed of the SWITCH PARTITION command and the TRUNCATE TABLE are extremely fast, because they are both DDL commands. Therefore switching and truncating one row or a million rows will take pretty much the same amount of time; certainly not enough difference for you to worry about.
For the finer detail on Partition Switching – http://technet.microsoft.com/en-us/library/ms191160(v=sql.105).aspx
*Nice post. I learn something more challenging on different blogs everyday. It will always be stimulating to read content from other writers and practice a little something from their store. Id prefer to use some with the content on my blog whether you dont mind. Natually Ill give you a link on your web blog. Thanks for sharing.
No problem. I hope its as useful to you as writing the article has been to me.
Very good information. Lucky me I found your site by accident (stumbleupon).
I’ve book-marked it foor later!
Good reading your ppost