SSIS Error Code “0x80070057″ in Slowly Changing Dimension

28 Aug

SSIS can produce an inordinate number of error messages and numbers. Some of these can even be helpful and informative.
Generally accompanied by an Error Code, one of the most popular appears to be “0x80070057” and can have a number of causes.

Recently I came across this particularly helpful code when creating a Slowly Changing Dimension:

Error: 0xC0202009 at ‘Task Name’ , Slowly Changing Dimension [14907]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80070057.
An OLE DB record is available. Source: “Microsoft SQL Server Native Client 10.0″ Hresult: 0x80070057 Description: “The parameter is incorrect.”.
Error: 0xC0047022 at ‘Task Name’ , SSIS.Pipeline: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component “Slowly Changing Dimension” (14907) failed with error code 0xC0202009 while processing input “Slowly Changing Dimension Input” (14918). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.
Error: 0xC02020C4 at ‘Task Name’ , ‘View Name’ [1]: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.
Error: 0xC0047038 at , SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component “’View Name’” (1) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.

The behaviour was thus – run it for the first time and the process extracted data from a View and inserted all of the rows as new rows into the output. Run it a second time, when it would check for changes (because the output table now had data) and it would instantly fail with the above error message.

The first thing I noticed was the “The parameter is incorrect”, which is strange because the Wizard created everything based on the details I fed into it – so it shouldn’t be anything I’ve done.
So, as I had created 5 other Slowly Changing Dimensions for other Views and Tables I decided to recreate this one from the start with a very ‘narrow’ table. The table concerned with this particular process was rather wide, certainly when compared to the others that had worked successfully.
A couple of fairly tedious hours later I had a failure when I added the last few columns back in (as ever, it’s always something at the end). These columns were varchar(8000) and there were six of them.
This is where the line in the error message “The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020” started to look more interesting.
Checking these columns showed that the 8000 size was a tad generous, based upon the data likely to be stored within them. Reducing these to 300 each removed the error and everything worked as expected.

I have no idea what size buffer SSIS can create for this processing but the message implied that it wasn’t large enough for this particular table. Reducing the width of the table corrected this.
It may not be the solution for every instance of that message but in this case it worked and can be added to the list of possible solutions to one less-than-helpful SSIS Error Code.

Replication – the sprouts of my DBA world.

14 Aug h94C0D998

Actually, that isn’t entirely fair. I don’t have to eat sprouts, which is just as well. However, I do have to deal with Replication – certainly in my current job. Whereas my refusal to eat the Little Green Balls of Death won’t result in a divorce (because my wife, as wonderful as she is, does for some strange reason really like sprouts) I can’t imagine a job that specified it uses Replication will suffer my presence for long if I flatly refuse to touch it.

SQL Server has a broad range of tools and features, so understanding all of them beyond a basic level is probably beyond most of us. However, within your role you will be expected to have a detailed understanding of some aspects of SQL Server and here, Replication is amongst those required skills.

Like many people, I have created Replication tasks in the past by using the Wizards, so I can at least have a basic understanding of the subject. However, the Wizard does everything for you, makes assumptions and on a simple test system it does work quite well.

On an industrial scale though, and processed via SQL Scripts it is another story. Some of the error messages it can give are less than useful and let’s face it – Replication Monitor isn’t the most helpful tool.

The following series of articles on Replication are to act as an aide-memoire for myself, which is the real reason I have this blog series. Like most people I find that if I don’t use an aspect of SQL Server for an extended period of time I forget the finer details. Having got to grip with a portion of Replication I need to document it. If it helps anybody else then that’s a bonus.


First – the interesting bits. No DDL changes replicated (generally), because in some cases we don’t require replication of all of the columns and the subscriber table may have additional columns (its own partition scheme – so another partition id and a date time showing when it was written to the subscriber).

No snapshots, because of the size of some of the publisher’s tables. A lock is taken while a snapshot is generated, which isn’t desirable on a database that is required 24/7.

And of course – no Wizards. If you have to rebuild a server then it is so much easier to have the scripts to hand, rather than answer a plethora of questions from a Wizard and hope you gave the answers that the DBA who first set up Replication gave. Easier to see what is going to happen and also removes some of the more interesting defaults that the Wizards can specify.

I have three SQL Server 2012 instances on my Development machine and I’ll be using these as Publisher, Distributor and Subscriber.

The articles that follow will include the initial creation of the Distributor, creating a Pull Subscription and changing an Article. As my experience with sprouts Replication continues I’ll add more articles, because there’s no way this mess can stay fresh in my mind for long.

Partition Switching – Basic Example

17 Feb shell_game

Want to delete a couple of records quickly from a large table; well DELETE is more than capable (indexing issues aside). Want to delete a couple of million from a large table quickly? Now it can get a little more interesting.

Partitioning is a very useful tool for moving large amounts of data very quickly. However, it does require an understanding of the several elements of this interesting aspect of SQL Server.

    What is Partitioning?

You can have a table within a database that for general use appears to be one large storage area for your information. However, it can be stored spilt into several filegroups and these filegroups can be spread across several folders/drives. So your data in reality can be spread across a wide range of drives for improved access on a hardware level whilst appearing to the user as one table, without having to take any considerations for its partitioned structure.

    How do you Partition?

To create a partitioned table there are three things that need to be present:

1. A Partition Function, which describes how the data will be split across partitions. It shows the values that will be used for dictating how the data is spread across those partitions. At this point it doesn’t specify the column itself, just the values.

2. A Partition Scheme, to show which filegroups are used for the partitions. If you design a table to have two partitioned values (as described in the Partition Function) then the Partition Scheme will have a filegroup for each of these plus one extra for those values that won’t fit within the values specified in the Function.

3. Allocating the table to a Partition Scheme. When the table is created it needs to be allocated at this time to a Partition Scheme. It is at this point that you specify which column of the table is used by the Partition Scheme, to map values according to the Partition Function.


Quite often examples of partitioning are given using date ranges, because that is a popular way of dividing the data. This time I’m going to use something that I found easier to visualise and demonstrate – animals.

For no particular reason I’m going to create a table that stores details of cats and dogs. One partition will contain Cats and the other Dogs. Using these as the boundaries of the partitions will show what happens when other animals are introduced, depending upon how they are listed alphabetically.

First of all, create a database. To properly utilise partitioning I should create the database across several filegroups but that isn’t essential for this demonstration. In my environment the following code will create a database in one filegroup – ‘PRIMARY’.


Now we create a Partition Function, which requires a name, the definition of the column it will ultimately be mapped against (or to look at it the other way – the mapped column should have the same definition as the Partition Function), the description of how the partition boundaries are set and the values that these boundaries will have.


This function is called ‘pf_PetTypes’. It has two boundaries, ‘Cat’ and ‘Dog’. The ‘RANGE LEFT’ means that anything to the left of ‘Cat’ will go into the first partition. So alphabetically anything with a value of ‘Cat’ or less will go into this partition.
Anything to the left of ‘Dog’ (and including ‘Dog’) alphabetically right down to (but not including) ‘Cat’ will go into the second partition.
Anything beyond ‘Dog’ will go into the third partition that will be specified in the Partition Scheme.

‘RANGE RIGHT’ is another option for the Partition Function and this would of course alter the behaviour of the partitions.

Having created the Partition Function we now associate it with a Partition Scheme, which will map the ranges specified in the Function to filegroups.


This Scheme shows three partitions, which in this example all point to the same filegroup. For partition switching this is not a problem. To spread your partitions across multiple drives or folders you’d have to have different filegroups specified, which would have started with the database creation.
This Scheme maps the values of the first value within the Function to the first filegroup ‘PRIMARY’, the second value to the second filegroup ‘PRIMARY’ and any values that are beyond the range will go to the third filegroup ‘PRIMARY’.

Now the table can be created and mapped to the Partition Scheme.

	PetDetailID		INT IDENTITY(1,1), 
	PetType			VARCHAR(40) NOT NULL, 
	PetBreed			VARCHAR(20)	NOT NULL DEFAULT 'Unknown', 
	PetName			VARCHAR(20), 
	SpecialRequirements	VARCHAR(500), 
	AdditionalNotes		VARCHAR(500), 
	PetDetailID ASC, 
	Pettype		ASC  
) ON ps_PetTypes(PetType); 

In this example I’ve specified the Partition Scheme against the Clustered Index and the table itself (‘ON ps_PetTypes(PetType)’). This isn’t actually necessary – one or the other is sufficient, it’s just a habit I’ve developed.
It is here that you specify which column is used to decide which partition is required. The column is the parameter used for the ps_PetTypes Partition Scheme. In this case the column is PetType.

A Clustered Index has been created against this table and where this has been done the partition column has to be part of the Key, otherwise the error ‘Partition columns for a unique index must be a subset of the index key’ is thrown and the table creation fails.

For partition switching we need somewhere to actually switch the data to. For this purpose you need another table that is identical in structure to the main table. If there are defaults and suchlike within the main table then these aren’t required in the additional table but columns, definitions and Clustered Index must match.

CREATE TABLE PetDetails_PartitionSwitch( 
	PetDetailID		INT IDENTITY(1,1), 
	PetType			VARCHAR(40) NOT NULL, 
	PetBreed			VARCHAR(20) NOT NULL, 
	PetName			VARCHAR(20), 
	SpecialRequirements	VARCHAR(500), 
	AdditionalNotes		VARCHAR(500), 
	PetDetailID ASC, 
	Pettype		ASC 
) ON ps_PetTypes(PetType); 

In this case I’ve created another table with the suffix ‘_PartitionSwitch’, to signify the purpose of this table.
Now that we’ve created the tables the partitions details can be found:

SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 


Both tables have three partitions, with no rows of data.
So we’ll add some data and look again:

INSERT INTO dbo.PetDetails 
        ( PetType , 
          PetBreed , 
          PetName , 
          SpecialRequirements , 
VALUES  ( 'Dog' , 'Alsatian' , 'Pookie' ,  '' , '' ), 
		( 'Dog' , 'Boxer', 'Rover','','Not the brightest bulb in the shop'), 
		( 'Cat', 'Persian', 'Snowball', '',''); 

SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 


PetDetails has one row in Partition 1 and two rows in Partition 2, because we added one ‘Cat’ row and two ‘Dog’ rows.
This table was designed with Cats and Dogs in mind, so what happens when other animals are used?

INSERT INTO dbo.PetDetails 
        ( PetType , 
          PetBreed , 
          PetName , 
          SpecialRequirements , 
VALUES  ( 'Bird' , 'Canary' , '' ,  '' , '' ); 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 


Because ‘Bird’ is left alphabetically of ‘Cat’ it also goes into Partition 1.

Add another one:

INSERT INTO dbo.PetDetails 
        ( PetType , 
          PetBreed , 
          PetName , 
          SpecialRequirements , 
VALUES  ( 'Frog' , 'Pig Frog' , '' ,  '' , '' ); 

--Frog goes into partition 3, because it is beyond the range of 'Dog' 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 


The value ‘Frog’ is beyond the boundary set by ‘Dog’ alphabetically, so it goes into the extra partition that is there to cater for out of range values.
Remove these additional values, so we can get back to ‘clean’ data:

--Get rid of the distracting data 
DELETE dbo.PetDetails 
WHERE PetType NOT IN ('cat','dog'); 

It is possible to use the partition function to ascertain what values will be stored into which partition without actually writing any data:

--What partition will 'cat' be in 
SELECT $PARTITION.pf_PetTypes('cat'); 

--What partition would 'elephant' be placed into? 
SELECT $PARTITION.pf_PetTypes('elephant'); 

The result is the partition number the data will be written to.
Check the partitions before we switch:

SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 


I know that the ‘Cat’ entry is in Partition 1. To move this data by switching the partition the command is:

ALTER TABLE dbo.PetDetails 
SWITCH PARTITION 1 TO dbo.PetDetails_PartitionSwitch PARTITION 1; 

SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails'; 
SELECT * FROM sys.partitions WHERE object_name(object_id)='PetDetails_PartitionSwitch'; 


Partition 1 of PetDetails now has no rows (check the partition_number column as the order may have changed) and Partition 1 of PetDetails_PartitionSwitch has one row.

FROM dbo.PetDetails; 


There is no trace of the ‘Cat’ entry in this table, it is now in PetDetails_PartitionSwitch:

FROM dbo.PetDetails_PartitionSwitch; 


Now we can remove the data quickly by using TRUNCATE TABLE against Petdetails_PartitionSwitch.

The speed of the SWITCH PARTITION command and the TRUNCATE TABLE are extremely fast, because they are both DDL commands. Therefore switching and truncating one row or a million rows will take pretty much the same amount of time; certainly not enough difference for you to worry about.
For the finer detail on Partition Switching –

XML and STUFF for a simple CSV column

27 Dec Cookie Monster

I’m not a big fan of XML via T-SQL – far too long-winded a monster for my liking. However, there is no denying the usefulness of it and the fact that on occasions it simply doesn’t matter if you like an aspect of SQL Server, you may simply have to learn to use it. If I could pick and choose which bits I get to play with I suspect I wouldn’t be a DBA for long.

One nice little ‘trick’ with the XML commands that I’m grateful for is used to provide a comma-separated list of data. It’s an interesting solution that is often shown to people who need such a thing and is far faster than other solutions I’ve seen.
As an example, I would like to produce a csv list of the surnames from the following data:


To obtain the following results:


First of all – the test data:

CREATE TABLE #PersonList( 
		Forename	VARCHAR (50), 
		Surname		VARCHAR(50) 
INSERT INTO #PersonList ( Forename, Surname ) 
VALUES  ( 'John', 'Smith'), 
		( 'Tom', 'Brown'), 
		( 'Tom', 'Sawyer'), 
		( 'Harry', 'Flashman'), 
		( 'Anna', 'Howe'), 
		( 'Julian', 'Sorel'), 
		('Sherlock', 'Holmes'); 
FROM #PersonList; 

The command to produce the csv data is:

SELECT STUFF((SELECT ', ' + Surname  
                            FROM #PersonList 
		          ORDER BY Surname 
                            FOR XML PATH('')), 1, 2, ''); 

It comprises of two commands. The inner command is FOR XML in it’s simplest form, with a comma added to the selected data:

SELECT ', ' +  Surname 
           FROM  #PersonList 
           ORDER BY Surname 
           OR XML PATH(''); 

This is a very simple example of the XML command. Normally the PATH would have something within it and would therefore produce a very basic XML.
For example:

SELECT Surname  
         FROM  #PersonList 
         FOR XML PATH('test'); 

will produce XML output thus:


Not valid XML in this form (hence the red underline) but you get the idea.

Change the command to add a comma between each surname and remove the element name (the ‘test’) and you get a strange XML value like this:


The next requirement is to remove the leading comma and convert this back to a character field. This is where STUFF comes in. In this case the first parameter for STUFF is the XML query.
Then it looks at the first character position of the query result and replaces the two characters from that point with a zero length field (”).
As STUFF returns character data or binary data depending upon the data type it is processing, in this case it returns the result as a character field, with the leading comma removed.
Two problems solved with one command.

I’ve added the ORDER BY just to make the data look more organised and of course the ‘inner’ XML query can be more involved than this one but it should give an idea of how to use this for this particular purpose:

SELECT STUFF((SELECT ', ' + Surname 
                            FROM #PersonList 
		          ORDER BY Surname 
                            FOR XML PATH('')), 1, 2, ''); 

UnPivot – Basic Example

23 Dec 1387817832_two_hand_two_finger_rotate_gestureworks

In a previous posting I described a simple example of Pivot – a much used feature which I personally have problems recalling the syntax of (hence the article in the first place). It’s lesser known relative in UnPivot, which is designed to do the opposite – it transforms columns of data into rows.

In the following example I have a few details about properties – the number of Bedrooms, Garages, Gardens and Attic Spaces. I’ve used this as an example because the column names will make it more apparent how the query and the results are connected.

I want to transform the data from showing the property features as column names to actually have those column names listed as data, along with the values that these columns contain:


First of all – the test data:

CREATE TABLE #Properties 
      PropertyID INT , 
      Attic_Spaces INT , 
      Bedrooms INT , 
      Garages INT , 
      Gardens INT 
    ) ; 

INSERT  INTO #Properties 
        ( PropertyID, Attic_Spaces, Bedrooms, Garages, Gardens ) 
VALUES  ( 101, 1, 2, 0, 1 ), 
        ( 102, 2, 1, 1, 2 ), 
        ( 103, 1, 2, 1, 2 ), 
        ( 104, 1, 3, 2, 2 ) ; 

FROM    #Properties ;   

The basic construct of an UnPivot is quite simple:

SELECT PropertyID, 
	   FROM #Properties p 
	   UNPIVOT ( NumberOf FOR PropertyFeature IN ( Attic_Spaces, Bedrooms, Garages, Gardens ) ) AS unpvt ; 

This will select PropertyID from the #Properties table. Then it will select the columns ‘PropertyFeature’ and ‘NumberOf’ from the UNPIVOT part of the query. In this section I have asked it to create a column called ‘PropertyFeature’, which is taken from the columns called ‘Attic_Spaces’, ‘Bedrooms’, ‘Garages’ and ‘Gardens’. The value that is held within those ‘original’ columns (within the #Properties table) will be stored in the column called ‘NumberOf’ within the UNPIVOT section.

The alias for the #Properties table isn’t required, it’s just a habit of mine. However, the alias for the UNPIVOT (the ‘AS unpvt’) is mandatory even though you see no reference to it.

This may not be the most accurate description of what it is really doing, but it is how my brain gets around the syntax and that’s good enough for me, for now.

Snapshot Backup – Basic Example

12 Dec dataSnapshot_camera

A couple of questions recently in SQL Server Central have related to Snapshot Backups and reminded me of an article I wrote a few years ago about such things for the Developers in my office.
So, using that as a basis for this article I’m going to explain the basics of Snapshot Backups.

In the past I’ve had to run tests against some databases that would take upwards of 30 minutes to restore whenever I wanted to repeat those tests. This is a waste of my time and resources as it also brought my PC to a halt whilst running. Of course, you can take copies of individual tables if you’re only interfering with a small number and move the data back to repeat a test. However, there are occasions where there are numerous tables impacted and it just isn’t a realistic option.
This is where you can take a Snapshot. A Snapshot is effectively a read-only copy of the database, generated by SQL Server as the data is changed. You can run SELECT statements against a Snapshot Database as if it were the original – any data altered within the originating database is within the Snapshot in its original state and if the data you want has not been altered then SQL Server will look in the originating database; your query just needs to look at the Snapshot Database without caring if the data will actually be held in there or not.
To create a Database Snapshot, you need to know the actual filenames that the database is made from. The easiest way is to use SQL Server Management Studio, right-click on the database and look at the properties. We’re after the ‘file’ information:


This particular database has two files for storing the data – ‘SnapshotDemo’ and ‘SecondaryData’.
To create a snapshot, you need to have a folder where the Snapshot data will be stored and then tell SQL Server to store the data from the two files detailed above into files within this folder (it will create the files but the folder needs to exist beforehand):

CREATE DATABASE SnapshotDemo_Snapshot01 ON
(NAME = SnapshotDemo, FILENAME = 'C:\Demo Snapshots\SnapshotDemo01.SS'),
(NAME = SecondaryData, FILENAME = 'C:\Demo Snapshots\SecondaryData01.SS')
AS SNAPSHOT OF SnapshotDemo; 

I’ve created the Snapshot files based on the names of the original files. The ‘SS’ extension is the usual extension used for snapshot data but neither of these matter – you can call the files whatever you like.
Once the command runs I now have a Snapshot Database:


Before I change any data, both are essentially identical databases. The difference is that the Snapshot is read-only. Running a SELECT against both databases gives the same results:


As an aside, the data was generated by Redgate’s SQL Data Generator – a wonderful tool if you ever get the opportunity to grab it.

Now I run a DELETE against the originating database – removing the rows that have ‘ACTIVE’ set to 0 and repeat the previous query:


Now the data in ‘SnapshotDemo’ shows the data as it is after the DELETE but ‘SnapshotDemo_Snapshot01’ shows the data as it was before the change.

I can now make whatever changes I want to the data within the originating database and SQL Server will track the alterations, keeping the original data within the Snapshot.
When I need to restore the data I don’t use the ‘usual’ restore. Instead I restore the Snapshot:

FROM DATABASE_SNAPSHOT = 'SnapshotDemo_Snapshot01';

For a large database this can take a fraction of the time a ‘normal’ restore requires. The restore time varies depending upon how much data you have been playing with but the important thing is that it is just the altered data – it doesn’t restore the entire database (unless you’ve changed all data within the entire database of course).
Now repeat the SELECT statement from earlier:


So the database has been restored back to the state it was in when the snapshot was taken and I can play all over again.

In a future article I’ll look more closely at what a Snapshot actually does behind the scenes but this basic example and explanation should give an idea of the usefulness of this somewhat underused facility.

The King is dead, Long Live the King

8 Nov

So, SQLPass 2013 has finished. Of course, all I read about it from those that attended are good things and having attended the 2012 conference I can believe them.

So, when are you going to start planning for SQLPass November 2014?

Yes, it might be 11 months away but look at what the fees have increased to at that point:
After December 6th it goes up by $300 – almost a third. Then it climbs steadily in several steps from that point onward. That’s a lot of money to save just by making the decision early.

If you’re paying for it yourself (which is how I do it) then that’s more money to be spent on the rest of the organising that goes with this – flights, hotels, food and suchlike.
If your company is paying for it I’m sure they’d appreciate an early request to spend a little less – then you both win.

Of course, paying this early means that you have no idea what the content will be but I wouldn’t worry about that. I’ve held off registering for some of the smaller conferences before, because there may be only a handful of speakers and even less presentations of interest to myself.
But this is the largest event of this type – there is no shortage of high-quality speakers. If you think you have a SQL Server related interest that nobody here can help you with then I suspect quite a few of the speakers would like the challenge of talking to you.

Pay early and then wait to see what you’ve bought – in this case it’s the best way to do it.


Get every new post delivered to your Inbox.