Archive | Errors RSS feed for this section

SSISDB Error “The version number of the package is not valid”

23 Sep

The full message will also contain “The version number cannot be greater than the current version number”.

Behaviour

I have an SSIS package installed in an SSISDB which is on a server that run SQL Server 2012 (server A) and it needed to be copied to another 2012 instance (server B). My workstation has SQL Server 2014 and I connect to the 2012 instance on the target server using my workstation instance.

I connected from my workstation to Server A and exported the SSIS package into the required ‘ispac’ file. Then changed my connection to point to Server B and imported the ispac file. So for both the export and import I was using my SQL Server 2014 to connect to SQL Server 2012 instances.

The export and import run without any issues, but when the associated job executed that ran the SSIS package on Server B, it failed. In this case the important part of the error report was:
“Package migration from version 8 to version 6 failed with error 0xC001700a. the version number in the package is not valid. The version number cannot be greater than the current version number.”

Cause

Exporting the SSIS package by using SQL Server 2014 (even though it is connecting to a 2012 instance) appears to have changed the ‘PackageFormatVersion’ property within the dtsx file, form ‘6’ to ‘8’. You can see this by opening the dtsx file in an editor such as Notepad++. Editing that value back to ‘6’ merely corrupts the package, so there are probably further changes made within the file.

Solution

Connect to the required instance using SQL Server Management Studio (SSMS) that matches that version in order to complete this export/import. In my case I could remote onto the required servers and use the SSMS for SQL Server 2012 that was installed there. The export and import worked as expected and this time the SSIS package ran without issue.

Advertisements

Read Committed – not getting what you expect

23 Jul

In a previous article I demonstrated how use of READUNCOMMITTED (or the NOLOCK hint) can result in erroneous data, due to page splits. Examples abound of the more common issue with this isolation level, which occur when reading a row that is being updated in another transaction and then rolled back. These so-called ‘dirty reads’ cannot happen with the default isolation level of READ COMMITTED.

However, it is still possible to alter data within a table that is being read with READ COMMITTED, so that the results of that transaction do not reflect the actual state of the data when the transaction has completed.
For this demonstration I’ll create a table that contains an ID column and an integer column, with a clustered index on the ‘ID’ column.

CREATE TABLE [dbo].[Test2](
	[ID]		INT  NOT NULL,
	[Test_Value]	INT NOT NULL,
PRIMARY KEY CLUSTERED 
(
	[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] 

GO

For reasons that will become apparent I’ll start the ID from 10, incremented by 1 for each row and store 1 in ‘Test_Value’ against each row.
With nothing else executing against that table there are 5,000,000 rows and the sum of ‘Test_Value’ is 5,000,000:

Read_Committed_01

The two queries within this transaction are looking at the same data.

Now I will repeat this query, and in a separate tab I will update the row with ID of 10 to move it to the end of the index, as soon as the first query has started:

UPDATE dbo.Test2
SET id = id + 6000000
WHERE id = 10;

The results for this an extra row in the first SELECT:

Read_Committed_03

And if I scroll to the end of that result, the row that was shown with an ID of 10 is also shown with an ID of 6000010.

Read_Committed_04

So, the first SELECT query has looked at the same row, before and after it was moved. However, the second SELECT query, because it read the data after all of the updating, has returned the correct information.

Resetting the data, it is also possible to ‘lose’ a row, when a row at the end of the index is moved to the start of the index, during the first SELECT.

This time, the second query is:

UPDATE dbo.Test2
SET id = 9
WHERE id = 5000009;

Read_Committed_05

This time, the row with an ID of 9 is not shown in the results, because the update moved it from the end of the index to the beginning of the index after the SELECT had read the beginning of the index but before it had reached the end of the index.

Read_Committed_06

The sum shows the correct results because the update had completed before the second select started.

A Smaller Demonstration

In the examples shown there were five million rows, but what about smaller tables?

Consider a far smaller table, which is accessed by three processes. One is simply reading the whole table (with READ COMMITTED Isolation Level). At the same time a second process has locked a row, because it is updating it. In addition, a third process is updating data and in the process is moving the row to another position within the index.

Firstly – the data before it is interfered with shows one thousand rows that SUM to 500500:

Read_Committed_08

In a separate tab run the following command:

BEGIN TRANSACTION;

UPDATE dbo.Test2
SET Test_Value = 1000
WHERE id = 50;

Notice that there is no ‘COMMIT’ or ‘ROLLBACK’. This will lock row 50.

Now run the initial command, with the two SELECT statements. This will be blocked by the update, because Row 50 is locked. READ UNCOMMITTED would use this (as yet uncommitted) Test_Value and continue. READ COMMITTED will not, and is therefore blocked.

In a third window:

BEGIN TRANSACTION;

UPDATE dbo.Test2
SET ID = 1001
WHERE ID = 10;

COMMIT TRANSACTION

This moved Row 10, which had already been read, and placed it at the end of the index.

Now ROLLBACK the update to row 50, this will allow the blocked process to continue.
The SELECT command has now completed, but the data for row 10 now also appears as row 1001 – the same data shown twice:

Read_Committed_09

Read_Committed_10

Of course, if the SELECT was also running a SUM or suchlike, the results would be incorrect in each of these examples.

So what is happening?

READ COMMITED obtains shared locks on the rows or pages as it requires them, but releases these locks as soon as it has read the locked data.

Therefore, it is possible to update, delete or insert rows where the query has already processed that area of the table, and to do the same in the part of the table that the query has yet to process. What is guaranteed with READ COMMITTED is that you can’t have dirty reads, but that’s pretty much it.

The fact that I placed the two queries within a Transaction also made no difference. The second query always saw the ‘true’ picture of the data because the manipulations had completed by that point.

I’ve frequently seen statements where people have claimed that to get a ‘true’ picture of the data that the default READ COMMITTED isolation level should be used. This clearly is a misunderstanding of how this process works, and one that I had to do a little bit of work to comprehend too.

Getting More Data Than You Bargained For With NOLOCK

29 May

Generally, when somebody posts a query using the NOLOCK hint, there are a variety of warnings about dirty reads, phantom reads and double-reading (due to page splits).
The dirty read example is common and tends to be the main example used to demonstrate what to watch out for with NOLOCK, but I’ve never seen an example of double-reading. So, I thought I’d try and see how much of an issue such a thing can be. It certainly sounds plausible, but is it a ‘one in a million’ chance, or is it a Terry Pratchett ‘one in a million’ chance – where it is pretty much guaranteed to happen?

So, create a database for this testing and then create a table within:

CREATE TABLE [dbo].[Test1](
	[ID] [int] IDENTITY(1,1) NOT NULL,
	[Test_Description] [nvarchar](max) NULL,
PRIMARY KEY CLUSTERED 
(
	[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY]
) ON [PRIMARY]

GO

A very simple table – an Identity column and a text column. The important part is that I’ve specified a Fill Factor of 100 for the Clustered Index, meaning there is no space for expansion of the data in the current location. Therefore, if I increase the contents of the ‘Test_Description’ column it should force a page-spilt, because the data won’t fit back where it was originally stored.

I then generate 500,000 rows with a simple integer id and the entry ‘TEST’ in the ‘Test_Description’ column.
In one window of SSMS I now set up a query to update this column on every row. For text-based test data I tend to use a text file of ‘War and Peace’, from Project Gutenberg. For this test I just use the first paragraph.

In another window I set up a query to return every row from the same table, specifying ‘READ UNCOMMITTED’ isolation level.
Once prepared I run the two queries together:

Nolock_02

Nolock_01

The update has returned a count of 500,000, which matches the number or rows I placed into that table.
However, the select has returned a count of 586,078. Considerably higher than the number of rows within that table. In addition, it can be seen within the results that I’m retrieving a mixture of updated and non-updated rows.

Exporting the results to a csv file and then into Excel, I can sort the results and see a large number of duplicates (based on the ID column):

Nolock_03

The Select found the rows before they were updated, and then again when the page split had moved the data, retrieving the row twice.
If I leave the select query to run once the update has completed, I get 500,000 rows – all updated.

This was a relatively easy test to set up, and has been consistent in the fact that it has duplicated a large number of rows every time. Of course, specifying a fill factor of 100 would pretty much guarantee a page split somewhere, but in the day-to-day operations of most databases, pages will become full and splits will happen eventually. So the scale of the issue might not be as large as the example I’ve created, but the underlying issue may still be present.

Obviously, if this was something more important that a short piece of text (financial data, for example) then the results derived from this query would be wildly inaccurate.

SSIS Error Code “0x80070057” in Slowly Changing Dimension

28 Aug

SSIS can produce an inordinate number of error messages and numbers. Some of these can even be helpful and informative.
Generally accompanied by an Error Code, one of the most popular appears to be “0x80070057” and can have a number of causes.

Recently I came across this particularly helpful code when creating a Slowly Changing Dimension:

Error: 0xC0202009 at ‘Task Name’ , Slowly Changing Dimension [14907]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80070057.
An OLE DB record is available. Source: “Microsoft SQL Server Native Client 10.0” Hresult: 0x80070057 Description: “The parameter is incorrect.”.
Error: 0xC0047022 at ‘Task Name’ , SSIS.Pipeline: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component “Slowly Changing Dimension” (14907) failed with error code 0xC0202009 while processing input “Slowly Changing Dimension Input” (14918). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.
Error: 0xC02020C4 at ‘Task Name’ , ‘View Name’ [1]: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.
Error: 0xC0047038 at , SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component “’View Name’” (1) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.

The behaviour was thus – run it for the first time and the process extracted data from a View and inserted all of the rows as new rows into the output. Run it a second time, when it would check for changes (because the output table now had data) and it would instantly fail with the above error message.

The first thing I noticed was the “The parameter is incorrect”, which is strange because the Wizard created everything based on the details I fed into it – so it shouldn’t be anything I’ve done.
So, as I had created 5 other Slowly Changing Dimensions for other Views and Tables I decided to recreate this one from the start with a very ‘narrow’ table. The table concerned with this particular process was rather wide, certainly when compared to the others that had worked successfully.
A couple of fairly tedious hours later I had a failure when I added the last few columns back in (as ever, it’s always something at the end). These columns were varchar(8000) and there were six of them.
This is where the line in the error message “The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020” started to look more interesting.
Checking these columns showed that the 8000 size was a tad generous, based upon the data likely to be stored within them. Reducing these to 300 each removed the error and everything worked as expected.

I have no idea what size buffer SSIS can create for this processing but the message implied that it wasn’t large enough for this particular table. Reducing the width of the table corrected this.
It may not be the solution for every instance of that message but in this case it worked and can be added to the list of possible solutions to one less-than-helpful SSIS Error Code.