Notes on SQL

Random articles from a puzzled DBA

Whilst reading Itzik Ben Gan’s excellent book on T-SQL I came across the COMPRESS and DECOMPRESS functions. These could be quite useful for some situations that I have, so decided to have a closer look.

COMPRESS and DECOMPRESS were introduced in SQL Server 2016. The COMPRESS function uses the GZIP algorithm to compress the data that is supplied to it (character or binary). However, the DECOMPRESS function will only work against binary data, so the target from COMPRESS should be a binary column.

You can COMPRESS into an NVARCHAR column but the attempt to DECOMPRESS will produce the following response:
Msg 8116, Level 16, State 1, Line 2
Argument data type nvarchar(max) is invalid for argument 1 of Decompress function.

Test Environment
In the following example I create two tables – one for the uncompressed data, using an NVARCHAR column and the other table for the compressed data, with a VARBINARY column. I’ve deliberately kept the tables as simple as possible for the purposes of this demo.

CREATE TABLE dbo.WarAndPeace_Text
	TextData	NVARCHAR(max) null


CREATE TABLE dbo.WarAndPeace_Compressed
	TextData	VARBINARY(max) null


For test data I import the Project Gutenberg copy of “War and Peace” and import it into the ‘dbo.WarAndPeace_Text’ table. This provides me with just over 66,000 rows of data.

War and Peace imported as source data

Now I use the COMPRESS function to load the data into ‘dbo.WarAndPeace_Compressed’. Because I know both tables were empty at the start I’m letting SQL take care of the ID column, happy that they will match across both tables.

INSERT INTO dbo.WarAndPeace_Compressed ( TextData )
FROM dbo.WarAndPeace_Text

So first of all, look at the size of the two tables:

The size of the uncompressed and compressed data

The compressed data table is 1,552KB smaller.

How does look when comparing the individual rows?

Original row datalength and compressed datalength

This is where it gets a little more interesting – the row with the id of 8 is actually a longer length in the compressed format. This is possible with compression algorithms, so don’t always assume your data will shrink in length. However, the overall impact is a reduction in data length.

Now we can check the resultant data by using the DECOMPRESS functio, matching the ID column against the uncompressed table. As an aside, I’ll show what the data looks like in the compressed form:

  SELECT t.TextData AS [Original Text], 
		 CAST(DECOMPRESS(c.TextData) AS NVARCHAR(max)) AS [Decompressed Text], 
		 c.TextData AS [Compressed Text]
  FROM dbo.WarAndPeace_Text AS t
  INNER JOIN dbo.WarAndPeace_Compressed AS c ON =
  WHERE = 8;

Using the ID of the row data that actually looked longer in compressed format, the data has been returned as expected:

The text, uncompressed and compressed data

Pick another ID, to show that wasn’t a fluke:

Showing another row results

These are two useful functions, particularly if you are archiving large amounts of text data. However, test that you do get sufficient storage savings, as compressing data doesn’t always make it smaller.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: