TABLESAMPLE – Basic Examples

24 Jan

Whilst reading an article on Quora I saw mention of TABLESAMPLE. I had never heard of it and upon investigation it piqued my interest.

Background
Introduced in SQL Server 2015 TABLESAMPLE is a clause for a query which can be used to select a pseudo-random number of rows from a table, based upon a percentage or a number of rows and an optional seed number – if a repeatable result is required.
It can only be used against local tables. Any attempt to use this clause against derived tables, views, table-value functions and suchlike, will result in the error message Msg 494, Level 16, State 1, Line xx The TABLESAMPLE clause can only be used with local tables.

Examples
For the purposes of these examples I’m using SQL2014 and the AdventureWorks2014 database.
A full SELECT against the Person.Person table produces 19,972 rows. The first example will SELECT 10 percent of the rows from this table:

Listing 1: 10 percent selection

USE AdventureWorks2014;
GO

SELECT * FROM Person.Person TABLESAMPLE (10 PERCENT);

In the first execution, this produces 1,997 rows.
Figure 1: First execution for 10 percent

First result with 10%

Now repeat the same statement and it returns 1,796 rows, some of which are different from the rows extracted in the previous execution.
Figure 2: Listing 1 repeated

A different number of rows

TABLESAMPLE will return an approximate number of rows, based upon your criteria. Indeed, if the table is very small and occupies just one page you will find that you may get every row or no rows; it bases the calculations upon the number of pages that the table occupies.

To show this more clearly, specify ROWS instead of PERCENT. You would probably expect it to return the number of rows that you specified:
Listing 2: 20 Row selection

USE AdventureWorks2014;
GO

SELECT * FROM Person.Person TABLESAMPLE (20 ROWS);

Figure 3: 20 Rows requested – 40 returned

Requested 20 – selected 40

Now repeat the execution in Listing 2:
Figure 4: 20 Rows requested – 15 returned

15 returned this time

Repeatable results
Although it might not return the number of rows expected, it can be made to return the same results each time, with a seed parameter. The ‘seed’ is an arbitrary number supplied by yourself. As long as the same number is supplied the same results will be received (as long as the source data doesn’t change).

Listing 3: Seed specified via REPEATABLE

USE AdventureWorks2014;
GO

SELECT * FROM Person.Person TABLESAMPLE (10 PERCENT) REPEATABLE (123);

Figure 5: Results from REPEATABLE (123)

Repeatable results

Repeating the execution of the code in Listing 3 produces the same results each time.

Multi-table Queries with TABLESAMPLE
You can specify TABLESAMPLE against one or several tables in joins but the results from the joins can be unpredictable, unless REPEATABLE is used with the same seed.
Listing 4: Two tables with TABLESAMPLE

SELECT p.BusinessEntityID ,
	   p.FirstName ,
	   p.MiddleName ,
	   p.LastName ,
	   pp.PhoneNumber
FROM   Person.Person AS p TABLESAMPLE (10 PERCENT) REPEATABLE (123)
	   LEFT OUTER JOIN Person.PersonPhone AS pp TABLESAMPLE (10 PERCENT)
		   ON pp.BusinessEntityID = p.BusinessEntityID 
ORDER BY p.BusinessEntityID DESC;

Because the PersonPhone table does not specify REPEATABLE it does not return consistent results with each execution.

The SYSTEM option
In the ANSI standard the sampling method can be specified. In SQL Server only one option is catered for and is the default. So ‘TABLESAMPLE SYSTEM (100 rows);’ and ‘TABLESAMPLE (100 rows);’ are identical.

References
Limiting Result Sets by Using TABLESAMPLE

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: