Whilst reading an article on Quora I saw mention of TABLESAMPLE. I had never heard of it and upon investigation it piqued my interest.
Introduced in SQL Server 2015 TABLESAMPLE is a clause for a query which can be used to select a pseudo-random number of rows from a table, based upon a percentage or a number of rows and an optional seed number – if a repeatable result is required.
It can only be used against local tables. Any attempt to use this clause against derived tables, views, table-value functions and suchlike, will result in the error message Msg 494, Level 16, State 1, Line xx The TABLESAMPLE clause can only be used with local tables.
For the purposes of these examples I’m using SQL2014 and the AdventureWorks2014 database.
A full SELECT against the Person.Person table produces 19,972 rows. The first example will SELECT 10 percent of the rows from this table:
Listing 1: 10 percent selection
USE AdventureWorks2014; GO SELECT * FROM Person.Person TABLESAMPLE (10 PERCENT);
In the first execution, this produces 1,997 rows.
Figure 1: First execution for 10 percent
Now repeat the same statement and it returns 1,796 rows, some of which are different from the rows extracted in the previous execution.
Figure 2: Listing 1 repeated
TABLESAMPLE will return an approximate number of rows, based upon your criteria. Indeed, if the table is very small and occupies just one page you will find that you may get every row or no rows; it bases the calculations upon the number of pages that the table occupies.
To show this more clearly, specify ROWS instead of PERCENT. You would probably expect it to return the number of rows that you specified:
Listing 2: 20 Row selection
USE AdventureWorks2014; GO SELECT * FROM Person.Person TABLESAMPLE (20 ROWS);
Figure 3: 20 Rows requested – 40 returned
Now repeat the execution in Listing 2:
Figure 4: 20 Rows requested – 15 returned
Although it might not return the number of rows expected, it can be made to return the same results each time, with a seed parameter. The ‘seed’ is an arbitrary number supplied by yourself. As long as the same number is supplied the same results will be received (as long as the source data doesn’t change).
Listing 3: Seed specified via REPEATABLE
USE AdventureWorks2014; GO SELECT * FROM Person.Person TABLESAMPLE (10 PERCENT) REPEATABLE (123);
Figure 5: Results from REPEATABLE (123)
Repeating the execution of the code in Listing 3 produces the same results each time.
Multi-table Queries with TABLESAMPLE
You can specify TABLESAMPLE against one or several tables in joins but the results from the joins can be unpredictable, unless REPEATABLE is used with the same seed.
Listing 4: Two tables with TABLESAMPLE
SELECT p.BusinessEntityID , p.FirstName , p.MiddleName , p.LastName , pp.PhoneNumber FROM Person.Person AS p TABLESAMPLE (10 PERCENT) REPEATABLE (123) LEFT OUTER JOIN Person.PersonPhone AS pp TABLESAMPLE (10 PERCENT) ON pp.BusinessEntityID = p.BusinessEntityID ORDER BY p.BusinessEntityID DESC;
Because the PersonPhone table does not specify REPEATABLE it does not return consistent results with each execution.
The SYSTEM option
In the ANSI standard the sampling method can be specified. In SQL Server only one option is catered for and is the default. So ‘TABLESAMPLE SYSTEM (100 rows);’ and ‘TABLESAMPLE (100 rows);’ are identical.
Limiting Result Sets by Using TABLESAMPLE