SQL TABLESAMPLE

What is SQL TABLESAMPLE?

TABLESAMPLE returns a pseudorandom subset of rows from a base table or view.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Compatible dialects for SQL TABLESAMPLE: Supported: PostgreSQL, SQL Server, Amazon Redshift, Azure Synapse, Google BigQuery (SYSTEM only). Not available natively in MySQL, MariaDB, Oracle, or SQLite.

SQL TABLESAMPLE Full Explanation

TABLESAMPLE lets you scan only a percentage or fixed number of rows instead of the entire table, reducing I/O and giving a quick approximation of large data sets. The sampling happens at the storage level, so rows are chosen without evaluating predicates first. Because of this, the subset is not ordered and may not be perfectly uniform; distribution depends on the underlying storage engine and the chosen sampling method (BERNOULLI or SYSTEM). The optional REPEATABLE clause ensures the same sample when the same seed is supplied. TABLESAMPLE is applied immediately after the FROM clause and before WHERE, GROUP BY, or JOIN processing. It is non-deterministic unless REPEATABLE is used and does not guarantee inclusion of specific rows, so use it only for exploratory analysis, statistics, or approximate aggregates—never for transactional queries that require completeness.

SQL TABLESAMPLE Syntax

SELECT column_list
FROM table_name
TABLESAMPLE [BERNOULLI | SYSTEM] (sample_size)
[REPEATABLE (seed)];

SQL TABLESAMPLE Parameters

BERNOULLI (keyword) - Row-level sampling; each row is checked independently.
SYSTEM (keyword) - Block-level sampling; entire data pages are selected, faster but less uniform.
sample_size (numeric) - Percentage (0.000001-100) or number of rows* depending on dialect.
REPEATABLE (keyword) - Optional clause that makes sampling deterministic.
seed (integer) - Seed value for the pseudo-random generator.

Example Queries Using SQL TABLESAMPLE

-- Take a 10 percent sample using block sampling
SELECT *
FROM sales TABLESAMPLE SYSTEM (10);

-- Row-level 5 percent sample, repeatable
SELECT customer_id, amount
FROM sales TABLESAMPLE BERNOULLI (5) REPEATABLE (123);

-- SQL Server: sample first 1000 rows approximately
SELECT *
FROM sales TABLESAMPLE (1000 ROWS);

-- Combine with aggregation for quick estimate
SELECT COUNT(*) AS approx_orders
FROM orders TABLESAMPLE SYSTEM (1);

Expected Output Using SQL TABLESAMPLE

Each query returns only the sampled subset instead of the full result set
Row count and content vary on every execution unless REPEATABLE is provided

Use Cases with SQL TABLESAMPLE

Quickly explore very large tables without full scans
Generate rough statistics when exact accuracy is unnecessary
Test query logic on smaller data during development
Feed approximate analytics or dashboards that tolerate sampling error
Reduce cost on pay-per-byte cloud warehouses

Common Mistakes with SQL TABLESAMPLE

Assuming sample is perfectly uniform across all columns
Using TABLESAMPLE in production reports that require complete data
Forgetting REPEATABLE, leading to non-reproducible results
Expecting ORDER BY inside TABLESAMPLE; ordering happens after sampling
Confusing BERNOULLI and SYSTEM performance characteristics

First Introduced In

SQL:2003 standard (feature T491)

Frequently Asked Questions

What is the difference between BERNOULLI and SYSTEM?

BERNOULLI samples rows individually, providing a more uniform distribution but with higher overhead. SYSTEM samples entire data pages, offering speed at the cost of potential bias.

Can I sample a fixed number of rows?

In SQL Server you can specify TABLESAMPLE (n ROWS). In PostgreSQL and BigQuery, only percentage sampling is supported; combine TABLESAMPLE with LIMIT for an approximate row count.

How do I get the same sample every time?

Add REPEATABLE (seed) after the TABLESAMPLE clause. Using the same seed recreates the identical subset on each execution.

Does TABLESAMPLE guarantee performance improvement?

Usually yes for very large tables, but on small tables the overhead of initiating the sampling engine may negate benefits.