SQL Keywords

SQL TABLESAMPLE

What is SQL TABLESAMPLE?

TABLESAMPLE returns a pseudorandom subset of rows from a base table or view.
Sign up to get up to date news on SQL keywords
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.

Compatible dialects for SQL TABLESAMPLE: Supported: PostgreSQL, SQL Server, Amazon Redshift, Azure Synapse, Google BigQuery (SYSTEM only). Not available natively in MySQL, MariaDB, Oracle, or SQLite.

SQL TABLESAMPLE Full Explanation

TABLESAMPLE lets you scan only a percentage or fixed number of rows instead of the entire table, reducing I/O and giving a quick approximation of large data sets. The sampling happens at the storage level, so rows are chosen without evaluating predicates first. Because of this, the subset is not ordered and may not be perfectly uniform; distribution depends on the underlying storage engine and the chosen sampling method (BERNOULLI or SYSTEM). The optional REPEATABLE clause ensures the same sample when the same seed is supplied. TABLESAMPLE is applied immediately after the FROM clause and before WHERE, GROUP BY, or JOIN processing. It is non-deterministic unless REPEATABLE is used and does not guarantee inclusion of specific rows, so use it only for exploratory analysis, statistics, or approximate aggregates—never for transactional queries that require completeness.

SQL TABLESAMPLE Syntax

SELECT column_list
FROM table_name
TABLESAMPLE [BERNOULLI | SYSTEM] (sample_size)
[REPEATABLE (seed)];

SQL TABLESAMPLE Parameters

  • BERNOULLI (keyword) - Row-level sampling; each row is checked independently.
  • SYSTEM (keyword) - Block-level sampling; entire data pages are selected, faster but less uniform.
  • sample_size (numeric) - Percentage (0.000001-100) or number of rows* depending on dialect.
  • REPEATABLE (keyword) - Optional clause that makes sampling deterministic.
  • seed (integer) - Seed value for the pseudo-random generator.

Example Queries Using SQL TABLESAMPLE

-- Take a 10 percent sample using block sampling
SELECT *
FROM sales TABLESAMPLE SYSTEM (10);

-- Row-level 5 percent sample, repeatable
SELECT customer_id, amount
FROM sales TABLESAMPLE BERNOULLI (5) REPEATABLE (123);

-- SQL Server: sample first 1000 rows approximately
SELECT *
FROM sales TABLESAMPLE (1000 ROWS);

-- Combine with aggregation for quick estimate
SELECT COUNT(*) AS approx_orders
FROM orders TABLESAMPLE SYSTEM (1);

Expected Output Using SQL TABLESAMPLE

  • Each query returns only the sampled subset instead of the full result set
  • Row count and content vary on every execution unless REPEATABLE is provided

Use Cases with SQL TABLESAMPLE

  • Quickly explore very large tables without full scans
  • Generate rough statistics when exact accuracy is unnecessary
  • Test query logic on smaller data during development
  • Feed approximate analytics or dashboards that tolerate sampling error
  • Reduce cost on pay-per-byte cloud warehouses

Common Mistakes with SQL TABLESAMPLE

  • Assuming sample is perfectly uniform across all columns
  • Using TABLESAMPLE in production reports that require complete data
  • Forgetting REPEATABLE, leading to non-reproducible results
  • Expecting ORDER BY inside TABLESAMPLE; ordering happens after sampling
  • Confusing BERNOULLI and SYSTEM performance characteristics

Related Topics

SAMPLE (Oracle), LIMIT, FETCH FIRST, TOP, ORDER BY RANDOM, APPROXIMATE aggregation

First Introduced In

SQL:2003 standard (feature T491)

Frequently Asked Questions

What is the difference between BERNOULLI and SYSTEM?

BERNOULLI samples rows individually, providing a more uniform distribution but with higher overhead. SYSTEM samples entire data pages, offering speed at the cost of potential bias.

Can I sample a fixed number of rows?

In SQL Server you can specify TABLESAMPLE (n ROWS). In PostgreSQL and BigQuery, only percentage sampling is supported; combine TABLESAMPLE with LIMIT for an approximate row count.

How do I get the same sample every time?

Add REPEATABLE (seed) after the TABLESAMPLE clause. Using the same seed recreates the identical subset on each execution.

Does TABLESAMPLE guarantee performance improvement?

Usually yes for very large tables, but on small tables the overhead of initiating the sampling engine may negate benefits.

Sign up to get up to date news on SQL keywords
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.
Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo

Check out other commonly used SQL Keywords!