How to VALIDATE Data in Amazon Redshift

How do I validate data quality in Amazon Redshift tables?

The VALIDATE command scans a Redshift table for rows that contain data-type or encoding errors so you can fix bad data before loading, exporting, or migrating.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

What problem does VALIDATE solve?

Data migrations and reporting jobs fail when even a single row contains malformed UTF-8, oversized VARCHARs, or numeric overflows. VALIDATE lets you surface those bad rows early—without moving data—so fixes are quick and contained.

How does VALIDATE work internally?

The command reads every block (or a user-defined sample) and checks column values against the table’s data types and encodings. Invalid rows are written to STL_VALIDATION_ERRS and returned to the client as a result set.

Which options matter most?

PERCENT lets you sample large tables. BATCHSIZE controls the number of rows each slice scans before returning. ACCEPTANYDATE treats out-of-range dates as NULL rather than errors—handy for legacy data.

When should I run VALIDATE?

Run it after a COPY, before an UNLOAD, or prior to moving a table into production. Use a daily job on critical fact tables to catch creeping corruption.

What are best practices?

Start with PERCENT 1 to gauge data health quickly. Increase gradually if errors appear. Always capture the result set into a staging table so data engineers can review and patch problematic rows.

How do I fix rows flagged by VALIDATE?

Export the primary keys returned, correct the source data (e.g., trim strings or cast numerics), then DELETE the bad rows and INSERT the clean versions. Re-run VALIDATE to confirm all issues are gone.

Why How to VALIDATE Data in Amazon Redshift is important

How to VALIDATE Data in Amazon Redshift Example Usage


-- Spot-check 2% of recent orders for corrupt monetary values
VALIDATE Orders
    COLUMN total_amount
    PERCENT 2;

-- Full scan of Customers to catch bad email strings
VALIDATE Customers;

How to VALIDATE Data in Amazon Redshift Syntax


VALIDATE [TABLE] table_name
    [COLUMN column_name]
    [PERCENT sample_percent]
    [BATCHSIZE batch_size]
    [ACCEPTANYDATE];

-- Example with ecommerce tables
VALIDATE Orders PERCENT 5;
VALIDATE Orders COLUMN total_amount;
VALIDATE Customers BATCHSIZE 50000 ACCEPTANYDATE;

Common Mistakes

Running VALIDATE expecting it to repair data. VALIDATE only reports bad rows; you must correct or delete them manually, then reload clean data.
Scanning huge tables without sampling. A full scan can lock disks and run for hours—use PERCENT first, then scale up if issues are found.