How to VALIDATE Data in Amazon Redshift

Galaxy Glossary

How do I validate data quality in Amazon Redshift tables?

The VALIDATE command scans a Redshift table for rows that contain data-type or encoding errors so you can fix bad data before loading, exporting, or migrating.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.

Description

Table of Contents

What problem does VALIDATE solve?

Data migrations and reporting jobs fail when even a single row contains malformed UTF-8, oversized VARCHARs, or numeric overflows. VALIDATE lets you surface those bad rows early—without moving data—so fixes are quick and contained.

How does VALIDATE work internally?

The command reads every block (or a user-defined sample) and checks column values against the table’s data types and encodings. Invalid rows are written to STL_VALIDATION_ERRS and returned to the client as a result set.

Which options matter most?

PERCENT lets you sample large tables. BATCHSIZE controls the number of rows each slice scans before returning. ACCEPTANYDATE treats out-of-range dates as NULL rather than errors—handy for legacy data.

When should I run VALIDATE?

Run it after a COPY, before an UNLOAD, or prior to moving a table into production. Use a daily job on critical fact tables to catch creeping corruption.

What are best practices?

Start with PERCENT 1 to gauge data health quickly. Increase gradually if errors appear. Always capture the result set into a staging table so data engineers can review and patch problematic rows.

How do I fix rows flagged by VALIDATE?

Export the primary keys returned, correct the source data (e.g., trim strings or cast numerics), then DELETE the bad rows and INSERT the clean versions. Re-run VALIDATE to confirm all issues are gone.

Why How to VALIDATE Data in Amazon Redshift is important

How to VALIDATE Data in Amazon Redshift Example Usage


-- Spot-check 2% of recent orders for corrupt monetary values
VALIDATE Orders
    COLUMN total_amount
    PERCENT 2;

-- Full scan of Customers to catch bad email strings
VALIDATE Customers;

How to VALIDATE Data in Amazon Redshift Syntax


VALIDATE [TABLE] table_name
    [COLUMN column_name]
    [PERCENT sample_percent]
    [BATCHSIZE batch_size]
    [ACCEPTANYDATE];

-- Example with ecommerce tables
VALIDATE Orders PERCENT 5;
VALIDATE Orders COLUMN total_amount;
VALIDATE Customers BATCHSIZE 50000 ACCEPTANYDATE;

Common Mistakes

Frequently Asked Questions (FAQs)

Does VALIDATE lock my table?

No. VALIDATE is a read-only operation and does not block writes, but heavy scans can impact cluster I/O.

Can I validate only one column?

Yes. Add COLUMN column_name to focus on the field most likely to contain bad data.

Where are the errors stored?

Redshift writes detailed error info to STL_VALIDATION_ERRS, including the table name, column, row ID, and error text.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.