The VALIDATE command scans a Redshift table for rows that contain data-type or encoding errors so you can fix bad data before loading, exporting, or migrating.
Data migrations and reporting jobs fail when even a single row contains malformed UTF-8, oversized VARCHARs, or numeric overflows. VALIDATE
lets you surface those bad rows early—without moving data—so fixes are quick and contained.
The command reads every block (or a user-defined sample) and checks column values against the table’s data types and encodings. Invalid rows are written to STL_VALIDATION_ERRS and returned to the client as a result set.
PERCENT lets you sample large tables. BATCHSIZE controls the number of rows each slice scans before returning. ACCEPTANYDATE treats out-of-range dates as NULL rather than errors—handy for legacy data.
Run it after a COPY
, before an UNLOAD
, or prior to moving a table into production. Use a daily job on critical fact tables to catch creeping corruption.
Start with PERCENT 1
to gauge data health quickly. Increase gradually if errors appear. Always capture the result set into a staging table so data engineers can review and patch problematic rows.
Export the primary keys returned, correct the source data (e.g., trim strings or cast numerics), then DELETE
the bad rows and INSERT
the clean versions. Re-run VALIDATE
to confirm all issues are gone.
No. VALIDATE is a read-only operation and does not block writes, but heavy scans can impact cluster I/O.
Yes. Add COLUMN column_name
to focus on the field most likely to contain bad data.
Redshift writes detailed error info to STL_VALIDATION_ERRS
, including the table name, column, row ID, and error text.