How to Validate Data in Snowflake

Galaxy Glossary

How do I use VALIDATION_MODE to validate data files in Snowflake before loading?

VALIDATE lets you test staged files with COPY INTO before loading them, returning rows that would fail.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Why validate data before loading?

Running COPY INTO with VALIDATION_MODE shows row-level errors without touching destination tables, helping you fix bad files faster.

How does VALIDATION_MODE work in COPY INTO?

Set VALIDATION_MODE to 'RETURN_ERRORS', 'RETURN_ALL_ERRORS', or 'RETURN_LIMIT n'. Snowflake scans the staged files and returns error details such as line number, column position, and reason.

Which error types can I catch?

Typical issues include missing columns, datatype mismatches, quotation problems, and un-escaped delimiters—exactly the problems that break ecommerce order feeds.

Can I preview successful rows?

No.VALIDATION_MODE only returns failing rows. Use a temporary table plus COPY INTO for sampling good data if needed.

How to fix errors revealed by VALIDATE?

Edit source CSVs, adjust COPY options (FIELD_OPTIONALLY_ENCLOSED_BY, SKIP_HEADER, NULL_IF), or cast columns explicitly in a SELECT stage transform.

Best practices for production pipelines?

1. Automate VALIDATE in CI jobs.
2. Fail fast on error counts > 0.
3. Version raw files in S3/GCS.
4. Log VALIDATION_MODE='RETURN_ERRORS' output for auditing.

.

Why How to Validate Data in Snowflake is important

How to Validate Data in Snowflake Example Usage


-- Validate today’s order feed without loading
COPY INTO Orders
FROM @orders_stage/2023/10/10/
FILE_FORMAT = (
  TYPE = 'CSV', FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
VALIDATION_MODE = 'RETURN_ERRORS';
-- Result sample
-- | line | character | file              | error                                       |
-- | 15   | 23        | orders_1010.csv.gz | Missing required column TOTAL_AMOUNT        |

How to Validate Data in Snowflake Syntax


COPY INTO <table_name>
FROM <@stage_or_external_location>
FILE_FORMAT = ( TYPE = 'CSV' [ , FIELD_OPTIONALLY_ENCLOSED_BY = '"' ] )
VALIDATION_MODE = { 'RETURN_ERRORS' | 'RETURN_ALL_ERRORS' | 'RETURN_LIMIT <integer>' }
ON_ERROR = 'CONTINUE' -- ignored during validation

-- Ecommerce example
COPY INTO Orders
FROM @orders_stage/current/
FILE_FORMAT = (
  TYPE = 'CSV',
  FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
VALIDATION_MODE = 'RETURN_ERRORS';

Common Mistakes

Frequently Asked Questions (FAQs)

Does VALIDATION_MODE count toward warehouse credits?

Yes, COPY INTO with validation still scans files, so it consumes compute but avoids table writes.

Can I validate gzip or parquet files?

Absolutely. Snowflake automatically decompresses gzip files and understands parquet. Just point to the stage and set FILE_FORMAT accordingly.

Is VALIDATION_MODE available in Tasks & Pipes?

Yes. Wrap the COPY INTO statement in a Task or use it during pipe debugging to ensure stream safety before continuous loads.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.