dbt seed

What is dbt seed?

dbt seed is a dbt command that uploads static CSV files in your project’s data/ folder into your data warehouse as managed tables, making small reference datasets version-controlled and easily queryable.

Welcome to the Galaxy, Guardian!

Oops! Something went wrong while submitting the form.

Description

What Is dbt seed?

dbt seed loads CSV files stored in a project’s data/ directory into your warehouse as tables, giving you version-controlled reference or lookup data with one command.

Why Use dbt seed Instead of SQL COPY?

dbt seed automates table creation, handles schema changes, and ties data to Git history, whereas manual COPY commands live outside version control and require repetitive boilerplate.

How Does dbt seed Work Under the Hood?

During dbt seed, dbt reads each CSV, infers column types or applies column-level quoting, generates CREATE and INSERT statements, and stores a checksum in manifest.json for freshness checks.

How to Configure dbt seed?

Add settings in dbt_project.yml under seeds: to set database, schema, header, delimiter, quote_columns, and file-specific overrides.

YAML Example

seeds: my_project: +schema: staging +quote_columns: false users.csv: +column_types: id: integer plan: varchar(10)

How Do You Run dbt seed Selectively?

Use dbt seed --select my_seed or dbt seed --exclude large_seed to control which CSVs load, saving build time in CI pipelines.

What Are Best Practices for dbt seed?

Keep files under ~100k rows, store only static or slowly changing data, set explicit column types, and add tests for row_count and not_null keys.

When Should You Avoid dbt seed?

Avoid dbt seed for large fact tables or frequently updated data; use proper ELT pipelines or warehouse-native staging instead.

How Does dbt seed Integrate with Galaxy?

In Galaxy’s SQL editor, seeded tables appear instantly in the sidebar metadata, letting you autocomplete against them and share validated seed queries inside Collections.

Why dbt seed is important

Version-controlled reference data eliminates hidden CSV uploads, keeps dev, CI, and prod in sync, and speeds testing by guaranteeing deterministic lookup tables.

dbt seed Example Usage


dbt seed --select marketing_campaigns

dbt seed Syntax

Common Mistakes

Ignoring column types lets dbt infer suboptimal strings. Fix by defining +column_types to ensure integers stay numeric and dates stay date-typed.
Seeding huge tables slows CI drastically. Fix by moving large datasets to an ELT job and keeping seeds under 100 K rows.
Forgetting unique or not_null tests means silent data drift. Add tests to catch accidental edits or row duplication in seeded files.