A dbt seed is a static dataset—typically CSV files—managed by dbt and loaded into your data warehouse as tables during dbt runs.
A dbt seed is a collection of static data files—most commonly CSVs—placed in a project’s data/
directory and ingested into your target data warehouse as tables when you run dbt seed
(or include the +seed
selector in a dbt run
). These tables can then be referenced in downstream models just like any other table in the warehouse.
Seeding solves problems that arise when you need look-up tables, small reference datasets, or fixture data that rarely changes but must live inside the warehouse so your transforms remain SQL-only and source-controlled. Without seeds you would have to manage external scripts or ETL jobs for these tiny datasets—adding fragility and cognitive load. By version-controlling seeds alongside models, your entire transformation graph stays transparent, reproducible, and CI/CD-friendly.
Place CSV files in models/data/
or data/
. dbt automatically discovers them. Each file name becomes the table name (minus the extension) unless overridden.
Create or extend a dbt_project.yml
or nested .yml
with seeds:
blocks to set:
schema
– where the seed table lives (analytics
, seed
, etc.)database
– override database if necessaryquote_columns
– quote column names (handy for mixed-case)column_types
– force warehouse-specific typesfull_refresh
– overwrite instead of incremental insertdbt seed
creates or replaces tables using the warehouse adapter’s bulk-load path (COPY on Snowflake, INSERT on BigQuery, etc.). Seeds respect dbt run-operation
hooks and appear in dbt docs
like models, aiding lineage tracking.
column_types
to avoid adapter defaults that may differ across environments.varchar(256)
and silently truncates. Fix by setting column_types
or quote_columns: true
.external tables
or stage + model
.Assume we need a country code look-up table.
# project layout
my_dbt_project/
├── data/
│ └── iso_countries.csv
└── models/
└── my_model.sql
# .yml config inside models/iso_countries.yml
version: 2
seeds:
my_dbt_project:
iso_countries:
schema: reference
column_types:
alpha2: string
name: string
-- models/my_model.sql
SELECT
o.order_id,
c.name AS country_name
FROM {{ ref('orders') }} o
JOIN {{ ref('iso_countries') }} c
ON o.ship_country_code = c.alpha2;
Running dbt seed && dbt run
will:
iso_countries.csv
into reference.iso_countries
.my_model
, replacing {{ ref('iso_countries') }}
with the fully-qualified seed table.Some teams keep look-ups in application code or spreadsheets. Those approaches break lineage, introduce dual sources of truth, and are not SQL-native. Others run bespoke ETL jobs that add maintenance. dbt seeds centralize the data exactly where transformations occur, and maintain version history.
Galaxy’s modern SQL editor recognizes seeded tables once they exist in your warehouse’s information schema. Auto-complete surfaces reference.iso_countries
, and Galaxy’s AI copilot can explain or refactor queries that reference seeds. You still manage the seed files in Git and run dbt seed
via CI, but day-to-day exploration and collaboration happen inside Galaxy without copy-pasting SQL across tools.
dbt seeds offer a lightweight yet powerful mechanism for storing small, rarely changing datasets in the warehouse under version control. By embracing seeds—and the best practices outlined here—you can keep transformations purely in SQL, ensure reproducibility, and simplify the mental model for everyone who touches your analytics stack.
Static reference data is a critical dependency for many analytics pipelines. Without a standardized approach, teams resort to ad-hoc scripts or external spreadsheets—causing data drift and broken lineage. dbt seeds let you manage these datasets as code: version-controlled, testable, and loaded with one command. This tightens CI/CD loops, keeps transformation logic in SQL, and ensures that anyone running or reviewing a model can reproduce the full state of the database from the repository alone.
Run dbt seed
to create or replace all CSV files in the project’s data/
directory. Use selectors like --select
to limit which seeds run.
Yes. Treat a seed just like any model: {{ ref('my_seed_name') }}
. dbt resolves the correct schema and table name during compilation.
Once seeds are loaded, Galaxy’s SQL editor auto-completes and documents those seed tables. While Galaxy doesn’t run dbt seed
itself, it enhances exploration and collaboration on the resulting tables.
No. Seeds are best for small, infrequently changing reference data. Large or frequently updated datasets should be loaded via your primary ELT pipeline.