Automatically extracting table, column, constraint, and comment metadata from PostgreSQL’s system catalogs to create a living reference of your schema.
Automatic Data Dictionary Generation in PostgreSQL
A practical guide to producing a self-updating catalog of your PostgreSQL schema with pure SQL and lightweight automation.
A data dictionary is a centralized catalog that documents every table, column, data type, constraint, default value, and business description in a database. Instead of living in a wiki or spreadsheet, the information is sourced directly from your database metadata so it never drifts from reality.
PostgreSQL keeps rich schema metadata in two primary locations:
information_schema
— ANSI-standard views (portable across databases).pg_catalog
— PostgreSQL-specific system tables that expose extras like column comments, indexes, and statistics.Combining these sources lets you build a comprehensive dictionary.
SELECT
ns.nspname AS schema_name,
cls.relname AS table_name,
col.ordinal_position,
col.column_name,
col.data_type,
col.is_nullable,
col.column_default,
pgd.description AS column_comment
FROM information_schema.columns col
JOIN pg_catalog.pg_class cls ON cls.relname = col.table_name
JOIN pg_catalog.pg_namespace ns ON ns.oid = cls.relnamespace
LEFT JOIN pg_catalog.pg_description pgd ON (
pgd.objoid = cls.oid AND pgd.objsubid = col.ordinal_position)
WHERE ns.nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY schema_name, table_name, ordinal_position;
This statement returns one row per column and pulls in human-written comments stored via COMMENT ON COLUMN
. By piping its results into Markdown, HTML, or a spreadsheet you already have a basic dictionary.
SELECT
tc.table_schema,
tc.table_name,
tc.constraint_type,
kcu.column_name,
ccu.table_schema AS referenced_schema,
ccu.table_name AS referenced_table,
ccu.column_name AS referenced_column
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
ON ccu.constraint_name = tc.constraint_name
WHERE tc.constraint_type IN ('PRIMARY KEY','FOREIGN KEY');
SELECT
schemaname,
tablename,
indexname,
indexdef
FROM pg_indexes
WHERE schemaname NOT IN ('pg_catalog','information_schema');
PostgreSQL exposes the COMMENT
command that attaches free-text descriptions to any object. Encourage your engineers to annotate tables and columns at creation time:
COMMENT ON TABLE users IS 'Application end-users. Source: Auth0 webhook';
COMMENT ON COLUMN users.created_at IS 'UTC timestamp when the account was provisioned';
Because our dictionary query pulls pg_description
, these remarks appear automatically.
.sql
file, dbt model, or bash psql -c
).psql --csv
or \\copy.COMMENT
statements like schema code so reviews ensure quality.information_schema
Why it’s wrong: You lose PostgreSQL-specific gems like comments and index definitions.
Fix: Join with pg_catalog
as shown above.
Why it’s wrong: The dictionary drifts, misleading users.
Fix: Hook regeneration into CI or migration runners.
Why it’s wrong: Although metadata queries are cheap, they still touch system catalogs and can create locks under heavy DDL churn.
Fix: Run on a read-replica or off-hours schedule.
With fewer than 100 lines of SQL you can programmatically extract every structural and descriptive detail of your PostgreSQL database, export it in human-friendly form, and refresh it automatically. The result is a living data dictionary that scales with your organization and guards against knowledge loss.
Galaxy’s desktop SQL editor displays table and column comments directly in its sidebar and autocomplete pop-ups, so engineers see the data dictionary as they write queries. The AI Copilot can even suggest missing COMMENT
statements or generate full documentation queries similar to those above. By saving the dictionary extraction SQL in a shared Collection, teams can endorse it and rerun with one click—eliminating outdated wikis forever.
COMMENT
coverage to all critical schemas.As databases evolve, undocumented schemas become a major source of technical debt. Developers waste hours deciphering column meanings, analysts misinterpret metrics, and auditors flag missing lineage. Automating a data dictionary ensures consistent, up-to-date documentation that accelerates onboarding, enhances data quality, and supports compliance without manual upkeep.
Run the core query in this article against your database, export the results, and encourage engineers to fill in missing COMMENT
statements.
Metadata queries are lightweight, but schedule them off-hours or on a read replica in very large clusters to avoid potential catalog locks.
Yes. Galaxy shows column comments in its sidebar and autocomplete tooltips, and its AI Copilot can generate or update COMMENT
statements and dictionary extraction queries. Saving the SQL in a shared Collection lets teams rerun and endorse it easily.
Commit the extraction script to version control and run it automatically after each migration via CI/CD or an orchestrator like Airflow or dbt Cloud.