How to Design a Snowflake Schema in PostgreSQL

How do I design a snowflake schema in PostgreSQL?

Designing a snowflake schema normalizes dimension tables into sub-dimensions to save space, improve data integrity, and speed maintenance.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

Why choose a snowflake schema over a star?

Pick snowflake when storage cost, strict data integrity, or rapidly changing dimensions outweigh the need for very fast, simple joins. Normalizing dimensions removes redundancy and centralizes updates.

How do I identify dimensions and sub-dimensions?

List business entities first—Customers, Orders, Products. Break repeating attributes into new tables: Customer addresses, Product categories, Date parts. Each sub-dimension gets its own surrogate key.

What are the key design steps?

1) Create the central fact table (e.g., OrderItems). 2) Define first-level dimensions (Orders, Products, Customers). 3) Normalize each dimension into secondary tables with one-to-many or one-to-one relationships. 4) Add primary keys, foreign keys, and indexes.

How do I name keys and constraints?

Use singular table names, surrogate integer PKs (product_id), and FK names like fk_orders_customer. Consistent naming speeds query writing and troubleshooting.

What indexing strategy works best?

Index every FK column referenced in the fact table. Add composite indexes for frequent filter columns (order_date, customer_id). Avoid over-indexing small lookup tables.

How do I query a snowflake schema efficiently?

Use explicit JOINs and only pull needed columns. Materialize common joins into views for analysts. Apply WHERE filters on the highest-level dimension to prune rows early.

When should I denormalize back to star?

If query latency becomes critical or joins strain the planner, selectively denormalize hot paths—often calendar or product category dimensions—into a flattened table or materialized view.

Best practice summary

Keep surrogate keys smallints if counts allow, document every relationship, version control DDL, and test query plans after each schema change.

Why How to Design a Snowflake Schema in PostgreSQL is important

How to Design a Snowflake Schema in PostgreSQL Example Usage


-- Total revenue by brand for February 2024
SELECT  b.name   AS brand,
        SUM(oi.quantity * oi.unit_price) AS revenue
FROM    order_items    oi
JOIN    products       p ON p.id = oi.product_id
JOIN    brands         b ON b.id = p.brand_id
JOIN    orders         o ON o.id = oi.order_id
WHERE   o.order_date BETWEEN '2024-02-01' AND '2024-02-29'
GROUP BY b.name
ORDER BY revenue DESC;

How to Design a Snowflake Schema in PostgreSQL Syntax


-- Central fact table
CREATE TABLE order_items (
    id              SERIAL PRIMARY KEY,
    order_id        INT NOT NULL REFERENCES orders(id),
    product_id      INT NOT NULL REFERENCES products(id),
    quantity        INT NOT NULL,
    unit_price      NUMERIC(12,2) NOT NULL,
    created_at      TIMESTAMPTZ DEFAULT now()
);

-- First-level dimension
CREATE TABLE products (
    id              SERIAL PRIMARY KEY,
    name            TEXT NOT NULL,
    price           NUMERIC(12,2) NOT NULL,
    category_id     INT REFERENCES product_categories(id),
    brand_id        INT REFERENCES brands(id)
);

-- Second-level sub-dimensions
CREATE TABLE product_categories (
    id          SERIAL PRIMARY KEY,
    category    TEXT UNIQUE NOT NULL
);

CREATE TABLE brands (
    id      SERIAL PRIMARY KEY,
    name    TEXT UNIQUE NOT NULL
);

-- Ensure fast joins
CREATE INDEX idx_order_items_order_id   ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

Common Mistakes

Mistake: Leaving dimension columns duplicated in the fact table. Why wrong: Bloats storage and risks inconsistent updates. Fix: Move repeating attributes into separate dimension tables and reference them via surrogate keys.
Mistake: Forgetting to index foreign keys. Why wrong: Queries crawl when joining multiple small dimension tables. Fix: Create single-column b-tree indexes on every FK appearing in JOIN clauses.