How to Choose Between Star and Snowflake Schema in MariaDB

Galaxy Glossary

Which is better: star schema or snowflake schema in MariaDB?

Star and snowflake schemas are design patterns for organizing data-warehouse tables to balance query speed, storage, and maintainability.

Sign up for the latest in SQL knowledge from the Galaxy Team!

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Description

What is a star schema in MariaDB?

A star schema stores one central fact table (e.g., OrderItems) surrounded by denormalized dimension tables such as Customers, Products, and Dates. Each dimension is a single table, giving joins a simple spoke-like layout.

What is a snowflake schema in MariaDB?

A snowflake schema further normalizes those dimensions. For example, a Customers dimension may split into Customers, Countries, and Regions tables. The extra layers reduce duplication but add more joins.

When should I choose a star schema?

Use a star schema when BI queries must run fast, the data volumes fit on modern disks, and dimensions are relatively small. Fewer joins mean simpler SQL and quicker aggregation.

When is a snowflake schema better?

Pick a snowflake schema when dimension tables are large or change frequently. Normalization keeps updates consistent and saves storage, at the cost of slightly slower queries.

How do I convert an ecommerce model to a star schema?

Create a wide fact table OrderItems_Fact with foreign keys to denormalized Customers_Dim, Products_Dim, and Dates_Dim. Copy descriptive columns (name, price) into the dimension tables so analysts avoid additional joins.

How do I convert the same model to a snowflake schema?

Normalize the dimension tables: split Customers_Dim into Customers_Dim, Cities_Dim, and Countries_Dim. Store only surrogate keys in child tables. Update ETL pipelines to load these tables in the correct order.

Which schema performs faster in MariaDB?

Star schemas usually win for read-heavy analytics because MariaDB’s optimizer handles fewer joins. Snowflake schemas may match performance if you add covering indexes on foreign keys and use partitioning on the fact table.

Can I mix both designs?

Yes. Many teams denormalize critical dimensions (star) and normalize infrequently queried ones (snowflake). Evaluate each dimension on its update rate, size, and query frequency.

Why How to Choose Between Star and Snowflake Schema in MariaDB is important

How to Choose Between Star and Snowflake Schema in MariaDB Example Usage


-- Star schema: one hop joins
SELECT c.name,
       p.name   AS product,
       SUM(oi.quantity) AS total_units
FROM   OrderItems_Fact oi
JOIN   Customers_Dim  c ON c.customer_id = oi.customer_id
JOIN   Products_Dim   p ON p.product_id  = oi.product_id
GROUP  BY c.name, p.name;

-- Snowflake schema: extra joins to normalized tables
SELECT c.name,
       country.country,
       SUM(oi.total_amount) AS revenue
FROM   OrderItems_Fact oi
JOIN   Customers_Dim  c       ON c.customer_id = oi.customer_id
JOIN   Cities_Dim     city    ON city.city_id  = c.city_id
JOIN   Countries_Dim  country ON country.country_id = city.country_id
GROUP  BY c.name, country.country;

How to Choose Between Star and Snowflake Schema in MariaDB Syntax


-- Star schema example (denormalized dimensions)
CREATE TABLE Customers_Dim (
    customer_id INT PRIMARY KEY,
    name        VARCHAR(100),
    email       VARCHAR(100),
    created_at  DATE,
    city        VARCHAR(50),
    country     VARCHAR(50)
);

CREATE TABLE Products_Dim (
    product_id  INT PRIMARY KEY,
    name        VARCHAR(100),
    price       DECIMAL(10,2),
    stock       INT
);

CREATE TABLE OrderItems_Fact (
    order_item_id INT PRIMARY KEY,
    order_date    DATE,
    customer_id   INT,
    product_id    INT,
    quantity      INT,
    total_amount  DECIMAL(10,2),
    FOREIGN KEY (customer_id) REFERENCES Customers_Dim(customer_id),
    FOREIGN KEY (product_id)  REFERENCES Products_Dim(product_id)
);

-- Snowflake schema example (normalized Customers)
CREATE TABLE Countries_Dim (
    country_id INT PRIMARY KEY,
    country    VARCHAR(50)
);

CREATE TABLE Cities_Dim (
    city_id    INT PRIMARY KEY,
    city       VARCHAR(50),
    country_id INT,
    FOREIGN KEY (country_id) REFERENCES Countries_Dim(country_id)
);

CREATE TABLE Customers_Dim (
    customer_id INT PRIMARY KEY,
    name        VARCHAR(100),
    email       VARCHAR(100),
    created_at  DATE,
    city_id     INT,
    FOREIGN KEY (city_id) REFERENCES Cities_Dim(city_id)
);

-- Fact table stays the same
CREATE TABLE OrderItems_Fact (...);

Common Mistakes

Mistake: Copying every descriptive column into the fact table. This bloats storage and defeats dimension tables. Fix: keep only surrogate keys and measures in the fact table.
Mistake: Over-normalizing small dimensions into a snowflake. Extra joins slow queries without saving meaningful space. Fix: Denormalize low-cardinality dimensions.

Frequently Asked Questions (FAQs)

Does MariaDB optimize star joins automatically?

Yes. The optimizer can reorder joins and push down predicates, especially when indexes exist on foreign keys.

Do I need surrogate keys?

Surrogate integer keys speed joins, decouple ETL from source system IDs, and simplify slowly changing dimensions.

Want to learn about other SQL terms?

dbt tests

dbt tests are declarative assertions in dbt that automatically validate data quality during model builds.

dbt utils

dbt utils is a community-maintained dbt package that adds time-saving utility macros for analytics engineers, such as generating surrogate keys, safely unioning models, and automating date spines.

dbt Tutorial Glossary: Build, Test & Deploy SQL Models

dbt is an open-source framework that lets data teams transform raw warehouse tables into tested, documented, production-ready datasets using version-controlled SQL.