How to Apply Data Modeling Best Practices in Snowflake

Galaxy Glossary

How do I model data in Snowflake for performance and scalability?

Design scalable, performant Snowflake schemas by following star-schema principles, proper data types, clustering, and naming conventions.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Why optimize data models in Snowflake?

Efficient models lower storage, speed queries, and simplify maintenance. Snowflake’s micro-partitioning and automatic clustering reward well-designed schemas with less compute spend.

Which schema suits ecommerce analytics?

Use a star schema: fact table Orders with foreign keys to dimension tables Customers, Products, and calendar. Snowflake’s columnar engine quickly scans narrow fact columns while dimensions add meaning.

Star schema vs. wide tables?

Star schema wins for agility. Wide tables duplicate data, inflate storage, and force full-table scans. Stars isolate change, enabling incremental loads and clustering.

How to define tables with correct data types?

Choose smallest type that fits. Use NUMBER(38,0) for surrogate keys, TIMESTAMP_NTZ for event time, and BOOLEAN flags. Small types reduce micro-partition size and improve cache hits.

When should I cluster or use automatic clustering?

Cluster large fact tables (>1 B rows) on frequent filter columns such as order_date or customer_id. Enable automatic clustering only when natural data skew causes constant re-ordering; otherwise run manual ALTER TABLE ... RECLUSTER jobs off-peak.

What naming conventions help?

Use singular nouns, upper snake case, and suffix keys with _ID. Prefix surrogate keys with table alias (e.g., CUSTOMER_ID). Consistency accelerates onboarding and auto-complete.

How to handle slowly changing dimensions (SCD)?

Implement Type 2 using ACTIVE_FLAG, EFFECTIVE_FROM, and EFFECTIVE_TO. Snowflake’s zero-copy cloning lets you test SCD logic without extra storage.

How to secure sensitive columns?

Apply column-level masking policies on email and total_amount. Tag PII columns and grant SELECT only to authorized roles. Data modeling and governance go hand in hand.

Can I evolve the model safely?

Use version-controlled DDL scripts and zero-downtime views. Create new columns, back-fill with UPDATE, then switch BI tools to new view. Snowflake streams + tasks automate back-fills.

Why How to Apply Data Modeling Best Practices in Snowflake is important

How to Apply Data Modeling Best Practices in Snowflake Example Usage


-- Total revenue by day and product category in last 30 days
SELECT  o.order_date,
        p.name      AS product_name,
        SUM(oi.quantity * p.price) AS revenue
FROM    fact_orders            o
JOIN    bridge_order_items     oi ON o.order_id = oi.order_id
JOIN    dim_products           p  ON p.product_id = oi.product_id
WHERE   o.order_date >= DATEADD(day, -30, CURRENT_DATE())
GROUP BY 1,2
ORDER BY 1 DESC, 3 DESC;

How to Apply Data Modeling Best Practices in Snowflake Syntax


-- Customers dimension
CREATE OR REPLACE TABLE DIM_CUSTOMERS (
  CUSTOMER_ID NUMBER(38,0) PRIMARY KEY,
  NAME        STRING,
  EMAIL       STRING,
  CREATED_AT  TIMESTAMP_NTZ,
  ACTIVE_FLAG BOOLEAN DEFAULT TRUE
);

-- Products dimension
CREATE OR REPLACE TABLE DIM_PRODUCTS (
  PRODUCT_ID   NUMBER(38,0) PRIMARY KEY,
  NAME         STRING,
  PRICE        NUMBER(10,2),
  STOCK        NUMBER(38,0),
  ACTIVE_FLAG  BOOLEAN DEFAULT TRUE
);

-- Orders fact (clustered on order_date)
CREATE OR REPLACE TABLE FACT_ORDERS (
  ORDER_ID     NUMBER(38,0) PRIMARY KEY,
  CUSTOMER_ID  NUMBER(38,0) REFERENCES DIM_CUSTOMERS,
  ORDER_DATE   DATE,
  TOTAL_AMOUNT NUMBER(12,2)
)
CLUSTER BY (ORDER_DATE);

-- Order items bridge
CREATE OR REPLACE TABLE BRIDGE_ORDER_ITEMS (
  ORDER_ITEM_ID NUMBER(38,0) PRIMARY KEY,
  ORDER_ID      NUMBER(38,0) REFERENCES FACT_ORDERS,
  PRODUCT_ID    NUMBER(38,0) REFERENCES DIM_PRODUCTS,
  QUANTITY      NUMBER(38,0)
);

Common Mistakes

Frequently Asked Questions (FAQs)

Does Snowflake require partitioning?

No manual partitions. Micro-partitioning is automatic; you only decide clustering for very large tables.

Can I mix OLTP and analytics models?

Yes, but separate compute warehouses and schemas. Keep OLTP in narrow tables; analytics in star schemas.

How do zero-copy clones help modeling?

They let you test schema changes on TB-scale data instantly without extra storage costs.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie
BauHealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.