Design scalable, performant Snowflake schemas by following star-schema principles, proper data types, clustering, and naming conventions.
Efficient models lower storage, speed queries, and simplify maintenance. Snowflake’s micro-partitioning and automatic clustering reward well-designed schemas with less compute spend.
Use a star schema: fact table Orders
with foreign keys to dimension tables Customers
, Products
, and calendar. Snowflake’s columnar engine quickly scans narrow fact columns while dimensions add meaning.
Star schema wins for agility. Wide tables duplicate data, inflate storage, and force full-table scans. Stars isolate change, enabling incremental loads and clustering.
Choose smallest type that fits. Use NUMBER(38,0)
for surrogate keys, TIMESTAMP_NTZ
for event time, and BOOLEAN
flags. Small types reduce micro-partition size and improve cache hits.
Cluster large fact tables (>1 B rows) on frequent filter columns such as order_date
or customer_id
. Enable automatic clustering only when natural data skew causes constant re-ordering; otherwise run manual ALTER TABLE ... RECLUSTER
jobs off-peak.
Use singular nouns, upper snake case, and suffix keys with _ID
. Prefix surrogate keys with table alias (e.g., CUSTOMER_ID
). Consistency accelerates onboarding and auto-complete.
Implement Type 2 using ACTIVE_FLAG
, EFFECTIVE_FROM
, and EFFECTIVE_TO
. Snowflake’s zero-copy cloning lets you test SCD logic without extra storage.
Apply column-level masking policies on email
and total_amount
. Tag PII columns and grant SELECT
only to authorized roles. Data modeling and governance go hand in hand.
Use version-controlled DDL scripts and zero-downtime views. Create new columns, back-fill with UPDATE
, then switch BI tools to new view. Snowflake streams + tasks automate back-fills.
No manual partitions. Micro-partitioning is automatic; you only decide clustering for very large tables.
Yes, but separate compute warehouses and schemas. Keep OLTP in narrow tables; analytics in star schemas.
They let you test schema changes on TB-scale data instantly without extra storage costs.