Designing a Star Schema for a SaaS Subscription Model

How do I design a star schema for a SaaS subscription model?

A star schema for SaaS subscriptions is a dimensional data model that organizes subscription events and related business entities into a central fact table surrounded by descriptive dimension tables, enabling fast, intuitive analytics of recurring-revenue metrics.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

The Big Picture

Software-as-a-Service (SaaS) businesses live and die by metrics like Monthly Recurring Revenue (MRR), churn, expansion, cohort retention, and lifetime value. To analyze these KPIs at scale, you need a data model that is both business-friendly and performance-optimized. A properly designed star schema delivers exactly that by combining a central Fact table—where numeric measures live—with a set of conformed Dimension tables that provide rich context.

Core Components of the SaaS Star Schema

1. Fact Subscription

The fact_subscription table records every billable subscription state on a specific date (often daily snapshots) or every subscription lifecycle event (start, upgrade, cancel, renew). Common measures include:

mrr – Monthly recurring revenue in the customer’s currency.
arr – Annual recurring revenue (12 × MRR).
seat_count – Number of licensed seats.
is_active – Boolean flag for active subscriptions.
delta_mrr – Change in MRR relative to previous snapshot (for churn/expansion analysis).

2. Dimension Customer

Holds slowly changing attributes about the client organization/person such as industry, segment, geography, acquisition channel, and account owner.

3. Dimension Plan

Describes pricing plans (e.g., Basic, Pro, Enterprise), billing frequency, included features, and list price. Useful for mix and upsell analyses.

4. Dimension Time

The most reused dimension, providing day/week/month/quarter hierarchies, fiscal calendars, and flags like is_month_end.

5. Optional Dimensions

Dimension Product – If your SaaS offers multiple SKUs/modules.
Dimension Sales Rep – For quota and pipeline alignment.
Dimension Promotion – To track discounts, coupon codes, or campaigns.

Why a Star Schema Beats an OLTP Schema for Analytics

Operational subscription systems—think Stripe Billing, Chargebee, or your own microservice—are designed for transactional consistency, not analytics. They often have highly normalized relational schemas with dozens of join tables, making even a simple churn query slow and unreadable. A star schema solves this by:

Reducing Join Depth: All dimensions connect directly to the fact table via surrogate keys.
Enforcing Business Logic: Metrics (e.g., churned MRR) are pre-calculated or derivable with simple SUMs.
Enabling BI Tools: Columnar warehouses (BigQuery, Snowflake, Redshift) and BI layers (Looker, Metabase) optimize star joins automatically.

Modeling Granularity: Snapshot vs. Event

Snapshot Grain: One row per subscription per day (or month). Simplifies period-end reporting and retention curves but can bloat quickly.

Event Grain: One row per lifecycle event (create, upgrade, cancel). Storage-friendly and explicit about change reasons but requires window functions to derive period states. Many teams model both: an fact_subscription_event at event grain and a derived fact_subscription_daily snapshot table.

Practical Example: Calculating Net New MRR

Suppose you want to calculate Net New MRR for January 2024: expansion + new − churn − contraction. With the star schema, the SQL is straightforward:

SELECT d_month.month_start_date AS month, SUM(CASE WHEN f.delta_mrr > 0 AND f.is_new_customer THEN f.delta_mrr ELSE 0 END) AS new_mrr, SUM(CASE WHEN f.delta_mrr > 0 AND NOT f.is_new_customer THEN f.delta_mrr ELSE 0 END) AS expansion_mrr, SUM(CASE WHEN f.delta_mrr < 0 AND f.is_cancellation THEN -f.delta_mrr ELSE 0 END) AS churn_mrr, SUM(CASE WHEN f.delta_mrr < 0 AND NOT f.is_cancellation THEN -f.delta_mrr ELSE 0 END) AS contraction_mrr, SUM(f.delta_mrr) AS net_new_mrr FROM fact_subscription_daily f JOIN dim_time d_day ON d_day.date_key = f.date_key JOIN dim_time d_month ON d_month.month_key = d_day.month_key WHERE d_month.month_start_date = '2024-01-01' GROUP BY d_month.month_start_date;

Best Practices

Conformed Dimensions

Share customer, time, and plan dimensions across other fact tables (e.g., usage, payments) for drill-across capabilities.

Surrogate Keys Over Natural Keys

Use integer surrogate keys generated in the warehouse to decouple analytics from source-system IDs, allowing for late arriving dimension rows.

SCD Type 2 for Slowly Changing Data

Capture changes in customer industry or plan attributes over time so historical reports remain accurate. Append new rows in the dimension with effective_from and effective_to dates.

Audit Columns

Add batch_id, load_timestamp, and source_system to support data lineage and recovery.

Common Pitfalls & How to Fix Them

Over-Normalizing Dimensions

Mistake: Splitting customer contact, address, and segment into separate tables.
Fix: Denormalize into one wide dim_customer. The star schema favors readability over disk savings.

Mixing Currencies in the Fact

Mistake: Storing MRR in different currencies without a conversion rate.
Fix: Add mrr_usd or maintain a currency dimension with daily FX rates.

Double-Counting Upgrades

Mistake: When you snapshot daily, an upgrade can appear as both contraction and expansion across different days.
Fix: Use delta_mrr sign to categorize events and aggregate at period level.

Galaxy & the Star Schema

While Galaxy is not a modeling tool, its AI-powered SQL editor accelerates query writing against your star schema. Autocomplete surfaces the right dimension columns, while the copilot can suggest optimized aggregation strategies—great for analysts exploring churn drivers without wading through dozens of tables.

Putting It All Together

A SaaS star schema turns subscription data into a single source of truth. With clearly defined facts and dimensions, finance and product teams can self-serve answers, dashboards run faster, and machine-learning features (e.g., churn prediction) get a clean training set. Invest the time to model correctly now, and you’ll pay down months of future “spreadsheet tax.”

Why Designing a Star Schema for a SaaS Subscription Model is important

SaaS businesses rely on granular, trustworthy metrics to inform product, finance, and go-to-market decisions. A star schema structures subscription data so teams can calculate MRR, churn, and retention quickly without wrestling with transactional tables or spreadsheets. The result is faster insight, lower query costs, and a unified language for the whole company.

Designing a Star Schema for a SaaS Subscription Model Example Usage


-- Monthly MRR trend by plan for 2023
SELECT d.month_start_date,
       p.plan_name,
       SUM(f.mrr) AS mrr
FROM fact_subscription_daily f
JOIN dim_time d    ON d.date_key = f.date_key
JOIN dim_plan p    ON p.plan_key = f.plan_key
WHERE d.year = 2023
GROUP BY d.month_start_date, p.plan_name
ORDER BY d.month_start_date, p.plan_name;

Designing a Star Schema for a SaaS Subscription Model Syntax

Common Mistakes

Over-normalizing dimensions: Modeling contact, address, and segment in separate tables decreases query performance and user comprehension. Keep dimensions wide and descriptive.
Ignoring currency conversion: Mixing multiple currencies in a single MRR field leads to impossible totals. Store both local and reporting-currency amounts or build an FX conversion layer.
Failing to capture plan changes: If you don’t Type-2 track plan attributes, historical reports misclassify legacy subscriptions. Include effective date ranges in <code>dim_plan</code>.

Frequently Asked Questions (FAQs)

What grain should the fact table use?

For financial metrics, daily snapshots are common because they align with period-end reporting. However, event grain is more storage-efficient and may be better for auditing. Many data teams keep both.

Do I need a separate payments fact table?

Yes. While fact_subscription captures contracted revenue, a fact_payment stores actual cash collections. Linking both through customer_key and invoice_key enables DSO and cash-flow analysis.

Can I design and query the star schema using Galaxy’s SQL editor?

Absolutely. Galaxy’s autocomplete and AI copilot make it easy to discover dimension attributes, write complex window functions, and share vetted queries with your team via Collections.

How do I handle free trials?

Include trials in fact_subscription with mrr = 0 but flag them via is_trial = TRUE. When a trial converts, update is_trial and adjust mrr accordingly, preserving historical rows if you use snapshots.