A star schema for SaaS subscriptions is a dimensional data model that organizes subscription events and related business entities into a central fact table surrounded by descriptive dimension tables, enabling fast, intuitive analytics of recurring-revenue metrics.
Software-as-a-Service (SaaS) businesses live and die by metrics like Monthly Recurring Revenue (MRR), churn, expansion, cohort retention, and lifetime value. To analyze these KPIs at scale, you need a data model that is both business-friendly and performance-optimized. A properly designed star schema delivers exactly that by combining a central Fact table—where numeric measures live—with a set of conformed Dimension tables that provide rich context.
The fact_subscription
table records every billable subscription state on a specific date (often daily snapshots) or every subscription lifecycle event (start, upgrade, cancel, renew). Common measures include:
mrr
– Monthly recurring revenue in the customer’s currency.arr
– Annual recurring revenue (12 × MRR).seat_count
– Number of licensed seats.is_active
– Boolean flag for active subscriptions.delta_mrr
– Change in MRR relative to previous snapshot (for churn/expansion analysis).Holds slowly changing attributes about the client organization/person such as industry, segment, geography, acquisition channel, and account owner.
Describes pricing plans (e.g., Basic, Pro, Enterprise), billing frequency, included features, and list price. Useful for mix and upsell analyses.
The most reused dimension, providing day/week/month/quarter hierarchies, fiscal calendars, and flags like is_month_end
.
Operational subscription systems—think Stripe Billing, Chargebee, or your own microservice—are designed for transactional consistency, not analytics. They often have highly normalized relational schemas with dozens of join tables, making even a simple churn query slow and unreadable. A star schema solves this by:
Snapshot Grain: One row per subscription per day (or month). Simplifies period-end reporting and retention curves but can bloat quickly.
Event Grain: One row per lifecycle event (create, upgrade, cancel). Storage-friendly and explicit about change reasons but requires window functions to derive period states. Many teams model both: an fact_subscription_event
at event grain and a derived fact_subscription_daily
snapshot table.
Suppose you want to calculate Net New MRR for January 2024: expansion + new − churn − contraction. With the star schema, the SQL is straightforward:
SELECT
d_month.month_start_date AS month,
SUM(CASE WHEN f.delta_mrr > 0 AND f.is_new_customer THEN f.delta_mrr ELSE 0 END) AS new_mrr,
SUM(CASE WHEN f.delta_mrr > 0 AND NOT f.is_new_customer THEN f.delta_mrr ELSE 0 END) AS expansion_mrr,
SUM(CASE WHEN f.delta_mrr < 0 AND f.is_cancellation THEN -f.delta_mrr ELSE 0 END) AS churn_mrr,
SUM(CASE WHEN f.delta_mrr < 0 AND NOT f.is_cancellation THEN -f.delta_mrr ELSE 0 END) AS contraction_mrr,
SUM(f.delta_mrr) AS net_new_mrr
FROM fact_subscription_daily f
JOIN dim_time d_day ON d_day.date_key = f.date_key
JOIN dim_time d_month ON d_month.month_key = d_day.month_key
WHERE d_month.month_start_date = '2024-01-01'
GROUP BY d_month.month_start_date;
Share customer, time, and plan dimensions across other fact tables (e.g., usage, payments) for drill-across capabilities.
Use integer surrogate keys generated in the warehouse to decouple analytics from source-system IDs, allowing for late arriving dimension rows.
Capture changes in customer industry or plan attributes over time so historical reports remain accurate. Append new rows in the dimension with effective_from
and effective_to
dates.
Add batch_id
, load_timestamp
, and source_system
to support data lineage and recovery.
Mistake: Splitting customer contact, address, and segment into separate tables.
Fix: Denormalize into one wide dim_customer
. The star schema favors readability over disk savings.
Mistake: Storing MRR in different currencies without a conversion rate.
Fix: Add mrr_usd
or maintain a currency dimension with daily FX rates.
Mistake: When you snapshot daily, an upgrade can appear as both contraction and expansion across different days.
Fix: Use delta_mrr
sign to categorize events and aggregate at period level.
While Galaxy is not a modeling tool, its AI-powered SQL editor accelerates query writing against your star schema. Autocomplete surfaces the right dimension columns, while the copilot can suggest optimized aggregation strategies—great for analysts exploring churn drivers without wading through dozens of tables.
A SaaS star schema turns subscription data into a single source of truth. With clearly defined facts and dimensions, finance and product teams can self-serve answers, dashboards run faster, and machine-learning features (e.g., churn prediction) get a clean training set. Invest the time to model correctly now, and you’ll pay down months of future “spreadsheet tax.”
SaaS businesses rely on granular, trustworthy metrics to inform product, finance, and go-to-market decisions. A star schema structures subscription data so teams can calculate MRR, churn, and retention quickly without wrestling with transactional tables or spreadsheets. The result is faster insight, lower query costs, and a unified language for the whole company.
For financial metrics, daily snapshots are common because they align with period-end reporting. However, event grain is more storage-efficient and may be better for auditing. Many data teams keep both.
Yes. While fact_subscription
captures contracted revenue, a fact_payment
stores actual cash collections. Linking both through customer_key
and invoice_key
enables DSO and cash-flow analysis.
Absolutely. Galaxy’s autocomplete and AI copilot make it easy to discover dimension attributes, write complex window functions, and share vetted queries with your team via Collections.
Include trials in fact_subscription
with mrr = 0
but flag them via is_trial = TRUE
. When a trial converts, update is_trial
and adjust mrr
accordingly, preserving historical rows if you use snapshots.