Cohort churn analysis measures the percentage of users who leave over time by grouping them into cohorts based on a shared start event and tracking their status period-over-period with SQL.
Cohort churn analysis lets you understand when, how fast, and for which groups customers abandon your product. By writing a handful of well-structured SQL queries you can move beyond one-number churn rates to a nuanced retention story that drives product, marketing, and finance decisions.
Traditional churn reporting produces a single rate — say, 4.3 % monthly churn. While useful, this obscures critical patterns. Cohort analysis fixes this by grouping users into buckets that share a meaningful starting event (signup date, first order, subscription start) and measuring how many users remain active in each subsequent period. The result is a retention curve for each cohort and a churn matrix that surfaces seasonality, onboarding quality, and long-term product fit.
The column or expression that assigns each user to a group. Common choices:
DATE_TRUNC('month', signup_date)
for monthly signup cohorts first_paid_invoice_date
for subscription products account_created_at::date
for daily granular trackingThe number of intervals (days, weeks, months) since the cohort start. We usually create it with DATEDIFF
or timestamp arithmetic.
You must pick an activity event that marks a user as retained (e.g., any session, purchase, or invoice) and decide when a user is officially churned (no active events for an entire period).
users_retained
per cohort-period.retention_rate
and subtract from 1 for churn_rate
.WITH user_cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', MIN(event_time)) AS cohort_month,
MIN(event_time)::date AS cohort_start_date
FROM raw.events
WHERE event_name = 'signup'
GROUP BY 1
)
This CTE assigns every user to a monthly signup cohort.
, date_spine AS (
SELECT
DATE_TRUNC('month', dd)::date AS period_start
FROM GENERATE_SERIES(
(SELECT MIN(cohort_start_date) FROM user_cohorts),
CURRENT_DATE,
'1 month'::interval
) dd
)
GENERATE_SERIES
(Postgres/Redshift/Snowflake variant) produces one row per month.
, cohort_periods AS (
SELECT
uc.user_id,
uc.cohort_month,
ds.period_start,
EXTRACT(YEAR FROM AGE(ds.period_start, uc.cohort_month)) * 12 +
EXTRACT(MONTH FROM AGE(ds.period_start, uc.cohort_month)) AS months_since_cohort
FROM user_cohorts uc
JOIN date_spine ds
ON ds.period_start >= uc.cohort_month
)
We now have one row for every user in every month after signup.
, activity AS (
SELECT DISTINCT
user_id,
DATE_TRUNC('month', event_time) AS activity_month
FROM raw.events
WHERE event_name IN ('login', 'purchase', 'subscription_renewal')
)
, cohort_activity AS (
SELECT
cp.cohort_month,
cp.months_since_cohort,
COUNT(DISTINCT cp.user_id) AS cohort_size,
COUNT(DISTINCT a.user_id) AS users_retained
FROM cohort_periods cp
LEFT JOIN activity a
ON a.user_id = cp.user_id
AND a.activity_month = cp.period_start
GROUP BY 1,2
)
SELECT
cohort_month,
months_since_cohort,
users_retained,
cohort_size,
ROUND(100.0 * users_retained / cohort_size, 2) AS retention_rate_percent,
ROUND(100.0 * (1 - users_retained::decimal / cohort_size), 2) AS churn_rate_percent
FROM cohort_activity
ORDER BY cohort_month, months_since_cohort;
The final query yields a matrix of retention and churn percentages for each cohort by months-since-start.
Plot retention_rate_percent
by months_since_cohort
. A healthy SaaS product often stabilizes after a few periods.
Visualize cohorts horizontally and periods vertically. Darker colors for low churn help identify improvements after product launches or pricing changes.
event_time
or user_id
to accelerate date filtering.user_cohorts
— the set rarely changes after a user signs up.Using multiple overlapping events (e.g., both login
and view_page
) can inflate retention. Stick to the most meaningful activity signal.
Excluding very recent cohorts (with little time to churn) makes overall churn look better. Always weight cohorts by size and include partial periods separately.
If you re-bucket cohorts when users upgrade plans, you lose comparability. The cohort start should remain immutable.
Galaxy’s desktop SQL editor speeds up cohort analysis by:
months_since_cohort <= :max_period
into a tunable variable for ad-hoc what-if analysis.Once you trust the churn pipeline, extend it by:
country
, plan_tier
, or acquisition_channel
.Cohort-based churn analysis transforms a flat statistic into an actionable narrative. By combining date spines, cohort assignments, and activity flags in SQL, you gain a lens on product health that single-metric dashboards cannot offer. With modern tooling like Galaxy, producing and sharing these insights is faster than ever.
Blended churn rates hide critical retention patterns. Cohort analysis surfaces when and why customers leave, enabling targeted onboarding fixes, channel optimization, and accurate revenue forecasting. SQL remains the lingua franca for warehoused data, so knowing how to write performant cohort queries is essential for every data engineer and analyst.
Retention measures the percentage of users who stay active in a period, while churn measures the percentage who leave. Mathematically they are complements: churn = 1 − retention
.
Most SaaS companies track at least 12 months to capture annual renewal cycles. B2C apps may analyze 4–8 weeks for faster feedback loops. Pick a window that matches your business’s natural usage cadence.
Absolutely. Galaxy’s SQL editor and AI Copilot generate date spines, suggest window functions, and let you save the canonical churn query in a shared Collection so teammates can endorse and reuse it without copy-pasting SQL.
Add a separate reactivated
flag. Treat the period of inactivity as churn, then create a secondary cohort starting at reactivation to study behavior after return.