Cohort-Based Customer Churn Calculation in SQL

Galaxy Glossary

How do I calculate churn using a cohort method in SQL?

Cohort churn analysis measures the percentage of users who leave over time by grouping them into cohorts based on a shared start event and tracking their status period-over-period with SQL.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Description

Cohort churn analysis lets you understand when, how fast, and for which groups customers abandon your product. By writing a handful of well-structured SQL queries you can move beyond one-number churn rates to a nuanced retention story that drives product, marketing, and finance decisions.

What Is Cohort Churn Analysis?

Traditional churn reporting produces a single rate — say, 4.3 % monthly churn. While useful, this obscures critical patterns. Cohort analysis fixes this by grouping users into buckets that share a meaningful starting event (signup date, first order, subscription start) and measuring how many users remain active in each subsequent period. The result is a retention curve for each cohort and a churn matrix that surfaces seasonality, onboarding quality, and long-term product fit.

Why Cohort Churn Matters

  • Diagnose onboarding issues. Steep drop-offs in the first month often signal activation friction.
  • Highlight product/market fit. Flattening curves after month N show stable retention, while continuous decline flags poor value delivery.
  • Compare marketing channels. Different acquisition cohorts reveal which sources bring stickier users.
  • Improve forecasting. Financial models built from cohort retention produce more accurate revenue projections than a single blended churn figure.

Core Concepts & Terminology

Cohort Definition

The column or expression that assigns each user to a group. Common choices:

  • DATE_TRUNC('month', signup_date) for monthly signup cohorts
  • first_paid_invoice_date for subscription products
  • account_created_at::date for daily granular tracking

Period Index

The number of intervals (days, weeks, months) since the cohort start. We usually create it with DATEDIFF or timestamp arithmetic.

Active vs. Churned Definition

You must pick an activity event that marks a user as retained (e.g., any session, purchase, or invoice) and decide when a user is officially churned (no active events for an entire period).

High-Level Algorithm

  1. Identify each user’s cohort start date.
  2. Generate a date spine of all periods from cohort start through today.
  3. Join user-period pairs with activity events to flag whether the user was active in that period.
  4. Aggregate to get users_retained per cohort-period.
  5. Divide by the cohort size to derive retention_rate and subtract from 1 for churn_rate.

Step-By-Step SQL Walkthrough

1. Capture Each User’s First Activity

WITH user_cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', MIN(event_time)) AS cohort_month,
MIN(event_time)::date AS cohort_start_date
FROM raw.events
WHERE event_name = 'signup'
GROUP BY 1
)

This CTE assigns every user to a monthly signup cohort.

2. Build a Period Spine

, date_spine AS (
SELECT
DATE_TRUNC('month', dd)::date AS period_start
FROM GENERATE_SERIES(
(SELECT MIN(cohort_start_date) FROM user_cohorts),
CURRENT_DATE,
'1 month'::interval
) dd
)

GENERATE_SERIES (Postgres/Redshift/Snowflake variant) produces one row per month.

3. Create All User-Period Combinations

, cohort_periods AS (
SELECT
uc.user_id,
uc.cohort_month,
ds.period_start,
EXTRACT(YEAR FROM AGE(ds.period_start, uc.cohort_month)) * 12 +
EXTRACT(MONTH FROM AGE(ds.period_start, uc.cohort_month)) AS months_since_cohort
FROM user_cohorts uc
JOIN date_spine ds
ON ds.period_start >= uc.cohort_month
)

We now have one row for every user in every month after signup.

4. Flag Activity

, activity AS (
SELECT DISTINCT
user_id,
DATE_TRUNC('month', event_time) AS activity_month
FROM raw.events
WHERE event_name IN ('login', 'purchase', 'subscription_renewal')
)

, cohort_activity AS (
SELECT
cp.cohort_month,
cp.months_since_cohort,
COUNT(DISTINCT cp.user_id) AS cohort_size,
COUNT(DISTINCT a.user_id) AS users_retained
FROM cohort_periods cp
LEFT JOIN activity a
ON a.user_id = cp.user_id
AND a.activity_month = cp.period_start
GROUP BY 1,2
)

SELECT
cohort_month,
months_since_cohort,
users_retained,
cohort_size,
ROUND(100.0 * users_retained / cohort_size, 2) AS retention_rate_percent,
ROUND(100.0 * (1 - users_retained::decimal / cohort_size), 2) AS churn_rate_percent
FROM cohort_activity
ORDER BY cohort_month, months_since_cohort;

The final query yields a matrix of retention and churn percentages for each cohort by months-since-start.

Interpreting The Results

Retention Curve

Plot retention_rate_percent by months_since_cohort. A healthy SaaS product often stabilizes after a few periods.

Heatmap

Visualize cohorts horizontally and periods vertically. Darker colors for low churn help identify improvements after product launches or pricing changes.

Performance & Optimization Tips

  • Use clustered/partitioned tables on event_time or user_id to accelerate date filtering.
  • Pre-aggregate activity events into a daily active users table if raw logs are huge.
  • Add a materialized view for user_cohorts — the set rarely changes after a user signs up.
  • Limit periods with a configurable max window (e.g., 24 months) to prevent exploding joins.

Best Practices

  1. Define churn clearly. Stakeholders must agree on what "inactive" means — 30 days with no session? Unpaid for a renewal cycle?
  2. Separate voluntary vs. involuntary churn. Payment failures are a different problem than customers who cancel.
  3. Version your SQL. Save queries in a version-controlled repository or an editor like Galaxy Collections so the logic is auditable.
  4. Automate refreshes. Schedule the churn pipeline daily or weekly to monitor cohort health continuously.

Common Pitfalls

Overlapping Activity Definitions

Using multiple overlapping events (e.g., both login and view_page) can inflate retention. Stick to the most meaningful activity signal.

Ignoring Surviving Bias

Excluding very recent cohorts (with little time to churn) makes overall churn look better. Always weight cohorts by size and include partial periods separately.

Shifting Cohort Anchors

If you re-bucket cohorts when users upgrade plans, you lose comparability. The cohort start should remain immutable.

How Galaxy Helps

Galaxy’s desktop SQL editor speeds up cohort analysis by:

  • AI Copilot Autocomplete. Generate date spines and window functions instantly.
  • Collections & Endorsements. Share the canonical churn query with finance and growth teams without pasting SQL in Slack.
  • Parameterization. Turn months_since_cohort <= :max_period into a tunable variable for ad-hoc what-if analysis.
  • Result-set snapshots. Compare this week’s retention curve to last week’s with built-in diffing.

Next Steps

Once you trust the churn pipeline, extend it by:

  • Adding re-activation logic for users who come back after churning.
  • Segmenting cohorts by country, plan_tier, or acquisition_channel.
  • Pushing the retention matrix to your BI tool for dashboards and alerts.

Conclusion

Cohort-based churn analysis transforms a flat statistic into an actionable narrative. By combining date spines, cohort assignments, and activity flags in SQL, you gain a lens on product health that single-metric dashboards cannot offer. With modern tooling like Galaxy, producing and sharing these insights is faster than ever.

Why Cohort-Based Customer Churn Calculation in SQL is important

Blended churn rates hide critical retention patterns. Cohort analysis surfaces when and why customers leave, enabling targeted onboarding fixes, channel optimization, and accurate revenue forecasting. SQL remains the lingua franca for warehoused data, so knowing how to write performant cohort queries is essential for every data engineer and analyst.

Cohort-Based Customer Churn Calculation in SQL Example Usage


-- Get 3-month churn for January 2023 signup cohort
WITH jan_2023 AS (
  SELECT user_id
  FROM users
  WHERE DATE_TRUNC('month', signup_date) = '2023-01-01'
), months_0_to_3 AS (
  SELECT generate_series(0,3) AS month_n)
SELECT
  month_n,
  COUNT(DISTINCT u.user_id) FILTER (WHERE a.user_id IS NOT NULL) AS retained,
  COUNT(DISTINCT u.user_id)                                   AS cohort_size,
  1 - COUNT(DISTINCT u.user_id) FILTER (WHERE a.user_id IS NOT NULL)::decimal
      / COUNT(DISTINCT u.user_id)                              AS churn_rate
FROM jan_2023 u
CROSS JOIN months_0_to_3 m
LEFT JOIN activity a
  ON a.user_id = u.user_id
 AND DATE_TRUNC('month', a.event_time) = '2023-01-01'::date + (m.month_n || ' month')::interval
GROUP BY 1
ORDER BY 1;

Common Mistakes

Frequently Asked Questions (FAQs)

What is the difference between retention and churn?

Retention measures the percentage of users who stay active in a period, while churn measures the percentage who leave. Mathematically they are complements: churn = 1 − retention.

How many periods should I include in my cohort analysis?

Most SaaS companies track at least 12 months to capture annual renewal cycles. B2C apps may analyze 4–8 weeks for faster feedback loops. Pick a window that matches your business’s natural usage cadence.

Can I calculate cohort churn in Galaxy?

Absolutely. Galaxy’s SQL editor and AI Copilot generate date spines, suggest window functions, and let you save the canonical churn query in a shared Collection so teammates can endorse and reuse it without copy-pasting SQL.

How do I handle users who reactivate after churn?

Add a separate reactivated flag. Treat the period of inactivity as churn, then create a secondary cohort starting at reactivation to study behavior after return.

Want to learn about other SQL terms?