Cohort-Based Customer Churn Calculation in SQL

How do I calculate churn using a cohort method in SQL?

Cohort churn analysis measures the percentage of users who leave over time by grouping them into cohorts based on a shared start event and tracking their status period-over-period with SQL.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

Cohort churn analysis lets you understand when, how fast, and for which groups customers abandon your product. By writing a handful of well-structured SQL queries you can move beyond one-number churn rates to a nuanced retention story that drives product, marketing, and finance decisions.

What Is Cohort Churn Analysis?

Traditional churn reporting produces a single rate — say, 4.3 % monthly churn. While useful, this obscures critical patterns. Cohort analysis fixes this by grouping users into buckets that share a meaningful starting event (signup date, first order, subscription start) and measuring how many users remain active in each subsequent period. The result is a retention curve for each cohort and a churn matrix that surfaces seasonality, onboarding quality, and long-term product fit.

Why Cohort Churn Matters

Diagnose onboarding issues. Steep drop-offs in the first month often signal activation friction.
Highlight product/market fit. Flattening curves after month N show stable retention, while continuous decline flags poor value delivery.
Compare marketing channels. Different acquisition cohorts reveal which sources bring stickier users.
Improve forecasting. Financial models built from cohort retention produce more accurate revenue projections than a single blended churn figure.

Core Concepts & Terminology

Cohort Definition

The column or expression that assigns each user to a group. Common choices:

DATE_TRUNC('month', signup_date) for monthly signup cohorts
first_paid_invoice_date for subscription products
account_created_at::date for daily granular tracking

Period Index

The number of intervals (days, weeks, months) since the cohort start. We usually create it with DATEDIFF or timestamp arithmetic.

Active vs. Churned Definition

You must pick an activity event that marks a user as retained (e.g., any session, purchase, or invoice) and decide when a user is officially churned (no active events for an entire period).

High-Level Algorithm

Identify each user’s cohort start date.
Generate a date spine of all periods from cohort start through today.
Join user-period pairs with activity events to flag whether the user was active in that period.
Aggregate to get users_retained per cohort-period.
Divide by the cohort size to derive retention_rate and subtract from 1 for churn_rate.

Step-By-Step SQL Walkthrough

1. Capture Each User’s First Activity

WITH user_cohorts AS ( SELECT user_id, DATE_TRUNC('month', MIN(event_time)) AS cohort_month, MIN(event_time)::date AS cohort_start_date FROM raw.events WHERE event_name = 'signup' GROUP BY 1 )

This CTE assigns every user to a monthly signup cohort.

2. Build a Period Spine

, date_spine AS ( SELECT DATE_TRUNC('month', dd)::date AS period_start FROM GENERATE_SERIES( (SELECT MIN(cohort_start_date) FROM user_cohorts), CURRENT_DATE, '1 month'::interval ) dd )

GENERATE_SERIES (Postgres/Redshift/Snowflake variant) produces one row per month.

3. Create All User-Period Combinations

, cohort_periods AS ( SELECT uc.user_id, uc.cohort_month, ds.period_start, EXTRACT(YEAR FROM AGE(ds.period_start, uc.cohort_month)) * 12 + EXTRACT(MONTH FROM AGE(ds.period_start, uc.cohort_month)) AS months_since_cohort FROM user_cohorts uc JOIN date_spine ds ON ds.period_start >= uc.cohort_month )

We now have one row for every user in every month after signup.

4. Flag Activity

, activity AS ( SELECT DISTINCT user_id, DATE_TRUNC('month', event_time) AS activity_month FROM raw.events WHERE event_name IN ('login', 'purchase', 'subscription_renewal') ) , cohort_activity AS ( SELECT cp.cohort_month, cp.months_since_cohort, COUNT(DISTINCT cp.user_id) AS cohort_size, COUNT(DISTINCT a.user_id) AS users_retained FROM cohort_periods cp LEFT JOIN activity a ON a.user_id = cp.user_id AND a.activity_month = cp.period_start GROUP BY 1,2 ) SELECT cohort_month, months_since_cohort, users_retained, cohort_size, ROUND(100.0 * users_retained / cohort_size, 2) AS retention_rate_percent, ROUND(100.0 * (1 - users_retained::decimal / cohort_size), 2) AS churn_rate_percent FROM cohort_activity ORDER BY cohort_month, months_since_cohort;

The final query yields a matrix of retention and churn percentages for each cohort by months-since-start.

Interpreting The Results

Retention Curve

Plot retention_rate_percent by months_since_cohort. A healthy SaaS product often stabilizes after a few periods.

Heatmap

Visualize cohorts horizontally and periods vertically. Darker colors for low churn help identify improvements after product launches or pricing changes.

Performance & Optimization Tips

Use clustered/partitioned tables on event_time or user_id to accelerate date filtering.
Pre-aggregate activity events into a daily active users table if raw logs are huge.
Add a materialized view for user_cohorts — the set rarely changes after a user signs up.
Limit periods with a configurable max window (e.g., 24 months) to prevent exploding joins.

Best Practices

Define churn clearly. Stakeholders must agree on what "inactive" means — 30 days with no session? Unpaid for a renewal cycle?
Separate voluntary vs. involuntary churn. Payment failures are a different problem than customers who cancel.
Version your SQL. Save queries in a version-controlled repository or an editor like Galaxy Collections so the logic is auditable.
Automate refreshes. Schedule the churn pipeline daily or weekly to monitor cohort health continuously.

Common Pitfalls

Overlapping Activity Definitions

Using multiple overlapping events (e.g., both login and view_page) can inflate retention. Stick to the most meaningful activity signal.

Ignoring Surviving Bias

Excluding very recent cohorts (with little time to churn) makes overall churn look better. Always weight cohorts by size and include partial periods separately.

Shifting Cohort Anchors

If you re-bucket cohorts when users upgrade plans, you lose comparability. The cohort start should remain immutable.

How Galaxy Helps

Galaxy’s desktop SQL editor speeds up cohort analysis by:

AI Copilot Autocomplete. Generate date spines and window functions instantly.
Collections & Endorsements. Share the canonical churn query with finance and growth teams without pasting SQL in Slack.
Parameterization. Turn months_since_cohort <= :max_period into a tunable variable for ad-hoc what-if analysis.
Result-set snapshots. Compare this week’s retention curve to last week’s with built-in diffing.

Next Steps

Once you trust the churn pipeline, extend it by:

Adding re-activation logic for users who come back after churning.
Segmenting cohorts by country, plan_tier, or acquisition_channel.
Pushing the retention matrix to your BI tool for dashboards and alerts.

Conclusion

Cohort-based churn analysis transforms a flat statistic into an actionable narrative. By combining date spines, cohort assignments, and activity flags in SQL, you gain a lens on product health that single-metric dashboards cannot offer. With modern tooling like Galaxy, producing and sharing these insights is faster than ever.

Why Cohort-Based Customer Churn Calculation in SQL is important

Blended churn rates hide critical retention patterns. Cohort analysis surfaces when and why customers leave, enabling targeted onboarding fixes, channel optimization, and accurate revenue forecasting. SQL remains the lingua franca for warehoused data, so knowing how to write performant cohort queries is essential for every data engineer and analyst.

Cohort-Based Customer Churn Calculation in SQL Example Usage


-- Get 3-month churn for January 2023 signup cohort
WITH jan_2023 AS (
  SELECT user_id
  FROM users
  WHERE DATE_TRUNC('month', signup_date) = '2023-01-01'
), months_0_to_3 AS (
  SELECT generate_series(0,3) AS month_n)
SELECT
  month_n,
  COUNT(DISTINCT u.user_id) FILTER (WHERE a.user_id IS NOT NULL) AS retained,
  COUNT(DISTINCT u.user_id)                                   AS cohort_size,
  1 - COUNT(DISTINCT u.user_id) FILTER (WHERE a.user_id IS NOT NULL)::decimal
      / COUNT(DISTINCT u.user_id)                              AS churn_rate
FROM jan_2023 u
CROSS JOIN months_0_to_3 m
LEFT JOIN activity a
  ON a.user_id = u.user_id
 AND DATE_TRUNC('month', a.event_time) = '2023-01-01'::date + (m.month_n || ' month')::interval
GROUP BY 1
ORDER BY 1;

Cohort-Based Customer Churn Calculation in SQL Syntax

Common Mistakes

Using the current date instead of the cohort start to compute period indexes. This misaligns rows across cohorts. Always calculate <code>months_since_cohort</code> by subtracting <em>period_start – cohort_start</em>.
Counting total events instead of distinct users when aggregating retention. The metric must be user-level; otherwise, heavy users skew the rate. Fix by applying <code>COUNT(DISTINCT user_id)</code>.
Dropping cohorts that have not completed the full analysis window, making churn look artificially low. Include partial cohorts but label them, or weight averages to avoid survivorship bias.

Frequently Asked Questions (FAQs)

What is the difference between retention and churn?

Retention measures the percentage of users who stay active in a period, while churn measures the percentage who leave. Mathematically they are complements: churn = 1 − retention.

How many periods should I include in my cohort analysis?

Most SaaS companies track at least 12 months to capture annual renewal cycles. B2C apps may analyze 4–8 weeks for faster feedback loops. Pick a window that matches your business’s natural usage cadence.

Can I calculate cohort churn in Galaxy?

Absolutely. Galaxy’s SQL editor and AI Copilot generate date spines, suggest window functions, and let you save the canonical churn query in a shared Collection so teammates can endorse and reuse it without copy-pasting SQL.

How do I handle users who reactivate after churn?

Add a separate reactivated flag. Treat the period of inactivity as churn, then create a secondary cohort starting at reactivation to study behavior after return.