Retention cohort analysis groups users by their acquisition period and tracks how many of them return or stay active over subsequent periods, letting teams measure product stickiness.
Retention cohort analysis groups users by the moment they first appear in your data—usually a signup date—and then measures how many of those same users perform a follow-up action (log-in, purchase, session) in later time buckets. By comparing the size of a cohort to its later activity counts you can answer questions like “What percentage of February sign-ups are still active three months later?”
Relational databases already store time-stamped user events. SQL’s grouping, date functions, and window functions make it easy to:
The event that establishes cohort membership, e.g., users.created_at
.
The follow-up behavior you care about—sessions, orders, messages sent.
The bucket size (day, week, month) applied consistently to both acquisition and activity timestamps.
An integer representing how many buckets have passed since acquisition. Period 0 is the signup bucket.
You need two tables (or one denormalized table):
users
– one row per user with a created_at
timestamp.events
– many rows per user capturing all activity (event name, event_time
).Ensure timestamps are in the same time zone and indexed for performance.
Bucket created_at
into your chosen grain with DATE_TRUNC
(Postgres/Snowflake) or TRUNC
(BigQuery).
Similarly bucket each qualifying activity event.
Connect users to their activity rows via JOIN
, keeping only events on or after the acquisition date.
Use DATE_PART
, TIMESTAMP_DIFF
, or plain arithmetic on epoch days to get an integer offset.
Aggregate by cohort_date
and period_index
.
With a window function, divide each period’s active count by period 0’s count for the same cohort.
WITH signups AS (
SELECT user_id,
DATE_TRUNC('month', created_at)::date AS cohort_month
FROM users
),
activity AS (
SELECT user_id,
DATE_TRUNC('month', event_time)::date AS activity_month
FROM events
WHERE event_name = 'session'
),
cohorts AS (
SELECT s.cohort_month,
a.activity_month,
DATE_PART('month', AGE(a.activity_month, s.cohort_month)) AS months_since_signup,
COUNT(DISTINCT a.user_id) AS active_users
FROM signups s
JOIN activity a USING (user_id)
WHERE a.activity_month >= s.cohort_month
GROUP BY 1,2,3
)
SELECT cohort_month,
months_since_signup,
active_users,
FIRST_VALUE(active_users) OVER (PARTITION BY cohort_month ORDER BY months_since_signup) AS cohort_size,
ROUND(active_users * 100.0 /
FIRST_VALUE(active_users) OVER (PARTITION BY cohort_month ORDER BY months_since_signup), 2) AS retention_rate
FROM cohorts
ORDER BY cohort_month, months_since_signup;
The result is a flat table you can pivot in BI or graph as a heatmap.
Retention measures unique users. Summing events inflates rates. Always COUNT(DISTINCT user_id)
.
Failing to filter activity_date >= signup_date
will yield negative period indices and nonsense retention.
Analysts sometimes switch from monthly cohorts to weekly activity when zooming in—this produces non-comparable percentages.
Imagine an e-commerce app with monthly cohorts. February sign-ups (15,000 users) generate:
The query above produces these numbers, and you can visualize them as a row in a retention matrix.
Warehouses output a tall table; your BI tool (or a quick Python notebook) can pivot on cohort_month
vs months_since_signup
and color by retention_rate
.
Galaxy’s modern SQL editor helps you build cohort queries faster:
DATE_TRUNC
, AGE
, and the correct timestamp columns as you type.Retention cohorts are one of the fastest ways to gauge product health. With disciplined SQL patterns and modern tooling like Galaxy, you can calculate, share, and iterate on retention metrics in minutes—no spreadsheets required.
Retention reveals whether your product delivers ongoing value or just drives one-time sign-ups. Cohort analysis pinpoints when and why users churn, guides feature prioritization, and surfaces the ROI of marketing campaigns. Engineers and analysts who can build cohort queries directly in the warehouse avoid spreadsheet errors, automate reporting, and empower teams to make data-backed decisions faster.
A retention cohort is a group of users who all performed an initial action—usually signing up—within the same time bucket. Analysts then track how many users in that group return or stay active in subsequent buckets, producing a retention curve.
Choose the grain that matches your product’s usage cycle. B2C apps often use weekly cohorts; enterprise SaaS commonly uses monthly. Consistency is more important than the specific grain.
Galaxy’s AI copilot can autogenerate the cohort SQL from a plain-English prompt, while Collections let your team endorse the final query so everyone reuses the same logic without pasting SQL in Slack.
Yes. As long as event timestamps are stored in a warehouse table, you can bucket them with DATE_TRUNC
and apply the same SQL patterns described here.