row_number() assigns a unique, sequential number to each row within a partition ordered by chosen columns.
row_number() is a window function that returns a monotonically increasing integer starting at 1 for the first row in each partition. The numbering resets whenever the PARTITION BY clause’s values change.
Place row_number() OVER (PARTITION BY … ORDER BY …) in the SELECT list. PARTITION BY groups rows, ORDER BY defines sequence. Omitting ORDER BY yields non-deterministic numbering.
Use it to remove duplicates, paginate results, pick the first/last record per group, or build surrogate keys without altering source tables.
The query below numbers every order for each customer by order_date. It helps analysts track a shopper’s first, second, third purchase.
SELECT
id,
customer_id,
order_date,
row_number() OVER (PARTITION BY customer_id ORDER BY order_date) AS order_seq
FROM Orders;
Wrap the window function in a subquery or use a CTE, then apply WHERE. ClickHouse forbids filtering on window aliases in the same SELECT layer.
WITH o AS (
SELECT *,
row_number() OVER (PARTITION BY customer_id ORDER BY order_date) AS rn
FROM Orders)
SELECT *
FROM o
WHERE rn = 1;
Always specify ORDER BY for deterministic output. Use sparse partitions to avoid heavy memory usage; pre-filter data when possible. Store intermediate results in a MergeTree table if the dataset is huge.
Window functions stream in ClickHouse, but large ORDER BY expressions can spill to disk. Index the ordering columns and limit selected fields to reduce I/O.
1) Forgetting ORDER BY causes random row numbers. 2) Filtering on row_number() in the same SELECT layer fails; always nest or CTE.
No. ClickHouse may output rows in any order, so row numbers can change between runs. Always add ORDER BY for stable results.
Yes. Separate columns with commas inside PARTITION BY. The row numbering restarts when any of those column values differ.
It streams efficiently, but large partitions with complex ORDER BY expressions can increase memory and CPU. Pre-filter data or materialize results if performance drops.