How to Use row_number() in ClickHouse

Galaxy Glossary

How do I use row_number() in ClickHouse to number rows within groups?

row_number() assigns a unique, sequential number to each row within a partition ordered by chosen columns.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Table of Contents

What does row_number() do in ClickHouse?

row_number() is a window function that returns a monotonically increasing integer starting at 1 for the first row in each partition. The numbering resets whenever the PARTITION BY clause’s values change.

How do I write row_number() syntax?

Place row_number() OVER (PARTITION BY … ORDER BY …) in the SELECT list. PARTITION BY groups rows, ORDER BY defines sequence. Omitting ORDER BY yields non-deterministic numbering.

When should I use row_number()?

Use it to remove duplicates, paginate results, pick the first/last record per group, or build surrogate keys without altering source tables.

How to create a per-customer order sequence?

The query below numbers every order for each customer by order_date. It helps analysts track a shopper’s first, second, third purchase.

SELECT
id,
customer_id,
order_date,
row_number() OVER (PARTITION BY customer_id ORDER BY order_date) AS order_seq
FROM Orders;

How can I filter on the generated row numbers?

Wrap the window function in a subquery or use a CTE, then apply WHERE. ClickHouse forbids filtering on window aliases in the same SELECT layer.

WITH o AS (
SELECT *,
row_number() OVER (PARTITION BY customer_id ORDER BY order_date) AS rn
FROM Orders)
SELECT *
FROM o
WHERE rn = 1;

Best practices for row_number()

Always specify ORDER BY for deterministic output. Use sparse partitions to avoid heavy memory usage; pre-filter data when possible. Store intermediate results in a MergeTree table if the dataset is huge.

Performance tips

Window functions stream in ClickHouse, but large ORDER BY expressions can spill to disk. Index the ordering columns and limit selected fields to reduce I/O.

Common mistakes recap

1) Forgetting ORDER BY causes random row numbers. 2) Filtering on row_number() in the same SELECT layer fails; always nest or CTE.

Why How to Use row_number() in ClickHouse is important

How to Use row_number() in ClickHouse Example Usage


-- Get first product a customer bought
WITH ordered_items AS (
    SELECT
        oi.order_id,
        o.customer_id,
        oi.product_id,
        row_number() OVER (
            PARTITION BY o.customer_id
            ORDER BY o.order_date
        ) AS rn
    FROM OrderItems oi
    JOIN Orders o ON o.id = oi.order_id)
SELECT *
FROM ordered_items
WHERE rn = 1;

How to Use row_number() in ClickHouse Syntax


row_number() OVER (
    [PARTITION BY partition_expression [, ...]]
    ORDER BY sort_expression [ASC|DESC] [, ...]
) AS alias

-- Example: sequence orders per customer
SELECT
    id,
    customer_id,
    row_number() OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS order_seq
FROM Orders;

Common Mistakes

Frequently Asked Questions (FAQs)

Is row_number() deterministic without ORDER BY?

No. ClickHouse may output rows in any order, so row numbers can change between runs. Always add ORDER BY for stable results.

Can I partition by multiple columns?

Yes. Separate columns with commas inside PARTITION BY. The row numbering restarts when any of those column values differ.

Does row_number() add noticeable overhead?

It streams efficiently, but large partitions with complex ORDER BY expressions can increase memory and CPU. Pre-filter data or materialize results if performance drops.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.