How to Add a Primary Key in ClickHouse

Galaxy Glossary

How do I add a primary key to an existing ClickHouse table?

Adds or changes the PRIMARY KEY (ORDER BY) expression of a ClickHouse MergeTree table to speed up data skipping and look-ups.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Table of Contents

Add a Primary Key in ClickHouse

ALTER TABLE … MODIFY PRIMARY KEY lets you define or change the key ClickHouse uses for data skipping, greatly improving query speed.

What does “adding a primary key” actually do in ClickHouse?

ClickHouse stores data in MergeTree engines. PRIMARY KEY is effectively an alias for the ORDER BY clause, controlling how parts are sorted on disk. Updating it rewrites future parts and accelerates range queries.

Can I add a primary key after the table is created?

Yes—use ALTER TABLE … MODIFY PRIMARY KEY. Existing parts are not rewritten automatically, but new inserts will follow the new key. Run an OPTIMIZE TABLE to backfill.

How do I choose an optimal key for ecommerce data?

Lead with the most selective, frequently filtered columns. For an Orders table, (customer_id, order_date) filters by customer then narrows by date, yielding high selectivity.

Step-by-step guide

1. Confirm engine supports PRIMARY KEY

MergeTree and its derivatives (ReplacingMergeTree, ReplicatedMergeTree, etc.) support the clause.

2. Check current key

Query system.tables to see primary_key and sorting_key expressions before altering.

3. Alter the key

Use ALTER TABLE … MODIFY PRIMARY KEY with the new expression.

4. Re-optimize for historical data

Run OPTIMIZE TABLE … FINAL during low-traffic windows to merge old parts under the new order.

Best practices

  • Keep key width <≈ 20 bytes to minimize index size.
  • Place highly selective columns first.
  • Avoid low-cardinality columns (e.g., status) as leading elements.
  • Always test on staging and monitor disk and CPU during OPTIMIZE.

Common mistakes

Using low selectivity columns first

Leading with order_date instead of customer_id in a multi-tenant store results in large ranges scanned. Fix by reordering the key or adding the selective column first.

Forgetting to OPTIMIZE after modifying

Without OPTIMIZE TABLE FINAL, historical data remains unsorted; queries gain no benefit. Schedule an optimize or run a background materialized view to rewrite data.

Full example

-- Step 1: create table without primary key
a CREATE TABLE Orders (
id UInt64,
customer_id UInt64,
order_date DateTime,
total_amount Decimal(10,2)
) ENGINE = MergeTree ORDER BY id;

-- Step 2: add a better primary key
ALTER TABLE Orders MODIFY PRIMARY KEY (customer_id, order_date);

-- Step 3: backfill
OPTIMIZE TABLE Orders FINAL;

When should I avoid changing the key?

If the table is monstrously large and historical queries are rare, the gain may not justify the rewrite cost.

Why How to Add a Primary Key in ClickHouse is important

How to Add a Primary Key in ClickHouse Example Usage


-- Add a compound primary key to speed up customer filtering
ALTER TABLE Orders MODIFY PRIMARY KEY (customer_id, order_date);
OPTIMIZE TABLE Orders FINAL;

How to Add a Primary Key in ClickHouse Syntax


ALTER TABLE <table_name> MODIFY PRIMARY KEY (<expr1>[, <expr2> …]);

Parameters:
<table_name>    – target MergeTree-family table
<exprX>          – column or expression used for ordering/primary key

Example (ecommerce):
ALTER TABLE Orders MODIFY PRIMARY KEY (customer_id, order_date);

Common Mistakes

Frequently Asked Questions (FAQs)

Does ClickHouse enforce uniqueness with PRIMARY KEY?

No. PRIMARY KEY controls ordering for data skipping; it does not guarantee uniqueness. Use ReplacingMergeTree or AggregatingMergeTree with a unique field if needed.

Is PRIMARY KEY the same as ORDER BY?

Yes. If you omit PRIMARY KEY, the ORDER BY expression acts as both sorting and primary key expression. Setting PRIMARY KEY explicitly lets you have a wider ORDER BY but narrower key.

Can I drop the primary key later?

You cannot remove it entirely, but you can modify it again. Ensure the replacement expression starts with the desired leading columns.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.