Backing up a ClickHouse cluster means creating consistent copies of its data and metadata so you can restore the system after failure, corruption, or accidental deletion.
ClickHouse is famous for its blazing-fast analytics, but speed is useless if you cannot recover from disaster. A robust backup strategy ensures that terabytes—or even petabytes—of production data can be restored with minimal downtime. This article walks through the principles, tools, and real-world workflows for backing up ClickHouse clusters in 2024.
Before you can protect ClickHouse data, you must understand how it is stored:
/var/lib/clickhouse/data/<db>/<table>/
./var/lib/clickhouse/metadata/
and ZooKeeper paths.system.macros
, system.clusters
) hold cluster configuration.Because parts are immutable, point-in-time consistency is easier than in row-mutable databases, but you still need to capture both data and metadata atomically.
Using LVM, ZFS, EBS, or Ceph snapshots provides instant, crash-consistent copies. Combine snapshots with object-storage uploads for durability.
clickhouse-backup
UtilityThe de-facto community tool (Altinity/ClickHouse-backup) automates data freezing, compression, and upload to S3, GCS, Azure Blob, or NFS.
ClickHouse 22.6+ offers native BACKUP
and RESTORE
statements, especially convenient for Managed ClickHouse services.
Cloud providers such as Altinity.Cloud, Aiven, and ClickHouse Cloud schedule automated snapshots and expose one-click restores.
Your production requirements dictate the mix of techniques:
BACKUP
increments or frequent snapshots shrink the window.clickhouse-backup
sudo wget -O /usr/local/bin/clickhouse-backup \
https://github.com/Altinity/clickhouse-backup/releases/download/v2.5.1/clickhouse-backup-linux-amd64
sudo chmod +x /usr/local/bin/clickhouse-backup
Edit /etc/clickhouse-backup/config.yml
:
general:
remote_storage: s3
compression_format: tar
upload_concurrency: 4
s3:
bucket: clickhouse-prod-backups
endpoint: s3.us-east-1.amazonaws.com
access_key_id: <AWS_KEY>
secret_access_key: <AWS_SECRET>
path: /cluster1/{hostname}
clickhouse-backup freeze --tables "db.*"
The command issues ALTER TABLE … FREEZE
to create hard-linked copies under /shadow/
.
clickhouse-backup create --backup-name 2024-04-30-full
clickhouse-backup upload 2024-04-30-full
This streams compressed archives to S3 and registers metadata.
0 3 * * * /usr/local/bin/clickhouse-backup create_remote \
--tables "db.*" --full --retention 14d >/var/log/ch-backup.log 2>&1
A daily 03:00 UTC full backup kept for 14 days.
# Download chosen backup
time clickhouse-backup download 2024-04-30-full
# Restore data and metadata across the cluster
clickhouse-backup restore --rm 2024-04-30-full
The --rm
flag drops existing tables first. When using replicated tables, be sure to system sync replicas
afterward.
BACKUP
and RESTORE
CommandsBACKUP DATABASE analytics, metadata TO S3('s3://clickhouse-prod-backups/2024-04-30')
SETTINGS access_key_id = 'AWS_KEY', secret_access_key='AWS_SECRET';
ClickHouse stores backup manifests; unchanged parts are referenced rather than copied, reducing I/O.
RESTORE ALL FROM S3('s3://clickhouse-prod-backups/2024-04-30')
SETTINGS allow_non_empty_tables = 1;
snapshot --force <path>
) so replicated tables can rebuild peers.clickhouse-backup server
.Replication guards against node failure, not operator error. If a DROP TABLE
is executed, every replica deletes the data.
File-system snapshots give crash-consistency but may miss un-flushed parts. Use system flush logs
or ALTER … FREEZE
first.
To restore a replica in a replicated table, you must remove ZooKeeper paths or use RESTORE REPLICA
so identity matches.
FinTech Co ingests 50 B rows per day across 6 shards × 2 replicas. They:
BACKUP
to S3.s3-sync
for geo-redundancy.clickhouse-backup delete remote
.When an engineer accidentally truncated a partition, they restored the affected shard in 7 minutes with zero data loss.
Embed ClickHouse restores in integration tests to validate migrations:
docker run --name ch-test -d yandex/clickhouse-server:23.8
clickhouse-backup restore --replication --rm $CI_BACKUP_ID
pytest tests/sql_migrations_test.py
Catch schema drift early and prove your backup can seed ephemeral environments.
ClickHouseBackupLastSuccessTimestamp
> 24 h triggers PagerDuty.system.disks.free_space
against backup size to avoid archive failures./var/log/ch-backup.log
to Loki or ELK for auditing.Backing up a ClickHouse cluster is straightforward when you leverage immutable parts, but the devil is in the details—coordination with ZooKeeper, retention policies, encryption, and continuous testing. Whether you choose file-system snapshots, clickhouse-backup
, or native SQL commands, automate everything and rehearse restores regularly.
ClickHouse often serves as the single source of truth for petabyte-scale analytics. A failed node, bad deployment, or accidental DROP can wipe out critical insight and revenue. Solid backup strategy safeguards data integrity, minimizes downtime, meets compliance standards, and instills confidence in engineering teams that fast analytics won’t come at the cost of resiliency.
On SSD-backed nodes, a 1 TB dataset compresses and uploads to S3 in roughly 15–25 minutes using four upload threads. Incremental backups after the first full run are dramatically faster because unchanged parts are skipped.
Yes. ClickHouse parts are immutable, so read and write workloads continue unhindered. The ALTER ... FREEZE
step merely creates hard links, introducing negligible I/O overhead.
No. Both clickhouse-backup
and native BACKUP
capture ZooKeeper metadata automatically or via an additional snapshot, allowing the service to stay online.
A common pattern is: keep 7 daily, 4 weekly, 6 monthly, and 12 yearly backups. Adjust based on compliance, cost, and data-change rate.