How to Bulk Load Data in ClickHouse

Galaxy Glossary

How do I bulk load large CSV, TSV, or JSON files into ClickHouse quickly?

Bulk loading moves large external datasets into ClickHouse tables quickly using optimized INSERT commands, file formats, and client-side tools.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Why choose bulk loading over single-row INSERTs?

Bulk loading uses a single network round trip and ClickHouse’s vectorized engine, cutting write latency and CPU overhead. Single-row INSERTs create excessive handshakes and log entries, slowing ingestion.

Which file formats load fastest?

ClickHouse parses Native, Parquet, and CSV fastest, followed by TSV and JSONEachRow. Match the format to your source files to avoid extra conversions.

How do I prepare data files?

Export source tables without headers, use UTF-8, and keep column order identical to the ClickHouse target schema. Compress files with gzip or lz4 to shrink transfer size.

How do I run a bulk INSERT from the CLI?

Use clickhouse-client --query plus shell redirection. This streams the local file directly into the server.

Can I load compressed files directly?

Yes. Add --compression or pipe gzip -dc output into clickhouse-client. ClickHouse autodetects compression.

What if my file contains extra columns?

Specify the target column list in the INSERT statement. Unmatched columns are ignored, preventing schema mismatch errors.

How do I monitor progress?

Query system.mutations and system.parts, or start clickhouse-client with --progress to view real-time row counts.

Best practices for massive datasets?

Split files into 100–500 MB parts, load in parallel, and set max_insert_block_size and max_threads to leverage CPU cores. Disable sync_replica during ingest to speed writes.

Why How to Bulk Load Data in ClickHouse is important

How to Bulk Load Data in ClickHouse Example Usage


-- Import March 2024 orders into the Orders table
clickhouse-client --query="INSERT INTO Orders FORMAT CSV" < /data/exports/orders_2024_03.csv

How to Bulk Load Data in ClickHouse Syntax


-- Basic CSV load
clickhouse-client \
  --query="INSERT INTO Customers FORMAT CSV" \
  < customers.csv

-- Load specific columns
clickhouse-client \
  --query="INSERT INTO Orders(id,customer_id,order_date,total_amount) FORMAT TSV" \
  < orders.tsv

-- Load compressed JSONEachRow
gzip -dc order_items.json.gz | \
clickhouse-client --query="INSERT INTO OrderItems FORMAT JSONEachRow"

-- Parallel load via GNU parallel
ls products_part*.csv | parallel -j4 \
  "clickhouse-client --query='INSERT INTO Products FORMAT CSV' < {}"

Common Mistakes

Frequently Asked Questions (FAQs)

Can I bulk load from S3 directly?

Yes. Use s3() table functions or ENGINE=File with S3 disks, then run INSERT … SELECT into the destination table.

Is transactional rollback possible during bulk load?

No. ClickHouse is eventually consistent. Test loads in a staging table first and drop it if the data is wrong.

How do I speed up replicated cluster loads?

Temporarily set insert_quorum = 1 and insert_quorum_timeout = 0 to skip synchronous replication, then revert after ingest.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie
BauHealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.