How to Bulk Load Data in ClickHouse

How do I bulk load large CSV, TSV, or JSON files into ClickHouse quickly?

Bulk loading moves large external datasets into ClickHouse tables quickly using optimized INSERT commands, file formats, and client-side tools.

Welcome to the Galaxy, Guardian!

Oops! Something went wrong while submitting the form.

Description

Why choose bulk loading over single-row INSERTs?

Bulk loading uses a single network round trip and ClickHouse’s vectorized engine, cutting write latency and CPU overhead. Single-row INSERTs create excessive handshakes and log entries, slowing ingestion.

Which file formats load fastest?

ClickHouse parses Native, Parquet, and CSV fastest, followed by TSV and JSONEachRow. Match the format to your source files to avoid extra conversions.

How do I prepare data files?

Export source tables without headers, use UTF-8, and keep column order identical to the ClickHouse target schema. Compress files with gzip or lz4 to shrink transfer size.

How do I run a bulk INSERT from the CLI?

Use clickhouse-client --query plus shell redirection. This streams the local file directly into the server.

Can I load compressed files directly?

Yes. Add --compression or pipe gzip -dc output into clickhouse-client. ClickHouse autodetects compression.

What if my file contains extra columns?

Specify the target column list in the INSERT statement. Unmatched columns are ignored, preventing schema mismatch errors.

How do I monitor progress?

Query system.mutations and system.parts, or start clickhouse-client with --progress to view real-time row counts.

Best practices for massive datasets?

Split files into 100–500 MB parts, load in parallel, and set max_insert_block_size and max_threads to leverage CPU cores. Disable sync_replica during ingest to speed writes.

Why How to Bulk Load Data in ClickHouse is important

How to Bulk Load Data in ClickHouse Example Usage


-- Import March 2024 orders into the Orders table
clickhouse-client --query="INSERT INTO Orders FORMAT CSV" < /data/exports/orders_2024_03.csv

How to Bulk Load Data in ClickHouse Syntax


-- Basic CSV load
clickhouse-client \
  --query="INSERT INTO Customers FORMAT CSV" \
  < customers.csv

-- Load specific columns
clickhouse-client \
  --query="INSERT INTO Orders(id,customer_id,order_date,total_amount) FORMAT TSV" \
  < orders.tsv

-- Load compressed JSONEachRow
gzip -dc order_items.json.gz | \
clickhouse-client --query="INSERT INTO OrderItems FORMAT JSONEachRow"

-- Parallel load via GNU parallel
ls products_part*.csv | parallel -j4 \
  "clickhouse-client --query='INSERT INTO Products FORMAT CSV' < {}"

Common Mistakes

Ignoring column order. ClickHouse maps incoming fields by position, not name. Fix by matching export order to the table schema or listing target columns explicitly.
Using many small INSERTs. Each call incurs overhead. Batch rows into files ≥100 MB or use streaming inserts to maximize throughput.