How to Bulk Load Data in PostgreSQL

Galaxy Glossary

How do I bulk load data into PostgreSQL quickly and safely?

Bulk loading quickly ingests large files into PostgreSQL tables using the high-performance COPY command.

Welcome to the Galaxy, Guardian!

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

Bulk Load Data in PostgreSQL

Use the COPY command (or psql’s \copy) to stream files directly into tables, bypassing row-by-row INSERTs and cutting load times by orders of magnitude.

Why choose COPY instead of many INSERTs?

COPY sends a single, optimized stream to the server, eliminating per-row overhead, trigger firing, and network latency. This makes it 10–100× faster than loops of INSERT.

What is the COPY syntax?

See the full reference below. Key options: FORMAT (csv, text, binary), DELIMITER, HEADER, NULL, QUOTE, ENCODING, and FREEZE for bulk loads into static tables.

How do I load a local CSV file?

\copy sales FROM 'sales_2023.csv' WITH (FORMAT csv, HEADER, DELIMITER ',', NULL '', ENCODING 'UTF8');

\copy runs on the client, reading the file and funneling data to the server over the existing connection—perfect for cloud-hosted databases where the server can’t reach your laptop’s filesystem.

How do I load a file already on the server?

COPY sales (id, product_id, qty, sold_at) FROM '/var/lib/postgresql/imports/sales_2023.csv' WITH (FORMAT csv, HEADER, FREEZE);

RUN AS superuser or grant the table owner the pg_read_server_files role. Server-side COPY avoids client upload time.

Can I load compressed data?

Yes. Pipe gunzip output into psql:

gunzip -c bigfile.csv.gz | psql -c "COPY mytable FROM STDIN WITH (FORMAT csv);"

What performance tweaks matter?

Disable indexes and constraints, set synchronous_commit = off, and increase maintenance_work_mem. Re-enable and ANALYZE afterward for accurate planner stats.

How do I monitor progress?

Run SELECT relname, pg_size_pretty(pg_total_relation_size(relid)) FROM pg_stat_progress_copy; in another session (PostgreSQL 14+).

How do I roll back a failed COPY?

Wrap COPY in a transaction: BEGIN; COPY ...; COMMIT;. On error, PostgreSQL aborts the COPY and you can simply ROLLBACK;.

Best practices checklist

Pre-create the table with correct datatypes
Match column order or specify it explicitly
Use text/csv for portability; binary for ultimate speed
ANALYZE after loading to refresh statistics

Why How to Bulk Load Data in PostgreSQL is important

How to Bulk Load Data in PostgreSQL Example Usage


\copy users FROM 'users.csv' WITH (FORMAT csv, HEADER, DELIMITER ',', NULL '');

How to Bulk Load Data in PostgreSQL Syntax


COPY [table_name [ (column_list) ] ]
    FROM { 'file_path' | PROGRAM 'command' | STDIN }
    [ WITH ( 
        FORMAT { text | csv | binary },
        DELIMITER 'char',
        NULL 'string',
        HEADER [ TRUE | FALSE ],
        QUOTE 'char',
        ESCAPE 'char',
        ENCODING 'encoding',
        FREEZE [ TRUE | FALSE ]
    ) ];

\copy is psql’s client-side wrapper with identical options.

Common Mistakes

Relying on thousands of INSERT statements. This overwhelms the planner, bloats WAL, and crawls at a few hundred rows/sec. Replace with a single COPY or \copy to ingest millions of rows in minutes.
Forgetting to ANALYZE after a massive load. Without updated statistics, the planner misestimates row counts and produces slow query plans. Always run ANALYZE (or VACUUM ANALYZE) once the load finishes.

Frequently Asked Questions (FAQs)

Does COPY require superuser privileges?

Server-side COPY FROM needs superuser or membership in pg_read_server_files. Client-side \copy works under regular roles because the file is read locally.

Can I skip bad rows?

Yes. In PostgreSQL 15+, use COPY ... LOG ERRORS. For earlier versions, load into a staging table or use tools like pg_bulkload.

Is binary format always faster?

Binary avoids parsing, so it’s ~20-30 % faster, but ties you to PostgreSQL versions and endianness. Use it for internal pipelines; stick with CSV for exchange.