How to Bulk Load Data in BigQuery

Galaxy Glossary

How do I bulk load large CSV or JSON files into BigQuery quickly?

Bulk loading lets you ingest large CSV, JSON, Parquet, or Avro files from Cloud Storage into a native BigQuery table in one command.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Table of Contents

Why use bulk load instead of streaming?

Bulk load is cheaper, faster, and bypasses streaming quota limits. Use it for historical back-fills or nightly warehouse ingests.

What is the fastest way to bulk load data into BigQuery?

Stage your files in a single Cloud Storage bucket, compress them with gzip, and run one bq load command using wildcards to parallelize the ingest.

Which file formats can I bulk load?

BigQuery supports CSV, JSON (newline-delimited), Avro, Parquet, ORC, and Datastore backups.Choose the format that preserves types and minimizes size.

Does BigQuery accept CSV headers?

Yes. Set --skip_leading_rows=1 (CLI) or skip_leading_rows = 1 (LOAD DATA DDL) so the first line is ignored.

Can I append versus overwrite?

Append with --replace=false (default) or WRITE_APPEND. Overwrite with --replace or WRITE_TRUNCATE.

Step-by-step bulk load example

1. Upload orders_*.csv.gz to gs://ecom-ingest/.
2. Create an empty table or let autodetect infer.
3.Run the CLI or SQL shown below.

Best practices for large loads

Chunk files at ≤15 GB, gzip them, colocate the bucket in the same region as the dataset, and set max_bad_records for tolerance.

Troubleshooting failed loads

Check the Job details in the BigQuery console. Review the error array for line numbers, fix schema mismatches, and re-run with --ignore_unknown_values if needed.

.

Why How to Bulk Load Data in BigQuery is important

How to Bulk Load Data in BigQuery Example Usage


bq load --source_format=CSV \
        --schema="id:INT64,customer_id:INT64,order_date:DATE,total_amount:NUMERIC" \
        --skip_leading_rows=1 \
        --replace \
        ecommerce.Orders \
        gs://ecom-ingest/orders_2023-*.csv.gz

How to Bulk Load Data in BigQuery Syntax


bq load [--location=US] \
        --source_format=CSV|NEWLINE_DELIMITED_JSON|AVRO|PARQUET \
        --autodetect|--schema="field:type,..." \
        --replace (optional) \
        --skip_leading_rows=1 \
        --max_bad_records=5 \
        ecommerce.Orders \  # dataset.table
        gs://ecom-ingest/orders_*.csv.gz

BigQuery DDL alternative:
LOAD DATA INTO `ecommerce.Orders`
FROM FILES (
  format = 'CSV',
  uris = ['gs://ecom-ingest/orders_*.csv.gz'],
  skip_leading_rows = 1,
  max_bad_records = 5,
  write_disposition = 'WRITE_TRUNCATE'
);

Common Mistakes

Frequently Asked Questions (FAQs)

How large can a single load job be?

You can load up to 15 TB per job by sharding files; each individual file must be ≤15 GB.

Can I load from multiple buckets?

Yes. Provide up to 10,000 URIs across buckets as long as they are in the same region as the dataset.

Is schema required?

No. Use --autodetect for CSV, JSON, Avro, and Parquet. For production, specify the schema explicitly for stability.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.