Bulk loading lets you ingest large CSV, JSON, Parquet, or Avro files from Cloud Storage into a native BigQuery table in one command.
Bulk load is cheaper, faster, and bypasses streaming quota limits. Use it for historical back-fills or nightly warehouse ingests.
Stage your files in a single Cloud Storage bucket, compress them with gzip, and run one bq load
command using wildcards to parallelize the ingest.
BigQuery supports CSV, JSON (newline-delimited), Avro, Parquet, ORC, and Datastore backups.Choose the format that preserves types and minimizes size.
Yes. Set --skip_leading_rows=1
(CLI) or skip_leading_rows = 1
(LOAD DATA DDL) so the first line is ignored.
Append with --replace=false
(default) or WRITE_APPEND
. Overwrite with --replace
or WRITE_TRUNCATE
.
1. Upload orders_*.csv.gz
to gs://ecom-ingest/
.
2. Create an empty table or let autodetect infer.
3.Run the CLI or SQL shown below.
Chunk files at ≤15 GB, gzip them, colocate the bucket in the same region as the dataset, and set max_bad_records
for tolerance.
Check the Job details in the BigQuery console. Review the error
array for line numbers, fix schema mismatches, and re-run with --ignore_unknown_values
if needed.
.
You can load up to 15 TB per job by sharding files; each individual file must be ≤15 GB.
Yes. Provide up to 10,000 URIs across buckets as long as they are in the same region as the dataset.
No. Use --autodetect for CSV, JSON, Avro, and Parquet. For production, specify the schema explicitly for stability.