BigQuery Omni lets you query S3 data with standard SQL from the BigQuery engine while your compute stays in AWS.
BigQuery Omni runs the BigQuery engine inside AWS, so you can query Amazon S3 data with familiar SQL without moving files to Google Cloud.
Create a cross-cloud connection that stores temporary AWS credentials in Secret Manager. You need the ARN of an IAM role granting S3 and STS access.
Run CREATE CONNECTION
in the BigQuery console or via bq CLI. Supply your AWS role ARN and optional external ID for least-privilege security.
Use CREATE EXTERNAL TABLE
with the connection. Point to one or more S3 URIs containing Parquet, CSV, or JSON files that match your schema.
After the external table exists, a simple SELECT
reads directly from S3. Join external data with native BigQuery datasets in the same query.
The IAM role must allow s3:GetObject
on the target bucket and sts:AssumeRole
for the BigQuery service account. Restrict the bucket prefix to limit exposure.
Store data in columnar Parquet or ORC, partition by date, and compress with Snappy. Prune columns and partitions in the WHERE
clause to reduce scanned bytes.
Enable table previews to inspect schema, add LIMIT
while testing, and monitor the "S3 bytes scanned" metric in Cloud Monitoring.
Wrong file format: BigQuery cannot read Hive-style partitions in CSV. Convert to Parquet or declare a Hive partitioning schema.
Missing role trust: If the connection fails, add the BigQuery Omni principal to the IAM role’s trust policy and re-test.
No. Compute runs in AWS and reads directly from S3. Only query metadata moves to Google control plane.
Yes. Cross-cloud joins are supported, but performance is network-bound. Filter early to minimise data transfer.
It depends. Small, infrequent queries cost less with federation. Large analytic workloads are cheaper after loading data into native storage.