How to Run BigQuery on AWS

Galaxy Glossary

How can I run BigQuery SQL directly on Amazon S3 data inside AWS?

BigQuery Omni lets you query S3 data with standard SQL from the BigQuery engine while your compute stays in AWS.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

What does “BigQuery on AWS” mean?

BigQuery Omni runs the BigQuery engine inside AWS, so you can query Amazon S3 data with familiar SQL without moving files to Google Cloud.

How do I connect BigQuery to my AWS account?

Create a cross-cloud connection that stores temporary AWS credentials in Secret Manager. You need the ARN of an IAM role granting S3 and STS access.

Step 1 — Create the external connection

Run CREATE CONNECTION in the BigQuery console or via bq CLI. Supply your AWS role ARN and optional external ID for least-privilege security.

Step 2 — Define external tables over S3 objects

Use CREATE EXTERNAL TABLE with the connection. Point to one or more S3 URIs containing Parquet, CSV, or JSON files that match your schema.

Step 3 — Query S3 data with standard SQL

After the external table exists, a simple SELECT reads directly from S3. Join external data with native BigQuery datasets in the same query.

Which permissions are required?

The IAM role must allow s3:GetObject on the target bucket and sts:AssumeRole for the BigQuery service account. Restrict the bucket prefix to limit exposure.

How do I optimise performance?

Store data in columnar Parquet or ORC, partition by date, and compress with Snappy. Prune columns and partitions in the WHERE clause to reduce scanned bytes.

Best practices for cost control?

Enable table previews to inspect schema, add LIMIT while testing, and monitor the "S3 bytes scanned" metric in Cloud Monitoring.

Common mistakes and quick fixes

Wrong file format: BigQuery cannot read Hive-style partitions in CSV. Convert to Parquet or declare a Hive partitioning schema.

Missing role trust: If the connection fails, add the BigQuery Omni principal to the IAM role’s trust policy and re-test.

Why How to Run BigQuery on AWS is important

How to Run BigQuery on AWS Example Usage


--Total revenue per customer last month
SELECT c.id, c.name, SUM(o.total_amount) AS revenue
FROM `ecom.Customers` AS c
JOIN `ecom.Orders`   AS o ON o.customer_id = c.id
WHERE o.order_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH) AND CURRENT_DATE()
GROUP BY c.id, c.name
ORDER BY revenue DESC
LIMIT 10;

How to Run BigQuery on AWS Syntax


--1. Create the external connection
CREATE CONNECTION `project.us.aws_conn`
OPTIONS (
  external_location = 'aws://us-east-1',
  aws_role_arn = 'arn:aws:iam::123456789012:role/bq-omni',
  aws_external_id = 'galaxy-omni'
);

--2. Create external tables (ecommerce example)
CREATE EXTERNAL TABLE `ecom.Customers` (
  id INT64,
  name STRING,
  email STRING,
  created_at TIMESTAMP
)
WITH CONNECTION `project.us.aws_conn`
OPTIONS (
  format = 'PARQUET',
  uris = ['s3://ecom-data/customers/*.parquet']
);

CREATE EXTERNAL TABLE `ecom.Orders` (
  id INT64,
  customer_id INT64,
  order_date DATE,
  total_amount NUMERIC
)
WITH CONNECTION `project.us.aws_conn`
OPTIONS (
  format = 'PARQUET',
  uris = ['s3://ecom-data/orders/2024/*/*.parquet']
);

Common Mistakes

Frequently Asked Questions (FAQs)

Does BigQuery Omni copy my data out of AWS?

No. Compute runs in AWS and reads directly from S3. Only query metadata moves to Google control plane.

Can I join S3 data with BigQuery datasets stored in GCP?

Yes. Cross-cloud joins are supported, but performance is network-bound. Filter early to minimise data transfer.

Is federation cheaper than loading data into BigQuery?

It depends. Small, infrequent queries cost less with federation. Large analytic workloads are cheaper after loading data into native storage.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie
BauHealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.