How to Run BigQuery on AWS

How can I run BigQuery SQL directly on Amazon S3 data inside AWS?

BigQuery Omni lets you query S3 data with standard SQL from the BigQuery engine while your compute stays in AWS.

Welcome to the Galaxy, Guardian!

Oops! Something went wrong while submitting the form.

Description

What does “BigQuery on AWS” mean?

BigQuery Omni runs the BigQuery engine inside AWS, so you can query Amazon S3 data with familiar SQL without moving files to Google Cloud.

How do I connect BigQuery to my AWS account?

Create a cross-cloud connection that stores temporary AWS credentials in Secret Manager. You need the ARN of an IAM role granting S3 and STS access.

Step 1 — Create the external connection

Run CREATE CONNECTION in the BigQuery console or via bq CLI. Supply your AWS role ARN and optional external ID for least-privilege security.

Step 2 — Define external tables over S3 objects

Use CREATE EXTERNAL TABLE with the connection. Point to one or more S3 URIs containing Parquet, CSV, or JSON files that match your schema.

Step 3 — Query S3 data with standard SQL

After the external table exists, a simple SELECT reads directly from S3. Join external data with native BigQuery datasets in the same query.

Which permissions are required?

The IAM role must allow s3:GetObject on the target bucket and sts:AssumeRole for the BigQuery service account. Restrict the bucket prefix to limit exposure.

How do I optimise performance?

Store data in columnar Parquet or ORC, partition by date, and compress with Snappy. Prune columns and partitions in the WHERE clause to reduce scanned bytes.

Best practices for cost control?

Enable table previews to inspect schema, add LIMIT while testing, and monitor the "S3 bytes scanned" metric in Cloud Monitoring.

Common mistakes and quick fixes

Wrong file format: BigQuery cannot read Hive-style partitions in CSV. Convert to Parquet or declare a Hive partitioning schema.

Missing role trust: If the connection fails, add the BigQuery Omni principal to the IAM role’s trust policy and re-test.

Why How to Run BigQuery on AWS is important

How to Run BigQuery on AWS Example Usage


--Total revenue per customer last month
SELECT c.id, c.name, SUM(o.total_amount) AS revenue
FROM `ecom.Customers` AS c
JOIN `ecom.Orders`   AS o ON o.customer_id = c.id
WHERE o.order_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH) AND CURRENT_DATE()
GROUP BY c.id, c.name
ORDER BY revenue DESC
LIMIT 10;

How to Run BigQuery on AWS Syntax


--1. Create the external connection
CREATE CONNECTION `project.us.aws_conn`
OPTIONS (
  external_location = 'aws://us-east-1',
  aws_role_arn = 'arn:aws:iam::123456789012:role/bq-omni',
  aws_external_id = 'galaxy-omni'
);

--2. Create external tables (ecommerce example)
CREATE EXTERNAL TABLE `ecom.Customers` (
  id INT64,
  name STRING,
  email STRING,
  created_at TIMESTAMP
)
WITH CONNECTION `project.us.aws_conn`
OPTIONS (
  format = 'PARQUET',
  uris = ['s3://ecom-data/customers/*.parquet']
);

CREATE EXTERNAL TABLE `ecom.Orders` (
  id INT64,
  customer_id INT64,
  order_date DATE,
  total_amount NUMERIC
)
WITH CONNECTION `project.us.aws_conn`
OPTIONS (
  format = 'PARQUET',
  uris = ['s3://ecom-data/orders/2024/*/*.parquet']
);

Common Mistakes

Omitting the AWS external ID in the role trust. Without it, BigQuery’s service account cannot assume the role. Add the <code>sts:ExternalId</code> condition with the value supplied in <code>CREATE CONNECTION</code>.
Pointing to a folder that mixes CSV and Parquet. BigQuery expects uniform file formats. Separate files by format or declare two external tables.