Calculating Cost per Query in BigQuery

How can I calculate cost per query in BigQuery?

Cost-per-query in BigQuery is the dollar amount charged for the bytes processed by an individual query, obtainable from job metadata, audit logs, or billing export tables.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

Overview

Google BigQuery is a fully managed, serverless data warehouse that bills you primarily for the amount of data your queries scan. Knowing how much each query costs is essential for budgeting, charge-backs, and query optimization. This article walks through the mechanics of calculating cost per query, offers best practices, and highlights common pitfalls.

Why Cost per Query Matters

Even though BigQuery’s on-demand model eliminates capacity planning, it can result in unexpectedly high bills if engineers are not vigilant. Per-query cost tracking enables:

Show-back/Charge-back to business units or customers.
Optimization feedback—developers immediately learn whether a refactor reduces spend.
Alerting and anomaly detection on runaway queries.
Compliance—auditors often request proof of cost attribution per workload.

BigQuery Pricing Refresher

On-Demand (Default)

You pay per byte processed. As of 2024-06, the rate is US $5 per TiB scanned. Storage and streaming inserts are separate charges.

Capacity Commitments

With slots or editions, cost becomes time-based rather than scan-based. Per-query cost is still helpful for allocating slot usage, but the math differs (slot-ms × edition price ÷ 1M).

Five Ways to Calculate Cost per Query

1. INFORMATION_SCHEMA.JOBS_BY_* Views (Recommended)

BigQuery exposes the total_bytes_processed field for every job. Divide by the on-demand price per byte:

-- project_id.dataset.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT SELECT job_id, user_email, creation_time, total_bytes_processed, total_bytes_billed, total_bytes_billed * 5 / POW(1024, 4) AS usd_cost FROM `my-project`.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND job_type = "QUERY";

total_bytes_billed is rounded up to the nearest MB; multiply by US$5/TiB (or your region’s price).

2. Cloud Logging Audit Logs

All BigQuery jobs emit AuditData entries that include jobChange.reason.totalProcessedBytes. Export these logs to BigQuery or GCS, then aggregate.

3. REST API / Client Libraries

Calling jobs.get(projectId, jobId) returns a JSON object containing statistics.totalBytesProcessed. For real-time dashboards, your backend can fetch cost immediately after a query completes.

4. Cloud Billing Export

To reconcile against your invoice, enable the BigQuery-specific Detailed Billing Export. The table gcp_billing_export_resource_v1_* contains usage.amount_in_pricing_units grouped by project.id, resource.labels.job_id, and sku.id. Join this with your jobs metadata for authoritative cost.

5. Estimating Cost Before Execution

Use a dry-run—set the client library’s dryRun=true or use the Console’s Query Validator. The response returns bytes billed so you can display an estimated cost. Some SQL editors, including Galaxy, surface this number inline so engineers know the cost before hitting “Run”.

End-to-End Example

The following walk-through tracks cost for yesterday’s queries and stores the result in a monitoring table.

-- Create a dataset to hold cost analytics CREATE SCHEMA IF NOT EXISTS bq_monitor; -- Yesterday's query costs INSERT INTO bq_monitor.query_costs SELECT job_id, project_id, user_email, creation_time, total_bytes_billed, total_bytes_billed * 5 / POW(1024, 4) AS usd_cost FROM `my-project`.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE job_type = "QUERY" AND creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP();

You can schedule this script via BigQuery Scheduled Queries or Cloud Composer and expose the results in Looker Studio.

Best Practices

Partition your datasets to reduce scanned bytes.
Use clustered tables—pruning saves money even if queries are ad-hoc.
Favor SELECT * EXCEPT to omit unneeded wide columns.
Cache results when possible; cached reads are free.
Set up quotas (per-user or per-project) to prevent runaway costs.

Galaxy Integration

Galaxy’s desktop SQL editor leverages BigQuery’s dry-run API. When you hit Validate, Galaxy retrieves totalBytesProcessed, converts it to USD based on your project’s location pricing, and displays the estimate next to the Run button. After execution, the job metadata panel shows the actual bytes billed so you can compare against the estimate. Because queries can be endorsed and shared in Galaxy Collections, teams quickly identify the most cost-efficient version of a query.

Common Mistakes and How to Avoid Them

1. Relying on `total_bytes_processed` Instead of `total_bytes_billed`

Why it’s wrong: Billed bytes are rounded up to the nearest MB. Using processed bytes under-reports cost.
Fix: Always reference total_bytes_billed when multiplying by price.

2. Forgetting Regional Price Differences

Why it’s wrong: Prices vary (e.g., US $5/TiB in us, US $6/TiB in asia-southeast1).
Fix: Parametrize your calculation with the region-appropriate price list, ideally in a lookup table.

3. Ignoring Slot-Based Pricing

Why it’s wrong: In editions like Enterprise, cost correlates with slot-ms not bytes.
Fix: When using reservations, multiply slot_ms by the edition’s price per million slot-ms instead.

Conclusion

Tracking BigQuery cost per query is straightforward once you understand the available telemetry. Whether you use INFORMATION_SCHEMA views, audit logs, or billing export tables, incorporating cost into your workflows drives accountability, optimization, and predictable spending. Tools like Galaxy further democratize visibility by surfacing cost estimates directly in the SQL editor.

Why Calculating Cost per Query in BigQuery is important

Unmonitored BigQuery queries can easily scan terabytes, turning a single analytic question into a multi-hundred-dollar charge. Calculating cost per query gives engineers immediate feedback, enables charge-backs, and enforces budget limits before invoices spiral out of control.