How to Use Subqueries in BigQuery

Galaxy Glossary

How do I write a subquery in BigQuery?

A subquery lets you embed one SELECT statement inside another to filter, aggregate, or transform data before the outer query runs.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Table of Contents

What is a subquery in BigQuery?

A subquery is a SELECT statement enclosed in parentheses and used as a source table for an outer query. It runs first, feeding its result set to the surrounding query.

Why choose a subquery over a CTE?

Subqueries shine for quick, inline filtering or aggregations when you don’t need to reference the result more than once. They avoid naming overhead but can reduce readability compared to CTEs.

What is the basic subquery syntax?

Place the inner SELECT inside parentheses in the FROM or WHERE clause. Alias it if used in FROM to expose its columns to the outer query.

How do I filter rows using a subquery?

Use a subquery in the WHERE clause with IN, EXISTS, or comparison operators to limit rows returned by the outer query.

How do I join to a subquery?

Embed the subquery in the FROM clause and join it like a table. Always alias the subquery to reference its columns.

How to nest aggregations with subqueries?

Wrap an aggregate subquery in the FROM clause, then apply additional aggregation or filtering in the outer query for multi-step analysis.

Best practices for subqueries?

Alias clearly, keep inner result sets small, and avoid deeply nested subqueries. Convert complex, reusable logic to CTEs for clarity and performance.

Common mistakes with subqueries?

Returning multiple columns in a scalar context or forgetting to alias subqueries in the FROM clause causes errors. Check that inner SELECT returns the expected shape.

Why How to Use Subqueries in BigQuery is important

How to Use Subqueries in BigQuery Example Usage


-- Top customers who spent more than $500 in the last 90 days
SELECT c.id, c.name, spend.last_90d_total
FROM Customers AS c
JOIN (
  SELECT customer_id, SUM(total_amount) AS last_90d_total
  FROM Orders
  WHERE order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
  GROUP BY customer_id
) AS spend
ON c.id = spend.customer_id
WHERE spend.last_90d_total > 500
ORDER BY spend.last_90d_total DESC;

How to Use Subqueries in BigQuery Syntax


-- Subquery in WHERE clause
SELECT id, name
FROM Customers
WHERE id IN (
  SELECT customer_id
  FROM Orders
  WHERE order_date >= '2024-01-01'
);

-- Subquery in FROM clause (must be aliased)
SELECT c.name, recent.total_amount
FROM Customers AS c
JOIN (
  SELECT customer_id, SUM(total_amount) AS total_amount
  FROM Orders
  WHERE order_date >= '2024-01-01'
  GROUP BY customer_id
) AS recent
ON c.id = recent.customer_id;

Common Mistakes

Frequently Asked Questions (FAQs)

Can subqueries be correlated in BigQuery?

Yes. Correlated subqueries reference columns from the outer query. BigQuery rewrites many correlated subqueries for efficiency, but ensure they return few rows per evaluation.

Do subqueries hurt performance?

Not inherently. Performance depends on data scanned. Limit columns, filter early, and check query plan to avoid full-table scans inside subqueries.

When should I switch to a CTE?

If the inner result set is reused, complex, or hampers readability, use WITH (CTE) instead. CTEs can also improve optimizer hints and caching.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.