How to Use Subqueries in Redshift

Galaxy Glossary

How do I write a subquery in Amazon Redshift?

A Redshift subquery is a SELECT statement nested inside another query to filter, compute, or supply data.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

What is a subquery in Redshift?

A subquery is a SELECT placed inside another query’s SELECT, FROM, or WHERE clause to provide a value set, single value, or virtual table the outer query can use.

When should I use a subquery instead of a JOIN?

Use a subquery when the nested logic depends on row-by-row evaluation, requires aggregation before filtering, or when simplifying complex JOINs improves readability.

What is the basic syntax?

Wrap the inner SELECT in parentheses. The outer query can treat the result as a table, list, or scalar value depending on context.

Scalar subquery example

Return each customer with total orders computed inline.

IN/EXISTS subquery example

Filter customers who bought out-of-stock products using EXISTS.

FROM-clause subquery example

Create an inline view that aggregates order revenue, then join to Customers for reporting.

How do I avoid performance issues?

Rewrite correlated subqueries as JOINs when possible, ensure inner queries have selective predicates, and avoid returning unnecessary columns.

Best practices for subqueries

1️⃣ Keep subqueries small and indexed by distribution/sort keys. 2️⃣ Use LIMIT for testing. 3️⃣ Materialize heavy subqueries into temporary tables for reuse.

Why How to Use Subqueries in Redshift is important

How to Use Subqueries in Redshift Example Usage


-- List customers with at least one order containing an out-of-stock product
SELECT DISTINCT c.id, c.name, c.email
FROM Customers c
WHERE EXISTS (
  SELECT 1
  FROM Orders o
  JOIN OrderItems oi ON oi.order_id = o.id
  JOIN Products p  ON p.id = oi.product_id
  WHERE o.customer_id = c.id
    AND p.stock = 0);

How to Use Subqueries in Redshift Syntax


-- Scalar subquery in SELECT
SELECT c.id,
       c.name,
       (SELECT COUNT(*)
        FROM Orders o
        WHERE o.customer_id = c.id) AS order_count
FROM Customers c;

-- Subquery in WHERE using IN
SELECT *
FROM Products p
WHERE p.id IN (SELECT product_id
               FROM OrderItems
               WHERE quantity > 3);

-- EXISTS correlated subquery
SELECT c.*
FROM Customers c
WHERE EXISTS (
  SELECT 1
  FROM Orders o
  WHERE o.customer_id = c.id
    AND o.total_amount > 500);

-- Subquery in FROM (inline view)
SELECT c.name, rev.total_amount
FROM Customers c
JOIN (
  SELECT customer_id, SUM(total_amount) AS total_amount
  FROM Orders
  GROUP BY customer_id) rev
ON rev.customer_id = c.id;

Common Mistakes

Frequently Asked Questions (FAQs)

Can I use ORDER BY inside a subquery?

Yes, but Redshift ignores ORDER BY unless you also use LIMIT. Sort the result in the outer query if presentation order matters.

Are correlated subqueries slower?

Often yes, because they re-execute per outer row. Rewrite as a JOIN or pre-aggregate in a CTE for better performance.

How deep can subqueries nest?

Redshift supports up to 32 levels of nesting, but readability and performance decline long before that limit.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.