How to Optimize Queries in Snowflake

How do I optimize slow queries in Snowflake?

Improve Snowflake query speed and lower costs with clustering, result-set caching, and well-written SQL.

Welcome to the Galaxy, Guardian!

Oops! Something went wrong while submitting the form.

Description

What makes a Snowflake query slow?

Large scans of micro-partitions, missing filters, Cartesian joins, and overuse of SELECT * force Snowflake to read more data than needed, inflating both latency and credits.

How do I inspect a query plan?

Run EXPLAIN USING TEXT <query> or click the Query Profile in Snowsight. Look for large "SCAN" nodes, repartition steps, and spilled data that signal optimization opportunities.

When should I add a clustering key?

Use a clustering key when a table exceeds ~100M rows and you filter on a small subset of values (e.g., order_date). Clustering keeps relevant micro-partitions together, reducing scan cost.

Syntax for clustering a table

ALTER TABLE Orders CLUSTER BY (order_date, customer_id);

How can I reduce scanned columns?

Select only needed columns and avoid SELECT *. Narrow projections shrink the amount of data Snowflake fetches from storage, instantly improving performance.

Does result caching help?

Yes. If the underlying data hasn’t changed, Snowflake returns cached results at no cost. Keep identical text, role, and warehouse to maximize cache hits.

Example using cache effectively

-- First run (computes and stores result) SELECT customer_id, total_amount FROM Orders WHERE order_date >= '2024-01-01'; -- Subsequent run (served from cache) SELECT customer_id, total_amount FROM Orders WHERE order_date >= '2024-01-01';

How do I avoid unnecessary shuffles?

Join on columns with similar cardinality and use broadcast joins (/*+ BROADCAST(table) */) for small dimensions like Products. This keeps data local to nodes.

What warehouse settings matter?

Use auto-resize warehouses for unpredictable spikes. Scale up (larger size) for CPU-bound analytics, and scale out (more clusters) for high concurrency.

Best practices checklist

Add clustering keys only when beneficial; monitor CLUSTERING_INFORMATION.
Filter early in CTEs to push predicates down.
Use CREATE TABLE AS SELECT (CTAS) for heavy transforms, then query the materialized result.
Leverage materialized views for frequent aggregations.

Why How to Optimize Queries in Snowflake is important

How to Optimize Queries in Snowflake Example Usage


-- Identify top customers by spend in 2024 using an optimized query
EXPLAIN USING TEXT
SELECT c.id, c.name, SUM(o.total_amount) AS spend
FROM Orders o
JOIN Customers c ON c.id = o.customer_id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY c.id, c.name
HAVING spend > 10000
ORDER BY spend DESC;

How to Optimize Queries in Snowflake Syntax


-- Inspect query plan
EXPLAIN USING TEXT <SQL_query>;

-- Create or modify clustering key
ALTER TABLE Orders
    CLUSTER BY (order_date [, <additional_columns>]);

-- Query with filter pushdown and no SELECT *
SELECT c.name, o.order_date, oi.quantity, p.price
FROM Orders o
JOIN Customers c   ON c.id = o.customer_id
JOIN OrderItems oi ON oi.order_id = o.id
JOIN Products p    ON p.id = oi.product_id
WHERE o.order_date >= '2024-01-01'
  AND p.stock > 0;

-- Use query hint to broadcast a small table
SELECT /*+ BROADCAST(Products) */ *
FROM OrderItems oi
JOIN Products p ON p.id = oi.product_id;

Common Mistakes

Using SELECT * in production queries. This forces Snowflake to scan every column. Fix it by explicitly listing only required columns.
Creating clustering keys on small tables. Snowflake spends credits reclustering with little benefit. Apply clustering only to large, frequently filtered tables.