Improve Snowflake query speed and lower costs with clustering, result-set caching, and well-written SQL.
Large scans of micro-partitions, missing filters, Cartesian joins, and overuse of SELECT * force Snowflake to read more data than needed, inflating both latency and credits.
Run EXPLAIN USING TEXT <query>
or click the Query Profile in Snowsight. Look for large "SCAN" nodes, repartition steps, and spilled data that signal optimization opportunities.
Use a clustering key when a table exceeds ~100M rows and you filter on a small subset of values (e.g., order_date
). Clustering keeps relevant micro-partitions together, reducing scan cost.
ALTER TABLE Orders
CLUSTER BY (order_date, customer_id);
Select only needed columns and avoid SELECT *. Narrow projections shrink the amount of data Snowflake fetches from storage, instantly improving performance.
Yes. If the underlying data hasn’t changed, Snowflake returns cached results at no cost. Keep identical text, role, and warehouse to maximize cache hits.
-- First run (computes and stores result)
SELECT customer_id, total_amount
FROM Orders
WHERE order_date >= '2024-01-01';
-- Subsequent run (served from cache)
SELECT customer_id, total_amount
FROM Orders
WHERE order_date >= '2024-01-01';
Join on columns with similar cardinality and use broadcast joins (/*+ BROADCAST(table) */
) for small dimensions like Products
. This keeps data local to nodes.
Use auto-resize warehouses for unpredictable spikes. Scale up (larger size) for CPU-bound analytics, and scale out (more clusters) for high concurrency.
CLUSTERING_INFORMATION
.CREATE TABLE AS SELECT
(CTAS) for heavy transforms, then query the materialized result.No. Snowflake automatically handles micro-partition maintenance. Focus on clustering and SQL tuning instead.
Monitor CLUSTERING_DEPTH
. Recluster when depth > 5 or after large batch loads.
You can’t force it, but keeping the same SQL text, role, and warehouse maximizes cache hits.