Query tuning in Snowflake means adjusting SQL, warehouse, and data-model settings to reduce runtime and cost.
Slow queries usually scan too much data, spill to local disk, or wait on a small warehouse. Start by checking QUERY_HISTORY for execution time, bytes scanned, and percentage of time spent on compilation, execution, and queuing.
Run EXPLAIN USING TEXT <your_query>
. Focus on the SCAN nodes. High “partitions scanned” or “bytes scanned” values signal poor pruning. Large “output rows” in JOIN nodes show the join order needs work.
Warehouse size sets CPU and memory. Increase from X-SMALL to MEDIUM for complex joins. Auto-Suspend lowers costs between runs. Auto-Resume wakes the warehouse instantly, so leave it on to avoid queue waits.
Filter early with selective WHERE clauses on partition columns like order_date
. Select only needed columns. Join on surrogate keys (customer_id
) instead of wide text columns.
Yes. Add a clustering key on Orders(order_date)
or a multi-column key on OrderItems(order_id, product_id)
. Recluster heavy-write tables during low-traffic windows.
If dashboards rerun the same query often, result caching returns data in milliseconds. Use fully qualified table names and identical SQL to hit the cache. Any DML on referenced tables invalidates the cache.
Star-schema designs keep fact tables like OrderItems
narrow and dimension tables like Products
wide. Declare PRIMARY KEY
constraints (even though Snowflake doesn’t enforce them) to improve planner statistics.
Multiply bytes scanned by your storage cost per TB to estimate scan cost. Warehouse cost equals execution time × credits per hour for the warehouse size. Reducing scan size usually saves more than downsizing warehouses.
1) Add filters and LIMIT early. 2) Create MATERIALIZED VIEWs for common aggregates. 3) Batch INSERTs to avoid micro-partitions with tiny files. 4) Scale warehouses temporarily for backfills, then scale down.
No traditional B-tree indexes. Micro-partition metadata and clustering keys handle pruning. Focus on good clustering instead of indexes.
Yes. Any DML on a table invalidates cached results for queries that reference that table.
For high-volume tables with uneven data distribution, AUTOCLUSTER keeps pruning efficient and can pay for itself by reducing query runtime.