Use EXPLAIN, ANALYZE, system tables, and WLM logs to find slow steps and fix Amazon Redshift queries.
High scan volume, skewed data distribution, missing sort keys, or queue waits usually cause latency. Measuring each factor pinpoints the fix.
Combine EXPLAIN
, EXPLAIN ANALYZE
, system views (SVL_*
, STL_*
), and the Workload Management (WLM) queue log. Together they reveal scan size, join type, CPU use, and slot contention.
Look for sequential scans, DS_DIST_*
steps, and high estimated row counts. These signals show unsorted tables, distribution key fights, or missing predicates.
Run EXPLAIN ANALYZE
. Redshift executes the statement, time-stamps every step, then returns actual row and byte counts. Compare with the estimated plan to locate misestimates.
SVL_QUERY_REPORT
(overall cost & bytes), SVL_QUERY_SUMMARY
(per-step time), STL_WLM_QUERY
(queue waits), and SVL_ALERT_EVENT_LOG
(disk spills) surface bottlenecks without rerunning the query.
1) Verify predicates use SORTKEY columns. 2) Ensure joins use matching DISTKEYs. 3) Vacuum & analyze tables. 4) Move heavy ad-hoc work to a lower-priority WLM queue.
Suppose Orders
has a DISTKEY on customer_id
but OrderItems
is EVEN. Joining them forces data redistribution. EXPLAIN will display DS_DIST_BOTH
. Make OrderItems.customer_id
a DISTKEY to eliminate the shuffle.
Capture the query
and pid
from stl_query
, monitor with stv_recents
, and abort long-running tests quickly using CANCEL pid
.
No, Redshift forbids most extensions. Use EXPLAIN ANALYZE and system views instead.
Yes. Query IDs in SVL_QUERY_REPORT
and SVL_QUERY_SUMMARY
store execution stats for 7–14 days by default.
Check STL_WLM_QUERY
for total_queue_time
. Increase slots, move the user to a higher service class, or rewrite the query for lower resource use.