Guidelines for selecting, implementing, and maintaining Amazon Redshift sort keys to ensure optimal query performance and cost-efficiency in 2025.
Amazon Redshift stores data in columnar blocks on disk. A sort key defines the physical order of those blocks on each node. When chosen well, sort keys let the query planner skip entire blocks, reduce disk I/O, and accelerate joins, filters, and aggregations. When chosen poorly, they bloat storage, slow VACUUM operations, and drive up costs. This article distills 2025 best practices for designing, monitoring, and evolving sort keys in production clusters.
Redshift has evolved: RA3 nodes with managed storage, AQUA acceleration, and AUTO
table tuning have changed some conventions. Yet physical data locality remains critical for:
Rows are ordered by the first column, then the second, and so on. Best when queries filter mostly on the leading column.
Redshift tries to balance importance across all columns. Works when queries filter on different combinations, but incurs higher maintenance cost.
Introduced in 2023, AUTO
lets Redshift pick and adjust the key based on query history. Handy for new tables but not a silver bullet.
Analyze SVL_QUERY_SUMMARY
or the system_query_history
view to find the columns most frequently used in WHERE
, JOIN
, and ORDER BY
clauses. Prioritize those columns for sort keys.
A date column with one value per day on a billion-row table is selective; a status
column with five values isn’t. High cardinality maximizes block skipping.
If a table is KEY
distributed on customer_id
, consider making customer_id
the leading sort column. This co-locates and co-orders data, speeding up JOIN
operations.
AUTO
works well for ad-hoc analytics or new datasets with evolving access patterns. For stable, high-traffic tables, explicit keys still outperform.
Time-partitioned fact tables (e.g., events) benefit from compound
on event_date
. Dashboards that slice by region
or product
warrant interleaved
keys on both columns.
Limit a compound key to 1–2 columns; every extra column increases load time and VACUUM cost. In 2025, Redshift still stores all key columns in block metadata.
After heavy UPDATE
/DELETE
workloads, run VACUUM REINDEX
instead of FULL
to re-sort with less write amplification on RA3 managed storage.
Query SVV_TABLE_INFO.unsorted
. Keep the unsorted percentage < 5% for compound and < 20% for interleaved keys. Schedule alerts via CloudWatch.
Access patterns change. Every quarter, replay query logs through the Redshift Advisor or third-party tools to validate that current keys still prune blocks.
Store the reasoning (e.g., “optimized for churn dashboard”) in a data catalog or comment. Future engineers save hours when deciding whether to modify keys.
Suppose we ingest clickstream events into events_raw
and query them mainly by event date and user ID.
CREATE TABLE events_raw
(
event_timestamp TIMESTAMP,
user_id BIGINT,
event_type VARCHAR(50),
page_url VARCHAR(2048),
payload VARCHAR(65535)
)
DISTSTYLE KEY
DISTKEY (user_id)
SORTKEY (event_timestamp, user_id);
Queries filtering on event_timestamp BETWEEN '2025-01-01' AND '2025-01-31'
and a specific user_id
scan only a fraction of blocks.
Why it’s wrong: Low cardinality prevents effective pruning.
Fix: Move the high-cardinality column to the front or switch to interleaved.
Why it’s wrong: Newly appended blocks are unsorted, negating benefits.
Fix: Schedule VACUUM REINDEX
or enable automatic table optimization.
Why it’s wrong: AUTO may oscillate keys, causing jittery performance.
Fix: Lock the sort key once workload stabilizes using ALTER TABLE ... ALTER SORTKEY
.
Galaxy’s modern SQL editor detects table metadata and surfaces sort-key definitions inline. Its AI copilot can:
SVL_STATEMENTTEXT
.This reduces guesswork and keeps engineering teams aligned on performance best practices.
-- 1. Examine current sort key effectiveness
SELECT tbl, unsorted
FROM SVV_TABLE_INFO
WHERE schema = 'public'
AND tbl = 'events_raw';
-- 2. Re-sort if unsorted > 5%
VACUUM REINDEX public.events_raw;
-- 3. Alter to INTERLEAVED if workload shifts
ALTER TABLE public.events_raw
ALTER SORTKEY INTERLEAVED (user_id, event_type);
-- 4. Lock in the new key to prevent AUTO changes
ALTER TABLE public.events_raw ALTER SORTKEY; -- no columns: locks current state
Despite automation advances, thoughtful sort-key design is still one of the highest-ROI tuning levers in Redshift. By combining data-driven design, disciplined maintenance, and tooling like Galaxy, teams can deliver snappy dashboards and analytical workloads well into 2025.
Sort keys remain the primary mechanism for data pruning in Amazon Redshift. In 2025, workloads are larger and more real-time, RA3 managed storage bills can explode, and concurrency demands are higher. Proper sort-key strategy directly reduces I/O, speeds queries, and keeps cloud costs predictable, making it critical knowledge for data engineers and analytics teams.
Query SVV_TABLE_INFO
or STV_BLOCKLIST
. If unsorted
exceeds 5% for compound keys, run VACUUM REINDEX
.
Use interleaved when queries filter on many different columns with similar frequencies, such as multidimensional drill-down dashboards.
Yes. Use ALTER TABLE ... ALTER SORTKEY
to define a new key and then VACUUM FULL
or REINDEX
. Expect write amplification.
Galaxy’s AI copilot inspects query history to suggest better sort keys, auto-generates VACUUM commands, and flags queries that bypass existing keys—all inside its SQL editor.