Guidelines that optimize Redshift performance, cost, and reliability.
Use star or snowflake schemas to reduce joins. Keep fact tables narrow; move infrequently used columns to dimension tables. Declare data types precisely—numeric(10,2) instead of varchar—so Redshift stores fewer bytes and scans less data.
Pick a DISTKEY used in large joins (e.g., customer_id) to colocate rows on the same node. Choose a SORTKEY that appears in range filters or order-by clauses (e.g., order_date) so Redshift can skip blocks with zone maps.
Load from Amazon S3 using COPY
with COMPUPDATE OFF
and STATUPDATE OFF
for large batches. Compress files as gzip or zstd, split into 1–4 GB parts, and use the manifest
option for exactly-once loading.
Delete and update operations create deleted blocks. VACUUM
reclaims space and re-sorts rows, while ANALYZE
refreshes statistics so the planner chooses optimal join orders. Automate both in nightly schedules.
Create separate queues for ETL, dashboards, and ad-hoc users. Assign query_group
or user_group
rules. Enable concurrency_scaling
for bursty BI loads, and set timeout
to cancel runaway queries.
Encrypt clusters at rest with KMS, enforce SSL in JDBC strings, rotate IAM keys, and restrict inbound traffic to specific security-group CIDRs. Use role-based access control via GRANT
and REVOKE
.
Archive cold tables to Redshift Spectrum or S3 LakeHouse, use RA3 nodes with managed storage, and enable automatic snapshot
retention tuning. Right-size concurrency scaling and pause development clusters overnight.
Yes—small dimension tables (<10 GB) benefit from DISTSTYLE ALL
because every node has a copy, eliminating broadcast steps in joins.
No direct ALTER exists. Create a new table with the desired DISTKEY, INSERT INTO new SELECT * FROM old
, then swap names.
Run VACUUM FULL weekly for write-heavy tables or when deleted rows exceed 20%. Lightweight VACUUM DELETE can run daily.