Design schemas, keys, and table properties in Amazon Redshift to maximize query speed, minimize cost, and keep maintenance simple.
Bad models slow queries and inflate storage. Up-front planning on schema layout, distribution key (DISTKEY), and sort key (SORTKEY) produces predictable performance and lower cost.
Use a star when dimension tables change rarely and fact tables grow fast. Opt for a wide, denormalized table when joins dominate workload and updates are minimal.
Choose the column most used in joins—often customer_id or order_id. If the table is small (<5M rows), use DISTSTYLE ALL to replicate it across slices. For unrelated queries, choose EVEN to balance.
1. High cardinality
2. Even distribution
3. Frequently joined
Create a COMPOUND SORTKEY on columns with range predicates (order_date) to enable prune-and-scan. Use INTERLEAVED only when multiple columns are equally common in filters.
Always. Run ANALYZE COMPRESSION on a sample, then apply the suggested ENCODE values during CREATE TABLE to cut storage up to 70% and improve I/O.
Separate raw, staging, and analytics layers. Raw mirrors sources 1:1, staging cleanses, analytics provides curated facts and dims. This protects analysts from breaking ETL and keeps lineage clear.
Implement SCD Type 2 with start_date, end_date, and is_current flags. Use DISTSTYLE ALL because dim tables are small and heavily joined.
Run VACUUM DELETE after large deletes, VACUUM SORT after heavy inserts into sorted tables, and ANALYZE after 5–10% data change to refresh statistics.
Yes. Summarize large fact tables by day or product to speed dashboards. Refresh on schedule or manually after ETL loads.
snake_case for tables and columns, prefix dims with dim_, facts with fct_, and staging with stg_. Consistent names enable fast autocomplete and easier AI copilot suggestions.
No, but heavy joins cost more than extra storage. Denormalize high-traffic dimensions while keeping rarely used reference tables normalized.
Tables regularly exceeding 2 billion rows need partitioning logic, such as monthly roll-off tables or using date ranges in SORTKEY to limit scans.
Yes—create a new table with the desired DISTKEY and INSERT … SELECT data, then swap names. Redshift doesn’t support ALTER DISTKEY directly.