Techniques and best practices for designing, refreshing, and querying Redshift Spectrum materialized views to achieve faster analytics at lower cost.
Optimizing materialized views in Amazon Redshift Spectrum is one of the most effective ways to accelerate analytical workloads that span your Redshift cluster and Amazon S3 data lake.
This guide walks through design principles, partitioning tactics, incremental refresh strategies, and query-writing tips that dramatically reduce scan costs and refresh times for Spectrum materialized views.
A materialized view (MV) in Amazon Redshift Spectrum is a pre-computed, physically stored result set that references external tables in your AWS Glue Data Catalog. When you query the MV, Redshift returns the already-calculated data instead of scanning raw objects in S3, slashing latency and data-scanned charges.
Out of the box, an MV improves read performance, but if it is built on poorly partitioned data, updates too often, or stores unnecessary columns, you may still pay for excessive S3 scans and spend minutes refreshing. Careful optimization ensures you:
Redshift stores MV data inside the cluster; it does not keep results in S3. Querying an MV therefore leverages the cluster’s compressed columnar storage and caching, while refresh operations must re-read S3 files referenced in the MV definition.
REFRESH MATERIALIZED VIEW mv_name;
recomputes the entire result set.SUM
, COUNT
, MIN
, MAX
, or AVG
).ALTER MATERIALIZED VIEW … AUTO REFRESH YES
to let Redshift refresh during its maintenance window.Partitioning is the single largest lever for refresh performance. Because Spectrum MVs scan S3 only for partitions that changed since the last refresh, fine-grained partitions (e.g., dt
=YYYY-MM-DD) mean smaller scans. Aim for daily or even hourly partitions if ingestion volume is high.
Store data in Parquet or ORC with Snappy compression. Columnar formats enable predicate pushdown, and compressed files reduce I/O during refresh.
Every column selected in the MV definition is physically stored inside Redshift. Avoid SELECT *
—enumerate only the fields required by consuming queries.
Write the MV definition so it qualifies for incremental refresh:
DISTINCT
, LIMIT
, or complex subqueries.If your workload demands more complex logic, consider staging data with an ELT job and then building an MV on the cleaned dataset.
Because MV data lives in Redshift, DISTKEY and SORTKEY choices directly influence query latency:
DISTKEY
to the most common join column between the MV and fact tables.SORTKEY
starting with the primary filter column (often dt
).Refreshing during business-critical hours can consume cluster slots and slow user queries. Instead:
AUTO REFRESH
but time-bound the cluster maintenance window.REFRESH MATERIALIZED VIEW
statements in an external scheduler (Airflow, dbt, Step Functions) during off-peak periods.Key system tables:
SVL_MV_REFRESH_STATUS
– per-refresh duration, incremental vs full, rows scanned.SVV_MATERIALIZED_VIEWS
– size and auto-refresh settings.STL_SCAN
+ STL_QUERY
– detect if queries still hit S3 instead of the MV.Look for rising scan_size_mb
or a switch from incremental to full refresh, and re-evaluate partitions or eligibility rules as data grows.
CREATE EXTERNAL TABLE spectrum.events (
event_id BIGINT,
user_id BIGINT,
event_type VARCHAR(50),
event_timestamp TIMESTAMP,
metadata VARCHAR
)
PARTITIONED BY (dt DATE)
STORED AS PARQUET
LOCATION 's3://my-lake/events/';
ALTER TABLE spectrum.events ADD
IF NOT EXISTS
PARTITION (dt='2024-03-01') LOCATION 's3://my-lake/events/dt=2024-03-01/';
CREATE MATERIALIZED VIEW mv_daily_events
DISTKEY(user_id)
SORTKEY(dt, user_id)
AUTO REFRESH NO
AS
SELECT
dt,
user_id,
COUNT(*) AS total_events,
SUM(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS purchases
FROM spectrum.events
GROUP BY dt, user_id;
REFRESH MATERIALIZED VIEW mv_daily_events;
The first refresh scans all historical partitions. Subsequent refreshes only read new dt
partitions, often finishing in seconds.
SELECT *
FROM mv_daily_events
WHERE dt >= '2024-03-01'
AND user_id = 123;
Because the result sits inside the cluster, dashboard queries return in milliseconds.
Galaxy’s desktop SQL editor accelerates MV optimization:
SELECT *
.SVL_MV_REFRESH_STATUS
.Why it hurts: Spectrum scans every S3 object at each refresh.
Fix: Re-ingest or reorganize data into partitioned folders (e.g., dt=YYYY-MM-DD
).
Why it hurts: Bloats MV size and invalidates incremental refresh when new columns appear.
Fix: Explicitly list only columns consumed downstream.
Why it hurts: Full refreshes read all partitions, spiking runtime and cost.
Fix: Verify MV still meets incremental rules; if business logic changed, separate heavy transforms into a staging table.
Only as often as your SLA requires. Most teams with hourly data load refresh MVs hourly or daily during low-traffic windows.
Yes—if the SQL meets eligibility rules. Alter the view’s SQL using CREATE OR REPLACE MATERIALIZED VIEW
; then run a full refresh once. Future refreshes become incremental automatically.
Spectrum incremental refresh detects new partitions only. If older partitions are updated in place, you must trigger a full refresh or rebuild the MV.
Absolutely. Galaxy’s AI copilot helps you write eligible SQL, and the editor’s result-pane shows whether Redshift used incremental refresh, so you can iterate quickly.
Materialized views account for a large share of Redshift Spectrum costs. Without thoughtful design, every refresh can trigger a full S3 scan and every query may still hit raw data, defeating the purpose of an MV. Optimization techniques—partition pruning, column projection, incremental refresh—ensure you gain the performance benefit of cached, compressed data while paying only for the new slices of your lake that arrive each day. For data engineers, mastering these levers means faster dashboards, lower AWS bills, and happier stakeholders.
Only as often as your SLA requires. Most teams with hourly data load refresh MVs hourly or daily during low-traffic windows.
Yes—if the SQL meets eligibility rules. Alter the view’s SQL using CREATE OR REPLACE MATERIALIZED VIEW
; then run a full refresh once. Future refreshes become incremental automatically.
Spectrum incremental refresh detects new partitions only. If older partitions are updated in place, you must trigger a full refresh or rebuild the MV.
Absolutely. Galaxy’s AI copilot helps you write eligible SQL, and the editor’s result-pane shows whether Redshift used incremental refresh, so you can iterate quickly.