Snowflake Storage Tiering

How can I reduce Snowflake storage costs with tiering?

Snowflake storage tiering is the practice of systematically moving infrequently accessed or archival data from high-cost internal Snowflake storage to lower-cost tiers—such as external cloud object stores—while preserving query access and data-retention requirements.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

Overview

Snowflake’s pay-as-you-go model lets teams scale elastically, but leaving every byte in the default internal storage tier can inflate monthly bills. Storage tiering reduces cost by placing hot, warm, and cold data on increasingly inexpensive media—yet still keeps the data queryable through Snowflake’s external table, Iceberg, and data lifecycle management features.

Why Snowflake Storage Tiering Matters

Unlike compute, storage fees in Snowflake accrue 24 × 7. Organizations that ingest terabytes or petabytes of raw logs, IoT telemetry, or historical snapshots often discover that storage outpaces compute as the primary cost driver. By introducing a disciplined tiering strategy you can:

Lower long-term TCO without deleting valuable history
Shorten backup and cloning times by minimizing table sizes
Speed up query performance on hot data sets through partition pruning
Stay compliant with retention policies by retaining, not discarding, aged data

How Storage Is Priced in Snowflake

Internal Database Storage

Data loaded directly into Snowflake tables is compressed, encrypted, and stored in Snowflake-managed cloud object storage. Pricing (as of 2024-06) is roughly $23/TB/month after compression.

Fail-safe and Time Travel

Fail-safe (7 days) and Time Travel (1–90 days) use additional internal storage. Even if a row is logically deleted, physical bytes persist until the windows expire.

External & Hybrid Storage

Snowflake can query data that lives in your own Amazon S3, Azure Data Lake Storage (ADLS), or Google Cloud Storage (GCS) through:

External Tables
Iceberg Tables (managed or unmanaged)
Snowflake on Iceberg (preview)

Because you pay the cloud provider’s native object-store prices (≈$21/TB for S3 Standard, $10/TB for S3 Infrequent Access, $2–$4/TB for archival tiers), large savings are possible.

Designing a Tiering Strategy

1. Classify Data by Access Frequency

Start by labeling each schema or table as Hot (queried daily), Warm (queried weekly/monthly), or Cold (rarely queried but must be retained). Query histories in SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY and STORAGE_USAGE_DAILY help with this classification.

2. Set Retention Windows

Decide how long a table must remain in the internal tier before offloading. For example:

Clickstream events: 90 days hot, 275 days warm, archive thereafter
Financial transactions: 1 year hot, 6 years warm (regulatory), then archive

3. Choose an Offload Mechanism

Snowflake supports multiple patterns:

External Tables: Copy Parquet/CSV files to object storage and create an external table referencing them.
Iceberg Tables: Convert internal tables to Iceberg format, optionally letting Snowflake manage the metadata catalog.
UNLOAD: Use COPY INTO @stage to export aged partitions.
Snowpipe / Streams & Tasks: Continuously unload cold partitions downstream.

4. Automate with Streams & Tasks

Pair STREAMs (change data capture) with TASKs to empty internal partitions once data lands externally, guaranteeing no data loss.

5. Abstract Access with Views

Create a UNION ALL view that stitches hot (internal) and cold (external/Iceberg) layers, keeping BI dashboards and downstream tools agnostic.

Practical Tiering Workflow

Identify partitions older than 90 days.
COPY INTO an external stage in Parquet with compression snappy.
DELETE those partitions from the internal table.
Create or refresh an external table pointing to the new object-store prefix.
Grant SELECT privileges on both internal and external tables to a role consumed by a view.
Schedule the process via a TASK that runs daily.

Code Walkthrough

-- 1. Stage for warm+cold tier CREATE OR REPLACE STAGE my_s3_stage URL = 's3://acme-archive/clickstream/' FILE_FORMAT = (TYPE = PARQUET COMPRESSION = SNAPPY); -- 2. Task to offload data older than 90 days CREATE OR REPLACE TASK offload_clickstream_90d WAREHOUSE = tiering_wh SCHEDULE = 'USING CRON 0 5 * * * America/Los_Angeles' AS DECLARE offload_date DATE := DATEADD('day', -90, CURRENT_DATE()); BEGIN -- a) Export to S3 COPY INTO @my_s3_stage/clickstream_90d_ FROM ( SELECT * FROM analytics.clickstream WHERE event_date < offload_date ) FILE_FORMAT = (TYPE = PARQUET COMPRESSION = SNAPPY) SINGLE = FALSE; -- b) Delete exported rows (shallow copy remains in Time Travel for 1 day) DELETE FROM analytics.clickstream WHERE event_date < offload_date; END;

The UNLOAD-then-DELETE pattern keeps recent data hot while archiving older partitions.

Measuring Savings

Use the following query to monitor total bytes in each tier:

SELECT TO_DATE(usage_date) AS day, SUM(bytes) / 1e9 AS gb, storage_type FROM SNOWFLAKE.ACCOUNT_USAGE.STORAGE_USAGE_DAILY WHERE table_name IN ('CLICKSTREAM', 'CLICKSTREAM_EXTERNAL') GROUP BY day, storage_type;

Combine with cloud-provider object-store metrics to estimate blended $/TB.

Best Practices

Compress aggressively: Use columnar formats (Parquet, ORC) with Snappy or ZSTD.
Partition by date or key to ensure efficient COPY INTO and predicate pushdown.
Lower Time Travel for write-once tables—1 day is often enough.
Use TAGs and POLICY objects to document retention policies.
Validate data integrity via CHECKSUM or row counts after each offload.
Monitor fail-safe spikes; these can negate savings if deletes are too large at once.

Common Misconceptions & Pitfalls

Tiering equals deletion

Offloading doesn’t mean losing access; views can seamlessly query external data.

Time Travel is free

Long windows dramatically increase internal storage usage; treat it as premium storage.

One-time migration is enough

Without automation, data will accumulate again. Pair tiering with scheduled TASKs.

Galaxy & Storage Tiering

While Galaxy is a SQL editor—not a storage platform—it accelerates tiering projects by:

Surfacing auto-complete for ACCOUNT_USAGE views, helping you audit storage quickly.
Letting teams share and endorse the canonical offload TASK via Galaxy Collections, preventing divergent scripts in Slack.
Using the AI Copilot to optimize COPY INTO commands and spot missing partitions.

Conclusion

Snowflake’s elasticity is powerful, but unchecked storage growth can erode budgets. By adopting a disciplined, automated storage-tiering strategy—classifying data, reducing Time Travel, offloading to external or Iceberg tables, and surfacing the right views—you can cut storage costs by 40–80 % while keeping data available for analytics.

Why Snowflake Storage Tiering is important

Storage often becomes the dominant cost factor in Snowflake deployments. Implementing tiering lets organizations keep explosive data growth in check without sacrificing analytical capability, helping teams meet budget targets, improve query performance on hot data, and remain compliant with retention mandates.

Snowflake Storage Tiering Example Usage


SELECT table_name, bytes / 1e9 AS gb FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS WHERE schema_name = 'ANALYTICS' ORDER BY gb DESC;

Snowflake Storage Tiering Syntax

Common Mistakes

Leaving Time Travel at the default 7 days for append-only or transient tables. This retains multiple versions unnecessarily. Fix by setting DATA_RETENTION_TIME_IN_DAYS = 1 or using transient tables where possible.
Exporting CSV instead of Parquet when offloading. CSV is verbose and lacks column pruning, leading to higher cloud-storage bills and slower reads. Use columnar compressed formats such as Parquet with Snappy or ZSTD.
Running large DELETEs without batching, causing fail-safe storage spikes and increased costs. Always delete in partitions and monitor fail-safe impact via ACCOUNT_USAGE views.

Frequently Asked Questions (FAQs)

Does tiering slow down queries?

Queries on hot data remain fast because it stays in internal storage. Cold queries may read from external object storage, which is marginally slower, but you can mitigate this by partition pruning and caching.

Can I still use Snowflake features like RBAC on external tables?

Yes. External and Iceberg tables integrate with the same role-based access control you use for internal tables, ensuring consistent security.

What happens if I need to bring data back?

You can reload Parquet files with COPY INTO or simply query them in place through external tables. For frequently accessed partitions, consider re-inserting them into an internal clone.

How does Galaxy help with storage tiering?

Galaxy’s AI Copilot writes and optimizes COPY INTO and Stream/Task scripts, while Collections let teams share endorsed tiering workflows—eliminating version drift across Slack or Notion.