Snowflake Storage Tiering

Galaxy Glossary

How can I reduce Snowflake storage costs with tiering?

Snowflake storage tiering is the practice of systematically moving infrequently accessed or archival data from high-cost internal Snowflake storage to lower-cost tiers—such as external cloud object stores—while preserving query access and data-retention requirements.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.

Description

Table of Contents

Overview

Snowflake’s pay-as-you-go model lets teams scale elastically, but leaving every byte in the default internal storage tier can inflate monthly bills. Storage tiering reduces cost by placing hot, warm, and cold data on increasingly inexpensive media—yet still keeps the data queryable through Snowflake’s external table, Iceberg, and data lifecycle management features.

Why Snowflake Storage Tiering Matters

Unlike compute, storage fees in Snowflake accrue 24 × 7. Organizations that ingest terabytes or petabytes of raw logs, IoT telemetry, or historical snapshots often discover that storage outpaces compute as the primary cost driver. By introducing a disciplined tiering strategy you can:

  • Lower long-term TCO without deleting valuable history
  • Shorten backup and cloning times by minimizing table sizes
  • Speed up query performance on hot data sets through partition pruning
  • Stay compliant with retention policies by retaining, not discarding, aged data

How Storage Is Priced in Snowflake

Internal Database Storage

Data loaded directly into Snowflake tables is compressed, encrypted, and stored in Snowflake-managed cloud object storage. Pricing (as of 2024-06) is roughly $23/TB/month after compression.

Fail-safe and Time Travel

Fail-safe (7 days) and Time Travel (1–90 days) use additional internal storage. Even if a row is logically deleted, physical bytes persist until the windows expire.

External & Hybrid Storage

Snowflake can query data that lives in your own Amazon S3, Azure Data Lake Storage (ADLS), or Google Cloud Storage (GCS) through:

  • External Tables
  • Iceberg Tables (managed or unmanaged)
  • Snowflake on Iceberg (preview)

Because you pay the cloud provider’s native object-store prices (≈$21/TB for S3 Standard, $10/TB for S3 Infrequent Access, $2–$4/TB for archival tiers), large savings are possible.

Designing a Tiering Strategy

1. Classify Data by Access Frequency

Start by labeling each schema or table as Hot (queried daily), Warm (queried weekly/monthly), or Cold (rarely queried but must be retained). Query histories in SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY and STORAGE_USAGE_DAILY help with this classification.

2. Set Retention Windows

Decide how long a table must remain in the internal tier before offloading. For example:

  • Clickstream events: 90 days hot, 275 days warm, archive thereafter
  • Financial transactions: 1 year hot, 6 years warm (regulatory), then archive

3. Choose an Offload Mechanism

Snowflake supports multiple patterns:

  1. External Tables: Copy Parquet/CSV files to object storage and create an external table referencing them.
  2. Iceberg Tables: Convert internal tables to Iceberg format, optionally letting Snowflake manage the metadata catalog.
  3. UNLOAD: Use COPY INTO @stage to export aged partitions.
  4. Snowpipe / Streams & Tasks: Continuously unload cold partitions downstream.

4. Automate with Streams & Tasks

Pair STREAMs (change data capture) with TASKs to empty internal partitions once data lands externally, guaranteeing no data loss.

5. Abstract Access with Views

Create a UNION ALL view that stitches hot (internal) and cold (external/Iceberg) layers, keeping BI dashboards and downstream tools agnostic.

Practical Tiering Workflow

  1. Identify partitions older than 90 days.
  2. COPY INTO an external stage in Parquet with compression snappy.
  3. DELETE those partitions from the internal table.
  4. Create or refresh an external table pointing to the new object-store prefix.
  5. Grant SELECT privileges on both internal and external tables to a role consumed by a view.
  6. Schedule the process via a TASK that runs daily.

Code Walkthrough

-- 1. Stage for warm+cold tier
CREATE OR REPLACE STAGE my_s3_stage
URL = 's3://acme-archive/clickstream/'
FILE_FORMAT = (TYPE = PARQUET COMPRESSION = SNAPPY);

-- 2. Task to offload data older than 90 days
CREATE OR REPLACE TASK offload_clickstream_90d
WAREHOUSE = tiering_wh
SCHEDULE = 'USING CRON 0 5 * * * America/Los_Angeles'
AS
DECLARE
offload_date DATE := DATEADD('day', -90, CURRENT_DATE());
BEGIN
-- a) Export to S3
COPY INTO @my_s3_stage/clickstream_90d_
FROM (
SELECT * FROM analytics.clickstream
WHERE event_date < offload_date
)
FILE_FORMAT = (TYPE = PARQUET COMPRESSION = SNAPPY)
SINGLE = FALSE;

-- b) Delete exported rows (shallow copy remains in Time Travel for 1 day)
DELETE FROM analytics.clickstream WHERE event_date < offload_date;
END;

The UNLOAD-then-DELETE pattern keeps recent data hot while archiving older partitions.

Measuring Savings

Use the following query to monitor total bytes in each tier:

SELECT
TO_DATE(usage_date) AS day,
SUM(bytes) / 1e9 AS gb,
storage_type
FROM SNOWFLAKE.ACCOUNT_USAGE.STORAGE_USAGE_DAILY
WHERE table_name IN ('CLICKSTREAM', 'CLICKSTREAM_EXTERNAL')
GROUP BY day, storage_type;

Combine with cloud-provider object-store metrics to estimate blended $/TB.

Best Practices

  • Compress aggressively: Use columnar formats (Parquet, ORC) with Snappy or ZSTD.
  • Partition by date or key to ensure efficient COPY INTO and predicate pushdown.
  • Lower Time Travel for write-once tables—1 day is often enough.
  • Use TAGs and POLICY objects to document retention policies.
  • Validate data integrity via CHECKSUM or row counts after each offload.
  • Monitor fail-safe spikes; these can negate savings if deletes are too large at once.

Common Misconceptions & Pitfalls

Tiering equals deletion

Offloading doesn’t mean losing access; views can seamlessly query external data.

Time Travel is free

Long windows dramatically increase internal storage usage; treat it as premium storage.

One-time migration is enough

Without automation, data will accumulate again. Pair tiering with scheduled TASKs.

Galaxy & Storage Tiering

While Galaxy is a SQL editor—not a storage platform—it accelerates tiering projects by:

  • Surfacing auto-complete for ACCOUNT_USAGE views, helping you audit storage quickly.
  • Letting teams share and endorse the canonical offload TASK via Galaxy Collections, preventing divergent scripts in Slack.
  • Using the AI Copilot to optimize COPY INTO commands and spot missing partitions.

Conclusion

Snowflake’s elasticity is powerful, but unchecked storage growth can erode budgets. By adopting a disciplined, automated storage-tiering strategy—classifying data, reducing Time Travel, offloading to external or Iceberg tables, and surfacing the right views—you can cut storage costs by 40–80 % while keeping data available for analytics.

Why Snowflake Storage Tiering is important

Storage often becomes the dominant cost factor in Snowflake deployments. Implementing tiering lets organizations keep explosive data growth in check without sacrificing analytical capability, helping teams meet budget targets, improve query performance on hot data, and remain compliant with retention mandates.

Snowflake Storage Tiering Example Usage


SELECT table_name, bytes / 1e9 AS gb FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS WHERE schema_name = 'ANALYTICS' ORDER BY gb DESC;

Snowflake Storage Tiering Syntax



Common Mistakes

Frequently Asked Questions (FAQs)

Does tiering slow down queries?

Queries on hot data remain fast because it stays in internal storage. Cold queries may read from external object storage, which is marginally slower, but you can mitigate this by partition pruning and caching.

Can I still use Snowflake features like RBAC on external tables?

Yes. External and Iceberg tables integrate with the same role-based access control you use for internal tables, ensuring consistent security.

What happens if I need to bring data back?

You can reload Parquet files with COPY INTO or simply query them in place through external tables. For frequently accessed partitions, consider re-inserting them into an internal clone.

How does Galaxy help with storage tiering?

Galaxy’s AI Copilot writes and optimizes COPY INTO and Stream/Task scripts, while Collections let teams share endorsed tiering workflows—eliminating version drift across Slack or Notion.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.