Time series analysis in SQL is the practice of using SQL queries to store, aggregate, transform, and interrogate data that is indexed by time, enabling trend detection, seasonality analysis, anomaly detection, and forecasting directly inside a relational database.
Time series data—transactions, sensor readings, click-streams, financial quotes—makes up a large portion of the data engineers and analysts work with every day. While specialized time-series databases exist, the reality is that most organizations already have years of historical data sitting inside PostgreSQL, MySQL, SQL Server, Snowflake, BigQuery, or Redshift. Learning to perform rigorous time series analysis within SQL unlocks fast insights without data export, simplifies data pipelines, and keeps analysis version-controlled and reproducible. This article shows you how.
Modern data platforms increasingly rely on SQL as the lingua franca for analytics. Moving time-series analysis closer to where data lives offers several advantages:
The backbone of every time-series query is a column that stores temporal information—timestamp
, date
, or an integer epoch. Ensure it is stored in UTC and indexed (e.g., a BTREE
index in PostgreSQL) for fast range scans.
SQL databases do not enforce a fixed frequency, so gaps are common. Techniques such as date spine generation (a calendar table) allow you to fill missing periods before computing rolling metrics.
Functions like ROW_NUMBER()
, LAG()
, LEAD()
, SUM() OVER()
, and AVG() OVER()
enable calculations across ordered sets without self-joins.
Choosing the right bucket size—hour, day, week, month—balances resolution with performance. Use DATE_TRUNC()
(PostgreSQL, Snowflake, Redshift) or TRUNC()
(Oracle) to align timestamps to the desired grain.
For a daily sales series, gaps cause biased rolling averages. Create a date spine and LEFT JOIN
:
WITH date_spine AS (
SELECT generate_series('2023-01-01'::date, '2023-12-31', '1 day') AS day
)
SELECT s.day,
COALESCE(SUM(o.amount),0) AS daily_sales
FROM date_spine s
LEFT JOIN orders o ON o.created_at::date = s.day
GROUP BY s.day
ORDER BY s.day;
Compute a 7-day rolling average directly:
SELECT day,
daily_sales,
AVG(daily_sales) OVER (ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_7d
FROM daily_sales; -- assume this is a view from the previous step
Year-over-Year (YoY) or Week-over-Week (WoW) leverages LAG
with an offset:
SELECT month,
revenue,
LAG(revenue, 12) OVER (ORDER BY month) AS revenue_prev_year,
100.0 * (revenue - LAG(revenue, 12) OVER (ORDER BY month)) / NULLIF(LAG(revenue, 12) OVER (ORDER BY month),0) AS yoy_pct
FROM monthly_revenue;
Identify outliers by comparing each point to a rolling mean and standard deviation:
SELECT timestamp,
metric_value,
(metric_value - AVG(metric_value) OVER w) / NULLIF(stddev_samp(metric_value) OVER w,0) AS z_score
FROM metrics
WINDOW w AS (ORDER BY timestamp ROWS BETWEEN 30 PRECEDING AND CURRENT ROW)
WHERE ABS(z_score) > 3;
Although advanced decomposition (STL, ARIMA) is better suited to Python or R, you can approximate seasonal factors with SQL by grouping:
SELECT EXTRACT(dow FROM day) AS weekday,
AVG(daily_sales) AS avg_sales
FROM daily_sales
GROUP BY weekday;
(timestamp, device_id)
support multi-tenant queries.Suppose we have an e-commerce orders
table:
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
user_id INT,
amount NUMERIC(10,2),
status TEXT,
created_at TIMESTAMP NOT NULL
);
CREATE INDEX ON orders (created_at);
We want:
WITH date_spine AS (
SELECT generate_series(min(created_at)::date,
max(created_at)::date,
'1 day') AS day
FROM orders
),
base AS (
SELECT s.day,
COALESCE(SUM(amount),0) AS gmv
FROM date_spine s
LEFT JOIN orders o ON o.created_at::date = s.day AND o.status='completed'
GROUP BY s.day
),
rolling AS (
SELECT day,
gmv,
SUM(gmv) OVER (ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS gmv_7d
FROM base
),
weekly AS (
SELECT day,
gmv_7d,
LAG(gmv_7d, 7) OVER (ORDER BY day) AS gmv_7d_prev_week
FROM rolling
)
SELECT day,
gmv_7d,
gmv_7d_prev_week,
100.0*(gmv_7d - gmv_7d_prev_week)/NULLIF(gmv_7d_prev_week,0) AS wow_change_pct
FROM weekly
ORDER BY day;
If you are using a modern SQL editor like Galaxy, the workflow is even smoother:
Why it’s wrong: Mixing local timestamps with UTC causes duplicate buckets or missing data at DST transitions.
Fix: Ingest in UTC; cast to local zones only for display.
Why it’s wrong: Self-joins on large tables are slow and hard to maintain.
Fix: Use window functions (ROWS BETWEEN
) instead.
Why it’s wrong: Missing days shrink the denominator, inflating the average.
Fix: Generate a complete date spine and join.
Time series analysis need not require exporting data to specialized tools. With the power of SQL window functions, date spines, and thoughtful indexing, you can perform robust analytics directly in your database—accelerating insight while simplifying your stack. Pair these techniques with a developer-friendly SQL editor like Galaxy, and your team can collaborate, iterate, and productionize time-series logic with confidence.
A majority of operational data is time-indexed, yet many teams export it to external tools for analysis, introducing latency, cost, and governance issues. Mastering time series techniques in SQL keeps computation close to the data, leverages database optimizations, and enables real-time analytics in dashboards and applications without additional infrastructure.
Window functions like LAG()
, LEAD()
, SUM() OVER()
, and AVG() OVER()
are indispensable. DATE_TRUNC()
and generate_series()
(or a calendar table) help with bucketing and gap-filling.
Galaxy’s context-aware AI copilot autocompletes window functions, suggests date bucketing, and can refactor queries when underlying schemas change. Collections make it easy to endorse and share canonical time-series queries.
If you require sub-second ingestion of millions of events per second, long-term retention at high granularity, or advanced forecasting algorithms, consider specialized databases like TimescaleDB or InfluxDB. For most analytical workloads, standard SQL engines suffice.
Basic extrapolations (linear regression via REGR_SLOPE
) are possible, but sophisticated models (ARIMA, Prophet) are better handled in Python/R. Use SQL to prepare features and offload modeling to specialized libraries.