A rolling average (moving average) is a technique that computes the mean of a fixed-size sliding window across a sequence, smoothing short-term fluctuations while preserving long-term trends.
Rolling Average in Python
Learn why rolling averages matter for time-series analytics, how to compute them efficiently in Python with pandas, NumPy, and pure Python, and how to avoid common performance and accuracy pitfalls.
A rolling average—also called a moving average or running mean—is a statistic that calculates the mean value of a fixed-length window that slides over a data series. At every step, the oldest observation drops out and the newest one drops in, creating a smoothed version of the original signal. Rolling averages are invaluable when you need to reduce noise in time-series data, detect trends, or prepare features for machine-learning models.
Time-series data often exhibits random fluctuations that can obscure underlying patterns. Rolling averages provide a simple yet powerful way to:
n
)The number of consecutive observations used to compute each average. Larger windows smooth more aggressively but lose local detail.
Most libraries compute the window ending at the current index (right-aligned). Setting center=True
in pandas centers the window around the current index.
For the first n-1
points, the window is incomplete. Strategies include returning NaN
, reducing the window size (min_periods
in pandas), or padding with nulls/zeros.
def rolling_average(series, n):
"""Compute rolling mean using a simple loop (O(N*n))."""
if n <= 0:
raise ValueError("Window size must be positive")
result = []
window_sum = 0.0
for i, x in enumerate(series):
window_sum += x
if i >= n:
window_sum -= series[i - n]
if i >= n - 1:
result.append(window_sum / n)
else:
result.append(None) # or float('nan')
return result
While easy to understand, this method is slow for large arrays because it operates in pure Python.
import numpy as np
def rolling_average_numpy(a, n):
"""Vectorized rolling mean using NumPy's stride tricks (O(N))."""
if n > len(a):
raise ValueError("Window longer than array")
cumsum = np.cumsum(np.insert(a, 0, 0))
return (cumsum[n:] - cumsum[:-n]) / n
This prefix-sum trick runs in linear time and leverages C-optimized routines.
import pandas as pd
df = pd.DataFrame({"ts": pd.date_range("2024-01-01", periods=10, freq="D"),
"value": [13, 15, 14, 16, 12, 11, 18, 20, 17, 19]})
df["ma_3"] = df["value"].rolling(window=3, min_periods=1).mean()
print(df)
pandas supports:
df.groupby('user').value.rolling(7)
)rolling('7D')
)win_type='triang'
, center=True
)import polars as pl
lazy_df = pl.DataFrame({"value": [1, 2, 3, 4, 5]}).lazy()
result = lazy_df.select(pl.col("value").rolling_mean(window_size=2)).collect()
print(result)
Polars uses Rust under the hood and can be orders of magnitude faster than pandas for large data sets.
min_periods
to balance accuracy and NaN propagation at data boundaries.A simple moving average weights all observations equally, whereas an EMA assigns exponentially decreasing weights to older points. Choosing the wrong method can distort trend analysis.
Rolling averages smooth high-frequency noise but cannot eliminate structural changes, outliers, or non-stationarity. Further preprocessing may be required.
The window length governs smoothing, not the number of steps you can reliably forecast. These are related but distinct parameters.
import pandas as pd
import matplotlib.pyplot as plt
# 1. Load synthetic daily traffic data
rng = pd.date_range("2024-01-01", periods=120, freq="D")
traffic = (200 + 30 * np.sin(np.linspace(0, 12, 120)) + np.random.randn(120) * 10)
df = pd.DataFrame({"date": rng, "visits": traffic}).set_index("date")
# 2. Compute 7-day and 30-day rolling means
for w in (7, 30):
df[f"ma_{w}"] = df["visits"].rolling(window=w, min_periods=w).mean()
# 3. Plot
ax = df[["visits", "ma_7", "ma_30"]].plot(figsize=(10, 4))
ax.set_title("Website Traffic With Rolling Averages")
plt.show()
The 7-day curve smooths random day-to-day variance, while the 30-day line reveals the underlying seasonal trend.
Assigns different weights to observations, e.g., triangular or Gaussian windows. In pandas, specify win_type
and optional parameters like std
.
df["gma"] = df["visits"].rolling(window=15, win_type="gaussian", std=3).mean()
Uses a decay factor alpha
. In pandas: df["ema"] = df["visits"].ewm(span=10).mean()
.
In SQL, rolling averages are implemented via WINDOW
functions such as AVG(value) OVER (ORDER BY ts ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
. Galaxy’s modern SQL editor makes crafting and sharing such window functions effortless and, with its AI copilot, can even suggest the correct syntax. While this article focuses on Python, the logic directly translates to SQL code you might run inside Galaxy.
Rolling averages are foundational in analytics and data engineering because they transform volatile raw signals into stable trends. Whether monitoring application KPIs, forecasting demand, or preprocessing features for ML models, engineers rely on rolling averages to reduce noise without losing the essence of the data. Understanding both the statistical concept and the efficient implementation is crucial for building scalable analytics pipelines.
A rolling (simple moving) average assigns equal weight to all points in the window, while an exponential moving average (EMA) applies greater weight to recent observations, reacting faster to changes.
Yes. First resample or interpolate your data to a regular frequency (e.g., hourly). Once the index is uniform, apply .rolling()
without surprises.
Start with a window that aligns with meaningful business cycles—such as 7 days for weekly seasonality—then fine-tune through error analysis or cross-validation.
Absolutely. Galaxy supports SQL window functions like AVG(value) OVER (ORDER BY ts ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
. The AI copilot can even generate or optimize these queries for you.