Using ggplot2’s sec.axis argument to overlay a second, independently scaled y-axis on the same plot.
A secondary y-axis allows you to display two different measures—each with its own units—on the same plot. In ggplot2
, this is achieved with the sec.axis
argument inside a scale function (usually scale_y_continuous()
). Unlike spreadsheet software where secondary axes can be added with a click, ggplot2
requires an explicit mathematical transformation that maps one data series onto the scale of another.
In data science and analytics, you often need to compare variables that live on different numeric scales: revenue (millions) versus conversion rate (percent), or temperature (°C) versus energy consumption (kWh). Placing both series on a single primary axis can hide important variations; using separate plots makes pattern detection harder. A properly implemented secondary y-axis keeps the viewer’s attention on a single visual while preserving each variable’s integrity.
In ggplot2 >= 2.2.0
, you add a secondary axis by appending sec.axis = sec_axis(~ transform(.), name = "Axis Label")
to a scale function:
scale_y_continuous(
name = "Primary Axis Label",
sec.axis = sec_axis(~ transform(.), name = "Secondary Axis Label")
)
~
denotes an anonymous function. The dot (.
) is the input (the primary axis values) that you map to the secondary scale.ggplot’s philosophy insists on a single coordinate system. That means you never add a second geom on a separate coordinate system; instead you transform one variable so it fits the numeric range of the other. Then you instruct ggplot how to reverse that transformation to label the secondary axis correctly.
scale_y_continuous()
or scale_x_continuous()
– host the sec.axis
.sec_axis()
– builds the secondary axis; accepts a transform formula and a label.geom_line()
, geom_col()
, geom_point()
– the geoms you overlay.Suppose you have daily data for outside temperature and electricity usage. You want a line for temperature and bars for kWh on the same plot. Temperature ranges 0-30°C, consumption 200-1000 kWh.
library(tidyverse)
set.seed(42)
df <- tibble(
day = seq.Date(Sys.Date() - 29, Sys.Date(), by = "1 day"),
temp_c = runif(30, 0, 30),
kwh = runif(30, 200, 1000)
)
# Compute a multiplier so kWh fits roughly the same range as temp
mult <- max(df$temp_c) / max(df$kwh)
ggplot(df, aes(x = day)) +
geom_col(aes(y = kwh * mult), fill = "steelblue", alpha = 0.6) +
geom_line(aes(y = temp_c), color = "red", size = 1) +
scale_y_continuous(
name = "Temperature (°C)",
sec.axis = sec_axis(~ . / mult, name = "Energy (kWh)")
) +
theme_minimal() +
labs(title = "Temperature vs. Energy Consumption")
Notice how you multiply kWh by mult
so it shares the primary scale, and then divide inside sec_axis()
to compute the right labels.
Stick to linear relationships (multiply and/or add). Non-linear transforms (log, sqrt) make axis interpretation difficult.
Label both axes and differentiate geoms (color, linetype) so the audience instantly knows which axis corresponds to which data series.
Dual axes can accidentally imply correlation where none exists. Only use them when variables share a logical connection (e.g., input versus output) or when patterns need to be temporally compared. Consider a faceted plot if scales or units are totally unrelated.
Expose the scale conversion in comments or the legend. If sharing code inside a collaborative SQL/BI platform like Galaxy, annotate the notebook or query to explain the transform so teammates can reproduce or audit it.
Why it’s wrong: Plotting raw kWh on the primary axis without scaling will compress the temperature line into a nearly flat line.
Fix: Scale the secondary metric (e.g., kwh * mult
) before plotting it.
Why it’s wrong: Misaligned x-values lead to faulty visual comparisons.
Fix: Combine data into a single data frame or ensure identical breaks.
Why it’s wrong: ggplot does not derive secondary axes automatically; you must supply the mathematical relationship.
Fix: Manually compute the multiplier or transformation function.
# Two y-axes: revenue vs. conversion rate
library(ggplot2)
sales <- data.frame(
month = 1:12,
revenue = c(120, 130, 140, 160, 155, 170, 190, 200, 210, 225, 230, 250),
conversion = c(2.0, 2.1, 2.3, 2.4, 2.2, 2.5, 2.7, 2.8, 2.9, 3.0, 3.1, 3.3)
)
coef <- max(sales$revenue) / max(sales$conversion)
ggplot(sales, aes(month)) +
geom_col(aes(y = revenue), fill = "forestgreen", alpha = 0.5) +
geom_line(aes(y = conversion * coef), colour = "darkred", size = 1.2) +
scale_y_continuous(
name = "Revenue (kUSD)",
sec.axis = sec_axis(~ . / coef, name = "Conversion Rate (%)")
) +
theme_minimal() +
labs(title = "Monthly Revenue vs. Conversion Rate")
Use a simple ratio such as max(primary) / max(secondary)
or another scaling factor that brings both series into a visually comparable range without distortion.
Yes. You can use scale_x_continuous(sec.axis = ...)
. The same transformation principles apply.
No. ggplot2 supports only one secondary axis per plot. If you need more, create additional plots or facets.
Check that the transform in sec_axis()
is the exact inverse of the scaling applied to the plotted data. A mismatch causes incorrect tick labels.
Dual-axis plots enable analysts to compare variables with different units without splitting attention across multiple visuals. In data engineering workflows, especially when generating automated reports, mastering secondary axes helps deliver clearer insights in less dashboard real estate. Understanding the required transformations prevents misleading visuals and ensures analytical integrity.
Use a ratio like max(primary) / max(secondary)
or another factor that brings both series into comparable ranges without distortion.
Yes. Apply the same principle with scale_x_continuous(sec.axis = ...)
.
No. ggplot2 allows at most one secondary axis. For additional metrics, use facets or multiple plots.
Ensure the function in sec_axis()
exactly inverts the scaling applied to the plotted data.