Using dplyr’s mutate() with across() to transform or create many columns in a single, concise pipeline step.
dplyr’s mutate() can update or create dozens of columns in one tidy, readable command—no loop required.
Learn how to leverage mutate()
and across()
to apply the same or differing transformations to multiple columns, avoid common pitfalls, and write maintainable R data-wrangling code.
Real-world data sets rarely fit neatly into a single column. Numeric values often need the same scaling, dates require common parsing, and character fields share a standard cleaning routine. Repeating mutate()
calls or hand-coding column names makes your pipeline brittle and verbose. Since version 1.0.0, dplyr
introduced across()
—an elegant helper that lets you transform many columns simultaneously while preserving tidy syntax.
library(dplyr)
df %>%
mutate(across(cols, .fns, .names = "{col}_new"))
cols
– tidy-select expression that picks which columns to target..fns
– a single function, an anonymous lambda, or a named list of functions..names
– glue-style template for naming output columns.mutate(across(c(height, weight), scale))
mutate(across(where(is.numeric), ~ (.x - mean(.x))))
mutate(across(starts_with("score_"), ~ .x / 100))
Pass a named list of functions to .fns
:
mutate(across(where(is.numeric), list(z = scale, log = log10)))
This doubles your columns—height
becomes height_z
and height_log
—without extra typing.
.names
provides glue placeholders:
{col}
– original column name.{fn}
– function name.mutate(across(where(is.numeric), scale, .names = "scaled_{col}"))
mutate(across(starts_with("pct_"), ~ .x * 100)) # tilde form
mutate(across(everything(), function(x) replace_na(x, 0)))
Pair across()
with if_else()
or case_when()
to handle branching logic for each column:
mutate(across(ends_with("_flag"), ~ if_else(.x == "Y", 1L, 0L)))
Filtering by class (e.g., where(is.numeric)
) survives new columns added later.
Complex anonymous functions belong in a named helper declared above your pipeline.
When overwriting columns, verify types first or suffix new names with .names
to prevent accidental loss.
library(dplyr)
library(lubridate)
patients <- tibble(
id = 1:3,
weight_kg = c(70, 80, 65),
height_cm = c(175, 168, 180),
dob = as.Date(c("1990-02-15", "1985-06-20", "1978-11-03"))
)
patients <- patients %>%
mutate(
across(c(weight_kg, height_cm), list(z = scale), .names = "{col}_{fn}"),
bmi = weight_kg / (height_cm/100)^2,
age = interval(dob, Sys.Date()) / years(1)
)
The pipeline calculates z-scores for two metrics, body-mass index, and age—all inside a single mutate()
.
Mistake: Trying to retain original rows with summarise(across(...))
.Fix: Use mutate()
; summarise()
collapses rows.
Mistake: New columns overwrite each other because defaults share names.Fix: Always set .names
when using a list of functions.
Mistake: Applying log()
on character columns selected by everything()
.Fix: Scope with where(is.numeric)
or starts_with()
.
across()
uses vctrs
under the hood, giving C-level speed for most base R functions. For very large data, pair with data.table
or arrow
back ends via dplyr
’s multi-dispatch.
If each column requires distinct, unrelated logic—e.g., different parameters per column—separate mutate()
calls may improve clarity.
mutate()
+ across()
is the idiomatic way to vectorize transforms across columns.where()
, starts_with()
) for robustness..names
to stay organized.• Wickham, H. et al. “dplyr: A Grammar of Data Manipulation.” R package v1.1.
• RStudio cheatsheet: Transforming data with dplyr.
• Hadley Wickham & Garrett Grolemund, R for Data Science, Ch. 5.
Looping over columns or copying-and-pasting repetitive mutate calls clutters code, introduces errors, and slows development. Mastering mutate + across lets data engineers build robust, declarative pipelines that scale to wide data sets—essential for feature engineering, ETL jobs, and analytics workflows.
Wrap your selection inside across()
: mutate(across(where(is.numeric), scale))
. This applies scale()
to every numeric column.
Pass a named list of functions: mutate(across(everything(), list(mean = mean, sd = sd)))
. Use .names
to control the resulting column names.
Yes. If you omit .names
and supply a single function, dplyr
writes results back into the same columns. Be cautious and verify data types before overwriting.
Absolutely. Place group_by()
before mutate()
, and functions run within each group context.