dplyr mutate multiple columns at once

Galaxy Glossary

How do I use dplyr mutate to modify multiple columns simultaneously?

Using dplyr’s mutate() with across() to transform or create many columns in a single, concise pipeline step.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

dplyr’s mutate() can update or create dozens of columns in one tidy, readable command—no loop required.

Learn how to leverage mutate() and across() to apply the same or differing transformations to multiple columns, avoid common pitfalls, and write maintainable R data-wrangling code.

Why mutate multiple columns at once?

Real-world data sets rarely fit neatly into a single column. Numeric values often need the same scaling, dates require common parsing, and character fields share a standard cleaning routine. Repeating mutate() calls or hand-coding column names makes your pipeline brittle and verbose. Since version 1.0.0, dplyr introduced across()—an elegant helper that lets you transform many columns simultaneously while preserving tidy syntax.

Core syntax

library(dplyr) df %>% mutate(across(cols, .fns, .names = "{col}_new"))

cols – tidy-select expression that picks which columns to target.
.fns – a single function, an anonymous lambda, or a named list of functions.
.names – glue-style template for naming output columns.

Selecting columns

By explicit names

mutate(across(c(height, weight), scale))

By helper functions

mutate(across(where(is.numeric), ~ (.x - mean(.x))))

By tidyselect ranges

mutate(across(starts_with("score_"), ~ .x / 100))

Applying multiple functions

Pass a named list of functions to .fns:

mutate(across(where(is.numeric), list(z = scale, log = log10)))

This doubles your columns—height becomes height_z and height_log—without extra typing.

Custom naming patterns

.names provides glue placeholders:

{col} – original column name.
{fn} – function name.

mutate(across(where(is.numeric), scale, .names = "scaled_{col}"))

Anonymous functions & lambda shortcuts

mutate(across(starts_with("pct_"), ~ .x * 100)) # tilde form mutate(across(everything(), function(x) replace_na(x, 0)))

Conditional transforms

Pair across() with if_else() or case_when() to handle branching logic for each column:

mutate(across(ends_with("_flag"), ~ if_else(.x == "Y", 1L, 0L)))

Best practices

1. Use where() to future-proof

Filtering by class (e.g., where(is.numeric)) survives new columns added later.

2. Keep pipelines readable

Complex anonymous functions belong in a named helper declared above your pipeline.

3. Avoid name clashes

When overwriting columns, verify types first or suffix new names with .names to prevent accidental loss.

Real-world example: feature engineering

library(dplyr) library(lubridate) patients <- tibble( id = 1:3, weight_kg = c(70, 80, 65), height_cm = c(175, 168, 180), dob = as.Date(c("1990-02-15", "1985-06-20", "1978-11-03")) ) patients <- patients %>% mutate( across(c(weight_kg, height_cm), list(z = scale), .names = "{col}_{fn}"), bmi = weight_kg / (height_cm/100)^2, age = interval(dob, Sys.Date()) / years(1) )

The pipeline calculates z-scores for two metrics, body-mass index, and age—all inside a single mutate().

Common pitfalls and solutions

1. Misusing summarise()

Mistake: Trying to retain original rows with summarise(across(...)).Fix: Use mutate(); summarise() collapses rows.

2. Forgetting .names when multiplying functions

Mistake: New columns overwrite each other because defaults share names.Fix: Always set .names when using a list of functions.

3. Mixing numeric & character types

Mistake: Applying log() on character columns selected by everything().Fix: Scope with where(is.numeric) or starts_with().

Performance notes

across() uses vctrs under the hood, giving C-level speed for most base R functions. For very large data, pair with data.table or arrow back ends via dplyr’s multi-dispatch.

When to avoid mutate + across

If each column requires distinct, unrelated logic—e.g., different parameters per column—separate mutate() calls may improve clarity.

Takeaways

mutate() + across() is the idiomatic way to vectorize transforms across columns.
Use tidy-select helpers (where(), starts_with()) for robustness.
Control new column names with .names to stay organized.
Validate types to avoid runtime errors.

Why dplyr mutate multiple columns at once is important

Looping over columns or copying-and-pasting repetitive mutate calls clutters code, introduces errors, and slows development. Mastering mutate + across lets data engineers build robust, declarative pipelines that scale to wide data sets—essential for feature engineering, ETL jobs, and analytics workflows.