dplyr mutate multiple columns at once

Galaxy Glossary

How do I use dplyr mutate to modify multiple columns simultaneously?

Using dplyr’s mutate() with across() to transform or create many columns in a single, concise pipeline step.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Description

dplyr’s mutate() can update or create dozens of columns in one tidy, readable command—no loop required.

Learn how to leverage mutate() and across() to apply the same or differing transformations to multiple columns, avoid common pitfalls, and write maintainable R data-wrangling code.

Why mutate multiple columns at once?

Real-world data sets rarely fit neatly into a single column. Numeric values often need the same scaling, dates require common parsing, and character fields share a standard cleaning routine. Repeating mutate() calls or hand-coding column names makes your pipeline brittle and verbose. Since version 1.0.0, dplyr introduced across()—an elegant helper that lets you transform many columns simultaneously while preserving tidy syntax.

Core syntax

library(dplyr)

df %>%
mutate(across(cols, .fns, .names = "{col}_new"))

  • cols – tidy-select expression that picks which columns to target.
  • .fns – a single function, an anonymous lambda, or a named list of functions.
  • .names – glue-style template for naming output columns.

Selecting columns

By explicit names

mutate(across(c(height, weight), scale))

By helper functions

mutate(across(where(is.numeric), ~ (.x - mean(.x))))

By tidyselect ranges

mutate(across(starts_with("score_"), ~ .x / 100))

Applying multiple functions

Pass a named list of functions to .fns:

mutate(across(where(is.numeric), list(z = scale, log = log10)))

This doubles your columns—height becomes height_z and height_log—without extra typing.

Custom naming patterns

.names provides glue placeholders:

  • {col} – original column name.
  • {fn} – function name.

mutate(across(where(is.numeric), scale, .names = "scaled_{col}"))

Anonymous functions & lambda shortcuts

mutate(across(starts_with("pct_"), ~ .x * 100)) # tilde form
mutate(across(everything(), function(x) replace_na(x, 0)))

Conditional transforms

Pair across() with if_else() or case_when() to handle branching logic for each column:

mutate(across(ends_with("_flag"), ~ if_else(.x == "Y", 1L, 0L)))

Best practices

1. Use where() to future-proof

Filtering by class (e.g., where(is.numeric)) survives new columns added later.

2. Keep pipelines readable

Complex anonymous functions belong in a named helper declared above your pipeline.

3. Avoid name clashes

When overwriting columns, verify types first or suffix new names with .names to prevent accidental loss.

Real-world example: feature engineering

library(dplyr)
library(lubridate)

patients <- tibble(
id = 1:3,
weight_kg = c(70, 80, 65),
height_cm = c(175, 168, 180),
dob = as.Date(c("1990-02-15", "1985-06-20", "1978-11-03"))
)

patients <- patients %>%
mutate(
across(c(weight_kg, height_cm), list(z = scale), .names = "{col}_{fn}"),
bmi = weight_kg / (height_cm/100)^2,
age = interval(dob, Sys.Date()) / years(1)
)

The pipeline calculates z-scores for two metrics, body-mass index, and age—all inside a single mutate().

Common pitfalls and solutions

1. Misusing summarise()

Mistake: Trying to retain original rows with summarise(across(...)).Fix: Use mutate(); summarise() collapses rows.

2. Forgetting .names when multiplying functions

Mistake: New columns overwrite each other because defaults share names.Fix: Always set .names when using a list of functions.

3. Mixing numeric & character types

Mistake: Applying log() on character columns selected by everything().Fix: Scope with where(is.numeric) or starts_with().

Performance notes

across() uses vctrs under the hood, giving C-level speed for most base R functions. For very large data, pair with data.table or arrow back ends via dplyr’s multi-dispatch.

When to avoid mutate + across

If each column requires distinct, unrelated logic—e.g., different parameters per column—separate mutate() calls may improve clarity.

Takeaways

  • mutate() + across() is the idiomatic way to vectorize transforms across columns.
  • Use tidy-select helpers (where(), starts_with()) for robustness.
  • Control new column names with .names to stay organized.
  • Validate types to avoid runtime errors.

Further reading

• Wickham, H. et al. “dplyr: A Grammar of Data Manipulation.” R package v1.1.
• RStudio cheatsheet: Transforming data with dplyr.
• Hadley Wickham & Garrett Grolemund, R for Data Science, Ch. 5.

Why dplyr mutate multiple columns at once is important

Looping over columns or copying-and-pasting repetitive mutate calls clutters code, introduces errors, and slows development. Mastering mutate + across lets data engineers build robust, declarative pipelines that scale to wide data sets—essential for feature engineering, ETL jobs, and analytics workflows.

dplyr mutate multiple columns at once Example Usage


patients %>% mutate(across(where(is.numeric), scale))

Common Mistakes

Frequently Asked Questions (FAQs)

What is the simplest way to mutate several columns at once?

Wrap your selection inside across(): mutate(across(where(is.numeric), scale)). This applies scale() to every numeric column.

How do I apply more than one function?

Pass a named list of functions: mutate(across(everything(), list(mean = mean, sd = sd))). Use .names to control the resulting column names.

Can I overwrite the original columns?

Yes. If you omit .names and supply a single function, dplyr writes results back into the same columns. Be cautious and verify data types before overwriting.

Does mutate + across work with grouped data?

Absolutely. Place group_by() before mutate(), and functions run within each group context.

Want to learn about other SQL terms?