Mutating Multiple Columns at Once with dplyr

Galaxy Glossary

How do I mutate multiple columns at once with dplyr?

Applying a single or multiple transformations to several columns of an R data frame in one step using dplyr’s mutate() + across() pattern.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Description

Mutating several columns at once is a common data-wrangling task. Since dplyr 1.0, the canonical way to do it is to combine mutate() with across(). The pattern dramatically reduces boiler-plate, prevents copy-pasting errors, and keeps your data pipelines readable, testable and fast.

Why mutate several columns together?

Data rarely arrives in the shape you need for analysis. Numeric features may require scaling, date columns need standardisation, categorical variables should be recoded, and text fields often need trimming. Performing these tasks column by column quickly becomes error-prone and hard to maintain. The mutate(across()) idiom solves the problem by letting you:

  • Express intent declaratively (what to change, not how to loop).
  • Ensure identical operations across multiple variables.
  • Build pipelines that are easy to audit and reproduce.
  • Leverage dplyr’s C++-backed speed instead of R-level for loops.

Recap: mutate() and across()

mutate() adds or modifies columns. across() specifies which columns to touch and what function(s) to apply. The general template is:

data %>%
mutate(across(, , .names = "{col}_new"))

Key arguments:

  • <selector>: tidy-select expression that picks columns (e.g., starts_with("score_"), where(is.numeric)).
  • <function>: • single function (e.g., log), • anonymous function (e.g., ~ .x / 100), or • named list of functions.
  • .names: glue pattern for output names. Use "{col}" to keep the original name or "{col}_{fn}" for multi-function cases.

Selecting columns to mutate

dplyr ships with a rich tidy-select mini-language:

  • starts_with("prefix"), ends_with(), contains()
  • matches("regex")
  • where(is.numeric), where(is.character)
  • Column ranges: col1:col5
  • Negative selection: -c(id, date)

Mix them with | and & for complex rules.

Applying the same transformation

library(dplyr)
data_scaled <- data_raw %>%
mutate(across(where(is.numeric), scale))

Every numeric column is centred and scaled. Because scale() returns a matrix, dplyr keeps the result as numeric vector, preserving column count.

Applying different transformations to the same columns

data_aug <- data_raw %>%
mutate(across(starts_with("score_"), list(z = scale, pct = ~ (.x / max(.x))*100),
.names = "{col}_{fn}"))

The list() names (z, pct) automatically suffix the new columns via {fn}.

Applying different functions to different columns

Create multiple across() calls inside one mutate:

data_clean <- data_raw %>%
mutate(across(where(is.character), str_trim),
across(ends_with("_date"), as.Date, format = "%Y-%m-%d"))

Row-wise operations that depend on other columns

Sometimes the new value of a column depends on values from other columns. Two approaches:

  1. Use rowwise() to work in row context:

data_ratios <- data_raw %>%
rowwise() %>%
mutate(across(c(a, b, c), ~ .x / sum(c_across(c(a, b, c)))))

  1. Compute helper variables first, then reuse:

data_ratios <- data_raw %>%
mutate(total = a + b + c) %>%
mutate(across(a:c, ~ .x / total))

Dynamic function lists

You can build function lists programmatically. Suppose you have a named vector of exponents:

powers <- c(sq = 2, cube = 3, quad = 4)
funs <- purrr::imap(powers, ~ function(x) x ^ .x)

data_pow <- data_raw %>%
mutate(across(where(is.numeric), funs, .names = "{col}_{fn}"))

Performance considerations

  • across() is C++-optimised when the applied function is vectorised.
  • Avoid wrapping expensive R loops inside anonymous functions; push heavy work into C-level packages (e.g., stringi for text).
  • Dplyr preserves groups; if you are inside group_by(), operations run per group.

Best practices

  • Prefer across() over the superseded mutate_at()/mutate_if().
  • Use .names proactively to avoid overwriting or to create tidy column names.
  • Write small, testable helper functions for complex transformations.
  • Chain multiple across() calls instead of nesting long anonymous functions.
  • Keep pipelines readable—break them into blocks and add comments for each logical step.

Common pitfalls to avoid

  • Relying on deprecated verbs: mutate_at(), mutate_if(), and mutate_all() may be removed in future releases. Migrate to across().
  • Accidental overwrite: Omitting .names when you intend to keep the originals will silently replace data.
  • Non-vectorised functions: Functions that work row-by-row (lm(), web requests) will murder performance. Pre-vectorise or move to rowwise() judiciously.

Putting it all together

features <- raw %>%
# Step 1: group by customer
group_by(customer_id) %>%
# Step 2: compute behaviour counts
summarise(across(event:audit, sum),
first_seen = min(timestamp),
last_seen = max(timestamp)) %>%
ungroup() %>%
# Step 3: scale numeric counts and keep dates intact
mutate(across(where(is.integer), ~ (.x - min(.x)) / (max(.x) - min(.x)),
.names = "{col}_norm"))

The result is a clean feature table ready for modelling.

Relation to Galaxy

Although dplyr is an R-side data-manipulation library, many teams use it to prototype transformations that later get re-implemented in SQL. Galaxy’s AI copilot can translate a well-structured dplyr pipeline—especially one built with across()—into efficient SQL, letting engineers validate logic in R before committing it to the warehouse.

Conclusion

Mastering mutate(across()) unlocks expressive, concise, and high-performance data wrangling. Adopt tidy-selection helpers, name your outputs clearly, and keep functions vectorised. Your future self—and your collaborators—will thank you.

Why Mutating Multiple Columns at Once with dplyr is important

In real-world datasets you often have dozens or hundreds of variables that require identical cleaning steps. Writing one line per column is not only verbose but also risky—typos and inconsistent logic creep in easily. The mutate-across idiom gives data engineers and analysts a single, reproducible source of truth, thereby improving code quality, reducing maintenance cost, and accelerating experimentation.

Mutating Multiple Columns at Once with dplyr Example Usage


df %>% mutate(across(starts_with("score_"), ~ .x / 100))

Common Mistakes

Frequently Asked Questions (FAQs)

What is the difference between mutate_at and mutate(across)?

mutate_at() was the historical way to apply a function to selected columns. It is now superseded by mutate(across()), which offers a unified syntax, more powerful tidy-selection, better naming control via .names, and a clearer mental model.

How can I keep original columns while adding transformed ones?

Supply a .names pattern that creates new column names, e.g., .names = "{col}_scaled". Without it, dplyr will overwrite the originals.

Why is my code slow when I use across()?

Most likely the function you pass to across() is not vectorised or calls R-level loops. Rewrite it with vectorised functions or apply rowwise() only when necessary.

Can Galaxy convert a dplyr mutate(across()) pipeline to SQL?

Yes. Galaxy’s AI copilot understands tidyverse syntax and can suggest equivalent SQL, allowing you to prototype in R and productionise in the data warehouse.

Want to learn about other SQL terms?