Using regular expressions in spreadsheet functions (e.g., Google Sheets or Excel) to normalize, validate, and format phone numbers consistently.
Regex (regular expressions) provide a concise, declarative way to identify patterns in text—making them ideal for stripping noise from messy phone-number strings and re-formatting them into standardized, analytics-ready values.
Phone numbers power user authentication, SMS campaigns, customer-support routing, and data enrichment. Inconsistent formats—dashes, parentheses, country codes, extensions—break JOINs, deduplication, and downstream APIs. Automated regex cleaning ensures:
+15554443333
).Google Sheets exposes three key functions:
REGEXREPLACE(text, pattern, replacement)
— substitute matching substrings.REGEXEXTRACT(text, pattern)
— pull out the first match.REGEXMATCH(text, pattern)
— boolean test.Excel now supports REGEX.EXTRACT()
and REGEX.REPLACE()
(or FILTERXML
/TEXTSPLIT
for legacy). The regex dialects are PCRE-like and case-sensitive, with a few limitations (no look-behinds in Sheets).
Most workflows start by deleting any character that is not a digit:
=REGEXREPLACE(A2, "[^0-9]", "")
Explanation: [^0-9]
is a negated character class—match everything except digits—and replace it with an empty string.
After stripping, verify length:
=IF(LEN(B2)=10, B2, "Invalid")
Sheets lacks look-behinds, so length checks are separate from regex.
=IF(LEN(B2)=10, "+1" & B2, IF(LEN(B2)=11, "+" & B2, "Invalid"))
=TEXT(MID(B2,1,3)&"-"&MID(B2,4,3)&"-"&MID(B2,7,4))
Remember: analytics systems usually prefer E.164 (+15551234567
) over display formats.
=REGEXREPLACE(A2,
"^([+]?\d{1,3})?[\s-]*\(?\d{3}\)?[\s-]*\d{3}[\s-]*\d{4}(?:\s*(?:x|ext\.?|#)\s*(\d+))?$",
"+$1$2")
This expression:
$1
).$2
). +countrycodeextension
—ideal before further parsing.=REGEXEXTRACT(A2, "^(?=\D*\d{10}\D*$)[\s\S]*$")
The positive look-ahead ensures exactly 10 digits in the entire cell.
phone_clean
column.An entire transformation can live inside Apps Script or dbt if your data later lands in a warehouse. Regex remains the portable core.
E.164
across systems; only pretty-print in the UI layer.Mistake: .*
before an extension captures too much.
Fix: Use non-greedy .*?
or explicit classes.
Mistake: Assuming all numbers are 10 digits.
Fix: Make country code optional but captured; validate by region.
Mistake: Sheets may render +15551230000
as 1.55512E+10
.
Fix: Prepend an apostrophe or wrap in TO_TEXT()
.
If your team stores phone numbers in a warehouse, you can push regex cleaning into SQL. Modern engines (PostgreSQL, Snowflake, BigQuery) expose REGEXP_REPLACE
. Galaxy’s AI-powered SQL editor can autocomplete these functions and even suggest the right pattern:
SELECT REGEXP_REPLACE(raw_phone,
'[^0-9]', '') AS digits_only
FROM users;
Run, share, and endorse the query inside Galaxy Collections so everyone reuses the same canonical transformation.
Dirty phone-number fields cause failed SMS sends, duplicate users, inaccurate joins, and frustrated customers. Cleaning with regex at the spreadsheet or SQL layer removes human error, enforces global standards like E.164, and keeps data pipelines flowing smoothly across marketing, support, and analytics systems.
Yes. Wrap your regex in ARRAYFORMULA
to process entire columns without dragging. Combine REGEXREPLACE
, LEN
, and conditional logic to normalize, validate, and re-format thousands of rows at once.
\(?\d{3}\)?[-.\s]*\d{3}[-.\s]*\d{4}
matches parentheses, dashes, dots, or spaces around a 3-3-4 digit structure. Always strip non-digits after matching to avoid edge-case surprises.
Absolutely. Galaxy’s SQL editor supports databases with REGEXP_REPLACE
. Its AI copilot can generate or refactor your cleaning queries, and Collections let teammates endorse them so everyone reuses the same logic.
Cast the result to text using TO_TEXT()
, or prefix an apostrophe ('
). You can also format the column as "Plain text" before pasting results.