Regex to Clean Phone Numbers in Sheets

How can I use regex to clean and standardize phone numbers in spreadsheets?

Using regular expressions in spreadsheet functions (e.g., Google Sheets or Excel) to normalize, validate, and format phone numbers consistently.

Welcome to the Galaxy, Guardian!

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

What Is Regex for Cleaning Phone Numbers?

Regex (regular expressions) provide a concise, declarative way to identify patterns in text—making them ideal for stripping noise from messy phone-number strings and re-formatting them into standardized, analytics-ready values.

Why Phone-Number Cleaning Matters

Phone numbers power user authentication, SMS campaigns, customer-support routing, and data enrichment. Inconsistent formats—dashes, parentheses, country codes, extensions—break JOINs, deduplication, and downstream APIs. Automated regex cleaning ensures:

Accurate joins between CRM, billing, and product databases.
Higher SMS deliverability by converting to E.164 (+15554443333).
Simpler validation logic in apps and analytics pipelines.

How Regex Works in Sheets

Google Sheets exposes three key functions:

REGEXREPLACE(text, pattern, replacement) — substitute matching substrings.
REGEXEXTRACT(text, pattern) — pull out the first match.
REGEXMATCH(text, pattern) — boolean test.

Excel now supports REGEX.EXTRACT() and REGEX.REPLACE() (or FILTERXML/TEXTSPLIT for legacy). The regex dialects are PCRE-like and case-sensitive, with a few limitations (no look-behinds in Sheets).

Core Cleaning Patterns

1. Strip Non-Digits

Most workflows start by deleting any character that is not a digit:

=REGEXREPLACE(A2, "[^0-9]", "")

Explanation: [^0-9] is a negated character class—match everything except digits—and replace it with an empty string.

2. Enforce 10-Digit US Numbers

After stripping, verify length:

=IF(LEN(B2)=10, B2, "Invalid")

Sheets lacks look-behinds, so length checks are separate from regex.

3. Add Country Code (+1)

=IF(LEN(B2)=10, "+1" & B2, IF(LEN(B2)=11, "+" & B2, "Invalid"))

4. Format Nicely (Optional)

=TEXT(MID(B2,1,3)&"-"&MID(B2,4,3)&"-"&MID(B2,7,4))

Remember: analytics systems usually prefer E.164 (+15551234567) over display formats.

Advanced Regex Techniques

Capture Groups for Extension Handling

=REGEXREPLACE(A2, "^([+]?\d{1,3})?[\s-]*$?\d{3}$?[\s-]*\d{3}[\s-]*\d{4}(?:\s*(?:x|ext\.?|#)\s*(\d+))?$", "+$1$2")

This expression:

Optionally captures the country code ($1).
Matches US local number in multiple formats.
Optionally captures extensions ($2).
Outputs +countrycodeextension—ideal before further parsing.

Look-Aheads for Length Validation (Excel PCRE)

=REGEXEXTRACT(A2, "^(?=\D*\d{10}\D*$)[\s\S]*$")

The positive look-ahead ensures exactly 10 digits in the entire cell.

Putting It All Together: End-to-End Pipeline

Normalize – remove all non-digits.
Validate – enforce expected digit count.
Standardize – prefix with country code.
Store – write to a dedicated phone_clean column.

An entire transformation can live inside Apps Script or dbt if your data later lands in a warehouse. Regex remains the portable core.

Best Practices

Store final numbers as text to avoid scientific notation.
Keep raw input for traceability.
Use E.164 across systems; only pretty-print in the UI layer.
Unit-test patterns on edge cases (vanity numbers, short codes, international).
Document regex intent—future maintainers will thank you.

Common Mistakes & How to Fix Them

Greedy Matching Swallows Extensions

Mistake: .* before an extension captures too much.
Fix: Use non-greedy .*? or explicit classes.

Ignoring Country Codes

Mistake: Assuming all numbers are 10 digits.
Fix: Make country code optional but captured; validate by region.

Converting to Number Type

Mistake: Sheets may render +15551230000 as 1.55512E+10.
Fix: Prepend an apostrophe or wrap in TO_TEXT().

Regex in SQL & Galaxy

If your team stores phone numbers in a warehouse, you can push regex cleaning into SQL. Modern engines (PostgreSQL, Snowflake, BigQuery) expose REGEXP_REPLACE. Galaxy’s AI-powered SQL editor can autocomplete these functions and even suggest the right pattern:

SELECT REGEXP_REPLACE(raw_phone, '[^0-9]', '') AS digits_only FROM users;

Run, share, and endorse the query inside Galaxy Collections so everyone reuses the same canonical transformation.

Why Regex to Clean Phone Numbers in Sheets is important

Dirty phone-number fields cause failed SMS sends, duplicate users, inaccurate joins, and frustrated customers. Cleaning with regex at the spreadsheet or SQL layer removes human error, enforces global standards like E.164, and keeps data pipelines flowing smoothly across marketing, support, and analytics systems.

Regex to Clean Phone Numbers in Sheets Example Usage

Regex to Clean Phone Numbers in Sheets Syntax

Common Mistakes

Using greedy wildcards (.*) that capture too much, accidentally deleting extensions. Fix by switching to non-greedy (.*?) or explicit classes like [0-9].
Assuming every number is 10 digits and ignoring international prefixes, leading to truncated or rejected records. Fix by capturing optional country codes and validating by region.
Letting spreadsheets auto-convert cleaned numbers to scientific notation. Fix by casting to text with TO_TEXT() or prefixing an apostrophe before +.

Frequently Asked Questions (FAQs)

Can I clean phone numbers in bulk with a single Sheets formula?

Yes. Wrap your regex in ARRAYFORMULA to process entire columns without dragging. Combine REGEXREPLACE, LEN, and conditional logic to normalize, validate, and re-format thousands of rows at once.

What regex pattern handles most US phone formats?

$?\d{3}$?[-.\s]*\d{3}[-.\s]*\d{4} matches parentheses, dashes, dots, or spaces around a 3-3-4 digit structure. Always strip non-digits after matching to avoid edge-case surprises.

Can I use Galaxy to run SQL that cleans phone numbers with regex?

Absolutely. Galaxy’s SQL editor supports databases with REGEXP_REPLACE. Its AI copilot can generate or refactor your cleaning queries, and Collections let teammates endorse them so everyone reuses the same logic.

How do I prevent Sheets from turning +15551234567 into scientific notation?

Cast the result to text using TO_TEXT(), or prefix an apostrophe ('). You can also format the column as "Plain text" before pasting results.