Cleaning Phone Numbers with Regex in Google Sheets

Galaxy Glossary

How can I use regex to clean and standardize phone numbers in Google Sheets?

Using Google Sheets’ REGEX functions to systematically strip noise, validate patterns, and standardize phone numbers into a consistent format.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Description

Overview

Google Sheets supports the full PCRE2 regular-expression engine through functions like REGEXREPLACE, REGEXEXTRACT, and REGEXMATCH. With a single formula you can purge punctuation, enforce country codes, or re-format inconsistent inputs into a clean, analytics-ready column.

Why Phone-Number Cleaning Matters

Dirty phone numbers break text messages, hinder deduplication, and cause CRM integrations to fail. Manual cleaning doesn’t scale, and simple SUBSTITUTE chains miss edge cases. Regular expressions (regex) let you codify every quirk—from stray spaces to optional extension markers—in one maintainable rule. Automating this step reduces data entry errors and protects downstream dashboards, email campaigns, and customer-support workflows.

Regex Building Blocks

Digits and Non-Digits

  • \d → any digit 0-9
  • \D → any non-digit
  • [^0-9] → explicit negation of digits

Quantifiers

  • * → 0 or more
  • + → 1 or more
  • {n} → exactly n repetitions
  • {n,} → at least n repetitions

Capture Groups

Parentheses group sub-patterns and capture matches for reuse in the replacement string, e.g. (\d{3})(\d{3})(\d{4}).

Step-by-Step Cleaning Pipeline

1. Strip All Non-Digits

Start by removing parentheses, dashes, spaces, and plus signs:

=REGEXREPLACE(A2, "[^0-9]", "")

Result: (415) 555-12344155551234

2. Enforce Country Code

US numbers missing the leading 1? Use alternation to add it when length=10:

=IF(LEN(B2)=10, "1" & B2, B2)

For multi-country sheets, pair REGEXMATCH with specific prefixes.

3. Re-format as E.164

E.164 expects + plus country code and no punctuation:

="+" & REGEXREPLACE(C2, "[^0-9]", "")

Now +14155551234 is ready for Twilio or your CRM.

4. Handle Extensions

Preserve extensions while cleaning:

=ARRAYFORMULA(IFNA(REGEXREPLACE(A2:A, "[\s-]*(ext|x|extension)[\s-]*([0-9]+)", ";ext=\\2")))

415-555-1234 x99+14155551234;ext=99

Advanced Patterns

Optional Parentheses and Country Codes

The pattern ^(?:\+?1\s*)?(?:\(?\d{3}\)?)[\s.-]*\d{3}[\s.-]*\d{4}$ matches most US numbers regardless of punctuation. Use it inside REGEXMATCH to validate a column before bulk SMS campaigns.

Conditional Replacements

Need to reformat only numbers already containing a country code? Combine REGEXMATCH inside an IF wrapper.

Putting It All Together

=ARRAYFORMULA(
IF(A2:A="","",
"+" & REGEXREPLACE(
IF(LEN(REGEXREPLACE(A2:A, "[^0-9]", ""))=10,
"1" & REGEXREPLACE(A2:A, "[^0-9]", ""),
REGEXREPLACE(A2:A, "[^0-9]", "")
),
"[^0-9]", "")
)
))

This single formula ingests raw user input, prepends the US country code when needed, and outputs E.164-compliant strings for every row in the sheet.

Best Practices

  • Work in helper columns until you’re confident the regex covers every scenario.
  • Document your patterns with inline comments (/* */) in Apps Script or in a Notes field so future teammates understand the logic.
  • Use ARRAYFORMULA to avoid copy-pasting down thousands of rows.
  • Validate first. Apply REGEXMATCH to spot invalid rows before you overwrite them with bad data.
  • Keep locale differences in mind. French numbers, for example, often group digits by two.

Common Misconceptions

Regex is slow: Sheets evaluates regex formulas cell-by-cell but stays performant up to tens of thousands of rows.
Digits only is enough: Not true—extensions, leading zeros, and country codes require nuanced handling.
One pattern fits all countries: Each numbering plan has its own rules; build country-specific regexes or use a library when coverage must be global.

Real-World Example

Imagine exporting Shopify orders: the phone field includes "+1 (415) 555-1234", "415.555.9876", and "0044 20 7930 4832". Feeding these directly into a messaging API leads to failures. A two-step regex—first detecting the country code (^(00|\+)?(\d+)) and then stripping non-digits—standardizes everything into clean E.164 strings.

Relation to Galaxy

While Galaxy focuses on SQL rather than spreadsheets, the mindset is identical: codify messy text transformations as deterministic logic instead of manual edits. If you later move your Sheets data into a database, comparable REGEXP_REPLACE statements can live inside Galaxy’s SQL editor and benefit from its AI-powered suggestions.

Why Cleaning Phone Numbers with Regex in Google Sheets is important

Accurate phone numbers are critical for SMS marketing, customer support, and fraud detection. Regex automates the cleanup process, reducing manual work, preventing failed messages, and ensuring that downstream systems receive uniform, reliable data.

Cleaning Phone Numbers with Regex in Google Sheets Example Usage


Transform “(415) 555-1234 ext.9” into “+14155551234;ext=9”

Common Mistakes

Frequently Asked Questions (FAQs)

How do I remove all characters except digits?

Use REGEXREPLACE(A2, "[^0-9]", ""). The pattern [^0-9] targets every non-digit, and the empty replacement string deletes them.

Can I automatically add the country code if it’s missing?

Yes. After stripping non-digits, test the length; if it equals 10, prepend “1”. Otherwise, assume the user supplied a code.

Is regex fast enough for large Sheets?

Sheets handles thousands of regex evaluations quickly. For six-figure rows consider moving to BigQuery and writing a REGEXP_REPLACE query inside Galaxy’s SQL editor.

How does Galaxy relate to phone-number cleaning?

Galaxy’s SQL editor lets you run similar REGEXP_REPLACE statements directly in your database. Its AI copilot can suggest patterns and validate results, extending the same cleaning logic beyond Google Sheets.

Want to learn about other SQL terms?