Data masking in ClickHouse hides sensitive column values for specific users or roles by applying masking policies at query time.
Masking policies are schema-level rules that transform column values at read time. They let you reveal only the necessary slices of data (e.g., first two email characters) without duplicating or encrypting the source table.
Use masking when analysts need production tables but should not see PII like full emails, phone numbers, or credit-card digits. Masking reduces compliance risk while keeping queries simple.
Create a policy with CREATE MASKING POLICY
, supply a masking expression, and attach it to one or more columns. The policy runs automatically for any role granted the policy.
1. Create the policy.
2. Attach to columns.
3. Grant it to a role.
4. Test with the role’s session.
substr
, concat
, replaceRegexpAll
, and math functions like intDiv
are popular. Any deterministic SQL expression is valid inside USING
.
Store full data; mask only on read. Use consistent masks to keep joins working (e.g., keep customer_id
intact). Document each policy’s purpose in the database description.
Open a new session, SET ROLE
to the granted role, and run representative SELECTs. Confirm that administrators without the role still see full data.
Yes. Use ALTER MASKING POLICY … USING
to change the expression. Dropping a policy immediately restores full visibility, so revoke user access first.
Minimal overhead. The masking expression executes on the fly, but ClickHouse’s vectorized engine keeps it fast.
Yes. Round totals (intDiv(total_amount, 100)*100
) or zero-out last digits. Any SQL expression works.
No. Masking is a presentation-layer transformation. The raw data stays unencrypted in storage, so use disk encryption if needed.