Skip to content

The PII type catalog

Every entry carries a type — the model’s classification of what the value is. The type is more than a label: it decides what a synthetic replacement looks like, and it’s how you filter and search entries.

TypeCoversSynthetic example
NAMEPeople’s names, in any form (full, first, surname, with title)Marcos Patel → David Romero Gil
EMAILEmail addresses[email protected][email protected]
PHONEPhone and fax numbers+34 612 345 678 → +34 698 112 530
ADDRESSStreet addresses, full or partialC/ Mayor 12, Madrid → Av. del Sol 84, Sevilla
IDGovernment and record IDs — DNI/NIE, NHC, passport, SSN47829105T → 71530642M
IBANBank account numbers (IBAN and local formats)ES91 2100 0418 45… → ES66 0049 1500 05…
CARDPayment card numbers4539 1488 0343 6467 → 4716 9012 3456 7890
DOBDates of birth14/03/1982 → 02/11/1979
DATEOther identifying dates (employment spans, admission dates)Jan 2020 – Mar 2023 → 02/2018 – 11/2021
ORGOrganizations, employers, institutionsAcme Ibérica SL → Globex Hispania SA
URLPersonal sites and profile linkslinkedin.com/in/mpatel → …
IPIP addresses192.168.1.42 → 10.0.4.207
OTHERAnything sensitive that doesn’t fit a category

The exact catalog the model uses is broad and may grow; these are the ones you’ll see most. The editor’s type dropdown lists the full set with descriptions.

Synthetic mode doesn’t just swap one string for another — it generates a value of the right kind and shape. A DOB of 14/03/1982 becomes another day/month/year date, not a random number. An ID keeps its rough letter-and-digit pattern. An EMAIL reuses the fake person’s name. This is why getting the type right matters when you edit an entry: change a misclassified PHONE to ID and the regenerated value matches the real shape.

See synthetic data for the full set of generator families and how shapes are preserved.

Not every date is PII. “Released in March 2023” in a press release is public. “Empleado de enero 2020 a marzo 2023” on a CV pins down a person’s timeline and is identifying. The model is prompted to treat personal-timeline dates as PII (DATE) and leave clearly public ones alone — a distinction a pattern matcher can’t make. See why LLMs for PII detection.

Prices, totals, and quantities aren’t usually PII, but they can be identifying in context (a very specific salary, a unique claim amount). They’re handled separately by a per-profile switch — preserve, randomize, or zero out — rather than as typed entries.

Redaction characters (for the Redact method)

Section titled “Redaction characters (for the Redact method)”

When an entry is redacted rather than replaced, you pick the fill character. The shared catalog (used in both the editor and profile settings):

full block · dark shade · medium shade · light shade · ■ ▬ blocks · * asterisk · or a custom character or string.