Anonymization modes

Every anonymization run uses one of five modes. The model’s job is the same in all of them: find every piece of PII in the document and map each surface form to a replacement. The mode decides what those replacements look like.

Redaction

PII is replaced with a visible marker, [REDACTED] by default.

Patient Marcus Patel (DOB 14/03/1982) can be reached at [email protected].
↓
Patient [REDACTED] (DOB [REDACTED]) can be reached at [REDACTED].

Redaction is the bluntest mode and the easiest to verify: anything sensitive is visibly gone. The cost is context. A redacted document can become hard to read when names, dates, and identifiers carried the meaning.

Through a profile you can change the marker text and switch to fill characters: asterisks or a custom string, matching the original length, a random length, or a fixed one.

Replacement

PII becomes numbered tokens that stay consistent across the document.

Marcus Patel emailed Sarah Kim. Patel's address is 12 Elm St.
↓
[NAME_1] emailed [NAME_2]. [NAME_1]'s address is [ADDRESS_1].

Two mentions of the same person get the same token, so the document’s structure survives: you can still tell who did what. The default token format is [TYPE_NUM] and can be changed in a profile.

Synthetic

PII becomes plausible fake values generated locally with a Faker runtime. Names map to fresh names, emails to coherent fake emails, addresses to fake addresses with the same shape.

Marcus Patel ([email protected]) lives at 12 Elm St, Springfield.
↓
Ethan Vance ([email protected]) lives at 84 Cedar Ave, Riverside.

The document keeps reading naturally, which makes synthetic mode the right choice for demos, test fixtures, and LLM prompts. Replacements stay internally consistent: a person’s email reuses their fake name, multi-line addresses come from one fake address profile, and formats (date styles, digit counts, separators) are preserved. The synthetic data page covers the generator families and consistency rules.

Profiles expose a random seed (for reproducible output) and a locale.

LLM Gen

The model itself invents the replacement values instead of delegating to the Faker runtime. Useful when replacements need judgment that a generator library can’t provide, like rephrasing a sensitive job title or producing a domain-appropriate fake value. Steer it through the profile’s prompt customization (system message or user prompt prepend/append).

JavaScript

You write the transformation. A profile holds a JavaScript snippet that receives each detected entity and returns the replacement, running in Piixie’s embedded pure-Go JavaScript VM with seed and locale settings. This is the escape hatch for organization-specific rules: internal ID formats, custom token schemes, lookups against your own conventions.

Consistency across mentions

Whatever the mode, Piixie instructs the model to emit a separate mapping for every surface form of an entity: “Marcus Patel”, “Marcus”, “Mr. Patel”, and “Patel’s” each get their own entry, and in replacement or synthetic mode they all resolve to the same target identity. Longer forms are applied before shorter ones so substring overlaps don’t corrupt the output.

Choosing a mode

Reviewing for legal or compliance: redaction, easiest to audit.
Analyzing structure, who-did-what, or feeding a pipeline that needs entity identity: replacement.
Anything a human (or an LLM) will read: synthetic.
Special cases: LLM Gen or JavaScript through a profile.