Recipe: consistent fake identities across a team

Synthetic mode is consistent within a document. To stay consistent across documents — so Marcos Patel is David Romero Gil in every file your team produces — you add a dictionary. This recipe sets up a self-maintaining mapping that grows as you work.

Goal: a stable cast of fake identities shared across an ongoing document set, with no per-file effort after setup.

1. Create a dictionary

Open the dictionary manager and create one named for the context — Cliente Mediterrània, Estudi cardiologia, Test fixtures. It starts empty.

2. Make a profile that uses it

Create a profile for this work, set its method to Synthetic, and in the synthetic settings:

Dictionary → your new dictionary.
Replace with existing synthetic data → on.
Add new entries to dictionary → on.

That’s the learning loop: reuse what’s known, remember what’s new.

3. Process the first document

Run a representative file with this profile. Synthetic mode invents fakes and, because add is on, writes them into the dictionary — each stamped with the source file and date:

Marcos Patel               → David Romero Gil      (expediente-01.pdf, 11 Jun)
[email protected]   → [email protected]
1029384 (NHC)              → 84913366

4. Process the rest

Run the next files with the same profile. Now replace existing kicks in: anyone already in the dictionary keeps their established fake; only genuinely new people get fresh values (which are added too). Over a few documents the cast fills out and stabilizes.

Document 2 also mentions Marcos Patel  → still David Romero Gil   ✓ consistent
Document 2 introduces Dra. Ruiz        → Dra. Lucía Sáez Marín    (new, recorded)

5. Curate

In the manager, review the cast. Edit any fake you don’t like; add manual entries for fixed swaps the model can’t know (a codename: Proyecto ORION → Proyecto AZUL). Use Hide original when curating in shared spaces.

The dictionary is the shared source of truth. Copy all as CSV to hand a teammate the mapping, or duplicate it to branch a variant. Everyone using the same profile + dictionary produces output that lines up.

7. It’s also your reverse key

Because the dictionary records original ↔ fake, it doubles as the key to reverse any of these documents later — even ones edited downstream. Keep it.

Why this beats hoping

Without a dictionary, two runs of the same person produce two different fakes — your “anonymized” set is internally inconsistent, and cross-document analysis breaks. The dictionary turns “hope the model picks the same value” into “it always does.”