Recipe: consistent fake identities across a team
Synthetic mode is consistent within a document. To stay consistent across documents — so Marcos Patel is David Romero Gil in every file your team produces — you add a dictionary. This recipe sets up a self-maintaining mapping that grows as you work.
Goal: a stable cast of fake identities shared across an ongoing document set, with no per-file effort after setup.
1. Create a dictionary
Section titled “1. Create a dictionary”Open the dictionary manager and create one named for the context — Cliente Mediterrània, Estudi cardiologia, Test fixtures. It starts empty.
2. Make a profile that uses it
Section titled “2. Make a profile that uses it”Create a profile for this work, set its method to Synthetic, and in the synthetic settings:
- Dictionary → your new dictionary.
- Replace with existing synthetic data → on.
- Add new entries to dictionary → on.
That’s the learning loop: reuse what’s known, remember what’s new.
3. Process the first document
Section titled “3. Process the first document”Run a representative file with this profile. Synthetic mode invents fakes and, because add is on, writes them into the dictionary — each stamped with the source file and date:
Marcos Patel → David Romero Gil (expediente-01.pdf, 11 Jun)1029384 (NHC) → 849133664. Process the rest
Section titled “4. Process the rest”Run the next files with the same profile. Now replace existing kicks in: anyone already in the dictionary keeps their established fake; only genuinely new people get fresh values (which are added too). Over a few documents the cast fills out and stabilizes.
Document 2 also mentions Marcos Patel → still David Romero Gil ✓ consistentDocument 2 introduces Dra. Ruiz → Dra. Lucía Sáez Marín (new, recorded)5. Curate
Section titled “5. Curate”In the manager, review the cast. Edit any fake you don’t like; add manual entries for fixed swaps the model can’t know (a codename: Proyecto ORION → Proyecto AZUL). Use Hide original when curating in shared spaces.
6. Share the mapping with teammates
Section titled “6. Share the mapping with teammates”The dictionary is the shared source of truth. Copy all as CSV to hand a teammate the mapping, or duplicate it to branch a variant. Everyone using the same profile + dictionary produces output that lines up.
7. It’s also your reverse key
Section titled “7. It’s also your reverse key”Because the dictionary records original ↔ fake, it doubles as the key to reverse any of these documents later — even ones edited downstream. Keep it.
Why this beats hoping
Section titled “Why this beats hoping”Without a dictionary, two runs of the same person produce two different fakes — your “anonymized” set is internally inconsistent, and cross-document analysis breaks. The dictionary turns “hope the model picks the same value” into “it always does.”