Skip to content

Recipe: consistent fake identities across a team

Synthetic mode is consistent within a document. To stay consistent across documents — so Marcos Patel is David Romero Gil in every file your team produces — you add a dictionary. This recipe sets up a self-maintaining mapping that grows as you work.

Goal: a stable cast of fake identities shared across an ongoing document set, with no per-file effort after setup.

Open the dictionary manager and create one named for the context — Cliente Mediterrània, Estudi cardiologia, Test fixtures. It starts empty.

Create a profile for this work, set its method to Synthetic, and in the synthetic settings:

  • Dictionary → your new dictionary.
  • Replace with existing synthetic dataon.
  • Add new entries to dictionaryon.

That’s the learning loop: reuse what’s known, remember what’s new.

Run a representative file with this profile. Synthetic mode invents fakes and, because add is on, writes them into the dictionary — each stamped with the source file and date:

Marcos Patel → David Romero Gil (expediente-01.pdf, 11 Jun)
1029384 (NHC) → 84913366

Run the next files with the same profile. Now replace existing kicks in: anyone already in the dictionary keeps their established fake; only genuinely new people get fresh values (which are added too). Over a few documents the cast fills out and stabilizes.

Document 2 also mentions Marcos Patel → still David Romero Gil ✓ consistent
Document 2 introduces Dra. Ruiz → Dra. Lucía Sáez Marín (new, recorded)

In the manager, review the cast. Edit any fake you don’t like; add manual entries for fixed swaps the model can’t know (a codename: Proyecto ORION → Proyecto AZUL). Use Hide original when curating in shared spaces.

The dictionary is the shared source of truth. Copy all as CSV to hand a teammate the mapping, or duplicate it to branch a variant. Everyone using the same profile + dictionary produces output that lines up.

Because the dictionary records original ↔ fake, it doubles as the key to reverse any of these documents later — even ones edited downstream. Keep it.

Without a dictionary, two runs of the same person produce two different fakes — your “anonymized” set is internally inconsistent, and cross-document analysis breaks. The dictionary turns “hope the model picks the same value” into “it always does.”