Skip to content

Building a dictionary from a run

The best way to fill a dictionary isn’t to type pairs — it’s to let a synthetic run generate them, then keep the ones you like. This turns a one-off anonymization into a permanent, reusable mapping.

When synthetic mode runs, it produces a full set of original → fake pairs for that document. Those pairs are exactly the shape a dictionary holds. Promoting them means: “the fake identities this run invented are now the canonical ones — use them again next time.”

In the profile editor, the Synthetic settings include dictionary options:

  • Add new entries to dictionary — after a run, any newly generated pairs are written into the chosen dictionary, each stamped with the source file and date.
  • Replace with existing synthetic data — before generating anything new, reuse values already in the dictionary. A person you’ve seen before keeps their established fake identity; only genuinely new values get freshly generated.

Pick the target dictionary from the dropdown in the same panel (or open the manager from there to create one first).

Run these two switches together and you get a system that learns:

  1. First document mentions Marcos Patel. The run synthesizes David Romero Gil and adds the pair to the dictionary.
  2. Next document also mentions Marcos Patel. Because “replace with existing” is on, he’s recognized and becomes David Romero Gil again — same identity, no churn.
  3. New people in that second document get fresh fakes, which are added too.

Over a few documents the dictionary fills out, and your anonymized outputs become consistent across the whole set — the same real person is always the same fake person, everywhere.

Each promoted entry records the document it came from, so the manager’s Source file column tells you the provenance of every pair. Manually added entries show as Manual; run-promoted ones show the file name.

  • Recurring documents — a monthly report on the same accounts, the same patients, the same clients.
  • A document set — anonymizing a whole folder where entities recur across files.
  • Test fixtures — generate a stable cast of fake people once, reuse them forever.

For a full walkthrough, see Consistent fake identities across a team.