Skip to content

Recipe: share safely with an LLM and restore

You want a cloud LLM — ChatGPT, Claude, whatever — to work on a document that can’t leave your control. The move is a round trip: anonymize on the way out, deanonymize on the way back. The cloud only ever sees fakes.

Goal: get a useful answer about a real document from a tool you don’t control, with zero real PII crossing the boundary.

real doc ──anonymize──▶ safe copy ──paste──▶ cloud LLM
▲ │
└──── deanonymize ◀──── answer (in fakes) ◀────┘

Use synthetic (not redaction — that can’t be reversed), and attach a dictionary with add to dictionary on. Synthetic keeps the document readable, so the LLM actually does a good job; the dictionary records the swaps so you can get back even if the LLM rewrites the text. Keep detection local — the whole point is not sending the raw document anywhere.

Run it, review in the editor, Save to output.

Paste the anonymized text (or upload the safe file) and ask your question:

“Summarize this patient’s cardiac history and flag any medication interactions.”

The model answers about David Romero Gil, NHC 84913366 — the fakes. It has no idea who the real patient is, because it never saw them.

Save the model’s answer to a .txt (or keep the edited file it produced). It’s written in fake values.

Drop that file into Piixie. Two cases:

  • If it’s the unchanged safe file, Piixie recognizes it from history and offers an exact reverse.
  • If it’s the LLM’s new text (a summary, an edit), Piixie won’t recognize it — choose the dictionary route. It finds the fakes in the answer and swaps them back, tolerating the case and accent changes a chat tool introduces.
LLM answer: "David Romero Gil (NHC 84913366): stable angina, review statin dose."
↓ deanonymize (dictionary)
Restored: "Marcos Patel (NHC 1029384): stable angina, review statin dose."

Now the summary is about your real patient — and the cloud only ever held the fake one.

Only the fake version, in both directions. The real document and the real answer existed solely on your machine. That’s the guarantee that makes external tools usable on data you couldn’t otherwise share. The full reasoning is in the round-trip workflow and privacy.

  • Redaction breaks the trip — you can’t reverse [REDACTED]. Use synthetic. (Why.)
  • If the LLM paraphrases a fake away (“the patient” instead of the fake name), there’s no fake left to restore for that value. Synthetic’s natural text makes models keep the names more often than redaction would.
  • The restored file is real PII again — keep it on the trusted side.