Recipe: anonymize a medical record
Clinical records are the hardest case and the one Piixie is built for. PII hides in relationships: a patient linked to a doctor, a first-name-only dependent, an admission date that anchors a timeline, an institution that shouldn’t be scrubbed. This is where a reasoning model earns its keep.
Goal: an informe clínic you can hand to a researcher, a vendor, or an external model — clinically intact, personally anonymous.
1. Mode and model
Section titled “1. Mode and model”Synthetic, region Español — the record should still read like a real clinical note. Turn reasoning up to Med or High: the context calls in this document are exactly what thinking-first models get right. If your data policy allows and you want top quality, a cloud endpoint with reasoning High is the strongest option; otherwise the local 12B model.
2. Run it — and watch the reasoning
Section titled “2. Run it — and watch the reasoning”Process the document. With reasoning on, the processing dialog streams the model’s thinking. You’ll see it make the calls that matter:
The header lists a patient name ‘Marcos Patel’ (NAME) and an NHC number — both PII. ‘Dr.’ before ‘Ruiz’ marks a provider; treat the surname as a NAME. ‘14/03/1982’ next to ‘F. Nacimiento’ is a date of birth. ‘Hospital Clínic’ is a public institution, not personal — leave it. The span ‘Jan 2020 – Mar 2023’ anchors a personal timeline → treat as DATE.
3. Review the judgement calls
Section titled “3. Review the judgement calls”Open in the editor and check the context-dependent ones specifically:
- Patient vs. provider — both are
NAMEs and both get synthesized, which is usually right. If you need providers kept (a known specialist relevant to the study), turn those entries off. - Institutions kept — confirm Hospital Clínic survived. If the model over-scrubbed it, switch its entry off.
- Identifying dates — DOB and admission/discharge spans should be handled; one-off public dates left alone. Add any the model missed.
- Record numbers — NHC, episode, and case IDs synthesized to the right shape (see entry types).
4. The privacy posture
Section titled “4. The privacy posture”A medical record’s history row and replacement table store the real values locally — that’s what powers search and reverse, and it means the local database is as sensitive as the record. Keep full-disk encryption on, and clear history (with “delete files”) when you’re done. Detection itself stays on your machine unless you deliberately chose a cloud model.
5. Save, and make it consistent
Section titled “5. Save, and make it consistent”Save to output. If you anonymize the same patients across visits, attach a dictionary with reuse on so each patient keeps one fake identity across every document — see team consistency. Save a template for the report shape.
6. Round-tripping to a cloud model
Section titled “6. Round-tripping to a cloud model”Want a cloud LLM to summarize the record without seeing the real patient? Anonymize first, summarize the safe copy, then deanonymize the answer back to real values. Full steps in share safely with an LLM.