Skip to content

Batch output: copies & naming

Most runs produce one safe copy. But with synthetic mode you can ask one run to produce several variants of a document — each with a different fake cast — and you can control how all your output files are named. These are the batch output controls.

When a profile uses synthetic mode, the staging table gains a Copies column with a stepper. Set it above 1 and that document is anonymized that many times in one run, each pass generating a fresh set of fake values.

expediente-clinico-marcos-patel.pdf Copies: 3
↓ one run
expediente…__001-1.pdf (David Romero Gil, NHC 84913366, …)
expediente…__001-2.pdf (Hugo Navarro Ortiz, NHC 55120947, …)
expediente…__001-3.pdf (Iker Pardo Vega, NHC 73048815, …)

Each copy is internally consistent — within a single variant, every mention of a person still maps to one fake identity — but the variants differ from each other.

  • Training and test data. Turn one real document into many realistic, non-identifying examples that share its structure.
  • Demos. Show a workflow on several “different” records without touching production data.
  • Robustness testing. Feed a pipeline many shape-identical inputs with varied values.

How output files are named is set in the profile’s Output section, under Output file naming. Each output gets a token appended as name__token:

StyleToken looks likeNotes
Number001, 002, …A per-file counter kept on this computer
Hashba72c58eA short hash, good for uniqueness
Date20260612The run date (YYYYMMDD)

The number style increments a counter per source file that persists between runs, so the third time you anonymize informe.pdf you get informe__003. When you also generate copies, the copy index is appended too: informe__003-1, informe__003-2.

The base file name is itself anonymized — if the original was invoice-marcos-patel.pdf, the PII in the name gets the same treatment as the body. See output location.

A run that produced several copies is a single history entry, tagged with a count like 3 files. Opening it offers to open all the generated files (Piixie asks first if there are many). The replacement table reflects the run; each variant’s files sit together in your output folder.

  • For varied output (the point of copies), leave dictionary reuse off — you want different casts.
  • For consistent output across runs, that’s a dictionary job, not copies.
  • A template shapes how each copy is anonymized (which fields, which methods); copies just multiply it.