Skip to content

Choosing a model

Piixie can run detection on a model in three places: locally on your machine, on a shared server on your network, or on a cloud endpoint like Anthropic or OpenAI. They’re interchangeable from the pipeline’s point of view — same modes, same profiles, same editor afterward. The choice comes down to three questions.

  1. Where can the document go? This is the hard constraint. Regulated or confidential data may be allowed to stay on your machine or a controlled internal server, but not cross to a third-party API.
  2. Is your hardware fast enough? A laptop runs the default local model fine for everyday documents; long, image-heavy PDFs are slow.
  3. How hard is the document? Tricky, ambiguous content benefits from a bigger model — and from reasoning.

The model runs on your machine; nothing crosses the network after the one-time download.

  • Best for: the default. Maximum privacy, no setup, works offline.
  • Models: Gemma 4 E4B (fast) and Gemma 4 12B (higher quality). See local models.
  • Watch out for: speed on older hardware, and very long documents (limited by the model’s context).

One machine (ideally with a GPU) hosts the model; lightweight desktops connect to it.

  • Best for: teams, high volume, air-gapped or compliance-bound networks where external APIs are off the table but an internal server is fine.
  • What crosses: document text and images travel to the server over your LAN, held in memory only for the run. Output and history stay on the workstation.
  • Watch out for: put it behind TLS if the network isn’t trusted.

A frontier model from Anthropic or OpenAI, via a remote endpoint.

  • Best for: the hardest documents, when your data policy allows it and you want top detection quality (and strong reasoning).
  • What crosses: document text and images are sent to that provider, under their terms. Piixie shows a confirmation the first time you pick a cloud model.
  • Watch out for: clear it with your compliance owner before using it on regulated data. This is the only path where raw documents leave your control.
SituationPick
Sensitive data, default privacyLocal
Sensitive data, slow laptops, a teamShared server
Air-gapped networkLocal or shared server
Public/low-risk data, hardest detectionCloud endpoint
Maximum quality, policy allows itCloud endpoint + reasoning High

All available models — downloaded local ones, ones you can download, and enabled remote ones — live in a single selector in settings. Remote models are labeled with their endpoint. Switch per document as the situation changes; everything downstream behaves identically.

Whichever you choose, the intended pattern is the same: anonymize where the document lives, send only the safe copy onward. A cloud anonymization model is an explicit opt-in for the detection step; the common, recommended path keeps detection local and only the anonymized output ever reaches external tools. See privacy.