Choosing a model
Piixie can run detection on a model in three places: locally on your machine, on a shared server on your network, or on a cloud endpoint like Anthropic or OpenAI. They’re interchangeable from the pipeline’s point of view — same modes, same profiles, same editor afterward. The choice comes down to three questions.
The three questions
Section titled “The three questions”- Where can the document go? This is the hard constraint. Regulated or confidential data may be allowed to stay on your machine or a controlled internal server, but not cross to a third-party API.
- Is your hardware fast enough? A laptop runs the default local model fine for everyday documents; long, image-heavy PDFs are slow.
- How hard is the document? Tricky, ambiguous content benefits from a bigger model — and from reasoning.
The options
Section titled “The options”Local (default)
Section titled “Local (default)”The model runs on your machine; nothing crosses the network after the one-time download.
- Best for: the default. Maximum privacy, no setup, works offline.
- Models: Gemma 4 E4B (fast) and Gemma 4 12B (higher quality). See local models.
- Watch out for: speed on older hardware, and very long documents (limited by the model’s context).
Shared server on your network
Section titled “Shared server on your network”One machine (ideally with a GPU) hosts the model; lightweight desktops connect to it.
- Best for: teams, high volume, air-gapped or compliance-bound networks where external APIs are off the table but an internal server is fine.
- What crosses: document text and images travel to the server over your LAN, held in memory only for the run. Output and history stay on the workstation.
- Watch out for: put it behind TLS if the network isn’t trusted.
Cloud endpoint
Section titled “Cloud endpoint”A frontier model from Anthropic or OpenAI, via a remote endpoint.
- Best for: the hardest documents, when your data policy allows it and you want top detection quality (and strong reasoning).
- What crosses: document text and images are sent to that provider, under their terms. Piixie shows a confirmation the first time you pick a cloud model.
- Watch out for: clear it with your compliance owner before using it on regulated data. This is the only path where raw documents leave your control.
A decision table
Section titled “A decision table”| Situation | Pick |
|---|---|
| Sensitive data, default privacy | Local |
| Sensitive data, slow laptops, a team | Shared server |
| Air-gapped network | Local or shared server |
| Public/low-risk data, hardest detection | Cloud endpoint |
| Maximum quality, policy allows it | Cloud endpoint + reasoning High |
Switching is one click
Section titled “Switching is one click”All available models — downloaded local ones, ones you can download, and enabled remote ones — live in a single selector in settings. Remote models are labeled with their endpoint. Switch per document as the situation changes; everything downstream behaves identically.
The privacy boundary holds regardless
Section titled “The privacy boundary holds regardless”Whichever you choose, the intended pattern is the same: anonymize where the document lives, send only the safe copy onward. A cloud anonymization model is an explicit opt-in for the detection step; the common, recommended path keeps detection local and only the anonymized output ever reaches external tools. See privacy.