What PSF Domain 1 requires
The Production Safety Framework defines Domain 1 — Input Governance — as the set of controls that ensure only appropriate, safe, and well-formed inputs reach your language model. A deployment satisfies D1 when it can demonstrate four properties:
- Prompt injection resistance — adversarial inputs cannot override system instructions or extract confidential context
- Input schema validation — inputs conform to a declared type and structure before model execution
- Content classification — inputs are classified by type and intent; high-risk inputs are flagged or blocked
- Audit trail — every input is logged with sufficient context to investigate incidents
A Strong D1 rating requires all four. A Partial rating typically means input validation is present but prompt injection resistance is absent, or classification is present without audit. A Gap rating means the framework passes inputs to the model with no controls whatsoever — which describes most pure-Python agent frameworks out of the box.
Why every framework leaves D1 open
Agent frameworks are designed to help you build agents quickly. Input governance is friction. Every framework in our assessment series — LangChain, CrewAI, AutoGen, Semantic Kernel, Haystack, DSPy, Pydantic AI — ships with exactly zero prompt injection defences by default. This is not an oversight. It is a deliberate design choice to keep the framework general-purpose and unopinionated.
The problem arises when practitioners assume that using a production-grade framework means they have production-grade input controls. They do not. The framework handles orchestration. Input governance is the practitioner's responsibility, and the companion tooling required is not included.
The only partial exception is Pydantic AI, which enforces input schema validation by design — but this validates structure and types, not adversarial content. A structurally valid prompt injection passes Pydantic validation without issue.
The D1 threat model
Before implementing controls, you need to be precise about what you're defending against. D1 covers three distinct threat categories:
1. Direct prompt injection
An attacker provides input that overrides or augments your system prompt. Classic examples: "Ignore all previous instructions and...", role-play framings that establish a new persona with different constraints, or instruction injection buried in otherwise normal-looking text. This is the most well-documented attack class.
2. Indirect prompt injection
Malicious instructions are embedded in content the agent retrieves and processes — a web page, a document, an email, a calendar entry. When the agent reads the content as part of its context, it also ingests the injected instructions. This is the harder problem: it requires defending against content you didn't write and didn't generate.
Indirect injection is particularly dangerous in RAG systems and agents with web access or email/calendar tool access — which is most production agentic deployments.
3. Data exfiltration via input manipulation
Crafted inputs designed to extract information from the system prompt, conversation history, or retrieved context. In multi-tenant environments, this extends to cross-tenant data leakage via carefully constructed prompts that exploit the model's context handling.
The four implementation layers
Robust D1 implementation requires controls at four distinct layers. Each layer addresses a different part of the threat surface. Implementing only one or two leaves exploitable gaps.
Layer 1: Input schema validation
Before any content check, validate that the input conforms to the expected structure. This means type validation (string, not object injection), length bounds, character set restrictions, and format checks where applicable.