Production AI Institute — vendor-neutral certification for AI practitioners
Verify a credentialFor organisationsContact
PSF Deep DiveDomain 1 · April 2026

PSF Domain 1:
Input Governance

Input governance is the discipline of controlling what enters your AI system — before it reaches the model. It is the first and most fundamental production safety domain, and it is partially or completely absent from every major agent framework. This guide documents what D1 requires and exactly how to implement it.

Read time
16 min
PSF version
v1.1
CC BY 4.0
Citable

What PSF Domain 1 requires

The Production Safety Framework defines Domain 1 — Input Governance — as the set of controls that ensure only appropriate, safe, and well-formed inputs reach your language model. A deployment satisfies D1 when it can demonstrate four properties:

  1. Prompt injection resistance — adversarial inputs cannot override system instructions or extract confidential context
  2. Input schema validation — inputs conform to a declared type and structure before model execution
  3. Content classification — inputs are classified by type and intent; high-risk inputs are flagged or blocked
  4. Audit trail — every input is logged with sufficient context to investigate incidents

A Strong D1 rating requires all four. A Partial rating typically means input validation is present but prompt injection resistance is absent, or classification is present without audit. A Gap rating means the framework passes inputs to the model with no controls whatsoever — which describes most pure-Python agent frameworks out of the box.

Why every framework leaves D1 open

Agent frameworks are designed to help you build agents quickly. Input governance is friction. Every framework in our assessment series — LangChain, CrewAI, AutoGen, Semantic Kernel, Haystack, DSPy, Pydantic AI — ships with exactly zero prompt injection defences by default. This is not an oversight. It is a deliberate design choice to keep the framework general-purpose and unopinionated.

The problem arises when practitioners assume that using a production-grade framework means they have production-grade input controls. They do not. The framework handles orchestration. Input governance is the practitioner's responsibility, and the companion tooling required is not included.

The only partial exception is Pydantic AI, which enforces input schema validation by design — but this validates structure and types, not adversarial content. A structurally valid prompt injection passes Pydantic validation without issue.

The D1 threat model

Before implementing controls, you need to be precise about what you're defending against. D1 covers three distinct threat categories:

1. Direct prompt injection

An attacker provides input that overrides or augments your system prompt. Classic examples: "Ignore all previous instructions and...", role-play framings that establish a new persona with different constraints, or instruction injection buried in otherwise normal-looking text. This is the most well-documented attack class.

2. Indirect prompt injection

Malicious instructions are embedded in content the agent retrieves and processes — a web page, a document, an email, a calendar entry. When the agent reads the content as part of its context, it also ingests the injected instructions. This is the harder problem: it requires defending against content you didn't write and didn't generate.

Indirect injection is particularly dangerous in RAG systems and agents with web access or email/calendar tool access — which is most production agentic deployments.

3. Data exfiltration via input manipulation

Crafted inputs designed to extract information from the system prompt, conversation history, or retrieved context. In multi-tenant environments, this extends to cross-tenant data leakage via carefully constructed prompts that exploit the model's context handling.

The four implementation layers

Robust D1 implementation requires controls at four distinct layers. Each layer addresses a different part of the threat surface. Implementing only one or two leaves exploitable gaps.

Layer 1: Input schema validation

Before any content check, validate that the input conforms to the expected structure. This means type validation (string, not object injection), length bounds, character set restrictions, and format checks where applicable.

// Pydantic validation example
from pydantic import BaseModel, Field, field_validator
class UserInput(BaseModel):
query: str = Field(max_length=2000)
session_id: str = Field(pattern=r'^[a-zA-Z0-9-]+$')
@field_validator('query')
def no_null_bytes(cls, v: str) -> str:
if '' in v: raise ValueError('null bytes not permitted')
return v

Pydantic AI does this natively. For other frameworks, add explicit validation before calling the agent. This step has near-zero latency cost and eliminates entire classes of malformed input.

Layer 2: Intent classification

Classify the input before it reaches your primary model. A fast, cheap classifier (GPT-4o-mini, a fine-tuned BERT, or a rule-based system) can categorise inputs into safe/suspicious/blocked buckets. The primary model only processes inputs that clear classification.

Guardrails AI provides pre-built validators including detect_prompt_injection and toxic_language. NeMo Guardrails approaches this differently — you define conversation rails that the LLM is constrained to follow, which prevents certain input types from triggering certain behaviours even if they reach the model.

Layer 3: Contextual isolation

For agents that retrieve external content (RAG, web browsing, email processing), the retrieved content must be isolated from the instruction context. This means treating retrieved content as data, not as instructions, and designing the system prompt to make that distinction explicit.

Practical patterns include: explicit content delimiters ("RETRIEVED_CONTENT: [...] END_RETRIEVED_CONTENT"), role separation in multi-turn conversations, and — for high-risk deployments — a secondary model that processes external content before it enters the primary agent context.

Layer 4: Audit logging

Every input must be logged before it reaches the model. The log entry needs enough information to reconstruct the incident: timestamp, session ID, user context, the full input, the classification result, and the disposition (passed/blocked/flagged). LangSmith, Langfuse, and Arize Phoenix all provide pre-built input logging. For custom implementations, OpenTelemetry spans are the appropriate primitive.

Recommended companion tooling

Guardrails AI
Input validators

Best choice when you need custom validators and framework-agnostic integration. Extensive pre-built validator library.

Read comparison →
NeMo Guardrails
Conversation rails

Best choice when you need to constrain conversation topology — prevent the agent from discussing topics entirely, not just detect bad inputs.

Read comparison →
Presidio (Microsoft)
PII detection in inputs

Detects and de-identifies PII in inputs before they reach the model. Critical when users may send personal data.

LlamaGuard
Input/output safety classification

Meta's purpose-built safety model. High accuracy on harmful content classification. Adds latency but high fidelity.

Framework-specific implementation notes

LangChain / LangGraph
Full assessment →

Add Guardrails AI as a wrapper around the chain input. Use LangSmith for audit logging — it captures inputs automatically. For RAG, use document metadata to distinguish retrieved content from user inputs in traces.

CrewAI
Full assessment →

Input governance is most critical here because multi-agent amplifies every gap. Validate before task assignment. Add NeMo Guardrails or Guardrails AI around the crew.kickoff() call. Each agent should receive pre-validated inputs only.

AutoGen
Full assessment →

UserProxyAgent's human_input_mode provides a natural insertion point for validation. Add a custom validate_input() method before human_input_mode processes responses. AutoGen's code execution container provides a useful parallel — apply the same isolation logic to text inputs.

Semantic Kernel
Full assessment →

Input filters are the native SK primitive for D1. Register an IFunctionInvocationFilter that runs before any kernel function. Azure Content Safety integrates directly and handles classification at the Azure layer.

Haystack
Full assessment →

Add a validation component as the first node in every pipeline. The component-based architecture makes this clean — create a PromptInjectionDetector component that wraps Guardrails AI and plug it in at position 0.

D1 pre-deployment checklist

Before deploying any AI system to production, verify all of the following:

Input schema validation is in place before any model callRequired
Length bounds are enforced on all text inputsRequired
Prompt injection detection is active (Guardrails AI, NeMo, or equivalent)Required
Retrieved external content is isolated from instruction context in promptsRequired
All inputs are logged before model execution with session ID and timestampRequired
PII detection is active on user inputs if users may send personal data
High-risk input classifications trigger human review, not automatic blocking only
Injection detection is tested with known attack vectors before go-live
Log retention policy is defined and compliant with data residency requirements
Input validation failure responses don't leak system prompt information

Related assessments and guides

Guardrails AI vs NeMo vs Azure Content SafetyAgent Framework Comparison (PSF matrix)PSF D2: Output Validation guidePSF D3: Data Protection guide
From reading to credential

You understand the gaps.
Get the credential that proves it.

The AIDA examination tests applied PSF knowledge across all eight domains — exactly the gaps and strengths covered in this assessment. 15 minutes. No charge. Ever.

Start AIDA — free →CPAP practitioner credential