PSF Domain 1: Input Governance — Complete Implementation Guide

What PSF Domain 1 requires

The Production Safety Framework defines Domain 1 — Input Governance — as the set of controls that ensure only appropriate, safe, and well-formed inputs reach your language model. A deployment satisfies D1 when it can demonstrate four properties:

Prompt injection resistance — adversarial inputs cannot override system instructions or extract confidential context
Input schema validation — inputs conform to a declared type and structure before model execution
Content classification — inputs are classified by type and intent; high-risk inputs are flagged or blocked
Audit trail — every input is logged with sufficient context to investigate incidents

A Strong D1 rating requires all four. A Partial rating typically means input validation is present but prompt injection resistance is absent, or classification is present without audit. A Gap rating means the framework passes inputs to the model with no controls whatsoever — which describes most pure-Python agent frameworks out of the box.

Why every framework leaves D1 open

Agent frameworks are designed to help you build agents quickly. Input governance is friction. Every framework in our assessment series — LangChain, CrewAI, AutoGen, Semantic Kernel, Haystack, DSPy, Pydantic AI — ships with exactly zero prompt injection defences by default. This is not an oversight. It is a deliberate design choice to keep the framework general-purpose and unopinionated.

The problem arises when practitioners assume that using a production-grade framework means they have production-grade input controls. They do not. The framework handles orchestration. Input governance is the practitioner's responsibility, and the companion tooling required is not included.

The only partial exception is Pydantic AI, which enforces input schema validation by design — but this validates structure and types, not adversarial content. A structurally valid prompt injection passes Pydantic validation without issue.

The D1 threat model

Before implementing controls, you need to be precise about what you're defending against. D1 covers three distinct threat categories:

1. Direct prompt injection

An attacker provides input that overrides or augments your system prompt. Classic examples: "Ignore all previous instructions and...", role-play framings that establish a new persona with different constraints, or instruction injection buried in otherwise normal-looking text. This is the most well-documented attack class.

2. Indirect prompt injection

Malicious instructions are embedded in content the agent retrieves and processes — a web page, a document, an email, a calendar entry. When the agent reads the content as part of its context, it also ingests the injected instructions. This is the harder problem: it requires defending against content you didn't write and didn't generate.

Indirect injection is particularly dangerous in RAG systems and agents with web access or email/calendar tool access — which is most production agentic deployments.

3. Data exfiltration via input manipulation

Crafted inputs designed to extract information from the system prompt, conversation history, or retrieved context. In multi-tenant environments, this extends to cross-tenant data leakage via carefully constructed prompts that exploit the model's context handling.

The four implementation layers

Robust D1 implementation requires controls at four distinct layers. Each layer addresses a different part of the threat surface. Implementing only one or two leaves exploitable gaps.

Layer 1: Input schema validation

Before any content check, validate that the input conforms to the expected structure. This means type validation (string, not object injection), length bounds, character set restrictions, and format checks where applicable.

// Pydantic validation example
from pydantic import BaseModel, Field, field_validator
class UserInput(BaseModel):
    query: str = Field(max_length=2000)
    session_id: str = Field(pattern=r'^[a-zA-Z0-9-]+$')
    @field_validator('query')
    def no_null_bytes(cls, v: str) -> str:
        if '' in v: raise ValueError('null bytes not permitted')
        return v

Pydantic AI does this natively. For other frameworks, add explicit validation before calling the agent. This step has near-zero latency cost and eliminates entire classes of malformed input.

Layer 2: Intent classification

Classify the input before it reaches your primary model. A fast, cheap classifier (GPT-4o-mini, a fine-tuned BERT, or a rule-based system) can categorise inputs into safe/suspicious/blocked buckets. The primary model only processes inputs that clear classification.

Guardrails AI provides pre-built validators including detect_prompt_injection and toxic_language. NeMo Guardrails approaches this differently — you define conversation rails that the LLM is constrained to follow, which prevents certain input types from triggering certain behaviours even if they reach the model.

Layer 3: Contextual isolation

For agents that retrieve external content (RAG, web browsing, email processing), the retrieved content must be isolated from the instruction context. This means treating retrieved content as data, not as instructions, and designing the system prompt to make that distinction explicit.

Practical patterns include: explicit content delimiters ("RETRIEVED_CONTENT: [...] END_RETRIEVED_CONTENT"), role separation in multi-turn conversations, and — for high-risk deployments — a secondary model that processes external content before it enters the primary agent context.

Layer 4: Audit logging

Every input must be logged before it reaches the model. The log entry needs enough information to reconstruct the incident: timestamp, session ID, user context, the full input, the classification result, and the disposition (passed/blocked/flagged). LangSmith, Langfuse, and Arize Phoenix all provide pre-built input logging. For custom implementations, OpenTelemetry spans are the appropriate primitive.

Recommended companion tooling

Guardrails AI

Input validators

Best choice when you need custom validators and framework-agnostic integration. Extensive pre-built validator library.

Read comparison →

NeMo Guardrails

Conversation rails

Best choice when you need to constrain conversation topology — prevent the agent from discussing topics entirely, not just detect bad inputs.

Read comparison →

Presidio (Microsoft)

PII detection in inputs

Detects and de-identifies PII in inputs before they reach the model. Critical when users may send personal data.

LlamaGuard

Input/output safety classification

Meta's purpose-built safety model. High accuracy on harmful content classification. Adds latency but high fidelity.

Framework-specific implementation notes

LangChain / LangGraph

Full assessment →

Add Guardrails AI as a wrapper around the chain input. Use LangSmith for audit logging — it captures inputs automatically. For RAG, use document metadata to distinguish retrieved content from user inputs in traces.

CrewAI

Full assessment →

Input governance is most critical here because multi-agent amplifies every gap. Validate before task assignment. Add NeMo Guardrails or Guardrails AI around the crew.kickoff() call. Each agent should receive pre-validated inputs only.

AutoGen

Full assessment →

UserProxyAgent's human_input_mode provides a natural insertion point for validation. Add a custom validate_input() method before human_input_mode processes responses. AutoGen's code execution container provides a useful parallel — apply the same isolation logic to text inputs.

Semantic Kernel

Full assessment →

Input filters are the native SK primitive for D1. Register an IFunctionInvocationFilter that runs before any kernel function. Azure Content Safety integrates directly and handles classification at the Azure layer.

Haystack

Full assessment →

Add a validation component as the first node in every pipeline. The component-based architecture makes this clean — create a PromptInjectionDetector component that wraps Guardrails AI and plug it in at position 0.

D1 pre-deployment checklist

Before deploying any AI system to production, verify all of the following:

Input schema validation is in place before any model callRequired

Length bounds are enforced on all text inputsRequired

Prompt injection detection is active (Guardrails AI, NeMo, or equivalent)Required

Retrieved external content is isolated from instruction context in promptsRequired

All inputs are logged before model execution with session ID and timestampRequired

PII detection is active on user inputs if users may send personal data

High-risk input classifications trigger human review, not automatic blocking only

Injection detection is tested with known attack vectors before go-live

Log retention policy is defined and compliant with data residency requirements

Input validation failure responses don't leak system prompt information

Related assessments and guides

Guardrails AI vs NeMo vs Azure Content Safety →Agent Framework Comparison (PSF matrix) →PSF D2: Output Validation guide →PSF D3: Data Protection guide →

PSF Domain 1:Input Governance

What PSF Domain 1 requires

Why every framework leaves D1 open

The D1 threat model

1. Direct prompt injection

2. Indirect prompt injection

3. Data exfiltration via input manipulation

The four implementation layers

Layer 1: Input schema validation

Layer 2: Intent classification

Layer 3: Contextual isolation

Layer 4: Audit logging

Recommended companion tooling

Framework-specific implementation notes

D1 pre-deployment checklist

Related assessments and guides

You understand the gaps.Get the credential that proves it.

PSF Domain 1:
Input Governance

You understand the gaps.
Get the credential that proves it.