LangChain, Composio, LangSmith, Guardrails AI — these tools are how agents get built. The PSF is how you know whether the result is safe to run in production. This page maps the ecosystem and shows where the standard applies.
Every tool in the production AI stack has a vendor who would prefer you thought of it as comprehensive. LangChain's documentation doesn't emphasise that it has no native PII protection. Composio's homepage doesn't lead with the fact that it provides no human-in-the-loop primitives. This is not deception — these are tool vendors describing what their tools do, not safety assessors evaluating what they miss.
The problem is that practitioners assembling a production AI stack need to know both: what each tool does well, and what gaps remain their responsibility to close. Without that complete picture, teams make confident deployments on incomplete foundations — and discover the gaps at incident time rather than design time.
PAI's role is to provide the complete picture. The Production Safety Framework defines what a safe production deployment requires across eight domains. Ecosystem assessments apply the PSF to specific tools — not to diminish them, but to give practitioners an honest map of what each tool satisfies and what each tool leaves open.
A production agent deployment involves multiple layers. Each layer is necessary; none is sufficient on its own. The PSF applies to the system as a whole — not to any individual layer.
Foundation models. PAI assessments are model-agnostic — the PSF applies regardless of which model underpins a deployment.
Orchestration and execution. These frameworks define how agents reason, plan, and call tools. PSF compliance depends heavily on how these are configured.
Managed access to external services — email, calendar, CRMs, code repositories. Determines how agents take actions in the real world.
Trace-level visibility into agent reasoning and execution. Satisfies PSF Domain 4. Critical for production incident investigation.
Input classification, output validation, PII detection, and prompt injection resistance. Closes PSF Domain 1, 2, and 3 gaps that most frameworks leave open.
The frameworks that define what 'safe' means. PAI's PSF is the practitioner-focused standard for production agentic AI deployment.
Each assessment evaluates a tool or framework against all eight PSF domains. Assessments are independent, versioned, and updated as products evolve.
Strong on observability (LangSmith) and vendor resilience. LangGraph adds strong human oversight. Gap on data protection and security without companion tooling.
Intuitive role-based multi-agent orchestration. Most extensive PSF gaps of any framework — multi-agent architecture amplifies every safety gap. Requires the most companion tooling.
Standout human oversight model (UserProxyAgent). Docker code execution for sandboxed security. Weakest production deployment tooling — research origins are evident.
Microsoft's enterprise SDK for .NET and Python. Native Entra ID and Azure Key Vault give D7 a Strong rating. Best-in-class OpenTelemetry integration. The default choice for Azure-committed teams.
Released April 2026. Programmatic access to Cursor's agent runtime with MCP integration. Strong observability, gap on security and data protection — particularly for filesystem and email access.
All three satisfy PSF D4 core requirements. LangSmith wins on LangChain depth; Langfuse wins on data residency and self-hosting; Arize wins on production alerting and MLOps integration.
Strong on security (managed OAuth) and data protection. Gap on human oversight — must be implemented above Composio.
RAG-native framework with the strongest production deployment story of any Python framework. Hayhooks REST serving is built-in. D4/D5/D8 are all Strong; D3 gap matters more for RAG workloads because retrieved documents carry PII.
Optimisation-first framework from Stanford NLP. TypedPredictor delivers the strongest structured output enforcement of any framework assessed (D2). Three gaps: D1, D3, D7. Research-to-production gap is real — deploy only with full companion safety layer.
Pydantic validation applied to LLM agents. Strong D2 from type-enforced outputs. Deliberately a library, not a platform — D5 and D6 are application responsibilities. Best for structured extraction pipelines; infrastructure ownership required.
Visual low-code builders that accelerate prototyping and carry production security debt. Known CVEs in unauthenticated instances. D7 and D3 are gaps. Excellent for PoC; requires hardening before enterprise deployment.
Three tools that close D1/D2/D3 gaps from different architectural positions. Guardrails AI for custom validators; NeMo for conversation policy; Azure Content Safety for enterprise managed compliance.
PSF D3/D4 assessment of the three major vector databases. Weaviate wins on access control and audit logging. Pinecone wins on managed compliance. Chroma requires full application-layer D3 implementation.
Haystack, DSPy, Pydantic AI, Flowise, and guardrails platform assessments in preparation.
Independence requires clarity about scope.
PAI does not build or sell agent frameworks, tool integration libraries, or AI models. The PSF is designed to be implemented on top of any framework — LangChain, CrewAI, a custom Python stack, or anything else.
PAI does not offer implementation services. The standard and its assessments are published openly for practitioners to apply directly. If you want someone to implement it for you, that is a separate commercial relationship with a certified integrator.
PAI has no equity stakes, advertising relationships, or commercial agreements with any of the tools or frameworks it assesses. Independence is the only basis on which an assessment authority is credible.
The PSF does not compete with LangChain, Composio, or any other tooling. It provides the yardstick against which they are evaluated — and most of them are genuinely useful. PSF compliance is about using the right tools correctly, not avoiding them.
If you are assembling a production AI stack, start by mapping your chosen tools against the PSF domains using the published assessments. Note which domains are addressed by your tooling and which require explicit implementation on your part. The gaps are your implementation checklist before deployment.
If your organisation requires formal compliance evidence — for internal governance, customer assurance, or regulatory purposes — the Certified Production AI Practitioner (CPAP) certification evaluates whether a real deployment meets PSF requirements. The assessment is conducted by an independent PAI assessor against your actual implementation, not a self-reported questionnaire.
The AIDA examination tests applied PSF knowledge across all eight domains — exactly the gaps and strengths covered in this assessment. 15 minutes. No charge. Ever.