New from the Lab·The Compass — an open moral reasoning standard for AI, tested across frontier modelsExplore →
Production AI Institute · PSF v1.1 open standard
AI Right-To-KnowAI Data Use IndexCheck My AI ToolsPolicy Change WatchAgent ReadinessPublic BenchmarkContactGlobal standard · Worldwide
PAI research programme

Public research for production AI deployment

Position papers, incident analysis, framework assessments, and monthly intelligence that make production AI safety inspectable. The work is public because a standard only matters if people can inspect it.

Read the latest Briefing →Browse all Insights
Open methodologyIncident patternsFramework analysisPractitioner evidence
Intelligence Briefing

Monthly · Free · Practitioner-written

All issues →
Issue
001
May 2026

Karpathy's Third Chip Flip · Software 3.0 · The Agent Operator · PAI Patterns Library

Five developments shaping production AI this month — with the PAI angle on what they mean for practitioners building real systems today.

Read Issue 001 →

Get the next issue in your inbox

Free. Monthly. Unsubscribe anytime.

Subscribe →
From research to action

Use the evidence to choose a proof path.

Research creates leverage only when the next step is obvious. Pick the route that matches the pressure you are under.

All research areas

LABQ2 2026

PAI Lab

Structured reliability testing of frontier AI models and agent frameworks against PSF criteria. Quarterly scorecards. Open methodology.

Explore →

R&I50+ articles

Insights

Analysis, essays, and deep-dives on AI deployment, safety, and the production practitioner experience.

Explore →

PTR21 patterns

Patterns Library

Reusable workflow patterns for production AI systems — vetted against the PSF and ready to adapt.

Explore →

INCCase studies

Incidents

Documented AI failure cases with root-cause analysis mapped to PSF domains. Learn from what went wrong.

Explore →

ECO12 frameworks

Ecosystem Reports

Independent PSF assessments of every major AI framework — LangChain, CrewAI, AutoGen, Cursor, and more.

Explore →

PSFPAI-8 / PSF

Standard

The Production Safety Framework itself — eight domains, openly published and freely referenceable.

Explore →

ARIPublic index

Agent Readiness Index

Generate a PSF-aligned readiness report for an AI agent, with evidence grade, repository signals, and a shareable badge.

Explore →

Recent findings (2026)

May–June 2026 lab and ecosystem work — indexed here for procurement visitors; formal publication titles below are unchanged.

Incident analysisJune 2026

AI Agent Bankrupted Its Operator: What Went Wrong

Forensic breakdown of DN42 agent failure: five critical control gaps and their costs; demonstrates PSF-compliance requirements.

Read →
AssessmentJune 2026

Free AI Certification That's Actually Free (No Card Required)

AIDA certification scope and employer verification mechanisms; clarifies upgrade pathway to CAAE standards.

Read →
Incident analysisJune 2026

AI Agent Bankrupted Its Operator: 6 Controls That Failed

Forensic post-mortem of DN42 incident maps six absent governance controls to real costs, showing each failure was preventable.

Read →
ResearchJune 2026

AI Is Eroding Your Team's Skills. Certification Fixes That.

Nature study confirms AI tool use degrades human expertise; human-in-the-loop competency verification prevents production failures.

Read →
AssessmentJune 2026

Is an AI Certification Worth It? An Honest Answer

Evidence-grounded cost-benefit analysis for practitioners seeking structured, credible pathways into production AI competency validation.

Read →
Incident analysisJune 2026

AI Agent Bankrupted Its Operator: 6 Controls That Failed

Forensic post-mortem of DN42 autonomous agent bankruptcy identifying six preventable governance control failures and remediation strategies.

Read →
ResearchJune 2026

Why AI Deployments Fail and What Certified Integrators Do Differently

Enterprise AI budget failures traced to three production-readiness gaps in governance, not technology—with certified integrator benchmarks.

Read →
StandardsJune 2026

When AI Agents Go Rogue: The Real Cost of Skipping Guardrails

PSF compliance requirements for production agents mapped to four critical guardrails; quantified cost of each missing control.

Read →
Incident analysisJune 2026

AI Agent Bankrupted Its Operator: 6 Controls That Failed

Forensic post-mortem of DN42 autonomous agent failure mapping six absent governance controls to preventable costs.

Read →
StandardsJune 2026

AI Raises the Engineering Bar - Here's What That Means

Production AI failures stem from discipline gaps. Defines formal engineering rigor metrics for verifying team readiness.

Read →
Incident analysisJune 2026

AI Key Theft via IDE Plugins: The Supply Chain Gap MSPs Miss

JetBrains plugin attack reveals PSF control gaps in MSP AI security reviews and toolchain certification requirements.

Read →
Lab reportJune 2026

When Your OSS AI Stack Disappears Overnight

TensorZero archival identifies six supply chain risk signals certified AI integrators must evaluate pre-deployment.

Read →
Lab reportJune 2026

CompassEval — moral reasoning scores for 10 frontier models

240 graded transcripts: 10 models × 12 hard moral cases × 2 conditions. GPT-5.5 refused the framework. 8 of 10 improved with Compass loaded.

Read →
ResearchJune 2026

The Machine God — on the moral weight of the values that make valuing possible

Philosophical essay on why any agent that values anything must, on pain of self-contradiction, value the conditions that make valuing possible.

Read →
Incident analysisJune 2026

Anthropic suspends Claude Fable 5 — what the data showed before suspension

Claude Fable 5 scored 8.79/10 with Compass, 8.56 bare — +0.24 uplift across all 12 cases. Suspended by Anthropic 1h after the PAI sweep completed.

Read →
StandardsJune 2026

When AI hides its rules — Claude's secret guardrails

Analysis of operator-hidden system prompts and what the PSF says about transparency obligations in production AI deployment.

Read →
Incident analysisJune 2026

A $0.01 bank transfer almost broke a banking AI agent

Edge case analysis of numeric validation failure in a production financial agent — mapped to D2 output validation and D1 input governance.

Read →
Ecosystem assessmentJune 2026

Cursor Enterprise Organizations — PSF assessment

Multi-team governance, org-level IdP, and usage analytics for Cursor Enterprise Organizations GA (3 Jun 2026).

Read →
Ecosystem assessmentJune 2026

OpenAI Codex Sites & role plugins — PSF assessment

Enterprise plugins, hosted Sites, and human-refinement signals for Codex knowledge work (2 Jun 2026).

Read →
Ecosystem assessmentJune 2026

OpenAI on Amazon Bedrock — PSF assessment

AWS-native governance and regional inference for OpenAI models and Codex on Bedrock GA (1 Jun 2026).

Read →
Open sourceJune 2026

Why we open-sourced WorkflowOS

PSF Workflow Studio released under MIT — the working artifact behind the open Production Safety Framework text.

Read →
Lab reportJune 2026

PAI Lab: public GitHub agent readiness (May 2026 cohort)

Empirical scan of 20 public agent repos against PSF evidence signals (PAI-ARI-2026.1).

Read →
Incident analysisJune 2026

OpenAI May 2026 multi-service outage

Vendor-reported ChatGPT, login, and checkout failures on 29 May 2026 — mapped to D8, D4, and D5; indexed in the incident registry.

Read →
Incident analysisMay 2026

Binnall Law — Claude Console phantom citations in federal court

Verified May 2026 filing failure mapped to D2, D5, and D6 — indexed in the incident registry.

Read →
Data use indexMay 2026

AI Data Use Index — week 5 (May 2026)

Weekly practitioner index of vendor and product data-use posture changes.

Read →
Ecosystem assessmentMay 2026

OpenAI Codex CLI 0.134 — PSF assessment

Independent PSF coverage review of Codex CLI for production agent workflows.

Read →
Ecosystem assessmentMay 2026

Cursor Automations 3.5 — PSF assessment

Vendor resilience and deployment-safety signals for Cursor Automations 3.5.

Read →
Ecosystem assessmentMay 2026

Google Agent Executor — PSF assessment

Framework assessment of Google Agent Executor against PSF domains.

Read →

PAI Publications

Position papers, analyses, and framework notes from the PAI research programme.

Position paper2025

The EU AI Act and the Production Safety Framework: A Practitioner's Guide

Maps PSF domains to EU AI Act obligations for high-risk AI system deployers. Covers conformity assessment requirements, technical documentation standards, and human oversight obligations.

EU AI ActComplianceRegulation
Read the paper →
Analysis2025

Incident Patterns in Production LLM Deployments

Analysis of common failure modes in production LLM deployments. Identifies root causes across PSF domains and intervention patterns.

IncidentsLLMRoot cause
Read the paper →
Position paper2024

Human Oversight in High-Stakes AI: What 'Meaningful' Means in Practice

Examines what constitutes meaningful human oversight in high-stakes AI-assisted decisions. Includes design patterns for effective human checkpoints.

Human oversightPSF Domain 05Design patterns
Read the paper →
Framework note2024

PSF v1.0 Rationale and Development History

Documents the reasoning behind each PSF domain, alternatives considered, and how practitioner feedback shaped the framework.

PSFFramework development
Read the paper →

Redacted assurance findings (cross-client patterns)

Patterns repeatedly observed in anonymised production assurance reviews and incident-led postmortems:

D2 output contracts missing

Model text consumed by downstream systems as if it were trusted structured data.

Review PSF-D2
D4 observability blind spots

Strong model-call logs, weak cross-service traceability at queue and handoff boundaries.

See Lab methodology
D5/D6 autonomy without gates

Operational actions executed without explicit human intervention criteria on high-consequence paths.

Use deployment guidance
Read the PSF →Ecosystem assessments →

Contribute to the research programme

PAI collects anonymised incident reports from practitioners to inform framework development. If you have experienced a production AI failure and are willing to share details, we welcome your contribution.

Submit an incident reportRead the PSF
The Production AI Brief

Get the brief that keeps AI work defensible

PSF updates, deployment checks, failure patterns, and proof paths for practitioners, MSPs, and teams who need AI work to survive scrutiny. No hype.