Production AI Institute — vendor-neutral certification for AI practitioners

Verify a credential For organisations Contact

Part 2: Production PatternsPSF D6 · Human OversightPSF D2 · Output ValidationPAI-8 C4 · Human OversightPAI-8 C1 · AI Governance Policy

Human-in-the-Loop

The architecture for deciding when agents act autonomously and when they pause for human review.

Human-in-the-loop (HITL) is not a limitation imposed on AI systems — it is a safety architecture designed into them. HITL defines the checkpoints at which human judgement is required before an agent can proceed, and the interfaces that make that judgement efficient and meaningful.

HITL design starts with a decision taxonomy: for each decision the agent system makes, what is the consequence of an error, what is the frequency, and what is the human reviewer's capacity? High-consequence, low-frequency decisions (e.g. approving a contract amendment above £100k) warrant mandatory human review. Low-consequence, high-frequency decisions (e.g. categorising a support ticket) can run autonomously with periodic sampling. HITL interfaces must be designed to enable meaningful review — not just click-through approval. The reviewer needs the right information, presented clearly, to make a genuine decision rather than simply ratifying what the agent has already effectively decided.

In practice

A mortgage processing firm implements a tiered HITL architecture. Applications below £250k with clean credit histories are processed autonomously with 5% random sampling for quality review. Applications above £250k or with any anomaly flags are routed to an underwriter for review. All declined applications — regardless of value — require a human to confirm the decline before the decision is communicated. The review interface shows the agent's reasoning, the specific data points that triggered the decision, and a comparison against recent similar applications. Average human review time is 4 minutes per flagged application.

Why it matters

Regulatory frameworks increasingly require human oversight for high-stakes AI decisions. But beyond compliance, HITL is sound risk management: it ensures that the consequences of agent errors are bounded, that humans remain meaningfully in control of consequential decisions, and that there is a human accountable for outcomes that matter. The absence of HITL in high-stakes workflows is one of the most common findings in production AI audits.

Framework alignment

PSF Domains

Human Oversight

View PSF domain →

Output Validation

View PSF domain →

PAI-8 Controls

Human Oversight

View PAI-8 standard →

AI Governance Policy

View PAI-8 standard →

Production failure modes

How this pattern fails in practice — and what to watch for.

⚠

Review bottleneck

The volume of agent outputs routed to human review exceeds reviewer capacity. A backlog builds. Agents queue their outputs waiting for approval. The business process the AI was meant to accelerate becomes slower than the manual process it replaced.

⚠

Approval fatigue

Reviewers receive hundreds of approvals per day. After the first few, they stop reading and start approving. The HITL checkpoint has become a formality. In an audit, the organisation can demonstrate a HITL architecture — but not that the human review was meaningful.

⚠

HITL bypass by redesign

Under pressure to accelerate processing, the HITL thresholds are gradually raised and the review interface is made simpler. Over 18 months, what began as a meaningful oversight mechanism becomes a rarely-triggered formality that no one remembers the original purpose of.

Implementation checklist

Seven things to verify before deploying this pattern in production.

Define HITL thresholds based on consequence and frequency, not organisational politics

Set SLA for human review — what happens to a queued decision if no reviewer responds within the defined window?

Monitor approval rates: a very high approval rate may indicate fatigue, not quality

Design review interfaces that show the agent's reasoning, the data, and relevant comparators — not just the output

Log reviewer identity, decision, time taken, and any comments for every HITL interaction

Test what happens when no reviewer is available during the SLA window

Review HITL threshold settings quarterly — thresholds should not drift upward without explicit decision

Certification relevance

Human-in-the-loop is the single most tested concept in the CAIG certification exam — it sits at the intersection of AI governance policy and technical implementation. AIDA tests HITL under D6. CAIAUD candidates are expected to be able to identify HITL architectures that exist on paper but are ineffective in practice — particularly approval fatigue and HITL bypass scenarios.

AIDA — Take the exam →CAIG — Take the exam →CAIAUD — Take the exam →

Related patterns

Part 1 · Core Patterns

Reflection

An agent critiques and revises its own output before it reaches a human.

Part 1 · Core Patterns

Orchestration

A controlling agent that directs sub-agents, manages state, and decides when a task is complete.

Part 2 · Production Patterns

Safety Guardrails

The input and output filters that prevent agents from receiving or producing content they should not.

Production AI Institute

Certify your understanding of production AI patterns

The AIDA certification covers all 21 agentic design patterns with a focus on deployment safety, governance, and the PSF. Free to attempt.

Start AIDA — Free →All 21 patterns