Production AI Institute — vendor-neutral certification for AI practitioners
Verify a credentialFor organisationsContact
Vertical PlaybookFinancial Services · April 2026

Production AI in Financial Services
The PSF Playbook

Financial services is the sector where production AI safety matters most and moves fastest. The regulatory obligations are the most developed of any industry. The blast radius of a failure is measured in fines, licence conditions, and reputational damage that takes years to recover from. This playbook maps PSF requirements to the specific obligations financial services firms face.

Who this is for: AI engineers, MLOps practitioners, and governance teams at banks, insurers, asset managers, fintechs, and payment firms deploying LLM-based applications in regulated environments.
Disclaimer: This playbook provides technical implementation guidance, not legal advice. Consult qualified legal and compliance counsel for regulatory obligations specific to your firm and jurisdiction. CC BY 4.0.

Why Financial Services Is Different

Most industries deploying production AI face reputational risk if something goes wrong. Financial services firms face that — plus regulatory censure, civil liability, potential licence revocation, and personal accountability for named individuals under senior manager regimes. The regulatory environment also moves quickly: the EU AI Act, DORA, and various national guidance documents on model risk have all entered or are entering force simultaneously.

The good news: financial services has the most developed internal governance infrastructure of any sector. Model Risk Management (MRM) frameworks, change management processes, and audit trails already exist. The PSF maps well onto existing MRM practice — it is not a parallel system, but a specification of what "model risk management for LLMs" looks like in practice.

Regulatory Mapping

Key regulations and their primary PSF domain touchpoints:

RegulationPSF DomainsRequirement summary
SR 11-7 (Fed/OCC)
D1, D6
Model risk management requires documented validation, governance, and ongoing monitoring of all models — including AI/ML
MiFID II / ESMA
D2, D4
Automated decisions in trading and advice must be explainable and auditable; firms must maintain records of algorithmic decisions
FCA SYSC / SMCR
D6, D5
Senior Manager accountability applies to AI systems; firms need clear ownership of AI deployment decisions and outcomes
EU DORA
D4, D5, D8
Digital operational resilience: ICT risk management, incident reporting, and third-party provider concentration risk
GDPR / UK GDPR
D3
Personal data in prompts must be processed lawfully; data subjects have rights over automated decisions affecting them
EU AI Act (High Risk)
D1-D8
Credit scoring, insurance, employment — these use cases trigger high-risk AI obligations including conformity assessment
Basel III / BCBS 239
D4
Data aggregation and reporting accuracy requirements apply to AI outputs used in risk calculation

Domain-by-Domain: Financial Services Requirements

D1 · Input Governance

Critical

Financial services AI systems routinely receive inputs that include account numbers, transaction data, client PII, and potentially price-sensitive information. Every input path must be classified, sanitised, and audited. Prompt injection is a real threat in customer-facing applications — a compromised advisory bot could be manipulated to provide unsuitable advice or disclose information.

REQUIRED ACTIONS
1.Classify all input channels by sensitivity level — retail customer inputs are different from internal analyst queries
2.Implement prompt injection detection on all customer-facing inputs (Guardrails AI or Azure Content Safety Prompt Shield)
3.Log all input classifications and validation decisions for audit purposes
4.Define and enforce token budgets to prevent prompt stuffing attacks

D2 · Output Validation

Critical

Under MiFID II, SR 11-7, and FCA SYSC, firms must be able to explain automated decisions. An AI output that is not validated against a defined schema is not explainable — it may be anything. For any output that influences a customer communication, a risk calculation, or a regulatory report, output validation is not optional.

REQUIRED ACTIONS
1.Define output schemas for every AI-generated output that flows to customers or regulators
2.Validate outputs against schemas before they leave the AI system — reject and escalate non-conforming outputs
3.For advice-generating systems, implement a secondary classifier that checks outputs for suitability red flags
4.Maintain a sample of validated outputs for model validation and audit review

D3 · Data Protection

Critical

Financial data is among the most sensitive categories of personal data under GDPR. Prompt inputs that contain account details, credit information, or transaction histories must be handled as personal data throughout the AI pipeline. Many financial services firms have additional obligations under local data protection regimes beyond GDPR.

REQUIRED ACTIONS
1.Never include raw customer account numbers, transaction IDs, or personal identifiers in LLM prompts — pseudonymise or mask at the application layer before the AI sees the data
2.Implement strict data residency controls — confirm that your LLM provider processes data within approved regions
3.Document all personal data flows through AI systems for your ROPA (Record of Processing Activities)
4.Establish a process for data subject access requests that covers AI system outputs

D4 · Observability

Critical

DORA requires ICT incident detection and reporting within defined timeframes. BCBS 239 requires data quality in risk calculations. SR 11-7 requires ongoing monitoring of deployed models. All of these obligations require structured, queryable observability — not log files.

REQUIRED ACTIONS
1.Deploy Langfuse self-hosted or equivalent — trace data must not leave your approved infrastructure boundary
2.Implement alerting on output quality metrics — if error rates or low-confidence outputs increase, escalate to model risk management
3.Create a production monitoring dashboard that is reviewed by the model risk team weekly
4.Retain traces for a minimum of 5 years for FCA-regulated activities (longer for some MiFID II obligations)

D5 · Deployment Safety

High

Financial services change management processes already require staged rollouts, rollback plans, and sign-off gates. The PSF D5 requirements map directly onto existing change management practice — the question is whether AI deployments are going through the same process as other system changes.

REQUIRED ACTIONS
1.AI system deployments must go through the same change management approval process as other production system changes
2.Implement canary deployment for any AI system with customer-facing output — never roll out to 100% of users simultaneously
3.Define rollback criteria and test the rollback procedure before production deployment
4.Conduct a pre-deployment model validation review for any model change — not just initial deployment

D6 · Human Oversight

Critical

The FCA, PRA, and SEC are increasingly clear: senior manager accountability requires that a named individual can be identified as responsible for AI system behaviour. Human oversight is not just good practice — it is the mechanism by which regulatory accountability is maintained. For high-risk use cases (credit decisions, investment advice, fraud flags), human review is likely mandatory.

REQUIRED ACTIONS
1.Define and document the human oversight tier for every AI system: which outputs require human review before action?
2.For credit, insurance, and investment advice AI: mandatory human review of all decisions that adversely affect customers
3.Implement a challenge process — customers must be able to request human review of AI-generated decisions
4.Assign named accountability under SMCR or equivalent for every production AI system in scope

D7 · Security

High

DORA explicitly addresses ICT security risks including third-party AI providers. Financial services firms must assess the security posture of AI providers as part of their third-party risk management framework.

REQUIRED ACTIONS
1.Include LLM provider security in your third-party risk assessment process — not just data processing agreements
2.Implement zero-trust access controls for AI system API endpoints — no anonymous access, all access logged
3.Conduct regular penetration testing of AI system input and output interfaces
4.For model fine-tuning: apply the same access controls to training data as to the production data the model will process

D8 · Vendor Resilience

High

DORA requires documented ICT third-party risk management and concentration risk assessment. A financial services firm that is dependent on a single LLM provider for critical processes has a concentration risk that must be managed and reported.

REQUIRED ACTIONS
1.Document LLM provider dependency in your ICT third-party risk register
2.Define the maximum tolerable downtime for each AI system and confirm your LLM provider's SLA meets it
3.Implement fallback to a secondary model or degraded manual process for all critical AI workflows
4.Test the fallback procedure at least annually — an untested fallback is not a fallback

Recommended Stack for Financial Services

A PSF-compliant baseline stack for a regulated financial services AI deployment:

Framework
LangChain / LangGraph or Semantic Kernel (Azure-native)
Strongest observability and human oversight of the Python frameworks; SK for .NET/Azure environments
D1 Guardrails
Guardrails AI (self-hosted) or Azure Content Safety
Data residency compliance; Guardrails AI for flexibility, Azure CS for Azure-native + Prompt Shield
D3 PII
Microsoft Presidio (self-hosted)
Open source, self-hosted, supports 40+ PII entity types, integrates at the pipeline layer
D4 Observability
Langfuse (self-hosted)
Full trace data stays on your infrastructure; GDPR-compliant; EU hosting option available
D5 Deployment
Azure Container Apps or AWS ECS with deployment slots
Canary deployment, rollback, and staging gates via infrastructure rather than framework
D6 Oversight
LangGraph interrupt/resume or AutoGen UserProxy
Explicit HITL checkpoints; LangGraph for complex workflows, AutoGen for conversational oversight

Related

The Production Safety FrameworkAgent Framework ComparisonObservability Tools ComparisonGuardrails ComparisonCPAP Certification

Related guides

→ D3 Data Protection deep dive (PII, GDPR)→ EU AI Act — production AI practitioner guide→ D6 Human Oversight for regulated industries→ PSF-compliant stack recipes
From reading to credential

You understand the gaps.
Get the credential that proves it.

The AIDA examination tests applied PSF knowledge across all eight domains — exactly the gaps and strengths covered in this assessment. 15 minutes. No charge. Ever.

Start AIDA — free →CPAP practitioner credential