Production AI Institute — vendor-neutral certification for AI practitioners
Verify a credentialFor organisationsContact
InsightsIndustry Playbooks
Industry Playbook · Healthcare

Healthcare AI Deployment Playbook

A practitioner guide to deploying AI in clinical and healthcare settings — covering HIPAA, FDA AI/ML guidance, NHS clinical safety requirements, and a full PSF domain mapping. Healthcare has the highest regulatory burden and the highest patient harm potential of any AI deployment context.

This is not legal or clinical advice. Healthcare AI deployment involves complex regulatory obligations that vary by jurisdiction, use case, and risk classification. Engage qualified healthcare regulatory counsel before deploying AI in clinical settings.

Regulatory Landscape

Healthcare AI sits at the intersection of data privacy law, medical device regulation, and general AI governance frameworks. Unlike financial services — where the regulatory question is primarily about fairness and systemic risk — healthcare AI regulation is fundamentally about patient safety and direct harm prevention.

FrameworkJurisdictionPrimary focusPSF domains
HIPAAUSPHI privacy, security, breach notification for AI systems handling patient dataD3, D7
FDA AI/ML SaMDUSPre-market authorisation, performance monitoring, predetermined change control plansD2, D5, D6
EU AI Act — High RiskEUMedical devices and clinical management systems are listed as high-risk AI systemsD1–D8 (all)
CMS InteroperabilityUSAPI access to health data — AI systems consuming this data inherit the obligationsD3, D8
NHS AI FrameworkUKClinical safety assessment (DCB 0129), algorithmic transparency, fairnessD2, D6
ISO 14971InternationalRisk analysis and control for medical AI — probability and severity of harmD5, D6

The PHI Problem: Most Common AI Compliance Failure

The single most common AI compliance failure in healthcare is sending Protected Health Information (PHI) to a third-party LLM API without a signed Business Associate Agreement (BAA). This is a HIPAA violation with penalties up to $1.9M per violation category per year.

PHI includes: names, dates (except year), geographic data below state level, phone/fax numbers, email addresses, social security numbers, medical record numbers, account numbers, certificate numbers, URLs, IP addresses, device identifiers, biometric identifiers, full-face photographs, and any other unique identifier. The model doesn't need to store it — sending it is sufficient for a violation.

Mitigation: all major AI vendors (OpenAI, Anthropic, Google, Microsoft/Azure) offer BAAs for enterprise tiers. Sign one. Separately, implement PHI detection and redaction before sending to any external API — even under a BAA, sending only what is necessary is best practice (data minimisation).

PSF Domain Mapping for Healthcare

Every PSF domain applies in healthcare, but three (D1, D2, D3) are elevated to Critical status due to direct patient harm potential. The following analysis maps each PSF domain to the healthcare context with specific regulatory touchpoints.

PSF-1 Input Governance

Critical

Clinical AI systems receive inputs from EHR systems, clinician free-text, and imaging pipelines. Prompt injection via malformed clinical notes is a documented attack vector. Schema validation on all HL7 FHIR inputs is non-negotiable.

  • Validate all EHR data against FHIR R4/R5 schema before passing to the model
  • Treat free-text clinical notes as untrusted input — sanitise before inclusion in prompts
  • Implement strict system prompt isolation; clinical context must not be manipulated by patient-submitted text
  • Rate-limit API access; abuse detection tuned for clinical access patterns

PSF-2 Output Validation

Critical

Clinical decision support outputs must be validated before display. A hallucinated drug dosage or contraindication assessment can cause direct patient harm. Output contracts with medical coding validation (ICD-10, SNOMED CT) are required.

  • Implement output contracts for all clinical recommendations — structured JSON with validation schema
  • Validate clinical codes (ICD-10, SNOMED CT, LOINC) against authoritative terminologies
  • Set confidence thresholds — below threshold: require human review, do not surface recommendation
  • Never allow the model to generate dosing instructions in free-text without validation

PSF-3 Data Protection

Critical — HIPAA

PHI (Protected Health Information) is the most heavily regulated personal data category. Sending PHI to a third-party LLM API without a BAA (Business Associate Agreement) is a HIPAA violation. This is the most common AI deployment compliance failure in healthcare.

  • Sign BAAs with every AI vendor receiving PHI — OpenAI, Anthropic, Google, Microsoft all offer these
  • Implement PHI detection and redaction before sending to any external API (use Microsoft Presidio or AWS Comprehend Medical)
  • Never log raw clinical inputs — PHI in logs is a breach
  • Audit trace retention policies — LangSmith, Langfuse retain prompts by default; configure data deletion
  • For imaging AI: DICOM metadata stripping required before external processing

PSF-4 Observability

Required

FDA AI/ML guidance requires performance monitoring throughout the product lifecycle. Audit logs for all AI-assisted clinical decisions are required for post-market surveillance and incident investigation. Logs must be HIPAA-compliant — no PHI in observability data.

  • Log all AI recommendations with timestamps, confidence scores, and clinician actions
  • Implement drift detection — clinical AI degrades as patient population shifts
  • Configure HIPAA-safe logging: strip PHI from all trace data before storage
  • Retain audit logs for minimum 6 years (HIPAA), or 10 years for medical devices

PSF-5 Deployment Safety

Required

Healthcare AI must have well-defined blast radius controls. Clinical decision support should operate at L2–L3 autonomy (recommendation with human approval) for high-stakes decisions. L4 autonomous action is only appropriate for clearly bounded, low-risk clinical tasks.

  • Define autonomy levels per clinical task: triage classification (L3) ≠ treatment recommendation (L2) ≠ order entry (L2 minimum)
  • Implement rollback procedures — ability to revert to previous model version within 4 hours
  • Staged deployment: shadow mode first, then limited cohort, then full deployment with monitoring
  • Document predetermined change control plan (PCCP) for FDA SaMD compliance

PSF-6 Human Oversight

Critical — Patient Safety

Clinical decision support requires meaningful human oversight. 'Alert fatigue' is the primary failure mode — too many AI recommendations cause clinicians to override without reviewing. The oversight design must be calibrated to clinical workflow, not just regulatory compliance.

  • Design oversight for clinical context: busy clinician workflow ≠ IT operator dashboard
  • Implement tiered alerting: critical (immediate interrupt) vs. advisory (end of note review)
  • Blind sampling: regularly send AI recommendations for human review without the AI label
  • Track override rates by recommendation type — high override = low clinical trust = model issue
  • Escalation paths must go to supervising clinician, not just IT

PSF-7 Security

Required

Healthcare is the most targeted sector for ransomware and data theft. AI systems are a new attack surface — model poisoning, adversarial clinical note injection, and API credential theft all have direct patient safety implications.

  • Treat AI API keys as PHI-equivalent credentials — store in secrets manager, rotate quarterly
  • Adversarial testing: red-team the clinical AI with adversarially crafted clinical notes
  • Monitor for model extraction attacks — unusual query patterns on clinical AI APIs
  • Zero-trust network architecture for AI API traffic

PSF-8 Vendor Resilience

Required

Clinical workflows cannot tolerate AI vendor outages. NHS and hospital IT have experienced AI vendor failures causing clinical disruption. Fallback procedures must be clinically tested, not just technically documented.

  • Dual-vendor strategy for critical clinical AI paths
  • Graceful degradation: define which clinical decisions revert to manual process on AI failure
  • SLA requirements: 99.9% uptime minimum for clinical decision support; 99.99% for anything in the care pathway
  • Test the fallback: quarterly drill of 'AI is unavailable' clinical workflow

Clinical vs Administrative AI: Different Risk Profiles

Not all healthcare AI is equal. A chatbot answering FAQs about appointment booking has a fundamentally different risk profile from a clinical decision support tool suggesting diagnoses. Practitioners must be explicit about which category they are deploying.

Clinical AI (High Risk)
Clinical decision support
Diagnostic assistance
Treatment planning
Medication management
Risk stratification
Patient triage
Requires: BAA, FDA SaMD consideration, clinical safety assessment, full PSF compliance
Administrative AI (Lower Risk)
Appointment scheduling
Billing code suggestions
Prior authorisation drafts
Patient communication
Staff training content
Operational analytics
Still requires: BAA (if PHI involved), D3 compliance, human review for high-stakes outputs

Recommended Autonomy Levels by Clinical Task

PSF-5 defines five autonomy levels (L0–L4). In healthcare, L3 (recommendation with mandatory human approval before action) is the maximum appropriate level for most clinical AI tasks. L4 autonomous action should be limited to low-risk, well-bounded tasks with extensive validation history.
L3
Diagnostic imaging classification (abnormal/normal triage)
Human radiologist reviews all AI-flagged and a sample of AI-cleared cases
L3
Clinical documentation assistance (note summarisation)
Clinician reviews and approves before saving to EHR
L2
Drug interaction checking
Alert only — clinician makes all prescribing decisions
L3
Patient risk stratification
Care team reviews risk scores before care plan changes
L4
Appointment reminder messages
Low risk, well-bounded — autonomous with audit log
L3
Prior authorisation content drafting
Clinical staff reviews before submission

Alert Fatigue: The Oversight Failure Mode Unique to Healthcare

Healthcare has a well-documented problem that is now directly relevant to AI deployment: alert fatigue. Studies show clinicians override up to 95% of drug interaction alerts — not because they are wrong, but because there are too many. AI systems that generate too many recommendations, flags, or warnings will be systematically ignored.

This is a PSF-6 (Human Oversight) failure mode, but it manifests as a PSF-2 (Output Validation) design problem. The solution is calibrated confidence thresholds: only surface recommendations above a high confidence bar, and continuously tune the threshold based on clinician override rates. A 90%+ override rate is evidence your threshold is wrong, not evidence that clinicians are non-compliant.

Minimum Viable Healthcare AI Compliance Checklist

LegalBAA signed with every AI vendor that receives PHI
LegalLegal review: is this use case SaMD under FDA rules?
PSF-3PHI detection and redaction in the data pipeline before model calls
PSF-3Logging configured to strip PHI — no raw inputs in observability tools
PSF-1Input validation on all EHR/FHIR data entering the AI system
PSF-2Output contracts defined with structured schema for all clinical recommendations
PSF-2Confidence thresholds set — below threshold = human review, not surfaced to clinician
PSF-6Oversight design reviewed by clinical staff — not just IT
PSF-5Autonomy level documented per clinical task type
PSF-4Audit logging for all AI-assisted decisions, retained 6+ years
PSF-7AI API credentials in secrets manager, not in code or config files
PSF-8Fallback clinical procedure documented and tested for AI unavailability

Related guides

D3 Data Protection deep dive — PII masking and retention policiesD6 Human Oversight deep dive — HITL patterns for regulated environmentsFinancial Services AI Deployment PlaybookEU AI Act — production AI practitioner guidePSF-compliant stack recipes
From reading to credential

You understand the gaps.
Get the credential that proves it.

The AIDA examination tests applied PSF knowledge across all eight domains — exactly the gaps and strengths covered in this assessment. 15 minutes. No charge. Ever.

Start AIDA — free →CPAP practitioner credential
The Production AI Brief

Get framework updates in your inbox

PSF assessments, deployment guides, and production AI analysis. Weekly. No hype.