Production AI Institute · Independent certification for production AI practice
Verify a credential|Contact

Insights / PSF / Domain Guide

Production AI Institute — PSF Domain Guide v1.0
Published: 2026-04-29 · License: CC BY 4.0
Domain: PSF-3 — Data Protection
PSF-3

Data Protection

Artificial intelligence dramatically amplifies the scale at which organisations process personal data. A model that summarises customer emails processes every word of every email. A system that classifies support tickets extracts meaning from what customers say about their experiences. PSF-3 addresses the data protection obligations that arise specifically from AI inference — not just the data governance obligations that existed before AI.

AI-Specific Data Protection Risks

Training data contamination

Personal data used to fine-tune or train models can be memorised and reproduced verbatim by the model. Data that should have been deleted may persist in model weights. Differential privacy, data minimisation, and fine-tuning data audits are the controls.

Inference data exposure

Data submitted for inference is sent to model providers (if using external APIs). This creates data residency, consent, and confidentiality obligations that many organisations fail to assess before deploying cloud AI APIs.

Context window leakage

In multi-turn conversations, personal data from earlier turns persists in the context window and can be referenced in later responses. Context management must include explicit data minimisation — not passing personal data through context unnecessarily.

Logging and telemetry

Inference logs, prompt logs, and monitoring telemetry can contain personal data. These logs are often retained longer than the original data, processed by more systems, and have weaker access controls than primary data stores.

Embedding and vector store risk

Vector embeddings of personal data persist in vector databases. While embeddings are not directly reversible, recent research demonstrates extraction attacks. Personal data embedded for RAG requires the same protection as the source data.

Cross-context data bleed

In shared inference infrastructure, data submitted by one user can influence outputs for others through caching, context pollution, or model state. Shared caches require tenant isolation controls.

The Consent Chain Problem

Most organisations have consent for data collection and primary processing. Very few have explicit consent for AI inference on that data. Under GDPR and equivalent frameworks, using personal data for a materially different purpose — including AI inference that was not disclosed at collection time — requires either a new legal basis or a legitimate interests assessment. The consent chain for AI processing must be documented: what data, from what source, for what AI purpose, under what legal basis, with what retention period.

PII Handling at Inference Time

PII classification applied to all data before it enters an AI pipeline
PII redacted or tokenised before submission to external model APIs
Inference logs purged or anonymised on a documented retention schedule
Vector embeddings of personal data subject to same access controls as source data
Context window management: personal data not persisted across sessions without consent
Data residency requirements checked against model provider API regions
Legal basis documented for each AI processing purpose
Third-party model provider data processing agreements reviewed and in place
User disclosure: AI processing disclosed in privacy notice
Right to erasure: process defined for removing personal data from inference logs, vector stores, and (where applicable) fine-tuned model weights

Data Minimisation in AI Systems

Data minimisation — only processing the personal data necessary for the specified purpose — applies to AI systems with particular force. The temptation to provide rich context to improve model performance must be balanced against the principle that processing more data than necessary creates exposure without proportionate benefit. For each AI feature, document the minimum data required and enforce that minimum in the system architecture — not as a policy aspiration.

AIDA Exam Tips for PSF-3

  • PSF-3 questions often involve a GDPR or EU AI Act scenario. Know that using personal data for AI inference may require a new legal basis separate from the original collection consent.
  • PII redaction/tokenisation before external API calls is the canonical PSF-3 answer when data is leaving your infrastructure.
  • Data residency questions: the PSF-3 control is to check the model provider API's data processing region against applicable residency requirements before deployment.
  • Logging questions: inference logs containing personal data must be subject to retention schedules and access controls — not just application logs. This is a common exam gap.
  • Right to erasure scenarios: the exam tests whether you know that 'delete from the database' is insufficient if the data also exists in inference logs, vector stores, or fine-tuned weights.

Certifications that assess PSF-3

AIDA ExaminationCAIG — AI GovernanceCPAP Portfolio
Full PSF FrameworkStudy GuidePractice Exam