Production AI Institute · Independent certification for production AI practice
Verify a credential|Contact

Insights / PSF / Domain Guide

Production AI Institute — PSF Domain Guide v1.0
Published: 2026-04-29 · License: CC BY 4.0
Domain: PSF-7 — Security
PSF-7

Security

AI systems are software systems, and all conventional software security requirements apply. But AI systems also introduce attack surfaces that are unique to machine learning: model extraction, membership inference, adversarial examples, and the ability to manipulate model behaviour through carefully crafted inputs. PSF-7 addresses both layers.

The AI-Specific Threat Surface

Model extraction attacks

An adversary queries your model repeatedly to reconstruct a functional equivalent. This is an intellectual property risk for proprietary models and a security risk if the reconstructed model is used to find adversarial inputs more efficiently.

Membership inference

An adversary determines whether a specific record was in the model's training data. This is a privacy risk when training data included personal data, and a confidentiality risk when it included proprietary business information.

Adversarial examples

Inputs specifically crafted to cause the model to produce a target incorrect output. In image classification, imperceptible pixel-level perturbations. In NLP, character-level substitutions that preserve human readability but change model predictions.

System prompt extraction

Adversarial inputs that cause the model to repeat or reveal its system prompt. System prompts often contain proprietary instructions, confidentiality requirements, or operational details that should not be disclosed to end users.

API credential exposure

AI API keys (OpenAI, Anthropic, Google, etc.) are high-value credentials. Exposure allows an adversary to run queries at your cost, access your fine-tuned models, and potentially exfiltrate logged data. Rotation schedules and secret management practices matter.

Supply chain attacks

AI systems depend on model providers, inference libraries, embedding models, and vector databases. A compromised dependency in any of these can affect model behaviour or expose data. Dependency provenance tracking is part of AI security.

AI Threat Modelling

Every production AI system should have a threat model that explicitly addresses the AI-specific attack surface. A conventional STRIDE threat model applied to an AI system will miss model extraction, membership inference, and adversarial input attacks because these do not map neatly to conventional threat categories. The threat model should be produced as part of the system design, updated when the system architecture changes, and reviewed when new attack techniques are published against similar systems.

API Key and Credential Management

AI API credentials are a concentrated risk. A single leaked key provides access to a model, potentially including fine-tuned versions, usage history, and (for some providers) data logged during inference. Best practices: use separate credentials per environment and per application, never embed credentials in source code or version-controlled configuration, store credentials in a secrets manager (not environment variables in process memory where avoidable), rotate credentials on a defined schedule, and monitor for unusual usage patterns that may indicate credential exposure.

PSF-7 Compliance Checklist

AI-specific threat model completed before production deployment
API credentials stored in a secrets manager (not hardcoded, not in environment variables in code repos)
Separate credentials per environment (development, staging, production)
Credential rotation schedule defined and enforced
Unusual API usage monitoring: alerts for off-hours spikes, geographic anomalies, cost anomalies
System prompt treated as a secret: not revealed in outputs, not stored in client-side code
Model access controls: only authorised systems and users can call inference endpoints
Rate limiting on all public-facing AI endpoints
Dependency provenance tracking for AI libraries and models
Penetration testing of AI-specific attack surface (prompt injection, extraction attempts) at least annually

Red-Teaming Production AI Systems

Red-teaming — structured adversarial testing by a team attempting to find exploitable failures — is the AI security equivalent of penetration testing. For AI systems, red-teaming exercises should include: systematic prompt injection attempts (direct and indirect), system prompt extraction attempts, jailbreak attempts across known attack pattern categories, model extraction rate measurement, and boundary-testing for each defined out-of-scope use case. Red-teaming findings should be documented, triaged, and tracked to remediation. This is specifically tested by the CAIS certification.

AIDA Exam Tips for PSF-7

  • PSF-7 covers AI-specific security, not general security. Questions that test whether you know prompt injection belongs to PSF-1 (input governance) vs. PSF-7 (security) are common — prompt injection defence at the input layer is PSF-1; the threat modelling and red-teaming that identifies it as a risk is PSF-7.
  • API credential questions are pure PSF-7. Know: separate credentials per environment, secrets manager storage, rotation schedules, usage monitoring.
  • System prompt exposure: the PSF-7 answer treats the system prompt as a secret and implements controls to prevent its disclosure in model outputs.
  • Model extraction: the PSF-7 control is rate limiting on public endpoints (making extraction expensive) combined with usage monitoring (detecting extraction attempts).
  • Supply chain: the PSF-7 angle on third-party AI dependencies is provenance and integrity verification — not just availability or performance.

Certifications that assess PSF-7

AIDA ExaminationCAIS — AI Safety SpecialistCPAP Portfolio
Full PSF FrameworkStudy GuidePractice Exam