CI/CD pipeline agents, event-driven ambient agents, and embedded product agents — with the full PSF Domain framework applied to each. Architecture, required controls, and the anti-patterns that will hurt you in production.
The PSF was designed for exactly this class of deployment: AI systems that take consequential actions in production environments, with inputs from external sources and outputs that affect real data, real communications, and real infrastructure. The Cursor SDK, in any of its three deployment contexts, is precisely this.
A Cursor SDK agent runs as a GitHub Actions step. On every pull request, it performs automated code review against your organisation's standards, generates missing test cases, scans for common security patterns, and posts a structured review comment — all before human review begins.
Validate the PR diff before passing to the agent: check file types are in scope, reject diffs that modify files outside the PR's declared scope, cap diff size at a defined token limit to prevent context flooding.
Parse agent review output against a ReviewSchema before posting. Required fields: summary string, issues array (each with file/line/severity/description), verdict enum (approved/changes-requested/advisory). Reject malformed output rather than posting it.
The agent context will contain source code. Ensure the agent's cloud execution environment does not have access to .env files, secrets, or files outside the PR diff. Pass only the diff to the agent context — not the full repository.
Log every agent run: PR ID, run duration, token consumption, review verdict, issue count. Alert on runs exceeding 3 minutes or 50k tokens. Retain logs for 90 days for audit and calibration.
Per-prompt runs provide natural blast-radius containment. Each PR triggers one bounded agent run. The agent cannot carry state between PRs. Use the SDK lifecycle to ensure runs are archived after completion.
The agent produces advisory output only. A human engineer must still review and approve the PR before merge. Never configure branch protection rules that allow agent approval to substitute for human review.
Run the agent under a service account with read-only repository access. The CI/CD step should not grant the agent write permissions to the main branch. Treat the agent's review output as untrusted before validation.
Implement a fallback path for Cursor SDK unavailability: the CI/CD step should be non-blocking on Cursor service outages. Configure the step to produce a skip result (not a failure) when the SDK is unreachable, and alert the team to run manual review.
Giving the agent write access to commit changes directly. The agent must remain advisory — it can identify issues but humans apply the changes.
Use GitHub Actions with the Cursor SDK TypeScript client. Set CURSOR_API_KEY from GitHub Secrets. Scope filesystem access to the checked-out PR diff directory only.
A Cursor agent is triggered by incoming support tickets. When a ticket arrives matching a defined category (e.g., billing dispute, integration error), the agent analyses the ticket, retrieves relevant account data via a read-only CRM MCP, drafts a structured response with supporting documentation links, and places it in a review queue — never sending directly.
This is the most critical domain for event-driven agents. Every incoming event is from an external, potentially untrusted source. Implement: event source authentication (verify webhook signatures), input classification (is this event within the agent's defined operational scope?), content sanitisation (strip HTML, normalise encoding, check for injection patterns in user-submitted text before passing to the agent).
Define an explicit output schema for every action the agent can take. For a response draft: ResponseDraft schema with required fields (subject, body, category, confidence, account_id). Reject drafts that don't conform. Validate that referenced documentation links exist before including them. Never pass raw agent output to a communication channel.
The agent will process customer data from the support ticket and potentially retrieve account data via MCP. Ensure: the agent context contains only the data required for the specific task, CRM MCP access is read-only with explicit field-level scope (do not grant access to payment data or full account history when only account status is needed), agent runs are not retained longer than required, PII in agent logs is masked.
Log every event processed: ticket ID, classification result, agent run ID, run duration, token usage, output validation result, queue placement. Build a dashboard showing daily volume, classification distribution, draft acceptance rate (after human review), and reject rate. This data calibrates the agent and identifies drift over time.
Implement action budgets: if the agent attempts more than N MCP tool calls in a single run, abort and flag for review. Use per-prompt runs for each event. Define explicit category scope — if a ticket is not in the agent's defined categories, route to human handling, do not pass to the agent.
No communication leaves the organisation without human review. The draft review queue is not optional — it is the core control. If queue processing falls behind (SLA breach), escalate to human handling rather than allowing unsupervised agent sends. Track time-in-queue and alert on unusual patterns.
Treat every incoming event as potentially adversarial. A sophisticated attacker can craft a support ticket designed to manipulate the agent's response or exploit the MCP connection. Implement: input content classification before agent access, MCP read-only scope enforcement, output review before any external action. Rotate MCP connection credentials on a regular schedule.
All events must be handled — Cursor service availability cannot be a single point of failure for customer support. Maintain a fallback routing path that sends tickets directly to human queues when the SDK is unavailable. Monitor SDK availability and alert on sustained outages.
Connecting the agent directly to a send-capable email or messaging MCP without a review queue. The agent must never be the final actor on communications.
Use a webhook receiver (e.g., Next.js API route or serverless function) to validate and enqueue events. The Cursor SDK call happens in a queue worker, not in the webhook handler. This separates event receipt from agent execution and gives you retry and backpressure controls.
A SaaS product embeds a Cursor agent as a 'Generate draft' feature. Users click a button in the UI, describe what they want, and the agent produces a structured output within the product — a report, a configuration file, a data pipeline definition. The output appears in an editable field before any action is taken.
User-submitted intent text is untrusted input. Validate: length limits, content classification (is this within the feature's intended scope?), injection pattern detection. Implement server-side validation — never rely on client-side validation alone. Rate-limit per user per time window. Log all inputs with user ID for audit.
Every agent output must be validated against the product's data schemas before rendering to the user or allowing application. For a config file generator: parse the output, validate schema conformance, check for required fields, validate value ranges. An agent-generated config that bypasses schema validation is a security risk.
Multi-tenant isolation is non-negotiable. The agent context must contain only the requesting user's data. Implement: tenant-scoped data access in your context assembly step, explicit verification that the assembled context contains no cross-tenant data, audit logging of what data was included in each agent context. GDPR and similar regulations apply to data processed by the agent.
Track per-user and per-feature agent usage: run count, token consumption, latency, output validation pass rate, user acceptance rate (did the user apply the draft or discard it?). Usage spikes may indicate abuse or unexpected feature adoption. Cost visibility per tenant is required for billing and capacity planning.
Implement per-user rate limits and monthly usage caps. Use per-prompt runs — embedded product agents should not maintain session state between user requests. Define explicit scope limits: the agent can access the user's data in the current workspace, nothing else. Implement hard limits on agent run duration (15s for sync, 5min for async).
The user is the human oversight layer. Never apply agent output without an explicit user confirmation step. Render output as an editable draft, not as an immediately applied change. For bulk operations (e.g., applying a config to multiple resources), require explicit confirmation per resource or a batched-confirmation UI with clear impact summary.
Embedded agents are a high-value attack target. Inputs come from users who may attempt to manipulate the agent to access other users' data or bypass product controls. Implement: server-side input validation, strict tenant isolation in context assembly, output schema validation before rendering, content security policy headers to prevent XSS if agent output is rendered as HTML.
The agent feature must degrade gracefully when Cursor is unavailable. Implement a feature flag that disables the agent UI and shows a 'currently unavailable' message rather than failing silently. Monitor SDK latency — if p95 latency exceeds your product's acceptable threshold, circuit-break to the fallback. Communicate planned maintenance to users proactively.
Applying agent output to production data without a user confirmation step. The draft-and-confirm pattern is the minimum acceptable oversight model for embedded agents.
Build context assembly as a separate server-side function that explicitly constructs the agent's context from scoped data queries. This function is auditable, testable, and can be reviewed independently of the agent logic.
These requirements apply to all three patterns regardless of deployment context.
Never store API keys, database credentials, or OAuth tokens in files accessible to the agent. Use a secrets manager (AWS Secrets Manager, GitHub Secrets, Doppler, Vault) and inject credentials as environment variables at runtime. A Cursor agent operating on a codebase that contains .env files will process those secrets in its context.
Before connecting any MCP server to a Cursor SDK agent, document: what data the MCP server can read, what actions it can take, and whether those actions are reversible. Grant the minimum OAuth scope required. Separate read MCPs from write MCPs architecturally — an agent should not have write access to a service it only needs to read from.
Every Cursor SDK agent run produces a run ID. Log this ID with your request metadata at the point of invocation, and store it in your observability system. When an incident occurs, the run ID lets you retrieve the full action trace from the SDK's durable agent store.
The PSF framework is model-agnostic. The controls above apply regardless of whether your Cursor agent is running GPT-4o, Claude Sonnet, Gemini, or Cursor's own Composer model. The framework evaluates your deployment architecture and operational controls — not your model choice. Model changes are a D8 (Vendor Resilience) concern.
Before promoting a Cursor SDK agent to a production deployment, test: does input governance correctly reject out-of-scope inputs? Does output validation correctly reject malformed outputs? Does the human oversight gate work as designed? Is the fallback path operational? Run adversarial test cases: what happens if you pass a prompt injection attempt through the input layer?
The AIDA examination tests applied PSF knowledge across all eight domains — exactly the gaps and strengths covered in this assessment. 15 minutes. No charge. Ever.
PSF assessments, deployment guides, and production AI analysis. Weekly. No hype.