Strategies for fitting the right information into the finite context an agent can process.
Every model has a context window — a maximum amount of text it can process at once. Context window management is the set of architectural strategies for ensuring that agents have the information they need within that constraint, without including information they should not.
Context window management operates at three levels. Selection: deciding which information to include in the context at all — instructions, conversation history, retrieved documents, tool outputs, and intermediate results all compete for the same space. Prioritisation: when the total desired context exceeds the window, defining which components get space and which are summarised or excluded. Sanitisation: ensuring that context does not contain PII or confidential information that was present in earlier turns but should not be visible in the current one. The key insight is that context management is a design decision with quality and safety implications: including the wrong information (too much history, irrelevant documents) degrades output quality; including prohibited information (PII from another user, confidential data in the wrong context) creates compliance and security incidents.
A healthcare documentation system manages context for agents that process clinical notes. The context budget is allocated as: 30% for current consultation notes, 25% for relevant medical history (retrieved via RAG, filtered to most relevant by recency and diagnosis relevance), 20% for clinical guidelines, 15% for instruction and output format specification, 10% for buffer. Before each agent call, a context assembly function selects, prioritises, and sanitises the components. PII not relevant to the current task is redacted from retrieved history. If the total assembled context exceeds 90% of the window, older history is summarised first.
Context window limits are not an engineering inconvenience — they are a design constraint that shapes what your agent can and cannot do. The agent that processes long documents, maintains conversation history, and retrieves external knowledge must make explicit trade-offs about what it can hold in mind at once. Making those trade-offs explicitly and systematically is context window management.
How this pattern fails in practice — and what to watch for.
The context assembly function silently truncates content when it exceeds the window limit. The model receives an incomplete context and has no way of knowing this. It produces a confident response based on incomplete information, and the truncation is not reflected in the output or logs.
In long contexts, the model's attention weights recent content more heavily than earlier content. The system instructions specified at the beginning of the context are effectively overridden by a long sequence of conversation history and retrieved documents. The agent behaves inconsistently with its specified instructions.
A multi-turn conversation progressively accumulates PII in context: a name in turn 1, an address in turn 5, an account number in turn 9. The output of turn 12 combines these into a response that constitutes a PII exposure incident — even though no single turn introduced the problem.
Seven things to verify before deploying this pattern in production.
Context window management is tested in AIDA under D3 (PII in context) and D2 (output quality implications of context truncation). CAIG examines the data governance implications: what information should never appear in context, and who defines this? CAIAUD auditors look for context management policies that are documented and enforced — particularly PII handling and truncation behaviour.
The AIDA certification covers all 21 agentic design patterns with a focus on deployment safety, governance, and the PSF. Free to attempt.