DSPy reframes LLM application development as a machine learning optimisation problem: instead of hand-crafting prompts, you define the input/output signature and let DSPy compile and optimise the prompting strategy automatically. It is the most intellectually distinctive framework in this series — and the one with the widest gap between its research elegance and production safety posture.
In every other framework, the developer writes the prompt. In DSPy, the developer writes a signature — a typed declaration of what goes in and what comes out — and DSPy compiles an optimised prompting strategy using a teleprompter (optimiser). The resulting program can outperform hand-crafted prompts significantly, particularly for complex multi-hop reasoning tasks.
This approach has real implications for PSF compliance. D2 (Output Validation) is genuinely strong because the type system is fundamental. But D1 (Input Governance) and D7 (Security) are weak because DSPy was designed for optimisation research, not adversarial production environments.
DSPy's TypedPredictor and Assertionsenforce output schemas at the framework level — if the model produces output that doesn't match the declared signature type, DSPy retries with corrective feedback automatically. This is the most rigorous D2 implementation of any framework assessed so far.
For applications where output correctness is critical — financial calculations, clinical summaries, structured data extraction — DSPy's type enforcement provides a level of output reliability that other frameworks only approximate through wrapper libraries.
DSPy's three Gap ratings (D1, D3, D7) share a common root: the framework was designed in a research context where the threat model assumes a cooperative user and a trusted environment. In production, neither assumption holds.
The AIDA examination tests applied PSF knowledge across all eight domains — exactly the gaps and strengths covered in this assessment. 15 minutes. No charge. Ever.