Threat Model¶

Key References

OWASP Agentic AI Top-10 — OWASP Foundation (2024). genai.owasp.org
LLM Agents Safety Survey — Gu, T. et al. (2024). Agents Under Siege: Breaking and Fixing LLM-Based Multi-Agent Systems. arXiv:2503.10434. arxiv.org/pdf/2504.00218
AgentDojo Benchmark — Debenedetti, E. et al. (2024). arXiv:2406.13352. arxiv.org/abs/2406.13352

Standard LLM safety red-teaming targets single-turn refusal. Agentic systems are fundamentally different:

Multi-step planning: An attack that fails in one turn can succeed across three or more carefully constructed turns.
Tool execution: Models don't just speak — they act. A successful jailbreak can read files, execute code, or exfiltrate data.
Memory and context: Long-horizon context windows allow adversaries to plant instructions that activate later in the conversation.
Agent-to-agent interaction: Multi-agent pipelines create impersonation and cross-agent injection surfaces.

This framework evaluates all of these surfaces systematically.

Threat Surface Map¶

flowchart TD
    USER[User / Attacker] -->|crafted prompt| PDEF[Prompt Defense Layer]
    PDEF -->|blocked| B1[🚫 Prompt Rejected]
    PDEF -->|pass| PLANNER[Attack / Agentic Planner]

    PLANNER --> TARGET[Target LLM]
    TARGET --> RESP[Model Response]
    RESP --> RDEF[Response Defense Layer]
    RDEF -->|blocked| B2[🚫 Response Rejected]
    RDEF -->|pass| TOOL_DEC{Tool Call?}

    TOOL_DEC -->|no| OUTPUT[Final Output]
    TOOL_DEC -->|yes| TDEF[Tool Policy Check]
    TDEF -->|blocked| B3[🚫 Tool Blocked]
    TDEF -->|pass| EXEC[Sandbox Tool Execution]
    EXEC --> OUTPUT

OWASP Agentic AI Top-10 Taxonomy¶

This framework maps all evaluation goals to the OWASP Agentic AI Top-10 vulnerability taxonomy:

ID	Category	Description
AAI-01	Broken Object-Level Access Control	Unauthorized access to files, memory, or external resources
AAI-02	Agent Impersonation	Attacker masquerades as a trusted agent or orchestrator
AAI-03	Prompt / Instruction Injection	Malicious instructions injected via tools, context, or memory
AAI-04	Overly Permissive Tool Use	Model invokes tools beyond appropriate scope
AAI-05	Memory Poisoning	Persistent memory stores manipulated to influence future decisions
AAI-06	Multi-Agent Exploitation	Cross-agent manipulation in orchestrated pipelines
AAI-07	Data Exfiltration	Sensitive data routed to attacker-controlled endpoints
AAI-08	Resource Abuse	Excessive CPU, token, or API resource consumption
AAI-09	Supply Chain Attack	Compromised tool or dependency exploited in execution
AAI-10	Trust Boundary Violation	Policies circumvented across isolation boundaries

→ OWASP AAI Top-10 detailed reference
→ Full attack surface breakdown

Defense Design Principle¶

Defenses in this framework are composable and fail-safe:

Prompt defenses execute first — block before the model ever sees the input
Response defenses execute after generation — catch harmful outputs before tool dispatch
Tool policy checks gate every tool invocation — last line of defense before execution

Defenses are activated in deterministic registry order and each contributes to the Defense Bypass Rate (DBR) metric.

→ Defense implementations →