Skip to content

Threat Model

Key References

  • OWASP Agentic AI Top-10 — OWASP Foundation (2024). genai.owasp.org
  • LLM Agents Safety Survey — Gu, T. et al. (2024). Agents Under Siege: Breaking and Fixing LLM-Based Multi-Agent Systems. arXiv:2503.10434. arxiv.org/pdf/2504.00218
  • AgentDojo Benchmark — Debenedetti, E. et al. (2024). arXiv:2406.13352. arxiv.org/abs/2406.13352

Standard LLM safety red-teaming targets single-turn refusal. Agentic systems are fundamentally different:

  • Multi-step planning: An attack that fails in one turn can succeed across three or more carefully constructed turns.
  • Tool execution: Models don't just speak — they act. A successful jailbreak can read files, execute code, or exfiltrate data.
  • Memory and context: Long-horizon context windows allow adversaries to plant instructions that activate later in the conversation.
  • Agent-to-agent interaction: Multi-agent pipelines create impersonation and cross-agent injection surfaces.

This framework evaluates all of these surfaces systematically.

Threat Surface Map

flowchart TD
    USER[User / Attacker] -->|crafted prompt| PDEF[Prompt Defense Layer]
    PDEF -->|blocked| B1[🚫 Prompt Rejected]
    PDEF -->|pass| PLANNER[Attack / Agentic Planner]

    PLANNER --> TARGET[Target LLM]
    TARGET --> RESP[Model Response]
    RESP --> RDEF[Response Defense Layer]
    RDEF -->|blocked| B2[🚫 Response Rejected]
    RDEF -->|pass| TOOL_DEC{Tool Call?}

    TOOL_DEC -->|no| OUTPUT[Final Output]
    TOOL_DEC -->|yes| TDEF[Tool Policy Check]
    TDEF -->|blocked| B3[🚫 Tool Blocked]
    TDEF -->|pass| EXEC[Sandbox Tool Execution]
    EXEC --> OUTPUT

OWASP Agentic AI Top-10 Taxonomy

This framework maps all evaluation goals to the OWASP Agentic AI Top-10 vulnerability taxonomy:

ID Category Description
AAI-01 Broken Object-Level Access Control Unauthorized access to files, memory, or external resources
AAI-02 Agent Impersonation Attacker masquerades as a trusted agent or orchestrator
AAI-03 Prompt / Instruction Injection Malicious instructions injected via tools, context, or memory
AAI-04 Overly Permissive Tool Use Model invokes tools beyond appropriate scope
AAI-05 Memory Poisoning Persistent memory stores manipulated to influence future decisions
AAI-06 Multi-Agent Exploitation Cross-agent manipulation in orchestrated pipelines
AAI-07 Data Exfiltration Sensitive data routed to attacker-controlled endpoints
AAI-08 Resource Abuse Excessive CPU, token, or API resource consumption
AAI-09 Supply Chain Attack Compromised tool or dependency exploited in execution
AAI-10 Trust Boundary Violation Policies circumvented across isolation boundaries

OWASP AAI Top-10 detailed reference
Full attack surface breakdown

Defense Design Principle

Defenses in this framework are composable and fail-safe:

  1. Prompt defenses execute first — block before the model ever sees the input
  2. Response defenses execute after generation — catch harmful outputs before tool dispatch
  3. Tool policy checks gate every tool invocation — last line of defense before execution

Defenses are activated in deterministic registry order and each contributes to the Defense Bypass Rate (DBR) metric.

Defense implementations →