Agentic Safety Evaluation Framework¶
Research-first documentation
This site foregrounds the threat model, attack taxonomy, defense mechanisms, and benchmark results before operational setup. If you just want to run an experiment, jump to Quickstart.
What This Framework Evaluates¶
Agentic LLMs are qualitatively different from single-turn chat models. They plan across many steps, call external tools, browse the web, execute code, and interact with other agents. A request that a chat model safely refuses in one turn may succeed after three carefully crafted turns with tool-use context.
This framework provides a repeatable evaluation harness that tests jailbreak attacks across the full agentic pipeline — from initial prompt to tool execution.
Navigation¶
| Section | What you'll find |
|---|---|
| 🗺️ Threat Model | OWASP Agentic AI Top-10 taxonomy, full attack surface analysis |
| ⚔️ Attacks | PAIR, Crescendo, Prompt Fusion, and Hybrid method documentation |
| 🛡️ Defenses | JBShield, Gradient Cuff, Progent, StepShield — how each works |
| 📊 Evaluation | Benchmark methodology, metrics (MIR/TIR/DBR/QTJ), leaderboard |
| 🌐 Providers | Cloud, local, and HPC provider setup |
| ⚡ Getting Started | Environment setup, install, and first run |
| 🏗️ Architecture | System wiring, execution flows, threat-defense model |
| 🚀 Deployment | GitHub Pages, Hugging Face Space, experiment scale-out |
Mini-Benchmark Results at a Glance¶
Strict PAIR attack · No defenses · 4-model core set · Consistent Llama-3.3-70B judge

| Model | MIR | QTJ |
|---|---|---|
| Llama-3.3-70B | 83.7% | ~3.0 |
| DeepSeek-R1-70B | 83.2% | ~3.0 |
| DeepSeek-R1-14B | 75.4% | ~2.6 |
| DeepSeek-V3.2 | 66.0% | ~2.2 |
→ Full evaluation methodology and per-category breakdown
Responsible Use¶
This framework is designed for security research and safety evaluation in controlled environments. Access to target models and tools should be isolated to prevent actual harm during testing. We encourage responsible disclosure of any vulnerabilities discovered using these tools.
Core External Links¶
- 🤗 Live Space — interactive frontend and results API
- 🤗 Results Dataset — raw experiment output
- 🐙 GitHub Repository — source code