Hybrid Attack Orchestration¶
Underlying Methods
The hybrid loop combines:
- PAIR — Chao et al. (2023). arXiv:2310.08419. arxiv.org/abs/2310.08419
- Crescendo — Russinovich et al. (2024). arXiv:2404.01833. arxiv.org/abs/2404.01833
- GCG — Zou et al. (2023). arXiv:2307.15043. arxiv.org/abs/2307.15043
The Hybrid Loop (attacks/hybrid_loop.py) orchestrates multiple attack strategies in sequence, escalating from fast-and-cheap to slow-and-powerful.
Orchestration Order¶
- PAIR is tried first — fast, low query cost.
- If PAIR fails (judge score < threshold after max iterations), Crescendo is launched — multi-turn, higher cost.
- If both fail, baseline direct prompting is used as a fallback.
The first successful attack short-circuits the chain.
When to Use Hybrid¶
Use hybrid orchestration when: - You want to measure which attack succeeds first per goal (for attack effectiveness comparison) - You are running a full benchmark sweep and want maximum coverage
Configuration¶
Or in YAML config:
Record Fields¶
Each ExperimentRecord produced by the hybrid loop includes:
attack_name: the attack that produced the final resultiterations: total iterations across all attempted strategiesqueries: total queries to the target modelfusion_strategy: if prompt fusion was involved, the fusion variant used