Crescendo Attack¶
Original Paper
Russinovich, M., Salem, A., & Eldan, R. (2024). Great, Now Write an Essay About That: The Crescendo Multi-Turn LLM Jailbreak Attack. arXiv:2404.01833. https://arxiv.org/abs/2404.01833
Crescendo is a multi-turn escalation attack that incrementally steers the target model from benign to harmful behavior across multiple conversation turns.
How Crescendo Works¶
Rather than making a single obviously-malicious request, Crescendo builds compliance step-by-step:
- Start with a near-benign opening that establishes a plausible roleplay or context.
- Each subsequent turn escalates slightly, staying within what the model just agreed to.
- By the time the full harmful goal is implied, the model has already committed to consistent compliance.
flowchart LR
T1["Turn 1\n(benign setup)"] --> T2["Turn 2\n(minor escalation)"]
T2 --> T3["Turn 3\n(moderate escalation)"]
T3 --> TN["Turn N\n(target behavior)"]
TN -->|judge score ≥ 8| SUCCESS[✅ Attack Success]
TN -->|judge score < 8| T1
Benchmark Results¶
Crescendo reaches very high MIR but requires significantly more queries than PAIR:
| Model | MIR (Crescendo) | Avg QTJ | vs PAIR QTJ |
|---|---|---|---|
| DeepSeek-R1-14B | ~97–100% | ~14 | 5–6× more queries |
| DeepSeek-V3.2 | ~88% | ~11 | ~5× more queries |
| DeepSeek-R1-70B | ~100% | ~11 | ~4× more queries |
Configuration¶
Implementation Notes¶
- Implemented in
attacks/crescendo.py - Each turn's attacker prompt is generated by the attacker LLM using the full conversation history
- Judge scores each final response; intermediate turns are not individually scored
- QTJ for Crescendo counts the total number of turns to reach jailbreak, not just attacker queries