Crescendo Attack¶

Original Paper

Russinovich, M., Salem, A., & Eldan, R. (2024). Great, Now Write an Essay About That: The Crescendo Multi-Turn LLM Jailbreak Attack. arXiv:2404.01833. https://arxiv.org/abs/2404.01833

Crescendo is a multi-turn escalation attack that incrementally steers the target model from benign to harmful behavior across multiple conversation turns.

How Crescendo Works¶

Rather than making a single obviously-malicious request, Crescendo builds compliance step-by-step:

Start with a near-benign opening that establishes a plausible roleplay or context.
Each subsequent turn escalates slightly, staying within what the model just agreed to.
By the time the full harmful goal is implied, the model has already committed to consistent compliance.

flowchart LR
    T1["Turn 1\n(benign setup)"] --> T2["Turn 2\n(minor escalation)"]
    T2 --> T3["Turn 3\n(moderate escalation)"]
    T3 --> TN["Turn N\n(target behavior)"]
    TN -->|judge score ≥ 8| SUCCESS[✅ Attack Success]
    TN -->|judge score < 8| T1

Benchmark Results¶

Crescendo reaches very high MIR but requires significantly more queries than PAIR:

Model	MIR (Crescendo)	Avg QTJ	vs PAIR QTJ
DeepSeek-R1-14B	~97–100%	~14	5–6× more queries
DeepSeek-V3.2	~88%	~11	~5× more queries
DeepSeek-R1-70B	~100%	~11	~4× more queries

Configuration¶

attacks:
  - crescendo

attack_config:
  crescendo:
    max_turns: 20
    judge_threshold: 8

Implementation Notes¶

Implemented in attacks/crescendo.py
Each turn's attacker prompt is generated by the attacker LLM using the full conversation history
Judge scores each final response; intermediate turns are not individually scored
QTJ for Crescendo counts the total number of turns to reach jailbreak, not just attacker queries