Skip to content

Execution Flows

Attack mode flow

sequenceDiagram
    participant U as User
    participant R as run.py
    participant A as runner.attacks
    participant T as target model
    participant J as judge model
    participant M as metrics.collector

    U->>R: run.py --config ...
    R->>A: build_attack_runners(...)
    loop each goal
        A->>T: generate target response
        A->>J: score response
        A->>M: record outcome
    end
    M-->>U: CSV/JSON summary

Agentic mode flow

sequenceDiagram
    participant U as User
    participant R as run.py
    participant L as runner.agentic_loop
    participant T as target model
    participant S as AgenticSandbox
    participant X as tools
    participant M as metrics.collector

    U->>R: run.py --mode agentic
    R->>L: run_agentic_mode(...)
    loop until max_steps
        L->>T: chat with tool schema
        T-->>L: tool_call or final answer
        L->>S: execute_tool(name, args)
        S->>X: dispatch
        X-->>S: tool result
        S-->>L: observation
    end
    L->>M: record outcome and tool logs
    M-->>U: CSV/JSON summary

Defense checkpoints

  • Prompt-level filtering before model query.
  • Response-level filtering after target generation.
  • Optional tool-call checks in defense registry implementations.