Quickstart¶
0) Prerequisites¶
- OS: macOS or Linux (required for sandbox isolation)
- Python: 3.10+
- Isolation (Optional):
bubblewrap(bwrap) or Docker, required for tool-execution sandboxing.
1) Clone and setup environment¶
# Clone the repository
git clone https://github.com/mohammedalaa40123/agentic_safety.git
cd agentic_safety
# Create and activate the Python environment
uv venv .venv
source .venv/bin/activate
uv pip install -e .
uv sync
Install server support if you plan to run the FastAPI backend:
Install documentation dependencies:
2) Set provider API keys¶
Export the keys required by your chosen model backend:
export OPENAI_API_KEY="..." # OpenAI models
export ANTHROPIC_API_KEY="..." # Claude models
export GEMINI_API_KEY="..." # Google Gemini (standard API)
export GENAI_STUDIO_API_KEY="..." # Google Vertex AI / GenAI Studio (RCAC)
export OLLAMA_CLOUD_API_KEY="..." # Hosted Ollama endpoint (e.g., https://ollama.com/api)
export WANDB_API_KEY="..." # Optional: only if wandb.enabled: true
3) Run a baseline smoke experiment¶
4) Run a sandboxed attack experiment¶
python run.py \
--config configs/eval_qwen_pair_attack.yaml \
--mode attack \
--goals data/agentic_scenarios_10_mixed.json \
--use-sandbox \
--use-defenses jbshield gradient_cuff \
--attack-plan pair crescendo baseline \
--output-dir results/demo \
--verbose
5) Run a server-backed evaluation¶
If you have built the frontend, the backend will serve the frontend/dist bundle.
6) Verify outputs¶
The configured output_dir contains:
*.logrun logsresults_*.csvexperiment recordsresults_*.jsonsummary and detail exports
7) Run tests¶
8) Preview docs locally¶
Then open http://127.0.0.1:8000.