Quickstart¶

0) Prerequisites¶

OS: macOS or Linux (required for sandbox isolation)
Python: 3.10+
Isolation (Optional): bubblewrap (bwrap) or Docker, required for tool-execution sandboxing.

1) Clone and setup environment¶

# Clone the repository
git clone https://github.com/mohammedalaa40123/agentic_safety.git
cd agentic_safety

# Create and activate the Python environment
uv venv .venv
source .venv/bin/activate
uv pip install -e .
uv sync

Install server support if you plan to run the FastAPI backend:

uv pip install -e .[server]

Install documentation dependencies:

uv pip install -r requirements-docs.txt

2) Set provider API keys¶

Export the keys required by your chosen model backend:

export OPENAI_API_KEY="..."            # OpenAI models
export ANTHROPIC_API_KEY="..."         # Claude models
export GEMINI_API_KEY="..."            # Google Gemini (standard API)
export GENAI_STUDIO_API_KEY="..."      # Google Vertex AI / GenAI Studio (RCAC)
export OLLAMA_CLOUD_API_KEY="..."      # Hosted Ollama endpoint (e.g., https://ollama.com/api)
export WANDB_API_KEY="..."             # Optional: only if wandb.enabled: true

3) Run a baseline smoke experiment¶

python run.py --config configs/eval_qwen_baseline.yaml --verbose

4) Run a sandboxed attack experiment¶

python run.py \
  --config configs/eval_qwen_pair_attack.yaml \
  --mode attack \
  --goals data/agentic_scenarios_10_mixed.json \
  --use-sandbox \
  --use-defenses jbshield gradient_cuff \
  --attack-plan pair crescendo baseline \
  --output-dir results/demo \
  --verbose

5) Run a server-backed evaluation¶

python -m uvicorn server.main:app --host 0.0.0.0 --port 7860

If you have built the frontend, the backend will serve the frontend/dist bundle.

6) Verify outputs¶

The configured output_dir contains:

*.log run logs
results_*.csv experiment records
results_*.json summary and detail exports

7) Run tests¶

pytest -q tests/

8) Preview docs locally¶

mkdocs serve

Then open http://127.0.0.1:8000.