Local and Ollama Providers¶
Running Ollama Locally¶
Install Ollama and pull a model:
Then reference in config:
models:
target_model: ollama:llama3.3:70b
attack_model: ollama:qwen3-coder:480b
judge_model: ollama:nemotron-3-super
Ollama Cloud Endpoint¶
For remote Ollama endpoints:
Model Cache¶
Override the model cache directory:
Performance Notes¶
- Local models require sufficient VRAM/RAM. DeepSeek-R1-70B requires ~40GB (fp16) or ~20GB (4-bit).
- CPU-only inference is very slow. Use MPS on Apple Silicon or CUDA on Linux with a GPU.
- For latency-sensitive benchmarking, prefer cloud API providers.
→ Back to Providers Overview