Agent Harness Explorer
Agent mode only — these results come from multi-turn agentic coding sessions (Claude CLI, Gemini CLI, opencode, Codex). For 0-shot API and self-repair results, see Benchmarks.
Browse by language, harness, and model. The cross-harness comparison shows what happens when the same underlying model runs through a different CLI.
Loading benchmark data…