Skip to main content

Agent Harness Explorer

Agent mode only — these results come from multi-turn agentic coding sessions (Claude CLI, Gemini CLI, opencode, Codex). For 0-shot API and self-repair results, see Benchmarks.

Browse by language, harness, and model. The cross-harness comparison shows what happens when the same underlying model runs through a different CLI.

Loading benchmark data…