Skip to main content

v0.23.0-iter5-par017-n1 OS-model smoke leaderboard

Auto-generated by ailang eval-publish v0.23.0-iter5-par017-n1.

Per-benchmark pass rate

Benchmarkopencode-gemma4-26b-ailang
adt_option100% (n=1)
balanced_parens100% (n=1)
binary_tree_sum0% (n=1)
canonical_convergence0% (n=1)
canonical_normalization100% (n=1)
dense_operator_program100% (n=1)
explicit_state_threading0% (n=1)
fizzbuzz100% (n=1)
gcd_lcm100% (n=1)
immutable_data_structures100% (n=1)
inline_tests0% (n=1)
nested_records100% (n=1)
numeric_modulo100% (n=1)
record_update100% (n=1)
records_book100% (n=1)
recursion_fibonacci100% (n=1)
type_safe_record_access0% (n=1)

Trend deltas since v0.23.0-iter1-baseline

Benchmarks where pass rate moved by >=10.0 percentage points.

BenchmarkModelv0.23.0-iter1-baselinev0.23.0-iter5-par017-n1Δ
adt_optionopencode-gemma4-26b-ailang33% (n=3)100% (n=1)▲ +66.7pp
balanced_parensopencode-gemma4-26b-ailang33% (n=3)100% (n=1)▲ +66.7pp
binary_tree_sumopencode-gemma4-26b-ailang100% (n=3)0% (n=1)▼ -100.0pp
canonical_convergenceopencode-gemma4-26b-ailang100% (n=3)0% (n=1)▼ -100.0pp
canonical_normalizationopencode-gemma4-26b-ailang33% (n=3)100% (n=1)▲ +66.7pp
dense_operator_programopencode-gemma4-26b-ailang0% (n=3)100% (n=1)▲ +100.0pp
explicit_state_threadingopencode-gemma4-26b-ailang67% (n=3)0% (n=1)▼ -66.7pp
fizzbuzzopencode-gemma4-26b-ailang67% (n=3)100% (n=1)▲ +33.3pp
inline_testsopencode-gemma4-26b-ailang100% (n=3)0% (n=1)▼ -100.0pp
record_updateopencode-gemma4-26b-ailang67% (n=3)100% (n=1)▲ +33.3pp
records_bookopencode-gemma4-26b-ailang67% (n=3)100% (n=1)▲ +33.3pp
type_safe_record_accessopencode-gemma4-26b-ailang100% (n=3)0% (n=1)▼ -100.0pp

Generated from N-trial rotation data via the local-ollama eval rig.