v0.23.0-iter2-slimv2-n1 OS-model smoke leaderboard
Auto-generated by ailang eval-publish v0.23.0-iter2-slimv2-n1.
Per-benchmark pass rate
| Benchmark | opencode-gemma4-26b-ailang |
|---|---|
| adt_option | 100% (n=1) |
| balanced_parens | 0% (n=1) |
| binary_tree_sum | 0% (n=1) |
| canonical_convergence | 100% (n=1) |
| canonical_normalization | 0% (n=1) |
| dense_operator_program | 0% (n=1) |
| explicit_state_threading | 100% (n=1) |
| fizzbuzz | 100% (n=1) |
| gcd_lcm | 100% (n=1) |
| immutable_data_structures | 0% (n=1) |
| inline_tests | 0% (n=1) |
| nested_records | 100% (n=1) |
| numeric_modulo | 100% (n=1) |
| record_update | 100% (n=1) |
| records_book | 100% (n=1) |
| recursion_fibonacci | 100% (n=1) |
| type_safe_record_access | 100% (n=1) |
Trend deltas since v0.23.0-iter1-baseline
Benchmarks where pass rate moved by >=10.0 percentage points.
| Benchmark | Model | v0.23.0-iter1-baseline | v0.23.0-iter2-slimv2-n1 | Δ |
|---|---|---|---|---|
| adt_option | opencode-gemma4-26b-ailang | 33% (n=3) | 100% (n=1) | ▲ +66.7pp |
| balanced_parens | opencode-gemma4-26b-ailang | 33% (n=3) | 0% (n=1) | ▼ -33.3pp |
| binary_tree_sum | opencode-gemma4-26b-ailang | 100% (n=3) | 0% (n=1) | ▼ -100.0pp |
| canonical_normalization | opencode-gemma4-26b-ailang | 33% (n=3) | 0% (n=1) | ▼ -33.3pp |
| explicit_state_threading | opencode-gemma4-26b-ailang | 67% (n=3) | 100% (n=1) | ▲ +33.3pp |
| fizzbuzz | opencode-gemma4-26b-ailang | 67% (n=3) | 100% (n=1) | ▲ +33.3pp |
| immutable_data_structures | opencode-gemma4-26b-ailang | 100% (n=3) | 0% (n=1) | ▼ -100.0pp |
| inline_tests | opencode-gemma4-26b-ailang | 100% (n=3) | 0% (n=1) | ▼ -100.0pp |
| record_update | opencode-gemma4-26b-ailang | 67% (n=3) | 100% (n=1) | ▲ +33.3pp |
| records_book | opencode-gemma4-26b-ailang | 67% (n=3) | 100% (n=1) | ▲ +33.3pp |
Generated from N-trial rotation data via the local-ollama eval rig.