v0.23.0-iter1-baseline-n3 OS-model smoke leaderboard
Auto-generated by ailang eval-publish v0.23.0-iter1-baseline-n3.
Per-benchmark pass rate
| Benchmark | opencode-gemma4-26b-ailang |
|---|---|
| adt_option | 33% (n=3) |
| balanced_parens | 33% (n=3) |
| binary_tree_sum | 100% (n=3) |
| canonical_convergence | 100% (n=3) |
| canonical_normalization | 33% (n=3) |
| dense_operator_program | 0% (n=3) |
| explicit_state_threading | 67% (n=3) |
| fizzbuzz | 67% (n=3) |
| gcd_lcm | 100% (n=3) |
| immutable_data_structures | 100% (n=3) |
| inline_tests | 100% (n=3) |
| nested_records | 100% (n=3) |
| numeric_modulo | 100% (n=3) |
| record_update | 67% (n=3) |
| records_book | 67% (n=3) |
| recursion_fibonacci | 100% (n=3) |
| type_safe_record_access | 100% (n=3) |
Generated from N-trial rotation data via the local-ollama eval rig.