AILANG Vision

A symbolic reasoning kernel for neural program synthesis.

AILANG sits between probabilistic program synthesis (LLMs) and deterministic execution environments. It constrains, verifies, and traces programs synthesized by neural systems.

  LLM (Neural Synthesis)
          │
          │  Symbolic Program
          ▼
  ┌──────────────────────────┐
  │      AILANG KERNEL       │
  │                          │
  │  • Effect relations      │
  │  • Contract verification │
  │  • Budget enforcement    │
  │  • Trace generation      │
  │  • Authority bounds      │
  └──────────────────────────┘
          │
          │  Constrained Execution
          ▼
  OS / Runtime / Cloud

The Five Pillars

AILANG's design rests on five pillars that together form a coherent system for trustworthy AI execution:

Pillar	What It Means	Status
1. Symbolic/Concrete Separation	Programs represent symbolic intent; concrete values arise at execution within constraints	Core AST, elaboration, deterministic eval
2. Effects as Relations	Effects are `(Intent × EffContext) ⇒ Outcome \| Constraints`, not opaque functions	Capabilities, budgets (`@limit=N`), policies
3. Contracts as Specs	`requires`/`ensures` are symbolic predicates for AI-facing specifications	Runtime enforcement, panic/warn modes
4. Traces as Artifacts	Execution produces structured, replayable, auditable traces	Agent traces (`ailang chains`) + program traces (`--emit-trace`)
5. Explicit Authority	All authority must be declared, bounded, and enforced — no ambient capabilities	Deny-by-default `--caps`, budget limits, structured failures

How AILANG Differs from Traditional Languages

Aspect	Traditional Languages	AILANG
Primary author	Human	AI
Execution model	Direct	Mediated through kernel
Effects	Invisible side-effects	Explicit relations with budgets
Verification	Type-centric	Contract + trace-centric
Traces	Ad-hoc logs	Structured semantic artifacts
Authority	Ambient (can do anything)	Declared and bounded
Purpose	Program correctness	Trustworthy AI execution

Effects as Capability Relations

Normal compilers:

doAnything() is fine as long as types line up
Side-effects are invisible to the type system
No semantic boundary between "pure logic" and "this talks to the database"

AILANG:

-- The type signature constrains what this function CAN do
let processUser: User -> ! {DB} UserRecord =
  \user. dbWrite(transformUser(user))

-- This CANNOT compile - no DB capability granted
let pureTransform: User -> UserRecord =
  \user. dbWrite(user)  -- ERROR: Effect mismatch

Why this matters for AI:

The model literally cannot "hallucinate" a network call in a pure function
Effect boundaries are machine-checkable constraints, not just documentation
Budget limits (! {IO @limit=5}) bound resource consumption per function

Traces as Semantic Artifacts

AILANG captures traces at two complementary levels:

Level 1: Agent Execution Traces

The ailang chains system captures what AI agents do during task execution:

Full conversation history with reasoning blocks
Tool usage timeline (file edits, commands, searches)
Cost, tokens, turns, and duration metrics
Session correlation across multi-step workflows
Approval workflow tracking

ailang chains view <chain-id> --spans    # See agent execution flow
ailang chains tree <chain-id> --detailed # Tree view with tool timeline
ailang chains stats --by-agent           # Cost and token breakdown

Level 2: Program Execution Traces

The --emit-trace flag captures what happens when AILANG code evaluates:

Function call entry/exit with arguments and results
Effect invocations (IO, FS, Net operations)
Contract check results (requires/ensures)
Budget consumption deltas
Call depth tracking

# Export program-level trace as JSONL
ailang run --emit-trace jsonl --caps IO --entry main program.ail

# Validate with jq
ailang run --emit-trace jsonl --quiet --caps IO --entry main program.ail 2>/dev/null | jq .

Example trace output:

{"version":"1.0","event":"function_enter","timestamp_ns":42459,"depth":1,"function":{"name":"processFile","args":["\"input.txt\""]}}
{"version":"1.0","event":"function_enter","timestamp_ns":44125,"depth":2,"function":{"name":"std/io.println","args":["\"Processing...\""]}}
{"version":"1.0","event":"function_exit","timestamp_ns":45750,"depth":2,"function":{"name":"std/io.println","result":"()","duration_ns":1625}}
{"version":"1.0","event":"function_exit","timestamp_ns":46500,"depth":1,"function":{"name":"processFile","result":"()","duration_ns":4041}}

Why both levels matter: When something goes wrong, agent traces show what the AI decided to do; program traces show what the code actually did. Together they provide full observability from intent to execution.

Explicit Authority and Bounded Power

AILANG eliminates ambient authority. Every capability must be explicitly declared:

-- Declare: this function needs IO and FS, with bounded IO usage
export func process(path: string) -> string ! {IO @limit=3, FS} =
  let data = readFile(path)      -- FS capability required
  let _ = println("Processing")  -- IO capability, budget: 1 of 3
  transform(data)                -- Pure: no capability needed

Budget enforcement:

Functions declare maximum effect usage: ! {IO @limit=5}
Runtime enforces the limit — exceeding it is a structured failure
Callers are charged the callee's declared budget (semantic charging)
Minimum budgets ensure effects were actually exercised

Contract verification:

requires blocks check preconditions before execution
ensures blocks verify postconditions after execution
Contract checks are recorded in traces for auditability

Better Feedback Loops

AI code generation is iterative. AILANG makes each iteration more productive:

Traditional	AILANG
Arbitrary logs	Structured effect traces
"Something crashed"	"Effect mismatch at line 42"
Blind retry	Targeted fix with trace evidence
No resource bounds	Budget-limited, auditable execution

The result: faster convergence to working code.

Research Foundation: LoCoBench

Recent research validates AILANG's design thesis. The LoCoBench benchmark (PDF) (Salesforce AI Research, 2025) evaluated LLMs on 8,000 multi-file software engineering tasks and found:

30-50% performance degradation as context grows from 10K to 1M tokens
Systems languages (C, Rust) score worst; high-level languages (Python, PHP) score best
Architectural reasoning is a key failure mode — especially for tightly-coupled systems

AILANG's explicit effects, static imports, and deterministic semantics directly address these failure modes.

See detailed analysis →

Benchmark Results — M-EVAL

We continuously track AI code generation success across 46 benchmarks with Claude, GPT, and Gemini.

Key metrics:

Zero-Shot: Code works on first try
Final Success: Code works after self-repair
Agent Success: Multi-turn agent completes task

The jump from zero-shot to agent mode shows structured error feedback helping models self-correct.

See the Benchmark Dashboard for live results, trends, and per-model breakdowns.

Core Capabilities

AILANG provides:

Algebraic Effects — ! {IO, FS, Net, AI, SharedMem} declares capabilities in types
Deterministic Core — pure functions are referentially transparent
Structured Traces — agent-level (ailang chains) and program-level (--emit-trace)
Contracts — requires/ensures as machine-readable specifications
Budget Enforcement — @limit=N bounds effect usage per function
Go Codegen — compile to native performance
Semantic Caching — multi-agent coordination through shared memory
AI Effect — unified interface for LLM calls across providers

See Implementation Status for detailed feature availability.

What AILANG Deliberately Excludes

AILANG prioritizes machine reasoning over human ergonomics:

No LSP/IDE servers — AIs use CLI/API, not text editors
No multiple syntaxes — one canonical way to express each concept
No implicit behaviors — all effects are explicit
No ambient authority — everything requires explicit capability grants

These aren't limitations — they're design choices that make the language a reliable substrate for AI-synthesized programs.

See Design Lineage for the full adopted/rejected analysis.

Get Involved

Try it: See Getting Started
Benchmark it: Run ailang eval-suite to test AI code generation
Contribute: github.com/sunholo-data/ailang

AILANG — a symbolic reasoning kernel for trustworthy AI execution.

Theoretical Foundations

For the academic grounding behind these design choices:

Design Axioms — The 12 non-negotiable principles
Philosophical Foundations — Block-universe determinism and computation as navigation
Design Lineage — What we adopted, rejected, and why
Citations & Bibliography — Full academic references

The Five Pillars​

How AILANG Differs from Traditional Languages​

Effects as Capability Relations​

Traces as Semantic Artifacts​

Level 1: Agent Execution Traces​

Level 2: Program Execution Traces​

Explicit Authority and Bounded Power​

Better Feedback Loops​

Research Foundation: LoCoBench​

Benchmark Results — M-EVAL​

Core Capabilities​

What AILANG Deliberately Excludes​

Get Involved​

Theoretical Foundations​