Skip to main content

AILANG Vision

A symbolic reasoning kernel for neural program synthesis.

AILANG sits between probabilistic program synthesis (LLMs) and deterministic execution environments. It constrains, verifies, and traces programs synthesized by neural systems.

  LLM (Neural Synthesis)

│ Symbolic Program

┌──────────────────────────┐
│ AILANG KERNEL │
│ │
│ • Effect relations │
│ • Contract verification │
│ • Budget enforcement │
│ • Trace generation │
│ • Authority bounds │
└──────────────────────────┘

│ Constrained Execution

OS / Runtime / Cloud

The Five Pillars

AILANG's design rests on five pillars that together form a coherent system for trustworthy AI execution:

PillarWhat It MeansStatus
1. Symbolic/Concrete SeparationPrograms represent symbolic intent; concrete values arise at execution within constraintsCore AST, elaboration, deterministic eval
2. Effects as RelationsEffects are (Intent × EffContext) ⇒ Outcome | Constraints, not opaque functionsCapabilities, budgets (@limit=N), policies
3. Contracts as Specsrequires/ensures are symbolic predicates for AI-facing specificationsRuntime enforcement, panic/warn modes
4. Traces as ArtifactsExecution produces structured, replayable, auditable tracesAgent traces (ailang chains) + program traces (--emit-trace)
5. Explicit AuthorityAll authority must be declared, bounded, and enforced — no ambient capabilitiesDeny-by-default --caps, budget limits, structured failures

How AILANG Differs from Traditional Languages

AspectTraditional LanguagesAILANG
Primary authorHumanAI
Execution modelDirectMediated through kernel
EffectsInvisible side-effectsExplicit relations with budgets
VerificationType-centricContract + trace-centric
TracesAd-hoc logsStructured semantic artifacts
AuthorityAmbient (can do anything)Declared and bounded
PurposeProgram correctnessTrustworthy AI execution

Effects as Capability Relations

Normal compilers:

  • doAnything() is fine as long as types line up
  • Side-effects are invisible to the type system
  • No semantic boundary between "pure logic" and "this talks to the database"

AILANG:

-- The type signature constrains what this function CAN do
let processUser: User -> ! {DB} UserRecord =
\user. dbWrite(transformUser(user))

-- This CANNOT compile - no DB capability granted
let pureTransform: User -> UserRecord =
\user. dbWrite(user) -- ERROR: Effect mismatch

Why this matters for AI:

  • The model literally cannot "hallucinate" a network call in a pure function
  • Effect boundaries are machine-checkable constraints, not just documentation
  • Budget limits (! {IO @limit=5}) bound resource consumption per function

Traces as Semantic Artifacts

AILANG captures traces at two complementary levels:

Level 1: Agent Execution Traces

The ailang chains system captures what AI agents do during task execution:

  • Full conversation history with reasoning blocks
  • Tool usage timeline (file edits, commands, searches)
  • Cost, tokens, turns, and duration metrics
  • Session correlation across multi-step workflows
  • Approval workflow tracking
ailang chains view <chain-id> --spans    # See agent execution flow
ailang chains tree <chain-id> --detailed # Tree view with tool timeline
ailang chains stats --by-agent # Cost and token breakdown

Level 2: Program Execution Traces

The --emit-trace flag captures what happens when AILANG code evaluates:

  • Function call entry/exit with arguments and results
  • Effect invocations (IO, FS, Net operations)
  • Contract check results (requires/ensures)
  • Budget consumption deltas
  • Call depth tracking
# Export program-level trace as JSONL
ailang run --emit-trace jsonl --caps IO --entry main program.ail

# Validate with jq
ailang run --emit-trace jsonl --quiet --caps IO --entry main program.ail 2>/dev/null | jq .

Example trace output:

{"version":"1.0","event":"function_enter","timestamp_ns":42459,"depth":1,"function":{"name":"processFile","args":["\"input.txt\""]}}
{"version":"1.0","event":"function_enter","timestamp_ns":44125,"depth":2,"function":{"name":"std/io.println","args":["\"Processing...\""]}}
{"version":"1.0","event":"function_exit","timestamp_ns":45750,"depth":2,"function":{"name":"std/io.println","result":"()","duration_ns":1625}}
{"version":"1.0","event":"function_exit","timestamp_ns":46500,"depth":1,"function":{"name":"processFile","result":"()","duration_ns":4041}}

Why both levels matter: When something goes wrong, agent traces show what the AI decided to do; program traces show what the code actually did. Together they provide full observability from intent to execution.


Explicit Authority and Bounded Power

AILANG eliminates ambient authority. Every capability must be explicitly declared:

-- Declare: this function needs IO and FS, with bounded IO usage
export func process(path: string) -> string ! {IO @limit=3, FS} =
let data = readFile(path) -- FS capability required
let _ = println("Processing") -- IO capability, budget: 1 of 3
transform(data) -- Pure: no capability needed

Budget enforcement:

  • Functions declare maximum effect usage: ! {IO @limit=5}
  • Runtime enforces the limit — exceeding it is a structured failure
  • Callers are charged the callee's declared budget (semantic charging)
  • Minimum budgets ensure effects were actually exercised

Contract verification:

  • requires blocks check preconditions before execution
  • ensures blocks verify postconditions after execution
  • Contract checks are recorded in traces for auditability

Better Feedback Loops

AI code generation is iterative. AILANG makes each iteration more productive:

TraditionalAILANG
Arbitrary logsStructured effect traces
"Something crashed""Effect mismatch at line 42"
Blind retryTargeted fix with trace evidence
No resource boundsBudget-limited, auditable execution

The result: faster convergence to working code.


Research Foundation: LoCoBench

Recent research validates AILANG's design thesis. The LoCoBench benchmark (PDF) (Salesforce AI Research, 2025) evaluated LLMs on 8,000 multi-file software engineering tasks and found:

  • 30-50% performance degradation as context grows from 10K to 1M tokens
  • Systems languages (C, Rust) score worst; high-level languages (Python, PHP) score best
  • Architectural reasoning is a key failure mode — especially for tightly-coupled systems

AILANG's explicit effects, static imports, and deterministic semantics directly address these failure modes.

See detailed analysis →


Benchmark Results — M-EVAL

We continuously track AI code generation success across 46 benchmarks with Claude, GPT, and Gemini.

Key metrics:

  • Zero-Shot: Code works on first try
  • Final Success: Code works after self-repair
  • Agent Success: Multi-turn agent completes task

The jump from zero-shot to agent mode shows structured error feedback helping models self-correct.

See the Benchmark Dashboard for live results, trends, and per-model breakdowns.


Core Capabilities

AILANG provides:

  • Algebraic Effects! {IO, FS, Net, AI, SharedMem} declares capabilities in types
  • Deterministic Core — pure functions are referentially transparent
  • Structured Traces — agent-level (ailang chains) and program-level (--emit-trace)
  • Contractsrequires/ensures as machine-readable specifications
  • Budget Enforcement@limit=N bounds effect usage per function
  • Go Codegen — compile to native performance
  • Semantic Caching — multi-agent coordination through shared memory
  • AI Effect — unified interface for LLM calls across providers

See Implementation Status for detailed feature availability.


What AILANG Deliberately Excludes

AILANG prioritizes machine reasoning over human ergonomics:

  • No LSP/IDE servers — AIs use CLI/API, not text editors
  • No multiple syntaxes — one canonical way to express each concept
  • No implicit behaviors — all effects are explicit
  • No ambient authority — everything requires explicit capability grants

These aren't limitations — they're design choices that make the language a reliable substrate for AI-synthesized programs.

See Design Lineage for the full adopted/rejected analysis.


Get Involved


AILANG — a symbolic reasoning kernel for trustworthy AI execution.


Theoretical Foundations

For the academic grounding behind these design choices: