Telemetry & Tracing

AILANG includes comprehensive OpenTelemetry (OTEL) instrumentation for distributed tracing and observability. This enables integration with standard observability backends like Google Cloud Trace, Grafana, Honeycomb, Jaeger, and more.

Quick Start

Google Cloud Trace (Recommended for GCP)

# Set your GCP project (uses Application Default Credentials)
export GOOGLE_CLOUD_PROJECT=your-project-id

# Start services
ailang serve
ailang coordinator start

View traces at: https://console.cloud.google.com/traces/explorer?project=your-project-id

Generic OTLP (Jaeger, Grafana, Honeycomb, etc.)

# Set OTLP collector endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# Start services
ailang serve
ailang coordinator start

Dual Export (Both GCP + OTLP)

Send traces to both Google Cloud Trace and another backend simultaneously:

# Configure both
export GOOGLE_CLOUD_PROJECT=your-project-id
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# Traces go to both destinations
ailang serve

CLI Trace Commands

AILANG includes a built-in CLI for querying traces from Google Cloud Trace:

Check Status

# See current telemetry configuration
ailang trace status

Output:

Telemetry Configuration Status
────────────────────────────────────────
Google Cloud Project: multivac-internal-dev
OTLP Endpoint:        (not set)

Mode: Google Cloud Trace
View traces: https://console.cloud.google.com/traces/explorer?project=multivac-internal-dev

List Recent Traces

# List last 10 traces (default)
ailang trace list

# Customize time range and limit
ailang trace list --hours 2 --limit 20

# Filter by span name
ailang trace list --filter "ailang run"

# JSON output for scripting
ailang trace list --json

View Trace Details

# View full trace hierarchy with timing
ailang trace view <trace-id>

Example output:

Trace: 5d359e6d157ba7e726aca8a7600a3bfe
Spans: 5
────────────────────────────────────────────────────────────
ailang run: examples/runnable/factorial.ail (2.065ms)
  └─ compile: examples/runnable/factorial.ail (1.458ms)
  └─ compile.load (358µs)
  └─ compile.topo_sort (84µs)
  └─ compile.modules (859µs)

Environment Variables

Variable	Description	Example
`GOOGLE_CLOUD_PROJECT`	GCP project for Cloud Trace	`my-project`
`OTLP_GOOGLE_CLOUD_PROJECT`	Telemetry-specific GCP project (takes precedence)	`telemetry-project`
`OTEL_EXPORTER_OTLP_ENDPOINT`	OTLP collector endpoint	`http://localhost:4318`
`OTEL_ENVIRONMENT`	Deployment environment	`production`, `staging`, `development`

Priority for GCP project:

OTLP_GOOGLE_CLOUD_PROJECT (if set)
GOOGLE_CLOUD_PROJECT (fallback)

This matches the Gemini CLI telemetry convention.

Instrumented Components

All components emit traces automatically when telemetry is configured:

Compiler Pipeline

The compilation pipeline emits detailed spans for each phase:

Single-file/REPL compilation:

Span	Description	Key Attributes
`compile.pipeline`	Parent span for entire compilation	`file.path`, `file.size_bytes`, `is_repl`
`compile.parse`	Parsing phase	`ast.nodes` (count)
`compile.elaborate`	Surface→Core elaboration	-
`compile.typecheck`	Type checking	-
`compile.validate`	CoreTypeInfo validation	-
`compile.lower`	Operator lowering	-

Module compilation:

Span	Description	Key Attributes
`compile.module_pipeline`	Parent span for module compilation	`file.path`, `file.size_bytes`
`compile.load`	Module loading	`modules.loaded` (count)
`compile.topo_sort`	Topological sort	`modules.sorted` (count)
`compile.modules`	Compile all modules	`modules.count`

Eval Harness (`ailang eval-suite`)

The benchmark evaluation system emits spans for suite execution and individual benchmarks:

Span	Description	Key Attributes
`eval.suite`	Parent span for entire benchmark run	`eval.models`, `eval.benchmarks`, `eval.languages`, `eval.total_runs`, `eval.agent_mode`, `eval.success_count`, `eval.fail_count`, `eval.success_rate`
`eval.benchmark`	Individual benchmark execution	`benchmark.id`, `benchmark.model`, `benchmark.language`, `benchmark.seed`, `benchmark.success`, `benchmark.duration_ms`, `benchmark.input_tokens`, `benchmark.output_tokens`, `benchmark.cost_usd`

v0.6.3+ Enhanced Attributes:

For successful benchmarks:

code.preview - First 100 chars of generated code
code.hash - 8-char hash for deduplication

For failed benchmarks:

error.summary - Truncated error message
error.category - Error classification

For standard mode (with repair):

benchmark.repair_successful - Whether self-repair succeeded

Messaging System (`ailang messages`)

Message operations emit spans for observability:

Span	Description	Key Attributes
`messages.send`	Create/insert message	`message.to_inbox`, `message.from_agent`, `message.type`, `message.category`, `message.id`
`messages.list`	List messages with filters	`list.inbox`, `list.unread_only`, `list.collapsed`, `list.limit`, `list.result_count`
`messages.read`	Read single message by ID	`message.id`, `message.from_agent`, `message.to_inbox`, `message.type`
`messages.search`	Semantic search	`search.query`, `search.use_neural`, `search.threshold`, `search.limit`, `search.inbox`, `search.result_count`
`messages.ack`	Mark message as read	`message.id`, `message.new_status`
`messages.unack`	Mark message as unread	`message.id`, `message.new_status`
`messages.cleanup`	Delete old/expired messages	`cleanup.older_than`, `cleanup.expired_only`, `cleanup.dry_run`, `cleanup.deleted_count`
`messages.github_sync`	Import issues from GitHub	`github.repo`, `sync.dry_run`, `github.issues_found`, `sync.imported`, `sync.skipped`

REPL (`ailang repl`)

Interactive REPL sessions emit session-level and input-level spans:

Span	Description	Key Attributes
`repl.session`	Parent span for entire REPL session	`session.id`, `version`, `session.input_count`, `session.duration_ms`
`repl.input`	Individual user input evaluation	`input.type` (command/expression), `input.text` (truncated 200 chars), `input.number`

Span Hierarchy:

repl.session (duration of interactive session)
  └─ repl.input #1 (first user input)
  └─ repl.input #2 (second user input)
  └─ ... (subsequent inputs)

Session metrics are finalized when the REPL exits, capturing total input count and session duration.

Check Command (`ailang check`)

The type checking command emits spans for file/directory verification:

Span	Description	Key Attributes
`ailang.check`	Root span for check operation	`file.path`, `timeout_ms`, `is_directory`
`check.result`	Check outcome with pass/fail	`passed` (bool), `errors.count`, `timed_out` (if timeout occurred)

Span Hierarchy:

ailang.check (root span)
  └─ check.result (with pass/fail and error counts)
  └─ compile.* (compilation phases from compiler pipeline)

When using --timeout, the timed_out attribute is set if compilation exceeds the limit.

Server (`ailang serve`)

HTTP request/response spans via otelhttp middleware
Automatic status codes, latency, and path attributes
Filters out /health and /ws endpoints

Coordinator (`ailang coordinator start`)

The coordinator daemon emits task lifecycle spans:

Task lifecycle spans: coordinator.execute_task
Attributes: task.id, task.type, task.stage
Token and cost tracking per task

Executors

Claude Executor:

Span: claude.execute
Attributes: executor.model, task.workspace, session.id
Token counts: task.tokens_in, task.tokens_out
Cost: task.cost_usd

Gemini Executor:

Span: gemini.execute
Same attributes as Claude executor

AI Providers

All AI providers emit spans for API calls:

Provider	Span Name	Key Attributes
Anthropic	`anthropic.generate`	`ai.model`, `ai.tokens_in`, `ai.tokens_out`, `http.status_code`, `ai.prompt_preview`, `ai.response_preview`, `ai.finish_reason`
OpenAI	`openai.generate`	`ai.model`, `ai.api_type` (chat/responses), `ai.tokens_*`, `ai.prompt_preview`, `ai.response_preview`
Gemini	`gemini.generate`	`ai.model`, `ai.auth_type` (api_key/adc), `ai.tokens_*`, `ai.prompt_preview`, `ai.response_preview`
Ollama	`ollama.generate`	`ai.model`, `ai.endpoint`, `ai.prompt_preview`, `ai.response_preview`

Telemetry Helpers (v0.6.3+)

The internal/telemetry package provides helper functions for safe, consistent span attributes:

Truncate

Safely truncate strings for span attributes, preserving UTF-8 boundaries:

import "github.com/sunholo/ailang/internal/telemetry"

// Truncate to 100 chars, adding "..." if truncated
preview := telemetry.Truncate(longString, 100)
// "Hello, 世界..." (never breaks in middle of multi-byte chars)

CategorizeError

Categorize errors for filtering and aggregation:

category := telemetry.CategorizeError(err)
// Returns: "network", "timeout", "auth", "rate_limit", "parse", "type", "runtime", or "unknown"

ShortHash

Generate deterministic short hashes for deduplication:

hash := telemetry.ShortHash(codeString)
// Returns 8-char hex string like "a1b2c3d4"

LineSnippet

Extract source code context around a line number:

snippet := telemetry.LineSnippet(sourceCode, lineNumber, 60)
// Returns up to 60 chars of the specified line

Example: Local Jaeger Setup

# Start Jaeger with OTLP collector
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

# Configure AILANG
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# Start services
ailang serve &
ailang coordinator start

# View traces at http://localhost:16686

Example: Google Cloud Trace + Local Jaeger

# Dual export - both GCP and local Jaeger
export GOOGLE_CLOUD_PROJECT=my-gcp-project
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# Traces go to BOTH destinations
ailang serve

Integration Tests

Run the integration tests to verify your Google Cloud Trace setup:

# Set project
export GOOGLE_CLOUD_PROJECT=your-project-id

# Run tests
go test -tags=integration -v -run TestGoogleCloudTrace ./internal/telemetry/...

Tests create sample traces and verify they export correctly.

Native CLI Telemetry

Both Claude Code and Gemini CLI have native OTEL support:

Claude Code

export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Gemini CLI

Configure in ~/.gemini/settings.json:

{
  "telemetry": {
    "enabled": true,
    "endpoint": "http://localhost:4318"
  }
}

Cross-Process Trace Linking (v0.6.3+)

AILANG supports end-to-end distributed tracing across process boundaries. When the coordinator spawns CLI executors (Claude Code, Gemini CLI), and those CLIs spawn ailang run commands, all spans are linked into a single trace.

How It Works

The trace context flows via W3C TRACEPARENT environment variables:

Environment Variables

The following environment variables are used for trace propagation:

Variable	Format	Description
`TRACEPARENT`	`00-{trace_id}-{span_id}-{flags}`	W3C trace context (standard)
`TRACESTATE`	Vendor-specific	Additional trace state (optional)
`AILANG_TASK_ID`	UUID	Coordinator task ID (fallback correlation)
`AILANG_SESSION_ID`	UUID	Executor session ID (fallback correlation)

Supported Commands

These commands extract trace context from the environment:

Command	Behavior
`ailang run`	Creates child span under parent trace
`ailang check`	Creates child span under parent trace
`ailang eval-suite`	Creates child span under parent trace
`ailang repl`	Session uses extracted trace context

CLI Tool Scenarios

Depending on whether the CLI tool (Claude Code, Gemini CLI) supports OTEL:

Scenario A: Gemini CLI (full traces)

Executor span ──► Gemini CLI span ──► ailang run span
     │                  │                   │
     └──────────────────┴───────────────────┘
                   (single trace)

Gemini CLI supports full trace export. Spans appear in the same trace hierarchy.

Scenario B: Claude Code (metrics + linked traces)

Executor span ─────────────────────► ailang run span
     │                                     │
     └─────────────────────────────────────┘
              (linked trace, CLI gap)

   + Claude Code metrics/events with ailang.task_id correlation

Claude Code exports metrics and events (not traces), but we inject ailang.task_id via OTEL_RESOURCE_ATTRIBUTES so they can be joined with AILANG traces in the dashboard.

Scenario C: CLI passes env vars through (fallback)

Executor span ──────────────────► ailang run span
     │                                  │
     └──────────────────────────────────┘
              (linked trace, CLI gap)

Scenario C: CLI sanitizes environment (rare)

Executor span             ailang run span
     │                          │
     │     (correlated via      │
     └──── AILANG_TASK_ID) ─────┘

CLI Telemetry Configuration

The AILANG executors automatically configure CLI telemetry. Here's what gets injected:

Claude Code (via environment variables):

CLAUDE_CODE_ENABLE_TELEMETRY=1           # Enable telemetry
OTEL_METRICS_EXPORTER=otlp               # Export metrics via OTLP
OTEL_LOGS_EXPORTER=otlp                  # Export events via OTLP
OTEL_RESOURCE_ATTRIBUTES=ailang.task_id=...,ailang.session_id=...  # Correlation IDs
OTEL_EXPORTER_OTLP_ENDPOINT=...          # Inherited from parent
OTEL_EXPORTER_OTLP_PROTOCOL=...          # Inherited from parent
OTLP_GOOGLE_CLOUD_PROJECT=...            # Primary project
GOOGLE_CLOUD_PROJECT=...                 # Fallback project
TRACEPARENT=00-{trace_id}-{span_id}-01   # W3C trace context (for ailang run)

Note: Claude Code exports metrics and events only (not traces). The OTEL_RESOURCE_ATTRIBUTES allows joining Claude Code metrics with AILANG traces by ailang.task_id.

Gemini CLI (via environment variables):

GEMINI_TELEMETRY_ENABLED=true            # Enable telemetry (includes traces)
GEMINI_TELEMETRY_TARGET=gcp              # Export to GCP (if project set)
OTEL_RESOURCE_ATTRIBUTES=ailang.task_id=...,ailang.session_id=...  # Correlation IDs
OTLP_GOOGLE_CLOUD_PROJECT=...            # Primary project var (checked first)
GOOGLE_CLOUD_PROJECT=...                 # Fallback project
OTEL_EXPORTER_OTLP_ENDPOINT=...          # For local collector
TRACEPARENT=00-{trace_id}-{span_id}-01   # W3C trace context

Note: Gemini CLI exports full traces that appear in the trace hierarchy.

Project detection order:

OTLP_GOOGLE_CLOUD_PROJECT - Primary (Gemini CLI standard)
GOOGLE_CLOUD_PROJECT - Fallback (GCP standard)

Manual CLI configuration (if running CLIs directly):

Claude Code settings (~/.claude/settings.json):

{
  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "1",
    "OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
    "OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4317"
  }
}

Gemini CLI settings (~/.gemini/settings.json):

{
  "telemetry": {
    "enabled": true,
    "target": "gcp"
  }
}

Using in CI/CD

Export trace context to link CI runs with AILANG executions:

GitHub Actions:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Run with trace linking
        env:
          TRACEPARENT: "00-${{ github.run_id }}-${{ github.job }}-01"
          AILANG_TASK_ID: ${{ github.run_id }}
        run: |
          ailang run --caps IO --entry main app.ail

Cloud Build:

steps:
  - name: 'ailang-builder'
    env:
      - 'TRACEPARENT=00-${BUILD_ID}-cloudtrace-01'
      - 'AILANG_TASK_ID=${BUILD_ID}'
    args: ['run', '--caps', 'IO', '--entry', 'main', 'app.ail']

Verification

To verify trace linking is working:

# 1. Set up telemetry
export GOOGLE_CLOUD_PROJECT=your-project

# 2. Create a trace context manually
export TRACEPARENT="00-$(uuidgen | tr -d '-')-$(uuidgen | tr -d '-' | cut -c1-16)-01"

# 3. Run a command
ailang run --caps IO --entry main examples/runnable/hello.ail

# 4. Check traces
ailang trace list --hours 1

The trace should show a child span linked to your TRACEPARENT.

Programmatic Usage

The telemetry package provides helpers for trace propagation:

import "github.com/sunholo/ailang/internal/telemetry"

// Inject trace context into subprocess environment
env := os.Environ()
env = telemetry.InjectTraceContext(ctx, env)
env = telemetry.InjectCorrelationIDs(env, taskID, sessionID)
cmd.Env = env

// Extract trace context from environment (in subprocess)
ctx = telemetry.ExtractTraceContext(ctx)
taskID, sessionID := telemetry.ExtractCorrelationIDs()

Trace Attributes Reference

Common Attributes

All spans include:

service.name - Service identifier (e.g., ailang-server, ailang-coordinator)
service.version - AILANG version
deployment.environment - From OTEL_ENVIRONMENT (default: development)
process.runtime.name - go
process.runtime.version - Go version

AI-Specific Attributes

Attribute	Description
`ai.provider`	Provider name: `anthropic`, `openai`, `gemini`, `ollama`
`ai.model`	Model ID (e.g., `claude-sonnet-4-5`, `gpt-5`)
`ai.tokens_in`	Input tokens
`ai.tokens_out`	Output tokens
`ai.tokens_total`	Total tokens
`ai.cost_usd`	Estimated cost in USD
`ai.prompt_preview`	First 100 chars of prompt (v0.6.3+)
`ai.response_preview`	First 100 chars of response (v0.6.3+)
`ai.finish_reason`	Why generation stopped: `end_turn`, `max_tokens`, etc. (v0.6.3+)

Task Attributes

Attribute	Description
`task.id`	Unique task identifier
`task.type`	Task type: `bug`, `feature`, `docs`, etc.
`task.stage`	Pipeline stage: `design`, `sprint`, `implement`
`task.success`	Boolean success status
`task.duration_ms`	Duration in milliseconds

Error Context Attributes (v0.6.3+)

When errors occur, spans include rich debugging context:

Attribute	Description
`error.message`	Truncated error message (max 200 chars, UTF-8 safe)
`error.category`	Error type: `network`, `timeout`, `auth`, `rate_limit`, `parse`, `type`, `runtime`, `unknown`
`error.location`	Position in source: `line:column` format
`error.snippet`	Source code around error (max 60 chars)
`error.summary`	Short error description for failed benchmarks

Code Context Attributes (v0.6.3+)

Eval benchmarks include code analysis attributes:

Attribute	Description
`code.preview`	First 100 chars of generated code
`code.hash`	Short hash (8 chars) for deduplication
`benchmark.repair_successful`	Whether self-repair succeeded (standard mode)

CLI Run Attributes (v0.6.3+)

The ailang run command includes:

Attribute	Description
`file.path`	Path to the executed file
`entry.function`	Entry point function name
`caps.granted`	List of granted capabilities (e.g., `["IO", "FS"]`)

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Your Application                         │
├─────────────────────────────────────────────────────────────┤
│  Server    │  Coordinator  │  Executors  │  AI Providers    │
│  (HTTP)    │  (Tasks)      │  (Claude/   │  (Anthropic/     │
│            │               │   Gemini)   │   OpenAI/etc)    │
└────────────┴───────────────┴─────────────┴──────────────────┘
                              │
                    otel.Tracer("...")
                              │
                    ┌─────────▼─────────┐
                    │  TracerProvider   │
                    │  (Global)         │
                    └─────────┬─────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
    ┌─────────▼────┐  ┌───────▼────┐  ┌───────▼──────┐
    │ GCP Exporter │  │ OTLP       │  │ (disabled)   │
    │ Cloud Trace  │  │ Exporter   │  │ No-op        │
    └──────────────┘  └────────────┘  └──────────────┘
              │               │
              ▼               ▼
    Google Cloud        Jaeger/Grafana/
    Console             Honeycomb/etc

Performance Overhead

When Disabled (Default)

When no telemetry environment variables are set:

No exporters are initialized
Tracers return no-op spans
~2-5 nanoseconds per span call (just a nil check)
Zero memory allocations
No external connections made

This is negligible - you can leave the instrumentation in place without any measurable impact.

When Enabled (Production Use)

When GOOGLE_CLOUD_PROJECT or OTEL_EXPORTER_OTLP_ENDPOINT is set:

~50-200 microseconds per compilation (all phases combined)
~100-500 microseconds per AI API call
Spans are batched and exported asynchronously
Export happens in background goroutines (doesn't block your code)

Production recommendations:

Use sampling for high-throughput services (e.g., 10% of requests)
Batch exporters are enabled by default
Set OTEL_ENVIRONMENT=production for environment tagging

Overhead Breakdown by Component

Component	Spans per Operation	Typical Overhead
Compiler Pipeline	5-7 spans	~100μs
Eval Harness	2 spans per benchmark	~50μs
Messaging	1 span per operation	~20μs
REPL Session	1 session + N input spans	~30μs per input
Check Command	2 spans (root + result)	~40μs
AI Providers	1 span per API call	~30μs

Note: Actual overhead depends on your OTEL collector. Local Jaeger adds ~10μs, while cloud exports (GCP, Honeycomb) add ~50-100μs due to batching and network I/O.

Observatory Dashboard Integration (v0.6.3+)

The AILANG Observatory provides a local dashboard for viewing traces, spans, and metrics from Claude Code, Gemini CLI, and AILANG operations in real-time.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                      Your Development Environment                    │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   Claude Code ──────────────┐                                        │
│   (metrics + events)        │                                        │
│                             │    OTLP/HTTP                           │
│   Gemini CLI ───────────────┼────────────────►  AILANG Server       │
│   (full traces)             │                   (localhost:1957)    │
│                             │                         │              │
│   ailang run ───────────────┘                         │              │
│   (full traces)                                       │              │
│                                                       ▼              │
│                                              ┌──────────────────┐    │
│                                              │  Observatory UI  │    │
│                                              │  /observatory    │    │
│                                              └──────────────────┘    │
└─────────────────────────────────────────────────────────────────────┘

Quick Setup

1. Start the AILANG server:

ailang server
# Or with make
make services-start

2. Configure Claude Code:

Add to ~/.claude/settings.json:

{
  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "1",
    "OTEL_LOGS_EXPORTER": "otlp",
    "OTEL_METRICS_EXPORTER": "otlp",
    "OTEL_EXPORTER_OTLP_PROTOCOL": "http/json",
    "OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:1957",
    "OTEL_RESOURCE_ATTRIBUTES": "ailang.source=user"
  }
}

3. Configure Gemini CLI:

Add to ~/.gemini/settings.json:

{
  "telemetry": {
    "enabled": true
  }
}

And add to your shell profile (~/.zshenv, ~/.bashrc, etc.):

# Gemini CLI telemetry to Observatory
export GEMINI_TELEMETRY_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:1957
export OTEL_EXPORTER_OTLP_PROTOCOL=http/json
export OTEL_RESOURCE_ATTRIBUTES="ailang.source=user"

4. View the dashboard:

Open http://localhost:1957 and navigate to the Observatory tab.

What Gets Captured

Source	Data Type	What You See
Claude Code	Events (OTLP logs)	Token counts, costs, model, session info
Gemini CLI	Full traces	Complete span hierarchy with timing
AILANG commands	Full traces	Compilation phases, eval benchmarks, etc.

Environment Variables Reference

Variable	Purpose	Example
`CLAUDE_CODE_ENABLE_TELEMETRY`	Enable Claude Code telemetry export	`1`
`OTEL_LOGS_EXPORTER`	Log export protocol	`otlp`
`OTEL_METRICS_EXPORTER`	Metrics export protocol	`otlp`
`OTEL_EXPORTER_OTLP_PROTOCOL`	OTLP transport protocol	`http/json` or `grpc`
`OTEL_EXPORTER_OTLP_ENDPOINT`	Observatory server URL	`http://localhost:1957`
`OTEL_RESOURCE_ATTRIBUTES`	Span metadata	`ailang.source=user`
`GEMINI_TELEMETRY_ENABLED`	Enable Gemini CLI telemetry	`true`

Resource Attributes

The OTEL_RESOURCE_ATTRIBUTES field supports these attributes:

Attribute	Description
`ailang.source`	`user` for manual sessions, `coordinator` for automated
`ailang.task_id`	Coordinator task ID (set automatically)
`ailang.session_id`	Session ID (set automatically)

OTLP Receiver Endpoints

The Observatory server exposes standard OTLP endpoints:

Endpoint	Method	Content-Type	Purpose
`/v1/traces`	POST	`application/x-protobuf`, `application/json`	Trace spans
`/v1/logs`	POST	`application/x-protobuf`, `application/json`	Log records (Claude Code events)
`/v1/metrics`	POST	`application/x-protobuf`, `application/json`	Metrics

Both protobuf and JSON formats are supported. Claude Code uses JSON (http/json protocol).

Silent Failure Mode

Important: If the Observatory server is not running, OTLP exports fail silently. This is by design - your CLI tools continue working normally without errors or delays.

To verify telemetry is working:

Ensure ailang server is running
Run a Claude Code or Gemini CLI command
Check the Observatory dashboard for new traces

Filtering Noisy Traces

The Observatory automatically filters out polling endpoints to reduce noise:

/api/approvals, /api/hierarchy, /api/statistics
/api/observatory/*, /api/metrics/*
/assets/*, /v1/* (OTLP endpoints themselves)
/health, /ws, /ws/observatory

Only meaningful operations appear in the trace list.

Coordinator vs User Sessions

The ailang.source attribute distinguishes trace sources:

Source	Origin	Typical Use
`user`	Manual Claude Code / Gemini CLI sessions	Interactive development
`coordinator`	Automated tasks via `ailang coordinator`	Background automation

This allows filtering in the dashboard to see only relevant traces.

Dual Export (Observatory + GCP)

Send traces to both the local Observatory and Google Cloud Trace:

# Local Observatory
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:1957

# Also export to GCP
export GOOGLE_CLOUD_PROJECT=your-project-id

The AILANG server will export to both destinations.

Troubleshooting

Traces not appearing in GCP?

Verify ADC credentials: gcloud auth application-default login
Check project: gcloud config get-value project
Verify permissions: Cloud Trace Agent role required
Run integration test: go test -tags=integration ./internal/telemetry/...

OTLP connection refused?

Check collector is running: curl http://localhost:4318/v1/traces
Verify endpoint URL includes protocol: http:// not just localhost:4318
Check firewall/port access

No spans from AI providers?

Verify telemetry is initialized before AI calls
Check spans are ending: defer span.End() in all paths
Enable debug logging: set OTEL_LOG_LEVEL=debug

Quick Start​

Google Cloud Trace (Recommended for GCP)​

Generic OTLP (Jaeger, Grafana, Honeycomb, etc.)​

Dual Export (Both GCP + OTLP)​

CLI Trace Commands​

Check Status​

List Recent Traces​

View Trace Details​

Environment Variables​

Instrumented Components​

Compiler Pipeline​

Eval Harness (ailang eval-suite)​

Messaging System (ailang messages)​

REPL (ailang repl)​

Check Command (ailang check)​

Server (ailang serve)​

Coordinator (ailang coordinator start)​

Executors​

AI Providers​

Telemetry Helpers (v0.6.3+)​

Truncate​

CategorizeError​

ShortHash​

LineSnippet​

Example: Local Jaeger Setup​

Example: Google Cloud Trace + Local Jaeger​

Integration Tests​

Native CLI Telemetry​

Claude Code​

Gemini CLI​

Cross-Process Trace Linking (v0.6.3+)​

How It Works​

Environment Variables​

Supported Commands​

CLI Tool Scenarios​

CLI Telemetry Configuration​

Using in CI/CD​

Verification​

Programmatic Usage​

Trace Attributes Reference​

Common Attributes​

AI-Specific Attributes​

Task Attributes​

Error Context Attributes (v0.6.3+)​

Code Context Attributes (v0.6.3+)​

CLI Run Attributes (v0.6.3+)​

Architecture​

Performance Overhead​

When Disabled (Default)​

When Enabled (Production Use)​

Overhead Breakdown by Component​

Observatory Dashboard Integration (v0.6.3+)​

Architecture​

Quick Setup​

What Gets Captured​

Environment Variables Reference​

Resource Attributes​

OTLP Receiver Endpoints​

Silent Failure Mode​

Filtering Noisy Traces​

Coordinator vs User Sessions​

Dual Export (Observatory + GCP)​

Troubleshooting​

Traces not appearing in GCP?​

OTLP connection refused?​

No spans from AI providers?​

See Also​

Quick Start

Google Cloud Trace (Recommended for GCP)

Generic OTLP (Jaeger, Grafana, Honeycomb, etc.)

Dual Export (Both GCP + OTLP)

CLI Trace Commands

Check Status

List Recent Traces

View Trace Details

Environment Variables

Instrumented Components

Compiler Pipeline

Eval Harness (`ailang eval-suite`)

Messaging System (`ailang messages`)

REPL (`ailang repl`)

Check Command (`ailang check`)

Server (`ailang serve`)

Coordinator (`ailang coordinator start`)

Executors

AI Providers

Telemetry Helpers (v0.6.3+)

Truncate

CategorizeError

ShortHash

LineSnippet

Example: Local Jaeger Setup

Example: Google Cloud Trace + Local Jaeger

Integration Tests

Native CLI Telemetry

Claude Code

Gemini CLI

Cross-Process Trace Linking (v0.6.3+)

How It Works

Environment Variables

Supported Commands

CLI Tool Scenarios

CLI Telemetry Configuration

Using in CI/CD

Verification

Programmatic Usage

Trace Attributes Reference

Common Attributes

AI-Specific Attributes

Task Attributes

Error Context Attributes (v0.6.3+)

Code Context Attributes (v0.6.3+)

CLI Run Attributes (v0.6.3+)

Architecture

Performance Overhead

When Disabled (Default)

When Enabled (Production Use)

Overhead Breakdown by Component

Observatory Dashboard Integration (v0.6.3+)

Architecture

Quick Setup

What Gets Captured

Environment Variables Reference

Resource Attributes

OTLP Receiver Endpoints

Silent Failure Mode

Filtering Noisy Traces

Coordinator vs User Sessions

Dual Export (Observatory + GCP)

Troubleshooting

Traces not appearing in GCP?

OTLP connection refused?

No spans from AI providers?

See Also