Skip to main content

Telemetry & Tracing

AILANG includes comprehensive OpenTelemetry (OTEL) instrumentation for distributed tracing and observability. This enables integration with standard observability backends like Google Cloud Trace, Grafana, Honeycomb, Jaeger, and more.

Quick Start

# Set your GCP project (uses Application Default Credentials)
export GOOGLE_CLOUD_PROJECT=your-project-id

# Start services
ailang serve
ailang coordinator start

View traces at: https://console.cloud.google.com/traces/explorer?project=your-project-id

Generic OTLP (Jaeger, Grafana, Honeycomb, etc.)

# Set OTLP collector endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# Start services
ailang serve
ailang coordinator start

Dual Export (Both GCP + OTLP)

Send traces to both Google Cloud Trace and another backend simultaneously:

# Configure both
export GOOGLE_CLOUD_PROJECT=your-project-id
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# Traces go to both destinations
ailang serve

CLI Trace Commands

AILANG includes a built-in CLI for querying traces from Google Cloud Trace:

Check Status

# See current telemetry configuration
ailang trace status

Output:

Telemetry Configuration Status
────────────────────────────────────────
Google Cloud Project: multivac-internal-dev
OTLP Endpoint: (not set)

Mode: Google Cloud Trace
View traces: https://console.cloud.google.com/traces/explorer?project=multivac-internal-dev

List Recent Traces

# List last 10 traces (default)
ailang trace list

# Customize time range and limit
ailang trace list --hours 2 --limit 20

# Filter by span name
ailang trace list --filter "ailang run"

# JSON output for scripting
ailang trace list --json

View Trace Details

# View full trace hierarchy with timing
ailang trace view <trace-id>

Example output:

Trace: 5d359e6d157ba7e726aca8a7600a3bfe
Spans: 5
────────────────────────────────────────────────────────────
ailang run: examples/runnable/factorial.ail (2.065ms)
└─ compile: examples/runnable/factorial.ail (1.458ms)
└─ compile.load (358µs)
└─ compile.topo_sort (84µs)
└─ compile.modules (859µs)

Environment Variables

VariableDescriptionExample
GOOGLE_CLOUD_PROJECTGCP project for Cloud Tracemy-project
OTLP_GOOGLE_CLOUD_PROJECTTelemetry-specific GCP project (takes precedence)telemetry-project
OTEL_EXPORTER_OTLP_ENDPOINTOTLP collector endpointhttp://localhost:4318
OTEL_ENVIRONMENTDeployment environmentproduction, staging, development

Priority for GCP project:

  1. OTLP_GOOGLE_CLOUD_PROJECT (if set)
  2. GOOGLE_CLOUD_PROJECT (fallback)

This matches the Gemini CLI telemetry convention.

Instrumented Components

All components emit traces automatically when telemetry is configured:

Compiler Pipeline

The compilation pipeline emits detailed spans for each phase:

Single-file/REPL compilation:

SpanDescriptionKey Attributes
compile.pipelineParent span for entire compilationfile.path, file.size_bytes, is_repl
compile.parseParsing phaseast.nodes (count)
compile.elaborateSurface→Core elaboration-
compile.typecheckType checking-
compile.validateCoreTypeInfo validation-
compile.lowerOperator lowering-

Module compilation:

SpanDescriptionKey Attributes
compile.module_pipelineParent span for module compilationfile.path, file.size_bytes
compile.loadModule loadingmodules.loaded (count)
compile.topo_sortTopological sortmodules.sorted (count)
compile.modulesCompile all modulesmodules.count

Eval Harness (ailang eval-suite)

The benchmark evaluation system emits spans for suite execution and individual benchmarks:

SpanDescriptionKey Attributes
eval.suiteParent span for entire benchmark runeval.models, eval.benchmarks, eval.languages, eval.total_runs, eval.agent_mode, eval.success_count, eval.fail_count, eval.success_rate
eval.benchmarkIndividual benchmark executionbenchmark.id, benchmark.model, benchmark.language, benchmark.seed, benchmark.success, benchmark.duration_ms, benchmark.input_tokens, benchmark.output_tokens, benchmark.cost_usd

v0.6.3+ Enhanced Attributes:

For successful benchmarks:

  • code.preview - First 100 chars of generated code
  • code.hash - 8-char hash for deduplication

For failed benchmarks:

  • error.summary - Truncated error message
  • error.category - Error classification

For standard mode (with repair):

  • benchmark.repair_successful - Whether self-repair succeeded

Messaging System (ailang messages)

Message operations emit spans for observability:

SpanDescriptionKey Attributes
messages.sendCreate/insert messagemessage.to_inbox, message.from_agent, message.type, message.category, message.id
messages.listList messages with filterslist.inbox, list.unread_only, list.collapsed, list.limit, list.result_count
messages.readRead single message by IDmessage.id, message.from_agent, message.to_inbox, message.type
messages.searchSemantic searchsearch.query, search.use_neural, search.threshold, search.limit, search.inbox, search.result_count
messages.ackMark message as readmessage.id, message.new_status
messages.unackMark message as unreadmessage.id, message.new_status
messages.cleanupDelete old/expired messagescleanup.older_than, cleanup.expired_only, cleanup.dry_run, cleanup.deleted_count
messages.github_syncImport issues from GitHubgithub.repo, sync.dry_run, github.issues_found, sync.imported, sync.skipped

REPL (ailang repl)

Interactive REPL sessions emit session-level and input-level spans:

SpanDescriptionKey Attributes
repl.sessionParent span for entire REPL sessionsession.id, version, session.input_count, session.duration_ms
repl.inputIndividual user input evaluationinput.type (command/expression), input.text (truncated 200 chars), input.number

Span Hierarchy:

repl.session (duration of interactive session)
└─ repl.input #1 (first user input)
└─ repl.input #2 (second user input)
└─ ... (subsequent inputs)

Session metrics are finalized when the REPL exits, capturing total input count and session duration.

Check Command (ailang check)

The type checking command emits spans for file/directory verification:

SpanDescriptionKey Attributes
ailang.checkRoot span for check operationfile.path, timeout_ms, is_directory
check.resultCheck outcome with pass/failpassed (bool), errors.count, timed_out (if timeout occurred)

Span Hierarchy:

ailang.check (root span)
└─ check.result (with pass/fail and error counts)
└─ compile.* (compilation phases from compiler pipeline)

When using --timeout, the timed_out attribute is set if compilation exceeds the limit.

Server (ailang serve)

  • HTTP request/response spans via otelhttp middleware
  • Automatic status codes, latency, and path attributes
  • Filters out /health and /ws endpoints

Coordinator (ailang coordinator start)

The coordinator daemon emits task lifecycle spans:

  • Task lifecycle spans: coordinator.execute_task
  • Attributes: task.id, task.type, task.stage
  • Token and cost tracking per task

Executors

Claude Executor:

  • Span: claude.execute
  • Attributes: executor.model, task.workspace, session.id
  • Token counts: task.tokens_in, task.tokens_out
  • Cost: task.cost_usd

Gemini Executor:

  • Span: gemini.execute
  • Same attributes as Claude executor

AI Providers

All AI providers emit spans for API calls:

ProviderSpan NameKey Attributes
Anthropicanthropic.generateai.model, ai.tokens_in, ai.tokens_out, http.status_code, ai.prompt_preview, ai.response_preview, ai.finish_reason
OpenAIopenai.generateai.model, ai.api_type (chat/responses), ai.tokens_*, ai.prompt_preview, ai.response_preview
Geminigemini.generateai.model, ai.auth_type (api_key/adc), ai.tokens_*, ai.prompt_preview, ai.response_preview
Ollamaollama.generateai.model, ai.endpoint, ai.prompt_preview, ai.response_preview

Telemetry Helpers (v0.6.3+)

The internal/telemetry package provides helper functions for safe, consistent span attributes:

Truncate

Safely truncate strings for span attributes, preserving UTF-8 boundaries:

import "github.com/sunholo/ailang/internal/telemetry"

// Truncate to 100 chars, adding "..." if truncated
preview := telemetry.Truncate(longString, 100)
// "Hello, 世界..." (never breaks in middle of multi-byte chars)

CategorizeError

Categorize errors for filtering and aggregation:

category := telemetry.CategorizeError(err)
// Returns: "network", "timeout", "auth", "rate_limit", "parse", "type", "runtime", or "unknown"

ShortHash

Generate deterministic short hashes for deduplication:

hash := telemetry.ShortHash(codeString)
// Returns 8-char hex string like "a1b2c3d4"

LineSnippet

Extract source code context around a line number:

snippet := telemetry.LineSnippet(sourceCode, lineNumber, 60)
// Returns up to 60 chars of the specified line

Example: Local Jaeger Setup

# Start Jaeger with OTLP collector
docker run -d --name jaeger \
-p 16686:16686 \
-p 4318:4318 \
jaegertracing/all-in-one:latest

# Configure AILANG
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# Start services
ailang serve &
ailang coordinator start

# View traces at http://localhost:16686

Example: Google Cloud Trace + Local Jaeger

# Dual export - both GCP and local Jaeger
export GOOGLE_CLOUD_PROJECT=my-gcp-project
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# Traces go to BOTH destinations
ailang serve

Integration Tests

Run the integration tests to verify your Google Cloud Trace setup:

# Set project
export GOOGLE_CLOUD_PROJECT=your-project-id

# Run tests
go test -tags=integration -v -run TestGoogleCloudTrace ./internal/telemetry/...

Tests create sample traces and verify they export correctly.

Native CLI Telemetry

Both Claude Code and Gemini CLI have native OTEL support:

Claude Code

export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Gemini CLI

Configure in ~/.gemini/settings.json:

{
"telemetry": {
"enabled": true,
"endpoint": "http://localhost:4318"
}
}

Cross-Process Trace Linking (v0.6.3+)

AILANG supports end-to-end distributed tracing across process boundaries. When the coordinator spawns CLI executors (Claude Code, Gemini CLI), and those CLIs spawn ailang run commands, all spans are linked into a single trace.

How It Works

The trace context flows via W3C TRACEPARENT environment variables:

Environment Variables

The following environment variables are used for trace propagation:

VariableFormatDescription
TRACEPARENT00-{trace_id}-{span_id}-{flags}W3C trace context (standard)
TRACESTATEVendor-specificAdditional trace state (optional)
AILANG_TASK_IDUUIDCoordinator task ID (fallback correlation)
AILANG_SESSION_IDUUIDExecutor session ID (fallback correlation)

Supported Commands

These commands extract trace context from the environment:

CommandBehavior
ailang runCreates child span under parent trace
ailang checkCreates child span under parent trace
ailang eval-suiteCreates child span under parent trace
ailang replSession uses extracted trace context

CLI Tool Scenarios

Depending on whether the CLI tool (Claude Code, Gemini CLI) supports OTEL:

Scenario A: Gemini CLI (full traces)

Executor span ──► Gemini CLI span ──► ailang run span
│ │ │
└──────────────────┴───────────────────┘
(single trace)

Gemini CLI supports full trace export. Spans appear in the same trace hierarchy.

Scenario B: Claude Code (metrics + linked traces)

Executor span ─────────────────────► ailang run span
│ │
└─────────────────────────────────────┘
(linked trace, CLI gap)

+ Claude Code metrics/events with ailang.task_id correlation

Claude Code exports metrics and events (not traces), but we inject ailang.task_id via OTEL_RESOURCE_ATTRIBUTES so they can be joined with AILANG traces in the dashboard.

Scenario C: CLI passes env vars through (fallback)

Executor span ──────────────────► ailang run span
│ │
└──────────────────────────────────┘
(linked trace, CLI gap)

Scenario C: CLI sanitizes environment (rare)

Executor span             ailang run span
│ │
│ (correlated via │
└──── AILANG_TASK_ID) ─────┘

CLI Telemetry Configuration

The AILANG executors automatically configure CLI telemetry. Here's what gets injected:

Claude Code (via environment variables):

CLAUDE_CODE_ENABLE_TELEMETRY=1           # Enable telemetry
OTEL_METRICS_EXPORTER=otlp # Export metrics via OTLP
OTEL_LOGS_EXPORTER=otlp # Export events via OTLP
OTEL_RESOURCE_ATTRIBUTES=ailang.task_id=...,ailang.session_id=... # Correlation IDs
OTEL_EXPORTER_OTLP_ENDPOINT=... # Inherited from parent
OTEL_EXPORTER_OTLP_PROTOCOL=... # Inherited from parent
OTLP_GOOGLE_CLOUD_PROJECT=... # Primary project
GOOGLE_CLOUD_PROJECT=... # Fallback project
TRACEPARENT=00-{trace_id}-{span_id}-01 # W3C trace context (for ailang run)

Note: Claude Code exports metrics and events only (not traces). The OTEL_RESOURCE_ATTRIBUTES allows joining Claude Code metrics with AILANG traces by ailang.task_id.

Gemini CLI (via environment variables):

GEMINI_TELEMETRY_ENABLED=true            # Enable telemetry (includes traces)
GEMINI_TELEMETRY_TARGET=gcp # Export to GCP (if project set)
OTEL_RESOURCE_ATTRIBUTES=ailang.task_id=...,ailang.session_id=... # Correlation IDs
OTLP_GOOGLE_CLOUD_PROJECT=... # Primary project var (checked first)
GOOGLE_CLOUD_PROJECT=... # Fallback project
OTEL_EXPORTER_OTLP_ENDPOINT=... # For local collector
TRACEPARENT=00-{trace_id}-{span_id}-01 # W3C trace context

Note: Gemini CLI exports full traces that appear in the trace hierarchy.

Project detection order:

  1. OTLP_GOOGLE_CLOUD_PROJECT - Primary (Gemini CLI standard)
  2. GOOGLE_CLOUD_PROJECT - Fallback (GCP standard)

Manual CLI configuration (if running CLIs directly):

Claude Code settings (~/.claude/settings.json):

{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4317"
}
}

Gemini CLI settings (~/.gemini/settings.json):

{
"telemetry": {
"enabled": true,
"target": "gcp"
}
}

Using in CI/CD

Export trace context to link CI runs with AILANG executions:

GitHub Actions:

jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Run with trace linking
env:
TRACEPARENT: "00-${{ github.run_id }}-${{ github.job }}-01"
AILANG_TASK_ID: ${{ github.run_id }}
run: |
ailang run --caps IO --entry main app.ail

Cloud Build:

steps:
- name: 'ailang-builder'
env:
- 'TRACEPARENT=00-${BUILD_ID}-cloudtrace-01'
- 'AILANG_TASK_ID=${BUILD_ID}'
args: ['run', '--caps', 'IO', '--entry', 'main', 'app.ail']

Verification

To verify trace linking is working:

# 1. Set up telemetry
export GOOGLE_CLOUD_PROJECT=your-project

# 2. Create a trace context manually
export TRACEPARENT="00-$(uuidgen | tr -d '-')-$(uuidgen | tr -d '-' | cut -c1-16)-01"

# 3. Run a command
ailang run --caps IO --entry main examples/runnable/hello.ail

# 4. Check traces
ailang trace list --hours 1

The trace should show a child span linked to your TRACEPARENT.

Programmatic Usage

The telemetry package provides helpers for trace propagation:

import "github.com/sunholo/ailang/internal/telemetry"

// Inject trace context into subprocess environment
env := os.Environ()
env = telemetry.InjectTraceContext(ctx, env)
env = telemetry.InjectCorrelationIDs(env, taskID, sessionID)
cmd.Env = env

// Extract trace context from environment (in subprocess)
ctx = telemetry.ExtractTraceContext(ctx)
taskID, sessionID := telemetry.ExtractCorrelationIDs()

Trace Attributes Reference

Common Attributes

All spans include:

  • service.name - Service identifier (e.g., ailang-server, ailang-coordinator)
  • service.version - AILANG version
  • deployment.environment - From OTEL_ENVIRONMENT (default: development)
  • process.runtime.name - go
  • process.runtime.version - Go version

AI-Specific Attributes

AttributeDescription
ai.providerProvider name: anthropic, openai, gemini, ollama
ai.modelModel ID (e.g., claude-sonnet-4-5, gpt-5)
ai.tokens_inInput tokens
ai.tokens_outOutput tokens
ai.tokens_totalTotal tokens
ai.cost_usdEstimated cost in USD
ai.prompt_previewFirst 100 chars of prompt (v0.6.3+)
ai.response_previewFirst 100 chars of response (v0.6.3+)
ai.finish_reasonWhy generation stopped: end_turn, max_tokens, etc. (v0.6.3+)

Task Attributes

AttributeDescription
task.idUnique task identifier
task.typeTask type: bug, feature, docs, etc.
task.stagePipeline stage: design, sprint, implement
task.successBoolean success status
task.duration_msDuration in milliseconds

Error Context Attributes (v0.6.3+)

When errors occur, spans include rich debugging context:

AttributeDescription
error.messageTruncated error message (max 200 chars, UTF-8 safe)
error.categoryError type: network, timeout, auth, rate_limit, parse, type, runtime, unknown
error.locationPosition in source: line:column format
error.snippetSource code around error (max 60 chars)
error.summaryShort error description for failed benchmarks

Code Context Attributes (v0.6.3+)

Eval benchmarks include code analysis attributes:

AttributeDescription
code.previewFirst 100 chars of generated code
code.hashShort hash (8 chars) for deduplication
benchmark.repair_successfulWhether self-repair succeeded (standard mode)

CLI Run Attributes (v0.6.3+)

The ailang run command includes:

AttributeDescription
file.pathPath to the executed file
entry.functionEntry point function name
caps.grantedList of granted capabilities (e.g., ["IO", "FS"])

Architecture

┌─────────────────────────────────────────────────────────────┐
│ Your Application │
├─────────────────────────────────────────────────────────────┤
│ Server │ Coordinator │ Executors │ AI Providers │
│ (HTTP) │ (Tasks) │ (Claude/ │ (Anthropic/ │
│ │ │ Gemini) │ OpenAI/etc) │
└────────────┴───────────────┴─────────────┴──────────────────┘

otel.Tracer("...")

┌─────────▼─────────┐
│ TracerProvider │
│ (Global) │
└─────────┬─────────┘

┌───────────────┼───────────────┐
│ │ │
┌─────────▼────┐ ┌───────▼────┐ ┌───────▼──────┐
│ GCP Exporter │ │ OTLP │ │ (disabled) │
│ Cloud Trace │ │ Exporter │ │ No-op │
└──────────────┘ └────────────┘ └──────────────┘
│ │
▼ ▼
Google Cloud Jaeger/Grafana/
Console Honeycomb/etc

Performance Overhead

When Disabled (Default)

When no telemetry environment variables are set:

  • No exporters are initialized
  • Tracers return no-op spans
  • ~2-5 nanoseconds per span call (just a nil check)
  • Zero memory allocations
  • No external connections made

This is negligible - you can leave the instrumentation in place without any measurable impact.

When Enabled (Production Use)

When GOOGLE_CLOUD_PROJECT or OTEL_EXPORTER_OTLP_ENDPOINT is set:

  • ~50-200 microseconds per compilation (all phases combined)
  • ~100-500 microseconds per AI API call
  • Spans are batched and exported asynchronously
  • Export happens in background goroutines (doesn't block your code)

Production recommendations:

  • Use sampling for high-throughput services (e.g., 10% of requests)
  • Batch exporters are enabled by default
  • Set OTEL_ENVIRONMENT=production for environment tagging

Overhead Breakdown by Component

ComponentSpans per OperationTypical Overhead
Compiler Pipeline5-7 spans~100μs
Eval Harness2 spans per benchmark~50μs
Messaging1 span per operation~20μs
REPL Session1 session + N input spans~30μs per input
Check Command2 spans (root + result)~40μs
AI Providers1 span per API call~30μs

Note: Actual overhead depends on your OTEL collector. Local Jaeger adds ~10μs, while cloud exports (GCP, Honeycomb) add ~50-100μs due to batching and network I/O.

Observatory Dashboard Integration (v0.6.3+)

The AILANG Observatory provides a local dashboard for viewing traces, spans, and metrics from Claude Code, Gemini CLI, and AILANG operations in real-time.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│ Your Development Environment │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Claude Code ──────────────┐ │
│ (metrics + events) │ │
│ │ OTLP/HTTP │
│ Gemini CLI ───────────────┼────────────────► AILANG Server │
│ (full traces) │ (localhost:1957) │
│ │ │ │
│ ailang run ───────────────┘ │ │
│ (full traces) │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Observatory UI │ │
│ │ /observatory │ │
│ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘

Quick Setup

1. Start the AILANG server:

ailang server
# Or with make
make services-start

2. Configure Claude Code:

Add to ~/.claude/settings.json:

{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "http/json",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:1957",
"OTEL_RESOURCE_ATTRIBUTES": "ailang.source=user"
}
}

3. Configure Gemini CLI:

Add to ~/.gemini/settings.json:

{
"telemetry": {
"enabled": true
}
}

And add to your shell profile (~/.zshenv, ~/.bashrc, etc.):

# Gemini CLI telemetry to Observatory
export GEMINI_TELEMETRY_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:1957
export OTEL_EXPORTER_OTLP_PROTOCOL=http/json
export OTEL_RESOURCE_ATTRIBUTES="ailang.source=user"

4. View the dashboard:

Open http://localhost:1957 and navigate to the Observatory tab.

What Gets Captured

SourceData TypeWhat You See
Claude CodeEvents (OTLP logs)Token counts, costs, model, session info
Gemini CLIFull tracesComplete span hierarchy with timing
AILANG commandsFull tracesCompilation phases, eval benchmarks, etc.

Environment Variables Reference

VariablePurposeExample
CLAUDE_CODE_ENABLE_TELEMETRYEnable Claude Code telemetry export1
OTEL_LOGS_EXPORTERLog export protocolotlp
OTEL_METRICS_EXPORTERMetrics export protocolotlp
OTEL_EXPORTER_OTLP_PROTOCOLOTLP transport protocolhttp/json or grpc
OTEL_EXPORTER_OTLP_ENDPOINTObservatory server URLhttp://localhost:1957
OTEL_RESOURCE_ATTRIBUTESSpan metadataailang.source=user
GEMINI_TELEMETRY_ENABLEDEnable Gemini CLI telemetrytrue

Resource Attributes

The OTEL_RESOURCE_ATTRIBUTES field supports these attributes:

AttributeDescription
ailang.sourceuser for manual sessions, coordinator for automated
ailang.task_idCoordinator task ID (set automatically)
ailang.session_idSession ID (set automatically)

OTLP Receiver Endpoints

The Observatory server exposes standard OTLP endpoints:

EndpointMethodContent-TypePurpose
/v1/tracesPOSTapplication/x-protobuf, application/jsonTrace spans
/v1/logsPOSTapplication/x-protobuf, application/jsonLog records (Claude Code events)
/v1/metricsPOSTapplication/x-protobuf, application/jsonMetrics

Both protobuf and JSON formats are supported. Claude Code uses JSON (http/json protocol).

Silent Failure Mode

Important: If the Observatory server is not running, OTLP exports fail silently. This is by design - your CLI tools continue working normally without errors or delays.

To verify telemetry is working:

  1. Ensure ailang server is running
  2. Run a Claude Code or Gemini CLI command
  3. Check the Observatory dashboard for new traces

Filtering Noisy Traces

The Observatory automatically filters out polling endpoints to reduce noise:

  • /api/approvals, /api/hierarchy, /api/statistics
  • /api/observatory/*, /api/metrics/*
  • /assets/*, /v1/* (OTLP endpoints themselves)
  • /health, /ws, /ws/observatory

Only meaningful operations appear in the trace list.

Coordinator vs User Sessions

The ailang.source attribute distinguishes trace sources:

SourceOriginTypical Use
userManual Claude Code / Gemini CLI sessionsInteractive development
coordinatorAutomated tasks via ailang coordinatorBackground automation

This allows filtering in the dashboard to see only relevant traces.

Dual Export (Observatory + GCP)

Send traces to both the local Observatory and Google Cloud Trace:

# Local Observatory
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:1957

# Also export to GCP
export GOOGLE_CLOUD_PROJECT=your-project-id

The AILANG server will export to both destinations.

Troubleshooting

Traces not appearing in GCP?

  1. Verify ADC credentials: gcloud auth application-default login
  2. Check project: gcloud config get-value project
  3. Verify permissions: Cloud Trace Agent role required
  4. Run integration test: go test -tags=integration ./internal/telemetry/...

OTLP connection refused?

  1. Check collector is running: curl http://localhost:4318/v1/traces
  2. Verify endpoint URL includes protocol: http:// not just localhost:4318
  3. Check firewall/port access

No spans from AI providers?

  1. Verify telemetry is initialized before AI calls
  2. Check spans are ending: defer span.End() in all paths
  3. Enable debug logging: set OTEL_LOG_LEVEL=debug

See Also