Telemetry & Tracing
AILANG includes comprehensive OpenTelemetry (OTEL) instrumentation for distributed tracing and observability. This enables integration with standard observability backends like Google Cloud Trace, Grafana, Honeycomb, Jaeger, and more.
Quick Start
Google Cloud Trace (Recommended for GCP)
# Set your GCP project (uses Application Default Credentials)
export GOOGLE_CLOUD_PROJECT=your-project-id
# Start services
ailang serve
ailang coordinator start
View traces at: https://console.cloud.google.com/traces/explorer?project=your-project-id
Generic OTLP (Jaeger, Grafana, Honeycomb, etc.)
# Set OTLP collector endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
# Start services
ailang serve
ailang coordinator start
Dual Export (Both GCP + OTLP)
Send traces to both Google Cloud Trace and another backend simultaneously:
# Configure both
export GOOGLE_CLOUD_PROJECT=your-project-id
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
# Traces go to both destinations
ailang serve
CLI Trace Commands
AILANG includes a built-in CLI for querying traces from Google Cloud Trace:
Check Status
# See current telemetry configuration
ailang trace status
Output:
Telemetry Configuration Status
────────────────────────────────────────
Google Cloud Project: multivac-internal-dev
OTLP Endpoint: (not set)
Mode: Google Cloud Trace
View traces: https://console.cloud.google.com/traces/explorer?project=multivac-internal-dev
List Recent Traces
# List last 10 traces (default)
ailang trace list
# Customize time range and limit
ailang trace list --hours 2 --limit 20
# Filter by span name
ailang trace list --filter "ailang run"
# JSON output for scripting
ailang trace list --json
View Trace Details
# View full trace hierarchy with timing
ailang trace view <trace-id>
Example output:
Trace: 5d359e6d157ba7e726aca8a7600a3bfe
Spans: 5
────────────────────────────────────────────────────────────
ailang run: examples/runnable/factorial.ail (2.065ms)
└─ compile: examples/runnable/factorial.ail (1.458ms)
└─ compile.load (358µs)
└─ compile.topo_sort (84µs)
└─ compile.modules (859µs)
Environment Variables
| Variable | Description | Example |
|---|---|---|
GOOGLE_CLOUD_PROJECT | GCP project for Cloud Trace | my-project |
OTLP_GOOGLE_CLOUD_PROJECT | Telemetry-specific GCP project (takes precedence) | telemetry-project |
OTEL_EXPORTER_OTLP_ENDPOINT | OTLP collector endpoint | http://localhost:4318 |
OTEL_ENVIRONMENT | Deployment environment | production, staging, development |
Priority for GCP project:
OTLP_GOOGLE_CLOUD_PROJECT(if set)GOOGLE_CLOUD_PROJECT(fallback)
This matches the Gemini CLI telemetry convention.
Instrumented Components
All components emit traces automatically when telemetry is configured:
Compiler Pipeline
The compilation pipeline emits detailed spans for each phase:
Single-file/REPL compilation:
| Span | Description | Key Attributes |
|---|---|---|
compile.pipeline | Parent span for entire compilation | file.path, file.size_bytes, is_repl |
compile.parse | Parsing phase | ast.nodes (count) |
compile.elaborate | Surface→Core elaboration | - |
compile.typecheck | Type checking | - |
compile.validate | CoreTypeInfo validation | - |
compile.lower | Operator lowering | - |
Module compilation:
| Span | Description | Key Attributes |
|---|---|---|
compile.module_pipeline | Parent span for module compilation | file.path, file.size_bytes |
compile.load | Module loading | modules.loaded (count) |
compile.topo_sort | Topological sort | modules.sorted (count) |
compile.modules | Compile all modules | modules.count |
Eval Harness (ailang eval-suite)
The benchmark evaluation system emits spans for suite execution and individual benchmarks:
| Span | Description | Key Attributes |
|---|---|---|
eval.suite | Parent span for entire benchmark run | eval.models, eval.benchmarks, eval.languages, eval.total_runs, eval.agent_mode, eval.success_count, eval.fail_count, eval.success_rate |
eval.benchmark | Individual benchmark execution | benchmark.id, benchmark.model, benchmark.language, benchmark.seed, benchmark.success, benchmark.duration_ms, benchmark.input_tokens, benchmark.output_tokens, benchmark.cost_usd |
v0.6.3+ Enhanced Attributes:
For successful benchmarks:
code.preview- First 100 chars of generated codecode.hash- 8-char hash for deduplication
For failed benchmarks:
error.summary- Truncated error messageerror.category- Error classification
For standard mode (with repair):
benchmark.repair_successful- Whether self-repair succeeded
Messaging System (ailang messages)
Message operations emit spans for observability:
| Span | Description | Key Attributes |
|---|---|---|
messages.send | Create/insert message | message.to_inbox, message.from_agent, message.type, message.category, message.id |
messages.list | List messages with filters | list.inbox, list.unread_only, list.collapsed, list.limit, list.result_count |
messages.read | Read single message by ID | message.id, message.from_agent, message.to_inbox, message.type |
messages.search | Semantic search | search.query, search.use_neural, search.threshold, search.limit, search.inbox, search.result_count |
messages.ack | Mark message as read | message.id, message.new_status |
messages.unack | Mark message as unread | message.id, message.new_status |
messages.cleanup | Delete old/expired messages | cleanup.older_than, cleanup.expired_only, cleanup.dry_run, cleanup.deleted_count |
messages.github_sync | Import issues from GitHub | github.repo, sync.dry_run, github.issues_found, sync.imported, sync.skipped |
REPL (ailang repl)
Interactive REPL sessions emit session-level and input-level spans:
| Span | Description | Key Attributes |
|---|---|---|
repl.session | Parent span for entire REPL session | session.id, version, session.input_count, session.duration_ms |
repl.input | Individual user input evaluation | input.type (command/expression), input.text (truncated 200 chars), input.number |
Span Hierarchy:
repl.session (duration of interactive session)
└─ repl.input #1 (first user input)
└─ repl.input #2 (second user input)
└─ ... (subsequent inputs)
Session metrics are finalized when the REPL exits, capturing total input count and session duration.
Check Command (ailang check)
The type checking command emits spans for file/directory verification:
| Span | Description | Key Attributes |
|---|---|---|
ailang.check | Root span for check operation | file.path, timeout_ms, is_directory |
check.result | Check outcome with pass/fail | passed (bool), errors.count, timed_out (if timeout occurred) |
Span Hierarchy:
ailang.check (root span)
└─ check.result (with pass/fail and error counts)
└─ compile.* (compilation phases from compiler pipeline)
When using --timeout, the timed_out attribute is set if compilation exceeds the limit.
Server (ailang serve)
- HTTP request/response spans via
otelhttpmiddleware - Automatic status codes, latency, and path attributes
- Filters out
/healthand/wsendpoints
Coordinator (ailang coordinator start)
The coordinator daemon emits task lifecycle spans:
- Task lifecycle spans:
coordinator.execute_task - Attributes:
task.id,task.type,task.stage - Token and cost tracking per task
Executors
Claude Executor:
- Span:
claude.execute - Attributes:
executor.model,task.workspace,session.id - Token counts:
task.tokens_in,task.tokens_out - Cost:
task.cost_usd
Gemini Executor:
- Span:
gemini.execute - Same attributes as Claude executor
AI Providers
All AI providers emit spans for API calls:
| Provider | Span Name | Key Attributes |
|---|---|---|
| Anthropic | anthropic.generate | ai.model, ai.tokens_in, ai.tokens_out, http.status_code, ai.prompt_preview, ai.response_preview, ai.finish_reason |
| OpenAI | openai.generate | ai.model, ai.api_type (chat/responses), ai.tokens_*, ai.prompt_preview, ai.response_preview |
| Gemini | gemini.generate | ai.model, ai.auth_type (api_key/adc), ai.tokens_*, ai.prompt_preview, ai.response_preview |
| Ollama | ollama.generate | ai.model, ai.endpoint, ai.prompt_preview, ai.response_preview |
Telemetry Helpers (v0.6.3+)
The internal/telemetry package provides helper functions for safe, consistent span attributes:
Truncate
Safely truncate strings for span attributes, preserving UTF-8 boundaries:
import "github.com/sunholo/ailang/internal/telemetry"
// Truncate to 100 chars, adding "..." if truncated
preview := telemetry.Truncate(longString, 100)
// "Hello, 世界..." (never breaks in middle of multi-byte chars)
CategorizeError
Categorize errors for filtering and aggregation:
category := telemetry.CategorizeError(err)
// Returns: "network", "timeout", "auth", "rate_limit", "parse", "type", "runtime", or "unknown"
ShortHash
Generate deterministic short hashes for deduplication:
hash := telemetry.ShortHash(codeString)
// Returns 8-char hex string like "a1b2c3d4"
LineSnippet
Extract source code context around a line number:
snippet := telemetry.LineSnippet(sourceCode, lineNumber, 60)
// Returns up to 60 chars of the specified line
Example: Local Jaeger Setup
# Start Jaeger with OTLP collector
docker run -d --name jaeger \
-p 16686:16686 \
-p 4318:4318 \
jaegertracing/all-in-one:latest
# Configure AILANG
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
# Start services
ailang serve &
ailang coordinator start
# View traces at http://localhost:16686
Example: Google Cloud Trace + Local Jaeger
# Dual export - both GCP and local Jaeger
export GOOGLE_CLOUD_PROJECT=my-gcp-project
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
# Traces go to BOTH destinations
ailang serve
Integration Tests
Run the integration tests to verify your Google Cloud Trace setup:
# Set project
export GOOGLE_CLOUD_PROJECT=your-project-id
# Run tests
go test -tags=integration -v -run TestGoogleCloudTrace ./internal/telemetry/...
Tests create sample traces and verify they export correctly.
Native CLI Telemetry
Both Claude Code and Gemini CLI have native OTEL support:
Claude Code
export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
Gemini CLI
Configure in ~/.gemini/settings.json:
{
"telemetry": {
"enabled": true,
"endpoint": "http://localhost:4318"
}
}
Cross-Process Trace Linking (v0.6.3+)
AILANG supports end-to-end distributed tracing across process boundaries. When the coordinator spawns CLI executors (Claude Code, Gemini CLI), and those CLIs spawn ailang run commands, all spans are linked into a single trace.
How It Works
The trace context flows via W3C TRACEPARENT environment variables:
Environment Variables
The following environment variables are used for trace propagation:
| Variable | Format | Description |
|---|---|---|
TRACEPARENT | 00-{trace_id}-{span_id}-{flags} | W3C trace context (standard) |
TRACESTATE | Vendor-specific | Additional trace state (optional) |
AILANG_TASK_ID | UUID | Coordinator task ID (fallback correlation) |
AILANG_SESSION_ID | UUID | Executor session ID (fallback correlation) |
Supported Commands
These commands extract trace context from the environment:
| Command | Behavior |
|---|---|
ailang run | Creates child span under parent trace |
ailang check | Creates child span under parent trace |
ailang eval-suite | Creates child span under parent trace |
ailang repl | Session uses extracted trace context |
CLI Tool Scenarios
Depending on whether the CLI tool (Claude Code, Gemini CLI) supports OTEL:
Scenario A: Gemini CLI (full traces)
Executor span ──► Gemini CLI span ──► ailang run span
│ │ │
└──────────────────┴───────────────────┘
(single trace)
Gemini CLI supports full trace export. Spans appear in the same trace hierarchy.
Scenario B: Claude Code (metrics + linked traces)
Executor span ─────────────────────► ailang run span
│ │
└─────────────────────────────────────┘
(linked trace, CLI gap)
+ Claude Code metrics/events with ailang.task_id correlation
Claude Code exports metrics and events (not traces), but we inject ailang.task_id via OTEL_RESOURCE_ATTRIBUTES so they can be joined with AILANG traces in the dashboard.
Scenario C: CLI passes env vars through (fallback)
Executor span ──────────────────► ailang run span
│ │
└──────────────────────────────────┘
(linked trace, CLI gap)
Scenario C: CLI sanitizes environment (rare)
Executor span ailang run span
│ │
│ (correlated via │
└──── AILANG_TASK_ID) ─────┘
CLI Telemetry Configuration
The AILANG executors automatically configure CLI telemetry. Here's what gets injected:
Claude Code (via environment variables):
CLAUDE_CODE_ENABLE_TELEMETRY=1 # Enable telemetry
OTEL_METRICS_EXPORTER=otlp # Export metrics via OTLP
OTEL_LOGS_EXPORTER=otlp # Export events via OTLP
OTEL_RESOURCE_ATTRIBUTES=ailang.task_id=...,ailang.session_id=... # Correlation IDs
OTEL_EXPORTER_OTLP_ENDPOINT=... # Inherited from parent
OTEL_EXPORTER_OTLP_PROTOCOL=... # Inherited from parent
OTLP_GOOGLE_CLOUD_PROJECT=... # Primary project
GOOGLE_CLOUD_PROJECT=... # Fallback project
TRACEPARENT=00-{trace_id}-{span_id}-01 # W3C trace context (for ailang run)
Note: Claude Code exports metrics and events only (not traces). The OTEL_RESOURCE_ATTRIBUTES allows joining Claude Code metrics with AILANG traces by ailang.task_id.
Gemini CLI (via environment variables):
GEMINI_TELEMETRY_ENABLED=true # Enable telemetry (includes traces)
GEMINI_TELEMETRY_TARGET=gcp # Export to GCP (if project set)
OTEL_RESOURCE_ATTRIBUTES=ailang.task_id=...,ailang.session_id=... # Correlation IDs
OTLP_GOOGLE_CLOUD_PROJECT=... # Primary project var (checked first)
GOOGLE_CLOUD_PROJECT=... # Fallback project
OTEL_EXPORTER_OTLP_ENDPOINT=... # For local collector
TRACEPARENT=00-{trace_id}-{span_id}-01 # W3C trace context
Note: Gemini CLI exports full traces that appear in the trace hierarchy.
Project detection order:
OTLP_GOOGLE_CLOUD_PROJECT- Primary (Gemini CLI standard)GOOGLE_CLOUD_PROJECT- Fallback (GCP standard)
Manual CLI configuration (if running CLIs directly):
Claude Code settings (~/.claude/settings.json):
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4317"
}
}
Gemini CLI settings (~/.gemini/settings.json):
{
"telemetry": {
"enabled": true,
"target": "gcp"
}
}
Using in CI/CD
Export trace context to link CI runs with AILANG executions:
GitHub Actions:
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Run with trace linking
env:
TRACEPARENT: "00-${{ github.run_id }}-${{ github.job }}-01"
AILANG_TASK_ID: ${{ github.run_id }}
run: |
ailang run --caps IO --entry main app.ail
Cloud Build:
steps:
- name: 'ailang-builder'
env:
- 'TRACEPARENT=00-${BUILD_ID}-cloudtrace-01'
- 'AILANG_TASK_ID=${BUILD_ID}'
args: ['run', '--caps', 'IO', '--entry', 'main', 'app.ail']
Verification
To verify trace linking is working:
# 1. Set up telemetry
export GOOGLE_CLOUD_PROJECT=your-project
# 2. Create a trace context manually
export TRACEPARENT="00-$(uuidgen | tr -d '-')-$(uuidgen | tr -d '-' | cut -c1-16)-01"
# 3. Run a command
ailang run --caps IO --entry main examples/runnable/hello.ail
# 4. Check traces
ailang trace list --hours 1
The trace should show a child span linked to your TRACEPARENT.
Programmatic Usage
The telemetry package provides helpers for trace propagation:
import "github.com/sunholo/ailang/internal/telemetry"
// Inject trace context into subprocess environment
env := os.Environ()
env = telemetry.InjectTraceContext(ctx, env)
env = telemetry.InjectCorrelationIDs(env, taskID, sessionID)
cmd.Env = env
// Extract trace context from environment (in subprocess)
ctx = telemetry.ExtractTraceContext(ctx)
taskID, sessionID := telemetry.ExtractCorrelationIDs()
Trace Attributes Reference
Common Attributes
All spans include:
service.name- Service identifier (e.g.,ailang-server,ailang-coordinator)service.version- AILANG versiondeployment.environment- FromOTEL_ENVIRONMENT(default:development)process.runtime.name-goprocess.runtime.version- Go version
AI-Specific Attributes
| Attribute | Description |
|---|---|
ai.provider | Provider name: anthropic, openai, gemini, ollama |
ai.model | Model ID (e.g., claude-sonnet-4-5, gpt-5) |
ai.tokens_in | Input tokens |
ai.tokens_out | Output tokens |
ai.tokens_total | Total tokens |
ai.cost_usd | Estimated cost in USD |
ai.prompt_preview | First 100 chars of prompt (v0.6.3+) |
ai.response_preview | First 100 chars of response (v0.6.3+) |
ai.finish_reason | Why generation stopped: end_turn, max_tokens, etc. (v0.6.3+) |
Task Attributes
| Attribute | Description |
|---|---|
task.id | Unique task identifier |
task.type | Task type: bug, feature, docs, etc. |
task.stage | Pipeline stage: design, sprint, implement |
task.success | Boolean success status |
task.duration_ms | Duration in milliseconds |
Error Context Attributes (v0.6.3+)
When errors occur, spans include rich debugging context:
| Attribute | Description |
|---|---|
error.message | Truncated error message (max 200 chars, UTF-8 safe) |
error.category | Error type: network, timeout, auth, rate_limit, parse, type, runtime, unknown |
error.location | Position in source: line:column format |
error.snippet | Source code around error (max 60 chars) |
error.summary | Short error description for failed benchmarks |
Code Context Attributes (v0.6.3+)
Eval benchmarks include code analysis attributes:
| Attribute | Description |
|---|---|
code.preview | First 100 chars of generated code |
code.hash | Short hash (8 chars) for deduplication |
benchmark.repair_successful | Whether self-repair succeeded (standard mode) |
CLI Run Attributes (v0.6.3+)
The ailang run command includes:
| Attribute | Description |
|---|---|
file.path | Path to the executed file |
entry.function | Entry point function name |
caps.granted | List of granted capabilities (e.g., ["IO", "FS"]) |
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Your Application │
├─────────────────────────────────────────────────────────────┤
│ Server │ Coordinator │ Executors │ AI Providers │
│ (HTTP) │ (Tasks) │ (Claude/ │ (Anthropic/ │
│ │ │ Gemini) │ OpenAI/etc) │
└────────────┴───────────────┴─────────────┴──────────────────┘
│
otel.Tracer("...")
│
┌─────────▼─────────┐
│ TracerProvider │
│ (Global) │
└─────────┬─────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌─────────▼────┐ ┌───────▼────┐ ┌───────▼──────┐
│ GCP Exporter │ │ OTLP │ │ (disabled) │
│ Cloud Trace │ │ Exporter │ │ No-op │
└──────────────┘ └────────────┘ └──────────────┘
│ │
▼ ▼
Google Cloud Jaeger/Grafana/
Console Honeycomb/etc
Performance Overhead
When Disabled (Default)
When no telemetry environment variables are set:
- No exporters are initialized
- Tracers return no-op spans
- ~2-5 nanoseconds per span call (just a nil check)
- Zero memory allocations
- No external connections made
This is negligible - you can leave the instrumentation in place without any measurable impact.
When Enabled (Production Use)
When GOOGLE_CLOUD_PROJECT or OTEL_EXPORTER_OTLP_ENDPOINT is set:
- ~50-200 microseconds per compilation (all phases combined)
- ~100-500 microseconds per AI API call
- Spans are batched and exported asynchronously
- Export happens in background goroutines (doesn't block your code)
Production recommendations:
- Use sampling for high-throughput services (e.g., 10% of requests)
- Batch exporters are enabled by default
- Set
OTEL_ENVIRONMENT=productionfor environment tagging
Overhead Breakdown by Component
| Component | Spans per Operation | Typical Overhead |
|---|---|---|
| Compiler Pipeline | 5-7 spans | ~100μs |
| Eval Harness | 2 spans per benchmark | ~50μs |
| Messaging | 1 span per operation | ~20μs |
| REPL Session | 1 session + N input spans | ~30μs per input |
| Check Command | 2 spans (root + result) | ~40μs |
| AI Providers | 1 span per API call | ~30μs |
Note: Actual overhead depends on your OTEL collector. Local Jaeger adds ~10μs, while cloud exports (GCP, Honeycomb) add ~50-100μs due to batching and network I/O.
Observatory Dashboard Integration (v0.6.3+)
The AILANG Observatory provides a local dashboard for viewing traces, spans, and metrics from Claude Code, Gemini CLI, and AILANG operations in real-time.
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ Your Development Environment │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Claude Code ──────────── ──┐ │
│ (metrics + events) │ │
│ │ OTLP/HTTP │
│ Gemini CLI ───────────────┼────────────────► AILANG Server │
│ (full traces) │ (localhost:1957) │
│ │ │ │
│ ailang run ───────────────┘ │ │
│ (full traces) │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Observatory UI │ │
│ │ /observatory │ │
│ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Quick Setup
1. Start the AILANG server:
ailang server
# Or with make
make services-start
2. Configure Claude Code:
Add to ~/.claude/settings.json:
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "http/json",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:1957",
"OTEL_RESOURCE_ATTRIBUTES": "ailang.source=user"
}
}
3. Configure Gemini CLI:
Add to ~/.gemini/settings.json:
{
"telemetry": {
"enabled": true
}
}
And add to your shell profile (~/.zshenv, ~/.bashrc, etc.):
# Gemini CLI telemetry to Observatory
export GEMINI_TELEMETRY_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:1957
export OTEL_EXPORTER_OTLP_PROTOCOL=http/json
export OTEL_RESOURCE_ATTRIBUTES="ailang.source=user"
4. View the dashboard:
Open http://localhost:1957 and navigate to the Observatory tab.
What Gets Captured
| Source | Data Type | What You See |
|---|---|---|
| Claude Code | Events (OTLP logs) | Token counts, costs, model, session info |
| Gemini CLI | Full traces | Complete span hierarchy with timing |
| AILANG commands | Full traces | Compilation phases, eval benchmarks, etc. |
Environment Variables Reference
| Variable | Purpose | Example |
|---|---|---|
CLAUDE_CODE_ENABLE_TELEMETRY | Enable Claude Code telemetry export | 1 |
OTEL_LOGS_EXPORTER | Log export protocol | otlp |
OTEL_METRICS_EXPORTER | Metrics export protocol | otlp |
OTEL_EXPORTER_OTLP_PROTOCOL | OTLP transport protocol | http/json or grpc |
OTEL_EXPORTER_OTLP_ENDPOINT | Observatory server URL | http://localhost:1957 |
OTEL_RESOURCE_ATTRIBUTES | Span metadata | ailang.source=user |
GEMINI_TELEMETRY_ENABLED | Enable Gemini CLI telemetry | true |
Resource Attributes
The OTEL_RESOURCE_ATTRIBUTES field supports these attributes:
| Attribute | Description |
|---|---|
ailang.source | user for manual sessions, coordinator for automated |
ailang.task_id | Coordinator task ID (set automatically) |
ailang.session_id | Session ID (set automatically) |
OTLP Receiver Endpoints
The Observatory server exposes standard OTLP endpoints:
| Endpoint | Method | Content-Type | Purpose |
|---|---|---|---|
/v1/traces | POST | application/x-protobuf, application/json | Trace spans |
/v1/logs | POST | application/x-protobuf, application/json | Log records (Claude Code events) |
/v1/metrics | POST | application/x-protobuf, application/json | Metrics |
Both protobuf and JSON formats are supported. Claude Code uses JSON (http/json protocol).
Silent Failure Mode
Important: If the Observatory server is not running, OTLP exports fail silently. This is by design - your CLI tools continue working normally without errors or delays.
To verify telemetry is working:
- Ensure
ailang serveris running - Run a Claude Code or Gemini CLI command
- Check the Observatory dashboard for new traces
Filtering Noisy Traces
The Observatory automatically filters out polling endpoints to reduce noise:
/api/approvals,/api/hierarchy,/api/statistics/api/observatory/*,/api/metrics/*/assets/*,/v1/*(OTLP endpoints themselves)/health,/ws,/ws/observatory
Only meaningful operations appear in the trace list.
Coordinator vs User Sessions
The ailang.source attribute distinguishes trace sources:
| Source | Origin | Typical Use |
|---|---|---|
user | Manual Claude Code / Gemini CLI sessions | Interactive development |
coordinator | Automated tasks via ailang coordinator | Background automation |
This allows filtering in the dashboard to see only relevant traces.
Dual Export (Observatory + GCP)
Send traces to both the local Observatory and Google Cloud Trace:
# Local Observatory
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:1957
# Also export to GCP
export GOOGLE_CLOUD_PROJECT=your-project-id
The AILANG server will export to both destinations.
Troubleshooting
Traces not appearing in GCP?
- Verify ADC credentials:
gcloud auth application-default login - Check project:
gcloud config get-value project - Verify permissions: Cloud Trace Agent role required
- Run integration test:
go test -tags=integration ./internal/telemetry/...
OTLP connection refused?
- Check collector is running:
curl http://localhost:4318/v1/traces - Verify endpoint URL includes protocol:
http://not justlocalhost:4318 - Check firewall/port access
No spans from AI providers?
- Verify telemetry is initialized before AI calls
- Check spans are ending:
defer span.End()in all paths - Enable debug logging: set
OTEL_LOG_LEVEL=debug
See Also
- Debugging Guide - Environment variables and CLI debug flags for troubleshooting
- Collaboration Hub - Real-time dashboard with Observatory integration
- Coordinator Guide - Task execution lifecycle with traced spans
- Evaluation Framework - AI benchmarks with
eval.suiteandeval.benchmarktraces