Skip to content

tracking: OTel observability — GenAI conventions, richer traces, session tracing #303

@christso

Description

@christso

Overview

Parent issue tracking 8 OTel observability and eval infrastructure improvements identified from deep analysis of braintrustdata/braintrust-claude-plugin. AgentV already has a working OTel exporter (#277). These issues make it production-grade and interoperable.

Architecture Alignment Review

Each issue was reviewed against AgentV's 5 design principles (CLAUDE.md). Key finding: #300 should be implemented as a plugin, not core code, per Principle 1 (Lightweight Core, Plugin Extensibility).

Issue Title Verdict Adjustment
#298 Adopt OTel GenAI semantic conventions ALIGNED None — standards alignment (P3)
#299 Per-span token usage in OTel export ALIGNED None — universal primitive (P2)
#300 Claude Code session tracing plugin NEEDS ADJUSTMENT Implement as plugin, not core (P1)
#301 Trace composition / parent span linking ALIGNED None — W3C standard (P3)
#302 Turn-level span grouping ALIGNED Opt-in via --otel-group-turns flag
#304 Replace proprietary trace JSONL with OTLP JSON file export ALIGNED Removes dead code, consolidates to one schema (P3, P1)
#305 Real-time span export during eval execution ALIGNED Core streaming primitive needed by all providers (P2)
#306 Lazy file-backed output for code judge payloads ALIGNED Performance optimization for large Message[] payloads (P2)

Principle Analysis

P1 (Lightweight Core): #300 is Claude Code-specific → plugin. #304 reduces core surface by deleting TraceWriter. Everything else is core OTel infrastructure.

P2 (Primitives Only): #298, #299, #305 expose existing data through standard interfaces — universal, stateless, needed by majority. #306 optimizes how existing data is passed to judges.

P3 (Industry Standards): #298 = OTel GenAI conventions. #301 = W3C Trace Context. #304 = OTLP JSON spec.

Phased Delivery

Phase 1: Core attribute improvements (~3-4 days)

Phase 2: Trace structure + file format (~1 week)

Phase 3: Streaming + plugin (~2-3 weeks)

Phase 4: Eval infrastructure optimization

Dependency Graph

#298 (GenAI conventions) ← gates everything
 ├── #299 (per-span tokens — uses GenAI attribute names)
 ├── #302 (turn grouping — spans use GenAI conventions)
 ├── #304 (OTLP JSON file — writes GenAI-convention spans to disk)
 ├── #305 (streaming export — creates GenAI-convention spans in real-time)
 └── #300 (session plugin — exports GenAI-convention spans via hooks)

#301 (trace composition — independent, can parallel with Phase 1)

#305 (streaming) ← #300 depends on this for real-time session tracing

#306 (lazy output) — independent, can be done anytime

Key Architectural Boundary

Core (packages/core/src/observability/): OTel exporter, attribute mapping, trace composition, span hierarchy, OTLP JSON file writer, streaming observer.

Delete (apps/cli/src/commands/eval/trace-writer.ts): Proprietary TraceWriter, buildTraceRecord, extractTraceSpans — replaced by #304.

Plugin (plugins/agentv-trace/): Claude Code hook wiring, session state management, transcript parsing. Opens the door for similar plugins for Copilot CLI, Codex, etc.

SDK (packages/eval/): Lazy file-backed loading for large payloads (#306) — transparent to judge authors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions