Consolidate trace-to-invocation conversion#76
Conversation
7292e27 to
1d42985
Compare
There was a problem hiding this comment.
Pull request overview
This PR moves trace-to-invocation conversion and trace metadata extraction fully into the Python backend, introducing a new conversion API endpoint and updating the frontend to consume backend-produced invocations consistently across uploads, inspector, and eval set building.
Changes:
- Added
POST /api/convertto convert uploaded trace files into invocations + extracted metadata in a unified response. - Updated the UI upload/evalset/inspector flows to call the backend conversion endpoint and to stop re-converting traces client-side.
- Removed the TypeScript trace conversion module and introduced a smaller
trace-helpersutility module for editor-related span traversal.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| ui/src/lib/types.ts | Adds TypeScript types for the new trace conversion API response payload. |
| ui/src/lib/trace-patcher.ts | Repoints editor utilities to trace-helpers after removing the converter module. |
| ui/src/lib/trace-metadata.ts | Removes client-side metadata extraction logic; keeps the TraceMetadata type only. |
| ui/src/lib/trace-helpers.ts | Introduces shared trace helper utilities (format detection, span traversal) used by the editor. |
| ui/src/lib/trace-converter.ts | Deletes the client-side trace-to-invocation converter implementation. |
| ui/src/lib/evalset-builder.ts | Switches eval set generation to accept backend-produced invocations rather than raw traces. |
| ui/src/context/TraceProvider.tsx | Updates trace file ingestion to call convertTraces() and store backend metadata + invocations in state. |
| ui/src/components/upload/TraceEditorDrawer.tsx | Loads invocations via backend conversion while still parsing traces locally for edit mapping. |
| ui/src/components/inspector/InspectorView.tsx | Uses invocations already in state rather than re-parsing/re-converting trace files on mount. |
| ui/src/components/inspector/InspectorHeader.tsx | Creates eval sets directly from stored invocations instead of reloading/parsing traces. |
| ui/src/components/dashboard/TraceCard.tsx | Same eval set creation change as inspector header (use stored invocations). |
| ui/src/components/builder/TraceUploadZone.tsx | Generates eval sets from backend conversion results; expands accepted file extensions to .jsonl. |
| ui/src/api/client.ts | Adds convertTraces() client for POST /api/convert (multipart upload). |
| src/agentevals/api/routes.py | Implements the new /convert endpoint (file validation, loading, conversion, metadata extraction). |
| src/agentevals/api/models.py | Adds response models for trace conversion metadata and entries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export function detectTraceFormat(trace: Trace): 'adk' | 'genai' { | ||
| const check = (spans: Span[]): 'adk' | 'genai' | null => { | ||
| let hasGenai = false; | ||
| for (const span of spans) { | ||
| if (span.tags['otel.scope.name'] === ADK_SCOPE) { | ||
| return 'adk'; | ||
| } | ||
| if (!hasGenai && (span.tags['gen_ai.request.model'] || span.tags['gen_ai.input.messages'])) { | ||
| hasGenai = true; | ||
| } | ||
| } | ||
| return hasGenai ? 'genai' : null; | ||
| }; |
There was a problem hiding this comment.
detectTraceFormat marks a trace as GenAI only if it finds gen_ai.request.model or gen_ai.input.messages. Other parts of the UI (and the removed converter) treat gen_ai.system as a GenAI indicator (see trace-patcher.ts), so traces that only emit gen_ai.system will be misclassified as ADK and the editor mappings will be wrong. Please include gen_ai.system (and ideally the other input-message variants the patcher supports, e.g. gen_ai.prompt / gen_ai.request.messages) in the detection logic to avoid this regression.
| export function findDescendantLLMSpans(root: Span): Span[] { | ||
| const results: Span[] = []; | ||
| const queue = [root]; | ||
|
|
||
| while (queue.length > 0) { | ||
| const span = queue.shift()!; | ||
| if (span.tags['gen_ai.request.model'] || span.tags['gen_ai.input.messages']) { | ||
| results.push(span); | ||
| } | ||
| queue.push(...span.children); | ||
| } |
There was a problem hiding this comment.
findDescendantLLMSpans only considers spans with gen_ai.request.model or gen_ai.input.messages as LLM spans. trace-patcher relies on this to locate the first/last LLM span for editing, but it also keys off gen_ai.system and supports gen_ai.prompt / gen_ai.request.messages for input; a GenAI trace that lacks gen_ai.input.messages will produce zero LLM spans and no edit mappings. Align this predicate with the set of attributes used elsewhere (at least gen_ai.system, and optionally gen_ai.prompt/gen_ai.request.messages).
| for trace_file in trace_files: | ||
| if not trace_file.filename: | ||
| continue | ||
|
|
||
| if not (trace_file.filename.endswith(".json") or trace_file.filename.endswith(".jsonl")): | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail=f"Invalid file extension for {trace_file.filename}. Only .json and .jsonl files are allowed.", | ||
| ) | ||
|
|
||
| trace_path = os.path.join(temp_dir, trace_file.filename) | ||
| with open(trace_path, "wb") as f: # noqa: ASYNC230 | ||
| content = await trace_file.read() | ||
|
|
||
| if len(content) > 10 * 1024 * 1024: | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail=f"File {trace_file.filename} exceeds 10MB limit", | ||
| ) | ||
|
|
||
| f.write(content) | ||
| trace_paths.append(trace_path) |
There was a problem hiding this comment.
trace_file.filename is used directly in os.path.join(temp_dir, trace_file.filename). A crafted filename containing path separators (e.g. ../) can escape the temp dir, and duplicate filenames will overwrite each other. Please sanitize with os.path.basename(...) (or generate a UUID-based temp filename) before joining, and ensure names are unique within the batch.
| fmt = trace_format | ||
| if not fmt: | ||
| if trace_paths[0].endswith(".jsonl"): | ||
| fmt = "otlp-json" | ||
| else: | ||
| fmt = "jaeger-json" | ||
|
|
||
| loader = get_loader(fmt) | ||
| all_traces = [] | ||
| trace_to_filename: dict[str, str] = {} | ||
| for path in trace_paths: | ||
| try: | ||
| traces = loader.load(path) | ||
| filename = os.path.basename(path) | ||
| for t in traces: | ||
| trace_to_filename[t.trace_id] = filename | ||
| all_traces.extend(traces) | ||
| except Exception as exc: | ||
| logger.warning(f"Failed to load trace file '{path}': {exc}") | ||
|
|
There was a problem hiding this comment.
The endpoint infers a single fmt from only the first uploaded file’s extension, then applies that loader to all files. If a user uploads a mix of .json and .jsonl, the non-matching files will fail loader.load(...) and be silently skipped (only a server log warning), which is confusing for clients. Consider either (a) validating that all files share the same extension/format, (b) selecting the loader per file, and/or (c) surfacing per-file load failures in the response warnings/errors so the UI can notify the user.
| if not (trace_file.filename.endswith(".json") or trace_file.filename.endswith(".jsonl")): | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail=f"Invalid file extension for {trace_file.filename}. Only .json and .jsonl files are allowed.", | ||
| ) |
There was a problem hiding this comment.
The extension check is case-sensitive (endswith('.json') / endswith('.jsonl')), so valid files like TRACE.JSON will be rejected even though other code (e.g. _session_name_from_filename regex) treats extensions case-insensitively. Normalizing filename.lower() (or using a case-insensitive regex) would make the API more robust.
This PR eliminates the duplicated trace-to-invocation conversion logic between the Python backend and TypeScript frontend, making the backend the single source of truth.
Now we have a new
POST /api/convertendpoint accepts trace files, runs the backend converter + metadata extraction, and returns invocations + metadata in a unified format. The frontend calls this for all file uploads and eval set building. The inspector reads invocations from state instead of re-converting. Now everything works consistently with streaming sessions as that was already using backend invocations.