Skip to content

fix(tb-lf): align eval response structs with devportal API#34

Merged
dafilipaj merged 1 commit into
mainfrom
fix/tb-lf-eval-deserialize
May 21, 2026
Merged

fix(tb-lf): align eval response structs with devportal API#34
dafilipaj merged 1 commit into
mainfrom
fix/tb-lf-eval-deserialize

Conversation

@dafilipaj
Copy link
Copy Markdown
Contributor

Summary

tb-lf eval run <id> (v0.6.0) failed with invalid type: sequence, expected a string at line 1 column 998, and tb-lf eval cases [--suite <key>] failed with invalid type: map, expected a sequence at line 1 column 0. Both bugs were client-side struct/shape mismatches against DevPortal's actual response.

EvalItem (crates/tb-lf/src/types.rs) was out of sync with SpaApi::Ai::Eval::RunsController#item_json:

  • conversation_log: Option<String> — server returns a JSON array of {role, content} (a JSON column, surfaced via as_json). Switched to Option<serde_json::Value>. This is the field that triggered the deserialize failure.
  • Renamed suitesuite_key + added suite_name, casecase_key + added case_name, duration_secondsduration_ms, trace_langfuse_idtrace_id — matching the keys actually emitted by the controller. Previously these silently deserialized to None, leaving the items list rendered as blank rows.
  • Display formatting in main.rs updated accordingly; duration is now duration_ms / 1000.0 for the Xs display.

EvalAction::Cases (crates/tb-lf/src/main.rs) was decoding into Vec<EvalCase>, but CoverageController#cases renders { data: [...], meta: {...} }. Switched to PaginatedResponse<EvalCase> and use resp.data. Also fixed the suite filter query param: server expects suite_key, CLI was sending suite, so --suite silently didn't filter.

Added deserialize tests covering both endpoints with representative payloads so the contracts can't drift again silently.

Test plan

Local verification against live DevPortal (Development project):

  • cargo run -p tb-lf -- eval run 60 --project Development — renders 352 items with populated suite / case / status / score / duration columns (previously failed at first item with conversation_log)
  • cargo run -p tb-lf -- eval run 60 --failed --project Development — filters to failed only
  • cargo run -p tb-lf -- eval run 60 --full --project Development — prints truncated JSON conversation log per item
  • cargo run -p tb-lf -- eval cases --project Development — lists 50 cases (default per_page) from the paginated wrapper
  • cargo run -p tb-lf -- eval cases --suite crm-agent --project Development — filters to 2 CRM Agent cases (previously: did not filter, deserialize-failed before display)
  • cargo fmt --check, cargo clippy --workspace -- -D warnings, cargo test --workspace — all clean (5 tb-lf unit tests now, +2 regression tests)

Task

n/a — incidental tooling fix surfaced while running evals.

EvalItem field names and types were out of sync with the server
response from SpaApi::Ai::Eval::RunsController#item_json:

- conversation_log was typed Option<String>; server returns a JSON
  array of {role, content} (JSON column, surfaced via as_json). This
  caused `tb-lf eval run <id>` to fail with "invalid type: sequence,
  expected a string". Switched to Option<serde_json::Value>.
- Renamed suite/case to suite_key/suite_name + case_key/case_name,
  duration_seconds to duration_ms, trace_langfuse_id to trace_id —
  matching the keys actually emitted by the controller. Previously
  these fields silently deserialized to None, leaving the items list
  blank in the CLI output.
- Display formatting in main.rs updated to use the new fields and
  format duration_ms / 1000.0 for the seconds display.

EvalAction::Cases was deserializing into Vec<EvalCase>, but
CoverageController#cases renders { data: [...], meta: {...} }. Switched
to PaginatedResponse<EvalCase> and use resp.data. Also fixed the suite
filter query param: server expects suite_key (line 14 of the
controller), CLI was sending suite — so --suite silently didn't filter.

Added deserialize tests against representative payloads for both
endpoints so the contracts can't drift again silently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dafilipaj dafilipaj marked this pull request as ready for review May 19, 2026 10:35
@dafilipaj dafilipaj requested a review from trogulja May 19, 2026 10:35
@dafilipaj dafilipaj merged commit 0201dec into main May 21, 2026
1 check passed
@dafilipaj dafilipaj deleted the fix/tb-lf-eval-deserialize branch May 21, 2026 09:16
@dafilipaj dafilipaj mentioned this pull request May 21, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants