fix(tb-lf): align eval response structs with devportal API by dafilipaj · Pull Request #34 · productiveio/cli-toolbox

dafilipaj · 2026-05-18T14:17:18Z

Summary

tb-lf eval run <id> (v0.6.0) failed with invalid type: sequence, expected a string at line 1 column 998, and tb-lf eval cases [--suite <key>] failed with invalid type: map, expected a sequence at line 1 column 0. Both bugs were client-side struct/shape mismatches against DevPortal's actual response.

EvalItem (crates/tb-lf/src/types.rs) was out of sync with SpaApi::Ai::Eval::RunsController#item_json:

conversation_log: Option<String> — server returns a JSON array of {role, content} (a JSON column, surfaced via as_json). Switched to Option<serde_json::Value>. This is the field that triggered the deserialize failure.
Renamed suite→suite_key + added suite_name, case→case_key + added case_name, duration_seconds→duration_ms, trace_langfuse_id→trace_id — matching the keys actually emitted by the controller. Previously these silently deserialized to None, leaving the items list rendered as blank rows.
Display formatting in main.rs updated accordingly; duration is now duration_ms / 1000.0 for the Xs display.

EvalAction::Cases (crates/tb-lf/src/main.rs) was decoding into Vec<EvalCase>, but CoverageController#cases renders { data: [...], meta: {...} }. Switched to PaginatedResponse<EvalCase> and use resp.data. Also fixed the suite filter query param: server expects suite_key, CLI was sending suite, so --suite silently didn't filter.

Added deserialize tests covering both endpoints with representative payloads so the contracts can't drift again silently.

Test plan

Local verification against live DevPortal (Development project):

cargo run -p tb-lf -- eval run 60 --project Development — renders 352 items with populated suite / case / status / score / duration columns (previously failed at first item with conversation_log)
cargo run -p tb-lf -- eval run 60 --failed --project Development — filters to failed only
cargo run -p tb-lf -- eval run 60 --full --project Development — prints truncated JSON conversation log per item
cargo run -p tb-lf -- eval cases --project Development — lists 50 cases (default per_page) from the paginated wrapper
cargo run -p tb-lf -- eval cases --suite crm-agent --project Development — filters to 2 CRM Agent cases (previously: did not filter, deserialize-failed before display)
cargo fmt --check, cargo clippy --workspace -- -D warnings, cargo test --workspace — all clean (5 tb-lf unit tests now, +2 regression tests)

Task

n/a — incidental tooling fix surfaced while running evals.

EvalItem field names and types were out of sync with the server response from SpaApi::Ai::Eval::RunsController#item_json: - conversation_log was typed Option<String>; server returns a JSON array of {role, content} (JSON column, surfaced via as_json). This caused `tb-lf eval run <id>` to fail with "invalid type: sequence, expected a string". Switched to Option<serde_json::Value>. - Renamed suite/case to suite_key/suite_name + case_key/case_name, duration_seconds to duration_ms, trace_langfuse_id to trace_id — matching the keys actually emitted by the controller. Previously these fields silently deserialized to None, leaving the items list blank in the CLI output. - Display formatting in main.rs updated to use the new fields and format duration_ms / 1000.0 for the seconds display. EvalAction::Cases was deserializing into Vec<EvalCase>, but CoverageController#cases renders { data: [...], meta: {...} }. Switched to PaginatedResponse<EvalCase> and use resp.data. Also fixed the suite filter query param: server expects suite_key (line 14 of the controller), CLI was sending suite — so --suite silently didn't filter. Added deserialize tests against representative payloads for both endpoints so the contracts can't drift again silently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dafilipaj marked this pull request as ready for review May 19, 2026 10:35

dafilipaj requested a review from trogulja May 19, 2026 10:35

trogulja approved these changes May 21, 2026

View reviewed changes

dafilipaj merged commit 0201dec into main May 21, 2026
1 check passed

dafilipaj deleted the fix/tb-lf-eval-deserialize branch May 21, 2026 09:16

dafilipaj mentioned this pull request May 21, 2026

tb-lf: bump version to 0.7.1 #36

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tb-lf): align eval response structs with devportal API#34

fix(tb-lf): align eval response structs with devportal API#34
dafilipaj merged 1 commit into
mainfrom
fix/tb-lf-eval-deserialize

dafilipaj commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dafilipaj commented May 18, 2026

Summary

Test plan

Task

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants