test(parsers): unified cases × all parsers, coverage matrix (step 1)#8728
test(parsers): unified cases × all parsers, coverage matrix (step 1)#8728keivenchang wants to merge 6 commits into
Conversation
WalkthroughThis pull request consolidates tool-call parser tests into a shared test-cases framework. It removes duplicate inline regression tests from individual parser modules, introduces a centralized fixture system for cross-parser testing, and adds documentation comments clarifying test coverage organization. No parsing logic or public API behavior changes. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
lib/parsers/src/tool_calling/test_cases/fixtures/pythonic.rs (1)
10-24: Non-string scalar rendering inrender_argsemits JSON form instead of Python literal syntax.
v.to_string()onserde_json::Valueproduces JSON syntax (true/false/null), but Python literal syntax requiresTrue/False/None. Currently,case_1_single_calluses only string-valued arguments ({"location": "NYC"}), so this latent issue does not manifest. However, if future test cases pass bool or null arguments through this fixture, the generated Python code will be syntactically invalid and the pythonic parser will fail to parse it.♻️ Optional hardening
fn render_args(arguments: &Value) -> String { let Some(map) = arguments.as_object() else { return String::new(); }; map.iter() .map(|(k, v)| { let v_str = match v { Value::String(s) => format!("\"{s}\""), + Value::Bool(true) => "True".to_string(), + Value::Bool(false) => "False".to_string(), + Value::Null => "None".to_string(), _ => v.to_string(), }; format!("{k}={v_str}") }) .collect::<Vec<_>>() .join(", ") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/parsers/src/tool_calling/test_cases/fixtures/pythonic.rs` around lines 10 - 24, The render_args function currently uses v.to_string() which emits JSON literals; update render_args (the function and the v_str match) to match on Value::Bool -> "True"/"False", Value::Null -> "None", Value::Number -> the number string (preserve numeric formatting), Value::String -> quoted string as already done, and for complex Value::Array/Value::Object either serialize appropriately or fall back to v.to_string(); ensure the match arms produce Python-valid literal strings so calls like in render_args return True/False/None instead of true/false/null.lib/parsers/src/tool_calling/test_cases/normalize.rs (1)
44-46: Optional: distinguish "args was literal null" from "args failed to parse".
unwrap_or(Value::Null)collapses two different parser bugs into the same canonical form. In practice CASE.1 should never trigger this (every parser must emit valid JSON for the happy path), but if a parser ever regresses to emitting a non-JSON arguments string, the failure message will readgot Nulland a reviewer can't tell whether the parser literally emitted JSONnullor produced unparseable text.A cheap improvement that preserves the current API: stash the raw string under a sentinel key when parsing fails, so the failure surfaces the bad payload directly.
♻️ One way to surface the parse failure
- let arguments: Value = - serde_json::from_str(&call.function.arguments).unwrap_or(Value::Null); + let arguments: Value = serde_json::from_str(&call.function.arguments) + .unwrap_or_else(|_| { + serde_json::json!({ "__unparseable_arguments__": call.function.arguments }) + }); Self { name, arguments }Defer if you'd rather wait for a real failing case to motivate the change.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/parsers/src/tool_calling/test_cases/normalize.rs` around lines 44 - 46, The current serde_json::from_str(&call.function.arguments).unwrap_or(Value::Null) hides whether the parser produced literal JSON null or an unparseable string; update the parsing so that on Err you construct a sentinel JSON object (e.g. an Object with a unique key like "__unparseable_arguments" or "__raw_args_parse_error") containing the original call.function.arguments (and optionally the parse error message) and assign that to the arguments variable, keeping the successful parse path unchanged so Self { name, arguments } still gets a Value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/parsers/src/tool_calling/xml/parser.rs`:
- Around line 589-592: Rustdoc is failing with invalid_html_tags because doc
comments embed raw XML-like markers such as <tool_call>, <function>, and
<parameter>; update the affected /// doc comments (around lines 30–90) —
particularly on the functions find_tool_call_end_position_xml,
try_tool_call_parse_xml, and parse_tool_call_block — to escape those angle
brackets by wrapping the literal markers in backticks or replacing < and > with
< and > so rustdoc treats them as code/text rather than HTML.
---
Nitpick comments:
In `@lib/parsers/src/tool_calling/test_cases/fixtures/pythonic.rs`:
- Around line 10-24: The render_args function currently uses v.to_string() which
emits JSON literals; update render_args (the function and the v_str match) to
match on Value::Bool -> "True"/"False", Value::Null -> "None", Value::Number ->
the number string (preserve numeric formatting), Value::String -> quoted string
as already done, and for complex Value::Array/Value::Object either serialize
appropriately or fall back to v.to_string(); ensure the match arms produce
Python-valid literal strings so calls like in render_args return True/False/None
instead of true/false/null.
In `@lib/parsers/src/tool_calling/test_cases/normalize.rs`:
- Around line 44-46: The current
serde_json::from_str(&call.function.arguments).unwrap_or(Value::Null) hides
whether the parser produced literal JSON null or an unparseable string; update
the parsing so that on Err you construct a sentinel JSON object (e.g. an Object
with a unique key like "__unparseable_arguments" or "__raw_args_parse_error")
containing the original call.function.arguments (and optionally the parse error
message) and assign that to the arguments variable, keeping the successful parse
path unchanged so Self { name, arguments } still gets a Value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7c670424-c81c-4b96-a9d2-46503d1fc5cd
📒 Files selected for processing (24)
lib/parsers/src/tool_calling/dsml/parser.rslib/parsers/src/tool_calling/harmony/harmony_parser.rslib/parsers/src/tool_calling/json/base_json_parser.rslib/parsers/src/tool_calling/json/deepseek_v3_1_parser.rslib/parsers/src/tool_calling/json/deepseek_v3_parser.rslib/parsers/src/tool_calling/mod.rslib/parsers/src/tool_calling/pythonic/pythonic_parser.rslib/parsers/src/tool_calling/test_cases/fixtures/deepseek_v3.rslib/parsers/src/tool_calling/test_cases/fixtures/deepseek_v3_1.rslib/parsers/src/tool_calling/test_cases/fixtures/dsml.rslib/parsers/src/tool_calling/test_cases/fixtures/generic_xml.rslib/parsers/src/tool_calling/test_cases/fixtures/glm47.rslib/parsers/src/tool_calling/test_cases/fixtures/harmony.rslib/parsers/src/tool_calling/test_cases/fixtures/json_wrapped.rslib/parsers/src/tool_calling/test_cases/fixtures/kimi_k2.rslib/parsers/src/tool_calling/test_cases/fixtures/mod.rslib/parsers/src/tool_calling/test_cases/fixtures/nemotron_deci.rslib/parsers/src/tool_calling/test_cases/fixtures/pythonic.rslib/parsers/src/tool_calling/test_cases/mod.rslib/parsers/src/tool_calling/test_cases/normalize.rslib/parsers/src/tool_calling/test_cases/tests.rslib/parsers/src/tool_calling/xml/glm47_parser.rslib/parsers/src/tool_calling/xml/kimi_k2_parser.rslib/parsers/src/tool_calling/xml/parser.rs
Pre-existing doc comments in xml/parser.rs and config.rs embed bare <tool_call>, <function>, <parameter> markers that rustdoc misreads as HTML. minimax_append_think has bare URLs that rustdoc wants angle- bracketed. Wrap the markers in backticks and angle-bracket the URLs. 12 cargo-doc warnings cleared. No behavior change. CodeRabbit comment on PR #8728 flagged this; while these warnings predate the PR, the contract-test scaffold is the active touch on the file so its a good moment to fix them. 385 tests still pass. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
| }) | ||
| .collect::<Vec<_>>() | ||
| .join(", ") | ||
| } |
There was a problem hiding this comment.
Deferring — case_1 only uses string args ({"location":"NYC"}), so the JSON-vs-Python literal mismatch doesnt manifest. Will fold into the same PR that adds case_2 / case_4 (parallel calls / non-string scalars), since thats when it actualy starts to bite.
| .to_string(); | ||
| let arguments: Value = | ||
| serde_json::from_str(&call.function.arguments).unwrap_or(Value::Null); | ||
| Self { name, arguments } |
There was a problem hiding this comment.
Same disposition as the pythonic one — case_1 cant trigger an unparseable-args scenario (every parser must emit valid JSON for the happy path), so the Null-vs-error collapse doesnt manifest yet. Will revisit when case_4 (malformed args) lands and we have a real failing case to motivate the sentinel shape.
dec0a39 to
4a204f0
Compare
4a204f0 to
2f90680
Compare
|
|
||
| /// Every parser fixture, in stable alphabetical order by registry name. | ||
| /// The matrix report iterates this vec; new parsers go here. | ||
| pub fn all_fixtures() -> Vec<Box<dyn ToolCallFixture>> { |
There was a problem hiding this comment.
In lib/parsers/src/tool_calling/parsers.rs there's a get_tool_parser_map. Is it worth adding something to make sure every id in there is also in here?
|
|
||
| fn case_1_single_call(&self, function_name: &str, arguments: &Value) -> FixtureCase<String> { | ||
| FixtureCase::Sample(format!( | ||
| "{}{{\"name\":\"{function_name}\",\"arguments\":{arguments}}}{}", |
There was a problem hiding this comment.
Do you need to escape the arguments? I think this is in a few other places if yes.
| NotApplicable(&'static str), | ||
|
|
||
| /// Fixture method not yet written. CI hard-fails. Used only as the | ||
| /// trait default — every concrete fixture must override every method |
There was a problem hiding this comment.
Used only as the trait default
Which trait?
|
|
||
| /// Canonicalize a vector of parser-emitted calls. Order is preserved | ||
| /// (callers compare ordered for single-call cases; future parallel-call | ||
| /// cases will sort by name before comparison). |
There was a problem hiding this comment.
I'm not sure this comment says anything that the function name omits.
| /// leaked alongside the tool call." Parsers vary: some return `None`, some | ||
| /// `Some("")`, and some `Some(" ")` after a stripped trailing newline. | ||
| /// All three mean the same thing — the tool call was the entire output. | ||
| pub fn normal_text_is_empty(normal: &Option<String>) -> bool { |
There was a problem hiding this comment.
is_normal_test_empty would be easier to read. I would inline it and drop the function.
| //! 3. Parse `function.arguments` (a JSON-encoded string) into a | ||
| //! `serde_json::Value` so equality is structural rather than textual. | ||
| //! A parser may emit `{"a":1}` while another emits `{ "a":1 }`; both | ||
| //! are correct. |
There was a problem hiding this comment.
That's a big comment. Where does all that happen? There's not much code in here.
#### Overview: Rename the parser/frontend test-case taxonomy so each prefix matches its directory: `lib/parsers/` uses `PARSER.*`, frontend tests use `FRONTEND.*`. Comment-only rename across ~28 files; cargo check passes. #### Details: - `lib/parsers/TEST_CASES.md` → `PARSER_CASES.md` with structural rewrite: format-variant section promoted to first-class, white-box CASE.20 block dropped, applicability summary expanded. - `components/src/dynamo/frontend/tests/TEST_CASES.md` → `FRONTEND_CASES.md`. - `CASE.1`–`CASE.15` → `PARSER.1`–`PARSER.15` (universal behavior contract). - `CASE.21`–`CASE.24` → `PARSER.fmt1`–`PARSER.fmt4` (format variants — function-name conventions, whitespace tolerance, token spelling variants, empty wrappers). - `CASE.xml1-2` / `CASE.harmony1` → `PARSER.xml1-2` / `PARSER.harmony1`. - `CASE.16` (gemma4 — was a CASE.5 refinement) merged into `PARSER.5`. - `CASE.19` (harmony envelope grammar across 9 tests in harmony_parser + gpt_oss reasoning) promoted to new `PARSER.harmony2`. - `CASE.17` / `CASE.18` / `CASE.20` (~80 white-box helper unit tests with no cross-impl analogue) replaced with `// helper` marker; descriptive forms preserved (`// detection helper`, `// end-position helper`, `// internal helper`, `// registry helper`, `// registry lookup`). - `FE.1`–`FE.9` → `FRONTEND.1`–`FRONTEND.9` (54 annotations across 4 frontend Python test files). - All Rust/Python diffs are comment-only — `cargo check -p dynamo-parsers --tests` passes. #### Where should the reviewer start? `lib/parsers/PARSER_CASES.md` (new front matter, format-variant section, applicability table); then a sample annotation diff in `lib/parsers/src/tool_calling/xml/kimi_k2_parser.rs`. #### Related Issues: Relates to [DIS-1906](https://linear.app/nvidia/issue/DIS-1906) — taxonomy alignment unblocks the cross-impl parser parity harness's case-mapping work. /coderabbit profile chill Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
…8960) Signed-off-by: Krishnan Prashanth <140860868+KrishnanPrash@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
…ERVER_DISABLE_GC is set (#9096)
Step 1 in moving individual per-parser test cases into a unified cross-parser matrix run against every parser. CASE.1 (single tool call) is migrated; follow-ups will migrate CASE.2 (parallel calls), CASE.7 (complex argument types), etc. Each parser declares per case via FixtureCase: Sample(input) / NotApplicable(reason) / Unimplemented. Adding a parser fills a column; adding a case forces every parser to declare. Coverage gaps surface as Unimplemented rather than silent omissions. Changes: - New module lib/parsers/src/tool_calling/test_cases/ — ToolCallFixture trait, 3-state FixtureCase enum, CanonicalCall normalizer, parametrized runner - 3 family fixtures (JsonWrappedFixture, GenericXmlFixture, DsmlFixture) + 7 standalones cover all 18 registered parsers - 5 inline per-parser happy-path tests deleted (dsml, harmony, glm47, kimi_k2, xml) — each was a strict subset of CASE.1; surviving siblings retain main CASE.* annotations - Cleared pre-existing rustdoc invalid_html_tags warnings on xml/parser.rs, config.rs, minimax_append_think_parser.rs - 385 tests pass See lib/parsers/TEST_CASES.md for the full case taxonomy. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
2f90680 to
ecb0232
Compare
|
Closing unmerged — scope was split into siblings that already landed: #8725 (taxonomy annotations, merged 2026-04-27), #8846 (top-7 × PARSER.1-5 matrix, merged 2026-04-30), #8888 (part 2, merged 2026-05-01). This scaffold-step-1 branch went back to draft on 2026-05-05 and is 244 commits behind main. Cleaning up. |
Overview:
Step 1 in moving individual per-parser test cases into a unified cross-parser matrix run against every parser, starting with CASE.1 (single tool call, single string argument).
Details:
lib/parsers/src/tool_calling/test_cases/—ToolCallFixturetrait, 3-stateFixtureCaseenum (Sample/NotApplicable(reason)/Unimplemented),CanonicalCallnormalizer, parametrized runnerJsonWrappedFixture,GenericXmlFixture,DsmlFixture) + 7 standalones cover all 18 registered parsersSample/NotApplicable/Unimplementedper case; coverage gaps surface asUnimplementedinstead of silent omissionsdsml,harmony,glm47,kimi_k2,xml); each was a strict subset of CASE.1, surviving siblings retain main'sCASE.*annotationsCoverage matrix:
Legend:
NotApplicable, or a real gap to surface asUnimplementeduntil filled)Column legend:
CASE.1— single tool callCASE.2— parallel callsCASE.7— complex argument types (nested objects, arrays, bool, number, Unicode)TODO migrations (next PRs):
deepseek_v3,deepseek_v3_1,deepseek_v3_2,deepseek_v4,glm47,pythonic. Plus a shared<tool_call>...</tool_call>find-end-position test injson/mod.rscovering the JSON-wrapped family at the scaffolding level.harmony,glm47,deepseek_v3,deepseek_v3_1,deepseek_v3_2,deepseek_v4,kimi_k2,nemotron_deci,pythonic.Where should the reviewer start?
lib/parsers/src/tool_calling/test_cases/mod.rs, thentests.rs, thenfixtures/json_wrapped.rs, thenlib/parsers/TEST_CASES.mdRelated Issues:
Internal tracking: DIS-1842. Replaces #8724 (closed when its head branch was renamed to add the DIS-1842 prefix). PR #8208 fixed CASE.5 for kimi_k2 specifically — generalizing into a
case_5_*contract is a follow-up to this PR./coderabbit profile chill
Summary by CodeRabbit
Tests
Documentation