Skip to content

test(parsers): add cross-parser contract test scaffold (part 1)#8724

Closed
keivenchang wants to merge 4 commits into
mainfrom
keivenchang/tool-call-parser-test-cases_part1
Closed

test(parsers): add cross-parser contract test scaffold (part 1)#8724
keivenchang wants to merge 4 commits into
mainfrom
keivenchang/tool-call-parser-test-cases_part1

Conversation

@keivenchang
Copy link
Copy Markdown
Contributor

@keivenchang keivenchang commented Apr 25, 2026

Re-opened at #8728

Add universal test cases that apply to every tool-call parser, filling
in the gaps in the per-parser coverage by surfacing the matrix as live
cargo-test output instead of a static doc.

This first revision covers two cases for all 18 registered parsers:
- case_1_single_call           --  single tool-call happy path  (18/18 pass)
- case_5_missing_end_token_recovery (PR #8208 generalized)
                               --  2 pass (kimi_k2, mistral),
                                   5 N/A,
                                   11 KnownBroken (each tagged for
                                   follow-up work to generalize the
                                   PR #8208 fix)

The framework is a four-state FixtureCase enum: Sample (parser handles
correctly), KnownBroken (parser drops the call today; the test asserts
the broken state and fails when a future fix actually adds recovery,
forcing the fixture to be upgraded), NotApplicable (format genuinely
lacks the concept; reason printed), Unimplemented (CI hard-fail). The
four states distinguish honest gaps from silent ones, which is the
matrix-sparseness problem this scaffold is meant to solve.

Layout:
- lib/parsers/src/tool_calling/test_cases/{mod,normalize,tests}.rs
- lib/parsers/src/tool_calling/test_cases/fixtures/<parser>.rs (x18)

Existing 134 inline parser tests are untouched; each test module gets
a TODO breadcrumb pointing at the contract suite for future trimming
once the suite stabilizes.

Future revisions: more universal cases (parallel calls, malformed
args, finish_reason, etc.), adversarial streaming chunkings, and a
CI matrix-report block.

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Reduce duplication: 18 fixture files -> 11. Three on-the-wire format
families share parametric fixtures, six standalones keep their own files.

- JsonWrappedFixture (new) absorbs hermes, jamba, mistral, llama3_json,
  phi4, default. MissingEndBehavior carries the per-parser difference
  (recovers / drops / not applicable).
- GenericXmlFixture (new) absorbs minimax_m2, qwen3_coder, nemotron_nano.
  XmlFunctionForm carries the EqualsName vs NameAttr distinction.
- DsmlFixture (new) absorbs deepseek_v3_2 and deepseek_v4.

Standalones kept: kimi_k2, glm47, harmony, pythonic, deepseek_v3,
deepseek_v3_1, nemotron_deci. Each has a unique grammar that doesn't
collapse cleanly into a family parameter.

No behavior change. Matrix output unchanged. 391 existing tests still pass.

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Strengthen the contract test to also assert normal_text is empty (no
text leaked alongside the tool call), making it a strict superset of
the simplest happy-path inline tests. Then delete those 5 duplicates.

Deleted:
- xml/kimi_k2_parser.rs::test_parse_simple_tool_call
- xml/glm47_parser.rs::test_parse_simple_tool_call
- xml/parser.rs::test_parse_simple_tool_call
- dsml/parser.rs::test_parse_single_tool_call_string_param
- harmony/harmony_parser.rs::test_parse_tool_calls_harmony_complete_basic

Kept (each has multiple tool calls in the input — the contract suite
only covers the single-call case so far, so these still earn their
keep until parallel-call scenarios land):
- pythonic, deepseek_v3, deepseek_v3_1 "basic" tests

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 25, 2026

Walkthrough

A new shared test fixtures framework is introduced for tool-calling parsers, consolidating cross-parser test coverage. The system defines a ToolCallFixture trait and FixtureCase enum, then implements parameterized fixtures for each parser variant. Individual parser test modules are simplified by removing redundant tests and adding documentation references to the shared test suite.

Changes

Cohort / File(s) Summary
Test Cases Module Structure
lib/parsers/src/tool_calling/mod.rs, lib/parsers/src/tool_calling/test_cases/mod.rs
Introduces new test_cases submodule with contract trait ToolCallFixture, outcome enum FixtureCase<T>, and submodule declarations for fixtures, normalization, and tests.
Fixture Implementations
lib/parsers/src/tool_calling/test_cases/fixtures/mod.rs, lib/parsers/src/tool_calling/test_cases/fixtures/{deepseek_v3, deepseek_v3_1, dsml, generic_xml, glm47, harmony, json_wrapped, kimi_k2, nemotron_deci, pythonic}.rs
Adds ToolCallFixture implementations for each parser, including single-call test cases and missing-end-token recovery scenarios. Aggregates all fixtures in centralized registry.
Test Harness & Normalization
lib/parsers/src/tool_calling/test_cases/{normalize.rs, tests.rs}
Implements canonicalization logic for tool-call outputs and two shared async contract tests iterating over all registered fixtures with validation and reporting.
Parser Test Consolidation
lib/parsers/src/tool_calling/{dsml/parser.rs, harmony/harmony_parser.rs, json/{base_json_parser,deepseek_v3,deepseek_v3_1}.rs, pythonic/pythonic_parser.rs, xml/{glm47_parser,kimi_k2_parser,parser}.rs}
Removes redundant inline regression tests and adds clarifying comments documenting shared coverage in test_cases/ directory and parser-specific gaps (e.g., missing end-token recovery applicability).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a cross-parser contract test scaffold for tool-call parsers.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR description is well-structured, comprehensive, and follows the template with all required sections: Overview, Details, Where to start, and Related Issues.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (6)
lib/parsers/src/tool_calling/test_cases/fixtures/glm47.rs (1)

20-59: Recommend: extract the arg-key/arg-value body builder to a private helper.

The Some(map) = arguments.as_object() loop is duplicated verbatim between case_1_single_call and case_5_missing_end_token_recovery. The DSML fixture already uses a private render_invoke helper — follow the same pattern here so a future arg-encoding tweak (e.g., XML-escaping </>/& once a case exercises special chars) only has to change one spot.

♻️ Proposed refactor
 impl Glm47Fixture {
+    fn render_body(&self, arguments: &Value) -> String {
+        let mut body = String::new();
+        if let Some(map) = arguments.as_object() {
+            for (k, v) in map {
+                let v_str = match v {
+                    Value::String(s) => s.clone(),
+                    _ => v.to_string(),
+                };
+                body.push_str(&format!(
+                    "<arg_key>{k}</arg_key><arg_value>{v_str}</arg_value>"
+                ));
+            }
+        }
+        body
+    }
+}
+
+impl ToolCallFixture for Glm47Fixture {
     fn parser_name(&self) -> &'static str {
         "glm47"
     }
 
     fn case_1_single_call(&self, function_name: &str, arguments: &Value) -> FixtureCase<String> {
-        let mut body = String::new();
-        if let Some(map) = arguments.as_object() { /* … */ }
-        FixtureCase::Sample(format!("<tool_call>{function_name}{body}</tool_call>"))
+        let body = self.render_body(arguments);
+        FixtureCase::Sample(format!("<tool_call>{function_name}{body}</tool_call>"))
     }
 
     fn case_5_missing_end_token_recovery(
         &self,
         function_name: &str,
         arguments: &Value,
     ) -> FixtureCase<String> {
-        let mut body = String::new();
-        if let Some(map) = arguments.as_object() { /* … */ }
+        let body = self.render_body(arguments);
         FixtureCase::KnownBroken {
             input: format!("<tool_call>{function_name}{body}"),
             reason: "glm47 has no missing-end recovery yet; follow-up to generalize PR `#8208`.",
         }
     }
 }

(Note: the above diff sketch shows the structural change — the impl Glm47Fixture { ... } block needs to be split out properly.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/parsers/src/tool_calling/test_cases/fixtures/glm47.rs` around lines 20 -
59, Both case_1_single_call and case_5_missing_end_token_recovery duplicate the
same arguments-as-object loop; extract that logic into a private helper (e.g.,
fn render_arg_body(arguments: &Value) -> String) and call it from both
functions. Move the map iteration and v_str conversion into that helper (use the
same Value::String handling) and return the concatenated body string, then
replace the duplicated body construction in case_1_single_call and
case_5_missing_end_token_recovery with a call to render_arg_body to produce
{body}; keep the existing formatting and behavior (including the missing-end
intentional omission in case_5).
lib/parsers/src/tool_calling/test_cases/tests.rs (1)

97-103: KnownBroken rejects parse errors — intentional, but couples the fixture to the exact failure mode.

A parser that returns Err for a missing-end-token input fails this branch even though "no recoverable call" is morally the same outcome as Ok((vec![], _)). If any of the 11 currently-KnownBroken parsers ever changes to surface an error instead of silently dropping (a perfectly reasonable hardening), the fixture has to be touched again. Consider either accepting both Ok(empty) and Err as valid KnownBroken outcomes, or extending KnownBroken with an explicit expected: BrokenKind { Drops, Errors } discriminant so the intent is recorded. Non-blocking — fine to defer to part 2 when more cases land.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/parsers/src/tool_calling/test_cases/tests.rs` around lines 97 - 103, The
KnownBroken handling in the test (matching Err(e) and producing a known-broken
parse-error message) is too strict; update the test logic around the parser
result match (where parser_name and case_label are used) to treat both Err(_)
and Ok((vec![], _)) as valid KnownBroken outcomes, or alternatively add an
explicit discriminant to KnownBroken (e.g., BrokenKind {Drops, Errors}) and
check that expected kind against the actual result (Err => Errors, Ok with empty
vec => Drops); ensure the message/optional detail uses input, parser_name and
case_label consistently when either outcome is accepted.
lib/parsers/src/tool_calling/test_cases/normalize.rs (1)

44-45: Minor: silent JSON-parse fallback to Value::Null could obscure failure diagnostics.

If a parser regresses and emits a non-JSON arguments string, you'll see Null in the canonicalized diff rather than the underlying parse error. The test still fails (Null ≠ expected object), but the failure message is less informative. Consider surfacing the parse error in the canonical form (e.g., wrap in a Result-like value or panic with context) so a future regression points directly at the offending parser. Non-blocking for this scaffold PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/parsers/src/tool_calling/test_cases/normalize.rs` around lines 44 - 45,
Replace the silent fallback that sets `arguments: Value` via
serde_json::from_str(...).unwrap_or(Value::Null) so parse failures are surfaced;
instead capture the Result from serde_json::from_str for
`call.function.arguments` and on Err either panic with a clear message including
the parse error and the original `call.function.arguments` payload or convert
the error into a Result-like canonical value so test diffs show the parsing
error (reference symbols: `arguments`, `call.function.arguments`,
serde_json::from_str).
lib/parsers/src/tool_calling/test_cases/fixtures/generic_xml.rs (1)

31-58: Body rendering doesn't XML-escape values or keys.

render_body interpolates k and v_str directly into XML without escaping <, >, &, or " (for the name="..." attribute). For the current trivial CASE.1 inputs this is invisible, but as soon as a fixture supplies arguments containing those characters (likely needed for CASE.4 malformed-args coverage), the harness will emit malformed XML by accident rather than the intended scenario, masking real parser behavior.

Recommend either documenting the "ASCII-identifier-only" assumption on the canonical inputs, or routing values through a small escape helper. Same note applies to function_name in the name="..." attribute path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/parsers/src/tool_calling/test_cases/fixtures/generic_xml.rs` around lines
31 - 58, render_body is inserting raw keys, values and function_name into XML
without escaping, which will produce invalid XML when inputs contain <, >, &, or
" characters; update render_body (and any helper used by
XmlFunctionForm::NameAttr and EqualsName branches) to run k, v_str and
function_name through an XML-escape function that replaces &, <, > and " (and '
if used) with entity references before formatting, or call an existing escape
utility if available; ensure attribute values in the NameAttr branch are
properly quoted and escaped; add a small unit test fixture demonstrating an
argument containing &<>\" to verify the output is well-formed.
lib/parsers/src/tool_calling/test_cases/mod.rs (1)

63-68: Doc comments contradict the actual trait definition.

The Unimplemented variant doc says "Used only as the trait default — every concrete fixture must override every method," but the trait below has no default implementations — every method is required. So Unimplemented is currently unreachable from the trait surface (hence #[allow(dead_code)]). Similarly, the trait-level "No default implementations: when a new case is added to this trait, every fixture must explicitly take a position on it" comment is consistent with the code but contradicts the Unimplemented rationale.

Either:

  • Add a Unimplemented default to future-added methods (so adding a case is non-breaking but produces a CI fail), or
  • Rewrite the Unimplemented doc to reflect that it's reserved for future use / matrix reporting and that the current design enforces coverage via missing-impl compile errors instead.

Also applies to: 77-79

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/parsers/src/tool_calling/test_cases/mod.rs` around lines 63 - 68, The doc
for the Unimplemented enum variant contradicts the fixture trait (which
currently provides no default implementations), so update the docs to reflect
reality: change the Unimplemented variant comment to say it is reserved for
future use / matrix reporting and that the current fixture trait requires all
methods to be implemented (no defaults), and also adjust the trait-level comment
to remove the "used as trait default" phrasing; reference the Unimplemented enum
variant and the fixture trait in this module so readers understand that adding a
new case remains a breaking change unless defaults are later introduced.
lib/parsers/src/tool_calling/test_cases/fixtures/pythonic.rs (1)

14-24: Consider escaping string values in render_args (low priority).

format!("\"{s}\"") will produce syntactically broken Pythonic if any future canonical argument string contains a " or \. Today's CASE.1 inputs are presumably trivial, but as more cases are added (CASE.4 malformed args explicitly requires non-trivial strings), this will silently emit invalid input rather than a controlled malformation. Either document the assumption that CASE.1 inputs are quote-free, or apply minimal Python-string escaping.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/parsers/src/tool_calling/test_cases/fixtures/pythonic.rs` around lines 14
- 24, The current string formatting in render_args uses format!("\"{s}\"") which
will produce invalid Pythonic if s contains " or \; update the Value::String arm
to escape backslashes and double-quotes before wrapping with quotes (e.g.,
replace '\' -> '\\' and '"' -> '\"') or add a small helper function
(escape_python_string) and call it from the closure that builds the
"{k}={v_str}" entries; ensure the escaping is applied only for Value::String and
leave other Value variants using to_string().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lib/parsers/src/tool_calling/test_cases/fixtures/generic_xml.rs`:
- Around line 31-58: render_body is inserting raw keys, values and function_name
into XML without escaping, which will produce invalid XML when inputs contain <,
>, &, or " characters; update render_body (and any helper used by
XmlFunctionForm::NameAttr and EqualsName branches) to run k, v_str and
function_name through an XML-escape function that replaces &, <, > and " (and '
if used) with entity references before formatting, or call an existing escape
utility if available; ensure attribute values in the NameAttr branch are
properly quoted and escaped; add a small unit test fixture demonstrating an
argument containing &<>\" to verify the output is well-formed.

In `@lib/parsers/src/tool_calling/test_cases/fixtures/glm47.rs`:
- Around line 20-59: Both case_1_single_call and
case_5_missing_end_token_recovery duplicate the same arguments-as-object loop;
extract that logic into a private helper (e.g., fn render_arg_body(arguments:
&Value) -> String) and call it from both functions. Move the map iteration and
v_str conversion into that helper (use the same Value::String handling) and
return the concatenated body string, then replace the duplicated body
construction in case_1_single_call and case_5_missing_end_token_recovery with a
call to render_arg_body to produce {body}; keep the existing formatting and
behavior (including the missing-end intentional omission in case_5).

In `@lib/parsers/src/tool_calling/test_cases/fixtures/pythonic.rs`:
- Around line 14-24: The current string formatting in render_args uses
format!("\"{s}\"") which will produce invalid Pythonic if s contains " or \;
update the Value::String arm to escape backslashes and double-quotes before
wrapping with quotes (e.g., replace '\' -> '\\' and '"' -> '\"') or add a small
helper function (escape_python_string) and call it from the closure that builds
the "{k}={v_str}" entries; ensure the escaping is applied only for Value::String
and leave other Value variants using to_string().

In `@lib/parsers/src/tool_calling/test_cases/mod.rs`:
- Around line 63-68: The doc for the Unimplemented enum variant contradicts the
fixture trait (which currently provides no default implementations), so update
the docs to reflect reality: change the Unimplemented variant comment to say it
is reserved for future use / matrix reporting and that the current fixture trait
requires all methods to be implemented (no defaults), and also adjust the
trait-level comment to remove the "used as trait default" phrasing; reference
the Unimplemented enum variant and the fixture trait in this module so readers
understand that adding a new case remains a breaking change unless defaults are
later introduced.

In `@lib/parsers/src/tool_calling/test_cases/normalize.rs`:
- Around line 44-45: Replace the silent fallback that sets `arguments: Value`
via serde_json::from_str(...).unwrap_or(Value::Null) so parse failures are
surfaced; instead capture the Result from serde_json::from_str for
`call.function.arguments` and on Err either panic with a clear message including
the parse error and the original `call.function.arguments` payload or convert
the error into a Result-like canonical value so test diffs show the parsing
error (reference symbols: `arguments`, `call.function.arguments`,
serde_json::from_str).

In `@lib/parsers/src/tool_calling/test_cases/tests.rs`:
- Around line 97-103: The KnownBroken handling in the test (matching Err(e) and
producing a known-broken parse-error message) is too strict; update the test
logic around the parser result match (where parser_name and case_label are used)
to treat both Err(_) and Ok((vec![], _)) as valid KnownBroken outcomes, or
alternatively add an explicit discriminant to KnownBroken (e.g., BrokenKind
{Drops, Errors}) and check that expected kind against the actual result (Err =>
Errors, Ok with empty vec => Drops); ensure the message/optional detail uses
input, parser_name and case_label consistently when either outcome is accepted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f0a9b5e2-832f-4154-9273-396d73c4e07c

📥 Commits

Reviewing files that changed from the base of the PR and between 46db930 and 8fbbd4b.

📒 Files selected for processing (24)
  • lib/parsers/src/tool_calling/dsml/parser.rs
  • lib/parsers/src/tool_calling/harmony/harmony_parser.rs
  • lib/parsers/src/tool_calling/json/base_json_parser.rs
  • lib/parsers/src/tool_calling/json/deepseek_v3_1_parser.rs
  • lib/parsers/src/tool_calling/json/deepseek_v3_parser.rs
  • lib/parsers/src/tool_calling/mod.rs
  • lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs
  • lib/parsers/src/tool_calling/test_cases/fixtures/deepseek_v3.rs
  • lib/parsers/src/tool_calling/test_cases/fixtures/deepseek_v3_1.rs
  • lib/parsers/src/tool_calling/test_cases/fixtures/dsml.rs
  • lib/parsers/src/tool_calling/test_cases/fixtures/generic_xml.rs
  • lib/parsers/src/tool_calling/test_cases/fixtures/glm47.rs
  • lib/parsers/src/tool_calling/test_cases/fixtures/harmony.rs
  • lib/parsers/src/tool_calling/test_cases/fixtures/json_wrapped.rs
  • lib/parsers/src/tool_calling/test_cases/fixtures/kimi_k2.rs
  • lib/parsers/src/tool_calling/test_cases/fixtures/mod.rs
  • lib/parsers/src/tool_calling/test_cases/fixtures/nemotron_deci.rs
  • lib/parsers/src/tool_calling/test_cases/fixtures/pythonic.rs
  • lib/parsers/src/tool_calling/test_cases/mod.rs
  • lib/parsers/src/tool_calling/test_cases/normalize.rs
  • lib/parsers/src/tool_calling/test_cases/tests.rs
  • lib/parsers/src/tool_calling/xml/glm47_parser.rs
  • lib/parsers/src/tool_calling/xml/kimi_k2_parser.rs
  • lib/parsers/src/tool_calling/xml/parser.rs

Initial prototype reduced to a single contract case. CASE.5 (missing
end-token recovery) and the KnownBroken FixtureCase variant added
~200 lines for one extra scenario; defer them to a follow-up so this
PR stays focused on "can a shared contract test even fit cleanly?"

Removed:
- FixtureCase::KnownBroken variant
- ToolCallFixture::case_5_missing_end_token_recovery method
- case_5_missing_end_token_recovery_all_parsers test
- KnownBroken handling in run_case
- MissingEndBehavior enum, on_missing_end / recovery_reason fields
  on family fixtures (no longer used)

Kept: CASE.1 (single-call happy path) running across all 18 parsers,
the four-state to three-state simplification (Sample / NotApplicable /
Unimplemented), the family-fixture refactor.

385 tests pass (was 389, lost the 4 case_5 sub-results; parser-internal
test_parse_malformed_no_section_end etc. still cover CASE.5 inline).

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
@keivenchang
Copy link
Copy Markdown
Contributor Author

Superseded by #8728 — head branch renamed to add the DIS-1842 prefix per branch-naming convention; GitHub auto-closed this PR despite the rename API saying it should migrate. All work continues there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant