Skip to content

feat(mvn): Maven filter + Surefire/Failsafe XML enrichment#4

Closed
mariuszs wants to merge 53 commits intodevelopfrom
feat/mvn-surefire-xml
Closed

feat(mvn): Maven filter + Surefire/Failsafe XML enrichment#4
mariuszs wants to merge 53 commits intodevelopfrom
feat/mvn-surefire-xml

Conversation

@mariuszs
Copy link
Copy Markdown
Owner

Summary

  • Add Maven (Java) filter module with state-machine parsers for mvn test, compile, checkstyle:check, dependency:tree
  • Enrich mvn test/verify failure output with structured details from target/surefire-reports/TEST-*.xml and target/failsafe-reports/*.xml
  • Stack traces segmented on Caused by: with framework frames collapsed and root-cause preserved
  • Autodetect application package from pom.xml <groupId> (override via RTK_MVN_APP_PACKAGE)
  • Red-flag heuristic for "0 tests" with no fresh XML reports
  • Time-gated report reads skip stale files from previous runs
  • 97-99%+ token savings on happy path, ≥85% on enriched failure path

Test plan

  • Snapshot tests for enriched surefire/failsafe rendering
  • Token savings assertions (happy path identity, failure path ≥85%)
  • Stack trace parsing: segments, framework collapsing, root-cause cap, hard cap
  • Surefire XML: passing/failing/skipped/error testsuites, system-out/err capture
  • pom.xml groupId detection: top-level, parent fallback, env override, missing
  • Time-gate: stale files skipped, fresh files parsed
  • All 1683 tests pass, zero clippy warnings

aeppling and others added 30 commits March 31, 2026 20:24
Co-Authored-By: ahundt <ATHundt@gmail.com>
+ remove unused import
  What changed:
  - Add `run_claude()` with permissions check, audit logging, tool_input
    preservation, and Ask/Allow/Deny support
  - Add `run_cursor()` with flat JSON format (`permission`/`updated_input`)
  - Add `audit_log()` (best-effort append when RTK_HOOK_AUDIT=1)
  - Fix `run_gemini()` to load exclude_commands from config
  - Convert all hook stdout to `writeln!` with `#[deny(clippy::print_stdout)]`
    to prevent JSON protocol corruption (Claude Code bug #4669)
  - Replace string-based heredoc detection with lexer-based `has_heredoc()`
    (quote-aware: `<<` inside quotes no longer false-positives)
  - Add shell prefix peeling (noglob, command, builtin, exec, nocorrect)
    to `rewrite_segment()` in registry.rs
  - Fix python3 -m pytest pattern, add pip show, add gt (Graphite) to RULES
  - Remove `command ` from IGNORED_PREFIXES (was blocking `command git status`)
  - Register `rtk hook claude`/`rtk hook cursor` binary commands in
    settings.json instead of writing bash script files
  - Add legacy script migration (deletes old rtk-rewrite.sh on `rtk init`)
  - Simplify hook_check and integrity for script-free model
Integrates ~30 develop commits (PR rtk-ai#997): AWS expansion (8→25 cmds),
SSH signing for git commit/push, go test context, grep stdin leak fix,
default-to-ask permissions, gh pr merge passthrough.

Conflict resolution (4 files):
- git.rs: kept .output()+stdin(inherit) for commit/push (SSH/GPG signing)
- go_cmd.rs: accepted incoming + added pub(crate) visibility
- hook_check.rs: merged binary_hook_registered + other_integration_installed
- hook_cmd.rs: fixed permissions path, println→writeln for Gemini deny

Verified: 1445 tests pass, 0 clippy errors, all manual integration tests pass.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
feat(): batch fixs + aws extended + filter quality batch
…master--components--rtk

chore(master): release 0.35.0
- pipe_cmd: fix panic on multi-byte UTF-8 at 1024 byte boundary (floor_char_boundary in auto_detect_filter)
- pipe_cmd: cap stdin at 10 MiB to prevent OOM (reuses RAW_CAP)
- stream: hoist RAW_CAP to pub const at module level
- hook_cmd: check deny before get_rewritten in handle_vscode
    (matches handle_copilot_cli and run_claude order)
  - hook_cmd: escape backslash and pipe in audit log sanitizer
  - tsc_cmd: hoist duplicate TSC_ERROR regex to single module-level
    lazy_static
4 fixes applied (all confirmed introduced by PR rtk-ai#956, all tests pass):

- P0 NEW-passthrough — pipe_cmd.rs: passthrough before cap read
- P1 BUFFERED-panic — stream.rs: catch_unwind on Buffered filter
- P1 STREAM-postcap — stream.rs: stop feeding filter after cap
- P2 OFFBYONE-rawcap — stream.rs: 5 cap boundary checks fixed

5 findings dropped (not introduced by PR or not bugs):

- DENY-claude: pre-existing on master
- AUDIT-asymmetry: intentional scope choice, not a bug
- GEMINI-test: pre-existing test pattern from master
- SAVINGS-threshold: 40% is correct (filters achieve ~46%)
- STDERR-test: cosmetic CI, not correctness
fix(docs): use release please changelog no manual
feat(refacto-core): binary hook w/ native cmd exec + streaming
…:check, dependency:tree

Adds `rtk mvn` with four filters:
- `mvn test` — state machine parser (Preamble → Testing → Summary → Done)
  that accumulates counts across T E S T S sections for multi-module builds
  and surefire+failsafe (`mvn verify`).
- `mvn compile` — line filter routed also from `process-classes` /
  `test-compile` via a shared `COMPILE_LIKE_GOALS` tuple table.
- `mvn checkstyle:check` — compact violation rewrite with Help-boilerplate strip.
- `mvn dependency:tree` — boilerplate/duplicate strip, transitive collapse.

Auto-detects `./mvnw` wrapper; falls back to system `mvn`. Other goals
(e.g. `spring-boot:run`, `install`) stream through unchanged for safety
with metrics-only tracking. Discover rules rewrite bare `mvn`/`mvnw`
invocations to `rtk mvn`. Replaces the previous TOML filter
`src/filters/mvn-build.toml`.

38 unit tests + 14 real-output fixtures covering pass/fail/multi-module
cases; verify-fixture savings ≥ 90%.
Ports maven-mcp's SurefireReportParser and StackTraceProcessor to Rust
as post-text-filter enrichment layer. Adds appPackage autodetect from
pom.xml groupId and no-tests red-flag heuristic. Time-gated XML reads
prevent stale fixture pollution. Targets our fork only; stacks on
feat/mvn-rust-module.
mariuszs added 23 commits April 16, 2026 07:54
21 tasks, TDD red-green-commit per step, targets our fork's master via
feat/mvn-surefire-xml stacked on feat/mvn-rust-module. Covers the full
spec: stack_trace port, surefire_reports parser, pom_groupid autodetect,
mvn_cmd integration, snapshot + savings tests, docs, PR.
Empty stubs for stack_trace, surefire_reports, pom_groupid. Adds
filetime dev-dep for mtime-based time-gate tests in later tasks.
Splits Java stack traces on top-level 'Caused by:' while keeping
indented Caused by lines inside Suppressed blocks as frames.
Counts Unicode chars, not bytes. 200-char cap matches maven-mcp original.
Structural lines (Suppressed:, indented Caused by:) are always
preserved during frame collapsing.
Emits '... N framework frames omitted' for runs of non-app frames;
preserves app and structural (Suppressed / nested Caused by) frames.
Structural lines (Suppressed / nested Caused by) bypass the cap and
are always preserved.
Wires parse_segments, add_collapsed_frames, add_root_cause_frames into
the public process(raw, app_package, max_lines) API. Hard cap stubbed
for next task.
When root-cause header lies beyond the line cap, emit a synthetic layout
with a truncated-intermediate marker so the diagnostic punchline survives.
Covers passing, failing, failing-with-logs, skipped, and error cases —
will feed surefire_reports parser tests in the next tasks.
Handles testsuite/testcase/failure/error/system-out/system-err with
per-test 2000-char log limit and 50-line stack trace truncation.
Classifies failure vs error by element name.
Asserts passing test system-out is not leaked, error vs failure kinds
are distinguished, and skipped counts are preserved.
Aggregates TEST-*.xml files; filters stale by mtime >= since; counts
malformed files without crashing. Applies total-output-limit across
failures.
Asserts the third 4KB test_output is nulled when 10000-char budget
is exhausted.
Reads top-level <project>/<groupId> with fallback to <parent>/<groupId>.
RTK_MVN_APP_PACKAGE env var overrides. Malformed POMs return None.
Two failsafe-report XMLs (ApplicationContext failure, port conflict)
and a Spring Caused-by chain for stack_trace::process coverage.
Prepares scaffolding for XML report enrichment. enrich_with_reports is
currently an identity function; real logic lands in the next commit.
Appends a structured Failures section for each report directory, with
per-failure stack trace (framework-frame-collapsed), optional captured
output, and a reports-processed footer. Short-circuits on happy path
to avoid I/O. Emits a red-flag message when 'no tests run' is reported
but also no fresh XML reports are present.
…ng match

"10 passed" previously triggered the zero_tests branch via substring
match on "0 passed". Anchoring with ": 0 passed" scopes the check to
literal zero. Also translates the red-flag message to English.
Pins output format for surefire-only, both-report-dirs, and the no-tests
red-flag path. Adjust with 'cargo insta review' when output changes.
Asserts happy-path enrichment is a no-op and that even on the enriched
failure path with a multi-segment Caused-by chain we stay under 15% of
the raw log size.
Describes the new post-filter XML read, groupId autodetect order,
stale-file time-gate, and the rtk proxy escape hatch.
- Consolidate TestCounts into TestSummary (identical struct)
- Unify add_collapsed_frames/add_root_cause_frames into add_frames
- Remove dead params from parse_dir_with_limits
- Change capture: Option<String> to bool in pom_groupid
- Clean up #[allow(dead_code)] with proper visibility
- Remove WHAT comments
@mariuszs
Copy link
Copy Markdown
Owner Author

Recreating — fork develop was out of sync

@mariuszs mariuszs closed this Apr 16, 2026
mariuszs added a commit that referenced this pull request Apr 17, 2026
Addresses review items from sibling PR rtk-ai#368 that apply to this PR:

- P1 #4 (code duplication): run_checkstyle, run_dep_tree, and the
  compile-like runner shared ~15 lines of near-identical cmd construction +
  runner::run_filtered plumbing. Extract run_simple_goal(binary, goal,
  tee_slug, filter, args, verbose) as the shared shell. Drops the
  compile_like_labels indirection (slug lookup now inline in
  run_compile_like). run_tests_like stays separate because XML enrichment
  needs cwd + app_pkgs + closure capture.

- Minor #subcmd_savings: mvn and mvnd discover rules had empty
  subcmd_savings. Populate with measured per-goal ratios (test 99%, verify
  95%, checkstyle 90%, dependency:tree 70%, compile 85%) so `rtk discover`
  reports accurate opportunity per goal instead of the rule-level 90%.
mariuszs added a commit that referenced this pull request Apr 17, 2026
Addresses review items from sibling PR rtk-ai#368 that apply to this PR:

- P1 #4 (code duplication): run_checkstyle, run_dep_tree, and the
  compile-like runner shared ~15 lines of near-identical cmd construction +
  runner::run_filtered plumbing. Extract run_simple_goal(binary, goal,
  tee_slug, filter, args, verbose) as the shared shell. Drops the
  compile_like_labels indirection (slug lookup now inline in
  run_compile_like). run_tests_like stays separate because XML enrichment
  needs cwd + app_pkgs + closure capture.

- Minor #subcmd_savings: mvn and mvnd discover rules had empty
  subcmd_savings. Populate with measured per-goal ratios (test 99%, verify
  95%, checkstyle 90%, dependency:tree 70%, compile 85%) so `rtk discover`
  reports accurate opportunity per goal instead of the rule-level 90%.
mariuszs added a commit that referenced this pull request Apr 17, 2026
Addresses review items from sibling PR rtk-ai#368 that apply to this PR:

- P1 #4 (code duplication): run_checkstyle, run_dep_tree, and the
  compile-like runner shared ~15 lines of near-identical cmd construction +
  runner::run_filtered plumbing. Extract run_simple_goal(binary, goal,
  tee_slug, filter, args, verbose) as the shared shell. Drops the
  compile_like_labels indirection (slug lookup now inline in
  run_compile_like). run_tests_like stays separate because XML enrichment
  needs cwd + app_pkgs + closure capture.

- Minor #subcmd_savings: mvn and mvnd discover rules had empty
  subcmd_savings. Populate with measured per-goal ratios (test 99%, verify
  95%, checkstyle 90%, dependency:tree 70%, compile 85%) so `rtk discover`
  reports accurate opportunity per goal instead of the rule-level 90%.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants