Skip to content

Plan phase 4 testing deployment polish#80

Merged
leynos merged 10 commits intomainfrom
terragon/implement-phase4-testing-deployment-85extp
Jan 2, 2026
Merged

Plan phase 4 testing deployment polish#80
leynos merged 10 commits intomainfrom
terragon/implement-phase4-testing-deployment-85extp

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Dec 26, 2025

Summary

  • Adds a Criterion-based benchmarking suite for the VRP solver across problem sizes with deterministic inputs and travel-time matrices.
  • Introduces bench utilities, a broader test scaffolding (bench helpers, integration and behavioural tests), and wiring to run performance tests locally and in CI.
  • Adds a dedicated bench target and Makefile integration for phase 4 testing deployment polish.

Changes

Benchmark framework

  • Add Criterion as a dev-dependency (with html_reports) and include rand/rand_chacha/rand_distr for deterministic benchmarks.
  • Define a workspace bench: solver_benchmarks in wildside-solver-vrp.
  • Extend Makefile with a bench target to run performance benchmarks.
  • Update docs/roadmap.md to mark the Criterion-based benchmark plan as completed (P95/P99 timing).

Bench utilities

  • New bench_support.rs providing deterministic benchmark helpers:
    • generate_clustered_pois(count, seed): create POIs distributed in clusters with deterministic RNG.
    • generate_travel_time_matrix(pois, seed): build a distance-based travel-time matrix with small noise.
    • Helpers for matrix population, Box-Muller Gaussian sampling, and safe matrix updates.

VRP solver benches

  • New benches/solver_benchmarks.rs implementing Criterion benchmarks:
    • Bench solve times for problem sizes [50, 100, 200].
    • Sets up deterministic inputs, a travel-time matrix provider, and a memory store for the solver.
    • Uses Criterion Group/BenchmarkId with a fixed sample setup for reproducible results.

Cargo/workspace integration

  • Root Cargo.toml updated with dev-dependencies: criterion, rand, rand_chacha, rand_distr.
  • wildside-solver-vrp/Cargo.toml updated to include criterion/rand/rand_chacha/rand_distr as dev-dependencies and a bench target.
  • Bench module wired into solver_benchmarks to enable bench execution.

Tests and test scaffolding

  • Added a comprehensive test suite to validate bench helpers:
    • tests/bench_helpers.rs: unit tests for clustered POI generation and travel-time matrix generation.
    • tests/bench_helpers_behaviour.rs: behaviour-style tests for bench helpers using rstest-bdd.
    • tests/bench_integration.rs: integration tests validating end-to-end bench data pipeline.
    • tests/bench_setup.rs: unit tests for bench setup helpers.
  • Added example feature file tests under tests/features/bench_helpers.feature to exercise bench flows.

Documentation

  • Roadmap doc updated to reflect completion of the benchmark plan for phase 4.

Testing plan

  • Build and run benches:
    • make bench
    • or: cargo bench --package wildside-solver-vrp
  • Verify that Criterion HTML reports are generated (html_reports feature is enabled in Cargo.toml).
  • Confirm benches compile against current wildside_core/wildside_solver_vrp interfaces and the bench helpers produce deterministic results.
  • Ensure no production code changes; benchmarks and tests are isolated to dedicated benches and test modules.

Notes

  • If CI requires a specific toolchain, ensure the environment has a compatible Rust toolchain installed.
  • This PR adds substantial bench infrastructure; reviewers should focus on correctness of bench setup, determinism guarantees, and non-production-safe code paths in bench_support.

🌿 Generated by Terry


ℹ️ Tag @terragon-labs to ask questions and address PR feedback

📎 Task: https://www.terragonlabs.com/task/16a3d126-1bad-4011-b4cf-eae7599d09f2

Implement performance benchmarks to measure P95/P99 solve times for problem
sizes of 50, 100, and 200 candidate POIs, as specified in Phase 4 of the
roadmap.

The benchmark infrastructure includes:
- bench_support.rs: Deterministic POI generation with clustered distributions
  and distance-based travel time matrices using seeded RNG (rand_chacha)
- solver_benchmarks.rs: Criterion benchmarks with 100 samples per problem size
  for reliable percentile estimation
- Makefile target: `make bench` for running benchmarks

Design decisions:
- Clustered POI distribution (5 clusters) mimics real-world attraction patterns
- Distance-based travel times with ±20% noise for realistic routing scenarios
- Deterministic seeding (seed=42) ensures reproducible results across runs
- Uses FixedMatrixTravelTimeProvider for full benchmark reproducibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 26, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

  • New Features

    • Added a Criterion-based performance benchmarking suite and a Makefile target to run benchmarks.
  • Tests

    • Added unit, behavioural, integration and Gherkin tests validating deterministic benchmark data generation, travel-time matrices and end‑to‑end benchmark workflows.
  • Chores

    • Declared benchmarking and randomness-related dependencies and configured the benchmark harness.
  • Documentation

    • Marked the benchmarking step as completed in the project roadmap.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Add Criterion-based benchmarks and deterministic benchmark utilities to the VRP solver workspace, update workspace dev-dependencies and [[bench]] config, add a Makefile bench target, and add unit, behavioural and integration tests validating POI and travel-time generation.

Changes

Cohort / File(s) Summary
Workspace & Makefile
Cargo.toml, wildside-solver-vrp/Cargo.toml, Makefile, docs/roadmap.md
Add workspace dev-dependencies (criterion with html_reports, rand, rand_chacha, rand_distr, serde, serde_json); add [[bench]] entry for solver_benchmarks; add bench phony Makefile target; mark benchmark item as done in roadmap.
Benchmark support
wildside-solver-vrp/benches/bench_support.rs
Add deterministic benchmark utilities and constants: BENCHMARK_SEED, DURATION_MINUTES, BENCHMARK_START, generate_clustered_pois(), generate_travel_time_matrix(), build_benchmark_request(), create_depot() and internal helpers for distances and noise.
Criterion benchmarks
wildside-solver-vrp/benches/solver_benchmarks.rs
Add Criterion benchmark group solve_time over sizes; prepare deterministic inputs, build MemoryStore and FixedMatrixTravelTimeProvider, instantiate VrpSolver, and measure solver.solve(&request) with CI-friendly sampling and throughput.
Unit & integration tests
wildside-solver-vrp/tests/bench_setup.rs, wildside-solver-vrp/tests/bench_helpers.rs, wildside-solver-vrp/tests/bench_integration.rs
Add unit and integration tests verifying benchmark request and depot fields, POI generation determinism, ID/theme assignment, travel-time matrix properties, and end-to-end solver compatibility across sizes and seeds.
Behavioural tests & feature file
wildside-solver-vrp/tests/bench_helpers_behaviour.rs, wildside-solver-vrp/tests/features/bench_helpers.feature
Add rstest-bdd behavioural tests and Gherkin scenarios validating POI and travel-time matrix generation determinism and expected properties.

Sequence Diagram

sequenceDiagram
    participant Criterion as Criterion
    participant Bench as Bench Suite
    participant Support as Bench Support
    participant Store as MemoryStore
    participant Provider as FixedMatrixProvider
    participant Solver as VrpSolver

    Note over Criterion: Start benchmark harness
    Criterion->>Bench: trigger benchmark group
    Bench->>Support: generate_clustered_pois(count, seed)
    Support-->>Bench: Vec<PointOfInterest>
    Bench->>Support: generate_travel_time_matrix(pois, seed)
    Support-->>Bench: travel_time_matrix
    Bench->>Store: create MemoryStore with candidate POIs
    Bench->>Provider: create FixedMatrixProvider(matrix)
    Bench->>Solver: instantiate VrpSolver(store, provider, scorer)
    rect rgb(220,248,198)
        note right of Solver: Timed solve phase
        Bench->>Solver: solve(request)
        Solver->>Provider: query travel times
        Provider-->>Solver: Duration values
        Solver->>Store: read/write solver state
        Solver-->>Bench: solution + diagnostics
    end
    Bench->>Criterion: record and report timing
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🌱 Seed the clusters, let the RNG hum,
Matrices murmur where the distances come,
Criterion counts each steady stride,
Solver spins with deterministic pride,
Bench blooms green where repeats abide.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Plan phase 4 testing deployment polish' is vague and uses a generic descriptor ('polish') that doesn't clearly convey the primary change—adding a Criterion benchmarking suite for the VRP solver. Revise the title to be more specific about the main change. For example: 'Add Criterion benchmarking suite for VRP solver' or 'Implement phase 4 testing with Criterion benchmarks'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description is well-structured, relates directly to the changeset, and clearly outlines the benchmarking suite additions, test scaffolding, and tooling integration.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch terragon/implement-phase4-testing-deployment-85extp

📜 Recent review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 641e19c and 57cb035.

📒 Files selected for processing (2)
  • wildside-solver-vrp/benches/bench_support.rs
  • wildside-solver-vrp/tests/bench_integration.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run make check-fmt, make lint, and make test before committing in Rust projects.
make check-fmt executes cargo fmt --workspace -- --check to validate formatting across the entire workspace without modifying files.
make lint executes cargo clippy --workspace --all-targets --all-features -- -D warnings to lint every target with all features enabled and deny all Clippy warnings.
make test executes cargo test --workspace to run the full workspace test suite.
Clippy warnings MUST be disallowed in Rust code.
Fix any warnings emitted during tests in the Rust code itself rather than silencing them.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality in Rust. Run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings in Rust.
Handle errors with the Result type instead of panicking in Rust where feasible.
Avoid unsafe code in Rust unless absolutely necessary, and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line Rust functions.
Use predicate functions for conditional criteria in Rust with more than two branches.
Rust lints must not be silenced except as a last resort.
Rust lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared setup in Rust tests.
Replace duplicated Rust tests with `#[rstest(...)...

Files:

  • wildside-solver-vrp/tests/bench_integration.rs
  • wildside-solver-vrp/benches/bench_support.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • wildside-solver-vrp/tests/bench_integration.rs
  • wildside-solver-vrp/benches/bench_support.rs
🧬 Code graph analysis (2)
wildside-solver-vrp/tests/bench_integration.rs (1)
wildside-solver-vrp/benches/bench_support.rs (4)
  • build_benchmark_request (53-66)
  • create_depot (82-84)
  • generate_clustered_pois (106-171)
  • generate_travel_time_matrix (217-249)
wildside-solver-vrp/benches/bench_support.rs (4)
wildside-solver-vrp/tests/bench_helpers_behaviour.rs (1)
  • new (39-48)
wildside-core/src/route.rs (1)
  • start (130-130)
wildside-core/src/poi.rs (1)
  • with_empty_tags (148-150)
wildside-solver-vrp/src/vrp.rs (2)
  • distance (166-174)
  • duration (176-184)
🔍 Remote MCP

The repository is not currently indexed in Deepwiki. Let me try searching for information about Criterion benchmarking best practices and related tooling to provide additional context for reviewing this PR.Based on my research, I've gathered relevant technical context for reviewing this PR. Here's a concise summary of findings:

Key Technical Context for PR Review

Criterion Configuration for CI/Long-Running Benchmarks

The PR addresses a common challenge: Criterion has a default sample size value of 100, but reduced to 10 for functions that take significant time, since 100 samples would take excessive time. The PR's approach of using sample_size(10) and measurement_time(Duration::from_secs(3)) aligns with established best practices for CI-friendly benchmark runtimes. Sample size must be at least 10 with Criterion, and panics if n < 10.

For long-running benchmarks (like VRP solver), Criterion provides SamplingMode option for long-running benchmarks, which can be configured via BenchmarkGroup::sampling_mode(SamplingMode).

Deterministic RNG Implementation Validation

The use of ChaCha8Rng is appropriate for benchmarking reproducibility. ChaCha generators (8-, 12- and 20-round variants) are all deterministic and portable with testing against reference vectors. Additionally, reproducibility requires using a named RNG with a fixed algorithm (e.g. ChaCha12Rng not StdRng).

The PR correctly uses ChaCha8Rng::seed_from_u64() which is designed exactly for applications like simulations that require sequence of distinct, fixed random number seeds.

Throughput Measurement and Normal Distribution Usage

Criterion's throughput function is used for reporting input sizes and calculating benchmark throughput. The PR's approach of measuring POI counts and travel times aligns with this pattern. The use of rand_distr::Normal for noise simulation is sound, as it's part of the standard distribution framework used for benchmarking realistic inputs.

Test Coverage Strategy

The multi-layered testing approach (unit tests, BDD tests, integration tests) is solid:

  • Unit tests validate individual helper functions
  • BDD/behavior tests document expected outcomes
  • Integration tests verify end-to-end VrpSolver compatibility with generated benchmark data

[::web_search::]

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Sourcery review
  • GitHub Check: build
🔇 Additional comments (11)
wildside-solver-vrp/tests/bench_integration.rs (5)

1-19: Module structure and imports look excellent.

The module-level doc comment clearly explains the integration test's purpose. The #[path] inclusion of bench_support from the benches directory is the correct approach for sharing benchmark utilities with tests, maintaining separation of concerns whilst enabling code reuse.


21-49: Excellent helper extraction that eliminates duplication.

The build_solver_and_request helper encapsulates the common setup pattern used across all integration tests, following DRY principles. The clear documentation and step-by-step explanation make the setup sequence easy to understand. Well done avoiding .expect() in this shared fixture, adhering to the strict guidelines for helpers outside test functions.


51-74: Solid end-to-end integration test with flakiness avoidance.

The test correctly exercises the full benchmark data pipeline and validates solver compatibility. Using candidates_evaluated rather than solve_time for assertions is a good choice to avoid flakiness on fast systems. The descriptive .expect() message aids debugging if the test fails.


76-89: Properly parameterized test covering multiple problem sizes.

The rstest parameterization follows coding guidelines perfectly, replacing what could have been a loop with individual test cases that report separately. Testing sizes 10, 50, and 100 provides good coverage of the benchmark helper's scalability.


91-104: Effective seed variation test with proper parameterization.

The parameterized approach allows each seed (42, 100, 999) to report as an individual test case, improving debugging granularity. This validates that the benchmark helpers produce valid solver inputs across different seeds whilst maintaining determinism.

wildside-solver-vrp/benches/bench_support.rs (6)

1-12: Clear module documentation and appropriate imports.

The module-level doc comment succinctly explains the purpose of these benchmark utilities, emphasising the deterministic and reproducible nature of the data generation.


14-38: Well-documented constants with helpful real-world context.

The constants are clearly named and include comments that translate abstract degree values into tangible distances (500m, 6km) and speeds (5 km/h). This makes the benchmark configuration easy to understand and adjust. Visibility modifiers are correctly applied.


40-66: Clean benchmark request builder with deterministic configuration.

The function creates a standardised solve request for benchmarks with a consistent interest profile. The #[must_use] attribute is appropriate, and the ignored doc example correctly avoids module path issues. The weighted interest profile (Art: 0.8, History: 0.5, Nature: 0.3, Culture: 0.2) provides realistic test data.


68-84: Straightforward depot creation helper.

The function provides a clear abstraction for creating a depot POI at the start location. Using ID 0 and empty tags is a sensible convention for the depot in travel-time matrices.


86-171: Robust deterministic POI generation with well-documented lint exceptions.

The function generates realistic clustered POI distributions using a seeded ChaCha RNG for reproducibility. The unwrap on line 132 is correctly guarded with a narrowly scoped #[expect] and cannot fail in practice (CLUSTER_SPREAD is a positive constant). All other lint expectations are appropriately narrow and include clear reasons. The round-robin cluster and theme assignment ensures even distribution across categories.


173-249: Excellent refactoring with consolidated helper functions.

The extraction of euclidean_distance and travel_time_with_noise into separate helpers successfully reduces duplication and consolidates lint expectations, addressing past review feedback. The function-level indexing expectation in generate_travel_time_matrix with its detailed reason is much clearer than multiple inline annotations. The use of Duration::from_secs_f64 (line 197) correctly avoids lossy casts. The nested loop structure correctly skips the diagonal and generates a complete travel-time matrix suitable for the fixed matrix provider.


Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Dec 26, 2025

Reviewer's Guide

Adds a Criterion-based benchmarking suite and supporting utilities for the VRP solver, wires it into the Rust workspace and Makefile, and updates the roadmap to mark benchmarking as complete, without touching production solver logic.

Sequence diagram for running VRP solver benchmarks via make bench

sequenceDiagram
    actor Developer
    participant Makefile
    participant Cargo
    participant CriterionHarness as Criterion_harness
    participant BenchFn as bench_solve_times
    participant Solver as VrpSolver

    Developer->>Makefile: make bench
    Makefile->>Cargo: cargo bench --package wildside-solver-vrp
    Cargo->>CriterionHarness: build_and_run_bench_target solver_benchmarks
    CriterionHarness->>BenchFn: invoke_bench_functions

    loop for_each_problem_size
        BenchFn->>BenchFn: generate_clustered_pois
        BenchFn->>BenchFn: generate_travel_time_matrix
        BenchFn->>Solver: solve(request)
        Solver-->>BenchFn: solution
    end

    CriterionHarness-->>Cargo: benchmark_results_with_statistics
    Cargo-->>Developer: console_output_and_html_report_links
Loading

Class diagram for new VRP benchmark support utilities and benchmarks

classDiagram
    class bench_support {
        <<module>>
        +u64 BENCHMARK_SEED
        +Theme THEMES[4]
        +usize CLUSTER_COUNT
        +f64 CLUSTER_SPREAD
        +f64 AREA_SIZE
        +f64 WALKING_SPEED_DEG_PER_SEC
        +generate_clustered_pois(count usize, seed u64) Vec~PointOfInterest~
        +generate_travel_time_matrix(pois Vec~PointOfInterest~, seed u64) Vec~Vec~Duration~~
        -set_matrix_cell(matrix Vec~Vec~Duration~~, i usize, j usize, duration Duration) void
        -box_muller(rng Rng, std_dev f64) (f64, f64)
    }

    class solver_benchmarks {
        <<bench_module>>
        +usize PROBLEM_SIZES[*]
        +u16 DURATION_MINUTES
        +build_benchmark_request(seed u64) SolveRequest
        +create_depot(start Coord~f64~) PointOfInterest
        +bench_solve_times(c Criterion) void
    }

    class VrpSolver {
        +new(store MemoryStore, provider FixedMatrixTravelTimeProvider, scorer TagScorer) VrpSolver
        +solve(request SolveRequest) Solution
    }

    class MemoryStore {
        +with_pois(pois Vec~PointOfInterest~) MemoryStore
    }

    class FixedMatrixTravelTimeProvider {
        +new(matrix Vec~Vec~Duration~~) FixedMatrixTravelTimeProvider
    }

    class SolveRequest {
        +Coord~f64~ start
        +Option~Coord~f64~~ end
        +u16 duration_minutes
        +InterestProfile interests
        +u64 seed
        +Option~usize~ max_nodes
    }

    class InterestProfile {
        +new() InterestProfile
        +with_weight(theme Theme, weight f64) InterestProfile
    }

    class PointOfInterest {
        +new(id u64, location Coord~f64~, tags Tags) PointOfInterest
        +with_empty_tags(id u64, location Coord~f64~) PointOfInterest
        +Coord~f64~ location
    }

    class TagScorer
    class Criterion
    class Coord~f64~
    class Tags
    class Theme
    class Duration
    class Solution

    solver_benchmarks --> bench_support : uses
    solver_benchmarks --> VrpSolver : constructs_and_calls
    solver_benchmarks --> MemoryStore : uses
    solver_benchmarks --> FixedMatrixTravelTimeProvider : uses
    solver_benchmarks --> Criterion : benchmark_group_api
    solver_benchmarks --> SolveRequest : builds
    solver_benchmarks --> InterestProfile : configures

    bench_support --> PointOfInterest : creates
    bench_support --> Theme : assigns
    bench_support --> Tags : creates
    bench_support --> Duration : fills_matrix

    VrpSolver --> MemoryStore : has_store
    VrpSolver --> FixedMatrixTravelTimeProvider : has_travel_time_provider
    VrpSolver --> TagScorer : uses

    SolveRequest --> InterestProfile : has
    SolveRequest --> Coord~f64~ : uses

    PointOfInterest --> Coord~f64~ : has
    PointOfInterest --> Tags : has

    InterestProfile --> Theme : keys
Loading

File-Level Changes

Change Details Files
Introduce deterministic benchmark support utilities for POI generation and travel-time matrix construction.
  • Add BENCHMARK_SEED constant, theme set, clustering parameters, and walking-speed constants for reproducible setups.
  • Implement generate_clustered_pois to create clustered POIs using a seeded ChaCha8 RNG, Box-Muller Gaussian sampling, and cyclic theme assignment.
  • Implement generate_travel_time_matrix to build a Duration matrix from POI coordinates with distance-based times and bounded random noise, plus a safe matrix cell setter.
  • Add internal box_muller helper for generating Gaussian-distributed coordinate offsets.
wildside-solver-vrp/benches/bench_support.rs
Add Criterion benchmark suite for the VRP solver using deterministic inputs and fixed problem sizes.
  • Create solver_benchmarks Criterion bench that measures solve time for 50, 100, and 200 candidate POIs.
  • Build a standard SolveRequest with fixed interests, time budget, and seed to keep runs comparable.
  • Construct a depot POI, merge with candidate POIs, and use FixedMatrixTravelTimeProvider over a generated travel-time matrix.
  • Use MemoryStore and VrpSolver wiring from wildside_core and wildside_solver_vrp test-support, discarding results and configuring throughput/sample size/measurement time for P95/P99 tracking.
  • Register the bench group and main entry with criterion_group/criterion_main and allow missing_docs from generated macros.
wildside-solver-vrp/benches/solver_benchmarks.rs
Wire Criterion and RNG crates into the workspace and VRP solver crate as dev-dependencies and register a non-harness bench target.
  • Add criterion (with html_reports), rand, and rand_chacha as workspace-level dev-dependencies.
  • Add criterion, rand, and rand_chacha as dev-dependencies for wildside-solver-vrp and enable a [[bench]] named solver_benchmarks with harness = false.
Cargo.toml
wildside-solver-vrp/Cargo.toml
Cargo.lock
Extend project tooling to run benchmarks conveniently and independently from tests.
  • Add a bench phony target in the Makefile that runs cargo bench for the wildside-solver-vrp package.
  • Ensure bench is listed in the .PHONY target list so it does not collide with a real file.
Makefile
Update roadmap documentation to reflect completion of the Criterion benchmarking milestone.
  • Mark the roadmap item for creating a Criterion benchmark suite measuring P95/P99 solve times as completed.
docs/roadmap.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@leynos leynos marked this pull request as ready for review December 26, 2025 20:02
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 5 issues, and left some high level feedback:

  • The doctest module paths in benches/bench_support.rs (e.g. use wildside_solver_vrp::benches::bench_support::...) won’t resolve because benches isn’t part of the library API; consider marking these examples as ignore or moving them to a library module where the paths are valid.
  • The Criterion configuration (sample_size(100) with measurement_time(10s) across three problem sizes) may lead to long-running benchmarks in CI; consider reducing these defaults or gating heavier runs behind a non-default flag/profile.
  • In generate_travel_time_matrix, instead of casting time_secs to u64 and using Duration::from_secs, consider Duration::from_secs_f64(time_secs.max(1.0)) to avoid lossy casts and simplify the clippy suppression.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The doctest module paths in `benches/bench_support.rs` (e.g. `use wildside_solver_vrp::benches::bench_support::...`) won’t resolve because `benches` isn’t part of the library API; consider marking these examples as `ignore` or moving them to a library module where the paths are valid.
- The Criterion configuration (`sample_size(100)` with `measurement_time(10s)` across three problem sizes) may lead to long-running benchmarks in CI; consider reducing these defaults or gating heavier runs behind a non-default flag/profile.
- In `generate_travel_time_matrix`, instead of casting `time_secs` to `u64` and using `Duration::from_secs`, consider `Duration::from_secs_f64(time_secs.max(1.0))` to avoid lossy casts and simplify the `clippy` suppression.

## Individual Comments

### Comment 1
<location> `wildside-solver-vrp/benches/bench_support.rs:66` </location>
<code_context>
+                clippy::integer_division_remainder_used,
+                reason = "Modulo for cyclic assignment is intentional"
+            )]
+            let cluster_idx = i % CLUSTER_COUNT;
+            let centre = cluster_centres
+                .get(cluster_idx)
</code_context>

<issue_to_address>
**issue (complexity):** Consider simplifying indexing, helper functions, and random sampling to reduce control-flow noise and make the benchmark utilities more straightforward to read and maintain.

You can simplify a few spots without losing any safety, which should make this benchmark helper easier to read and maintain.

### 1. Direct indexing where bounds are guaranteed

`cluster_idx` and `theme_idx` are always in-bounds, so you can drop the `get`/`unwrap_or` noise and some `#[expect]`s:

```rust
// before
let cluster_idx = i % CLUSTER_COUNT;
let centre = cluster_centres
    .get(cluster_idx)
    .copied()
    .unwrap_or(Coord { x: 0.0, y: 0.0 });

// after
let cluster_idx = i % CLUSTER_COUNT;
let centre = cluster_centres[cluster_idx];
```

```rust
// before
let theme_idx = i % THEMES.len();
let theme = THEMES.get(theme_idx).cloned().unwrap_or(Theme::History);

// after
let theme_idx = i % THEMES.len();
let theme = THEMES[theme_idx];
```

In `generate_travel_time_matrix`, `i` and `j` are also guaranteed in-bounds:

```rust
// before
let poi_i = pois.get(i);
let poi_j = pois.get(j);

let (loc_i, loc_j) = match (poi_i, poi_j) {
    (Some(a), Some(b)) => (a.location, b.location),
    _ => continue,
};

// after
let loc_i = pois[i].location;
let loc_j = pois[j].location;
```

This removes impossible branches and makes the “core math” easier to see.

### 2. Inline `set_matrix_cell`

Given the matrix is fully allocated as `n x n`, the helper adds indirection without extra safety:

```rust
// before
let duration = Duration::from_secs(time_secs.max(1.0) as u64);
set_matrix_cell(&mut matrix, i, j, duration);

// after
let duration = Duration::from_secs(time_secs.max(1.0) as u64);
matrix[i][j] = duration;
```

You can then remove `set_matrix_cell` entirely.

### 3. Use a normal distribution instead of manual Box–Muller

For benchmarks, using `rand_distr::Normal` reads more directly and lets you drop several float-`expect`s and the custom helper:

```rust
use rand_distr::{Normal, Distribution};

// inside generate_clustered_pois
let normal = Normal::new(0.0, CLUSTER_SPREAD).expect("valid std dev");
let dx = normal.sample(&mut rng);
let dy = normal.sample(&mut rng);

let location = Coord {
    x: centre.x + dx,
    y: centre.y + dy,
};
```

You can then remove `box_muller` entirely.

### 4. Trim `#[expect]` annotations after simplification

Once you simplify indexing and remove the helper functions, some `#[expect]`s become unnecessary or can be consolidated. For example, after inlining the distance calculation:

```rust
let dx = loc_j.x - loc_i.x;
let dy = loc_j.y - loc_i.y;
let distance = (dx * dx + dy * dy).sqrt();
let base_time_secs = distance / WALKING_SPEED_DEG_PER_SEC;
let noise_factor: f64 = rng.gen_range(0.8..1.2);
let time_secs = base_time_secs * noise_factor;
let duration = Duration::from_secs(time_secs.max(1.0) as u64);
```

You can selectively keep one or two `#[expect(clippy::float_arithmetic, ..)]` at the most relevant locations instead of many small ones, improving signal-to-noise.
</issue_to_address>

### Comment 2
<location> `wildside-solver-vrp/benches/bench_support.rs:47` </location>
<code_context>
+/// let pois = generate_clustered_pois(50, 42);
+/// assert_eq!(pois.len(), 50);
+/// ```
+#[must_use]
+pub fn generate_clustered_pois(count: usize, seed: u64) -> Vec<PointOfInterest> {
+    let mut rng = ChaCha8Rng::seed_from_u64(seed);
</code_context>

<issue_to_address>
**issue (review_instructions):** Add unit tests that exercise `generate_clustered_pois` and `generate_travel_time_matrix`, and a higher-level behavioural test that validates their end-to-end behaviour.

Add focused unit tests for `generate_clustered_pois` (e.g., deterministic output given a seed, correct length, use of all clusters/themes) and `generate_travel_time_matrix` (e.g., matrix shape, symmetry or expected bounds, non-zero off-diagonal entries).

In addition, add at least one behavioural test that uses these helpers together with a minimal solver setup (or a lightweight stub) to verify they produce coherent benchmark inputs end to end (e.g., depot + POIs -> matrix -> can be consumed by the solver/travel-time provider without panics or shape mismatches). Place these in the regular test suite so they run in CI rather than only via `cargo bench`.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.

</details>
</issue_to_address>

### Comment 3
<location> `wildside-solver-vrp/benches/solver_benchmarks.rs:66` </location>
<code_context>
+///
+/// The benchmark uses 100 samples and 10-second measurement windows for
+/// reliable P95/P99 estimation.
+fn bench_solve_times(c: &mut Criterion) {
+    let mut group = c.benchmark_group("solve_time");
+
</code_context>

<issue_to_address>
**issue (review_instructions):** Add tests that cover the benchmark setup logic (request construction, depot handling, and solver wiring) in addition to the benchmarks themselves.

Treat the benchmark harness as a new feature and cover its behaviour with tests. Add at least:

* Unit-style tests for `build_benchmark_request` and `create_depot` (e.g., verify start coordinate, interest weights, duration, and seed are set as expected; verify depot ID and location).
* A behavioural/integration-style test that builds a small POI set, constructs a `FixedMatrixTravelTimeProvider`, creates a `VrpSolver`, and calls `solve` once, asserting that it returns successfully and that the inputs are wired as expected (e.g., depot included, candidate POIs in store).

Ensure these tests live in the normal test suite so that changes to the benchmarking setup are exercised by CI, not only by running Criterion manually.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.

</details>
</issue_to_address>

### Comment 4
<location> `wildside-solver-vrp/benches/solver_benchmarks.rs:13` </location>
<code_context>
+//! ```
+
+// Criterion macros generate code that triggers missing_docs warnings.
+#![allow(missing_docs, reason = "Criterion macros generate undocumented code")]
+
+use std::time::Duration;
</code_context>

<issue_to_address>
**issue (review_instructions):** Using a crate-level `#![allow(missing_docs, ..)]` violates the lint-suppression rules (`#[allow]` forbidden and no blanket/file-wide suppression).

This line introduces a crate-level `#[allow]` for `missing_docs`, which conflicts with the instructions: `#[allow]` is forbidden, and only narrowly scoped `#[expect(lint, reason = "...")]` is allowed (no blanket/file-wide suppression).

If Criterion’s generated code is triggering `missing_docs`, please instead:
- Remove this `#![allow(..)]`, and
- Narrowly suppress on the specific item(s) or module(s) that need it using `#[expect(missing_docs, reason = "FIXME: <link> – document benchmark entrypoints")]`.

If a global suppression is truly unavoidable, it should still not use `#[allow]`, and would need a clear `FIXME:` with link and be scoped as narrowly as possible.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Lints must not be silenced except as a **last resort**.

  * `#[allow]` is **forbidden**.
  * Only **narrowly scoped** `#[expect(lint, reason = "...")]` is allowed.
  * No lint groups, no blanket or file-wide suppression.
  * Include `FIXME:` with link if a fix is expected.

</details>
</issue_to_address>

### Comment 5
<location> `docs/roadmap.md:175` </location>
<code_context>
         invariants like "total route duration must not exceed Tmax" and "route
         must start and end at the same point".
-  - [ ] Use `criterion` to create a benchmark suite that measures the P95 and
+  - [x] Use `criterion` to create a benchmark suite that measures the P95 and
     P99 solve times for various problem sizes (e.g., 50, 100, 200 candidate
     POIs).
</code_context>

<issue_to_address>
**issue (review_instructions):** P95 and P99 appear as uncommon acronyms and are not defined on first use in this sentence.

To follow the guideline on defining uncommon acronyms, consider expanding P95 and P99 on first use (for example, by explaining that they refer to 95th and 99th percentile solve times).

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.md`

**Instructions:**
Define uncommon acronyms on first use.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread wildside-solver-vrp/benches/bench_support.rs Outdated
Comment thread wildside-solver-vrp/benches/bench_support.rs
Comment thread wildside-solver-vrp/benches/solver_benchmarks.rs Outdated
Comment thread docs/roadmap.md Outdated
- Replace manual Box-Muller with rand_distr::Normal
- Use Duration::from_secs_f64 instead of lossy cast
- Simplify indexing to direct array access with #[expect]
- Mark doctests as ignore (not accessible from benchmark crate)
- Fix crate-level #![allow] to use narrowly scoped #[expect]
- Add rand_distr workspace dependency

Add comprehensive tests for benchmark helpers:
- Unit tests for generate_clustered_pois and generate_travel_time_matrix
- Unit tests for build_benchmark_request and create_depot
- BDD behavioural tests using rstest-bdd

Define P95/P99 acronyms in roadmap.md.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@leynos leynos changed the title Introduce Criterion benchmarks and bench tooling for VRP solver Add Criterion benchmarks and bench tooling for VRP solver (phase 4) Dec 27, 2025
@leynos

This comment was marked as resolved.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 28, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Read the benchmark support file
echo "=== bench_support.rs ==="
cat wildside-solver-vrp/benches/bench_support.rs

echo -e "\n\n=== solver_benchmarks.rs ==="
cat wildside-solver-vrp/benches/solver_benchmarks.rs

echo -e "\n\n=== docs/roadmap.md (relevant section) ==="
# Get lines around the P95/P99 mention
grep -A 2 -B 2 "P95" docs/roadmap.md

echo -e "\n\n=== Search for test files related to bench utilities ==="
fd -e rs . wildside-solver-vrp/tests/

echo -e "\n\n=== Check for tests mentioning bench_support or benchmark setup ==="
rg -n "bench_support|build_benchmark_request|create_depot" wildside-solver-vrp/tests/ --type rust || echo "No matches found"

Length of output: 15567


🏁 Script executed:

#!/bin/bash

echo "=== bench_helpers.rs (unit tests for bench_support) ==="
cat wildside-solver-vrp/tests/bench_helpers.rs

echo -e "\n\n=== bench_helpers_behaviour.rs (behavioral test) ==="
cat wildside-solver-vrp/tests/bench_helpers_behaviour.rs

echo -e "\n\n=== bench_setup.rs (tests for build_benchmark_request/create_depot) ==="
cat wildside-solver-vrp/tests/bench_setup.rs

Length of output: 19888


@leynos: I've reviewed the current state of the code. Here's the status of each comment:

Fully Resolved (6/8)

Overall Comment 1 - Doctest module paths

RESOLVED: The doctests in bench_support.rs now use ```ignore markers (lines 47 and 123), preventing the invalid path issues.

Overall Comment 3 - Duration::from_secs_f64

RESOLVED: Line 176 in bench_support.rs now uses:

let duration = Duration::from_secs_f64(time_secs.max(1.0));

Comment 1 - Simplification

RESOLVED: All suggestions implemented:

  • Direct indexing for clusters and themes (lines 81, 99)
  • Direct POI indexing in matrix generation (lines 153-158)
  • set_matrix_cell inlined (line 178)
  • rand_distr::Normal used instead of Box-Muller (lines 73-76)

Comment 2 - Unit tests for helpers

RESOLVED: bench_helpers.rs contains comprehensive unit tests covering:

  • Determinism, count, IDs, themes for generate_clustered_pois
  • Matrix shape, diagonal, off-diagonal, determinism for generate_travel_time_matrix
  • Plus BDD behavioural tests in bench_helpers_behaviour.rs

Comment 3 - Tests for benchmark setup

PARTIALLY RESOLVED: bench_setup.rs has thorough unit tests for build_benchmark_request and create_depot, but there's no full integration test that wires POIs + matrix + FixedMatrixTravelTimeProvider + VrpSolver and calls solve() to verify end-to-end compatibility.

Comment 5 - P95/P99 acronyms

RESOLVED: roadmap.md now expands them: "P95 (95th percentile) and P99 (99th percentile)".


Not Resolved (2/8)

Overall Comment 2 - Criterion configuration

NOT RESOLVED: Lines 72-73 in solver_benchmarks.rs still use:

group.sample_size(100);
group.measurement_time(Duration::from_secs(10));

This may still lead to long-running benchmarks in CI (potentially 30+ minutes across all problem sizes).

Comment 4 - Lint suppression violation

NOT RESOLVED: Line 13 in solver_benchmarks.rs still uses:

#![allow(missing_docs, reason = "Criterion macros generate undocumented code")]

This violates the lint-suppression rules requiring narrowly scoped #[expect(...)] instead of crate-level #[allow(...)].


Summary: 6 out of 8 issues are fully resolved. The two remaining issues are the Criterion configuration (performance concern) and the forbidden #![allow(...)] usage (policy violation).

- Reduce criterion sample_size (100→10) and measurement_time (10s→3s)
  for CI-friendly runtimes (under 5 minutes total)
- Replace crate-level #![allow(missing_docs)] with #![expect(...)]
  to comply with workspace lint policy. This is the narrowest scope
  possible since Rust does not support applying lint attributes to
  macro invocations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 29, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

echo "=== Checking bench_support.rs for doctest ignore markers and simplifications ==="
cat wildside-solver-vrp/benches/bench_support.rs

echo -e "\n\n=== Checking solver_benchmarks.rs for Criterion config and lint suppressions ==="
cat wildside-solver-vrp/benches/solver_benchmarks.rs

echo -e "\n\n=== Checking roadmap.md for P95/P99 definitions ==="
grep -A 3 -B 3 "P95\|P99" docs/roadmap.md

echo -e "\n\n=== Checking for test files ==="
fd -e rs . wildside-solver-vrp/tests/

echo -e "\n\n=== Checking test file contents ==="
if [ -f wildside-solver-vrp/tests/bench_helpers.rs ]; then
    echo "--- bench_helpers.rs ---"
    cat wildside-solver-vrp/tests/bench_helpers.rs
fi

if [ -f wildside-solver-vrp/tests/bench_helpers_behaviour.rs ]; then
    echo -e "\n--- bench_helpers_behaviour.rs ---"
    cat wildside-solver-vrp/tests/bench_helpers_behaviour.rs
fi

if [ -f wildside-solver-vrp/tests/bench_setup.rs ]; then
    echo -e "\n--- bench_setup.rs ---"
    cat wildside-solver-vrp/tests/bench_setup.rs
fi

Length of output: 32058


@leynos: I've reviewed the current state of the code. Here's the resolution status:

Fully Resolved (6/8)

Overall Comment 1 - Doctest module paths

RESOLVED: Lines 47 and 123 in bench_support.rs now use ```ignore markers.

Overall Comment 2 - Criterion configuration

RESOLVED: Lines 72-73 in solver_benchmarks.rs now use:

group.sample_size(10);
group.measurement_time(Duration::from_secs(3));

This is substantially reduced from the original 100 samples / 10 seconds and is CI-friendly.

Overall Comment 3 - Duration::from_secs_f64

RESOLVED: Line 176 in bench_support.rs now uses:

let duration = Duration::from_secs_f64(time_secs.max(1.0));

Comment 1 - Simplification

RESOLVED: All four suggested simplifications are implemented:

  • Direct indexing for clusters (line 81) and themes (line 99)
  • Direct POI indexing in matrix generation (lines 153-158)
  • set_matrix_cell inlined (line 178)
  • rand_distr::Normal used instead of Box-Muller (lines 73-76)

Comment 5 - P95/P99 acronyms

RESOLVED: roadmap.md now expands them: "P95 (95th percentile) and P99 (99th percentile)".


⚠️ Partially Resolved (2/8)

Comment 2 - Unit tests for helpers

PARTIALLY RESOLVED: bench_helpers.rs contains comprehensive unit tests covering determinism, count, IDs, themes for generate_clustered_pois, and matrix shape, diagonal, off-diagonal for generate_travel_time_matrix. bench_helpers_behaviour.rs has BDD tests.

Missing: The full end-to-end integration test that "builds a small POI set, constructs a FixedMatrixTravelTimeProvider, creates a VrpSolver, and calls solve once" to verify the helpers produce inputs compatible with the solver without panics or shape mismatches.

Comment 3 - Tests for benchmark setup

PARTIALLY RESOLVED: bench_setup.rs has thorough unit tests for build_benchmark_request and create_depot.

Missing: The behavioural/integration-style test that wires POIs + matrix + FixedMatrixTravelTimeProvider + VrpSolver and calls solve() to verify end-to-end compatibility.


Not Resolved (1/8)

Comment 4 - Lint suppression violation

NOT RESOLVED: Line 13 in solver_benchmarks.rs now uses:

#![expect(missing_docs, reason = "Criterion macros generate undocumented code")]

Whilst this upgrades from #[allow] to #[expect], it remains a crate-level (file-wide) suppression. The review requirement states: "no blanket/file-wide suppression" and "narrowly scoped #[expect(lint, reason = "...")]". The #![...] syntax with the bang is file-wide, not narrowly scoped to specific items.


Summary: 6 out of 8 issues are fully resolved. The two partially resolved issues (Comments 2 and 3) need integration tests that exercise the full benchmark pipeline with the solver. The unresolved issue (Comment 4) requires replacing the crate-level lint suppression with item-level suppressions on the specific Criterion-generated items.

- Reduce criterion sample_size (100→10) and measurement_time (10s→3s)
  for CI-friendly runtimes (under 5 minutes total)
- Remove crate-level lint suppression by moving criterion_group! macro
  into a private module. Private module contents are not subject to
  the missing_docs lint, eliminating the need for any suppression.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
docs/roadmap.md (1)

175-177: LGTM!

The P95 and P99 acronyms are now properly expanded on first use, and the benchmark item is correctly marked as complete.

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e6c8445 and 968e9e1.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • Cargo.toml
  • Makefile
  • docs/roadmap.md
  • wildside-solver-vrp/Cargo.toml
  • wildside-solver-vrp/benches/bench_support.rs
  • wildside-solver-vrp/benches/solver_benchmarks.rs
  • wildside-solver-vrp/tests/bench_helpers.rs
  • wildside-solver-vrp/tests/bench_helpers_behaviour.rs
  • wildside-solver-vrp/tests/bench_setup.rs
  • wildside-solver-vrp/tests/features/bench_helpers.feature
🧰 Additional context used
📓 Path-based instructions (6)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run make check-fmt, make lint, and make test before committing in Rust projects.
make check-fmt executes cargo fmt --workspace -- --check to validate formatting across the entire workspace without modifying files.
make lint executes cargo clippy --workspace --all-targets --all-features -- -D warnings to lint every target with all features enabled and deny all Clippy warnings.
make test executes cargo test --workspace to run the full workspace test suite.
Clippy warnings MUST be disallowed in Rust code.
Fix any warnings emitted during tests in the Rust code itself rather than silencing them.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality in Rust. Run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings in Rust.
Handle errors with the Result type instead of panicking in Rust where feasible.
Avoid unsafe code in Rust unless absolutely necessary, and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line Rust functions.
Use predicate functions for conditional criteria in Rust with more than two branches.
Rust lints must not be silenced except as a last resort.
Rust lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared setup in Rust tests.
Replace duplicated Rust tests with `#[rstest(...)...

Files:

  • wildside-solver-vrp/benches/bench_support.rs
  • wildside-solver-vrp/tests/bench_setup.rs
  • wildside-solver-vrp/tests/bench_helpers.rs
  • wildside-solver-vrp/benches/solver_benchmarks.rs
  • wildside-solver-vrp/tests/bench_helpers_behaviour.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • wildside-solver-vrp/benches/bench_support.rs
  • wildside-solver-vrp/tests/bench_setup.rs
  • wildside-solver-vrp/tests/bench_helpers.rs
  • wildside-solver-vrp/benches/solver_benchmarks.rs
  • wildside-solver-vrp/tests/bench_helpers_behaviour.rs
Cargo.toml

📄 CodeRabbit inference engine (AGENTS.md)

Cargo.toml: Use explicit version ranges in Cargo.toml and keep dependencies up-to-date.
All crate versions specified in Cargo.toml must use SemVer-compatible caret requirements (e.g., some-crate = "1.2.3"). Wildcard (*) or open-ended inequality (>=) version requirements are strictly forbidden.
Tilde requirements (~) in Rust dependencies should only be used where a dependency must be locked to patch-level updates for a specific, documented reason.

Files:

  • Cargo.toml
docs/**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

docs/**/*.md: Use the markdown files within the docs/ directory as a knowledge base and source of truth for project requirements, dependency choices, and architectural decisions.
When new decisions are made, requirements change, libraries are added/removed, or architectural patterns evolve, proactively update the relevant file(s) in the docs/ directory to reflect the latest state. Ensure the documentation remains accurate and current.
Documentation must use en-GB-oxendict spelling and grammar ("-ize" / "-yse" / "-our") with the exception of the LICENSE filename.

docs/**/*.md: Use British English based on the Oxford English Dictionary locale en-GB, including -ize suffixes (e.g., realize, organization), -lyse suffixes (e.g., analyse, paralyse), -our suffixes (e.g., colour, behaviour), -re suffixes (e.g., centre, calibre), double 'l' (e.g., cancelled, counsellor), maintained 'e' (e.g., likeable, rateable), -ogue suffixes (e.g., analogue, catalogue)
Use the Oxford comma in documentation: 'ships, planes, and hovercraft' where it aids comprehension
Treat company names as collective nouns, for example: 'Concordat Industries are expanding'
Write headings in sentence case
Use Markdown headings (#, ##, ###, and so on) in order without skipping levels
Follow markdownlint recommendations
Provide code blocks and lists using standard Markdown syntax
Always provide a language identifier for fenced code blocks; use 'plaintext' for non-code text
Use '-' as the first level bullet and renumber lists when items change
Prefer inline links using 'text' or angle brackets around the URL
Ensure blank lines before and after bulleted lists and fenced blocks
Ensure tables have a delimiter line below the header row
Expand any uncommon acronym on first use, for example, Continuous Integration (CI)
Wrap paragraphs at 80 columns
Do not wrap tables
Use GitHub-flavoured numeric footnotes referenced as '[^1]'
Footnotes must be numbered in order of appearance in the document
Caption ev...

Files:

  • docs/roadmap.md
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

**/*.md: Validate Markdown files using make markdownlint.
Run make fmt after any documentation changes to format all Markdown files and fix table markup.
Validate Mermaid diagrams in Markdown files by running make nixie.
Markdown paragraphs and bullet points must be wrapped at 80 columns.
Code blocks in Markdown must be wrapped at 120 columns.
Markdown tables and headings must not be wrapped.
Use dashes (-) for list bullets in Markdown.
Use GitHub-flavoured Markdown footnotes ([^1]) for references and footnotes.

Files:

  • docs/roadmap.md

⚙️ CodeRabbit configuration file

**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")

  • Use en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Headings must not be wrapped.
  • Documents must start with a level 1 heading
  • Headings must correctly increase or decrease by no more than one level at a time
  • Use GitHub-flavoured Markdown style for footnotes and endnotes.
  • Numbered footnotes must be numbered by order of appearance in the document.

Files:

  • docs/roadmap.md
docs/**/!(README).md

📄 CodeRabbit inference engine (docs/documentation-style-guide.md)

Avoid first and second person personal pronouns outside the README.md file

Files:

  • docs/roadmap.md
docs/**/*.{md,rs}

📄 CodeRabbit inference engine (docs/documentation-style-guide.md)

Wrap code at 120 columns

Files:

  • docs/roadmap.md
🧬 Code graph analysis (3)
wildside-solver-vrp/tests/bench_helpers.rs (2)
wildside-solver-vrp/benches/bench_support.rs (2)
  • generate_clustered_pois (54-108)
  • generate_travel_time_matrix (127-180)
wildside-core/src/poi.rs (1)
  • with_empty_tags (148-150)
wildside-solver-vrp/benches/solver_benchmarks.rs (3)
wildside-solver-vrp/benches/bench_support.rs (2)
  • generate_clustered_pois (54-108)
  • generate_travel_time_matrix (127-180)
wildside-core/src/poi.rs (1)
  • with_empty_tags (148-150)
wildside-core/src/test_support.rs (1)
  • with_pois (37-44)
wildside-solver-vrp/tests/bench_helpers_behaviour.rs (1)
wildside-solver-vrp/benches/bench_support.rs (2)
  • generate_clustered_pois (54-108)
  • generate_travel_time_matrix (127-180)
🔍 Remote MCP

Excellent context gathered. Now let me search for best practices around integration testing with solvers/complex systems and lint suppression patterns in Rust.Now I have sufficient context. Let me compile the key findings into a concise summary.

Summary of Additional Context

Based on comprehensive research of benchmarking best practices and Rust lint suppression patterns, here is the relevant context for reviewing this PR:

Criterion Benchmarking Best Practices

Criterion's default settings include: sample size of 100 measurements, warm-up time of 3 seconds, and measurement time of 5 seconds. The PR notes mention the configuration has been reduced to sample_size(10) and measurement_time(3s) for CI compatibility, which aligns with best practices—if measurement time is not achievable, Criterion will spend as long as needed to collect the desired samples. Sample size must be at least 10, so the reduced configuration is at the minimum acceptable threshold and appropriate for faster CI feedback.

Deterministic Seeding with ChaCha RNG

The PR correctly uses ChaCha RNG for reproducibility, not StdRng, since named RNGs with fixed algorithms are required for reproducible results. SeedableRng::seed_from_u64 is designed for simulations where a sequence of distinct, fixed random number seeds is needed, making the PR's approach of seeding with a constant BENCHMARK_SEED appropriate.

Lint Suppression with #[expect]

The PR addresses lint suppression concerns. The expect level suppresses lints but emits a warning if the expectation is unfulfilled, notifying when the lint is no longer triggered. Importantly, lint expectations are only fulfilled by emissions suppressed by the expect attribute itself—if lint levels are modified by other attributes like allow or warn in the same scope, the expectation remains unfulfilled. This means narrowly-scoped #[expect] attributes on individual items or functions are required rather than file-wide or crate-level suppressions.

Outstanding Issues

Per the PR context, two critical items remain:

  1. Missing Full Integration Test: An end-to-end behavioral test that constructs POIs via generate_clustered_pois, builds a FixedMatrixTravelTimeProvider from the travel-time matrix, instantiates VrpSolver, and calls solve() to verify the entire pipeline works together.

  2. Lint Suppression Scope: Any remaining file-wide or crate-level #[expect(...)] attributes need replacement with narrowly-scoped item-level suppression on specific functions or modules where missing_docs is legitimately unavoidable (e.g., inside Criterion macro-generated code or test-only helper functions).

[::web_search::], [::web_search::], [::web_search::], [::web_search::]

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Sourcery review
  • GitHub Check: Sourcery review
  • GitHub Check: build
🔇 Additional comments (8)
Cargo.toml (1)

33-36: LGTM!

The workspace dependencies use implicit SemVer caret versioning as required. The criterion crate with html_reports feature and the rand ecosystem crates are appropriate additions for the benchmarking infrastructure.

Makefile (1)

21-23: LGTM!

The bench target is correctly wired to run Criterion benchmarks for the solver package, with appropriate PHONY declaration and help text.

wildside-solver-vrp/Cargo.toml (1)

36-38: LGTM!

The benchmark configuration is correct: harness = false is required for Criterion to provide its own harness, and the name matches the benchmark file.

wildside-solver-vrp/tests/features/bench_helpers.feature (1)

1-25: LGTM!

The Gherkin scenarios comprehensively cover the key behaviours of the benchmark helpers: POI generation, matrix generation, and determinism verification. The scenarios are well-structured and will serve as good living documentation.

wildside-solver-vrp/tests/bench_helpers.rs (2)

34-103: LGTM!

The POI generation tests are comprehensive, covering count verification, parameterised size handling, determinism, sequential ID assignment, cyclic theme distribution, and seed variation. Good use of rstest for parameterised cases.


109-207: LGTM!

The travel time matrix tests thoroughly validate the matrix properties: square dimensions, zero diagonal, positive off-diagonal entries, determinism, and distance-based sanity checking. Edge cases (empty input, single POI) are properly handled.

wildside-solver-vrp/benches/solver_benchmarks.rs (1)

108-122: Good use of the private module pattern to avoid lint issues.

Encapsulating the criterion_group! and criterion_main! macros within a private module prevents missing_docs warnings on the generated code without requiring file-wide lint suppression. This adheres to the coding guidelines.

wildside-solver-vrp/tests/bench_setup.rs (1)

44-152: Tests are thorough but verify duplicated code.

The unit tests comprehensively cover the helper functions' behaviour. Once the duplication is resolved (see previous comment), these tests will provide value by verifying the actual benchmark setup code.

Comment thread wildside-solver-vrp/benches/bench_support.rs
Comment thread wildside-solver-vrp/benches/bench_support.rs
Comment thread wildside-solver-vrp/benches/solver_benchmarks.rs
Comment thread wildside-solver-vrp/tests/bench_helpers_behaviour.rs Outdated
Comment thread wildside-solver-vrp/tests/bench_helpers.rs Outdated
Comment thread wildside-solver-vrp/tests/bench_setup.rs Outdated
Comment thread wildside-solver-vrp/tests/bench_setup.rs Outdated
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Dec 29, 2025

@coderabbitai Have the following now been resolved?

❌ Not Resolved (1/8)
Comment 4 - Lint suppression violation

NOT RESOLVED: Line 13 in solver_benchmarks.rs now uses:

#![expect(missing_docs, reason = "Criterion macros generate undocumented code")]

Whilst this upgrades from #[allow] to #[expect], it remains a crate-level (file-wide) suppression. The review requirement states: "no blanket/file-wide suppression" and "narrowly scoped #[expect(lint, reason = "...")]". The #![...] syntax with the bang is file-wide, not narrowly scoped to specific items.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 29, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

echo "=== Checking solver_benchmarks.rs for lint suppressions ==="
cat wildside-solver-vrp/benches/solver_benchmarks.rs | head -50

echo -e "\n\n=== Searching for all expect/allow attributes in solver_benchmarks.rs ==="
rg -n "#\[!?\]?(expect|allow)" wildside-solver-vrp/benches/solver_benchmarks.rs || echo "No matches found"

echo -e "\n\n=== Full file content ==="
cat wildside-solver-vrp/benches/solver_benchmarks.rs

Length of output: 6502


@leynos: Yes, Comment 4 is now fully resolved

The file-wide lint suppression has been removed. The current state of solver_benchmarks.rs:

  1. No crate-level suppression: The previous #![expect(missing_docs, reason = "...")] at line 13 is gone.

  2. Structural solution for Criterion macros: The criterion_group! macro (which generates undocumented code) has been moved into a private module mod group (lines 109–116). Private modules don't trigger missing_docs warnings, eliminating the need for suppression.

  3. Only narrowly scoped suppressions remain:

    • Line 88: #[expect(clippy::as_conversions, ...)] scoped to a single variable
    • Line 96: #[expect(clippy::let_underscore_must_use, ...)] scoped to a single let statement

This fully complies with the review requirement: "Only narrowly scoped #[expect(lint, reason = "...")] is allowed (no blanket/file-wide suppression)."

All 8 review comments are now resolved (the two partially resolved integration test comments from earlier were addressed in separate discussion).

- Extract shared helpers (build_benchmark_request, create_depot) to
  bench_support.rs to eliminate duplication between benchmark and test
  files. Tests now use the actual benchmark code, preventing drift.

- Consolidate lint expectations in bench_support.rs by extracting helper
  functions (euclidean_distance, travel_time_with_noise) that each have
  a single #[expect] attribute, reducing visual noise from 16+ to 3
  expectations.

- Remove all file-wide #![expect(...)] suppressions from test files.
  Apply narrowly scoped #[expect] to individual functions or module
  declarations instead, as required by coding guidelines.

- Add integration test (bench_integration.rs) that validates the full
  benchmark data pipeline works with VrpSolver end-to-end:
  - Generates POIs using generate_clustered_pois
  - Builds travel time matrix using generate_travel_time_matrix
  - Constructs FixedMatrixTravelTimeProvider and VrpSolver
  - Calls solve() and asserts success

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Centre POI generation around BENCHMARK_START (0.05, 0.05) to ensure
all generated POIs fall within the solver's calculated bounding box.

Previously, POIs were generated in the range (0, 0) to (0.1, 0.1),
which could place some POIs outside the bbox calculated from the
request parameters, causing matrix dimension mismatches.

Changes:
- Reduce AREA_SIZE from 0.1 to 0.08 degrees
- Generate cluster centres centred on BENCHMARK_START

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Further reduce AREA_SIZE from 0.08 to 0.06 degrees to ensure POIs with
Gaussian spread (σ=0.005) stay within the solver's bounding box even
with larger POI counts (100+).

With AREA_SIZE=0.06:
- Cluster centres range from 0.02 to 0.08 (0.05 ± 0.03)
- With 3σ spread (0.015), POIs range from 0.005 to 0.095
- This matches the solver's bbox [0.005, 0.095] exactly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Dec 29, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 29, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 968e9e1 and dd5c595.

📒 Files selected for processing (6)
  • wildside-solver-vrp/benches/bench_support.rs
  • wildside-solver-vrp/benches/solver_benchmarks.rs
  • wildside-solver-vrp/tests/bench_helpers.rs
  • wildside-solver-vrp/tests/bench_helpers_behaviour.rs
  • wildside-solver-vrp/tests/bench_integration.rs
  • wildside-solver-vrp/tests/bench_setup.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run make check-fmt, make lint, and make test before committing in Rust projects.
make check-fmt executes cargo fmt --workspace -- --check to validate formatting across the entire workspace without modifying files.
make lint executes cargo clippy --workspace --all-targets --all-features -- -D warnings to lint every target with all features enabled and deny all Clippy warnings.
make test executes cargo test --workspace to run the full workspace test suite.
Clippy warnings MUST be disallowed in Rust code.
Fix any warnings emitted during tests in the Rust code itself rather than silencing them.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality in Rust. Run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings in Rust.
Handle errors with the Result type instead of panicking in Rust where feasible.
Avoid unsafe code in Rust unless absolutely necessary, and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line Rust functions.
Use predicate functions for conditional criteria in Rust with more than two branches.
Rust lints must not be silenced except as a last resort.
Rust lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared setup in Rust tests.
Replace duplicated Rust tests with `#[rstest(...)...

Files:

  • wildside-solver-vrp/tests/bench_setup.rs
  • wildside-solver-vrp/benches/solver_benchmarks.rs
  • wildside-solver-vrp/tests/bench_integration.rs
  • wildside-solver-vrp/tests/bench_helpers.rs
  • wildside-solver-vrp/benches/bench_support.rs
  • wildside-solver-vrp/tests/bench_helpers_behaviour.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • wildside-solver-vrp/tests/bench_setup.rs
  • wildside-solver-vrp/benches/solver_benchmarks.rs
  • wildside-solver-vrp/tests/bench_integration.rs
  • wildside-solver-vrp/tests/bench_helpers.rs
  • wildside-solver-vrp/benches/bench_support.rs
  • wildside-solver-vrp/tests/bench_helpers_behaviour.rs
🧬 Code graph analysis (4)
wildside-solver-vrp/tests/bench_setup.rs (1)
wildside-solver-vrp/benches/bench_support.rs (2)
  • build_benchmark_request (53-66)
  • create_depot (82-84)
wildside-solver-vrp/benches/solver_benchmarks.rs (1)
wildside-solver-vrp/benches/bench_support.rs (4)
  • build_benchmark_request (53-66)
  • create_depot (82-84)
  • generate_clustered_pois (106-167)
  • generate_travel_time_matrix (213-245)
wildside-solver-vrp/tests/bench_integration.rs (1)
wildside-core/src/test_support.rs (1)
  • with_pois (37-44)
wildside-solver-vrp/benches/bench_support.rs (2)
wildside-core/src/route.rs (1)
  • start (130-130)
wildside-core/src/poi.rs (1)
  • with_empty_tags (148-150)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (1)
wildside-solver-vrp/benches/solver_benchmarks.rs (1)

70-78: Remove this review comment—the benchmark structure is correct.

VrpSolver::solve() is entirely stateless and idempotent. All fields in VrpSolver are immutable, the method signature takes only immutable references (&self, &SolveRequest), and every invocation creates fresh local state with no caching or memoisation. Creating the solver once per benchmark size and reusing it across iterations is the proper benchmarking approach; moving instantiation inside b.iter() would incorrectly include allocation overhead in the measurement. The current structure optimally measures solve performance alone.

Comment thread wildside-solver-vrp/tests/bench_integration.rs
Comment thread wildside-solver-vrp/tests/bench_setup.rs
Address code review feedback by replacing iteration-based tests with
rstest parameterized test cases:

- bench_integration.rs: Convert seed loop to #[case] parameterization
  for individual test case reporting
- bench_setup.rs: Consolidate four duplicate theme weight tests into
  a single parameterized test

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@leynos leynos changed the title Add Criterion benchmarks and bench tooling for VRP solver (phase 4) Plan phase 4 testing deployment polish Dec 29, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
wildside-solver-vrp/tests/bench_integration.rs (1)

69-117: Parameterised tests implemented correctly.

Both benchmark_helpers_work_with_various_problem_sizes and different_seeds_produce_valid_solver_inputs use #[rstest] parameterization properly, providing clear individual test case reporting. The past review concern about converting seed iteration to parameterized cases has been fully addressed.

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd5c595 and 6fbcc20.

📒 Files selected for processing (2)
  • wildside-solver-vrp/tests/bench_integration.rs
  • wildside-solver-vrp/tests/bench_setup.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run make check-fmt, make lint, and make test before committing in Rust projects.
make check-fmt executes cargo fmt --workspace -- --check to validate formatting across the entire workspace without modifying files.
make lint executes cargo clippy --workspace --all-targets --all-features -- -D warnings to lint every target with all features enabled and deny all Clippy warnings.
make test executes cargo test --workspace to run the full workspace test suite.
Clippy warnings MUST be disallowed in Rust code.
Fix any warnings emitted during tests in the Rust code itself rather than silencing them.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality in Rust. Run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings in Rust.
Handle errors with the Result type instead of panicking in Rust where feasible.
Avoid unsafe code in Rust unless absolutely necessary, and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line Rust functions.
Use predicate functions for conditional criteria in Rust with more than two branches.
Rust lints must not be silenced except as a last resort.
Rust lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared setup in Rust tests.
Replace duplicated Rust tests with `#[rstest(...)...

Files:

  • wildside-solver-vrp/tests/bench_setup.rs
  • wildside-solver-vrp/tests/bench_integration.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • wildside-solver-vrp/tests/bench_setup.rs
  • wildside-solver-vrp/tests/bench_integration.rs
🔍 Remote MCP Context7, Deepwiki

Let me try a web search to gather relevant context about Criterion benchmarking best practices and rust benchmarking patterns:## Summary of Relevant Context for PR Review

Based on my research, here are key facts and best practices relevant to this PR's implementation:

Criterion Benchmarking Framework

Configuration for CI-Friendliness: A bigger sample should yield more accurate results if paired with a sufficiently large measurement time. If it is not possible to perform the measurement in the requested time then Criterion will spend as long as is needed to collect the desired number of samples. The PR's resolution to reduce defaults to 10 samples and 3 seconds measurement time aligns with making benchmarks suitable for CI environments.

Sample Size Requirement: Sample size must be at least 10. The PR's approach of using 10 samples meets this minimum requirement while remaining CI-friendly.

Measurement Configuration: In Criterion.rs 0.3.3, a new option was added to change the sampling mode to handle long-running benchmarks. The benchmark author can call BenchmarkGroup::sampling_mode(SamplingMode) to change the sampling mode. For benchmarks with varying problem sizes (50, 100, 200 POIs), this may be relevant if individual benchmarks become slow.

Duration API Best Practices

Avoiding Lossy Conversions: Creates a new Duration from the specified number of seconds represented as f64. This constructor will panic if secs is negative, overflows Duration or not finite. The PR's adoption of Duration::from_secs_f64 is the appropriate method for constructing durations from floating-point values, avoiding manual conversion logic.

Precision Concerns: When recording metrics that involve a span of time, the pattern duration.as_millis() as f64 is problematic because it loses the sub-milliseconds precision. The PR's approach of using deterministic, controlled Duration construction avoids these precision issues in travel-time matrix generation.

Rust Benchmarking Patterns

Testing Abstraction Levels: It is best to test at the lowest level of abstraction possible. In the case benchmarks, this makes them both easier to maintain, and it helps to reduce the amount of noise in the measurements. The PR's strategy of creating isolated bench_support utilities and conducting unit/integration tests follows this principle.

Throughput Measurements: When benchmarking some types of code it is useful to measure the throughput as well as the iteration time. Criterion.rs can estimate the throughput of a benchmark, but it needs to know how many bytes or elements each iteration will process. Throughput measurements are only supported when using the BenchmarkGroup structure. The PR's use of BenchmarkGroup with throughput() calls is appropriate for the parameterized solver benchmarks.,

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (4)
wildside-solver-vrp/tests/bench_setup.rs (3)

1-18: LGTM! Module structure and imports are well-organised.

The module-level documentation clearly states the purpose, and the #[path] inclusion of bench_support with a narrowly scoped #[expect(dead_code)] is appropriate for a shared test module where not all exports are consumed by every test file.


20-70: Excellent test coverage for build_benchmark_request.

The tests comprehensively validate all aspects of the benchmark request construction: duration, seed propagation, coordinates, optional fields, and theme weights. The parameterised theme test properly consolidates what would have been four duplicate tests.


72-111: Thorough testing of create_depot with good edge-case coverage.

The tests validate all properties of the depot POI (location, id, tags) and use parameterisation to check various coordinate values including zero, positive, and negative numbers. The narrowly scoped float_cmp expectation within a block (lines 106-110) is a particularly precise application of the guideline.

wildside-solver-vrp/tests/bench_integration.rs (1)

1-19: Module structure and imports look good.

The module doc comment clearly explains the integration test's purpose, and the use of #[path] to share bench_support utilities is appropriate for test code.

Comment thread wildside-solver-vrp/tests/bench_integration.rs Outdated
Comment thread wildside-solver-vrp/tests/bench_integration.rs Outdated
Address code review feedback:
- Extract duplicated setup logic into `build_solver_and_request` helper
- Remove redundant assert!(is_ok()) + expect() pattern, use expect() directly
- Convert for loop to #[rstest] parameterized test cases

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@leynos leynos changed the title Plan phase 4 testing deployment polish Implement phase 4 Criterion benchmarks and bench tooling Dec 30, 2025
Address code review feedback:
- Fix walking speed comment to match constant (0.000014 not 0.0014)
- Use separate x/y distributions for POI cluster centres
- Replace flaky solve_time assertion with candidates_evaluated check
- Extract build_solver_and_request helper to reduce duplication
- Convert different_seeds test to parameterized #[rstest]

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@leynos leynos changed the title Implement phase 4 Criterion benchmarks and bench tooling Plan phase 4 testing deployment polish Dec 30, 2025
@leynos leynos merged commit 881a319 into main Jan 2, 2026
4 checks passed
@leynos leynos deleted the terragon/implement-phase4-testing-deployment-85extp branch January 2, 2026 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant