Conversation
Implement performance benchmarks to measure P95/P99 solve times for problem sizes of 50, 100, and 200 candidate POIs, as specified in Phase 4 of the roadmap. The benchmark infrastructure includes: - bench_support.rs: Deterministic POI generation with clustered distributions and distance-based travel time matrices using seeded RNG (rand_chacha) - solver_benchmarks.rs: Criterion benchmarks with 100 samples per problem size for reliable percentile estimation - Makefile target: `make bench` for running benchmarks Design decisions: - Clustered POI distribution (5 clusters) mimics real-world attraction patterns - Distance-based travel times with ±20% noise for realistic routing scenarios - Deterministic seeding (seed=42) ensures reproducible results across runs - Uses FixedMatrixTravelTimeProvider for full benchmark reproducibility 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdd Criterion-based benchmarks and deterministic benchmark utilities to the VRP solver workspace, update workspace dev-dependencies and Changes
Sequence DiagramsequenceDiagram
participant Criterion as Criterion
participant Bench as Bench Suite
participant Support as Bench Support
participant Store as MemoryStore
participant Provider as FixedMatrixProvider
participant Solver as VrpSolver
Note over Criterion: Start benchmark harness
Criterion->>Bench: trigger benchmark group
Bench->>Support: generate_clustered_pois(count, seed)
Support-->>Bench: Vec<PointOfInterest>
Bench->>Support: generate_travel_time_matrix(pois, seed)
Support-->>Bench: travel_time_matrix
Bench->>Store: create MemoryStore with candidate POIs
Bench->>Provider: create FixedMatrixProvider(matrix)
Bench->>Solver: instantiate VrpSolver(store, provider, scorer)
rect rgb(220,248,198)
note right of Solver: Timed solve phase
Bench->>Solver: solve(request)
Solver->>Provider: query travel times
Provider-->>Solver: Duration values
Solver->>Store: read/write solver state
Solver-->>Bench: solution + diagnostics
end
Bench->>Criterion: record and report timing
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used📓 Path-based instructions (1)**/*.rs📄 CodeRabbit inference engine (AGENTS.md)
Files:
⚙️ CodeRabbit configuration file
Files:
🧬 Code graph analysis (2)wildside-solver-vrp/tests/bench_integration.rs (1)
wildside-solver-vrp/benches/bench_support.rs (4)
🔍 Remote MCPThe repository is not currently indexed in Deepwiki. Let me try searching for information about Criterion benchmarking best practices and related tooling to provide additional context for reviewing this PR.Based on my research, I've gathered relevant technical context for reviewing this PR. Here's a concise summary of findings: Key Technical Context for PR ReviewCriterion Configuration for CI/Long-Running BenchmarksThe PR addresses a common challenge: Criterion has a default sample size value of 100, but reduced to 10 for functions that take significant time, since 100 samples would take excessive time. The PR's approach of using For long-running benchmarks (like VRP solver), Criterion provides SamplingMode option for long-running benchmarks, which can be configured via BenchmarkGroup::sampling_mode(SamplingMode). Deterministic RNG Implementation ValidationThe use of ChaCha8Rng is appropriate for benchmarking reproducibility. ChaCha generators (8-, 12- and 20-round variants) are all deterministic and portable with testing against reference vectors. Additionally, reproducibility requires using a named RNG with a fixed algorithm (e.g. ChaCha12Rng not StdRng). The PR correctly uses Throughput Measurement and Normal Distribution UsageCriterion's throughput function is used for reporting input sizes and calculating benchmark throughput. The PR's approach of measuring POI counts and travel times aligns with this pattern. The use of Test Coverage StrategyThe multi-layered testing approach (unit tests, BDD tests, integration tests) is solid:
[::web_search::] ⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (11)
Comment |
Reviewer's GuideAdds a Criterion-based benchmarking suite and supporting utilities for the VRP solver, wires it into the Rust workspace and Makefile, and updates the roadmap to mark benchmarking as complete, without touching production solver logic. Sequence diagram for running VRP solver benchmarks via make benchsequenceDiagram
actor Developer
participant Makefile
participant Cargo
participant CriterionHarness as Criterion_harness
participant BenchFn as bench_solve_times
participant Solver as VrpSolver
Developer->>Makefile: make bench
Makefile->>Cargo: cargo bench --package wildside-solver-vrp
Cargo->>CriterionHarness: build_and_run_bench_target solver_benchmarks
CriterionHarness->>BenchFn: invoke_bench_functions
loop for_each_problem_size
BenchFn->>BenchFn: generate_clustered_pois
BenchFn->>BenchFn: generate_travel_time_matrix
BenchFn->>Solver: solve(request)
Solver-->>BenchFn: solution
end
CriterionHarness-->>Cargo: benchmark_results_with_statistics
Cargo-->>Developer: console_output_and_html_report_links
Class diagram for new VRP benchmark support utilities and benchmarksclassDiagram
class bench_support {
<<module>>
+u64 BENCHMARK_SEED
+Theme THEMES[4]
+usize CLUSTER_COUNT
+f64 CLUSTER_SPREAD
+f64 AREA_SIZE
+f64 WALKING_SPEED_DEG_PER_SEC
+generate_clustered_pois(count usize, seed u64) Vec~PointOfInterest~
+generate_travel_time_matrix(pois Vec~PointOfInterest~, seed u64) Vec~Vec~Duration~~
-set_matrix_cell(matrix Vec~Vec~Duration~~, i usize, j usize, duration Duration) void
-box_muller(rng Rng, std_dev f64) (f64, f64)
}
class solver_benchmarks {
<<bench_module>>
+usize PROBLEM_SIZES[*]
+u16 DURATION_MINUTES
+build_benchmark_request(seed u64) SolveRequest
+create_depot(start Coord~f64~) PointOfInterest
+bench_solve_times(c Criterion) void
}
class VrpSolver {
+new(store MemoryStore, provider FixedMatrixTravelTimeProvider, scorer TagScorer) VrpSolver
+solve(request SolveRequest) Solution
}
class MemoryStore {
+with_pois(pois Vec~PointOfInterest~) MemoryStore
}
class FixedMatrixTravelTimeProvider {
+new(matrix Vec~Vec~Duration~~) FixedMatrixTravelTimeProvider
}
class SolveRequest {
+Coord~f64~ start
+Option~Coord~f64~~ end
+u16 duration_minutes
+InterestProfile interests
+u64 seed
+Option~usize~ max_nodes
}
class InterestProfile {
+new() InterestProfile
+with_weight(theme Theme, weight f64) InterestProfile
}
class PointOfInterest {
+new(id u64, location Coord~f64~, tags Tags) PointOfInterest
+with_empty_tags(id u64, location Coord~f64~) PointOfInterest
+Coord~f64~ location
}
class TagScorer
class Criterion
class Coord~f64~
class Tags
class Theme
class Duration
class Solution
solver_benchmarks --> bench_support : uses
solver_benchmarks --> VrpSolver : constructs_and_calls
solver_benchmarks --> MemoryStore : uses
solver_benchmarks --> FixedMatrixTravelTimeProvider : uses
solver_benchmarks --> Criterion : benchmark_group_api
solver_benchmarks --> SolveRequest : builds
solver_benchmarks --> InterestProfile : configures
bench_support --> PointOfInterest : creates
bench_support --> Theme : assigns
bench_support --> Tags : creates
bench_support --> Duration : fills_matrix
VrpSolver --> MemoryStore : has_store
VrpSolver --> FixedMatrixTravelTimeProvider : has_travel_time_provider
VrpSolver --> TagScorer : uses
SolveRequest --> InterestProfile : has
SolveRequest --> Coord~f64~ : uses
PointOfInterest --> Coord~f64~ : has
PointOfInterest --> Tags : has
InterestProfile --> Theme : keys
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Hey - I've found 5 issues, and left some high level feedback:
- The doctest module paths in
benches/bench_support.rs(e.g.use wildside_solver_vrp::benches::bench_support::...) won’t resolve becausebenchesisn’t part of the library API; consider marking these examples asignoreor moving them to a library module where the paths are valid. - The Criterion configuration (
sample_size(100)withmeasurement_time(10s)across three problem sizes) may lead to long-running benchmarks in CI; consider reducing these defaults or gating heavier runs behind a non-default flag/profile. - In
generate_travel_time_matrix, instead of castingtime_secstou64and usingDuration::from_secs, considerDuration::from_secs_f64(time_secs.max(1.0))to avoid lossy casts and simplify theclippysuppression.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The doctest module paths in `benches/bench_support.rs` (e.g. `use wildside_solver_vrp::benches::bench_support::...`) won’t resolve because `benches` isn’t part of the library API; consider marking these examples as `ignore` or moving them to a library module where the paths are valid.
- The Criterion configuration (`sample_size(100)` with `measurement_time(10s)` across three problem sizes) may lead to long-running benchmarks in CI; consider reducing these defaults or gating heavier runs behind a non-default flag/profile.
- In `generate_travel_time_matrix`, instead of casting `time_secs` to `u64` and using `Duration::from_secs`, consider `Duration::from_secs_f64(time_secs.max(1.0))` to avoid lossy casts and simplify the `clippy` suppression.
## Individual Comments
### Comment 1
<location> `wildside-solver-vrp/benches/bench_support.rs:66` </location>
<code_context>
+ clippy::integer_division_remainder_used,
+ reason = "Modulo for cyclic assignment is intentional"
+ )]
+ let cluster_idx = i % CLUSTER_COUNT;
+ let centre = cluster_centres
+ .get(cluster_idx)
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying indexing, helper functions, and random sampling to reduce control-flow noise and make the benchmark utilities more straightforward to read and maintain.
You can simplify a few spots without losing any safety, which should make this benchmark helper easier to read and maintain.
### 1. Direct indexing where bounds are guaranteed
`cluster_idx` and `theme_idx` are always in-bounds, so you can drop the `get`/`unwrap_or` noise and some `#[expect]`s:
```rust
// before
let cluster_idx = i % CLUSTER_COUNT;
let centre = cluster_centres
.get(cluster_idx)
.copied()
.unwrap_or(Coord { x: 0.0, y: 0.0 });
// after
let cluster_idx = i % CLUSTER_COUNT;
let centre = cluster_centres[cluster_idx];
```
```rust
// before
let theme_idx = i % THEMES.len();
let theme = THEMES.get(theme_idx).cloned().unwrap_or(Theme::History);
// after
let theme_idx = i % THEMES.len();
let theme = THEMES[theme_idx];
```
In `generate_travel_time_matrix`, `i` and `j` are also guaranteed in-bounds:
```rust
// before
let poi_i = pois.get(i);
let poi_j = pois.get(j);
let (loc_i, loc_j) = match (poi_i, poi_j) {
(Some(a), Some(b)) => (a.location, b.location),
_ => continue,
};
// after
let loc_i = pois[i].location;
let loc_j = pois[j].location;
```
This removes impossible branches and makes the “core math” easier to see.
### 2. Inline `set_matrix_cell`
Given the matrix is fully allocated as `n x n`, the helper adds indirection without extra safety:
```rust
// before
let duration = Duration::from_secs(time_secs.max(1.0) as u64);
set_matrix_cell(&mut matrix, i, j, duration);
// after
let duration = Duration::from_secs(time_secs.max(1.0) as u64);
matrix[i][j] = duration;
```
You can then remove `set_matrix_cell` entirely.
### 3. Use a normal distribution instead of manual Box–Muller
For benchmarks, using `rand_distr::Normal` reads more directly and lets you drop several float-`expect`s and the custom helper:
```rust
use rand_distr::{Normal, Distribution};
// inside generate_clustered_pois
let normal = Normal::new(0.0, CLUSTER_SPREAD).expect("valid std dev");
let dx = normal.sample(&mut rng);
let dy = normal.sample(&mut rng);
let location = Coord {
x: centre.x + dx,
y: centre.y + dy,
};
```
You can then remove `box_muller` entirely.
### 4. Trim `#[expect]` annotations after simplification
Once you simplify indexing and remove the helper functions, some `#[expect]`s become unnecessary or can be consolidated. For example, after inlining the distance calculation:
```rust
let dx = loc_j.x - loc_i.x;
let dy = loc_j.y - loc_i.y;
let distance = (dx * dx + dy * dy).sqrt();
let base_time_secs = distance / WALKING_SPEED_DEG_PER_SEC;
let noise_factor: f64 = rng.gen_range(0.8..1.2);
let time_secs = base_time_secs * noise_factor;
let duration = Duration::from_secs(time_secs.max(1.0) as u64);
```
You can selectively keep one or two `#[expect(clippy::float_arithmetic, ..)]` at the most relevant locations instead of many small ones, improving signal-to-noise.
</issue_to_address>
### Comment 2
<location> `wildside-solver-vrp/benches/bench_support.rs:47` </location>
<code_context>
+/// let pois = generate_clustered_pois(50, 42);
+/// assert_eq!(pois.len(), 50);
+/// ```
+#[must_use]
+pub fn generate_clustered_pois(count: usize, seed: u64) -> Vec<PointOfInterest> {
+ let mut rng = ChaCha8Rng::seed_from_u64(seed);
</code_context>
<issue_to_address>
**issue (review_instructions):** Add unit tests that exercise `generate_clustered_pois` and `generate_travel_time_matrix`, and a higher-level behavioural test that validates their end-to-end behaviour.
Add focused unit tests for `generate_clustered_pois` (e.g., deterministic output given a seed, correct length, use of all clusters/themes) and `generate_travel_time_matrix` (e.g., matrix shape, symmetry or expected bounds, non-zero off-diagonal entries).
In addition, add at least one behavioural test that uses these helpers together with a minimal solver setup (or a lightweight stub) to verify they produce coherent benchmark inputs end to end (e.g., depot + POIs -> matrix -> can be consumed by the solver/travel-time provider without panics or shape mismatches). Place these in the regular test suite so they run in CI rather than only via `cargo bench`.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*`
**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.
</details>
</issue_to_address>
### Comment 3
<location> `wildside-solver-vrp/benches/solver_benchmarks.rs:66` </location>
<code_context>
+///
+/// The benchmark uses 100 samples and 10-second measurement windows for
+/// reliable P95/P99 estimation.
+fn bench_solve_times(c: &mut Criterion) {
+ let mut group = c.benchmark_group("solve_time");
+
</code_context>
<issue_to_address>
**issue (review_instructions):** Add tests that cover the benchmark setup logic (request construction, depot handling, and solver wiring) in addition to the benchmarks themselves.
Treat the benchmark harness as a new feature and cover its behaviour with tests. Add at least:
* Unit-style tests for `build_benchmark_request` and `create_depot` (e.g., verify start coordinate, interest weights, duration, and seed are set as expected; verify depot ID and location).
* A behavioural/integration-style test that builds a small POI set, constructs a `FixedMatrixTravelTimeProvider`, creates a `VrpSolver`, and calls `solve` once, asserting that it returns successfully and that the inputs are wired as expected (e.g., depot included, candidate POIs in store).
Ensure these tests live in the normal test suite so that changes to the benchmarking setup are exercised by CI, not only by running Criterion manually.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*`
**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.
</details>
</issue_to_address>
### Comment 4
<location> `wildside-solver-vrp/benches/solver_benchmarks.rs:13` </location>
<code_context>
+//! ```
+
+// Criterion macros generate code that triggers missing_docs warnings.
+#![allow(missing_docs, reason = "Criterion macros generate undocumented code")]
+
+use std::time::Duration;
</code_context>
<issue_to_address>
**issue (review_instructions):** Using a crate-level `#![allow(missing_docs, ..)]` violates the lint-suppression rules (`#[allow]` forbidden and no blanket/file-wide suppression).
This line introduces a crate-level `#[allow]` for `missing_docs`, which conflicts with the instructions: `#[allow]` is forbidden, and only narrowly scoped `#[expect(lint, reason = "...")]` is allowed (no blanket/file-wide suppression).
If Criterion’s generated code is triggering `missing_docs`, please instead:
- Remove this `#![allow(..)]`, and
- Narrowly suppress on the specific item(s) or module(s) that need it using `#[expect(missing_docs, reason = "FIXME: <link> – document benchmark entrypoints")]`.
If a global suppression is truly unavoidable, it should still not use `#[allow]`, and would need a clear `FIXME:` with link and be scoped as narrowly as possible.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*.rs`
**Instructions:**
Lints must not be silenced except as a **last resort**.
* `#[allow]` is **forbidden**.
* Only **narrowly scoped** `#[expect(lint, reason = "...")]` is allowed.
* No lint groups, no blanket or file-wide suppression.
* Include `FIXME:` with link if a fix is expected.
</details>
</issue_to_address>
### Comment 5
<location> `docs/roadmap.md:175` </location>
<code_context>
invariants like "total route duration must not exceed Tmax" and "route
must start and end at the same point".
- - [ ] Use `criterion` to create a benchmark suite that measures the P95 and
+ - [x] Use `criterion` to create a benchmark suite that measures the P95 and
P99 solve times for various problem sizes (e.g., 50, 100, 200 candidate
POIs).
</code_context>
<issue_to_address>
**issue (review_instructions):** P95 and P99 appear as uncommon acronyms and are not defined on first use in this sentence.
To follow the guideline on defining uncommon acronyms, consider expanding P95 and P99 on first use (for example, by explaining that they refer to 95th and 99th percentile solve times).
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*.md`
**Instructions:**
Define uncommon acronyms on first use.
</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Replace manual Box-Muller with rand_distr::Normal - Use Duration::from_secs_f64 instead of lossy cast - Simplify indexing to direct array access with #[expect] - Mark doctests as ignore (not accessible from benchmark crate) - Fix crate-level #![allow] to use narrowly scoped #[expect] - Add rand_distr workspace dependency Add comprehensive tests for benchmark helpers: - Unit tests for generate_clustered_pois and generate_travel_time_matrix - Unit tests for build_benchmark_request and create_depot - BDD behavioural tests using rstest-bdd Define P95/P99 acronyms in roadmap.md. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This comment was marked as resolved.
This comment was marked as resolved.
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 15567 🏁 Script executed: Length of output: 19888 @leynos: I've reviewed the current state of the code. Here's the status of each comment: ✅ Fully Resolved (6/8)Overall Comment 1 - Doctest module pathsRESOLVED: The doctests in Overall Comment 3 - Duration::from_secs_f64RESOLVED: Line 176 in let duration = Duration::from_secs_f64(time_secs.max(1.0));Comment 1 - SimplificationRESOLVED: All suggestions implemented:
Comment 2 - Unit tests for helpersRESOLVED:
Comment 3 - Tests for benchmark setupPARTIALLY RESOLVED: Comment 5 - P95/P99 acronymsRESOLVED: ❌ Not Resolved (2/8)Overall Comment 2 - Criterion configurationNOT RESOLVED: Lines 72-73 in group.sample_size(100);
group.measurement_time(Duration::from_secs(10));This may still lead to long-running benchmarks in CI (potentially 30+ minutes across all problem sizes). Comment 4 - Lint suppression violationNOT RESOLVED: Line 13 in #![allow(missing_docs, reason = "Criterion macros generate undocumented code")]This violates the lint-suppression rules requiring narrowly scoped Summary: 6 out of 8 issues are fully resolved. The two remaining issues are the Criterion configuration (performance concern) and the forbidden |
- Reduce criterion sample_size (100→10) and measurement_time (10s→3s) for CI-friendly runtimes (under 5 minutes total) - Replace crate-level #![allow(missing_docs)] with #![expect(...)] to comply with workspace lint policy. This is the narrowest scope possible since Rust does not support applying lint attributes to macro invocations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 32058 @leynos: I've reviewed the current state of the code. Here's the resolution status: ✅ Fully Resolved (6/8)Overall Comment 1 - Doctest module pathsRESOLVED: Lines 47 and 123 in Overall Comment 2 - Criterion configurationRESOLVED: Lines 72-73 in group.sample_size(10);
group.measurement_time(Duration::from_secs(3));This is substantially reduced from the original 100 samples / 10 seconds and is CI-friendly. Overall Comment 3 - Duration::from_secs_f64RESOLVED: Line 176 in let duration = Duration::from_secs_f64(time_secs.max(1.0));Comment 1 - SimplificationRESOLVED: All four suggested simplifications are implemented:
Comment 5 - P95/P99 acronymsRESOLVED:
|
- Reduce criterion sample_size (100→10) and measurement_time (10s→3s) for CI-friendly runtimes (under 5 minutes total) - Remove crate-level lint suppression by moving criterion_group! macro into a private module. Private module contents are not subject to the missing_docs lint, eliminating the need for any suppression. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (1)
docs/roadmap.md (1)
175-177: LGTM!The P95 and P99 acronyms are now properly expanded on first use, and the benchmark item is correctly marked as complete.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
Cargo.tomlMakefiledocs/roadmap.mdwildside-solver-vrp/Cargo.tomlwildside-solver-vrp/benches/bench_support.rswildside-solver-vrp/benches/solver_benchmarks.rswildside-solver-vrp/tests/bench_helpers.rswildside-solver-vrp/tests/bench_helpers_behaviour.rswildside-solver-vrp/tests/bench_setup.rswildside-solver-vrp/tests/features/bench_helpers.feature
🧰 Additional context used
📓 Path-based instructions (6)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Runmake check-fmt,make lint, andmake testbefore committing in Rust projects.
make check-fmtexecutescargo fmt --workspace -- --checkto validate formatting across the entire workspace without modifying files.
make lintexecutescargo clippy --workspace --all-targets --all-features -- -D warningsto lint every target with all features enabled and deny all Clippy warnings.
make testexecutescargo test --workspaceto run the full workspace test suite.
Clippy warnings MUST be disallowed in Rust code.
Fix any warnings emitted during tests in the Rust code itself rather than silencing them.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider usingArcto reduce the amount of data returned.
Write unit and behavioural tests for new functionality in Rust. Run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessarymutbindings in Rust.
Handle errors with theResulttype instead of panicking in Rust where feasible.
Avoidunsafecode in Rust unless absolutely necessary, and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not usereturnin single-line Rust functions.
Use predicate functions for conditional criteria in Rust with more than two branches.
Rust lints must not be silenced except as a last resort.
Rust lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallowin Rust.
Userstestfixtures for shared setup in Rust tests.
Replace duplicated Rust tests with `#[rstest(...)...
Files:
wildside-solver-vrp/benches/bench_support.rswildside-solver-vrp/tests/bench_setup.rswildside-solver-vrp/tests/bench_helpers.rswildside-solver-vrp/benches/solver_benchmarks.rswildside-solver-vrp/tests/bench_helpers_behaviour.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
wildside-solver-vrp/benches/bench_support.rswildside-solver-vrp/tests/bench_setup.rswildside-solver-vrp/tests/bench_helpers.rswildside-solver-vrp/benches/solver_benchmarks.rswildside-solver-vrp/tests/bench_helpers_behaviour.rs
Cargo.toml
📄 CodeRabbit inference engine (AGENTS.md)
Cargo.toml: Use explicit version ranges inCargo.tomland keep dependencies up-to-date.
All crate versions specified inCargo.tomlmust use SemVer-compatible caret requirements (e.g.,some-crate = "1.2.3"). Wildcard (*) or open-ended inequality (>=) version requirements are strictly forbidden.
Tilde requirements (~) in Rust dependencies should only be used where a dependency must be locked to patch-level updates for a specific, documented reason.
Files:
Cargo.toml
docs/**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
docs/**/*.md: Use the markdown files within thedocs/directory as a knowledge base and source of truth for project requirements, dependency choices, and architectural decisions.
When new decisions are made, requirements change, libraries are added/removed, or architectural patterns evolve, proactively update the relevant file(s) in thedocs/directory to reflect the latest state. Ensure the documentation remains accurate and current.
Documentation must use en-GB-oxendict spelling and grammar ("-ize" / "-yse" / "-our") with the exception of the LICENSE filename.
docs/**/*.md: Use British English based on the Oxford English Dictionary locale en-GB, including -ize suffixes (e.g., realize, organization), -lyse suffixes (e.g., analyse, paralyse), -our suffixes (e.g., colour, behaviour), -re suffixes (e.g., centre, calibre), double 'l' (e.g., cancelled, counsellor), maintained 'e' (e.g., likeable, rateable), -ogue suffixes (e.g., analogue, catalogue)
Use the Oxford comma in documentation: 'ships, planes, and hovercraft' where it aids comprehension
Treat company names as collective nouns, for example: 'Concordat Industries are expanding'
Write headings in sentence case
Use Markdown headings (#, ##, ###, and so on) in order without skipping levels
Follow markdownlint recommendations
Provide code blocks and lists using standard Markdown syntax
Always provide a language identifier for fenced code blocks; use 'plaintext' for non-code text
Use '-' as the first level bullet and renumber lists when items change
Prefer inline links using 'text' or angle brackets around the URL
Ensure blank lines before and after bulleted lists and fenced blocks
Ensure tables have a delimiter line below the header row
Expand any uncommon acronym on first use, for example, Continuous Integration (CI)
Wrap paragraphs at 80 columns
Do not wrap tables
Use GitHub-flavoured numeric footnotes referenced as '[^1]'
Footnotes must be numbered in order of appearance in the document
Caption ev...
Files:
docs/roadmap.md
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
**/*.md: Validate Markdown files usingmake markdownlint.
Runmake fmtafter any documentation changes to format all Markdown files and fix table markup.
Validate Mermaid diagrams in Markdown files by runningmake nixie.
Markdown paragraphs and bullet points must be wrapped at 80 columns.
Code blocks in Markdown must be wrapped at 120 columns.
Markdown tables and headings must not be wrapped.
Use dashes (-) for list bullets in Markdown.
Use GitHub-flavoured Markdown footnotes ([^1]) for references and footnotes.
Files:
docs/roadmap.md
⚙️ CodeRabbit configuration file
**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")
- Use en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Headings must not be wrapped.
- Documents must start with a level 1 heading
- Headings must correctly increase or decrease by no more than one level at a time
- Use GitHub-flavoured Markdown style for footnotes and endnotes.
- Numbered footnotes must be numbered by order of appearance in the document.
Files:
docs/roadmap.md
docs/**/!(README).md
📄 CodeRabbit inference engine (docs/documentation-style-guide.md)
Avoid first and second person personal pronouns outside the README.md file
Files:
docs/roadmap.md
docs/**/*.{md,rs}
📄 CodeRabbit inference engine (docs/documentation-style-guide.md)
Wrap code at 120 columns
Files:
docs/roadmap.md
🧬 Code graph analysis (3)
wildside-solver-vrp/tests/bench_helpers.rs (2)
wildside-solver-vrp/benches/bench_support.rs (2)
generate_clustered_pois(54-108)generate_travel_time_matrix(127-180)wildside-core/src/poi.rs (1)
with_empty_tags(148-150)
wildside-solver-vrp/benches/solver_benchmarks.rs (3)
wildside-solver-vrp/benches/bench_support.rs (2)
generate_clustered_pois(54-108)generate_travel_time_matrix(127-180)wildside-core/src/poi.rs (1)
with_empty_tags(148-150)wildside-core/src/test_support.rs (1)
with_pois(37-44)
wildside-solver-vrp/tests/bench_helpers_behaviour.rs (1)
wildside-solver-vrp/benches/bench_support.rs (2)
generate_clustered_pois(54-108)generate_travel_time_matrix(127-180)
🔍 Remote MCP
Excellent context gathered. Now let me search for best practices around integration testing with solvers/complex systems and lint suppression patterns in Rust.Now I have sufficient context. Let me compile the key findings into a concise summary.
Summary of Additional Context
Based on comprehensive research of benchmarking best practices and Rust lint suppression patterns, here is the relevant context for reviewing this PR:
Criterion Benchmarking Best Practices
Criterion's default settings include: sample size of 100 measurements, warm-up time of 3 seconds, and measurement time of 5 seconds. The PR notes mention the configuration has been reduced to sample_size(10) and measurement_time(3s) for CI compatibility, which aligns with best practices—if measurement time is not achievable, Criterion will spend as long as needed to collect the desired samples. Sample size must be at least 10, so the reduced configuration is at the minimum acceptable threshold and appropriate for faster CI feedback.
Deterministic Seeding with ChaCha RNG
The PR correctly uses ChaCha RNG for reproducibility, not StdRng, since named RNGs with fixed algorithms are required for reproducible results. SeedableRng::seed_from_u64 is designed for simulations where a sequence of distinct, fixed random number seeds is needed, making the PR's approach of seeding with a constant BENCHMARK_SEED appropriate.
Lint Suppression with #[expect]
The PR addresses lint suppression concerns. The expect level suppresses lints but emits a warning if the expectation is unfulfilled, notifying when the lint is no longer triggered. Importantly, lint expectations are only fulfilled by emissions suppressed by the expect attribute itself—if lint levels are modified by other attributes like allow or warn in the same scope, the expectation remains unfulfilled. This means narrowly-scoped #[expect] attributes on individual items or functions are required rather than file-wide or crate-level suppressions.
Outstanding Issues
Per the PR context, two critical items remain:
-
Missing Full Integration Test: An end-to-end behavioral test that constructs POIs via
generate_clustered_pois, builds aFixedMatrixTravelTimeProviderfrom the travel-time matrix, instantiatesVrpSolver, and callssolve()to verify the entire pipeline works together. -
Lint Suppression Scope: Any remaining file-wide or crate-level
#[expect(...)]attributes need replacement with narrowly-scoped item-level suppression on specific functions or modules wheremissing_docsis legitimately unavoidable (e.g., inside Criterion macro-generated code or test-only helper functions).
[::web_search::], [::web_search::], [::web_search::], [::web_search::]
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Sourcery review
- GitHub Check: Sourcery review
- GitHub Check: build
🔇 Additional comments (8)
Cargo.toml (1)
33-36: LGTM!The workspace dependencies use implicit SemVer caret versioning as required. The criterion crate with
html_reportsfeature and the rand ecosystem crates are appropriate additions for the benchmarking infrastructure.Makefile (1)
21-23: LGTM!The bench target is correctly wired to run Criterion benchmarks for the solver package, with appropriate PHONY declaration and help text.
wildside-solver-vrp/Cargo.toml (1)
36-38: LGTM!The benchmark configuration is correct:
harness = falseis required for Criterion to provide its own harness, and the name matches the benchmark file.wildside-solver-vrp/tests/features/bench_helpers.feature (1)
1-25: LGTM!The Gherkin scenarios comprehensively cover the key behaviours of the benchmark helpers: POI generation, matrix generation, and determinism verification. The scenarios are well-structured and will serve as good living documentation.
wildside-solver-vrp/tests/bench_helpers.rs (2)
34-103: LGTM!The POI generation tests are comprehensive, covering count verification, parameterised size handling, determinism, sequential ID assignment, cyclic theme distribution, and seed variation. Good use of
rstestfor parameterised cases.
109-207: LGTM!The travel time matrix tests thoroughly validate the matrix properties: square dimensions, zero diagonal, positive off-diagonal entries, determinism, and distance-based sanity checking. Edge cases (empty input, single POI) are properly handled.
wildside-solver-vrp/benches/solver_benchmarks.rs (1)
108-122: Good use of the private module pattern to avoid lint issues.Encapsulating the
criterion_group!andcriterion_main!macros within a private module preventsmissing_docswarnings on the generated code without requiring file-wide lint suppression. This adheres to the coding guidelines.wildside-solver-vrp/tests/bench_setup.rs (1)
44-152: Tests are thorough but verify duplicated code.The unit tests comprehensively cover the helper functions' behaviour. Once the duplication is resolved (see previous comment), these tests will provide value by verifying the actual benchmark setup code.
|
@coderabbitai Have the following now been resolved? ❌ Not Resolved (1/8) NOT RESOLVED: Line 13 in solver_benchmarks.rs now uses: #![expect(missing_docs, reason = "Criterion macros generate undocumented code")] Whilst this upgrades from #[allow] to #[expect], it remains a crate-level (file-wide) suppression. The review requirement states: "no blanket/file-wide suppression" and "narrowly scoped #[expect(lint, reason = "...")]". The #![...] syntax with the bang is file-wide, not narrowly scoped to specific items. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 6502 @leynos: Yes, Comment 4 is now fully resolved ✅ The file-wide lint suppression has been removed. The current state of
This fully complies with the review requirement: "Only narrowly scoped All 8 review comments are now resolved (the two partially resolved integration test comments from earlier were addressed in separate discussion). |
- Extract shared helpers (build_benchmark_request, create_depot) to bench_support.rs to eliminate duplication between benchmark and test files. Tests now use the actual benchmark code, preventing drift. - Consolidate lint expectations in bench_support.rs by extracting helper functions (euclidean_distance, travel_time_with_noise) that each have a single #[expect] attribute, reducing visual noise from 16+ to 3 expectations. - Remove all file-wide #![expect(...)] suppressions from test files. Apply narrowly scoped #[expect] to individual functions or module declarations instead, as required by coding guidelines. - Add integration test (bench_integration.rs) that validates the full benchmark data pipeline works with VrpSolver end-to-end: - Generates POIs using generate_clustered_pois - Builds travel time matrix using generate_travel_time_matrix - Constructs FixedMatrixTravelTimeProvider and VrpSolver - Calls solve() and asserts success 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Centre POI generation around BENCHMARK_START (0.05, 0.05) to ensure all generated POIs fall within the solver's calculated bounding box. Previously, POIs were generated in the range (0, 0) to (0.1, 0.1), which could place some POIs outside the bbox calculated from the request parameters, causing matrix dimension mismatches. Changes: - Reduce AREA_SIZE from 0.1 to 0.08 degrees - Generate cluster centres centred on BENCHMARK_START 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Further reduce AREA_SIZE from 0.08 to 0.06 degrees to ensure POIs with Gaussian spread (σ=0.005) stay within the solver's bounding box even with larger POI counts (100+). With AREA_SIZE=0.06: - Cluster centres range from 0.02 to 0.08 (0.05 ± 0.03) - With 3σ spread (0.015), POIs range from 0.005 to 0.095 - This matches the solver's bbox [0.005, 0.095] exactly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
wildside-solver-vrp/benches/bench_support.rswildside-solver-vrp/benches/solver_benchmarks.rswildside-solver-vrp/tests/bench_helpers.rswildside-solver-vrp/tests/bench_helpers_behaviour.rswildside-solver-vrp/tests/bench_integration.rswildside-solver-vrp/tests/bench_setup.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Runmake check-fmt,make lint, andmake testbefore committing in Rust projects.
make check-fmtexecutescargo fmt --workspace -- --checkto validate formatting across the entire workspace without modifying files.
make lintexecutescargo clippy --workspace --all-targets --all-features -- -D warningsto lint every target with all features enabled and deny all Clippy warnings.
make testexecutescargo test --workspaceto run the full workspace test suite.
Clippy warnings MUST be disallowed in Rust code.
Fix any warnings emitted during tests in the Rust code itself rather than silencing them.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider usingArcto reduce the amount of data returned.
Write unit and behavioural tests for new functionality in Rust. Run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessarymutbindings in Rust.
Handle errors with theResulttype instead of panicking in Rust where feasible.
Avoidunsafecode in Rust unless absolutely necessary, and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not usereturnin single-line Rust functions.
Use predicate functions for conditional criteria in Rust with more than two branches.
Rust lints must not be silenced except as a last resort.
Rust lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallowin Rust.
Userstestfixtures for shared setup in Rust tests.
Replace duplicated Rust tests with `#[rstest(...)...
Files:
wildside-solver-vrp/tests/bench_setup.rswildside-solver-vrp/benches/solver_benchmarks.rswildside-solver-vrp/tests/bench_integration.rswildside-solver-vrp/tests/bench_helpers.rswildside-solver-vrp/benches/bench_support.rswildside-solver-vrp/tests/bench_helpers_behaviour.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
wildside-solver-vrp/tests/bench_setup.rswildside-solver-vrp/benches/solver_benchmarks.rswildside-solver-vrp/tests/bench_integration.rswildside-solver-vrp/tests/bench_helpers.rswildside-solver-vrp/benches/bench_support.rswildside-solver-vrp/tests/bench_helpers_behaviour.rs
🧬 Code graph analysis (4)
wildside-solver-vrp/tests/bench_setup.rs (1)
wildside-solver-vrp/benches/bench_support.rs (2)
build_benchmark_request(53-66)create_depot(82-84)
wildside-solver-vrp/benches/solver_benchmarks.rs (1)
wildside-solver-vrp/benches/bench_support.rs (4)
build_benchmark_request(53-66)create_depot(82-84)generate_clustered_pois(106-167)generate_travel_time_matrix(213-245)
wildside-solver-vrp/tests/bench_integration.rs (1)
wildside-core/src/test_support.rs (1)
with_pois(37-44)
wildside-solver-vrp/benches/bench_support.rs (2)
wildside-core/src/route.rs (1)
start(130-130)wildside-core/src/poi.rs (1)
with_empty_tags(148-150)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (1)
wildside-solver-vrp/benches/solver_benchmarks.rs (1)
70-78: Remove this review comment—the benchmark structure is correct.
VrpSolver::solve()is entirely stateless and idempotent. All fields inVrpSolverare immutable, the method signature takes only immutable references (&self,&SolveRequest), and every invocation creates fresh local state with no caching or memoisation. Creating the solver once per benchmark size and reusing it across iterations is the proper benchmarking approach; moving instantiation insideb.iter()would incorrectly include allocation overhead in the measurement. The current structure optimally measures solve performance alone.
Address code review feedback by replacing iteration-based tests with rstest parameterized test cases: - bench_integration.rs: Convert seed loop to #[case] parameterization for individual test case reporting - bench_setup.rs: Consolidate four duplicate theme weight tests into a single parameterized test 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
wildside-solver-vrp/tests/bench_integration.rs (1)
69-117: Parameterised tests implemented correctly.Both
benchmark_helpers_work_with_various_problem_sizesanddifferent_seeds_produce_valid_solver_inputsuse#[rstest]parameterization properly, providing clear individual test case reporting. The past review concern about converting seed iteration to parameterized cases has been fully addressed.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
wildside-solver-vrp/tests/bench_integration.rswildside-solver-vrp/tests/bench_setup.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Runmake check-fmt,make lint, andmake testbefore committing in Rust projects.
make check-fmtexecutescargo fmt --workspace -- --checkto validate formatting across the entire workspace without modifying files.
make lintexecutescargo clippy --workspace --all-targets --all-features -- -D warningsto lint every target with all features enabled and deny all Clippy warnings.
make testexecutescargo test --workspaceto run the full workspace test suite.
Clippy warnings MUST be disallowed in Rust code.
Fix any warnings emitted during tests in the Rust code itself rather than silencing them.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider usingArcto reduce the amount of data returned.
Write unit and behavioural tests for new functionality in Rust. Run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessarymutbindings in Rust.
Handle errors with theResulttype instead of panicking in Rust where feasible.
Avoidunsafecode in Rust unless absolutely necessary, and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not usereturnin single-line Rust functions.
Use predicate functions for conditional criteria in Rust with more than two branches.
Rust lints must not be silenced except as a last resort.
Rust lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallowin Rust.
Userstestfixtures for shared setup in Rust tests.
Replace duplicated Rust tests with `#[rstest(...)...
Files:
wildside-solver-vrp/tests/bench_setup.rswildside-solver-vrp/tests/bench_integration.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
wildside-solver-vrp/tests/bench_setup.rswildside-solver-vrp/tests/bench_integration.rs
🔍 Remote MCP Context7, Deepwiki
Let me try a web search to gather relevant context about Criterion benchmarking best practices and rust benchmarking patterns:## Summary of Relevant Context for PR Review
Based on my research, here are key facts and best practices relevant to this PR's implementation:
Criterion Benchmarking Framework
Configuration for CI-Friendliness: A bigger sample should yield more accurate results if paired with a sufficiently large measurement time. If it is not possible to perform the measurement in the requested time then Criterion will spend as long as is needed to collect the desired number of samples. The PR's resolution to reduce defaults to 10 samples and 3 seconds measurement time aligns with making benchmarks suitable for CI environments.
Sample Size Requirement: Sample size must be at least 10. The PR's approach of using 10 samples meets this minimum requirement while remaining CI-friendly.
Measurement Configuration: In Criterion.rs 0.3.3, a new option was added to change the sampling mode to handle long-running benchmarks. The benchmark author can call BenchmarkGroup::sampling_mode(SamplingMode) to change the sampling mode. For benchmarks with varying problem sizes (50, 100, 200 POIs), this may be relevant if individual benchmarks become slow.
Duration API Best Practices
Avoiding Lossy Conversions: Creates a new Duration from the specified number of seconds represented as f64. This constructor will panic if secs is negative, overflows Duration or not finite. The PR's adoption of Duration::from_secs_f64 is the appropriate method for constructing durations from floating-point values, avoiding manual conversion logic.
Precision Concerns: When recording metrics that involve a span of time, the pattern duration.as_millis() as f64 is problematic because it loses the sub-milliseconds precision. The PR's approach of using deterministic, controlled Duration construction avoids these precision issues in travel-time matrix generation.
Rust Benchmarking Patterns
Testing Abstraction Levels: It is best to test at the lowest level of abstraction possible. In the case benchmarks, this makes them both easier to maintain, and it helps to reduce the amount of noise in the measurements. The PR's strategy of creating isolated bench_support utilities and conducting unit/integration tests follows this principle.
Throughput Measurements: When benchmarking some types of code it is useful to measure the throughput as well as the iteration time. Criterion.rs can estimate the throughput of a benchmark, but it needs to know how many bytes or elements each iteration will process. Throughput measurements are only supported when using the BenchmarkGroup structure. The PR's use of BenchmarkGroup with throughput() calls is appropriate for the parameterized solver benchmarks.,
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (4)
wildside-solver-vrp/tests/bench_setup.rs (3)
1-18: LGTM! Module structure and imports are well-organised.The module-level documentation clearly states the purpose, and the
#[path]inclusion ofbench_supportwith a narrowly scoped#[expect(dead_code)]is appropriate for a shared test module where not all exports are consumed by every test file.
20-70: Excellent test coverage forbuild_benchmark_request.The tests comprehensively validate all aspects of the benchmark request construction: duration, seed propagation, coordinates, optional fields, and theme weights. The parameterised theme test properly consolidates what would have been four duplicate tests.
72-111: Thorough testing ofcreate_depotwith good edge-case coverage.The tests validate all properties of the depot POI (location, id, tags) and use parameterisation to check various coordinate values including zero, positive, and negative numbers. The narrowly scoped
float_cmpexpectation within a block (lines 106-110) is a particularly precise application of the guideline.wildside-solver-vrp/tests/bench_integration.rs (1)
1-19: Module structure and imports look good.The module doc comment clearly explains the integration test's purpose, and the use of
#[path]to share bench_support utilities is appropriate for test code.
Address code review feedback: - Extract duplicated setup logic into `build_solver_and_request` helper - Remove redundant assert!(is_ok()) + expect() pattern, use expect() directly - Convert for loop to #[rstest] parameterized test cases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address code review feedback: - Fix walking speed comment to match constant (0.000014 not 0.0014) - Use separate x/y distributions for POI cluster centres - Replace flaky solve_time assertion with candidates_evaluated check - Extract build_solver_and_request helper to reduce duplication - Convert different_seeds test to parameterized #[rstest] 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Changes
Benchmark framework
Bench utilities
VRP solver benches
Cargo/workspace integration
Tests and test scaffolding
Documentation
Testing plan
Notes
🌿 Generated by Terry
ℹ️ Tag @terragon-labs to ask questions and address PR feedback
📎 Task: https://www.terragonlabs.com/task/16a3d126-1bad-4011-b4cf-eae7599d09f2