feat: track and persist session peer outcomes by xdustinface · Pull Request #106 · xdustinface/rust-dashcore

xdustinface · 2026-04-17T10:40:34Z

Summary

Adds last_success, last_tried, and consecutive_failures to PeerReputation, wired into the connect/handshake lifecycle: attempts bump last_tried, successful handshakes set last_success and reset consecutive_failures, and both handshake and connect failures increment consecutive_failures.
Bumps AddrV2.time on successful handshake via AddrV2Handler::mark_seen, which preserves richer gossip services on known entries and uses the peer's actual handshake-negotiated services for first observations.
Consolidates the two-lock failure recording into an atomic record_failure_with_penalty, used from both the handshake-failure and connect-failure paths.
Unit tests cover every field transition, mark_seen behavior, and the atomic failure helper.

Backward compatibility with older reputations.json files is explicitly out of scope per #102. Stale local state can be discarded on upgrade.

Foundational work for #104 (capability-aware selection scoring) and #105 (capability-driven churn).

Closes #103

Part of #102

Adds `last_success`, `last_tried`, and `consecutive_failures` to `PeerReputation`, plus a `record_connection_failure` method on `PeerReputationManager`. All new fields use `#[serde(default)]` so existing `reputations.json` files load without migration. `record_connection_attempt` now sets `last_tried` and `record_successful_connection` sets `last_success` and resets `consecutive_failures`.

Adds unit tests for default values, `last_tried` on attempt, `last_success` plus `consecutive_failures` reset on success, failure streak increment preserving `last_success`, and legacy `reputations.json` decoding with missing fields.

Adds `AddrV2Handler::mark_seen` to refresh the stored timestamp for a directly observed peer, preserving existing services for known entries and inserting a fresh entry otherwise. `connect_to_peer` now calls `mark_seen` after a successful handshake so the `peers.dat` time reflects first-hand observation instead of gossip.

Calls `record_connection_failure` on both the TCP connect failure and the handshake failure paths in `PeerNetworkManager::connect_to_peer`, so the `consecutive_failures` streak reflects every unsuccessful attempt.

manki-review · 2026-04-17T10:40:48Z

Manki — Review complete

Planner (28s)
feature · 230 lines · 3 agents
review effort: medium · judge effort: medium

Review — 9 findings
    ✅ Security & Safety — 3 (75s)
    ✅ Correctness & Logic — 3 (210s)
    ✅ Architecture & Design — 3 (70s)

Judge — 8 kept · 0 dropped (50s)
kept: 1 required · 4 suggestion · 3 nit

Review metadata

Config:

Models: reviewer=claude-sonnet-4-6, judge=claude-sonnet-4-6
Review level: medium (auto, 230 lines)
Team: Security & Safety, Correctness & Logic, Architecture & Design
Memory: disabled
Nit handling: issues

Judge decisions:

✓ Kept: "mark_seen called with hardcoded NETWORK services, discarding handshake-negotiated capabilities" (required, high confidence) — "Impact: High (stored peer records will show only NETWORK for every directly-connected peer, silently breaking capability-aware selection once Capability-aware peer selection scoring #104 lands). Likelihood: Certain (happens on every successful handshake). The fix is clear: thread the negotiated ServiceFlags from the version message through to this call site. One reviewer flagged it, but the analysis is unambiguous."
✓ Kept: "consecutive_failures lacks a clamping deserializer, unlike analogous fields" (nit, medium confidence) — "Impact: Low today (just stored), Medium in future (enormous cooldown if Capability-aware peer selection scoring #104/Capability-driven churn in network maintenance loop #105 read the field). Likelihood: Unlikely (requires crafted or corrupted JSON). Reviewer called it nit; the concern is valid but speculative until the downstream consumer exists. Keep as nit — add the clamp when the cooldown logic lands."
✓ Kept: "SystemTime clock-error fallback sets peer timestamp to Unix epoch" (nit, high confidence) — "Impact: Medium (makes a live peer appear maximally stale). Likelihood: Unlikely (system clock before Unix epoch is essentially never). Reviewer rated nit; I agree — the suggested early-return guard is good hygiene but not urgent."
✓ Kept: "Redundant parallel fields track the same semantic concept with different types" (suggestion, high confidence) — "Impact: Medium — future contributors building cooldown logic on last_connection (Instant, resets on restart) instead of last_tried (SystemTime, persisted) would silently produce wrong behavior. Likelihood: Possible, especially since both fields are updated in the same method. A doc comment distinguishing the two, at minimum, is warranted."
✓ Kept: "AddrV2Message construction duplicated across mark_seen and add_known_address" (suggestion, medium confidence) — "Impact: Medium (third copy of the same IP-variant match; a future Tor/CJDNS variant must be added in multiple spots and will easily diverge). Likelihood: Probable as the address type set expands with Capability-aware peer management: churn and smarter selection #102 follow-ons. Extracting a private helper is a small, safe refactor."
✓ Kept: "mark_seen and add_known_address have inconsistent eviction placement" (nit, high confidence) — "Merged findings 6 and 9 — both describe the same observation: evict_if_needed is only called in the None branch of mark_seen (correct, map size unchanged on update) but unconditionally in add_known_address, and the Some branch has no comment explaining the omission. Behavior is correct; a one-line comment resolves the confusion. Low impact, cosmetic."
✓ Kept: "record_connection_failure does not defensively set last_tried" (suggestion, medium confidence) — "Impact: Medium — if called outside the current sequence (attempt always precedes failure), the resulting entry has consecutive_failures=1 but last_tried=None, an inconsistent state. Likelihood: Possible as the method gets reused. Making the method self-consistent is low-cost and prevents a future footgun."
✓ Kept: "Non-atomic failure recording: two separate write-lock acquisitions in failure paths" (suggestion, medium confidence) — "Impact: Low today (fields serve independent purposes, narrow window). Likelihood: Unlikely (race requires concurrent reader between two async awaits). However, once cooldown/backoff logic in Capability-aware peer selection scoring #104/Capability-driven churn in network maintenance loop #105 keys off consecutive_failures, a reader between the two locks could make a wrong cooldown decision. Documenting the deliberate two-step pattern or consolidating into a single lock is worth tracking."

Timing:

Parse: 1.2s
Review agents: 577.2s
Judge: 50.3s
Total: 628.8s

codecov-commenter · 2026-04-17T10:44:22Z

Codecov Report

❌ Patch coverage is 93.92265% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.99%. Comparing base (853d3e3) to head (6d1febf).

Files with missing lines	Patch %	Lines
dash-spv/src/network/addrv2.rs	94.38%	5 Missing ⚠️
dash-spv/src/network/reputation.rs	95.23%	4 Missing ⚠️
dash-spv/src/network/manager.rs	60.00%	2 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##           v0.42-dev     #106      +/-   ##
=============================================
+ Coverage      67.93%   67.99%   +0.06%     
=============================================
  Files            319      319              
  Lines          68002    68117     +115     
=============================================
+ Hits           46195    46318     +123     
+ Misses         21807    21799       -8

Flag	Coverage Δ
core	`75.25% <ø> (ø)`
ffi	`38.99% <ø> (ø)`
rpc	`20.00% <ø> (+15.72%)`	⬆️
spv	`85.36% <93.92%> (+0.02%)`	⬆️
wallet	`67.62% <ø> (ø)`

Files with missing lines	Coverage Δ
dash-spv/src/network/handshake.rs	`79.14% <100.00%> (-1.49%)`	⬇️
dash-spv/src/network/manager.rs	`63.71% <60.00%> (-1.64%)`	⬇️
dash-spv/src/network/reputation.rs	`79.28% <95.23%> (+5.10%)`	⬆️
dash-spv/src/network/addrv2.rs	`90.18% <94.38%> (+5.51%)`	⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Removes `#[serde(default)]` on the new session outcome fields and the legacy-JSON load test. Backward compatibility of `reputations.json` across versions is no longer a requirement, so the shims and test are dead weight.

manki-review

The hardcoded ServiceFlags::NETWORK passed to mark_seen on every successful handshake will silently poison stored peer capabilities — defeating the capability-aware selection that is the stated goal of the parent issue #102. Everything else is polish.

📊 8 findings (1 required, 4 suggestion, 3 nit) · 230 lines · 629s

Review stats

{
  "model": "claude-sonnet-4-6",
  "reviewTimeMs": 628761,
  "diffLines": 230,
  "diffAdditions": 227,
  "diffDeletions": 3,
  "filesReviewed": 4,
  "agents": [
    "Security & Safety",
    "Correctness & Logic",
    "Architecture & Design"
  ],
  "findingsRaw": 9,
  "findingsKept": 8,
  "findingsDropped": 1,
  "severity": {
    "required": 1,
    "suggestion": 4,
    "nit": 3
  },
  "verdict": "REQUEST_CHANGES",
  "prNumber": 106,
  "commitSha": "836f45c301490f36a43ece4ff6dc715a130f2c25",
  "agentMetrics": [
    {
      "name": "Security & Safety",
      "findingsRaw": 3,
      "findingsKept": 3,
      "responseLength": 3276
    },
    {
      "name": "Correctness & Logic",
      "findingsRaw": 3,
      "findingsKept": 3,
      "responseLength": 3953
    },
    {
      "name": "Architecture & Design",
      "findingsRaw": 3,
      "findingsKept": 3,
      "responseLength": 3427
    }
  ],
  "judgeMetrics": {
    "confidenceDistribution": {
      "high": 4,
      "medium": 4,
      "low": 0
    },
    "severityChanges": 8,
    "mergedDuplicates": 1
  },
  "fileMetrics": {
    "fileTypes": {
      ".rs": 4
    },
    "findingsPerFile": {
      "dash-spv/src/network/manager.rs": 2,
      "dash-spv/src/network/reputation.rs": 3,
      "dash-spv/src/network/addrv2.rs": 3
    }
  },
  "reviewerModel": "claude-sonnet-4-6",
  "judgeModel": "claude-sonnet-4-6"
}

… sites

manki-review · 2026-04-17T11:00:27Z

Manki — Review complete

Planner
311 lines · 0 agents

Review — 13 findings

Judge — 7 kept · 0 dropped (161s)
kept: 1 required · 4 suggestion · 2 nit

Review metadata

Config:

Models: reviewer=claude-sonnet-4-6, judge=claude-sonnet-4-6
Review level: medium (auto, 311 lines)
Team: Security & Safety, Architecture & Design, Correctness & Logic, Testing & Coverage, Performance & Efficiency
Memory: disabled
Nit handling: issues

Judge decisions:

✓ Kept: "PR description contradicts code: consecutive_failures lacks #[serde(default)], causing hard deserialization failure on upgrade" (required, high confidence) — "Impact: High (upgrading users cannot load their reputation store, causing a startup error or silent loss of all ban/misbehavior history), Likelihood: Probable (any node upgrading from a pre-existing reputations.json hits this). Option<T> fields work fine when absent because serde implicitly treats them as None, but a bare u32 field fails with a hard missing field error. The issue spec says old state can be discarded, but nothing in the storage caller currently handles that error gracefully — adding #[serde(default)] on the field, or explicit error recovery in load_from_storage, is required. Merged findings 3 and 8 — same underlying problem."
✓ Kept: "consecutive_failures not clamped on deserialization, unlike all other numeric fields" (suggestion, medium confidence) — "Impact: Medium (an unbounded value drives exponential backoff, potentially locking a peer out indefinitely in forthcoming Capability-aware peer selection scoring #104/Capability-driven churn in network maintenance loop #105 work), Likelihood: Possible (requires a corrupted or adversarially crafted reputations.json). Every other numeric field (score, ban_count, connection_attempts) has a custom clamping deserializer; consecutive_failures is the lone exception. The issue is somewhat deferred until the hard-failure bug (finding 1) is resolved, but the clamping pattern should be applied for consistency. Merged findings 1 and 2."
✓ Kept: "record_failure_with_penalty duplicates scoring/ban logic from update_reputation" (suggestion, high confidence) — "Impact: Medium (any future scoring rule change — ban threshold, action accounting, log format — must be applied in two places and can silently diverge), Likelihood: Probable (scoring logic is actively evolving across this PR series). Two reviewers independently flagged this. A private apply_score_change(rep: &mut PeerReputation, ...) -> bool helper would let both callers hold exactly one write lock while eliminating the ~35 lines of duplication. Merged findings 4 and 12."
✓ Kept: "record_failure_with_penalty — the hot-path combined method — has no direct unit test" (suggestion, high confidence) — "Impact: Medium (the combined-lock method is the primary failure-recording path in manager.rs for both handshake and connection failures, and its four distinct behaviors — conditional last_tried init, streak increment, score mutation, ban detection — are tested nowhere), Likelihood: Probable (regressions in the compound path will not be caught by the existing suite, and the acceptance criteria explicitly requires comprehensive coverage). Two reviewers flagged this independently. Merged findings 9 and 10."
✓ Kept: "record_connection_failure does not update last_tried after the first failure" (suggestion, medium confidence) — "Impact: Medium (a stale last_tried would make a peer appear eligible for retry sooner than it should be, undermining future backoff/cooldown logic), Likelihood: Possible (the normal flow calls record_connection_attempt first so the guard rarely fires, but any direct call or test that bypasses the attempt step will leave last_tried stale after the first failure). The issue thread PRRT_kwDOQSlaXs57s9cb flagged exactly this pattern and remains unresolved."
✓ Kept: "last_tried "set only when None" contract is untested in both failure methods" (nit, low confidence) — "Impact: Low (the contract exists and is documented; the test gap means silent breakage if a refactor accidentally unconditionalizes the write, but the behavior itself is also questionable per finding 5), Likelihood: N/A. Given that finding 5 argues the behavior should change to always-update, writing a test that asserts the current "don't overwrite" behavior may be premature."
✓ Kept: "record_failure_with_penalty calls Instant::now() twice unnecessarily" (nit, high confidence) — "Impact: Low (two monotonic-clock syscalls instead of one; negligible in practice), Likelihood: Certain. Capturing let now = Instant::now() before the inner block and reusing it for both banned_until and the ReputationEvent::timestamp eliminates the duplicate call and keeps the two timestamps consistent."

Timing:

Parse: 1.8s
Review agents: 247.6s
Judge: 160.9s
Total: 410.4s

Superseded by new review

xdustinface · 2026-04-17T11:01:17Z

@manki review

manki-review · 2026-04-17T11:01:30Z

Review skipped — a review is currently in progress. Retry when it completes, or force now:

Force review

manki-review

consecutive_failures: u32 without #[serde(default)] will hard-fail deserialization on any upgrade from an existing reputations.json — Option<T> fields silently default, but u32 returns a missing-field error. Separately, record_failure_with_penalty is wired into every failure path in manager.rs but has zero test coverage despite the acceptance criteria requiring comprehensive tests. Two previous threads resolved: non-atomic failure recording (PRRT_kwDOQSlaXs57s9cf) via the new combined method, and addrv2 construction duplication (PRRT_kwDOQSlaXs57s9cZ) via the extracted make_addr_message helper.

📊 7 findings (1 required, 4 suggestion, 2 nit) · 311 lines · 410s

Review stats

{
  "model": "claude-sonnet-4-6",
  "reviewTimeMs": 410387,
  "diffLines": 311,
  "diffAdditions": 289,
  "diffDeletions": 22,
  "filesReviewed": 5,
  "agents": [
    "Security & Safety",
    "Architecture & Design",
    "Correctness & Logic",
    "Testing & Coverage",
    "Performance & Efficiency"
  ],
  "findingsRaw": 13,
  "findingsKept": 7,
  "findingsDropped": 6,
  "severity": {
    "required": 1,
    "suggestion": 4,
    "nit": 2
  },
  "verdict": "REQUEST_CHANGES",
  "prNumber": 106,
  "commitSha": "d6e23009c816c41433e3daa20fa34a8f9821a0db",
  "agentMetrics": [
    {
      "name": "Security & Safety",
      "findingsRaw": 3,
      "findingsKept": 1,
      "responseLength": 3283
    },
    {
      "name": "Architecture & Design",
      "findingsRaw": 4,
      "findingsKept": 2,
      "responseLength": 4945
    },
    {
      "name": "Correctness & Logic",
      "findingsRaw": 2,
      "findingsKept": 2,
      "responseLength": 3359
    },
    {
      "name": "Testing & Coverage",
      "findingsRaw": 2,
      "findingsKept": 2,
      "responseLength": 4077
    },
    {
      "name": "Performance & Efficiency",
      "findingsRaw": 2,
      "findingsKept": 2,
      "responseLength": 2781
    }
  ],
  "judgeMetrics": {
    "confidenceDistribution": {
      "high": 4,
      "medium": 2,
      "low": 1
    },
    "severityChanges": 7,
    "mergedDuplicates": 6
  },
  "fileMetrics": {
    "fileTypes": {
      ".rs": 5
    },
    "findingsPerFile": {
      "dash-spv/src/network/reputation.rs": 5,
      "dash-spv/src/network/reputation_tests.rs": 2
    }
  },
  "reviewerModel": "claude-sonnet-4-6",
  "judgeModel": "claude-sonnet-4-6"
}

… on failure

manki-review · 2026-04-17T11:17:16Z

Manki — Review complete

Planner (21s)
feature · 460 lines · 4 agents
review effort: medium · judge effort: medium

Review — 12 findings
    ✅ Correctness & Logic — 2 (318s)
    ✅ Security & Safety — 3 (233s)
    ✅ Architecture & Design — 3 (115s)
    ✅ Testing & Coverage — 4 (64s)

Judge — 9 kept · 0 dropped (144s)
kept: 7 suggestion · 2 nit

Review metadata

Config:

Models: reviewer=claude-sonnet-4-6, judge=claude-sonnet-4-6
Review level: medium (auto, 460 lines)
Team: Correctness & Logic, Security & Safety, Architecture & Design, Testing & Coverage
Memory: disabled
Nit handling: issues

Judge decisions:

✓ Kept: "consecutive_failures not bounded at runtime write sites" (suggestion, high confidence) — "Impact: Medium (silent in-memory/persisted divergence that will produce incorrect backoff durations in Capability-aware peer selection scoring #104), Likelihood: Certain (every failure past 1,000 goes unbounded). The clamp exists only in clamp_peer_consecutive_failures on the deserialization path; both record_connection_failure and record_failure_with_penalty use bare saturating_add(1). A one-liner .min(MAX_CONSECUTIVE_FAILURES) at each site closes the gap."
✓ Kept: "record_failure_with_penalty overwrites last_tried already set by record_connection_attempt, making semantics inconsistent" (suggestion, medium confidence) — "Impact: Low-Medium (semantic inconsistency — failure path records failure time rather than dial-initiation time, off by milliseconds), Likelihood: Certain (every failure goes through both calls). The issue matters once exponential backoff in Capability-aware peer selection scoring #104/Capability-driven churn in network maintenance loop #105 consumes last_tried; removing the redundant assignment from record_failure_with_penalty is a clean fix."
✓ Kept: "Both record_connection_failure and record_failure_with_penalty are public, creating a double-count footgun" (suggestion, medium confidence) — "Impact: Medium (double-increment of consecutive_failures silently corrupts the backoff streak used by Capability-aware peer selection scoring #104), Likelihood: Possible (requires a future caller mistake, but the current public API actively invites it). Downgrading to pub(crate) or adding a prominent doc-comment warning would contain the risk."
✓ Kept: "record_failure_with_penalty inlines record_connection_failure fields without calling it, creating silent divergence risk" (suggestion, high confidence) — "Impact: Medium (if record_connection_failure gains a new field in a follow-up PR, record_failure_with_penalty will silently diverge despite its doc-comment guarantee), Likelihood: Possible (likely given planned Capability-aware peer management: churn and smarter selection #102 follow-on work). Extracting a private apply_failure_fields helper would enforce the equivalence structurally."
✓ Kept: "apply_score_change accepts peer: SocketAddr solely for logging, polluting a pure mutation function" (nit, high confidence) — "Impact: Low (works correctly, minor design impurity), Likelihood: N/A. Passing a Display impl or moving the log call to the caller would be cleaner but does not affect correctness."
✓ Kept: "record_failure_with_penalty tests never assert last_success is preserved" (suggestion, high confidence) — "Impact: Medium (a regression that accidentally clears last_success inside the combined method would go undetected), Likelihood: Certain (no test exercises this invariant). The analogous check exists for record_connection_failure — parity is a single assert away."
✓ Kept: "No test for clamp_peer_consecutive_failures actually clamping on deserialization" (suggestion, high confidence) — "Impact: Medium (adversarial or corrupted reputations.json with a large value would silently bypass the clamp if the code regresses), Likelihood: Possible. A direct serde_json::from_str round-trip with a value above MAX_CONSECUTIVE_FAILURES would be a one-shot guard."
✓ Kept: "mark_seen eviction path (new-entry branch) has no test" (suggestion, medium confidence) — "Impact: Medium (without a capacity test, an off-by-one in evict_if_needed or a missing call would silently allow unbounded map growth), Likelihood: Possible. Pre-filling to MAX_ADDR_TO_STORE and asserting len() <= MAX_ADDR_TO_STORE after a new insert is straightforward."
✓ Kept: "mark_seen tests cover only IPv4; no IPv6 coverage for the new code path" (nit, medium confidence) — "Impact: Low (IPv6 is handled by make_addr_message which is separately exercised, so risk of regression is low), Likelihood: Unlikely. A single [::1]:9999 test would close the gap but is not urgent."

Timing:

Parse: 1.4s
Review agents: 342.4s
Judge: 143.9s
Total: 487.7s

Superseded by new review

xdustinface · 2026-04-17T11:18:01Z

@manki review

manki-review

Deserialization clamping, missing last_tried update, and test coverage for record_failure_with_penalty are all addressed. The standout new concern is that consecutive_failures is bounded on the deserialization path but not at the two write sites — in-memory values will silently exceed 1,000 and snap back on restart, corrupting the backoff streak that #104 depends on.

📊 9 findings (7 suggestion, 2 nit) · 460 lines · 488s

Review stats

{
  "model": "claude-sonnet-4-6",
  "reviewTimeMs": 487686,
  "diffLines": 460,
  "diffAdditions": 412,
  "diffDeletions": 48,
  "filesReviewed": 5,
  "agents": [
    "Correctness & Logic",
    "Security & Safety",
    "Architecture & Design",
    "Testing & Coverage"
  ],
  "findingsRaw": 12,
  "findingsKept": 9,
  "findingsDropped": 3,
  "severity": {
    "required": 0,
    "suggestion": 7,
    "nit": 2
  },
  "verdict": "REQUEST_CHANGES",
  "prNumber": 106,
  "commitSha": "dcb69ffe9a7bf03de1660e4b58bd96428d94e0a7",
  "agentMetrics": [
    {
      "name": "Correctness & Logic",
      "findingsRaw": 2,
      "findingsKept": 1,
      "responseLength": 2429
    },
    {
      "name": "Security & Safety",
      "findingsRaw": 3,
      "findingsKept": 2,
      "responseLength": 3939
    },
    {
      "name": "Architecture & Design",
      "findingsRaw": 3,
      "findingsKept": 2,
      "responseLength": 4000
    },
    {
      "name": "Testing & Coverage",
      "findingsRaw": 4,
      "findingsKept": 4,
      "responseLength": 4854
    }
  ],
  "judgeMetrics": {
    "confidenceDistribution": {
      "high": 5,
      "medium": 4,
      "low": 0
    },
    "severityChanges": 9,
    "mergedDuplicates": 0
  },
  "fileMetrics": {
    "fileTypes": {
      ".rs": 5
    },
    "findingsPerFile": {
      "dash-spv/src/network/reputation.rs": 5,
      "dash-spv/src/network/reputation_tests.rs": 2,
      "dash-spv/src/network/addrv2.rs": 2
    }
  },
  "reviewerModel": "claude-sonnet-4-6",
  "judgeModel": "claude-sonnet-4-6"
}

…ailure-field updates Extract private `record_failure_fields` that applies `last_tried = now` and `consecutive_failures.saturating_add(1).min(MAX_CONSECUTIVE_FAILURES)`. Both `record_connection_failure` and `record_failure_with_penalty` now delegate to it, eliminating duplicated mutations and capping the in-memory streak at the same 1000 limit enforced by the deserializer.

…tion, \`mark_seen\` eviction - Assert `last_success` is unchanged after `record_failure_with_penalty` - Deserialize a `PeerReputation` with `consecutive_failures: 99999` and assert it clamps to `MAX_CONSECUTIVE_FAILURES` - Fill `AddrV2Handler` to capacity and assert `mark_seen` stays bounded and includes the new entry

manki-review · 2026-04-17T11:35:26Z

Manki — Review complete

Planner (26s)
feature · 506 lines · 4 agents
review effort: medium · judge effort: medium

Review — 12 findings
    ✅ Security & Safety — 2 (264s)
    ✅ Correctness & Logic — 4 (257s)
    ✅ Testing & Coverage — 4 (140s)
    ✅ Architecture & Design — 2 (136s)

Judge — 6 kept · 0 dropped (130s)
kept: 4 suggestion · 2 nit

Review metadata

Config:

Models: reviewer=claude-sonnet-4-6, judge=claude-sonnet-4-6
Review level: medium (auto, 506 lines)
Team: Security & Safety, Correctness & Logic, Testing & Coverage, Architecture & Design
Memory: disabled
Nit handling: issues

Judge decisions:

✓ Kept: "record_failure_with_penalty overwrites last_tried already set by record_connection_attempt" (suggestion, high confidence) — "Merged findings 1, 3, and 11 — three independent reviewers flagged the same issue. Every failure path overwrites the attempt timestamp with the failure-detection timestamp, making last_tried semantically inconsistent: on success it holds attempt time, on failure it holds failure time. Impact: Medium (incorrect cooldown anchor for backoff consumers in Capability-aware peer management: churn and smarter selection #102), Likelihood: Certain (happens on every attempt → failure sequence). Three-reviewer consensus keeps this at suggestion."
✓ Kept: "Both record_connection_failure and record_failure_with_penalty are public, creating a double-count footgun" (suggestion, high confidence) — "Merged findings 2, 4, and 12 — three independent reviewers flagged the same issue. Any caller invoking both for the same event silently doubles consecutive_failures, corrupting backoff logic. Current manager.rs call sites are correct, but the public API surface provides no guardrail. Impact: Medium (corrupted streak count), Likelihood: Possible (future callers, especially given the doc comment's misleading framing of the two as equivalent). Three-reviewer consensus maintains suggestion."
✓ Kept: "Flaky eviction test: new address may be discarded when all timestamps are identical" (suggestion, high confidence) — "Merged findings 5 and 7 — two independent reviewers flagged the same race. All pre-filled entries and the mark_seen entry can share the same second-granularity timestamp, making eviction order non-deterministic and the assertion intermittently false. Impact: High (flaky CI blocker that can mask real regressions), Likelihood: Probable in fast CI environments. Two-reviewer consensus and real flakiness risk."
✓ Kept: "No runtime saturation test for consecutive_failures ceiling" (suggestion, medium confidence) — "The deserialization clamp is tested, but the runtime write path (saturating_add(1).min(MAX_CONSECUTIVE_FAILURES)) is never exercised past MAX. Removing the .min() call would silently break the cap with no test catching it. Acceptance criteria requires comprehensive coverage. Impact: Medium, Likelihood: Certain (the gap is structural)."
✓ Kept: "test_record_successful_connection does not assert last_tried is preserved" (nit, medium confidence) — "Only one reviewer flagged this. The omission is a minor coverage gap; record_successful_connection is simple enough that accidentally writing last_tried is unlikely. Impact: Low, Likelihood: Unlikely."
✓ Kept: "record_connection_attempt repeated-call behavior untested: last_tried advancement not verified" (nit, medium confidence) — "Only one reviewer flagged this. The failure path already has an equivalent test (test_record_connection_failure_always_updates_last_tried), so the gap is narrow. Impact: Low, Likelihood: Unlikely."

Timing:

Parse: 1.3s
Review agents: 293.3s
Judge: 130.0s
Total: 424.5s

Superseded by new review

…nalty` A value of 0 is a deliberate no-op for the reputation score but still records the failure counter and timestamp, which is useful for failures that should be tracked without contributing toward a ban.

manki-review · 2026-04-17T12:40:32Z

Manki — Review complete

Planner (25s)
feature · 591 lines · 3 agents
review effort: medium · judge effort: medium

Review — 8 findings
    ✅ Security & Safety — 2 (228s)
    ✅ Correctness & Logic — 3 (318s)
    ✅ Testing & Coverage — 3 (132s)

Judge — 4 kept · 0 dropped (95s)
kept: 2 suggestion · 2 nit

Review metadata

Config:

Models: reviewer=claude-sonnet-4-6, judge=claude-sonnet-4-6
Review level: medium (auto, 591 lines)
Team: Security & Safety, Correctness & Logic, Testing & Coverage
Memory: disabled
Nit handling: issues

Judge decisions:

✓ Kept: "clamp_future_system_time: unchecked subtraction panics on broken system clocks" (suggestion, high confidence) — "Merged findings 1 and 2 — same issue. Impact: Medium (panic in the deserialization/load path). Likelihood: Unlikely in production but Possible in CI with mocked or near-epoch clocks. Two independent reviewers flagged it; the fix (checked_sub with UNIX_EPOCH fallback) is one line and eliminates the panic path entirely. Per calibration guidance, borderline nit/suggestion goes up."
✓ Kept: "normalize_after_load has no end-to-end test through load_from_storage" (suggestion, high confidence) — "Merged findings 3 and 5 — same issue. Impact: Medium (peers can carry an incorrect consecutive_failures count into backoff decisions if last_tried is clamped away on load). Likelihood: Possible (requires a persisted record with a stale or far-future last_tried and non-zero consecutive_failures). Two independent reviewers flagged it; the logic exists but is never exercised by the test suite through the actual load path."
✓ Kept: "record_failure_with_penalty always emits a ReputationEvent even when score_change is 0" (nit, medium confidence) — "Impact: Low (spurious zero-change entries in the event log, misleading to log consumers). Likelihood: Probable for callers that pass score_change=0. One reviewer flagged it; the semantics are confusing but not harmful to correctness of backoff or selection logic."
✓ Kept: "Timing-sensitive assertions rely on 5 ms sleep for SystemTime ordering" (nit, medium confidence) — "Impact: Low (sporadic CI test failures). Likelihood: Possible on coarse-clock platforms like Windows (~15 ms tick) or under heavy load. One reviewer flagged it; a 20 ms sleep or a spin-until helper would make the ordering robust at negligible cost."

Timing:

Parse: 2.5s
Review agents: 346.5s
Judge: 94.5s
Total: 443.5s

manki-review

normalize_after_load correctly patches the last_tried/consecutive_failures invariant, but the unchecked subtraction panic in clamp_future_system_time and the missing persistence round-trip test for normalize_after_load both survive from the previous review. New nits on spurious zero-delta events and flaky timing assertions.

📊 4 findings (2 suggestion, 2 nit) · 591 lines · 444s

Review stats

{
  "model": "claude-sonnet-4-6",
  "reviewTimeMs": 443541,
  "diffLines": 591,
  "diffAdditions": 543,
  "diffDeletions": 48,
  "filesReviewed": 5,
  "agents": [
    "Security & Safety",
    "Correctness & Logic",
    "Testing & Coverage"
  ],
  "findingsRaw": 8,
  "findingsKept": 4,
  "findingsDropped": 4,
  "severity": {
    "required": 0,
    "suggestion": 2,
    "nit": 2
  },
  "verdict": "REQUEST_CHANGES",
  "prNumber": 106,
  "commitSha": "90bd2fd2bda76e6402f4b55d4f322e46c8868060",
  "agentMetrics": [
    {
      "name": "Security & Safety",
      "findingsRaw": 2,
      "findingsKept": 0,
      "responseLength": 2133
    },
    {
      "name": "Correctness & Logic",
      "findingsRaw": 3,
      "findingsKept": 3,
      "responseLength": 2774
    },
    {
      "name": "Testing & Coverage",
      "findingsRaw": 3,
      "findingsKept": 2,
      "responseLength": 4243
    }
  ],
  "judgeMetrics": {
    "confidenceDistribution": {
      "high": 2,
      "medium": 2,
      "low": 0
    },
    "severityChanges": 4,
    "mergedDuplicates": 2
  },
  "fileMetrics": {
    "fileTypes": {
      ".rs": 5
    },
    "findingsPerFile": {
      "dash-spv/src/network/reputation.rs": 2,
      "dash-spv/src/network/reputation_tests.rs": 2
    }
  },
  "reviewerModel": "claude-sonnet-4-6",
  "judgeModel": "claude-sonnet-4-6"
}

…n broken clocks

manki-review · 2026-04-17T12:54:09Z

Manki — Review complete

Planner (29s)
feature · 626 lines · 3 agents
review effort: medium · judge effort: medium

Review — 7 findings
    ✅ Security & Safety — 2 (153s)
    ✅ Correctness & Logic — 1 (261s)
    ✅ Testing & Coverage — 4 (126s)

Judge — 5 kept · 0 dropped (66s)
kept: 2 suggestion · 3 nit

Review metadata

Config:

Models: reviewer=claude-sonnet-4-6, judge=claude-sonnet-4-6
Review level: medium (auto, 626 lines)
Team: Security & Safety, Correctness & Logic, Testing & Coverage
Memory: disabled
Nit handling: issues

Judge decisions:

✓ Kept: "clamp_future_system_time: unchecked addition on now + FUTURE_TIMESTAMP_TOLERANCE can panic" (suggestion, high confidence) — "Impact: High (process crash from panic in deserialization), Likelihood: Unlikely under normal conditions but possible with broken RTC or adversarial NTP. The asymmetry with the adjacent checked_sub on the floor line makes this a clear oversight, and two reviewers independently flagged it. Fix is one-line with no tradeoffs. Merged findings 1 and 3 — same line, same root cause."
✓ Kept: "mark_seen: silent u32 truncation on timestamp after year 2106" (nit, high confidence) — "Impact: Medium (peers appear maximally stale), Likelihood: Unlikely (won't manifest for 80 years). The pattern already exists in unchanged sibling code paths, reducing urgency further."
✓ Kept: "normalize_after_load storage round-trip only covers future-timestamp path, not stale-timestamp path" (suggestion, high confidence) — "Impact: Medium (a bug in the floor comparison or normalize guard would not be caught), Likelihood: Certain (the code path is definitively untested). The new test is a meaningful improvement but the acceptance criterion requires comprehensive coverage of each field transition — the stale path is a distinct branch with its own consecutive_failures reset obligation."
✓ Kept: "test_consecutive_failures_saturates_at_runtime causes unintended ban as a side-effect" (nit, high confidence) — "Impact: Low (test clarity only, assertion still verifies saturation correctly), Likelihood: Certain. Using score_change=0 would make the test's purpose unambiguous at no cost."
✓ Kept: "New peer_services() getter in handshake.rs has no unit test" (nit, medium confidence) — "Impact: Low (trivial accessor, regression would surface quickly in integration tests), Likelihood: Possible. Useful to have but not blocking."

Timing:

Parse: 1.8s
Review agents: 293.5s
Judge: 66.1s
Total: 361.4s

manki-review

The unchecked subtraction flagged last review is gone — floor now uses checked_sub — but an analogous unchecked addition for the ceiling slipped in at the same line, and the storage round-trip test added for normalize_after_load only exercises the future-timestamp path, leaving the stale (>30 day) path untested.

📊 5 findings (2 suggestion, 3 nit) · 626 lines · 361s

Review stats

{
  "model": "claude-sonnet-4-6",
  "reviewTimeMs": 361388,
  "diffLines": 626,
  "diffAdditions": 578,
  "diffDeletions": 48,
  "filesReviewed": 5,
  "agents": [
    "Security & Safety",
    "Correctness & Logic",
    "Testing & Coverage"
  ],
  "findingsRaw": 7,
  "findingsKept": 5,
  "findingsDropped": 2,
  "severity": {
    "required": 0,
    "suggestion": 2,
    "nit": 3
  },
  "verdict": "REQUEST_CHANGES",
  "prNumber": 106,
  "commitSha": "a2c4a705ce8fc80938056d4a8b66f5f0ce3abef3",
  "agentMetrics": [
    {
      "name": "Security & Safety",
      "findingsRaw": 2,
      "findingsKept": 1,
      "responseLength": 1870
    },
    {
      "name": "Correctness & Logic",
      "findingsRaw": 1,
      "findingsKept": 1,
      "responseLength": 793
    },
    {
      "name": "Testing & Coverage",
      "findingsRaw": 4,
      "findingsKept": 3,
      "responseLength": 4216
    }
  ],
  "judgeMetrics": {
    "confidenceDistribution": {
      "high": 4,
      "medium": 1,
      "low": 0
    },
    "severityChanges": 5,
    "mergedDuplicates": 1
  },
  "fileMetrics": {
    "fileTypes": {
      ".rs": 5
    },
    "findingsPerFile": {
      "dash-spv/src/network/reputation.rs": 1,
      "dash-spv/src/network/addrv2.rs": 1,
      "dash-spv/src/network/reputation_tests.rs": 2,
      "dash-spv/src/network/handshake.rs": 1
    }
  },
  "reviewerModel": "claude-sonnet-4-6",
  "judgeModel": "claude-sonnet-4-6"
}

…n far-future clocks

…-trip

manki-review · 2026-04-17T13:03:06Z

Manki — Review complete

Planner (28s)
feature · 655 lines · 3 agents
review effort: medium · judge effort: high

Review — 7 findings
    ✅ Security & Safety — 1 (180s)
    ✅ Correctness & Logic — 3 (490s)
    ✅ Testing & Coverage — 3 (124s)

Judge — 5 kept · 0 dropped (82s)
kept: 2 suggestion · 3 nit

Review metadata

Config:

Models: reviewer=claude-sonnet-4-6, judge=claude-sonnet-4-6
Review level: medium (auto, 655 lines)
Team: Security & Safety, Correctness & Logic, Testing & Coverage
Memory: disabled
Nit handling: issues

Judge decisions:

✓ Kept: "mark_seen silently preserves stale gossip services for existing entries" (suggestion, medium confidence) — "Impact: Medium-High (capability-aware selection in Capability-aware peer selection scoring #104 will make wrong decisions if service flags are stale), Likelihood: Possible (only manifests after a peer downgrades its capabilities). The comment explicitly documents the behavior as intentional, and the Track and persist session peer outcomes #103 acceptance criteria only requires bumping AddrV2.time — not updating services. However, calling this a 'direct observation beats gossip' function while not propagating the directly-observed services is an internal inconsistency that will silently corrupt the data Capability-aware peer selection scoring #104 depends on. Keeping as suggestion since it's one reviewer, behavior is documented, and it only matters at Capability-aware peer selection scoring #104 integration time."
✓ Kept: "test_consecutive_failures_saturates_at_runtime conflates saturation with banning" (nit, high confidence) — "Impact: Low (test-correctness confusion, not production code), Likelihood: Certain (the misbehavior score accumulates to MAX and bans the peer partway through the loop). The assertion still passes, so this is a test clarity issue, not a correctness bug."
✓ Kept: "test_mark_seen_evicts_when_at_capacity has implicit assumption on MAX_ADDR_TO_STORE bound" (nit, high confidence) — "Impact: Low (test fragility, not production), Likelihood: Unlikely (would only fail if MAX_ADDR_TO_STORE exceeded ~302,401, an implausible value). Fragile assumption but not a present-day risk."
✓ Kept: "normalize_after_load positive branch (consecutive_failures preserved when last_tried valid) has no test" (suggestion, high confidence) — "Impact: Medium (a regression that unconditionally zeroes consecutive_failures on load would go undetected, silently breaking the cooldown logic Capability-aware peer management: churn and smarter selection #102 depends on), Likelihood: Certain (the gap exists and will remain until a test is added). The two existing round-trip tests only exercise the reset branch. The suggested test is clear and self-contained."
✓ Kept: "test_consecutive_failures_saturates_at_runtime spins 1001 async lock acquisitions to test saturation" (nit, high confidence) — "Impact: Low (test suite slowdown, no production effect), Likelihood: Certain. Directionally valid but the overhead is acceptable for a single async test; not worth restructuring unless the full suite shows measurable impact."

Timing:

Parse: 2.0s
Review agents: 519.9s
Judge: 82.4s
Total: 604.4s

manki-review

Acceptance criteria are substantially met — last_tried/last_success/consecutive_failures are tracked and wired in. Two gaps remain: mark_seen preserves stale gossip service flags instead of updating them from the direct handshake observation, and the normalize_after_load path that preserves consecutive_failures on a valid last_tried has no test, leaving a silent regression risk.

📊 5 findings (2 suggestion, 3 nit) · 655 lines · 604s

Review stats

{
  "model": "claude-sonnet-4-6",
  "reviewTimeMs": 604383,
  "diffLines": 655,
  "diffAdditions": 607,
  "diffDeletions": 48,
  "filesReviewed": 5,
  "agents": [
    "Security & Safety",
    "Correctness & Logic",
    "Testing & Coverage"
  ],
  "findingsRaw": 7,
  "findingsKept": 5,
  "findingsDropped": 2,
  "severity": {
    "required": 0,
    "suggestion": 2,
    "nit": 3
  },
  "verdict": "REQUEST_CHANGES",
  "prNumber": 106,
  "commitSha": "71d37487cc00d92f1cece6854e5c939235741981",
  "agentMetrics": [
    {
      "name": "Security & Safety",
      "findingsRaw": 1,
      "findingsKept": 0,
      "responseLength": 1060
    },
    {
      "name": "Correctness & Logic",
      "findingsRaw": 3,
      "findingsKept": 3,
      "responseLength": 3061
    },
    {
      "name": "Testing & Coverage",
      "findingsRaw": 3,
      "findingsKept": 2,
      "responseLength": 4549
    }
  ],
  "judgeMetrics": {
    "confidenceDistribution": {
      "high": 4,
      "medium": 1,
      "low": 0
    },
    "severityChanges": 5,
    "mergedDuplicates": 0
  },
  "fileMetrics": {
    "fileTypes": {
      ".rs": 5
    },
    "findingsPerFile": {
      "dash-spv/src/network/addrv2.rs": 2,
      "dash-spv/src/network/reputation_tests.rs": 3
    }
  },
  "reviewerModel": "claude-sonnet-4-6",
  "judgeModel": "claude-sonnet-4-6"
}

…_time` Remove `TIMESTAMP_MAX_AGE` and the `floor` computation that rejected timestamps older than 30 days. The future-timestamp guard (10-second tolerance) is the only meaningful constraint. Update the `normalize_after_load` doc comment to drop the "stale" reference.

…ture_system_time` coverage Delete `test_normalize_after_load_via_storage_round_trip_stale` (tested the removed stale-floor path). Merge the three `test_clamp_future_system_time_*` tests into a single `test_clamp_future_system_time` covering future rejection and recent-past acceptance.

manki-review · 2026-04-17T13:19:08Z

Manki — Review complete

Planner (21s)
feature · 615 lines · 3 agents
review effort: medium · judge effort: medium

Review — 8 findings
    ✅ Correctness & Logic — 2 (811s)
    ✅ Security & Safety — 2 (137s)
    ✅ Testing & Coverage — 4 (84s)

Judge — 5 kept · 0 dropped (94s)
kept: 3 suggestion · 2 nit

Review metadata

Config:

Models: reviewer=claude-sonnet-4-6, judge=claude-sonnet-4-6
Review level: medium (auto, 615 lines)
Team: Correctness & Logic, Security & Safety, Testing & Coverage
Memory: disabled
Nit handling: issues

Judge decisions:

✓ Kept: "mark_seen silently preserves stale gossip services for existing entries" (suggestion, high confidence) — "Merged findings 1 and 2 — same issue. The behavior is intentional and documented in the docstring ('Existing services on a known entry are preserved'), but the docstring's justification (gossip aggregates more capabilities than the handshake version message) is not stated inline. Impact: Medium (capability-aware selection in follow-on issues Capability-aware peer selection scoring #104/Capability-driven churn in network maintenance loop #105 will read stale service flags as authoritative direct-observation data), Likelihood: Possible (only matters once selection logic uses services). The docstring partially mitigates the risk, but a brief inline comment at the match arm would prevent future editors from 'fixing' this as a bug when wiring up capability-aware churn."
✓ Kept: "normalize_after_load positive branch (consecutive_failures preserved when last_tried valid) has no test" (suggestion, high confidence) — "Merged findings 3 and 4 — same gap, flagged by two independent reviewers. The only normalize_after_load round-trip test exercises the discard/reset branch (future timestamp). The preserve branch — where a valid past last_tried should leave consecutive_failures intact — is uncovered. The acceptance criteria explicitly states 'Comprehensive test coverage is a requirement', and the backoff logic in Capability-aware peer management: churn and smarter selection #102 depends on this invariant surviving load. Impact: High (broken backoff silently re-hammers peers after a restart), Likelihood: Possible (any refactor of normalize_after_load). Two reviewers agreeing on a gap against an explicit AC requirement keeps this at suggestion rather than nit."
✓ Kept: "record_failure_with_penalty(score_change=0) — the no-penalty tracking path — has no test" (suggestion, medium confidence) — "The doc comment defines score_change=0 as an explicit, distinct mode. Every test in the PR uses a positive score_change. A regression that accidentally applies a penalty or skips the consecutive_failures/last_tried update in this branch would pass all current tests. Impact: Medium (incorrect penalty or missed tracking silently corrupts peer scoring), Likelihood: Possible (distinct code path with no test guard). The acceptance criteria also requires comprehensive coverage of each field transition."
✓ Kept: "test_consecutive_failures_saturates_at_runtime loops 1001 async write-lock acquisitions" (nit, medium confidence) — "Correctness is fine; the test verifies saturation at MAX_CONSECUTIVE_FAILURES. The 1001-iteration loop adds unnecessary async overhead with no additional coverage benefit. A targeted seed + two calls would test the same boundary. Impact: Low (test suite slowness only), Likelihood: Certain."
✓ Kept: "test_clamp_future_system_time does not cover the tolerance boundary (timestamps ~10 s ahead)" (nit, medium confidence) — "The test covers 1-hour-future (rejected) and 10-second-past (accepted) but misses the boundary region around FUTURE_TIMESTAMP_TOLERANCE. A change to the constant or the comparison direction would not be caught. Impact: Low (test coverage gap only; production behavior is unaffected by missing test), Likelihood: Unlikely."

Timing:

Parse: 1.9s
Review agents: 835.3s
Judge: 94.0s
Total: 931.2s

Since round 1 the caller passes the actual handshake-negotiated services, so preserving the gossip-sourced value was inverted. The handshake-observed value is authoritative and is now written on both new and existing entries. Rename \`test_mark_seen_bumps_time_and_preserves_services\` to \`test_mark_seen_bumps_time_and_updates_services\` and update its assertion to expect the handshake services, not the original gossip services.

Add \`test_normalize_after_load_preserves_failures_when_last_tried_valid\` to assert that a valid (non-future) \`last_tried\` and a non-zero \`consecutive_failures\` are both preserved through the load round-trip, complementing the existing reset-path test.

manki-review

Both previously open threads remain unaddressed: mark_seen still preserves existing services without explanation, and the normalize_after_load positive branch (valid last_tried preserving consecutive_failures) still has no test. The acceptance criteria explicitly requires comprehensive test coverage for each new field transition — that bar is not yet met.

📊 5 findings (3 suggestion, 2 nit) · 615 lines · 931s

Review stats

{
  "model": "claude-sonnet-4-6",
  "reviewTimeMs": 931175,
  "diffLines": 615,
  "diffAdditions": 567,
  "diffDeletions": 48,
  "filesReviewed": 5,
  "agents": [
    "Correctness & Logic",
    "Security & Safety",
    "Testing & Coverage"
  ],
  "findingsRaw": 8,
  "findingsKept": 5,
  "findingsDropped": 3,
  "severity": {
    "required": 0,
    "suggestion": 3,
    "nit": 2
  },
  "verdict": "REQUEST_CHANGES",
  "prNumber": 106,
  "commitSha": "fe827b182bb4e68ef88d8defc36732a4ab95d3f9",
  "agentMetrics": [
    {
      "name": "Correctness & Logic",
      "findingsRaw": 2,
      "findingsKept": 1,
      "responseLength": 3682
    },
    {
      "name": "Security & Safety",
      "findingsRaw": 2,
      "findingsKept": 2,
      "responseLength": 1774
    },
    {
      "name": "Testing & Coverage",
      "findingsRaw": 4,
      "findingsKept": 4,
      "responseLength": 6153
    }
  ],
  "judgeMetrics": {
    "confidenceDistribution": {
      "high": 2,
      "medium": 3,
      "low": 0
    },
    "severityChanges": 5,
    "mergedDuplicates": 2
  },
  "fileMetrics": {
    "fileTypes": {
      ".rs": 5
    },
    "findingsPerFile": {
      "dash-spv/src/network/addrv2.rs": 1,
      "dash-spv/src/network/reputation_tests.rs": 4
    }
  },
  "reviewerModel": "claude-sonnet-4-6",
  "judgeModel": "claude-sonnet-4-6"
}

manki-review · 2026-04-17T13:34:55Z

Manki — Review complete

Planner (28s)
feature · 651 lines · 4 agents
review effort: medium · judge effort: medium

Review — 9 findings
    ✅ Security & Safety — 1 (152s)
    ✅ Correctness & Logic — 2 (257s)
    ✅ Testing & Coverage — 3 (131s)
    ✅ Architecture & Design — 3 (83s)

Judge — 4 kept · 0 dropped (90s)
kept: 1 required · 1 suggestion · 2 nit

Review metadata

Config:

Models: reviewer=claude-sonnet-4-6, judge=claude-sonnet-4-6
Review level: medium (auto, 651 lines)
Team: Security & Safety, Correctness & Logic, Testing & Coverage, Architecture & Design
Memory: disabled
Nit handling: issues

Judge decisions:

✓ Kept: "manager.rs lifecycle wiring is not unit-tested" (required, high confidence) — "The acceptance criteria for issue Track and persist session peer outcomes #103 explicitly state 'Unit tests must cover each new field transition (set on attempt, set on success, increment on failure, reset on success).' The manager.rs wiring — including the peer_services().unwrap_or(ServiceFlags::NETWORK) fallback at line 270 — is not exercised by any unit test added in this PR. If the fallback fires silently after a successful handshake, negotiated capabilities would be discarded without detection. Impact: High (silent capability data loss), Likelihood: Probable (the fallback is a live code path on every success) → unmet acceptance criterion elevates this to required."
✓ Kept: "test_consecutive_failures_saturates_at_runtime runs 1001 async lock acquisitions" (nit, high confidence) — "Correct observation — 900+ iterations run in an already-banned state with no additional behavioral coverage — but it's a test efficiency concern, not a correctness issue. Impact: Low (slow CI), Likelihood: Certain but inconsequential → nit."
✓ Kept: "record_failure_fields misplaced on manager instead of PeerReputation" (suggestion, medium confidence) — "The inconsistency with apply_decay and normalize_after_load (both on PeerReputation) is real and weakens cohesion, but this is a design/maintainability concern rather than a correctness or safety issue. Only one reviewer flagged it. Impact: Medium (maintainability), Likelihood: N/A → suggestion."
✓ Kept: "apply_score_change accepts peer: SocketAddr solely for tracing, mixing mutation and observability" (nit, high confidence) — "The concern about mixing mutation and observability is valid style feedback, but it doesn't affect correctness or safety. Only one reviewer flagged it. Impact: Low (design cleanliness) → nit."

Timing:

Parse: 2.8s
Review agents: 287.4s
Judge: 90.3s
Total: 380.5s

manki-review

Substantial new test coverage in reputation_tests.rs (+334 lines) likely addresses the two open test-coverage threads. The manager.rs lifecycle wiring — specifically that handshake-negotiated services are stored (not silently defaulted to NETWORK) and that consecutive_failures resets on success — remains untested at the unit level, which the issue's acceptance criteria explicitly require.

📊 4 findings (1 required, 1 suggestion, 2 nit) · 651 lines · 380s

Review stats

{
  "model": "claude-sonnet-4-6",
  "reviewTimeMs": 380457,
  "diffLines": 651,
  "diffAdditions": 603,
  "diffDeletions": 48,
  "filesReviewed": 5,
  "agents": [
    "Security & Safety",
    "Correctness & Logic",
    "Testing & Coverage",
    "Architecture & Design"
  ],
  "findingsRaw": 9,
  "findingsKept": 4,
  "findingsDropped": 5,
  "severity": {
    "required": 1,
    "suggestion": 1,
    "nit": 2
  },
  "verdict": "REQUEST_CHANGES",
  "prNumber": 106,
  "commitSha": "1f9661254e00ca78432d49d692bacaac4778dea1",
  "agentMetrics": [
    {
      "name": "Security & Safety",
      "findingsRaw": 1,
      "findingsKept": 0,
      "responseLength": 1516
    },
    {
      "name": "Correctness & Logic",
      "findingsRaw": 2,
      "findingsKept": 0,
      "responseLength": 2411
    },
    {
      "name": "Testing & Coverage",
      "findingsRaw": 3,
      "findingsKept": 2,
      "responseLength": 4221
    },
    {
      "name": "Architecture & Design",
      "findingsRaw": 3,
      "findingsKept": 2,
      "responseLength": 3345
    }
  ],
  "judgeMetrics": {
    "confidenceDistribution": {
      "high": 3,
      "medium": 1,
      "low": 0
    },
    "severityChanges": 4,
    "mergedDuplicates": 0
  },
  "fileMetrics": {
    "fileTypes": {
      ".rs": 5
    },
    "findingsPerFile": {
      "dash-spv/src/network/manager.rs": 1,
      "dash-spv/src/network/reputation_tests.rs": 1,
      "dash-spv/src/network/reputation.rs": 2
    }
  },
  "reviewerModel": "claude-sonnet-4-6",
  "judgeModel": "claude-sonnet-4-6"
}

manki-review · 2026-04-17T13:47:03Z

Manki — Review complete

Planner (27s)
feature · 651 lines · 4 agents
review effort: medium · judge effort: high

Review — 13 findings
    ✅ Security & Safety — 3 (236s)
    ✅ Correctness & Logic — 3 (292s)
    ✅ Testing & Coverage — 4 (222s)
    ✅ Architecture & Design — 3 (87s)

Judge — 2 kept · 0 dropped (86s)
kept: 1 required · 1 suggestion

Review metadata

Config:

Models: reviewer=claude-sonnet-4-6, judge=claude-sonnet-4-6
Review level: medium (auto, 651 lines)
Team: Security & Safety, Correctness & Logic, Testing & Coverage, Architecture & Design
Memory: disabled
Nit handling: issues

Judge decisions:

✓ Kept: "Manager lifecycle wiring is not unit-tested" (required, high confidence) — "Merged findings 1, 2, 3, and 4 — all four reviewers independently identified the same gap. The new hooks in connect_to_peer (peer_services capture, mark_seen call, both failure paths through record_failure_with_penalty) are exercised only by dashd integration tests. The acceptance criteria explicitly states 'Comprehensive test coverage is a requirement. Unit tests must cover each new field transition' and this is a carry-over open thread (PRRT_kwDOQSlaXs57v0u9). Impact: High (silent regression in services propagation or failure-path routing breaks backoff and selection correctness for downstream Capability-aware peer selection scoring #104/Capability-driven churn in network maintenance loop #105), Likelihood: Probable (any refactor of handshake or addrv2 interfaces could silently break the wiring) → required."
✓ Kept: "apply_score_change operates on PeerReputation data but lives on PeerReputationManager" (suggestion, medium confidence) — "The method takes rep: &mut PeerReputation as its primary operand — all mutation is on that struct — but is defined on the manager. The previous thread (PRRT_kwDOQSlaXs57v0vJ) flagged the same issue for record_failure_fields; this PR addressed some of that refactor but left apply_score_change with the same design inconsistency. Impact: Medium (reduces cohesion and makes PeerReputation harder to test in isolation), Likelihood: N/A (design issue, not a runtime bug) → suggestion."

Timing:

Parse: 2.3s
Review agents: 322.4s
Judge: 85.6s
Total: 410.4s

manki-review

reputation.rs now has thorough unit coverage for each field transition (+334 lines in reputation_tests.rs), satisfying most acceptance criteria. The gap flagged in the previous review persists: the orchestration in connect_to_peer — capturing peer_services, routing to mark_seen, and switching both failure paths to record_failure_with_penalty — has no unit test, only integration coverage against dashd.

📊 2 findings (1 required, 1 suggestion) · 651 lines · 410s

Review stats

{
  "model": "claude-sonnet-4-6",
  "reviewTimeMs": 410373,
  "diffLines": 651,
  "diffAdditions": 603,
  "diffDeletions": 48,
  "filesReviewed": 5,
  "agents": [
    "Security & Safety",
    "Correctness & Logic",
    "Testing & Coverage",
    "Architecture & Design"
  ],
  "findingsRaw": 13,
  "findingsKept": 2,
  "findingsDropped": 11,
  "severity": {
    "required": 1,
    "suggestion": 1,
    "nit": 0
  },
  "verdict": "REQUEST_CHANGES",
  "prNumber": 106,
  "commitSha": "0d2b54bad6fc2042b6014193dc77500fefe0d98e",
  "agentMetrics": [
    {
      "name": "Security & Safety",
      "findingsRaw": 3,
      "findingsKept": 1,
      "responseLength": 3255
    },
    {
      "name": "Correctness & Logic",
      "findingsRaw": 3,
      "findingsKept": 1,
      "responseLength": 3323
    },
    {
      "name": "Testing & Coverage",
      "findingsRaw": 4,
      "findingsKept": 0,
      "responseLength": 4533
    },
    {
      "name": "Architecture & Design",
      "findingsRaw": 3,
      "findingsKept": 2,
      "responseLength": 3618
    }
  ],
  "judgeMetrics": {
    "confidenceDistribution": {
      "high": 1,
      "medium": 1,
      "low": 0
    },
    "severityChanges": 2,
    "mergedDuplicates": 3
  },
  "fileMetrics": {
    "fileTypes": {
      ".rs": 5
    },
    "findingsPerFile": {
      "dash-spv/src/network/manager.rs": 1,
      "dash-spv/src/network/reputation.rs": 1
    }
  },
  "reviewerModel": "claude-sonnet-4-6",
  "judgeModel": "claude-sonnet-4-6"
}

manki-review · 2026-04-17T14:03:24Z

Manki — Review complete

Planner
675 lines · 0 agents

Review — 12 findings

Judge — 5 kept · 0 dropped (103s)
kept: 2 suggestion · 3 nit

Review metadata

Config:

Models: reviewer=claude-sonnet-4-6, judge=claude-sonnet-4-6
Review level: medium (auto, 675 lines)
Team: Security & Safety, Architecture & Design, Correctness & Logic, Testing & Coverage, Performance & Efficiency
Memory: disabled
Nit handling: issues

Judge decisions:

✓ Kept: "peer_services fallback silently misrepresents peer capabilities in AddrV2 store" (suggestion, medium confidence) — "Impact: Medium (downstream capability-aware selection sees incorrect service flags, directly undermining Capability-aware peer selection scoring #104's selection logic), Likelihood: Possible (requires a latent bug in the handshake path, but that's exactly the kind of invariant a debug_assert would catch in CI). The fix is a one-liner and the payoff — surfacing the invariant violation in testing rather than silently poisoning the peer store — is high relative to the cost."
✓ Kept: "apply_score_change takes SocketAddr solely for logging, coupling data and display" (nit, high confidence) — "Impact: Low (design coupling, no correctness consequence), Likelihood: N/A. The method is on PeerReputation (takes &mut self), so the architectural separation is already better than the previous review noted. The logging-parameter coupling is a style concern only."
✓ Kept: "add_known_address unconditionally overwrites mark_seen data" (suggestion, medium confidence) — "Impact: Medium (handshake-verified service flags and timestamp replaced by gossip data if call ordering is wrong), Likelihood: Possible (depends on call sequence, but the public API permits it and the two methods have semantically inconsistent behavior for existing entries). The acceptance criterion explicitly says 'direct observation beats gossip', making this a latent footgun against the stated design intent."
✓ Kept: "normalize_after_load called before corruption guard" (nit, high confidence) — "Impact: Low (unnecessary work on discarded entries only), Likelihood: Unlikely (corrupted entries are rare). Purely an ordering/readability concern with no correctness consequence."
✓ Kept: "Double hash lookup in mark_seen new-entry path" (nit, high confidence) — "Impact: Low (micro-optimization in a non-hot path — called only on successful handshake), Likelihood: Certain in the None branch but inconsequential. The Entry API suggestion is cleaner but not worth requiring."

Timing:

Parse: 2.0s
Review agents: 407.9s
Judge: 103.1s
Total: 513.0s

manki-review

Acceptance criteria are substantially met — new fields, lifecycle hookups, and 334 lines of new tests land. The most noteworthy new concern is unwrap_or(ServiceFlags::NETWORK) on the handshake success path: if peer_services is ever None post-handshake, capability-aware selection (the next sub-issue) will silently score peers against the wrong service flags with no diagnostic signal.

📊 5 findings (2 suggestion, 3 nit) · 675 lines · 513s

Review stats

{
  "model": "claude-sonnet-4-6",
  "reviewTimeMs": 512954,
  "diffLines": 675,
  "diffAdditions": 613,
  "diffDeletions": 62,
  "filesReviewed": 5,
  "agents": [
    "Security & Safety",
    "Architecture & Design",
    "Correctness & Logic",
    "Testing & Coverage",
    "Performance & Efficiency"
  ],
  "findingsRaw": 12,
  "findingsKept": 5,
  "findingsDropped": 7,
  "severity": {
    "required": 0,
    "suggestion": 2,
    "nit": 3
  },
  "verdict": "APPROVE",
  "prNumber": 106,
  "commitSha": "6d1febfa6a5fec11a0de37ad0ba48a40771c0952",
  "agentMetrics": [
    {
      "name": "Security & Safety",
      "findingsRaw": 3,
      "findingsKept": 2,
      "responseLength": 3102
    },
    {
      "name": "Architecture & Design",
      "findingsRaw": 3,
      "findingsKept": 1,
      "responseLength": 3343
    },
    {
      "name": "Correctness & Logic",
      "findingsRaw": 2,
      "findingsKept": 1,
      "responseLength": 2127
    },
    {
      "name": "Testing & Coverage",
      "findingsRaw": 3,
      "findingsKept": 0,
      "responseLength": 3364
    },
    {
      "name": "Performance & Efficiency",
      "findingsRaw": 1,
      "findingsKept": 1,
      "responseLength": 919
    }
  ],
  "judgeMetrics": {
    "confidenceDistribution": {
      "high": 3,
      "medium": 2,
      "low": 0
    },
    "severityChanges": 5,
    "mergedDuplicates": 0
  },
  "fileMetrics": {
    "fileTypes": {
      ".rs": 5
    },
    "findingsPerFile": {
      "dash-spv/src/network/manager.rs": 1,
      "dash-spv/src/network/reputation.rs": 2,
      "dash-spv/src/network/addrv2.rs": 2
    }
  },
  "reviewerModel": "claude-sonnet-4-6",
  "judgeModel": "claude-sonnet-4-6"
}

* feat: add session outcome fields to `PeerReputation` Adds `last_success`, `last_tried`, and `consecutive_failures` to `PeerReputation`, plus a `record_connection_failure` method on `PeerReputationManager`. All new fields use `#[serde(default)]` so existing `reputations.json` files load without migration. `record_connection_attempt` now sets `last_tried` and `record_successful_connection` sets `last_success` and resets `consecutive_failures`. * test: cover `PeerReputation` session outcome transitions Adds unit tests for default values, `last_tried` on attempt, `last_success` plus `consecutive_failures` reset on success, failure streak increment preserving `last_success`, and legacy `reputations.json` decoding with missing fields. * feat: bump `AddrV2.time` on successful handshake Adds `AddrV2Handler::mark_seen` to refresh the stored timestamp for a directly observed peer, preserving existing services for known entries and inserting a fresh entry otherwise. `connect_to_peer` now calls `mark_seen` after a successful handshake so the `peers.dat` time reflects first-hand observation instead of gossip. * feat: track peer connection outcomes in network manager Calls `record_connection_failure` on both the TCP connect failure and the handshake failure paths in `PeerNetworkManager::connect_to_peer`, so the `consecutive_failures` streak reflects every unsuccessful attempt. * refactor: drop backward-compat shims from `PeerReputation` Removes `#[serde(default)]` on the new session outcome fields and the legacy-JSON load test. Backward compatibility of `reputations.json` across versions is no longer a requirement, so the shims and test are dead weight. * refactor: thread peer-advertised services into `mark_seen` * docs: clarify `last_connection` vs `last_tried` on `PeerReputation` * refactor: extract `make_addr_message` helper in `addrv2` * refactor: defensively set `last_tried` in `record_connection_failure` * refactor: add atomic `record_failure_with_penalty` and use at failure sites * chore: apply `cargo fmt` * refactor: clamp `consecutive_failures` on deserialization * refactor: extract `apply_score_change` and always update `last_tried` on failure * test: cover `record_failure_with_penalty` directly * refactor: clamp \`consecutive_failures\` at runtime and consolidate failure-field updates Extract private `record_failure_fields` that applies `last_tried = now` and `consecutive_failures.saturating_add(1).min(MAX_CONSECUTIVE_FAILURES)`. Both `record_connection_failure` and `record_failure_with_penalty` now delegate to it, eliminating duplicated mutations and capping the in-memory streak at the same 1000 limit enforced by the deserializer. * test: cover \`consecutive_failures\` clamp, \`last_success\` preservation, \`mark_seen\` eviction - Assert `last_success` is unchanged after `record_failure_with_penalty` - Deserialize a `PeerReputation` with `consecutive_failures: 99999` and assert it clamps to `MAX_CONSECUTIVE_FAILURES` - Fill `AddrV2Handler` to capacity and assert `mark_seen` stays bounded and includes the new entry * refactor: remove unused `record_connection_failure` * test: stabilize eviction test and cover runtime `consecutive_failures` saturation * refactor: document and assert non-negative contract on `record_failure_with_penalty` * test: cover `last_tried` preservation on success and update on failure * fix: clamp negative `score_change` in `record_failure_with_penalty` * test: cover happy-path attempt to success lifecycle * refactor: tighten `clamp_future_system_time` bounds and enforce load invariants Add a 30-day lower bound to `clamp_future_system_time` so stale or corrupted timestamps (including epoch 0) are discarded on load, in addition to future ones. Add `PeerReputation::normalize_after_load` and call it from the storage load path. It resets `consecutive_failures` to 0 whenever `last_tried` is `None`, preventing the inconsistent state where a non-zero failure streak has no temporal anchor. * test: cover `clamp_future_system_time` edge cases Add three tests: future timestamp rejected, epoch-zero rejected (exercising the new lower bound), and recent-past timestamp preserved. * docs: clarify zero-`score_change` contract on `record_failure_with_penalty` A value of 0 is a deliberate no-op for the reputation score but still records the failure counter and timestamp, which is useful for failures that should be tracked without contributing toward a ban. * fix: use `checked_sub` in `clamp_future_system_time` to avoid panic on broken clocks * test: cover `normalize_after_load` via storage round-trip * fix: use `checked_add` in `clamp_future_system_time` to avoid panic on far-future clocks * test: also cover stale-timestamp path in `normalize_after_load` round-trip * refactor: drop 30-day stale-timestamp floor from `clamp_future_system_time` Remove `TIMESTAMP_MAX_AGE` and the `floor` computation that rejected timestamps older than 30 days. The future-timestamp guard (10-second tolerance) is the only meaningful constraint. Update the `normalize_after_load` doc comment to drop the "stale" reference. * test: remove obsolete stale-timestamp tests and consolidate `clamp_future_system_time` coverage Delete `test_normalize_after_load_via_storage_round_trip_stale` (tested the removed stale-floor path). Merge the three `test_clamp_future_system_time_*` tests into a single `test_clamp_future_system_time` covering future rejection and recent-past acceptance. * fix: \`mark_seen\` now overwrites \`services\` on existing entries Since round 1 the caller passes the actual handshake-negotiated services, so preserving the gossip-sourced value was inverted. The handshake-observed value is authoritative and is now written on both new and existing entries. Rename \`test_mark_seen_bumps_time_and_preserves_services\` to \`test_mark_seen_bumps_time_and_updates_services\` and update its assertion to expect the handshake services, not the original gossip services. * test: cover positive \`normalize_after_load\` branch Add \`test_normalize_after_load_preserves_failures_when_last_tried_valid\` to assert that a valid (non-future) \`last_tried\` and a non-zero \`consecutive_failures\` are both preserved through the load round-trip, complementing the existing reset-path test. * refactor: move `record_failure_fields` to `impl PeerReputation` * refactor: move `apply_score_change` to `impl PeerReputation`

xdustinface added 4 commits April 17, 2026 20:37

feat: track peer connection outcomes in network manager

836f45c

Calls `record_connection_failure` on both the TCP connect failure and the handshake failure paths in `PeerNetworkManager::connect_to_peer`, so the `consecutive_failures` streak reflects every unsuccessful attempt.

refactor: drop backward-compat shims from PeerReputation

0125768

Removes `#[serde(default)]` on the new session outcome fields and the legacy-JSON load test. Backward compatibility of `reputations.json` across versions is no longer a requirement, so the shims and test are dead weight.

manki-review Bot previously requested changes Apr 17, 2026

View reviewed changes

Comment thread dash-spv/src/network/manager.rs Outdated

Comment thread dash-spv/src/network/reputation.rs

Comment thread dash-spv/src/network/addrv2.rs

Comment thread dash-spv/src/network/reputation.rs Outdated

Comment thread dash-spv/src/network/manager.rs Outdated

xdustinface added 6 commits April 17, 2026 20:55

refactor: thread peer-advertised services into mark_seen

e4c8831

docs: clarify last_connection vs last_tried on PeerReputation

2f9b666

refactor: extract make_addr_message helper in addrv2

4c7b641

refactor: defensively set last_tried in record_connection_failure

7c8e434

refactor: add atomic record_failure_with_penalty and use at failure…

7e0cdb8

… sites

chore: apply cargo fmt

d6e2300

manki-review Bot previously requested changes Apr 17, 2026

View reviewed changes

Comment thread dash-spv/src/network/reputation.rs

Comment thread dash-spv/src/network/reputation.rs

Comment thread dash-spv/src/network/reputation.rs Outdated

Comment thread dash-spv/src/network/reputation_tests.rs

Comment thread dash-spv/src/network/reputation.rs Outdated

xdustinface added 3 commits April 17, 2026 21:13

refactor: clamp consecutive_failures on deserialization

a0c210c

refactor: extract apply_score_change and always update last_tried…

fc463f1

… on failure

test: cover record_failure_with_penalty directly

dcb69ff

manki-review Bot previously requested changes Apr 17, 2026

View reviewed changes

xdustinface added 2 commits April 17, 2026 21:34

docs: clarify zero-score_change contract on `record_failure_with_pe…

90bd2fd

…nalty` A value of 0 is a deliberate no-op for the reputation score but still records the failure counter and timestamp, which is useful for failures that should be tracked without contributing toward a ban.