fix: drift agent exhaustive release coverage#765
Conversation
Three fixes to ensure complete drift coverage: 1. Capped untracked_closed_issues to 20 in JSON output (was 7200+ at 1.2MB — the agent wasted turns parsing noise instead of reading release notes) 2. Restructured prompt with explicit 6-step workflow: - Extract release notes with python3 (not raw cat of 2.6MB) - Build exhaustive checklist of ALL changes before editing - Classify P0/P1/P2/P3 — implement ALL P0-P2, only skip P3 - Where-to-add guide for each change type - Verify completeness before committing - PR body must list every item implemented AND skipped 3. Removed 'note P2 in description' instruction that caused the agent to skip new features instead of implementing them Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Expert Code Review — PR #765: fix: drift agent exhaustive release coverage
Methodology: 3 independent reviewers with adversarial consensus. Findings required 2/3+ agreement to be included; disputed findings (1/3) were sent back to the other 2 reviewers for validation. All 6 findings survived consensus.
Findings Summary
| # | Severity | File | Finding | Agreement |
|---|---|---|---|---|
| 1 | 🟡 MODERATE | instruction-drift.agent.md |
Orphaned upstream scan — step still runs but output is never consumed after removing cat upstream.json |
3/3 |
| 2 | 🟡 MODERATE | instruction-drift.agent.md |
8000-char release notes truncation is silent — no indicator to the agent that content was lost, risking the same incomplete coverage this PR aims to fix | 2/3 |
| 3 | 🟡 MODERATE | instruction-drift.agent.md |
Python snippets use d['manifests'] (hard key access) — KeyError on malformed/error JSON |
2/3 |
| 4 | 🟡 MODERATE | Check-Staleness.ps1 |
Magic number caps (10, 20) duplicated without constants — arithmetic bug risk on future edits | 3/3 (after follow-up) |
| 5 | 🟡 MODERATE | Check-Staleness.ps1 |
JSON cap at 20 items may hide important untracked issues with no retrieval fallback | 2/3 (after follow-up) |
| 6 | 🟢 MINOR | Check-Staleness.ps1 |
untracked_closed_issues_total field added but never consumed by agent extraction snippets, and absent when count=0 |
2/3 |
Assessment
The PR correctly identifies the root causes of the drift agent missing features (1.2MB noise, weak prompt language, no structured extraction) and the core approach — capping console/JSON output and using Python extraction — is sound. The restructured step-by-step agent instructions (Steps 2–6) are a clear improvement over the previous loose bullet list.
The main concerns are around silent data loss at the boundaries: truncated release notes without indicators (#2), capped issue lists without fallback (#5), and an orphaned upstream scan step whose output is now dead code (#1). None are critical, but #1 and #2 directly undercut the PR's goal of exhaustive coverage.
CI / Review Status
- CI: 2 agent jobs in progress, activation/pre-activation steps succeeded
- Prior reviews: None
- PR state: Already merged
Generated by Expert Code Review (auto) for issue #765 · ● 5.6M
| foreach ($issue in ($closedResult.untracked | Select-Object -First 10)) { | ||
| $labels = if ($issue.labels) { " [$($issue.labels -join ', ')]" } else { '' } | ||
| Write-Host " #$($issue.number): $($issue.title)$labels" -ForegroundColor Yellow | ||
| } | ||
| $results[-1].untracked_closed_issues = @($closedResult.untracked) | ||
| if ($untrackedCount -gt 10) { | ||
| Write-Host " ... and $($untrackedCount - 10) more" -ForegroundColor Yellow |
There was a problem hiding this comment.
🟡 MODERATE — Magic number caps are fragile (Flagged by: 3/3 reviewers)
The literal 10 appears three times (Select-Object -First 10, -gt 10, $untrackedCount - 10) and must stay in sync. Similarly 20 for the JSON cap. If a maintainer changes one without the others, the "... and X more" message will silently produce incorrect arithmetic.
Fix: Extract named constants at the top of the block:
$displayCap = 10
$jsonCap = 20Then reference $displayCap and $jsonCap throughout.
| for r in s['result']['releases']: | ||
| print(f'=== {r[\"tag\"]} ({r[\"published_at\"]}) ===') | ||
| print(r.get('release_notes','')[:8000]) | ||
| print() |
There was a problem hiding this comment.
🟡 MODERATE — 8000-char truncation silently drops release note tail (Flagged by: 2/3 reviewers)
The [:8000] slice has no truncation indicator. A release with >8000 chars of notes will have features silently dropped — the agent has no signal that content was cut. This is ironic since the PR's stated goal is to fix the agent missing features from release notes.
Fix: Add a truncation marker so the agent knows content was lost:
notes = r.get('release_notes', '')
print(notes[:8000])
if len(notes) > 8000:
print(f'... TRUNCATED ({len(notes)} total chars, {len(notes) - 8000} chars omitted)')Also consider raising the limit (e.g., 50000 chars) or removing it entirely if token budget allows.
| } | ||
| # Cap at 20 in JSON to avoid bloating the report | ||
| $results[-1].untracked_closed_issues = @($closedResult.untracked | Select-Object -First 20) | ||
| $results[-1].untracked_closed_issues_total = $untrackedCount |
There was a problem hiding this comment.
🟢 MINOR — untracked_closed_issues_total not consumed by agent (Flagged by: 2/3 reviewers)
This new field is a useful addition, but it has two gaps:
- The agent's Python extraction snippets in
instruction-drift.agent.mdonly parsereleasessources — they never extract or display this field. The agent won't know it's seeing 20 of 7200 issues. - The field is only written inside the
if ($untrackedCount -gt 0)block. Downstream consumers that unconditionally accessuntracked_closed_issues_totalwill get$null/KeyError when there are zero untracked issues.
Fix: Add extraction of this total to the agent's summary snippet, and consider setting it to 0 in the else branch for a consistent contract.
| ```bash | ||
| cat /tmp/drift-results/upstream.json | ||
| python3 -c " | ||
| import json |
There was a problem hiding this comment.
🟡 MODERATE — Orphaned upstream scan step (Flagged by: 3/3 reviewers)
The old instruction cat /tmp/drift-results/upstream.json was removed here and replaced with Python extraction from staleness.json only. However, the steps: block earlier in this file still runs Scan-GhAwUpdates.ps1 which writes to /tmp/drift-results/upstream.json. That step now executes (consuming CI time and API quota) but its output is never consumed.
Commit-level change data captured in upstream.json but not in release notes (e.g., breaking changes between releases) is now silently lost.
Fix: Either (a) remove the upstream scan step entirely since its output is dead, or (b) add a Python extraction snippet for upstream.json similar to the staleness one.
| Write-Host " ... and $($untrackedCount - 10) more" -ForegroundColor Yellow | ||
| } | ||
| # Cap at 20 in JSON to avoid bloating the report | ||
| $results[-1].untracked_closed_issues = @($closedResult.untracked | Select-Object -First 20) |
There was a problem hiding this comment.
🟡 MODERATE — JSON cap at 20 may hide important untracked issues (Flagged by: 2/3 reviewers)
If a security-relevant closed issue ranks beyond position 20 in the API response order, it's silently dropped from the JSON artifact. The agent has no way to retrieve it — gh CLI is not authenticated in the agent container, and there is no pagination or full-list fallback.
The new untracked_closed_issues_total field (line 695) signals truncation occurred, but the agent can't act on issues it never sees.
Fix: Consider keeping the full list in the JSON (the size is manageable — 20 issues ≈ a few KB) and only capping the console display, or provide a secondary artifact file with the complete list.
| cat /tmp/drift-results/staleness.json | ||
| # Check if changes were detected | ||
| python3 -c "import json; d=json.load(open('/tmp/drift-results/staleness.json')); print('changes_detected:', d.get('changes_detected')); [print(f' {s[\"type\"]}: {s.get(\"last_reviewed_release\",\"\")} -> releases={len(s.get(\"result\",{}).get(\"releases\",[]))}') for m in d['manifests'] for s in m['sources'] if s['type']=='releases']" | ||
| ``` |
There was a problem hiding this comment.
🟡 MODERATE — d['manifests'] KeyError on malformed JSON (Flagged by: 2/3 reviewers)
Both Python extraction snippets (this line and line ~96) use d['manifests'] with hard key access. If staleness.json has an error structure (e.g., {"changes_detected": false, "error": "script failed"}) or is otherwise malformed, this raises a KeyError and the agent gets no extraction output.
Note: one reviewer argued a loud traceback is better than silent empty results, since the agent can investigate. This is a valid trade-off — if you keep hard access, consider adding an explicit error-check block before extraction:
if 'manifests' not in d:
print(f'ERROR: no manifests key. Keys: {list(d.keys())}')
Fixes agent skipping most v0.70.0 features. Root causes: 1) 1.2MB of untracked issues noise (7200 items), 2) prompt said 'note P2 in description' instead of 'implement P2', 3) no structured extraction of release notes from the massive JSON.