Skip to content

Add prescan data preparation to duplicate-issue-detector and stale-issues#481

Merged
strawgate merged 8 commits intomainfrom
copilot/investigate-prep-workflows
Feb 28, 2026
Merged

Add prescan data preparation to duplicate-issue-detector and stale-issues#481
strawgate merged 8 commits intomainfrom
copilot/investigate-prep-workflows

Conversation

Copy link
Contributor

Copilot AI commented Feb 28, 2026

This PR adds a prescan-first data preparation flow to issue-investigation workflows so the agent starts from concrete candidates before broader search.

duplicate-issue-detector

  • Added a Prescan issue index step that writes /tmp/gh-aw/agent/issues-index.tsv with issue number, title, and state.
  • Prescan collects the newest 500 and oldest 500 issues, then de-duplicates by issue number.
  • In .github/workflows/gh-aw-duplicate-issue-detector.md, added bash: true and updated the prompt flow to include Step 2: Scan the Issue Index before targeted search and candidate evaluation (with step renumbering through post result).
  • Added github/workflows/gh-aw-duplicate-issue-detector.md with the same prescan-first investigation structure for the workflow-source copy.

stale-issues

  • Added a Prescan open issues step that writes /tmp/gh-aw/agent/open-issues.tsv with number, title, updated_at, created_at, and label_names.
  • In both .github/workflows/gh-aw-stale-issues.md and github/workflows/gh-aw-stale-issues.md, prescan now fetches up to 500 open issues sorted by least recently updated.
  • Added bash: true and updated the prompt flow to include Step 0 so the agent reads the prescanned index first and prioritizes oldest updated_at candidates.

Workflow source and docs updates

  • Added workflow source copies under github/workflows/ for both gh-aw-duplicate-issue-detector.md and gh-aw-stale-issues.md with the prescan-first investigation structure.
  • Regenerated both .lock.yml compiled workflows to include the prescan steps and prompt updates.
  • Updated workflow READMEs for both duplicate-issue-detector and stale-issues to document the prescan-first investigation flow.

Notes

Generated by Update PR Body for issue #481

…sues workflows

- duplicate-issue-detector: prescan fetches newest 500 + oldest 500 issues
  (number, title, state) into TSV, agent scans index before searching
- stale-issues: prescan fetches open issues sorted by least recently updated
  with metadata (labels, timestamps) into TSV for immediate candidate pool
- Add bash: true to duplicate-issue-detector tools for file reading
- Update READMEs to document prescan behavior

Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
Copilot AI changed the title [WIP] Identify workflows needing data preparation Add prescan data preparation to duplicate-issue-detector and stale-issues Feb 28, 2026
@strawgate strawgate marked this pull request as ready for review February 28, 2026 18:43
@coderabbitai
Copy link

coderabbitai bot commented Feb 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds prescan steps and documentation for two GitHub Actions workflows. The duplicate-issue-detector workflow now creates an issue index by fetching the newest 500 and oldest 500 issues, deduplicating them, and writing a TSV index used as the initial scan input. The stale-issues workflow now prescans up to 500 open issues (sorted by least-recently-updated) into a TSV to seed the candidate pool. README and workflow docs/lock files were updated to describe the index-driven prescan flow and to adjust step numbering and labels.

Possibly related PRs

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed PR implements prescan data preparation for duplicate-issue-detector and stale-issues workflows, enabling agents to start with concrete candidate pools [#479].
Out of Scope Changes check ✅ Passed All changes directly align with PR objectives: prescan steps for duplicate-issue-detector and stale-issues, updated documentation, and workflow copies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch copilot/investigate-prep-workflows

Comment @coderabbitai help to get the list of available commands and usage tips.

@strawgate
Copy link
Collaborator

/ai aren't file read tools included by default? Is enabling bash here a bad idea? Please investigate.

Let's make sure we have logic to not grab overlapping issues for duplicate issues and for stale issues let's grab like 500 issues

Please update the pr

github-actions[bot]

This comment was marked as resolved.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@cla-checker-service
Copy link

❌ Author of the following commits did not sign a Contributor Agreement:
, , df76679

Please, read and sign the above mentioned agreement if you want to contribute to this project

coderabbitai[bot]

This comment was marked as resolved.

@strawgate strawgate closed this Feb 28, 2026
@strawgate strawgate reopened this Feb 28, 2026
github-actions[bot]

This comment was marked as resolved.

- Use --search "sort:updated-asc" for prescan queries to guarantee
  oldest-updated issues are fetched first (not just sorted client-side)
- Add bash: true to duplicate-issue-detector github/ copy
- Recompile lock files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
coderabbitai[bot]

This comment was marked as resolved.

strawgate and others added 2 commits February 28, 2026 13:26
The agent can read prescan files with its built-in file-reading tools.
Enabling bash unnecessarily expands agent capabilities beyond what's
needed for reading a TSV file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aligns the runtime workflow with the README and github/ copy
which both document a 500-issue prescan window.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/gh-aw-stale-issues.md:
- Around line 115-117: The fenced code block containing the command "cat
/tmp/gh-aw/agent/open-issues.tsv" is missing a language tag; update the opening
fence from ``` to ```bash so the block declares the language (e.g., change the
snippet around the "cat /tmp/gh-aw/agent/open-issues.tsv" command to start with
```bash).
- Around line 92-97: The command invoking gh issue list currently swallows all
errors via "2>/dev/null || true", which hides prescan failures and allows the
workflow to continue with an empty "$issues_file"; remove the silent suppression
and let failures surface by deleting the "2>/dev/null || true" tail (or replace
it with explicit error handling that writes stderr to logs and exits non‑zero),
so that the gh issue list invocation fails the job on error and preserves
visibility into problems when populating "$issues_file".

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea641b3 and 0db1c5c.

📒 Files selected for processing (2)
  • .github/workflows/gh-aw-stale-issues.lock.yml
  • .github/workflows/gh-aw-stale-issues.md

…anguage

Replaces 2>/dev/null || true with a ::warning annotation so failures
are visible in the Actions log. Adds bash language marker to code fence.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes due to a verified prescan error-handling gap that can silently degrade duplicate detection quality.


What is this? | From workflow: PR Review

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

Replaces 2>/dev/null || true with ::warning annotations on both prescan
queries so API failures are visible in the Actions log.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@strawgate strawgate merged commit 719b9fb into main Feb 28, 2026
16 of 17 checks passed
@strawgate strawgate deleted the copilot/investigate-prep-workflows branch February 28, 2026 20:17
strawgate added a commit that referenced this pull request Feb 28, 2026
Keeps both the stale-labeled issues collection step (from this PR)
and the prescan open issues step (from merged PR #481). Recompiled
lock file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Identify workflows we should prepare data for

2 participants