DO NOT MERGE — responder test sandbox v3 by microsasa · Pull Request #159 · microsasa/cli-tools

microsasa · 2026-03-19T07:34:16Z

Test PR for responder agent with REST API fallback instruction. DO NOT MERGE.

Exact match to the version that successfully addressed review threads (run 23117978475, 2026-03-15), with only two changes: 1. Added if: label gate for aw PRs 2. Removed resolve-pull-request-review-thread safe-output and step 6e (orchestrator handles resolution) No other changes. This commit should not be modified. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace pull_request_review trigger with workflow_dispatch + pr_number input. No more loops from stray review events. Orchestrator or manual dispatch controls when responder runs. Removed aw label check and triggering review check — no longer applicable with workflow_dispatch. Renumbered steps. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

If CI checks fail after addressing review comments, fix the issues and re-run until they pass. Do not push broken code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The reply-to-pull-request-review-comment and push-to-pull-request-branch safe output handlers default to target 'triggering' which requires a pull_request event context. Setting target to '*' lets the agent specify the PR number in each message, enabling workflow_dispatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Same fix as the responder: set target '*' on push-to-pull-request-branch so the agent specifies the PR number in its message (no event context needed). Added labels: [aw] to restrict pushes. Added checkout fetch config so all remote branches are available. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

MCP tools intermittently return empty results for review data. The agent sometimes falls back to REST API on its own (PR #152) but not always (PR #158). Explicitly instruct it to use the REST API endpoints for comments and reviews. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Intentionally imperfect code for testing the review-responder agent. DO NOT MERGE this branch to main.

Fix broken_function return type mismatch: wrap int result with str() to match declared return type. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR is a responder-agent sandbox/testing change set, introducing an intentionally imperfect Python “sandbox” module and altering gh-aw agent workflows to be manually dispatched (with expanded safe-output targeting and checkout ref fetching) for testing a REST-API-based review responder flow.

Changes:

Add tests/test_responder_sandbox.py with simple helper functions intended to be “imperfect” for responder testing.
Convert review-responder from a PR-review-triggered workflow to a workflow_dispatch workflow with pr_number input.
Update review-responder and ci-fixer workflows (and compiled lock files) to fetch full history/refs and set safe-outputs target: "*" (plus add aw labels via config).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/test_responder_sandbox.py	Adds a sandbox module under `tests/` with intentionally imperfect code (but currently not real pytest tests).
.github/workflows/review-responder.md	Switches responder to manual dispatch and updates instructions + safe-output targeting.
.github/workflows/review-responder.lock.yml	Compiled workflow updates reflecting new trigger/config (fetch depth/refs, safe-output target).
.github/workflows/ci-fixer.md	Adds checkout ref fetching and safe-output targeting (and config label addition).
.github/workflows/ci-fixer.lock.yml	Compiled workflow updates reflecting checkout ref fetching and safe-output target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

microsasa · 2026-03-19T07:55:16Z

tests/test_responder_sandbox.py

+    total = 0
+    for i in lst:
+        total = total + i
+    return total
+
+


Fixed. Replaced the manual loop/accumulator with sum(lst) for simplicity.

Generated by Review Responder

microsasa · 2026-03-19T07:55:23Z

tests/test_responder_sandbox.py

+    return result
+
+
+def broken_function(value: int) -> str:


Fixed. Renamed broken_function to increment_to_string to accurately describe its behavior (increments an int and returns it as a string).

Generated by Review Responder

microsasa · 2026-03-19T07:55:22Z

.github/workflows/review-responder.md

  push-to-pull-request-branch:
+    target: "*"
+    labels: [aw]
    github-token: ${{ secrets.GH_AW_WRITE_TOKEN }}


Partially addressed. Added an explicit aw label gate as step 1 in the workflow instructions — the agent will now verify the target PR has the aw label and stop if it doesn't.

However, removing labels: [aw] from the push-to-pull-request-branch frontmatter config requires recompilation of the .lock.yml file via gh aw compile, which I cannot run. This should be done manually as a follow-up.

Generated by Review Responder

microsasa · 2026-03-19T07:55:20Z

.github/workflows/ci-fixer.md

  push-to-pull-request-branch:
+    target: "*"
+    labels: [aw]
    github-token: ${{ secrets.GH_AW_WRITE_TOKEN }}


Partially addressed. Added an explicit aw label gate as step 1 in the ci-fixer instructions — the agent will now verify the target PR has the aw label before proceeding.

Same as the review-responder: removing labels: [aw] from the push-to-pull-request-branch frontmatter config requires .lock.yml recompilation via gh aw compile, which I cannot run. This should be done manually as a follow-up.

Generated by Review Responder

microsasa · 2026-03-19T07:55:21Z

.github/workflows/review-responder.md

+4. For each unresolved review comment thread (up to 10):
   a. Read the comment and understand what change is being requested
   b. Read the relevant file and surrounding code context
-   c. Make the requested fix in the code (edit the file locally — do NOT push yet)
+   c. Make the requested fix in the code
   d. Reply to the comment thread explaining what you changed


Fixed. Added step 4: "Fetch and checkout the PR's head branch before making any code changes. Use the PR number from $\{\{ inputs.pr_number }} to determine the head ref and switch to it." This ensures the agent explicitly checks out the correct branch rather than relying on the automatic checkout step (which doesn't fire for workflow_dispatch).

Generated by Review Responder

microsasa · 2026-03-19T07:55:20Z

.github/workflows/ci-fixer.md

+checkout:
+  fetch: ["*"]
+  fetch-depth: 0
+


Fixed. Added step 6: "Fetch and checkout the PR's head branch before making any code changes. Use the PR number from $\{\{ github.event.inputs.pr_number }} to determine the head ref and switch to it." This ensures the ci-fixer agent explicitly checks out the PR branch before applying fixes.

Generated by Review Responder

microsasa · 2026-03-19T07:55:18Z

tests/test_responder_sandbox.py

+# Responder Sandbox — intentionally imperfect code for testing
+
+
+def calculate_total(lst: list[int]) -> int:


Fixed. Renamed the file from tests/test_responder_sandbox.py to tests/responder_sandbox.py (removed the test_ prefix) so pytest no longer collects it as a test module.

Generated by Review Responder

microsasa · 2026-03-19T07:55:19Z

tests/test_responder_sandbox.py

+    result = ""
+    for item in data:
+        suffix = "\n" if verbose else ", "
+        result = result + str(item) + suffix
+    return result


Fixed. Replaced the quadratic string concatenation loop with separator.join(data). This also eliminates the trailing delimiter issue since join only places the separator between elements.

Generated by Review Responder

microsasa · 2026-03-19T07:55:17Z

.github/workflows/review-responder.md

+2. Add the label `review-response-attempted` to the PR.

-5. Read the unresolved review comment threads on the PR (not just the latest review — get all unresolved threads). If there are more than 10 unresolved threads, address the first 10 and leave a summary comment on the PR noting how many remain for manual follow-up.
+3. Read the unresolved review comment threads on the PR using the GitHub REST API: fetch `https://api.github.com/repos/$OWNER/$REPO/pulls/$PR_NUMBER/comments` and `https://api.github.com/repos/$OWNER/$REPO/pulls/$PR_NUMBER/reviews`. If there are more than 10 unresolved threads, address the first 10 and leave a summary comment on the PR noting how many remain for manual follow-up.


Fixed. Rewrote step 5 (previously step 3) to:

Use {owner}/{repo}/{pr_number} placeholder notation instead of non-existent $OWNER/$REPO/$PR_NUMBER env vars

Note that the REST API doesn't expose thread resolution state, and instruct the agent to use in_reply_to_id to identify top-level threads

Add guidance to fall back to REST when MCP tools return empty results

Removed the /reviews endpoint reference since comments are sufficient for identifying threads

Generated by Review Responder

microsasa · 2026-03-19T07:45:30Z

Quality Gate — Manual Review Required

Decision: NOT auto-approving (HIGH impact + explicit "DO NOT MERGE")

Evaluation Summary

Dimension	Assessment
`aw` label	✅ Present
Copilot review trigger	✅ Confirmed
PR title	⚠️ Explicitly says "DO NOT MERGE"
Code quality	N/A (test sandbox)
Impact	HIGH

High-Impact Areas

Workflow trigger change (review-responder.md): Switches from event-driven pull_request_review trigger to workflow_dispatch. This fundamentally changes how the responder is invoked and removes the automatic label gate (if: contains(...labels..., 'aw')).
Safe-outputs target: "*" (both workflows): Adds target: "*" to push-to-pull-request-branch and reply-to-pull-request-review-comment, broadening which PRs these agents can push to. This is a security-sensitive configuration change.
Full repo fetch (fetch: ["*"], fetch-depth: 0): Both workflows now fetch full history and all branches, which increases checkout time and attack surface.
Not real tests: tests/test_responder_sandbox.py contains sandbox functions for testing the responder agent — no actual test_ functions or assertions.

Recommendation

This is a test sandbox PR explicitly marked "DO NOT MERGE." A human reviewer should evaluate whether the workflow configuration changes (especially target: "*" and the trigger model change) are intentional and safe before any merge consideration.

Generated by Quality Gate for issue #159 · ◷

Sasa Junuzovic and others added 9 commits March 17, 2026 21:28

feat: responder fixes CI failures before pushing

af4ee63

If CI checks fail after addressing review comments, fix the issues and re-run until they pass. Do not push broken code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: add responder sandbox file for testing review response

b291a66

Intentionally imperfect code for testing the review-responder agent. DO NOT MERGE this branch to main.

test: add broken function for ci-fixer testing

8f0d9f9

fix: resolve CI failure (pyright reportReturnType error)

80d5003

Fix broken_function return type mismatch: wrap int result with str() to match declared return type. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

microsasa added the aw Created by agentic workflow label Mar 19, 2026

Copilot AI review requested due to automatic review settings March 19, 2026 07:34

Copilot AI reviewed Mar 19, 2026

View reviewed changes

microsasa added the review-response-attempted label Mar 19, 2026

github-actions bot mentioned this pull request Mar 19, 2026

[aw] Review Responder failed #153

Open

microsasa closed this Mar 19, 2026

microsasa mentioned this pull request Mar 19, 2026

feat: responder + ci-fixer fixes + orchestrator v3 #163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DO NOT MERGE — responder test sandbox v3#159

DO NOT MERGE — responder test sandbox v3#159
microsasa wants to merge 9 commits intomainfrom
test/responder-sandbox

microsasa commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

microsasa Mar 19, 2026

Uh oh!

microsasa Mar 19, 2026

Uh oh!

microsasa Mar 19, 2026

Uh oh!

microsasa Mar 19, 2026

Uh oh!

microsasa Mar 19, 2026

Uh oh!

microsasa Mar 19, 2026

Uh oh!

microsasa Mar 19, 2026

Uh oh!

microsasa Mar 19, 2026

Uh oh!

microsasa Mar 19, 2026

Uh oh!

microsasa commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Responder Sandbox — intentionally imperfect code for testing


		def calculate_total(lst: list[int]) -> int:

Conversation

microsasa commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

microsasa commented Mar 19, 2026

Quality Gate — Manual Review Required

Evaluation Summary

High-Impact Areas

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants