Skip to content

Add log-searching-agent workflow with inline preflight search#548

Open
Copilot wants to merge 6 commits intomainfrom
copilot/log-searching-agent-preflight
Open

Add log-searching-agent workflow with inline preflight search#548
Copilot wants to merge 6 commits intomainfrom
copilot/log-searching-agent-preflight

Conversation

Copy link
Contributor

Copilot AI commented Mar 2, 2026

Summary

Adds a new reusable log-searching agent workflow that performs an inline preflight search over GitHub Actions logs, then hands structured results to the agent for investigation and issue creation when findings are actionable.

Changes in this PR

  • Added .github/workflows/gh-aw-log-searching-agent.md.
    • Defines workflow_call inputs for workflow targeting, exact-match search terms, lookback window, run limits, conclusion filtering, and optional setup/instruction controls.
    • Includes an inline preflight script that:
      • fetches matching workflow runs in the configured time window,
      • downloads and extracts run logs,
      • performs exact-match term searches with context,
      • writes per-run match files and search-manifest.json under /tmp/gh-aw/search-results.
  • Added gh-agent-workflows/log-searching-agent/example.yml with a workflow_dispatch trigger and required inputs.
  • Added gh-agent-workflows/log-searching-agent/README.md documenting behavior, inputs, and safe-output behavior.
  • Updated scripts/dogfood.sh to exclude log-searching-agent.
  • Compiled .github/workflows/gh-aw-log-searching-agent.lock.yml and recompiled it to pick up the messages-footer format change from main.

The body of this PR is automatically managed by the Trigger Update PR Body workflow.

New workflow that downloads GitHub Actions logs for a specified workflow,
searches them for caller-supplied terms (comma-separated exact matches),
and hands the search results to an AI agent for pattern analysis and
triage issue creation.

All preflight logic is inline in the workflow frontmatter steps to
support workflow_call without external script references.

Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
@strawgate strawgate marked this pull request as ready for review March 2, 2026 18:18
@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a Log Searching Agent workflow system for GitHub Actions that downloads and searches workflow logs for specified terms within a configurable time window, then analyzes results using an AI agent to identify patterns and file triage issues. The implementation comprises a reusable workflow definition with multi-stage execution (pre-activation, activation, agent, safe outputs, conclusion), a Markdown specification, documentation, an example workflow file, and updates to the dogfood exclusion script.

Suggested labels

big_boom

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed PR fully implements all objectives from issue #547: pre-flight script downloads logs, searches for comma-separated terms, writes results to accessible location, and integrates with agent workflow.
Out of Scope Changes check ✅ Passed All changes directly support log-searching-agent implementation. Only scope-aligned modification is dogfood exclusion in scripts/dogfood.sh, which is appropriate for new agent workflows.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch copilot/log-searching-agent-preflight
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

github-actions[bot]

This comment was marked as resolved.

Copilot AI changed the title [WIP] Add log searching agent with pre-flight script Add log-searching-agent workflow with inline preflight search Mar 2, 2026

## Safe Outputs

- `create-issue` — file a log search investigation report (max 1, auto-closes older reports)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Safe output behavior description does not match workflow config

This README says create-issue will "auto-close older reports", but the workflow config sets close-older-issues: false (in .github/workflows/gh-aw-log-searching-agent.md). That mismatch can mislead users about issue lifecycle behavior.

Suggested change
- `create-issue` — file a log search investigation report (max 1, auto-closes older reports)
- `create-issue` — file a log search investigation report (max 1)

@fr4nc1sc0-r4m0n
Copy link
Contributor

@copilot Will be feasible to execute this log-searching-agent over all the defined workflows in the repository?

…rom main)

Co-authored-by: fr4nc1sc0-r4m0n <215478872+fr4nc1sc0-r4m0n@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 4, 2026

@copilot Will be feasible to execute this log-searching-agent over all the defined workflows in the repository?

Yes — the caller's trigger workflow can use a GitHub Actions matrix strategy to fan out over all (or a subset of) workflows. For example:

jobs:
  discover:
    runs-on: ubuntu-latest
    outputs:
      workflows: ${{ steps.list.outputs.workflows }}
    steps:
      - id: list
        run: |
          workflows=$(gh api repos/${{ github.repository }}/actions/workflows \
            --jq '[.workflows[].path | split("/")[-1]] | tojson')
          echo "workflows=$workflows" >> "$GITHUB_OUTPUT"
        env:
          GH_TOKEN: ${{ github.token }}

  search:
    needs: discover
    strategy:
      matrix:
        workflow: ${{ fromJSON(needs.discover.outputs.workflows) }}
    uses: elastic/ai-github-actions/.github/workflows/gh-aw-log-searching-agent.lock.yml@v0
    with:
      workflow: ${{ matrix.workflow }}
      search-terms: "OutOfMemoryError, DEADLINE_EXCEEDED"
    secrets:
      COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }}

Each matrix leg runs the preflight search + agent independently, so you get a separate triage issue per workflow when actionable findings exist. The discover job can also filter the list (e.g., skip workflows by name pattern) before passing to the matrix.

Copy link
Contributor

@fr4nc1sc0-r4m0n fr4nc1sc0-r4m0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fr4nc1sc0-r4m0n
Copy link
Contributor

@strawgate Do you think we can merge this?

@strawgate strawgate enabled auto-merge (squash) March 14, 2026 18:30
auto-merge was automatically disabled March 14, 2026 18:30

Pull Request is not mergeable

@github-actions github-actions bot added the medium_boom Medium PR blast radius; likely benefits from human review label Mar 14, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/gh-aw-log-searching-agent.lock.yml:
- Around line 16-20: The committed lockfile
.github/workflows/gh-aw-log-searching-agent.lock.yml is out of sync with its
source .github/workflows/gh-aw-log-searching-agent.md; recompile the generated
workflow and update the lockfile by running the generator (gh aw compile or make
compile) on gh-aw-log-searching-agent.md, verify the resulting .lock.yml matches
the compiled output, and commit the updated
.github/workflows/gh-aw-log-searching-agent.lock.yml to unblock CI.
- Line 593: The manifest's runs_searched uses COLLECTED which counts selected
runs but not whether logs were successfully downloaded/extracted; change the
logic to track only successfully processed runs: introduce a new counter (e.g.
PROCESSED_RUNS=0), increment PROCESSED_RUNS only after a successful gh api
download and successful unzip (inside the loop after extraction, near the
existing unzip success path), use PROCESSED_RUNS in the manifest "runs_searched"
field and in the final summary echoes, and leave COLLECTED for limiting how many
run IDs are selected; alternatively, if you prefer to keep COLLECTED as the
processed count, decrement COLLECTED or avoid incrementing it until after
successful download/extract (update references to COLLECTED in
messages/manifests accordingly).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d033038f-3611-4cf5-9e5b-adaad3ee4423

📥 Commits

Reviewing files that changed from the base of the PR and between 0488272 and f66b22f.

📒 Files selected for processing (2)
  • .github/workflows/gh-aw-log-searching-agent.lock.yml
  • scripts/dogfood.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/dogfood.sh

Comment on lines +16 to +20
# This file was automatically generated by gh-aw. DO NOT EDIT.
#
# To update this file, edit the corresponding .md file and run:
# gh aw compile
# Not all edits will cause changes to this file.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Re-sync generated lockfile to unblock CI

Internal: CI reports this file is modified after make compile, so the committed lock output is out of sync with its .md source. Please regenerate from .github/workflows/gh-aw-log-searching-agent.md (gh aw compile / make compile) and commit the updated lockfile.

Based on learnings, .lock.yml files are generated artifacts and must match the compiled output from their corresponding .md source.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/gh-aw-log-searching-agent.lock.yml around lines 16 - 20,
The committed lockfile .github/workflows/gh-aw-log-searching-agent.lock.yml is
out of sync with its source .github/workflows/gh-aw-log-searching-agent.md;
recompile the generated workflow and update the lockfile by running the
generator (gh aw compile or make compile) on gh-aw-log-searching-agent.md,
verify the resulting .lock.yml matches the compiled output, and commit the
updated .github/workflows/gh-aw-log-searching-agent.lock.yml to unblock CI.

SEARCH_TERMS: ${{ inputs.search-terms }}
SEARCH_WORKFLOW: ${{ inputs.workflow }}
name: Search workflow logs
run: "set -euo pipefail\n\nLOGS_DIR=\"/tmp/gh-aw/logs\"\nRESULTS_DIR=\"/tmp/gh-aw/search-results\"\nmkdir -p \"$LOGS_DIR\" \"$RESULTS_DIR\"\n\n# Compute the since date (N days ago)\nSINCE_DATE=$(date -u -d \"$SEARCH_DAYS days ago\" +%Y-%m-%dT00:00:00Z 2>/dev/null \\\n || date -u -v-\"${SEARCH_DAYS}\"d +%Y-%m-%dT00:00:00Z)\n\necho \"Fetching up to $SEARCH_MAX_RUNS runs of '$SEARCH_WORKFLOW' since $SINCE_DATE (conclusion: $SEARCH_CONCLUSION)...\"\n\n# List matching workflow runs\nPAGE=1\nCOLLECTED=0\nRUN_IDS=\"\"\nwhile [ \"$COLLECTED\" -lt \"$SEARCH_MAX_RUNS\" ]; do\n RESPONSE=$(gh api \"repos/$GITHUB_REPOSITORY/actions/workflows/$SEARCH_WORKFLOW/runs?per_page=100&page=$PAGE&created=>=$SINCE_DATE\" --jq '.workflow_runs')\n COUNT=$(echo \"$RESPONSE\" | jq 'length')\n [ \"$COUNT\" -gt 0 ] || break\n\n for i in $(seq 0 $((COUNT - 1))); do\n RUN_CONCLUSION=$(echo \"$RESPONSE\" | jq -r \".[$i].conclusion\")\n RUN_ID=$(echo \"$RESPONSE\" | jq -r \".[$i].id\")\n\n if [ \"$SEARCH_CONCLUSION\" = \"any\" ] || [ \"$RUN_CONCLUSION\" = \"$SEARCH_CONCLUSION\" ]; then\n RUN_IDS=\"$RUN_IDS $RUN_ID\"\n COLLECTED=$((COLLECTED + 1))\n [ \"$COLLECTED\" -lt \"$SEARCH_MAX_RUNS\" ] || break\n fi\n done\n PAGE=$((PAGE + 1))\ndone\n\nif [ -z \"$RUN_IDS\" ]; then\n echo \"No matching workflow runs found.\"\n echo '{\"workflow\":\"'\"$SEARCH_WORKFLOW\"'\",\"search_terms\":[],\"since\":\"'\"$SINCE_DATE\"'\",\"runs_searched\":0,\"total_matches\":0,\"results\":[]}' > \"$RESULTS_DIR/search-manifest.json\"\n exit 0\nfi\n\necho \"Found $COLLECTED matching run(s). Downloading and searching logs...\"\n\n# Parse search terms (comma-separated) into an array\nIFS=',' read -ra TERMS <<< \"$SEARCH_TERMS\"\n# Trim whitespace from each term\nCLEAN_TERMS=()\nfor term in \"${TERMS[@]}\"; do\n cleaned=$(echo \"$term\" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')\n [ -n \"$cleaned\" ] && CLEAN_TERMS+=(\"$cleaned\")\ndone\n\nMANIFEST_ENTRIES=\"\"\nTOTAL_MATCHES=0\n\nfor RUN_ID in $RUN_IDS; do\n RUN_DIR=\"$LOGS_DIR/$RUN_ID\"\n mkdir -p \"$RUN_DIR\"\n\n # Download and extract logs\n echo \" Downloading logs for run $RUN_ID...\"\n if ! gh api \"repos/$GITHUB_REPOSITORY/actions/runs/$RUN_ID/logs\" > \"$RUN_DIR/logs.zip\" 2>/dev/null; then\n echo \" Warning: could not download logs for run $RUN_ID\"\n continue\n fi\n\n if ! unzip -q -o \"$RUN_DIR/logs.zip\" -d \"$RUN_DIR/\" 2>/dev/null; then\n echo \" Warning: could not extract logs for run $RUN_ID\"\n continue\n fi\n rm -f \"$RUN_DIR/logs.zip\"\n\n # Search each log file for each term\n RUN_MATCHES=\"\"\n RUN_MATCH_COUNT=0\n while IFS= read -r -d '' logfile; do\n for term in \"${CLEAN_TERMS[@]}\"; do\n MATCHES=$(grep -n -F \"$term\" \"$logfile\" 2>/dev/null || true)\n if [ -n \"$MATCHES\" ]; then\n MATCH_COUNT=$(echo \"$MATCHES\" | wc -l)\n RUN_MATCH_COUNT=$((RUN_MATCH_COUNT + MATCH_COUNT))\n RELATIVE_PATH=\"${logfile#$LOGS_DIR/}\"\n\n # Write matched lines with context to a result file\n RESULT_FILE=\"$RESULTS_DIR/run-${RUN_ID}-matches.txt\"\n {\n echo \"=== File: $RELATIVE_PATH | Term: $term | Matches: $MATCH_COUNT ===\"\n grep -n -F -B2 -A2 \"$term\" \"$logfile\" 2>/dev/null || true\n echo \"\"\n } >> \"$RESULT_FILE\"\n\n # Build JSON entry for this file match\n ESCAPED_PATH=$(echo \"$RELATIVE_PATH\" | jq -Rs '.')\n ESCAPED_TERM=$(echo \"$term\" | jq -Rs '.')\n RUN_MATCHES=\"$RUN_MATCHES{\\\"file\\\":$ESCAPED_PATH,\\\"term\\\":$ESCAPED_TERM,\\\"count\\\":$MATCH_COUNT},\"\n fi\n done\n done < <(find \"$RUN_DIR\" -name \"*.txt\" -print0)\n\n if [ \"$RUN_MATCH_COUNT\" -gt 0 ]; then\n TOTAL_MATCHES=$((TOTAL_MATCHES + RUN_MATCH_COUNT))\n RUN_MATCHES=\"${RUN_MATCHES%,}\"\n RUN_URL=\"https://github.com/$GITHUB_REPOSITORY/actions/runs/$RUN_ID\"\n MANIFEST_ENTRIES=\"$MANIFEST_ENTRIES{\\\"run_id\\\":$RUN_ID,\\\"url\\\":\\\"$RUN_URL\\\",\\\"match_count\\\":$RUN_MATCH_COUNT,\\\"file_matches\\\":[$RUN_MATCHES]},\"\n echo \" Run $RUN_ID: $RUN_MATCH_COUNT match(es)\"\n else\n echo \" Run $RUN_ID: no matches\"\n # Clean up log files for runs with no matches to save disk\n rm -rf \"$RUN_DIR\"\n fi\ndone\n\n# Build search terms JSON array\nTERMS_JSON=\"[\"\nfor term in \"${CLEAN_TERMS[@]}\"; do\n ESCAPED=$(echo \"$term\" | jq -Rs '.')\n TERMS_JSON=\"$TERMS_JSON$ESCAPED,\"\ndone\nTERMS_JSON=\"${TERMS_JSON%,}]\"\n\n# Write the search manifest\nMANIFEST_ENTRIES=\"${MANIFEST_ENTRIES%,}\"\ncat > \"$RESULTS_DIR/search-manifest.json\" <<MANIFEST_EOF\n{\n \"workflow\": \"$SEARCH_WORKFLOW\",\n \"search_terms\": $TERMS_JSON,\n \"since\": \"$SINCE_DATE\",\n \"conclusion_filter\": \"$SEARCH_CONCLUSION\",\n \"runs_searched\": $COLLECTED,\n \"total_matches\": $TOTAL_MATCHES,\n \"results\": [$MANIFEST_ENTRIES]\n}\nMANIFEST_EOF\n\necho \"\"\necho \"Search complete. Total matches: $TOTAL_MATCHES across $COLLECTED run(s).\"\necho \"Results written to $RESULTS_DIR/\"\n"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

runs_searched can be incorrect when run log download/extract fails

The script counts a run as searched when it is selected, but not when logs are actually downloaded/extracted. If gh api .../logs or unzip fails, the manifest still reports it in runs_searched, which can understate false negatives in analysis.

Suggested fix (in source .md, then recompile)
- COLLECTED=0
+ COLLECTED=0
+ RUNS_SEARCHED=0
+ RUNS_FAILED=0

  for RUN_ID in $RUN_IDS; do
    ...
    if ! gh api "repos/$GITHUB_REPOSITORY/actions/runs/$RUN_ID/logs" > "$RUN_DIR/logs.zip" 2>/dev/null; then
+     RUNS_FAILED=$((RUNS_FAILED + 1))
      continue
    fi

    if ! unzip -q -o "$RUN_DIR/logs.zip" -d "$RUN_DIR/" 2>/dev/null; then
+     RUNS_FAILED=$((RUNS_FAILED + 1))
      continue
    fi
+   RUNS_SEARCHED=$((RUNS_SEARCHED + 1))
    ...
  done

- "runs_searched": $COLLECTED,
+ "runs_searched": $RUNS_SEARCHED,
+ "runs_failed": $RUNS_FAILED,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
run: "set -euo pipefail\n\nLOGS_DIR=\"/tmp/gh-aw/logs\"\nRESULTS_DIR=\"/tmp/gh-aw/search-results\"\nmkdir -p \"$LOGS_DIR\" \"$RESULTS_DIR\"\n\n# Compute the since date (N days ago)\nSINCE_DATE=$(date -u -d \"$SEARCH_DAYS days ago\" +%Y-%m-%dT00:00:00Z 2>/dev/null \\\n || date -u -v-\"${SEARCH_DAYS}\"d +%Y-%m-%dT00:00:00Z)\n\necho \"Fetching up to $SEARCH_MAX_RUNS runs of '$SEARCH_WORKFLOW' since $SINCE_DATE (conclusion: $SEARCH_CONCLUSION)...\"\n\n# List matching workflow runs\nPAGE=1\nCOLLECTED=0\nRUN_IDS=\"\"\nwhile [ \"$COLLECTED\" -lt \"$SEARCH_MAX_RUNS\" ]; do\n RESPONSE=$(gh api \"repos/$GITHUB_REPOSITORY/actions/workflows/$SEARCH_WORKFLOW/runs?per_page=100&page=$PAGE&created=>=$SINCE_DATE\" --jq '.workflow_runs')\n COUNT=$(echo \"$RESPONSE\" | jq 'length')\n [ \"$COUNT\" -gt 0 ] || break\n\n for i in $(seq 0 $((COUNT - 1))); do\n RUN_CONCLUSION=$(echo \"$RESPONSE\" | jq -r \".[$i].conclusion\")\n RUN_ID=$(echo \"$RESPONSE\" | jq -r \".[$i].id\")\n\n if [ \"$SEARCH_CONCLUSION\" = \"any\" ] || [ \"$RUN_CONCLUSION\" = \"$SEARCH_CONCLUSION\" ]; then\n RUN_IDS=\"$RUN_IDS $RUN_ID\"\n COLLECTED=$((COLLECTED + 1))\n [ \"$COLLECTED\" -lt \"$SEARCH_MAX_RUNS\" ] || break\n fi\n done\n PAGE=$((PAGE + 1))\ndone\n\nif [ -z \"$RUN_IDS\" ]; then\n echo \"No matching workflow runs found.\"\n echo '{\"workflow\":\"'\"$SEARCH_WORKFLOW\"'\",\"search_terms\":[],\"since\":\"'\"$SINCE_DATE\"'\",\"runs_searched\":0,\"total_matches\":0,\"results\":[]}' > \"$RESULTS_DIR/search-manifest.json\"\n exit 0\nfi\n\necho \"Found $COLLECTED matching run(s). Downloading and searching logs...\"\n\n# Parse search terms (comma-separated) into an array\nIFS=',' read -ra TERMS <<< \"$SEARCH_TERMS\"\n# Trim whitespace from each term\nCLEAN_TERMS=()\nfor term in \"${TERMS[@]}\"; do\n cleaned=$(echo \"$term\" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')\n [ -n \"$cleaned\" ] && CLEAN_TERMS+=(\"$cleaned\")\ndone\n\nMANIFEST_ENTRIES=\"\"\nTOTAL_MATCHES=0\n\nfor RUN_ID in $RUN_IDS; do\n RUN_DIR=\"$LOGS_DIR/$RUN_ID\"\n mkdir -p \"$RUN_DIR\"\n\n # Download and extract logs\n echo \" Downloading logs for run $RUN_ID...\"\n if ! gh api \"repos/$GITHUB_REPOSITORY/actions/runs/$RUN_ID/logs\" > \"$RUN_DIR/logs.zip\" 2>/dev/null; then\n echo \" Warning: could not download logs for run $RUN_ID\"\n continue\n fi\n\n if ! unzip -q -o \"$RUN_DIR/logs.zip\" -d \"$RUN_DIR/\" 2>/dev/null; then\n echo \" Warning: could not extract logs for run $RUN_ID\"\n continue\n fi\n rm -f \"$RUN_DIR/logs.zip\"\n\n # Search each log file for each term\n RUN_MATCHES=\"\"\n RUN_MATCH_COUNT=0\n while IFS= read -r -d '' logfile; do\n for term in \"${CLEAN_TERMS[@]}\"; do\n MATCHES=$(grep -n -F \"$term\" \"$logfile\" 2>/dev/null || true)\n if [ -n \"$MATCHES\" ]; then\n MATCH_COUNT=$(echo \"$MATCHES\" | wc -l)\n RUN_MATCH_COUNT=$((RUN_MATCH_COUNT + MATCH_COUNT))\n RELATIVE_PATH=\"${logfile#$LOGS_DIR/}\"\n\n # Write matched lines with context to a result file\n RESULT_FILE=\"$RESULTS_DIR/run-${RUN_ID}-matches.txt\"\n {\n echo \"=== File: $RELATIVE_PATH | Term: $term | Matches: $MATCH_COUNT ===\"\n grep -n -F -B2 -A2 \"$term\" \"$logfile\" 2>/dev/null || true\n echo \"\"\n } >> \"$RESULT_FILE\"\n\n # Build JSON entry for this file match\n ESCAPED_PATH=$(echo \"$RELATIVE_PATH\" | jq -Rs '.')\n ESCAPED_TERM=$(echo \"$term\" | jq -Rs '.')\n RUN_MATCHES=\"$RUN_MATCHES{\\\"file\\\":$ESCAPED_PATH,\\\"term\\\":$ESCAPED_TERM,\\\"count\\\":$MATCH_COUNT},\"\n fi\n done\n done < <(find \"$RUN_DIR\" -name \"*.txt\" -print0)\n\n if [ \"$RUN_MATCH_COUNT\" -gt 0 ]; then\n TOTAL_MATCHES=$((TOTAL_MATCHES + RUN_MATCH_COUNT))\n RUN_MATCHES=\"${RUN_MATCHES%,}\"\n RUN_URL=\"https://github.com/$GITHUB_REPOSITORY/actions/runs/$RUN_ID\"\n MANIFEST_ENTRIES=\"$MANIFEST_ENTRIES{\\\"run_id\\\":$RUN_ID,\\\"url\\\":\\\"$RUN_URL\\\",\\\"match_count\\\":$RUN_MATCH_COUNT,\\\"file_matches\\\":[$RUN_MATCHES]},\"\n echo \" Run $RUN_ID: $RUN_MATCH_COUNT match(es)\"\n else\n echo \" Run $RUN_ID: no matches\"\n # Clean up log files for runs with no matches to save disk\n rm -rf \"$RUN_DIR\"\n fi\ndone\n\n# Build search terms JSON array\nTERMS_JSON=\"[\"\nfor term in \"${CLEAN_TERMS[@]}\"; do\n ESCAPED=$(echo \"$term\" | jq -Rs '.')\n TERMS_JSON=\"$TERMS_JSON$ESCAPED,\"\ndone\nTERMS_JSON=\"${TERMS_JSON%,}]\"\n\n# Write the search manifest\nMANIFEST_ENTRIES=\"${MANIFEST_ENTRIES%,}\"\ncat > \"$RESULTS_DIR/search-manifest.json\" <<MANIFEST_EOF\n{\n \"workflow\": \"$SEARCH_WORKFLOW\",\n \"search_terms\": $TERMS_JSON,\n \"since\": \"$SINCE_DATE\",\n \"conclusion_filter\": \"$SEARCH_CONCLUSION\",\n \"runs_searched\": $COLLECTED,\n \"total_matches\": $TOTAL_MATCHES,\n \"results\": [$MANIFEST_ENTRIES]\n}\nMANIFEST_EOF\n\necho \"\"\necho \"Search complete. Total matches: $TOTAL_MATCHES across $COLLECTED run(s).\"\necho \"Results written to $RESULTS_DIR/\"\n"
run: "set -euo pipefail\n\nLOGS_DIR=\"/tmp/gh-aw/logs\"\nRESULTS_DIR=\"/tmp/gh-aw/search-results\"\nmkdir -p \"$LOGS_DIR\" \"$RESULTS_DIR\"\n\n# Compute the since date (N days ago)\nSINCE_DATE=$(date -u -d \"$SEARCH_DAYS days ago\" +%Y-%m-%dT00:00:00Z 2>/dev/null \\\n || date -u -v-\"${SEARCH_DAYS}\"d +%Y-%m-%dT00:00:00Z)\n\necho \"Fetching up to $SEARCH_MAX_RUNS runs of '$SEARCH_WORKFLOW' since $SINCE_DATE (conclusion: $SEARCH_CONCLUSION)...\"\n\n# List matching workflow runs\nPAGE=1\nCOLLECTED=0\nRUNS_SEARCHED=0\nRUNS_FAILED=0\nRUN_IDS=\"\"\nwhile [ \"$COLLECTED\" -lt \"$SEARCH_MAX_RUNS\" ]; do\n RESPONSE=$(gh api \"repos/$GITHUB_REPOSITORY/actions/workflows/$SEARCH_WORKFLOW/runs?per_page=100&page=$PAGE&created=>=$SINCE_DATE\" --jq '.workflow_runs')\n COUNT=$(echo \"$RESPONSE\" | jq 'length')\n [ \"$COUNT\" -gt 0 ] || break\n\n for i in $(seq 0 $((COUNT - 1))); do\n RUN_CONCLUSION=$(echo \"$RESPONSE\" | jq -r \".[$i].conclusion\")\n RUN_ID=$(echo \"$RESPONSE\" | jq -r \".[$i].id\")\n\n if [ \"$SEARCH_CONCLUSION\" = \"any\" ] || [ \"$RUN_CONCLUSION\" = \"$SEARCH_CONCLUSION\" ]; then\n RUN_IDS=\"$RUN_IDS $RUN_ID\"\n COLLECTED=$((COLLECTED + 1))\n [ \"$COLLECTED\" -lt \"$SEARCH_MAX_RUNS\" ] || break\n fi\n done\n PAGE=$((PAGE + 1))\ndone\n\nif [ -z \"$RUN_IDS\" ]; then\n echo \"No matching workflow runs found.\"\n echo '{\"workflow\":\"'\"$SEARCH_WORKFLOW\"'\",\"search_terms\":[],\"since\":\"'\"$SINCE_DATE\"'\",\"runs_searched\":0,\"total_matches\":0,\"results\":[]}' > \"$RESULTS_DIR/search-manifest.json\"\n exit 0\nfi\n\necho \"Found $COLLECTED matching run(s). Downloading and searching logs...\"\n\n# Parse search terms (comma-separated) into an array\nIFS=',' read -ra TERMS <<< \"$SEARCH_TERMS\"\n# Trim whitespace from each term\nCLEAN_TERMS=()\nfor term in \"${TERMS[@]}\"; do\n cleaned=$(echo \"$term\" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')\n [ -n \"$cleaned\" ] && CLEAN_TERMS+=(\"$cleaned\")\ndone\n\nMANIFEST_ENTRIES=\"\"\nTOTAL_MATCHES=0\n\nfor RUN_ID in $RUN_IDS; do\n RUN_DIR=\"$LOGS_DIR/$RUN_ID\"\n mkdir -p \"$RUN_DIR\"\n\n # Download and extract logs\n echo \" Downloading logs for run $RUN_ID...\"\n if ! gh api \"repos/$GITHUB_REPOSITORY/actions/runs/$RUN_ID/logs\" > \"$RUN_DIR/logs.zip\" 2>/dev/null; then\n echo \" Warning: could not download logs for run $RUN_ID\"\n RUNS_FAILED=$((RUNS_FAILED + 1))\n continue\n fi\n\n if ! unzip -q -o \"$RUN_DIR/logs.zip\" -d \"$RUN_DIR/\" 2>/dev/null; then\n echo \" Warning: could not extract logs for run $RUN_ID\"\n RUNS_FAILED=$((RUNS_FAILED + 1))\n continue\n fi\n RUNS_SEARCHED=$((RUNS_SEARCHED + 1))\n rm -f \"$RUN_DIR/logs.zip\"\n\n # Search each log file for each term\n RUN_MATCHES=\"\"\n RUN_MATCH_COUNT=0\n while IFS= read -r -d '' logfile; do\n for term in \"${CLEAN_TERMS[@]}\"; do\n MATCHES=$(grep -n -F \"$term\" \"$logfile\" 2>/dev/null || true)\n if [ -n \"$MATCHES\" ]; then\n MATCH_COUNT=$(echo \"$MATCHES\" | wc -l)\n RUN_MATCH_COUNT=$((RUN_MATCH_COUNT + MATCH_COUNT))\n RELATIVE_PATH=\"${logfile#$LOGS_DIR/}\"\n\n # Write matched lines with context to a result file\n RESULT_FILE=\"$RESULTS_DIR/run-${RUN_ID}-matches.txt\"\n {\n echo \"=== File: $RELATIVE_PATH | Term: $term | Matches: $MATCH_COUNT ===\"\n grep -n -F -B2 -A2 \"$term\" \"$logfile\" 2>/dev/null || true\n echo \"\"\n } >> \"$RESULT_FILE\"\n\n # Build JSON entry for this file match\n ESCAPED_PATH=$(echo \"$RELATIVE_PATH\" | jq -Rs '.')\n ESCAPED_TERM=$(echo \"$term\" | jq -Rs '.')\n RUN_MATCHES=\"$RUN_MATCHES{\\\"file\\\":$ESCAPED_PATH,\\\"term\\\":$ESCAPED_TERM,\\\"count\\\":$MATCH_COUNT},\"\n fi\n done\n done < <(find \"$RUN_DIR\" -name \"*.txt\" -print0)\n\n if [ \"$RUN_MATCH_COUNT\" -gt 0 ]; then\n TOTAL_MATCHES=$((TOTAL_MATCHES + RUN_MATCH_COUNT))\n RUN_MATCHES=\"${RUN_MATCHES%,}\"\n RUN_URL=\"https://github.com/$GITHUB_REPOSITORY/actions/runs/$RUN_ID\"\n MANIFEST_ENTRIES=\"$MANIFEST_ENTRIES{\\\"run_id\\\":$RUN_ID,\\\"url\\\":\\\"$RUN_URL\\\",\\\"match_count\\\":$RUN_MATCH_COUNT,\\\"file_matches\\\":[$RUN_MATCHES]},\"\n echo \" Run $RUN_ID: $RUN_MATCH_COUNT match(es)\"\n else\n echo \" Run $RUN_ID: no matches\"\n # Clean up log files for runs with no matches to save disk\n rm -rf \"$RUN_DIR\"\n fi\ndone\n\n# Build search terms JSON array\nTERMS_JSON=\"[\"\nfor term in \"${CLEAN_TERMS[@]}\"; do\n ESCAPED=$(echo \"$term\" | jq -Rs '.')\n TERMS_JSON=\"$TERMS_JSON$ESCAPED,\"\ndone\nTERMS_JSON=\"${TERMS_JSON%,}]\"\n\n# Write the search manifest\nMANIFEST_ENTRIES=\"${MANIFEST_ENTRIES%,}\"\ncat > \"$RESULTS_DIR/search-manifest.json\" <<MANIFEST_EOF\n{\n \"workflow\": \"$SEARCH_WORKFLOW\",\n \"search_terms\": $TERMS_JSON,\n \"since\": \"$SINCE_DATE\",\n \"conclusion_filter\": \"$SEARCH_CONCLUSION\",\n \"runs_searched\": $RUNS_SEARCHED,\n \"runs_failed\": $RUNS_FAILED,\n \"total_matches\": $TOTAL_MATCHES,\n \"results\": [$MANIFEST_ENTRIES]\n}\nMANIFEST_EOF\n\necho \"\"\necho \"Search complete. Total matches: $TOTAL_MATCHES across $COLLECTED run(s).\"\necho \"Results written to $RESULTS_DIR/\"\n"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/gh-aw-log-searching-agent.lock.yml at line 593, The
manifest's runs_searched uses COLLECTED which counts selected runs but not
whether logs were successfully downloaded/extracted; change the logic to track
only successfully processed runs: introduce a new counter (e.g.
PROCESSED_RUNS=0), increment PROCESSED_RUNS only after a successful gh api
download and successful unzip (inside the loop after extraction, near the
existing unzip success path), use PROCESSED_RUNS in the manifest "runs_searched"
field and in the final summary echoes, and leave COLLECTED for limiting how many
run IDs are selected; alternatively, if you prefer to keep COLLECTED as the
processed count, decrement COLLECTED or avoid incrementing it until after
successful download/extract (update references to COLLECTED in
messages/manifests accordingly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

medium_boom Medium PR blast radius; likely benefits from human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants