Skip to content

Add cost gate and pre-flight check to Documentation Unbloat workflow#26248

Merged
pelikhan merged 2 commits intomainfrom
copilot/deep-report-add-cost-gate
Apr 14, 2026
Merged

Add cost gate and pre-flight check to Documentation Unbloat workflow#26248
pelikhan merged 2 commits intomainfrom
copilot/deep-report-add-cost-gate

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 14, 2026

Documentation Unbloat runs daily with Claude (expensive) at ~50% success rate — ~$55/week — because there's no guard against running when nothing actionable exists or a previous run's PR is still open.

Changes

Frontmatter cost gate (skip-if-match)

  • Skips the entire workflow in pre-activation when an open draft PR tagged doc-unbloat already exists — no agent invoked, zero cost
  • Adds doc-unbloat label to created PRs so the query can identify them (disambiguates from other [docs]-prefixed workflows)

Agent prompt pre-flight section (## 0. Pre-flight Validation)
Three fast checks the agent must run before any expensive analysis:

Check Condition → action
0.1 docs/src/content/docs/ missing → noop
0.2 No editable .md candidates (excluding blog, generated, disable-agentic-editing files) → noop
0.3 All candidates cleaned within 7 days (via cache) → noop

Check 0.3 computes UNCLEANED = TOTAL - CLEANED using awk against the cache's YYYY-MM-DD - Cleaned: <file> format:

CLEANED=$(awk -v cutoff="$RECENT_CUTOFF" \
  'NF>0 && $1>=cutoff{count++} END{print count+0}' \
  /tmp/gh-aw/cache-memory/cleaned-files.txt 2>/dev/null || echo "0")
UNCLEANED=$(( TOTAL - CLEANED ))

Bash allowlist expansion
Generalized find patterns and added xargs *, awk *, date *, grep * to support the new pre-flight commands.

…kflow

- Add skip-if-match to skip when a draft PR with doc-unbloat label is open
- Add doc-unbloat label to created PRs for skip-if-match to identify them
- Add ## 0. Pre-flight Validation section at start of agent prompt:
  - 0.1 Verify docs/src/content/docs/ directory exists
  - 0.2 Count editable candidate markdown files (noop if none)
  - 0.3 Check cache; compute uncleaned candidates (noop if all cleaned recently)
- Expand bash allowlist to support pre-flight commands (find, grep, xargs, awk, date)
- Recompile lock file

Closes #<issue_number>

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/12f6ef2e-92c8-467a-98be-a182e6b95023

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Add cost gate and pre-flight check to Documentation Unbloat Add cost gate and pre-flight check to Documentation Unbloat workflow Apr 14, 2026
Copilot AI requested a review from pelikhan April 14, 2026 16:23
@pelikhan pelikhan marked this pull request as ready for review April 14, 2026 16:26
Copilot AI review requested due to automatic review settings April 14, 2026 16:26
@pelikhan pelikhan merged commit 7c60d4e into main Apr 14, 2026
53 checks passed
@pelikhan pelikhan deleted the copilot/deep-report-add-cost-gate branch April 14, 2026 16:26
This was referenced Apr 14, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a cost-control gate and “pre-flight” validation to the Documentation Unbloat agentic workflow to avoid invoking Claude when there’s already an in-flight PR or when there’s nothing actionable to edit.

Changes:

  • Add skip-if-match to avoid running when an open draft doc-unbloat PR already exists.
  • Add a new “0. Pre-flight Validation” section to the agent prompt (directory exists, editable candidates exist, and not all candidates were cleaned recently).
  • Expand the bash allowlist and ensure created PRs are labeled doc-unbloat for disambiguation/skip matching.
Show a summary per file
File Description
.github/workflows/unbloat-docs.md Adds skip gate, PR labeling, bash allowlist updates, and the new pre-flight validation prompt section.
.github/workflows/unbloat-docs.lock.yml Regenerates the compiled workflow to include the new label, expanded tool allowlist, and pre-activation skip check wiring.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

.github/workflows/unbloat-docs.md:183

  • Same issue as above: if the eligible-file find returns no results, xargs may still run grep -rL without any file arguments and grep will recurse from the repo root, inflating TOTAL and breaking the UNCLEANED calculation. Update this pipeline to handle an empty list safely (or use find ... -exec ... {} + instead of xargs).
TOTAL=$(find docs/src/content/docs -path '*/blog*' -prune \
  -o -name '*.md' -type f ! -name 'frontmatter-full.md' -print \
  | xargs grep -rL 'disable-agentic-editing: true' 2>/dev/null \
  | wc -l)
  • Files reviewed: 2/2 changed files
  • Comments generated: 3

Comment on lines +179 to +189
# Get total eligible files
TOTAL=$(find docs/src/content/docs -path '*/blog*' -prune \
-o -name '*.md' -type f ! -name 'frontmatter-full.md' -print \
| xargs grep -rL 'disable-agentic-editing: true' 2>/dev/null \
| wc -l)

# Count recently cleaned files (last 7 days from cache)
# Cache lines are in format: "YYYY-MM-DD - Cleaned: <filename>"
RECENT_CUTOFF=$(date -d '7 days ago' '+%Y-%m-%d' 2>/dev/null || date -v-7d '+%Y-%m-%d' 2>/dev/null || echo "0000-00-00")
CLEANED=$(awk -v cutoff="$RECENT_CUTOFF" 'NF>0 && $1>=cutoff{count++} END{print count+0}' \
/tmp/gh-aw/cache-memory/cleaned-files.txt 2>/dev/null || echo "0")
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLEANED is counted as the number of cache lines newer than the cutoff, not the number of unique eligible files cleaned recently. If the cache contains duplicate entries for the same file, or entries for files that are no longer eligible, CLEANED can exceed TOTAL and UNCLEANED becomes ≤ 0, causing a false noop. Consider extracting filenames from the cache, de-duping them, and intersecting with the current eligible-file list before computing UNCLEANED.

Suggested change
# Get total eligible files
TOTAL=$(find docs/src/content/docs -path '*/blog*' -prune \
-o -name '*.md' -type f ! -name 'frontmatter-full.md' -print \
| xargs grep -rL 'disable-agentic-editing: true' 2>/dev/null \
| wc -l)
# Count recently cleaned files (last 7 days from cache)
# Cache lines are in format: "YYYY-MM-DD - Cleaned: <filename>"
RECENT_CUTOFF=$(date -d '7 days ago' '+%Y-%m-%d' 2>/dev/null || date -v-7d '+%Y-%m-%d' 2>/dev/null || echo "0000-00-00")
CLEANED=$(awk -v cutoff="$RECENT_CUTOFF" 'NF>0 && $1>=cutoff{count++} END{print count+0}' \
/tmp/gh-aw/cache-memory/cleaned-files.txt 2>/dev/null || echo "0")
# Build the current eligible file list once so recent-cleaned entries can be
# de-duplicated and intersected against the files that are eligible right now.
ELIGIBLE_FILES=$(mktemp)
RECENT_FILES=$(mktemp)
trap 'rm -f "$ELIGIBLE_FILES" "$RECENT_FILES"' EXIT
find docs/src/content/docs -path '*/blog*' -prune \
-o -name '*.md' -type f ! -name 'frontmatter-full.md' -print \
| xargs grep -rL 'disable-agentic-editing: true' 2>/dev/null \
| sort -u > "$ELIGIBLE_FILES"
TOTAL=$(wc -l < "$ELIGIBLE_FILES")
# Count unique recently cleaned files (last 7 days from cache) that are still eligible.
# Cache lines are in format: "YYYY-MM-DD - Cleaned: <filename>"
RECENT_CUTOFF=$(date -d '7 days ago' '+%Y-%m-%d' 2>/dev/null || date -v-7d '+%Y-%m-%d' 2>/dev/null || echo "0000-00-00")
awk -v cutoff="$RECENT_CUTOFF" '
NF>0 && $1 >= cutoff {
file = $0
sub(/^[0-9]{4}-[0-9]{2}-[0-9]{2} - Cleaned: /, "", file)
if (file != "") print file
}
' /tmp/gh-aw/cache-memory/cleaned-files.txt 2>/dev/null \
| sort -u > "$RECENT_FILES"
CLEANED=$(comm -12 "$ELIGIBLE_FILES" "$RECENT_FILES" | wc -l)

Copilot uses AI. Check for mistakes.
bash:
- "find docs/src/content/docs -name '*.md'"
- "find docs/src/content/docs *"
- "find /tmp/gh-aw/cache-memory *"
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bash allowlist includes wc -l * and wc, but the new pre-flight commands use wc -l with stdin (no filename). In other workflows (e.g., sergo.md) wc -l is explicitly allowlisted, suggesting wc -l * won’t match wc -l and the agent may be blocked from running the pre-flight checks. Add an explicit wc -l allowlist entry (or broaden the wc allowlist consistently) so the pre-flight commands are actually permitted.

Suggested change
- "find /tmp/gh-aw/cache-memory *"
- "find /tmp/gh-aw/cache-memory *"
- "wc -l"

Copilot uses AI. Check for mistakes.
Comment on lines +158 to +159
-o -name '*.md' -type f ! -name 'frontmatter-full.md' -print \
| xargs grep -rL 'disable-agentic-editing: true' 2>/dev/null \
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find ... | xargs grep -rL ... pipeline is unsafe when find returns no files: xargs can still invoke grep -rL with no paths, which makes grep recurse from the current directory and can produce a large, incorrect candidate count. Guard against an empty input set (e.g., use an xargs mode that does not run on empty input, or avoid xargs by using find ... -exec grep -L ... {} +).

This issue also appears on line 180 of the same file.

Suggested change
-o -name '*.md' -type f ! -name 'frontmatter-full.md' -print \
| xargs grep -rL 'disable-agentic-editing: true' 2>/dev/null \
-o -name '*.md' -type f ! -name 'frontmatter-full.md' \
-exec grep -L 'disable-agentic-editing: true' {} + 2>/dev/null \

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[deep-report] Add cost gate and pre-flight check to Documentation Unbloat (~$55/week at 50% success rate)

3 participants