Add cost gate and pre-flight check to Documentation Unbloat workflow#26248
Add cost gate and pre-flight check to Documentation Unbloat workflow#26248
Conversation
…kflow - Add skip-if-match to skip when a draft PR with doc-unbloat label is open - Add doc-unbloat label to created PRs for skip-if-match to identify them - Add ## 0. Pre-flight Validation section at start of agent prompt: - 0.1 Verify docs/src/content/docs/ directory exists - 0.2 Count editable candidate markdown files (noop if none) - 0.3 Check cache; compute uncleaned candidates (noop if all cleaned recently) - Expand bash allowlist to support pre-flight commands (find, grep, xargs, awk, date) - Recompile lock file Closes #<issue_number> Agent-Logs-Url: https://github.com/github/gh-aw/sessions/12f6ef2e-92c8-467a-98be-a182e6b95023 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a cost-control gate and “pre-flight” validation to the Documentation Unbloat agentic workflow to avoid invoking Claude when there’s already an in-flight PR or when there’s nothing actionable to edit.
Changes:
- Add
skip-if-matchto avoid running when an open draftdoc-unbloatPR already exists. - Add a new “0. Pre-flight Validation” section to the agent prompt (directory exists, editable candidates exist, and not all candidates were cleaned recently).
- Expand the bash allowlist and ensure created PRs are labeled
doc-unbloatfor disambiguation/skip matching.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/unbloat-docs.md | Adds skip gate, PR labeling, bash allowlist updates, and the new pre-flight validation prompt section. |
| .github/workflows/unbloat-docs.lock.yml | Regenerates the compiled workflow to include the new label, expanded tool allowlist, and pre-activation skip check wiring. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
.github/workflows/unbloat-docs.md:183
- Same issue as above: if the eligible-file
findreturns no results,xargsmay still rungrep -rLwithout any file arguments and grep will recurse from the repo root, inflatingTOTALand breaking theUNCLEANEDcalculation. Update this pipeline to handle an empty list safely (or usefind ... -exec ... {} +instead ofxargs).
TOTAL=$(find docs/src/content/docs -path '*/blog*' -prune \
-o -name '*.md' -type f ! -name 'frontmatter-full.md' -print \
| xargs grep -rL 'disable-agentic-editing: true' 2>/dev/null \
| wc -l)
- Files reviewed: 2/2 changed files
- Comments generated: 3
| # Get total eligible files | ||
| TOTAL=$(find docs/src/content/docs -path '*/blog*' -prune \ | ||
| -o -name '*.md' -type f ! -name 'frontmatter-full.md' -print \ | ||
| | xargs grep -rL 'disable-agentic-editing: true' 2>/dev/null \ | ||
| | wc -l) | ||
|
|
||
| # Count recently cleaned files (last 7 days from cache) | ||
| # Cache lines are in format: "YYYY-MM-DD - Cleaned: <filename>" | ||
| RECENT_CUTOFF=$(date -d '7 days ago' '+%Y-%m-%d' 2>/dev/null || date -v-7d '+%Y-%m-%d' 2>/dev/null || echo "0000-00-00") | ||
| CLEANED=$(awk -v cutoff="$RECENT_CUTOFF" 'NF>0 && $1>=cutoff{count++} END{print count+0}' \ | ||
| /tmp/gh-aw/cache-memory/cleaned-files.txt 2>/dev/null || echo "0") |
There was a problem hiding this comment.
CLEANED is counted as the number of cache lines newer than the cutoff, not the number of unique eligible files cleaned recently. If the cache contains duplicate entries for the same file, or entries for files that are no longer eligible, CLEANED can exceed TOTAL and UNCLEANED becomes ≤ 0, causing a false noop. Consider extracting filenames from the cache, de-duping them, and intersecting with the current eligible-file list before computing UNCLEANED.
| # Get total eligible files | |
| TOTAL=$(find docs/src/content/docs -path '*/blog*' -prune \ | |
| -o -name '*.md' -type f ! -name 'frontmatter-full.md' -print \ | |
| | xargs grep -rL 'disable-agentic-editing: true' 2>/dev/null \ | |
| | wc -l) | |
| # Count recently cleaned files (last 7 days from cache) | |
| # Cache lines are in format: "YYYY-MM-DD - Cleaned: <filename>" | |
| RECENT_CUTOFF=$(date -d '7 days ago' '+%Y-%m-%d' 2>/dev/null || date -v-7d '+%Y-%m-%d' 2>/dev/null || echo "0000-00-00") | |
| CLEANED=$(awk -v cutoff="$RECENT_CUTOFF" 'NF>0 && $1>=cutoff{count++} END{print count+0}' \ | |
| /tmp/gh-aw/cache-memory/cleaned-files.txt 2>/dev/null || echo "0") | |
| # Build the current eligible file list once so recent-cleaned entries can be | |
| # de-duplicated and intersected against the files that are eligible right now. | |
| ELIGIBLE_FILES=$(mktemp) | |
| RECENT_FILES=$(mktemp) | |
| trap 'rm -f "$ELIGIBLE_FILES" "$RECENT_FILES"' EXIT | |
| find docs/src/content/docs -path '*/blog*' -prune \ | |
| -o -name '*.md' -type f ! -name 'frontmatter-full.md' -print \ | |
| | xargs grep -rL 'disable-agentic-editing: true' 2>/dev/null \ | |
| | sort -u > "$ELIGIBLE_FILES" | |
| TOTAL=$(wc -l < "$ELIGIBLE_FILES") | |
| # Count unique recently cleaned files (last 7 days from cache) that are still eligible. | |
| # Cache lines are in format: "YYYY-MM-DD - Cleaned: <filename>" | |
| RECENT_CUTOFF=$(date -d '7 days ago' '+%Y-%m-%d' 2>/dev/null || date -v-7d '+%Y-%m-%d' 2>/dev/null || echo "0000-00-00") | |
| awk -v cutoff="$RECENT_CUTOFF" ' | |
| NF>0 && $1 >= cutoff { | |
| file = $0 | |
| sub(/^[0-9]{4}-[0-9]{2}-[0-9]{2} - Cleaned: /, "", file) | |
| if (file != "") print file | |
| } | |
| ' /tmp/gh-aw/cache-memory/cleaned-files.txt 2>/dev/null \ | |
| | sort -u > "$RECENT_FILES" | |
| CLEANED=$(comm -12 "$ELIGIBLE_FILES" "$RECENT_FILES" | wc -l) |
| bash: | ||
| - "find docs/src/content/docs -name '*.md'" | ||
| - "find docs/src/content/docs *" | ||
| - "find /tmp/gh-aw/cache-memory *" |
There was a problem hiding this comment.
The bash allowlist includes wc -l * and wc, but the new pre-flight commands use wc -l with stdin (no filename). In other workflows (e.g., sergo.md) wc -l is explicitly allowlisted, suggesting wc -l * won’t match wc -l and the agent may be blocked from running the pre-flight checks. Add an explicit wc -l allowlist entry (or broaden the wc allowlist consistently) so the pre-flight commands are actually permitted.
| - "find /tmp/gh-aw/cache-memory *" | |
| - "find /tmp/gh-aw/cache-memory *" | |
| - "wc -l" |
| -o -name '*.md' -type f ! -name 'frontmatter-full.md' -print \ | ||
| | xargs grep -rL 'disable-agentic-editing: true' 2>/dev/null \ |
There was a problem hiding this comment.
The find ... | xargs grep -rL ... pipeline is unsafe when find returns no files: xargs can still invoke grep -rL with no paths, which makes grep recurse from the current directory and can produce a large, incorrect candidate count. Guard against an empty input set (e.g., use an xargs mode that does not run on empty input, or avoid xargs by using find ... -exec grep -L ... {} +).
This issue also appears on line 180 of the same file.
| -o -name '*.md' -type f ! -name 'frontmatter-full.md' -print \ | |
| | xargs grep -rL 'disable-agentic-editing: true' 2>/dev/null \ | |
| -o -name '*.md' -type f ! -name 'frontmatter-full.md' \ | |
| -exec grep -L 'disable-agentic-editing: true' {} + 2>/dev/null \ |
Documentation Unbloat runs daily with Claude (expensive) at ~50% success rate — ~$55/week — because there's no guard against running when nothing actionable exists or a previous run's PR is still open.
Changes
Frontmatter cost gate (
skip-if-match)doc-unbloatalready exists — no agent invoked, zero costdoc-unbloatlabel to created PRs so the query can identify them (disambiguates from other[docs]-prefixed workflows)Agent prompt pre-flight section (
## 0. Pre-flight Validation)Three fast checks the agent must run before any expensive analysis:
docs/src/content/docs/missing →noop.mdcandidates (excluding blog, generated,disable-agentic-editingfiles) →noopnoopCheck 0.3 computes
UNCLEANED = TOTAL - CLEANEDusingawkagainst the cache'sYYYY-MM-DD - Cleaned: <file>format:Bash allowlist expansion
Generalized
findpatterns and addedxargs *,awk *,date *,grep *to support the new pre-flight commands.