Skip to content

Count unique files in create_pull_request patch limit and add max-patch-files config#28472

Merged
pelikhan merged 4 commits intomainfrom
copilot/fix-pull-request-file-limit
Apr 25, 2026
Merged

Count unique files in create_pull_request patch limit and add max-patch-files config#28472
pelikhan merged 4 commits intomainfrom
copilot/fix-pull-request-file-limit

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 25, 2026

The create_pull_request 100-file limit counted every diff --git header in the generated patch. Since git format-patch base..branch emits one header per (commit, file), long-running autoloop branches that merge main before committing trip the limit (E003: ...received 1768) even when the iteration only touches 1–3 files. There was also no escape hatch — only max-patch-size was configurable.

Changes

  • Count unique files (actions/setup/js/create_pull_request.cjs)

    • New parseDiffGitHeader() helper performs a proper C-style quoted-token tokenization of diff --git headers, correctly handling quoted paths with embedded \" and \\ escapes.
    • New countUniquePatchFiles() walks every header, dedupes by post-image path, and conservatively counts each unparseable header as a synthetic unique entry (keyed by byte offset) so a malformed/quoted-with-escapes header line can never silently bypass the limit.
    • enforcePullRequestLimits(patch, maxFiles) now takes a configurable limit and uses unique counting.
  • New max-patch-files safe-outputs option (mirrors max-patch-size)

    • SafeOutputsConfig.MaximumPatchFiles (default 100), parsed in safe_outputs_config.go for int / int64 / uint64 / float64.
    • Explicit overflow / range guards before narrowing to int: int64/uint64 clamp to math.MaxInt with a log warning, and out-of-range floats / NaN / ±Inf are rejected with a log warning instead of producing an undefined narrowing.
    • Threaded into the create_pull_request handler config as max_patch_files, then read by the JS handler.
    • Schema entry added under safe-outputs in main_workflow_schema.json.
  • Tests

    • JS: multi-commit dedup regression, configurable raise/lower override, quoted/escaped path parsing, conservative counting of unparseable headers, empty-patch, boundary.
    • Go: TestHandlerConfigPatchFiles, TestParseSafeOutputsMaxPatchFiles (incl. MaxUint64 clamp, huge floats, NaN, ±Inf).
  • Recompiled all 202 .lock.yml files to surface max_patch_files: 100 in compiled handler configs.

Usage

safe-outputs:
  max-patch-size: 10240   # KB
  max-patch-files: 500    # new — override the 100-file default
  create-pull-request:
    ...

Copilot AI changed the title [WIP] Fix file limit enforcement in create_pull_request handler Count unique files in create_pull_request patch limit and add max-patch-files config Apr 25, 2026
Copilot finished work on behalf of mrjf April 25, 2026 15:36
Copilot AI requested a review from mrjf April 25, 2026 15:36
@pelikhan pelikhan marked this pull request as ready for review April 25, 2026 15:57
Copilot AI review requested due to automatic review settings April 25, 2026 15:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes create_pull_request patch file-limit enforcement to count unique files across multi-commit git format-patch output, and adds a new top-level safe-outputs config knob (max-patch-files, default 100) that flows through compilation and schema.

Changes:

  • Update JS PR creation handler to count unique patch file paths (with configurable max_patch_files) instead of raw diff --git header count.
  • Add max-patch-files to safe-outputs Go config parsing/types, thread into compiled handler config, and extend JSON schema.
  • Add Go/JS tests and regenerate compiled .lock.yml workflow outputs to include max_patch_files: 100.
Show a summary per file
File Description
actions/setup/js/create_pull_request.cjs Implements unique-file counting for patches and enforces a configurable max patch files limit.
actions/setup/js/create_pull_request.test.cjs Adds regression and override tests for unique-file counting and max-files enforcement.
pkg/workflow/safe_outputs_config.go Parses new safe-outputs.max-patch-files and defaults it to 100.
pkg/workflow/compiler_types.go Adds MaximumPatchFiles to SafeOutputsConfig.
pkg/workflow/compiler_safe_outputs_handlers.go Threads MaximumPatchFiles into compiled create_pull_request handler config as max_patch_files.
pkg/workflow/compiler_safe_outputs_config_test.go Tests config propagation to handler config and parsing of max-patch-files.
pkg/parser/schemas/main_workflow_schema.json Adds schema entry for safe-outputs.max-patch-files with default/minimum.
.changeset/patch-create-pr-max-files-config.md Documents the behavior change and new configuration option for release notes.
.github/workflows/weekly-safe-outputs-spec-review.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/weekly-editors-health-check.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/weekly-blog-post-writer.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/update-astro.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/unbloat-docs.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/ubuntu-image-analyzer.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/tidy.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/test-create-pr-error-handling.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/technical-doc-writer.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/spec-extractor.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/spec-enforcer.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/smoke-project.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/smoke-multi-pr.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/smoke-create-cross-repo-pr.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/slide-deck-maintainer.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/schema-feature-coverage.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/refiner.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/q.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/poem-bot.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/layout-spec-maintainer.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/jsweep.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/instructions-janitor.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/hourly-ci-cleaner.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/go-logger.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/glossary-maintainer.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/github-mcp-tools-report.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/functional-pragmatist.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/dictation-prompt.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/developer-docs-consolidator.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/dead-code-remover.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/daily-workflow-updater.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/daily-safe-output-integrator.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/daily-rendering-scripts-verifier.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/daily-doc-updater.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/daily-doc-healer.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/daily-community-attribution.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/daily-astrostylelite-markdown-spellcheck.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/daily-architecture-diagram.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/code-simplifier.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/code-scanning-fixer.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/cloclo.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.
.github/workflows/ci-coach.lock.yml Regenerated compiled safe-outputs config to include max_patch_files: 100.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 50/50 changed files
  • Comments generated: 2

Comment on lines +259 to +276
// Match: diff --git a/<path> b/<path>
// Paths may be quoted when they contain unusual characters; we capture both
// forms and prefer the "b/" path. The non-greedy capture for the a-path is
// bounded by " b/" to handle paths that contain spaces.
const re = /^diff --git "?a\/(.+?)"? "?b\/(.+?)"?$/gm;
let match;
while ((match = re.exec(patchContent)) !== null) {
const bPath = match[2] || match[1];
if (bPath) {
files.add(bPath);
}
}
// Fallback: if the structured regex matched nothing (unexpected patch
// shape) but the patch contains diff headers, count those headers so we
// never silently skip the limit check.
if (files.size === 0) {
const fallback = patchContent.match(/^diff --git /gm);
return fallback ? fallback.length : 0;
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

countUniquePatchFiles() can undercount (and potentially let patches bypass the file limit) when git emits quoted paths that contain escaped quotes/backslashes (e.g. diff --git "a/foo\"bar" "b/foo\"bar"). The current regex does not handle C-style escapes, and the fallback only triggers when no headers match, so a single unparseable header can be silently dropped.

Consider either using a quoted-string aware pattern (handling \" and \\), or detecting any diff --git headers that didn't match and counting them conservatively so the safety limit can’t be evaded. Adding a regression test for an escaped-quote filename would help prevent reintroductions.

Copilot uses AI. Check for mistakes.
Comment on lines +448 to +470
if maxPatchFiles, exists := outputMap["max-patch-files"]; exists {
switch v := maxPatchFiles.(type) {
case int:
if v >= 1 {
config.MaximumPatchFiles = v
}
case int64:
if v >= 1 {
config.MaximumPatchFiles = int(v)
}
case uint64:
if v >= 1 {
config.MaximumPatchFiles = int(v)
}
case float64:
intVal := int(v)
if v != float64(intVal) {
safeOutputsConfigLog.Printf("max-patch-files: float value %.2f truncated to integer %d", v, intVal)
}
if intVal >= 1 {
config.MaximumPatchFiles = intVal
}
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When parsing max-patch-files, the int64/uint64/float64 cases cast to int without any bounds checking. Very large values can overflow/wrap and end up as an unexpected (possibly negative) MaximumPatchFiles, which then falls back to the default later and makes the config appear to be ignored.

Suggestion: clamp to math.MaxInt (or reject with a log warning) before converting, and for float64 also guard against values outside the representable int range.

Copilot uses AI. Check for mistakes.
@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot review all comments

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 25, 2026

@copilot review all comments

Addressed both reviewer comments in 46f994f:

  1. Quoted-path parsing: rewrote countUniquePatchFiles to use a proper C-style quoted-token tokenizer in a new parseDiffGitHeader helper that correctly handles \" and \\ escapes. Any header that still fails to parse now contributes a synthetic unique entry (keyed by its byte offset) so a malformed/unparseable header can never silently bypass the limit. Added regression tests for escaped-quote and escaped-backslash filenames, plus a mixed parseable/unparseable scenario.

  2. Int bounds checking: added explicit guards before narrowing int64/uint64/float64int for max-patch-files. uint64 clamps to math.MaxInt, out-of-range floats and NaN/±Inf are rejected with a log warning instead of producing an undefined narrowing. Added test cases for MaxUint64, 1e30, -1e30, NaN, and +Inf.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/graphql
    • Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw (http block)
  • https://api.github.com/repos/actions/ai-inference/git/ref/tags/v1
    • Triggering command: /usr/bin/gh gh api /repos/actions/ai-inference/git/ref/tags/v1 --jq [.object.sha, .object.type] | @tsv --noprofile (http block)
  • https://api.github.com/repos/actions/checkout/git/ref/tags/v3
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v3 --jq [.object.sha, .object.type] | @tsv get --global rgo/bin/git http.https://gitgit (http block)
  • https://api.github.com/repos/actions/checkout/git/ref/tags/v5
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v5 --jq [.object.sha, .object.type] | @tsv (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v5 --jq [.object.sha, .object.type] | @tsv --show-toplevel (http block)
  • https://api.github.com/repos/actions/checkout/git/ref/tags/v6
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv (http block)
  • https://api.github.com/repos/actions/github-script/git/ref/tags/v8
    • Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v8 --jq [.object.sha, .object.type] | @tsv --show-toplevel (http block)
  • https://api.github.com/repos/actions/github-script/git/ref/tags/v9
    • Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v9 --jq [.object.sha, .object.type] | @tsv HEAD (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v9 --jq [.object.sha, .object.type] | @tsv --noprofile (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v9 --jq [.object.sha, .object.type] | @tsv Cv8T/zg16L3uqOgPudXbrCv8T (http block)
  • https://api.github.com/repos/actions/setup-go/git/ref/tags/v4
    • Triggering command: /usr/bin/gh gh api /repos/actions/setup-go/git/ref/tags/v4 --jq [.object.sha, .object.type] | @tsv --noprofile (http block)
  • https://api.github.com/repos/actions/setup-node/git/ref/tags/v4
    • Triggering command: /usr/bin/gh gh api /repos/actions/setup-node/git/ref/tags/v4 --jq [.object.sha, .object.type] | @tsv (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/setup-node/git/ref/tags/v4 --jq [.object.sha, .object.type] | @tsv HEAD (http block)
  • https://api.github.com/repos/actions/upload-artifact/git/ref/tags/v4
    • Triggering command: /usr/bin/gh gh api /repos/actions/upload-artifact/git/ref/tags/v4 --jq [.object.sha, .object.type] | @tsv (http block)
  • https://api.github.com/repos/github/gh-aw-actions/git/ref/tags/v0.1.2
    • Triggering command: /usr/bin/gh gh api /repos/github/gh-aw-actions/git/ref/tags/v0.1.2 --jq [.object.sha, .object.type] | @tsv HEAD (http block)
  • https://api.github.com/repos/github/gh-aw-actions/git/ref/tags/v1.0.0
    • Triggering command: /usr/bin/gh gh api /repos/github/gh-aw-actions/git/ref/tags/v1.0.0 --jq [.object.sha, .object.type] | @tsv /\1/p (http block)
  • https://api.github.com/repos/github/gh-aw-actions/git/ref/tags/v1.2.3
    • Triggering command: /usr/bin/gh gh api /repos/github/gh-aw-actions/git/ref/tags/v1.2.3 --jq [.object.sha, .object.type] | @tsv (http block)
  • https://api.github.com/repos/github/gh-aw/git/ref/tags/v0.47.4
    • Triggering command: /usr/bin/gh gh api /repos/github/gh-aw/git/ref/tags/v0.47.4 --jq [.object.sha, .object.type] | @tsv --show-toplevel (http block)
  • https://api.github.com/repos/github/gh-aw/git/ref/tags/v1.0.0
    • Triggering command: /usr/bin/gh gh api /repos/github/gh-aw/git/ref/tags/v1.0.0 --jq [.object.sha, .object.type] | @tsv (http block)
  • https://api.github.com/repos/nonexistent/action/git/ref/tags/v999.999.999
    • Triggering command: /usr/bin/gh gh api /repos/nonexistent/action/git/ref/tags/v999.999.999 --jq [.object.sha, .object.type] | @tsv (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from pelikhan April 25, 2026 16:13
@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot merge main fix conflicts recompile

…est-file-limit

# Conflicts:
#	.github/workflows/schema-feature-coverage.lock.yml

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 25, 2026

@copilot merge main fix conflicts recompile

Merged origin/main in 164b6ce. One conflict in .github/workflows/schema-feature-coverage.lock.yml (regenerated file), resolved by taking origin/main's version and recompiling all 202 workflow lock files via make build && make recompile. Targeted tests still pass.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/graphql
    • Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw (http block)
    • Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw pgYXNY9_Zjcl (http block)
    • Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw wruUsjv/Zsup3-GC-C (http block)
  • https://api.github.com/repos/astral-sh/setup-uv/git/ref/tags/eac588ad8def6316056a12d4907a9d4d84ff7a3b
    • Triggering command: /usr/bin/gh gh api /repos/astral-sh/setup-uv/git/ref/tags/eac588ad8def6316056a12d4907a9d4d84ff7a3b --jq [.object.sha, .object.type] | @tsv (http block)
  • https://api.github.com/repos/github/gh-aw
    • Triggering command: /usr/bin/gh gh api /repos/github/gh-aw --jq .default_branch (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

create_pull_request 100-file limit counts full branch diff, not per-push diff

4 participants