Preserve angle brackets in code blocks and inline code spans during sanitization by Copilot · Pull Request #22005 · github/gh-aw

Copilot · 2026-03-20T17:54:48Z

convertXmlTags and removeXmlComments were applied to the entire content string, converting type parameters and code-block content like VBuffer<float32> → VBuffer(float32). This mangled code samples and issue titles containing generic/template syntax.

Changes

`sanitize_content_core.cjs` — three new helpers

getFencedCodeRanges(s) — returns [start, end) character ranges of fenced code blocks (` ``` / ~~~) by scanning line-by-line
applyFnOutsideInlineCode(text, fn) — applies a transform to a text segment, skipping backtick-delimited inline code spans (handles multi-backtick spans)
applyToNonCodeRegions(s, fn) — composes both helpers; falls back to fn(s) on any parse error (never reduces protection)

sanitizeContentCore updated to call applyToNonCodeRegions(sanitized, removeXmlComments) and applyToNonCodeRegions(sanitized, convertXmlTags) instead of the bare functions.

`sanitize_content.cjs`

Same wrapping applied to the mention-filtering variant of the pipeline.

Scope

Only XML tag conversion and XML comment removal are made code-region aware. URL redaction, invisible-character stripping, and all other sanitizers are unchanged — they still apply everywhere.

Example

Before fix:
  `VBuffer<float32>` in inline code → `` `VBuffer(float32)` ``
  ```cpp
  std::vector<int> v;   →   std::vector(int) v;

After fix:
VBuffer<float32> preserved in inline code

std::vector<int> v;   preserved in fenced block


<!-- START COPILOT CODING AGENT TIPS -->
---

💬 Send tasks to Copilot coding agent from [Slack](https://gh.io/cca-slack-docs) and [Teams](https://gh.io/cca-teams-docs) to turn conversations into code. Copilot posts an update in your thread when it's finished.

Angle brackets (e.g. VBuffer<float32>) inside fenced code blocks and backtick inline-code spans are no longer converted to parentheses by the XML-tag sanitizer. The same code-region awareness is applied to XML comment removal. Three helpers are added to sanitize_content_core.cjs: - getFencedCodeRanges: locates fenced code blocks by character position - applyFnOutsideInlineCode: applies a transform skipping inline code spans - applyToNonCodeRegions: composes both to skip all code regions Both sanitizeContentCore and sanitizeContent now use these helpers when calling removeXmlComments and convertXmlTags, with a safe fallback to full-string processing if parsing fails. Ten new tests verify the behaviour for fenced blocks, inline code, regular text (still sanitised), XML comments, and the VBuffer<float32> example from the original issue." Co-authored-by: dsyme <7204669+dsyme@users.noreply.github.com> Agent-Logs-Url: https://github.com/github/gh-aw/sessions/43a8350f-5a6b-41d6-b5cf-16233acc7ed4

pelikhan · 2026-03-20T18:22:08Z

Sorry I closed this one acidentaly... We need to somehow be careful because markdown parsers may have different bugs which would allow to create unbalanced code regions. We should never allow the dangerous HTML elements.

github-actions · 2026-03-21T05:20:13Z

🤖 Contribution Check Report

Hey @Copilot 👋 — great work on this sanitization fix! Preserving angle brackets inside fenced code blocks and inline code spans (e.g. VBuffer(float32)) is a real usability improvement — mangled type parameters in issue titles and code samples is a legitimate pain point worth fixing.

✅ Checklist

Check	Result
On-topic	✅ Yes — targeted bug fix in `actions/setup/js` sanitization pipeline
Follows process	✅ Yes — Copilot coding agent PR assigned to core team member (`dsyme`)
Focused	✅ Yes — all 3 changed files relate to the single XML sanitization concern
New dependencies	✅ No — no manifest changes
Has tests	✅ Yes — 10 test cases added to `sanitize_content.test.cjs`
Has description	✅ Yes — detailed body with Changes section, scope callout, and before/after example
Diff size	273 lines (265 additions, 8 deletions)

🟢 Verdict: Aligned

The PR is well-structured and complete. The three new helpers (getFencedCodeRanges, applyFnOutsideInlineCode, applyToNonCodeRegions) are clearly documented with JSDoc, the safe fallback (catch (_e) → fn(s)) ensures the security invariant is never weakened, and the test suite covers backtick fences, tilde fences, language-specifier fences, multi-backtick inline spans, mixed content, and the original VBuffer(float32) reproduction case.

One optional enhancement worth considering: the helper functions in sanitize_content_core.cjs are tested indirectly through the sanitizeContent integration tests. Adding a small dedicated test block directly for getFencedCodeRanges and applyFnOutsideInlineCode (e.g. in a sanitize_content_core.test.cjs) would give tighter feedback on edge cases like deeply nested backticks or malformed fences — but this is not a blocker.

This PR looks ready for maintainer review. 🎉

Note

🔒 Integrity filtering filtered 1 item

Integrity filtering activated and filtered the following item during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.

issue:#unknown (search_issues: has secrecy requirements that agent doesn't meet. The agent is not authorized to access private-scoped data.)

Generated by Contribution Check · ◷

…ix-angle-bracket-redaction

Copilot

Pull request overview

Prevents XML tag conversion (<...> → (...)) and XML comment stripping from mangling markdown code samples and inline code spans that include angle brackets (e.g., generics/templates like VBuffer<float32>).

Changes:

Added markdown-aware helpers to identify fenced code blocks and inline code spans, and to apply transforms only outside those regions.
Updated both sanitization pipelines (sanitizeContentCore and the mention-filtering sanitizeContent) to run removeXmlComments / convertXmlTags via the new wrapper.
Added tests asserting angle brackets and XML comments are preserved inside fenced code blocks and inline code spans, while still sanitizing regular text.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
actions/setup/js/sanitize_content_core.cjs	Adds helpers for code-region detection and applies XML sanitizers only outside code regions; exports `applyToNonCodeRegions`.
actions/setup/js/sanitize_content.cjs	Uses `applyToNonCodeRegions` in the mention-filtering sanitization pipeline as well.
actions/setup/js/sanitize_content.test.cjs	Adds tests for preserving angle brackets/comments in inline and fenced code regions and continued sanitization in normal text.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-21T13:31:18Z

actions/setup/js/sanitize_content_core.cjs

+  // Remove XML comments – skip code blocks and inline code to avoid altering code content
+  sanitized = applyToNonCodeRegions(sanitized, removeXmlComments);

-  // Convert XML tags to parentheses format to prevent injection
-  sanitized = convertXmlTags(sanitized);
+  // Convert XML tags to parentheses format – skip code blocks and inline code so that
+  // type parameters (e.g. VBuffer<float32>) and code containing angle brackets are preserved
+  sanitized = applyToNonCodeRegions(sanitized, convertXmlTags);



applyToNonCodeRegions determines code regions before balanceCodeRegions runs later in the pipeline. If balanceCodeRegions modifies fence lengths / closes unclosed fences (a scenario this repo explicitly expects for AI-generated markdown), content that ends up inside a code block after balancing may still have had XML comment removal / tag conversion applied earlier, reintroducing the original mangling for malformed markdown. Consider balancing code regions before running the code-aware XML transforms, or running the XML transforms on the balanced markdown so both stages agree on code boundaries.

Copilot · 2026-03-21T13:31:19Z

actions/setup/js/sanitize_content.test.cjs

+      expect(result).toContain("`VBuffer<float>`");
+      expect(result).not.toContain("VBuffer(float32)");
+      expect(result).not.toContain("VBuffer(float)");
+    });


The new tests cover top-level fenced blocks and inline code spans, but they don’t cover fenced code blocks inside blockquotes (e.g. > ```yaml) or list items—both are common markdown patterns and currently not detected by getFencedCodeRanges. Adding coverage for these cases would prevent regressions where angle brackets are still converted inside those code blocks.

Suggested change

});

});

it("should treat fenced code blocks inside blockquotes as code regions", () => {

const markdown = [

"> ```yaml",

"> apiVersion: v1",

"> kind: Pod<V1>",

"> ```",

].join("\n");

const result = sanitizeContent(markdown);

expect(result).toContain("kind: Pod<V1>");

expect(result).not.toContain("kind: Pod(V1)");

});

it("should treat fenced code blocks inside list items as code regions", () => {

const markdown = [

"- ```csharp",

" var list = new List<string>();",

" ```",

].join("\n");

const result = sanitizeContent(markdown);

expect(result).toContain("List<string>");

expect(result).not.toContain("List(string)");

});

Copilot · 2026-03-21T13:31:19Z

actions/setup/js/sanitize_content_core.cjs

+    if (!inBlock) {
+      const m = trimmed.match(/^(`{3,}|~{3,})/);
+      if (m) {
+        inBlock = true;
+        blockStart = pos;
+        fenceChar = m[1][0];
+        fenceLen = m[1].length;
+      }


getFencedCodeRanges only recognizes fences when the trimmed line starts with backticks/tildes. This misses valid fenced code blocks inside blockquotes (e.g. > ```yaml) and list items (e.g. 1. ````, - ````), so XML tag conversion/comment removal will still run inside those code blocks and continue mangling code samples in those common markdown constructs. Consider extending the fence detection to handle >-prefixed blockquote lines and list-item prefixes (or switching to a markdown fence parser that understands these cases).

Copilot · 2026-03-21T13:31:19Z

actions/setup/js/sanitize_content_core.cjs

+function applyToNonCodeRegions(s, fn) {
+  if (!s || typeof s !== "string") {
+    return s || "";
+  }


The JSDoc for applyToNonCodeRegions says it returns a string, but when s is truthy and not a string (e.g. Buffer/object) it returns s as-is (return s || ""). Since this helper is exported, that return-type mismatch can leak non-strings to callers. Consider always returning a string here (e.g. "" for non-strings) or updating the JSDoc/type contract accordingly.

Copilot · 2026-03-21T13:31:19Z

actions/setup/js/sanitize_content_core.cjs

+  while (i < text.length) {
+    if (text[i] !== "`") {
+      i++;
+      continue;
+    }
+
+    // Count consecutive backticks at the current position
+    const btStart = i;
+    let btCount = 0;
+    while (i < text.length && text[i] === "`") {
+      btCount++;
+      i++;
+    }
+    // i is now past the opening backtick sequence
+
+    // Look for the matching closing sequence of exactly btCount backticks
+    let closeIdx = -1;
+    let j = i;
+    while (j < text.length) {
+      if (text[j] === "`") {
+        let closeCount = 0;
+        const jStart = j;
+        while (j < text.length && text[j] === "`") {
+          closeCount++;
+          j++;
+        }
+        if (closeCount === btCount) {
+          closeIdx = jStart;
+          break;
+        }
+        // Different length – keep scanning (j already advanced past these backticks)
+      } else {
+        j++;
+      }
+    }
+
+    if (closeIdx !== -1) {
+      // Valid inline code span found: apply fn to the text before it, then keep the code span
+      if (textStart < btStart) {
+        parts.push(fn(text.slice(textStart, btStart)));
+      }
+      parts.push(text.slice(btStart, closeIdx + btCount));
+      textStart = closeIdx + btCount;
+      i = textStart;
+    }
+    // If no matching close was found, the backticks are treated as regular text (i already advanced)


applyFnOutsideInlineCode can become O(n²) for pathological inputs with many unmatched/differently-sized backtick runs because for each opener it linearly scans the remainder of the string to find a matching closer. Given the sanitizer allows up to ~524k chars, this can become a noticeable CPU hotspot / potential DoS vector. Consider rewriting this as a single-pass state machine that advances j monotonically (or using a regex-based tokenizer) so the overall runtime is O(n).

Suggested change

while (i < text.length) {

if (text[i] !== "`") {

i++;

continue;

}

// Count consecutive backticks at the current position

const btStart = i;

let btCount = 0;

while (i < text.length && text[i] === "`") {

btCount++;

i++;

}

// i is now past the opening backtick sequence

// Look for the matching closing sequence of exactly btCount backticks

let closeIdx = -1;

let j = i;

while (j < text.length) {

if (text[j] === "`") {

let closeCount = 0;

const jStart = j;

while (j < text.length && text[j] === "`") {

closeCount++;

j++;

}

if (closeCount === btCount) {

closeIdx = jStart;

break;

}

// Different length – keep scanning (j already advanced past these backticks)

} else {

j++;

}

}

if (closeIdx !== -1) {

// Valid inline code span found: apply fn to the text before it, then keep the code span

if (textStart < btStart) {

parts.push(fn(text.slice(textStart, btStart)));

}

parts.push(text.slice(btStart, closeIdx + btCount));

textStart = closeIdx + btCount;

i = textStart;

}

// If no matching close was found, the backticks are treated as regular text (i already advanced)

// To avoid O(n^2) behavior when scanning for matching backtick runs,

// we pre-scan the string once to find all backtick runs, then pair them

// by length in a single pass. This preserves the original semantics:

// for a given run length, the first occurrence opens a span, the next

// closes it, and so on, ignoring intervening runs of different lengths.

/** @type {{ start: number, length: number }[]} */

const runs = [];

for (let idx = 0; idx < text.length;) {

if (text[idx] !== "`") {

idx++;

continue;

}

const start = idx;

let count = 0;

while (idx < text.length && text[idx] === "`") {

count++;

idx++;

}

runs.push({ start, length: count });

}

/**

* Map from backtick run length to the start index of the currently

* pending opener (if any) for that length.

* @type {Record<string, number | undefined>}

*/

const pending = Object.create(null);

/** @type {{ openStart: number, openLen: number, closeStart: number }[]} */

const spans = [];

for (const run of runs) {

const lenKey = String(run.length);

const prev = pending[lenKey];

if (prev === undefined) {

// No pending opener for this length: mark this run as an opener.

pending[lenKey] = run.start;

} else {

// Found a closer for the existing opener of this length.

spans.push({

openStart: prev,

openLen: run.length,

closeStart: run.start,

});

pending[lenKey] = undefined;

}

}

// spans are generated in the order in which their closing runs appear.

// Since each openStart is always before its closeStart and runs are

// iterated in increasing order of start, spans are effectively ordered

// by appearance in the text and do not overlap for a given length.

// We now walk through the spans, emitting processed text and raw code.

for (const span of spans) {

const btStart = span.openStart;

const btCount = span.openLen;

const closeIdx = span.closeStart;

// Skip spans that start before the current textStart (e.g., if any

// earlier spans have already advanced textStart past them).

if (btStart < textStart) {

continue;

}

// Apply fn to non-code text before this code span.

if (textStart < btStart) {

parts.push(fn(text.slice(textStart, btStart)));

}

// Preserve the code span (including backticks) unchanged.

parts.push(text.slice(btStart, closeIdx + btCount));

textStart = closeIdx + btCount;

Initial plan

a85948b

Copilot AI assigned Copilot and dsyme Mar 20, 2026

Copilot started work on behalf of dsyme March 20, 2026 17:54 View session

Copilot AI linked an issue Mar 20, 2026 that may be closed by this pull request

Angle-bracket redaction too strong in code and issue titles #22004

Closed

github-actions bot mentioned this pull request Mar 20, 2026

[aw] No-Op Runs #21483

Open

Copilot AI changed the title ~~[WIP] Fix redaction of angle brackets in code and issue titles~~ Preserve angle brackets in code blocks and inline code spans during sanitization Mar 20, 2026

Copilot finished work on behalf of dsyme March 20, 2026 18:16

Copilot AI requested a review from dsyme March 20, 2026 18:16

pelikhan closed this Mar 20, 2026

pelikhan reopened this Mar 20, 2026

This was referenced Mar 21, 2026

Smoke Test: Copilot - 23368835222 #22069

Closed

Smoke Test: Copilot - 23372042458 #22086

Closed

github-actions bot added the lgtm label Mar 21, 2026

This was referenced Mar 21, 2026

[Contribution Check Report] Contribution Check Report — 2026-03-21 #22115

Closed

[Daily Report] Daily Status Report — 2026-03-21 #22116

Open

dsyme added 2 commits March 21, 2026 13:18

Merge branch 'main' of https://github.com/github/gh-aw into copilot/f…

e3c4045

…ix-angle-bracket-redaction

fix test

93c319d

dsyme approved these changes Mar 21, 2026

View reviewed changes

dsyme marked this pull request as ready for review March 21, 2026 13:27

Copilot AI review requested due to automatic review settings March 21, 2026 13:27

dsyme merged commit 69d40fe into main Mar 21, 2026
51 checks passed

dsyme deleted the copilot/fix-angle-bracket-redaction branch March 21, 2026 13:27

Copilot started reviewing on behalf of dsyme March 21, 2026 13:27 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

This was referenced Mar 21, 2026

[actions] Update GitHub Actions versions - 2026-03-21 #22120

Merged

Smoke Test: Claude - 23382583484 #22143

Closed

github-actions bot mentioned this pull request Mar 21, 2026

[PR Triage Report] PR Triage Report - 2026-03-21 #22172

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve angle brackets in code blocks and inline code spans during sanitization#22005

Preserve angle brackets in code blocks and inline code spans during sanitization#22005
dsyme merged 4 commits intomainfrom
copilot/fix-angle-bracket-redaction

Copilot AI commented Mar 20, 2026 •

edited

Loading

Uh oh!

pelikhan commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-    });
+    });
+    it("should treat fenced code blocks inside blockquotes as code regions", () => {
+      const markdown = [
+        "> ```yaml",
+        "> apiVersion: v1",
+        "> kind: Pod<V1>",
+        "> ```",
+      ].join("\n");
+      const result = sanitizeContent(markdown);
+      expect(result).toContain("kind: Pod<V1>");
+      expect(result).not.toContain("kind: Pod(V1)");
+    });
+    it("should treat fenced code blocks inside list items as code regions", () => {
+      const markdown = [
+        "- ```csharp",
+        "  var list = new List<string>();",
+        "  ```",
+      ].join("\n");
+      const result = sanitizeContent(markdown);
+      expect(result).toContain("List<string>");
+      expect(result).not.toContain("List(string)");
+    });

Conversation

Copilot AI commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

sanitize_content_core.cjs — three new helpers

sanitize_content.cjs

Scope

Example

Uh oh!

pelikhan commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

🤖 Contribution Check Report

✅ Checklist

🟢 Verdict: Aligned

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Mar 20, 2026 •

edited

Loading

`sanitize_content_core.cjs` — three new helpers

`sanitize_content.cjs`