Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 42 additions & 5 deletions actions/setup/js/markdown_code_region_balancer.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,20 @@
* }
* ```
*
* Common AI-Generated Error Patterns (in order of frequency):
* 1. Unclosed code blocks at end of content (FIXED: adds closing fence)
* 2. Nested fences at same indentation level (FIXED: escapes by increasing fence length)
* 3. Mixed fence types causing confusion (HANDLED: treats ` and ~ separately)
* 4. Indented bare fences in markdown examples (HANDLED: preserves as content)
*
* Rules:
* - Supports both backtick (`) and tilde (~) fences
* - Minimum fence length is 3 characters
* - A fence must be at least as long as the opening fence to close it
* - Fences can have optional language specifiers
* - Indentation is preserved but doesn't affect matching
* - Content inside code blocks should never contain valid fences
* - Indented fences (different indentation than opener) are treated as content
*
* @module markdown_code_region_balancer
*/
Expand All @@ -33,10 +40,19 @@
* Balance markdown code regions by attempting to fix mismatched fences.
*
* The algorithm:
* 1. Parse through markdown line by line, skipping XML comment regions
* 2. Track code block state (open/closed)
* 3. When nested fences are detected, increase outer fence length by 1
* 4. Ensure all opened code blocks are properly closed
* 1. Normalize line endings to ensure consistent processing
* 2. Parse through markdown line by line, skipping XML comment regions
* 3. Track code block state (open/closed)
* 4. When nested fences are detected, increase outer fence length by 1
* 5. Ensure all opened code blocks are properly closed
* 6. Quality check: Verify the result doesn't create more unbalanced regions
* than the original input - if it does, return the original (normalized)
*
* Quality guarantees:
* - Never creates MORE unbalanced code regions than the input
* - Always normalizes line endings (\r\n -> \n)
* - If the algorithm would degrade quality, returns original content
* - Preserves indentation and fence character types
*
* @param {string} markdown - Markdown content to balance
* @returns {string} Balanced markdown with properly matched code regions
Expand Down Expand Up @@ -345,7 +361,28 @@ function balanceCodeRegions(markdown) {
result.push(closingFence);
}

return result.join("\n");
const resultMarkdown = result.join("\n");

// Quality check: Verify we didn't make things worse
// Compare the unbalanced counts before and after
const originalCounts = countCodeRegions(normalizedMarkdown);
const resultCounts = countCodeRegions(resultMarkdown);

// If we created MORE unbalanced regions, give up and return original (normalized)
if (resultCounts.unbalanced > originalCounts.unbalanced) {
return normalizedMarkdown;
}

// If we didn't improve the balance at all (same unbalanced count),
// and we modified the markdown significantly, check if we should give up
if (resultCounts.unbalanced === originalCounts.unbalanced && resultMarkdown !== normalizedMarkdown) {
// If the total count increased (we added more fences somehow), give up
if (resultCounts.total > originalCounts.total) {
return normalizedMarkdown;
}
}

return resultMarkdown;
}

/**
Expand Down
104 changes: 104 additions & 0 deletions actions/setup/js/markdown_code_region_balancer.test.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -704,5 +704,109 @@ code
`;
expect(balancer.balanceCodeRegions(input)).toBe(input);
});

it("should never create MORE unbalanced regions than input", () => {
// Test quality degradation detection
const testCases = [
"```\ncode\n```", // Balanced - should not modify
"```javascript\nunclosed", // Unclosed - should add closing
"```\ncode1\n```\n```\ncode2\n```", // Multiple balanced - should not modify
"```\nnested\n```\n```\n```", // Unbalanced sequence
"```markdown\n```\nexample\n```\n```", // Nested example
"```\nfirst\n```\nsecond\n```\nthird\n```", // Partially balanced
];

testCases.forEach(input => {
const originalCounts = balancer.countCodeRegions(input);
const result = balancer.balanceCodeRegions(input);
const resultCounts = balancer.countCodeRegions(result);

// Key quality invariant: never create MORE unbalanced regions
expect(resultCounts.unbalanced).toBeLessThanOrEqual(originalCounts.unbalanced);
});
});

it("should preserve balanced markdown exactly (except line ending normalization)", () => {
const balancedExamples = ["```javascript\nconst x = 1;\n```", "~~~markdown\ntext\n~~~", "```\ngeneric\n```\n\n```python\ncode\n```", "# Title\n\n```bash\necho test\n```\n\nMore text", "````\nfour backticks\n````"];

balancedExamples.forEach(input => {
const result = balancer.balanceCodeRegions(input);
expect(result).toBe(input);
});
});

it("should handle AI-generated common error patterns", () => {
// Common error pattern: AI generates nested markdown examples without proper escaping
const aiPattern1 = `How to use code blocks:

\`\`\`markdown
You can write code like this:
\`\`\`javascript
code here
\`\`\`
\`\`\``;

const result1 = balancer.balanceCodeRegions(aiPattern1);
const counts1 = balancer.countCodeRegions(result1);

// Result should have fewer or equal unbalanced regions
const originalCounts1 = balancer.countCodeRegions(aiPattern1);
expect(counts1.unbalanced).toBeLessThanOrEqual(originalCounts1.unbalanced);

// Common error pattern: Unclosed code block at end of content
const aiPattern2 = `Here's some code:

\`\`\`javascript
function example() {
console.log("test");
}`;

const result2 = balancer.balanceCodeRegions(aiPattern2);
expect(balancer.isBalanced(result2)).toBe(true);

// Common error pattern: Mixed fence types causing confusion
const aiPattern3 = `\`\`\`markdown
Example with tilde:
~~~
content
~~~
\`\`\``;

const result3 = balancer.balanceCodeRegions(aiPattern3);
const counts3 = balancer.countCodeRegions(result3);
expect(counts3.unbalanced).toBe(0);
});

it("should handle pathological cases without hanging", () => {
// Generate pathological input: alternating fences
let pathological = "";
for (let i = 0; i < 100; i++) {
pathological += i % 2 === 0 ? "```\n" : "~~~\n";
}

// Should complete in reasonable time (not hang)
const start = Date.now();
const result = balancer.balanceCodeRegions(pathological);
const elapsed = Date.now() - start;

expect(elapsed).toBeLessThan(1000); // Should complete in less than 1 second
expect(typeof result).toBe("string");
});

it("should handle random fence variations", () => {
// Generate random fence lengths and types
const fenceChars = ["`", "~"];
const fenceLengths = [3, 4, 5, 6, 10];

for (let i = 0; i < 20; i++) {
const char = fenceChars[i % fenceChars.length];
const length = fenceLengths[i % fenceLengths.length];
const fence = char.repeat(length);
const input = `${fence}javascript\ncode${i}\n${fence}`;

const result = balancer.balanceCodeRegions(input);
expect(balancer.isBalanced(result)).toBe(true);
}
});
});
});
Loading
Loading