Background
The markdown-content-parity check now has formal escape hatches for intentional content divergence:
data-markdown-ignore HTML attribute
--parity-exclusions CSS selectors
- Configurable thresholds (including 0/0 informational mode)
This is documented in docs/checks/observability.md under "Audience segmentation" and "Configuring parity."
The baseline NOISE_PATTERNS list in src/checks/observability/markdown-content-parity.ts predates these escape hatches. It currently filters specific text segments before comparison:
/^last updated/,
/^was this page helpful/,
/^thank you for your feedback/,
/^previous\s+\S.*next\s+\S/, // pagination
/^start from the beginning$/,
/^join our .* server/,
/^loading video content/,
/^\/.+\/.+/, // breadcrumb paths
/^for ai agents:/, // agent-directive banner
The list grew organically: each entry was added to fix a specific site's chrome leaking into parity comparisons. With the escape hatches now available, the question is whether these baseline filters are still appropriate or whether some should move to site responsibility.
The fairness problem
The current baseline favors sites whose chrome happens to use the filtered phrasing over sites that don't. For example, ^was this page helpful matches some platforms but not others, since equivalent widgets use varied phrasing ("Did this page help you?", "Was this useful?", star ratings, thumbs up/down). Adding more phrasings to the baseline shifts the inconsistency rather than resolves it.
The same issue applies to:
^join our .* server matches Discord CTAs but not Slack/Telegram/etc.
^for ai agents: matches one specific agent-directive banner phrasing.
Proposal
Part 1: Document parityExclusions recipes
Add a "Common chrome patterns" section to docs/checks/observability.md under markdown-content-parity. Provide ready-to-use parityExclusions selectors for chrome that varies in phrasing across sites:
- Feedback widgets (
[class*=\"feedback\"], [aria-label*=\"feedback\"], .was-this-helpful, etc.)
- "View as markdown" agent directive links
- Cookie banners and privacy bars
- Community CTAs (Discord/Slack invitations)
- Footer link bars (Edit this page, Report a bug, etc.)
This gives site owners a clear path: when chrome inflates their parity score, copy a recipe instead of needing the baseline updated for their phrasing.
Part 2: Decide on baseline shrinkage
Two consistent positions:
A. Keep only universal web infrastructure in baseline (recommended)
These are essentially impossible for a site to opt out of, and aren't really content:
^last updated/
^previous\s+\S.*next\s+\S/ (pagination)
^/.+/.+/ (breadcrumb paths — though this regex is fragile)
^loading video content/
Move the rest to site responsibility (their phrasing varies, so the baseline is biased toward sites using one specific wording):
^was this page helpful/
^thank you for your feedback/
^join our .* server/
^start from the beginning$/
^for ai agents:/
Site owners use parityExclusions recipes from Part 1.
B. Leave baseline as-is, don't grow it
Keep all current entries (sites already passing depend on them), but adopt a policy: no new patterns. Future requests for additional phrasings are answered with "use parityExclusions."
This is less consistent but lower-disruption for sites currently passing under the baseline.
Decision needed
- Is shrinking the baseline (option A) desirable, or is the migration risk too high (option B)?
- Either way, document
parityExclusions recipes so the escape hatch is discoverable.
This issue is to make the decision and execute it.
Background
The
markdown-content-paritycheck now has formal escape hatches for intentional content divergence:data-markdown-ignoreHTML attribute--parity-exclusionsCSS selectorsThis is documented in
docs/checks/observability.mdunder "Audience segmentation" and "Configuring parity."The baseline
NOISE_PATTERNSlist insrc/checks/observability/markdown-content-parity.tspredates these escape hatches. It currently filters specific text segments before comparison:The list grew organically: each entry was added to fix a specific site's chrome leaking into parity comparisons. With the escape hatches now available, the question is whether these baseline filters are still appropriate or whether some should move to site responsibility.
The fairness problem
The current baseline favors sites whose chrome happens to use the filtered phrasing over sites that don't. For example,
^was this page helpfulmatches some platforms but not others, since equivalent widgets use varied phrasing ("Did this page help you?", "Was this useful?", star ratings, thumbs up/down). Adding more phrasings to the baseline shifts the inconsistency rather than resolves it.The same issue applies to:
^join our .* servermatches Discord CTAs but not Slack/Telegram/etc.^for ai agents:matches one specific agent-directive banner phrasing.Proposal
Part 1: Document
parityExclusionsrecipesAdd a "Common chrome patterns" section to
docs/checks/observability.mdundermarkdown-content-parity. Provide ready-to-useparityExclusionsselectors for chrome that varies in phrasing across sites:[class*=\"feedback\"],[aria-label*=\"feedback\"],.was-this-helpful, etc.)This gives site owners a clear path: when chrome inflates their parity score, copy a recipe instead of needing the baseline updated for their phrasing.
Part 2: Decide on baseline shrinkage
Two consistent positions:
A. Keep only universal web infrastructure in baseline (recommended)
These are essentially impossible for a site to opt out of, and aren't really content:
^last updated/^previous\s+\S.*next\s+\S/(pagination)^/.+/.+/(breadcrumb paths — though this regex is fragile)^loading video content/Move the rest to site responsibility (their phrasing varies, so the baseline is biased toward sites using one specific wording):
^was this page helpful/^thank you for your feedback/^join our .* server/^start from the beginning$/^for ai agents:/Site owners use
parityExclusionsrecipes from Part 1.B. Leave baseline as-is, don't grow it
Keep all current entries (sites already passing depend on them), but adopt a policy: no new patterns. Future requests for additional phrasings are answered with "use
parityExclusions."This is less consistent but lower-disruption for sites currently passing under the baseline.
Decision needed
parityExclusionsrecipes so the escape hatch is discoverable.This issue is to make the decision and execute it.