Skip to content

markdown-content-parity: revisit baseline NOISE_PATTERNS now that audience segmentation has escape hatches #87

@dacharyc

Description

@dacharyc

Background

The markdown-content-parity check now has formal escape hatches for intentional content divergence:

  • data-markdown-ignore HTML attribute
  • --parity-exclusions CSS selectors
  • Configurable thresholds (including 0/0 informational mode)

This is documented in docs/checks/observability.md under "Audience segmentation" and "Configuring parity."

The baseline NOISE_PATTERNS list in src/checks/observability/markdown-content-parity.ts predates these escape hatches. It currently filters specific text segments before comparison:

/^last updated/,
/^was this page helpful/,
/^thank you for your feedback/,
/^previous\s+\S.*next\s+\S/,    // pagination
/^start from the beginning$/,
/^join our .* server/,
/^loading video content/,
/^\/.+\/.+/,                     // breadcrumb paths
/^for ai agents:/,               // agent-directive banner

The list grew organically: each entry was added to fix a specific site's chrome leaking into parity comparisons. With the escape hatches now available, the question is whether these baseline filters are still appropriate or whether some should move to site responsibility.

The fairness problem

The current baseline favors sites whose chrome happens to use the filtered phrasing over sites that don't. For example, ^was this page helpful matches some platforms but not others, since equivalent widgets use varied phrasing ("Did this page help you?", "Was this useful?", star ratings, thumbs up/down). Adding more phrasings to the baseline shifts the inconsistency rather than resolves it.

The same issue applies to:

  • ^join our .* server matches Discord CTAs but not Slack/Telegram/etc.
  • ^for ai agents: matches one specific agent-directive banner phrasing.

Proposal

Part 1: Document parityExclusions recipes

Add a "Common chrome patterns" section to docs/checks/observability.md under markdown-content-parity. Provide ready-to-use parityExclusions selectors for chrome that varies in phrasing across sites:

  • Feedback widgets ([class*=\"feedback\"], [aria-label*=\"feedback\"], .was-this-helpful, etc.)
  • "View as markdown" agent directive links
  • Cookie banners and privacy bars
  • Community CTAs (Discord/Slack invitations)
  • Footer link bars (Edit this page, Report a bug, etc.)

This gives site owners a clear path: when chrome inflates their parity score, copy a recipe instead of needing the baseline updated for their phrasing.

Part 2: Decide on baseline shrinkage

Two consistent positions:

A. Keep only universal web infrastructure in baseline (recommended)

These are essentially impossible for a site to opt out of, and aren't really content:

  • ^last updated/
  • ^previous\s+\S.*next\s+\S/ (pagination)
  • ^/.+/.+/ (breadcrumb paths — though this regex is fragile)
  • ^loading video content/

Move the rest to site responsibility (their phrasing varies, so the baseline is biased toward sites using one specific wording):

  • ^was this page helpful/
  • ^thank you for your feedback/
  • ^join our .* server/
  • ^start from the beginning$/
  • ^for ai agents:/

Site owners use parityExclusions recipes from Part 1.

B. Leave baseline as-is, don't grow it

Keep all current entries (sites already passing depend on them), but adopt a policy: no new patterns. Future requests for additional phrasings are answered with "use parityExclusions."

This is less consistent but lower-disruption for sites currently passing under the baseline.

Decision needed

  1. Is shrinking the baseline (option A) desirable, or is the migration risk too high (option B)?
  2. Either way, document parityExclusions recipes so the escape hatch is discoverable.

This issue is to make the decision and execute it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions