Skip to content

Tighten fuzzy_match() to avoid accidental hyphen-stripped matches #80

@patrick-chinchill

Description

@patrick-chinchill

Problem

fuzzy_match() in scripts/verify_test_fidelity.py uses substring word matching (sum(1 for w in words if w in existing)). Because Python test names use _ as a separator but the words themselves come from lowercased/stripped TS titles, a word like post will match any Python test containing the substring post (e.g. test_postable_object_...). Combined with the 60% threshold, this can produce surprising matches when TS test titles use hyphens or compound terms.

Proposed work

  1. Switch from substring (w in existing) to whole-word matching (w in existing.split("_")).
  2. Add unit tests for the matcher covering (a) hyphen-stripped titles, (b) compound words, (c) the 60% threshold edge case.
  3. Re-run --strict to confirm no regressions after the tightening.

Why it matters

False-positive fuzzy matches silently hide real coverage gaps — exactly the kind of failure --strict is supposed to prevent. The current behavior is conservative (no known false positives at 588/588) but brittle against future TS test additions.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions