Skip to content

feat: normalize Unicode in token system for consistent port matching#101

Merged
rororowyourboat merged 1 commit intodevfrom
feat/96-unicode-token-normalization
Mar 4, 2026
Merged

feat: normalize Unicode in token system for consistent port matching#101
rororowyourboat merged 1 commit intodevfrom
feat/96-unicode-token-normalization

Conversation

@rororowyourboat
Copy link
Collaborator

Summary

  • Apply NFC normalization before lowercasing in tokenize() so equivalent Unicode representations produce identical tokens
  • 4 new tests: NFC/NFD equivalence, overlap across encodings, subset across encodings, ASCII no-op

Closes #96

Apply NFC normalization before lowercasing in tokenize() so that
equivalent Unicode representations produce identical tokens.

Closes #96
@rororowyourboat rororowyourboat merged commit 199ba8e into dev Mar 4, 2026
@rororowyourboat rororowyourboat deleted the feat/96-unicode-token-normalization branch March 4, 2026 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enhancement: normalize Unicode in token system for port name matching

1 participant