fix: add FLAG num support for agglutinative languages by tolgakaratas · Pull Request #1090 · vale-cli/vale

tolgakaratas · 2026-03-23T17:27:30Z

Summary

The spell checker's affix parser used rune (single character) as the map key for affix rules, which broke dictionaries using FLAG num format
Numeric flags like 14308,10482,4720 were parsed by reading only the first digit, making most affix rules unreachable
This affected all agglutinative languages (Turkish, Hungarian, Finnish, etc.) whose Hunspell dictionaries use FLAG num with thousands of suffix groups

Changes

Change AffixMap key from rune to string to support multi-character flags
Change compoundMap key from rune to string
Add parseFlags() method that correctly handles all Hunspell flag formats: ASCII, num, long, and UTF-8
Update expand() to use parsed flag slices instead of rune iteration
Update compound rule parsing in gospell.go

Impact

For the Turkish dictionary (tr_TR), this fix enables correct recognition of ~59,000 suffix groups and ~15.8 million inflected word forms that were previously unreachable. The tr_TR.aff file uses FLAG num with comma-separated numeric IDs.

Before: hunspell -d tr_TR -l recognizes "belediyeye", "adaletli", "ancak" — but Vale flags them as unknown.
After: Vale correctly recognizes these words using the same dictionary files.

Test plan

New unit tests for parseFlags() covering ASCII, num, long, and UTF-8 formats
New unit test for FLAG num affix parsing and expansion
New integration test: newGoSpellReader with FLAG num dictionary
Backward compatibility test: ASCII flag dictionaries still work correctly
All 8 new tests pass

=== RUN   TestParseFlagsASCII
--- PASS: TestParseFlagsASCII (0.00s)
=== RUN   TestParseFlagsNum
--- PASS: TestParseFlagsNum (0.00s)
=== RUN   TestParseFlagsLong
--- PASS: TestParseFlagsLong (0.00s)
=== RUN   TestParseFlagsUTF8
--- PASS: TestParseFlagsUTF8 (0.00s)
=== RUN   TestFlagNumAffixParsing
--- PASS: TestFlagNumAffixParsing (0.00s)
=== RUN   TestFlagNumExpand
--- PASS: TestFlagNumExpand (0.00s)
=== RUN   TestFlagNumGoSpellReader
--- PASS: TestFlagNumGoSpellReader (0.00s)
=== RUN   TestASCIFlagBackwardCompatibility
--- PASS: TestASCIFlagBackwardCompatibility (0.00s)
PASS

The spell checker's affix parser used `rune` (single character) as the map key for affix rules. This broke dictionaries that use `FLAG num` format, where flags are comma-separated numbers (e.g., "14308,10482"). Only the first digit of each numeric flag was read, causing most affix rules to be unreachable. This affected all agglutinative languages (Turkish, Hungarian, Finnish, etc.) whose Hunspell dictionaries use `FLAG num` with tens of thousands of suffix groups. Changes: - Change AffixMap key from `rune` to `string` - Change compoundMap key from `rune` to `string` - Add parseFlags() method that handles ASCII, num, long, and UTF-8 formats - Update expand() to use parsed flag slices instead of rune iteration - Update compound rule parsing in gospell.go For the Turkish dictionary (tr_TR), this enables correct recognition of ~59,000 suffix groups and ~15.8M inflected word forms that were previously unreachable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add FLAG num support for agglutinative languages#1090

fix: add FLAG num support for agglutinative languages#1090
tolgakaratas wants to merge 1 commit intovale-cli:v3from
Denomas:fix/flag-num-support

tolgakaratas commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tolgakaratas commented Mar 23, 2026

Summary

Changes

Impact

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant