fix: Korean keyboard unclickable + German ß — universal diacritic normalization by Hugo0 · Pull Request #155 · Hugo0/wordle

Hugo0 · 2026-03-14T19:22:51Z

Summary

Korean keyboard was completely non-functional (issue #153). Root cause: keyboard keys used Compatibility Jamo (U+3130) while the word list used Hangul Jamo (U+1100) — visually identical but different Unicode codepoints, so every key press was silently rejected.

Changes

1. Universal diacritic normalization fix (app.py)

After filtering acceptable_characters to chars used in words, also include diacritic base characters whose variants appear in the word list
Universal fix: any language where keyboard encoding ≠ word list encoding gets automatic support
Documents future applicability to Devanagari, Thai, Khmer

2. Korean diacritic_map (language_config.json)

Maps 50+ Compatibility Jamo (keyboard) ↔ Hangul Jamo (word list) equivalences
Reuses the existing diacritic normalization system — no new normalization code

3. Korean keyboard expansion (ko_keyboard.json)

Default layout: 3→5 rows, adds double consonants + all compound vowels (98.6% word coverage)
"Full" layout for remaining compound jongseong characters

4. Korean compound jongseong blocklist (ko_blocklist.txt)

129 words with compound final consonants excluded from daily selection
Remain valid guesses via Full layout — 100% of daily words solvable on default keyboard

5. Physical keyboard support (game.ts, physical_key_map)

Maps physical key codes (event.code) to jamo, bypassing OS IME composition
Standard Korean 2-set (Dubeolsik) layout: Q→ㅂ, W→ㅈ, Shift+Q→ㅃ, etc.
Universal config — any language needing IME bypass can add a physical_key_map

6. German ß fix (de/language_config.json)

Fixed broken "ss": ["ß"] (multi-char, ignored by normalizer) → "s": ["ß"]

7. Test improvements

load_all_keyboard_chars(): coverage tests check ALL layouts, not just default
Removed ALL keyboard coverage xfails — 2033 tests pass (4 former xfails now pass)

Future work (documented in code)

Option C: Decompose compound jongseong → individual jamo, 6-cell grid (like kordle.kr)
Long-press popups: Universal key expansion UI for compound chars

Test plan

uv run pytest tests/ — 2033 passed, 0 failed
pnpm test — 81 passed, 0 failed
uv run ruff check + pnpm format — clean
Manual: load /ko, verify on-screen keyboard works
Manual: load /ko, verify physical keyboard types jamo (bypasses IME)
Manual: load /de, verify typing s matches words with ß
Manual: verify other languages unaffected

Fixes #153

The Korean keyboard used Compatibility Jamo (U+3130 block) while the word list used Hangul Jamo (U+1100 block). These are visually identical but different Unicode codepoints, so keyboard input was silently rejected. Fix uses the existing diacritic_map system to normalize between the two encodings — the same universal mechanism used for accent normalization in European languages. Also adds compound jongseong and compound vowel keys to an extended keyboard layout so all Korean words are playable. Backend change (app.py) is universal: any language with a diacritic_map now automatically includes base characters in the accepted set, even when only variant forms appear in the word list. Fixes #153

coderabbitai · 2026-03-14T19:22:59Z

Warning

Rate limit exceeded

@Hugo0 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 55 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8f67d9fe-2b98-450a-8259-1153d83cc8c5

📥 Commits

Reviewing files that changed from the base of the PR and between 4ddd907 and 6f3f7aa.

📒 Files selected for processing (4)

frontend/src/game.ts
frontend/src/types/index.ts
webapp/app.py
webapp/data/languages/ko/language_config.json

📝 Walkthrough

Walkthrough

The PR refactors keyboard character handling and enhances Korean language support. It introduces a new utility function to aggregate typeable characters across all keyboard layouts, updates test coverage validation to use this function, modifies the Language class to normalize diacritical variants, replaces the Korean "double" keyboard layout with an expanded "full" layout, and adds comprehensive diacritical mappings for Korean.

Changes

Cohort / File(s)	Summary
Keyboard Character Aggregation `tests/conftest.py`	New `load_all_keyboard_chars()` function that collects typeable characters from all keyboard layouts for a language, supporting both multi-layout and legacy list formats, filtering out control keys.
Test Coverage Updates `tests/test_language_config.py`, `tests/test_word_lists.py`	Import and use `load_all_keyboard_chars()` for keyboard coverage validation; replace hard-coded exception sets with empty sets.
Language Initialization & Diacritical Mapping `webapp/app.py`	Convert `characters_used` from list to set for efficiency; add diacritic normalization logic to include base characters when their variants appear in word lists.
German Diacritic Configuration `webapp/data/languages/de/language_config.json`	Update diacritic_map key from "ss" to "s" for sharp S (ß) mapping.
Korean Keyboard Layout Enhancement `webapp/data/languages/ko/ko_keyboard.json`	Replace "korean_double" layout with expanded "korean_full" layout containing additional consonant clusters and extended character groupings.
Korean Language Configuration `webapp/data/languages/ko/language_config.json`	Add comprehensive diacritic_map for Korean, mapping base characters and syllables to variant forms for normalization.
Korean Word Blocklist `webapp/data/languages/ko/ko_blocklist.txt`	New blocklist containing 139 Korean syllables with compound jongseong (final consonant clusters) excluded from default keyboard but available in full layout.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Multiple Keyboard Layouts #105: Both PRs modify keyboard handling logic and the Language class initialization behavior for character set management.

Poem

🐰 A keyboard grows, from double to full,
Korean syllables, now beautiful!
Diacritics dance, base forms unite,
Characters aggregate—test coverage bright! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main changes: fixing the Korean keyboard unclickability issue and addressing German ß via universal diacritic normalization, matching the PR's core objectives.
Linked Issues check	✅ Passed	The PR successfully addresses issue `#153` by implementing diacritic mapping for Korean Jamo characters, adding an extended keyboard layout, and ensuring keyboard characters are properly recognized in the allowed character set.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with fixing the Korean keyboard issue and implementing universal diacritic normalization; no out-of-scope changes detected.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch worktree-fix-korean-keyboard

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Korean default keyboard now 5 rows with double consonants + all compound vowels (98.6% word coverage, up from 77%) - Compound jongseong available on "Full" layout (remaining 1.4%) - German: fix broken multi-char diacritic "ss"→"ß" to single-char "s"→"ß" which the char-by-char normalizer can actually process - Remove all keyboard coverage xfails (both ko and de now pass)

Words with compound final consonants (ㄺ, ㄻ, ㄼ, etc.) are not typeable on the default keyboard. They remain valid guesses (via the Full layout) but are excluded from daily word selection so every daily word is always solvable on the default keyboard. Also adds documentation for future Option C: decomposing compound jongseong into individual jamo with 6-cell grid (like kordle.kr), which would eliminate compound keys entirely. Notes other languages (Devanagari, Thai, Khmer) that may need similar treatment.

Maps physical key codes (event.code) to jamo characters, bypassing the OS IME which would otherwise compose syllable blocks. Uses the standard Korean 2-set (Dubeolsik) layout: Q→ㅂ, W→ㅈ, E→ㄷ, etc. with Shift for double consonants (Shift+Q→ㅃ, Shift+T→ㅆ, etc.). The physical_key_map config is universal — any language needing IME bypass can add one to their language_config.json.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_word_lists.py (1)
128-152: ⚠️ Potential issue | 🟠 Major

Add a default-layout coverage check for daily candidates.

Unioning all layouts is fine for “guessable somewhere”, but it no longer protects the Korean contract introduced here: daily-selectable words must still be typeable on the default korean_2set layout. A word missed in ko_blocklist.txt will now pass CI via korean_full and still ship as an unsolvable daily. Please keep this all-layout assertion for guessability, but add a second check over the daily candidate pool (load_daily_words(lang) or load_word_list(lang) - load_blocklist(lang)) against the default layout only.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_word_lists.py` around lines 128 - 152, The current test unions
characters from all layouts via load_all_keyboard_chars(lang), which misses the
requirement that daily-selectable words must be typeable on the default layout;
add a second assertion that computes the daily candidate set (use
load_daily_words(lang) or compute load_word_list(lang) minus
load_blocklist(lang)) and verify every character in those daily words appears on
the default keyboard layout (load_keyboard(lang, layout="korean_2set") or the
project’s equivalent API); keep the existing all-layout check for guessability
and add this default-layout check (reference
test_keyboard_covers_all_word_characters, KEYBOARD_COVERAGE_XFAIL,
load_word_list, load_daily_words, load_blocklist, load_keyboard,
load_all_keyboard_chars).

🧹 Nitpick comments (2)

webapp/data/languages/ko/language_config.json (1)

56-63: Add the missing ㅒ → ᅤ normalization entry.

ko_keyboard.json now exposes ㅒ on both layouts, but this map jumps straight from ㅑ to ㅓ. Adding the Hangul Jamo variant keeps the keyboard/config pair symmetric and avoids another Compatibility Jamo mismatch if ᅤ ever lands in the guessable lists.
Possible change
         "ㅑ": ["ᅣ"],
+        "ㅒ": ["ᅤ"],
         "ㅓ": ["ᅥ"],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@webapp/data/languages/ko/language_config.json` around lines 56 - 63, The JSON
vowel-to-Jamo map is missing the ㅒ→ᅤ normalization; add an entry mapping the
Hangul character "ㅒ" to the compatibility Jamo array ["ᅤ"] (insert it between
the existing "ㅑ": ["ᅣ"] and "ㅓ": ["ᅥ"] entries) so the language_config
normalization matches the ko_keyboard layouts and avoids Compatibility Jamo
mismatches.

tests/conftest.py (1)

159-181: Keep this helper in lockstep with the runtime keyboard parser.

webapp/app.py still accepts the legacy top-level dict keyboard shape, but this helper only handles {"layouts": ...} or raw lists. If a keyboard JSON uses the legacy form, the tests will under-report typeable chars compared to production.

Possible alignment

-    if isinstance(data, dict) and "layouts" in data:
-        for layout in data["layouts"].values():
-            for row in layout.get("rows", []):
-                chars.update(k for k in row if k not in control_keys)
+    if isinstance(data, dict):
+        layouts = data.get("layouts")
+        if not isinstance(layouts, dict):
+            layouts = {k: v for k, v in data.items() if k != "default"}
+        for layout in layouts.values():
+            rows = layout.get("rows", []) if isinstance(layout, dict) else layout
+            for row in rows:
+                chars.update(k for k in row if k not in control_keys)
     elif isinstance(data, list):
         for row in data:
             chars.update(k for k in row if k not in control_keys)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/conftest.py` around lines 159 - 181, The helper load_all_keyboard_chars
currently only handles data shaped as {"layouts": ...} or a raw list and thus
misses legacy top-level dict keyboard JSONs; update it to detect when data is a
dict but lacks "layouts" and treat the dict's values as rows (or lists of rows)
similarly to the other branches, iterating over those values and filtering out
control_keys (use the existing control_keys set and LANGUAGES_DIR reference) so
legacy keyboard files produce the same char set as the runtime parser.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tests/test_word_lists.py`:
- Around line 128-152: The current test unions characters from all layouts via
load_all_keyboard_chars(lang), which misses the requirement that
daily-selectable words must be typeable on the default layout; add a second
assertion that computes the daily candidate set (use load_daily_words(lang) or
compute load_word_list(lang) minus load_blocklist(lang)) and verify every
character in those daily words appears on the default keyboard layout
(load_keyboard(lang, layout="korean_2set") or the project’s equivalent API);
keep the existing all-layout check for guessability and add this default-layout
check (reference test_keyboard_covers_all_word_characters,
KEYBOARD_COVERAGE_XFAIL, load_word_list, load_daily_words, load_blocklist,
load_keyboard, load_all_keyboard_chars).

---

Nitpick comments:
In `@tests/conftest.py`:
- Around line 159-181: The helper load_all_keyboard_chars currently only handles
data shaped as {"layouts": ...} or a raw list and thus misses legacy top-level
dict keyboard JSONs; update it to detect when data is a dict but lacks "layouts"
and treat the dict's values as rows (or lists of rows) similarly to the other
branches, iterating over those values and filtering out control_keys (use the
existing control_keys set and LANGUAGES_DIR reference) so legacy keyboard files
produce the same char set as the runtime parser.

In `@webapp/data/languages/ko/language_config.json`:
- Around line 56-63: The JSON vowel-to-Jamo map is missing the ㅒ→ᅤ
normalization; add an entry mapping the Hangul character "ㅒ" to the
compatibility Jamo array ["ᅤ"] (insert it between the existing "ㅑ": ["ᅣ"] and
"ㅓ": ["ᅥ"] entries) so the language_config normalization matches the ko_keyboard
layouts and avoids Compatibility Jamo mismatches.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0e4d480f-fdc2-4a21-9735-f8dc8bc01da9

📥 Commits

Reviewing files that changed from the base of the PR and between 01e0882 and 4ddd907.

📒 Files selected for processing (8)

tests/conftest.py
tests/test_language_config.py
tests/test_word_lists.py
webapp/app.py
webapp/data/languages/de/language_config.json
webapp/data/languages/ko/ko_blocklist.txt
webapp/data/languages/ko/ko_keyboard.json
webapp/data/languages/ko/language_config.json

Korean diacritic_map is for internal Unicode normalization (Compatibility Jamo ↔ Hangul Jamo), not player-visible accent variants. The sub-key hints (ᄇ/ᆸ) are meaningless to players and add visual noise. Adds hide_diacritic_hints config flag — keeps hints working for European languages (German ä/ö/ü/ß, etc.) where they're genuinely useful.

Same fix as German ß (PR #155): "oe"→["œ"] and "ae"→["æ"] were multi-char keys that the char-by-char normalizer silently ignored. Changed to single-char mappings: "o"→["ô","œ"] and "a"→["à","â","æ"]. Players can now type 'o' to match œ and 'a' to match æ in French words.

Same fix as German ß (PR #155): "oe"→["œ"] and "ae"→["æ"] were multi-char keys that the char-by-char normalizer silently ignored. Changed to: "o"→["ô","œ"] and "a"→["à","â","æ"]. Players can now type 'o' to match œ and 'a' to match æ in French words.

… words These words contain compound final consonants not typeable on the default keyboard, and were added to ko_blocklist.txt in PR #155.

Hugo0 added 2 commits March 14, 2026 19:56

Hugo0 changed the title ~~fix: Korean keyboard unclickable due to Hangul Jamo Unicode mismatch~~ fix: Korean keyboard unclickable + German ß — universal diacritic normalization Mar 14, 2026

coderabbitai Bot reviewed Mar 14, 2026

View reviewed changes

Hugo0 mentioned this pull request Mar 14, 2026

Scalable sub-component tile coloring for composition-based scripts (Korean, Tamil, Hindi, Chinese) #157

Closed

Hugo0 merged commit ead824c into main Mar 14, 2026
4 checks passed

Hugo0 mentioned this pull request Mar 14, 2026

fix: French œ/æ diacritic mapping #160

Merged

1 task

Hugo0 added a commit that referenced this pull request Mar 14, 2026

fix: remove 19 blocklisted compound jongseong words from Korean daily…

6c51c61

… words These words contain compound final consonants not typeable on the default keyboard, and were added to ko_blocklist.txt in PR #155.

Hugo0 mentioned this pull request Mar 14, 2026

feat: add 13 new languages (2B+ speakers), unified pipeline, word quality infrastructure #149

Merged

6 tasks

coderabbitai Bot mentioned this pull request Mar 16, 2026

fix: remove diacritic maps for languages with distinct alphabet letters #176

Merged

4 tasks

Hugo0 deleted the worktree-fix-korean-keyboard branch April 15, 2026 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Korean keyboard unclickable + German ß — universal diacritic normalization#155

fix: Korean keyboard unclickable + German ß — universal diacritic normalization#155
Hugo0 merged 5 commits into
mainfrom
worktree-fix-korean-keyboard

Hugo0 commented Mar 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 14, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Hugo0 commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Future work (documented in code)

Test plan

Uh oh!

coderabbitai Bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Hugo0 commented Mar 14, 2026 •

edited

Loading

coderabbitai Bot commented Mar 14, 2026 •

edited

Loading