Skip to content

fix: Korean keyboard unclickable + German ß — universal diacritic normalization#155

Merged
Hugo0 merged 5 commits into
mainfrom
worktree-fix-korean-keyboard
Mar 14, 2026
Merged

fix: Korean keyboard unclickable + German ß — universal diacritic normalization#155
Hugo0 merged 5 commits into
mainfrom
worktree-fix-korean-keyboard

Conversation

@Hugo0
Copy link
Copy Markdown
Owner

@Hugo0 Hugo0 commented Mar 14, 2026

Summary

Korean keyboard was completely non-functional (issue #153). Root cause: keyboard keys used Compatibility Jamo (U+3130) while the word list used Hangul Jamo (U+1100) — visually identical but different Unicode codepoints, so every key press was silently rejected.

Changes

1. Universal diacritic normalization fix (app.py)

  • After filtering acceptable_characters to chars used in words, also include diacritic base characters whose variants appear in the word list
  • Universal fix: any language where keyboard encoding ≠ word list encoding gets automatic support
  • Documents future applicability to Devanagari, Thai, Khmer

2. Korean diacritic_map (language_config.json)

  • Maps 50+ Compatibility Jamo (keyboard) ↔ Hangul Jamo (word list) equivalences
  • Reuses the existing diacritic normalization system — no new normalization code

3. Korean keyboard expansion (ko_keyboard.json)

  • Default layout: 3→5 rows, adds double consonants + all compound vowels (98.6% word coverage)
  • "Full" layout for remaining compound jongseong characters

4. Korean compound jongseong blocklist (ko_blocklist.txt)

  • 129 words with compound final consonants excluded from daily selection
  • Remain valid guesses via Full layout — 100% of daily words solvable on default keyboard

5. Physical keyboard support (game.ts, physical_key_map)

  • Maps physical key codes (event.code) to jamo, bypassing OS IME composition
  • Standard Korean 2-set (Dubeolsik) layout: Q→ㅂ, W→ㅈ, Shift+Q→ㅃ, etc.
  • Universal config — any language needing IME bypass can add a physical_key_map

6. German ß fix (de/language_config.json)

  • Fixed broken "ss": ["ß"] (multi-char, ignored by normalizer) → "s": ["ß"]

7. Test improvements

  • load_all_keyboard_chars(): coverage tests check ALL layouts, not just default
  • Removed ALL keyboard coverage xfails — 2033 tests pass (4 former xfails now pass)

Future work (documented in code)

  • Option C: Decompose compound jongseong → individual jamo, 6-cell grid (like kordle.kr)
  • Long-press popups: Universal key expansion UI for compound chars

Test plan

  • uv run pytest tests/ — 2033 passed, 0 failed
  • pnpm test — 81 passed, 0 failed
  • uv run ruff check + pnpm format — clean
  • Manual: load /ko, verify on-screen keyboard works
  • Manual: load /ko, verify physical keyboard types jamo (bypasses IME)
  • Manual: load /de, verify typing s matches words with ß
  • Manual: verify other languages unaffected

Fixes #153

The Korean keyboard used Compatibility Jamo (U+3130 block) while the
word list used Hangul Jamo (U+1100 block). These are visually identical
but different Unicode codepoints, so keyboard input was silently rejected.

Fix uses the existing diacritic_map system to normalize between the two
encodings — the same universal mechanism used for accent normalization
in European languages. Also adds compound jongseong and compound vowel
keys to an extended keyboard layout so all Korean words are playable.

Backend change (app.py) is universal: any language with a diacritic_map
now automatically includes base characters in the accepted set, even
when only variant forms appear in the word list.

Fixes #153
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 14, 2026

Warning

Rate limit exceeded

@Hugo0 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 55 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8f67d9fe-2b98-450a-8259-1153d83cc8c5

📥 Commits

Reviewing files that changed from the base of the PR and between 4ddd907 and 6f3f7aa.

📒 Files selected for processing (4)
  • frontend/src/game.ts
  • frontend/src/types/index.ts
  • webapp/app.py
  • webapp/data/languages/ko/language_config.json
📝 Walkthrough

Walkthrough

The PR refactors keyboard character handling and enhances Korean language support. It introduces a new utility function to aggregate typeable characters across all keyboard layouts, updates test coverage validation to use this function, modifies the Language class to normalize diacritical variants, replaces the Korean "double" keyboard layout with an expanded "full" layout, and adds comprehensive diacritical mappings for Korean.

Changes

Cohort / File(s) Summary
Keyboard Character Aggregation
tests/conftest.py
New load_all_keyboard_chars() function that collects typeable characters from all keyboard layouts for a language, supporting both multi-layout and legacy list formats, filtering out control keys.
Test Coverage Updates
tests/test_language_config.py, tests/test_word_lists.py
Import and use load_all_keyboard_chars() for keyboard coverage validation; replace hard-coded exception sets with empty sets.
Language Initialization & Diacritical Mapping
webapp/app.py
Convert characters_used from list to set for efficiency; add diacritic normalization logic to include base characters when their variants appear in word lists.
German Diacritic Configuration
webapp/data/languages/de/language_config.json
Update diacritic_map key from "ss" to "s" for sharp S (ß) mapping.
Korean Keyboard Layout Enhancement
webapp/data/languages/ko/ko_keyboard.json
Replace "korean_double" layout with expanded "korean_full" layout containing additional consonant clusters and extended character groupings.
Korean Language Configuration
webapp/data/languages/ko/language_config.json
Add comprehensive diacritic_map for Korean, mapping base characters and syllables to variant forms for normalization.
Korean Word Blocklist
webapp/data/languages/ko/ko_blocklist.txt
New blocklist containing 139 Korean syllables with compound jongseong (final consonant clusters) excluded from default keyboard but available in full layout.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

🐰 A keyboard grows, from double to full,
Korean syllables, now beautiful!
Diacritics dance, base forms unite,
Characters aggregate—test coverage bright! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main changes: fixing the Korean keyboard unclickability issue and addressing German ß via universal diacritic normalization, matching the PR's core objectives.
Linked Issues check ✅ Passed The PR successfully addresses issue #153 by implementing diacritic mapping for Korean Jamo characters, adding an extended keyboard layout, and ensuring keyboard characters are properly recognized in the allowed character set.
Out of Scope Changes check ✅ Passed All changes are directly aligned with fixing the Korean keyboard issue and implementing universal diacritic normalization; no out-of-scope changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch worktree-fix-korean-keyboard
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Hugo0 added 2 commits March 14, 2026 19:56
- Korean default keyboard now 5 rows with double consonants + all
  compound vowels (98.6% word coverage, up from 77%)
- Compound jongseong available on "Full" layout (remaining 1.4%)
- German: fix broken multi-char diacritic "ss"→"ß" to single-char
  "s"→"ß" which the char-by-char normalizer can actually process
- Remove all keyboard coverage xfails (both ko and de now pass)
Words with compound final consonants (ㄺ, ㄻ, ㄼ, etc.) are not typeable
on the default keyboard. They remain valid guesses (via the Full layout)
but are excluded from daily word selection so every daily word is always
solvable on the default keyboard.

Also adds documentation for future Option C: decomposing compound
jongseong into individual jamo with 6-cell grid (like kordle.kr),
which would eliminate compound keys entirely. Notes other languages
(Devanagari, Thai, Khmer) that may need similar treatment.
@Hugo0 Hugo0 changed the title fix: Korean keyboard unclickable due to Hangul Jamo Unicode mismatch fix: Korean keyboard unclickable + German ß — universal diacritic normalization Mar 14, 2026
Maps physical key codes (event.code) to jamo characters, bypassing the
OS IME which would otherwise compose syllable blocks. Uses the standard
Korean 2-set (Dubeolsik) layout: Q→ㅂ, W→ㅈ, E→ㄷ, etc. with Shift
for double consonants (Shift+Q→ㅃ, Shift+T→ㅆ, etc.).

The physical_key_map config is universal — any language needing IME
bypass can add one to their language_config.json.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_word_lists.py (1)

128-152: ⚠️ Potential issue | 🟠 Major

Add a default-layout coverage check for daily candidates.

Unioning all layouts is fine for “guessable somewhere”, but it no longer protects the Korean contract introduced here: daily-selectable words must still be typeable on the default korean_2set layout. A word missed in ko_blocklist.txt will now pass CI via korean_full and still ship as an unsolvable daily. Please keep this all-layout assertion for guessability, but add a second check over the daily candidate pool (load_daily_words(lang) or load_word_list(lang) - load_blocklist(lang)) against the default layout only.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_word_lists.py` around lines 128 - 152, The current test unions
characters from all layouts via load_all_keyboard_chars(lang), which misses the
requirement that daily-selectable words must be typeable on the default layout;
add a second assertion that computes the daily candidate set (use
load_daily_words(lang) or compute load_word_list(lang) minus
load_blocklist(lang)) and verify every character in those daily words appears on
the default keyboard layout (load_keyboard(lang, layout="korean_2set") or the
project’s equivalent API); keep the existing all-layout check for guessability
and add this default-layout check (reference
test_keyboard_covers_all_word_characters, KEYBOARD_COVERAGE_XFAIL,
load_word_list, load_daily_words, load_blocklist, load_keyboard,
load_all_keyboard_chars).
🧹 Nitpick comments (2)
webapp/data/languages/ko/language_config.json (1)

56-63: Add the missing normalization entry.

ko_keyboard.json now exposes on both layouts, but this map jumps straight from to . Adding the Hangul Jamo variant keeps the keyboard/config pair symmetric and avoids another Compatibility Jamo mismatch if ever lands in the guessable lists.

Possible change
         "ㅑ": ["ᅣ"],
+        "ㅒ": ["ᅤ"],
         "ㅓ": ["ᅥ"],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@webapp/data/languages/ko/language_config.json` around lines 56 - 63, The JSON
vowel-to-Jamo map is missing the ㅒ→ᅤ normalization; add an entry mapping the
Hangul character "ㅒ" to the compatibility Jamo array ["ᅤ"] (insert it between
the existing "ㅑ": ["ᅣ"] and "ㅓ": ["ᅥ"] entries) so the language_config
normalization matches the ko_keyboard layouts and avoids Compatibility Jamo
mismatches.
tests/conftest.py (1)

159-181: Keep this helper in lockstep with the runtime keyboard parser.

webapp/app.py still accepts the legacy top-level dict keyboard shape, but this helper only handles {"layouts": ...} or raw lists. If a keyboard JSON uses the legacy form, the tests will under-report typeable chars compared to production.

Possible alignment
-    if isinstance(data, dict) and "layouts" in data:
-        for layout in data["layouts"].values():
-            for row in layout.get("rows", []):
-                chars.update(k for k in row if k not in control_keys)
+    if isinstance(data, dict):
+        layouts = data.get("layouts")
+        if not isinstance(layouts, dict):
+            layouts = {k: v for k, v in data.items() if k != "default"}
+        for layout in layouts.values():
+            rows = layout.get("rows", []) if isinstance(layout, dict) else layout
+            for row in rows:
+                chars.update(k for k in row if k not in control_keys)
     elif isinstance(data, list):
         for row in data:
             chars.update(k for k in row if k not in control_keys)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/conftest.py` around lines 159 - 181, The helper load_all_keyboard_chars
currently only handles data shaped as {"layouts": ...} or a raw list and thus
misses legacy top-level dict keyboard JSONs; update it to detect when data is a
dict but lacks "layouts" and treat the dict's values as rows (or lists of rows)
similarly to the other branches, iterating over those values and filtering out
control_keys (use the existing control_keys set and LANGUAGES_DIR reference) so
legacy keyboard files produce the same char set as the runtime parser.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tests/test_word_lists.py`:
- Around line 128-152: The current test unions characters from all layouts via
load_all_keyboard_chars(lang), which misses the requirement that
daily-selectable words must be typeable on the default layout; add a second
assertion that computes the daily candidate set (use load_daily_words(lang) or
compute load_word_list(lang) minus load_blocklist(lang)) and verify every
character in those daily words appears on the default keyboard layout
(load_keyboard(lang, layout="korean_2set") or the project’s equivalent API);
keep the existing all-layout check for guessability and add this default-layout
check (reference test_keyboard_covers_all_word_characters,
KEYBOARD_COVERAGE_XFAIL, load_word_list, load_daily_words, load_blocklist,
load_keyboard, load_all_keyboard_chars).

---

Nitpick comments:
In `@tests/conftest.py`:
- Around line 159-181: The helper load_all_keyboard_chars currently only handles
data shaped as {"layouts": ...} or a raw list and thus misses legacy top-level
dict keyboard JSONs; update it to detect when data is a dict but lacks "layouts"
and treat the dict's values as rows (or lists of rows) similarly to the other
branches, iterating over those values and filtering out control_keys (use the
existing control_keys set and LANGUAGES_DIR reference) so legacy keyboard files
produce the same char set as the runtime parser.

In `@webapp/data/languages/ko/language_config.json`:
- Around line 56-63: The JSON vowel-to-Jamo map is missing the ㅒ→ᅤ
normalization; add an entry mapping the Hangul character "ㅒ" to the
compatibility Jamo array ["ᅤ"] (insert it between the existing "ㅑ": ["ᅣ"] and
"ㅓ": ["ᅥ"] entries) so the language_config normalization matches the ko_keyboard
layouts and avoids Compatibility Jamo mismatches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0e4d480f-fdc2-4a21-9735-f8dc8bc01da9

📥 Commits

Reviewing files that changed from the base of the PR and between 01e0882 and 4ddd907.

📒 Files selected for processing (8)
  • tests/conftest.py
  • tests/test_language_config.py
  • tests/test_word_lists.py
  • webapp/app.py
  • webapp/data/languages/de/language_config.json
  • webapp/data/languages/ko/ko_blocklist.txt
  • webapp/data/languages/ko/ko_keyboard.json
  • webapp/data/languages/ko/language_config.json

Korean diacritic_map is for internal Unicode normalization (Compatibility
Jamo ↔ Hangul Jamo), not player-visible accent variants. The sub-key
hints (ᄇ/ᆸ) are meaningless to players and add visual noise.

Adds hide_diacritic_hints config flag — keeps hints working for European
languages (German ä/ö/ü/ß, etc.) where they're genuinely useful.
@Hugo0 Hugo0 merged commit ead824c into main Mar 14, 2026
4 checks passed
Hugo0 added a commit that referenced this pull request Mar 14, 2026
Same fix as German ß (PR #155): "oe"→["œ"] and "ae"→["æ"] were
multi-char keys that the char-by-char normalizer silently ignored.

Changed to single-char mappings: "o"→["ô","œ"] and "a"→["à","â","æ"].
Players can now type 'o' to match œ and 'a' to match æ in French words.
Hugo0 added a commit that referenced this pull request Mar 14, 2026
Same fix as German ß (PR #155): "oe"→["œ"] and "ae"→["æ"] were
multi-char keys that the char-by-char normalizer silently ignored.

Changed to: "o"→["ô","œ"] and "a"→["à","â","æ"]. Players can now
type 'o' to match œ and 'a' to match æ in French words.
@Hugo0 Hugo0 mentioned this pull request Mar 14, 2026
1 task
Hugo0 added a commit that referenced this pull request Mar 14, 2026
… words

These words contain compound final consonants not typeable on the default
keyboard, and were added to ko_blocklist.txt in PR #155.
@Hugo0 Hugo0 deleted the worktree-fix-korean-keyboard branch April 15, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Korean keyboard doesn't work

1 participant