-
Notifications
You must be signed in to change notification settings - Fork 45
Create a mapping from script to font language #767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| # (C) Copyright 2005-2021 Enthought, Inc., Austin, TX | ||
| # All rights reserved. | ||
| # | ||
| # This software is provided without warranty under the terms of the BSD | ||
| # license included in LICENSE.txt and may be redistributed only under | ||
| # the conditions described in the aforementioned license. The license | ||
| # is also available online at http://www.enthought.com/licenses/BSD.txt | ||
| # | ||
| # Thanks for using Enthought open source! | ||
| import locale | ||
|
|
||
| from kiva.fonttools.text._data import SCRIPTS | ||
|
|
||
| # Derived from kiva.fonttools._util: | ||
| # `_ot_code_page_masks` and `_ot_unicode_range_bits` | ||
| # These are the font languages which we recognize | ||
| _FONT_LANGUAGES = [ | ||
| "Arabic", "Armenian", "Balinese", "Bengali", "Buginese", | ||
| "Canadian_Aboriginal", "Cherokee", "Coptic", "Cyrillic", "Deseret", | ||
| "Devanagari", "Ethiopic", "Georgia", "Glagolitic", "Gothic", "Greek", | ||
| "Gujarati", "Gurmukhi", "Hebrew", "Japanese", "Kannada", "Khmer", "Korean", | ||
| "Lao", "Latin", "Limbu", "Malayalam", "Math", "Mongolian", "Myanmar", | ||
| "New_Tai_Lue", "Nko", "Ogham", "Oriya", "Phoenician", "Runic", | ||
| "Simplified Chinese", "Sinhala", "Symbol", "Syriac", "Tai_Le", "Tamil", | ||
| "Telugu", "Thaana", "Thai", "Tibetan", "Tifinagh", "Traditional Chinese", | ||
| "Vai", "Vietnamese", | ||
| ] | ||
|
|
||
|
|
||
| def build_script_to_language_map(): | ||
| """ Create a dictionary which maps from script name (from `SCRIPTS`) to | ||
| font language. | ||
|
|
||
| NOTE: The langauge for a given script is locale dependent. | ||
| """ | ||
| locale_lang = locale.getdefaultlocale()[0] | ||
|
|
||
| if locale_lang == "C": | ||
| locale_lang = "en_US" | ||
|
|
||
| # Pick a language to use for "Han" script | ||
| han_lang = "Traditional Chinese" # Default | ||
| if locale_lang in ("zh_CN", "zh_SG"): | ||
| han_lang = "Simplified Chinese" | ||
| elif locale_lang.startswith("ja"): | ||
| han_lang = "Japanese" | ||
| elif locale_lang.startswith("ko"): | ||
| han_lang = "Korean" | ||
|
|
||
| # Mapping from script -> langauge that we're _mostly_ sure about | ||
| known_mappings = { | ||
| # Special script properties | ||
| "Common": "Common", | ||
| "Inherited": "Inherited", | ||
| "Unknown": "Unknown", | ||
|
|
||
| # Scripts which infer the writing system | ||
| "Bopomofo": "Traditional Chinese", # XXX: Taiwan only? | ||
| "Han": han_lang, | ||
| "Hangul": "Korean", | ||
| "Hiragana": "Japanese", | ||
| "Katakana": "Japanese", | ||
| } | ||
|
|
||
| mapping = {} | ||
| for script in SCRIPTS: | ||
| if script in known_mappings: | ||
| mapping[script] = known_mappings[script] | ||
| elif script in _FONT_LANGUAGES: | ||
| mapping[script] = script | ||
| else: | ||
| mapping[script] = "Latin" | ||
|
|
||
| return mapping | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having played with this a bit more, we should only use this choice when it's not otherwise clear from the context. For instance if a string already contains Hiragana or Katakana, then Han should be mapped to "Japanese". If Hangul is encountered, Han maps to "Korean". Only if the Han is mixed with some non-CJK language should we fall back to this locale-based guess.