Dictionary curation 2025 03 06#837
Conversation
Terms with lowercase minor words that are not English and terms with hyphens work in the new system: Port-au-Prince, Porto-Novo, Dar es Salaam, Andorra la Vella, etc. (Terms with periods still don't work: St. George's etc. Hyphens can't be added.)
| "andorra","la","vella" => "Andorra la Vella", | ||
| "Andorra","la","vella" => "Andorra la Vella", | ||
| "Andorra","La","Vella" => "Andorra la Vella", | ||
| "guinea","bissau" => "Guinea-Bissau", |
There was a problem hiding this comment.
These hyphenation rules can be made into phrase_corrections entries.
There was a problem hiding this comment.
Cool. Will look into that today then....
As it turns out, unless I'm missing something, phrase_corrections will only add the hyphens, it won't change the case. So it can correct Guinea Bissau but will incorrectly "correct" guinea bissau to guinea-bissau, Guinea bissau to Guinea-bissau, etc.
If you go with the quick fix for those then the canonical case linter will kick in, but this might not be what we're really aiming for.
I'll move the one case of each that can be fixed, but maybe the best solution is to change "canonical case" into "canonicalize" and it should handle at least case and hyphenation, but ideally also closed vs open vs hyphenated compounds and hyphenation and missing/extra apostrophes, missing/extra periods, standardizing abbreviated vs full spelling, for things like "Saint" vs "St."
There was a problem hiding this comment.
They can be made into separate rules. An entry for proper noun capitalization and an entry for hyphenation. In other words: just do the quick fix.
I may start looking into how we can collapse multiple Lints into one soon, since we're starting to hit the critical mass you mentioned a while ago.
There was a problem hiding this comment.
They can be made into separate rules. An entry for proper noun capitalization and an entry for hyphenation. In other words: just do the quick fix.
I may start looking into how we can collapse multiple
Lints into one soon, since we're starting to hit the critical mass you mentioned a while ago.
I'm not sure how to proceed with this? Is it possible to at least merge in the dictionary.dict curation part as it's going to get more and more out-of-sync with later curation PRs.
|
Oops I accidentally pushed |
|
I'm a little confused to your meaning. Are you wanting to only submit the |
I think one problem is I keep forgetting which file can fix which things because it's not in the name of the file, the name of the linter, or the comment at the top of the file. I'm going to try to narrow this down, include tests, and include comments. But I might miss something. I'll push my best effort and then merge everything that looks right to you. Maybe we could have a file somewhere just for testing place names and/or proper nouns that can include tests for cases, hyphenation, apostrophes, accents, etc, without having to know which source file is responsible. |
I don't understand why I needed to change the logic of the `fst_map_contains_all_in_full_dict` test in `fst_dicttionary.rs` to pass `cargo test` - the first word from the iterator is `""`
and remove empty entry handling
…ctionary-curation-2025-03-06
…ctionary-curation-2025-03-06
|
Is the precommit really still failing the PHP test? I can't grok why? It says it gives correct diagnostics but then there's an error. After the last merge with master this is no longer happening. 🎉 |
…ctionary-curation-2025-03-06
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [Automattic/harper/harper-ls](https://github.com/Automattic/harper) | minor | `v0.26.0` -> `v0.29.1` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>Automattic/harper (Automattic/harper/harper-ls)</summary> ### [`v0.29.1`](https://github.com/Automattic/harper/releases/tag/v0.29.1) [Compare Source](Automattic/harper@v0.29.0...v0.29.1) #### What's Changed - chore: "off of a" false positive by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1081 - feat: phrase corrections: like a plague, have went, case and point, aswell by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1078 - Dictionary curation 2025 04 17 by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1080 - fix: [#​1075](Automattic/harper#1075) package logo font for vscode extension by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1077 **Full Changelog**: Automattic/harper@v0.29.0...v0.29.1 ### [`v0.29.0`](https://github.com/Automattic/harper/releases/tag/v0.29.0) [Compare Source](Automattic/harper@v0.28.0...v0.29.0) #### What's Changed - refactor: remove unneeded logic for repeated words by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1020 - fix: improve to handle -s -es and -ed endings by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1003 - feat: start clarifying affix system by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#972 - refactor: improve logic and robustness of then→than by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1021 - Curate existing rules by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1023 - fix: new logic and false positives by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1024 - build(deps): bump tokio from 1.44.1 to 1.44.2 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#1040 - build(deps): bump indexmap from 2.8.0 to 2.9.0 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#1037 - build(deps): bump uuid from 1.12.0 to 1.16.0 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#1038 - feat: use the old code if `parallel` unavailable by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1019 - feat: mention which verb triggered lint in msg by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1034 - feat: Expand "cuz", correct "on face value" by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1030 - feat: exempt "you guys" from the possessive your linter by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1027 - feat: trail and error→trial and error by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1044 - chore: tweak priority/position of statusbar item and add Harper logo by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1043 - chore: "side of a" false positive by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1029 - feat: Add special cases to sentence capitalization by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1031 - feat: don't allow "let's" to trigger a following compound by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1032 - Dictionary curation 2025 04 04 by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1028 - feature: highly kept (secret)→well-kept by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1045 - Documentation updates by [@​mcecode](https://github.com/mcecode) in Automattic/harper#1000 - chore: manually spotted 3 things our lints would flag in doc comments by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1047 - feat: sequential pronouns: don't detect "my US", make case insensitive by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1026 - feat(devshell): init devshell by [@​alDuncanson](https://github.com/alDuncanson) in Automattic/harper#1014 - `harper.js` API reference generation improvements by [@​mcecode](https://github.com/mcecode) in Automattic/harper#1050 - fix(comments): ignore comments CSpell compatibility by [@​mcecode](https://github.com/mcecode) in Automattic/harper#1046 - fix: update vscode deps by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1042 - feat: hone in on→home in on by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1059 - Adj of a curation 2025 04 09 by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1053 - Dictionary curation 2025 04 08 by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1052 - fix: make modal-of linter case insensitive by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1048 - build(deps): bump clap from 4.5.34 to 4.5.36 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#1060 - build(deps): bump anyhow from 1.0.97 to 1.0.98 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#1063 - build(deps): bump lru from 0.13.0 to 0.14.0 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#1061 - build(deps): bump smallvec from 1.14.0 to 1.15.0 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#1062 - feat(core): add simple corrections from my notes by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1025 - Fix bug 1066 "Stack Overflow", dictionary curation, adjective-of-a curation by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1068 - feat: add for (a)while, after (a)while, unless if, suffice to say by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1071 **Full Changelog**: Automattic/harper@v0.28.0...v0.29.0 ### [`v0.28.0`](https://github.com/Automattic/harper/releases/tag/v0.28.0) [Compare Source](Automattic/harper@v0.27.0...v0.28.0) #### What's Changed - fix(vscode-plugin): sleep a longer time after openUntitled by [@​kiding](https://github.com/kiding) in Automattic/harper#1004 - docs: update link to website by [@​alDuncanson](https://github.com/alDuncanson) in Automattic/harper#1007 - feat(harper-cli): make lint accept user & file-local dictionary by [@​kiding](https://github.com/kiding) in Automattic/harper#987 - feat(ls): use PlainEnglish parser for language id "text" by [@​86xsk](https://github.com/86xsk) in Automattic/harper#968 - docs: fix grammar by [@​alDuncanson](https://github.com/alDuncanson) in Automattic/harper#1009 - feat(comments): add `scala` support by [@​tymcauley](https://github.com/tymcauley) in Automattic/harper#970 - docs(core): update the `Author a Rule` page to align with code by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1008 - feat(core): make `LintConfig` sorted by key by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1012 - feat: wrote first draft of statistics logging by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#454 - feat(obsidian): add debounce setting by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1015 - feat(harper.js): significantly improve worker performance by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1016 - feat: [#​20](Automattic/harper#20) : comma spacing and [#​498](Automattic/harper#498) : Asian commas by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#891 - feat(core): added a bunch more common rules by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#940 - chore: merge and sort original and non-US sections by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#977 - Adj of a curation 2025 04 02 by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#1006 #### New Contributors - [@​alDuncanson](https://github.com/alDuncanson) made their first contribution in Automattic/harper#1007 - [@​tymcauley](https://github.com/tymcauley) made their first contribution in Automattic/harper#970 **Full Changelog**: Automattic/harper@v0.27.0...v0.28.0 ### [`v0.27.0`](https://github.com/Automattic/harper/releases/tag/v0.27.0) [Compare Source](Automattic/harper@v0.26.0...v0.27.0) #### What's Changed - fix(harper-ls): handle language mode change and VS Code auto detect by [@​kiding](https://github.com/kiding) in Automattic/harper#966 - feat: linters for common mistakes with "another" by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#963 - Dictionary curation 2025 03 06 by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#837 - feat(core): flag "<adjective> of a" by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#967 - fix: include the full set of personal pronouns and possessives by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#964 - fix: 2 words in curated dict lack / before annotations by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#958 - feat: implement [#​828](Automattic/harper#828) by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#853 - VS Code Extension Updates by [@​mcecode](https://github.com/mcecode) in Automattic/harper#960 - doc(core): write up the difference between a `Linter` and a `PatternLinter` by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#973 - fix(core): remove bad phrase by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#974 - fix: handle another false positive in "adjective of a" by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#980 - feat(harper-typst): ignore file path as arguments, regex, .display by [@​kiding](https://github.com/kiding) in Automattic/harper#976 - Improve doc coverage by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#979 - feat: detect capitalized false positives by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#988 - fix(harper-cli): set American as the default dialect by [@​kiding](https://github.com/kiding) in Automattic/harper#986 - build(deps): bump once_cell from 1.21.1 to 1.21.3 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#996 - feat: more false positives: "inside of" & "out of" by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#992 - template for feature requests on github by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#981 - build(deps): bump clap from 4.5.32 to 4.5.34 by [@​dependabot](https://github.com/dependabot) in Automattic/harper#995 - feat: implement [#​993](Automattic/harper#993) by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#994 - Dialect indicator for VS Code extension by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#985 - chore: "head of a" false positive + test by [@​hippietrail](https://github.com/hippietrail) in Automattic/harper#999 - test(core): confirm that [#​720](Automattic/harper#720) is no longer present by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#975 - fix(wordpress): crashes when options menu is opened by [@​elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1002 **Full Changelog**: Automattic/harper@v0.26.0...v0.27.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yMjguMCIsInVwZGF0ZWRJblZlciI6IjM5LjI0OC4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Issues
N/A
Description
Demo
N/A
How Has This Been Tested?
Tested on place names that previously didn't work or needed special rules.
Checklist