Skip to content

Dictionary curation 2025 03 06#837

Merged
elijah-potter merged 11 commits intoAutomattic:masterfrom
hippietrail:dictionary-curation-2025-03-06
Mar 25, 2025
Merged

Dictionary curation 2025 03 06#837
elijah-potter merged 11 commits intoAutomattic:masterfrom
hippietrail:dictionary-curation-2025-03-06

Conversation

@hippietrail
Copy link
Copy Markdown
Collaborator

@hippietrail hippietrail commented Mar 6, 2025

Issues

N/A

Description

  • Add words to dictionary
  • Adjust affix annotations
  • Move placenames into the new canonical case system
  • Add "VS Code" to the canonical case system
  • Sort parts of the canonical case JSON file that I modified

Demo

N/A

How Has This Been Tested?

Tested on place names that previously didn't work or needed special rules.

Checklist

  • I have performed a self-review of my own code

Terms with lowercase minor words that are not English and terms with hyphens work in the new system: Port-au-Prince, Porto-Novo, Dar es Salaam, Andorra la Vella, etc.

(Terms with periods still don't work: St. George's etc. Hyphens can't be added.)
Comment thread harper-core/proper_noun_rules.json Outdated
Comment thread harper-core/src/linting/matcher.rs Outdated
"andorra","la","vella" => "Andorra la Vella",
"Andorra","la","vella" => "Andorra la Vella",
"Andorra","La","Vella" => "Andorra la Vella",
"guinea","bissau" => "Guinea-Bissau",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These hyphenation rules can be made into phrase_corrections entries.

Copy link
Copy Markdown
Collaborator Author

@hippietrail hippietrail Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Will look into that today then....

As it turns out, unless I'm missing something, phrase_corrections will only add the hyphens, it won't change the case. So it can correct Guinea Bissau but will incorrectly "correct" guinea bissau to guinea-bissau, Guinea bissau to Guinea-bissau, etc.

If you go with the quick fix for those then the canonical case linter will kick in, but this might not be what we're really aiming for.

I'll move the one case of each that can be fixed, but maybe the best solution is to change "canonical case" into "canonicalize" and it should handle at least case and hyphenation, but ideally also closed vs open vs hyphenated compounds and hyphenation and missing/extra apostrophes, missing/extra periods, standardizing abbreviated vs full spelling, for things like "Saint" vs "St."

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can be made into separate rules. An entry for proper noun capitalization and an entry for hyphenation. In other words: just do the quick fix.

I may start looking into how we can collapse multiple Lints into one soon, since we're starting to hit the critical mass you mentioned a while ago.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can be made into separate rules. An entry for proper noun capitalization and an entry for hyphenation. In other words: just do the quick fix.

I may start looking into how we can collapse multiple Lints into one soon, since we're starting to hit the critical mass you mentioned a while ago.

I'm not sure how to proceed with this? Is it possible to at least merge in the dictionary.dict curation part as it's going to get more and more out-of-sync with later curation PRs.

@hippietrail
Copy link
Copy Markdown
Collaborator Author

Oops I accidentally pushed 3d0c3c3 here instead of my new curation branch. I tried to revert it with git reset --hard HEAD~1 and git push -f origin dictionary-curation-2025-03-06 which I thought worked, but apparently it didn't. I don't want to fumble around and make it even worse \-:

@elijah-potter
Copy link
Copy Markdown
Collaborator

I'm a little confused to your meaning. Are you wanting to only submit the dictionary.dict changes?

@hippietrail
Copy link
Copy Markdown
Collaborator Author

I'm a little confused to your meaning. Are you wanting to only submit the dictionary.dict changes?

I think one problem is I keep forgetting which file can fix which things because it's not in the name of the file, the name of the linter, or the comment at the top of the file.
By this I mean which file has lints that can add hyphens to phrases vs which has lints that can correct the case of hyphenated phrases.

I'm going to try to narrow this down, include tests, and include comments. But I might miss something. I'll push my best effort and then merge everything that looks right to you.

Maybe we could have a file somewhere just for testing place names and/or proper nouns that can include tests for cases, hyphenation, apostrophes, accents, etc, without having to know which source file is responsible.

I don't understand why I needed to change the logic of the `fst_map_contains_all_in_full_dict` test in `fst_dicttionary.rs` to pass `cargo test` - the first word from the iterator is `""`
Comment thread harper-core/src/linting/phrase_corrections.rs
Comment thread harper-core/src/spell/fst_dictionary.rs Outdated
@elijah-potter elijah-potter enabled auto-merge March 21, 2025 13:40
elijah-potter
elijah-potter previously approved these changes Mar 21, 2025
@elijah-potter elijah-potter added this pull request to the merge queue Mar 21, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Mar 21, 2025
@hippietrail
Copy link
Copy Markdown
Collaborator Author

hippietrail commented Mar 21, 2025

Is the precommit really still failing the PHP test? I can't grok why? It says it gives correct diagnostics but then there's an error.

Failures:
1) Languages > gives correct diagnostics for PHP files
  Message:
    Error: Expected 1 diagnostics, got 0.
  Stack:
    	at compareActualVsExpectedDiagnostics (/home/runner/work/harper/harper/packages/vscode-plugin/build/tests/suite/helper.js:38:15)
    	at UserContext.<anonymous> (/home/runner/work/harper/harper/packages/vscode-plugin/build/tests/suite/languages.test.js:50:61)

30 specs, 1 failure
Finished in 8.873 seconds
Error: Tests failed
	at Object.run (/home/runner/work/harper/harper/packages/vscode-plugin/build/tests/suite/index.js:19:15)
	at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Extension host test runner error Error: Tests failed
	at Object.run (/home/runner/work/harper/harper/packages/vscode-plugin/build/tests/suite/index.js:19:15)
	at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Asking native host service to exit with code 1.
[main 2025-03-21T15:32:42.370Z] Extension host with pid 11581 exited with code: 0, signal: unknown.
Exit code:   1
Failed to run tests TestRunFailedError: Test run failed with code 1
    at ChildProcess.onProcessClosed (/home/runner/work/harper/harper/node_modules/.pnpm/@vscode+test-electron@2.4.1/node_modules/@vscode/test-electron/out/runTest.js:110:24)
    at ChildProcess.emit (node:events:518:28)
    at ChildProcess._handle.onexit (node:internal/child_process:293:12) {
  code: 1,
  signal: undefined
}
 ELIFECYCLE  Test failed. See above for more details.
error: Recipe `test-vscode` failed with exit code 1
Error: Process completed with exit code 1.

After the last merge with master this is no longer happening. 🎉

@elijah-potter elijah-potter added this pull request to the merge queue Mar 25, 2025
Merged via the queue into Automattic:master with commit f768fa4 Mar 25, 2025
22 checks passed
@hippietrail hippietrail deleted the dictionary-curation-2025-03-06 branch March 25, 2025 14:10
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Apr 20, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [Automattic/harper/harper-ls](https://github.com/Automattic/harper) | minor | `v0.26.0` -> `v0.29.1` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>Automattic/harper (Automattic/harper/harper-ls)</summary>

### [`v0.29.1`](https://github.com/Automattic/harper/releases/tag/v0.29.1)

[Compare Source](Automattic/harper@v0.29.0...v0.29.1)

#### What's Changed

-   chore: "off of a" false positive by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1081
-   feat: phrase corrections: like a plague, have went, case and point, aswell by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1078
-   Dictionary curation 2025 04 17 by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1080
-   fix: [#&#8203;1075](Automattic/harper#1075) package logo font for vscode extension by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1077

**Full Changelog**: Automattic/harper@v0.29.0...v0.29.1

### [`v0.29.0`](https://github.com/Automattic/harper/releases/tag/v0.29.0)

[Compare Source](Automattic/harper@v0.28.0...v0.29.0)

#### What's Changed

-   refactor: remove unneeded logic for repeated words by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1020
-   fix: improve to handle -s -es and -ed endings by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1003
-   feat: start clarifying affix system by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#972
-   refactor: improve logic and robustness of then→than by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1021
-   Curate existing rules by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1023
-   fix: new logic and false positives by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1024
-   build(deps): bump tokio from 1.44.1 to 1.44.2 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#1040
-   build(deps): bump indexmap from 2.8.0 to 2.9.0 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#1037
-   build(deps): bump uuid from 1.12.0 to 1.16.0 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#1038
-   feat: use the old code if `parallel` unavailable by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1019
-   feat: mention which verb triggered lint in msg by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1034
-   feat: Expand "cuz", correct "on face value" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1030
-   feat: exempt "you guys" from the possessive your linter by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1027
-   feat: trail and error→trial and error by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1044
-   chore: tweak priority/position of statusbar item and add Harper logo by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1043
-   chore: "side of a" false positive by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1029
-   feat: Add special cases to sentence capitalization by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1031
-   feat: don't allow "let's" to trigger a following compound by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1032
-   Dictionary curation 2025 04 04 by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1028
-   feature: highly kept (secret)→well-kept by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1045
-   Documentation updates by [@&#8203;mcecode](https://github.com/mcecode) in Automattic/harper#1000
-   chore: manually spotted 3 things our lints would flag in doc comments by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1047
-   feat: sequential pronouns: don't detect "my US", make case insensitive by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1026
-   feat(devshell): init devshell by [@&#8203;alDuncanson](https://github.com/alDuncanson) in Automattic/harper#1014
-   `harper.js` API reference generation improvements by [@&#8203;mcecode](https://github.com/mcecode) in Automattic/harper#1050
-   fix(comments): ignore comments CSpell compatibility by [@&#8203;mcecode](https://github.com/mcecode) in Automattic/harper#1046
-   fix: update vscode deps by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1042
-   feat: hone in on→home in on by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1059
-   Adj of a curation 2025 04 09 by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1053
-   Dictionary curation 2025 04 08 by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1052
-   fix: make modal-of linter case insensitive by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1048
-   build(deps): bump clap from 4.5.34 to 4.5.36 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#1060
-   build(deps): bump anyhow from 1.0.97 to 1.0.98 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#1063
-   build(deps): bump lru from 0.13.0 to 0.14.0 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#1061
-   build(deps): bump smallvec from 1.14.0 to 1.15.0 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#1062
-   feat(core): add simple corrections from my notes by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1025
-   Fix bug 1066 "Stack Overflow", dictionary curation, adjective-of-a curation by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1068
-   feat: add for (a)while, after (a)while, unless if, suffice to say by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1071

**Full Changelog**: Automattic/harper@v0.28.0...v0.29.0

### [`v0.28.0`](https://github.com/Automattic/harper/releases/tag/v0.28.0)

[Compare Source](Automattic/harper@v0.27.0...v0.28.0)

#### What's Changed

-   fix(vscode-plugin): sleep a longer time after openUntitled by [@&#8203;kiding](https://github.com/kiding) in Automattic/harper#1004
-   docs: update link to website by [@&#8203;alDuncanson](https://github.com/alDuncanson) in Automattic/harper#1007
-   feat(harper-cli): make lint accept user & file-local dictionary by [@&#8203;kiding](https://github.com/kiding) in Automattic/harper#987
-   feat(ls): use PlainEnglish parser for language id "text" by [@&#8203;86xsk](https://github.com/86xsk) in Automattic/harper#968
-   docs: fix grammar by [@&#8203;alDuncanson](https://github.com/alDuncanson) in Automattic/harper#1009
-   feat(comments): add `scala` support by [@&#8203;tymcauley](https://github.com/tymcauley) in Automattic/harper#970
-   docs(core): update the `Author a Rule` page to align with code by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1008
-   feat(core): make `LintConfig` sorted by key by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1012
-   feat: wrote first draft of statistics logging by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#454
-   feat(obsidian): add debounce setting by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1015
-   feat(harper.js): significantly improve worker performance by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1016
-   feat: [#&#8203;20](Automattic/harper#20) : comma spacing and [#&#8203;498](Automattic/harper#498) : Asian commas by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#891
-   feat(core): added a bunch more common rules by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#940
-   chore: merge and sort original and non-US sections by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#977
-   Adj of a curation 2025 04 02 by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#1006

#### New Contributors

-   [@&#8203;alDuncanson](https://github.com/alDuncanson) made their first contribution in Automattic/harper#1007
-   [@&#8203;tymcauley](https://github.com/tymcauley) made their first contribution in Automattic/harper#970

**Full Changelog**: Automattic/harper@v0.27.0...v0.28.0

### [`v0.27.0`](https://github.com/Automattic/harper/releases/tag/v0.27.0)

[Compare Source](Automattic/harper@v0.26.0...v0.27.0)

#### What's Changed

-   fix(harper-ls): handle language mode change and VS Code auto detect by [@&#8203;kiding](https://github.com/kiding) in Automattic/harper#966
-   feat: linters for common mistakes with "another" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#963
-   Dictionary curation 2025 03 06 by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#837
-   feat(core): flag "<adjective> of a" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#967
-   fix: include the full set of personal pronouns and possessives by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#964
-   fix: 2 words in curated dict lack / before annotations by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#958
-   feat: implement [#&#8203;828](Automattic/harper#828) by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#853
-   VS Code Extension Updates by [@&#8203;mcecode](https://github.com/mcecode) in Automattic/harper#960
-   doc(core): write up the difference between a `Linter` and a `PatternLinter` by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#973
-   fix(core): remove bad phrase by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#974
-   fix: handle another false positive in "adjective of a" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#980
-   feat(harper-typst): ignore file path as arguments, regex, .display by [@&#8203;kiding](https://github.com/kiding) in Automattic/harper#976
-   Improve doc coverage by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#979
-   feat: detect capitalized false positives by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#988
-   fix(harper-cli): set American as the default dialect by [@&#8203;kiding](https://github.com/kiding) in Automattic/harper#986
-   build(deps): bump once_cell from 1.21.1 to 1.21.3 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#996
-   feat: more false positives: "inside of" & "out of" by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#992
-   template for feature requests on github by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#981
-   build(deps): bump clap from 4.5.32 to 4.5.34 by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#995
-   feat: implement [#&#8203;993](Automattic/harper#993) by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#994
-   Dialect indicator for VS Code extension by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#985
-   chore: "head of a" false positive + test by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#999
-   test(core): confirm that [#&#8203;720](Automattic/harper#720) is no longer present by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#975
-   fix(wordpress): crashes when options menu is opened by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#1002

**Full Changelog**: Automattic/harper@v0.26.0...v0.27.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yMjguMCIsInVwZGF0ZWRJblZlciI6IjM5LjI0OC4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants