Skip to content

Conversation

@Uzlopak
Copy link
Contributor

@Uzlopak Uzlopak commented Nov 3, 2023

This PR adds a standardized way via the cli option --icu-locale to set the default locale of the ICU used in node process.

How to use:

node --icu-locale=fr-FR

then in the repl you can do

console.log(Intl?.Collator().resolvedOptions().locale) // output is 'fr-FR'

Todo:

  • get feedback if it is implemented correctly from other maintainers
  • add tests
  • add documentation

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/startup

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Nov 3, 2023
@Uzlopak Uzlopak changed the title lib: set defautlt locale per cli option --locale lib: set default locale per cli option --locale Nov 3, 2023
@Uzlopak Uzlopak changed the title lib: set default locale per cli option --locale lib: set default locale per cli option --locale or LOCALE env var Nov 3, 2023
@Uzlopak Uzlopak force-pushed the set-locale-per-cli branch 2 times, most recently from a5b21c6 to ad61226 Compare November 3, 2023 15:21
@Uzlopak Uzlopak marked this pull request as ready for review November 3, 2023 15:22
Copy link
Member

@tniessen tniessen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the motivation here, as in, providing a portable way of defining the default ICU locale.

However, ICU already does a good job at determining the correct locale based on the environment, and changing ICU's locale may introduce inconsistencies within Node.js applications. It won't affect the locale used by the C and C++ standard libraries, it won't affect other dependencies of Node.js that don't use ICU, it won't correctly propagate to native addons, it won't correctly propagate to child processes, etc.

See also #28099 (comment). cc @bnoordhuis

@bnoordhuis
Copy link
Member

bnoordhuis commented Nov 3, 2023

Another thing: this should use NODE_OPTIONS instead of inventing a new environment variable.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Nov 3, 2023

I will have a look later regarding implementing it as NODE_OPTIONS.

@Uzlopak Uzlopak force-pushed the set-locale-per-cli branch from ad61226 to 423f20b Compare November 3, 2023 22:53
@Uzlopak Uzlopak changed the title lib: set default locale per cli option --locale or LOCALE env var lib: set default locale per cli option --locale Nov 3, 2023
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Nov 3, 2023

@bnoordhuis

I hope this is to your liking?

@Uzlopak Uzlopak force-pushed the set-locale-per-cli branch from 423f20b to 49423d0 Compare November 4, 2023 10:55
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Nov 4, 2023

@bnoordhuis PTAL

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes themselves LGTM. I don't know if it's a good change for the reasons Tobias points out.

Copy link
Member

@tniessen tniessen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Ben, I am also not sure if this is a good change.

In any case, if this doesn't properly change the locale for the entire process tree, it should probably be called --icu-locale. See also #50538 (review) and #28099 (comment).

@Uzlopak Uzlopak force-pushed the set-locale-per-cli branch from 49423d0 to 9aea65e Compare November 6, 2023 10:20
@Uzlopak Uzlopak changed the title lib: set default locale per cli option --locale lib: set default ICU locale per cli option --icu-locale Nov 6, 2023
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Nov 6, 2023

@tniessen
Renamed it to --icu-locale

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Nov 9, 2023

Anybody knows how I could get the value of Intl?.Collator().resolvedOptions().locale without a roundtrip to js and back?

@bnoordhuis
Copy link
Member

It's icu::Locale::toLanguageTag<std::string>()

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Nov 15, 2023

@bnoordhuis

Do you have a code example, please? It doesnt seem trivial to use it.

@Uzlopak Uzlopak force-pushed the set-locale-per-cli branch 2 times, most recently from 70a8143 to 7863b37 Compare August 14, 2024 10:15
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Aug 14, 2024

@bnoordhuis
@tniessen

What about now?

@Uzlopak Uzlopak force-pushed the set-locale-per-cli branch from 7863b37 to fe4f31b Compare August 14, 2024 10:20
@codecov
Copy link

codecov bot commented Aug 14, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 2 lines in your changes missing coverage. Please review.

Project coverage is 87.09%. Comparing base (d0f5943) to head (fe4f31b).
Report is 5 commits behind head on main.

Files Patch % Lines
src/node_i18n.cc 88.88% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #50538   +/-   ##
=======================================
  Coverage   87.09%   87.09%           
=======================================
  Files         648      648           
  Lines      182213   182238   +25     
  Branches    34961    34968    +7     
=======================================
+ Hits       158703   158729   +26     
+ Misses      16794    16784   -10     
- Partials     6716     6725    +9     
Files Coverage Δ
src/node.cc 74.14% <100.00%> (+0.11%) ⬆️
src/node_i18n.h 80.00% <ø> (ø)
src/node_options.cc 88.16% <100.00%> (+0.01%) ⬆️
src/node_options.h 98.17% <ø> (ø)
src/node_i18n.cc 76.36% <88.88%> (+0.47%) ⬆️

... and 18 files with indirect coverage changes

Comment on lines +1374 to +1375
representing the language version as defined in [RFC 5646][] (also known as
BCP 47).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
representing the language version as defined in [RFC 5646][] (also known as
BCP 47).
representing the language version as defined in [IETF BCP 47][].

don't refer to 5646, that's the reason there's a BCP 47.

[OSSL_PROVIDER-legacy]: https://www.openssl.org/docs/man3.0/man7/OSSL_PROVIDER-legacy.html
[Permission Model]: permissions.md#permission-model
[REPL]: repl.md
[RFC 5646]: https://www.rfc-editor.org/rfc/rfc5646.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[RFC 5646]: https://www.rfc-editor.org/rfc/rfc5646.txt
[BCP 47]: https://www.rfc-editor.org/info/bcp47

Comment on lines +619 to +630
const char* newDefaultLocale = uloc_getDefault();
int newDefaultLocaleLen = strlen(newDefaultLocale);
int32_t locCount = 0;
const icu::Locale* supportedLocales =
icu::Locale::getAvailableLocales(locCount);
for (int32_t i = 0; i < locCount; ++i) {
if (strncmp(newDefaultLocale,
supportedLocales[i].getName(),
newDefaultLocaleLen) == 0) {
return;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the purpose of this. Just let the user set the default locale to whatever they want. The user's own system might be set to something not available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set the --icu-locale to invalid values and as long as it is in the right format, it will set it. It will only go to und if it is an invalid locale.This ensures, that the locale is existing and valid.

Tbh. I didnt know how I could get the BestMatcher Class running. So I wrote this logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of trying to protect the user from invalid values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment by @bnoordhuis

#50538 (comment)

'-p',
'new Date(Date.UTC(2012, 11, 20, 3, 0, 0)).toLocaleString(undefined, { timeZone: "UTC" })',
]);
assert.strictEqual(child.stdout.toString(), '2012. 12. 20. 오전 3:00:00\n');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like these kinds of tests. At all. You are hard coding linguistic data that is guaranteed to change with every release, and even with local modifications the user makes.

I would recommend NOT testing for unsupported locales (you can't predict which locales are included), and not testing for specific formats. We've discussed this elsewhere.

Copy link
Contributor Author

@Uzlopak Uzlopak Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took these examples from MDN to ensure it works as expected

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't expect it. The data changes every single time. See https://www.unicode.org/cldr/charts/latest/delta/index.html and click on any locale.

I think a better test would simply be to verify that the output is DIFFERENT when a different locale is chosen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. that is also a good approach.

'-p',
'new Intl.ListFormat(undefined, {style: "long", type:"conjunction"}).format(["a", "b", "c"])',
]);
assert.strictEqual(child.stdout.toString(), 'a, b und c\n');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps just keep ONE of these tests, to see that setting the locale makes a difference. Knowing that test might need updates. Delete the rest of the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think two are needed each?! On my machine the locale is de-DE so this test would pass always on my machine, even if the locale flag would not work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe two. you can also set LC_ALL on a unix type env to show the distinction betwen the env and the parameter

$ env LC_ALL=de node -p 'new Date().toLocaleString()'
6.9.2024, 12:14:35
$ env LC_ALL=en node -p 'new Date().toLocaleString()'
9/6/2024, 12:14:39 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then i could not test it on windows?

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Sep 6, 2024

@srl295
I followed you on twitter, if you like to chat about it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants