Skip to content

uniq: fix -w to count bytes in C locale#11061

Merged
cakebaker merged 3 commits intouutils:mainfrom
aguimaraes:uniq-fix-w-locale-bytes
Feb 23, 2026
Merged

uniq: fix -w to count bytes in C locale#11061
cakebaker merged 3 commits intouutils:mainfrom
aguimaraes:uniq-fix-w-locale-bytes

Conversation

@aguimaraes
Copy link
Contributor

@aguimaraes aguimaraes commented Feb 23, 2026

Summary

uniq -w N should count bytes in C/POSIX locale and characters in UTF-8 locale. Currently it always counts UTF-8 characters regardless of locale.

Changes

  • Added is_c_locale() helper that checks LC_ALL, LC_CTYPE, LANG in order
  • Modified key_end_index() to use byte counting when in C locale
  • Added test for C locale byte counting behavior
  • Fixed test_stdin_w1_multibyte to explicitly set UTF-8 locale (it was implicitly relying on character counting)

Considerations

I chose to inline the locale check (~9 lines) rather than adding the i18n feature dependency. The check is simple enough that duplicating it seemed better than pulling in ICU dependencies just for this.

If you'd prefer I use uucore::i18n instead, let me know and I'll update.

Fixes #10184

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)

@cakebaker cakebaker merged commit 3883cea into uutils:main Feb 23, 2026
157 checks passed
@cakebaker
Copy link
Contributor

Thanks!

abendrothj pushed a commit to abendrothj/coreutils that referenced this pull request Feb 23, 2026
* uniq: fix -w to count bytes in C locale

* uniq: add CTYPE to spellchecker ignore list

* uniq: avoid String allocation by using std::env::var_os()
@aguimaraes aguimaraes deleted the uniq-fix-w-locale-bytes branch February 23, 2026 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

uniq: -w counts UTF-8 characters instead of bytes

3 participants