Skip to content

sort: gnu coreutils compatibility (sort float.sh)#9839

Merged
sylvestre merged 11 commits intouutils:mainfrom
mattsu2020:sort_sort-float.sh
Jan 14, 2026
Merged

sort: gnu coreutils compatibility (sort float.sh)#9839
sylvestre merged 11 commits intouutils:mainfrom
mattsu2020:sort_sort-float.sh

Conversation

@mattsu2020
Copy link
Contributor

We have made corrections to ensure the GNU coreutils tests pass.

sort-float.sh

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-float is no longer failing!

@sylvestre
Copy link
Contributor

Could you please add a test to make sure we don't regress in the future?
Thanks

@sylvestre
Copy link
Contributor

Did you run some benchmark on this btw?

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-float is no longer failing!

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-float is no longer failing!

Add locale-aware handling for decimal separators (comma vs. period) in general numeric sorting mode. This allows proper sorting of numeric fields in locales where comma is the standard decimal separator, such as many European countries.

Changes include:
- Enabling the "i18n-decimal" feature in uucore dependencies
- Introducing locale_decimal_pt() to retrieve the locale-specific decimal separator
- Updating get_leading_gen() and general_bd_parse() to accept and use the decimal point parameter
- Converting input to standard period notation for parsing when locale uses comma

This fixes sorting inaccuracies for international users with comma-based decimal notation.
…g Option::then

Refactored the `general_bd_parse` function to replace manual mutable vector initialization and conditional assignment with `Option::then` and `as_deref`, making the code more concise and idiomatic while maintaining the same logic for normalizing decimal points.
Add `effective_decimal_pt` function to intelligently select the decimal point based on input content, ensuring correct sorting when locale uses comma but input uses period as decimal separator. This fixes parsing issues in general numeric mode by prioritizing input-based detection over strict locale adherence.
Updated various crates in fuzz/Cargo.lock: windows-sys to 0.61.2, bumpalo to 3.19.1, cc to 1.2.50, crc to 3.3.0, crc-fast to 1.9.0, icu_properties and data to 2.1.2, jiff to 0.2.17. Added fixed_decimal, icu_decimal, and icu_decimal_data for improved functionality and compatibility.
…ntains method

Use input.contains() instead of input.iter().any() for checking presence of comma and dot bytes, improving readability and reducing boilerplate.
Split the uucore dependency features into multiple lines to improve code formatting and maintain consistency with other multi-line dependency declarations.
Add test to verify that sort's general numeric (-g) option correctly handles
decimal separators based on locale settings, ensuring proper ordering of
numbers like "1,9" vs "1,10" in French locale (comma as separator).
}
}

fn effective_decimal_pt(input: &[u8], locale_decimal: u8) -> u8 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function used? Shouldn't it always just default to the locale?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK fix

Comment on lines +659 to +660
let locale_decimal = locale_decimal_pt();
let decimal_pt = effective_decimal_pt(initial_selection, locale_decimal);
Copy link
Collaborator

@ChrisDryden ChrisDryden Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can just be:

let decimal_pt = locale_decimal_pt();

An example of where this causes issues is:

  echo -e "1.9\n1.1" | LC_ALL=fr_FR.utf8 /usr/bin/sort -g --stable
  # GNU output: 1.9, 1.1 (both = 1, stable)
  # PR output: 1.1, 1.9 (WRONG - treats as floats)

  echo -e "2,5\n1.9\n1,1" | LC_ALL=fr_FR.utf8 /usr/bin/sort -g
  # GNU output: 1.9, 1,1, 2,5 (values: 1, 1.1, 2.5)
  # PR output: 1,1, 1.9, 2,5 (WRONG - inconsistent decimal handling per line)

  echo -e "1.5e10\n2" | LC_ALL=fr_FR.utf8 /usr/bin/sort -g
  # GNU output: 1.5e10, 2 (1 < 2)
  # PR output: 2, 1.5e10 (WRONG - treats as 15000000000 > 2)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add these cases to the GNU test suite, will follow up with that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it

// Parse this number as BigDecimal, as this is the requirement for general numeric sorting.
Selection::AsBigDecimal(general_bd_parse(&range_str[get_leading_gen(range_str)]))
let locale_decimal = locale_decimal_pt();
let decimal_pt = effective_decimal_pt(range_str, locale_decimal);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for the comment above, you can take that entire helper function out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

…e_decimal_pt

Remove the effective_decimal_pt function and update its usages in Line and FieldSelector to directly call locale_decimal_pt, simplifying the code without altering functionality. This refactor eliminates unnecessary logic for determining decimal points in general numeric sort mode.
@github-actions
Copy link

github-actions bot commented Jan 7, 2026

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/sort/sort-float is no longer failing!

@sylvestre
Copy link
Contributor

Did you run some benchmark on this btw?

ping?

@mattsu2020
Copy link
Contributor Author

Did you run some benchmark on this btw?

ping?

Since regression is occurring, I will investigate.

@github-actions
Copy link

github-actions bot commented Jan 7, 2026

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/sort/sort-float is no longer failing!

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 7, 2026

Merging this PR will not alter performance

✅ 141 untouched benchmarks
⏩ 38 skipped benchmarks1


Comparing mattsu2020:sort_sort-float.sh (2f14b2d) with main (574f6ba)

Open in CodSpeed

Footnotes

  1. 38 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-float is no longer failing!

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-float is no longer failing!

@sylvestre sylvestre merged commit 6b49ff9 into uutils:main Jan 14, 2026
132 checks passed
@mattsu2020 mattsu2020 deleted the sort_sort-float.sh branch January 14, 2026 07:03
mattsu2020 added a commit to mattsu2020/coreutils that referenced this pull request Jan 23, 2026
* feat(sort): support international decimal separators in numeric sorting

M
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants