Skip to content

sort: numeric sort (-n) does not recognize thousand separators #10316

@sylvestre

Description

@sylvestre

sort: numeric sort (-n) does not recognize thousand separators

Component

sort

Description

GNU sort uses the locale's thousand separator when parsing numbers in numeric sort mode (-n). It retrieves the separator from localeconv() and passes it to the number comparison function.

In GNU sort, the thousand separator is obtained from the locale.

struct lconv const *locale = localeconv ();
...
thousands_sep = locale->thousands_sep[0];

This separator is then used in numeric comparison in numcompare.

return strnumcmp (a, b, decimal_point, thousands_sep);

However, in uutils sort, the NumInfoParseSettings struct has a thousands_separator field, but it defaults to None.

impl Default for NumInfoParseSettings {
    fn default() -> Self {
        Self {
            accept_si_units: false,
            thousands_separator: None,
            decimal_pt: Some(b'.'),
        }
    }
}

When parsing numbers for numeric sort at line 898-903, the default settings are used without setting the thousand separator from locale.

let (info, num_range) = NumInfo::parse(
    range_str,
    &NumInfoParseSettings {
        accept_si_units: self.settings.mode == SortMode::HumanNumeric,
        ..Default::default()
    },
);

As a result, "1,000" is parsed as "1" because the comma terminates number parsing.

Test / Reproduction Steps

# GNU
$ printf '1,000\n500\n2,000\n100\n' | sort -n
100
500
1,000
2,000

# uutils
$ printf '1,000\n500\n2,000\n100\n' | coreutils sort -n
1,000
2,000
100
500

Impact

Numbers with thousand separators are incorrectly parsed and causes wrong sort order.

Recommendations

Retrieve the locale's thousand separator and pass it to NumInfoParseSettings.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions