Skip to content

ptx: implement -S/--sentence-regexp#9682

Merged
sylvestre merged 6 commits intouutils:mainfrom
CrazyRoka:ptx-implement-sentence-regexp
Dec 28, 2025
Merged

ptx: implement -S/--sentence-regexp#9682
sylvestre merged 6 commits intouutils:mainfrom
CrazyRoka:ptx-implement-sentence-regexp

Conversation

@CrazyRoka
Copy link
Contributor

Description
This PR implements the -S / --sentence-regexp flag for ptx, bringing it closer to full GNU compatibility.

Previously, ptx only supported splitting input by lines. This change allows users to define a custom regular expression to split the input into sentences, as specified in the GNU documentation.

Tests
GNU Compatibility: This fixes the previously failing tests/ptx/ptx.pl test case S-infloop.
Unit Tests: Added new Rust unit tests in tests/by-util/test_ptx.rs.

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/tail/inotify-dir-recreate is now passing!

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/ptx/ptx is no longer failing!

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/ptx/ptx is no longer failing!

@CrazyRoka CrazyRoka force-pushed the ptx-implement-sentence-regexp branch from e0a5cd3 to 6009e8e Compare December 27, 2025 22:01
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/ptx/ptx is no longer failing!

@CrazyRoka CrazyRoka force-pushed the ptx-implement-sentence-regexp branch from 6009e8e to b5e643b Compare December 27, 2025 22:48
Comment on lines +307 to +317
let lines = if let Some(re) = &sentence_splitter {
let mut buffer = String::new();
reader.read_to_string(&mut buffer)?;

re.split(&buffer)
.map(|s| s.replace('\n', " ")) // ptx behavior: newlines become spaces inside sentences
.filter(|s| !s.is_empty()) // remove empty sentences
.collect()
} else {
reader.lines().collect::<std::io::Result<Vec<String>>>()?
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move that into a function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, refactored in 81cc64d

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/ptx/ptx is no longer failing!
Note: The gnu test tests/csplit/csplit-heap is now being skipped but was previously passing.
Note: The gnu test tests/cut/cut-huge-range is now being skipped but was previously passing.
Note: The gnu test tests/printf/printf-surprise is now being skipped but was previously passing.
Note: The gnu test tests/rm/many-dir-entries-vs-OOM is now being skipped but was previously passing.

@CrazyRoka CrazyRoka force-pushed the ptx-implement-sentence-regexp branch from 81cc64d to 4ef57f5 Compare December 27, 2025 23:10
@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/ptx/ptx is no longer failing!

@sylvestre sylvestre merged commit 5b70ed4 into uutils:main Dec 28, 2025
124 of 128 checks passed
sylvestre added a commit to sylvestre/coreutils that referenced this pull request Dec 28, 2025
@CrazyRoka CrazyRoka deleted the ptx-implement-sentence-regexp branch January 3, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants