Skip to content

Implement most commands#64

Merged
sylvestre merged 86 commits intouutils:mainfrom
dspinellis:process
May 18, 2025
Merged

Implement most commands#64
sylvestre merged 86 commits intouutils:mainfrom
dspinellis:process

Conversation

@dspinellis
Copy link
Copy Markdown
Collaborator

@dspinellis dspinellis commented May 6, 2025

This PR implements the s, y, and most single-letter commands. This allows the porting and implementation of integration tests and initial benchmarking. Thanks to zero-copy mmapped I/O, when few changes are made to the file, this implementation processes its input much faster than GNU sed.

$ for i in $(seq 50) ; do cat /usr/share/dict/words ; done >words-50

$ time target/release/sedapp sed '' words-50  >a
real    0m0.272s
user    0m0.130s
sys     0m0.112s

$ time /bin/sed '' words-50  >a
real    0m0.643s
user    0m0.484s
sys     0m0.134s


$ time target/release/sedapp sed 's/supercalifrangolistic//' words-50  >a
real    0m0.662s
user    0m0.514s
sys     0m0.124s

$ time /bin/sed 's/supercalifrangolistic//' words-50  >a        
real    0m0.939s
user    0m0.770s
sys     0m0.140s

$ time /bin/sed 's/s/S/' words-50  >a
real    0m1.563s
user    0m1.358s
sys     0m0.179s

$ time target/release/sedapp sed 's/s/S/' words-50  >a
real    0m1.524s
user    0m1.351s
sys     0m0.147s

dspinellis added 26 commits May 3, 2025 23:40
Add test cases for output and zero-copy verification. While at it fix a
bug that used a hardcoded stdout for the output.  This required
introducing RawFd, a stronger separation between the Unix and the
fallback implementation, and the use of temporary output files for
testing.
This was required to fix the Unix implementation.
This reverts commit 5545e82.

Apparently there is no such thing.  It doesn't work on Windows
even with on the nightly build.
The preceding one, by bringing together input and output in the same
structure,  resulted in overlapping mutable borrowing.  The current
design separates the two and thus avoids this problem.
This is required for the b t and { commands, which point to others.
This demonstrates mutable patern space and starting a new cycle.
dspinellis added 3 commits May 7, 2025 15:32
This avoids the duplication of implementation methods, which will be
required for processing data.
@dspinellis dspinellis changed the title Implement a few key commands Implement most commands May 16, 2025
For the sake of consistency; we don't employ try in other methods with
similar semantics.
@dspinellis
Copy link
Copy Markdown
Collaborator Author

dspinellis commented May 16, 2025

@sylvestre Seeing that you hadn't started the review, I pushed recent work because it addresses some comments you might have. The PR is ready for review.

My planned next task is to try to change the compiled command data structure to use bumpalo. I hope that this will simplify some hairy code while also improving performance by removing reference counters and allowing the safe and easy construction of the more complex structures required for blocks and branch commands.

dspinellis added 11 commits May 16, 2025 13:10
This will also make it easier to call compile_thread recursively.
The runtime stack would not handle jumps within blocks.  Also, patched
links are more efficient at runtime.
While at it fix the label population to avoid defining labels for the
branch command targets.
@dspinellis
Copy link
Copy Markdown
Collaborator Author

dspinellis commented May 17, 2025

@sylvestre Using bumpalo proved to be too complex and disruptive, so I abandoned the idea. Instead, I replaced the pointer juggling of the C nested block compilation design with an elegant recursive compilation scheme.

I've completed most of the technically difficult and risky work. The few remaining commands and missing features should be easy to implement.

As long as you don't start reviewing, I'll be pushing here complete batches that leave the code in a reviewable state (no half-baked stuff).

@sylvestre
Copy link
Copy Markdown
Contributor

Impressive!

/// the sub-chain, then splice its tail back to the original
/// “next” pointer of the *parent* (falling back to its own
/// parent_next if its own next was `None`).
fn patch_block_endings(head: Option<Rc<RefCell<Command>>>) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to write unit test for this function ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have; search for "// patch_block_endings".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, sorry

@sylvestre sylvestre merged commit 22fa99b into uutils:main May 18, 2025
13 checks passed
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (fb41cf3) to head (039d629).
Report is 58 commits behind head on main.

Additional details and impacted files
@@    Coverage Diff     @@
##   main   #64   +/-   ##
==========================
==========================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants