I'm new to rust and I like this project, so I take parts of it sometimes as reference for my rust learning projects. During tests I noticed that uutils tail version is blazingly fast reading large files (my test file tests/inputs/bigger.txt is ~500MB full of random text with 10_000_000 lines although not very long lines) from disk and is up to 15x faster than the gnu version, but when it comes to reading from stdin the performance drops significantly. Please note, I ran these benchmarks just to get a first impression for relative performance differences between the tested programs. Here's a quick overview over the test file tests/inputs/bigger.txt, tail and uu-tail:
❯ wc --lines --words --bytes --chars --max-line-length tests/inputs/bigger.txt
10000000 105001050 577418760 577418760 101 tests/inputs/bigger.txt
❯ tail --version
tail (GNU coreutils) 9.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Paul Rubin, David MacKenzie, Ian Lance Taylor,
and Jim Meyering.
❯ ~/workspace/external/uutils/coreutils/target/release/tail --version
/home/lenny/workspace/external/uutils/coreutils/target/release/tail 0.0.14
Benchmark of {tail,uu-tail} -n +{10,1000,100000,10000000} tests/inputs/bigger.txt in which uu-tail is faster than gnu's tail. However there is a performance drop of uu-tail running with `-n +10_000_000` at the end of the benchmark test run.
❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/coreutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n +{values} tests/inputs/bigger.txt'
Benchmark 1: tail -n +10 tests/inputs/bigger.txt
Time (mean ± σ): 313.9 ms ± 3.4 ms [User: 61.7 ms, System: 251.6 ms]
Range (min … max): 310.9 ms … 322.5 ms 10 runs
Benchmark 2: ~/workspace/external/uutils/coreutils/target/release/tail -n +10 tests/inputs/bigger.txt
Time (mean ± σ): 20.8 ms ± 0.7 ms [User: 2.9 ms, System: 19.7 ms]
Range (min … max): 20.2 ms … 24.5 ms 111 runs
Warning: The first benchmarking run for this command was significantly slower than the rest (23.2 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
Benchmark 3: tail -n +1000 tests/inputs/bigger.txt
Time (mean ± σ): 317.6 ms ± 4.7 ms [User: 62.6 ms, System: 254.4 ms]
Range (min … max): 312.0 ms … 329.3 ms 10 runs
Benchmark 4: ~/workspace/external/uutils/coreutils/target/release/tail -n +1000 tests/inputs/bigger.txt
Time (mean ± σ): 21.0 ms ± 1.8 ms [User: 2.5 ms, System: 20.1 ms]
Range (min … max): 20.1 ms … 34.1 ms 115 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 5: tail -n +100000 tests/inputs/bigger.txt
Time (mean ± σ): 318.0 ms ± 4.9 ms [User: 69.6 ms, System: 248.0 ms]
Range (min … max): 313.2 ms … 330.3 ms 10 runs
Benchmark 6: ~/workspace/external/uutils/coreutils/target/release/tail -n +100000 tests/inputs/bigger.txt
Time (mean ± σ): 25.6 ms ± 0.5 ms [User: 5.2 ms, System: 21.4 ms]
Range (min … max): 24.9 ms … 27.8 ms 97 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 7: tail -n +10000000 tests/inputs/bigger.txt
Time (mean ± σ): 214.3 ms ± 3.5 ms [User: 125.7 ms, System: 88.3 ms]
Range (min … max): 212.2 ms … 222.8 ms 13 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 8: ~/workspace/external/uutils/coreutils/target/release/tail -n +10000000 tests/inputs/bigger.txt
Time (mean ± σ): 371.8 ms ± 5.4 ms [User: 284.7 ms, System: 86.6 ms]
Range (min … max): 366.4 ms … 379.4 ms 10 runs
Summary
'~/workspace/external/uutils/coreutils/target/release/tail -n +10 tests/inputs/bigger.txt' ran
1.01 ± 0.09 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n +1000 tests/inputs/bigger.txt'
1.23 ± 0.05 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n +100000 tests/inputs/bigger.txt'
10.28 ± 0.37 times faster than 'tail -n +10000000 tests/inputs/bigger.txt'
15.06 ± 0.51 times faster than 'tail -n +10 tests/inputs/bigger.txt'
15.24 ± 0.54 times faster than 'tail -n +1000 tests/inputs/bigger.txt'
15.26 ± 0.54 times faster than 'tail -n +100000 tests/inputs/bigger.txt'
17.84 ± 0.63 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n +10000000 tests/inputs/bigger.txt'
Now with -n -{values} instead of -n +{values} the gnu version and uu-tail version are pretty close.
Benchmark of {tail,uu-tail} -n -{10,1000,100000,10000000} tests/inputs/bigger.txt
❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/coreutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n -{values} tests/inputs/bigger.txt'
Benchmark 1: tail -n -10 tests/inputs/bigger.txt
Time (mean ± σ): 0.4 ms ± 0.3 ms [User: 0.7 ms, System: 0.5 ms]
Range (min … max): 0.0 ms … 3.1 ms 803 runs
Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: ~/workspace/external/uutils/coreutils/target/release/tail -n -10 tests/inputs/bigger.txt
Time (mean ± σ): 0.5 ms ± 0.5 ms [User: 0.7 ms, System: 0.6 ms]
Range (min … max): 0.0 ms … 4.5 ms 714 runs
Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
Benchmark 3: tail -n -1000 tests/inputs/bigger.txt
Time (mean ± σ): 0.5 ms ± 0.3 ms [User: 0.7 ms, System: 0.6 ms]
Range (min … max): 0.0 ms … 3.0 ms 803 runs
Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 4: ~/workspace/external/uutils/coreutils/target/release/tail -n -1000 tests/inputs/bigger.txt
Time (mean ± σ): 0.7 ms ± 0.4 ms [User: 0.8 ms, System: 0.6 ms]
Range (min … max): 0.0 ms … 4.9 ms 803 runs
Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 5: tail -n -100000 tests/inputs/bigger.txt
Time (mean ± σ): 5.5 ms ± 0.7 ms [User: 3.5 ms, System: 3.5 ms]
Range (min … max): 4.9 ms … 9.7 ms 295 runs
Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 6: ~/workspace/external/uutils/coreutils/target/release/tail -n -100000 tests/inputs/bigger.txt
Time (mean ± σ): 7.0 ms ± 0.6 ms [User: 5.9 ms, System: 2.0 ms]
Range (min … max): 5.7 ms … 10.6 ms 275 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 7: tail -n -10000000 tests/inputs/bigger.txt
Time (mean ± σ): 546.3 ms ± 6.3 ms [User: 207.4 ms, System: 338.1 ms]
Range (min … max): 537.4 ms … 554.5 ms 10 runs
Benchmark 8: ~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 tests/inputs/bigger.txt
Time (mean ± σ): 499.1 ms ± 4.2 ms [User: 416.0 ms, System: 82.4 ms]
Range (min … max): 495.6 ms … 506.0 ms 10 runs
Summary
'tail -n -10 tests/inputs/bigger.txt' ran
1.08 ± 1.03 times faster than 'tail -n -1000 tests/inputs/bigger.txt'
1.16 ± 1.33 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10 tests/inputs/bigger.txt'
1.61 ± 1.45 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -1000 tests/inputs/bigger.txt'
13.06 ± 8.12 times faster than 'tail -n -100000 tests/inputs/bigger.txt'
16.74 ± 10.30 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -100000 tests/inputs/bigger.txt'
1185.55 ± 722.84 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 tests/inputs/bigger.txt'
1297.59 ± 791.22 times faster than 'tail -n -10000000 tests/inputs/bigger.txt'
But when it comes to reading from stdin uu-tail is significantly slower than the gnu version
Benchmark of {tail,uu-tail} -n -{10,1000,100000,10000000} - < tests/inputs/bigger.txt
❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/coreutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n -{values} - < tests/inputs/bigger.txt'
Benchmark 1: tail -n -10 - < tests/inputs/bigger.txt
Time (mean ± σ): 0.1 ms ± 0.2 ms [User: 0.5 ms, System: 0.2 ms]
Range (min … max): 0.0 ms … 1.7 ms 647 runs
Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
Warning: The first benchmarking run for this command was significantly slower than the rest (0.8 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
Benchmark 2: ~/workspace/external/uutils/coreutils/target/release/tail -n -10 - < tests/inputs/bigger.txt
Time (mean ± σ): 611.1 ms ± 8.4 ms [User: 500.8 ms, System: 109.8 ms]
Range (min … max): 596.3 ms … 621.8 ms 10 runs
Benchmark 3: tail -n -1000 - < tests/inputs/bigger.txt
Time (mean ± σ): 0.1 ms ± 0.1 ms [User: 0.5 ms, System: 0.2 ms]
Range (min … max): 0.0 ms … 0.8 ms 779 runs
Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
Warning: The first benchmarking run for this command was significantly slower than the rest (0.0 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
Benchmark 4: ~/workspace/external/uutils/coreutils/target/release/tail -n -1000 - < tests/inputs/bigger.txt
Time (mean ± σ): 671.2 ms ± 9.1 ms [User: 559.4 ms, System: 111.3 ms]
Range (min … max): 662.4 ms … 692.5 ms 10 runs
Benchmark 5: tail -n -100000 - < tests/inputs/bigger.txt
Time (mean ± σ): 5.6 ms ± 0.3 ms [User: 3.7 ms, System: 2.3 ms]
Range (min … max): 5.1 ms … 6.6 ms 287 runs
Benchmark 6: ~/workspace/external/uutils/coreutils/target/release/tail -n -100000 - < tests/inputs/bigger.txt
Time (mean ± σ): 1.196 s ± 0.013 s [User: 1.011 s, System: 0.184 s]
Range (min … max): 1.184 s … 1.226 s 10 runs
Benchmark 7: tail -n -10000000 - < tests/inputs/bigger.txt
Time (mean ± σ): 552.2 ms ± 6.9 ms [User: 223.1 ms, System: 328.0 ms]
Range (min … max): 544.2 ms … 566.8 ms 10 runs
Benchmark 8: ~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 - < tests/inputs/bigger.txt
Time (mean ± σ): 8.179 s ± 0.160 s [User: 3.442 s, System: 4.726 s]
Range (min … max): 8.025 s … 8.533 s 10 runs
Summary
'tail -n -1000 - < tests/inputs/bigger.txt' ran
1.97 ± 4.73 times faster than 'tail -n -10 - < tests/inputs/bigger.txt'
83.11 ± 140.53 times faster than 'tail -n -100000 - < tests/inputs/bigger.txt'
8259.42 ± 13958.62 times faster than 'tail -n -10000000 - < tests/inputs/bigger.txt'
9140.98 ± 15448.56 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10 - < tests/inputs/bigger.txt'
10040.23 ± 16968.30 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -1000 - < tests/inputs/bigger.txt'
17885.66 ± 30226.98 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -100000 - < tests/inputs/bigger.txt'
122339.26 ± 206764.49 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 - < tests/inputs/bigger.txt'
I hope these benchmarks help figuring out the problem and I haven't done anything wrong on my side.
I'm new to rust and I like this project, so I take parts of it sometimes as reference for my rust learning projects. During tests I noticed that uutils tail version is blazingly fast reading large files (my test file
tests/inputs/bigger.txtis ~500MB full of random text with10_000_000lines although not very long lines) from disk and is up to 15x faster than the gnu version, but when it comes to reading from stdin the performance drops significantly. Please note, I ran these benchmarks just to get a first impression for relative performance differences between the tested programs. Here's a quick overview over the test filetests/inputs/bigger.txt, tail and uu-tail:Benchmark of
{tail,uu-tail} -n +{10,1000,100000,10000000} tests/inputs/bigger.txtin which uu-tail is faster than gnu's tail. However there is a performance drop of uu-tail running with `-n +10_000_000` at the end of the benchmark test run.Now with
-n -{values}instead of-n +{values}the gnu version and uu-tail version are pretty close.Benchmark of
{tail,uu-tail} -n -{10,1000,100000,10000000} tests/inputs/bigger.txtBut when it comes to reading from stdin uu-tail is significantly slower than the gnu version
Benchmark of
{tail,uu-tail} -n -{10,1000,100000,10000000} - < tests/inputs/bigger.txtI hope these benchmarks help figuring out the problem and I haven't done anything wrong on my side.