Skip to content

uu-tail performance drop when reading from stdin #3842

@Joining7943

Description

@Joining7943

I'm new to rust and I like this project, so I take parts of it sometimes as reference for my rust learning projects. During tests I noticed that uutils tail version is blazingly fast reading large files (my test file tests/inputs/bigger.txt is ~500MB full of random text with 10_000_000 lines although not very long lines) from disk and is up to 15x faster than the gnu version, but when it comes to reading from stdin the performance drops significantly. Please note, I ran these benchmarks just to get a first impression for relative performance differences between the tested programs. Here's a quick overview over the test file tests/inputs/bigger.txt, tail and uu-tail:

❯ wc --lines --words --bytes --chars --max-line-length tests/inputs/bigger.txt
 10000000 105001050 577418760 577418760       101 tests/inputs/bigger.txt
❯ tail --version
tail (GNU coreutils) 9.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin, David MacKenzie, Ian Lance Taylor,
and Jim Meyering.
❯ ~/workspace/external/uutils/coreutils/target/release/tail --version
/home/lenny/workspace/external/uutils/coreutils/target/release/tail 0.0.14
Benchmark of {tail,uu-tail} -n +{10,1000,100000,10000000} tests/inputs/bigger.txt in which uu-tail is faster than gnu's tail. However there is a performance drop of uu-tail running with `-n +10_000_000` at the end of the benchmark test run.
❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/coreutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n +{values} tests/inputs/bigger.txt'
Benchmark 1: tail -n +10 tests/inputs/bigger.txt
  Time (mean ± σ):     313.9 ms ±   3.4 ms    [User: 61.7 ms, System: 251.6 ms]
  Range (min … max):   310.9 ms … 322.5 ms    10 runs

Benchmark 2: ~/workspace/external/uutils/coreutils/target/release/tail -n +10 tests/inputs/bigger.txt
  Time (mean ± σ):      20.8 ms ±   0.7 ms    [User: 2.9 ms, System: 19.7 ms]
  Range (min … max):    20.2 ms …  24.5 ms    111 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (23.2 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 3: tail -n +1000 tests/inputs/bigger.txt
  Time (mean ± σ):     317.6 ms ±   4.7 ms    [User: 62.6 ms, System: 254.4 ms]
  Range (min … max):   312.0 ms … 329.3 ms    10 runs

Benchmark 4: ~/workspace/external/uutils/coreutils/target/release/tail -n +1000 tests/inputs/bigger.txt
  Time (mean ± σ):      21.0 ms ±   1.8 ms    [User: 2.5 ms, System: 20.1 ms]
  Range (min … max):    20.1 ms …  34.1 ms    115 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 5: tail -n +100000 tests/inputs/bigger.txt
  Time (mean ± σ):     318.0 ms ±   4.9 ms    [User: 69.6 ms, System: 248.0 ms]
  Range (min … max):   313.2 ms … 330.3 ms    10 runs

Benchmark 6: ~/workspace/external/uutils/coreutils/target/release/tail -n +100000 tests/inputs/bigger.txt
  Time (mean ± σ):      25.6 ms ±   0.5 ms    [User: 5.2 ms, System: 21.4 ms]
  Range (min … max):    24.9 ms …  27.8 ms    97 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 7: tail -n +10000000 tests/inputs/bigger.txt
  Time (mean ± σ):     214.3 ms ±   3.5 ms    [User: 125.7 ms, System: 88.3 ms]
  Range (min … max):   212.2 ms … 222.8 ms    13 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 8: ~/workspace/external/uutils/coreutils/target/release/tail -n +10000000 tests/inputs/bigger.txt
  Time (mean ± σ):     371.8 ms ±   5.4 ms    [User: 284.7 ms, System: 86.6 ms]
  Range (min … max):   366.4 ms … 379.4 ms    10 runs

Summary
  '~/workspace/external/uutils/coreutils/target/release/tail -n +10 tests/inputs/bigger.txt' ran
    1.01 ± 0.09 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n +1000 tests/inputs/bigger.txt'
    1.23 ± 0.05 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n +100000 tests/inputs/bigger.txt'
   10.28 ± 0.37 times faster than 'tail -n +10000000 tests/inputs/bigger.txt'
   15.06 ± 0.51 times faster than 'tail -n +10 tests/inputs/bigger.txt'
   15.24 ± 0.54 times faster than 'tail -n +1000 tests/inputs/bigger.txt'
   15.26 ± 0.54 times faster than 'tail -n +100000 tests/inputs/bigger.txt'
   17.84 ± 0.63 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n +10000000 tests/inputs/bigger.txt'

Now with -n -{values} instead of -n +{values} the gnu version and uu-tail version are pretty close.

Benchmark of {tail,uu-tail} -n -{10,1000,100000,10000000} tests/inputs/bigger.txt
❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/coreutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n -{values} tests/inputs/bigger.txt'
Benchmark 1: tail -n -10 tests/inputs/bigger.txt
  Time (mean ± σ):       0.4 ms ±   0.3 ms    [User: 0.7 ms, System: 0.5 ms]
  Range (min … max):     0.0 ms …   3.1 ms    803 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ~/workspace/external/uutils/coreutils/target/release/tail -n -10 tests/inputs/bigger.txt
  Time (mean ± σ):       0.5 ms ±   0.5 ms    [User: 0.7 ms, System: 0.6 ms]
  Range (min … max):     0.0 ms …   4.5 ms    714 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.

Benchmark 3: tail -n -1000 tests/inputs/bigger.txt
  Time (mean ± σ):       0.5 ms ±   0.3 ms    [User: 0.7 ms, System: 0.6 ms]
  Range (min … max):     0.0 ms …   3.0 ms    803 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 4: ~/workspace/external/uutils/coreutils/target/release/tail -n -1000 tests/inputs/bigger.txt
  Time (mean ± σ):       0.7 ms ±   0.4 ms    [User: 0.8 ms, System: 0.6 ms]
  Range (min … max):     0.0 ms …   4.9 ms    803 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 5: tail -n -100000 tests/inputs/bigger.txt
  Time (mean ± σ):       5.5 ms ±   0.7 ms    [User: 3.5 ms, System: 3.5 ms]
  Range (min … max):     4.9 ms …   9.7 ms    295 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 6: ~/workspace/external/uutils/coreutils/target/release/tail -n -100000 tests/inputs/bigger.txt
  Time (mean ± σ):       7.0 ms ±   0.6 ms    [User: 5.9 ms, System: 2.0 ms]
  Range (min … max):     5.7 ms …  10.6 ms    275 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 7: tail -n -10000000 tests/inputs/bigger.txt
  Time (mean ± σ):     546.3 ms ±   6.3 ms    [User: 207.4 ms, System: 338.1 ms]
  Range (min … max):   537.4 ms … 554.5 ms    10 runs

Benchmark 8: ~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 tests/inputs/bigger.txt
  Time (mean ± σ):     499.1 ms ±   4.2 ms    [User: 416.0 ms, System: 82.4 ms]
  Range (min … max):   495.6 ms … 506.0 ms    10 runs

Summary
  'tail -n -10 tests/inputs/bigger.txt' ran
    1.08 ± 1.03 times faster than 'tail -n -1000 tests/inputs/bigger.txt'
    1.16 ± 1.33 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10 tests/inputs/bigger.txt'
    1.61 ± 1.45 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -1000 tests/inputs/bigger.txt'
   13.06 ± 8.12 times faster than 'tail -n -100000 tests/inputs/bigger.txt'
   16.74 ± 10.30 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -100000 tests/inputs/bigger.txt'
 1185.55 ± 722.84 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 tests/inputs/bigger.txt'
 1297.59 ± 791.22 times faster than 'tail -n -10000000 tests/inputs/bigger.txt'

But when it comes to reading from stdin uu-tail is significantly slower than the gnu version

Benchmark of {tail,uu-tail} -n -{10,1000,100000,10000000} - < tests/inputs/bigger.txt
❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/coreutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n -{values} - < tests/inputs/bigger.txt'
Benchmark 1: tail -n -10 - < tests/inputs/bigger.txt
  Time (mean ± σ):       0.1 ms ±   0.2 ms    [User: 0.5 ms, System: 0.2 ms]
  Range (min … max):     0.0 ms …   1.7 ms    647 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: The first benchmarking run for this command was significantly slower than the rest (0.8 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 2: ~/workspace/external/uutils/coreutils/target/release/tail -n -10 - < tests/inputs/bigger.txt
  Time (mean ± σ):     611.1 ms ±   8.4 ms    [User: 500.8 ms, System: 109.8 ms]
  Range (min … max):   596.3 ms … 621.8 ms    10 runs

Benchmark 3: tail -n -1000 - < tests/inputs/bigger.txt
  Time (mean ± σ):       0.1 ms ±   0.1 ms    [User: 0.5 ms, System: 0.2 ms]
  Range (min … max):     0.0 ms …   0.8 ms    779 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: The first benchmarking run for this command was significantly slower than the rest (0.0 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 4: ~/workspace/external/uutils/coreutils/target/release/tail -n -1000 - < tests/inputs/bigger.txt
  Time (mean ± σ):     671.2 ms ±   9.1 ms    [User: 559.4 ms, System: 111.3 ms]
  Range (min … max):   662.4 ms … 692.5 ms    10 runs

Benchmark 5: tail -n -100000 - < tests/inputs/bigger.txt
  Time (mean ± σ):       5.6 ms ±   0.3 ms    [User: 3.7 ms, System: 2.3 ms]
  Range (min … max):     5.1 ms …   6.6 ms    287 runs

Benchmark 6: ~/workspace/external/uutils/coreutils/target/release/tail -n -100000 - < tests/inputs/bigger.txt
  Time (mean ± σ):      1.196 s ±  0.013 s    [User: 1.011 s, System: 0.184 s]
  Range (min … max):    1.184 s …  1.226 s    10 runs

Benchmark 7: tail -n -10000000 - < tests/inputs/bigger.txt
  Time (mean ± σ):     552.2 ms ±   6.9 ms    [User: 223.1 ms, System: 328.0 ms]
  Range (min … max):   544.2 ms … 566.8 ms    10 runs

Benchmark 8: ~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 - < tests/inputs/bigger.txt
  Time (mean ± σ):      8.179 s ±  0.160 s    [User: 3.442 s, System: 4.726 s]
  Range (min … max):    8.025 s …  8.533 s    10 runs

Summary
  'tail -n -1000 - < tests/inputs/bigger.txt' ran
    1.97 ± 4.73 times faster than 'tail -n -10 - < tests/inputs/bigger.txt'
   83.11 ± 140.53 times faster than 'tail -n -100000 - < tests/inputs/bigger.txt'
 8259.42 ± 13958.62 times faster than 'tail -n -10000000 - < tests/inputs/bigger.txt'
 9140.98 ± 15448.56 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10 - < tests/inputs/bigger.txt'
10040.23 ± 16968.30 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -1000 - < tests/inputs/bigger.txt'
17885.66 ± 30226.98 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -100000 - < tests/inputs/bigger.txt'
122339.26 ± 206764.49 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 - < tests/inputs/bigger.txt'

I hope these benchmarks help figuring out the problem and I haven't done anything wrong on my side.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions