Skip to content

Conversation

@nirandaperera
Copy link
Contributor

This change adds a Bitmap::VisitWordsAndWrite method, that outputs the values of the visitor lambda function to a provided bitmap.

@github-actions
Copy link

github-actions bot commented Jun 9, 2021

@bkietz bkietz self-requested a review June 9, 2021 18:50
@nirandaperera
Copy link
Contributor Author

@github-actions autotune

@nirandaperera nirandaperera marked this pull request as ready for review June 11, 2021 20:28
Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, thanks for working on this

@bkietz
Copy link
Member

bkietz commented Jun 15, 2021

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Jun 15, 2021

Benchmark runs are scheduled for baseline = 39dcb43 and contender = d0cead51aa9eb0f5e4a9b094236632ffdb8c4436. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2 (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-thinkcentre-m75q (mimalloc)

@nirandaperera
Copy link
Contributor Author

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Jun 22, 2021

Benchmark runs are scheduled for baseline = c913aa3 and contender = bcce18e5d4d83f0831de71b363ad91470376084c. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2 (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-thinkcentre-m75q (mimalloc)

@nirandaperera
Copy link
Contributor Author

@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-scalar-boolean-benchmark

@nirandaperera
Copy link
Contributor Author

@bkietz I added the changes we discussed. Following are the latest bench results in my machine.

Results for master commit: c7c959a26a6512b0ad078a06df474617f1b306aa
----------------------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------
ArrayArrayKernel<And>/32768/10000               9.21 us         9.21 us        69999 bytes_per_second=3.31485G/s items_per_second=28.4743G/s null_percent=0.01 size=32.768k
ArrayArrayKernel<And>/32768/100                 9.11 us         9.11 us        74446 bytes_per_second=3.35138G/s items_per_second=28.7882G/s null_percent=1 size=32.768k
ArrayArrayKernel<And>/32768/10                  8.06 us         8.06 us       103589 bytes_per_second=3.78621G/s items_per_second=32.5233G/s null_percent=10 size=32.768k
ArrayArrayKernel<And>/32768/2                   7.27 us         7.27 us        96371 bytes_per_second=4.19765G/s items_per_second=36.0575G/s null_percent=50 size=32.768k
ArrayArrayKernel<And>/32768/1                   8.96 us         8.95 us        91746 bytes_per_second=3.40815G/s items_per_second=29.2758G/s null_percent=100 size=32.768k
ArrayArrayKernel<And>/32768/0                   7.87 us         7.87 us        78959 bytes_per_second=3.87712G/s items_per_second=33.3042G/s null_percent=0 size=32.768k
ArrayArrayKernel<And>/1048576/10000              335 us          335 us         2080 bytes_per_second=2.91231G/s items_per_second=25.0165G/s null_percent=0.01 size=1048.58k
ArrayArrayKernel<And>/1048576/100                334 us          334 us         2098 bytes_per_second=2.92463G/s items_per_second=25.1224G/s null_percent=1 size=1048.58k
ArrayArrayKernel<And>/1048576/10                 336 us          336 us         2089 bytes_per_second=2.90452G/s items_per_second=24.9496G/s null_percent=10 size=1048.58k
ArrayArrayKernel<And>/1048576/2                  336 us          336 us         2077 bytes_per_second=2.90715G/s items_per_second=24.9722G/s null_percent=50 size=1048.58k
ArrayArrayKernel<And>/1048576/1                  238 us          238 us         2944 bytes_per_second=4.10794G/s items_per_second=35.2869G/s null_percent=100 size=1048.58k
ArrayArrayKernel<And>/1048576/0                  239 us          239 us         2932 bytes_per_second=4.0919G/s items_per_second=35.1491G/s null_percent=0 size=1048.58k
ArrayArrayKernel<KleeneAnd>/32768/10000         15.7 us         15.7 us        44167 bytes_per_second=1.9399G/s items_per_second=16.6637G/s null_percent=0.01 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/100           15.7 us         15.7 us        44447 bytes_per_second=1.94457G/s items_per_second=16.7038G/s null_percent=1 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/10            15.7 us         15.7 us        44661 bytes_per_second=1.94512G/s items_per_second=16.7085G/s null_percent=10 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/2             15.7 us         15.7 us        44260 bytes_per_second=1.94383G/s items_per_second=16.6974G/s null_percent=50 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/1             15.6 us         15.6 us        43756 bytes_per_second=1.95105G/s items_per_second=16.7594G/s null_percent=100 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/0             5.28 us         5.28 us       136108 bytes_per_second=5.78137G/s items_per_second=49.6616G/s null_percent=0 size=32.768k
ArrayArrayKernel<KleeneAnd>/1048576/10000        483 us          483 us         1447 bytes_per_second=2.02148G/s items_per_second=17.3643G/s null_percent=0.01 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/100          484 us          484 us         1447 bytes_per_second=2.0196G/s items_per_second=17.3482G/s null_percent=1 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/10           483 us          483 us         1445 bytes_per_second=2.02036G/s items_per_second=17.3548G/s null_percent=10 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/2            484 us          484 us         1448 bytes_per_second=2.0172G/s items_per_second=17.3276G/s null_percent=50 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/1            484 us          484 us         1448 bytes_per_second=2.01958G/s items_per_second=17.3481G/s null_percent=100 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/0            198 us          198 us         3541 bytes_per_second=4.93652G/s items_per_second=42.4044G/s null_percent=0 size=1048.58k



Results for the PR:

----------------------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------
ArrayArrayKernel<And>/32768/10000               8.64 us         8.64 us        82671 bytes_per_second=3.53361G/s items_per_second=30.3535G/s null_percent=0.01 size=32.768k
ArrayArrayKernel<And>/32768/100                 10.4 us         10.2 us        68076 bytes_per_second=2.99687G/s items_per_second=25.7429G/s null_percent=1 size=32.768k
ArrayArrayKernel<And>/32768/10                  10.4 us         10.4 us        64431 bytes_per_second=2.94554G/s items_per_second=25.302G/s null_percent=10 size=32.768k
ArrayArrayKernel<And>/32768/2                   7.92 us         7.92 us       102414 bytes_per_second=3.85474G/s items_per_second=33.1119G/s null_percent=50 size=32.768k
ArrayArrayKernel<And>/32768/1                   7.37 us         7.37 us        97390 bytes_per_second=4.14169G/s items_per_second=35.5769G/s null_percent=100 size=32.768k
ArrayArrayKernel<And>/32768/0                   6.48 us         6.48 us       106870 bytes_per_second=4.71213G/s items_per_second=40.4769G/s null_percent=0 size=32.768k
ArrayArrayKernel<And>/1048576/10000              336 us          336 us         2086 bytes_per_second=2.90622G/s items_per_second=24.9642G/s null_percent=0.01 size=1048.58k
ArrayArrayKernel<And>/1048576/100                333 us          333 us         2091 bytes_per_second=2.93627G/s items_per_second=25.2224G/s null_percent=1 size=1048.58k
ArrayArrayKernel<And>/1048576/10                 333 us          333 us         2101 bytes_per_second=2.93204G/s items_per_second=25.186G/s null_percent=10 size=1048.58k
ArrayArrayKernel<And>/1048576/2                  333 us          333 us         2104 bytes_per_second=2.93041G/s items_per_second=25.172G/s null_percent=50 size=1048.58k
ArrayArrayKernel<And>/1048576/1                  238 us          238 us         2947 bytes_per_second=4.10724G/s items_per_second=35.2809G/s null_percent=100 size=1048.58k
ArrayArrayKernel<And>/1048576/0                  239 us          239 us         2939 bytes_per_second=4.09211G/s items_per_second=35.1509G/s null_percent=0 size=1048.58k
ArrayArrayKernel<KleeneAnd>/32768/10000         13.4 us         13.4 us        52481 bytes_per_second=2.28086G/s items_per_second=19.5925G/s null_percent=0.01 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/100           13.3 us         13.3 us        52642 bytes_per_second=2.28768G/s items_per_second=19.651G/s null_percent=1 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/10            13.3 us         13.3 us        52251 bytes_per_second=2.29669G/s items_per_second=19.7284G/s null_percent=10 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/2             13.3 us         13.3 us        52684 bytes_per_second=2.29603G/s items_per_second=19.7227G/s null_percent=50 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/1             13.4 us         13.4 us        52555 bytes_per_second=2.28596G/s items_per_second=19.6362G/s null_percent=100 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/0             6.10 us         6.10 us       118983 bytes_per_second=5.00416G/s items_per_second=42.9854G/s null_percent=0 size=32.768k
ArrayArrayKernel<KleeneAnd>/1048576/10000        388 us          388 us         1807 bytes_per_second=2.51753G/s items_per_second=21.6254G/s null_percent=0.01 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/100          389 us          389 us         1805 bytes_per_second=2.50864G/s items_per_second=21.5491G/s null_percent=1 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/10           390 us          390 us         1804 bytes_per_second=2.50681G/s items_per_second=21.5333G/s null_percent=10 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/2            391 us          391 us         1803 bytes_per_second=2.49759G/s items_per_second=21.4541G/s null_percent=50 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/1            388 us          388 us         1807 bytes_per_second=2.51458G/s items_per_second=21.6001G/s null_percent=100 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/0            238 us          238 us         2939 bytes_per_second=4.11143G/s items_per_second=35.3169G/s null_percent=0 size=1048.58k

@nirandaperera
Copy link
Contributor Author

@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-scalar-boolean-benchmark

@ursabot
Copy link

ursabot commented Jun 28, 2021

Benchmark runs are scheduled for baseline = c913aa3 and contender = 788bd495a8ca8c180355e9066387824fc972d734. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Only ['lang', 'name'] filters are supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2 (mimalloc)
[Skipped ⚠️ Only ['lang', 'name'] filters are supported on ursa-i9-9960x] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.25% ⬆️0.0%] ursa-thinkcentre-m75q (mimalloc)

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits

@nirandaperera
Copy link
Contributor Author

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Jun 28, 2021

Benchmark runs are scheduled for baseline = c913aa3 and contender = 0631e7bbb5042adb5299440572c5b49633dc58fb. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2 (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-thinkcentre-m75q (mimalloc)

@nirandaperera
Copy link
Contributor Author

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Jun 28, 2021

Benchmark runs are scheduled for baseline = c913aa3 and contender = 2663d92be3f95598b00391e254eefa11cfb11279. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2 (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-thinkcentre-m75q (mimalloc)

@pitrou
Copy link
Member

pitrou commented Jun 29, 2021

Locally, I seem to get varying results from run to run (and also depending on the compiler), but archery benchmark diff doesn't show very worrying regressions with clang 10:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (11)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                               benchmark      baseline     contender  change %                                                                                                                                                  counters
    ArrayArrayKernel<KleeneAnd>/524288/1 4.309 GiB/sec 5.868 GiB/sec    36.176    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6226, 'null_percent': 100.0}
   ArrayArrayKernel<KleeneAnd>/524288/10 4.337 GiB/sec 5.888 GiB/sec    35.749    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6226, 'null_percent': 10.0}
  ArrayArrayKernel<KleeneAnd>/524288/100 4.335 GiB/sec 5.850 GiB/sec    34.939    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6232, 'null_percent': 1.0}
ArrayArrayKernel<KleeneAnd>/524288/10000 4.318 GiB/sec 5.755 GiB/sec    33.290 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6221, 'null_percent': 0.01}
    ArrayArrayKernel<KleeneAnd>/524288/2 4.365 GiB/sec 5.762 GiB/sec    31.994     {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6200, 'null_percent': 50.0}
     ArrayArrayKernel<KleeneAnd>/32768/1 3.204 GiB/sec 4.220 GiB/sec    31.726    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 75304, 'null_percent': 100.0}
     ArrayArrayKernel<KleeneAnd>/32768/2 3.370 GiB/sec 4.212 GiB/sec    24.961     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 77611, 'null_percent': 50.0}
 ArrayArrayKernel<KleeneAnd>/32768/10000 3.360 GiB/sec 4.197 GiB/sec    24.906 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 76897, 'null_percent': 0.01}
    ArrayArrayKernel<KleeneAnd>/32768/10 3.373 GiB/sec 4.208 GiB/sec    24.745    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 76903, 'null_percent': 10.0}
   ArrayArrayKernel<KleeneAnd>/32768/100 3.383 GiB/sec 4.202 GiB/sec    24.212    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 77559, 'null_percent': 1.0}
     ArrayArrayKernel<KleeneAnd>/32768/0 7.838 GiB/sec 7.956 GiB/sec     1.499     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 187182, 'null_percent': 0.0}

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (1)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                           benchmark       baseline      contender  change %                                                                                                                                              counters
ArrayArrayKernel<KleeneAnd>/524288/0 15.853 GiB/sec 14.343 GiB/sec    -9.527 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 23504, 'null_percent': 0.0}

(on Ubuntu 20.04, AMD Ryzen 3900)

@pitrou
Copy link
Member

pitrou commented Jun 29, 2021

There are more regressions with gcc 9, though:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (2)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                           benchmark      baseline     contender  change %                                                                                                                                               counters
ArrayArrayKernel<KleeneAnd>/524288/1 2.149 GiB/sec 2.318 GiB/sec     7.887 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3713, 'null_percent': 100.0}
ArrayArrayKernel<KleeneAnd>/524288/2 2.249 GiB/sec 2.262 GiB/sec     0.569  {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4816, 'null_percent': 50.0}

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (10)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                               benchmark       baseline      contender  change %                                                                                                                                                  counters
     ArrayArrayKernel<KleeneAnd>/32768/0  7.645 GiB/sec  6.972 GiB/sec    -8.804     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 178864, 'null_percent': 0.0}
    ArrayArrayKernel<KleeneAnd>/524288/0 16.468 GiB/sec 13.337 GiB/sec   -19.014     {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 22855, 'null_percent': 0.0}
   ArrayArrayKernel<KleeneAnd>/524288/10  2.801 GiB/sec  2.040 GiB/sec   -27.172    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4221, 'null_percent': 10.0}
  ArrayArrayKernel<KleeneAnd>/524288/100  2.984 GiB/sec  2.147 GiB/sec   -28.031    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5191, 'null_percent': 1.0}
ArrayArrayKernel<KleeneAnd>/524288/10000  3.224 GiB/sec  2.250 GiB/sec   -30.221 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5014, 'null_percent': 0.01}
    ArrayArrayKernel<KleeneAnd>/32768/10  2.991 GiB/sec  1.970 GiB/sec   -34.143    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68647, 'null_percent': 10.0}
 ArrayArrayKernel<KleeneAnd>/32768/10000  2.991 GiB/sec  1.964 GiB/sec   -34.335 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68084, 'null_percent': 0.01}
     ArrayArrayKernel<KleeneAnd>/32768/2  2.999 GiB/sec  1.969 GiB/sec   -34.335     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68549, 'null_percent': 50.0}
     ArrayArrayKernel<KleeneAnd>/32768/1  2.997 GiB/sec  1.966 GiB/sec   -34.402    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68530, 'null_percent': 100.0}
   ArrayArrayKernel<KleeneAnd>/32768/100  3.000 GiB/sec  1.967 GiB/sec   -34.414    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68686, 'null_percent': 1.0}

@nirandaperera
Copy link
Contributor Author

ArrayArrayKernel/524288/0 15.853 GiB/sec 14.343 GiB/sec -9.527 {'run_name': 'ArrayArrayKernel/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 23504, 'null_percent': 0.0}

Regression for ArrayArrayKernel<KleeneAnd>/32768/0 and ArrayArrayKernel<KleeneAnd>/524288/0 is expected because we are populating a validity buffer always now, because the exec infrastructure always allocates memory for the validity buffer.

@pitrou
Copy link
Member

pitrou commented Jun 29, 2021

What's weird as well is that, sometimes, L2-sized benchmarks are faster than L1-sized, but sometimes they are slower.

@pitrou
Copy link
Member

pitrou commented Jun 29, 2021

@nirandaperera I see, thanks for the insight.

@pitrou
Copy link
Member

pitrou commented Jun 29, 2021

In any case, I don't think the regressions are really terrible in themselves.

@nirandaperera
Copy link
Contributor Author

I got archery running on my machine and I can confirm that gcc-9 is the problem there. If I use clang-10, it shows a better performance. But gcc-9 shows a lot of regressions.

@nirandaperera
Copy link
Contributor Author

nirandaperera commented Jun 29, 2021

Did some further analysis on this. It turns out that gcc-10 works much better than gcc-9.
https://gist.github.com/nirandaperera/0bcd40c223fd32105d027a86a571334f

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think since the performance regression is compiler dependent we don't need to worry about it here. Thanks for doing this!

I'll merge shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants