ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels #10487

nirandaperera · 2021-06-09T04:24:25Z

This change adds a Bitmap::VisitWordsAndWrite method, that outputs the values of the visitor lambda function to a provided bitmap.

github-actions · 2021-06-09T04:24:43Z

https://issues.apache.org/jira/browse/ARROW-13010

nirandaperera · 2021-06-11T01:07:38Z

@github-actions autotune

bkietz

This is looking good, thanks for working on this

cpp/src/arrow/compute/kernels/scalar_boolean.cc

cpp/src/arrow/compute/kernels/scalar_if_else.cc

cpp/src/arrow/util/bit_util.cc

cpp/src/arrow/util/bit_util_test.cc

cpp/src/arrow/util/bitmap.h

bkietz · 2021-06-15T18:31:21Z

@ursabot please benchmark

ursabot · 2021-06-15T18:32:14Z

Benchmark runs are scheduled for baseline = 39dcb43 and contender = d0cead51aa9eb0f5e4a9b094236632ffdb8c4436. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2 (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-thinkcentre-m75q (mimalloc)

cpp/src/arrow/util/bitmap.h

nirandaperera · 2021-06-22T20:10:15Z

@ursabot please benchmark

ursabot · 2021-06-22T20:11:18Z

Benchmark runs are scheduled for baseline = c913aa3 and contender = bcce18e5d4d83f0831de71b363ad91470376084c. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2 (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-thinkcentre-m75q (mimalloc)

nirandaperera · 2021-06-22T20:15:41Z

@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-scalar-boolean-benchmark

nirandaperera · 2021-06-28T19:46:20Z

@bkietz I added the changes we discussed. Following are the latest bench results in my machine.

Results for master commit: c7c959a26a6512b0ad078a06df474617f1b306aa
----------------------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------
ArrayArrayKernel<And>/32768/10000               9.21 us         9.21 us        69999 bytes_per_second=3.31485G/s items_per_second=28.4743G/s null_percent=0.01 size=32.768k
ArrayArrayKernel<And>/32768/100                 9.11 us         9.11 us        74446 bytes_per_second=3.35138G/s items_per_second=28.7882G/s null_percent=1 size=32.768k
ArrayArrayKernel<And>/32768/10                  8.06 us         8.06 us       103589 bytes_per_second=3.78621G/s items_per_second=32.5233G/s null_percent=10 size=32.768k
ArrayArrayKernel<And>/32768/2                   7.27 us         7.27 us        96371 bytes_per_second=4.19765G/s items_per_second=36.0575G/s null_percent=50 size=32.768k
ArrayArrayKernel<And>/32768/1                   8.96 us         8.95 us        91746 bytes_per_second=3.40815G/s items_per_second=29.2758G/s null_percent=100 size=32.768k
ArrayArrayKernel<And>/32768/0                   7.87 us         7.87 us        78959 bytes_per_second=3.87712G/s items_per_second=33.3042G/s null_percent=0 size=32.768k
ArrayArrayKernel<And>/1048576/10000              335 us          335 us         2080 bytes_per_second=2.91231G/s items_per_second=25.0165G/s null_percent=0.01 size=1048.58k
ArrayArrayKernel<And>/1048576/100                334 us          334 us         2098 bytes_per_second=2.92463G/s items_per_second=25.1224G/s null_percent=1 size=1048.58k
ArrayArrayKernel<And>/1048576/10                 336 us          336 us         2089 bytes_per_second=2.90452G/s items_per_second=24.9496G/s null_percent=10 size=1048.58k
ArrayArrayKernel<And>/1048576/2                  336 us          336 us         2077 bytes_per_second=2.90715G/s items_per_second=24.9722G/s null_percent=50 size=1048.58k
ArrayArrayKernel<And>/1048576/1                  238 us          238 us         2944 bytes_per_second=4.10794G/s items_per_second=35.2869G/s null_percent=100 size=1048.58k
ArrayArrayKernel<And>/1048576/0                  239 us          239 us         2932 bytes_per_second=4.0919G/s items_per_second=35.1491G/s null_percent=0 size=1048.58k
ArrayArrayKernel<KleeneAnd>/32768/10000         15.7 us         15.7 us        44167 bytes_per_second=1.9399G/s items_per_second=16.6637G/s null_percent=0.01 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/100           15.7 us         15.7 us        44447 bytes_per_second=1.94457G/s items_per_second=16.7038G/s null_percent=1 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/10            15.7 us         15.7 us        44661 bytes_per_second=1.94512G/s items_per_second=16.7085G/s null_percent=10 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/2             15.7 us         15.7 us        44260 bytes_per_second=1.94383G/s items_per_second=16.6974G/s null_percent=50 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/1             15.6 us         15.6 us        43756 bytes_per_second=1.95105G/s items_per_second=16.7594G/s null_percent=100 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/0             5.28 us         5.28 us       136108 bytes_per_second=5.78137G/s items_per_second=49.6616G/s null_percent=0 size=32.768k
ArrayArrayKernel<KleeneAnd>/1048576/10000        483 us          483 us         1447 bytes_per_second=2.02148G/s items_per_second=17.3643G/s null_percent=0.01 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/100          484 us          484 us         1447 bytes_per_second=2.0196G/s items_per_second=17.3482G/s null_percent=1 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/10           483 us          483 us         1445 bytes_per_second=2.02036G/s items_per_second=17.3548G/s null_percent=10 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/2            484 us          484 us         1448 bytes_per_second=2.0172G/s items_per_second=17.3276G/s null_percent=50 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/1            484 us          484 us         1448 bytes_per_second=2.01958G/s items_per_second=17.3481G/s null_percent=100 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/0            198 us          198 us         3541 bytes_per_second=4.93652G/s items_per_second=42.4044G/s null_percent=0 size=1048.58k



Results for the PR:

----------------------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------
ArrayArrayKernel<And>/32768/10000               8.64 us         8.64 us        82671 bytes_per_second=3.53361G/s items_per_second=30.3535G/s null_percent=0.01 size=32.768k
ArrayArrayKernel<And>/32768/100                 10.4 us         10.2 us        68076 bytes_per_second=2.99687G/s items_per_second=25.7429G/s null_percent=1 size=32.768k
ArrayArrayKernel<And>/32768/10                  10.4 us         10.4 us        64431 bytes_per_second=2.94554G/s items_per_second=25.302G/s null_percent=10 size=32.768k
ArrayArrayKernel<And>/32768/2                   7.92 us         7.92 us       102414 bytes_per_second=3.85474G/s items_per_second=33.1119G/s null_percent=50 size=32.768k
ArrayArrayKernel<And>/32768/1                   7.37 us         7.37 us        97390 bytes_per_second=4.14169G/s items_per_second=35.5769G/s null_percent=100 size=32.768k
ArrayArrayKernel<And>/32768/0                   6.48 us         6.48 us       106870 bytes_per_second=4.71213G/s items_per_second=40.4769G/s null_percent=0 size=32.768k
ArrayArrayKernel<And>/1048576/10000              336 us          336 us         2086 bytes_per_second=2.90622G/s items_per_second=24.9642G/s null_percent=0.01 size=1048.58k
ArrayArrayKernel<And>/1048576/100                333 us          333 us         2091 bytes_per_second=2.93627G/s items_per_second=25.2224G/s null_percent=1 size=1048.58k
ArrayArrayKernel<And>/1048576/10                 333 us          333 us         2101 bytes_per_second=2.93204G/s items_per_second=25.186G/s null_percent=10 size=1048.58k
ArrayArrayKernel<And>/1048576/2                  333 us          333 us         2104 bytes_per_second=2.93041G/s items_per_second=25.172G/s null_percent=50 size=1048.58k
ArrayArrayKernel<And>/1048576/1                  238 us          238 us         2947 bytes_per_second=4.10724G/s items_per_second=35.2809G/s null_percent=100 size=1048.58k
ArrayArrayKernel<And>/1048576/0                  239 us          239 us         2939 bytes_per_second=4.09211G/s items_per_second=35.1509G/s null_percent=0 size=1048.58k
ArrayArrayKernel<KleeneAnd>/32768/10000         13.4 us         13.4 us        52481 bytes_per_second=2.28086G/s items_per_second=19.5925G/s null_percent=0.01 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/100           13.3 us         13.3 us        52642 bytes_per_second=2.28768G/s items_per_second=19.651G/s null_percent=1 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/10            13.3 us         13.3 us        52251 bytes_per_second=2.29669G/s items_per_second=19.7284G/s null_percent=10 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/2             13.3 us         13.3 us        52684 bytes_per_second=2.29603G/s items_per_second=19.7227G/s null_percent=50 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/1             13.4 us         13.4 us        52555 bytes_per_second=2.28596G/s items_per_second=19.6362G/s null_percent=100 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/0             6.10 us         6.10 us       118983 bytes_per_second=5.00416G/s items_per_second=42.9854G/s null_percent=0 size=32.768k
ArrayArrayKernel<KleeneAnd>/1048576/10000        388 us          388 us         1807 bytes_per_second=2.51753G/s items_per_second=21.6254G/s null_percent=0.01 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/100          389 us          389 us         1805 bytes_per_second=2.50864G/s items_per_second=21.5491G/s null_percent=1 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/10           390 us          390 us         1804 bytes_per_second=2.50681G/s items_per_second=21.5333G/s null_percent=10 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/2            391 us          391 us         1803 bytes_per_second=2.49759G/s items_per_second=21.4541G/s null_percent=50 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/1            388 us          388 us         1807 bytes_per_second=2.51458G/s items_per_second=21.6001G/s null_percent=100 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/0            238 us          238 us         2939 bytes_per_second=4.11143G/s items_per_second=35.3169G/s null_percent=0 size=1048.58k

nirandaperera · 2021-06-28T19:48:33Z

@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-scalar-boolean-benchmark

ursabot · 2021-06-28T19:49:07Z

Benchmark runs are scheduled for baseline = c913aa3 and contender = 788bd495a8ca8c180355e9066387824fc972d734. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Only ['lang', 'name'] filters are supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2 (mimalloc)
[Skipped ⚠️ Only ['lang', 'name'] filters are supported on ursa-i9-9960x] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.25% ⬆️0.0%] ursa-thinkcentre-m75q (mimalloc)

bkietz

nits

cpp/src/arrow/compute/kernels/scalar_if_else_test.cc

cpp/src/arrow/util/bit_util.h

cpp/src/arrow/util/bitmap.h

nirandaperera · 2021-06-28T20:36:09Z

@ursabot please benchmark

ursabot · 2021-06-28T20:36:15Z

Benchmark runs are scheduled for baseline = c913aa3 and contender = 0631e7bbb5042adb5299440572c5b49633dc58fb. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2 (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-thinkcentre-m75q (mimalloc)

nirandaperera · 2021-06-28T23:24:20Z

@ursabot please benchmark

ursabot · 2021-06-28T23:25:21Z

Benchmark runs are scheduled for baseline = c913aa3 and contender = 2663d92be3f95598b00391e254eefa11cfb11279. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2 (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-thinkcentre-m75q (mimalloc)

This reverts commit 97091f85

Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>

pitrou · 2021-06-29T17:06:13Z

Locally, I seem to get varying results from run to run (and also depending on the compiler), but archery benchmark diff doesn't show very worrying regressions with clang 10:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (11)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                               benchmark      baseline     contender  change %                                                                                                                                                  counters
    ArrayArrayKernel<KleeneAnd>/524288/1 4.309 GiB/sec 5.868 GiB/sec    36.176    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6226, 'null_percent': 100.0}
   ArrayArrayKernel<KleeneAnd>/524288/10 4.337 GiB/sec 5.888 GiB/sec    35.749    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6226, 'null_percent': 10.0}
  ArrayArrayKernel<KleeneAnd>/524288/100 4.335 GiB/sec 5.850 GiB/sec    34.939    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6232, 'null_percent': 1.0}
ArrayArrayKernel<KleeneAnd>/524288/10000 4.318 GiB/sec 5.755 GiB/sec    33.290 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6221, 'null_percent': 0.01}
    ArrayArrayKernel<KleeneAnd>/524288/2 4.365 GiB/sec 5.762 GiB/sec    31.994     {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6200, 'null_percent': 50.0}
     ArrayArrayKernel<KleeneAnd>/32768/1 3.204 GiB/sec 4.220 GiB/sec    31.726    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 75304, 'null_percent': 100.0}
     ArrayArrayKernel<KleeneAnd>/32768/2 3.370 GiB/sec 4.212 GiB/sec    24.961     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 77611, 'null_percent': 50.0}
 ArrayArrayKernel<KleeneAnd>/32768/10000 3.360 GiB/sec 4.197 GiB/sec    24.906 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 76897, 'null_percent': 0.01}
    ArrayArrayKernel<KleeneAnd>/32768/10 3.373 GiB/sec 4.208 GiB/sec    24.745    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 76903, 'null_percent': 10.0}
   ArrayArrayKernel<KleeneAnd>/32768/100 3.383 GiB/sec 4.202 GiB/sec    24.212    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 77559, 'null_percent': 1.0}
     ArrayArrayKernel<KleeneAnd>/32768/0 7.838 GiB/sec 7.956 GiB/sec     1.499     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 187182, 'null_percent': 0.0}

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (1)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                           benchmark       baseline      contender  change %                                                                                                                                              counters
ArrayArrayKernel<KleeneAnd>/524288/0 15.853 GiB/sec 14.343 GiB/sec    -9.527 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 23504, 'null_percent': 0.0}

(on Ubuntu 20.04, AMD Ryzen 3900)

pitrou · 2021-06-29T17:14:29Z

There are more regressions with gcc 9, though:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (2)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                           benchmark      baseline     contender  change %                                                                                                                                               counters
ArrayArrayKernel<KleeneAnd>/524288/1 2.149 GiB/sec 2.318 GiB/sec     7.887 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3713, 'null_percent': 100.0}
ArrayArrayKernel<KleeneAnd>/524288/2 2.249 GiB/sec 2.262 GiB/sec     0.569  {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4816, 'null_percent': 50.0}

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (10)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                               benchmark       baseline      contender  change %                                                                                                                                                  counters
     ArrayArrayKernel<KleeneAnd>/32768/0  7.645 GiB/sec  6.972 GiB/sec    -8.804     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 178864, 'null_percent': 0.0}
    ArrayArrayKernel<KleeneAnd>/524288/0 16.468 GiB/sec 13.337 GiB/sec   -19.014     {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 22855, 'null_percent': 0.0}
   ArrayArrayKernel<KleeneAnd>/524288/10  2.801 GiB/sec  2.040 GiB/sec   -27.172    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4221, 'null_percent': 10.0}
  ArrayArrayKernel<KleeneAnd>/524288/100  2.984 GiB/sec  2.147 GiB/sec   -28.031    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5191, 'null_percent': 1.0}
ArrayArrayKernel<KleeneAnd>/524288/10000  3.224 GiB/sec  2.250 GiB/sec   -30.221 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5014, 'null_percent': 0.01}
    ArrayArrayKernel<KleeneAnd>/32768/10  2.991 GiB/sec  1.970 GiB/sec   -34.143    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68647, 'null_percent': 10.0}
 ArrayArrayKernel<KleeneAnd>/32768/10000  2.991 GiB/sec  1.964 GiB/sec   -34.335 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68084, 'null_percent': 0.01}
     ArrayArrayKernel<KleeneAnd>/32768/2  2.999 GiB/sec  1.969 GiB/sec   -34.335     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68549, 'null_percent': 50.0}
     ArrayArrayKernel<KleeneAnd>/32768/1  2.997 GiB/sec  1.966 GiB/sec   -34.402    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68530, 'null_percent': 100.0}
   ArrayArrayKernel<KleeneAnd>/32768/100  3.000 GiB/sec  1.967 GiB/sec   -34.414    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68686, 'null_percent': 1.0}

nirandaperera · 2021-06-29T17:15:27Z

ArrayArrayKernel/524288/0 15.853 GiB/sec 14.343 GiB/sec -9.527 {'run_name': 'ArrayArrayKernel/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 23504, 'null_percent': 0.0}

Regression for ArrayArrayKernel<KleeneAnd>/32768/0 and ArrayArrayKernel<KleeneAnd>/524288/0 is expected because we are populating a validity buffer always now, because the exec infrastructure always allocates memory for the validity buffer.

pitrou · 2021-06-29T17:15:40Z

What's weird as well is that, sometimes, L2-sized benchmarks are faster than L1-sized, but sometimes they are slower.

pitrou · 2021-06-29T17:16:06Z

@nirandaperera I see, thanks for the insight.

pitrou · 2021-06-29T17:16:33Z

In any case, I don't think the regressions are really terrible in themselves.

nirandaperera · 2021-06-29T17:57:49Z

I got archery running on my machine and I can confirm that gcc-9 is the problem there. If I use clang-10, it shows a better performance. But gcc-9 shows a lot of regressions.

nirandaperera · 2021-06-29T21:22:23Z

Did some further analysis on this. It turns out that gcc-10 works much better than gcc-9.
https://gist.github.com/nirandaperera/0bcd40c223fd32105d027a86a571334f

bkietz

LGTM. I think since the performance regression is compiler dependent we don't need to worry about it here. Thanks for doing this!

I'll merge shortly

github-actions bot added the Component: C++ label Jun 9, 2021

bkietz self-requested a review June 9, 2021 18:50

nirandaperera marked this pull request as ready for review June 11, 2021 20:28

bkietz requested changes Jun 15, 2021

View reviewed changes

bkietz reviewed Jun 15, 2021

View reviewed changes

cpp/src/arrow/util/bitmap.h Outdated Show resolved Hide resolved

nirandaperera requested a review from bkietz June 18, 2021 13:59

nirandaperera force-pushed the ARROW-13010 branch from 3a4f269 to 95c3688 Compare June 21, 2021 20:52

nirandaperera mentioned this pull request Jun 28, 2021

ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type #10585

Closed

bkietz requested changes Jun 28, 2021

View reviewed changes

cpp/src/arrow/compute/kernels/scalar_if_else_test.cc Outdated Show resolved Hide resolved

cpp/src/arrow/util/bit_util.h Outdated Show resolved Hide resolved

cpp/src/arrow/util/bitmap.h Outdated Show resolved Hide resolved

nirandaperera added 7 commits June 29, 2021 09:48

prelim

02e3d8d

working - not tested properly. requires clean up

fad0383

adding striding

1ffce3f

adding tests

35f6178

moving BitmapWordReader and BitmapWordWriter to header files

f0f3c83

adding multiple writers and testing w/ offsets

7c6a4ef

Autoformat/render all the things [automated commit]

1e22301

nirandaperera and others added 17 commits June 29, 2021 09:48

fixing errors

b12a20d

simplifying if-else

a20e295

simplifying if-else

e37be50

fixing errors

6c71c36

attempting to fix msvc error

aeb48ae

lint fix

4519dd3

fixing the down casting issue

30ec72e

fixing the down casting issue

0e4f1a0

refactor

f5a14c0

adding set/clearbitmap tests

cfb88f8

making if_else kernels write_to_slices

33444d1

fixing lint

984b7db

fixing performance isssue

adfb0fd

dummy

6d48f7a

Revert "dummy"

d368866

This reverts commit 97091f85

Apply suggestions from code review

4324a73

Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>

applying PR comments

1b3144b

nirandaperera force-pushed the ARROW-13010 branch from 2663d92 to 1b3144b Compare June 29, 2021 13:48

bkietz approved these changes Jun 30, 2021

View reviewed changes

bkietz closed this in 23d19ce Jun 30, 2021

asfimport mentioned this pull request Jun 30, 2021

[C++][Compute] Support outputting to slices from kleene kernels #28726

Closed

ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels #10487

ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels #10487

Uh oh!

Conversation

nirandaperera commented Jun 9, 2021

Uh oh!

github-actions bot commented Jun 9, 2021

Uh oh!

nirandaperera commented Jun 11, 2021

Uh oh!

bkietz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bkietz commented Jun 15, 2021

Uh oh!

ursabot commented Jun 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nirandaperera commented Jun 22, 2021

Uh oh!

ursabot commented Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nirandaperera commented Jun 22, 2021

Uh oh!

nirandaperera commented Jun 28, 2021

Uh oh!

nirandaperera commented Jun 28, 2021

Uh oh!

ursabot commented Jun 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bkietz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nirandaperera commented Jun 28, 2021

Uh oh!

ursabot commented Jun 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nirandaperera commented Jun 28, 2021

Uh oh!

ursabot commented Jun 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou commented Jun 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou commented Jun 29, 2021

Uh oh!

nirandaperera commented Jun 29, 2021

Uh oh!

pitrou commented Jun 29, 2021

Uh oh!

pitrou commented Jun 29, 2021

Uh oh!

pitrou commented Jun 29, 2021

Uh oh!

nirandaperera commented Jun 29, 2021

Uh oh!

nirandaperera commented Jun 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bkietz left a comment

Choose a reason for hiding this comment

Uh oh!

ursabot commented Jun 15, 2021 •

edited

Loading

ursabot commented Jun 22, 2021 •

edited

Loading

ursabot commented Jun 28, 2021 •

edited

Loading

ursabot commented Jun 28, 2021 •

edited

Loading

ursabot commented Jun 28, 2021 •

edited

Loading

pitrou commented Jun 29, 2021 •

edited

Loading

nirandaperera commented Jun 29, 2021 •

edited

Loading