Skip to content

Conversation

@nirandaperera
Copy link
Contributor

@nirandaperera nirandaperera commented Jun 23, 2021

Adding array_sort_indices and partition_nth_indices for BooleanType using existing sort and Nth-partition utils.
This may be rather inefficient, since the values are traversed bit-by-bit rather than working on a byte/word.

May be we could work on it as a separate improvement?

@github-actions
Copy link

@nirandaperera
Copy link
Contributor Author

@pitrou Can you take a look at this?

@ianmcook
Copy link
Member

@nirandaperera as part of this PR, could you please make two small changes to the R package tests to exercise this new capability?

  1. Remove the comment at the end of this line:

    lgl = c(rep(FALSE, 4L), rep(TRUE, 5L), NA), # bool is not supported (ARROW-12016)

  2. Remove this line:

    skip("Sorting by bool columns is not supported (ARROW-12016)")

Thanks!

@nirandaperera
Copy link
Contributor Author

nirandaperera commented Jun 23, 2021

@ianmcook Done! :-)

@nirandaperera
Copy link
Contributor Author

@pitrou
Copy link
Member

pitrou commented Jun 24, 2021

@nirandaperera You may want to add a benchmark in vector_sort_benchmark.cc.

@pitrou
Copy link
Member

pitrou commented Jun 24, 2021

Also, if you want to work on performance, note that a dedicated counting sort for boolean should be really simple.
You can first call null_count, true_count and false_count, then you just have to walk individual bits and emit indices.

@nirandaperera
Copy link
Contributor Author

@nirandaperera You may want to add a benchmark in vector_sort_benchmark.cc.

@pitrou I added a simple benchmark now. I'll add the improved version and run it against that bench. Didn't have to do much for RecordBatches and Tables because they are already using ::GetView methods to access values. So, it was working OOB for bools.

@nirandaperera
Copy link
Contributor Author

@pitrou I think I'll open a new JIRA for the optimized bool sort implementation. I feel like I can reuse some of the stuff from the #10487 PR here.

@nirandaperera nirandaperera force-pushed the ARROW-12016 branch 2 times, most recently from c1ef90b to 9cffeae Compare July 2, 2021 23:04
@nirandaperera nirandaperera requested a review from pitrou July 2, 2021 23:05
@nirandaperera
Copy link
Contributor Author

@pitrou I added a separate ArrayCountSorter impl for bool types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, you needn't write this yourself. Just call null_count() and true_count().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay. I did this way because we can count both nulls and trues in a single pass. But sure, I'll use that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds rather weird. Are you sure about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was added by me very early when implementing counting sort approach.
I think it's possibly due to 32bit counter array is smaller and have more chance to stay in L1 cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless it's benchmarked here as well, maybe let's remove the comment to avoid being confusing.

Copy link
Contributor Author

@nirandaperera nirandaperera Jul 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is there in the primitive type impl as well. Should that be removed as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one was benchmarked!

More seriously…maybe at least edit them to reflect that it was done due to a benchmark. Though I worry about the comment effectively bitrotting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth testing the non-null case separately. This will make less test code to maintain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@nirandaperera nirandaperera requested a review from pitrou July 6, 2021 14:19
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One minor comment about that confusing comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless it's benchmarked here as well, maybe let's remove the comment to avoid being confusing.

@nirandaperera
Copy link
Contributor Author

I made the suggested changes and I think this is ready now

@lidavidm
Copy link
Member

If we're removing the 32 vs 64 bit counter branch, can we benchmark it to make sure there's no impact?

@nirandaperera
Copy link
Contributor Author

If we're removing the 32 vs 64 bit counter branch, can we benchmark it to make sure there's no impact?

I'm still thinking how to do this benchmark? 😄 Because we can't call separate ArrayCountSorter<BooleanType> impls from the bench suite, isn't it?

@lidavidm
Copy link
Member

It would be a before vs after benchmark not side by side

@kszucs
Copy link
Member

kszucs commented Jul 19, 2021

@nirandaperera MSVC doesn't look happy.

@edponce
Copy link
Contributor

edponce commented Jul 19, 2021

@nirandaperera @kszucs The MSVC error seems unrelated to this PR and is cause by a timeout in a Flight test.

@kszucs
Copy link
Member

kszucs commented Jul 19, 2021

I'm referring to these errors (appveyor not mingw).

@nirandaperera
Copy link
Contributor Author

I'm referring to these errors (appveyor not mingw).

I believe this is due to a method being declared static. Let's see! thanks @kszucs

@nirandaperera
Copy link
Contributor Author

nirandaperera commented Jul 20, 2021

It would be a before vs after benchmark not side by side

I tested this with the ArraySortIndicesBool benchmark and didn't see any significant difference in performance. So I think its okay to leave it with int64_t.

int64_t

-------------------------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------
ArraySortIndicesBool/32768/10000     856525 ns       790696 ns         1067 bytes_per_second=39.5221M/s items_per_second=331.536M/s null_percent=0.01 size=32.768k
ArraySortIndicesBool/32768/100      1068125 ns      1067932 ns          644 bytes_per_second=29.2622M/s items_per_second=245.469M/s null_percent=1 size=32.768k
ArraySortIndicesBool/32768/10       1322777 ns      1320088 ns          523 bytes_per_second=23.6727M/s items_per_second=198.581M/s null_percent=10 size=32.768k
ArraySortIndicesBool/32768/2        1898999 ns      1871787 ns          372 bytes_per_second=16.6953M/s items_per_second=140.05M/s null_percent=50 size=32.768k
ArraySortIndicesBool/32768/1         228503 ns       218945 ns         3266 bytes_per_second=142.73M/s items_per_second=1.19731G/s null_percent=100 size=32.768k
ArraySortIndicesBool/32768/0         755168 ns       735851 ns          960 bytes_per_second=42.4678M/s items_per_second=356.246M/s null_percent=0 size=32.768k
ArraySortIndicesBool/1048576/100   35812425 ns     35626885 ns           19 bytes_per_second=28.0687M/s items_per_second=235.457M/s null_percent=1 size=1048.58k
ArraySortIndicesBool/8388608/100  320588616 ns    320491041 ns            2 bytes_per_second=24.9617M/s items_per_second=209.394M/s null_percent=1 size=8.38861M


int32_t

-------------------------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------
ArraySortIndicesBool/32768/10000     768462 ns       763410 ns         1005 bytes_per_second=40.9347M/s items_per_second=343.385M/s null_percent=0.01 size=32.768k
ArraySortIndicesBool/32768/100      1060476 ns      1026142 ns          765 bytes_per_second=30.4539M/s items_per_second=255.466M/s null_percent=1 size=32.768k
ArraySortIndicesBool/32768/10       1447835 ns      1375450 ns          499 bytes_per_second=22.7198M/s items_per_second=190.588M/s null_percent=10 size=32.768k
ArraySortIndicesBool/32768/2        2002536 ns      2001982 ns          343 bytes_per_second=15.6095M/s items_per_second=130.942M/s null_percent=50 size=32.768k
ArraySortIndicesBool/32768/1         274177 ns       268186 ns         2750 bytes_per_second=116.523M/s items_per_second=977.469M/s null_percent=100 size=32.768k
ArraySortIndicesBool/32768/0         735371 ns       734928 ns         1025 bytes_per_second=42.5211M/s items_per_second=356.693M/s null_percent=0 size=32.768k
ArraySortIndicesBool/1048576/100   42704206 ns     42695362 ns           12 bytes_per_second=23.4217M/s items_per_second=196.476M/s null_percent=1 size=1048.58k
ArraySortIndicesBool/8388608/100  310066885 ns    310040314 ns            2 bytes_per_second=25.8031M/s items_per_second=216.452M/s null_percent=1 size=8.38861M

@lidavidm
Copy link
Member

@ursabot please benchmark lang=C++

@lidavidm
Copy link
Member

Thanks for checking. Let's also check with Conbench and if that's alright, then let's merge.

@ursabot
Copy link

ursabot commented Jul 20, 2021

Benchmark runs are scheduled for baseline = d7a8b46 and contender = 85445cf37da25a953bb15478938853613b64cc18. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2 (mimalloc)
[Skipped ⚠️ Only ['Python', 'R'] langs are supported on ursa-i9-9960x] ursa-i9-9960x (mimalloc)
[Failed] ursa-thinkcentre-m75q (mimalloc)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

@lidavidm
Copy link
Member

@nirandaperera this apparently needs rebasing against master before we can run Conbench on it

@kszucs
Copy link
Member

kszucs commented Jul 21, 2021

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Jul 21, 2021

Benchmark runs are scheduled for baseline = 737492e and contender = 992f8dcf38caf30405f100636760a77e5e98d056. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2 (mimalloc)
[Skipped ⚠️ Only ['Python', 'R'] langs are supported on ursa-i9-9960x] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.43% ⬆️0.05%] ursa-thinkcentre-m75q (mimalloc)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

@pitrou
Copy link
Member

pitrou commented Jul 21, 2021

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Jul 21, 2021

Benchmark runs are scheduled for baseline = 737492e and contender = 257527c. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2 (mimalloc)
[Skipped ⚠️ Only ['Python', 'R'] langs are supported on ursa-i9-9960x] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.19% ⬆️0.0%] ursa-thinkcentre-m75q (mimalloc)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants