Skip to content

Conversation

@michalursa
Copy link
Contributor

Fixing 3 issues:

  • one in SchemaProjectionMaps - I simplified all of the code to get rid of thread synchronization at all
  • one in TaskScheduler - added (unnecessary) mutex
  • one in HashJoinImpl - switching from shared byte vector to local bit vectors and merge (for recording if a match for a hash table row has been seen)

@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

github-actions bot commented Oct 7, 2021

@github-actions
Copy link

github-actions bot commented Oct 7, 2021

Revision: 16ded42dddc5e654dd177707c1c47dd6d8005c0d

Submitted crossbow builds: ursacomputing/crossbow @ actions-902

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@michalursa michalursa force-pushed the ARROW-14211-hash-join-tsan branch from 16ded42 to 9682388 Compare October 7, 2021 04:47
@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

github-actions bot commented Oct 7, 2021

Revision: 96823881bfd42876dfbcc577415a1870611fee25

Submitted crossbow builds: ursacomputing/crossbow @ actions-903

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@pitrou pitrou requested a review from bkietz October 7, 2021 07:42
@michalursa michalursa changed the title ARROW-14197: [C++][Compute] Fixing thread sanitizer problems in hash join node ARROW-14211: [C++][Compute] Fixing thread sanitizer problems in hash join node Oct 7, 2021
@github-actions
Copy link

github-actions bot commented Oct 7, 2021

@michalursa michalursa force-pushed the ARROW-14211-hash-join-tsan branch from 9682388 to 7c80dbd Compare October 7, 2021 19:17
@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 679309bb681dd4e40caf5cccf608a867d97f3add

Submitted crossbow builds: ursacomputing/crossbow @ actions-908

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@michalursa michalursa force-pushed the ARROW-14211-hash-join-tsan branch from 679309b to 3c32de9 Compare October 11, 2021 10:06
@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 3c32de96b05171c126d1cf5e08ce019eaedb244f

Submitted crossbow builds: ursacomputing/crossbow @ actions-910

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@michalursa michalursa force-pushed the ARROW-14211-hash-join-tsan branch from 3c32de9 to f1dae79 Compare October 11, 2021 21:44
@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: f1dae79664dc0be394e0544b62ef79d7e1e34685

Submitted crossbow builds: ursacomputing/crossbow @ actions-914

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@michalursa michalursa force-pushed the ARROW-14211-hash-join-tsan branch from f1dae79 to 1deb369 Compare October 11, 2021 22:47
@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 1deb369bd9dccd27475ee47574ec14015f35b532

Submitted crossbow builds: ursacomputing/crossbow @ actions-915

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@michalursa michalursa force-pushed the ARROW-14211-hash-join-tsan branch from 1deb369 to 3d1ce0c Compare October 12, 2021 06:43
@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 3d1ce0c31fdab96ef969d790fc3ac673c8749379

Submitted crossbow builds: ursacomputing/crossbow @ actions-917

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@michalursa michalursa force-pushed the ARROW-14211-hash-join-tsan branch 2 times, most recently from 37993cf to c94bd41 Compare October 12, 2021 22:39
@pitrou
Copy link
Member

pitrou commented Oct 14, 2021

@michalursa Is this ready for review? Ideally we would like to ship this fix in 6.0.0, which will hopefully be cut in a few days.

@github-actions
Copy link

Revision: c94bd41ed08dcbb3dd22585cf97fa536abc48262

Submitted crossbow builds: ursacomputing/crossbow @ actions-952

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@pitrou
Copy link
Member

pitrou commented Oct 14, 2021

Also, can you rebase from master to ensure that we get the latest CI fixes?

@michalursa michalursa force-pushed the ARROW-14211-hash-join-tsan branch from c94bd41 to 9d5bcee Compare October 14, 2021 23:11
@michalursa
Copy link
Contributor Author

@pitrou I rebased. It is ready for review.

@pitrou
Copy link
Member

pitrou commented Oct 15, 2021

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 9d5bceed669496e863bfde9e59c22d4fcaf3ea25

Submitted crossbow builds: ursacomputing/crossbow @ actions-966

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@pitrou
Copy link
Member

pitrou commented Oct 15, 2021

@github-actions crossbow submit -g r

@pitrou
Copy link
Member

pitrou commented Oct 15, 2021

@bkietz Are you available to give a quick review on this? If CI is ok, it would be nice to have this fix in 6.0.0.

@github-actions
Copy link

Revision: 9d5bceed669496e863bfde9e59c22d4fcaf3ea25

Submitted crossbow builds: ursacomputing/crossbow @ actions-967

Task Status
conda-linux-gcc-py36-cpu-r40 Azure
conda-linux-gcc-py37-cpu-r41 Azure
conda-osx-clang-py36-r40 Azure
conda-osx-clang-py37-r41 Azure
conda-win-vs2017-py36-r40 Azure
conda-win-vs2017-py37-r41 Azure
homebrew-r-autobrew Github Actions
test-r-arrow-backwards-compatibility Github Actions
test-r-depsource-auto Azure
test-r-depsource-system Github Actions
test-r-devdocs Github Actions
test-r-gcc-11 Github Actions
test-r-install-local Github Actions
test-r-linux-as-cran Github Actions
test-r-linux-rchk Github Actions
test-r-linux-valgrind Azure
test-r-minimal-build Azure
test-r-offline-maximal Github Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-ubuntu-gcc-release-latest Azure
test-r-rocker-r-base-latest Azure
test-r-rstudio-r-base-3.6-bionic Azure
test-r-rstudio-r-base-3.6-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-3.6-centos8 Azure
test-r-rstudio-r-base-3.6-opensuse15 Azure
test-r-rstudio-r-base-3.6-opensuse42 Azure
test-r-ubuntu-21.04 Github Actions
test-r-versions Github Actions
test-ubuntu-18.04-r-sanitizer Azure

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit, LGTM

@michalursa michalursa force-pushed the ARROW-14211-hash-join-tsan branch from 08931a6 to 9fd1c02 Compare October 18, 2021 19:16
@kou
Copy link
Member

kou commented Oct 18, 2021

@github-actions crossbow submit -g cpp -g r

@github-actions
Copy link

Revision: 9fd1c024392207bc3622da6a3307c06272efbc4c

Submitted crossbow builds: ursacomputing/crossbow @ actions-994

Task Status
conda-linux-gcc-py36-cpu-r40 Azure
conda-linux-gcc-py37-cpu-r41 Azure
conda-osx-clang-py36-r40 Azure
conda-osx-clang-py37-r41 Azure
conda-win-vs2017-py36-r40 Azure
conda-win-vs2017-py37-r41 Azure
homebrew-r-autobrew Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-r-arrow-backwards-compatibility Github Actions
test-r-depsource-auto Azure
test-r-depsource-system Github Actions
test-r-devdocs Github Actions
test-r-gcc-11 Github Actions
test-r-install-local Github Actions
test-r-linux-as-cran Github Actions
test-r-linux-rchk Github Actions
test-r-linux-valgrind Azure
test-r-minimal-build Azure
test-r-offline-maximal Github Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-ubuntu-gcc-release-latest Azure
test-r-rocker-r-base-latest Azure
test-r-rstudio-r-base-3.6-bionic Azure
test-r-rstudio-r-base-3.6-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-3.6-centos8 Azure
test-r-rstudio-r-base-3.6-opensuse15 Azure
test-r-rstudio-r-base-3.6-opensuse42 Azure
test-r-ubuntu-21.04 Github Actions
test-r-versions Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-18.04-r-sanitizer Azure
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@michalursa michalursa force-pushed the ARROW-14211-hash-join-tsan branch from 9fd1c02 to 720c4f5 Compare October 18, 2021 21:39
@kou
Copy link
Member

kou commented Oct 19, 2021

@michalursa Can we merge this?

@michalursa
Copy link
Contributor Author

@michalursa Can we merge this?

@kou Yes, I believe so. This is a relatively simple change (more precisely 3 simple fixes put together) and I don't see any errors that could be related.

@kou
Copy link
Member

kou commented Oct 19, 2021

OK.
I merge this.

@kou kou closed this in d9ef519 Oct 19, 2021
@ursabot
Copy link

ursabot commented Oct 19, 2021

Benchmark runs are scheduled for baseline = 6e1293b and contender = d9ef519. d9ef519 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.51% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.36% ⬆️0.09%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

lidavidm pushed a commit that referenced this pull request Oct 19, 2021
#11350 moved ThreadIndexer from `aggregate_node.cc` to `compute/exec/util.h` to allow usage from the join node and incremented its capacity, but did not remove the original implementation.

Closes #11463 from bkietz/remove-duplicated-ThreadIndexer

Authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants