Skip to content

Conversation

@lidavidm
Copy link
Member

No description provided.

@github-actions
Copy link

Comment on lines +84 to +86
cdef extern from "arrow/util/future.h" namespace "arrow" nogil:
cdef cppclass CFuture_Void" arrow::Future<>":
CStatus status()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprising we've made it this far without exposing Future. I wonder if we want to add a WaitForFinish method on ExecPlan instead of exposing Future (essentially a sync mirror-api) like we do in the Scanner.

I don't really have any problems with this approach though. Just making an observation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose at some point we'll want to expose async APIs in Python, so we're probably bound to interface with Future in Cython anyway.

@kszucs
Copy link
Member

kszucs commented May 2, 2022

How severe this issue is? Does this belong to a new feature? I'm not sure whether we should block the release on it or not.

cc @pitrou

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you @lidavidm

@pitrou
Copy link
Member

pitrou commented May 2, 2022

@kszucs It should probably not block the release, but worth including if you cut another RC. And, yes, it's part of a new feature.

cc @amol-

@pitrou pitrou closed this in 3f2c33f May 2, 2022
@lidavidm lidavidm deleted the arrow-16419 branch May 2, 2022 11:55
@lidavidm
Copy link
Member Author

lidavidm commented May 2, 2022

This by itself is not blocking but see #13036 which is the fix for the segfault I was running into on the ML.

lidavidm added a commit that referenced this pull request May 2, 2022
This builds on top of #13035 which is also important for avoiding segmentation faults.  On top of that there were a few more problems:

 * The python was using `SourceNodeOptions::FromTable` which is a rather dangerous method (mainly useful for unit testing) as it doesn't share ownership of the input table (even worse, it takes a const ref).  Python was not keeping the table alive and it was maybe possible for the table to deleted out from under the plan (I'm not entirely sure this was causing issues but it seemed risky).  I switched to TableSourceNode which shares ownership of the table (and is a bit more efficient).
 * Setting use_threads to False did nothing because `_perform_join` was not passing the arg on to `execplan`.
 * When fixing the above and running with `use_threads=False` it was creating a single thread executor but the current best practice is to pass in nullptr.
 * Finally, the actual bug was my improper fix in #12894 .  I had still left a small window open for `End` to be called between `Submit` and `AddTask` which would allow the task to be submitted but not participate in setting `finished` on the node.

Closes #13036 from westonpace/bugfix/ARROW-16417--segfault-in-python-join

Lead-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
@ursabot
Copy link

ursabot commented May 6, 2022

Benchmark runs are scheduled for baseline = 22a1885 and contender = 3f2c33f. 3f2c33f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.35% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.36% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.16% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 3f2c33f5 ec2-t3-xlarge-us-east-2
[Finished] 3f2c33f5 test-mac-arm
[Failed] 3f2c33f5 ursa-i9-9960x
[Finished] 3f2c33f5 ursa-thinkcentre-m75q
[Finished] 22a18852 ec2-t3-xlarge-us-east-2
[Finished] 22a18852 test-mac-arm
[Finished] 22a18852 ursa-i9-9960x
[Finished] 22a18852 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants