-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Labels
Component: C++Critical FixBugfixes for security vulnerabilities, crashes, or invalid data.Bugfixes for security vulnerabilities, crashes, or invalid data.Type: bug
Milestone
Description
Describe the bug, including details regarding any error messages, version, and platform.
- Version: Repro'd on
HEAD,v12.0.0, andv13.0.0
I've encountered a subtle race condition in the asof join node that is particularly common for large parquet files with many row groups:
- The left hand side of the asofjoin completes, so
InputFinishedproceeds as expected. So far so good - The right hand table(s) of the join are a huge dataset scan. They're still streaming and can legally still call
AsofJoinNode::InputReceivedall they want (doc ref) - Each input batch is blindly pushed to the
InputStates, which in turn defer toBackpressureHandlers to decide whether to pause inputs. (code pointer) - If enough batches come in right after
EndFromProcessThreadis called, then we might exceed the high_threshold and tell the input node to pause via the BackpressureController - At this point, the process thread has stopped for the asofjoiner, so the right hand table(s) won't be dequeue'd, meaning
BackpressureController::Resume()will never be called. This causes a deadlock
I have hackily fixed this in a local checkout by storing an atomic<bool> of whether EndFromProcessQueue was called. If it turns true, then at InputReceived I shortcircuit and return a Status::OK() without enqueueing the batch. Also at EndFromProcessQueue, I call ResumeProducing for all input nodes.
For good measure, I also call StopProducing() on all the inputs in EndFromProcessQueue... though I don't know if it's necessary
Happy to submit a PR once I find bandwidth, but reporting this early in case others run into it.
Component(s)
C++
Metadata
Metadata
Assignees
Labels
Component: C++Critical FixBugfixes for security vulnerabilities, crashes, or invalid data.Bugfixes for security vulnerabilities, crashes, or invalid data.Type: bug