Conversation
…e#21068) ## Which issue does this PR close? * Closes apache#20492. ## Rationale for this change `HashJoinExec` currently continues polling and consuming the probe side even after the build side has completed with zero rows. For join types whose output is guaranteed to be empty when the build side is empty, this work is unnecessary. In practice, it can trigger large avoidable scans and extra compute despite producing no output. This is especially costly for cases such as INNER, LEFT, LEFT SEMI, LEFT ANTI, LEFT MARK, and RIGHT SEMI joins. This change makes the stream state machine aware of that condition so execution can terminate as soon as the build side is known to be empty and no probe rows are needed to determine the final result. The change also preserves the existing behavior for join types that still require probe-side rows even when the build side is empty, such as RIGHT, FULL, RIGHT ANTI, and RIGHT MARK joins. ## What changes are included in this PR? * Added `JoinType::empty_build_side_produces_empty_result` to centralize logic determining when an empty build side guarantees empty output. * Updated `HashJoinStream` state transitions to: * Skip transitioning to `FetchProbeBatch` when the build side is empty and output is deterministically empty. * Immediately complete the stream in such cases. * Refactored logic in `build_batch_empty_build_side` to reuse the new helper method and simplify match branches. * Ensured probe-side consumption still occurs for join types that require probe rows (e.g., RIGHT, FULL). * Added helper `state_after_build_ready` to unify post-build decision logic. * Introduced reusable helper for constructing hash joins with dynamic filters in tests. ## Are these changes tested? Yes, comprehensive tests have been added: * Verified that probe side is **not consumed** when: * Build side is empty * Join type guarantees empty output * Verified that probe side **is still consumed** when required by join semantics (e.g., RIGHT, FULL joins) * Covered both filtered and non-filtered joins * Added tests ensuring correct behavior with dynamic filters * Added regression test ensuring correct behavior after partition bounds reporting These tests validate both correctness and the intended optimization behavior. ## Are there any user-facing changes? No API changes. However, this introduces a performance optimization: * Queries involving joins with empty build sides may complete significantly faster * Reduced unnecessary IO and compute No behavioral changes in query results. ## LLM-generated code disclosure This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested. (cherry picked from commit 6c5e241)
LiaCastaneda
approved these changes
Apr 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry picks apache#21068