Conversation
viirya
left a comment
There was a problem hiding this comment.
The fix looks good. It'd be better if we can add some e2e tests in sqllogictest.
@viirya Good suggestion. I tried added some sqllogictest, they work. But as far as I can tell they are not using HashJoinExec (even if the exact same query joining on only field is). Is there some way to force a HashJoin or help me understand why hashjoin is not used? |
Hmm, what join operator it is using? I think HashJoin is used by default. |
Just using |
|
Reading the explain more closely I see that the reason for it not using HashJoin seems to be that it fails to extract an equi-join condition when it is joining on structs. |
|
Seems like we need to add struct here: https://github.com/eejbyfeldt/datafusion/blob/2b851c5bbdaef7d1d78d91833d2a7e8d94809bc8/datafusion/expr/src/utils.rs#L830-L833 for hash join to be used. |
| # Join on struct | ||
| query ?? | ||
| select join_t3.s3, join_t4.s4 | ||
| from join_t3 | ||
| inner join join_t4 on join_t3.s3 = join_t4.s4 | ||
| ---- | ||
| {id: 2} {id: 2} |
There was a problem hiding this comment.
Maybe also add an explain query to make sure it is using HashJoin?
There was a problem hiding this comment.
Added for this query. The one using IS NOT DISTINCT FROM is still not using hash join. But that does not seem to be related to the struct, since it the same if just joining on a int.
There was a problem hiding this comment.
Hmm, so the query using IS NOT DISTINCT FROM is actually not testing HashJoin path.
There was a problem hiding this comment.
Sadly not. If we made some other improvements like #11272 it would.
But for not I just changed the test case to use EXCEPT instead and test that path. (The test case fails on master without the fixes in the branches)
77ba24a to
47a2e02
Compare
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
* Fixes to 10749 and generalization * Add e2e tests for joins on struct * PR comments * Add Struct to can_hash method * Add explain query as well * Use EXCEPT to trigger failure * Update datafusion/sqllogictest/test_files/joins.slt Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> --------- Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
* Fixes to 10749 and generalization * Add e2e tests for joins on struct * PR comments * Add Struct to can_hash method * Add explain query as well * Use EXCEPT to trigger failure * Update datafusion/sqllogictest/test_files/joins.slt Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> --------- Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Which issue does this PR close?
Does some follow up to #10749 and it PR #11117 and follow up #11149.
Closes #11233
Rationale for this change
In #11117 hash join was extended to handled nested types. The fixed contained a small issues around handling of
null_equals_nullwhere it only used the new compare function whennull_equals_nullwas true. But it also used theOperator::Eqwhich does not consider nulls equal.What changes are included in this PR?
This address the 2 described issues. Makes hash join also support nested types when
null_equals_nullis false and use the correctOperatorwhen it is true.Are these changes tested?
New test cases.
Are there any user-facing changes?
Yes, it fixes bugs in unreleased features.