Implement comparisons on nested data types such that distinct/except would work#11117
Implement comparisons on nested data types such that distinct/except would work#11117alamb merged 1 commit intoapache:mainfrom
Conversation
…would work This relies on newer functionality in arrow 52 and allows DataFrame.except() to properly work on schemas with structs and lists Closes apache#10749
alamb
left a comment
There was a problem hiding this comment.
Thank you @rtyler -- I think this is a nice improvement. I left some suggestions on how to improve comments / naming but I do think they could go in a follow on PR
It might also make sense to see if there are other kernels which need the same handling (e.g. eq_dyn for example)
| if left.data_type().is_nested() && null_equals_null { | ||
| let cmp = make_comparator(left, right, SortOptions::default())?; | ||
| let len = left.len().min(right.len()); | ||
| let values = (0..len).map(|i| cmp(i, i).is_eq()).collect(); |
There was a problem hiding this comment.
this is likely quite slow as it will be doing dynamic dispatch per row.
However, slow is better than not working at first.
Could you please: update the name of the function to reflect it isn't just for null anymore? Perhaps we could rename it to eq_dyn or something more generic
There was a problem hiding this comment.
I think other than the potential rename the PR is ready to go -- however I also think we could do the rename as a follow on PR
Note @jayzhan211 added similiar code to handle nested comparisons in eq_datum in #11091 -- I wonder if we would consolidate those implementations somehow
There was a problem hiding this comment.
We could do the comparison with datum function, I move it to physical-common in #11091
It will be a nice alternative for equal_rows_arr
|
I am quite indifferent to the solution here as long as #10749 is resolved 😄 Happy to have this closed out in favor of a better implementation! |
This PR is great and I think a step forward (the code no longer errors!) I'll make a follow on PR to try and simplify the implementation. |
|
Thanks again @rtyler and @jayzhan211 |
|
Filed #11149 with a proposed simpler implementation |
…would work (apache#11117) This relies on newer functionality in arrow 52 and allows DataFrame.except() to properly work on schemas with structs and lists Closes apache#10749
Which issue does this PR close?
Closes #10749
Rationale for this change
This relies on newer functionality in arrow 52 and allows DataFrame.except() to properly work on schemas with structs and lists. I'm not sure if this is the appropriate way to handle this change per se, but I included the regression case from the issue as a test in order to demonstrate the correction of the issue
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?