[GLUTEN-11330][VL] Make PartialProject support array and map with null values#11331
Conversation
|
Run Gluten Clickhouse CI on x86 |
f9c5f70 to
55c5764
Compare
|
Run Gluten Clickhouse CI on x86 |
55c5764 to
16ab9d6
Compare
|
Run Gluten Clickhouse CI on x86 |
jinchengchenghh
left a comment
There was a problem hiding this comment.
Thanks for your enhancement!
| import org.apache.spark.sql.vectorized.ColumnVector; | ||
|
|
||
| /** | ||
| * Because `get` method in `ColumnarArray` don't check whether the data to get is null and arrow |
There was a problem hiding this comment.
Is this the Spark shortage or design? What's the Spark usage for ColumnarArray with null value?
There was a problem hiding this comment.
I think it is Spark shortage. If there exists null values, ColumnarArray will get the value(this might be a default value or previously set value) because call get on ColumnarArray will eventually call getXXX on ColumnVector and getXXX will not check if it is null value, either.
There was a problem hiding this comment.
Could you also raise an issue in Spark?
There was a problem hiding this comment.
OK, I will. Thanks.
…l values (apache#11331) --------- Co-authored-by: jiangtian <JT2677636391@outlook.com>
What changes are proposed in this pull request?
This PR introduces a new class named
ArrowColumnarArray. Its implementation is copied from Spark-4.0, except that thehandleNullparameter is set to true when we callSpecializedGettersReader.readinget, which means that when trying to access a value of array, we will check whether the value to get is null first. So we can avoid throwing exception when we try to access a null value of array.Besides, this PR introduces another new class named
ArrowColumnarMap. This class defines two fields of typeArrowColumnarArrayto represent keys and values, separately. With this class, we can also avoid throwing exception when we try to access a null value of map.How was this patch tested?
unit tests.
Related issue: #11330