-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
I am faced with index out of bounds Error in the Avro Reader. From the stacktrace I can see that the size of a null_buffer is wrong initialized if the data type is a nested nullable struct array. The origin is that nullable Values are Union[_,Array] instead of just Array. Due to that the array_item_count is wrongly calculated in datafusion/core/src/datasource/avro_to_arrow/arrow_array_reader.rs:575
The issue can be solved with the maybe_resolve_union function
To Reproduce
Read a Avro File which contains a column in the following format:
{
"name": "some_array",
"type": [
"null",
{
"type": "array",
"items": {
"type": "record",
"name": "Item",
"fields": [
{
"name": "id",
"type": "long"
}
]
}
]
}
Expected behavior
No response
Additional context
Stacktrace:
index out of bounds: the len is 1 but the index is 1 stack backtrace: 0: rust_begin_unwind at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:665:5 1: core::panicking::panic_fmt at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/panicking.rs:74:14 2: core::panicking::panic_bounds_check at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/panicking.rs:276:5 3: arrow_buffer::util::bit_util::set_bit at /Users/JONSchmi/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-52.2.0/src/util/bit_util.rs:55:5 4: datafusion::datasource::avro_to_arrow::arrow_array_reader::AvroArrowArrayReader<R>::build_nested_list_array::{{closure}}::{{closure}} at /Users/JONSchmi/data-platform/services/kafka-ingest-lambda/datafusion/datafusion/core/src/datasource/avro_to_arrow/arrow_array_reader.rs:598:41