Skip to content

Index out of bounds error durring read of an Avro file #12682

@JonasDev1

Description

@JonasDev1

Describe the bug

I am faced with index out of bounds Error in the Avro Reader. From the stacktrace I can see that the size of a null_buffer is wrong initialized if the data type is a nested nullable struct array. The origin is that nullable Values are Union[_,Array] instead of just Array. Due to that the array_item_count is wrongly calculated in datafusion/core/src/datasource/avro_to_arrow/arrow_array_reader.rs:575

The issue can be solved with the maybe_resolve_union function

To Reproduce

Read a Avro File which contains a column in the following format:

{
      "name": "some_array",
      "type": [
        "null",
        {
          "type": "array",
          "items": {
            "type": "record",
            "name": "Item",
            "fields": [
              {
                "name": "id",
                "type": "long"
              }
             ]
          }
      ]
}

Expected behavior

No response

Additional context

Stacktrace:
index out of bounds: the len is 1 but the index is 1 stack backtrace: 0: rust_begin_unwind at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:665:5 1: core::panicking::panic_fmt at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/panicking.rs:74:14 2: core::panicking::panic_bounds_check at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/panicking.rs:276:5 3: arrow_buffer::util::bit_util::set_bit at /Users/JONSchmi/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-52.2.0/src/util/bit_util.rs:55:5 4: datafusion::datasource::avro_to_arrow::arrow_array_reader::AvroArrowArrayReader<R>::build_nested_list_array::{{closure}}::{{closure}} at /Users/JONSchmi/data-platform/services/kafka-ingest-lambda/datafusion/datafusion/core/src/datasource/avro_to_arrow/arrow_array_reader.rs:598:41

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions