Skip to content

Regression in in parquet 42.0.0 : Bad parquet column indexes for All Null Columns, resulting in Parquet error: StructArrayReader out of sync on read #4459

@tustvold

Description

@tustvold

Describe the bug

Writing a column consisting solely of all nulls results in an empty offset index for that column

#[test]
fn test_writer_all_null() {
    let a = Int32Array::from(vec![1, 2, 3, 4, 5]);
    let b = Int32Array::new(vec![0; 5].into(), Some(NullBuffer::new_null(5)));
    let batch = RecordBatch::try_from_iter(vec![
        ("a", Arc::new(a) as ArrayRef),
        ("b", Arc::new(b) as ArrayRef),
    ])
    .unwrap();

    let mut buf = Vec::with_capacity(1024);
    let mut writer = ArrowWriter::try_new(&mut buf, batch.schema(), None).unwrap();
    writer.write(&batch).unwrap();
    writer.close().unwrap();

    let bytes = Bytes::from(buf);
    let options = ReadOptionsBuilder::new().with_page_index().build();
    let reader = SerializedFileReader::new_with_options(bytes, options).unwrap();
    let index = reader.metadata().offset_index().unwrap();

    assert_eq!(index.len(), 1);
    assert_eq!(index[0].len(), 2); // 2 columns
    assert_eq!(index[0][0].len(), 1); // 1 page
    assert_eq!(index[0][1].len(), 1); // 1 page
}

This appears to have been a bug introduced by #4389

In particular - https://github.com/apache/arrow-rs/pull/4389/files#diff-b1859e4da1d85e57a4185dc407458ac83a369dac132285689c27e878e3695ad6R695

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions