Skip to content

Java Arrow RecordBatch might have logical row count which is not same as physical row count in the arrays #973

@viirya

Description

@viirya

Describe the bug

Integrating Comet with Iceberg internally gets the following error if there are deleted rows in the Iceberg table:

org.apache.comet.CometNativeException: Invalid argument error: all columns in a record batch must have the specified row count

It is because Iceberg stores row mappings in its CometVector implementations and uses it to skip deleted rows during iterating rows in a batch. The row values in arrays are not actually deleted. The Iceberg batch reader sets the number of rows of a returned record batch to be the "logical" number of rows after deletion. It is okay for Java Arrow.

However, arrow-rs has stricter check on the lengths of arrays and row number parameter. Once it detects they are inconsistent, an error like above will be thrown.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions