Skip to content

List Row Encoding Sorts Incorrectly #5807

@tustvold

Description

@tustvold

Describe the bug

The list encoding concatenates the row values consecutively when constructing a row

/// ```text
///                         ┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
///  [1_u8, 2_u8, 3_u8]     │01│01│01│02│01│03│00│00│00│02│00│00│00│02│00│00│00│02│00│00│00│03│
///                         └──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
///                          └──── rows ────┘   └───────── row lengths ─────────┘  └─ count ─┘
///
///                         ┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
///  [1_u8, null]           │01│01│00│00│00│00│00│02│00│00│00│02│00│00│00│02│
///                         └──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
///```

When comparing rows this can cause row to be incorrectly compared against row lengths, leading to arbitrary results

To Reproduce

 let mut a = ListBuilder::new(Int8Builder::new());
a.append_value([None]);
a.append_value([None, None]);
let a = a.finish();
let converter = RowConverter::new(vec![SortField::new(a.data_type().clone())]).unwrap();
let rows = converter.convert_columns(&[Arc::new(a) as _]).unwrap();
assert_eq!(rows.row(0).cmp(&rows.row(1)), Ordering::Less);

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

arrowChanges to the arrow cratebug

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions