Skip to content

[C#] ArrowStreamWriter writes FieldNodes in wrong order #16803

@asfimport

Description

@asfimport

When ArrowStreamWriter is writing a RecordBatch with nulls in it, it is mixing up the column's NullCount.

You can see here:

for (var i = 0; i < fieldCount; i++)
{
var fieldArray = recordBatch.Column(i);
fieldNodeOffsets[i] =
Flatbuf.FieldNode.CreateFieldNode(Builder, fieldArray.Length, fieldArray.NullCount);
}

It is writing the fields from 0 ~~> fieldCount order. But then lower, it is writing the fields from fieldCount ~~> 0.

Looking at the Java implementation it says

// struct vectors have to be created in reverse order
 

A simple test of roundtripping the following RecordBatch shows the issue:

 

var result = new RecordBatch(
new Schema.Builder()
.Field(f => f.Name("age").DataType(Int32Type.Default))
.Field(f => f.Name("CharCount").DataType(Int32Type.Default))
.Build(),
new IArrowArray[]
{
new Int32Array(
new ArrowBuffer.Builder<int>().Append(0).Build(),
new ArrowBuffer.Builder<byte>().Append(0).Build(),
length: 1,
nullCount: 1,
offset: 0),
new Int32Array(
new ArrowBuffer.Builder<int>().Append(7).Build(),
ArrowBuffer.Empty,
length: 1,
nullCount: 0,
offset: 0)
},
length: 1);

Here, the "age" column should have a null in it. However, when you write and read this RecordBatch back, you see that the "CharCount" column has NullCount == 1 and "age" column has NullCount == 0.

Reporter: Eric Erhardt / @eerhardt
Assignee: Eric Erhardt / @eerhardt

PRs and other links:

Note: This issue was originally created as ARROW-5887. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions