Skip to content

C++: Writing sliced record batch to IPC writes the entire array #15387

@asfimport

Description

@asfimport

The bug can be triggered through python:

import pyarrow.parquet
array = pyarrow.array.from_pylist([1] * 1000000)

rb = pyarrow.RecordBatch.from_arrays([array], ['a'])
rb2 = rb.slice(0,2)

with open('/tmp/t.arrow', 'wb') as f:
  w = pyarrow.ipc.FileWriter(f, rb.schema)
  w.write_batch(rb2)
  w.close()

which will result in a big file:

$ ll /tmp/t.arrow 
-rw-rw-r-- 1 itai itai 800618 Apr 12 13:22 /tmp/t.arrow

Reporter: Itai Incze / @itaiin
Assignee: Wes McKinney / @wesm

Related issues:

Note: This issue was originally created as ARROW-809. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions