The bug can be triggered through python:
import pyarrow.parquet
array = pyarrow.array.from_pylist([1] * 1000000)
rb = pyarrow.RecordBatch.from_arrays([array], ['a'])
rb2 = rb.slice(0,2)
with open('/tmp/t.arrow', 'wb') as f:
w = pyarrow.ipc.FileWriter(f, rb.schema)
w.write_batch(rb2)
w.close()
which will result in a big file:
$ ll /tmp/t.arrow
-rw-rw-r-- 1 itai itai 800618 Apr 12 13:22 /tmp/t.arrow
Reporter: Itai Incze / @itaiin
Assignee: Wes McKinney / @wesm
Related issues:
Note: This issue was originally created as ARROW-809. Please see the migration documentation for further details.