Title: Null values replaced with type defaults when loading multi-batch Arrow IPC streams
This repository is a minimal reproduction for an issue where loading an Arrow IPC stream containing multiple record batches into Perspective appears to replace some null values with type-specific defaults. The same logical data, encoded as a single record batch, preserves null values correctly.
This repro intentionally avoids the Perspective datagrid and any backend/networking code so the issue can be demonstrated with the smallest possible surface area.
Repository contents:
.
├── package.json
├── vite.config.js
└── repro.html
repro.html: primary browser repro using@perspective-dev/clientandapache-arrowpackage.json: JavaScript dependencies and scripts for the browser reprovite.config.js: vite config
The repro constructs three one-row Arrow RecordBatch objects, each with the same schema. The input logical data is:
[
{ "Identifier": "A", "Value": null, "Date": null },
{ "Identifier": "B", "Value": 5, "Date": null },
{ "Identifier": "C", "Value": null, "Date": "2025-06-15" }
]The repro compares two cases:
-
Multi-batch Arrow IPC stream:
- 3 record batches
- 1 row per batch
-
Single-batch Arrow IPC stream:
- 1 record batch
- 3 rows
Install dependencies and run the local dev server:
npm install
npm run devOpen the local URL printed by Vite, usually:
http://127.0.0.1:5173/repro.html
Then open the browser developer console and compare the logged output for:
Input Values
Input Dates
Multi-batch output
Single-batch output
Both the multi-batch and single-batch Arrow IPC streams should preserve null values.
Expected output:
[
{ "Identifier": "A", "Value": null, "Date": null },
{ "Identifier": "B", "Value": 5, "Date": null },
{ "Identifier": "C", "Value": null, "Date": 1749945600000 }
]I would expect multi-batch Arrow IPC streams to be supported equivalently to single-batch Arrow IPC streams, since both represent the same logical Arrow table.
The single-batch Arrow IPC stream preserves null values correctly.
The multi-batch Arrow IPC stream appears to replace some null values with type-specific defaults.
Observed:
Multi-batch output:
{'Identifier': 'A', 'Value': 0, 'Date': -62167305600000}
{'Identifier': 'B', 'Value': 5, 'Date': -62167305600000}
{'Identifier': 'C', 'Value': null, 'Date': 1749945600000}
Single-batch output:
{'Identifier': 'A', 'Value': null, 'Date': null}
{'Identifier': 'B', 'Value': 5, 'Date': null}
{'Identifier': 'C', 'Value': null, 'Date': 1749945600000}
The incorrect values are visible in the console output from view.to_json().
Browser repro:
@perspective-dev/client: 4.4.1apache-arrow: 21.1.0vite: 5.4.11- Browser: Chrome Version 145.0.7632.160
- OS: MacOS 26.3.1
I originally encountered this while working on an application using the Perspective datagrid. In that application, Arrow IPC data is produced by a backend that may naturally produce multiple record batches, then loaded into a Perspective-backed UI. I intentionally did not include the datagrid, backend, HTTP streaming, or app-specific code in this repro. The problem does not appear to be caused by HTTP streaming or by splitting network chunks, because the minimal repro constructs the Arrow data entirely in memory and still reproduces the issue.
A workaround is to consolidate the Arrow data into a single batch before serialization. In PyArrow, calling something like table.combine_chunks() before writing the IPC stream avoids the issue. However, that requires an additional copy and is expensive for large datasets.
Question: is loading multi-batch Arrow IPC streams expected to be supported by client.table()? If so, this appears to be a null-handling bug across record batch boundaries.
Link to Issue: perspective-dev/perspective#3169