[C++] Acero/dataset relies on ExecBatch::ToRecordBatch truncating excess columns

As found while working on ARROW-18004: the dataset scanner and the Acero engine rely on `ExecBatch::ToRecordBatch` returning successfully when the given schema has fewer fields than the ExecBatch has columns.

This apparently allows to implicitly drop the dataset-added columns (`kAugmentedFields` in `arrow/dataset/scanner.cc`) from a scan's final result.

However, it seems wrong and brittle to do this implicitly at the `ExecBatch::ToRecordBatch` level (hiding potential errors). Instead, it should probably be done explicitly inside Acero/dataset.


**Reporter**: [Antoine Pitrou](https://issues.apache.org/jira/browse/ARROW-18037) / @pitrou
#### Related issues:
- [[C++] ExecBatch conversion to RecordBatch may go out of bounds](https://github.com/apache/arrow/issues/33208) (is related to)

<sub>**Note**: *This issue was originally created as [ARROW-18037](https://issues.apache.org/jira/browse/ARROW-18037). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Acero/dataset relies on ExecBatch::ToRecordBatch truncating excess columns #33240

Related issues:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[C++] Acero/dataset relies on ExecBatch::ToRecordBatch truncating excess columns #33240

Description

Related issues:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions