As found while working on ARROW-18004: the dataset scanner and the Acero engine rely on ExecBatch::ToRecordBatch returning successfully when the given schema has fewer fields than the ExecBatch has columns.
This apparently allows to implicitly drop the dataset-added columns (kAugmentedFields in arrow/dataset/scanner.cc) from a scan's final result.
However, it seems wrong and brittle to do this implicitly at the ExecBatch::ToRecordBatch level (hiding potential errors). Instead, it should probably be done explicitly inside Acero/dataset.
Reporter: Antoine Pitrou / @pitrou
Related issues:
Note: This issue was originally created as ARROW-18037. Please see the migration documentation for further details.