Skip to content

[C++][Dataset] Make Expressions available for projection #27080

@asfimport

Description

@asfimport

RecordBatchProjector should be replaced by an expression calling the "project" compute function.

Projection currently supports only reordering and subselection of fields, materializing virtual columns where necessary. Replacement with an Expression will enable specifying arbitrary expressions for projected columns:

// project an explicit selection:
// SELECT a as "a", b as "b" ...
project({field_ref("a"), field_ref("b")}, {"a", "b"});

// project an arithmetic expression:
// SELECT a + b as "a + b" ...
project({add(field_ref("a"), field_ref("b"))}, {"a + b"})

This will also allow the same expression optimization machinery used for filters to be directly applied to projections. Virtual columns become a consequence of constant folding:

// project in a partition where a == 3:
assert(
  SimplifyWithGuarantee(
    project({field_ref("a"), field_ref("b")}, {"a", "b"}),
    equal(field_ref("a"), literal(3))
  )
  == project({literal(3), field_ref("b")}, {"a", "b"})
)

 

Reporter: Ben Kietzman / @bkietz
Assignee: Ben Kietzman / @bkietz

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-11174. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions