Skip to content

[R] Use WriteNode in write_dataset() #30990

@asfimport

Description

@asfimport

Currently, write_dataset uses the Scanner interface, which can't handle everything that the ExecPlan does. So if your arrow_dplyr_query contains things like aggregations or (more importantly) joins, you have to materialize the Table in memory before you can write to disk. The WriteNode added in ARROW-13542 is a special sink node that can be put at the end of an ExecPlan, so data should be able to stream to disk in more cases, and will benefit from future improvements to ExecPlan memory usage and spillover.

Reporter: Neal Richardson / @nealrichardson
Assignee: Neal Richardson / @nealrichardson

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-15517. Please see the migration documentation for further details.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions