Skip to content

[R] [C++] Allow me to write_parquet() from an arrow_dplyr_query  #29992

@asfimport

Description

@asfimport

Right now, I can:

ds <- open_dataset("some.parquet")
ds %>% 
  mutate(
    o_orderdate = cast(o_orderdate, date32())  
  ) %>% 
  write_dataset(path = "new.parquet")

but I can't:

tab <- read_parquet("some.parquet", as_data_frame = FALSE)
tab %>% 
  mutate(
    o_orderdate = cast(o_orderdate, date32())  
  ) %>% 
  write_parquet("new.parquet")

In this case, I can cast the column as a separate command and then write_parquet() after, but it would be nice to be able to us write_parquet() in a pipeline.

This will require a libarrow addition to / another version of WriteParquet that takes a RecordBatchReader instead of a fully-instantiated Table

Reporter: Jonathan Keane / @jonkeane

Related issues:

Note: This issue was originally created as ARROW-14428. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions