-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Right now, I can:
ds <- open_dataset("some.parquet")
ds %>%
mutate(
o_orderdate = cast(o_orderdate, date32())
) %>%
write_dataset(path = "new.parquet")but I can't:
tab <- read_parquet("some.parquet", as_data_frame = FALSE)
tab %>%
mutate(
o_orderdate = cast(o_orderdate, date32())
) %>%
write_parquet("new.parquet")In this case, I can cast the column as a separate command and then write_parquet() after, but it would be nice to be able to us write_parquet() in a pipeline.
This will require a libarrow addition to / another version of WriteParquet that takes a RecordBatchReader instead of a fully-instantiated Table
Reporter: Jonathan Keane / @jonkeane
Related issues:
- write_parquet() / write_csv_arrow() cannot stream a dataset object back to S3 (is duplicated by)
- [C++] Allow ParquetWriter to take a RecordBatchReader as input (depends upon)
Note: This issue was originally created as ARROW-14428. Please see the migration documentation for further details.