-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
We have an internal filesystem that we interact with through objects that inherit from the connection class. These files aren't necessarily local, making it slightly more complicated to read and write parquet files, for example.
For now, we're generating raw vectors and using that to create the file. For example, to read files
ReadParquet <- function(filename, ...) {}}
file <-file(filename,"rb")
on.exit(close(file))
raw <- readBin(file, "raw", FileInfo(filename)$size)
return(arrow::read_parquet(raw, ...))
}
And to write,
WriteParquet <- function(df, filepath, ...) {
stream <- BufferOutputStream$create()
write_parquet(df, stream, ...)
raw <- stream$finish()$data()
file <- file(filepath, "wb")
on.exit(close(file)
writeBin(raw, file)
return(invisible())
}
At the C++ level, we are interacting with R_new_custom_connection defined here:
https://github.com/wch/r-source/blob/trunk/src/include/R_ext/Connections.h
I've been very impressed with how feature-rich arrow is. It would be nice to see this API supported as well.
Reporter: Michael Quinn
Assignee: Dewey Dunnington / @paleolimbot
Related issues:
- [R] Stream reader/writer API that takes socket stream (is duplicated by)
PRs and other links:
Note: This issue was originally created as ARROW-9235. Please see the migration documentation for further details.