Skip to content

[R] Support for connection class when reading and writing files #25332

@asfimport

Description

@asfimport

We have an internal filesystem that we interact with through objects that inherit from the connection class. These files aren't necessarily local, making it slightly more complicated to read and write parquet files, for example.

For now, we're generating raw vectors and using that to create the file. For example, to read files


ReadParquet <- function(filename, ...) {}}
   file <-file(filename,"rb")
   on.exit(close(file))
   raw <- readBin(file, "raw", FileInfo(filename)$size)
   return(arrow::read_parquet(raw, ...))
}

And to write,


WriteParquet <- function(df, filepath, ...) {
   stream <- BufferOutputStream$create()
   write_parquet(df, stream, ...)
   raw <- stream$finish()$data()
   file <- file(filepath, "wb")
   on.exit(close(file)
   writeBin(raw, file)
   return(invisible())
}

At the C++ level, we are interacting with R_new_custom_connection defined here:
https://github.com/wch/r-source/blob/trunk/src/include/R_ext/Connections.h

I've been very impressed with how feature-rich arrow is. It would be nice to see this API supported as well.

Reporter: Michael Quinn
Assignee: Dewey Dunnington / @paleolimbot

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-9235. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions