Skip to content

[R] Reading in Parquet files are 20x slower than reading fst files in R #22617

@asfimport

Description

@asfimport

Problem

Loading any of the data I mentioned below is 20x slower than the fst format in R.

 

How to get the data

https://loanperformancedata.fanniemae.com/lppub/index.html

Register and download any of these. I can't provide the data to you, and I think it's best you register.

 

image-2019-08-14-10-04-56-834.png

 

Code

path = "data/Performance_2016Q4.txt"

library(data.table)
 library(arrow)

a = data.table::fread(path, header = FALSE)

fst::write_fst(a, "data/a.fst")

arrow::write_parquet(a, "data/a.parquet")

rm(a); gc()

#read in test
system.time(a <- fst::read_fst("data/a.fst")) # 4.61 seconds

rm(a); gc()

read in test
system.time(a <- arrow::read_parquet("data/a.parquet") # 99.19 seconds

Environment: Windows 10 Pro and Ubuntu
Reporter: Zhuo Jia Dai
Assignee: Wes McKinney / @wesm

Related issues:

Original Issue Attachments:

Note: This issue was originally created as ARROW-6230. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions