I'm trying to read a file that was output from parquet to csv(s) from Spark.
In its infinite wisdom, Spark created some empty files. So this work flow failed:
read_f = list.files('path/to/csvs', pattern = 'csv$', full.names = TRUE)
DT = rbindlist(lapply(read_f, fread))
It's kind of a pain to have to single out empty files (basically add the line read_f = read_f[file.info(read_f)$size > 0]) when the vast majority of the time this operation works as intended (since it's rare for spark to output empty files) -- is there any reason fread can't just warn for such a file and skip?
I'm trying to read a file that was output from parquet to csv(s) from Spark.
In its infinite wisdom, Spark created some empty files. So this work flow failed:
It's kind of a pain to have to single out empty files (basically add the line
read_f = read_f[file.info(read_f)$size > 0]) when the vast majority of the time this operation works as intended (since it's rare for spark to output empty files) -- is there any reasonfreadcan't just warn for such a file and skip?