Add gzip support to fwrite#3278
Add gzip support to fwrite#3278philippechataignon wants to merge 9 commits intoRdatatable:masterfrom
Conversation
Use zlib and gzopen/gzwrite/gzclose function to write buffer directly in a gzipped csv file. zlib is thread-safe and the gzip compression use the fwrite threads. Option compress="gzip" is added to fwrite et is automatically set when file ends with ".gz"
Codecov Report
@@ Coverage Diff @@
## master #3278 +/- ##
==========================================
- Coverage 94.81% 94.68% -0.14%
==========================================
Files 65 65
Lines 12094 12125 +31
==========================================
+ Hits 11467 11480 +13
- Misses 627 645 +18
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #3278 +/- ##
==========================================
- Coverage 94.81% 94.72% -0.09%
==========================================
Files 65 65
Lines 12094 12125 +31
==========================================
+ Hits 11467 11486 +19
- Misses 627 639 +12
Continue to review full report at Codecov.
|
jangorecki
left a comment
There was a problem hiding this comment.
Very nice contribution. Reviewed mostly R code and UI. Also would be better to avoid formatting code like removing empty newlines.
| length(nThread)==1L && !is.na(nThread) && nThread>=1L | ||
| ) | ||
|
|
||
| is_gzip <- compress == "gzip" || grepl("\\.gz$", file) |
There was a problem hiding this comment.
Nice to recognise filename but we could warn if user explicitly set compress to none and uses gz filename
There was a problem hiding this comment.
In commit 75af89e, I propose this approch :
In fwrite, compress has now 3 options :
- default : gzip if file ends with .gz, else csv
- none : force csv
- gzip : force gzip
There was a problem hiding this comment.
Might be better "auto" instead of "default".
Anyway new argument is not required as we might do:
is_gzip <- compress == "gzip" || (missing(compress) && grepl("\\.gz$", file))In fwrite, compress has now 3 options : * default : gzip if file ends with .gz, else csv * none : force csv * gzip : force gzip
| if (.Platform$OS.type=="unix") { | ||
| f <- tempfile() | ||
| fwrite(data.table(a=c(1:3), b=c(1:3)), file=f, compress="gzip") | ||
| test(1658.38, system(paste("zcat", f), intern=T), output='[1] "a,b" "1,1" "2,2" "3,3"') |
There was a problem hiding this comment.
there is manual check that will make those line (and next test too) fail, related to T instead of TRUE, see
Line 61 in 7d2acb5
|
Thanks for reviews. I will close this PR for now to improve this enhancement. I saw 2 problems :
|
|
There is a program called |
This is a first attempt to implement gzipped csv output in fwrite (issue #2016).
It uses zlib library and replaces open/write/close used for csv file by gzopen/gzwrite/gzclose for gzipped csv. A second buffer, which size is around 10% bigger than main buffer, is allocated for each thread. zlib is thread-safe and the gzip compression use all the available threads.
Option
compress="gzip"is added to fwrite and is automatically set when file ends with.gz. Default iscompress = "none".On my system (Debian Linux), r-base-dev install zlib1g-dev, which is needed to compile. I don't need to add a -lz for gcc to compile but that may be not true for others systems and on Windows or Mac. I guess that file
cc.Rmust be modified to indicate the zlib dependence but it's too hard for me.I've added 2 tests but it seems to be difficult to test a binary output like a gzipped csv. Test uses command
zcatand runs only on unix platforms.Please feel free to test.