Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
aa848a7
Add gzip support to fwrite
Jan 12, 2019
92b9ca4
Add compress= option in fwrite documentation
Jan 12, 2019
a2d6969
Add tests for fwrite with compress="gzip" option
Jan 12, 2019
6010131
Rewrite test 1658.12
Jan 12, 2019
9f16e31
Add default option in compress
Jan 13, 2019
1ca3454
Adapt fwrite compress option documentation
Jan 13, 2019
70fdf8b
Replace 'default' by 'auto' in fwrite compress option
Jan 13, 2019
a279dae
Tests for gzip compression in fwrite and restore tests 1658.11,12
Jan 13, 2019
c8858e4
\#endif was in wrong place
Jan 13, 2019
42b4964
Remove realloc sections
Jan 13, 2019
81cc49b
Header line is written in buff
Jan 13, 2019
9b575db
Header compression OK
Jan 13, 2019
1375a34
Tmp : fwrite csv doesn't end
Jan 13, 2019
38764d9
self-or for failed
Jan 13, 2019
7586e2f
Add if fail
Jan 13, 2019
72c70ae
Remove #endif
Jan 13, 2019
6dd28c8
Move compress buffer at the right place
Jan 13, 2019
7346875
Compress in thread, only write is ordered
Jan 13, 2019
98d90dc
Remove comment
Jan 13, 2019
9202717
Move bool is_gzip in fwrite.h
Jan 14, 2019
2564197
Add cast and remove unused variables
Jan 14, 2019
9e9de37
Test buffer size et adapt error messages
Jan 14, 2019
ffa26cb
Test buffer size after each line
Jan 14, 2019
8423996
Remove old comment
Jan 14, 2019
c161828
Use strlen to compute maxLineLen in fwrite
Jan 15, 2019
646535b
Compute mexHeaderLen and maxLineLen
Jan 15, 2019
ea68fd0
Compute output length for writeListe with buffer
Jan 15, 2019
7464a9a
Reinsert fwrite progress bar
Jan 15, 2019
e62b77e
Introduce buffLimit and buffSecure in fwrite
Jan 15, 2019
af44e9a
TRUE instead of T in tests.Rraw
Jan 15, 2019
dc2b825
Typo correction
Jan 15, 2019
49ee223
Alloc zbuff only if args.gzip
Jan 15, 2019
9200e8e
Test if string/factor is NA is line header length detection
Jan 15, 2019
5b93d91
Correct zbuff allocation
Jan 15, 2019
7ccc3e6
Add missing is_gzip test
Jan 15, 2019
47d2210
Use directly maxLineLen ; don't divide by 2
Jan 15, 2019
01f3db8
Initialize buffer pointer to NULL
Jan 16, 2019
e2a97ff
Add news entry for gzip support in fwrite
Jan 16, 2019
c24c154
Move enum WF in fwrite.h
Jan 16, 2019
cd8768b
Replace integer by WF enum
Jan 16, 2019
33ad794
Test size before writing in thread
Jan 17, 2019
739ac7e
Replace strlen by strnlen
Jan 20, 2019
f7f37e4
Use void* and size_t instead of Bytef* and uLongf
Jan 24, 2019
fa7d6c0
Merge branch 'master' into fwrite_gzip
philippechataignon Feb 9, 2019
07c0f6f
Merge branch 'master' into fwrite_gzip
philippechataignon Feb 22, 2019
3d37bad
Merge branch 'master' into fwrite_gzip
mattdowle Apr 16, 2019
46fd237
attempt to pass on travis/appveyor: added -lz to PKG_LIBS
mattdowle Apr 16, 2019
0e10df9
fixed fwrite.Rd warning (treated as fail by CI)
mattdowle Apr 16, 2019
031ebd0
news tidy
mattdowle Apr 17, 2019
c32249a
nocov and 2-space indentation
mattdowle Apr 17, 2019
1189568
removed not-used writer_len item in fwriteMainArgs
mattdowle Apr 17, 2019
25ffdb0
extra 1 width for sign of negative ints
mattdowle Apr 17, 2019
3c097ae
declare variables close to first usage, and added comment as to why u…
mattdowle Apr 18, 2019
c2efc8e
reduce diff; free(NULL) is no-op ok (removed if)
mattdowle Apr 18, 2019
01833e2
more diff reduction
mattdowle Apr 19, 2019
6a7d87b
free's were missing on STOP('compress gzip error'). Moved zbuffUsed d…
mattdowle Apr 19, 2019
42711f2
interm: no sample for maxLineLen, reorganized
mattdowle Apr 20, 2019
034149b
hot loop hot again; passes all tests
mattdowle Apr 23, 2019
96d6230
revert unrelated indentation fix to reduce diff (will do post-merge)
mattdowle Apr 23, 2019
a593abd
coverage
mattdowle Apr 23, 2019
8482bd1
compress tests made cross-platform, and example added to news item
mattdowle Apr 23, 2019
42ceb61
simpler example in news item
mattdowle Apr 23, 2019
ee3e595
tidy
mattdowle Apr 23, 2019
989740c
news item for bug fixes with links
mattdowle Apr 23, 2019
7665a8d
coverage
mattdowle Apr 23, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Authors@R: c(
Depends: R (>= 3.1.0)
Imports: methods
Suggests: bit64, curl, R.utils, knitr, xts, nanotime, zoo
SystemRequirements: zlib
Description: Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
License: MPL-2.0 | file LICENSE
URL: http://r-datatable.com
Expand Down
20 changes: 16 additions & 4 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,21 @@

#### NEW FEATURES

1. New option `options(datatable.quiet = TRUE)` turns off the package startup message, [#3489](https://github.com/Rdatatable/data.table/issues/3489). `suppressPackageStartupMessages()` continues to work too. Thanks to @leobarlach for the suggestion inspired by `options(tidyverse.quiet = TRUE)`. We don't know of a way to make a package respect the `quietly=` option of `library()` and `require()` because the `quietly=` isn't passed through for use by the package's own `.onAttach`. If you can see how to do that, please submit a patch to R.
1. `rleid()` functions now support long vectors (length > 2 billion).

2. `rleid()` functions now support long vectors (length > 2 billion).

3. `fread()`:
2. `fread()`:
* now skips embedded `NUL` (`\0`), [#3400](https://github.com/Rdatatable/data.table/issues/3400). Thanks to Marcus Davy for reporting with examples, and Roy Storey for the initial PR.

3. `fwrite()`:
* now writes compressed `.gz` files directly, [#2016](https://github.com/Rdatatable/data.table/issues/2016). Compression, like `fwrite()`, is multithreaded and compresses each chunk on-the-fly (a full size intermediate file is not created). Use a ".gz" extension, or the new `compress=` option. Many thanks to Philippe Chataignon for the significant PR. For example:

```R
DT = data.table(A=rep(1:2,each=100), B=rep(1:4,each=25))
fwrite(DT, "data.csv") # 804 bytes
fwrite(DT, "data.csv.gz") # 74 bytes
identical(DT, fread("data.csv.gz"))
```

4. Assigning to one item of a list column no longer requires the RHS to be wrapped with `list` or `.()`, [#950](https://github.com/Rdatatable/data.table/issues/950).
```R
> DT = data.table(A=1:3, B=list(1:2,"foo",3:5))
Expand Down Expand Up @@ -45,6 +53,8 @@

3. A missing item in `j` such as `j=.(colA, )` now gives a helpful error (`Item 2 of the .() or list() passed to j is missing`) rather than the unhelpful error `argument "this_jsub" is missing, with no default` (v1.12.2) or `argument 2 is empty` (v1.12.0 and before), [#3507](https://github.com/Rdatatable/data.table/issues/3507). Thanks to @eddelbuettel for the report.

4. `fwrite()` could crash when writing very long strings such as 30 million characters, [#2974](https://github.com/Rdatatable/data.table/issues/2974), and could be unstable in memory constrained environments, [#2612](https://github.com/Rdatatable/data.table/issues/2612). Thanks to @logworthy and @zachokeeffe for reporting and Philippe Chataignon for fixing in PR [#3288](https://github.com/Rdatatable/data.table/pull/3288).

#### NOTES

1. `rbindlist`'s `use.names="check"` now emits its message for automatic column names (`"V[0-9]+"`) too, [#3484](https://github.com/Rdatatable/data.table/pull/3484). See news item 5 of v1.12.2 below.
Expand All @@ -58,6 +68,8 @@

3. `setorder` on a superset of a keyed `data.table`'s key now retains its key, [#3456](https://github.com/Rdatatable/data.table/issues/3456). For example, if `a` is the key of `DT`, `setorder(DT, a, -v)` will leave `DT` keyed by `a`.

4. New option `options(datatable.quiet = TRUE)` turns off the package startup message, [#3489](https://github.com/Rdatatable/data.table/issues/3489). `suppressPackageStartupMessages()` continues to work too. Thanks to @leobarlach for the suggestion inspired by `options(tidyverse.quiet = TRUE)`. We don't know of a way to make a package respect the `quietly=` option of `library()` and `require()` because the `quietly=` isn't passed through for use by the package's own `.onAttach`. If you can see how to do that, please submit a patch to R.


### Changes in [v1.12.2](https://github.com/Rdatatable/data.table/milestone/14?closed=1) (07 Apr 2019)

Expand Down
9 changes: 7 additions & 2 deletions R/fwrite.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,11 @@ fwrite <- function(x, file="", append=FALSE, quote="auto",
dateTimeAs = c("ISO","squash","epoch","write.csv"),
buffMB=8, nThread=getDTthreads(verbose),
showProgress=getOption("datatable.showProgress", interactive()),
compress = c("auto", "none", "gzip"),
verbose=getOption("datatable.verbose", FALSE)) {
na = as.character(na[1L]) # fix for #1725
if (missing(qmethod)) qmethod = qmethod[1L]
if (missing(compress)) compress = compress[1L]
if (missing(dateTimeAs)) { dateTimeAs = dateTimeAs[1L] }
else if (length(dateTimeAs)>1L) stop("dateTimeAs must be a single string")
dateTimeAs = chmatch(dateTimeAs, c("ISO","squash","epoch","write.csv"))-1L
Expand Down Expand Up @@ -37,13 +39,17 @@ fwrite <- function(x, file="", append=FALSE, quote="auto",
dec != sep, # sep2!=dec and sep2!=sep checked at C level when we know if list columns are present
is.character(eol) && length(eol)==1L,
length(qmethod) == 1L && qmethod %chin% c("double", "escape"),
length(compress) == 1L && compress %chin% c("auto", "none", "gzip"),
isTRUEorFALSE(col.names), isTRUEorFALSE(append), isTRUEorFALSE(row.names),
isTRUEorFALSE(verbose), isTRUEorFALSE(showProgress), isTRUEorFALSE(logical01),
length(na) == 1L, #1725, handles NULL or character(0) input
is.character(file) && length(file)==1L && !is.na(file),
length(buffMB)==1L && !is.na(buffMB) && 1L<=buffMB && buffMB<=1024,
length(nThread)==1L && !is.na(nThread) && nThread>=1L
)

is_gzip <- compress == "gzip" || (compress == "auto" && grepl("\\.gz$", file))

file <- path.expand(file) # "~/foo/bar"
if (append && missing(col.names) && (file=="" || file.exists(file)))
col.names = FALSE # test 1658.16 checks this
Expand All @@ -70,7 +76,6 @@ fwrite <- function(x, file="", append=FALSE, quote="auto",
file <- enc2native(file) # CfwriteR cannot handle UTF-8 if that is not the native encoding, see #3078.
.Call(CfwriteR, x, file, sep, sep2, eol, na, dec, quote, qmethod=="escape", append,
row.names, col.names, logical01, dateTimeAs, buffMB, nThread,
showProgress, verbose)
showProgress, is_gzip, verbose)
invisible()
}

20 changes: 19 additions & 1 deletion inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -9467,6 +9467,23 @@ test(1658.34, fwrite(matrix(1:4, nrow=2, ncol=2), quote = TRUE), output = '"V1",
test(1658.35, fwrite(matrix(1:3, nrow=3, ncol=1), quote = TRUE), output = '"V1"\n.*1\n2\n3', message = "x being coerced from class: matrix to data.table")
test(1658.36, fwrite(matrix(1:4, nrow=2, ncol=2, dimnames = list(c("ra","rb"),c("ca","cb"))), quote = TRUE), output = '"ca","cb"\n.*1,3\n2,4', message = "x being coerced from class: matrix to data.table")

# fwrite compress
test(1658.37, fwrite(data.table(a=c(1:3), b=c(1:3)), compress="gzip"), output='a,b\n1,1\n2,2\n3,3') # compress ignored on console
DT = data.table(a=rep(1:2,each=100), b=rep(1:4,each=25))
fwrite(DT, file=f1<-tempfile(fileext=".gz"))
fwrite(DT, file=f2<-tempfile())
test(1658.38, file.size(f1)<file.size(f2)) # 74 < 804
test(1658.39, fread(f1), DT) # use fread to decompress gz (works cross-platform)
fwrite(DT, file=f3<-tempfile(), compress="gzip") # compress to filename not ending .gz
test(1658.40, file.size(f3), file.size(f1))
unlink(c(f1,f2,f3))
DT = data.table(a=1:3, b=list(1:4, c(3.14, 100e10), c("foo", "bar", "baz")))
test(1658.41, fwrite(DT), output=c("a,b","1,1|2|3|4","2,3.14|1e+12","3,foo|bar|baz"))
DT[3,b:=c(3i,4i,5i)]
test(1658.42, fwrite(DT), error="Row 3 of list column is type 'complex'")
DT[3,b:=factor(letters[1:3])]
test(1658.43, fwrite(DT), error="Row 3 of list column is type 'factor'")

## End fwrite tests

# tests for #679, inrange(), FR #707
Expand Down Expand Up @@ -10334,7 +10351,8 @@ DT = data.table(
D = as.POSIXct(dt<-paste(d,t), tz="UTC"),
E = as.POSIXct(paste0(dt,c(".999",".0",".5",".111112",".123456",".023",".0",".999999",".99",".0009")), tz="UTC"))

test(1740.1, fwrite(DT,dateTimeAs="iso"), error="dateTimeAs must be 'ISO','squash','epoch' or 'write.csv'")
test(1740.0, fwrite(DT,dateTimeAs="iso"), error="dateTimeAs must be 'ISO','squash','epoch' or 'write.csv'")
test(1740.1, fwrite(DT,dateTimeAs=c("ISO","squash")), error="dateTimeAs must be a single string")
test(1740.2, capture.output(fwrite(DT,dateTimeAs="ISO")), c(
"A,B,C,D,E",
"1907-10-21,1907-10-21,23:59:59,1907-10-21T23:59:59Z,1907-10-21T23:59:59.999Z",
Expand Down
2 changes: 2 additions & 0 deletions man/fwrite.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ fwrite(x, file = "", append = FALSE, quote = "auto",
dateTimeAs = c("ISO","squash","epoch","write.csv"),
buffMB = 8L, nThread = getDTthreads(verbose),
showProgress = getOption("datatable.showProgress", interactive()),
compress = c("auto", "none", "gzip"),
verbose = getOption("datatable.verbose", FALSE))
}
\arguments{
Expand Down Expand Up @@ -52,6 +53,7 @@ fwrite(x, file = "", append = FALSE, quote = "auto",
\item{buffMB}{The buffer size (MB) per thread in the range 1 to 1024, default 8MB. Experiment to see what works best for your data on your hardware.}
\item{nThread}{The number of threads to use. Experiment to see what works best for your data on your hardware.}
\item{showProgress}{ Display a progress meter on the console? Ignored when \code{file==""}. }
\item{compress}{If \code{compress = "auto"} and if \code{file} ends in \code{.gz} then output format is gzipped csv else csv. If \code{compress = "none"}, output format is always csv. If \code{compress = "gzip"} then format is gzipped csv. Output to the console is never gzipped even if \code{compress = "gzip"}. By default, \code{compress = "auto"}.}
\item{verbose}{Be chatty and report timings?}
}
\details{
Expand Down
2 changes: 1 addition & 1 deletion src/Makevars
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

PKG_CFLAGS = $(SHLIB_OPENMP_CFLAGS)
PKG_LIBS = $(SHLIB_OPENMP_CFLAGS)
PKG_LIBS = $(SHLIB_OPENMP_CFLAGS) -lz

all: $(SHLIB)
mv $(SHLIB) datatable$(SHLIB_EXT)
Expand Down
Loading