diff --git a/NEWS.md b/NEWS.md index 0a9e8f0745..398819815d 100644 --- a/NEWS.md +++ b/NEWS.md @@ -80,6 +80,8 @@ 12. `as.data.table(table(NULL))` now returns `data.table(NULL)` rather than error `attempt to set an attribute on NULL`, [#4179](https://github.com/Rdatatable/data.table/issues/4179). The result differs slightly to `as.data.frame(table(NULL))` (0-row, 1-column) because 0-column works better with other `data.table` functions like `rbindlist()`. Thanks to Michael Chirico for the report and fix. +13. `melt` with a list for `measure.vars` would output `variable` inconsistently between `na.rm=TRUE` and `FALSE`, [#4455](https://github.com/Rdatatable/data.table/issues/4455). Thanks to @tdhock for reporting and fixing. + ## NOTES 1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example : diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 43cf3d87ca..9f3bb5eb30 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -3014,6 +3014,9 @@ test(1034, as.data.table(x<-as.character(sample(letters, 5))), data.table(V1=x)) error="Unknown 'id.vars' type raw") test(1035.012, melt(DT, id.vars=1:3, measure.vars=as.raw(0)), error="Unknown 'measure.vars' type raw") + test(1035.013, melt(data.table(a=1, b=1), id.vars=c(1,1)), data.table(a=1, a.1=1, variable=factor("b"), value=1)) + test(1035.014, melt(data.table(a1=1, b1=1, b2=2), na.rm=TRUE, measure.vars=list(a="a1", b=c("b1","b2"))), data.table(variable=factor(1,c("1","2")), a=1, b=1)) + test(1035.015, melt(data.table(a=1+2i, b=1), id.vars="a"), error="Unknown column type 'complex' for column 'a' in 'data'") ans1 = cbind(DT[, c(1,2,8), with=FALSE], variable=factor("l_1")) ans1[, value := DT$l_1] @@ -3175,9 +3178,9 @@ Sep,33.5,19.4,15.7,11.9,0,100.8,100.8,0,12.7,12.7,0,174.1") x[, c("y1","z1"):=NA] test(1037.405, dim(melt(x, measure.vars=patterns("^y", "^z"))), INT(4,5)) test(1037.406, dim(ans<-melt(x, measure.vars=patterns("^y", "^z"), na.rm=TRUE)), INT(2,5)) - test(1037.407, ans$variable, factor(c("1","1"))) + test(1037.407, ans$variable, factor(c("2","2"), c("1", "2"))) test(1037.408, dim(ans<-melt(x, measure.vars=patterns("^y", "^z"), na.rm=TRUE, variable.factor=FALSE)), INT(2,5)) - test(1037.409, ans$variable, c("1","1")) + test(1037.409, ans$variable, c("2","2")) test(1037.410, melt(data.table(NULL), verbose=TRUE), data.table(NULL), output="ncol(data) is 0. Nothing to melt") diff --git a/man/melt.data.table.Rd b/man/melt.data.table.Rd index e56a10e4e1..e51c61aaac 100644 --- a/man/melt.data.table.Rd +++ b/man/melt.data.table.Rd @@ -31,7 +31,7 @@ non-measure columns will be assigned to it. If integer, must be positive; see De } For convenience/clarity in the case of multiple \code{melt}ed columns, resulting column names can be supplied as names to the elements \code{measure.vars} (in the \code{list} and \code{patterns} usages). See also \code{Examples}. } -\item{variable.name}{name for the measured variable names column. The default name is \code{'variable'}.} +\item{variable.name}{name (default \code{'variable'}) of output column containing information about which input column(s) were melted. If \code{measure.vars} is an integer/character vector, then each entry of this column contains the name of a melted column from \code{data}. If \code{measure.vars} is a list of integer/character vectors, then each entry of this column contains an integer indicating an index/position in each of those vectors.} \item{value.name}{name for the molten data values column(s). The default name is \code{'value'}. Multiple names can be provided here for the case when \code{measure.vars} is a \code{list}, though note well that the names provided in \code{measure.vars} take precedence. } \item{na.rm}{If \code{TRUE}, \code{NA} values will be removed from the molten data.} diff --git a/src/fmelt.c b/src/fmelt.c index 22a4ac1fc5..39b93d2e81 100644 --- a/src/fmelt.c +++ b/src/fmelt.c @@ -538,7 +538,6 @@ SEXP getvarcols(SEXP DT, SEXP dtnames, Rboolean varfactor, Rboolean verbose, str } else { for (int j=0, ansloc=0, level=1; jlmax; ++j) { const int thislen = data->narm ? length(VECTOR_ELT(data->naidx, j)) : data->nrow; - if (thislen==0) continue; // so as not to bump level char buff[20]; snprintf(buff, 20, "%d", level++); SEXP str = PROTECT(mkChar(buff)); @@ -546,11 +545,11 @@ SEXP getvarcols(SEXP DT, SEXP dtnames, Rboolean varfactor, Rboolean verbose, str UNPROTECT(1); } } - } else { + } else {// varfactor==TRUE SET_VECTOR_ELT(ansvars, 0, target=allocVector(INTSXP, data->totlen)); SEXP levels; int *td = INTEGER(target); - if (data->lvalues == 1) { + if (data->lvalues == 1) {//single output column. SEXP thisvaluecols = VECTOR_ELT(data->valuecols, 0); int len = length(thisvaluecols); levels = PROTECT(allocVector(STRSXP, len)); protecti++; @@ -573,24 +572,16 @@ SEXP getvarcols(SEXP DT, SEXP dtnames, Rboolean varfactor, Rboolean verbose, str const int thislen = data->narm ? length(VECTOR_ELT(data->naidx, j)) : data->nrow; for (int k=0; klmax)); protecti++; for (int j=0, ansloc=0; jlmax; ++j) { const int thislen = data->narm ? length(VECTOR_ELT(data->naidx, j)) : data->nrow; - if (thislen==0) continue; // so as not to bump level char buff[20]; snprintf(buff, 20, "%d", nlevel+1); SET_STRING_ELT(levels, nlevel++, mkChar(buff)); // generate levels = 1:nlevels for (int k=0; klmax) { - // data->narm is true and there are some all-NA items causing at least one 'if (thislen==0) continue' above - // shrink the levels - SEXP newlevels = PROTECT(allocVector(STRSXP, nlevel)); protecti++; - for (int i=0; i