From 8b13f3eec48569c222099438b51d40be256628aa Mon Sep 17 00:00:00 2001 From: mattdowle Date: Thu, 23 May 2019 00:10:41 -0700 Subject: [PATCH 1/2] retain unused levels from zero-row data.table in rbindlist --- NEWS.md | 2 ++ inst/tests/tests.Rraw | 9 +++++++++ src/rbindlist.c | 10 +++++----- 3 files changed, 16 insertions(+), 5 deletions(-) diff --git a/NEWS.md b/NEWS.md index 14b1316302..8411e8d1f2 100644 --- a/NEWS.md +++ b/NEWS.md @@ -112,6 +112,8 @@ 14. Subassigning using `$<-` to a `data.table` embedded in a list column of a single-row `data.table` could fail, [#3474](https://github.com/Rdatatable/data.table/issues/3474). Note that `$<-` is not recommended; please use `:=` instead which already worked in this case. Thanks to Jakob Richter for reporting. +15. `rbind` and `rbindlist` of zero-row `data.table` now retain again the unused levels of any zero-length factor columns, [#3508](https://github.com/Rdatatable/data.table/issues/3508). This was a regression in v1.12.2 just when stacking zero-row items. Unused factor levels when nrow>=1 were already retained. Thanks to Gregory Demin for reporting. + #### NOTES 1. `rbindlist`'s `use.names="check"` now emits its message for automatic column names (`"V[0-9]+"`) too, [#3484](https://github.com/Rdatatable/data.table/pull/3484). See news item 5 of v1.12.2 below. diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 82d7a1ce26..ee35ec302e 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -14847,6 +14847,15 @@ test(2049.2, outer$ab, list(data.table(a=1:3, b=4L))) test(2049.3, outer$ab[[1]][, b := 5L], data.table(a=1:3, b=5L)) test(2049.4, outer$ab, list(data.table(a=1:3, b=5L))) +# rbindlist zero row DT should retain its (unused) levels, #3508 +DT = data.table(f = factor(c("a", "b", "c"))) +test(2050.1, rbind(DT[1], DT[1])[,levels(f)], c("a","b","c")) # ok before (unused levels when nrow>0 were retained) +test(2050.2, rbind(DT[1], DT[0])[,levels(f)], c("a","b","c")) # ok before +test(2050.3, rbind(DT[0], DT[1])[,levels(f)], c("a","b","c")) # ok before +test(2050.4, rbind(DT[0], DT[0])[,levels(f)], c("a","b","c")) # now ok again (only when nrow=0 were unused levels dropped) +test(2050.5, rbindlist(list(DT[0], DT[0]))[,levels(f)], c("a","b","c")) # now ok again +test(2050.6, rbind(DT[1], data.table(f=factor(letters[10:11]))[0])[,levels(f)], c("a","b","c","j","k")) # now includes "j","k" again + ################################### # Add new tests above this line # diff --git a/src/rbindlist.c b/src/rbindlist.c index 5c80d21e38..1a0cffae31 100644 --- a/src/rbindlist.c +++ b/src/rbindlist.c @@ -392,13 +392,13 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg) } for (int i=0; i Date: Thu, 23 May 2019 00:21:34 -0700 Subject: [PATCH 2/2] news item tweak --- NEWS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index 8411e8d1f2..ac1c30b9b4 100644 --- a/NEWS.md +++ b/NEWS.md @@ -112,7 +112,7 @@ 14. Subassigning using `$<-` to a `data.table` embedded in a list column of a single-row `data.table` could fail, [#3474](https://github.com/Rdatatable/data.table/issues/3474). Note that `$<-` is not recommended; please use `:=` instead which already worked in this case. Thanks to Jakob Richter for reporting. -15. `rbind` and `rbindlist` of zero-row `data.table` now retain again the unused levels of any zero-length factor columns, [#3508](https://github.com/Rdatatable/data.table/issues/3508). This was a regression in v1.12.2 just when stacking zero-row items. Unused factor levels when nrow>=1 were already retained. Thanks to Gregory Demin for reporting. +15. `rbind` and `rbindlist` of zero-row items now retain (again) the unused levels of any (zero-length) factor columns, [#3508](https://github.com/Rdatatable/data.table/issues/3508). This was a regression in v1.12.2 just for zero-row items. Unused factor levels were already retained for items having `nrow>=1`. Thanks to Gregory Demin for reporting. #### NOTES