From dd4fa9712901787a92b5113330a984041d510ced Mon Sep 17 00:00:00 2001 From: dereckdemezquita Date: Sun, 30 Jan 2022 03:29:58 -0500 Subject: [PATCH 01/10] Allows as.data.frame move ID column to rownames. --- R/data.table.R | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/R/data.table.R b/R/data.table.R index e671a208df..a11f1f5ec6 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -2207,7 +2207,7 @@ tail.data.table = function(x, n=6L, ...) { set(x,j=name,value=value) # important i is missing here } -as.data.frame.data.table = function(x, ...) +as.data.frame.data.table = function(x, row.names = NULL, ...) { ans = copy(x) setattr(ans,"row.names",.set_row_names(nrow(x))) # since R 2.4.0, data.frames can have non-character row names @@ -2216,6 +2216,10 @@ as.data.frame.data.table = function(x, ...) setattr(ans,"index",NULL) #4889 #5042 setattr(ans,".internal.selfref",NULL) # leave tl intact, no harm, + if(!is.null(row.names)) { + rownames(ans) = ans[, row.names] + ans[, row.names] = NULL + } ans } From 66be7e308bd25964a9987fc94874ea02aabb1a90 Mon Sep 17 00:00:00 2001 From: dereckdemezquita Date: Sun, 30 Jan 2022 03:36:14 -0500 Subject: [PATCH 02/10] as.data.frame.data.table move column to rownames added to news. --- NEWS.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/NEWS.md b/NEWS.md index aad41a0ccb..07a966805c 100644 --- a/NEWS.md +++ b/NEWS.md @@ -544,6 +544,8 @@ 51. `merge.data.table()` silently ignored the `incomparables` argument, [#2587](https://github.com/Rdatatable/data.table/issues/2587). It is now implemented and any other ignored arguments (e.g. misspellings) are now warned about. Thanks to @GBsuperman for the report and @ben-schwen for the fix. +56. `as.data.frame.data.table()` now allows for the use of `row.names` argument; it moves the specified column to the `rownames` of the resulting `data.frame`. + ## NOTES 1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example : From cab9ff01251065a33a92417bb44163a698d839e3 Mon Sep 17 00:00:00 2001 From: dereckdemezquita Date: Sun, 30 Jan 2022 19:07:41 -0500 Subject: [PATCH 03/10] as.data.frame.data.table no longer ignores row.names and simplifies with setDF. --- NEWS.md | 2 +- R/data.table.R | 12 +----------- 2 files changed, 2 insertions(+), 12 deletions(-) diff --git a/NEWS.md b/NEWS.md index 07a966805c..bd8d23d706 100644 --- a/NEWS.md +++ b/NEWS.md @@ -544,7 +544,7 @@ 51. `merge.data.table()` silently ignored the `incomparables` argument, [#2587](https://github.com/Rdatatable/data.table/issues/2587). It is now implemented and any other ignored arguments (e.g. misspellings) are now warned about. Thanks to @GBsuperman for the report and @ben-schwen for the fix. -56. `as.data.frame.data.table()` now allows for the use of `row.names` argument; it moves the specified column to the `rownames` of the resulting `data.frame`. +56. `as.data.frame.data.table()` bug fix `as.data.frame.data.table` `S3` method no longer silently ignores `row.names` argument; issue [#5319](https://github.com/Rdatatable/data.table/issues/5319). Usage as such: `as.data.frame(dt, row.names = dt$sample)`. Thanks to @dereckdemezquita for the fix and pull request. ## NOTES diff --git a/R/data.table.R b/R/data.table.R index a11f1f5ec6..08b9afb0dd 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -2209,17 +2209,7 @@ tail.data.table = function(x, n=6L, ...) { as.data.frame.data.table = function(x, row.names = NULL, ...) { - ans = copy(x) - setattr(ans,"row.names",.set_row_names(nrow(x))) # since R 2.4.0, data.frames can have non-character row names - setattr(ans,"class","data.frame") - setattr(ans,"sorted",NULL) # remove so if you convert to df, do something, and convert back, it is not sorted - setattr(ans,"index",NULL) #4889 #5042 - setattr(ans,".internal.selfref",NULL) - # leave tl intact, no harm, - if(!is.null(row.names)) { - rownames(ans) = ans[, row.names] - ans[, row.names] = NULL - } + ans = setDF(copy(x), rownames = row.names) ans } From be1245c48fe25e1996b1cfd9c258cad501837e10 Mon Sep 17 00:00:00 2001 From: dereckdemezquita Date: Sun, 30 Jan 2022 21:03:41 -0500 Subject: [PATCH 04/10] Moved news update to bug section. --- NEWS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index bd8d23d706..2e6131f7f7 100644 --- a/NEWS.md +++ b/NEWS.md @@ -544,7 +544,7 @@ 51. `merge.data.table()` silently ignored the `incomparables` argument, [#2587](https://github.com/Rdatatable/data.table/issues/2587). It is now implemented and any other ignored arguments (e.g. misspellings) are now warned about. Thanks to @GBsuperman for the report and @ben-schwen for the fix. -56. `as.data.frame.data.table()` bug fix `as.data.frame.data.table` `S3` method no longer silently ignores `row.names` argument; issue [#5319](https://github.com/Rdatatable/data.table/issues/5319). Usage as such: `as.data.frame(dt, row.names = dt$sample)`. Thanks to @dereckdemezquita for the fix and pull request. +52. `as.data.frame()` bug fix in `as.data.frame.data.table` `S3` method no longer silently ignores `row.names` argument; issue [#5319](https://github.com/Rdatatable/data.table/issues/5319). Usage as such: `as.data.frame(dt, row.names = dt$sample)`. Thanks to @ben-schwen for guidance and @dereckdemezquita for the fix and pull request. ## NOTES From c3c1d87ee14b875f4cfc1e557766b9269d6c802d Mon Sep 17 00:00:00 2001 From: dereckdemezquita Date: Sun, 30 Jan 2022 21:32:04 -0500 Subject: [PATCH 05/10] Test for issue #5320. --- inst/tests/tests.Rraw | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index fc7e14f753..47b413d28e 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -18702,3 +18702,8 @@ test(2234.7, DT[, min(.SD), by=c(.I)], data.table(I=1L:5L, V1=c(1L, 2L, 3L, 2L, test(2234.8, DT[, min(.SD), by=.I%%2L], error="by.*contains .I.*supported") # would be nice to support in future; i.e. by odd/even rows, and by=(.I+1L)%/%2L for pairs of rows; i.e. any expression of .I test(2234.9, DT[, min(.SD), by=somefun(.I)], error="by.*contains .I.*supported") +# test for #5320 `as.data.table(x)` `s3` method fixed, no longer ignoring `row.names` argument and simplified with use of `setDT` +test = data.table::data.table(iris) +test$sample = paste("sample", 1:nrow(test), sep = "_") + +test2 = as.data.frame(test, row.names = test$sample) \ No newline at end of file From 557479f4fe24d6a9a38d3d9318dc5d7524294ad0 Mon Sep 17 00:00:00 2001 From: dereckdemezquita Date: Sun, 30 Jan 2022 22:32:48 -0500 Subject: [PATCH 06/10] Fixed my test; compare as.data.frame from DT to DF when using row.names. --- inst/tests/tests.Rraw | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 47b413d28e..7d1c33eb1e 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -18703,7 +18703,7 @@ test(2234.8, DT[, min(.SD), by=.I%%2L], error="by.*contains .I.*supported") # w test(2234.9, DT[, min(.SD), by=somefun(.I)], error="by.*contains .I.*supported") # test for #5320 `as.data.table(x)` `s3` method fixed, no longer ignoring `row.names` argument and simplified with use of `setDT` -test = data.table::data.table(iris) -test$sample = paste("sample", 1:nrow(test), sep = "_") +DT = data.table::data.table(iris) +DF = data.frame(iris) -test2 = as.data.frame(test, row.names = test$sample) \ No newline at end of file +test(2234, as.data.frame(DT, row.names = paste("sample", 1:nrow(DT), sep = "_")), as.data.frame(DF, row.names = paste("sample", 1:nrow(DF), sep = "_"))) \ No newline at end of file From ae00bb1b428d5efe42f9b05d1be7a9bd44169700 Mon Sep 17 00:00:00 2001 From: dereckdemezquita Date: Sun, 30 Jan 2022 22:34:33 -0500 Subject: [PATCH 07/10] Mistake in test number. --- inst/tests/tests.Rraw | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 7d1c33eb1e..3cb8c2c031 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -18706,4 +18706,4 @@ test(2234.9, DT[, min(.SD), by=somefun(.I)], error="by.*contains .I.*supported") DT = data.table::data.table(iris) DF = data.frame(iris) -test(2234, as.data.frame(DT, row.names = paste("sample", 1:nrow(DT), sep = "_")), as.data.frame(DF, row.names = paste("sample", 1:nrow(DF), sep = "_"))) \ No newline at end of file +test(2235.1, as.data.frame(DT, row.names = paste("sample", 1:nrow(DT), sep = "_")), as.data.frame(DF, row.names = paste("sample", 1:nrow(DF), sep = "_"))) \ No newline at end of file From 341a75a82f4b02faf757662249b5ebaf2ae8bb70 Mon Sep 17 00:00:00 2001 From: dereckdemezquita Date: Tue, 1 Feb 2022 13:01:06 -0500 Subject: [PATCH 08/10] Corrected tests to use data.table's test function; testing for row.names=c('x','y')|NULL. --- inst/tests/tests.Rraw | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 3cb8c2c031..ee46cb7ff8 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -18703,7 +18703,10 @@ test(2234.8, DT[, min(.SD), by=.I%%2L], error="by.*contains .I.*supported") # w test(2234.9, DT[, min(.SD), by=somefun(.I)], error="by.*contains .I.*supported") # test for #5320 `as.data.table(x)` `s3` method fixed, no longer ignoring `row.names` argument and simplified with use of `setDT` -DT = data.table::data.table(iris) -DF = data.frame(iris) +dt = data.table(a=1:2, b=3:4) +df = structure(list(a=1:2, b=3:4), row.names=c("x", "y"), class="data.frame") -test(2235.1, as.data.frame(DT, row.names = paste("sample", 1:nrow(DT), sep = "_")), as.data.frame(DF, row.names = paste("sample", 1:nrow(DF), sep = "_"))) \ No newline at end of file +test(2235.1, as.data.frame(dt, row.names=c("x", "y")), df) + +df = data.frame(a=1:2, b=3:4) +test(2235.2, as.data.frame(dt, row.names=NULL), df) From 52b01dff8615776c1e4adef1cd6f0205b43f01b7 Mon Sep 17 00:00:00 2001 From: dereckdemezquita Date: Tue, 1 Feb 2022 13:48:14 -0500 Subject: [PATCH 09/10] Added git issue #5319 to modification and formatted testing code. --- R/data.table.R | 2 +- inst/tests/tests.Rraw | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/R/data.table.R b/R/data.table.R index 08b9afb0dd..2b557e366d 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -2209,7 +2209,7 @@ tail.data.table = function(x, n=6L, ...) { as.data.frame.data.table = function(x, row.names = NULL, ...) { - ans = setDF(copy(x), rownames = row.names) + ans = setDF(copy(x), rownames = row.names) # issue #5319 ans } diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index ee46cb7ff8..f6a2d77217 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -18705,8 +18705,6 @@ test(2234.9, DT[, min(.SD), by=somefun(.I)], error="by.*contains .I.*supported") # test for #5320 `as.data.table(x)` `s3` method fixed, no longer ignoring `row.names` argument and simplified with use of `setDT` dt = data.table(a=1:2, b=3:4) df = structure(list(a=1:2, b=3:4), row.names=c("x", "y"), class="data.frame") - test(2235.1, as.data.frame(dt, row.names=c("x", "y")), df) - df = data.frame(a=1:2, b=3:4) test(2235.2, as.data.frame(dt, row.names=NULL), df) From 934c1ea67adce813f7e28daee167cd9b002fdd21 Mon Sep 17 00:00:00 2001 From: mattdowle Date: Tue, 15 Mar 2022 19:31:18 -0600 Subject: [PATCH 10/10] add Dereck to contirbutor list in DESCRIPTION: great PR and welcome! --- DESCRIPTION | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index 88dfd46140..924bdeb2dc 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -70,7 +70,8 @@ Authors@R: c( person("Kyle","Haynes", role="ctb"), person("Boniface Christian","Kamgang", role="ctb"), person("Olivier","Delmarcell", role="ctb"), - person("Josh","O'Brien", role="ctb")) + person("Josh","O'Brien", role="ctb"), + person("Dereck","de Mezquita", role="ctb")) Depends: R (>= 3.1.0) Imports: methods Suggests: bit64 (>= 4.0.0), bit (>= 4.0.4), curl, R.utils, xts, nanotime, zoo (>= 1.8-1), yaml, knitr, rmarkdown, markdown