as.data.frame but keep row names#5320
Conversation
|
Hey, Assigning Might be also worth to take a look at NEWS item misses your name. |
|
Hi thanks for your feedback. That is true, as.data.frame.data.table = function(x, row.names = NULL, ...) {
ans = copy(x)
setattr(ans,"class","data.frame")
setattr(ans,"sorted",NULL) # remove so if you convert to df, do something, and convert back, it is not sorted
setattr(ans,"index",NULL) #4889 #5042
setattr(ans,".internal.selfref",NULL)
# leave tl intact, no harm,
if(!is.null(row.names) && is.character(row.names)) {
setattr(ans, "row.names", row.names)
} else {
setattr(ans,"row.names",.set_row_names(nrow(x))) # since R 2.4.0, data.frames can have non-character row names
}
ans
}Could you help me understand what this line of code is doing: I tried it and it returns this which I don't understand the purpose of: test <- data.table::data.table(iris)
test$sample <- paste("sample", 1:nrow(test), sep = "_")
.set_row_names(nrow(test))You also mentioned looking at |
|
The result for .Internal(inspect(iris))
# @55fa91c185d8 19 VECSXP g1c4 [OBJ,MARK,REF(27),ATT] (len=5, tl=0)
# @55fa8ecbe2e0 14 REALSXP g1c7 [MARK,REF(7)] (len=150, tl=0) 5.1,4.9,4.7,4.6,5,...
# @55fa910bca50 14 REALSXP g1c7 [MARK,REF(7)] (len=150, tl=0) 3.5,3,3.2,3.1,3.6,...
# @55fa90e857e0 14 REALSXP g1c7 [MARK,REF(7)] (len=150, tl=0) 1.4,1.4,1.3,1.5,1.4,...
# @55fa90c821e0 14 REALSXP g1c7 [MARK,REF(7)] (len=150, tl=0) 0.2,0.2,0.2,0.2,0.2,...
# @55fa8e5a9a70 13 INTSXP g1c7 [OBJ,MARK,REF(11),ATT] (len=150, tl=0) 1,1,1,1,1,...
# ATTRIB:
# @55fa90729090 02 LISTSXP g1c0 [MARK,REF(1)]
# TAG: @55fa8e27a330 01 SYMSXP g1c0 [MARK,REF(334),LCK,gp=0x4000] "levels" (has value)
# @55fa91c50d48 16 STRSXP g1c3 [MARK,REF(65535)] (len=3, tl=0)
# @55fa91c54d20 09 CHARSXP g1c1 [MARK,REF(1588),gp=0x60] [ASCII] [cached] "setosa"
# @55fa913c5f48 09 CHARSXP g1c2 [MARK,REF(1498),gp=0x60] [ASCII] [cached] "versicolor"
# @55fa91355d88 09 CHARSXP g1c2 [MARK,REF(1498),gp=0x60] [ASCII] [cached] "virginica"
# TAG: @55fa8e27a720 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x6000] "class" (has value)
# @55fa91c54d58 16 STRSXP g1c1 [MARK,REF(65535)] (len=1, tl=0)
# @55fa8e31dad8 09 CHARSXP g1c1 [MARK,REF(409),gp=0x61] [ASCII] [cached] "factor"
# ATTRIB:
# @55fa90729100 02 LISTSXP g1c0 [MARK,REF(1)]
# TAG: @55fa8e27a1e0 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x6000] "names" (has value)
# @55fa91c18648 16 STRSXP g1c4 [MARK,REF(65535)] (len=5, tl=0)
# @55fa91355dc8 09 CHARSXP g1c2 [MARK,REF(169),gp=0x61] [ASCII] [cached] "Sepal.Length"
# @55fa91355e08 09 CHARSXP g1c2 [MARK,REF(161),gp=0x61] [ASCII] [cached] "Sepal.Width"
# @55fa91355e48 09 CHARSXP g1c2 [MARK,REF(169),gp=0x61] [ASCII] [cached] "Petal.Length"
# @55fa91355e88 09 CHARSXP g1c2 [MARK,REF(161),gp=0x61] [ASCII] [cached] "Petal.Width"
# @55fa91c54d90 09 CHARSXP g1c1 [MARK,REF(162),gp=0x61] [ASCII] [cached] "Species"
# TAG: @55fa8e27a720 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x6000] "class" (has value)
# @55fa91c54dc8 16 STRSXP g1c1 [MARK,REF(65535)] (len=1, tl=0)
# @55fa8e324408 09 CHARSXP g1c2 [MARK,REF(1722),gp=0x61,ATT] [ASCII] [cached] "data.frame"
# TAG: @55fa8e279fb0 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x4000] "row.names" (has value)
# @55fa91c54e00 13 INTSXP g1c1 [MARK,REF(65535)] (len=2, tl=0) -2147483648,-150The interesting part is the last line where we can see that the rownames are represented by setDFThe pointer towards as.data.frame.data.table = function(x, row.names = NULL, ...) {
ans = setDF(copy(x), rownames=row.names)
ans
}
DT = as.data.table(head(iris))
as.data.frame(DT)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
as.data.frame(DT, row.names=paste0("n", seq(nrow(DT))))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# n1 5.1 3.5 1.4 0.2 setosa
# n2 4.9 3.0 1.4 0.2 setosa
# n3 4.7 3.2 1.3 0.2 setosa
# n4 4.6 3.1 1.5 0.2 setosa
# n5 5.0 3.6 1.4 0.2 setosa
# n6 5.4 3.9 1.7 0.4 setosa |
|
Thank you for that, now some of that is still foggy but I do have a better understanding of You originally pointed me to Looking at My current iteration thanks to you is now as you stated above: as.data.frame.data.table = function(x, row.names = NULL, ...)
{
ans = setDF(copy(x), rownames = row.names)
ans
}
test <- data.table::data.table(iris)
test$sample <- paste("sample", 1:nrow(test), sep = "_")
test2 <- as.data.frame(test, row.names = test$sample) |
|
Well if you take a look at the source of Updating the NEWS item with the fix (no longer silently ignoring |
|
Hi thank you again. I made the necessary changes to my pull request:
I simply added these lines I believe this should be sufficient (I updated my test, I believe this is how one writes a test): # test for #5320 `as.data.table(x)` `s3` method fixed, no longer ignoring `row.names` argument and simplified with use of `setDT`
DT = data.table::data.table(iris)
DF = data.frame(iris)
test(2235.1, as.data.frame(DT, row.names = paste("sample", 1:nrow(DT), sep = "_")), as.data.frame(DF, row.names = paste("sample", 1:nrow(DF), sep = "_")))Testing issuesI did have issues when running my First of all thank you for your patience; this is my first PR to My system and sessionInfoFirst of all this is my system: MacOS 11.4 MacBook Air (M1, 2020) 16GB RAM. And my Testing is failing for me;
|
|
Regarding the test cases, there should be at least two test cases, 1. where we assign rownames to non-standard values and 2. where So the first one could look like this. dt = data.table(a=1:2, b=3:4)
df = structure(list(a = 1:2, b = 3:4), row.names = c("x", "y"), class = "data.frame")
test(2235.1, as.data.frame(dt, row.names=c("x","y")), df)Regarding the testing issues there are some issues with installing on MAC, but in any case I clicked now on approve to allow Github CI to run, so you can also check the test cases when you changed them (simply push them to the branch). |
…mes=c('x','y')|NULL.
|
Hi thank you, so the tests look as such after your guidance: dt = data.table(a=1:2, b=3:4)
df = structure(list(a=1:2, b=3:4), row.names=c("x", "y"), class="data.frame")
test(2235.1, as.data.frame(dt, row.names=c("x", "y")), df)
df = data.frame(a=1:2, b=3:4)
test(2235.2, as.data.frame(dt, row.names=NULL), df)I found that structure(list(a=1:2, b=3:4), row.names=NULL, class="data.frame")I found that these were Here I tested the equivalence of all these ways of creating a df = data.frame(a=1:2, b=3:4)
df2 = data.frame(a=1:2, b=3:4, row.names = NULL)
df3 = structure(list(a=1:2, b=3:4), row.names=c("x", "y"), class="data.frame")
rownames(df1) = NULL
sapply(list(df, df1, df2), function(x) { identical(df, x)})For my education; why did you create a |
|
There is something wrong with your equivalence checking code, since you never create an object for The reason for creating the |
|
You're right this is the correct equivalency code; pardon that: df1 = data.frame(a=1:2, b=3:4)
df2 = data.frame(a=1:2, b=3:4, row.names = NULL)
df3 = structure(list(a=1:2, b=3:4), row.names=c("x", "y"), class="data.frame")
rownames(df3) = NULL
sapply(list(df1, df2, df3), function(x) { identical(df1, x) })As for the data.frame(a=1:2, b=3:4, row.names = c("x", "y"))And no way that is really cool; I will be using this for sure: |
Codecov Report
@@ Coverage Diff @@
## master #5320 +/- ##
==========================================
- Coverage 99.51% 99.51% -0.01%
==========================================
Files 78 78
Lines 14761 14756 -5
==========================================
- Hits 14689 14684 -5
Misses 72 72
Continue to review full report at Codecov.
|
|
Ah true, didn't remember that |
|
@mattdowle Codecov decrease is spurious since duplicated code is deleted. |
|
Great PR, thanks @dereckdemezquita and welcome to the project. Have invited you to be project member; the invite should be a button in your profile or projects page that you need to click to accept. Then you can create branches in the main project in future. Thanks @ben-schwen for guiding here. |
|
@mattdowle thank for you the invite, I am a big fan so naturally very happy to contribute! |
Closes #5319
Hello, I created an issue in the repo a few hours ago. I want to contribute and thus read the documentation and guidelines for pull requests.
This is a new feature I'm adding here to the
s3methodas.data.frame.data.table; allows the user to specify a column to move to the row names of the resultingdata.frame.My code looks like this:
I also added this feature to the NEWS.md document. I hope I've satisfied all requests and am always open to critique.