Skip to content

all.equal no longer error when not a DT#4072

Merged
mattdowle merged 12 commits intoRdatatable:masterfrom
d-sci:all-equal-shouldnt-error
Jan 8, 2020
Merged

all.equal no longer error when not a DT#4072
mattdowle merged 12 commits intoRdatatable:masterfrom
d-sci:all-equal-shouldnt-error

Conversation

@d-sci
Copy link
Copy Markdown
Contributor

@d-sci d-sci commented Nov 21, 2019

Closes #4042
Fix so that all.equal(DT, not_a_DT) returns the typical messages rather than error.
I had to modify old tests 1613.59 and 1613.60, which were checking for this error.

Comment thread R/setops.R Outdated
stopifnot(is.logical(trim.levels), is.logical(check.attributes), is.logical(ignore.col.order), is.logical(ignore.row.order), is.numeric(tolerance))
if (!is.data.table(target) || !is.data.table(current)) stop("'target' and 'current' must both be data.tables")
stopifnot(is.logical(trim.levels), is.logical(check.attributes), is.logical(ignore.col.order), is.logical(ignore.row.order), is.numeric(tolerance), is.data.table(target))
if (!is.data.table(current)) return(NextMethod())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will return all.equal(data.table(a=1), data.frame(a=1)) to TRUE right? Shouldn't that be FALSE with reason "current is not a data.table"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will return

[1] "Attributes: < Names: 2 string mismatches >"                              
[2] "Attributes: < Length mismatch: comparison on first 2 components >"       
[3] "Attributes: < Component 1: target is externalptr, current is character >"
[4] "Attributes: < Component 2: Modes: character, numeric >"                  
[5] "Attributes: < Component 2: Lengths: 2, 1 >"                              
[6] "Attributes: < Component 2: target is character, current is numeric >"    

which is the same (just with some terms swapped) as the result of all.equal(data.frame(a=1), data.table(a=1))

Comment thread R/setops.R Outdated
@codecov
Copy link
Copy Markdown

codecov bot commented Nov 21, 2019

Codecov Report

Merging #4072 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4072      +/-   ##
==========================================
+ Coverage   99.54%   99.54%   +<.01%     
==========================================
  Files          72       72              
  Lines       13918    13932      +14     
==========================================
+ Hits        13854    13868      +14     
  Misses         64       64
Impacted Files Coverage Δ
R/data.table.R 100% <100%> (ø) ⬆️
src/fifelse.c 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa0ca17...fb8c57d. Read the comment docs.

@d-sci
Copy link
Copy Markdown
Contributor Author

d-sci commented Nov 21, 2019

@jangorecki raised an interesting point with regards to check.attributes = F. And what the result of all.equal should be on other-wise identical data.table vs data.frame. Base R can actually be oddly inconsistent here on all.equal(x, y) vs all.equal(y,x):

all.equal(data.frame(a=1), c(a=1), check.attributes = F)
#> [1] TRUE
all.equal(c(a=1), data.frame(a=1), check.attributes = F)
#> [1] "target is numeric, current is data.frame"

Current data.table always errors when the second argument isn't a data.table (regardless of check.attributes), and most of all.equal.data.table seems written based on the assumption that both args are data.tables. With new commit, this PR takes an easy middle ground, relaxing the error to a message, but still not returning TRUE for all.equal(data.table(a=1), data.frame(a=1), check.attributes = F), which would require more work:

all.equal(data.table(a=1), c(a=1), check.attributes = F) # would ERROR before
#> [1] "target is data.table, current is numeric"
all.equal(c(a=1), data.table(a=1), check.attributes = F)
#> [1] "target is numeric, current is data.table"

all.equal(data.table(a=1), data.frame(a=1), check.attributes = F) # would ERROR before
#> [1] "target is data.table, current is data.frame"
all.equal(data.frame(a=1), data.table(a=1), check.attributes = F)
#> [1] TRUE

Comment thread R/setops.R Outdated
@jangorecki jangorecki added the WIP label Nov 24, 2019
@d-sci
Copy link
Copy Markdown
Contributor Author

d-sci commented Nov 24, 2019

Ok, new attempt to handle check.attributes appropriately. Basically relying on the fact that setDT(x) does a good job of checking whether xis at all data.table-like and if so, coercing to a data.table without actually changing any non-attribute data. So, if current is not a data.table:

  • if check.attributes = TRUE return quickly with class mismatch info
  • if check.attributes = FALSE butcurrent is nothing like a data.table, return quickly with message to that effect
  • if check.attributes = FALSE butcurrent can be coerced via setDT, do so and continue on (thus still honouring ignore.col.order, ignore.row.order etc.)

Comment thread R/setops.R Outdated
Comment thread inst/tests/tests.Rraw Outdated
Comment thread inst/tests/tests.Rraw Outdated
@d-sci
Copy link
Copy Markdown
Contributor Author

d-sci commented Jan 6, 2020

Addressed most of your comments as suggested. I didn't change setDT(copy(current)) to as.data.table(current) because as.data.table seems like it could do quite a lot of coercion that really changes underlying values. In particular, it means that both list(a = 1, b = c(2,2)) and list(a = c(1,1), b = c(2,2)) would be "all equal" with data.table(a = c(1,1), b = c(2,2)) even though they wouldn't be with each other (since as.data.table can expand columns but setDT will error if any list elements are unequal length). setDT seems to be really good at telling "what could possibly be a data.table as it currently is, even if its class isn't data.table", and I can't see a situation where it would fail on X but you would still expect all.equal(DT, X) to be TRUE.

@jangorecki
Copy link
Copy Markdown
Member

@d-sci thanks, I am getting your point. The thing is that if we supply a list into current one might actually expect it to be expanded, because this is how the list behaves when being coerced to data.table. If one wants to have behavior you described then probably all.equal(as.list(target), current) is more appropriate (then of course we dispatch to all.equal.list).
If as.data.table way handles your use case then I would go with it. Otherwise probably best to hear what others think about try() coercion and the consistency of list/data.table equality.

@d-sci
Copy link
Copy Markdown
Contributor Author

d-sci commented Jan 6, 2020

Ok, fair enough, switched over to as.data.table. This will certainly work for my use case -- as long as I get a string and not an error for all.equal(DT, not_a_DT) I'm happy!

@jangorecki
Copy link
Copy Markdown
Member

jangorecki commented Jan 6, 2020

We need documentation update for current argument, to explicitly mention as.data.table method used for coercion when check.attributes=F. Probably also an example of all.equal(dt(a=1), 1, check.attributes=FALSE)) which might be surprising. Also news entry is out dated now.

@jangorecki
Copy link
Copy Markdown
Member

Thanks for fixing missing S3 registration. Unless you have something more in mind I think we can remove WIP label already.

@d-sci
Copy link
Copy Markdown
Contributor Author

d-sci commented Jan 7, 2020

Yep, this is ready in my mind! Thanks for all your comments and suggestions.

@jangorecki jangorecki removed the WIP label Jan 8, 2020
@mattdowle mattdowle added this to the 1.12.9 milestone Jan 8, 2020
@mattdowle mattdowle changed the title all.equal doesnt error when not a DT, fixes #4042 all.equal no longer error when not a DT Jan 8, 2020
@mattdowle
Copy link
Copy Markdown
Member

Thanks, @d-sci. I added you to DESCRIPTION in this PR and invited you to be project member. Please accept the invite displayed in your GitHub repositories page and then you can create branches in the main project so we can more easily commit to each other's branches. Welcome!

@mattdowle mattdowle merged commit 7a9eaf6 into Rdatatable:master Jan 8, 2020
@jangorecki jangorecki modified the milestones: 1.12.11, 1.12.9 May 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

all.equal shouldn't ERROR when comparing a data.table to a non-data.table

4 participants