Fix #431: Make drop_redundant_dims safe for data.table by danschrage · Pull Request #434 · stan-dev/rstanarm

danschrage · 2020-05-03T21:12:15Z

Modify drop_redundant_dims to avoid an error when data is a data.table, as reported in #431. This happens because drop_redundant_dims indexes into a data.frame using a vector of logicals, but data.table expects an unquoted name to be a column name. To fix this, my code skips the dimension-reduction step if data is a data.table.

This shouldn't cause problems with data.table: In general, dimension reduction like this isn't necessary for a data.table, because data.table makes it really difficult to add a matrix as a column. The only way to do this is to explicitly coerce an existing data.frame that has matrices as columns using setDT(), and that will warn the user against doing this (see Rdatatable/data.table#3851). For any other case, data.table will automatically coerce matrices into columns. In other words, data.table does this dimension reduction automatically, so it can be safely skipped.

mespe · 2020-05-04T14:37:15Z

I am not sure this is the best solution for this issue. Specifically, I worry it is not robust because it relies on implicit behavior of the data.table package. It would be more robust to either issue a more informative error and let the user coerce the input data, or force coercion to a data.frame.

Additionally, it would be nice if this PR included an associated test so if/when data.table changes its behavior, it will flag this as needing attention.

Instead of checking for data.table in `drop_redundant_dims`, follow the behavior of `validate_newdata` and coerce everything to a data.frame in `validate_data` before calling `drop_redundant_dims`.

Make drop_redundant_dims safe for data.table

danschrage · 2020-05-04T17:12:11Z

Thanks for the comments! I updated the pull request, and now validate_data behaves identically to validate_newdata and simply coerces anything (data.table, tibble, etc.) into a data.frame before calling drop_redundant_dims. So it should no longer depend on or be disrupted by any future changes in data.table. That should eliminate the need for an associated test, too.

I tested it on your MWE from #431 as well as on my own data.

jgabry · 2020-05-13T16:58:50Z

@danschrage and @mespe, thanks for sorting this out!

Make drop_redundant_dims safe for data.table

9d81223

danschrage mentioned this pull request May 3, 2020

drop_rendudant_dims() issues error with data.table #431

Closed

danschrage added 2 commits May 4, 2020 09:49

Make drop_redundant_dims safe for data.table

a53ec59

Instead of checking for data.table in `drop_redundant_dims`, follow the behavior of `validate_newdata` and coerce everything to a data.frame in `validate_data` before calling `drop_redundant_dims`.

Merge pull request #1 from danschrage/testing

78ba95f

Make drop_redundant_dims safe for data.table

jgabry approved these changes May 13, 2020

View reviewed changes

jgabry merged commit 2ab134f into stan-dev:master May 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix #431: Make drop_redundant_dims safe for data.table#434

Fix #431: Make drop_redundant_dims safe for data.table#434
jgabry merged 3 commits intostan-dev:masterfrom
danschrage:master

danschrage commented May 3, 2020

Uh oh!

mespe commented May 4, 2020

Uh oh!

danschrage commented May 4, 2020

Uh oh!

jgabry commented May 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

danschrage commented May 3, 2020

Uh oh!

mespe commented May 4, 2020

Uh oh!

danschrage commented May 4, 2020

Uh oh!

jgabry commented May 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants