Skip to content

Fix #431: Make drop_redundant_dims safe for data.table#434

Merged
jgabry merged 3 commits intostan-dev:masterfrom
danschrage:master
May 13, 2020
Merged

Fix #431: Make drop_redundant_dims safe for data.table#434
jgabry merged 3 commits intostan-dev:masterfrom
danschrage:master

Conversation

@danschrage
Copy link
Copy Markdown
Contributor

Modify drop_redundant_dims to avoid an error when data is a data.table, as reported in #431. This happens because drop_redundant_dims indexes into a data.frame using a vector of logicals, but data.table expects an unquoted name to be a column name. To fix this, my code skips the dimension-reduction step if data is a data.table.

This shouldn't cause problems with data.table: In general, dimension reduction like this isn't necessary for a data.table, because data.table makes it really difficult to add a matrix as a column. The only way to do this is to explicitly coerce an existing data.frame that has matrices as columns using setDT(), and that will warn the user against doing this (see Rdatatable/data.table#3851). For any other case, data.table will automatically coerce matrices into columns. In other words, data.table does this dimension reduction automatically, so it can be safely skipped.

@mespe
Copy link
Copy Markdown

mespe commented May 4, 2020

I am not sure this is the best solution for this issue. Specifically, I worry it is not robust because it relies on implicit behavior of the data.table package. It would be more robust to either issue a more informative error and let the user coerce the input data, or force coercion to a data.frame.

Additionally, it would be nice if this PR included an associated test so if/when data.table changes its behavior, it will flag this as needing attention.

danschrage added 2 commits May 4, 2020 09:49
Instead of checking for data.table in `drop_redundant_dims`, follow the behavior of `validate_newdata` and coerce everything to a data.frame in `validate_data` before calling `drop_redundant_dims`.
Make drop_redundant_dims safe for data.table
@danschrage
Copy link
Copy Markdown
Contributor Author

Thanks for the comments! I updated the pull request, and now validate_data behaves identically to validate_newdata and simply coerces anything (data.table, tibble, etc.) into a data.frame before calling drop_redundant_dims. So it should no longer depend on or be disrupted by any future changes in data.table. That should eliminate the need for an associated test, too.

I tested it on your MWE from #431 as well as on my own data.

@jgabry
Copy link
Copy Markdown
Member

jgabry commented May 13, 2020

@danschrage and @mespe, thanks for sorting this out!

@jgabry jgabry merged commit 2ab134f into stan-dev:master May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants