Fix #431: Make drop_redundant_dims safe for data.table#434
Fix #431: Make drop_redundant_dims safe for data.table#434jgabry merged 3 commits intostan-dev:masterfrom
Conversation
|
I am not sure this is the best solution for this issue. Specifically, I worry it is not robust because it relies on implicit behavior of the data.table package. It would be more robust to either issue a more informative error and let the user coerce the input data, or force coercion to a data.frame. Additionally, it would be nice if this PR included an associated test so if/when data.table changes its behavior, it will flag this as needing attention. |
Instead of checking for data.table in `drop_redundant_dims`, follow the behavior of `validate_newdata` and coerce everything to a data.frame in `validate_data` before calling `drop_redundant_dims`.
Make drop_redundant_dims safe for data.table
|
Thanks for the comments! I updated the pull request, and now I tested it on your MWE from #431 as well as on my own data. |
|
@danschrage and @mespe, thanks for sorting this out! |
Modify
drop_redundant_dimsto avoid an error when data is a data.table, as reported in #431. This happens becausedrop_redundant_dimsindexes into a data.frame using a vector of logicals, but data.table expects an unquoted name to be a column name. To fix this, my code skips the dimension-reduction step if data is a data.table.This shouldn't cause problems with data.table: In general, dimension reduction like this isn't necessary for a data.table, because data.table makes it really difficult to add a matrix as a column. The only way to do this is to explicitly coerce an existing data.frame that has matrices as columns using
setDT(), and that will warn the user against doing this (see Rdatatable/data.table#3851). For any other case, data.table will automatically coerce matrices into columns. In other words, data.table does this dimension reduction automatically, so it can be safely skipped.