Some ideas:
- We do not need to store both
Y and Yorig in the flash data object.
- We should probably store
tau as a vector when var_type = by_column or by_row. This could be tricky, but it's probably worth it since flash fit objects are frequently copied.
- It shouldn't be too difficult to allow
Y to be a dgCMatrix, and likewise for S.
If we do the above, then the only large dense matrices will be the matrices of residuals and squared residuals. (Or rather, R2, Rk, and R2k for the greedy step.) So, optimistically, we might be able to shoot for a memory requirement of 5x the size of the original data (measured as a dense matrix) when Y is sparse and 6-8x otherwise.
Some ideas:
YandYorigin the flash data object.tauas a vector whenvar_type = by_columnorby_row. This could be tricky, but it's probably worth it since flash fit objects are frequently copied.Yto be adgCMatrix, and likewise forS.If we do the above, then the only large dense matrices will be the matrices of residuals and squared residuals. (Or rather,
R2,Rk, andR2kfor the greedy step.) So, optimistically, we might be able to shoot for a memory requirement of 5x the size of the original data (measured as a dense matrix) whenYis sparse and 6-8x otherwise.