Reduce memory footprint for large datasets

Some ideas:
1. We do not need to store both `Y` and `Yorig` in the flash data object.
2. We should probably store `tau` as a vector when `var_type = by_column` or `by_row`. This could be tricky, but it's probably worth it since flash fit objects are frequently copied.
3. It shouldn't be too difficult to allow `Y` to be a `dgCMatrix`, and likewise for `S`.

If we do the above, then the only large dense matrices will be the matrices of residuals and squared residuals. (Or rather, `R2`, `Rk`, and `R2k` for the greedy step.) So, optimistically, we might be able to shoot for a memory requirement of 5x the size of the original data (measured as a dense matrix) when `Y` is sparse and 6-8x otherwise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory footprint for large datasets #90

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reduce memory footprint for large datasets #90

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions