Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Support Sparse Matrix Input #773

@szilard

Description

@szilard

If one has categorical variables, that needs to be 1-hot encoded and the resulting matrix can be 100x or even 1000x bigger if dense rather than sparse representation is used. It seems mxnet currently supports only dense, and therefore it's easy to hit RAM limitations even with fairly small datasets on the largest EC2 GPU box g2.8xlarge with 60GB RAM.

For example a 10M row sample of the well knows airline dataset is ~2GB in sparse representation, but over 60GB in dense.

More info here:
szilard/benchm-ml#30
and some code here:
https://github.com/szilard/benchm-ml/blob/master/4-DL/2-mxnet.R

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions