Skip to content

parse_gctx.py performance improvements #12

@DavidTingley

Description

@DavidTingley

I'm not sure if this is the same or a different issue from what @shababo brought up the other week. But parsing the GCTX file takes ~8 times longer than loading the same data via pd.read_table when loading subsets of data. It is ~2X slower when loading the full matrices. It's unclear if the compression is the same on these files. I hadn't noticed this previously as I typically was loading once and often loading only methylation via the *.tsv.gz files.

image

Tagging @bsiranosian @ANaka for visibility and to bring the discussion into github.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions