parse_gctx.py performance improvements

I'm not sure if this is the same or a different issue from what @shababo brought up the other week. But parsing the GCTX file takes ~8 times longer than loading the same data via `pd.read_table` when loading subsets of data. It is ~2X slower when loading the full matrices. It's unclear if the compression is the same on these files. I hadn't noticed this previously as I typically was loading once and often loading only methylation via the `*.tsv.gz` files. 

![image](https://user-images.githubusercontent.com/5696262/169830481-35033629-f47a-4eb5-8d0f-9d7da063dac7.png)

Tagging @bsiranosian @ANaka for visibility and to bring the discussion into github. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parse_gctx.py performance improvements #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

parse_gctx.py performance improvements #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions