I'm not sure if this is the same or a different issue from what @shababo brought up the other week. But parsing the GCTX file takes ~8 times longer than loading the same data via pd.read_table when loading subsets of data. It is ~2X slower when loading the full matrices. It's unclear if the compression is the same on these files. I hadn't noticed this previously as I typically was loading once and often loading only methylation via the *.tsv.gz files.

Tagging @bsiranosian @ANaka for visibility and to bring the discussion into github.
I'm not sure if this is the same or a different issue from what @shababo brought up the other week. But parsing the GCTX file takes ~8 times longer than loading the same data via
pd.read_tablewhen loading subsets of data. It is ~2X slower when loading the full matrices. It's unclear if the compression is the same on these files. I hadn't noticed this previously as I typically was loading once and often loading only methylation via the*.tsv.gzfiles.Tagging @bsiranosian @ANaka for visibility and to bring the discussion into github.