Try out the parquet format for some tables

We currently write all the things marked with `@table` to tab delimited csv files. This has a number of problems:

 - The format does not contain enogh information to restore the table by itself. Instead we need to write inconvenient and buggy code in `tableloader.py`.
  - Parsing csv is very slow compared to sensible binary formats.
  - The files are huge compared to sensibly compressed binary formats.
  - There is no built in functionality to store or load metadata, so we cannot easily implement sensible checks without loading the whole file.

These thing make it very inconvenient to work with theory covariance matrices for example.

CSV does have the advantage that you can open it with any text editor or spreadsheet software, but that is not such a frequent use case. 

We should instead start depending on binary formats. The one that seems the best for our needs, because it is supported directly by pandas, as well as a number of other things, and has answers to all of the problems above is parquet. It is probably what we are going to use for fktables as well (see #404). 

At minimum we would need something similar to `reportengine.table` but writing parquet files. How that should interact with the existing table decorator is unclear to me. One option would be to just write parquet always, but that would break a number of things (such as the as analysis code) unless some compatibility layer was added. In any case I do think that the covariance matrix files should be smaller and faster (and smarter).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try out the parquet format for some tables #449

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Try out the parquet format for some tables #449

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions