We currently write all the things marked with @table to tab delimited csv files. This has a number of problems:
- The format does not contain enogh information to restore the table by itself. Instead we need to write inconvenient and buggy code in
tableloader.py.
- Parsing csv is very slow compared to sensible binary formats.
- The files are huge compared to sensibly compressed binary formats.
- There is no built in functionality to store or load metadata, so we cannot easily implement sensible checks without loading the whole file.
These thing make it very inconvenient to work with theory covariance matrices for example.
CSV does have the advantage that you can open it with any text editor or spreadsheet software, but that is not such a frequent use case.
We should instead start depending on binary formats. The one that seems the best for our needs, because it is supported directly by pandas, as well as a number of other things, and has answers to all of the problems above is parquet. It is probably what we are going to use for fktables as well (see #404).
At minimum we would need something similar to reportengine.table but writing parquet files. How that should interact with the existing table decorator is unclear to me. One option would be to just write parquet always, but that would break a number of things (such as the as analysis code) unless some compatibility layer was added. In any case I do think that the covariance matrix files should be smaller and faster (and smarter).
We currently write all the things marked with
@tableto tab delimited csv files. This has a number of problems:tableloader.py.These thing make it very inconvenient to work with theory covariance matrices for example.
CSV does have the advantage that you can open it with any text editor or spreadsheet software, but that is not such a frequent use case.
We should instead start depending on binary formats. The one that seems the best for our needs, because it is supported directly by pandas, as well as a number of other things, and has answers to all of the problems above is parquet. It is probably what we are going to use for fktables as well (see #404).
At minimum we would need something similar to
reportengine.tablebut writing parquet files. How that should interact with the existing table decorator is unclear to me. One option would be to just write parquet always, but that would break a number of things (such as the as analysis code) unless some compatibility layer was added. In any case I do think that the covariance matrix files should be smaller and faster (and smarter).