Backwards compatibility
First, let's define backwards compatibility:
Grid::read must be able to read all generated PineAPPL grids if they were generated using a released version of PineAPPL. Released versions are the ones on the Releases page.
PineAPPL file format
PineAPPL doesn't have a dedicated file format, but instead relies on serde for (de)serialization and on bincode for actually writing bytes to and from files. This has the disadvantage that, for the sake of backwards compatibility, every struct that has the #[derive(Deserialize,Serialize)] attributes must never be changed ever, and the only flexibility is adding further kinds of enums; that's the reason why there are multiple versions of a struct as V1 and V2 variants.
Obviously requirements change and even in the design mistakes were/will be made. To mention a few examples:
- The
MoreMembers enum was added to support a BinRemapper. This struct basically supersedes BinLimits, which only supports one-dimensional distributions that are contiguous (the right bin limit is the left limit of the next bin). BinRemapper supports, at least in principle, an arbitrary number of dimensions and also normalizations that are not necessarily tied to bin sizes. Yet another struct BinInfo is needed to abstract the differences between the two, as shown in Grid::bin_info.
- Furthermore, the
MoreMembers enum is needed for metadata, which was previously missing. As a result, the methods Grid::key_values, Grid::key_values_mut return an Option depending on whether the Grid does have metadata or not.
Planned changes
To make file handling more flexible and to support different designs without sacrificing backwards compatibility, we need to implement a few changes:
- We need a file header and a file version. The file header precedes as the remaining data and can be as simple as the byte string
['P', 'i', 'n', 'e', 'A', 'P', 'P', 'L']. This is needed to let Grid::read detect if a grid can immediately be deserialized or if it has first to be decompressed. The file version, on the other hand, lets us determine exactly how the read is performed.
- Depending on the file version,
read of the correct struct is called, followed by upgrade which converts the grid from a specific version to the latest one. The upgrade method must also be offered by the CLI so that one can batch convert grids into the newest version.
- At some point we might have different versions of the
Grid struct in the crate, possible as pineappl::grid::v0::Grid, pineappl::grid::v1::Grid as so forth, and a type definition for pineappl::grid::Grid for the most recent version.
- As soon as a new file version is released all previous file versions should be considered deprecated, and at some point older versions can be removed. Backwards compatibility is ensured by the fact that crates.io always has all versions of the CLI, which we can use to
upgrade grids in bootstrap kind of way (install the latest version that still supports the file format that needs to be upgraded, upgrade to most recent version supported, etc.). This could and should probably be automated.
- To make this work, the supported file versions need to be documented, ideally in the
upgrade subcommand of the CLI itself as error messages (something along the lines of error: tried to upgrade grid with file version 0. You need pineappl 0.5.0 to upgrade this version).
Backwards compatibility
First, let's define backwards compatibility:
Grid::readmust be able to read all generated PineAPPL grids if they were generated using a released version of PineAPPL. Released versions are the ones on the Releases page.PineAPPL file format
PineAPPL doesn't have a dedicated file format, but instead relies on
serdefor (de)serialization and onbincodefor actually writing bytes to and from files. This has the disadvantage that, for the sake of backwards compatibility, everystructthat has the#[derive(Deserialize,Serialize)]attributes must never be changed ever, and the only flexibility is adding further kinds ofenums; that's the reason why there are multiple versions of a struct asV1andV2variants.Obviously requirements change and even in the design mistakes were/will be made. To mention a few examples:
MoreMembersenum was added to support aBinRemapper. This struct basically supersedesBinLimits, which only supports one-dimensional distributions that are contiguous (the right bin limit is the left limit of the next bin).BinRemappersupports, at least in principle, an arbitrary number of dimensions and also normalizations that are not necessarily tied to bin sizes. Yet another structBinInfois needed to abstract the differences between the two, as shown inGrid::bin_info.MoreMembersenum is needed for metadata, which was previously missing. As a result, the methodsGrid::key_values,Grid::key_values_mutreturn anOptiondepending on whether theGriddoes have metadata or not.Planned changes
To make file handling more flexible and to support different designs without sacrificing backwards compatibility, we need to implement a few changes:
['P', 'i', 'n', 'e', 'A', 'P', 'P', 'L']. This is needed to letGrid::readdetect if a grid can immediately be deserialized or if it has first to be decompressed. The file version, on the other hand, lets us determine exactly how the read is performed.readof the correct struct is called, followed byupgradewhich converts the grid from a specific version to the latest one. Theupgrademethod must also be offered by the CLI so that one can batch convert grids into the newest version.Gridstruct in the crate, possible aspineappl::grid::v0::Grid,pineappl::grid::v1::Gridas so forth, and a type definition forpineappl::grid::Gridfor the most recent version.upgradegrids in bootstrap kind of way (install the latest version that still supports the file format that needs to be upgraded, upgrade to most recent version supported, etc.). This could and should probably be automated.upgradesubcommand of the CLI itself as error messages (something along the lines oferror: tried to upgrade grid with file version 0. You need pineappl 0.5.0 to upgrade this version).