Skip to content

PineAPPL file format and backwards compatibility issues #83

@cschwan

Description

@cschwan

Backwards compatibility

First, let's define backwards compatibility:

Grid::read must be able to read all generated PineAPPL grids if they were generated using a released version of PineAPPL. Released versions are the ones on the Releases page.

PineAPPL file format

PineAPPL doesn't have a dedicated file format, but instead relies on serde for (de)serialization and on bincode for actually writing bytes to and from files. This has the disadvantage that, for the sake of backwards compatibility, every struct that has the #[derive(Deserialize,Serialize)] attributes must never be changed ever, and the only flexibility is adding further kinds of enums; that's the reason why there are multiple versions of a struct as V1 and V2 variants.

Obviously requirements change and even in the design mistakes were/will be made. To mention a few examples:

  • The MoreMembers enum was added to support a BinRemapper. This struct basically supersedes BinLimits, which only supports one-dimensional distributions that are contiguous (the right bin limit is the left limit of the next bin). BinRemapper supports, at least in principle, an arbitrary number of dimensions and also normalizations that are not necessarily tied to bin sizes. Yet another struct BinInfo is needed to abstract the differences between the two, as shown in Grid::bin_info.
  • Furthermore, the MoreMembers enum is needed for metadata, which was previously missing. As a result, the methods Grid::key_values, Grid::key_values_mut return an Option depending on whether the Grid does have metadata or not.

Planned changes

To make file handling more flexible and to support different designs without sacrificing backwards compatibility, we need to implement a few changes:

  1. We need a file header and a file version. The file header precedes as the remaining data and can be as simple as the byte string ['P', 'i', 'n', 'e', 'A', 'P', 'P', 'L']. This is needed to let Grid::read detect if a grid can immediately be deserialized or if it has first to be decompressed. The file version, on the other hand, lets us determine exactly how the read is performed.
  2. Depending on the file version, read of the correct struct is called, followed by upgrade which converts the grid from a specific version to the latest one. The upgrade method must also be offered by the CLI so that one can batch convert grids into the newest version.
  3. At some point we might have different versions of the Grid struct in the crate, possible as pineappl::grid::v0::Grid, pineappl::grid::v1::Grid as so forth, and a type definition for pineappl::grid::Grid for the most recent version.
  4. As soon as a new file version is released all previous file versions should be considered deprecated, and at some point older versions can be removed. Backwards compatibility is ensured by the fact that crates.io always has all versions of the CLI, which we can use to upgrade grids in bootstrap kind of way (install the latest version that still supports the file format that needs to be upgraded, upgrade to most recent version supported, etc.). This could and should probably be automated.
  5. To make this work, the supported file versions need to be documented, ideally in the upgrade subcommand of the CLI itself as error messages (something along the lines of error: tried to upgrade grid with file version 0. You need pineappl 0.5.0 to upgrade this version).

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions