Conversation
Integers in R are always 32-bits, but the integer sequence altrep was being unpacked as an array of 64-bit integers. This has been detected when comparing tests for R format versions 2 (that has no altreps) and version 3 (with altreps).
We now use pytest plus some extra magic to make our read tests more reproducible. Now, the code to generate the datasets (in R) is defined in the docstrings, and the datasets can be (re-)generated passing the argument `--generate-datasets` to pytest. The tests will automatically be executed with all possible combinations of the R save format, including: - Versions 2 or 3 - RDS or RDA - XDR, ASCII and binary encodings This should improve our coverage and ensure that we do not miss some features for some combination of these parameters. NOTE: Apparently binary and RDA cannot be combined, so that combination is skipped.
The generated tests have been added, so that an R installation is not required for checking that everything works. Removed the old versions of the datasets. NOTE: Binary datasets are by their nature machine-dependent. We upload the datasets for little-endian machines, as these are the most common (and the ones used by Github pipelines). Users in a big-endian environment would need to test with the `--generate-datasets` option if they want to check the datasets generated by their machine: no attempt will be made to generate also big-endian datasets as I do not have access to a big-endian machine. I only kept the simple test for checking that big-endian works.
|
@traversc I improved the tests so that they are now tried with all possible format combinations. However, I do not think it is possible to save in R an object in the RDA format with native binary encoding. I noticed that you actually provided a test to check that combination. Is there a way to generate such a dataset that I do not know about? |
Now that we have write tests too, the old name feels inappropriate.
|
No normal way but it is officially part of the R serialization spec: https://github.com/wch/r-source/blob/c1a5096885cb8c55a1df78d660e0828f83bc68f9/doc/NEWS.3#L1546 Someone could write a custom function that writes it and R will properly load, so I feel that it should still be supported. |
|
Ok, then that part would only be tested with your handcrafted example, which I think is enough. If R adds support for writing binary RDA files in the future we can then stop skipping that combination. |
Describe the proposed changes
We now use pytest plus some extra magic to make our read tests more reproducible.
Now, the code to generate the datasets (in R) is defined in the docstrings, and the datasets can be (re-)generated passing the argument
--generate-datasetsto pytest.The tests will automatically be executed with all possible combinations of the R save format, including:
This should improve our coverage and ensure that we do not miss some features for some combination of these parameters.
NOTE: Apparently binary and RDA cannot be combined, so that combination is skipped.
Checklist before requesting a review