Improve testing of read datasets by vnmabus · Pull Request #61 · vnmabus/rdata

vnmabus · 2026-04-18T11:03:23Z

Describe the proposed changes

We now use pytest plus some extra magic to make our read tests more reproducible.
Now, the code to generate the datasets (in R) is defined in the docstrings, and the datasets can be (re-)generated passing the argument --generate-datasets to pytest.
The tests will automatically be executed with all possible combinations of the R save format, including:

Versions 2 or 3
RDS or RDA
XDR, ASCII and binary encodings

This should improve our coverage and ensure that we do not miss some features for some combination of these parameters.

NOTE: Apparently binary and RDA cannot be combined, so that combination is skipped.

Checklist before requesting a review

I have performed a self-review of my code
The code conforms to the style used in this package (checked with Ruff)
The code is fully documented and typed (type-checked with Mypy)
I have added thorough tests for the new/changed functionality

Integers in R are always 32-bits, but the integer sequence altrep was being unpacked as an array of 64-bit integers. This has been detected when comparing tests for R format versions 2 (that has no altreps) and version 3 (with altreps).

We now use pytest plus some extra magic to make our read tests more reproducible. Now, the code to generate the datasets (in R) is defined in the docstrings, and the datasets can be (re-)generated passing the argument `--generate-datasets` to pytest. The tests will automatically be executed with all possible combinations of the R save format, including: - Versions 2 or 3 - RDS or RDA - XDR, ASCII and binary encodings This should improve our coverage and ensure that we do not miss some features for some combination of these parameters. NOTE: Apparently binary and RDA cannot be combined, so that combination is skipped.

The generated tests have been added, so that an R installation is not required for checking that everything works. Removed the old versions of the datasets. NOTE: Binary datasets are by their nature machine-dependent. We upload the datasets for little-endian machines, as these are the most common (and the ones used by Github pipelines). Users in a big-endian environment would need to test with the `--generate-datasets` option if they want to check the datasets generated by their machine: no attempt will be made to generate also big-endian datasets as I do not have access to a big-endian machine. I only kept the simple test for checking that big-endian works.

vnmabus · 2026-04-18T11:09:43Z

@traversc I improved the tests so that they are now tried with all possible format combinations. However, I do not think it is possible to save in R an object in the RDA format with native binary encoding. I noticed that you actually provided a test to check that combination. Is there a way to generate such a dataset that I do not know about?

Now that we have write tests too, the old name feels inappropriate.

traversc · 2026-04-19T03:40:38Z

No normal way but it is officially part of the R serialization spec: https://github.com/wch/r-source/blob/c1a5096885cb8c55a1df78d660e0828f83bc68f9/doc/NEWS.3#L1546

Someone could write a custom function that writes it and R will properly load, so I feel that it should still be supported.

vnmabus · 2026-04-20T07:29:50Z

Ok, then that part would only be tested with your handcrafted example, which I think is enough. If R adds support for writing binary RDA files in the future we can then stop skipping that combination.

vnmabus added 3 commits April 4, 2026 11:42

Change integer size for sequence altrep

d77186f

Integers in R are always 32-bits, but the integer sequence altrep was being unpacked as an array of 64-bit integers. This has been detected when comparing tests for R format versions 2 (that has no altreps) and version 3 (with altreps).

Rename test_rdata to test_read

73da518

Now that we have write tests too, the old name feels inappropriate.

vnmabus merged commit a88194b into develop Apr 20, 2026
22 checks passed

vnmabus deleted the feature/tests_multiple branch April 20, 2026 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve testing of read datasets#61

Improve testing of read datasets#61
vnmabus merged 4 commits intodevelopfrom
feature/tests_multiple

vnmabus commented Apr 18, 2026 •

edited

Loading

Uh oh!

vnmabus commented Apr 18, 2026 •

edited

Loading

Uh oh!

traversc commented Apr 19, 2026

Uh oh!

vnmabus commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vnmabus commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the proposed changes

Checklist before requesting a review

Uh oh!

vnmabus commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

traversc commented Apr 19, 2026

Uh oh!

vnmabus commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vnmabus commented Apr 18, 2026 •

edited

Loading

vnmabus commented Apr 18, 2026 •

edited

Loading