Skip to content

[C++][Python] Sporadic asof_join failures in PyArrow #41149

@pitrou

Description

@pitrou

Describe the bug, including details regarding any error messages, version, and platform.

We see sporadic CI failures in test_dataset_join_asof_empty_by. Example:
https://github.com/ursacomputing/crossbow/actions/runs/8597009571/job/23554761859#step:10:787

=================================== FAILURES ===================================
_______________________ test_dataset_join_asof_empty_by ________________________

tempdir = PosixPath('/tmp/pytest-of-root/pytest-0/test_dataset_join_asof_empty_b0')

    @pytest.mark.dataset
    def test_dataset_join_asof_empty_by(tempdir):
        t1 = pa.table({
            "on": [1, 2, 3],
        })
        ds.write_dataset(t1, tempdir / "t1", format="ipc")
        ds1 = ds.dataset(tempdir / "t1", format="ipc")
    
        t2 = pa.table({
            "colVals": ["Z", "B", "A"],
            "on": [2, 3, 4],
        })
        ds.write_dataset(t2, tempdir / "t2", format="ipc")
        ds2 = ds.dataset(tempdir / "t2", format="ipc")
    
        result = ds1.join_asof(
            ds2, on="on", by=[], tolerance=1
        )
>       assert result.to_table() == pa.table({
            "on": [1, 2, 3],
            "colVals": ["Z", "Z", "B"],
        })
E       assert pyarrow.Table...null,"Z","B"]] == pyarrow.Table...["Z","Z","B"]]
E         
E         Use -v to get more diff

usr/local/lib/python3.9/site-packages/pyarrow/tests/test_dataset.py:5154: AssertionError

The useful information is unfortunately truncated by stupid pytest, but we can still see that there is a null value somewhere in the colVals result, which is probably unexpected.

Component(s)

C++, Python

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions