Skip to content

Conversation

@phofl
Copy link
Collaborator

@phofl phofl commented Aug 31, 2023

Our await logic doesn't play nicely with the intermediate computes, hence the different test logic here. dask/dask suffers the same issue



def test_sort_values():
with LocalCluster() as cluster:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to leverage gen_cluster here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's async

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more? Also, do we really want to allow a test to use all the available cores on our machine like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You maybe want these fixtures from distributed.utils_test

@pytest.fixture
def cluster_fixture(loop):
    with cluster() as (scheduler, workers):
        yield (scheduler, workers)


@pytest.fixture
def s(cluster_fixture):
    scheduler, workers = cluster_fixture
    return scheduler


@pytest.fixture
def a(cluster_fixture):
    scheduler, workers = cluster_fixture
    return workers[0]


@pytest.fixture
def b(cluster_fixture):
    scheduler, workers = cluster_fixture
    return workers[1]


@pytest.fixture
def client(loop, cluster_fixture):
    scheduler, workers = cluster_fixture
    with Client(scheduler["address"], loop=loop) as client:
        yield client

They handle cleanup and various sanity checks.

Alternatively, if you want things to be very fast, you might also want with LocalCluster(processes=False)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the gen_cluster decorator creates a async cluster and async tests will fail for sort_values/set_index because of intermediate computes. That's not a dask-expr problem, dask/dask has the same issue.

Sure, we can restrict the number of workers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer the fast version, this is only supposed to test that the function serialisation works as expected

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can restrict the number of workers

Sorry for mentioning that - I care less about that detail (for now).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See dask/distributed#8167 This one has the same issue in dask/dask

@phofl phofl merged commit 37f1eae into dask:main Sep 6, 2023
@phofl phofl deleted the sort_values_keyword branch September 6, 2023 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants