-
-
Notifications
You must be signed in to change notification settings - Fork 27
Fix bug in sort_values in distributed setting #288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
|
|
||
| def test_sort_values(): | ||
| with LocalCluster() as cluster: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way to leverage gen_cluster here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's async
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain more? Also, do we really want to allow a test to use all the available cores on our machine like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You maybe want these fixtures from distributed.utils_test
@pytest.fixture
def cluster_fixture(loop):
with cluster() as (scheduler, workers):
yield (scheduler, workers)
@pytest.fixture
def s(cluster_fixture):
scheduler, workers = cluster_fixture
return scheduler
@pytest.fixture
def a(cluster_fixture):
scheduler, workers = cluster_fixture
return workers[0]
@pytest.fixture
def b(cluster_fixture):
scheduler, workers = cluster_fixture
return workers[1]
@pytest.fixture
def client(loop, cluster_fixture):
scheduler, workers = cluster_fixture
with Client(scheduler["address"], loop=loop) as client:
yield clientThey handle cleanup and various sanity checks.
Alternatively, if you want things to be very fast, you might also want with LocalCluster(processes=False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the gen_cluster decorator creates a async cluster and async tests will fail for sort_values/set_index because of intermediate computes. That's not a dask-expr problem, dask/dask has the same issue.
Sure, we can restrict the number of workers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the fast version, this is only supposed to test that the function serialisation works as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can restrict the number of workers
Sorry for mentioning that - I care less about that detail (for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See dask/distributed#8167 This one has the same issue in dask/dask
Our await logic doesn't play nicely with the intermediate computes, hence the different test logic here. dask/dask suffers the same issue