Skip to content

Conversation

@fjetter
Copy link
Member

@fjetter fjetter commented Jan 25, 2023

Closes #7498

Copy link
Member Author

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: There are a handful of tests that still use the old Scheduler.update_graph but I will adapt them accordingly

Comment on lines -464 to -469
def __setstate__(self, state):
key, address = state
try:
c = Client.current(allow_global=False)
except ValueError:
c = get_client(address)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual change I want to do

Comment on lines -471 to -478
c._send_to_scheduler(
{
"op": "update-graph",
"tasks": {},
"keys": [stringify(self.key)],
"client": c.id,
}
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this message. I believe this is entirely redundant since the initialized future will already let the scheduler know that it exists. This is the only place where this message is submitted, therefore I started to consolidate the two update_graph methods on the scheduler

Comment on lines 4288 to 4315
self,
client=None,
tasks=None,
keys=None,
dependencies=None,
restrictions=None,
priority=None,
loose_restrictions=None,
resources=None,
submitting_task=None,
retries=None,
user_priority=0,
actors=None,
fifo_timeout=0,
annotations=None,
code=None,
stimulus_id=None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed a couple of unused arguments and started to clean up the signature which led me to add type annotations. All of the changes in this method are merely there to make mypy happy but should not alter any behavior

Comment on lines -5330 to -5363
@gen_cluster(client=True)
async def test_serialize_collections_of_futures(c, s, a, b):
pd = pytest.importorskip("pandas")
dd = pytest.importorskip("dask.dataframe")
from dask.dataframe.utils import assert_eq

df = pd.DataFrame({"x": [1, 2, 3]})
ddf = dd.from_pandas(df, npartitions=2).persist()
future = await c.scatter(ddf)

ddf2 = await future
df2 = await c.compute(ddf2)

assert_eq(df, df2)


def test_serialize_collections_of_futures_sync(c):
pd = pytest.importorskip("pandas")
dd = pytest.importorskip("dask.dataframe")
from dask.dataframe.utils import assert_eq

df = pd.DataFrame({"x": [1, 2, 3]})
ddf = dd.from_pandas(df, npartitions=2).persist()
future = c.scatter(ddf)

result = future.result()
assert_eq(result.compute(), df)

assert future.type == dd.DataFrame
assert c.submit(lambda x, y: assert_eq(x.compute(), y), future, df).result()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these tests are just wrong

Comment on lines +2207 to +2208
except NoCurrentClient:
raise NoCurrentClient(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we typically do not work with custom exception types. However, in this case this gives us the possibility to easily tell the user what's going wrong which would otherwise much more messy.

@fjetter
Copy link
Member Author

fjetter commented Jan 25, 2023

FWIW I believe I can split off the "remove implicit client instantiation" from the update_graph refactoring.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 25, 2023

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       24 files  ±  0         24 suites  ±0   10h 2m 6s ⏱️ - 44m 19s
  3 334 tests  -   1    3 229 ✔️ ±  0     104 💤 ±0  1  - 1 
39 306 runs   - 12  37 430 ✔️  - 18  1 875 💤 +7  1  - 1 

For more details on these failures, see this check.

Results for commit f2e3696. ± Comparison against base commit 1c6fb84.

♻️ This comment has been updated with latest results.

@fjetter fjetter force-pushed the collections_futures_arguments branch from 9e9d4a5 to d2e5127 Compare January 26, 2023 11:11
@fjetter
Copy link
Member Author

fjetter commented Jan 26, 2023

Broke out the update_Graph refactoring to #7502

Comment on lines +4048 to +4049
with temp_default_client(ci), pytest.raises(NoCurrentClient):
future2 = pickle.loads(pickle.dumps(future))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a subtle difference between default clients and current clients. Current client is stricter and better in almost all circumstances. temp_default is also only used in testing.
I think this change makes everything much more predictable and should not have an effect on actual UX

Comment on lines +5169 to +5171
with pytest.raises(NoCurrentClient, match=r"Future.*argument.*persist"):
future = c.submit(f, x)
result = await future
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users actually provide a collection as an argument they will receive a helpful exception message. unfortunately we can only raise once the task is deserialized on the worker. That's pretty late but the best I can do right now and still much better than spurious failures as described in #7498

async def test_retire_state_change(c, s, a, b):
np = pytest.importorskip("numpy")
y = c.map(lambda x: x**2, range(10))
await c.scatter(y)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea why this scatter is here. Am I missing something? Should this be a "replicate" or why would somebody scatter a future??

@jrbourbeau any ideas?

@fjetter fjetter marked this pull request as ready for review January 26, 2023 11:24
@fjetter fjetter self-assigned this Jan 27, 2023
@fjetter
Copy link
Member Author

fjetter commented Mar 10, 2023

isntead #7580

@fjetter fjetter closed this Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race conditions in implicit creation of worker clients when serializing futures resulting in distributed.CancelledErrors

1 participant