Forbid collections of futures to be passed as arguments #7500

fjetter · 2023-01-25T18:44:19Z

fjetter

Note: There are a handful of tests that still use the old Scheduler.update_graph but I will adapt them accordingly

fjetter · 2023-01-25T18:46:02Z

distributed/client.py

-    def __setstate__(self, state):
-        key, address = state
-        try:
-            c = Client.current(allow_global=False)
-        except ValueError:
-            c = get_client(address)


This is the actual change I want to do

fjetter · 2023-01-25T18:47:14Z

distributed/client.py

-        c._send_to_scheduler(
-            {
-                "op": "update-graph",
-                "tasks": {},
-                "keys": [stringify(self.key)],
-                "client": c.id,
-            }
-        )


I noticed this message. I believe this is entirely redundant since the initialized future will already let the scheduler know that it exists. This is the only place where this message is submitted, therefore I started to consolidate the two update_graph methods on the scheduler

fjetter · 2023-01-25T18:48:23Z

distributed/scheduler.py

        self,
-        client=None,
-        tasks=None,
-        keys=None,
-        dependencies=None,
-        restrictions=None,
-        priority=None,
-        loose_restrictions=None,
-        resources=None,
-        submitting_task=None,
-        retries=None,
-        user_priority=0,
-        actors=None,
-        fifo_timeout=0,
-        annotations=None,
-        code=None,
-        stimulus_id=None,


I noticed a couple of unused arguments and started to clean up the signature which led me to add type annotations. All of the changes in this method are merely there to make mypy happy but should not alter any behavior

fjetter · 2023-01-25T18:48:40Z

distributed/tests/test_client.py

-@gen_cluster(client=True)
-async def test_serialize_collections_of_futures(c, s, a, b):
-    pd = pytest.importorskip("pandas")
-    dd = pytest.importorskip("dask.dataframe")
-    from dask.dataframe.utils import assert_eq
-
-    df = pd.DataFrame({"x": [1, 2, 3]})
-    ddf = dd.from_pandas(df, npartitions=2).persist()
-    future = await c.scatter(ddf)
-
-    ddf2 = await future
-    df2 = await c.compute(ddf2)
-
-    assert_eq(df, df2)
-
-
-def test_serialize_collections_of_futures_sync(c):
-    pd = pytest.importorskip("pandas")
-    dd = pytest.importorskip("dask.dataframe")
-    from dask.dataframe.utils import assert_eq
-
-    df = pd.DataFrame({"x": [1, 2, 3]})
-    ddf = dd.from_pandas(df, npartitions=2).persist()
-    future = c.scatter(ddf)
-
-    result = future.result()
-    assert_eq(result.compute(), df)
-
-    assert future.type == dd.DataFrame
-    assert c.submit(lambda x, y: assert_eq(x.compute(), y), future, df).result()


I believe these tests are just wrong

fjetter · 2023-01-25T18:49:30Z

distributed/worker.py

+        except NoCurrentClient:
+            raise NoCurrentClient(


I know we typically do not work with custom exception types. However, in this case this gives us the possibility to easily tell the user what's going wrong which would otherwise much more messy.

fjetter · 2023-01-25T18:51:11Z

FWIW I believe I can split off the "remove implicit client instantiation" from the update_graph refactoring.

github-actions · 2023-01-25T19:33:37Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      24 files ±  0       24 suites ±0 10h 2m 6s ⏱️ - 44m 19s
  3 334 tests -   1   3 229 ✔️ ±  0   104 💤 ±0 1 ❌ - 1
39 306 runs - 12 37 430 ✔️ - 18 1 875 💤 +7 1 ❌ - 1

For more details on these failures, see this check.

Results for commit f2e3696. ± Comparison against base commit 1c6fb84.

♻️ This comment has been updated with latest results.

fjetter · 2023-01-26T11:21:09Z

Broke out the update_Graph refactoring to #7502

fjetter · 2023-01-26T11:22:28Z

distributed/tests/test_client.py

+            with temp_default_client(ci), pytest.raises(NoCurrentClient):
+                future2 = pickle.loads(pickle.dumps(future))


There is a subtle difference between default clients and current clients. Current client is stricter and better in almost all circumstances. temp_default is also only used in testing.
I think this change makes everything much more predictable and should not have an effect on actual UX

fjetter · 2023-01-26T11:23:27Z

distributed/tests/test_client.py

+    with pytest.raises(NoCurrentClient, match=r"Future.*argument.*persist"):
+        future = c.submit(f, x)
+        result = await future


If users actually provide a collection as an argument they will receive a helpful exception message. unfortunately we can only raise once the task is deserialized on the worker. That's pretty late but the best I can do right now and still much better than spurious failures as described in #7498

fjetter · 2023-01-26T11:24:10Z

distributed/tests/test_scheduler.py

 async def test_retire_state_change(c, s, a, b):
    np = pytest.importorskip("numpy")
    y = c.map(lambda x: x**2, range(10))
-    await c.scatter(y)


No idea why this scatter is here. Am I missing something? Should this be a "replicate" or why would somebody scatter a future??

@jrbourbeau any ideas?

fjetter · 2023-03-10T14:49:10Z

isntead #7580

fjetter mentioned this pull request Jan 25, 2023

Race conditions in implicit creation of worker clients when serializing futures resulting in distributed.CancelledErrors #7498

Closed

fjetter commented Jan 25, 2023

View reviewed changes

Forbid collections of futures to be passed as arguments

d2e5127

fjetter force-pushed the collections_futures_arguments branch from 9e9d4a5 to d2e5127 Compare January 26, 2023 11:11

fjetter commented Jan 26, 2023

View reviewed changes

fjetter marked this pull request as ready for review January 26, 2023 11:24

use worker client for nested calls

696da04

fjetter self-assigned this Jan 27, 2023

Merge branch 'main' into collections_futures_arguments

f2e3696

fjetter closed this Mar 10, 2023

		with temp_default_client(ci), pytest.raises(NoCurrentClient):
		future2 = pickle.loads(pickle.dumps(future))

Uh oh!

Forbid collections of futures to be passed as arguments #7500

Forbid collections of futures to be passed as arguments #7500

Uh oh!

Conversation

fjetter commented Jan 25, 2023

Uh oh!

fjetter left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fjetter commented Jan 25, 2023

Uh oh!

github-actions bot commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

fjetter commented Jan 26, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fjetter commented Mar 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Jan 25, 2023 •

edited

Loading