Skip to content

Conversation

@pentschev
Copy link
Member

Closes #7130

As was previously discussed in #6996, some of the exception catching could be improved for both read and write tasks which are now addressed by the changes contained here.

  • Tests added / passed
  • Passes pre-commit run --all-files

@pentschev
Copy link
Member Author

cc @wence-

@github-actions
Copy link
Contributor

github-actions bot commented Oct 10, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±0         15 suites  ±0   6h 51m 25s ⏱️ + 21m 25s
  3 146 tests ±0    3 061 ✔️ +1    85 💤 ±0  0  - 1 
23 272 runs  ±0  22 357 ✔️ +3  915 💤  - 2  0  - 1 

Results for commit 61e5af0. ± Comparison against base commit 37fd8b5.

♻️ This comment has been updated with latest results.

@wence-
Copy link
Contributor

wence- commented Oct 10, 2022

Lints fails are addressed by #7131

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor typo, but otherwise looks good, thanks!

@wence-
Copy link
Contributor

wence- commented Oct 11, 2022

I think if you merge trunk (or rebase) the linting errors will disappear

@quasiben
Copy link
Member

The test_shutdown_localcluster failure is a known flaky tests:
#6625 (comment)

The test_sni failure is also unrelated but I've not seen it before. I'm going to restart CI but it would be good to keep an eye on it:

___________________________________ test_sni ___________________________________

loop = <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7fa96cb75730>

    def test_sni(loop):
        port = open_port()
        with popen(["dask-scheduler", "--no-dashboard", f"--port={port}"] + tls_args) as s:
            with popen(
                [
                    "dask-worker",
                    "--no-dashboard",
                    "--scheduler-sni",
                    "localhost",
                    f"tls://127.0.0.1:{port}",
                ]
                + tls_args
            ) as w:
                with Client(
                    f"tls://127.0.0.1:{port}", loop=loop, security=tls_security()
                ) as c:
>                   wait_for_cores(c)

distributed/cli/tests/test_tls_cli.py:60: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

c = <Client: 'tls://10.213.3.122:50230' processes=0 threads=0, memory=0 B>
nthreads = 1

    def wait_for_cores(c, nthreads=1):
        start = time()
        while len(c.nthreads()) < 1:
            sleep(0.1)
>           assert time() < start + 10
E           assert 1665511027.440665 < (1665511017.4086459 + 10)
E            +  where 1665511027.440665 = time()

distributed/cli/tests/test_tls_cli.py:30: AssertionError

@wence-
Copy link
Contributor

wence- commented Oct 12, 2022

>           assert time() < start + 10
E           assert 1665511027.440665 < (1665511017.4086459 + 10)
E            +  where 1665511027.440665 = time()

This seems like the kind of test that is very likely to fail on heavily loaded virtualised systems.

@quasiben
Copy link
Member

All tests pass now -- merging in.

@wence- if you have thoughts on improving either of the flaky tests they would be extremely appreciated

@quasiben quasiben merged commit 470f42c into dask:main Oct 12, 2022
@pentschev pentschev deleted the ucx-base-exception-read-frames branch October 17, 2022 09:57
gjoseph92 pushed a commit to gjoseph92/distributed that referenced this pull request Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Catch exceptions on both send and recv in UCX comm

3 participants