Skip to content

Preload teardown failure interferes with scheduler idle timeout #6437

@jrbourbeau

Description

@jrbourbeau

Earlier today I ran into a situation where a scheduler preload script had a teardown method failure which caused the scheduler's automatic idle timeout to not work. Here's a quick reproducer in the form of a test:

import asyncio

from distributed.utils_test import gen_cluster
from distributed.core import Status

bad_preload_text = """
def dask_setup(scheduler):
    scheduler.foo = 'setup'

def dask_teardown(scheduler):
    raise ValueError('ASDF')
"""

@gen_cluster(
    client=True,
    config={
        "distributed.scheduler.idle-timeout": "5s",
        "distributed.scheduler.preload": [bad_preload_text],
    },
)
async def test_idle_timeout_preload_error(c, s, a, b):
    assert s.foo == "setup"  # Confirm preload was run on setup

    while s.status != Status.closed:
        print(f"{s.status = }")
        await asyncio.sleep(0.1)

Currently this test hangs due to the ValueError raised in the preload dask_teardown method. If you replace, for example, raise ValueError('ASDF') with a simple return then the test passes

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething is broken

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions