Skip to content

Conversation

@jrbourbeau
Copy link
Member

Today if a user attempts to create a Variable or Queue from within a task they will get the following error:

ValueError: No clients found
Start a client and point it to the scheduler address
  from distributed import Client
  client = Client('ip-addr-of-scheduler:8786')

This PR adds additional fallback logic to attempt to get a Client from the worker running the task, which allows users to do the following

In [6]: x = Variable("my-var")

In [7]: def bar():
   ...:     from distributed import Variable
   ...:     v = Variable("my-var")
   ...:     return v.get()
   ...:

In [8]: x.set(987)

In [9]: client.submit(bar).result()
987

This behavior is in line with how Locks, Events, and Semaphores work today:

def __init__(self, name=None, client=None):
try:
self.client = client or Client.current()
except ValueError:
# Initialise new client
self.client = get_worker().client

@mrocklin
Copy link
Member

mrocklin commented Feb 8, 2021

Two quick comments:

  1. We should have a test for this
  2. @fjetter noticed that we're less scalable around having many clients as we are around having many workers. We don't need to fix this right now, but in the future we may want to either make clients more scalable, or else make it easier to create variables/queues from workers without a client present.

@jrbourbeau
Copy link
Member Author

Thanks for reviewing -- test added

@fjetter noticed that we're less scalable around having many clients as we are around having many workers

Good to know. Is there a discussion around this somewhere? Or more just a theme @fjetter has noticed when running at scale ?

@fjetter
Copy link
Member

fjetter commented Feb 9, 2021

There is no big discussion on the subject since I changed the implementation of the semaphore to not use clients anymore, see #4195

The scenario back then was that we ended up with a 100% CPU utilization on scheduler side once we reached about 200 workers w/ an active semaphore. I'm not entirely certain if it is solely due to the clients themselves or if it were too many connections, maybe both.

One low hanging fruit which should help for this is already to adjust the heartbeat intervals of the clients. They do not scale dynamically with increasing number of connected clients and this might already be it but I decided to refactor the semaphore instead of triaging the client scalability.

@jrbourbeau
Copy link
Member Author

Thanks for reviewing @fjetter @mrocklin!

Also, xref #4498 for the failing CI builds

@jrbourbeau jrbourbeau merged commit 725f001 into dask:master Feb 11, 2021
@jrbourbeau jrbourbeau deleted the variable-client branch February 11, 2021 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants