-
-
Notifications
You must be signed in to change notification settings - Fork 748
Managing priorities manually #4185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When defining priorities manually, somehow they are added with a negative sign. In the example below, last task should have the highest priority, but ends up with the lower priority (-5):
```python
from dask.distributed import Client, LocalCluster
cluster = LocalCluster()
client = Client(cluster)
def test_func(i):
return
futures = [client.submit(test_func, i, priority=i) for i in range(5)]
```
jrbourbeau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @SultanOrazbayev. IIUC the - is intentional as when the scheduler constructs task state transition recommendations we add tasks to the recommendations dictionary in reverse order of their priority
distributed/distributed/scheduler.py
Lines 2018 to 2020 in 18fff8b
| for ts in sorted(runnables, key=operator.attrgetter("priority"), reverse=True): | |
| if ts.state == "released" and ts.run_spec: | |
| recommendations[ts.key] = "waiting" |
This is so the highest priority tasks, those with the largest priority= value in client.submit, will be added to the recommendations dict last. This is important because later on when we go to actually transition the tasks (e.g. to the "waiting" state) using recommendations.popitem
distributed/distributed/scheduler.py
Lines 4787 to 4790 in 18fff8b
| while recommendations: | |
| key, finish = recommendations.popitem() | |
| keys.add(key) | |
| new = self.transition(key, finish) |
higher priority tasks will be processed first.
|
@SultanOrazbayev I should have asked, did you find tasks were being run in the opposite order you expected? |
Thank you for the quick feedback, I noticed the reverse ordering in the code but got confused by the negative priority values in the task details of the dashboard. With regards to the task completion... I think the order of completion is not correct, but maybe I am doing it the wrong way. So, consider a modification of the above: from dask.distributed import Client, LocalCluster
cluster = LocalCluster(n_workers=1, resources={'foo': 1})
client = Client(cluster)
def test_func(i):
import time
time.sleep(i)
return i
futures = [client.submit(test_func, i, priority=i, resources={'foo': 1}) for i in range(5)]This should complete the longest running task first, but in the dashboard I see that the completion of the first task is instant. Or, am I coding it wrong? |
|
Probably the scheduler just got the first task first and so started running
it right away. The scheduler then got the subsequent tasks a fraction of a
millisecond later but the first task was already running. If you want to
submit many tasks at once then you might consider using the dask.delayed
interface.
…On Fri, Oct 23, 2020 at 2:56 PM SultanOrazbayev ***@***.***> wrote:
@SultanOrazbayev <https://github.com/SultanOrazbayev> I should have
asked, did you find tasks were being run in the opposite order you expected?
Thank you for the quick feedback, yes I noticed the reverse ordering in
the code but got confused by the values in the task details of the
dashboard. With regards to the task completion... I think it's wrong, but
maybe I am doing it the wrong way. So, consider a modification of the above:
from dask.distributed import Client, LocalClustercluster = LocalCluster(n_workers=1, resources={'foo': 1})client = Client(cluster)
def test_func(i):
import time
time.sleep(i)
return i
futures = [client.submit(test_func, i, priority=i, resources={'foo': 1}) for i in range(5)]
This should complete the longest running task first, but in the dashboard
I see that the completion of the first task is instant. Or, am I coding it
wrong?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4185 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTHURAUZE5HRQNEDOHTSMH32JANCNFSM4S47AQ4A>
.
|
OK, thanks! Just to make sure I understand it correctly, with futures it's not possible to affect the order of execution? Changing priority sign in the code below gives the same execution order (in the order of submission): from dask.distributed import Client, LocalCluster
cluster = LocalCluster(n_workers=1, resources={'foo': 1})
client = Client(cluster)
# a small delay is added to check if the scheduler needs more time to accept all tasks before sorting them
def test_func(i):
import time
time.sleep(1+i)
return i
# executed in the order of submission (priorities are in the same order)
futures = [client.submit(test_func, i, priority=-i, resources={'foo': 1}) for i in range(5)]
# also executed in the order of submission, even though priorities of later tasks are higher
futures = [client.submit(test_func, i, priority=i, resources={'foo': 1}) for i in range(5)] |
|
You can affect the order of execution, but Dask will also start working on
things right away. Dask won't stop things that have already started
running.
…On Tue, Oct 27, 2020 at 4:21 PM SultanOrazbayev ***@***.***> wrote:
Probably the scheduler just got the first task first and so started
running it right away. The scheduler then got the subsequent tasks a
fraction of a millisecond later but the first task was already running.
OK, thanks! Just to make sure I understand it correctly, with futures it's
not possible to affect the order of execution? Changing priority sign in
the code below gives the same execution order (in the order of submission):
from dask.distributed import Client, LocalClustercluster = LocalCluster(n_workers=1, resources={'foo': 1})client = Client(cluster)
# a small delay is added to check if the scheduler needs more time to accept all tasks before sorting themdef test_func(i):
import time
time.sleep(1+i)
return i
# executed in the order of submission (priorities are in the same order)futures = [client.submit(test_func, i, priority=-i, resources={'foo': 1}) for i in range(5)]
# also executed in the order of submission, even though priorities of later tasks are higherfutures = [client.submit(test_func, i, priority=i, resources={'foo': 1}) for i in range(5)]
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4185 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTFKS6EEGHVJ55PRMJ3SM5IX5ANCNFSM4S47AQ4A>
.
|
|
Hmm, after removing # this executes in the order of priorities (long-running task first)
futures = [client.submit(test_func, i, priority=i) for i in range(5)] |
|
I got confused by the negative sign in the task dashboard, but right now priorities work as expected. The only problem I have is combining resources and priority tags, but this is more a matter of understanding how to use resources, so I opened a discussion here: #4209 |
|
If you want to submit a small patch negating the priority again in the info
page that would be welcome.
https://github.com/dask/distributed/blob/cee4e3c99c34c1e515b9e80032d19c24b5f390ed/distributed/http/templates/task.html#L42-L45
…On Sat, Oct 31, 2020 at 12:37 PM SultanOrazbayev ***@***.***> wrote:
Closed #4185 <#4185>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4185 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTB6IVNBAZCLO3QSH2LSNRRQXANCNFSM4S47AQ4A>
.
|
When defining priorities manually, somehow they are added with a negative sign. In the example below, last task should have the highest priority, but ends up with a lower priority (-4):