-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
There are two bugs.
- Losing the event SchedulerServerEvent::JobSubmitted results in job no longer to be scheduled
- Concurrency issue of updating ExecutorData simultaneously.
-
For the first bug:
In the method of SchedulerServerEventAction::offer_resources, the returned available_executors may be all with 0 available_task_slots. In this case, there'll be no tasks to be scheduled for the job and no SchedulerServerEvent::JobSubmitted will be resent to the channel. As a result, the job will get stuck. -
For the second bug:
The operations of get_executor_data and save_executor_data are not atomic, which may result in concurrency issue.
To Reproduce
Run loading test with Push-based task scheduling policy as described in #1983.
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working