-
-
Notifications
You must be signed in to change notification settings - Fork 609
sched: low-level support for POSIX-compatible real-time scheduling #1223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
wkozaczuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what specific changes but two nitpicks are required.
core/sched.cc
Outdated
| } else if (!called_from_yield) { | ||
| auto &t = *runqueue.begin(); | ||
| if (p->_runtime.get_local() < t._runtime.get_local()) { | ||
| auto &t = *runqueue.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nitpick! Please shift one space to the right :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how this happens, I didn't need to change that line at all ;-) Thanks.
core/sched.cc
Outdated
| return; | ||
| #endif | ||
| } | ||
| } else if (p->_runtime.get_local() < t._runtime.get_local()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we handling the case when p->_realtime._priority == 0 && t._realtime._priority > 0? The below seems to indicate we are not treating this case in any special case, should we? In other words, should _runtime override the priority of threads? What if the priority of t is greater than 0? Should it then preempt p?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I should && t._realtime._priority == 0 in this if (see also thread_runtime_compare()). We only stay in this thread - which we know is not real time, if the next thread is not real time AND it has a higher runtime.
core/sched.cc
Outdated
| #endif | ||
| } | ||
| if (t._realtime._priority == p->_realtime._priority && | ||
| !called_from_yield) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the !called_from_yield is unnecessary (see line 291/292).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, and it's even worse - I think the if of line 291/292 is not the right thing. I think I have a bug. According to https://man7.org/linux/man-pages/man2/sched_yield.2.html, "If the calling thread is the only thread in the highest priority list at that time, it will continue to run after a call to sched_yield().". So the code wrongly runs some other thread even in non-real-time priorities. I'll need to fix this.
core/sched.cc
Outdated
| if (p->_realtime._priority > 0) { | ||
| // Only switch to a higher-priority realtime thread, or to | ||
| // the next one with equal priority if yielding. | ||
| if (t._realtime._priority < p->_realtime._priority) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that if somehow the current thread p has a high enough priority (for example 99) and all other threads in the run queue have a lower priority, this thread p will keep running forever, not even kernel threads will get a chance to do anything. Am I wrong in my analysis? Is this by design?
I am especially concerned about the kernel threads in this context. I am not sure if any of the thoughts raised in this article apply - https://lwn.net/Articles/818388/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is by design that a real-time thread that never yields, is never preempted by any thread with a lower priority or non-real-time priority.
However, the question about kernel threads is a good one, as in Linux/Unix the things that happen in the kernel weren't called "threads" and the priorities didn't apply to them. In particular cpu::load_balance() which can't even move the poor threads to other CPUs.
I think the idea is that real-time threads should be short-running - and if a real-time thread just starts an endless loop that never returns, it will cause problems also on Linux.
Maybe the right thing to set some sort of special priority to cpu::load_balance (and anything else?) to allow it to run also when a real-time thread is running. I don't know. I think we can improve this later.
core/sched.cc
Outdated
| return; | ||
| #endif | ||
| } | ||
| if (t._realtime._priority == p->_realtime._priority && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this condition right? If the priority of t is equal to p, then we keep p running. Is this correct?
This patch adds "real-time priority" to OSv threads, which can be used to implement Posix's SCHED_FIFO and SCHED_RR scheduling policies. Refs cloudius-systems#386. It doesn't yet "Fixes" it because this patch does not yet add the Posix front-end to this implementation (sched_setscheduler() et al.) - this will be added in a separate patch. We add to each thread a _realtime._priority, an unsigned integer. Normal (non-real-time) threads have _realtime._priority of 0, and are time-shared fairly as before, while realtime threads have _realtime._priority > 0, and preempt any thread with a lower realtime priority, including all normal threads. We add a new API, t->set_realtime_priority(priority), to make a thread real-time scheduled. The "priority" is an unsigned int as explained above (0 means normal thread). The resulting scheduling policy is fully compatible with POSIX's SCHED_FIFO. This includes following the intricate rules on what happens to the running thread's place in the queue when there are several real-time threads with the same priority and the running thread yields, waits, or is preempted by a higher-priority thread. This patch also adds an exercise for these cases in tests/misc-scheduler.cc. This patch does not yet implement time slices as needed for supporting POSIX's SCHED_RR policy. This patch already includes a time-slice setting API, t->set_realtime_time_slice(duration), but if used it produces a warning message, and the scheduler currently ignores this setting. Adding support for real-time time slices is left for a future patch: To implement time slices, we will need to keep in _realtime the total time this thread ran since starting this slice (only zeroed when the thread is removed from the run queue), and each time we switch to a realtime thread with timeslice >= 0, we'll set an appropriate timer (for timeslice minus already used time). I tested that although this patch adds code into the scheduler it does not noticably slow down context switches of normal threads. E.g., tests/misc-ctxsw.so measured colocated context switch time of 240ns before this patch, and exactly the same after it. While developing this patch I tested it using added workloads in tests/misc-scheduler.cc, included in this patch, but unfortunately this is not a real regression test - it has no assertions and cannot automatically detect failure (and, like all the "misc-*" tests, it doesn't run automatically in "make check"). Signed-off-by: Nadav Har'El <nyh@scylladb.com>
|
Pushed a new version with the bugs that @wkozaczuk noticed (thanks!) fixed. While fixing these bugs, I decided the scheduler is written in a too complicated way - e.g., it's ugly and confusing that we have two separate places in the context switch code which re-queue the outgoing thread. But unfortunately, after doing this cleanup the resulting scheduler's context switches wre much slower - and I couldn't figure out why even though I spent several hours trying to fix it :-( So the version I pushed now just fixes the bugs that Waldek found, without trying to improve the understandability of the code. |
|
By the way, @wkozaczuk, the Travis CI fails saying that "Error: Cannot perform an interactive login from a non TTY device" when trying to pass docker a password from a TTY. Do you have any idea where/how to fix it? |
|
@nyh I believe I have fixed the issue with the Travis build. |
|
Though, honestly, I am not sure if it is actually building and testing your pull request changes or what is on master. |
|
I think your changes look good and the unit tests pass on |
|
Definitely, no hurry, it's been waiting for half a decade already 😂 |
|
I have just committed and pushed this MR into the master branch. I meant to do it properly using the github UI but somehow accidentally did it by manually applying the patch version of it (the side effect of local testing :-)). So I am closing it manually. Anyhow I have run quite extensive tests including the ones driven by https://github.com/cloudius-systems/osv/wiki/Automated-Testing-Framework and others from Closed with 894ceb2 |
Previously, pull request cloudius-systems#1223 added support for the SCHED_FIFO but didn't implement SCHED_RR. This PR attempts to follow-up on that by proposing an implementation of SCHED_RR. Time slicing support, as mandated by SCHED_RR, is implemented though the `set_realtime_time_slice(duration)` API added in the aforementioned pull request. Within the scheduler, the amount of time the thread ran so far is tracked if it exceeds the set duration, the thread is preempted (if there is a runnable thread of same or higher priority). The thread's run time is reset on preemption. This is my first OSv pull request, I added some basic tests which I can expand further. Additionally, I checked that the tests added in know what you think.
Previously, pull request cloudius-systems#1223 added support for the SCHED_FIFO but didn't implement SCHED_RR. This PR attempts to follow-up on that by proposing an implementation of SCHED_RR. Time slicing support, as mandated by SCHED_RR, is implemented though the `set_realtime_time_slice(duration)` API added in the aforementioned pull request. Within the scheduler, the amount of time the thread ran so far is tracked if it exceeds the set duration, the thread is preempted (if there is a runnable thread of same or higher priority). The thread's run time is reset on preemption. This is my first OSv pull request, I added some basic tests which I can expand further. Additionally, I checked that the tests added in know what you think. Signed-off-by: Sören Tempel <tempel@ibr.cs.tu-bs.de>
Previously, pull request cloudius-systems#1223 added support for the SCHED_FIFO but didn't implement SCHED_RR. This PR attempts to follow-up on that by proposing an implementation of SCHED_RR. Time slicing support, as mandated by SCHED_RR, is implemented though the `set_realtime_time_slice(duration)` API added in the aforementioned pull request. Within the scheduler, the amount of time the thread ran so far is tracked if it exceeds the set duration, the thread is preempted (if there is a runnable thread of same or higher priority). The thread's run time is reset on preemption. Signed-off-by: Sören Tempel <tempel@ibr.cs.tu-bs.de>
Previously, pull request cloudius-systems#1223 added support for the SCHED_FIFO but didn't implement SCHED_RR. This PR attempts to follow-up on that by proposing an implementation of SCHED_RR. Time slicing support, as mandated by SCHED_RR, is implemented though the `set_realtime_time_slice(duration)` API added in the aforementioned pull request. Within the scheduler, the amount of time the thread ran so far is tracked if it exceeds the set duration, the thread is preempted (if there is a runnable thread of same or higher priority). The thread's run time is reset on preemption. Signed-off-by: Sören Tempel <tempel@ibr.cs.tu-bs.de>
Previously, pull request cloudius-systems#1223 added support for the SCHED_FIFO but didn't implement SCHED_RR. This PR attempts to follow-up on that by proposing an implementation of SCHED_RR. Time slicing support, as mandated by SCHED_RR, is implemented though the `set_realtime_time_slice(duration)` API added in the aforementioned pull request. Within the scheduler, the amount of time the thread ran so far is tracked if it exceeds the set duration, the thread is preempted (if there is a runnable thread of same or higher priority). The thread's run time is reset on preemption. Signed-off-by: Sören Tempel <tempel@ibr.cs.tu-bs.de>
This patch adds "real-time priority" to OSv threads, which can be used to implement Posix's SCHED_FIFO and SCHED_RR scheduling policies.
Refs #386. It doesn't yet "Fixes" it because this patch does not yet add the Posix front-end to this implementation (sched_setscheduler() et al.) - this will be added in a separate patch.
We add to each thread a _realtime._priority, an unsigned integer. Normal (non-real-time) threads have _realtime._priority of 0, and are time-shared fairly as before, while realtime threads have _realtime._priority > 0, and preempt any thread with a lower realtime priority, including all normal threads.
We add a new API, t->set_realtime_priority(priority), to make a thread real-time scheduled. The "priority" is an unsigned int as explained above (0 means normal thread). The resulting scheduling policy is fully compatible with POSIX's SCHED_FIFO. This includes following the intricate rules on what happens to the running thread's place in the queue when there are several real-time threads with the same priority and the running thread yields, waits, or is preempted by a higher-priority thread. This patch also adds an exercise for these cases in tests/misc-scheduler.cc.
This patch does not yet implement time slices as needed for supporting POSIX's SCHED_RR policy. This patch already includes a time-slice setting API, t->set_realtime_time_slice(duration), but if used it produces a warning message, and the scheduler currently ignores this setting. Adding support for real-time time slices is left for a future patch: To implement time slices, we will need to keep in _realtime the total time this thread ran since starting this slice (only zeroed when the thread is removed from the run queue), and each time we switch to a realtime thread with timeslice >= 0, we'll set an appropriate timer (for timeslice minus already used time).
I tested that although this patch adds code into the scheduler it does not noticably slow down context switches of normal threads. E.g., tests/misc-ctxsw.so measured colocated context switch time of 308ns before this patch, and exactly the same after it.
While developing this patch I tested it using added workloads in tests/misc-scheduler.cc, included in this patch, but unfortunately this is not a real regression test - it has no assertions and cannot automatically detect failure (and, like all the "misc-*" tests, it doesn't run automatically in "make check").