test(scheduler): repro signal watcher failure during SIGINT cleanup#14023
test(scheduler): repro signal watcher failure during SIGINT cleanup#14023anmonteiro wants to merge 2 commits intoocaml:mainfrom
Conversation
b7d25d2 to
2c8e50c
Compare
64ffe05 to
c05f2a1
Compare
980fb36 to
614796b
Compare
|
Could you explain the bug that is being fixed here? |
|
Glancing over the fix, the changes don't really make sense as they essentially introduce a separate signal watching thread for SIGCHLD. Why would we want that unless using |
|
For sure, perhaps I haven't gotten to the absolute root cause of the issue, but the 1st commit reproduces the issue in our macOS tests. I will do more digging to see if I can unearth something, but essentially under high process concurrency / cancelation, I observe the |
|
That's concerning, but I'm not sure I'd suspect the issue is in dune. I would suspect there's something wrong with the OCaml runtime if it's returning us a garbage signal. Why can't we just ignore such junk signals? |
|
we may just be able to ignore the junk signals. I'll push an alternative fix doing that, instead. In parallel, I'm trying to track down the specific bug, at this point I agree it's not in Dune. |
|
FWIW, we don't necessarily need to watch signals in a separate thread. If watch signals in the main thread works better on macos, then so be it. But we need to do it for all signals. Moreover, we need to justify why we can't write the signal handlers in OCaml. |
Signed-off-by: Antonio Nuno Monteiro <anmonteiro@gmail.com>
95c7fe6 to
1076fb9
Compare
Signed-off-by: Antonio Nuno Monteiro <anmonteiro@gmail.com>
1076fb9 to
c5c65a8
Compare
|
I haven't found a good way to write this:
so far, the only solution that i've seen work well is handling only so i'm proceeding with the mitigation in #14047, which at least gets Dune unstuck. |
alternative to #14023 Signed-off-by: Antonio Nuno Monteiro <anmonteiro@gmail.com>
Job_complete_ready.assert falseonSIGKILL#13370