Use a dedicated thread for timers in rather than a Timer#21490
Conversation
davidfowl
commented
May 5, 2020
- This make it possible to still timeout various operations when there's thread pool starvation occurring.
- One downside is that the "heartbeatslow" warning that usually can communicate starvation to users is now gone.
- This make it possible to still timeout various operations when there's thread pool starvation occuring. - One downside is that the heartbeatslow warning that usually can communicate starvation to users is now gone.
Co-authored-by: Günther Foidl <gue@korporal.at>
|
Interesting theory. How can we tell if it's effective? Tests need updating. |
There's still risk of the thread being preempted and not getting rescheduled for long periods when there's threadpool starvation. I think if a heartbeat takes longer than the _interval (1s), we should still log the "heartbeatslow" warning. Today there are two possible causes for the warning:
I was hoping to get some customer reports of the heartbeat slow log now that we started logging the heartbeat duration in 5.0 (#15273) to see which of these causes is more common. |
| public void Dispose() | ||
| { | ||
| _timer?.Dispose(); | ||
| _stopped = true; |
There was a problem hiding this comment.
Why not implement DisposeAsync and wait? Maybe throw after a timeout.
There was a problem hiding this comment.
Because it’s not important. Any hung thread will be visible in the debugger/dump.
| _timerThread = new Thread(state => ((Heartbeat)state).TimerLoop()) | ||
| { | ||
| Name = "Kestrel Timer", | ||
| IsBackground = true |
There was a problem hiding this comment.
In LibuvThread, we set IsBackground to false in debug so we can find issues where the thread isn't properly stopping.
There was a problem hiding this comment.
Sure but the libuv thread does more than this. I wouldn’t compare the 2. We could set it to true in debug if you’re paranoid about the timer not shutting down.
There was a problem hiding this comment.
How/when does this get initiated? Is it lazy and triggered by a first request? i.e. does it need executioncontext flow supression?
There was a problem hiding this comment.
Don't want an HttpContext stuck hanging around on a Thread 😉
There was a problem hiding this comment.
How/when does this get initiated? Is it lazy and triggered by a first request? i.e. does it need executioncontext flow supression?
Startup. Could suppress execution context but that isn't new behavior. The Timer before didn't suppress it 😄 , nothing new here.
|
You're going to want to look at and the other tests in that class. |
Sure. What’s great about this is that this thread will continue to run during the ramp up to a high number of threadpool threads. Today the thread pool needs to dequeue the timer callback as threads are slowly being injected into the thread pool (fighting other threads). |
The thread could still get preempted. A developer could block too long in a custom IConnectionHeartbeatFeature callback. We could introduce a deadlock. There are a bunch of reasons we should still time OnHeartbeat() and log "heartbeatslow" if it takes over a second. I agree that "heartbeatslow" warnings are likely more often caused because bunch of timer callbacks get dequeued in rapid succession as the threadpool works through a backlog of work items, but what's the harm in timing OnHeartbeat()? Are we sure that already-dequeued timer callback never gets preempted by the OS for over a second? What's wrong with logging something if and when this does happen? Again, getting #15273 in the hands of customers would be telling. |
Yes all 4 of those developers writing connection middleware 😬. |
|
You're right about the connection middleware. I'm sure there are less that use the feature. The warning wouldn't catch a deadlock either because we don't have another thread to observe it on. We should still log if OnHeartbeat() takes over _interval though. If it never happens, no harm. |
- Print heartbeat slow if duration > interval - Fix tests
|
Is there overhead in reserving a dedicated thread? e.g. is it reserving a bunch of memory for its stack that it will never use |
There's definitely overhead but not enough to matter. This thread runs {number of connection} callbacks every second, so it's busy. |
|
Yeah, whether the overhead matters or not is more my question. I recall threads grabbing a megabyte of memory for its stack. I was wondering whether that was still a thing, and if it is, whether this is worth it. |
I think it's just reserved pretend memory these days? |
|
@dougbu what gives? |
|
@davidfowl what gives with what? |
|
There’s an OOM in msbuild blocking my merge |
|
Ah, that's a known problem w/ And, no, we don't know why the problem is suddenly much worse. @rainersigwald any comment on that side of the issue? |
|
I think dotnet/msbuild#3577 is tracking the issue (even though it's really old @wtgodbe is talking about opening a new issue) and dotnet/arcade#5421 and/or #21478 should mitigate it. There have been a lot of these failures reported in the ASP.NET Build Teams channel. |
|
OK well somebody with admin merge my stuff 😄 |
|
Just retry the failing leg. The OOM errors are flaky, not failing all the time. |