Lock continuation before calling event handler.#4142
Conversation
|
when EThread::process_event(Event*, int) is called, it is not acquiring the HttpSM mutex? DNS/HostDB mutex should be same as that of HttpSM, right ? |
|
No. The lock in |
|
In any case, this stack has the HttpSM mutex not held because the assert went off. |
|
it makes sense to hold the bucket mutex. If the theory is right, we should be able to reproduce the scenario as it should crash if DNS takes longer than transaction time. |
| MUTEX_TRY_LOCK(lock, action.mutex, t); | ||
| if (!lock.is_locked()) { | ||
| // Go ahead and grab the continuation mutex or just grab the action mutex again of there is no continuation mutex | ||
| MUTEX_TRY_LOCK(lock2, (action.continuation && action.continuation->mutex) ? action.continuation->mutex : action.mutex, t); |
There was a problem hiding this comment.
Does this only happen with the global session pool? I was talking to @zwoop about this earlier today.
There was a problem hiding this comment.
I have only tested with the global pool environment.
|
Cherry picked to 8.1.0 |
After further deploying builds with the asserts in handleEvent we have seen several variants of the following stack. In the cases I've dug into the continuation called from reply_to_cont is a HttpSM. And indeed it is unlocked.
Grabbing the lock at the top of probeEvent and rescheduling if either lock cannot be obtained should solve the problem. To simply the lock scope problem, we get the action.mutex lock again if the continuation lock is not present. Should only be a compare and increment overhead.