-
Notifications
You must be signed in to change notification settings - Fork 349
fix the deadlock issue when mailbox trace is configured. #4700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
keyonjie
commented
Aug 31, 2021
lgirdwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find @keyonjie - I agree we have a race between the trace work being scheduled and the the trace_off() call. I've had a quick check and Its made even more complex due to
- We have a
trace->enabledflag and adma_trace_data->enabledflag, we should only have a one flag. - It does not look like we are flushing when we turn trace off (so missing the last data).
- The
trace_work()is not checkingtrace->enabledflag at entry. This would be a big simplification against scheduling races. i.e.
if (!trace->enabled)
return SOF_TASK_STATE_CANCEL;If we fix 1,2 & 3 then we do not need IRQ off locking around the trace_on() and trace_off() calls.
|
|
||
| schedule_task(&trace_data->dmat_work, DMA_TRACE_PERIOD, | ||
| DMA_TRACE_PERIOD); | ||
| trace_data->enabled = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're setting this flag protected by the spinlock, but you test it on like 480 without any protection.
Is this intentional?
src/trace/trace.c
Outdated
| spin_lock_irq(&trace->lock, flags); | ||
|
|
||
| trace->enable = 1; | ||
| /* should not do this with trace->lock held as there is trace calling internal */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not able to parse the sentence above. 'there is trace calling internal' -> missing a complement. Or that was 'trace calling internally', which doesn't make more sense.
| dma_trace_on(); | ||
|
|
||
| spin_lock_irq(&trace->lock, flags); | ||
| trace->enable = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the trace is enabled after the 'dma trace'. that seems surprising?
src/trace/trace.c
Outdated
| spin_lock_irq(&trace->lock, flags); | ||
|
|
||
| trace->enable = 0; | ||
| /* should not do this with trace->lock held as there is trace calling internal */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, this comment needs to be reworded.
| return; | ||
|
|
||
| trace_data->enabled = 1; | ||
| spin_lock_irq(&trace_data->lock, flags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a good understanding of what this lock is protecting, vs. what the other trace lock is protecting? The use of this lock doesn't seem to be fully consistent. E.g. dtrace_event() this lock is also covering the test of .copy_in_progress, but not in a consistent way, because both locations where .copy_in_progress is set to 1 aren't protected by that lock, so, holding the lock while testing it doesn't help. Why do we have to lock here at all?
we could have different use cases, e.g. disable dma trace while mailbox trace is enabled, that's why we need 2 'enabled' flags correspondingly. The race here is not on the flushing point, the schedule_task() calling in the dma_trace_on() will try to logging out something with trace_info(), which will require to hold the trace->log again. |
ok, so we dont need 1 - but 2 & 3 are needed.
Doing 3 above means no locking for on/off calls..
Oh, this code looks wrong. We are entering atomic context (no IRQs) |
So looks we are doing things oppositely here? |
Hold the dma-trace lock when performing on/off switching, to make sure the status is consistent. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
As there is trace calling in the dma_trace_on/off() internal, we should not do that with trace->lock held, to avoid deadlock. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
marc-hb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure there are plenty valid points in this discussion but let's please first do a couple small reverts and get back to the "last known good state", fixing both #4676 and #4699 with a single line. The two recent commits that I'm reverting in #4760 started a "chain reaction" that may end up in completely refactoring the existing trace and I believe we don't want that. See longer commit message in #4760.
Oh, this code looks wrong. We are entering atomic context (no IRQs) if (!send_atomic) ??
send_atomic is very confusing, see revert of very old confusion in #4246