-
Notifications
You must be signed in to change notification settings - Fork 349
Revert trace enable commits, fix DEBUG_TRACE_PTR() macro so it can be used early #4760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As reported in thesofproject#4759, thesofproject#4636 and a few others linked from there. Signed-off-by: Marc Herbert <marc.herbert@intel.com>
This reverts commit 7df3674. This restores the ability to use CONFIG_TRACEM (copy everything to mailbox) without crashing, in other words it fixes thesofproject#4699 This also fixes the other DSP panic thesofproject#4676 and removes the need for logical changes in thesofproject#4678, which can be reverted too. commit 7df3674 ("trace: enable trace after it is ready") was meant to fix a crash when tr_xxx() was used early. However I've used very early tracing for months and it never caused any crash (see thesofproject#4334) I tried adding a tr_err() statement immediately after trace_init(sof) in primary_core_init() and it works just fine. primary_core_init() runs extremely early so I don't think it's too demanding not to use an tr_XXX() before the trace even exists. The reverted commits confused initializing and enabling. Reproduction thesofproject#4683 did not seem to demonstrate anything obvious, there's not even a link to a failed test run. I don't understand how playing with spin locks is relevant to this. Later, reproduction thesofproject#4759 finally demonstrated the real issue: through DEBUG_TRACE_PTR(), some tr_XXX() can indeed be called (in very unusal debug circumstances specific to the original author) before the trace is initialized. The previous commit in this series fixes that by simply guarding it with if(trace_get()) -------- I am _not_ pretending that these reverts make the tracing code bug-free and perfect again, absolutely not and very far from it. I'm merely saying that: - The first reverted commit caused at least two regressions: thesofproject#4676 and thesofproject#4699 - These two commits added yet another variable (time) in an already complex situation with an already existing combinatorial "explosion": compile-time Kconfigs, run-time settings, platform-specific bugs (thesofproject#4333, thesofproject#4573, ...), various races, mbox + DMA, different DMA engines, Zephyr vs XTOS, etc. - Last but not least, we don't want to invest in making the exist trace implementation better. We want to switch to the Zephyr implementation instead So let's go back to a previous known good state, I mean _relatively_ good and stay there if we can. Signed-off-by: Marc Herbert <marc.herbert@intel.com>
|
https://sof-ci.01.org/sofpr/PR4760/build10326/devicetest/ is all green The checkpatch warnings in https://sof-ci.01.org/sofpr/PR4760/build10326/checkpatch/ all come from the reverted commits. |
|
Extra testing in #4758 is good: |
Once CI has passed the trace update, which order should they be applied in ? |
I don't understand sorry: I don't see any failure. The only PR to apply is this one: 2 reverts + 1 line change. |
|
|
||
| #define DEBUG_TRACE_PTR(ptr, bytes, zone, caps, flags) \ | ||
| do { \ | ||
| if (trace_get()) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, but it looks odd to add this check for each caller. Can we implement this inside the tr_xx()? Then we can fix the crash for any other potential callers e.g. the one you are demonstrating in src/init/init.c: 139
/* tr_err(&buffer_tr, "CRASHES (and requires a Linux reboot!)"); */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be more complicated because it would have to be done in a few different places because of the number of layers and configurability of the code. It will also affect every tracing statement for something that basically never happens. If done at the macro level it would also grow the code size significantly.
I don't think it's worth it to guard against a crash that happened only once in years and only in your local workspace where you turned DEBUG_HEAP on and added one extra log statement in the source. The fewer changes, the better.
trace_init() is called extremely early so it's not very demanding not to use the trace before it.
odd to add this check for each caller.
So far we found two instances, one that I made up and the other one only in your workspace.
I must have been imagining two other trace PRs yesterday.... |
This reverts commit 7df3674 + reverts its followup commit + 1 line fix for DEBUG_TRACE_PTR()
It has been successfully tested in TEST PR #4758
See commit messages, the main one copied here:
This restores the ability to use CONFIG_TRACEM (copy everything to
mailbox) without crashing, in other words it fixes #4699
This also fixes the other DSP panic #4676 and removes the need for
logical changes in #4678, which can be reverted too.
commit 7df3674 ("trace: enable trace after it is ready") was meant
to fix a crash when tr_xxx() was used early. However I've used very
early tracing for months and it never caused any crash (see #4334)
I tried adding a tr_err() statement immediately after trace_init(sof) in
primary_core_init() and it works just fine. primary_core_init() runs
extremely early so I don't think it's too demanding not to use an
tr_XXX() before the trace even exists.
The reverted commits confused initializing and enabling.
Reproduction #4683 did not seem to demonstrate anything obvious,
there's not even a link to a failed test run. I don't understand how
playing with spin locks is relevant to this.
Later, reproduction #4759 finally demonstrated the real issue: through
DEBUG_TRACE_PTR(), some tr_XXX() can indeed be called (in very unusal
debug circumstances specific to the original author) before the trace is
initialized. The previous commit in this series fixes that by simply
guarding it with if(trace_get())
I am not pretending that these reverts make the tracing code bug-free
and perfect again, absolutely not and very far from it. I'm merely
saying that:
The first reverted commit caused at least two regressions: [BUG] DSP panic when initializing the DMA trace #4676 and
[BUG] FW boot failure with SOF main when trace mailbox CONFIG_TRACEM is enabled #4699
These two commits added yet another variable (time) in an already
complex situation with an already existing combinatorial "explosion":
compile-time Kconfigs, run-time settings, platform-specific bugs
([BUG]Empty or stuck DMA trace - Initial FW ABI banner message not found in logger.data.txt #4333, Revert trace changes to fix CML suspend resume #4573, ...), various races, mbox + DMA, different DMA engines,
Zephyr vs XTOS, etc.
Last but not least, we don't want to invest in making the exist trace
implementation better. We want to switch to the Zephyr implementation
instead
So let's go back to a previous known good state, I mean relatively
good and stay there if we can.
Signed-off-by: Marc Herbert marc.herbert@intel.com