Revert trace enable commits, fix DEBUG_TRACE_PTR() macro so it can be used early #4760

marc-hb · 2021-09-14T06:20:23Z

This reverts commit 7df3674 + reverts its followup commit + 1 line fix for DEBUG_TRACE_PTR()

It has been successfully tested in TEST PR #4758

See commit messages, the main one copied here:

This restores the ability to use CONFIG_TRACEM (copy everything to
mailbox) without crashing, in other words it fixes #4699

This also fixes the other DSP panic #4676 and removes the need for
logical changes in #4678, which can be reverted too.

commit 7df3674 ("trace: enable trace after it is ready") was meant
to fix a crash when tr_xxx() was used early. However I've used very
early tracing for months and it never caused any crash (see #4334)

I tried adding a tr_err() statement immediately after trace_init(sof) in
primary_core_init() and it works just fine. primary_core_init() runs
extremely early so I don't think it's too demanding not to use an
tr_XXX() before the trace even exists.

The reverted commits confused initializing and enabling.

Reproduction #4683 did not seem to demonstrate anything obvious,
there's not even a link to a failed test run. I don't understand how
playing with spin locks is relevant to this.

Later, reproduction #4759 finally demonstrated the real issue: through
DEBUG_TRACE_PTR(), some tr_XXX() can indeed be called (in very unusal
debug circumstances specific to the original author) before the trace is
initialized. The previous commit in this series fixes that by simply
guarding it with if(trace_get())

 --------

I am not pretending that these reverts make the tracing code bug-free
and perfect again, absolutely not and very far from it. I'm merely
saying that:

The first reverted commit caused at least two regressions: [BUG] DSP panic when initializing the DMA trace #4676 and
[BUG] FW boot failure with SOF main when trace mailbox CONFIG_TRACEM is enabled #4699
These two commits added yet another variable (time) in an already
complex situation with an already existing combinatorial "explosion":
compile-time Kconfigs, run-time settings, platform-specific bugs
([BUG]Empty or stuck DMA trace - Initial FW ABI banner message not found in logger.data.txt #4333, Revert trace changes to fix CML suspend resume #4573, ...), various races, mbox + DMA, different DMA engines,
Zephyr vs XTOS, etc.
Last but not least, we don't want to invest in making the exist trace
implementation better. We want to switch to the Zephyr implementation
instead

So let's go back to a previous known good state, I mean relatively
good and stay there if we can.

Signed-off-by: Marc Herbert marc.herbert@intel.com

As reported in thesofproject#4759, thesofproject#4636 and a few others linked from there. Signed-off-by: Marc Herbert <marc.herbert@intel.com>

This reverts commit 7df3674. This restores the ability to use CONFIG_TRACEM (copy everything to mailbox) without crashing, in other words it fixes thesofproject#4699 This also fixes the other DSP panic thesofproject#4676 and removes the need for logical changes in thesofproject#4678, which can be reverted too. commit 7df3674 ("trace: enable trace after it is ready") was meant to fix a crash when tr_xxx() was used early. However I've used very early tracing for months and it never caused any crash (see thesofproject#4334) I tried adding a tr_err() statement immediately after trace_init(sof) in primary_core_init() and it works just fine. primary_core_init() runs extremely early so I don't think it's too demanding not to use an tr_XXX() before the trace even exists. The reverted commits confused initializing and enabling. Reproduction thesofproject#4683 did not seem to demonstrate anything obvious, there's not even a link to a failed test run. I don't understand how playing with spin locks is relevant to this. Later, reproduction thesofproject#4759 finally demonstrated the real issue: through DEBUG_TRACE_PTR(), some tr_XXX() can indeed be called (in very unusal debug circumstances specific to the original author) before the trace is initialized. The previous commit in this series fixes that by simply guarding it with if(trace_get()) -------- I am _not_ pretending that these reverts make the tracing code bug-free and perfect again, absolutely not and very far from it. I'm merely saying that: - The first reverted commit caused at least two regressions: thesofproject#4676 and thesofproject#4699 - These two commits added yet another variable (time) in an already complex situation with an already existing combinatorial "explosion": compile-time Kconfigs, run-time settings, platform-specific bugs (thesofproject#4333, thesofproject#4573, ...), various races, mbox + DMA, different DMA engines, Zephyr vs XTOS, etc. - Last but not least, we don't want to invest in making the exist trace implementation better. We want to switch to the Zephyr implementation instead So let's go back to a previous known good state, I mean _relatively_ good and stay there if we can. Signed-off-by: Marc Herbert <marc.herbert@intel.com>

This reverts commit 89ec377. As commit 7df3674 ("trace: enable trace after it is ready") is reverted this is not required anymore. See long previous commit message. Signed-off-by: Marc Herbert <marc.herbert@intel.com>

marc-hb · 2021-09-14T07:11:22Z

https://sof-ci.01.org/sofpr/PR4760/build10326/devicetest/ is all green

The checkpatch warnings in https://sof-ci.01.org/sofpr/PR4760/build10326/checkpatch/ all come from the reverted commits.

marc-hb · 2021-09-14T07:23:31Z

Extra testing in #4758 is good:
https://sof-ci.01.org/sofpr/PR4758/build10327/devicetest/ is all green and is has the expected traces, for instance in https://sof-ci.01.org/sofpr/PR4758/build10327/devicetest/?model=CML_RVP_SDW&testcase=check-sof-logger

 TIMESTAMP      (us)           DELTA  C# COMPONENT          LOCATION                      CONTENT	ktime=185.422s  @  2021-09-14 06:47:01 +0000 UTC
[   225575306.974] (           0.000) c0 dma-trace             src/trace/dma-trace.c:392  INFO DMA: FW ABI 0x3013000 DBG ABI 0x5003000 tag v1.9-rc1-9-ge3234fe475ec src hash 0xdbe1afa7 (ldc hash 0xdbe1afa7)
[          14.115] (          14.115) c0 memory                      src/lib/alloc.c:787  INFO MARC: allocated ptr 0xbe090300 0x1800 bytes at zone 0x0
[         169.844] (         155.729) c0 ll-schedule        ./schedule/ll_schedule.c:390  INFO task add 0x9e17e340 dma-trace-task <2b972272-c5b1-4b7e-926f-0fc5cb4c4690>

 TIMESTAMP      (us)           DELTA  C# COMPONENT          LOCATION                      CONTENT	ktime=187.450s  @  2021-09-14 06:47:03 +0000 UTC
[           2.604] (           0.000) c0 buffer                      src/init/init.c:146  ERROR SUPER EARLY trace does not crash
[   225575294.109] (   225575296.000) c0 dma-trace             src/trace/dma-trace.c:386  INFO SHM: FW ABI 0x3013000 DBG ABI 0x5003000 tag v1.9-rc1-9-ge3234fe475ec src hash 0xdbe1afa7 (ldc hash 0xdbe1afa7)
Skipped 8124 bytes after the last statement.

lgirdwood · 2021-09-14T14:20:46Z

So let's go back to a previous known good state, I mean relatively
good and stay there if we can.

Once CI has passed the trace update, which order should they be applied in ?

marc-hb · 2021-09-14T14:27:45Z

Once CI has passed the trace update, which order should they be applied in ?

I don't understand sorry: I don't see any failure. The only PR to apply is this one: 2 reverts + 1 line change.

keyonjie · 2021-09-15T01:21:17Z

src/lib/alloc.c


 #define DEBUG_TRACE_PTR(ptr, bytes, zone, caps, flags) \
-	do { \
+	if (trace_get()) { \


I thought about this, but it looks odd to add this check for each caller. Can we implement this inside the tr_xx()? Then we can fix the crash for any other potential callers e.g. the one you are demonstrating in src/init/init.c: 139

/* tr_err(&buffer_tr, "CRASHES (and requires a Linux reboot!)"); */

That would be more complicated because it would have to be done in a few different places because of the number of layers and configurability of the code. It will also affect every tracing statement for something that basically never happens. If done at the macro level it would also grow the code size significantly.

I don't think it's worth it to guard against a crash that happened only once in years and only in your local workspace where you turned DEBUG_HEAP on and added one extra log statement in the source. The fewer changes, the better.

trace_init() is called extremely early so it's not very demanding not to use the trace before it.

odd to add this check for each caller.

So far we found two instances, one that I made up and the other one only in your workspace.

lgirdwood · 2021-09-15T13:26:23Z

Once CI has passed the trace update, which order should they be applied in ?

I don't understand sorry: I don't see any failure. The only PR to apply is this one: 2 reverts + 1 line change.

I must have been imagining two other trace PRs yesterday....

marc-hb added 3 commits September 14, 2021 05:12

alloc.c: fix DEBUG_TRACE_PTR() not to trace before trace is initialized

8b61686

As reported in thesofproject#4759, thesofproject#4636 and a few others linked from there. Signed-off-by: Marc Herbert <marc.herbert@intel.com>

Revert "dma-trace: add check to avoid dereference from NULL"

878bd7f

This reverts commit 89ec377. As commit 7df3674 ("trace: enable trace after it is ready") is reverted this is not required anymore. See long previous commit message. Signed-off-by: Marc Herbert <marc.herbert@intel.com>

marc-hb mentioned this pull request Sep 14, 2021

[DRAFT][TEST][DNM] Super early trace #4758

Closed

marc-hb marked this pull request as ready for review September 14, 2021 07:24

marc-hb requested review from akloniex, bardliao, dbaluta, lbetlej, lgirdwood, libinyang, mmaka1 and plbossart as code owners September 14, 2021 07:24

marc-hb requested review from iuliana-prodan, keyonjie, kv2019i, lyakh and paulstelian97 September 14, 2021 07:33

This was referenced Sep 14, 2021

dma-trace: add check to avoid dereference from NULL #4678

Merged

trace: enable trace after it is ready #4636

Merged

[BUG] FW boot failure with SOF main when trace mailbox CONFIG_TRACEM is enabled #4699

Closed

marc-hb requested a review from ranj063 September 14, 2021 08:06

plbossart approved these changes Sep 14, 2021

View reviewed changes

keyonjie reviewed Sep 15, 2021

View reviewed changes

lgirdwood approved these changes Sep 15, 2021

View reviewed changes

lgirdwood merged commit a487ca9 into thesofproject:main Sep 15, 2021

keyonjie mentioned this pull request Sep 16, 2021

fix the deadlock issue when mailbox trace is configured. #4700

Closed

marc-hb mentioned this pull request Sep 21, 2021

stable-1.9: cherry-pick "Revert trace enable commits, fix DEBUG_TRACE_PTR() macro so it can be used early #4760" #4788

Merged

marc-hb deleted the revert-trace-enable branch September 28, 2021 00:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert trace enable commits, fix DEBUG_TRACE_PTR() macro so it can be used early #4760

Revert trace enable commits, fix DEBUG_TRACE_PTR() macro so it can be used early #4760

Uh oh!

marc-hb commented Sep 14, 2021

Uh oh!

marc-hb commented Sep 14, 2021 •

edited

Loading

Uh oh!

marc-hb commented Sep 14, 2021

Uh oh!

lgirdwood commented Sep 14, 2021

Uh oh!

marc-hb commented Sep 14, 2021

Uh oh!

keyonjie Sep 15, 2021

Uh oh!

marc-hb Sep 15, 2021 •

edited

Loading

Uh oh!

lgirdwood commented Sep 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Revert trace enable commits, fix DEBUG_TRACE_PTR() macro so it can be used early #4760

Revert trace enable commits, fix DEBUG_TRACE_PTR() macro so it can be used early #4760

Uh oh!

Conversation

marc-hb commented Sep 14, 2021

Uh oh!

marc-hb commented Sep 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marc-hb commented Sep 14, 2021

Uh oh!

lgirdwood commented Sep 14, 2021

Uh oh!

marc-hb commented Sep 14, 2021

Uh oh!

keyonjie Sep 15, 2021

Choose a reason for hiding this comment

Uh oh!

marc-hb Sep 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgirdwood commented Sep 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

marc-hb commented Sep 14, 2021 •

edited

Loading

marc-hb Sep 15, 2021 •

edited

Loading