-
Notifications
You must be signed in to change notification settings - Fork 349
Description
Describe the bug
In case DSP hits an exception, it dumps oops data for the driver to read out.
What actually happens is that I get a dump file that is partially ok. I can find "struct sof_ipc_panic_info" correctly filled. E.g. I can find SOF_IPC_PANIC_MAGIC and the rest of the struct seems ok. Start of the dump however is just zeroes and there doesn't seem to be any valid values for "struct sof_ipc_dsp_oops_xtensa". Stack dump seems ok and I can find correct functions my manually looking up symbols from the stack dump, but the coredumper python scripts cannot make sense out of this dump as many key fields are just zeroes.
To Reproduce
Cause FW to hit oops. I did this by adding following code to volume.c:volume_copy()
» » panic(1234);
» » while(1) {}
Expected behavior
I can extract the oops file by doing:
scp root@dut:/sys/kernel/debug/sof/exception oops.bin
And feed data to coredumper tool.
Impact
If DSP oops cannot be succesfully saved, debugging hard-to-reproduce bugs is severely impacted.
Environment
-
Branch name and commit hash of 3 repositories: sof (firmware), linux (kernel driver) and soft (tools & topology).
linux sofdev 2e94569
sof master e14ab70 -
Name of the topology file
n/a -
Name of the platform(s) on which the bug is observed.
APL UP2 -
Reproducibility Rate. If you can only reproduce it randomly, it’s useful to report how many times the bug has been reproduced vs. the number of attempts it’s taken to reproduce the bug.
100%
Screenshots or console output
Two example dumps attached.
Highlights from the comments below:
problem is definitely in how the dump routine handles WINDOWBASE updates. If ROTW is called even once (like happens with 1 iteration of store_register_loop), the result is an invalid dump. Code looks current and I fail to see how a single ROTW can have such impact (there are only a few ops on the core after this
Got basic gdb working at least do a degree within QEMU and it seems ROTW causes another exception and we end up in DoubleExceptionVector handler. But that's probably just a symptom, the same code works when compiled on XT-CC. I
I now got an OK exception dump (for another bug) on WHL (cnl image), built with GCC, so at least this is not happening in all cases. Rootcause still unknown.
Fwiw, the ABI between GCC and XCC is slightly different wrt calling convention and registers windows hence there are some incompatibilities with some of the dump data.