Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion tools/sof-kernel-log-check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ ignore_str="$ignore_str"'|usb 2-.: .'
#
# Buglink: https://github.com/thesofproject/sof/issues/3395

ignore_str="$ignore_str"'|sof-audio-pci 0000:00:..\..: status = 0x[0]{8} panic = 0x[0]{8}'
ignore_str="$ignore_str"'|sof-audio-pci 0000:00:..\..: status = 0x[0-f]{8} panic = 0x[0-f]{8}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you observe random panic values too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could not remember, make all into regex to avoid random failure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will ignoring any status and any panic not ignore other, totally unrelated panics?

Copy link
Contributor Author

@xiulipan xiulipan Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All sof-audio-pci 0000:00:..\..: status should be ignored as they are all from DSP reset attempts. Real panic dump start with sof-audio-pci 0000:00:..\..: error: status thesofproject/linux#2382

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It indeed seems we are getting multiple panic codecs for #3395 . @marc-hb This is ok potentially ok as in all current cases, dump is printed with sof_dev_dbg_or_err() and will always has "error: " prefix in the message if the dump is really for an error case. The problem is that this relies on the callers (of hda_dsp_dump()) to set the error flag, which might break in the future.

@xiulipan Would it be ok to limit this only for ICL platform? We shouldn't have random DSP resets on any other platforms currently, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ICL,
sof-audio-pci 0000:00:1f.3: status = 0x00000000 panic = 0x00000000
For GLK,
sof-audio-pci 0000:00:0e.0: status = 0xecc00301 panic = 0x00000000
then, should we NOT ignore panic code other than 0? I can't imagine there is panic code to ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i The DSP reset retry attempts is not only added for ICL platforms, it is a common issue for all platforms (but lower rate on others). We do see issue on GLK, CNL, ICL and TGL. If ICCMAX is enabled, we may have same fail rate for DSP reset.

PS: DSP reset is not only DSP reset, it also include init communication with CSME

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiulipan Sorry for late response to this. Ack, I can see you added mention of other platforms to linux#3395. So that does complicate matters.

I'm still a bit concerned that this could hide problems at some point in future as dbg_dump() method is a generic one. E.g. we have snd_sof_handle_fw_exception() which calls snd_sof_dsp_dbg_dump(). I did check all current instances where this is called, and in each and every place, there is a dev_err() print on the same code path, so CI would catch the error.

Given options on the tables, I'd say it's ok to ignore panic prints that are not tagged as errors. And reverse, if DSP status is dumped because of an error (like exception), driver is expected to emit at least one error trace, which CI can catch.

# There will be debug logs at each failed initializaiton of DSP before Linux 5.9
# sof-audio-pci 0000:00:1f.3: error: cl_dsp_init: timeout HDA_DSP_SRAM_REG_ROM_STATUS read
# sof-audio-pci 0000:00:1f.3: error: status = 0x00000000 panic = 0x00000000
Expand Down