-
Notifications
You must be signed in to change notification settings - Fork 59
sof-kernel-log-check.sh: update the regex for DSP reset fails random logs #487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you observe random
panicvalues too?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could not remember, make all into regex to avoid random failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will ignoring any status and any panic not ignore other, totally unrelated panics?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All
sof-audio-pci 0000:00:..\..: statusshould be ignored as they are all from DSP reset attempts. Real panic dump start withsof-audio-pci 0000:00:..\..: error: statusthesofproject/linux#2382There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It indeed seems we are getting multiple panic codecs for #3395 . @marc-hb This is ok potentially ok as in all current cases, dump is printed with sof_dev_dbg_or_err() and will always has "error: " prefix in the message if the dump is really for an error case. The problem is that this relies on the callers (of hda_dsp_dump()) to set the error flag, which might break in the future.
@xiulipan Would it be ok to limit this only for ICL platform? We shouldn't have random DSP resets on any other platforms currently, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ICL,
sof-audio-pci 0000:00:1f.3: status = 0x00000000 panic = 0x00000000
For GLK,
sof-audio-pci 0000:00:0e.0: status = 0xecc00301 panic = 0x00000000
then, should we NOT ignore panic code other than 0? I can't imagine there is panic code to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kv2019i The DSP reset retry attempts is not only added for ICL platforms, it is a common issue for all platforms (but lower rate on others). We do see issue on GLK, CNL, ICL and TGL. If ICCMAX is enabled, we may have same fail rate for DSP reset.
PS: DSP reset is not only DSP reset, it also include init communication with CSME
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiulipan Sorry for late response to this. Ack, I can see you added mention of other platforms to linux#3395. So that does complicate matters.
I'm still a bit concerned that this could hide problems at some point in future as dbg_dump() method is a generic one. E.g. we have snd_sof_handle_fw_exception() which calls snd_sof_dsp_dbg_dump(). I did check all current instances where this is called, and in each and every place, there is a dev_err() print on the same code path, so CI would catch the error.
Given options on the tables, I'd say it's ok to ignore panic prints that are not tagged as errors. And reverse, if DSP status is dumped because of an error (like exception), driver is expected to emit at least one error trace, which CI can catch.