Skip to content

[BUG][ADL-P] DSP panic on device boot-up or resume-from-suspend for sof-adl-max98357a-rt5682-2way tplg #6964

@johnylin76

Description

@johnylin76

Describe the bug
ChromeOS device: Brya/Taniks (ADL-P)
ChromeOS image version: R111-15313.0.0

Under this OS version the backporting v5.10 kernel commit chain is landed.
The waves-integrated SOF build is generated from Google-internal rpl-001-drop-stable branch (checkout head: https://chrome-internal-review.googlesource.com/c/chromeos/third_party/sound-open-firmware-private/+/5325552/1).
While issues being fixed on other devices e.g. Vell, Primus, we found the DSP panic is now observable on Taniks devices. And it seems to be (not strictly-verified) reproducible on Taniks only, and only after kernel backporting commits landing (it can pass by the latest SOF build + OS version 15308.0.0)

To Reproduce
Can be reproduced by Taniks (tplg: sof-adl-max98357a-rt5682-waves-2way). Verified that even if we removed waves (use sof-adl-max98357a-rt5682-2way.tplg instread) the issue is still reproducible.

  1. Flash OS image R111-15313.0.0
  2. DSP panic may be observed after device rebooted (not 100%)
  3. DSP panic can be ~100% reproducible after resume-from-suspend by command suspend_stress_test -c 1

Observation
To summarize the observation so far:

proposed fixes attached conf/tplg Core SPK Core DMIC48K Core DMIC16K/KWD DSP panic?
the present tplg sof-adl-max98357a-rt5682-waves-2way 1 1 0 Has DSP panic
tplg w/ removing Waves sof-adl-max98357a-rt5682-2way 1 1 0 Has DSP panic
tplg running on one core sof-adl-max98357a-rt5682-waves-2way-core0 0 0 0 NO
tplg w/ removing dmic16k/KWD sof-adl-max98357a-rt5682-waves-2way-nohotword 1 1 0 NO

(tplg/conf files can be found in the attached zip-file)

Impact
Audio broken on Taniks devices

Log Analysis

The observed DSP panic information is like the following:
image

Although the error was shown on DMIC0 (DMIC48K), by comparing between sof-loggers from the present tplg (left) and the one-core tplg (right), the error seems to start from the suspicious behavior while re-loading DMIC16K/KWD topology of the present tplg after resume-from-suspend. (logs can be found in the attached zip-file)

The following sof-loggers are extracted from the timing after resume-from-suspend. Arrows in yellow are starting points for loading a new pipeline. Both of the first loading pipe_id is 12 (KWD pipeline). Logs on the right (one-core tplg) show the next actions to create selector (arrow in green) and google-hotword-detect which are located on KWD pipeline, and then load pipe_id 11 (DMIC16K pipeline).
However, logs on the left (present tplg) show the different behavior which jumps to Core#1 for edf_scheduler_init (arrow in read) and then starts to load pipe_id 10 (DMIC48K pipeline). KWD/DMIC16K pipelines seem to be skipped loading since I didn't find in the following logs.
image

In the end of logs from the present tplg we can find the error message on ipc_comp_connect which leads to DSP issue like the example below (sink_id 59 stands for the selector located on KWD pipeline while source_id 60 is its source buffer).
image

From my perspective the missing logs for loading KWD/DMIC16K pipelines might be caused from the side effect of multi-core processing. However the ipc_comp_connect error shouldn't be expected which implies the defect by DSP recovering during suspend-resume. However, I have no idea why it is only observed on sof-adl-max98357a-rt5682-2way cases (w/ and w/o Waves). Would it happen to meet the corner case for AMP_SSP=2 or codec in TDM mode?

rpl-001-waves-dsp-panic-issue-taniks.zip

Metadata

Metadata

Labels

ADLApplies to Alder Lake platformP2Critical bugs or normal featuresbugSomething isn't working as expectedchromeChromebooks or ChromeOSstaleIssue/PR marked as stale and will be closed after 14 days if there is no activity.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions