Skip to content

Conversation

@plbossart
Copy link
Member

This PR is the companion of kernel PR thesofproject/linux#2965

This builds on previous PRs from @bardliao (PR ##4219) and @brentlu (PR #4262)

We also build on previous work from @juimonen for the multi-ssp config where we send a DAI_CONFIG IPC in the hw_params and hw_free stages. Currently we send the same information in the two cases, so the firmware cannot take any action wrt. early clock enablement. Adding a flag is enough to provide the firmware with the information missing.

This is tagged as an ABI 3.18 change.

Results with a 50ms delay added in the kernel shows the bclock starting early and stopping late. The FSYNC starts with the data but that's explained (see commit messages).

start
stop1

@plbossart plbossart added ABI ABI change involved APL Applies to Apollolake platform CML Applies to Comet Lake platform CNL Applies to Cannonlake platform EHL Applies to Elkhart Lake GLK Applies to Gemini Lake platform ICL Applies to IceLake platform JSL Applies to Jasper Lake platform TGL Applies to Tiger Lake TGL-H TGL-H platform WHL Applies to WhiskeyLake platform ADL Applies to Alder Lake platform CFL Applies to Coffee Lake platform labels Jun 3, 2021
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spurious change

plbossart and others added 2 commits June 15, 2021 16:46
set clks_control to (SOF_DAI_INTEL_SSP_CLKCTRL_MCLK_ES |
SOF_DAI_INTEL_SSP_CLKCTRL_BCLK_ES) to enable MCLK/BCLK early start
feature.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Enable the bclk clock control for SSP2.

Note that this impacts existing GLK-based chromebooks as well as newer
hardware.

Signed-off-by: Brent Lu <brent.lu@intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
@plbossart
Copy link
Member Author

plbossart commented Jun 15, 2021

@bardliao @lgirdwood @brentlu this was indeed a backwards-compatibility issue. We only want to use the _ES bits when the kernel makes use of the HW_PARAMS and HW_FREE stages, so I introduced an indirection with a new set of flags. See the difference here compared to the last version:
ssp-req.diff.txt

I tested with an 'old' kernel and a 'new' one, in the first case the bclk/fsync start at the usual 4.2ms earlier than data (pipeline delay), and in the second case the bclk starts earlier.

@plbossart plbossart force-pushed the ssp/dai-config-status branch from 0031285 to 7199c68 Compare June 15, 2021 21:58
@bardliao
Copy link
Collaborator

@bardliao @lgirdwood @brentlu this was indeed a backwards-compatibility issue. We only want to use the _ES bits when the kernel makes use of the HW_PARAMS and HW_FREE stages, so I introduced an indirection with a new set of flags. See the difference here compared to the last version:
ssp-req.diff.txt

I tested with an 'old' kernel and a 'new' one, in the first case the bclk/fsync start at the usual 4.2ms earlier than data (pipeline delay), and in the second case the bclk starts earlier.

Good finding and good fix

@keyonjie
Copy link
Contributor

Sorry I am not quite confident about this, the merging of the Linux PR is a one of my suspects which introduces Xrun regression on TGLU_RVP_NOCODEC daily test.

@lgirdwood
Copy link
Member

@keyonjie please keep us updated. I dont think this side should cause an xrun on one platform only.

@plbossart
Copy link
Member Author

Sorry I am not quite confident about this, the merging of the Linux PR is a one of my suspects which introduces Xrun regression on TGLU_RVP_NOCODEC daily test.

what xrun and what issue @keyonjie?

The tests show an IPC error

[ 2470.317120] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x60050000: GLB_STREAM_MSG: TRIG_STOP
[ 2470.820686] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: error: ipc timed out for 0x60050000 size 12
[ 2470.820701] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: info: preventing DSP entering D3 state to preserve context
[ 2470.820709] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: status: fw entered - code 00000005
[ 2470.820824] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: error: unexpected fault 0x00000000 trace 0x00004000
[ 2470.820841] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: error: hda irq intsts 0x00000000 intlctl 0xc0000081 rirb 00
[ 2470.820847] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: error: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 2470.820856] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: error: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
[ 2470.820861] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: error: waking up any trace sleepers
[ 2470.820881] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x140000 successful
[ 2470.820886] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: error: trace IO error
[ 2470.820893] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
[ 2470.820918] kernel:  Port1: ASoC: trigger FE cmd: 0 failed: -110

@plbossart
Copy link
Member Author

@keyonjie I've run tests with the latest kernel with and without this PR, I don't see any xrun on APL_NOCODEC (same results as CI).
Bugs are always possible but this is common code used on many platforms, and only TGL_NOCODEC has an IPC error.

I will note btw that your TGL_NOCODEC topology does not use dynamic pipelines but this is configured for EHL:

"sof-tgl-nocodec\;sof-tgl-nocodec"
"sof-tgl-nocodec\;sof-ehl-nocodec\;-DDYNAMIC=1"

Something clearly misaligned. @ranj063 was this intentional?

@ranj063
Copy link
Collaborator

ranj063 commented Jun 16, 2021

@keyonjie I've run tests with the latest kernel with and without this PR, I don't see any xrun on APL_NOCODEC (same results as CI).
Bugs are always possible but this is common code used on many platforms, and only TGL_NOCODEC has an IPC error.

I will note btw that your TGL_NOCODEC topology does not use dynamic pipelines but this is configured for EHL:

"sof-tgl-nocodec\;sof-tgl-nocodec"
"sof-tgl-nocodec\;sof-ehl-nocodec\;-DDYNAMIC=1"

Something clearly misaligned. @ranj063 was this intentional?

absolutely not. I seem to have missed the sof-tgl-nocodec case. But I've added tihs in my other PR to enable more dynamic pipeline topologies

@plbossart
Copy link
Member Author

I seem to have missed the sof-tgl-nocodec case. But I've added tihs in my other PR to enable more dynamic pipeline topologies

Ack @ranj063. there's also the TGL_NOCODEC_CI topology in development/CMakefiles.txt

@keyonjie
Copy link
Contributor

@lgirdwood @plbossart if you check the FW etrace, you will find xrun reported and that's why the IPC error happen in dmesg.

@lgirdwood
Copy link
Member

@lgirdwood @plbossart if you check the FW etrace, you will find xrun reported and that's why the IPC error happen in dmesg.

Can you paste the link ? Not sure which test ?

@lgirdwood
Copy link
Member

@keyonjie any updates here ?

@keyonjie
Copy link
Contributor

let me start a daily-test coverage level test plan with this today, and come back with the result.

@keyonjie
Copy link
Contributor

@plbossart @lgirdwood @bardliao @brentlu there are more than 10 TGLU_RVP_NOCODEC cases failed with this PR applied, according to the CI plan test result 4783.

@bardliao
Copy link
Collaborator

@plbossart @lgirdwood @bardliao @brentlu there are more than 10 TGLU_RVP_NOCODEC cases failed with this PR applied, according to the CI plan test result 4783.

@keyonjie Do you know what 488520, 487496 , and 488008 registers are?

[     2098505.958280] (     2098506.000000) c1 wait                         src/lib/wait.c:46   ERROR poll timeout reg 488520 mask 63 val 0 us 937
[     4192460.666740] (     2093954.750000) c0 wait                         src/lib/wait.c:46   ERROR poll timeout reg 487496 mask 63 val 0 us 937
[     6276479.542262] (     2084018.875000) c3 wait                         src/lib/wait.c:46   ERROR poll timeout reg 488008 mask 63 val 0 us 937

@keyonjie
Copy link
Contributor

@plbossart @lgirdwood @bardliao @brentlu there are more than 10 TGLU_RVP_NOCODEC cases failed with this PR applied, according to the CI plan test result 4783.

@keyonjie Do you know what 488520, 487496 , and 488008 registers are?

[     2098505.958280] (     2098506.000000) c1 wait                         src/lib/wait.c:46   ERROR poll timeout reg 488520 mask 63 val 0 us 937
[     4192460.666740] (     2093954.750000) c0 wait                         src/lib/wait.c:46   ERROR poll timeout reg 487496 mask 63 val 0 us 937
[     6276479.542262] (     2084018.875000) c3 wait                         src/lib/wait.c:46   ERROR poll timeout reg 488008 mask 63 val 0 us 937

These are SSP TFL/TNF registers, but these errors happen on each ssp_stop() both with and without the changes here.

I think the problem is that in some cases SSP dai DMA will hit xrun more easily with the changes here applied.

@brentlu
Copy link
Contributor

brentlu commented Jun 22, 2021

the only timing change is the set config command moved from hw_params() to prepare()

BTW, does the TGLU_RVP_NOCODEC means there is no codec connected to the SSP port?

@bardliao
Copy link
Collaborator

the only timing change is the set config command moved from hw_params() to prepare()

But the change is on kernel side and it is already merged. The only difference I can tell is that we enabled MCLK and BCLK early start feature in this PR. But I have no idea how will it cause XRUN.

BTW, does the TGLU_RVP_NOCODEC means there is no codec connected to the SSP port?

Correct.

@brentlu
Copy link
Contributor

brentlu commented Jun 22, 2021

the only timing change is the set config command moved from hw_params() to prepare()

But the change is on kernel side and it is already merged. The only difference I can tell is that we enabled MCLK and BCLK early start feature in this PR. But I have no idea how will it cause XRUN.

So this PR is included in the test plan and cause XRUN?

@bardliao
Copy link
Collaborator

These are SSP TFL/TNF registers, but these errors happen on each ssp_stop() both with and without the changes here.

But I don't see those error in the PASS cases.

I think the problem is that in some cases SSP dai DMA will hit xrun more easily with the changes here applied.

@plbossart
Copy link
Member Author

plbossart commented Jun 22, 2021

@plbossart @lgirdwood @bardliao @brentlu there are more than 10 TGLU_RVP_NOCODEC cases failed with this PR applied, according to the CI plan test result 4783.

@keyonjie I am afraid I will have to give up. The results show failures on SoundWire platforms as well, so I just can't trust the results of those tests. The results on TGL nocodec make no sense either, there's no explanation for xruns when the only change is to start the SSP earlier.

And now there are conflicts?

I've spent enough time on this PR, someone will have to take over.

@bardliao
Copy link
Collaborator

I reserved a TGLU_RVP_NOCODEC device from CI pool and manually ran the check-playback-100times.sh test and no issue found. I just created a new PR to fix the conflict and will trigger a new CI test with that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ABI ABI change involved ADL Applies to Alder Lake platform APL Applies to Apollolake platform CFL Applies to Coffee Lake platform CML Applies to Comet Lake platform CNL Applies to Cannonlake platform EHL Applies to Elkhart Lake enhancement New feature or request GLK Applies to Gemini Lake platform ICL Applies to IceLake platform JSL Applies to Jasper Lake platform TGL Applies to Tiger Lake TGL-H TGL-H platform WHL Applies to WhiskeyLake platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants