Skip to content

Conversation

@marc-hb
Copy link
Collaborator

@marc-hb marc-hb commented Feb 24, 2022

This reverts commit 58ce6e6.

This commit has caused the DMA trace to be frequently empty on a number
of ADL and TGL in daily runs for two days in a row (test runs 10429 and
10459).

The DMA trace could be completely empty in the past but that was on
different platforms and a very long time ago, this had been fixed in
bug #4333 and had not happened for many months.

The only commit difference between the last successful test
run (10402) and the first failing one (10459) is this commit.

Signed-off-by: Marc Herbert marc.herbert@intel.com

This reverts commit 58ce6e6.

This commit has caused the DMA trace to be frequently empty on a number
of ADL and TGL in daily runs for two days in a row (test runs 10429 and
10459).

The DMA trace could be completely empty in the past but that was on
different platforms and a very long time ago, this had been fixed in
bug thesofproject#4333 and had not happened for many months.

The _only_ commit difference between the last successful test
run (10402) and the first failing one (10459) is this commit.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb marc-hb force-pushed the revert-trace-preserve branch from 369fd10 to dd9d8f9 Compare February 24, 2022 01:05
@marc-hb marc-hb marked this pull request as ready for review February 24, 2022 01:08
@marc-hb marc-hb requested a review from akloniex as a code owner February 24, 2022 01:08
@marc-hb
Copy link
Collaborator Author

marc-hb commented Feb 24, 2022

@ujfalusi
Copy link
Contributor

This commit has caused the DMA trace to be frequently empty on a number of ADL and TGL in daily runs for two days in a row (test runs 10429 and 10459).

I have randomly clicked on results in them and by luck I only saw test reports where there is slogger data.
However I have found this for example:
10429: https://sof-ci.sh.intel.com/#/result/planresultdetail/10429?model=ADLP_RVP_NOCODEC_CI&testcase=check-runtime-pm-status-15
10459: https://sof-ci.sh.intel.com/#/result/planresultdetail/10459?model=ADLP_RVP_NOCODEC_CI&testcase=check-runtime-pm-status-15

10429 is PASS, 10459 FAILED. and 10459 does not have slogger. I have looked back to try to see what happened in the past when the same test on the same device failed, but I yet to find any.
Suspend-resume tests are fine (even if they FAIL), by the look of it this looks like runtime_pm suspend-resume related?

The DMA trace could be completely empty in the past but that was on different platforms and a very long time ago, this had been fixed in bug #4333 and had not happened for many months.

If the issue has not been seen for many months then what #5329 fixed? It was merged two weeks ago.

The only commit difference between the last successful test run (10402) and the first failing one (10459) is this commit.

It is hard to argue against test results...
I'm genuinely surprised that commit 58ce6e6 have this side effect.
We don't need the dtrace client conversion in sof-dev (I have it again running) and the patch was to fix the issue when the dma-trace module removed and inserted back while the DSP is kept alive with audio playback for example.
Without that patch when the dtrace is re-enabled and I forced some dtrace activity via gain change I was seeing the DMA trace data from the start of the internal buffer coming in again instead of the fresh prints I was expecting.
The DMA got released and re-allocated, configured when the dtrace is removed and then re-insterted. We kept placing the dtrace logs wherever we left when the dtrace was disabled, but the DMA starts to copy from the start of the buffer.

Now, in sof-dev we can not do this. The dtrace is disabled when before the DSP powers off (we send the IPC) and the only explanation for this is that time to time for some reason the DSP does not power off, retains the local buffer and we overwrite the old banner in there -> sof-logger will not find it anymore.

ADLP_RVP_NOCODEC:check-sof-logger.sh is pretty consistent on not having dtrace data and this is what dmesg has to say:

[  118.935161] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x90030000: GLB_TRACE_MSG: DMA_PARAMS_EXT
[  118.935723] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx succeeded: 0x90030000: GLB_TRACE_MSG: DMA_PARAMS_EXT
[  118.935762] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x80]=0x14001e successful
[  118.935802] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x40020000: GLB_PM_MSG: CTX_RESTORE
[  118.936050] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx succeeded: 0x40020000: GLB_PM_MSG: CTX_RESTORE
[  118.936077] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc rx: 0x90020000: GLB_TRACE_MSG: DMA_POSITION
[  118.936130] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc rx done: 0x90020000: GLB_TRACE_MSG: DMA_POSITION
[  118.946245] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x80]=0x20140000 successful
[  118.946268] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x90050000: GLB_TRACE_MSG: DMA_FREE
[  118.946455] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx succeeded: 0x90050000: GLB_TRACE_MSG: DMA_FREE
[  118.946489] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x40010000: GLB_PM_MSG: CTX_SAVE
[  118.946734] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx succeeded: 0x40010000: GLB_PM_MSG: CTX_SAVE
[  118.946869] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x4]=0x9010f0f successful
[  118.946883] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x4]=0xf0f successful
[  118.946889] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: DSP core(s) enabled? 0 : core_mask 1
...
[  120.072015] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x90030000: GLB_TRACE_MSG: DMA_PARAMS_EXT
[  120.072642] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx succeeded: 0x90030000: GLB_TRACE_MSG: DMA_PARAMS_EXT
[  120.073148] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc rx: 0x90020000: GLB_TRACE_MSG: DMA_POSITION
[  120.073207] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc rx done: 0x90020000: GLB_TRACE_MSG: DMA_POSITION
[  120.073229] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x80]=0x2014001e successful
[  120.073591] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: created machine sof-nocodec
[  120.074041] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: Completing SOF PCI probe

The read/write pointer got reset when we sent DMA_FREE and the dtrace got re-enabled with DMA_PARAMS_EXT.

I really like to keep my commit, let's see if #5416 works, but if not then I bow before the proof and I need to revisit the issue again to unblock the dtrace client support.

@gkbldcig
Copy link
Collaborator

Can one of the admins verify this patch?

@lgirdwood lgirdwood merged commit e3e038d into thesofproject:main Mar 1, 2022
@ujfalusi
Copy link
Contributor

ujfalusi commented Mar 2, 2022

@marc-hb, just to point out that the dtrace is still empty randomly with this revert in.
For example: http://sof-ci.sh.intel.com/#/result/planresultdetail/10633?model=ADLP_RVP_NOCODEC&testcase=check-sof-logger

Based on my analysis in #5416, that reverted patch had 0 effect on the probability of missing dtrace log.

@marc-hb marc-hb deleted the revert-trace-preserve branch March 2, 2022 22:38
@marc-hb
Copy link
Collaborator Author

marc-hb commented Mar 2, 2022

Indeed, it's still empty in https://sof-ci.01.org/softestpr/PR873/build1001/devicetest/?model=ADLP_RVP_NOCODEC&testcase=check-sof-logger

Based on my analysis in #5416, that reverted patch had 0 effect on the probability of missing dtrace log.

Sorry about that. I gave the git evidence above showing that this commit was the only change when the probability suddenly jumped from 0% to 10% but it seems we live in a "post-truth" world... :-(

I haven't had time to look at your analysis, sorry again.

@ujfalusi
Copy link
Contributor

ujfalusi commented Mar 3, 2022

Sorry about that. I gave the git evidence above showing that this commit was the only change when the probability suddenly jumped from 0% to 10% but it seems we live in a "post-truth" world... :-(

I would have came to the same conclusion, likely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants