Skip to content

Conversation

@iuliana-prodan
Copy link
Contributor

@iuliana-prodan iuliana-prodan commented Feb 23, 2021

This pull request fixes #3809.

The first commit updates the correct order for DAI and DMA start/stop.
Before this update, on multiple start/stop runs for playback or record or combined,
we get an underrun/overflow.
That's because the DAI makes a DMA request, before the DMA channel is
enabled.

The second commit updates the start/stop operations for SAI.
Please see a more detailed explanation in the commit message.

After the first 3 commits, after multiple playback/capture runs (~700, ~1000)
everything seems ok.
But, sometimes, when running a second/third time, the same 700 loop I get an "ipc timed out for 0x60050000 size 12".
Here's a log:

Playing raw data '16/j16.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
Playing raw data '16/j16.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
Recording WAVE 'test_record.pcm' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
[  582.814139] sof-audio-of 556e8000.dsp: error: ipc timed out for 0x60050000 size 12
[  582.821802] sof-audio-of 556e8000.dsp: info: preventing DSP entering D3 state to preserve context
[  582.830906] sof-audio-of 556e8000.dsp: invalid header size 0x2000200. FW oops is bogus
[  582.838989] sof-audio-of 556e8000.dsp: error: unexpected fault 0x00000000 trace 0x00000000
[  582.847530] sof-audio-of 556e8000.dsp: error: waking up any trace sleepers
[  582.854601] sof-audio-of 556e8000.dsp: error: trace IO error
[  582.861198] sof-audio-of 556e8000.dsp: ASoC: error at snd_soc_pcm_component_trigger on 556e8000.dsp: -110
[  582.871466]  Port0: ASoC: trigger FE cmd: 0 failed: -110
Playing raw data '16/j16.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
Recording WAVE 'test_record.pcm' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
aplay: pcm_write:2061: write error: Input/output error
(null)      1  TFAIL  :  ltpapicmd.c:188: Test 420 Playback failed
Recording WAVE 'test_record.pcm' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
Recording WAVE 'test_record.pcm' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo

By looking in the trace I saw that the stop cmd is not triggered in DSP
and in SAI we get two start operations one after another.
Therefore, in the 3rd commit, we check in SAI, on start, if a start was already
triggered, disable just the DMA request and the data channel.

@aiChaoSONG
Copy link
Collaborator

SOFCI TEST

@keyonjie
Copy link
Contributor

Looks this fails on some BYT platforms according to CI?

@aiChaoSONG
Copy link
Collaborator

aiChaoSONG commented Feb 24, 2021

Edit: Wrong comment, please ignore.
The failures on BYT are all related to pipeline PCM: PCM Deep Buffer [hw:0,1]

@iuliana-prodan
Copy link
Contributor Author

iuliana-prodan commented Feb 24, 2021

@aiChaoSONG @keyonjie @lgirdwood @ranj063 @plbossart

The only thing that might impact the CI tests it's the order from dai (commit 1).
I saw this commit 5c8c804 ("dai: change starting sequence"), but it says "The purpose of this change is to allow slave interfaces to prepare their FIFOs before DMA starts transferring to them.".
I believe there are multiple Intel's topologies that uses slave interfaces, not just sof-byt-nocodec.tplg (this is from the failed test, but I cannot find it in sof repo).

Therefore, can somebody explain why, in the first place, the order for DAI and DMA was changed?
Where is this sof-byt-nocodec.tplg? Is it just for testing?

Thanks!

@iuliana-prodan
Copy link
Contributor Author

iuliana-prodan commented Feb 24, 2021

@aiChaoSONG @keyonjie @lgirdwood @ranj063 @plbossart

I looked further into the error and the problems seems to be on dw_dma_stop, which expects for the fifo to be cleared.
Most probably this is not drained anymore because the peripheral was stopped before the DMA.
Why do you need to clear the fifo? We just stopped the operation, can we just reset the fifo?
Or, isn’t this fifo emptied by default? Sorry, I don’t know the hardware.

In the imx case, this is not applicable.

If there is no other way to empty the fifo, maybe we can add the “#if CONFIG_DMA_SUSPEND_DRAIN” in dai.c and, for this case, we keep the old order for DAI and DMA.

@lgirdwood
Copy link
Member

@aiChaoSONG @keyonjie @lgirdwood @ranj063 @plbossart

I looked further into the error and the problems seems to be on dw_dma_stop, which expects for the fifo to be cleared.
Most probably this is not drained anymore because the peripheral was stopped before the DMA.
Why do you need to clear the fifo?

The DW DMA needs the FIFO clear so it's in a empty state prior to next usage. i.e. no stale data is sent out.

We just stopped the operation, can we just reset the fifo?

Or, isn’t this fifo emptied by default? Sorry, I don’t know the hardware.

In the imx case, this is not applicable.

ok, we need to differentiate then.

If there is no other way to empty the fifo, maybe we can add the “#if CONFIG_DMA_SUSPEND_DRAIN” in dai.c and, for this case, we keep the old order for DAI and DMA.

The best place for this is in the DMA driver and not any core logic. Btw, I dont see any reference to draining in dai.c, can you tell me where you would put the ifdef ?

@iuliana-prodan
Copy link
Contributor Author

If there is no other way to empty the fifo, maybe we can add the “#if CONFIG_DMA_SUSPEND_DRAIN” in dai.c and, for this case, we keep the old order for DAI and DMA.

The best place for this is in the DMA driver and not any core logic. Btw, I dont see any reference to draining in dai.c, can you tell me where you would put the ifdef ?

@lgirdwood There is already a CONFIG_DMA_SUSPEND_DRAIN in dw/dma.c.
I was thinking to add the same config (CONFIG_DMA_SUSPEND_DRAIN) in dai.c, in start and stop operations.
If the CONFIG_DMA_SUSPEND_DRAIN is set, we keep the order as before this change, otherwise we use the new order (added by commit 1 from this pull request).

@dbaluta
Copy link
Collaborator

dbaluta commented Feb 26, 2021

@lgirdwood Linux kernel ALSA order of DAI/DMA start is as @iuliana-prodan fixed in her patch. Any idea, why the ALSA with a non-dsp setup works but the DSP has problems?

@lgirdwood
Copy link
Member

@lgirdwood Linux kernel ALSA order of DAI/DMA start is as @iuliana-prodan fixed in her patch. Any idea, why the ALSA with a non-dsp setup works but the DSP has problems?

Any update, could it be Pulseaudio searching for working configs on each PCM ?

@lgirdwood
Copy link
Member

If there is no other way to empty the fifo, maybe we can add the “#if CONFIG_DMA_SUSPEND_DRAIN” in dai.c and, for this case, we keep the old order for DAI and DMA.

The best place for this is in the DMA driver and not any core logic. Btw, I dont see any reference to draining in dai.c, can you tell me where you would put the ifdef ?

@iuliana-prodan So looking at the code again this is high level ordering and IIRC we also have a similar configuration in ASoC.

I would agree now and say here that we cant easily do in dma driver, it's a higher level framework CONFIG_ Kconfig setting. i.e. IMX should enable this in the Kconfig, I'm gathering patches for v1.7-final so we should add this fix.

@iuliana-prodan iuliana-prodan requested a review from dbaluta March 4, 2021 11:27
To stop/suspend an active DMA channel:
1. Stop the DMA service request at the peripheral first (stop the DAI);
2. Disable the hardware service request on the appropriate DMA channel.

For start/resume:
1. Enable the DMA service request on the appropriate channel;
2. Enable the DMA service request at the peripheral (enable DAI).

When the start/stop order for DMA and DAI is different, on multiple
start/stop runs for playback or record or combined, we get an
underrun/overflow.
That's because the DAI makes a DMA request, before the DMA channel is
enabled.

Some platforms cannot just simple disable DMA channel during
the transfer, because it will hang the whole DMA controller.
Therefore, for DMA_SUSPEND_DRAIN, stop the DMA first
and let the DAI drain the FIFO in order to stop the channel
as soon as possible.

Fixes: thesofproject#3809

Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com>
Set SAI watermark only once, on sai_set_config().
There is no need to set it each time, on start().

SAI watermark is kept on half FIFO size.

Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com>
For SAI, we have the synchronous mode enabled: the transmitter is
configured for asynchronous operation and the receiver for
synchronous operation.
In this case, transmitter bit clock and frame sync are used by both
the transmitter and receiver. So, when enabling RX we need to enable TX
(if not already enabled).

Therefore, for a clear start, we first do a software reset for the current
direction, but checking the state of RX and TX.
This will reset the internal transmitter/receiver logic including the FIFO
pointers.

For capture we can disable the receiver data channel, but on playback,
we can disable the transmitter only if the RX has the DMA requests
disabled.
Also, for capture, there's no need to enable the transmit data channel.
It's sufficient to enable only the transmitter, which enables the bit
clock (shared with RX).

On stop, we just need to disable the DMA request, the transmit/receive data
channel, the interrupts and the receiver and/or the transmitter.

Fixes: thesofproject#3809

Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com>
On start W1C the Work Start Flag, Sync Error Flag and
FIFO Error Flag.

Write a logic 1 to this field to clear each of this
flags and have a clean start for SAI.

Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com>
@iuliana-prodan
Copy link
Contributor Author

@dbaluta I've added a new version.
As per our discussion, I've dropped the 4th commit from my previous version (the extra checks for ipc timed out issue).
This needs more debugging and it will be handled in a separate pull request.

I've also separated the "sai start/stop operations" in 3 patches since some changes are not linked.

@lgirdwood
Copy link
Member

@zrombel this PR should have 0 impact on Intel systems but I do see it failing all codec PR tests ? Is this a false positive ?

@iuliana-prodan
Copy link
Contributor Author

@lgirdwood @dbaluta
"All checks have passed" - I think is good to be merged :)

@dbaluta dbaluta merged commit 5f93ad4 into thesofproject:master Mar 15, 2021
@dbaluta
Copy link
Collaborator

dbaluta commented Mar 15, 2021

Done. Iulia now we should backport these changes on 1.5 and 1.6 :D

@iuliana-prodan
Copy link
Contributor Author

Done. Iulia now we should backport these changes on 1.5 and 1.6 :D

Yes, I will backport these patches.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG IMX] Playback and capture occasionally fails

6 participants