Skip to content

Conversation

@ktrzcinx
Copy link
Member

Bug was caused because sdw controller is stopped first,
then DSP is stopped, so DW FIFO will never be consumed,
so timeout occurs, and watch dog will reset hardware.
Moreover polling for FIFO emty in duch a place should have positive
result only when pause take shorten than 1ms what is not
reasonable value.

Signed-off-by: Karol Trzcinski karolx.trzcinski@linux.intel.com

Bug was caused because sdw controller is stopped first,
then DSP is stopped, so DW FIFO will never be consumed,
so timeout occurs, and watch dog will reset hardware.
Moreover polling for FIFO empty in duch a place should have positive
result only when pause take shorten than 1ms what is not
reasonable value.

Signed-off-by: Karol Trzcinski <karolx.trzcinski@linux.intel.com>
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the fix be stopping DMA then SDW ? What do we do with DMIC and SSP ?

@slawblauciak
Copy link
Collaborator

Shouldn't the fix be stopping DMA then SDW ? What do we do with DMIC and SSP ?

@plbossart would that be a good solution to you in your eyes? I'm not entirely sure if we could guarantee correct flow if we were to do that.

@lgirdwood
Copy link
Member

@slawblauciak @ktrzcinx we need to have something consistent and not reorder the ops depending on DAI. e.g.

  1. Receive IPC STOP from host.
  2. Set DMA to STOP on next DMA period consumed IRQ
  3. Stop the DAI

Can you confirm the stop flow today.

@slawblauciak
Copy link
Collaborator

Thing is, the SDW controller is managed on the host side. So, this really has to be synched with kernel.

@lgirdwood
Copy link
Member

@slawblauciak which parts of SDW are controlled by host and which are FW ?

@slawblauciak
Copy link
Collaborator

The entirety of SDW is controlled by host. The FW only handles the DMA.

@lgirdwood
Copy link
Member

@slawblauciak I'm assuming DW DMA is used here ?

The programming partitioning flow with host would be

  1. User stops stream.
  2. Driver sends stop IPC to FW
  3. FW Receive IPC STOP from host.
  4. FW Sets DMA to STOP on next DMA period consumed IRQ
  5. FW IPC replies STOP complete.
  6. Driver stops the DAI.

@slawblauciak
Copy link
Collaborator

That's right, we use DW DMA for SDW/ALH.

@lgirdwood
Copy link
Member

@slawblauciak can you confirm the current flow like 1 - 6 above, it may mean we need to make driver and FW changes here.

@slawblauciak
Copy link
Collaborator

Yeah, I think that's the kind of flow we'd want here.

@plbossart
Copy link
Member

Sorry I don't understand this entire thread. We ALREADY stop the DMA before stopping the SoundWire transfers, and that creates a pop noise. thesofproject/linux#1897

@lgirdwood
Copy link
Member

@plbossart so does thesofproject/linux#1897 fix the pop issue for you ?
This will still pop if absolute value of the last sample is large.
The problem here for FW is that DMA FIFO cant be drained if DAI IO is disabled but I've no idea why @slawblauciak is seeing a panic. I would expect to just the the error messages.

@plbossart
Copy link
Member

@plbossart so does thesofproject/linux#1897 fix the pop issue for you ?
This will still pop if absolute value of the last sample is large.

No, the RT1308 will deal with this. What it can't deal with is a sustained output to -1 as is currently the case.

The problem here for FW is that DMA FIFO cant be drained if DAI IO is disabled but I've no idea why @slawblauciak is seeing a panic. I would expect to just the the error messages.

I have no idea how this error happens since it's different from what is being used today. Unless this is with the python scripts?

At any rate I started an Intel internal thread on the recommended programming sequence. let's not continue here until we know what direction to take.

@slawblauciak
Copy link
Collaborator

slawblauciak commented Apr 6, 2020

I haven't seen any DSP panics actually. Haven't tried to reproduce the problem so far.

@lgirdwood
Copy link
Member

I haven't seen any DSP panics actually. Haven't tried to reproduce the problem so far.

ok, then the PR title is misleading. Please align internally with @plbossart, I will close this for the moment, we can re-open once solution is agreed.

@lgirdwood lgirdwood closed this Apr 6, 2020
@slawblauciak
Copy link
Collaborator

Uhm, I believe there's a misunderstanding, this is not my PR :)

@paulstelian97
Copy link
Collaborator

@ktrzcinx You do still get the FW panics right? If so please reopen if this is the fix you want to go forward with.

@RanderWang
Copy link
Collaborator

the DSP panic is caused by FW for thesofproject/linux#1897

[563056340.885417] (563056320.000000) c0 WAIT                         src/lib/wait.c:39    ERROR ewt
[563056354.583333] (       13.697917) c0 DMA                    src/drivers/dw/dma.c:362   ERROR dw_dma_release() error: dma 0 channel 0 timeout
[563492634.635417] (   436280.062500) c0 WAIT                         src/lib/wait.c:39    ERROR ewt
[563492648.229167] (       13.593750) c0 DMA                    src/drivers/dw/dma.c:362   ERROR dw_dma_release() error: dma 0 channel 0 timeout
[563951513.281250] (   458865.062500) c0 WAIT                         src/lib/wait.c:39    ERROR ewt
[563951526.822917] (       13.541667) c0 DMA                    src/drivers/dw/dma.c:362   ERROR dw_dma_release() error: dma 0 channel 0 timeout
[563490940.260417] (             nan) c0 IPC                       src/ipc/handler.c:441   ipc: comp 0 -> trigger cmd 0x70000
[563490945.416667] (        5.156250) c0 PIPE         1.5       src/audio/pipeline.c:725   pipe trigger cmd 3
[563490953.072917] (        7.656250) c0 DMA                    src/drivers/dw/dma.c:343   dw_dma_release(): dma 0 channel 0 release
[563492634.635417] (     1681.562500) c0 WAIT                         src/lib/wait.c:39    ERROR ewt
[563492641.979167] (        7.343750) c0 DMA                    src/drivers/dw/dma.c:413   dw_dma_stop(): dma 0 channel 0 stop
[563492648.229167] (        6.250000) c0 DMA                    src/drivers/dw/dma.c:362   ERROR dw_dma_release() error: dma 0 channel 0 timeout
```

@lgirdwood
Copy link
Member

@RanderWang please provide more logs, I dont see a panic above. I only see that we cannot release a DMA channel. Please discuss internally with @plbossart

@tlauda
Copy link
Contributor

tlauda commented Apr 9, 2020

This code for sure causes agent panic. Such long wait shouldn't be done with interrupts disabled. Every timeout will kill DSP immediately. I don't understand why we're trying to close this.
Please also note that this problem is reported only with thesofproject/linux#1897, which changes start/stop sequence for ALH and not on master.

@lgirdwood
Copy link
Member

@tlauda it was closed until internal alignment.

@RanderWang
Copy link
Collaborator

RanderWang commented Apr 9, 2020

@tlauda it was closed until internal alignment.

@lgirdwood Pierre, @slawblauciak and me got a conclusion that we need to change the start/stop sequence for sdw. With the sequence of HDA was changed (just tested HDA, not changed. But HDA doesn't use GP-DMA), QA also reported DSP panic. All the kernel or FW logs are at https://sof-ci.01.org/linuxpr/PR1897/build3520/devicetest/CML_RVP_SDW/check-pause-resume-playback-10/

@lgirdwood
Copy link
Member

@RanderWang I've not seen any internal alignment. Please make sure I am on any email alongside @plbossart and @lbetlej .

@mengdonglin mengdonglin added the P1 Blocker bugs or important features label Apr 10, 2020
@RanderWang
Copy link
Collaborator

@RanderWang I've not seen any internal alignment. Please make sure I am on any email alongside @plbossart and @lbetlej .

@lgirdwood we discussed int microsoft team. please check the picture (edited)

1897

@lgirdwood
Copy link
Member

@RanderWang this make no sense, it's unreadable.

@plbossart
Copy link
Member

@RanderWang this make no sense, it's unreadable.

@lgirdwood the sequence used so far for SoundWire is broken and there's consensus to change it. thesofproject/linux#1897 was updated to use the same sequence for all DAIs, and keep the weird sequence for HDAudio - we still don't have a clue about the firmware underflows and panic issues for the HDaudio link DMA.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktrzcinx what happens here when the DMA channel FIFO has data when the stream is stopped ? How can the data be cleared and channel recovered ?

@ktrzcinx
Copy link
Member Author

@lgirdwood As I know FIFO should be cleared after disabling DMA enable bit, what is going to be done in dw_dma_stop() called from dw_dma_release()

@lgirdwood
Copy link
Member

SOFCI TEST

@lgirdwood
Copy link
Member

@xiulipan I guess Jenkins is back now ? Do we need to restart CI ?

@xiulipan
Copy link
Contributor

@lgirdwood I think for this PR, we may need to restart the test.

@xiulipan
Copy link
Contributor

SOFCI TEST

@lgirdwood
Copy link
Member

CI known issues.

@lgirdwood lgirdwood merged commit ab0c9e0 into thesofproject:master Apr 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P1 Blocker bugs or important features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants