Skip to content

Conversation

@ujfalusi
Copy link
Collaborator

@ujfalusi ujfalusi commented Dec 9, 2024

The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this likely need to be changed to:

		if (cmd == SNDRV_PCM_TRIGGER_STOP &&
		    sdev->pdata->ipc_type == SOF_IPC_TYPE_4) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi why not just remove the call for cleanup then? What is the idea behind cleaning up the DMA and reclaiming it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - should we not just give up the link DMA ID in close() ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see the problem. Maybe we only need to clear the stream ID for stop without giving up the link DMA?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063, we need to reset the linkDMA, that is the only way to reset it's pointers and this is needed to fix the random metallic noise issue.
The firmware for IPC4 expects the linkDMA to be reset but not released, the same channel must remain to be used if we have a start after the stop.
The firmware keeps it's side of the DMA channel on stop, we need to do the same on the host side.

Copy link
Collaborator

@ranj063 ranj063 Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi do you think this will work?

diff --git a/sound/soc/sof/intel/hda-dai.c b/sound/soc/sof/intel/hda-dai.c
index f5f2f832d117..e14647e8fd16 100644
--- a/sound/soc/sof/intel/hda-dai.c
+++ b/sound/soc/sof/intel/hda-dai.c
@@ -320,10 +320,22 @@ static int __maybe_unused hda_dai_trigger(struct snd_pcm_substream *substream, i
 
        switch (cmd) {
        case SNDRV_PCM_TRIGGER_STOP:
-               ret = hda_link_dma_cleanup(substream, hext_stream, dai);
-               if (ret < 0) {
-                       dev_err(sdev->dev, "%s: failed to clean up link DMA\n", __func__);
-                       return ret;
+               /* Clear the stream ID in the case of playback without giving up the link DMA ID */
+               if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) {
+                       struct hdac_ext_stream *hext_stream;
+                       struct hdac_ext_link *hlink;
+                       int stream_tag;
+
+                       hlink = ops->get_hlink(sdev, substream);
+                       if (!hlink)
+                               return -EINVAL;
+
+                       hext_stream = ops->get_hext_stream(sdev, dai, substream);
+                       if (!hext_stream)
+                               return -EINVAL;
+
+                       stream_tag = hdac_stream(hext_stream)->stream_tag;
+                       snd_hdac_ext_bus_link_clear_stream_id(hlink, stream_tag);
                }
                break;
        case SNDRV_PCM_TRIGGER_SUSPEND:
@@ -359,7 +371,11 @@ static int hda_dai_prepare(struct snd_pcm_substream *substream, struct snd_soc_d
        struct snd_soc_dapm_widget *w = snd_soc_dai_get_widget(dai, substream->stream);
        struct snd_sof_widget *swidget = w->dobj.private;
        struct snd_sof_dev *sdev = widget_to_sdev(w);
+       struct hdac_ext_stream *hext_stream;
+       struct hdac_stream *hstream;
+       struct hdac_ext_link *hlink;
        int stream = substream->stream;
+       int stream_tag;
        int ret;
 
        if (!ops) {
@@ -378,6 +394,17 @@ static int hda_dai_prepare(struct snd_pcm_substream *substream, struct snd_soc_d
        if (ret < 0)
                return ret;
 
+       hlink = ops->get_hlink(sdev, substream);
+       if (!hlink)
+               return -EINVAL;
+
+       hext_stream = ops->get_hext_stream(sdev, dai, substream);
+
+       hstream = &hext_stream->hstream;
+       stream_tag = hstream->stream_tag;
+       if (hext_stream->hstream.direction == SNDRV_PCM_STREAM_PLAYBACK)
+               snd_hdac_ext_bus_link_set_stream_id(hlink, stream_tag);
+
        if (ops && ops->set_up_be_pipeline)
                return ops->set_up_be_pipeline(w, substream->stream);
 

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I tested this and it works for me on my HDA TGL machine

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063, this does not apply and it might re-introduce the metallic noise as we need to reset the link DMA, this patch will not do that, right?

What about this one:

diff --git a/sound/soc/sof/intel/hda-dai.c b/sound/soc/sof/intel/hda-dai.c
index 2a6f821c9e14..1a6f13feef4c 100644
--- a/sound/soc/sof/intel/hda-dai.c
+++ b/sound/soc/sof/intel/hda-dai.c
@@ -103,8 +103,10 @@ hda_dai_get_ops(struct snd_pcm_substream *substream, struct snd_soc_dai *cpu_dai
 	return sdai->platform_private;
 }
 
-int hda_link_dma_cleanup(struct snd_pcm_substream *substream, struct hdac_ext_stream *hext_stream,
-			 struct snd_soc_dai *cpu_dai)
+static int
+hda_link_dma_cleanup(struct snd_pcm_substream *substream,
+		     struct hdac_ext_stream *hext_stream,
+		     struct snd_soc_dai *cpu_dai, bool release)
 {
 	const struct hda_dai_widget_dma_ops *ops = hda_dai_get_ops(substream, cpu_dai);
 	struct sof_intel_hda_stream *hda_stream;
@@ -128,6 +130,12 @@ int hda_link_dma_cleanup(struct snd_pcm_substream *substream, struct hdac_ext_st
 		snd_hdac_ext_bus_link_clear_stream_id(hlink, stream_tag);
 	}
 
+	if (!release) {
+		/* force stream reconfiguration, including LinkDMA reset */
+		hext_stream->link_prepared = 0;
+		return 0;
+	}
+
 	if (ops->release_hext_stream)
 		ops->release_hext_stream(sdev, cpu_dai, substream);
 
@@ -211,7 +219,7 @@ static int __maybe_unused hda_dai_hw_free(struct snd_pcm_substream *substream,
 	if (!hext_stream)
 		return 0;
 
-	return hda_link_dma_cleanup(substream, hext_stream, cpu_dai);
+	return hda_link_dma_cleanup(substream, hext_stream, cpu_dai, true);
 }
 
 static int __maybe_unused hda_dai_hw_params_data(struct snd_pcm_substream *substream,
@@ -304,7 +312,8 @@ static int __maybe_unused hda_dai_trigger(struct snd_pcm_substream *substream, i
 	switch (cmd) {
 	case SNDRV_PCM_TRIGGER_STOP:
 	case SNDRV_PCM_TRIGGER_SUSPEND:
-		ret = hda_link_dma_cleanup(substream, hext_stream, dai);
+		ret = hda_link_dma_cleanup(substream, hext_stream, dai,
+					   cmd == SNDRV_PCM_TRIGGER_STOP ? false : true);
 		if (ret < 0) {
 			dev_err(sdev->dev, "%s: failed to clean up link DMA\n", __func__);
 			return ret;
@@ -673,7 +682,7 @@ static int hda_dai_suspend(struct hdac_bus *bus)
 
 			ret = hda_link_dma_cleanup(hext_stream->link_substream,
 						   hext_stream,
-						   cpu_dai);
+						   cpu_dai, true);
 			if (ret < 0)
 				return ret;
 		}
diff --git a/sound/soc/sof/intel/hda.h b/sound/soc/sof/intel/hda.h
index 22bd9c3c8216..ee4ccc1a5490 100644
--- a/sound/soc/sof/intel/hda.h
+++ b/sound/soc/sof/intel/hda.h
@@ -1038,8 +1038,6 @@ const struct hda_dai_widget_dma_ops *
 hda_select_dai_widget_ops(struct snd_sof_dev *sdev, struct snd_sof_widget *swidget);
 int hda_dai_config(struct snd_soc_dapm_widget *w, unsigned int flags,
 		   struct snd_sof_dai_config_data *data);
-int hda_link_dma_cleanup(struct snd_pcm_substream *substream, struct hdac_ext_stream *hext_stream,
-			 struct snd_soc_dai *cpu_dai);
 
 static inline struct snd_sof_dev *widget_to_sdev(struct snd_soc_dapm_widget *w)
 {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on HDA (IPC3/4) and LNL SDW (IPC4) with playback and capture and suspend

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @ujfalusi my patch didn't apply as it was on top of my pr. But I love this version. Thanks!

bardliao
bardliao previously approved these changes Dec 9, 2024
@ujfalusi ujfalusi force-pushed the peter/sof/pr/hda-dai-fix-linkdma-bleed-01 branch from 3658d2f to 0c490ec Compare December 10, 2024 11:15
@ujfalusi
Copy link
Collaborator Author

Changes since v1:

  • only reclaim the LinkDMA channel on stop.

@ujfalusi ujfalusi changed the title ASoC: SOF: Intel: hda-dai: Reclaim the link DMA on STOP/SUSPEND with … ASoC: SOF: Intel: hda-dai: Reclaim the link DMA on STOP with … Dec 10, 2024
@ujfalusi
Copy link
Collaborator Author

The -fanalyzer gets it completely wrong: hda_link_dma_setup() is only called from two function and the call location is after a check for ops == NULL, in hda_link_dma_setup() ops cannot be NULL as the caller functions would have failed earlier.

Comment on lines 329 to 344
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth commenting the link DMA flow in a bit more detail since this is complex (if not already detailed in code).

@ujfalusi ujfalusi force-pushed the peter/sof/pr/hda-dai-fix-linkdma-bleed-01 branch from 0c490ec to a940ab9 Compare December 10, 2024 13:12
@ujfalusi
Copy link
Collaborator Author

Changes since v2:

  • Update comment
  • Add unlikely NULL check for ops to please static analyzer

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cautious +1 on this. The need to coordinate 1) host-side programming of the link DMA, with 2) IPCs sent to FW, is just complex. A few comments inline.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder @ranj063 @ujfalusi if we should modify the logic in ab55937 and delay the hda_link_dma_cleanup() on PCM_TRIGGER_STOP. At this point, FW pipeline is put to PAUSED state so it is still hanging on to the link DMA channel (see annotated log in #5198 (comment) ).

This now works in the original case (same stream restarted later with a new prepare), but indeed if another stream is allocated on host side (like the bug this PR is address), the link DMA channel is allocated to other purposes and the flow won't work anymore.

So could this be delayed to either close() or new prepare() (whichever comes first). For suspend this needs to be immediate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was delayed before ab55937 and we needed to move it ahead to fix metallic noise (on restart w/o stop) and the explosion of the delay reporting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, you mean the hda_link_dma_cleanup() ? I think we have tried that but it was causing other issues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, this does seem correct and should work in all cases. It just looks a bit weird to first clean and then immediately setup again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add a flag parameter for hda_link_dma_cleanup() (reclaim_channel?) and hide it there? I'm not sure if that is better or worse

The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
@ujfalusi ujfalusi force-pushed the peter/sof/pr/hda-dai-fix-linkdma-bleed-01 branch from a940ab9 to 8c1ab8e Compare December 11, 2024 13:07
@ujfalusi ujfalusi changed the title ASoC: SOF: Intel: hda-dai: Reclaim the link DMA on STOP with … ASoC: SOF: Intel: hda-dai: Do not release the link DMA on STOP Dec 11, 2024
@ujfalusi
Copy link
Collaborator Author

Changes since v3:

  • New patch to not even release the LinkDMA channel (new flag added to hda_link_dma_cleanup() )

@ranj063
Copy link
Collaborator

ranj063 commented Dec 11, 2024

@ujfalusi is the LNL HDA alsabat failure a known issue?

@lgirdwood
Copy link
Member

@ujfalusi is the LNL HDA alsabat failure a known issue?

Yes, suspect atm is AC noise on a DUT. Team looking at it.

@ranj063 ranj063 requested a review from bardliao December 13, 2024 04:17
@ujfalusi
Copy link
Collaborator Author

@ujfalusi is the LNL HDA alsabat failure a known issue?

It appears to be present in all PR tests

@ranj063 ranj063 merged commit 89772eb into thesofproject:topic/sof-dev Dec 13, 2024
9 of 14 checks passed
@ujfalusi ujfalusi deleted the peter/sof/pr/hda-dai-fix-linkdma-bleed-01 branch May 19, 2025 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] DSP Panic on Intel MTL

5 participants