Skip to content

Conversation

@RanderWang
Copy link

Currently we use runtime->hw_ptr and BE dai link position to calculate
delay and find runtime->hw_ptr is smaller than BE dai link position for
playback. This is incorrect since snd_pcm_delay function is called
first and then hw_ptr is updated to latest one, so hw_ptr here is for
the last stream position update. This results to hw_ptr is smaller than
latest BE dai link position. This bug is exposed by a 10 microseconds
delay in fw.

#4781
#4686

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit title: ASoC: SOF: ipc4-pcm: ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it's not clear to the average reviewer what snd_sof_pcm_platform_pointer() might do either or what this improves.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use snd_sof_pcm_platform_pointer() to get latest host dma position. Updated commit message, thanks for review

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, at least I understand the commit description now.

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs more work @RanderWang, it doesn't pass basic upstream expectations for comments and explanations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it's not clear to the average reviewer what snd_sof_pcm_platform_pointer() might do either or what this improves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This begs the question "how is the boundary taken into account"?

This looks like a controversial change without much justification. That is unlikely to be accepted upstream....

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boundary is used for app_ptr and hw_ptr wrapping. Now we use snd_sof_pcm_platform_pointer which is wrapped with runtime->buffer_size, so we need to wrap be dai link position with runtime->buffer_size so that we can compare the snd_sof_pcm_platform_pointer result with be dai link position

Copy link
Author

@RanderWang RanderWang Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of boundary is always multiple of the size of period in intermediate buffer for user space,up to UINT_MAX. It is not fit for us now. Please check the dump of hw:0,0 on my device

Plug PCM: Hardware PCM card 0 'sof-nocodec' device 0 subdevice 0
Its setup is:
  stream       : PLAYBACK
  access       : RW_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 16
  buffer_size  : 16384
  period_size  : 4096
  period_time  : 85333
 ......
  boundary     : 4611686018427387904 // 4000000000000000₁₆
  appl_ptr     : 0
  hw_ptr       : 0

@RanderWang
Copy link
Author

Thanks for your comments. Updated commit message and try to make it more clear

@RanderWang
Copy link
Author

updated commit message. Thanks!

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very important fix, I can confirm this fixes video playback to PCM0 on MTL SDW device (with e.g. mplayer). I have a proposal on the implementation, please see inline.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, at least I understand the commit description now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, buffer-size and boundary-size could be both used as modulo basis, but ALSA uses the adhoc large value of "boundary" so that differences larger than buffer_size can be detected. With just the above check, you cannot detect whether the value wrapped around or whether link/host position difference is just larger than buffer_size. I think we need to cover this case as well (especially given FW just had a bug around this area so it is possible we get a completely wrong value from FW and kernel will have to recover gracefully). So I think we should keep using boundary.

OTOH, if the delay difference is larger than buffer_size, something is seriously wrong, and we should not return such delay values, but gap the value. So in this sense this patch is much safer way to calculate the delay.

How about we keep the current logic in tact but add add a check:

if (head_ptr < tail_ptr)
     delay = substream->runtime->boundary - tail_ptr + head_ptr;
else
     delay = head_ptr - tail_ptr;

if (delay > substream->runtime->buffers_size) {
      dev_dbg(sdev->dev, "Link position exceeding buffer size, head %lu tail %lu\n", head_ptr, tail_ptr);
      delay = 0;
}

return delay;

So we'd never return larger delay than buffer-size, but we'd also detect the case where link position drift too far from host position, and not ignore this silently (buffer_size module would hide this).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i snd_sof_pcm_platform_pointer() return the position in [0, buffer_size], we need to wrap tmp_ptr to [0, buffer_size] so that we can compare them. Or after few cycles, head_ptr is alway < tail_ptr for playback.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dump the log for driver,

[521646.446395] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 32 tail 16402
[521646.446696] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 32 tail 16402
[521646.531253] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 4112 tail 20482
[521646.531503] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 4112 tail 20482
[521646.616390] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 8192 tail 24562
[521646.616660] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 8192 tail 24562
[521646.702375] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 12320 tail 28690
[521646.702642] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 12320 tail 28690
[521646.787340] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 16 tail 32770
[521646.787616] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 16 tail 32770
[521646.872362] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 4096 tail 36850
[521646.872613] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 4096 tail 36850
[521646.958319] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 8224 tail 40978
[521646.958682] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 8224 tail 40978
[521647.043383] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 12304 tail 45058
[521647.043649] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 12304 tail 45058
[521647.128393] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 0 tail 49138
[521647.128650] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 0 tail 49138
[521647.214363] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 4128 tail 53266
[521647.214616] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 4128 tail 53266
[521647.299364] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 8208 tail 57346
[521647.299613] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 8208 tail 57346
[521647.384365] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 12288 tail 61426
[521647.384584] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 12288 tail 61426
[521647.470367] sof-audio-pci-intel-mtl Link position exceeding buffer size, head 32 tail 65554

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sorry @RanderWang , I missed you changed also to use the snd_sof_pcm_platform_pointer.

Hmm, we do lose the ability to express delays longer than buffer_size (and ability to detect such cases reliably, I don't think we in practise need to support such delays with SOF).

OTOH, so with now, if tmp_ptr is read with get_position, it is a value between 0 and buffersize, while the LLP read case will return a value that will wrap only at UINT64_MAX?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pointer callback is returning the DMA position between [0 , buffer_size - 1] in frames. The LLP is a linear position in frames, so the LLP needs to be normalized down with buffer_size to be usable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sorry @RanderWang , I missed you changed also to use the snd_sof_pcm_platform_pointer.

Hmm, we do lose the ability to express delays longer than buffer_size (and ability to detect such cases reliably, I don't think we in practise need to support such delays with SOF).

Hrm, that is an interesting problem. Indeed we could have a long DSP processing pipeline which would take longer to pass through than the ALSA buffer, it is a valid concern I think

OTOH, so with now, if tmp_ptr is read with get_position, it is a value between 0 and buffersize, while the LLP read case will return a value that will wrap only at UINT64_MAX?

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong button, see comment above. So fixes the issue, but change request on the implementation.

Copy link
Collaborator

@ujfalusi ujfalusi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RanderWang, I think the two patch needs to be squashed as the change from substream->runtime->status->hw_ptr to snd_sof_pcm_platform_pointer() mandates the second patch, I think the hw_ptr is a linear position within the boundary while the pointer callback return value is within the buffer?

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, re-reviewed and the boundary removal is indeed correct. We lose the ability to detect larger delta, but that could be done in a later fixup.

I'd drop the "This bug is exposed by a regression issue which results to 10ms delay in fw." sentence from the second git commit. If you want to add context, you can add a "Link: #4781" .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sorry @RanderWang , I missed you changed also to use the snd_sof_pcm_platform_pointer.

Hmm, we do lose the ability to express delays longer than buffer_size (and ability to detect such cases reliably, I don't think we in practise need to support such delays with SOF).

OTOH, so with now, if tmp_ptr is read with get_position, it is a value between 0 and buffersize, while the LLP read case will return a value that will wrap only at UINT64_MAX?

Currently we use runtime->hw_ptr for host dma position and decrease it with
BE dai link position to calculate delay and find runtime->hw_ptr is smaller
than BE dai link position for playback. This is incorrect since
snd_pcm_delay function is called first and then hw_ptr is updated to latest
host dma position, so hw_ptr here is for the last host dma position update.
This results to hw_ptr is smaller than latest BE dai link position. Now use
snd_sof_pcm_platform_pointer directly to get the latest host dma position
and calculate delay with latest BE dai link position.

The BE stream position is increased from zero to MAX_UINT and now we
compare it with snd_sof_pcm_platform_pointer which is also increased from
zero to MAX_UINT and wrapped with runtime->buffer_size, so we need to wrap
BE stream position with runtime->buffer_size instead of runtime->boundary
which is used for hw_ptr wrapping.

Link: thesofproject#4781
      thesofproject#4686

Signed-off-by: Rander Wang <rander.wang@intel.com>
@RanderWang
Copy link
Author

combine two patches into one and refined the commit message. Thanks!

@RanderWang
Copy link
Author

@RanderWang, I think the two patch needs to be squashed as the change from substream->runtime->status->hw_ptr to snd_sof_pcm_platform_pointer() mandates the second patch, I think the hw_ptr is a linear position within the boundary while the pointer callback return value is within the buffer?

sure, thanks! updated

@ujfalusi
Copy link
Collaborator

ujfalusi commented Jan 31, 2024

@RanderWang, @kv2019i, now that I have had time to think about this a bit more I came to the realization that we are doing this completely wrong.
The delay has nothing to do with the ALSA buffer size, it is entirely detached from it.

The delay is the difference between what the host DMA copied into the DSP memory and what the link DMA copied out in case of playback.
PPHCxLDPx - (PPHCxLLPx - start_offset) = delay

I would rename the .get_stream_position() callback to something more precise, it is too vague, like .get_stream_dai_frame_count() / .get_stream_dai_llp() / .get_stream_llp() [1] [2]
Introduce a new callback (which is mandatory for delay reporting to be available) to get the frame count on the host side, so the number of frames copied so far (LDP - Linear DMA Position in HDA terms): .get_stream_host_frame_count() / .get_stream_host_ldp() / .get_stream_ldp() [2]

[1] we already use LLP (Linear Link Position) in generic code, which is the number of frames sent, document that it is in frames.

[2] probably better to use non Intel term for the callbacks, so *_frame_count?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants