Skip to content

Conversation

@kv2019i
Copy link
Collaborator

@kv2019i kv2019i commented Jul 30, 2025

When the sampling rates going in (host) and out (dai) from the DSP are different, the IPC4 delay reporting does not work correctly. Add support for this case by scaling the host position values to a common timebase before calculating real-time delay for the PCM.

Link: #5498

@kv2019i
Copy link
Collaborator Author

kv2019i commented Aug 4, 2025

V2:

  • resolved the TODO in first version related to boundary condition handling
  • now ready for review

Copy link
Collaborator

@ujfalusi ujfalusi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i, this is indeed was not handled correctly, thanks for fixing it!
Few comments, but the patch looks good otherwise. I would add fixes tag if applicable

return value;
}

static u64 sof_ipc4_time_dai_to_host(struct sof_ipc4_timestamp_info *time_info, u64 dai_time)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dai_time is is not time, but number of frames, name it as 'frames' ?

return sof_ipc4_time_scale(time_info, dai_time, true);
}

static u64 sof_ipc4_time_host_to_dai(struct sof_ipc4_timestamp_info *time_info, u64 host_time)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, host_time is not time, but frames

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I coined this as a "sof_ipc4_time_" prefix as this is a helper for timekeeping functionality, but I see you interprted this differently. Will change.

host_ptr = host_cnt;

/* Scale value to DAI time in case DAI running at different rate */
host_cnt = sof_ipc4_time_host_to_dai(time_info, host_cnt);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we don't scale the dai_cnt to host time-scale instead and save on back-forth scaling (here host to dai and later from dai to host)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, true, now that we can't assume which will wrap around faster, we might as well use host scale as the defaul and then we can avoid converting the resulting delay. Let me work on this. I have an additional fix in the works as well, will add it to this series.

Copy link
Collaborator Author

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ujfalusi , will work on an update.

host_ptr = host_cnt;

/* Scale value to DAI time in case DAI running at different rate */
host_cnt = sof_ipc4_time_host_to_dai(time_info, host_cnt);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, true, now that we can't assume which will wrap around faster, we might as well use host scale as the defaul and then we can avoid converting the resulting delay. Let me work on this. I have an additional fix in the works as well, will add it to this series.

return sof_ipc4_time_scale(time_info, dai_time, true);
}

static u64 sof_ipc4_time_host_to_dai(struct sof_ipc4_timestamp_info *time_info, u64 host_time)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I coined this as a "sof_ipc4_time_" prefix as this is a helper for timekeeping functionality, but I see you interprted this differently. Will change.

@kv2019i kv2019i force-pushed the 202507-src-delay-reporting branch from 552f865 to f66cb85 Compare August 11, 2025 09:52
@kv2019i
Copy link
Collaborator Author

kv2019i commented Aug 11, 2025

V3 pushed:

  • addressed comments from @ujfalusi
  • one fix/change added to the series, this will filter out bogus delay values when delay is querid during an xrun


if (time_info->delay > (DELAY_BOUNDARY >> 1)) {
dev_dbg_ratelimited(sdev->dev, "inaccurate delay, host %llu dai_cnt %llu",
host_cnt, dai_cnt);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets hit quite often with alsa_conformance_test(.py) and high sampling rates as alsa_conformance_test by default uses very small buffers and default is set in frames, so the amount of buffering between CPU and HW in terms of wall-clock time is short at 48000, but it gets super short at higher rates. This doesn't really interfere with testing, but xruns are sometimes hit and if we report delays of 2^31 frames, this completely confuses alsa_conformance_test. Returning zero in these cases seems more right thing to do. E.g. wth appls like video playback, a crazy high delay value can freeze video playback, while a transient zero might just through the audio track off for a few seconds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this happens on stream start or during a running stream?

With any PCM or only with deepbuffer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi Does happen both at start and during runtime. Can be hit with HDMI and nocodec SSP PCMs as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, does it happen on ChainDMA w/o the first patch?
ChainDMA runs with the same rate and both host and dai position is queried from the host side as FW does not provide this information.
ChainDMA and SSP use cases are different when it comes to delay reporting...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi Yes, it still happens. I get this with alsa_conformance_test and on some machines (not all, not related to SOF in anyway), it just hits xruns and e.g. on playback hw_ptr catches appl_ptr. This is its own problem of course, but when this does happen, we report a crazily large value as delay. As a result, even if there is a single xrun, all the alsa_conformance_rate metrics will be out-of-bounds and test will fail.

If it would be just the alsa_conformance_test, maybe we could ignore and just mark this as an app bug, but I think other apps could hit this as well. E.g. if you ar eplaying back video and there is some odd system glitch that causes a transient xrun, we shouldn't make matters worse by reporting a 2^25 delay and cause the video lip-sync to freeze the screen.


/*
* Modulus to use to compare host and link counters. This is required
* as host/link counters use different units (bytes/frames) and the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bytes on the host side as unit is not relevant here imho, it is converted to frames, what matters is the fact that the rate might be different and the thus the counter on dai/host might wrap at different time for the same duration.

IOW, you introduce an artificial low and hogh enough wrapping point for both to be used, right?

Previously we moved the wrapping point of the DAI earlier to match with how the host will wrap.

Copy link
Collaborator Author

@kv2019i kv2019i Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi I left the note on bytes as this modulus calculation is still needed even if there is no difference in rates.
I can drop it if it confuses too much. UPDATE: I will drop it. The conversion to frames is a separate step, you are right.

The idea is to use a made-up wrap point that is smaller than any possible hw counter (e.g. our link and host DMAs), and is much larger than any possible valid delay (we can express 93min delay at 768khz with the U32_MAX as wrap point).

The calculation is the same whether DAI or host side wraps first.


/* Wrap the dai counter at the boundary where the host counter wraps */
div64_u64_rem(dai_cnt, time_info->boundary, &dai_cnt);
/* dai/host_cnt converted to same unit, but the values will
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dai counter is moved to host rate domain, make sure that they are wrapped at the same value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi This is what we do here, right? DELAY_BOUNDARY is a adhoc boundary that will work with any sampling rate and we scale both values to same base before comparison.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the 'unit' confuses me, can we use rate domain to leave no room for misunderstanding?


if (time_info->delay > (DELAY_BOUNDARY >> 1)) {
dev_dbg_ratelimited(sdev->dev, "inaccurate delay, host %llu dai_cnt %llu",
host_cnt, dai_cnt);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this happens on stream start or during a running stream?

With any PCM or only with deepbuffer?

* be smaller than the wrap-around point of any hardware counter, as
* expressed in units of the host frame counter.
*/
#define DELAY_BOUNDARY U32_MAX
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is preferred to have the define at the top of the file and not embedded in middle

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, will move up.

Copy link
Collaborator Author

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replying to review comments.


/*
* Modulus to use to compare host and link counters. This is required
* as host/link counters use different units (bytes/frames) and the
Copy link
Collaborator Author

@kv2019i kv2019i Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi I left the note on bytes as this modulus calculation is still needed even if there is no difference in rates.
I can drop it if it confuses too much. UPDATE: I will drop it. The conversion to frames is a separate step, you are right.

The idea is to use a made-up wrap point that is smaller than any possible hw counter (e.g. our link and host DMAs), and is much larger than any possible valid delay (we can express 93min delay at 768khz with the U32_MAX as wrap point).

The calculation is the same whether DAI or host side wraps first.

* be smaller than the wrap-around point of any hardware counter, as
* expressed in units of the host frame counter.
*/
#define DELAY_BOUNDARY U32_MAX
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, will move up.


/* Wrap the dai counter at the boundary where the host counter wraps */
div64_u64_rem(dai_cnt, time_info->boundary, &dai_cnt);
/* dai/host_cnt converted to same unit, but the values will
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi This is what we do here, right? DELAY_BOUNDARY is a adhoc boundary that will work with any sampling rate and we scale both values to same base before comparison.

@kv2019i kv2019i force-pushed the 202507-src-delay-reporting branch from f66cb85 to b085c3b Compare August 11, 2025 11:22
@kv2019i
Copy link
Collaborator Author

kv2019i commented Aug 11, 2025

V4 uploaded:

  • moved DELAY_BOUNDARY def at start
  • reworded some of the commentary, please @ujfalusi check if this helps readability

@kv2019i
Copy link
Collaborator Author

kv2019i commented Aug 11, 2025

V5:

  • dropped the second patch -- let's focus on the SRC case first

@kv2019i kv2019i force-pushed the 202507-src-delay-reporting branch 2 times, most recently from 64358e9 to fc7a307 Compare August 11, 2025 12:27
@kv2019i
Copy link
Collaborator Author

kv2019i commented Aug 11, 2025

V6:

  • forgot to fix the stream_start_offset calculation for ChainDMA case. the value is already in host sampling rate (as there can be no SRC with chaindma), so the conversion from from dai_to_host format is bogus (the function would not have done anything given the rates are the same, but this is still wrong)

When the sampling rates going in (host) and out (dai) from the DSP
are different, the IPC4 delay reporting does not work correctly.
Add support for this case by scaling the all raw position values to
a common timebase before calculating real-time delay for the PCM.

Fixes: 0ea0668 ("ASoC: SOF: ipc4-pcm: Correct the delay calculation")
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
@kv2019i kv2019i force-pushed the 202507-src-delay-reporting branch from fc7a307 to f666a0a Compare August 18, 2025 13:35
@kv2019i
Copy link
Collaborator Author

kv2019i commented Aug 18, 2025

V7:

  • minor change to use simple AND bitmask instead of div64_u64_rem() -- we can do this now as DELAY_BOUNDARY is a fixed value

@kv2019i
Copy link
Collaborator Author

kv2019i commented Aug 25, 2025

@bardliao @ujfalusi ok to merge?

@ujfalusi ujfalusi merged commit aba6ee4 into thesofproject:topic/sof-dev Aug 25, 2025
10 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants