-
Notifications
You must be signed in to change notification settings - Fork 140
ASoC: SOF: ipc4-pcm: fix delay calculation when DSP resamples #5499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASoC: SOF: ipc4-pcm: fix delay calculation when DSP resamples #5499
Conversation
be6ab7b to
552f865
Compare
|
V2:
|
ujfalusi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kv2019i, this is indeed was not handled correctly, thanks for fixing it!
Few comments, but the patch looks good otherwise. I would add fixes tag if applicable
sound/soc/sof/ipc4-pcm.c
Outdated
| return value; | ||
| } | ||
|
|
||
| static u64 sof_ipc4_time_dai_to_host(struct sof_ipc4_timestamp_info *time_info, u64 dai_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dai_time is is not time, but number of frames, name it as 'frames' ?
sound/soc/sof/ipc4-pcm.c
Outdated
| return sof_ipc4_time_scale(time_info, dai_time, true); | ||
| } | ||
|
|
||
| static u64 sof_ipc4_time_host_to_dai(struct sof_ipc4_timestamp_info *time_info, u64 host_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, host_time is not time, but frames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I coined this as a "sof_ipc4_time_" prefix as this is a helper for timekeeping functionality, but I see you interprted this differently. Will change.
sound/soc/sof/ipc4-pcm.c
Outdated
| host_ptr = host_cnt; | ||
|
|
||
| /* Scale value to DAI time in case DAI running at different rate */ | ||
| host_cnt = sof_ipc4_time_host_to_dai(time_info, host_cnt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we don't scale the dai_cnt to host time-scale instead and save on back-forth scaling (here host to dai and later from dai to host)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, true, now that we can't assume which will wrap around faster, we might as well use host scale as the defaul and then we can avoid converting the resulting delay. Let me work on this. I have an additional fix in the works as well, will add it to this series.
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ujfalusi , will work on an update.
sound/soc/sof/ipc4-pcm.c
Outdated
| host_ptr = host_cnt; | ||
|
|
||
| /* Scale value to DAI time in case DAI running at different rate */ | ||
| host_cnt = sof_ipc4_time_host_to_dai(time_info, host_cnt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, true, now that we can't assume which will wrap around faster, we might as well use host scale as the defaul and then we can avoid converting the resulting delay. Let me work on this. I have an additional fix in the works as well, will add it to this series.
sound/soc/sof/ipc4-pcm.c
Outdated
| return sof_ipc4_time_scale(time_info, dai_time, true); | ||
| } | ||
|
|
||
| static u64 sof_ipc4_time_host_to_dai(struct sof_ipc4_timestamp_info *time_info, u64 host_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I coined this as a "sof_ipc4_time_" prefix as this is a helper for timekeeping functionality, but I see you interprted this differently. Will change.
552f865 to
f66cb85
Compare
|
V3 pushed:
|
sound/soc/sof/ipc4-pcm.c
Outdated
|
|
||
| if (time_info->delay > (DELAY_BOUNDARY >> 1)) { | ||
| dev_dbg_ratelimited(sdev->dev, "inaccurate delay, host %llu dai_cnt %llu", | ||
| host_cnt, dai_cnt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets hit quite often with alsa_conformance_test(.py) and high sampling rates as alsa_conformance_test by default uses very small buffers and default is set in frames, so the amount of buffering between CPU and HW in terms of wall-clock time is short at 48000, but it gets super short at higher rates. This doesn't really interfere with testing, but xruns are sometimes hit and if we report delays of 2^31 frames, this completely confuses alsa_conformance_test. Returning zero in these cases seems more right thing to do. E.g. wth appls like video playback, a crazy high delay value can freeze video playback, while a transient zero might just through the audio track off for a few seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this happens on stream start or during a running stream?
With any PCM or only with deepbuffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi Does happen both at start and during runtime. Can be hit with HDMI and nocodec SSP PCMs as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, does it happen on ChainDMA w/o the first patch?
ChainDMA runs with the same rate and both host and dai position is queried from the host side as FW does not provide this information.
ChainDMA and SSP use cases are different when it comes to delay reporting...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi Yes, it still happens. I get this with alsa_conformance_test and on some machines (not all, not related to SOF in anyway), it just hits xruns and e.g. on playback hw_ptr catches appl_ptr. This is its own problem of course, but when this does happen, we report a crazily large value as delay. As a result, even if there is a single xrun, all the alsa_conformance_rate metrics will be out-of-bounds and test will fail.
If it would be just the alsa_conformance_test, maybe we could ignore and just mark this as an app bug, but I think other apps could hit this as well. E.g. if you ar eplaying back video and there is some odd system glitch that causes a transient xrun, we shouldn't make matters worse by reporting a 2^25 delay and cause the video lip-sync to freeze the screen.
sound/soc/sof/ipc4-pcm.c
Outdated
|
|
||
| /* | ||
| * Modulus to use to compare host and link counters. This is required | ||
| * as host/link counters use different units (bytes/frames) and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bytes on the host side as unit is not relevant here imho, it is converted to frames, what matters is the fact that the rate might be different and the thus the counter on dai/host might wrap at different time for the same duration.
IOW, you introduce an artificial low and hogh enough wrapping point for both to be used, right?
Previously we moved the wrapping point of the DAI earlier to match with how the host will wrap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi I left the note on bytes as this modulus calculation is still needed even if there is no difference in rates.
I can drop it if it confuses too much. UPDATE: I will drop it. The conversion to frames is a separate step, you are right.
The idea is to use a made-up wrap point that is smaller than any possible hw counter (e.g. our link and host DMAs), and is much larger than any possible valid delay (we can express 93min delay at 768khz with the U32_MAX as wrap point).
The calculation is the same whether DAI or host side wraps first.
sound/soc/sof/ipc4-pcm.c
Outdated
|
|
||
| /* Wrap the dai counter at the boundary where the host counter wraps */ | ||
| div64_u64_rem(dai_cnt, time_info->boundary, &dai_cnt); | ||
| /* dai/host_cnt converted to same unit, but the values will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dai counter is moved to host rate domain, make sure that they are wrapped at the same value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi This is what we do here, right? DELAY_BOUNDARY is a adhoc boundary that will work with any sampling rate and we scale both values to same base before comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the 'unit' confuses me, can we use rate domain to leave no room for misunderstanding?
sound/soc/sof/ipc4-pcm.c
Outdated
|
|
||
| if (time_info->delay > (DELAY_BOUNDARY >> 1)) { | ||
| dev_dbg_ratelimited(sdev->dev, "inaccurate delay, host %llu dai_cnt %llu", | ||
| host_cnt, dai_cnt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this happens on stream start or during a running stream?
With any PCM or only with deepbuffer?
sound/soc/sof/ipc4-pcm.c
Outdated
| * be smaller than the wrap-around point of any hardware counter, as | ||
| * expressed in units of the host frame counter. | ||
| */ | ||
| #define DELAY_BOUNDARY U32_MAX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is preferred to have the define at the top of the file and not embedded in middle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, will move up.
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replying to review comments.
sound/soc/sof/ipc4-pcm.c
Outdated
|
|
||
| /* | ||
| * Modulus to use to compare host and link counters. This is required | ||
| * as host/link counters use different units (bytes/frames) and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi I left the note on bytes as this modulus calculation is still needed even if there is no difference in rates.
I can drop it if it confuses too much. UPDATE: I will drop it. The conversion to frames is a separate step, you are right.
The idea is to use a made-up wrap point that is smaller than any possible hw counter (e.g. our link and host DMAs), and is much larger than any possible valid delay (we can express 93min delay at 768khz with the U32_MAX as wrap point).
The calculation is the same whether DAI or host side wraps first.
sound/soc/sof/ipc4-pcm.c
Outdated
| * be smaller than the wrap-around point of any hardware counter, as | ||
| * expressed in units of the host frame counter. | ||
| */ | ||
| #define DELAY_BOUNDARY U32_MAX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, will move up.
sound/soc/sof/ipc4-pcm.c
Outdated
|
|
||
| /* Wrap the dai counter at the boundary where the host counter wraps */ | ||
| div64_u64_rem(dai_cnt, time_info->boundary, &dai_cnt); | ||
| /* dai/host_cnt converted to same unit, but the values will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi This is what we do here, right? DELAY_BOUNDARY is a adhoc boundary that will work with any sampling rate and we scale both values to same base before comparison.
f66cb85 to
b085c3b
Compare
|
V4 uploaded:
|
|
V5:
|
64358e9 to
fc7a307
Compare
|
V6:
|
When the sampling rates going in (host) and out (dai) from the DSP are different, the IPC4 delay reporting does not work correctly. Add support for this case by scaling the all raw position values to a common timebase before calculating real-time delay for the PCM. Fixes: 0ea0668 ("ASoC: SOF: ipc4-pcm: Correct the delay calculation") Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
fc7a307 to
f666a0a
Compare
|
V7:
|
When the sampling rates going in (host) and out (dai) from the DSP are different, the IPC4 delay reporting does not work correctly. Add support for this case by scaling the host position values to a common timebase before calculating real-time delay for the PCM.
Link: #5498