-
Notifications
You must be signed in to change notification settings - Fork 349
Description
Describe the bug
This is a meta bug to track issues related to problems found by https://chromium.googlesource.com/chromiumos/platform/audiotest/+/HEAD/alsa_conformance_test.md with SOF driver. As similar discussions happen in multiple bugs, I filed a metabug in addition to just linking the issues, so that we can provide common commentary here.
Issues related to this meta bug:
- [BUG][RPL][CHROMEOS] alsa_conformance_test fails for test rates #8512
- [BUG][MTL][ChromeOS][Rex] - alsa conformance test rate error exceeds the limit for DMIC with 4 channel #8458
- [BUG][mtl-002] alsa_conformance_test: test_rates failed for both accuracy and stability on I2S interfaces #6756
- [BUG] [TGL] rate error > 200 by alsa_conformance_test for PCM99 capture #4617
- MTL - Rex - Chrome - alsa conformance test rate error exceeds the limit for DMIC with 4 channel linux#4691
- ASoC: SOF: Intel: hda_pcm: Regulate DPIB reading at frame boundary for playback stream linux#2381
What is the test about?
ALSA conformance rate test monitors (by busylooping on host CPU thread) value of snd_pcm_avail() and records any change of hardware pointer and uses a linear regression to estimate the rate (and calculate an error to estimate).
Why many SOF targets fail the test? Is this a bug?
The very purpose of SOF is to provide a framework to manage a programmable DSP, sitting between application and audio codec(s). The SOF devices must adhere to ALSA PCM API semantics, but within the limits set, the DSP can optimize data transfers based on its needs and the needs of the overall SoC. On many SOF platforms, the driver can report accurate hardware pointer value even during DMA bursts. From alsa_conformance_test's point of view, this shows up as high degree of burstiness of the snd_pcm_avail() values, when it polls the values many times within one ALSA period.
The test has been useful tool to analyze SOF driver and firmware behaviour, but especially the rate error calculation has proven to be complex and we have multiple cases where SOF is behaving correctly, but conformance test repots a failure.
Ways for driver to alert user-space of its particular behaviour
ALSA has a capability interface for driver to declare particular characteristics of drivers. Following interface could be of potential benefit for test apps like alsa_conformance_test to better understand the device it is testing:
| Driver flag | alsa-lib | Usage in applications | Can help with SOF and alsa_conformance_test |
|---|---|---|---|
| SNDRV_PCM_INFO_BATCH | snd_pcm_hw_params_is_batch() | Widely used by apps to detect drivers that update hw_ptr only once per period. | Not helpful, SOF drivers can updated hw_ptr continuously |
| SNDRV_PCM_INFO_DOUBLE | sndpcm_hw_params_is_double() | No known usage in apps. Only a few kernel drivers set. | Not helpful, SOF is not double-buffering like the RME drivers in kernel |
| SNDRV_PCM_INFO_BLOCK_TRANSFER | snd_pcm_hw_params_is_block_transfer() | No known usage in apps, but set by most ALSA driver... but not by any SOF driver. | Semantics very unclear, device transfers data in block but that's it. Given this is set by so many drivers without any actual user-space usage, starting to use that with more specific semantics does not seem like a good idea. |
In summary, there is no means to relay the size of DMA bursts from driver to user-space application. INFO_BATCH could be set by the driver, but that will currently disable other functionality (e.g. in Pulseaudio and Pipewire) if this flag is set.
Existing solutions/workarounds -- merge threshold
A merge feature was added to alsa_conformance_test. By setting --merge_threshold_sz and/or --merge_threshold variable, the test can be instructed to merge measurements that are very close to each other. In most cases, setting the merge threshold to be equal to period size, will result in a passing test with low error rate. To comply with ALSA PCM API, the driver must update the hardware pointer position at least every period. In practise, the tool's existing method to do a dry-run and detect the median step increase, works for the majority of cases (see comment #8717 (comment)).
This method is not bullet proof. E.g. #8512 failed despite merge threshold being set.
alsa_conformance_test also had a change request to make this the default, but the change was reverted https://chromium-review.googlesource.com/c/chromiumos/platform/audiotest/+/4220795 so it also caused failures on platforms where test was passing without the merge threshold being set.
**Solutions under investigation: aligning to period size **
Instead of merge threshold, align analysis to full period size blocks of data. Big part of the problem seems to be that the hardware pointer may be sampled in middle of a DMA burst. So what if the analysis is forced to happen only at period intervals? This will ensure samples are not taking in middle of DMA bursts, but rest of the analysis can be done with existing logic.
This solution has a problem with potential hiding problems on non-SOF platforms, so would have to be a dedicated option.
UPDATE Jan 12th: updating merge threshold from median step size to period size does seem to solve any fundamental issue at least for the standard 1ms LL tick configuration (see comment #8717 (comment) ). To cover cases with larger host buffer ("deepbuffer" in some SOF topologies), this may still be applicable.
Solutions under investigation: exposing size of host DMA buffer
In many SOF setups, the host DMA buffer size is a small multiple of the Low-Latency scheduler tick length (typically 1ms). This is not always the case, "deep buffer" configurations may define a much longer buffer (e.g. 10ms). Notably this duration will be shorter than the ALSA period size. If this information would be available to user-space, measurements could be aligned (either via merge threshold or a new method) to avoid DMA bursts. OTOH, it's not clear how much benefit this has over just using the period size.
Alternative solutions
In the linked bugs, it has been pointed out that a more robust test for sample rate correctness and sample rate stability is to playback/capture a known reference tone and analyze its characteristics (as described in #4617 (comment) . This is especially important as the alsa conformance test relies on system clock (MONOTONIC_RAW source) to sample the audio time, but this is not guaranteed to be aligned with the audio susbsystem's clock. Analysis of a reference tone is immune to this error. The alsa_conformance_test allows to capture failures of this type without need for an external test setup (reference tone generator/capture), so it has its own merits and places of use (and it has been used long in ChromeOS and Android), but the methodology limitations should be noted.
New Tools to Debug
- alsa_conformance_test: add debug mode for step/avail analysis, kv2019i/cros-audiotest@d4d6f42 Allow to visually plot the snd_pcm_avail() updates and some new debugging to use debug behaviour with bursty DMA. Added Jan 12 2024.