-
Notifications
You must be signed in to change notification settings - Fork 349
Audio: Google RTC Audio Proc: Process copy() without read/write frag #5579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio: Google RTC Audio Proc: Process copy() without read/write frag #5579
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just samples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is I was wondering how to name this. In first loop it's samples, in second loop it's frames. It seems that the optimizer makes variables more local so it should be the same to have two variables without increase in stack usage: samples and frames.
lkoenig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very nice pull request. Thanks a lot @singalsu to put it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename it to num_sample_remaining ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was rethinking it, and maybe we should stick to frames and handle the number of samples in the inner loop. The reason is that we can have different number of channel for:
- aec reference
- capture input
- capture output
But the number of frame should match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll rename if no issues with too long lines. Yep, I kept the possibility for different number of channels in ref, src, and snk.
The samples counting is more efficient so I used it in the reference loop. The frames without wrap function is using division that is slow on xtensa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also write: remain = cl.frames * cd->num_capture_channels where we initialize cd->num_capture_channels = 1 in google_rtc_audio_processing_create
I think that would make more sense.
Technically we could have (That might be for a follow up PR) different number of channels for capture input and capture output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leave that up to you. I'm OK to change but counting frames here preserves possibility for different number of source and sink channes. What is your preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer counting frame everywhere converting to samples when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, here the frames counting is needed. The implementation is not most efficient, there's divide and a switch case in audio_stream_frame_bytes().
static inline uint32_t
audio_stream_frames_without_wrap(const struct audio_stream *source,
const void *ptr)
{
uint32_t bytes = audio_stream_bytes_without_wrap(source, ptr);
uint32_t frame_bytes = audio_stream_frame_bytes(source);
return bytes / frame_bytes;
}
While this function is a lot lighter, so I used in reference where it was possible.
static inline int
audio_stream_samples_without_wrap_s16(const struct audio_stream *source, const void *ptr)
{
int to_end = (int16_t *)source->end_addr - (int16_t *)ptr;
assert((intptr_t)source->end_addr >= (intptr_t)ptr);
return to_end;
}
e44c18a to
08045a4
Compare
|
looks good to me. |
lgirdwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cujomalainey @lkoenig wont merge until Google approves
lkoenig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the code as it is and it did not work due to the small details.
Once I corrected those details, it work as intended.
Thanks a lot for putting the work in that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the number of AEC reference channels passed to the AEC might be lower than the number of channel in the stream. See comment in prepare about that.
Should be like:
num_samples_remaining = num_aec_reference_frames * cd->aec_reference->stream.channels;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here as cd->num_aec_reference_channels could be lower or equal cd->aec_reference->stream.channels one should make sure ref is incremented by cd->aec_reference->stream.channels at the end of the loop.
In that implementation, it is only incremented by cd->num_aec_reference_channels.
|
@singalsu Here is the diff I used to make it work: |
|
ping to @singalsu |
|
I'm sorry @lkoenig @cujomalainey @lgirdwood ! I got twice ill this spring that impacted my work, plus next week is vacation. I plan to start with with this after. If you feel this is needed earlier please take over this patch. |
|
No worries, just didnt want it to drop to the bottom of your inbox. Hope you are taking care of yourself. |
|
@singalsu Please do take care of you and if you are on vacation do enjoy your vacations. |
|
Can one of the admins verify this patch? |
|
@lkoenig Thanks! I've now included your changes and will test tomorrow that it works here before update the PR. |
This patch optimizes the copy() function by avoiding the per sample done circular wrap check. The source, sink, and reference buffers are read and written in maximum size blocks returned by audio stream function audio_stream_samples_without_wrap_s16() and audio_stream_frames_without_wrap(). The saving with mock-up algorithm version is 16 MCPS, from 29 to 13 MCPS in test with stereo 48 kHz 16 bit playback and capture. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
08045a4 to
5da6554
Compare
|
@lkoenig This version seemed to work here with mock-up algorithm, playback was mixed to first capture channel. Can you please check it's OK with the real version. |
|
Ping to @lkoenig |
cujomalainey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks good, but I definitely think we can improve the code with some macros/helpers across all of the codebase, especially if we are getting c99 support.
This patch optimizes the copy() function by avoiding the per sample
done circular wrap check. The source, sink, and reference buffers
are read and written in maximum size blocks returned by audio stream
function audio_stream_samples_without_wrap_s16() and
audio_stream_frames_without_wrap().
The saving with mock-up algorithm version is 16 MCPS, from 29 to 13 MCPS.
Signed-off-by: Seppo Ingalsuo seppo.ingalsuo@linux.intel.com