Skip to content

Conversation

@singalsu
Copy link
Collaborator

This patch optimizes the copy() function by avoiding the per sample
done circular wrap check. The source, sink, and reference buffers
are read and written in maximum size blocks returned by audio stream
function audio_stream_samples_without_wrap_s16() and
audio_stream_frames_without_wrap().

The saving with mock-up algorithm version is 16 MCPS, from 29 to 13 MCPS.

Signed-off-by: Seppo Ingalsuo seppo.ingalsuo@linux.intel.com

@singalsu
Copy link
Collaborator Author

@lkoenig This draft will need a rebase after merge of your #5576.

@singalsu singalsu requested a review from lkoenig March 21, 2022 17:46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just samples?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is I was wondering how to name this. In first loop it's samples, in second loop it's frames. It seems that the optimizer makes variables more local so it should be the same to have two variables without increase in stack usage: samples and frames.

Copy link
Contributor

@lkoenig lkoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very nice pull request. Thanks a lot @singalsu to put it up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename it to num_sample_remaining ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was rethinking it, and maybe we should stick to frames and handle the number of samples in the inner loop. The reason is that we can have different number of channel for:

  • aec reference
  • capture input
  • capture output
    But the number of frame should match.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll rename if no issues with too long lines. Yep, I kept the possibility for different number of channels in ref, src, and snk.

The samples counting is more efficient so I used it in the reference loop. The frames without wrap function is using division that is slow on xtensa.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also write: remain = cl.frames * cd->num_capture_channels where we initialize cd->num_capture_channels = 1 in google_rtc_audio_processing_create
I think that would make more sense.

Technically we could have (That might be for a follow up PR) different number of channels for capture input and capture output.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave that up to you. I'm OK to change but counting frames here preserves possibility for different number of source and sink channes. What is your preference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer counting frame everywhere converting to samples when needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, here the frames counting is needed. The implementation is not most efficient, there's divide and a switch case in audio_stream_frame_bytes().

static inline uint32_t
audio_stream_frames_without_wrap(const struct audio_stream *source,
				 const void *ptr)
{
	uint32_t bytes = audio_stream_bytes_without_wrap(source, ptr);
	uint32_t frame_bytes = audio_stream_frame_bytes(source);

	return bytes / frame_bytes;
}

While this function is a lot lighter, so I used in reference where it was possible.

static inline int
audio_stream_samples_without_wrap_s16(const struct audio_stream *source, const void *ptr)
{
	int to_end = (int16_t *)source->end_addr - (int16_t *)ptr;

	assert((intptr_t)source->end_addr >= (intptr_t)ptr);
	return to_end;
}

@andrula-song
Copy link
Contributor

looks good to me.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cujomalainey @lkoenig wont merge until Google approves

Copy link
Contributor

@lkoenig lkoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the code as it is and it did not work due to the small details.

Once I corrected those details, it work as intended.
Thanks a lot for putting the work in that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the number of AEC reference channels passed to the AEC might be lower than the number of channel in the stream. See comment in prepare about that.
Should be like:
num_samples_remaining = num_aec_reference_frames * cd->aec_reference->stream.channels;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here as cd->num_aec_reference_channels could be lower or equal cd->aec_reference->stream.channels one should make sure ref is incremented by cd->aec_reference->stream.channels at the end of the loop.

In that implementation, it is only incremented by cd->num_aec_reference_channels.

@lkoenig
Copy link
Contributor

lkoenig commented Mar 24, 2022

@singalsu Here is the diff I used to make it work:

@@ -372,16 +377,16 @@ static int google_rtc_audio_processing_copy(struct comp_dev *dev)
 
        buffer_stream_invalidate(cd->aec_reference, num_aec_reference_bytes);
 
-       num_samples_remaining = num_aec_reference_frames * cd->num_aec_reference_channels;
+       num_samples_remaining = num_aec_reference_frames * cd->aec_reference->stream.channels;
        while (num_samples_remaining) {
                nmax = audio_stream_samples_without_wrap_s16(&cd->aec_reference->stream, ref);
                n  = MIN(num_samples_remaining, nmax);
                for (i = 0; i < n; i += cd->num_aec_reference_channels) {
                        j = cd->num_aec_reference_channels * cd->aec_reference_frame_index;
                        for (channel = 0; channel < cd->num_aec_reference_channels; ++channel) {
-                               cd->aec_reference_buffer[j++] = *ref;
-                               ref++;
+                               cd->aec_reference_buffer[j++] = ref[channel];
                        }
+                       ref += cd->aec_reference->stream.channels;
                        ++cd->aec_reference_frame_index;
 
                        if (cd->aec_reference_frame_index == cd->num_frames) {

@cujomalainey
Copy link
Contributor

ping to @singalsu

@singalsu
Copy link
Collaborator Author

I'm sorry @lkoenig @cujomalainey @lgirdwood ! I got twice ill this spring that impacted my work, plus next week is vacation. I plan to start with with this after. If you feel this is needed earlier please take over this patch.

@cujomalainey
Copy link
Contributor

No worries, just didnt want it to drop to the bottom of your inbox. Hope you are taking care of yourself.

@lkoenig
Copy link
Contributor

lkoenig commented Apr 11, 2022

@singalsu Please do take care of you and if you are on vacation do enjoy your vacations.
Have a look at this one once you are rested and recovered.
Take care !

@lgirdwood lgirdwood added this to the v2.2 milestone Apr 19, 2022
@sys-pt1s
Copy link

Can one of the admins verify this patch?

@singalsu
Copy link
Collaborator Author

@lkoenig Thanks! I've now included your changes and will test tomorrow that it works here before update the PR.

This patch optimizes the copy() function by avoiding the per sample
done circular wrap check. The source, sink, and reference buffers
are read and written in maximum size blocks returned by audio stream
function audio_stream_samples_without_wrap_s16() and
audio_stream_frames_without_wrap().

The saving with mock-up algorithm version is 16 MCPS, from 29 to
13 MCPS in test with stereo 48 kHz 16 bit playback and capture.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
@singalsu singalsu force-pushed the grtcproc_read_write_frag_optimize branch from 08045a4 to 5da6554 Compare April 22, 2022 08:31
@singalsu
Copy link
Collaborator Author

@lkoenig This version seemed to work here with mock-up algorithm, playback was mixed to first capture channel. Can you please check it's OK with the real version.

@singalsu singalsu requested a review from lkoenig April 22, 2022 08:34
@cujomalainey
Copy link
Contributor

Ping to @lkoenig

Copy link
Contributor

@cujomalainey cujomalainey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good, but I definitely think we can improve the code with some macros/helpers across all of the codebase, especially if we are getting c99 support.

@lgirdwood lgirdwood merged commit 89e50e0 into thesofproject:main May 18, 2022
@singalsu singalsu deleted the grtcproc_read_write_frag_optimize branch August 26, 2022 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants