Skip to content

Conversation

@singalsu
Copy link
Collaborator

@singalsu singalsu commented Dec 3, 2021

This patch replaces the per sample done circular copies with
audio_stream_read/write_frag_s16/32() functions calls with
max. length linear copies.

The original copy function for >1ch sink is just 1:1 copy
from source to sink without controllable channel select. This
patch preserves the functionality and implements the copy with
memcpy().

Signed-off-by: Seppo Ingalsuo seppo.ingalsuo@linux.intel.com

@singalsu singalsu marked this pull request as ready for review December 7, 2021 13:43
@singalsu singalsu requested review from lgirdwood and removed request for lgirdwood December 7, 2021 13:43
@singalsu singalsu changed the title Audio: Select: Optimize data copy operations [Do not merge]Audio: Select: Optimize data copy operations Dec 7, 2021
@singalsu
Copy link
Collaborator Author

singalsu commented Dec 7, 2021

I'm now able to test in device, there's some problem in this version that needs to be fixed.

@singalsu singalsu force-pushed the selector_buffers_access_optimize branch from 40e2560 to 244f466 Compare December 9, 2021 14:34
@singalsu singalsu requested a review from keyonjie December 9, 2021 14:34
@singalsu singalsu changed the title [Do not merge]Audio: Select: Optimize data copy operations Audio: Select: Optimize data copy operations Dec 9, 2021
@singalsu
Copy link
Collaborator Author

singalsu commented Dec 9, 2021

Here's plot of processing times for s16 and s24 data before and after optimize:
Screenshot from 2021-12-09 17-13-20

@singalsu
Copy link
Collaborator Author

singalsu commented Dec 9, 2021

For s16 data the processing time dropped from average 43 us to 10 us, or 17 MCPS to 4 MCPS.

For s24 data processing times drop is quite similar from 42 us to 10 us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not assert here, we should return an error with a trace.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have config check before copy() streaming starts in params or prepare. There's some checks against max channel count but I could not find this. I'll see if I can add it there and remove this assert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this will always be a multiple of 2 can we shift by sample_size (2 or 4) and mult by chans ? Means no divide.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samples count can be computed with shift but to get frames count would need to divide with channels. I'll see if I can build the inner loop other way to avoid divide here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of way to make this work with e.g. 3 channel to 1 without divide. The inner loop must operate with frames count and I can't change it to bytes (increments of frame bytes) because bytes max from source and sink are not "same thing". At least now I think so. The divide happens only max. twice per copy() so the cost is not high.

Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a minor comment, nice optimisations otherwise!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a huge optimisation, but since you're using memcpy() now, src and dst don't need to be 16-bit pointers. You could make them 8-bit pointers and avoid shifting here. It might seem a bit counterintuitive to some, that we use 8-bit pointers in an S16LE copying function, so, it's up to you whether to make this change. There are more cases like this in this PR, possibly also in other your copy optimisation PRs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that would allow in compiled code to go from bytes to samples to bytes. I didn't do it because I was wondering if int8_t would get criticism. Let's re-try with bytes, it's at least most optimal code.

This patch replaces the per sample done circular copies with
audio_stream_read/write_frag_s16/32() functions calls with
max. length linear copies.

The original copy function for >1ch sink is just 1:1 copy
from source to sink without controllable channel select. This
patch preserves the functionality and implements the copy with
memcpy().

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The optimized copy() functions could output data outside source
buffer without this check. The test in params() checked only
channel select parameter vs. max channels count. The changed code
checks it vs. actual stream channels count. There's in params() a
channels count check so this one could be modified.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
@singalsu singalsu force-pushed the selector_buffers_access_optimize branch from 244f466 to a8c2960 Compare December 14, 2021 16:50
@lgirdwood lgirdwood merged commit e917018 into thesofproject:main Dec 16, 2021
@singalsu singalsu deleted the selector_buffers_access_optimize branch September 16, 2022 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants