-
Notifications
You must be signed in to change notification settings - Fork 349
Audio: Optimize IIR performance #4940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio: Optimize IIR performance #4940
Conversation
|
Note: There's still plenty of read/write frag buffer access remaining elsewhere. I will address them later in another PR. |
0d11189 to
203160e
Compare
src/math/iir_df2t_generic.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should all be static inline in the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, need to think how to make in a nice way the generic vs. hifi3 headers. Here I was hoping compiler does the inline but it's of course not guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not when this is an archive. you can do
/* the header file */
#if USE_THE_STATIC_INLINE_VERSION
static inline int16_t iir_df2t_s16(struct iir_state_df2t *iir, int16_t x)
{
return sat_int16(Q_SHIFT_RND(iir_df2t(iir, ((int32_t)x) << 16), 31, 15));
}
#else
/* just declare func */
int16_t iir_df2t_s16(struct iir_state_df2t *iir, int16_t x);
#endif There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lgirdwood I'm testing approach to have in iir_df2t.h next code chunk.
/* Inline functions with or without HiFi3 intrinsics */
#if IIR_HIFI3
#include "iir_df2t_hifi3.h"
#else
#include "iir_df2t_generic.h"
#endif
The inline functions e.g. in iir_df2t_hifi3.h are like:
static inline int16_t iir_df2t_s16(struct iir_state_df2t *iir, int16_t x)
{
ae_f32x2 y = iir_df2t(iir, ((int32_t)x) << 16);
return AE_ROUND16X4F32SSYM(y, y);
}
Is such OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be , but what is the iir_df2t definition ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Looks nice improvement to me, thanks a lot @singalsu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice optimisation! Not very obvious either - this mostly just seems to remove checking for buffer wrapping on each sample access! But yes, making some of those one-line wrapper functions inline would help a bit more!
src/math/iir_df2t_hifi3.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a mistake. This instruction does not saturate to 24 bits. Need in addition to shift left with saturation by 8, shift right by 8. I think my cmocka test for 24 bit treated S24_LE as S32_LE so it can't detect overflow in 24 bits.
This patch optimizes the buffer copying and output scaling to other format than int32_t. The main saving is from not using read/write frag buffer access functions for every sample. The saving of processing cycles consumption varies per platform but a second order IIR stereo EQ on TGL shows 43% improvement for average copy() duration. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
203160e to
53ed0ff
Compare
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
This patch optimizes the buffer copying and output scaling to
other format than int32_t. The main saving is from not using
read/write frag buffer access functions for every sample.
The saving of processing cycles consumption varies per platform
but a second order IIR stereo EQ on TGL shows 43% improvement for
average copy() duration.
Signed-off-by: Seppo Ingalsuo seppo.ingalsuo@linux.intel.com