Skip to content

Conversation

@singalsu
Copy link
Collaborator

This patch optimizes the buffer copying and output scaling to
other format than int32_t. The main saving is from not using
read/write frag buffer access functions for every sample.

The saving of processing cycles consumption varies per platform
but a second order IIR stereo EQ on TGL shows 43% improvement for
average copy() duration.

Signed-off-by: Seppo Ingalsuo seppo.ingalsuo@linux.intel.com

@singalsu singalsu requested a review from keyonjie October 29, 2021 17:06
@singalsu
Copy link
Collaborator Author

Note: There's still plenty of read/write frag buffer access remaining elsewhere. I will address them later in another PR.

@singalsu singalsu force-pushed the eqiir_readwritefrags_optimize branch from 0d11189 to 203160e Compare October 29, 2021 17:22
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should all be static inline in the header.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, need to think how to make in a nice way the generic vs. hifi3 headers. Here I was hoping compiler does the inline but it's of course not guaranteed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not when this is an archive. you can do

/* the header file */
#if USE_THE_STATIC_INLINE_VERSION
static inline int16_t iir_df2t_s16(struct iir_state_df2t *iir, int16_t x)
{
	return sat_int16(Q_SHIFT_RND(iir_df2t(iir, ((int32_t)x) << 16), 31, 15));
}
#else
/* just declare func */
int16_t iir_df2t_s16(struct iir_state_df2t *iir, int16_t x);
#endif  

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood I'm testing approach to have in iir_df2t.h next code chunk.

/* Inline functions with or without HiFi3 intrinsics */
#if IIR_HIFI3
#include "iir_df2t_hifi3.h"
#else
#include "iir_df2t_generic.h"
#endif

The inline functions e.g. in iir_df2t_hifi3.h are like:

static inline int16_t iir_df2t_s16(struct iir_state_df2t *iir, int16_t x)
{
	ae_f32x2 y = iir_df2t(iir, ((int32_t)x) << 16);

	return AE_ROUND16X4F32SSYM(y, y);
}

Is such OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be , but what is the iir_df2t definition ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keyonjie
Copy link
Contributor

keyonjie commented Nov 1, 2021

Looks nice improvement to me, thanks a lot @singalsu

Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nice optimisation! Not very obvious either - this mostly just seems to remove checking for buffer wrapping on each sample access! But yes, making some of those one-line wrapper functions inline would help a bit more!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a mistake. This instruction does not saturate to 24 bits. Need in addition to shift left with saturation by 8, shift right by 8. I think my cmocka test for 24 bit treated S24_LE as S32_LE so it can't detect overflow in 24 bits.

This patch optimizes the buffer copying and output scaling to
other format than int32_t. The main saving is from not using
read/write frag buffer access functions for every sample.

The saving of processing cycles consumption varies per platform
but a second order IIR stereo EQ on TGL shows 43% improvement for
average copy() duration.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
@singalsu singalsu force-pushed the eqiir_readwritefrags_optimize branch from 203160e to 53ed0ff Compare November 3, 2021 17:00
@singalsu singalsu marked this pull request as ready for review November 4, 2021 10:05
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@lgirdwood lgirdwood merged commit 73f9814 into thesofproject:main Nov 8, 2021
@singalsu singalsu deleted the eqiir_readwritefrags_optimize branch September 15, 2022 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants