Skip to content

Conversation

@andrula-song
Copy link
Contributor

Use the xtense intrinsic instrunctions directly can save at least 10% cycles for those functions, and save about 0.92mcps for DRC component.

@andrula-song andrula-song requested a review from singalsu November 2, 2023 02:40
@andrula-song
Copy link
Contributor Author

andrula-song commented Nov 2, 2023

here is the xtensa simulator result:
drc_math
compared with the original functions, using instructions directly can save:
for log10_fixed can save about 12.8% cycles;
for drc_lin2db_fixed can save about 11.1% cycles;
for drc_log_fixed can save about 10.0% cycles;
for drc_asin_fixed can save about 17.1% cycles;
for drc_inv_fixed can save about 12.7% cycles;

and test with xt-testbench 32bit 48kHz on tgl, before optimization we get 150.05mcps(including many trace print and module adapter operation) and after optimization we get 149.13mcps, save about 0.92mcps for component DRC.

ae_f32 exp; /* Q7.25 */
ae_f32 acc; /* Q6.26 */
ae_f32 tmp; /* Q6.26 */
ae_f64 tmp64;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there's overhead in inline functions, maybe it's from 26 as literal instead of variable? You could comment that the instructions normalize the value after the function was removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested in xtensa test bench, it really costs more cycles when using the function wrapper than instructions.

x = drc_mult_lshift(x, ONE_OVER_SQRT2_Q30, lshift);
tmp64 = AE_MULF32R_LL(x, ONE_OVER_SQRT2_Q30);
/* drc_get_lshift(30, 30, 30) = 1 */
tmp64 = AE_SLAI64S(tmp64, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use #define macro for the magic shift values to know the are for a Qx to Qy conversion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if use macro then I can not use AE_SLAI64S, and if use AE_SLAA64S, the log10_fixed reduce cycles from 12.8% down to 12.2%, so better get back the lshift calculation for better code readability.

int32_t lshift;
int32_t e;
/* drc_get_lshift(25, 30, 25) = 1 */
int32_t lshift = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a #define macro for Qx Qy multiply as Qz shift value?

Use the xtense intrinsic instrunctions directly can save
at least 10% cycles for those functions, and save about
0.9mcps for DRC component.

Signed-off-by: Andrula Song <andrula.song@intel.com>
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvement.

@lgirdwood
Copy link
Member

@andrula-song whats the reason for close if saving MCPS ?

@andrula-song
Copy link
Contributor Author

andrula-song commented Jan 18, 2024

@andrula-song whats the reason for close if saving MCPS ?
sorry, closed by mistake, and force pushed to the branch, can not reopen. so created a new one #8757

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants