Skip to content

Conversation

@LouisYZK
Copy link

@LouisYZK LouisYZK commented Jan 23, 2022

Run on (8 X 1400 MHz CPU s) 4核8线程 osx

Before After 加速比
fill: 1.23536s fill: 0.256878s 4.80
fill: 1.25023s fill: 0.264094s 4.80
saxpy: 0.048762s saxpy: 0.03981s 1.22
sqrtdot: 0.086141s sqrtdot: 0.031361s 2.74
5165.4 5792.62 x
minvalue: 0.07746s minvalue: 0.01506s 5.14
-1.11803 -1.11803 -
magicfilter: 0.426949s magicfilter: 0.183792s 2.32
55924034 55924034 -
scanner: 0.080046s scanner: 0.032603s 2.45
5.28566e+07 5.28591e+07 -

sqrtdot 结果有差异,且优化效果不理想。还没想通为啥

想通了,精度问题

tbb::parallel_for(
tbb::blocked_range<size_t>(0, std::min(x.size(), y.size())),
[&] (tbb::blocked_range<size_t> r) {
std::vector<T> tmp_vec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加个 tmp_vec.reserve(r.size())?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants