Skip to content

Conversation

@QifanWang
Copy link

平台:WSL2 of Ubuntu20.04, CPU 8核 16线程

使用TBB库进行修改:

  • 函数fillsaxpy使用 parallel_for并行化;
  • 函数 sqrtdotminvalue 使用parallel_reduce并行化;
  • 函数magicfilter使用 slides 里的推荐方法,即对 res 与临时vector预留空间,并对 res 加锁;
  • 函数scanner 使用parallel_scan

用时对比:

function serial parallel speedup
fill(1) 0.975237s 0.244887s 3.98
fill(2) 1.07456s 0.264752s 4.06
saxpy 1.25815s 0.0610303s 20.62
sqrtdot 0.0876327s 0.0469867s 1.87
minvalue 0.0868141s 0.0236882s 3.66
magicfilter 0.34048s 0.262888s 1.30
scanner 0.0916984s 0.0658602s 1.39

一个比较好阅读理解的文档

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant