Skip to content

Conversation

@daohwang
Copy link

@daohwang daohwang commented Feb 4, 2022

  1. fill 采用了parallel_for
  2. saxpy 使用了 parallel_for
  3. sqrtdot 使用了 parallel_reduce
  4. minvalue 使用了 parallel_reduce
  5. magicfilter 使用了小彭老师PPT上的方法
  6. scanner 直接使用了 parallel_scanner

硬件:8核16线程,AMD
加速情况:

  1. fill 6.61倍
  2. saxpy: 1.04倍
  3. sqrtdot:2.50倍
  4. minvalue:2.30倍
  5. magicfilter:0.76倍(反而变慢了)
  6. scanner:1.36倍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant