Skip to content

Conversation

@delve-wang
Copy link

代码中使用了最大8线程的任务组。
优化前:
fill: 0.840091s
fill: 0.761348s
saxpy: 0.0335867s
sqrtdot: 0.0801362s
5165.4
minvalue: 0.124584s
-1.11803
magicfilter: 0.21705s
55924034
scanner: 0.0785354s
6.18926e+07
优化后:
fill: 0.130924s,6.416倍
fill: 0.132913s,5.728倍
saxpy: 0.0101733s,3.325倍
sqrtdot: 0.0241576s,3.317倍
5792.62
minvalue: 0.0222686s,5.596倍
-1.11803
magicfilter: 0.0409125s,5.305倍
55924034
scanner: 0.0292774s,2.689倍
0

  • 使用parallel for加速的fill、saxpy,对fill加速效果良好,saxpy不佳,因为后者计算量不够大;
  • 用parallel reduce对sqrtdot、minvalue加速
  • 直接对scanner用parallel scan加速
  • magicfilter:采用无锁加速,并且用pod包装。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants