Conversation
|
Do we understand why? I'm guessing by adding the +1 you forced lots of things into the 'total > NumThreads()' branch which doesn't re-use the current thread. |
I didn't adding the +1, I removed it. So the logic should keep the same, it should still re-use the current thread. However, the little trick(reuse thread) only improved 10% performance. But I'm seeing much bigger perf drop. |
If the machine has 4 CPUs but 5 threads, then the number of cpu migrations will increase, and all the math functions will be slow down, and instructions per cycle will get lower. So,
But the second one is 10% slower than the first one on mlperf_resnet50 model. |
|
I wrote a test benchmark program without using onnxruntime(just use eigen) Task: int sum = 0;
for (int i = 0; i != 100000; ++i)
sum += i;
total += sum;4 tasks, on a 4 cores CPU
The number of Benchmark name is the number of threads.
|
This reverts commit 166b1f8.
Description:
This reverts commit 166b1f8 because our perf dashboard shows it caused great perf degradation.
Note:
The second column was run with thread pool size = 4
The third column was run with thread pool size = 3
Motivation and Context