Add Timer class unifying CPU and GPU timer and use it in net_speed_benchmark#136
Conversation
|
@mavenlin, I observed no significant difference among pure CPU timer, CPU timer with cudaDeviceSynchronize, and cudaEvent_t based GPU timer. Are these what you called weird results? |
|
Reopen to simplify future benchmark works. |
|
Please rebase on the latest dev and we'll merge. Thanks. |
|
@shelhamer, it has been rebased and polished with the newly added cpplint. |
|
So sorry, but this needs another rebase because of a complicated merge. If it's any consolation, this has prepared the reconciliation of the MKL and non-MKL versions of Caffe and brought in support for DAGs, improved documentation, and a better organization of the project. We are adopting a new development strategy that will not have a constant need for rebasing. It will be documented shortly, but the bottom line is that we will no longer rewrite the history of |
|
I am glad that Caffe is approaching version 1.0. The workflow that I used to rebase on the most recent merge is as follows. Absolutely clean history. |
|
@shelhamer, this utility class is rebased and tested again. Please merge it for those who are interested in benchmarking run time. Thanks! |
Add Timer class unifying CPU and GPU timer and use it in net_speed_benchmark
|
@kloudkl Thanks! We're catching up on PRs now, so hope to merge lots of the new developments soon. |
Add Timer class unifying CPU and GPU timer and use it in net_speed_benchmark
This resolves the concern about timing CUDA codes in the discussions of #128.
http://devblogs.nvidia.com/parallelforall/how-implement-performance-metrics-cuda-cc/
http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/#performance-metrics