Add steps to install multi-threaded OpenBLAS on Ubuntu#80
Add steps to install multi-threaded OpenBLAS on Ubuntu#80kloudkl wants to merge 1 commit intoBVLC:masterfrom kloudkl:multi_threaded_blas
Conversation
|
Are you sure when using boost-eigen, you are compiling with multi-thread enabled? boost-eigen naturally comes with multithreaded gemm, which would probably account for most of the gain you are observing. |
|
To make it clear whether OpenBLAS or Eigen contributed to the performance improvements in the boost-eigen branch, three groups of benchmark experiments with different compilation flags are conducted using the lenet*.prototxt. For all the experiments, max iter is set to 3,000 and solver_mode is 0.
To check the effects of threads number, three runtime environment variables combinations are tested.
In all the experiments, max iter is set to 3,000 and solver_mode is set to 0 in lenet_solver.prototxt.
Comparing the results of compilation flags 1 and 3, it is evident that the multi-threaded OpenBLAS runs about 5 times faster than the normal ATLAS. The similar performances of compilation flags 2 and 3 prove that enabling OpenMP for Eigen does not help at all in this setting. |
|
I still do not think you are using the multithreaded version of eigen3. https://plafrim.bordeaux.inria.fr/doku.php?id=people:guenneba it would be extremely unlikely that eigen itself is bad in multithreading. Again, using lenet is not a good idea to benchmark things, use Yangqing On Fri, Feb 7, 2014 at 10:39 AM, kloudkl notifications@github.com wrote:
|
|
I'd like to make my arguments clear: (1) I am not comparing ATLAS with OpenBLAS - it is known that ATLAS is (2) small datasets like MNIST does not reflect actual use cases such as Yangqing On Fri, Feb 7, 2014 at 10:44 AM, Yangqing Jia jiayq84@gmail.com wrote:
|
|
I looked at the code more closely and now I have a little clearer picture on what caused this. in caffe/util/math_functions.cpp the gemm calls are still made using cblas_gemm instead of the eigen function, making the framework effectively still using atlas rather than eigen to carry out gemm. I will close this issue and open a separate issue indicating this necessary change for boost-eigen. If you would like to do a more detailed comparison please feel free to. Thanks for finding this bug! |
|
Thank you for all this benchmarking work! |
|
INSTALL.md has been replaced with a pointer to the online installation documentation to avoid the overhead of duplication, so refer to #81. |
|
This statement is categorically false: "it is known that ATLAS is inherently single-threaded." ATLAS has been threaded for 5+ years http://math-atlas.sourceforge.net/faq.html#tnum |
Add cudnn v4 batch normalization integration
* Fix boost shared_ptr issue in python interface * Default output model name for bn convert style script * Fix bugs in generation bn inference model * Script to convert inner product to convolution * Script to do polyak averaging
standardize memory optimization configurations * yjxiong/fix/mem_config: take care of share data with excluded blob improvise memory opt configs fix cudnn conv legacy bug (BVLC#96) add TOC Update README.md Update README.md (BVLC#95) Update README.md Improve the python interface (BVLC#80) Update README.md
…caffe into imagenet_vid_2016 * 'imagenet_vid_2016' of https://github.com/myfavouritekk/caffe: take care of share data with excluded blob Revert "Fix a but when setting no_mem_opt: true for layers near in-place layers." improvise memory opt configs fix cudnn conv legacy bug (BVLC#96) add TOC Update README.md Update README.md (BVLC#95) Update README.md Improve the python interface (BVLC#80) Update README.md
Multi-threaded OpenBLAS makes a huge performance difference. The benchmarks with and without it in comments to #16 demonstrated more than 5 times speed-up for boost-eigen and MKL on a machine with 4 Hyper-Threading CPU cores (supporting 8 threads).
This fixes #79.