Skip to content

Comments

Proposition: After the OpenCL branch maturing for over 3 years, maybe it's time to give Caffe a bump#6427

Closed
naibaf7 wants to merge 6150 commits intomasterfrom
opencl
Closed

Proposition: After the OpenCL branch maturing for over 3 years, maybe it's time to give Caffe a bump#6427
naibaf7 wants to merge 6150 commits intomasterfrom
opencl

Conversation

@naibaf7
Copy link
Member

@naibaf7 naibaf7 commented Jun 12, 2018

The OpenCL branch has been maturing for 3 years. In this time, a lot of features have been accumulated that are absent in the main branch. Notably, there are many features (and many more to come) that would differentiate Caffe from other frameworks such as TensorFlow, Caffe2 and Torch in a positive way. Caffe has been lacking such features lately.

I'm also standing on a crossroad now, where adding more features to my branch would ultimately make it too difficult to merge upstream changes from the master branch into the opencl branch.

A selection of core features:

  • Virtual pointers and complete device abstraction.
  • Multiplexer supporting different BLAS libraries on GPUs and CPUs.
  • A single, device-abstracted code path for the GPU layers.
  • Runtime compiled kernels for CUDA and OpenCL with a SQLite3 kernel cache.
  • LibDNN with support for convolution, pooling and a complete BLAS, that can act as a drop-in replacement for cuDNN and runs on all OpenCL and CUDA GPUs.
  • Extended Python interface.
  • Vastly improved NetSpec.
  • Quantized data types (INT8, INT16) and half-precision float (FP16).
  • Quantization with support of pseudo-quantization, quantization parameter estimation and mixed-precision neural networks.
  • Gemmlowp support on the CPU (INT8 inference).
  • Topological MALIS-loss layer.
  • Mixture-of-experts (MOE) layer.
  • Explicit support for AMD, nVidia, ARM Mali and Intel GPUs.
  • CPU OpenCL support.
  • Extended OpenMP support.
  • Dropping the Makefile build system.
  • CMake cross-compile support for Android and ARM (Raspberry Pi, Asus Tinkerboard)
  • Reduced memory inference.
  • Runtime compiled kernels means no more conflicts between NVCC and host compiler, making builds much faster and easier.
  • Forward and backward compatibility to existing networks and trained weights.

Upcoming features:

  • Raspberry Pi VideoCore IV support
  • Writing GPU layers in Python

Currently broken features:

  • NCCL
  • Multi-GPU
  • HDF5 (in some cases)
  • Some runtests (easy to fix)

Missing/incomplete features:

  • Some unit tests for quantized types are not implemented yet
  • Quantization only implemented for some layers (LeNet and AlexNet work quantized).

So I hope we can get some maintainers to help me hunt down the remaining few bugs in this branch and make it ready for a merge with the master branch (maybe until December?).

I hope this finds consideration among other Caffe maintainers. I will follow up with all changes and additions made to this branch once core maintainers will agree to go down this path for the future of Caffe.

cypof and others added 30 commits August 16, 2017 18:24
Mention SKX support
The logic for setting the library RPATH checks whether or not
${CMAKE_INSTALL_PREFIX}/lib is a system directory, and if not
adds it to the library RPATH. However, caffe does not install
to ${CMAKE_INSTALL_PREFIX}/lib, it installs to
${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR} (from
GNUInstallDirs). CMAKE_INSTALL_LIBDIR may be something like
"lib/x86_64-linux-gnu"
Fix division operator for Compatibility of python 3 in classifier.py
The pointers could be used by CUDA wrapper libraries in Python such as
PyCUDA, gnumpy, Theano etc.
This line is needed for Ubuntu 16.04:

    sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev

For reference:

* https://github.com/BVLC/caffe/wiki/Ubuntu-16.04-or-15.10-Installation-Guide
* https://youtu.be/DnIs4DRjNL4
Expose GPU pointers to Python
fix bilinear filler (and make constant filler more strict, as it should be)
Signed-off-by: Finnian Anderson <get@finnian.io>
…op_k, no need for fancy priority_queue etc. (2) GPU implementation
…ch-1

[docs] packages needed by Ubuntu 16.04, not just Ubuntu 14.04
Add absolute tolerance to test_net.py to prevent random Travis fails
fixes "NameError: name 'int_tp' is not defined" when running plot_training_log.py
fixes "NameError: name 'int_tp' is not defined" when running plot_training_log.py
Fix default mode warning in io.resize_image
Fixed bug where make distribute duplicates python files
fix web demo install instruction link
Fix: mean shape incompatible with input shape
@naibaf7 naibaf7 requested review from jeffdonahue and shelhamer June 12, 2018 05:56
@naibaf7 naibaf7 self-assigned this Jun 12, 2018
@naibaf7 naibaf7 requested a review from Noiredd June 12, 2018 05:57
@BVLC BVLC deleted a comment from skn123 Jun 26, 2018
@shelhamer
Copy link
Member

@naibaf7 Thank you for dedicating so much effort and engineering to OpenCL Caffe! This branch has only improved since the old times of #2610. Throughout the discussion and comments you have always contributed code, and that takes a lot of time and thought.

I have clearly not had the bandwidth for Caffe development that I once did. I still have my sights on a 1.1 to include module layers (and by extension Halide layers), because these are excellent directions that I was always interested in—among many other good directions, of course—but it has been a long while now. While I still hope to rally the time to take care of last bits of grooming and merge those, I do not expect to be able to contribute as necessary to make a difference in the ~100k line patch that you are hammering on here.

That said, do not let that be an obstacle to your hacking! The only constraint I would advise is that existing capabilities not go missing, like multi-GPU training, HDF5 IO, and so forth. Whether this line of development is a branch, fork, or merged in master can be up to you and the others who keep pushing, pulling, reviewing, and of course brewing ☕

@TechnikEmpire
Copy link

"Thanks for all the hard work" - Lets it rot as a dangling PR forever.

@Coderx7
Copy link
Contributor

Coderx7 commented Oct 11, 2018

@naibaf7 : what happened? whats the ultimate decision ?

@TechnikEmpire
Copy link

I have to say that right now, there's compilers issues. Msvc rejects the function instantiation macros being used in some layers.

@naibaf7
Copy link
Member Author

naibaf7 commented Oct 11, 2018

@TechnikEmpire
Currently I do not have time to continue developing it.
Neither seems interest big enough to do so, and since this was an academic exercise, I doubt it will go on like it is.
However, I am looking into porting the Caffe OpenCL code into LibDNN, and then trying to get PyTorch onto that. But it's not sure yet.
Don't expect any news short-time.

@TechnikEmpire
Copy link

TechnikEmpire commented Oct 11, 2018

Thanks for your contribution @naibaf7. It's unfortunate but Nvidia has a strangle hold on deep learning, and now Intel has joined the ranks, even destroying opencv and making it depend on proprietary acceleration that doesn't even work on amd cpus.

People like you with your contributions are opening these technologies to the general public while others hold back progress to line their corporate pockets. Will watch your personal repos. Thanks again.

@skn123
Copy link

skn123 commented Oct 12, 2018

@naibaf7 This was indeed a welcome commit. Can you please check the PR That I made? If a few issues related to "double" is solved then the code will be fully functional. However, there is a HSA version of caffe being developed at ROCm! I am waiting for that. The older version of the code is also there in my personal repo and it should work without any hitch.

@naibaf7
Copy link
Member Author

naibaf7 commented Oct 13, 2018

Thanks for your support. Stay tuned for future developments in deep learning by me, it just may take a while longer. :)

@willyd willyd mentioned this pull request Nov 28, 2018
5 tasks
@naibaf7 naibaf7 closed this Jan 10, 2023
@dragonQian

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.