Speeding up ADAM (and maybe other solvers too)#3519
Conversation
|
Thanks @philkr, this is a nice simple speedup for GPU training, and passes the TestGradientBasedSolver checks indicating all the solvers should work as they did before besides the performance improvement. Looks like Travis is failing due to some minor lint issues. Also, could you CamelCase the new GPU kernel names and remove the |
|
Lint and CamelCase should be fixed now, and also squashed into a single commit. |
|
ok, SGD is now slightly faster too (about 10%). Still don't quite understand why this worked, but I guess it has to do with minimizing the number of kernel calls (2 before, 1 now). Ran all tests again and passed. |
|
Sweet, thanks again! |
Speeding up ADAM (and maybe other solvers too)
It has bothered me for a while that ADAM is quite a bit slower than SGD. This PR speeds up the GPU implementation of ADAM (and if desired I can do the same for other solvers too).
I created a simple python benchmark for solvers, using random input data and a couple of large inner products: solver_bench.zip. Before this PR:
After this PR: