Speeding up ADAM (and maybe other solvers too) by philkr · Pull Request #3519 · BVLC/caffe

philkr · 2016-01-05T20:52:32Z

It has bothered me for a while that ADAM is quite a bit slower than SGD. This PR speeds up the GPU implementation of ADAM (and if desired I can do the same for other solvers too).

I created a simple python benchmark for solvers, using random input data and a couple of large inner products: solver_bench.zip. Before this PR:

Benchmarking forward              ...      11.96 ms / it
Benchmarking backward             ...       0.32 ms / it
Benchmarking forward-backward     ...      15.98 ms / it
Benchmarking SGD-solver           ...      25.17 ms / it
Benchmarking Adam-solver          ...      40.01 ms / it
Benchmarking AdaDelta-solver      ...      50.09 ms / it
Benchmarking AdaGrad-solver       ...      35.00 ms / it
Benchmarking Nesterov-solver      ...      30.95 ms / it
Benchmarking RMSProp-solver       ...      36.77 ms / it

After this PR:

...
Benchmarking SGD-solver           ...      21.98 ms / it
Benchmarking Adam-solver          ...      23.97 ms / it
Benchmarking AdaDelta-solver      ...      24.00 ms / it
Benchmarking AdaGrad-solver       ...      21.98 ms / it
Benchmarking Nesterov-solver      ...      21.98 ms / it
Benchmarking RMSProp-solver       ...      21.99 ms / it

jeffdonahue · 2016-01-06T02:52:12Z

Thanks @philkr, this is a nice simple speedup for GPU training, and passes the TestGradientBasedSolver checks indicating all the solvers should work as they did before besides the performance improvement.

Looks like Travis is failing due to some minor lint issues. Also, could you CamelCase the new GPU kernel names and remove the _device suffixes to match existing ones (e.g. adadelta_update_device -> AdaDeltaUpdate)? I can merge this after those fixes.

philkr · 2016-01-06T04:23:24Z

Lint and CamelCase should be fixed now, and also squashed into a single commit.

philkr · 2016-01-06T04:38:10Z

ok, SGD is now slightly faster too (about 10%). Still don't quite understand why this worked, but I guess it has to do with minimizing the number of kernel calls (2 before, 1 now). Ran all tests again and passed.

jeffdonahue · 2016-01-06T05:20:51Z

Sweet, thanks again!

Speeding up ADAM (and maybe other solvers too)

philkr added the JD label Jan 5, 2016

philkr force-pushed the faster_solver branch from b109a57 to 2069d00 Compare January 6, 2016 04:22

Speeding up the GPU solvers

6d09ca2

philkr force-pushed the faster_solver branch from 2069d00 to 6d09ca2 Compare January 6, 2016 04:33

jeffdonahue added a commit that referenced this pull request Jan 6, 2016

Merge pull request #3519 from philkr/faster_solver

b23e9b1

Speeding up ADAM (and maybe other solvers too)

jeffdonahue merged commit b23e9b1 into BVLC:master Jan 6, 2016

philkr deleted the faster_solver branch January 15, 2016 18:26

seanbell mentioned this pull request Mar 23, 2016

SGD Update Kernel #3861

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up ADAM (and maybe other solvers too)#3519

Speeding up ADAM (and maybe other solvers too)#3519
jeffdonahue merged 1 commit intoBVLC:masterfrom
philkr:faster_solver

philkr commented Jan 5, 2016

Uh oh!

jeffdonahue commented Jan 6, 2016

Uh oh!

philkr commented Jan 6, 2016

Uh oh!

philkr commented Jan 6, 2016

Uh oh!

jeffdonahue commented Jan 6, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

philkr commented Jan 5, 2016

Uh oh!

jeffdonahue commented Jan 6, 2016

Uh oh!

philkr commented Jan 6, 2016

Uh oh!

philkr commented Jan 6, 2016

Uh oh!

jeffdonahue commented Jan 6, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments