Fix weight sharing by jeffdonahue · Pull Request #2866 · BVLC/caffe

jeffdonahue · 2015-08-06T00:07:52Z

This fixes a couple issues with weight sharing:

Unnecessary memory use and computation due to the fact that shared parameters currently don't share diffs. Having shared parameters share diffs is possible due to Decouple the computational batch size and minibatch size by accumulating gradients #1977 which made layers accumulate parameter gradient diffs rather than overwriting.
Momentum and other solver computations were incorrect due to the separation of shared parameters in the params_ vector. Someone can do the math to figure out exactly how it was incorrect if they want to; I'll just say that the added tests definitely fail without this fix.

The one possible downside is that you can no longer specify different lr_mults for shared parameters; but using this "feature" was probably a bad idea before, and if you were also using momentum or weight decay it was probably behaving in an incorrect/unexpected way.

This is based on @erictzeng's PR #2836; adding just the last commit on top of that.

shelhamer · 2015-08-06T02:33:11Z

This looks good to me once #2836 is merged. The switch for test data to accommodate the accumulation checks is fine by me.

Thanks for sorting this out Jeff!

shelhamer · 2015-08-06T02:52:29Z

This makes progress on #1211.

@jeffdonahue I think this addresses the https://github.com/BVLC/caffe/pull/546/files#r16817721 issue by the learnable_params update loop. Could you check?

shelhamer · 2015-08-06T03:36:58Z

src/caffe/solver.cpp

~~This could loop over learnable_params now as could the loop on 477.~~ Now that this is a loop over learnable_params, the condition on param_owners()[i] is wrong.

agreed... see the above diff :)

In which we learn I have a short term memory of 1 line... sorry haha.

@shelhamer later pointed out that the remaining check on param_owners below was wrong -- just fixed that, thanks!

raingo · 2015-08-06T04:09:20Z

This is very nice. Does the learnable_params depend on HDF5 snapshots in #2836? If not, can we just separate learnable_params out and merge independently?

Thanks!

jeffdonahue · 2015-08-06T18:02:22Z

@raingo if you want, you should be able to cherry-pick my last commit (you probably also need my first two commits to avoid conflicts in test_gradient_based_solver.cpp). I'm going to leave it based on #2836 as I expect that will be merged first and wanted to make sure the added tests still pass with it.

accumulation one

… params -Params now share diffs as well as data (works due to layers accumulating gradients into param diffs, rather than overwriting) -It's now required that any shared params with specified lr_mult's, decay_mult's match -TestGradientBasedSolver checks that behavior remains correct with shared weights

jeffdonahue · 2015-08-07T22:41:47Z

Rebased now that #2836 is merged, and unrelated changes to TestGradientBasedSolver are now in separate commits. @shelhamer let me know if you want to take another look, otherwise this should be good to go.

shelhamer · 2015-08-07T22:53:40Z

I took a last glance and didn't catch anything to change so go ahead and
merge. Thanks Jeff!
On Fri, Aug 7, 2015 at 15:41 Jeff Donahue notifications@github.com wrote:

Rebased now that #2836 #2836 is
merged, and unrelated changes to TestGradientBasedSolver are now in
separate commits. @shelhamer https://github.com/shelhamer let me know
if you want to take another look, otherwise this should be good to go.

—
Reply to this email directly or view it on GitHub
#2866 (comment).

Fix weight sharing

jeffdonahue · 2015-08-07T23:05:09Z

Great, thanks for the review @shelhamer!

shelhamer · 2015-08-08T04:27:17Z

On further thought I think that learnable_params is what params really ought to be / have been so that we don't need to keep both. The only instances of the old params() left are in tests and could be replaced.

The harm in changing it could be if there is downstream code such as solvers or serializers that made use of params().

What's your take @jeffdonahue ?

jeffdonahue · 2015-08-08T22:30:15Z

Yeah, I think I agree. Downstream code that is actually aware of weight sharing and conditions on param_owners would break, but I'd guess most downstream code isn't weight-sharing-aware and was treating params as learnable_params already. To be safe, we could also remove the public param_owners() accessor as all that logic, I think, should now be handled by Net.

jeffdonahue force-pushed the fix-weight-sharing branch 2 times, most recently from 7feef51 to b1d42c8 Compare August 6, 2015 00:16

jeffdonahue mentioned this pull request Aug 6, 2015

Snapshot model weights/solver state to HDF5 files #2836

Merged

shelhamer added bug enhancement focus labels Aug 6, 2015

shelhamer mentioned this pull request Aug 6, 2015

Improve / Fix Weight Sharing #1211

Open

8 tasks

shelhamer mentioned this pull request Aug 6, 2015

Adagrad and Nesterov solvers do not understand weight sharing #1659

Closed

shelhamer reviewed Aug 6, 2015
View reviewed changes

jeffdonahue force-pushed the fix-weight-sharing branch from b1d42c8 to d8ed02a Compare August 6, 2015 04:17

jeffdonahue added 3 commits August 7, 2015 15:37

TestGradientBasedSolver: restore Gaussian filler to all tests except

f81ed07

accumulation one

TestGradientBasedSolver: make tests across solver types more consistent

c251b25

jeffdonahue force-pushed the fix-weight-sharing branch from d8ed02a to d5b42bf Compare August 7, 2015 22:38

jeffdonahue added a commit that referenced this pull request Aug 7, 2015

Merge pull request #2866 from jeffdonahue/fix-weight-sharing

32ced4f

Fix weight sharing

jeffdonahue merged commit 32ced4f into BVLC:master Aug 7, 2015

jeffdonahue deleted the fix-weight-sharing branch August 7, 2015 23:07

jeffdonahue mentioned this pull request Aug 8, 2015

Multi-GPU #2870

Merged

10 tasks

This was referenced Aug 9, 2015

RMSprop clean up and rebase #2867

Merged

Adaptive Solvers: AdaDelta, RMSprop, and ADAM #2860

Closed

AdaDelta Solver (v3) #2782

Merged

ronghanghu mentioned this pull request Aug 9, 2015

Adam solver #2856

Closed

jeffdonahue mentioned this pull request Aug 11, 2015

learnable_param_ids_ fix #2909

Merged

ronghanghu mentioned this pull request Aug 12, 2015

Multi-GPU Data Parallelism (with Parallel Data Layers) #2903

Merged

9 tasks

jeffdonahue mentioned this pull request Sep 3, 2015

Net debug_info fix #3023

Merged

lukeyeager mentioned this pull request Nov 10, 2015

Adding out of memory handler, making arena bins more detailed NVIDIA/caffe#73

Merged

Comments

Conversation

jeffdonahue commented Aug 6, 2015

Uh oh!

shelhamer commented Aug 6, 2015

Uh oh!

shelhamer commented Aug 6, 2015

Uh oh!

shelhamer Aug 6, 2015

Choose a reason for hiding this comment

Uh oh!

jeffdonahue Aug 6, 2015

Choose a reason for hiding this comment

Uh oh!

shelhamer Aug 6, 2015

Choose a reason for hiding this comment

Uh oh!

jeffdonahue Aug 6, 2015

Choose a reason for hiding this comment

Uh oh!

raingo commented Aug 6, 2015

Uh oh!

jeffdonahue commented Aug 6, 2015

Uh oh!

jeffdonahue commented Aug 7, 2015

Uh oh!

shelhamer commented Aug 7, 2015

Uh oh!

jeffdonahue commented Aug 7, 2015

Uh oh!

shelhamer commented Aug 8, 2015

Uh oh!

jeffdonahue commented Aug 8, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants