Conversation
7feef51 to
b1d42c8
Compare
|
This looks good to me once #2836 is merged. The switch for test data to accommodate the accumulation checks is fine by me. Thanks for sorting this out Jeff! |
|
This makes progress on #1211. @jeffdonahue I think this addresses the https://github.com/BVLC/caffe/pull/546/files#r16817721 issue by the |
There was a problem hiding this comment.
This could loop over Now that this is a loop over learnable_params now as could the loop on 477.learnable_params, the condition on param_owners()[i] is wrong.
There was a problem hiding this comment.
agreed... see the above diff :)
There was a problem hiding this comment.
In which we learn I have a short term memory of 1 line... sorry haha.
There was a problem hiding this comment.
@shelhamer later pointed out that the remaining check on param_owners below was wrong -- just fixed that, thanks!
|
This is very nice. Does the learnable_params depend on HDF5 snapshots in #2836? If not, can we just separate learnable_params out and merge independently? Thanks! |
b1d42c8 to
d8ed02a
Compare
|
@raingo if you want, you should be able to cherry-pick my last commit (you probably also need my first two commits to avoid conflicts in |
… params -Params now share diffs as well as data (works due to layers accumulating gradients into param diffs, rather than overwriting) -It's now required that any shared params with specified lr_mult's, decay_mult's match -TestGradientBasedSolver checks that behavior remains correct with shared weights
d8ed02a to
d5b42bf
Compare
|
Rebased now that #2836 is merged, and unrelated changes to |
|
I took a last glance and didn't catch anything to change so go ahead and
|
|
Great, thanks for the review @shelhamer! |
|
On further thought I think that The harm in changing it could be if there is downstream code such as solvers or serializers that made use of What's your take @jeffdonahue ? |
|
Yeah, I think I agree. Downstream code that is actually aware of weight sharing and conditions on |
This fixes a couple issues with weight sharing:
params_vector. Someone can do the math to figure out exactly how it was incorrect if they want to; I'll just say that the added tests definitely fail without this fix.The one possible downside is that you can no longer specify different
lr_mults for shared parameters; but using this "feature" was probably a bad idea before, and if you were also using momentum or weight decay it was probably behaving in an incorrect/unexpected way.This is based on @erictzeng's PR #2836; adding just the last commit on top of that.