Conversation
...including siamese networks [1] as asked in #316. [1] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. CVPR 2005. |
|
@jeffdonahue this is excellent! Thanks for taking a second pass on this extension -- the implementation this time around is elegant and uncomplicated to digest.
How about "param" for the field name? This fits the same scheme as "top" and "bottom" since these all name blobs. Two other naming suggestions are:
|
|
(The following is purely about workflow policy -- disregard if uninterested.) @jeffdonahue in the future, let's pull request branches with dependencies all against dev and rebase them as they're included. Since github doesn't allow re-heading open PRs, I have to replace this PR with #546 and I think the disconnect is unfortunate. While the github interface may show spurious commits on the branches with dependencies until they are rebased, actually consulting the diff by Sound good? |
|
Replaced by #546 for merge. |
Built on top of #497 -- I pushed that to a new branch here (
fix-backward-interface) and PRed this against that, and once #497 is finished and merged I will pull against dev.This adds the ability to share parameters between layers, which has a number of applications, the canonical one perhaps being recurrent neural network (RNN) training.
To share weights between two or more layers with parameters (currently just
InnerProductLayers andConvolutionLayers), specify the sameblob_namefor all of these layers. (You can also name the biases with a secondblob_name, as in theblobs_lrandweight_decayparameters.) You can see a very simple example of this insrc/caffe/test/test_net.cpp: see the unit test namedInitDiffDataSharedWeightsNet:This means layers innerproduct1 and innerproduct2 are sharing the same set of weights as they've both specified
blob_name: 'sharedweights'. And in this case they also take the same bottom blob, (data), so their outputs, top blobsinnerproduct1andinnerproduct2, should be identical (so this is not actually something you'd ever want to do; I do it there just for testing purposes).Note that in this case we specify only one blob name because we've set
bias_term: false; if we didn't havebias_term: falsewe'd need to specify twoblob_names, but probably the second one should be empty unless we actually want to share biases. (Specifying the empty string as ablob_nameis equivalent to not specifying ablob_namein my implementation.)The entire implementation is in
Net::Init,Net::AppendParam, andNet::Update.Initfigures out which layer will actually "own" the shared param (the first one to list itsblob_name), andUpdateadds the non-owned layers' computed diffs into the diff of the owner blob, then only actually performs updates on owned blobs. Memory-wise, all shared blobs actually point to the same memory location for the parameter's data, but still have separately allocated diff blobs, as the logic to handle learning rate, weight decay, etc is still handled by the Solver (which is blissfully unaware that parameters can be shared).Open to hearing feedback on the interface, implementation, etc. I'm not sure I'm happy with
blob_nameas the name of the field, I think it would be less ambiguous to useparam_nameor something, but would be inconsistent with the other per-parameter fieldblobs_lr(and actually to be consistent with that it should beblobs_name, but I strongly prefer the singular here..).