Skip to content

Implement regularizers#258

Closed
kloudkl wants to merge 16 commits intoBVLC:devfrom
kloudkl:regularizer_class_hierarchy
Closed

Implement regularizers#258
kloudkl wants to merge 16 commits intoBVLC:devfrom
kloudkl:regularizer_class_hierarchy

Conversation

@kloudkl
Copy link
Contributor

@kloudkl kloudkl commented Mar 25, 2014

This PR replaces #113 to set the merging target to be the dev branch. Please refer to #113 for the discussions about the design decisions.

@kloudkl
Copy link
Contributor Author

kloudkl commented Mar 25, 2014

The original PR #113 passed all the tests. But to accommodate the switching of the return types of Layer::Forward and Backward, regularization in the RegularizerAsLossLayer was moved from the latter to the former. After the change, many tests failed. Root cause and the solution have not been determined.

@shelhamer shelhamer added this to the 1.1 milestone Mar 28, 2014
@longjon longjon mentioned this pull request Apr 8, 2014
@longjon
Copy link
Contributor

longjon commented Apr 17, 2014

I wanted this so I took a look:

  • The main issue is that RegularizerAsLossLayer is redundant with the regularizer implementation. Making it almost trivial (just setting diff to zero in Forward) mostly fixes things, except the below.
  • Is there a reason for using sigma=10 Gaussian initalization in RegularizationAsLossTest? I can get things to pass iff I use the default sigma=1 instead.
  • The kink parameters to GradientChecker needs to be used to avoid the nonsmooth region of the L1 loss (how did these tests pass before?)
  • Making the changes described above fixes the tests but still does not leave a correct implementation, because the regularizer gradient is computed in Forward, and could be clobbered by the layer's Backward.
  • There are many redundant checks to see if things are nonzero (the number of regularizers or the size of a blob). Is there a good reason for these? To me they feel like noise.
  • Why regularize the bottom blob? To me the top feels more natural (but it does complicate the implementation of RegularizerAsLossLayer...)

@aravindhm
Copy link

I changed the gradient checker call from Single to Exhaustive and all the test passed. I don't think I changed anything else. Call CheckGradientExhaustive with only the first three options.

As for the gradient getting clobbered (forward vs backward) ... is this addressed by the automatic insertion of a split node creating a dedicated bottom blob for the regularizer? If not, I don't mind calling the regularize_gpu /cpu function again in the layer's backward_cpu/gpu.

@longjon
Copy link
Contributor

longjon commented Apr 21, 2014

@aravindhm, CheckGradientExhaustive loops over top blobs, which the RegularizerAsLossLayer has none. The tests pass, but don't test anything.

To fix the clobbering problem, no extra storage or computation is needed. In my branch I've split Regularize into Loss and Gradient functions, calling the former in Forward and the latter in Backward. (I've also switched regularization to work against the top blob.) If @kloudkl wants I can clean this up and send him a PR.

@kloudkl
Copy link
Contributor Author

kloudkl commented Apr 22, 2014

@longjon, I'm too busy and exhausted to take care of this recently. You can modify the code into whatever shape that satisfies your needs and open a new PR.

@rodrigob
Copy link
Contributor

rodrigob commented Oct 6, 2014

It seems that I also implemented something similar to this on my private branch (by adding a "PostUpdate" stage to the layers class). I am somewhat confused with the current design. How is L1Regularizer different from an L1Loss ? (and by the way, there is no L1Loss in dev branch, right ?)

@longjon
Copy link
Contributor

longjon commented Mar 9, 2015

Closing as this is now out-of-date/abandoned and can be executed through less intrusive means, e.g., explicit loss layers or per-param regularization options.

@longjon longjon closed this Mar 9, 2015
@shelhamer shelhamer removed this from the Future milestone Mar 10, 2015
@zhaogengyan
Copy link

Hello @longjon , I have been searching for how to involve both L1 and L2 norm regularization on W in the cost function for the whole night, but got no answer. There are many people asking the same question in the User group, but there is nobody answering. I'm super confused on how to do the

explicit loss layers or per-param regularization options

mentioned by you.

Can Caffe involve both L1 and L2 regularization at the same time? And can Caffe involve layer-wise different regularizers (some layers get L1 and others get L2)? Thank you very much.

@robmosh
Copy link

robmosh commented Apr 24, 2017

plus one interested here for a regularization layer

1 similar comment
@foolwood
Copy link

plus one interested here for a regularization layer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants

Comments