Conversation
|
The original PR #113 passed all the tests. But to accommodate the switching of the return types of Layer::Forward and Backward, regularization in the RegularizerAsLossLayer was moved from the latter to the former. After the change, many tests failed. Root cause and the solution have not been determined. |
|
I wanted this so I took a look:
|
|
I changed the gradient checker call from Single to Exhaustive and all the test passed. I don't think I changed anything else. Call CheckGradientExhaustive with only the first three options. As for the gradient getting clobbered (forward vs backward) ... is this addressed by the automatic insertion of a split node creating a dedicated bottom blob for the regularizer? If not, I don't mind calling the regularize_gpu /cpu function again in the layer's backward_cpu/gpu. |
|
@aravindhm, To fix the clobbering problem, no extra storage or computation is needed. In my branch I've split |
|
@longjon, I'm too busy and exhausted to take care of this recently. You can modify the code into whatever shape that satisfies your needs and open a new PR. |
c01f07a to
4278286
Compare
|
It seems that I also implemented something similar to this on my private branch (by adding a "PostUpdate" stage to the layers class). I am somewhat confused with the current design. How is L1Regularizer different from an L1Loss ? (and by the way, there is no L1Loss in dev branch, right ?) |
|
Closing as this is now out-of-date/abandoned and can be executed through less intrusive means, e.g., explicit loss layers or per-param regularization options. |
|
Hello @longjon , I have been searching for how to involve both L1 and L2 norm regularization on W in the cost function for the whole night, but got no answer. There are many people asking the same question in the User group, but there is nobody answering. I'm super confused on how to do the
mentioned by you. Can Caffe involve both L1 and L2 regularization at the same time? And can Caffe involve layer-wise different regularizers (some layers get L1 and others get L2)? Thank you very much. |
|
plus one interested here for a regularization layer |
1 similar comment
|
plus one interested here for a regularization layer |
This PR replaces #113 to set the merging target to be the dev branch. Please refer to #113 for the discussions about the design decisions.