New lr policies, MultiStep and StepEarly#190
Conversation
|
Nice policies Sergio. Thanks for the examples. Could you also include tests? Learning rate policies and termination criteria #76 are both scheduled parts of the solver, and the conversation kind of stalled about the best way to add these. The options were observer/notify classes, coding right into solver, or make learning rate and termination factories like layerfactory. I think refactoring to a LearningRateFactory could be nice and orderly, and then the solver would call the LearningRate for any updates. What do you think? Re: naming, StepPlateau or StepFlat might be more descriptive than StepEarly. Or as you suggested elsewhere, EarlyStep has a nice relationship to early stopping. |
|
@shelhamer, I agree with you since this PR increases the number of learning rates to @Yangqing's refactoring threshold. I will use AdaptiveLearningRateFactory and AdaptiveLearningRate when I got the time to solve #30. AdaptiveLearningRate can not be mixed with LearningRate because of the different APIs. template<typename Dtype>
class LearningRate {
public:
Dtype schedule(const int iteration);
}
template<typename Dtype>
class AdaptiveLearningRate {
public:
// returns parameter wise learning rate
shared_ptr<Blob<Dtype> > schedule(const int iteration, const shared_ptr<Blob<Dtype> > gradient);
} |
|
Having a multistep decrease is definitely useful. The only thing I'd like to add is that having an unlimited number of steps, makes parametrizing Caffe more difficult. (The reason I bring this up is that I running hyperparameter optimization on caffe). So maybe instead of having to set each step, the stepsize could, just like the learning rate, follow a parametric function, e.g. decay exponentially or linearly. Let me know what you think. |
|
@tdomhan I will fix first the current new_lr_decay policies and will add more later. |
|
@sguada it'd be great to include these policies, and multistep would simplify the cifar-10 example. |
4278286 to
c01f07a
Compare
77de8d7 to
523a39c
Compare
|
hi Caffe! just a heads up to say I did try the merge on my own repo, and did run the test just as Travis did, and I am not getting any error. FYI Travis reports this after all tests are successful: |
523a39c to
be36e40
Compare
Conflicts: include/caffe/solver.hpp src/caffe/proto/caffe.proto src/caffe/solver.cpp
be36e40 to
b025da7
Compare
|
When will this commit be available approximately? |
b025da7 to
6e20aa3
Compare
New lr policies, MultiStep and StepEarly
|
@Mezn it is available, let me know if you have any problems. |
|
@sguada my understanding is that stepearly is not part of the commit. Also, the *.prototxt for mnist are in examples/lenet instead of examples/mnist. my 2cents :) and thanks for this. |
|
@sguada thanks for the explanations. I'm into stochastic optimization, I'd be interested in looking at the old stepearly code. FYI, I am experimenting with a 'stagnation' policy relying on the median losses and or tests in order to speed up the overall training time. |
New lr policies, MultiStep and StepEarly
|
Let's remove |
This `examples/lenet/lenet_stepearly_solver.prototxt` is introduced in BVLC#190 by mistake, since stepearly is never actually merged.
Fix Python installation with CMake install target
This `examples/lenet/lenet_stepearly_solver.prototxt` is introduced in BVLC#190 by mistake, since stepearly is never actually merged.
lenet_multistep_solver.prototxtAllows to define multiple steps in the
solver.prototxtby settinglr_policy: multistepand by definingstepvaluewhen the learning rate should be decreased. This allows to have not evenly distributed steps. One should define the sequence ofstepvaluein increasing order.lenet_stepearly_solver.prototxtAllows to decrease the
lr_ratedynamically based in the behaviour of Test accuracy.The learning will be decreased when for a number Tests defined by
stepearlythe maximum accuracy has not increased.