Conversation
|
Good start! Comment once you've investigated the initial issues for review. On Saturday, September 20, 2014, Mohamed Omran notifications@github.com
Evan Shelhamer |
|
Hello, I'm currently using this PR in my project (https://github.com/muupan/dqn-in-the-caffe). I think allowing base_lr and lr_policy will be helpful in case AdaDelta does not coverge. In my case, using original AdaDelta caused divergence, so I multiplied the With respect to the slow tests, I wonder why kNumIters for AdaDeltaSolver are very large (=500), https://github.com/mohomran/caffe/blob/adadelta/src/caffe/test/test_gradient_based_solver.cpp#L566 while those for other solvers are small (=4). https://github.com/mohomran/caffe/blob/adadelta/src/caffe/test/test_gradient_based_solver.cpp#L390 Are these 500 iterations necessary? |
Thank you for the feedback! |
Initial implementation of the Adadelta solver as proposed in "ADADELTA: An Adaptive Learning Rate Method" (Zeiler, 2012). Motivation: http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html
Performance on the MNIST autoencoder demo is more or less on par with standard SGD+momentum but not as good as the Nesterov solver. The lack of a learning rate does seem to be a problem towards later iterations in that loss/accuracy don't entirely converge, but this could be due to an implementation issue.
(for comparison see: #741 (comment))
A couple of things to note: