Conversation
|
The callback sounds good for separating concerns and giving control to different interfaces. Please rebase / squash your commits to simplify your contribution and make it a clean merge; once that's done a reviewer will be assigned. |
|
This seems a little bit heavy in terms of engineering... Maybe we should implement it in a separate, more "fancy" solver instead of adding on to the current solver? I somewhat like having an educative, simple solver in caffe for people to test water with, and then if one wants, s/he can move to use the more powerful solvers. |
|
It seems to me that proto message(s) could easily be used in place of re: @Yangqing, I think it could be possible to write this such that the existing solver doesn't become to muddled, but the code does look a bit overwhelming to me in the current state. Also @leelurch, note that in #800 I made the training iterations 0-indexed and modified accordingly the order in which the solver executes its actions -- Test and Snapshot now come before ForwardBackward/Update in the main training loop. |
|
Yikes. I didn't change 136 files. Or at least didn't mean to. Trying to figure out what happened and how to fix it. But git is still mostly black magic to me. |
Various lint et al fixes. Fix bad conflict resolution.
|
Thanks for the comments everyone. @shelhamer re squashed changes: I had to look that up. But I think it's fixed now, as well as the error where I had screwed up the feature branch and it was showing > 100 files changed. @Yangqing regarding fancy and regular solvers: maybe I don't understand you, but that sounds like it would entail duplicating the solver logic. They'd have to be maintained in parallel which increases maintenance effort and makes changes more error prone. So.... I'm not wildly enthusiastic about that. Please see last paragraph below about alternative approach I could take. @jeffdonahue regarding the solver being overwhelming: from my perspective it is simpler. Whereas before it had logic to decide when it should do things like run test nets, resume and so on. With these changes it never has to "think" about that: it just delegates to the IterActions object to tell it what to do. For a couple weeks until today I had screwed up my feature branch and it showed over a hundred files changed. I fixed that and now it's just the 5 files that changed. Any chance you are saying it looked overwhelming because of that error? Regarding the classes IterActions, DefaultSolverActions, & TrainingStats being just setters and getters that could be automated with protobuf: good point. I have a reservation about that. Under the hood protobuf does a lot of small object heap allocation for the individual fields if I understand correctly. Whereas the classes I provided are mostly just a single contiguous object in memory and are stack allocated. Stack allocation is faster, enjoys better locality-of-reference and is more cache friendly. Since there's no need to be serializing these objects, no reason to use protobuf. Probably a trivial difference either way. Well there is the advantage that there's just a protobuf definition file instead of .h and .cpp file to maintain, but is it harder to maintain classes with just setters and getters than a .proto file? Shrug. If you strongly prefer the proto form I can do that. Regarding the 0-index modification I think I carried that through. Another design approach occurs to me that maybe is better and simpler that doesn't require a callback. Instead of only exposing a Solve() method that takes over and never returns until it's done, the Solver could also expose separate public methods for Iterate(), Resume(), Snapshot() and cetera. That way a client, instead of calling Solve(), could call Iterate() in its own loop, and decide when to stop, when to run the test nets and so on. I'd still like to have a TrainingStats object so it could get those results and store them in something easier to work with like a .csv file; otherwise we're stuck parsing a log to extract that info. |
|
Thanks @leelurch for checking this. The thing with the current IterAction is that it wraps a decision multiple rounds - imagine a first-year graduate students starting to learn SGD. S/he looks at the code, expecting to see a procedural call like learning_rate = initial_learning_rate * 1 / iter (or something similar). Instead, what s/he will see is that solver needs to write everything to a Stats, and then calls a callback to get actions, and then query the actions to determine what actions it should carry out - for an algorithm as simple as SGD, this is, with all due respect and in Chinese vernacular, killing a chicken with a machete :) Yes, code-wise it is not any longer, but think about the logic flow - it makes things quite hard to understand. Another problem is that, the current TrainingStats and IterActions are not as flexible as one may expect - for example, the key parameters are hard-coded as functions, which is not that good. What if a specific solver algorithm need additional stats such as the momentum of the second order gradient (yes, I am making things up...)? The current implementation will need to hard-code again the TrainingStats and IterActions, so such hard-coding takes place at too many places. I am actually not worried about protobuf's speed problem - sure, it adds a little overhead, but well, we are already paying a lot of function call overheads. Protobuf is MUCH easier to maintain and document than writing trivial setters and getters. And if one day we need to serialize them (imagine the case: we have a separate tool that collects stats from a running solver and shows more complicated analysis. A hand-written setter and getter class will then struggle to send every and all its values, while for protobufs? SerializeToString(). There is a reason why google uses protobufs extensively). You also mentioned storing the TrainingStats to a csv file, which is sort of serialization - why write your own serialization code instead of using a well accepted and efficient one? Another minor comment - I am not quite sure if I understand the SGDSolverEx and SGDSolver classes, aren't they just the same class with SGDSolverEx being a thin virtual class? Is there a compelling reason to separate them two? Last but not least, although I wrote a virtual Solver class, I never meant to use it as a general solver - it fits the SGD type algorithm, but not others. I may have made a wrong design choice in the first place because there is never more than a derived class (SGDSolver, although at one point I have an AdagradSolver in my private branch). I am totally fine with not having a solver base class at all: every solver just writes its own functions and decisions. Scikit-learn takes a similar approach too: https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/linear_model see coordinate_descent and stochastic_gradient: the algorithms are different enough that trying to have a unified interface above them brings more pain than good. |
|
Thank you @Yangqing for the extensive explanation regarding preference for protobufs and problem with IterActions. I think I get it now. Regarding your question about SGDSolverEx vs SGDSolver: the base Solver's Solve method was changed in this PR to accept the callback function. SGDSolverEx inherits that method, so SGDSolverEx was the version of the class that allows the client to specify a callback. But I wanted to avoid breaking existing code that already uses SGDSolver, so its interface couldn't change. So the solution was to make SGDSolver a specialization of SGDSolverEx that has an implementation of the callback, where that implementation provides the same behavior as SGDSolver had before the PR. Anyway, I guess all that is moot now. Interesting point about not trying to force a common interface on all solvers. I guess the way forward will probably be to close this PR and take another wack at it in another feature branch that is less intrusive to the design than this one. I think it's simpler to do it without a callback. But I'll wait a while to see if others care to express any opinions before I close it. |
The goal of this PR is provide a mechanism for a client of the library (such as a GUI) to take control of the training process to decide when to start and stop training, resume, and to be notified of training stats.
A callback function is passed to the Solve method. Solver calls the callback function on each training iteration. It passes the current training statistics to the callback. The callback returns an object that tells the solver what actions it should take (make a snapshot, resume, continue training or stop and cetera) as well as learning rate and weight decay.
A default implementation of the callback is provided (DefaultSolverActions) that preserves the existing behavior of the solver.
I think this improves a separation-of-concerns with Solver. It seems to monolithic, trying to do too much. This change factors the logic for the decisions related to stopping, learning rate, weight decay, snapshots and resuming into a dependency on the callback and moves the current logic for doing these things into DefaultSolverActions.