Adding error handling to model traits by AtheMathmo · Pull Request #120 · AtheMathmo/rusty-machine

AtheMathmo · 2016-09-02T05:31:12Z

This PR adds some error handling to the SupModel and UnSupModel traits. It's flagged as WIP as I have a few tweaks planned - but the core content is ready for review and I'd appreciate some feedback.

Note that I haven't attempted to cover all possible panics from within the train and predict functions. I think it is non-trivial to decide when such things should occur and so I'd prefer to take a lightweight approach for now and add them in as we go (as this will not cause breakage in the api). With that said - if you see something that should be returning an Error here please comment so I can address/create an issue.

theotherphil · 2016-09-02T06:21:06Z

Have you considered separating the notions of model and modelling method (or 'trainer')? That would get rid of the 'model not trained' errors at the type level.

Trainer::train : training data -> Model
Trainer::predict : (data, Model) -> Prediction

This would also help make serialisation clearer when that's implemented - models become just lumps of data, and they're the things that get serialised. Modelling methods/trainer/otherbettername run algorithms to produce models from training data and to produce predictions from data and a model.

AtheMathmo · 2016-09-02T12:45:29Z

@theotherphil thanks for the comment! Someone else recommended this a while ago and at the time I was really excited about it but a couple things prevented me from moving forwards.

Sometimes we want to train a model multiple times. Like with online learning, or maybe in some model validation situations. This isn't a killer though, we could just make a new trait that acts on Models (ReTrainable or something).
Though it would make a lot of sense to the Rust crowd I worry it would be overwhelming to users coming from elsewhere (which right now is almost all of our potential users). We have the advantage right now of feeling similar to other frameworks.

Honestly, at this point in time the first point feels like we could work around this cleanly enough. As for the second I think I'd like to try converting an existing model to see what it looks like and how complex the new API feels. Maybe I'll try to get some eyes on it before merging to see how people feel about it.

There were some other issues I ran into while trying this before but none were too severe from memory.

tafia · 2016-09-02T14:03:38Z

Can't we have both alternatives with a wrapper over the Trainer, which hold a Option<Model> for newcomers, and the plain Trainer for the other?

AtheMathmo · 2016-09-02T14:41:07Z

@tafia - Yeah that is certainly an option too. Though I'm not sure I'd like for both options to exist like that. If we make the change it should be to enforce safety with the type system, but this way we end up with the same problem if the API is invoked directly on the wrapper.

theotherphil · 2016-09-03T07:15:15Z

@AtheMathmo: your first point sounds more like an argument in favour of splitting the model out. As a user I wouldn't be sure what calling train twice is supposed to do. Before you mentioned retraining I'd have guessed that calling train twice should either panic or retrain from scratch on the new data only. But in some cases it might mean 'train a new model on the data and combine with the old model equivalently to having trained from scratch using all the data seen thus far' (e.g. if fitting a normal to your data you can just do a weighted combination of the two fitted normals), or maybe 'update the model to reflect new data without a full refit' (e.g. just updating the leaf distributions for a random forest, without retraining the node functions), or maybe 'train from scratch using all the new data and all the previous training data we've seen because we've cached it all somewhere'. It should be fairly easy to make these distinct in the type system, but at the cost of more surface complexity.

I think offering a wrapper option could lead to a lot of initial confusion for people learning the library.

AtheMathmo · 2016-09-03T16:15:05Z

@theotherphil I agree with your points totally! However, I wasn't very clear with what I meant. I meant that I avoided the change before as I couldn't see a way to handle the code organization if we require training functionality on models that have been trained (like the one you discussed above). We'd like to reuse much of the code in the training algorithm but it seems difficult to organize this effectively.

I think it still makes sense to explore this change and we can figure out the online training complications later.

AtheMathmo · 2016-09-03T16:44:06Z

I'm fairly happy with this PR now. I decided to go with roughly the following criteria:

Any unexpected internal algorithmic failures are caught in debug assertions. e.g. dimensions not aligned which are determined by the algorithm.
Any user programming errors are caught in panics (normally via assertions).
Any failures - due to poor user trait implementations, bad data, or algorithmic failures out of user control are returned as Errors.

I'd be happy to receive feedback on places where it seems I'm not following this!

tafia · 2016-09-05T16:28:27Z

-            return (post_mean, post_var);
+            Ok((post_mean, post_var))
+        } else {
+            Err(Error::new(ErrorKind::UntrainedModel, "The model has not been trained."))


This is really happening a lot. Can't we have a special constructor or a more advanced ErrorKind that has or not a &'static str message?

I more or less copied the std::io implementation but I agree that this case doesn't really fit in. I'll see how easy it is to modify - but it seems this particular error will be disappearing if we modify the model traits.

AtheMathmo · 2016-09-08T22:05:25Z

@tafia - I added a new constructor for the untrained model errors. We may end up changing the API to remove these completely though.

tafia · 2016-09-09T06:54:15Z

To be honest while it is better it still doesn't look good. But it doesn't
really matters, better to have results merged.

On 9 Sep 2016 12:05 a.m., "James Lucas" notifications@github.com wrote:

@tafia https://github.com/tafia - I added a new constructor for the
untrained model errors. We may end up changing the API to remove these
completely though.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#120 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHAszm5_pMftdPSV8Erc6qpWWF4pSebNks5qoIamgaJpZM4JzYWR
.

AtheMathmo · 2016-09-09T13:19:32Z

@tafia I feel the same way. But if you have any suggestions on how I can make it better without overhauling the error module I'd be happy to do so.

Otherwise I'll merge this in (at least it's not worse than our current panic with train message approach!)

tafia · 2016-09-14T17:53:15Z

Sorry for not answering. I have a very limited access to internet. I don't
have any good alternative either. You were right to merge it

On 14 Sep 2016 12:41 p.m., "James Lucas" notifications@github.com wrote:

Merged #120 #120.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#120 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHAszqRXgDOYh5xuzr0sAL_-OdBDpYgSks5qqBWLgaJpZM4JzYWR
.

Adding error handling to model traits

1a94887

Removing some panics and asserts in favour of result chains

031256d

AtheMathmo changed the title ~~[WIP] Adding error handling to model traits~~ Adding error handling to model traits Sep 3, 2016

AtheMathmo mentioned this pull request Sep 3, 2016

Serialisation #122

Open

Adding error handling for neural nets

131bafc

theotherphil mentioned this pull request Sep 3, 2016

Model trait APIs #124

Open

tafia reviewed Sep 5, 2016
View reviewed changes

AtheMathmo mentioned this pull request Sep 8, 2016

Improvements to Neural Networks #91

Open

Adding special error constructor for untrained models

736b38f

AtheMathmo merged commit 5fb0c57 into master Sep 14, 2016

AtheMathmo deleted the error-handling branch September 17, 2016 17:19

AtheMathmo mentioned this pull request Sep 21, 2016

Updating to Rulinalg 0.3.2 #133

Merged

Conversation

AtheMathmo commented Sep 2, 2016

Uh oh!

theotherphil commented Sep 2, 2016

Uh oh!

AtheMathmo commented Sep 2, 2016

Uh oh!

tafia commented Sep 2, 2016

Uh oh!

AtheMathmo commented Sep 2, 2016

Uh oh!

theotherphil commented Sep 3, 2016

Uh oh!

AtheMathmo commented Sep 3, 2016

Uh oh!

AtheMathmo commented Sep 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tafia Sep 5, 2016

Choose a reason for hiding this comment

Uh oh!

AtheMathmo Sep 5, 2016

Choose a reason for hiding this comment

Uh oh!

AtheMathmo commented Sep 8, 2016

Uh oh!

tafia commented Sep 9, 2016

Uh oh!

AtheMathmo commented Sep 9, 2016

Uh oh!

tafia commented Sep 14, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AtheMathmo commented Sep 3, 2016 •

edited

Loading