Add Transformer.fit by sinhrks · Pull Request #166 · AtheMathmo/rusty-machine

sinhrks · 2016-12-27T03:01:07Z

Based on the discussion in #158, added .fit method to Transformer. This also allows basic transformers to fit to training data and transform test data.

Needs to decide:

Shuffler doesn't needs to be fit. There is an option to split Transformer depending on needs to fit, but I feel it is complex rather than convenient from user's point of view.
To keep backward compat, .transform internally calls .fit if it is not fitted yet. Or should this be breaking change to force users to call .fit first.
Is it ok to change MinMaxScaler to store trained data as Vector rather than Vec, which should be compat with others.

Once decisions are made, let me add tests to validate changes.

AtheMathmo

Overall I think this is a good change.

I'm a little put-off because as you point out - some transformers do not need to be fitted. However, on the other side of the spectrum we have things like PCA which prefer to be fitted so that the components can be extracted without necessarily doing a transform.

I have a few other review comments but nothing huge.

AtheMathmo · 2016-12-27T08:45:26Z

        let features = inputs.cols();

+        // ToDo: can use min, max
+        // https://github.com/AtheMathmo/rulinalg/pull/115


Just to note that this implementation may actually be more efficient as we handle it on one pass of the rows.

Certainly worth checking though (besides - we will have some other breaking changes by the time this PR lands).

AtheMathmo · 2016-12-27T08:46:13Z

+            // if Transformer is not fitted to the data, fit for backward-compat.
+            (&None, &None) => {
+                let res = self.fit(&inputs);
+                match res {


Can remove this match block by using try!(self.fit(&inputs)).

AtheMathmo · 2016-12-27T08:47:18Z

-                *x = *x + y;
-            });
+    fn transform(&mut self, mut inputs: Matrix<T>) -> Result<Matrix<T>, Error> {
+        match (&self.scale_factors, &self.const_factors) {


I think using if let ... would be a little clearer here.

AtheMathmo · 2016-12-27T08:48:23Z

 pub trait Transformer<T> {
+    /// Fit Transformer to input data, and stores the transformation in the Transformer
+    fn fit(&mut self, inputs: &T) -> Result<(), error::Error>;
    /// Transforms the inputs and stores the transformation in the Transformer


I would add to this comment and state that if this function is used without fitting first then the Transformer will call fit itself.

A point is whether calling transform before fit is allowed as standard behavior.

I'm + 1 to remove it in future. thus better to show warning and do not describe the behavior in the doc?

AtheMathmo · 2016-12-27T08:52:23Z

+
+    #[allow(unused_variables)]
+    fn fit(&mut self, inputs: &Matrix<T>) -> Result<(), Error> {
+        unimplemented!();


This is a tricky one.

I think we would prefer to just have the function be a no-op. My only concern would be that users might expect to be able to retrieve information about the transformation after fitting - for example a list of row-indices to be swapped.

Otherwise without any additional documentation informing them of the panic I think almost all users would be confused and tripped up by this.

AtheMathmo · 2016-12-28T06:50:34Z

This was originally a comment reply but I've moved it here so it can persists.

If we want to disallow using transform without fit then I believe the best way is to control this with the traits. I.e. add a new trait called TransformFitter (or something better named). Which has the following signature (or similar):

pub trait TransformFitter<U, T: Transformer> {
    fn fit(self, inputs: U) -> T;
}

We can then make it so that any Transformers which require a fit before use cannot be directly instantiated (no constructor functions, and private fields). If a transform does not need a fit (like Shuffler) then we can give it constructor functions instead of a TransformFitter struct.

Another advantage of this approach is that we do not have to deal with Option fields all the time.

What do you think?

sinhrks · 2017-01-02T12:21:21Z

Splitting traits sounds good. Maybe it should be done at the same time with #124?

Any change is required for the PR atm?

AtheMathmo · 2017-01-02T12:52:29Z

As this PR introduces breaking changes anyway I think there isn't much reason to delay the change?

I'll read through the PR again properly soon - am a little tied up at the moment.

sinhrks · 2017-01-02T13:02:29Z

@AtheMathmo I don't think the PR breaks something ATM, because all existing tests are passed. Users can call transform directly as the same as previous versions.

AtheMathmo · 2017-01-06T12:10:17Z

Ah yes, I see your point! I'll take one last look through and hopefully get this merged.

Add Transformer.fit

b8029bc

AtheMathmo suggested changes Dec 27, 2016

View reviewed changes

update reviewed points

7eae6e2

sinhrks mentioned this pull request Jan 2, 2017

ENH: Add Normalizer #170

Closed

AtheMathmo mentioned this pull request Jan 6, 2017

Add a new trait for fitting a transformer #171

Closed

AtheMathmo merged commit b8277f6 into AtheMathmo:master Jan 6, 2017

sinhrks deleted the transformer branch January 29, 2017 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Transformer.fit#166

Add Transformer.fit#166
AtheMathmo merged 2 commits intoAtheMathmo:masterfrom
sinhrks:transformer

sinhrks commented Dec 27, 2016 •

edited

Loading

Uh oh!

AtheMathmo left a comment

Uh oh!

AtheMathmo Dec 27, 2016

Uh oh!

AtheMathmo Dec 27, 2016

Uh oh!

AtheMathmo Dec 27, 2016

Uh oh!

AtheMathmo Dec 27, 2016

Uh oh!

sinhrks Dec 28, 2016

Uh oh!

AtheMathmo Dec 27, 2016

Uh oh!

AtheMathmo commented Dec 28, 2016 •

edited

Loading

Uh oh!

sinhrks commented Jan 2, 2017

Uh oh!

AtheMathmo commented Jan 2, 2017

Uh oh!

sinhrks commented Jan 2, 2017

Uh oh!

AtheMathmo commented Jan 6, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sinhrks commented Dec 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AtheMathmo left a comment

Choose a reason for hiding this comment

Uh oh!

AtheMathmo Dec 27, 2016

Choose a reason for hiding this comment

Uh oh!

AtheMathmo Dec 27, 2016

Choose a reason for hiding this comment

Uh oh!

AtheMathmo Dec 27, 2016

Choose a reason for hiding this comment

Uh oh!

AtheMathmo Dec 27, 2016

Choose a reason for hiding this comment

Uh oh!

sinhrks Dec 28, 2016

Choose a reason for hiding this comment

Uh oh!

AtheMathmo Dec 27, 2016

Choose a reason for hiding this comment

Uh oh!

AtheMathmo commented Dec 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sinhrks commented Jan 2, 2017

Uh oh!

AtheMathmo commented Jan 2, 2017

Uh oh!

sinhrks commented Jan 2, 2017

Uh oh!

AtheMathmo commented Jan 6, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sinhrks commented Dec 27, 2016 •

edited

Loading

AtheMathmo commented Dec 28, 2016 •

edited

Loading