Skip to content
This repository was archived by the owner on Jul 16, 2021. It is now read-only.

Add Transformer.fit#166

Merged
AtheMathmo merged 2 commits intoAtheMathmo:masterfrom
sinhrks:transformer
Jan 6, 2017
Merged

Add Transformer.fit#166
AtheMathmo merged 2 commits intoAtheMathmo:masterfrom
sinhrks:transformer

Conversation

@sinhrks
Copy link
Copy Markdown
Contributor

@sinhrks sinhrks commented Dec 27, 2016

Based on the discussion in #158, added .fit method to Transformer. This also allows basic transformers to fit to training data and transform test data.

Needs to decide:

  • Shuffler doesn't needs to be fit. There is an option to split Transformer depending on needs to fit, but I feel it is complex rather than convenient from user's point of view.
  • To keep backward compat, .transform internally calls .fit if it is not fitted yet. Or should this be breaking change to force users to call .fit first.
  • Is it ok to change MinMaxScaler to store trained data as Vector rather than Vec, which should be compat with others.

Once decisions are made, let me add tests to validate changes.

Copy link
Copy Markdown
Owner

@AtheMathmo AtheMathmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this is a good change.

I'm a little put-off because as you point out - some transformers do not need to be fitted. However, on the other side of the spectrum we have things like PCA which prefer to be fitted so that the components can be extracted without necessarily doing a transform.

I have a few other review comments but nothing huge.

let features = inputs.cols();

// ToDo: can use min, max
// https://github.com/AtheMathmo/rulinalg/pull/115
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note that this implementation may actually be more efficient as we handle it on one pass of the rows.

Certainly worth checking though (besides - we will have some other breaking changes by the time this PR lands).

Comment thread src/data/transforms/minmax.rs Outdated
// if Transformer is not fitted to the data, fit for backward-compat.
(&None, &None) => {
let res = self.fit(&inputs);
match res {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this match block by using try!(self.fit(&inputs)).

Comment thread src/data/transforms/minmax.rs Outdated
*x = *x + y;
});
fn transform(&mut self, mut inputs: Matrix<T>) -> Result<Matrix<T>, Error> {
match (&self.scale_factors, &self.const_factors) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using if let ... would be a little clearer here.

pub trait Transformer<T> {
/// Fit Transformer to input data, and stores the transformation in the Transformer
fn fit(&mut self, inputs: &T) -> Result<(), error::Error>;
/// Transforms the inputs and stores the transformation in the Transformer
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add to this comment and state that if this function is used without fitting first then the Transformer will call fit itself.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A point is whether calling transform before fit is allowed as standard behavior.

I'm + 1 to remove it in future. thus better to show warning and do not describe the behavior in the doc?

Comment thread src/data/transforms/shuffle.rs Outdated

#[allow(unused_variables)]
fn fit(&mut self, inputs: &Matrix<T>) -> Result<(), Error> {
unimplemented!();
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky one.

I think we would prefer to just have the function be a no-op. My only concern would be that users might expect to be able to retrieve information about the transformation after fitting - for example a list of row-indices to be swapped.

Otherwise without any additional documentation informing them of the panic I think almost all users would be confused and tripped up by this.

@AtheMathmo
Copy link
Copy Markdown
Owner

AtheMathmo commented Dec 28, 2016

This was originally a comment reply but I've moved it here so it can persists.

If we want to disallow using transform without fit then I believe the best way is to control this with the traits. I.e. add a new trait called TransformFitter (or something better named). Which has the following signature (or similar):

pub trait TransformFitter<U, T: Transformer> {
    fn fit(self, inputs: U) -> T;
}

We can then make it so that any Transformers which require a fit before use cannot be directly instantiated (no constructor functions, and private fields). If a transform does not need a fit (like Shuffler) then we can give it constructor functions instead of a TransformFitter struct.

Another advantage of this approach is that we do not have to deal with Option fields all the time.

What do you think?

@sinhrks
Copy link
Copy Markdown
Contributor Author

sinhrks commented Jan 2, 2017

Splitting traits sounds good. Maybe it should be done at the same time with #124?

Any change is required for the PR atm?

@sinhrks sinhrks mentioned this pull request Jan 2, 2017
@AtheMathmo
Copy link
Copy Markdown
Owner

As this PR introduces breaking changes anyway I think there isn't much reason to delay the change?

I'll read through the PR again properly soon - am a little tied up at the moment.

@sinhrks
Copy link
Copy Markdown
Contributor Author

sinhrks commented Jan 2, 2017

@AtheMathmo I don't think the PR breaks something ATM, because all existing tests are passed. Users can call transform directly as the same as previous versions.

@AtheMathmo
Copy link
Copy Markdown
Owner

Ah yes, I see your point! I'll take one last look through and hopefully get this merged.

@AtheMathmo AtheMathmo merged commit b8277f6 into AtheMathmo:master Jan 6, 2017
@sinhrks sinhrks deleted the transformer branch January 29, 2017 02:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants