Conversation
AtheMathmo
left a comment
There was a problem hiding this comment.
Overall I think this is a good change.
I'm a little put-off because as you point out - some transformers do not need to be fitted. However, on the other side of the spectrum we have things like PCA which prefer to be fitted so that the components can be extracted without necessarily doing a transform.
I have a few other review comments but nothing huge.
| let features = inputs.cols(); | ||
|
|
||
| // ToDo: can use min, max | ||
| // https://github.com/AtheMathmo/rulinalg/pull/115 |
There was a problem hiding this comment.
Just to note that this implementation may actually be more efficient as we handle it on one pass of the rows.
Certainly worth checking though (besides - we will have some other breaking changes by the time this PR lands).
| // if Transformer is not fitted to the data, fit for backward-compat. | ||
| (&None, &None) => { | ||
| let res = self.fit(&inputs); | ||
| match res { |
There was a problem hiding this comment.
Can remove this match block by using try!(self.fit(&inputs)).
| *x = *x + y; | ||
| }); | ||
| fn transform(&mut self, mut inputs: Matrix<T>) -> Result<Matrix<T>, Error> { | ||
| match (&self.scale_factors, &self.const_factors) { |
There was a problem hiding this comment.
I think using if let ... would be a little clearer here.
| pub trait Transformer<T> { | ||
| /// Fit Transformer to input data, and stores the transformation in the Transformer | ||
| fn fit(&mut self, inputs: &T) -> Result<(), error::Error>; | ||
| /// Transforms the inputs and stores the transformation in the Transformer |
There was a problem hiding this comment.
I would add to this comment and state that if this function is used without fitting first then the Transformer will call fit itself.
There was a problem hiding this comment.
A point is whether calling transform before fit is allowed as standard behavior.
I'm + 1 to remove it in future. thus better to show warning and do not describe the behavior in the doc?
|
|
||
| #[allow(unused_variables)] | ||
| fn fit(&mut self, inputs: &Matrix<T>) -> Result<(), Error> { | ||
| unimplemented!(); |
There was a problem hiding this comment.
This is a tricky one.
I think we would prefer to just have the function be a no-op. My only concern would be that users might expect to be able to retrieve information about the transformation after fitting - for example a list of row-indices to be swapped.
Otherwise without any additional documentation informing them of the panic I think almost all users would be confused and tripped up by this.
|
This was originally a comment reply but I've moved it here so it can persists. If we want to disallow using pub trait TransformFitter<U, T: Transformer> {
fn fit(self, inputs: U) -> T;
}We can then make it so that any Another advantage of this approach is that we do not have to deal with What do you think? |
|
Splitting traits sounds good. Maybe it should be done at the same time with #124? Any change is required for the PR atm? |
|
As this PR introduces breaking changes anyway I think there isn't much reason to delay the change? I'll read through the PR again properly soon - am a little tied up at the moment. |
|
@AtheMathmo I don't think the PR breaks something ATM, because all existing tests are passed. Users can call |
|
Ah yes, I see your point! I'll take one last look through and hopefully get this merged. |
Based on the discussion in #158, added
.fitmethod toTransformer. This also allows basic transformers to fit to training data and transform test data.Needs to decide:
Shufflerdoesn't needs to befit. There is an option to splitTransformerdepending on needs to fit, but I feel it is complex rather than convenient from user's point of view..transforminternally calls.fitif it is not fitted yet. Or should this be breaking change to force users to call.fitfirst.MinMaxScalerto store trained data asVectorrather thanVec, which should be compat with others.Once decisions are made, let me add tests to validate changes.