Conversation
|
Thanks for the PR! I'm really looking forward to reviewing it :D As for your question, rulinalg has an |
|
Updated to use |
|
As the weekend is finally here I'll have some time tomorrow to go over all of these PRs. Thank you again for the amount of effort you've put into them - there's a lot of things here that I've been trying to find time to do for a while! |
AtheMathmo
left a comment
There was a problem hiding this comment.
So first of all I must confess that I don't know too much about PCA - having studied it little and used it less.
Overall the code looks good to me and I'm encouraged by what looks like some solid tests.
There are a few comments which need addressing in the code. I also have a more general question.
I was under the impression that with PCA you should be able to choose how many components you wish to choose (by selecting the largest singular values). I think also that this PCA implementation does not take into account when input rows > input columns?
If I am correct then we need to do a little more work here. Check out the sklearn documentation for some ideas.
| } | ||
|
|
||
| #[test] | ||
| fn test_centering() { |
There was a problem hiding this comment.
This test should live in a tests module to follow convention:
#[test]
mod tests {
// ...
}| //! | ||
| //! # Examples | ||
| //! | ||
| //! ``` |
There was a problem hiding this comment.
I think this example will be a little confusing for people who are not very familiar with PCA. In particular due to some api confusion that I'll bring up in another comment.
I think it could be improved by explaining a little more what the prediction part does. And moreover what the assertion means, what is this value that we are comparing to -0.6686?
There was a problem hiding this comment.
Fixed the example to compare with mapped matrix.
| match self.centers { | ||
| None => return Err(Error::new_untrained()), | ||
| Some(ref centers) => { | ||
| let data = centering(inputs, ¢ers); |
There was a problem hiding this comment.
This could be dangerous.
You are assuming that inputs and centers have the same number of columns and using get_unchecked in the centering function. However, our user could give an input to the predict function which does not match up.
To correct this we should check that the column counts of the internal centers and new input are the same.
There was a problem hiding this comment.
Thx. Added dimension check.
| } else { | ||
| inputs.clone() | ||
| }; | ||
| let (_, _, v) = data.svd().unwrap(); |
There was a problem hiding this comment.
Just so that you are aware, there are some known issues with SVD: AtheMathmo/rulinalg#48
These affect performance and in some cases accuracy. It might be worth adding a comment to the module description to PCA to state that this will not work on large data sets and is experimental.
|
Thanks for updating this so quickly! I wont be able to take a look for a couple of days sadly. I had a quick glance and it looked good to me. There was just one thing (that I might have missed). Have you handled the case where a matrix has more columns than rows? |
| } | ||
|
|
||
| /// Subtract center Vector from each rows | ||
| fn centering(inputs: &Matrix<f64>, centers: &Vector<f64>) -> Matrix<f64> { |
There was a problem hiding this comment.
This is minor but I think we should make this function unsafe. This way the emphasis is on whoever is using it to make sure it works properly. In your case it looks fine but if someone else makes changes later it would be a welcome reminder.
|
I have been feeling a little conflicted about having // The data that I want to reduce to principal components
let data = Matrix::new(...);
// Train a new pca model
let mut pca = PCA::default();
pca.train(&data);
// Reduce the data to get components
let p_components = pca.predict(&data);I was thinking that maybe the What do you think? I think it's safe to say that you know more than about this than I. |
|
Thanks for making these changes! The code looks good to me now - but I'd still like to discuss whether the |
|
Yeah, I feel |
I agree. The approach I would take would be to have the impl<T: Float> Transformer<Matrix<T>> for PCA {
pub fn transform(&mut self, inputs: Matrix<T>) -> LearningResult<Matrix<T>> {
let (components, transformed_inputs) = do_pca(inputs);
self.components = Some(components);
Ok(transformed_inputs)
}
}This is definitely not perfect. Sadly the API needs a little tweaking but really it is impossible to describe all possible machine learning actions with a small set of traits. Because of this we need to try and stretch the traits a little. With all that said, I think updating the |
|
I think that with the new |
|
@sinhrks just pinging you on this. Let me know if you plan to complete the work now that the Transformer changes are in place. |
|
Sure, will do. |
|
Note that #176 will land soon too (hopefully I'll check it over and merge tomorrow) |
|
This is great. May it be successfully rebased and merged. I am going to start using it for some machine learning work. |
|
I'm inclined to merge this now and let making it use |
Closes #104.
Is there a better way to compare float matrices are almost equal?