Conversation
AtheMathmo
left a comment
There was a problem hiding this comment.
Just need to add a feature flag - or at least discuss this.
Otherwise this looks ready to go.
| /// Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. | ||
| /// Irvine, CA: University of California, School of Information and Computer Science. | ||
| pub fn load_iris() -> Dataset<Matrix<f64>, Vector<usize>> { | ||
| let data: Vec<f64> = vec![5.1, 3.5, 1.4, 0.2, |
There was a problem hiding this comment.
Minor: It might be easier to use the matrix! macro here. My thinking is that if we need to add a row there's a little less work.
|
|
||
| /// Dataset container | ||
| #[derive(Clone, Debug)] | ||
| pub struct Dataset<D, T> where D: Clone + Debug, T: Clone + Debug { |
There was a problem hiding this comment.
I think this makes sense for now. We might want to be more strict in future if we want to be generic over DataSets. However, this is something that I don't think we will ever want to do.
| } | ||
|
|
||
| /// Module for datasets. | ||
| pub mod datasets; |
There was a problem hiding this comment.
We should feature gate this. My thinking is that if we have a few datasets users will not want to download all of this data by default.
To do this:
- Add a new feature to
Cargo.toml - In
lib.rsadd a feature flag
|
Added feature gates. Is it ok to be included by default ATM (as it is likely to be used in most tests)? |
|
Thanks for the update. It looks good but I'm a little cautious about having the datasets flag included by default. I wanted it feature flagged specifically so that it had to be opted-in. I can see that we will probably want to use it in some tests but I'd try a few ways around this first.
Finally note that we will need to modify the travis CI matrix to include the "datasets" flag. |
43d6e2d to
90e1944
Compare
|
OK, made |
|
This looks good to me but before merging I'd like to check out the branch and play around with it a little. Thanks! |
|
I checked out the code and I have a few thoughts. I am happy to merge this in without any further changes but we should at least write up a tracking issue for improvements. I think the description of
Also I think it might be a good idea to organize the datasets module a little differently. If we add more datasets the module is going to get large quickly and difficult to manage. I think we should move the iris data into a new use rusty_machine::datasets:iris;
let (inputs, targets) = iris::load();But we could also use the current format and have a Let me know what you think. If you don't want to make any of these changes now I'll merge and move this information to a separate ticket. |
4cdb18a to
06e52c4
Compare
|
Thx for the comment. I've did a change requested. Pls take a look when u have a time. |
AtheMathmo
left a comment
There was a problem hiding this comment.
Looks good to me now!
I have a minor nitpick for the features section but so far as I can tell it makes no real difference.
| default = [] | ||
| stats = [] | ||
| datasets = [] | ||
| test = [] |
There was a problem hiding this comment.
We don't need to include the test or default features. These already exist as defined here.
|
|
||
| use super::Dataset; | ||
|
|
||
| /// Load iris dataset. |
|
Thank you! Merging now. |
Closes #115. Added
Datasetstruct which hasdata()andtarget()impl (intended for supervised learning).Adding more data once API looks OK.