-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13033][ML][PySpark] Add import/export for ml.regression #11000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13033][ML][PySpark] Add import/export for ml.regression #11000
Conversation
python/pyspark/ml/regression.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Wenpei Please check wether it supports save/load for the peer Scala implementation. Some algorithms such as DecisionTree did not support it currently. And you should add doc test that will test the correctness of your modification.
|
@yanboliang Sorry for last PR that I didn't check scala side. For regression, there are only three algorithm support MLRead/MLWrite: I add export/import api, and doc test currently. But there is one issues here that doctest failed with below exception. It was caused by we didn't set default value for "weightCol" (IsotonicRegression), "quantilesCol"(AFTSurvivalRegression) on scala code side. I add value when constructure instance to make doctest pass, but I thought we should submit a jira for this. How about your idea? Exception detail. |
|
@Wenpei It looks like |
|
Sure, I will submit a jira, I thought we need fix it in scala side that ensure all parameter has default value. |
|
It should not make all parameters have default value because of some params are not setting default value on purpose. I think we should modify |
|
Sure, good catch. I have submit a jira 13153 and submit a pr later |
|
Ping @yanboliang @mengxr |
python/pyspark/ml/regression.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to check model.boundaries == model2.boundaries
|
@yanboliang OK. |
784e315 to
9cddc98
Compare
|
@yanboliang I complete this pr, please take a look |
| >>> model.save(model_path) | ||
| >>> model2 = IsotonicRegressionModel.load(model_path) | ||
| >>> model.boundaries == model2.boundaries | ||
| True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also test model.predictions == model2.predictions.
|
Jenkins, test this please. |
|
Jenkins, test this please. |
|
Test build #51873 has finished for PR 11000 at commit
|
|
LGTM |
|
Merged into master. Thanks! |
Add export/import for all estimators and transformers(which have Scala implementation) under pyspark/ml/regression.py.
@yanboliang Please help to review.
For doctest, I though it's enough to add one since it's common usage. But I can add to all if we want it.