Skip to content

Conversation

@Wenpei
Copy link
Contributor

@Wenpei Wenpei commented Feb 1, 2016

Add export/import for all estimators and transformers(which have Scala implementation) under pyspark/ml/regression.py.

@yanboliang Please help to review.
For doctest, I though it's enough to add one since it's common usage. But I can add to all if we want it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wenpei Please check wether it supports save/load for the peer Scala implementation. Some algorithms such as DecisionTree did not support it currently. And you should add doc test that will test the correctness of your modification.

@Wenpei
Copy link
Contributor Author

Wenpei commented Feb 2, 2016

@yanboliang Sorry for last PR that I didn't check scala side.

For regression, there are only three algorithm support MLRead/MLWrite:
LinearRegression
IsotonicRegression
AFTSurvivalRegression

I add export/import api, and doc test currently.

But there is one issues here that doctest failed with below exception. It was caused by we didn't set default value for "weightCol" (IsotonicRegression), "quantilesCol"(AFTSurvivalRegression) on scala code side. I add value when constructure instance to make doctest pass, but I thought we should submit a jira for this. How about your idea?

Exception detail.
ir2 = IsotonicRegression.load(ir_path)
Exception raised:
Traceback (most recent call last):
File "C:\Python27\lib\doctest.py", line 1289, in run
compileflags, 1) in test.globs
File "<doctest __main
.IsotonicRegression[11]>", line 1, in
ir2 = IsotonicRegression.load(ir_path)
File "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\util.py", line 194, in load
return cls.read().load(path)
File "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\util.py", line 148, in load
instance._transfer_params_from_java()
File "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\wrapper.py", line 82, in _tran
fer_params_from_java
value = _java2py(sc, self.java_obj.getOrDefault(java_param))
File "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\java_gateway.py", line 813, in
call

answer, self.gateway_client, self.target_id, self.name)
File "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\sql\utils.py", line 45, in deco
return f(_a, *kw)
File "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\protocol.py", line 308, in get

eturn_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o351.getOrDefault.
: java.util.NoSuchElementException: Failed to find a default value for weightCol
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:647)
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:647)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:646)
at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)

@yanboliang
Copy link
Contributor

@Wenpei It looks like _transfer_params_from_java did not consider the params which do not have default value and we should handle them. Would you mind to create a jira to track this issue?

@Wenpei
Copy link
Contributor Author

Wenpei commented Feb 3, 2016

Sure, I will submit a jira, I thought we need fix it in scala side that ensure all parameter has default value.

@yanboliang
Copy link
Contributor

It should not make all parameters have default value because of some params are not setting default value on purpose. I think we should modify _transfer_params_from_java to make it not to get the params which do not have default values.

@Wenpei
Copy link
Contributor Author

Wenpei commented Feb 3, 2016

Sure, good catch. I have submit a jira 13153 and submit a pr later

@Wenpei
Copy link
Contributor Author

Wenpei commented Feb 15, 2016

Ping @yanboliang @mengxr
Done for this PR. Please help review and launch test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to check model.boundaries == model2.boundaries

@yanboliang
Copy link
Contributor

@Wenpei Please pay attention to the status of #11197 and update this PR corresponding when it get merged.

@Wenpei
Copy link
Contributor Author

Wenpei commented Feb 15, 2016

@yanboliang OK.

@Wenpei Wenpei force-pushed the spark-13033-ml.regression-exprot-import branch from 784e315 to 9cddc98 Compare February 22, 2016 06:53
@Wenpei
Copy link
Contributor Author

Wenpei commented Feb 22, 2016

@yanboliang I complete this pr, please take a look

>>> model.save(model_path)
>>> model2 = IsotonicRegressionModel.load(model_path)
>>> model.boundaries == model2.boundaries
True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test model.predictions == model2.predictions.

@yanboliang
Copy link
Contributor

Jenkins, test this please.

@Wenpei
Copy link
Contributor Author

Wenpei commented Feb 24, 2016

@mengxr @srowen Can you add me to white list ? or help to launch a jenkins test for this?

@srowen
Copy link
Member

srowen commented Feb 24, 2016

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Feb 24, 2016

Test build #51873 has finished for PR 11000 at commit 3646b36.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor

LGTM

@mengxr
Copy link
Contributor

mengxr commented Feb 26, 2016

Merged into master. Thanks!

@asfgit asfgit closed this in f3be369 Feb 26, 2016
@Wenpei Wenpei deleted the spark-13033-ml.regression-exprot-import branch June 16, 2016 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants