-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35142][PYTHON][ML] Fix incorrect return type for rawPredictionUDF in OneVsRestModel
#32245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| predArray.append(x) | ||
| return Vectors.dense(predArray) | ||
|
|
||
| rawPredictionUDF = udf(func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I add a test here to ensure that the rawPrediction column is no longer string
spark/python/pyspark/ml/tests/test_algorithms.py
Lines 108 to 117 in 0494dc9
| def test_output_columns(self): | |
| df = self.spark.createDataFrame([(0.0, Vectors.dense(1.0, 0.8)), | |
| (1.0, Vectors.sparse(2, [], [])), | |
| (2.0, Vectors.dense(0.5, 0.5))], | |
| ["label", "features"]) | |
| lr = LogisticRegression(maxIter=5, regParam=0.01) | |
| ovr = OneVsRest(classifier=lr, parallelism=1) | |
| model = ovr.fit(df) | |
| output = model.transform(df) | |
| self.assertEqual(output.columns, ["label", "features", "rawPrediction", "prediction"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we should better add a test if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, added a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon
why only transformed_df.head() trigger this error ?
does it indicate bugs in pyspark-sql udf ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like pred.show() triggers an exception too? what does it return in other methods?
|
ok to test |
|
add to whitelist |
|
cc @WeichenXu123 FYI |
|
Test build #137665 has finished for PR 32245 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #137666 has finished for PR 32245 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #137668 has finished for PR 32245 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
WeichenXu123
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Test build #137708 has finished for PR 32245 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #137713 has finished for PR 32245 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
LGTM |
|
Looks good. @harupy, would you mind filling the PR description per the template? |
rawPredictionUDF in OneVsRestModelrawPredictionUDF in OneVsRestModel
|
@viirya, are you preparing Spark 2.4 RC now? This is supposed to be in Spark 2.4 too but this isn't a regression so it doesn't block. It's just a good to have so if you're preparing, it should be fine to don't backport. |
|
BTW, the tests passed at https://github.com/harupy/spark/actions/runs/769366516. GitHub Actions didn't work properly for linking that run for some reasons .. I will leave it to @WeichenXu123 then. |
…nUDF` in `OneVsRestModel` ### What changes were proposed in this pull request? Fixes incorrect return type for `rawPredictionUDF` in `OneVsRestModel`. ### Why are the changes needed? Bugfix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test. Closes #32245 from harupy/SPARK-35142. Authored-by: harupy <17039389+harupy@users.noreply.github.com> Signed-off-by: Weichen Xu <weichen.xu@databricks.com> (cherry picked from commit b6350f5) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
|
Backport to branch-3.1 cause conflicts. |
|
@WeichenXu123 Opened a PR: #32269 |
|
I don't see backport to 2.4. Do you plan to backport it? @WeichenXu123 @harupy? |
|
@viirya Got it. I'll open another PR for 2.4. Wait, does spark/python/pyspark/ml/classification.py Lines 1964 to 2009 in 1630d64
|
|
Okay, looks like we can skip Spark 2.4. |
|
Thanks for confirming. @harupy @HyukjinKwon |
…ictionUDF` in `OneVsRestModel` ### What changes were proposed in this pull request? This PR backports #32245. Fixes incorrect return type for `rawPredictionUDF` in `OneVsRestModel`. ### Why are the changes needed? Bugfix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test. Closes #32275 from harupy/backport-35142-3.0. Authored-by: harupy <17039389+harupy@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…nUDF` in `OneVsRestModel` ### What changes were proposed in this pull request? Fixes incorrect return type for `rawPredictionUDF` in `OneVsRestModel`. ### Why are the changes needed? Bugfix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test. Closes apache#32245 from harupy/SPARK-35142. Authored-by: harupy <17039389+harupy@users.noreply.github.com> Signed-off-by: Weichen Xu <weichen.xu@databricks.com> (cherry picked from commit b6350f5) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
What changes were proposed in this pull request?
Fixes incorrect return type for
rawPredictionUDFinOneVsRestModel.Why are the changes needed?
Bugfix
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test.