-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17697][ML] Fixed bug in summary calculations that pattern match against label without casting #15288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17697][ML] Fixed bug in summary calculations that pattern match against label without casting #15288
Conversation
|
@jkbradley I checked the other algorithms for similar issues. |
|
Test build #66072 has finished for PR 15288 at commit
|
|
I'll take a look, thanks! |
jkbradley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just a few tiny comments
| test("evaluate with labels that are not doubles") { | ||
| // Evaluate a test set with Label that is a numeric type other than Double | ||
| val lr = new LogisticRegression() | ||
| .setMaxIter(10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just 1 iteration to be a little faster? Also no need to set threshold.
indent 2 spaces, not 4
|
|
||
| test("evaluate with labels that are not doubles") { | ||
| // Evaulate with a dataset that contains Labels not as doubles to verify correct casting | ||
| val datasetWithWeight = Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to have weights in this test, so it could be simplified a bit.
|
Thanks for the review @jkbradley! I simplified the tests as you suggested |
|
LGTM pending tests |
|
Test build #66113 has finished for PR 15288 at commit
|
|
Merging with master and branch-2.0 |
…ranch-2.0 [SPARK-17697][ML] Fixed bug in summary calculations that pattern match against label without casting In calling LogisticRegression.evaluate and GeneralizedLinearRegression.evaluate using a Dataset where the Label is not of a double type, calculations pattern match against a double and throw a MatchError. This fix casts the Label column to a DoubleType to ensure there is no MatchError. Added unit tests to call evaluate with a dataset that has Label as other numeric types. Author: Bryan Cutler <cutlerb@gmail.com> Closes #15288 from BryanCutler/binaryLOR-numericCheck-SPARK-17697. (cherry picked from commit 2f73956) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
|
OK done with cherry-pick to 2.0. Also, I just noticed there are some other select() calls for labelCol in GeneralizedLinearRegression.scala without casts. Would you mind sending a follow-up PR for those? Thank you! |
|
Thanks @jkbradley. Are you referring to other select() calls like that used in |
|
Oh, you're right! I didn't realize that the UDF would handle casting automatically. I think it's fine then. I'll mark the JIRA as resolved. Thanks! |
What changes were proposed in this pull request?
In calling LogisticRegression.evaluate and GeneralizedLinearRegression.evaluate using a Dataset where the Label is not of a double type, calculations pattern match against a double and throw a MatchError. This fix casts the Label column to a DoubleType to ensure there is no MatchError.
How was this patch tested?
Added unit tests to call evaluate with a dataset that has Label as other numeric types.