-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16055][SPARKR] warning added while using sparkPackages with spark-submit #14179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @shivaram |
R/pkg/R/sparkR.R
Outdated
| existingPort <- Sys.getenv("EXISTING_SPARKR_BACKEND_PORT", "") | ||
| if (existingPort != "") { | ||
| if(sparkPackages != ""){ | ||
| warning("--packages flag should be used with with spark-submit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two space indent here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And "spark-submit or sparkR shell"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung the sparkPackages argument should work from the SparkR shell ? Not sure we should add that in the warning message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shivaram @felixcheung how about something like sparkPackages cannot be used as an argument within sparkR.init please use the --packages flag while using spark-submit or sparkR shell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shivaram maybe it should but sparkR.session() is already called in sparkR shell, and calling SparkSession again with the sparkPackages does nothing:
> sparkR.session(sparkPackages = "com.databricks:spark-avro_2.10:2.0.1")
Java ref type org.apache.spark.sql.SparkSession id 1
> read.df("", source = "avro")
16/07/14 23:55:43 ERROR RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
org.apache.spark.sql.AnalysisException: Failed to find data source: avro. Please use Spark package http://spark-packages.org/package/databricks/spark-avro;
It's because it is passed only when creating the SparkContext initially.
https://github.com/apache/spark/blob/master/R/pkg/R/sparkR.R#L164
@krishnakalyan3 something like "sparkPackages has no effect when using spark-submit or sparkR shell, please use the --packages commandline instead"
|
Jenkins, ok to test |
|
Test build #62254 has finished for PR 14179 at commit
|
|
Test build #62338 has finished for PR 14179 at commit
|
|
@shivaram @felixcheung My patch fails sparkR unit test. (./R/run-tests.sh) |
|
Test build #62385 has finished for PR 14179 at commit
|
|
The error in the link you provided (https://gist.github.com/krishnakalyan3/6585a1007b731e82fede1b942ea00bec) are odd, I have not seen them. What is the version of testthat you have installed?
As for build failure, they look more straightforward:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62385/console
R/pkg/R/group.R'
lines should not be more than 100 characters.
warning("sparkPackages has no effect when using spark-submit or sparkR shell, please use the --packages commandline instead")
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lintr checks failed.
|
|
@felixcheung my local unit test still fail, anyway thanks for the clarification. |
|
Test build #62409 has finished for PR 14179 at commit
|
|
Test build #62410 has finished for PR 14179 at commit
|
|
@felixcheung @shivaram I am not sure if the warning message is clear enough. I did the best I could with character limit of 100. I am not sure which SparkR unit tests fail from the logs below |
|
@krishnakalyan3 We don't need to modify the message. You can keep the original message and just split it across two lines with something like the spark/R/pkg/inst/tests/testthat/test_sparkSQL.R Line 1599 in 5ec0d69
Regarding the test error, it looks like there is some unit test that is running into this warning and that warning isn't expected, so it gets promoted to an error. We should first track down which test is causing this. |
|
Test build #62414 has finished for PR 14179 at commit
|
|
A couple of test files are reusing an existing SparkSession/SparkContext by calling |
R/pkg/R/sparkR.R
Outdated
|
|
||
| existingPort <- Sys.getenv("EXISTING_SPARKR_BACKEND_PORT", "") | ||
| if (existingPort != "") { | ||
| if (length(sparkPackages) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this and we get the unit test error because the check here isn't correct. So we get sparkPackages as "" and in R the length of empty string is not 0 but 1.
> length("")
[1] 1
I think we should instead check the length of packages to be zero ? (packages is a list created by splitting the input in processSparkPackages)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking
if (length(packages) != 0)
sounds like a much better idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shivaram yes you are right, thanks. @felixcheung will make the change.
|
Test build #62424 has finished for PR 14179 at commit
|
|
Test build #62425 has finished for PR 14179 at commit
|
|
@felixcheung @shivaram Is the current state okay? |
|
LGTM. |
|
LGTM. btw, for future references, you could either or |
…ark-submit ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-16055 sparkPackages - argument is passed and we detect that we are in the R script mode, we should print some warning like --packages flag should be used with with spark-submit ## How was this patch tested? In my system locally Author: krishnakalyan3 <krishnakalyan3@gmail.com> Closes #14179 from krishnakalyan3/spark-pkg. (cherry picked from commit 8ea3f4e) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
|
@shivaram @felixcheung thanks for the reviews. Will keep the feedbacks in mind. |
What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/SPARK-16055
sparkPackages - argument is passed and we detect that we are in the R script mode, we should print some warning like --packages flag should be used with with spark-submit
How was this patch tested?
In my system locally