Skip to content

Conversation

@krishnakalyan3
Copy link
Member

@krishnakalyan3 krishnakalyan3 commented Jul 13, 2016

What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-16055
sparkPackages - argument is passed and we detect that we are in the R script mode, we should print some warning like --packages flag should be used with with spark-submit

How was this patch tested?

In my system locally

@krishnakalyan3
Copy link
Member Author

cc @shivaram

R/pkg/R/sparkR.R Outdated
existingPort <- Sys.getenv("EXISTING_SPARKR_BACKEND_PORT", "")
if (existingPort != "") {
if(sparkPackages != ""){
warning("--packages flag should be used with with spark-submit")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two space indent here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And "spark-submit or sparkR shell"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung the sparkPackages argument should work from the SparkR shell ? Not sure we should add that in the warning message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivaram @felixcheung how about something like sparkPackages cannot be used as an argument within sparkR.init please use the --packages flag while using spark-submit or sparkR shell

Copy link
Member

@felixcheung felixcheung Jul 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivaram maybe it should but sparkR.session() is already called in sparkR shell, and calling SparkSession again with the sparkPackages does nothing:

> sparkR.session(sparkPackages = "com.databricks:spark-avro_2.10:2.0.1")
Java ref type org.apache.spark.sql.SparkSession id 1
> read.df("", source = "avro")
16/07/14 23:55:43 ERROR RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
  org.apache.spark.sql.AnalysisException: Failed to find data source: avro. Please use Spark package http://spark-packages.org/package/databricks/spark-avro;

It's because it is passed only when creating the SparkContext initially.
https://github.com/apache/spark/blob/master/R/pkg/R/sparkR.R#L164

@krishnakalyan3 something like "sparkPackages has no effect when using spark-submit or sparkR shell, please use the --packages commandline instead"

@shivaram
Copy link
Contributor

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented Jul 13, 2016

Test build #62254 has finished for PR 14179 at commit 3d5d49b.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2016

Test build #62338 has finished for PR 14179 at commit 259c4af.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@krishnakalyan3
Copy link
Member Author

krishnakalyan3 commented Jul 15, 2016

@shivaram @felixcheung My patch fails sparkR unit test. (./R/run-tests.sh)
Logs https://gist.github.com/krishnakalyan3/6585a1007b731e82fede1b942ea00bec
I am not sure how to go about resolving this.

@SparkQA
Copy link

SparkQA commented Jul 15, 2016

Test build #62385 has finished for PR 14179 at commit 76fa14b.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

felixcheung commented Jul 16, 2016 via email

@krishnakalyan3
Copy link
Member Author

@felixcheung my local unit test still fail, anyway thanks for the clarification.

@SparkQA
Copy link

SparkQA commented Jul 16, 2016

Test build #62409 has finished for PR 14179 at commit c0dca31.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2016

Test build #62410 has finished for PR 14179 at commit 204fa25.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@krishnakalyan3
Copy link
Member Author

@felixcheung @shivaram I am not sure if the warning message is clear enough. I did the best I could with character limit of 100. I am not sure which SparkR unit tests fail from the logs below
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62410/console

@shivaram
Copy link
Contributor

@krishnakalyan3 We don't need to modify the message. You can keep the original message and just split it across two lines with something like the paste function used in

paste("The following column name: name_y occurs more than once in the 'DataFrame'.",

Regarding the test error, it looks like there is some unit test that is running into this warning and that warning isn't expected, so it gets promoted to an error. We should first track down which test is causing this.

@SparkQA
Copy link

SparkQA commented Jul 16, 2016

Test build #62414 has finished for PR 14179 at commit b425540.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

A couple of test files are reusing an existing SparkSession/SparkContext by calling sparkR.sparkContext I think they are now hitting the warning statement that is added in this PR, somehow. I'm not sure which ones are calling sparkR.sparkContext with sparkPackages set. If you find them you are add suppressWarnings() around it.

SerDe functionality : Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap,  : 
  (converted from warning) sparkPackages has no effect when using spark-submit or sparkR shell,please use the --packages commandline instead
Calls: test_package ... eval -> eval -> sparkR.session -> sparkR.sparkContext

R/pkg/R/sparkR.R Outdated

existingPort <- Sys.getenv("EXISTING_SPARKR_BACKEND_PORT", "")
if (existingPort != "") {
if (length(sparkPackages) != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and we get the unit test error because the check here isn't correct. So we get sparkPackages as "" and in R the length of empty string is not 0 but 1.

> length("")
[1] 1

I think we should instead check the length of packages to be zero ? (packages is a list created by splitting the input in processSparkPackages)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking

if (length(packages) != 0)

sounds like a much better idea!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivaram yes you are right, thanks. @felixcheung will make the change.

@SparkQA
Copy link

SparkQA commented Jul 17, 2016

Test build #62424 has finished for PR 14179 at commit aefdb45.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 17, 2016

Test build #62425 has finished for PR 14179 at commit 12bd700.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@krishnakalyan3
Copy link
Member Author

@felixcheung @shivaram Is the current state okay?

@shivaram
Copy link
Contributor

LGTM.

@felixcheung
Copy link
Member

LGTM.

btw, for future references, you could either

# sep defaults to " "
paste("sparkPackages has no effect when using spark-submit or sparkR shell,",
                     "please use the --packages commandline instead")

or

paste0("sparkPackages has no effect when using spark-submit or sparkR shell, ",
                     "please use the --packages commandline instead")

@asfgit asfgit closed this in 8ea3f4e Jul 18, 2016
asfgit pushed a commit that referenced this pull request Jul 18, 2016
…ark-submit

## What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/SPARK-16055
sparkPackages - argument is passed and we detect that we are in the R script mode, we should print some warning like --packages flag should be used with with spark-submit

## How was this patch tested?
In my system locally

Author: krishnakalyan3 <krishnakalyan3@gmail.com>

Closes #14179 from krishnakalyan3/spark-pkg.

(cherry picked from commit 8ea3f4e)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
@krishnakalyan3 krishnakalyan3 deleted the spark-pkg branch July 20, 2016 08:13
@krishnakalyan3
Copy link
Member Author

@shivaram @felixcheung thanks for the reviews. Will keep the feedbacks in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants