-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20726][SPARKR] wrapper for SQL broadcast #17965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Points to discuss:
|
|
Test build #76874 has finished for PR 17965 at commit
|
|
Test build #76875 has finished for PR 17965 at commit
|
| sdf <- callJMethod(object@sdf, "alias", data) | ||
| dataFrame(sdf) | ||
| }) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: one empty line instead of two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
R/pkg/R/DataFrame.R
Outdated
| #' | ||
| #' Return a new SparkDataFrame marked as small enough for use in broadcast joins. | ||
| #' | ||
| #' Equivalent to hint(x, "broadcast). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"broadcast"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\code{hint(x, "broadcast")}
R/pkg/R/context.R
Outdated
| #' sumRDD <- lapply(rdd, useBroadcast) | ||
| #'} | ||
| broadcast <- function(sc, object) { | ||
| broadcast_ <- function(sc, object) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please change this to broadcastRDD like other functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, generally this is how we have handled name conflict with an existing RDD method.
we should be removing the internal only RDD methods at some point
|
|
||
| #' @rdname broadcast | ||
| #' @export | ||
| setGeneric("broadcast", function(x) { standardGeneric("broadcast") }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this list is sorted alphabetically within this section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a rd for broadcast already though https://github.com/zero323/spark/blob/397ab1f7b4b4e2b9e51b697c92e3be197fed4554/R/pkg/R/generics.R#L376
we probably need to remove that one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this list is sorted alphabetically within this section
Looks like it used to be at some point, but these days are long gone. I can reorder it right now, but this means rearranging a whole section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ouch it is # and not #' on this line https://github.com/zero323/spark/blob/397ab1f7b4b4e2b9e51b697c92e3be197fed4554/R/pkg/R/generics.R#L376
let's leave the sorting for now. we really need to stick with one method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's fix up the sorting when 2.2.0 is released - it would help to minimize major changes for now to make it easier to merge fixes, just in case.
|
Test build #76898 has finished for PR 17965 at commit
|
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add broadcast to NAMESPACE
https://github.com/apache/spark/blob/master/R/pkg/NAMESPACE
(testthat is running inside the SparkR namespace)
|
Test build #76912 has finished for PR 17965 at commit
|
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Jenkins passed, AppVeyor passed before
|
merged to master |
## What changes were proposed in this pull request? - Adds R wrapper for `o.a.s.sql.functions.broadcast`. - Renames `broadcast` to `broadcast_`. ## How was this patch tested? Unit tests, check `check-cran.sh`. Author: zero323 <zero323@users.noreply.github.com> Closes apache#17965 from zero323/SPARK-20726.
## What changes were proposed in this pull request? - Adds R wrapper for `o.a.s.sql.functions.broadcast`. - Renames `broadcast` to `broadcast_`. ## How was this patch tested? Unit tests, check `check-cran.sh`. Author: zero323 <zero323@users.noreply.github.com> Closes apache#17965 from zero323/SPARK-20726.

What changes were proposed in this pull request?
o.a.s.sql.functions.broadcast.broadcasttobroadcast_.How was this patch tested?
Unit tests, check
check-cran.sh.