Skip to content

Conversation

@zero323
Copy link
Member

@zero323 zero323 commented May 12, 2017

What changes were proposed in this pull request?

  • Adds R wrapper for o.a.s.sql.functions.broadcast.
  • Renames broadcast to broadcast_.

How was this patch tested?

Unit tests, check check-cran.sh.

@zero323
Copy link
Member Author

zero323 commented May 12, 2017

Points to discuss:

  • Do we really need this. It gives us full API parity but is not strictly necessary. hint(df, "broadcast") should be equivalent.

  • Is this the best implementation? Some alternatives:

    • Use generics for both and signature(x = "SparkDataFrame", "missing") for DataFrame version and signature(x = "jobj", object = "Any") for general version. This would keep internal API intact, but is hard to document without leaking internal details.

    • Use different name for DataFrame version, for example broadcast_table. This is a bit verbose, and slightly harder to port for users.

  • Is dataframe.R the best location? It is generic on SparkDataFrame so functions.R don't feel like a right choice.

@SparkQA
Copy link

SparkQA commented May 12, 2017

Test build #76874 has finished for PR 17965 at commit f190d62.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 12, 2017

Test build #76875 has finished for PR 17965 at commit 397ab1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323 zero323 closed this May 13, 2017
@zero323 zero323 reopened this May 13, 2017
sdf <- callJMethod(object@sdf, "alias", data)
dataFrame(sdf)
})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: one empty line instead of two

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#'
#' Return a new SparkDataFrame marked as small enough for use in broadcast joins.
#'
#' Equivalent to hint(x, "broadcast).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"broadcast"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\code{hint(x, "broadcast")}

#' sumRDD <- lapply(rdd, useBroadcast)
#'}
broadcast <- function(sc, object) {
broadcast_ <- function(sc, object) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change this to broadcastRDD like other functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, generally this is how we have handled name conflict with an existing RDD method.
we should be removing the internal only RDD methods at some point


#' @rdname broadcast
#' @export
setGeneric("broadcast", function(x) { standardGeneric("broadcast") })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this list is sorted alphabetically within this section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a rd for broadcast already though https://github.com/zero323/spark/blob/397ab1f7b4b4e2b9e51b697c92e3be197fed4554/R/pkg/R/generics.R#L376
we probably need to remove that one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to affect the docs so I don't think we have to touch this for now:

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this list is sorted alphabetically within this section

Looks like it used to be at some point, but these days are long gone. I can reorder it right now, but this means rearranging a whole section.

Copy link
Member

@felixcheung felixcheung May 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch it is # and not #' on this line https://github.com/zero323/spark/blob/397ab1f7b4b4e2b9e51b697c92e3be197fed4554/R/pkg/R/generics.R#L376

let's leave the sorting for now. we really need to stick with one method

Copy link
Member

@felixcheung felixcheung May 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's fix up the sorting when 2.2.0 is released - it would help to minimize major changes for now to make it easier to merge fixes, just in case.

@SparkQA
Copy link

SparkQA commented May 13, 2017

Test build #76898 has finished for PR 17965 at commit 1530785.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323 zero323 closed this May 13, 2017
@zero323 zero323 reopened this May 13, 2017
Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add broadcast to NAMESPACE
https://github.com/apache/spark/blob/master/R/pkg/NAMESPACE

(testthat is running inside the SparkR namespace)

@SparkQA
Copy link

SparkQA commented May 14, 2017

Test build #76912 has finished for PR 17965 at commit d6c3435.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Jenkins passed, AppVeyor passed before

@felixcheung
Copy link
Member

merged to master

@asfgit asfgit closed this in 5a799fd May 14, 2017
robert3005 pushed a commit to palantir/spark that referenced this pull request May 19, 2017
## What changes were proposed in this pull request?

- Adds R wrapper for `o.a.s.sql.functions.broadcast`.
- Renames `broadcast` to `broadcast_`.

## How was this patch tested?

Unit tests, check `check-cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes apache#17965 from zero323/SPARK-20726.
lycplus pushed a commit to lycplus/spark that referenced this pull request May 24, 2017
## What changes were proposed in this pull request?

- Adds R wrapper for `o.a.s.sql.functions.broadcast`.
- Renames `broadcast` to `broadcast_`.

## How was this patch tested?

Unit tests, check `check-cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes apache#17965 from zero323/SPARK-20726.
@zero323 zero323 deleted the SPARK-20726 branch February 2, 2020 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants