Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Mar 7, 2016

JIRA: https://issues.apache.org/jira/browse/SPARK-13717

What changes were proposed in this pull request?

Currently RandomSampler.sample only accepts Scala iterator. We should also let it accept Java iterator for better compatibility.

How was this patch tested?

Some tests are added into RandomSamplerSuite.

@viirya
Copy link
Member Author

viirya commented Mar 7, 2016

cc @mengxr @rxin @srowen

@SparkQA
Copy link

SparkQA commented Mar 7, 2016

Test build #52555 has finished for PR 11559 at commit e54d050.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Mar 7, 2016

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 7, 2016

Test build #52565 has finished for PR 11559 at commit e54d050.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Mar 7, 2016

Why was this made a public API?

@holdenk
Copy link
Contributor

holdenk commented Mar 7, 2016

I think this was made a public API so that people could implement custom sampler logic. I'm not sure why we would want to take Java iterators.

@rxin
Copy link
Contributor

rxin commented Mar 7, 2016

Yea - I don't see why either.

@rxin
Copy link
Contributor

rxin commented Mar 7, 2016

@viirya can you close this? Thanks.

@viirya
Copy link
Member Author

viirya commented Mar 7, 2016

@holdenk @rxin I added the Java iterator support due to the wholestage codegen version of Sample in #11517. I was using Scala iterator but as RandomSampler interface can't accept Java iterator it is quite annoying that I need to use Scala mutable collection such as ArrayBuffer in codegen's Java code. By doing this, there seems weird janino error happened. The solution approach is to Java list and iterator and let RandomSampler interface simply accepts Java iterator like this.

@viirya
Copy link
Member Author

viirya commented Mar 7, 2016

I will close this now if you really think this is not necessary to better Java compatibility. However, I will go to implement RandomSampler interface without iterator due to the need in #11517. Because it will cause performance regression in wholestage codegen.

@viirya viirya closed this Mar 7, 2016
@rxin
Copy link
Contributor

rxin commented Mar 7, 2016

It seems like we shouldn't use iterators for whole stage codegen. The point is to get rid of iterators. Shouldn't we just have a filter there?

@rxin
Copy link
Contributor

rxin commented Mar 7, 2016

BTW next time it'd be great if you can provide the important context for this pull request directly in the description. Thanks.

@viirya
Copy link
Member Author

viirya commented Mar 7, 2016

Yea, thanks! I will do it next time. As you said, so I will implement RandomSampler interface without iterator for wholestage codegen of Sampler.

@viirya viirya deleted the random-sampler-java-compatability branch December 27, 2023 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants