PARQUET-480: Update for Cascading 3.0#284
Conversation
|
thanks @cchepelov |
|
Thanks for your feedback @julienledem Agreed, making a "cp -R" fork as I did is not very tasty, let's find how to to resolve this the best way. $ diff -urN parquet-cascading parquet-cascading3|diffstat
REVIEWERS.md | 4 ++
pom.xml | 8 ++---
src/main/java/org/apache/parquet/cascading/ParquetTBaseScheme.java | 4 +-
src/main/java/org/apache/parquet/cascading/ParquetTupleScheme.java | 14 +++++-----
src/main/java/org/apache/parquet/cascading/ParquetValueScheme.java | 6 ++--
src/test/java/org/apache/parquet/cascading/TestParquetTBaseScheme.java | 2 -
6 files changed, 21 insertions(+), 17 deletions(-)out of 15 files. Now the trick things:
So, the question is: do you want me to split things that don't change into a "parquet-cascading-23common.jar" (built with e.g. cascading 2.x "provided"), keeping the respective 2.0 and 3.0-specific code in parquet-cascading / parquet-cascading3? I'd be happy to do so if this is needed. Provisionally, it'd look like this:
Does this look like what you had in mind? |
|
@cchepelov Looking for a good way to do this without making it too complicated. What does the diff look like between the Scheme classes? |
|
Sure, here's one: --- ./parquet-cascading/src/main/java/org/apache/parquet/cascading/ParquetTupleScheme.java 2015-10-21 13:29:53.725167935 +0200
+++ ./parquet-cascading3/src/main/java/org/apache/parquet/cascading/ParquetTupleScheme.java 2015-10-21 14:40:37.830312750 +0200
@@ -101,7 +101,7 @@
@SuppressWarnings("rawtypes")
@Override
- public void sourceConfInit(FlowProcess<JobConf> fp,
+ public void sourceConfInit(FlowProcess<? extends JobConf> fp,
Tap<JobConf, RecordReader, OutputCollector> tap, JobConf jobConf) {
if (filterPredicate != null) {
@@ -114,7 +114,7 @@
}
@Override
- public Fields retrieveSourceFields(FlowProcess<JobConf> flowProcess, Tap tap) {
+ public Fields retrieveSourceFields(FlowProcess<? extends JobConf> flowProcess, Tap tap) {
MessageType schema = readSchema(flowProcess, tap);
SchemaIntersection intersection = new SchemaIntersection(schema, getSourceFields());
@@ -123,7 +123,7 @@
return getSourceFields();
}
- private MessageType readSchema(FlowProcess<JobConf> flowProcess, Tap tap) {
+ private MessageType readSchema(FlowProcess<? extends JobConf> flowProcess, Tap tap) {
try {
Hfs hfs;
@@ -144,7 +144,7 @@
}
}
- private List<Footer> getFooters(FlowProcess<JobConf> flowProcess, Hfs hfs) throws IOException {
+ private List<Footer> getFooters(FlowProcess<? extends JobConf> flowProcess, Hfs hfs) throws IOException {
JobConf jobConf = flowProcess.getConfigCopy();
DeprecatedParquetInputFormat format = new DeprecatedParquetInputFormat();
format.addInputPath(jobConf, hfs.getPath());
@@ -153,7 +153,7 @@
@SuppressWarnings("unchecked")
@Override
- public boolean source(FlowProcess<JobConf> fp, SourceCall<Object[], RecordReader> sc)
+ public boolean source(FlowProcess<? extends JobConf> fp, SourceCall<Object[], RecordReader> sc)
throws IOException {
Container<Tuple> value = (Container<Tuple>) sc.getInput().createValue();
boolean hasNext = sc.getInput().next(null, value);
@@ -169,7 +169,7 @@
@SuppressWarnings("rawtypes")
@Override
- public void sinkConfInit(FlowProcess<JobConf> fp,
+ public void sinkConfInit(FlowProcess<? extends JobConf> fp,
Tap<JobConf, RecordReader, OutputCollector> tap, JobConf jobConf) {
DeprecatedParquetOutputFormat.setAsOutputFormat(jobConf);
jobConf.set(TupleWriteSupport.PARQUET_CASCADING_SCHEMA, parquetSchema);
@@ -182,7 +182,7 @@
}
@Override
- public void sink(FlowProcess<JobConf> fp, SinkCall<Object[], OutputCollector> sink)
+ public void sink(FlowProcess<? extends JobConf> fp, SinkCall<Object[], OutputCollector> sink)
throws IOException {
TupleEntry tuple = sink.getOutgoingEntry();
OutputCollector outputCollector = sink.getOutput();(very similar things happen in the rest; it's really mostly mechanical) |
|
I wonder if we could work around duplicating this part by just having the raw type in the parameters: This would have warnings but should compile and work in both cases. |
|
Um, would it, including for Scala consumers ? It's worth a try. Let me get back to you tomorrow. Sur 8 déc. 2015 20:55, à 20:55, Julien Le Dem notifications@github.com a écrit:
|
|
(got sidetracked, will come back at the issue once got an initial "scalding on cascading3" build up&running and ready to go into a "polish issues" phase) |
|
-1 to the raw type suggestion in general. Scala can have a lot of issues with that. |
|
Hi @johnynek ! Nice to see you around :) |
|
This file should not be added right: parquet-cascading3/.cache (I could not comment on a binary file). |
|
@julienledem Hello, sir, I hope all is well. I don't know a good way to deal with the duplication without some templateing and build system hacking. That said, I don't have a big issue, since I hope we can all move on to cascading 3 soon. Also, there is not a ton of duplication, and the code is not high velocity, so I'd make the compomise. :/ my idealism wanes? |
|
Whoopsie.
just pushed a correction
|
4cc15d0 to
87c570f
Compare
|
Hi @julienledem ; apparently Maven and I found a way to agree on whether and how to share unchanged code between parquet-cascading and parquet-cascading3 / cc @johnynek |
|
Looms good to me! Thanks for doing the work to minimize duplication!. @julienledem cool? |
|
Thanks @cchepelov it looks much better. Here is the diff for the remaining duplicated code: $ diff parquet-cascading/src/main/java/org/apache/parquet/cascading/ parquet-cascading3/src/main/java/org/apache/parquet/cascading/
diff parquet-cascading/src/main/java/org/apache/parquet/cascading/ParquetTBaseScheme.java parquet-cascading3/src/main/java/org/apache/parquet/cascading/ParquetTBaseScheme.java
60c60
< public void sourceConfInit(FlowProcess<JobConf> fp,
---
> public void sourceConfInit(FlowProcess<? extends JobConf> fp,
69c69
< public void sinkConfInit(FlowProcess<JobConf> fp,
---
> public void sinkConfInit(FlowProcess<? extends JobConf> fp,
diff parquet-cascading/src/main/java/org/apache/parquet/cascading/ParquetTupleScheme.java parquet-cascading3/src/main/java/org/apache/parquet/cascading/ParquetTupleScheme.java
104c104
< public void sourceConfInit(FlowProcess<JobConf> fp,
---
> public void sourceConfInit(FlowProcess<? extends JobConf> fp,
117c117
< public Fields retrieveSourceFields(FlowProcess<JobConf> flowProcess, Tap tap) {
---
> public Fields retrieveSourceFields(FlowProcess<? extends JobConf> flowProcess, Tap tap) {
126c126
< private MessageType readSchema(FlowProcess<JobConf> flowProcess, Tap tap) {
---
> private MessageType readSchema(FlowProcess<? extends JobConf> flowProcess, Tap tap) {
147c147
< private List<Footer> getFooters(FlowProcess<JobConf> flowProcess, Hfs hfs) throws IOException {
---
> private List<Footer> getFooters(FlowProcess<? extends JobConf> flowProcess, Hfs hfs) throws IOException {
156c156
< public boolean source(FlowProcess<JobConf> fp, SourceCall<Object[], RecordReader> sc)
---
> public boolean source(FlowProcess<? extends JobConf> fp, SourceCall<Object[], RecordReader> sc)
172c172
< public void sinkConfInit(FlowProcess<JobConf> fp,
---
> public void sinkConfInit(FlowProcess<? extends JobConf> fp,
185c185
< public void sink(FlowProcess<JobConf> fp, SinkCall<Object[], OutputCollector> sink)
---
> public void sink(FlowProcess<? extends JobConf> fp, SinkCall<Object[], OutputCollector> sink)
diff parquet-cascading/src/main/java/org/apache/parquet/cascading/ParquetValueScheme.java parquet-cascading3/src/main/java/org/apache/parquet/cascading/ParquetValueScheme.java
141c141
< public void sourceConfInit(FlowProcess<JobConf> jobConfFlowProcess, Tap<JobConf, RecordReader, OutputCollector> jobConfRecordReaderOutputCollectorTap, final JobConf jobConf) {
---
> public void sourceConfInit(FlowProcess<? extends JobConf> jobConfFlowProcess, Tap<JobConf, RecordReader, OutputCollector> jobConfRecordReaderOutputCollectorTap, JobConf jobConf) {
156c156
< public boolean source(FlowProcess<JobConf> fp, SourceCall<Object[], RecordReader> sc)
---
> public boolean source(FlowProcess<? extends JobConf> fp, SourceCall<Object[], RecordReader> sc)
171c171
< public void sink(FlowProcess<JobConf> fp, SinkCall<Object[], OutputCollector> sc)
---
> public void sink(FlowProcess<? extends JobConf> fp, SinkCall<Object[], OutputCollector> sc)
diff parquet-cascading/src/main/java/org/apache/parquet/cascading/TupleWriteSupport.java parquet-cascading3/src/main/java/org/apache/parquet/cascading/TupleWriteSupport.java
45,49d44
< public String getName() {
< return "cascading";
< }
<
< @Override
|
I'd rather go with the second route (stick @deprecated("This |
|
@cchepelov Yep that's fine. Thanks again. |
|
Thanks for the feedback @julienledem ! |
|
@cchepelov did you have a jira for this? |
|
@julienledem aah, forgot that. Done. |
|
@cchepelov Could you rebase your branch? Thank you |
This reverts commit b1b7719.
…et-cascading-common23
0212b87 to
e7d1304
Compare
|
@julienledem done! |
|
Thank you @cchepelov ! |
The code in parquet-cascading is adapted to the API as of Cascading 2.5.3 Some incompatible changes were introduced in Cascading 3.0. This patch forks the parquet-cascading module to also provide a parquet-cascading3 module, which is about identical save for overloads which changed from requiring a Foo<JobConf> to requiring a Foo<? extends JobConf> Author: Cyrille Chépélov (TP12) <cch@transparencyrights.com> Closes apache#284 from cchepelov/try_cascading3 and squashes the following commits: e7d1304 [Cyrille Chépélov (TP12)] Adding a @deprecated notice on parquet-cascading's remaining classes 05a417d [Cyrille Chépélov (TP12)] cascading2/3: share back TupleWriteSupport.java (accidentally unmerged) 7fff2d4 [Cyrille Chépélov (TP12)] cascading/cascading3: remove duplicates, push common files into parquet-cascading-common23 338a416 [Cyrille Chépélov (TP12)] Removing unwanted file (what?!) + .gitignoring this kind of files d9f0455 [Cyrille Chépélov (TP12)] TupleEntry#get is now TupleEntry#getObject a7f490a [Cyrille Chépélov (TP12)] Revert "Missing test conversion to Cascading 3.0" cc8b870 [Cyrille Chépélov (TP12)] Missing test conversion to Cascading 3.0 2d73512 [Cyrille Chépélov (TP12)] conflicting values can come in one order or the other. Accept both. 33355d5 [Cyrille Chépélov (TP12)] Fix version mismatch (duh!) 7128639 [Cyrille Chépélov (TP12)] non-C locale can break tests implementation (decimal formats) 53aa2f9 [Cyrille Chépélov (TP12)] Adding a parquet-cascading3 module (forking the parquet-cascading module and accounting for API changes)
The code in parquet-cascading is adapted to the API as of Cascading 2.5.3
Some incompatible changes were introduced in Cascading 3.0. This patch forks the parquet-cascading module to also provide a parquet-cascading3 module, which is about identical save for overloads which changed from requiring a Foo to requiring a Foo<? extends JobConf>