Skip to content

Remove Apache Pig from the tests#7810

Merged
clintropolis merged 6 commits intoapache:masterfrom
Fokko:remove-pig-from-the-tests
Jun 14, 2019
Merged

Remove Apache Pig from the tests#7810
clintropolis merged 6 commits intoapache:masterfrom
Fokko:remove-pig-from-the-tests

Conversation

@Fokko
Copy link
Copy Markdown
Contributor

@Fokko Fokko commented May 31, 2019

I strongly feel we should remove Pig from the tests since it is blocking #7772

Currently, we write the Avro to a file, load it with Apache Pig, and then read it again with Druid. I think this is the wrong way of testing. Avro is a language agnostic format for storing data. So if we're able to read the data with Druid, this should be sufficient. I've replaced the Pig part with just writing and reading Avro.

Besides that, I think Pig is not that commonly used anymore. Currently, we're using a version that has been released in 2015.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented May 31, 2019

Hmm, ordinarily I would suggest to keep the flag in, and instead remove the Pig test dependency and test it some other way. But Pig looks like it is not under very active development anymore. There hasn't been a release in almost two years, and I only see a handful of commits made to the repo in 2019. So for those reasons, removing the flag seems fine to me. We would need to mention it as an incompatibility in the release notes.

@Fokko Fokko force-pushed the remove-pig-from-the-tests branch from b02876c to 8a31061 Compare May 31, 2019 21:58
@Fokko Fokko force-pushed the remove-pig-from-the-tests branch from dd57ec4 to 0fdd7be Compare June 3, 2019 07:20
@Fokko
Copy link
Copy Markdown
Contributor Author

Fokko commented Jun 3, 2019

@gianm Can you shed some light on the flattening part? I noticed that you've written parts of the original code. I do understand this one: https://github.com/apache/incubator-druid/blob/master/extensions-core/avro-extensions/src/main/java/org/apache/druid/data/input/avro/AvroFlattenerMaker.java#L113
Where you align the types that are being returned. For example, removing null's from a list, and changing an Utf8 into a String type.

However, this one looks awkward: https://github.com/apache/incubator-druid/blob/master/extensions-core/avro-extensions/src/main/java/org/apache/druid/data/input/avro/AvroFlattenerMaker.java#L120
Here we pass a GenericRecord into the JsonPath extraction function. It compiles since it accepts an Object, but it doesn't feel right. WDYT?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jun 4, 2019

@Fokko Check out the JSONPATH_CONFIGURATION object: it sets up a GenericAvroJsonProvider, which teaches JsonPath how to read GenericRecords. AvroStreamInputRowParserTest has some tests for this.

@Fokko
Copy link
Copy Markdown
Contributor Author

Fokko commented Jun 4, 2019

Thanks @gianm for the pointer. That is actually pretty neat. I've added some additional tests.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm 👍

/**
* imitate avro extension {@link org.apache.druid.data.input.avro.AvroParsers#parseGenericRecord}
*/
@Nonnull
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't super consistent about this in the codebase but we strive to be better, I think this isn't necessary because we all assume @Nonnull is the default. You can enforce this with a package-info.java if you'd like, but it should be fine to leave out as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, what is the path forward here? For me, explicit is better than implicit. WDYT

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy as well to make this more consistent in the codebase. Maybe we can also use SpotBugs to enforce this a bit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now, thanks for pointing this out. I think the setting everything to NonNull by default is the best practice, I'll pick this up in another PR if that is fine by you.

@Fokko Fokko force-pushed the remove-pig-from-the-tests branch from 7ae3fe2 to d1a1ce7 Compare June 10, 2019 14:19
Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks 👍

@Fokko
Copy link
Copy Markdown
Contributor Author

Fokko commented Jun 12, 2019

@gianm Anything more required to get this merged? Thanks!

@clintropolis clintropolis merged commit f581118 into apache:master Jun 14, 2019
@clintropolis
Copy link
Copy Markdown
Member

Apologies for the delay getting this merged, got busy with some stuff, thanks!

@Fokko Fokko deleted the remove-pig-from-the-tests branch June 15, 2019 07:31
@Fokko
Copy link
Copy Markdown
Contributor Author

Fokko commented Jun 15, 2019

No problem, thanks!

@clintropolis clintropolis added this to the 0.16.0 milestone Aug 8, 2019
jihoonson pushed a commit to implydata/druid-public that referenced this pull request Aug 23, 2019
* Remove Apache Pig from the tests

* Remove the Pig specific part

* Fix the Checkstyle issues

* Cleanup a bit

* Add an additional test

* Revert the abstract class
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants