Skip to content

Add integration test for protobuf#11126

Merged
clintropolis merged 32 commits intoapache:masterfrom
bananaaggle:add_integration_test_for_protobuf
May 17, 2021
Merged

Add integration test for protobuf#11126
clintropolis merged 32 commits intoapache:masterfrom
bananaaggle:add_integration_test_for_protobuf

Conversation

@bananaaggle
Copy link
Copy Markdown
Contributor

Now there is no stream integration test for Protobuf data. I add it to integration test.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@bananaaggle
Copy link
Copy Markdown
Contributor Author

@clintropolis Hi, happy weekend! I add integration test of Protobuf, but I meet some troubles. Test with schema registry is OK, but it will include kafka-protobuf-provider in pom which is licensed by Confluent Community. Is it OK because is used only in test? If not, how can I do to add it to integration test? Another problem is file based protobuf encoder needs a file for test, I did't find out how to add it in code, so its test is not work actually. Can you give me some advices to solve them? Thanks.

@clintropolis
Copy link
Copy Markdown
Member

Thanks so much for taking this on, this fills one of the last test coverage gaps in the core ingestion extensions!

Test with schema registry is OK, but it will include kafka-protobuf-provider in pom which is licensed by Confluent Community. Is it OK because is used only in test? If not, how can I do to add it to integration test?

I think we might be able to add the jar(s?) the same way we do for the MySQL connector, which is GPL licensed, by just downloading directly in the Dockerfile, see https://github.com/apache/druid/blob/master/integration-tests/docker/Dockerfile#L45.

Another problem is file based protobuf encoder needs a file for test, I did't find out how to add it in code, so its test is not work actually. Can you give me some advices to solve them?

Hmm, I think a .proto file with the wikipedia schema could be converted into the .desc file needed for ingestion following the instructions here: https://druid.apache.org/docs/latest/development/extensions-core/protobuf.html#metricsdesc. Then, this descriptor file would need added to git, maybe in https://github.com/apache/druid/tree/master/integration-tests/docker or https://github.com/apache/druid/tree/master/integration-tests/docker/test-data. Next, it would need to be copied to the docker container as part of setup; sample wikipedia data for batch ingestion and some lookup configuration stuff are also copied in a similar manner, see https://github.com/apache/druid/blob/master/integration-tests/script/copy_resources.sh#L81. After this, wherever this file is copied to (path on the container) should now be available to specify in the ingestion spec.

@bananaaggle
Copy link
Copy Markdown
Contributor Author

@clintropolis Hi. I can run one of those tests successful, but if when I run two tests such as 'protobuf/input_format' and 'protobuf/parser', it will fail. I check druid log and find no error. Can you help me check it? I use an absolute path in test file like 'file:///shared/wikiticker-it/wikipedia.desc'. Should I replace it as an alias?

@bananaaggle
Copy link
Copy Markdown
Contributor Author

bananaaggle commented Apr 20, 2021

@clintropolis Is it there anything I can do when I finish this PR? I'm glad to contribute more code to community, any area is OK, such as core, ingestion or test. And where can I get more information about it?

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. I can run one of those tests successful, but if when I run two tests such as 'protobuf/input_format' and 'protobuf/parser', it will fail. I check druid log and find no error. Can you help me check it? I use an absolute path in test file like 'file:///shared/wikiticker-it/wikipedia.desc'. Should I replace it as an alias?

Sorry I haven't had a chance to get back to this until now. Hmm, this looks right to me, but I am not terribly familiar with this extension. I'll try to pull your branch today and see if I can help determine why this test is failing. If it proves too troublesome, I'm also ok if you want to split out adding the file based test into a separate PR, since the schema registry test is useful on its own.

@clintropolis Is it there anything I can do when I finish this PR? I'm glad to contribute more code to community, any area is OK, such as core, ingestion or test. And where can I get more information about it?

One thing I can think of related to some of the stuff you've done so far, would be switching the 'config' and 'headers' map properties of the schema-registry based parsers and input formats (avro, protobuf) to use the DynamicConfigProvider added in #10309 (see also #9351 and #10658). It doesn't really change much by itself, but will allow for building extensions which can provide this potentially sensitive information directly via a secure channel, such as a cloud encryption service, instead of needing to embed it in the JSON spec as plain text. I think it is the preferred alternative to using PasswordProvider these days, which provides similar facilities for single properties.

You can also check out https://github.com/apache/druid/issues?q=is%3Aopen+is%3Aissue+label%3A%22Contributions+Welcome%22 and https://github.com/apache/druid/issues?q=is%3Aopen+is%3Aissue+label%3AStarter, though I'm not sure how current that list is and some of the older ones might no longer be relevant or have had code refactored already. Let me get back to you on this as well, there is always something to do so I'll see if I can find anything else, it is great you want to help out!

Comment thread integration-tests/pom.xml Outdated
<DRUID_INTEGRATION_TEST_SKIP_RUN_DOCKER>${docker.run.skip}</DRUID_INTEGRATION_TEST_SKIP_RUN_DOCKER>
<DRUID_INTEGRATION_TEST_INDEXER>${it.indexer}</DRUID_INTEGRATION_TEST_INDEXER>
<MYSQL_VERSION>${mysql.version}</MYSQL_VERSION>
<MYSQL_VERSION>${mysql.version}</MYSQL_VERSION>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: duplicate line

Comment thread integration-tests/pom.xml
<DRUID_INTEGRATION_TEST_INDEXER>${it.indexer}</DRUID_INTEGRATION_TEST_INDEXER>
<MYSQL_VERSION>${mysql.version}</MYSQL_VERSION>
<MYSQL_VERSION>${mysql.version}</MYSQL_VERSION>
<CONFLUENT_VERSION>5.5.1</CONFLUENT_VERSION>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might make sense to pull this up all the way to the top level pom.xml, then here, the avro extension, and the protobuf extension all share it to keep the version synced instead of each defining their own. I do notice that protobuf is using a newer version than avro and here, so I would be ok if we want to save this for a follow-up PR since it wouldn't be directly related adding this integration test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, 5.5.1 can support protobuf and in avro extension, its version is 5.5.1 too. In protobuf extension, its version is 6.0.1. I think those two versions are both OK because there is no change for core function. So which version is better?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably best to align the Confluent version with the Kafka version we use since Confluent dependencies often require Kafka dependencies. We still need to exclude those Kafka dependencies, but keeping them close should avoid compatibility issues.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already define a confluent version here https://github.com/apache/druid/blob/master/extensions-core/protobuf-extensions/pom.xml#L37 which is newer, so we should at least align that

@bananaaggle
Copy link
Copy Markdown
Contributor Author

Hi. I can run one of those tests successful, but if when I run two tests such as 'protobuf/input_format' and 'protobuf/parser', it will fail. I check druid log and find no error. Can you help me check it? I use an absolute path in test file like 'file:///shared/wikiticker-it/wikipedia.desc'. Should I replace it as an alias?

Sorry I haven't had a chance to get back to this until now. Hmm, this looks right to me, but I am not terribly familiar with this extension. I'll try to pull your branch today and see if I can help determine why this test is failing. If it proves too troublesome, I'm also ok if you want to split out adding the file based test into a separate PR, since the schema registry test is useful on its own.

Sorry, my expression is not very clear. When I test 'protobuf/input_format' and 'protobuf/parser' with schema registry, this trouble exists too. I don't know what happened because it works if run it alone. I think it's not very easy to find out bug. It easy to separate those file based test as another PR, but I want to figure out what cause this problem before.

@bananaaggle
Copy link
Copy Markdown
Contributor Author

Hi, @clintropolis, I've solved problems. It caused by static DynamicMessage.Builder, I fixed it and all tests can pass.

@clintropolis
Copy link
Copy Markdown
Member

Hi, @clintropolis, I've solved problems. It caused by static DynamicMessage.Builder, I fixed it and all tests can pass.

Glad you figured it out! The PR looks good to me, but the spotbugs check is failing:

[ERROR] High: org.apache.druid.testing.utils.ProtobufEventSerializer.SCHEMA isn't final but should be [org.apache.druid.testing.utils.ProtobufEventSerializer] At ProtobufEventSerializer.java:[line 61] MS_SHOULD_BE_FINAL

I did manually trigger the kafka format integration tests and they passed, so after the spotbugs thing is fixed up I think it should be good to merge 👍

@bananaaggle
Copy link
Copy Markdown
Contributor Author

Hi, @clintropolis, I've solved problems. It caused by static DynamicMessage.Builder, I fixed it and all tests can pass.

Glad you figured it out! The PR looks good to me, but the spotbugs check is failing:

[ERROR] High: org.apache.druid.testing.utils.ProtobufEventSerializer.SCHEMA isn't final but should be [org.apache.druid.testing.utils.ProtobufEventSerializer] At ProtobufEventSerializer.java:[line 61] MS_SHOULD_BE_FINAL

I did manually trigger the kafka format integration tests and they passed, so after the spotbugs thing is fixed up I think it should be good to merge 👍

Fixed.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 lgtm, thanks!

@clintropolis clintropolis merged commit 3be8e29 into apache:master May 17, 2021
Comment on lines +50 to +52
RUN wget -q "https://packages.confluent.io/maven/io/confluent/kafka-protobuf-provider/$CONFLUENT_VERSION/kafka-protobuf-provider-$CONFLUENT_VERSION.jar" \
-O /usr/local/druid/lib/kafka-protobuf-provider.jar

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to avoid this by enabling the protobuf-extension which already includes this dependency

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't actually include the dependency, there are licensing issues and that jar isn't solely Apache licensed, but actually dual Apache and Confluent community license, so people need to fetch it themselves similar to MySQL, see #10839 (comment)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I did not realize that. Maybe a comment would be worthwhile

Comment thread integration-tests/pom.xml
Comment on lines +375 to +385
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-protobuf-provider</artifactId>
<version>5.5.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.11.0</version>
</dependency>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove all this by enabling the protobuf extension, no?

@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants