Add protobuf schema registry#10839
Conversation
|
Hi @bananaaggle, thanks for the contribution it seems like a useful addition and I would like to review it, but it is impossible if you keep closing and opening as a new PR. Could you please just stick to one PR and do any updates in place so that we can review it? |
Sorry, I will stick to one PR. |
| compatible_licenses['Apache License, version 2.0'] = 'Apache License version 2.0' | ||
| compatible_licenses['Apache 2.0 License'] = 'Apache License version 2.0' | ||
| compatible_licenses['Apache License, 2.0'] = 'Apache License version 2.0' | ||
| compatible_licenses['Confluent Community License'] = 'Confluent Community License' |
There was a problem hiding this comment.
hmm, what requires the confluent community license? I don't see anything in licenses.yaml mentioning it. Last i knew, the clients to confluent things were still apache licensed (i'm not sure we can distribute anything with this license in Druid packages, would have to check into that, but I suspect not since it has field of use restrictions)
There was a problem hiding this comment.
There was a problem hiding this comment.
I looked at the source files for kafka-protobuf-provider here, https://github.com/confluentinc/schema-registry/tree/master/protobuf-provider/src/main/java/io/confluent/kafka/schemaregistry/protobuf
and there appears to be a mix of Apache and Confluent Community License headers, which I assume it means both licenses apply.
Unfortunately, I think this means that we cannot distribute this jar, but I think that instead we can provide instructions in the documentation on how users can download it (see https://www.apache.org/legal/resolved.html#optional), similar to how we do with the MySQL connector.
The pom.xml will need to mark this jar as provided, and you will need to remove this license from this python script, and remove kafka-protobuf-provider from license.yaml. I think maybe it is ok if tests still depend on it, since it wouldn't cause packaging to distribute it? but i'm unsure exactly...
| @JsonProperty("parseSpec") ParseSpec parseSpec, | ||
| @JsonProperty("descriptor") String descriptorFilePath, | ||
| @JsonProperty("protoMessageType") String protoMessageType | ||
| @JsonProperty("protoBytesDecoder") ProtobufBytesDecoder protobufBytesDecoder |
There was a problem hiding this comment.
So that this change is not incompatible with previous druid versions, I think we should probably consider leaving the old constructor properties in place and marking them as deprecated instead of removing them, and then enforce that only one of protoBytesDecoder or descriptor/protoMessageType can be set, and then automatically construct a 'FileBasedProtobufBytesDecoder' if the old properties are set and the new is null.
Then in a future version we can consider dropping them
There was a problem hiding this comment.
Good idea, I'll implement it.
| } | ||
| } | ||
|
|
||
| @SuppressWarnings("checkstyle:RightCurly") |
There was a problem hiding this comment.
what is this suppression for?
| DynamicMessage message = DynamicMessage.parseFrom(descriptor, rawMessage); | ||
| return message; | ||
| } | ||
| catch (Exception e) { |
There was a problem hiding this comment.
maybe this "try" should block be split up error handling so that error messaging for failing to fetch the schema from the registry, which doesn't exactly seem like a parse exception, can be distinguished from an actual parse exception
…pt to old version
|
@clintropolis Hi, I change code follow your suggestion two weeks ago, could you please help me review this change? |
There was a problem hiding this comment.
Sorry very for the delay @bananaaggle! The code changes lgtm 👍, but I think we need to fix the license stuff. One minor nitpick is that some of the exception handling could maybe be a bit more specific than catching Exception probably?
Also, could you update the documentation for the protobuf extension, https://github.com/apache/druid/blob/master/docs/development/extensions-core/protobuf.md, to include the new stuff, and instructions to get the jar to use it? You can do something similar to here: https://github.com/apache/druid/blob/master/docs/development/extensions-core/mysql.md#installing-the-mysql-connector-library.
| compatible_licenses['Apache License, version 2.0'] = 'Apache License version 2.0' | ||
| compatible_licenses['Apache 2.0 License'] = 'Apache License version 2.0' | ||
| compatible_licenses['Apache License, 2.0'] = 'Apache License version 2.0' | ||
| compatible_licenses['Confluent Community License'] = 'Confluent Community License' |
There was a problem hiding this comment.
I looked at the source files for kafka-protobuf-provider here, https://github.com/confluentinc/schema-registry/tree/master/protobuf-provider/src/main/java/io/confluent/kafka/schemaregistry/protobuf
and there appears to be a mix of Apache and Confluent Community License headers, which I assume it means both licenses apply.
Unfortunately, I think this means that we cannot distribute this jar, but I think that instead we can provide instructions in the documentation on how users can download it (see https://www.apache.org/legal/resolved.html#optional), similar to how we do with the MySQL connector.
The pom.xml will need to mark this jar as provided, and you will need to remove this license from this python script, and remove kafka-protobuf-provider from license.yaml. I think maybe it is ok if tests still depend on it, since it wouldn't cause packaging to distribute it? but i'm unsure exactly...
It is a good idea. I've separated this jar and write document for its usage. |
|
@clintropolis Hi, I've written document for this feature. Can you help me refine it? Thanks. |
clintropolis
left a comment
There was a problem hiding this comment.
lgtm, just a few minor suggestions on docs, thanks 👍
| ``` | ||
|
|
||
| ### Descriptor file | ||
| ### When use descriptor file |
There was a problem hiding this comment.
| ### When use descriptor file | |
| ### When using a descriptor file |
| ### Proto file | ||
|
|
||
| The corresponding proto file for our 'metrics' dataset looks like this. | ||
| The corresponding proto file for our 'metrics' dataset looks like this. You can use Protobuf parser with a proto file or Confluent schema registry. |
There was a problem hiding this comment.
| The corresponding proto file for our 'metrics' dataset looks like this. You can use Protobuf parser with a proto file or Confluent schema registry. | |
| The corresponding proto file for our 'metrics' dataset looks like this. You can use Protobuf parser with a proto file or [Confluent Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html). |
| } | ||
| ``` | ||
|
|
||
| This feature uses Confluent's Protobuf provider which is not included in the Druid distribution and must be installed separately. You can fetch it and its dependencies from Maven Central at: |
There was a problem hiding this comment.
| This feature uses Confluent's Protobuf provider which is not included in the Druid distribution and must be installed separately. You can fetch it and its dependencies from Maven Central at: | |
| This feature uses Confluent's Protobuf provider which is not included in the Druid distribution and must be installed separately. You can fetch it and its dependencies from the Confluent repository and Maven Central at: |
| } | ||
| ``` | ||
|
|
||
| ### When use schema registry |
There was a problem hiding this comment.
| ### When use schema registry | |
| ### When using Schema Registry |
| Below is the complete Supervisor spec JSON to be submitted to the Overlord. | ||
| Make sure these keys are properly configured for successful ingestion. | ||
|
|
||
| ### When use descriptor file |
There was a problem hiding this comment.
| ### When use descriptor file | |
| ### When using a descriptor file |
| protoc -o /tmp/metrics.desc ./quickstart/protobuf/metrics.proto | ||
| ``` | ||
|
|
||
| ### When use schema registry |
There was a problem hiding this comment.
| ### When use schema registry | |
| ### When using Schema Registry |
|
|
||
| ### When use schema registry | ||
|
|
||
| At first make sure your schema registry version is later than 5.5. Next, we post this schema to schema registry. |
There was a problem hiding this comment.
| At first make sure your schema registry version is later than 5.5. Next, we post this schema to schema registry. | |
| Make sure your Schema Registry version is later than 5.5. Next, we can post a schema to add it to the registry: |
|
@clintropolis Hi, thanks for your review. I've refine document for this feature. What else should I do before merging? |
I think this is g2g after CI. If you are interested in doing a follow-up PR or two, the protobuf extension is one of the few ingestion formats that has not been ported to the newer Another thing, after #10929 goes in it would probably be possible to add an integration test for the functionality added in this PR. A fair warning, our integration tests are a bit rough around the edges to work with, but I think most of the parts would be there (I guess the protobuf jar we can't include would need added to the Dockerfile similar to mysql). I'm not sure we actually have any integration tests using protobuf currently. |
Yeah, I will learn something about InputFormat interface and implement it with protobuf. As for integration test, I'm not very familiar with this component, but I think I can finish it. |
| @JsonCreator | ||
| public SchemaRegistryBasedProtobufBytesDecoder( | ||
| @JsonProperty("url") String url, | ||
| @JsonProperty("capacity") Integer capacity |
There was a problem hiding this comment.
Oops, I forgot about one thing, could you add support for supplying config and headers to the schema registry client? See https://github.com/apache/druid/pull/10314/files#diff-29d909859e829594d32e7d39a72d233c9947cedf9b35ddf9e4566547799adb0fR51-R52 for details (and docs could be more or less copied too probably https://github.com/apache/druid/pull/10314/files#diff-b5e5f4cf38f24309777ff1750a3ed7d5f55492afe2e8eded4f7f71337b2c4a90R1041)
|
Hi, @clintropolis. I've added support for supplying config and headers. And I change document about it, can you review it? Thanks. |
clintropolis
left a comment
There was a problem hiding this comment.
changes lgtm, thanks 👍
CI is failing asking for druid-protobuf-extensions to have
<dependency>
<groupId>com.google.code.findbugs</groupId>
<artifactId>jsr305</artifactId>
<version>2.0.1</version>
<scope>provided</scope>
</dependency>
added to the pom.xml, not really sure which of the changes caused that to happen...



Confluent's Schema Registry supports for deserializing protobuf now. So I add this feature to druid-protobuf-extension module. This module only work when user provide a .desc file in old version. Now it can get schema from schema registry dynamically.
This submission adds the io.confluent:kafka-schema-registry-client:6.0.1/io.confluent:kafka-protobuf-provider:6.0.1 dependency to the druid-protobuf-extension module.
Old spec is:
New spec is:
and
Description
Fixed the bug ...
Renamed the class ...
Added a forbidden-apis entry ...
This PR has:
Key changed/added classes in this PR
MyFooOurBarTheirBaz