Conversation
|
...Sorry, I did not notice that there is already a pull request on thrift support (#3418) |
|
Check pass on jdk8, I will fix problem with jdk7 later |
|
@justsmoke can we please also add documentation similar to the other extensions and also update extensions.md? |
| @@ -0,0 +1,117 @@ | |||
| <?xml version="1.0" encoding="UTF-8"?> | |||
| <module>extensions-contrib/parquet-extensions</module> | ||
| <module>extensions-contrib/statsd-emitter</module> | ||
| <module>extensions-contrib/orc-extensions</module> | ||
| <module>extensions-contrib/thrift-extensions</module> |
There was a problem hiding this comment.
while you are in this code, can you sort the extensions lexicographically by name?
|
@fjy documentation is added |
| /** | ||
| * Attempt to determine the protocol used to serialize some data. | ||
| * <p> | ||
| * The guess algorithm is copied from TProtocolUtil.java of thrift java runtime library. |
There was a problem hiding this comment.
Can we attach a license file for this module that provides attribution for where this code is copied from?
There was a problem hiding this comment.
Thanks for the review, I will fix it later
There was a problem hiding this comment.
@fjy This guess algorithm is copied and revised from guessProtocolFactory in https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/protocol/TProtocolUtil.java
I have no idea on how to merge these two licenses, can you give some suggestions ?
| * | ||
| * @return a deserializer with correct protocol for the current thread. | ||
| */ | ||
| private static ThreadLocal<TDeserializer> guessProtocol(final byte[] data) |
There was a problem hiding this comment.
why does this need to be ThreadLocal?
There was a problem hiding this comment.
Thrift serializer/deserializers are not thread-safe, I thought it would be better to return ThreadLocal variable in help functions.
|
Generally I'm 👍 on the changes. I made some minor comments but given this is an extension, and if there's been some basic testing on things, I think we should get this in for 0.9.3. |
|
I just realized there is overlap work between this and #3418 |
|
@justsmoke Good job. Protocol aware is awesome, and thread local deserializer makes it thread safe. But I don't quite understand how and when a thrift class is added to classpath? |
|
@du00cs In my company, we have a project containing all the thrift generated code, and I just copy this project's jar file into extension folder manually to make it work. It seems that thrift does not have tools similar to Avro schema repo, so I am still trying to find better solutions for this problem. |
|
@du00cs I noticed that there is an issue on flattenSpec, it is quite hard for thrift to do it. As far as I know, thrift does not have official tools on this function. And in the code I uploaded, I use a.b.c to represent field c in struct field b in struct field a, what is your suggestions on this ? |
|
@justsmoke We use thrift exactly as you do in your company. I have implement a simple one in #3418 . I just "serialize" it to simple json and use JSONSpec to do all the left. Specifying a thrift jar in runtime is needed for the thrift jar changes over time. Or you will have to restart middle manager nodes to make it work again. |
|
@du00cs Sorry, I did not fully understand your meaning. IMHO, serializing to simple json depends on the thrift schema, and how do you avoid restarting middle manager when updating the schema ? |
|
@justsmoke I mean a new class only exist in newer thrift jar and you have to do more things do make it work properly if a thrift jar is not specified in runtime. |
|
@du00cs you are right, specify thrift jar in runtime is a better solution |
|
@du00cs @justsmoke which PR should be pulled in for thrift support? Is it possible you guys merge the changes into one PR? |
|
@fjy we decide to merge together, still some works to do. |
|
@fjy this pr can be closed |
thrift support for stream and Hadoop-based batch ingestion