Skip to content

druid thrift extensions#3633

Closed
justsmoke wants to merge 3 commits intoapache:masterfrom
justsmoke:feature-thrift-extensions
Closed

druid thrift extensions#3633
justsmoke wants to merge 3 commits intoapache:masterfrom
justsmoke:feature-thrift-extensions

Conversation

@justsmoke
Copy link
Copy Markdown

thrift support for stream and Hadoop-based batch ingestion

@justsmoke
Copy link
Copy Markdown
Author

...Sorry, I did not notice that there is already a pull request on thrift support (#3418)

@justsmoke
Copy link
Copy Markdown
Author

Check pass on jdk8, I will fix problem with jdk7 later
Thrift Protocol is detected automatically (binaryProtocol, compactProtocol, jsonProtocol is supported)
Batch Ingestion is based on elephant-bird LzoThriftBlockInputFormat
Hope this helps....

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Nov 1, 2016

@justsmoke can we please also add documentation similar to the other extensions and also update extensions.md?

@@ -0,0 +1,117 @@
<?xml version="1.0" encoding="UTF-8"?>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing header

Comment thread pom.xml
<module>extensions-contrib/parquet-extensions</module>
<module>extensions-contrib/statsd-emitter</module>
<module>extensions-contrib/orc-extensions</module>
<module>extensions-contrib/thrift-extensions</module>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while you are in this code, can you sort the extensions lexicographically by name?

@justsmoke
Copy link
Copy Markdown
Author

@fjy documentation is added

@drcrallen drcrallen added this to the 0.9.3 milestone Nov 2, 2016
/**
* Attempt to determine the protocol used to serialize some data.
* <p>
* The guess algorithm is copied from TProtocolUtil.java of thrift java runtime library.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we attach a license file for this module that provides attribution for where this code is copied from?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, I will fix it later

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fjy This guess algorithm is copied and revised from guessProtocolFactory in https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/protocol/TProtocolUtil.java
I have no idea on how to merge these two licenses, can you give some suggestions ?

*
* @return a deserializer with correct protocol for the current thread.
*/
private static ThreadLocal<TDeserializer> guessProtocol(final byte[] data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be ThreadLocal?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thrift serializer/deserializers are not thread-safe, I thought it would be better to return ThreadLocal variable in help functions.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Nov 4, 2016

Generally I'm 👍 on the changes. I made some minor comments but given this is an extension, and if there's been some basic testing on things, I think we should get this in for 0.9.3.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Nov 8, 2016

I just realized there is overlap work between this and #3418

@du00cs
Copy link
Copy Markdown
Contributor

du00cs commented Nov 9, 2016

@justsmoke Good job. Protocol aware is awesome, and thread local deserializer makes it thread safe. But I don't quite understand how and when a thrift class is added to classpath?

@justsmoke
Copy link
Copy Markdown
Author

@du00cs In my company, we have a project containing all the thrift generated code, and I just copy this project's jar file into extension folder manually to make it work. It seems that thrift does not have tools similar to Avro schema repo, so I am still trying to find better solutions for this problem.

@justsmoke
Copy link
Copy Markdown
Author

@du00cs I noticed that there is an issue on flattenSpec, it is quite hard for thrift to do it. As far as I know, thrift does not have official tools on this function. And in the code I uploaded, I use a.b.c to represent field c in struct field b in struct field a, what is your suggestions on this ?

@du00cs
Copy link
Copy Markdown
Contributor

du00cs commented Nov 9, 2016

@justsmoke We use thrift exactly as you do in your company. I have implement a simple one in #3418 . I just "serialize" it to simple json and use JSONSpec to do all the left. Specifying a thrift jar in runtime is needed for the thrift jar changes over time. Or you will have to restart middle manager nodes to make it work again.

@justsmoke
Copy link
Copy Markdown
Author

@du00cs Sorry, I did not fully understand your meaning. IMHO, serializing to simple json depends on the thrift schema, and how do you avoid restarting middle manager when updating the schema ?

@du00cs
Copy link
Copy Markdown
Contributor

du00cs commented Nov 9, 2016

@justsmoke I mean a new class only exist in newer thrift jar and you have to do more things do make it work properly if a thrift jar is not specified in runtime.

@justsmoke
Copy link
Copy Markdown
Author

@du00cs you are right, specify thrift jar in runtime is a better solution

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Nov 10, 2016

@du00cs @justsmoke which PR should be pulled in for thrift support? Is it possible you guys merge the changes into one PR?

@du00cs
Copy link
Copy Markdown
Contributor

du00cs commented Nov 11, 2016

@fjy we decide to merge together, still some works to do.

@justsmoke
Copy link
Copy Markdown
Author

@fjy this pr can be closed

@fjy fjy closed this Nov 16, 2016
@gianm gianm removed this from the 0.10.0 milestone Feb 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants