In our scenario , a query of one Druid segment matched 400million rows from 900 million rows took 79ms to query out the matched data ,but took 19s to serialize all the data into json format, also with a high cpu load.Though json has a better readability, it's unacceptable for large data query.
I also mentioned that some other ones expected a binary transfer protocol ,as http is not good at high speed data transfer.
I will try to define a RPC protocol to substitute the http protocol . Also substitute the json output data format with apache Arrow ,as the output data to be expressed as ValueVector ,it doesn't need to be serialized or deserialized. Most of other systems likeKudu, Spark,Drill are collaborating to support Arrow as system exchange data standard.
I think a RPC and Arrow combination should be a candidate or a new interactive standard for the broker ,history ,realtime nodes to exchange data.
In our scenario , a query of one Druid segment matched 400million rows from 900 million rows took 79ms to query out the matched data ,but took 19s to serialize all the data into json format, also with a high cpu load.Though json has a better readability, it's unacceptable for large data query.
I also mentioned that some other ones expected a binary transfer protocol ,as http is not good at high speed data transfer.
I will try to define a RPC protocol to substitute the http protocol . Also substitute the json output data format with apache
Arrow,as the output data to be expressed asValueVector,it doesn't need to be serialized or deserialized. Most of other systems likeKudu, Spark,Drillare collaborating to supportArrowas system exchange data standard.I think a RPC and
Arrowcombination should be a candidate or a new interactive standard for thebroker ,history ,realtimenodes to exchange data.