[Proposal] define a RPC  protocol for querying data, support apache Arrow as data exchange interface

In our scenario , a query of one Druid segment  matched 400million rows from 900 million rows took 79ms to query out the matched data ,but took 19s to serialize all the data into json format, also with a high cpu load.Though json has a better readability, it's unacceptable for large data query.

I also mentioned that some other ones expected a binary transfer protocol ,as http is not good at high speed data transfer.

I will try to define a RPC protocol to substitute the http protocol . Also substitute the json output data format with apache ``` Arrow ``` ,as the output data to be expressed as ```ValueVector``` ,it doesn't need to be serialized or deserialized. Most of other systems like``` Kudu, Spark,Drill ``` are collaborating to support ```Arrow```  as system exchange data standard.

I think a RPC and ```Arrow``` combination should be a candidate or a new interactive standard for the ```broker ,history ,realtime  ```  nodes to exchange data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] define a RPC protocol for querying data, support apache Arrow as data exchange interface #3891

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Proposal] define a RPC protocol for querying data, support apache Arrow as data exchange interface #3891

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions