Skip to content

[Proposal] define a RPC protocol for querying data, support apache Arrow as data exchange interface #3891

@weijietong

Description

@weijietong

In our scenario , a query of one Druid segment matched 400million rows from 900 million rows took 79ms to query out the matched data ,but took 19s to serialize all the data into json format, also with a high cpu load.Though json has a better readability, it's unacceptable for large data query.

I also mentioned that some other ones expected a binary transfer protocol ,as http is not good at high speed data transfer.

I will try to define a RPC protocol to substitute the http protocol . Also substitute the json output data format with apache Arrow ,as the output data to be expressed as ValueVector ,it doesn't need to be serialized or deserialized. Most of other systems likeKudu, Spark,Drill are collaborating to support Arrow as system exchange data standard.

I think a RPC and Arrow combination should be a candidate or a new interactive standard for the broker ,history ,realtime nodes to exchange data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions