Motivation
When I used the Druid histogram extension to calculate the quantile value, I found that the performance was too poor compared with the exact quantile value of Clickhouse.
Through Arthas's monitor tool to monitor the aggregate function of FixedbuckethistogramBufferAggregator, I found that nearly 40% of the overhead is spent on serialization and deserialization. In order to avoid the overhead in this aspect, I try to change the intermediate results from the offheap memory to onheap memory of the JVM to avoid the overhead of serialization and deserialization of intermediate results. The query performance is optimized from more than 10s to more than 3S.
offheap

onheap

Proposed changes
Add a cache in the FixedbucketshistogramBufferaggregator class to save the intermediate results.
Rationale
Use jvm onheap memory to save temporary results, avoiding unnecessary serialization and deserialization overhead.
Motivation
When I used the Druid histogram extension to calculate the quantile value, I found that the performance was too poor compared with the exact quantile value of Clickhouse.
Through Arthas's monitor tool to monitor the aggregate function of FixedbuckethistogramBufferAggregator, I found that nearly 40% of the overhead is spent on serialization and deserialization. In order to avoid the overhead in this aspect, I try to change the intermediate results from the offheap memory to onheap memory of the JVM to avoid the overhead of serialization and deserialization of intermediate results. The query performance is optimized from more than 10s to more than 3S.
offheap
onheap
Proposed changes
Add a cache in the FixedbucketshistogramBufferaggregator class to save the intermediate results.
Rationale
Use jvm onheap memory to save temporary results, avoiding unnecessary serialization and deserialization overhead.