Using ObjectReader in the SqlSegmentsMetadataManager.doPollSegments#17732
Conversation
|
Thanks for creating the PR, @umisan ! Have you been able to compare the performance of this code while polling real segments before and after the change? In my experience, most of the time in the poll is actually spent in the IO itself rather than the Jackson deserialization. |
|
@kfaraz Sorry, I haven't tested this change on our Druid cluster yet. I completely agree with your point that most of the time spent during polling is due to I/O, and the improvement from deserialization optimization might be negligible. Our Druid cluster has about 1 million segments and takes several minutes to load newly added segments. Unfortunately, we don't have a staging Druid cluster, so I haven't been able to test this change in an environment with a large number of segments. I am considering setting up a test Druid cluster to evaluate this change. However, it's possible that the results will show that this PR doesn't provide meaningful improvements. |
@umisan , yes, that's what I fear as well. FYI, we have recently merged a segment caching feature in #17653 . |
|
I understand the current situation. Thank you for reviewing my PR and sharing your insights! |
|
Thanks a lot, @umisan ! I am really glad to hear that you have enjoyed using Druid. |
|
@umisan i didn't find any information about ObjectReader from the link you gave. If the ObjectReader has better performance, what I think is we can apply it to the ingestion module which uses ObjectMapper heavily. |
|
@FrankChen021 |
|
While the document states that ObjectReader is performant than ObjectMapper, the gain is trivial. But I think we can still accpet and merge this change as it can be seen a best practice of using jackson. |
Description
This PR aims to speed up metadata reading, improving performance during metadata polling.
This patch changes the code to use ObjectReader instead of ObjectMapper when reading multiple JSON objects. Since ObjectReader is slightly faster in this scenario, this change should improve the performance of metadata polling.
jackson document
Release note
Improved: You can now load newly added segments more quickly.
Key changed/added classes in this PR
SqlSegmentsMetadataManagerThis PR has: