MINOR: avoid unnecessary collection copies between java and scala in metadata cache#6397
MINOR: avoid unnecessary collection copies between java and scala in metadata cache#6397ijuma merged 4 commits intoapache:trunkfrom radai-rosenblatt:ilovescala
Conversation
|
this is also expected to have a positive impact on GC (less garbage produced in copied Buffer[Int]) |
ijuma
left a comment
There was a problem hiding this comment.
LGTM. I think the win is about avoiding the collection copy more than the unboxing. Good change anyway.
|
One more thing, let's add a comment explaining this or else someone will revert the change. |
There was a problem hiding this comment.
Actually, do you think we can switch this to be a plain j.u.List? See next comment as well.
There was a problem hiding this comment.
basically, these fields are all originally j.u.Lists. There seems to be an unnecessary conversion to Scala collections and then back to Java (https://github.com/apache/kafka/pull/6397/files#diff-bfeebf48d90e86c1ffb850fbbe019dd6R112). I think we can avoid these altogether. Although there are Scala-collection-specific filters applied for debug logging (https://github.com/apache/kafka/pull/6397/files#diff-bfeebf48d90e86c1ffb850fbbe019dd6R107 for example), you can just use JavaConverters.collectionAsScalaIterable which IIUC does not do any copying.
There was a problem hiding this comment.
actually, nm - these aren't copying conversions except for the type conversion which your patch eliminates. So I think we are good.
There was a problem hiding this comment.
Yeah, none of the converters do copying. It's just unfortunate that there's no way to have Int as a collection parameter without copying. We can live with Integer here.
|
@ijuma - commit edited to explain the issue a little better. also - i never implied the win was in avoiding the boxing - i was pointing out the map() calls, not the _toInt bits (sorry for being unclear) |
ijuma
left a comment
There was a problem hiding this comment.
Thanks, can you please add a comment to the method explaining why we use Iterable[Integer]? That will help ensure people don't change this in the future.
the metadata cache used map() to convert java.util.List<Integer>s into Iterable[Int]. the map() calls removed by this patch represent 11% of total CPU time measured under load for us. we also expect a positive impact on GC. Signed-off-by: radai-rosenblatt <radai.rosenblatt@gmail.com>
Signed-off-by: radai-rosenblatt <radai.rosenblatt@gmail.com>
|
@ijuma - i've added a comment explaining why the method signature for getEndpoints() retains the java types. |
* apache/trunk: (23 commits) KAFKA-7986: Distinguish logging from different ZooKeeperClient instances (apache#6493) KAFKA-8102: Add an interval-based Trogdor transaction generator (apache#6444) MINOR: Fix misspelling in protocol documentation KAFKA-8150: Fix bugs in handling null arrays in generated RPC code (apache#6489) KAFKA-8014: Extend Connect integration tests to add and remove workers dynamically (apache#6342) MINOR: Remove line for testing repartition topic name (apache#6488) MINOR: add MacOS requirement to Streams docs MINOR: fix message protocol help text for ElectPreferredLeadersResult (apache#6479) MINOR: list-topics should not require topic param MINOR: Clean up ThreadCacheTest (apache#6485) MINOR: Avoid unnecessary collection copy in MetadataCache (apache#6397) KAFKA-8142: Fix NPE for nulls in Headers (apache#6484) KAFKA-7243: Add unit integration tests to validate metrics in Kafka Streams (apache#6080) MINOR: Add verification step for Streams archetype to Jenkins build (apache#6431) KAFKA-7819: Improve RoundTripWorker (apache#6187) KAFKA-7989: RequestQuotaTest should wait for quota config change before running tests (apache#6482) KAFKA-8098: Fix Flaky Test testConsumerGroups KAFKA-6958: Add new NamedOperation interface to enforce consistency in naming operations (apache#6409) MINOR: capture result timestamps in Kafka Streams DSL tests (apache#6447) MINOR: updated names for deprecated streams constants (apache#6466) ...
`map` was being used to convert `Iterable[Integer]` to `Iterable[Int`]. That operation represented 11% of total CPU time measured under load for us. We also expect a positive impact on GC. Reviewers: Joel Koshy <jjkoshy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Signed-off-by: radai-rosenblatt radai.rosenblatt@gmail.com
a profiler run on one of our brokers under load showed that ~11% of the global cpu time for the entire broker was taken by these 3 .map() calls:
5.36% in replicas.asScala.map(_.toInt) (line 79)
2.31% in offlineReplicas.asScala.map(_.toInt) (line 81)
3.58% in isr.asScala.map(_.toInt) (line 97)
for a total of 11.25%