Skip to content

MINOR: avoid unnecessary collection copies between java and scala in metadata cache#6397

Merged
ijuma merged 4 commits intoapache:trunkfrom
radai-rosenblatt:ilovescala
Mar 22, 2019
Merged

MINOR: avoid unnecessary collection copies between java and scala in metadata cache#6397
ijuma merged 4 commits intoapache:trunkfrom
radai-rosenblatt:ilovescala

Conversation

@radai-rosenblatt
Copy link
Copy Markdown
Contributor

@radai-rosenblatt radai-rosenblatt commented Mar 7, 2019

Signed-off-by: radai-rosenblatt radai.rosenblatt@gmail.com

a profiler run on one of our brokers under load showed that ~11% of the global cpu time for the entire broker was taken by these 3 .map() calls:

5.36% in replicas.asScala.map(_.toInt) (line 79)
2.31% in offlineReplicas.asScala.map(_.toInt) (line 81)
3.58% in isr.asScala.map(_.toInt) (line 97)

for a total of 11.25%

@radai-rosenblatt
Copy link
Copy Markdown
Contributor Author

this is also expected to have a positive impact on GC (less garbage produced in copied Buffer[Int])

@jjkoshy jjkoshy changed the title stop wasting CPU copying java collections in scala MINOR: avoid unnecessary Java to Scala int conversion in metadata cache Mar 8, 2019
Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think the win is about avoiding the collection copy more than the unboxing. Good change anyway.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Mar 8, 2019

One more thing, let's add a comment explaining this or else someone will revert the change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, do you think we can switch this to be a plain j.u.List? See next comment as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically, these fields are all originally j.u.Lists. There seems to be an unnecessary conversion to Scala collections and then back to Java (https://github.com/apache/kafka/pull/6397/files#diff-bfeebf48d90e86c1ffb850fbbe019dd6R112). I think we can avoid these altogether. Although there are Scala-collection-specific filters applied for debug logging (https://github.com/apache/kafka/pull/6397/files#diff-bfeebf48d90e86c1ffb850fbbe019dd6R107 for example), you can just use JavaConverters.collectionAsScalaIterable which IIUC does not do any copying.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, nm - these aren't copying conversions except for the type conversion which your patch eliminates. So I think we are good.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, none of the converters do copying. It's just unfortunate that there's no way to have Int as a collection parameter without copying. We can live with Integer here.

@radai-rosenblatt
Copy link
Copy Markdown
Contributor Author

radai-rosenblatt commented Mar 8, 2019

@ijuma - commit edited to explain the issue a little better. also - i never implied the win was in avoiding the boxing - i was pointing out the map() calls, not the _toInt bits (sorry for being unclear)

@radai-rosenblatt radai-rosenblatt changed the title MINOR: avoid unnecessary Java to Scala int conversion in metadata cache MINOR: avoid unnecessary collection copies between java and scala in metadata cache Mar 8, 2019
Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, can you please add a comment to the method explaining why we use Iterable[Integer]? That will help ensure people don't change this in the future.

Radai Rosenblatt and others added 2 commits March 14, 2019 19:16
the metadata cache used map() to convert java.util.List<Integer>s into Iterable[Int].
the map() calls removed by this patch represent 11% of total CPU time measured under load for us.
we also expect a positive impact on GC.

Signed-off-by: radai-rosenblatt <radai.rosenblatt@gmail.com>
Signed-off-by: radai-rosenblatt <radai.rosenblatt@gmail.com>
@radai-rosenblatt
Copy link
Copy Markdown
Contributor Author

@ijuma - i've added a comment explaining why the method signature for getEndpoints() retains the java types.

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ijuma ijuma merged commit 1deb072 into apache:trunk Mar 22, 2019
@radai-rosenblatt radai-rosenblatt deleted the ilovescala branch March 22, 2019 15:57
jarekr pushed a commit to confluentinc/kafka that referenced this pull request Apr 18, 2019
* apache/trunk: (23 commits)
  KAFKA-7986: Distinguish logging from different ZooKeeperClient instances (apache#6493)
  KAFKA-8102: Add an interval-based Trogdor transaction generator (apache#6444)
  MINOR: Fix misspelling in protocol documentation
  KAFKA-8150: Fix bugs in handling null arrays in generated RPC code (apache#6489)
  KAFKA-8014: Extend Connect integration tests to add and remove workers dynamically (apache#6342)
  MINOR: Remove line for testing repartition topic name (apache#6488)
  MINOR: add MacOS requirement to Streams docs
  MINOR: fix message protocol help text for ElectPreferredLeadersResult (apache#6479)
  MINOR: list-topics should not require topic param
  MINOR: Clean up ThreadCacheTest (apache#6485)
  MINOR: Avoid unnecessary collection copy in MetadataCache (apache#6397)
  KAFKA-8142: Fix NPE for nulls in Headers (apache#6484)
  KAFKA-7243: Add unit integration tests to validate metrics in Kafka Streams (apache#6080)
  MINOR: Add verification step for Streams archetype to Jenkins build (apache#6431)
  KAFKA-7819: Improve RoundTripWorker (apache#6187)
  KAFKA-7989: RequestQuotaTest should wait for quota config change before running tests (apache#6482)
  KAFKA-8098: Fix Flaky Test testConsumerGroups
  KAFKA-6958: Add new NamedOperation interface to enforce consistency in naming operations (apache#6409)
  MINOR: capture result timestamps in Kafka Streams DSL tests (apache#6447)
  MINOR: updated names for deprecated streams constants (apache#6466)
  ...
pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
`map` was being used to convert `Iterable[Integer]` to `Iterable[Int`]. That
operation represented 11% of total CPU time measured under load for us.
We also expect a positive impact on GC.

Reviewers: Joel Koshy <jjkoshy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants