MINOR: reduce impact of trace logging in replica hot path by lbradstreet · Pull Request #8468 · apache/kafka

lbradstreet · 2020-04-12T01:10:28Z

The impact of trace logging is normally small, on the order of 40ns per getEffectiveLevel check, however this adds up with trace is called multiple times per partition in the replica fetch hot path.

This PR removes some trace logs that are not very useful and reduces cases where the level is checked over and over for one fetch request.

Back of envelope savings calculation

Assume 1500 leader replicas on a broker, 8 followers fetching 200 times per second.

Partitions per fetch request = 1500 * 2 (replication) / 8 (followers) = 375
Partition reads per second = 375 * 8 * 200 = 6000000
CPU time in ms used per second for single trace call saved = 40 * 6000000 / 1000000 = 24ms

Assuming 4 cores each trace call we save saves approximately 24/1000/4 = 0.6%.

lbradstreet · 2020-04-12T01:11:58Z

  override def processPartitionData(topicPartition: TopicPartition,
                                    fetchOffset: Long,
                                    partitionData: FetchData): Option[LogAppendInfo] = {
+    val logTrace = isTraceEnabled


Saves 3 getEffectiveLevel checks.

lbradstreet · 2020-04-12T01:14:29Z


-          trace(s"${records.sizeInBytes} written to log $topicPartition beginning at offset " +
-            s"${info.firstOffset.getOrElse(-1)} and ending at offset ${info.lastOffset}")
+          if (isTraceEnabled)


Looking up the trace level outside of the map allows us to save a isTraceEnabled lookup for every partition being produced to.

lbradstreet · 2020-04-12T01:28:46Z

-        trace(s"Fetching log segment for partition $tp, offset $offset, partition fetch size $partitionFetchSize, " +
-          s"remaining response limit $limitBytes" +
-          (if (minOneMessage) s", ignoring response/partition size limits" else ""))
+        if (isTraceEnabled)


Looking up the trace level above means we only need to get the effective level once even though read() may be called once for each partition.

lbradstreet · 2020-04-14T02:43:11Z

    lastFetchLeaderLogEndOffset = leaderEndOffset
    lastFetchTimeMs = followerFetchTimeMs
    updateLastSentHighWatermark(lastSentHighwatermark)
-    trace(s"Updated state of replica to $this")


I don't think this trace call is worthwhile given the cost. It would be hard to turn it on without being deluged with logs.

lbradstreet · 2020-04-14T02:43:22Z

    */
  private def updateLastSentHighWatermark(highWatermark: Long): Unit = {
    _lastSentHighWatermark = highWatermark
-    trace(s"Updated HW of replica to $highWatermark")


I don't think this trace call is worthwhile given the cost. It would be hard to turn it on without being deluged with logs..

@junrao Thoughts? Can we remove these calls?

At minimum I think we could keep the trace I deleted here: #8468 (comment) which should end up logging the lastSentWatermark anyway?

Yes, I agree this may not be that useful. We have other logging like the following in Partition.updateFollowerFetchState() that provides similar traceability.

debug(s"Recorded replica $followerId log end offset (LEO) position " + s"${followerFetchOffsetMetadata.messageOffset} and log start offset $followerStartOffset.")

junrao

@lbradstreet : Thanks for the PR. LGTM

junrao · 2020-04-17T17:30:15Z

ok to test

* 'trunk' of github.com:apache/kafka: (28 commits) MINOR: cleanup RocksDBStore tests (apache#8510) KAFKA-9818: Fix flaky test in RecordCollectorTest (apache#8507) MINOR: reduce impact of trace logging in replica hot path (apache#8468) KAFKA-6145: KIP-441: Add test scenarios to ensure rebalance convergence (apache#8475) KAFKA-9881: Convert integration test to verify measurements from RocksDB to unit test (apache#8501) MINOR: improve test coverage for dynamic LogConfig(s) (apache#7616) MINOR: Switch order of sections on tumbling and hopping windows in streams doc. Tumbling windows are defined as "special case of hopping time windows" - but hopping windows currently only explained later in the docs. (apache#8505) KAFKA-9819: Fix flaky test in StoreChangelogReaderTest (apache#8488) HOTFIX: fix active task process ratio metric recording KAFKA-9796; Ensure broker shutdown is not stuck when Acceptor is waiting on connection queue (apache#8448) MINOR: Use streaming iterator with decompression buffer when building offset map (apache#8494) Add log message in release.py (apache#8461) KAFKA-9854 Re-authenticating causes mismatched parse of response (apache#8471) KAFKA-9838; Add log concurrency test and fix minor race condition (apache#8476) KAFKA-9703; Free up compression buffer after splitting a large batch KAFKA-9779: Add Stream system test for 2.5 release (apache#8378) KAFKA-7885: TopologyDescription violates equals-hashCode contract. (apache#6210) MINOR: KafkaApis#handleOffsetDeleteRequest does not group result correctly (apache#8485) HOTFIX: don't close or wipe out someone else's state (apache#8478) MINOR: add process(Test)Messages to the README (apache#8480) ...

* apache-github/trunk: MINOR: Fix grammar in error message for InvalidRecordException (apache#8465) KAFKA-9868: Reduce number of transaction log partitions for embed broker (apache#8522) MINOR: Adding github whitelist (apache#8523) MINOR: Upgrade gradle plugins and test libraries for Java 14 support (apache#8519) MINOR: Further reduce runtime for metrics integration tests (apache#8514) MINOR: Use .asf.yaml to direct github notifications to JIRA and mailing lists (apache#8521) MINOR: Update to Gradle 6.3 (apache#7677) HOTFIX: fix checkstyle error of RocksDBStoreTest and flaky RocksDBTimestampedStoreTest.shouldOpenExistingStoreInRegularMode (apache#8515) MINOR: cleanup RocksDBStore tests (apache#8510) KAFKA-9818: Fix flaky test in RecordCollectorTest (apache#8507) MINOR: reduce impact of trace logging in replica hot path (apache#8468) KAFKA-6145: KIP-441: Add test scenarios to ensure rebalance convergence (apache#8475) KAFKA-9881: Convert integration test to verify measurements from RocksDB to unit test (apache#8501) MINOR: improve test coverage for dynamic LogConfig(s) (apache#7616) MINOR: Switch order of sections on tumbling and hopping windows in streams doc. Tumbling windows are defined as "special case of hopping time windows" - but hopping windows currently only explained later in the docs. (apache#8505) KAFKA-9819: Fix flaky test in StoreChangelogReaderTest (apache#8488)

lbradstreet added 2 commits April 6, 2020 14:26

MINOR: reduce impact of trace logging in replica hot path

b3ca4ba

Perform trace check once per partition in replica fetcher thread

58d7677

lbradstreet commented Apr 12, 2020

View reviewed changes

Comment thread core/src/main/scala/kafka/server/ReplicaManager.scala Outdated

lbradstreet commented Apr 12, 2020

View reviewed changes

Avoid calling the underlying logger

c206dd9

lbradstreet commented Apr 14, 2020

View reviewed changes

junrao approved these changes Apr 17, 2020

View reviewed changes

junrao merged commit 851b45c into apache:trunk Apr 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MINOR: reduce impact of trace logging in replica hot path#8468

MINOR: reduce impact of trace logging in replica hot path#8468
junrao merged 3 commits intoapache:trunkfrom
lbradstreet:minor-trace-logger-fixes

lbradstreet commented Apr 12, 2020 •

edited

Loading

Uh oh!

Uh oh!

lbradstreet Apr 12, 2020

Uh oh!

lbradstreet Apr 12, 2020

Uh oh!

lbradstreet Apr 12, 2020

Uh oh!

lbradstreet Apr 14, 2020

Uh oh!

lbradstreet Apr 14, 2020

Uh oh!

ijuma Apr 16, 2020

Uh oh!

lbradstreet Apr 16, 2020

Uh oh!

junrao Apr 17, 2020

Uh oh!

junrao left a comment

Uh oh!

junrao commented Apr 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lbradstreet commented Apr 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Back of envelope savings calculation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

junrao commented Apr 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lbradstreet commented Apr 12, 2020 •

edited

Loading