Skip to content

MINOR: reduce impact of trace logging in replica hot path#8468

Merged
junrao merged 3 commits intoapache:trunkfrom
lbradstreet:minor-trace-logger-fixes
Apr 17, 2020
Merged

MINOR: reduce impact of trace logging in replica hot path#8468
junrao merged 3 commits intoapache:trunkfrom
lbradstreet:minor-trace-logger-fixes

Conversation

@lbradstreet
Copy link
Copy Markdown
Contributor

@lbradstreet lbradstreet commented Apr 12, 2020

The impact of trace logging is normally small, on the order of 40ns per getEffectiveLevel check, however this adds up with trace is called multiple times per partition in the replica fetch hot path.

This PR removes some trace logs that are not very useful and reduces cases where the level is checked over and over for one fetch request.

Back of envelope savings calculation

Assume 1500 leader replicas on a broker, 8 followers fetching 200 times per second.

Partitions per fetch request = 1500 * 2 (replication) / 8 (followers) = 375
Partition reads per second = 375 * 8 * 200 = 6000000
CPU time in ms used per second for single trace call saved = 40 * 6000000 / 1000000 = 24ms

Assuming 4 cores each trace call we save saves approximately 24/1000/4 = 0.6%.

Comment thread core/src/main/scala/kafka/server/ReplicaManager.scala Outdated
override def processPartitionData(topicPartition: TopicPartition,
fetchOffset: Long,
partitionData: FetchData): Option[LogAppendInfo] = {
val logTrace = isTraceEnabled
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saves 3 getEffectiveLevel checks.


trace(s"${records.sizeInBytes} written to log $topicPartition beginning at offset " +
s"${info.firstOffset.getOrElse(-1)} and ending at offset ${info.lastOffset}")
if (isTraceEnabled)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking up the trace level outside of the map allows us to save a isTraceEnabled lookup for every partition being produced to.

trace(s"Fetching log segment for partition $tp, offset $offset, partition fetch size $partitionFetchSize, " +
s"remaining response limit $limitBytes" +
(if (minOneMessage) s", ignoring response/partition size limits" else ""))
if (isTraceEnabled)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking up the trace level above means we only need to get the effective level once even though read() may be called once for each partition.

lastFetchLeaderLogEndOffset = leaderEndOffset
lastFetchTimeMs = followerFetchTimeMs
updateLastSentHighWatermark(lastSentHighwatermark)
trace(s"Updated state of replica to $this")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this trace call is worthwhile given the cost. It would be hard to turn it on without being deluged with logs.

*/
private def updateLastSentHighWatermark(highWatermark: Long): Unit = {
_lastSentHighWatermark = highWatermark
trace(s"Updated HW of replica to $highWatermark")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this trace call is worthwhile given the cost. It would be hard to turn it on without being deluged with logs..

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junrao Thoughts? Can we remove these calls?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At minimum I think we could keep the trace I deleted here: #8468 (comment) which should end up logging the lastSentWatermark anyway?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree this may not be that useful. We have other logging like the following in Partition.updateFollowerFetchState() that provides similar traceability.

        debug(s"Recorded replica $followerId log end offset (LEO) position " +
          s"${followerFetchOffsetMetadata.messageOffset} and log start offset $followerStartOffset.")

Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbradstreet : Thanks for the PR. LGTM

@junrao
Copy link
Copy Markdown
Contributor

junrao commented Apr 17, 2020

ok to test

@junrao junrao merged commit 851b45c into apache:trunk Apr 17, 2020
Kvicii pushed a commit to Kvicii/kafka that referenced this pull request Apr 19, 2020
* 'trunk' of github.com:apache/kafka: (28 commits)
  MINOR: cleanup RocksDBStore tests  (apache#8510)
  KAFKA-9818: Fix flaky test in RecordCollectorTest (apache#8507)
  MINOR: reduce impact of trace logging in replica hot path (apache#8468)
  KAFKA-6145: KIP-441: Add test scenarios to ensure rebalance convergence (apache#8475)
  KAFKA-9881: Convert integration test to verify measurements from RocksDB to unit test (apache#8501)
  MINOR: improve test coverage for dynamic LogConfig(s) (apache#7616)
  MINOR: Switch order of sections on tumbling and hopping windows in streams doc. Tumbling windows are defined as "special case of hopping time windows" - but hopping windows currently only explained later in the docs. (apache#8505)
  KAFKA-9819: Fix flaky test in StoreChangelogReaderTest (apache#8488)
  HOTFIX: fix active task process ratio metric recording
  KAFKA-9796; Ensure broker shutdown is not stuck when Acceptor is waiting on connection queue (apache#8448)
  MINOR: Use streaming iterator with decompression buffer when building offset map (apache#8494)
  Add log message in release.py (apache#8461)
  KAFKA-9854 Re-authenticating causes mismatched parse of response (apache#8471)
  KAFKA-9838; Add log concurrency test and fix minor race condition (apache#8476)
  KAFKA-9703; Free up compression buffer after splitting a large batch
  KAFKA-9779: Add Stream system test for 2.5 release (apache#8378)
  KAFKA-7885: TopologyDescription violates equals-hashCode contract. (apache#6210)
  MINOR: KafkaApis#handleOffsetDeleteRequest does not group result correctly (apache#8485)
  HOTFIX: don't close or wipe out someone else's state (apache#8478)
  MINOR: add process(Test)Messages to the README (apache#8480)
  ...
ijuma added a commit to confluentinc/kafka that referenced this pull request Apr 21, 2020
* apache-github/trunk:
  MINOR: Fix grammar in error message for InvalidRecordException (apache#8465)
  KAFKA-9868: Reduce number of transaction log partitions for embed broker (apache#8522)
  MINOR: Adding github whitelist (apache#8523)
  MINOR: Upgrade gradle plugins and test libraries for Java 14 support (apache#8519)
  MINOR: Further reduce runtime for metrics integration tests (apache#8514)
  MINOR: Use .asf.yaml to direct github notifications to JIRA and mailing lists (apache#8521)
  MINOR: Update to Gradle 6.3 (apache#7677)
  HOTFIX: fix checkstyle error of RocksDBStoreTest and flaky RocksDBTimestampedStoreTest.shouldOpenExistingStoreInRegularMode (apache#8515)
  MINOR: cleanup RocksDBStore tests  (apache#8510)
  KAFKA-9818: Fix flaky test in RecordCollectorTest (apache#8507)
  MINOR: reduce impact of trace logging in replica hot path (apache#8468)
  KAFKA-6145: KIP-441: Add test scenarios to ensure rebalance convergence (apache#8475)
  KAFKA-9881: Convert integration test to verify measurements from RocksDB to unit test (apache#8501)
  MINOR: improve test coverage for dynamic LogConfig(s) (apache#7616)
  MINOR: Switch order of sections on tumbling and hopping windows in streams doc. Tumbling windows are defined as "special case of hopping time windows" - but hopping windows currently only explained later in the docs. (apache#8505)
  KAFKA-9819: Fix flaky test in StoreChangelogReaderTest (apache#8488)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants