KAFKA-7557: optimize LogManager.truncateFullyAndStartAt() by huxihx · Pull Request #5848 · apache/kafka

huxihx · 2018-10-28T01:06:30Z

Instead of calling deleteSnapshotsAfterRecoveryPointCheckpoint for allLogs, a possible optimization could be invoking it only for the logs being truncated.

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

ijuma · 2018-10-28T05:09:49Z

This is a good fix and we should keep it. However, I think @junrao was suggesting something even better. We call checkpointLogRecoveryOffsetsInDir from a number of places where we just want to do it for a single partition. In such cases, we can do deleteSnapshotsAfterRecoveryPointCheckpoint for that partition only.

ijuma · 2018-10-29T04:55:37Z

A few suggestions:

Maybe pass a Option[Seq[Log]] here, remove the default argument and call it affectedLogs.

In the documentation, we should specify that if a subset of logs are passed, we can optimize the producer snapshot deletion process. The actual checkpointing always involves all the logs.

junrao · 2018-10-30T18:50:03Z

@huxihx : Thanks for the patch. I realized that LogManager.truncateTo() has a similar problem since it calls checkpointLogRecoveryOffsetsInDir on every disk dir. To properly fix this, we probably should change AbstractFetcherThread.maybeTruncate to call truncate on a batch of partitions. In LogManager.truncateTo(), we only call checkpointLogRecoveryOffsetsInDir on the disk dir that has at least one truncated partition and also pass along a set of partitions per disk dir to checkpointLogRecoveryOffsetsInDir to optimize the number of calls to log.deleteSnapshotsAfterRecoveryPoint. Do you think you could address that in the same PR?

junrao

@huxihx : Thanks for the updated patch. A few more comments.

junrao · 2018-11-01T22:13:14Z

It seems that log.dir.getAbsolutePath should be log.dir.getParentFile.getAbsolutePath? Could we make this method visible at the package level and add a unit test?

junrao · 2018-11-01T22:15:01Z

truncated --> affected?

junrao · 2018-11-01T22:39:53Z

Could we make offsetTruncationStates non-optional and get rid of tp and offsetTruncationState?

huxihx · 2018-11-02T02:57:40Z

@junrao Are we safe to call LogManager#checkpointLogRecoveryOffsets without partition's leaderIsrUpdateLock lock?

junrao · 2018-11-02T23:15:26Z

@huxihx : That's a good question. We are already calling checkpointLogRecoveryOffsets() in the scheduler in LogManager w/o partition level leaderIsrUpdateLock. The purpose of partition level leaderIsrUpdateLock is to prevent leader changes while the partition is being used. checkpointLogRecoveryOffsets() just writes the current per partition recovery offset to a file, which is independent of partition level leader changes. So, this seems ok.

huxihx · 2018-11-06T05:51:17Z

retest this please

junrao

@huxihx : Thanks for the updated patch. Added a few more comments below.

junrao · 2018-11-06T22:41:36Z

        // Close the log, update checkpoint files, and enqueue this log to be deleted.
        sourceLog.close()
-        checkpointLogRecoveryOffsetsInDir(sourceLog.dir.getParentFile)
+        checkpointLogRecoveryOffsetsInDir(sourceLog.dir.getParentFile, Some(Seq(sourceLog)))


Since sourceLog will be deleted soon, there is no need to clean the snapshot for this partition. We could just pass in None.

I think it'd better pass it explicitly since only passing None could also means doing for all logs.

junrao · 2018-11-06T22:42:47Z

      }
      removedLog.renameDir(Log.logDeleteDirName(topicPartition))
-      checkpointLogRecoveryOffsetsInDir(removedLog.dir.getParentFile)
+      checkpointLogRecoveryOffsetsInDir(removedLog.dir.getParentFile, Some(Seq(removedLog)))


Similar to the above, since removedLog will be deleted soon, there is no need to clean the snapshot for this partition. We could just pass in None.

Ditto. I think it'd better pass it explicitly since only passing None could also means doing for all logs. In checkpointLogRecoveryOffsetsInDir, all logs ending with -delete will be excluded.

huxihx · 2018-11-07T07:40:43Z

retest this please

junrao

@huxihx : Thanks for the new patch. Posted another comment on better structuring the code. What do you think?

junrao · 2018-11-07T21:09:53Z

+    * @param affectedLogs logs whose snapshots need to be cleaned. If it's None, the snapshot for all logs in the directory will be cleaned
+    */
+  // Only for testing
+  private[log] def checkpointLogRecoveryOffsetsInDir(dir: File, affectedLogs: Option[Seq[Log]] = None): Unit = {


Thinking about this a bit more. I feel that Option[Seq[Log]] is a bit hard to understand. The issue is that we are trying to do 2 separate things--writing the recovery checkpoint file and deleting the snapshot---in a single method. Perhaps it's cleaner to split them into 2 separate methods. I am thinking of the following.

// write the recovery checkpoint in the provided directory
def checkpointLogRecoveryOffsetsInDir(dir: File)

// clean the producer snapshot files in the provided logs
def cleanSnapshot(logs: Seq[Log])

Would that be better?

That's a good point Jun. My only concern is that we lose the fact that these two things should happen at the same time. Is that ever not true? Another option would be to replace Option[Seq[Long]] with Seq[Log] and then just have the caller always pass the sequence.

It's mostly true that these two things should happen together. However, in the case when we want to delete a partition (asyncDelete), we just want to checkpoint the recovery offsets w/o the to be deleted partition. There is no need to delete the snapshot since the partition will be deleted.

But I agree, perhaps the better compromise is to have a single method like the following and force the caller to explicitly pass in the logs to clean the snapshots.

def checkpointRecoveryOffsetsAndCleanSnapshot(dir: File, logsToCleanSnapshot: Seq[Log])

Sounds good.

I guess it's a good idea to have this method split into two sub routines focusing on their own jobs.

junrao

@huxihx : Thanks for the updated patch. LGTM. Just a few minor comments below.

junrao

@huxihx : Thanks for the update. Still one more comment below.

junrao · 2018-11-09T02:05:26Z

+  // Only for testing
+  private[log] def checkpointRecoveryOffsetsAndCleanSnapshot(dir: File, logsToCleanSnapshot: Seq[Log]): Unit = {
+    try {
+      checkpointLogRecoveryOffsetsInDir(dir)


It's probably better to fold the logic in checkpointLogRecoveryOffsetsInDir(dir: File) here and let all callers go through checkpointRecoveryOffsetsAndCleanSnapshot(). Currently, there are still a couple of callers to checkpointLogRecoveryOffsetsInDir(dir: File) directly. The issue is that IOException is not handled properly there as in checkpointRecoveryOffsetsAndCleanSnapshot().

ijuma

One thing I noticed:

private def logsByDir: Map[String, Map[TopicPartition, Log]] = {
    (this.currentLogs.toList ++ this.futureLogs.toList).groupBy {
      case (_, log) => log.dir.getParent
    }.mapValues(_.toMap)

We should replace mapValues with map. The reason is that mapValues is lazy and it should generally only be used for cheap operations to ensure the performance model is understandable. With the current implementation, everytime someone does a get it triggers a toMap call.

junrao

@huxihx : Thanks for your patience. LGTM

Instead of calling deleteSnapshotsAfterRecoveryPointCheckpoint for allLogs, invoking it only for the logs being truncated. Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>

ijuma reviewed Oct 28, 2018

View reviewed changes

ijuma reviewed Oct 29, 2018

View reviewed changes

junrao reviewed Nov 1, 2018

View reviewed changes

rebase the trunk

7e812c9

huxihx force-pushed the KAFKA-7557 branch from 35ac9bf to 7e812c9 Compare November 6, 2018 02:59

Merge branch 'trunk' into KAFKA-7557

d6bb460

junrao reviewed Nov 6, 2018

View reviewed changes

huxihx added 2 commits November 7, 2018 12:20

addressed Jun's comments

8fbe4cb

addressed Jun's comments

828ac27

junrao reviewed Nov 7, 2018

View reviewed changes

checkpointRecoveryOffsetsAndCleanSnapshot split

8e0622b

junrao approved these changes Nov 8, 2018

View reviewed changes

Comment thread core/src/main/scala/kafka/log/LogManager.scala Outdated

Comment thread core/src/main/scala/kafka/log/LogManager.scala Outdated

Comment thread core/src/main/scala/kafka/log/LogManager.scala Outdated

ijuma reviewed Nov 8, 2018

View reviewed changes

Comment thread core/src/main/scala/kafka/log/LogManager.scala Outdated

addressed comments

2b37afc

junrao reviewed Nov 9, 2018

View reviewed changes

hide checkpointLogRecoveryOffsetsInDir

0ecf259

ijuma reviewed Nov 9, 2018

View reviewed changes

refined logsByDir

e796447

junrao approved these changes Nov 12, 2018

View reviewed changes

junrao merged commit 3eaf44b into apache:trunk Nov 12, 2018

Conversation

huxihx commented Oct 28, 2018

Committer Checklist (excluded from commit message)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrao commented Oct 30, 2018

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huxihx commented Nov 2, 2018

Uh oh!

junrao commented Nov 2, 2018

Uh oh!

huxihx commented Nov 6, 2018

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

huxihx commented Nov 7, 2018

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijuma left a comment

Choose a reason for hiding this comment

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects