KAFKA-7321: Add a Maximum Log Compaction Lag (KIP-354) by xiowu0 · Pull Request #6009 · apache/kafka

xiowu0 · 2018-12-06T22:17:44Z

Implement the change described in KIP-354
Added unit tests.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

xiowu0 · 2019-01-14T20:25:04Z

Does anyone have time to review the code? @cmccabe @jjkoshy @lindong28

jjkoshy

Thanks for the patch @xiowu0

jjkoshy · 2019-02-05T00:58:12Z

This works, but I think we should try and avoid passing in a new preCleanStats param with default.
E.g., we could separate out updating max compaction delay (i.e., separate function from this) and all it does is update the stat; alternately just have a volatile maxCompactionDelay member and update that from this method. A minor disadvantage of a snapshot stats object is that it is necessary to drive its progress from the cleaner thread - for bulk stats such as cleaner stats it is okay. For a metric that you might rely on for alerting, its true value would be delayed by up to log.cleaner.backoff in low volume scenarios.

Since grabFilthiestCompactedLog and cleaner stats are defined in two classes. Volatile global variables doesn't make our life easier. The default "new PreCleanStats()" is mainly for the purposes not to change many test cases that use this function directly.
In terms of delay, the max delay can only be safely populated in log cleaner thread, and it reflects the correct view when the delay is calculated. So the next update of maxdelay might be delayed by the backoff and the time spent in the actual compaction.

I still dislike the "step"-effect that this has. i.e., from the point in time any log is due for compaction, the maxCompactionDelay metric should be increasing with time. This is minor in the sense that you will record it the next time the cleaner gets around to computing it. I think this can be addressed in a follow-up.

jjkoshy · 2019-02-05T23:46:36Z

I think it would also be useful to log a count of how many logs are unclean and of those, how many logs are cleanable due to violating the max compaction lag constraint.

I added related logs : see PreCleanStats

jjkoshy · 2019-04-23T17:46:41Z

Sorry I lost track of this - as of my last review I'm +1 but will look over it again in the next couple of days before checking in. cc @cmccabe since I heard he was interested in taking a look as well.

ijuma · 2019-05-07T14:46:14Z

@jjkoshy The feature freeze is in 1 week so you should hurry if you want this in the next release. :)

minor doc changes

jjkoshy · 2019-05-10T22:36:05Z

I still dislike the "step"-effect that this has. i.e., from the point in time any log is due for compaction, the maxCompactionDelay metric should be increasing with time. This is minor in the sense that you will record it the next time the cleaner gets around to computing it. I think this can be addressed in a follow-up.

address review comments

jjkoshy · 2019-05-11T06:03:28Z

        None
      } else {
+        preCleanStats.recordCleanablePartitions(cleanableLogs.size)
        val filthiest = cleanableLogs.max


Sorry I didn't notice earlier: should we actually prioritize a log that is past its max compaction delay over a log that is more dirty?

The original idea is to sort the log based on the compaction delay that passed the the max delay. But a log with a very short compaction delay may always takes priority over a very dirty log (with high dirty ratio). I think it is better not to prioritize it since the the compaction finish time is not actually guaranteed since log cleaner thread can take a long time for compaction, and it can work on other log when a log go beyond the max compaction lag.

This really depends on the interpretation of the config. For PII data for e.g., you should be able to provide some guarantee. Either way, there is the possibility of starvation. However, we do have sensors to indicate this situation, so I think we can leave it as is and revisit if people want harder guarantees.

jjkoshy · 2019-05-13T16:17:14Z

@xiowu0 can you also provide a follow-up patch to update the website documentation?

…es-14-May * AK_REPO/trunk: (24 commits) KAFKA-7321: Add a Maximum Log Compaction Lag (KIP-354) (apache#6009) KAFKA-8335; Clean empty batches when sequence numbers are reused (apache#6715) KAFKA-6455: Session Aggregation should use window-end-time as record timestamp (apache#6645) KAFKA-6521: Use timestamped stores for KTables (apache#6667) [MINOR] Consolidate in-memory/rocksdb unit tests for window & session store (apache#6677) MINOR: Include StickyAssignor in system tests (apache#5223) KAFKA-7633: Allow Kafka Connect to access internal topics without cluster ACLs (apache#5918) MINOR: Align KTableAgg and KTableReduce (apache#6712) MINOR: Fix code section formatting in TROGDOR.md (apache#6720) MINOR: Remove unnecessary OptionParser#accepts method call from PreferredReplicaLeaderElectionCommand (apache#6710) KAFKA-8352 : Fix Connect System test failure 404 Not Found (apache#6713) KAFKA-8348: Fix KafkaStreams JavaDocs (apache#6707) MINOR: Add missing option for running vagrant-up.sh with AWS to vagrant/README.md KAFKA-8344; Fix vagrant-up.sh to work with AWS properly MINOR: docs typo in '--zookeeper myhost:2181--execute' MINOR: Remove header and key/value converter config value logging (apache#6660) KAFKA-8231: Expansion of ConnectClusterState interface (apache#6584) KAFKA-8324: Add close() method to RocksDBConfigSetter (apache#6697) KAFKA-6789; Handle retriable group errors in AdminClient API (apache#5578) KAFKA-8332: Refactor ImplicitLinkedHashSet to avoid losing ordering when converting to Scala ...

KAFKA-7321: Add a Maximum Log Compaction Lag (KIP-354) Records become eligible for compaction after the specified time interval. Author: Xiongqi Wu <xiowu@linkedin.com> Reviewer: Joel Koshy <jjkoshy@gmail.com>

xiowu0 commented Jan 29, 2019

View reviewed changes

Comment thread core/src/main/scala/kafka/log/Log.scala Outdated

jjkoshy requested changes Feb 5, 2019

View reviewed changes

jjkoshy reviewed Feb 5, 2019

View reviewed changes

xiowu0 added 2 commits May 10, 2019 16:13

KAFKA-7321: Add a Maximum Log Compaction Lag (KIP-354)

6e4080e

KAFKA-7321: Add a Maximum Log Compaction Lag (KIP-354)

5188bdd

minor doc changes

jjkoshy reviewed May 10, 2019

View reviewed changes

xiowu0 and others added 2 commits May 10, 2019 16:15

KAFKA-7321: Add a Maximum Log Compaction Lag (KIP-354)

b58f41a

address review comments

rebase and add comments

e66773e

xiowu0 force-pushed the maxlogcompactlag branch from 774faeb to e66773e Compare May 11, 2019 01:11

jjkoshy reviewed May 11, 2019

View reviewed changes

jjkoshy approved these changes May 13, 2019

View reviewed changes

jjkoshy merged commit 1fdc853 into apache:trunk May 13, 2019

Conversation

xiowu0 commented Dec 6, 2018

Committer Checklist (excluded from commit message)

Uh oh!

xiowu0 commented Jan 14, 2019

Uh oh!

Uh oh!

jjkoshy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjkoshy commented Apr 23, 2019

Uh oh!

ijuma commented May 7, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjkoshy commented May 13, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants