KAFKA-17584: Fix incorrect synonym handling for dynamic log configurations#17258
KAFKA-17584: Fix incorrect synonym handling for dynamic log configurations#17258
Conversation
…tions Several Kafka log configurations in have synonyms. For example, log retention can be configured either by log.retention.ms, or by log.retention.minutes, or by log.retention.hours. There is also a faculty in Kafka to dynamically change broker configurations without restarting the broker. These dynamically set configurations are stored in the metadata log and override what is in the broker properties file. Unfortunately, these two features interacted poorly; there was a bug where the dynamic log configuration update code ignored synonyms. For example, if you set log.retention.minutes and then reconfigured something unrelated that triggered the LogConfig update path, the retention value that you had configured was overwritten. The reason for this was incorrect handling of synonyms. The code tried to treat the Kafka broker configuration as a bag of key/value entities rather than extracting the correct retention time (or other setting with overrides) from the KafkaConfig object. Separately from the above bug, the code did not honor the value of dynamically configured synonyms: setting log.retention.minutes had no effect; only log.retention.ms was honored.
b182d44 to
8cf31e0
Compare
showuon
left a comment
There was a problem hiding this comment.
LGTM! Minor comment left.
There's a DynamicBrokerReconfigurationTest.testConfigDescribeUsingAdminClient test failed. I had a look, it looks like we were validating the wrong thing before. The log.retention.hours should be not a read-only config. It can be dynamically changed, as well as log.roll.hours.
| results.asScala | ||
| } | ||
|
|
||
| val KafkaConfigToLogConfigName: Map[String, String] = { |
There was a problem hiding this comment.
nit: After this PR, this variable is only used in DynamicBrokerReconfigurationTest. We can move it there.
I think we should also update the documentation, as they are both reported as read-only. |
| ServerTopicConfigSynonyms.TOPIC_CONFIG_SYNONYMS.values.asScala.toSet - ServerLogConfigs.LOG_MESSAGE_FORMAT_VERSION_CONFIG | ||
| val KafkaConfigToLogConfigName: Map[String, String] = | ||
| ServerTopicConfigSynonyms.TOPIC_CONFIG_SYNONYMS.asScala.map { case (k, v) => (v, k) } | ||
| val ReconfigurableConfigs: Set[String] = { |
There was a problem hiding this comment.
nit (immutable style):
val ReconfigurableConfigs: Set[String] = {
ServerTopicConfigSynonyms.ALL_TOPIC_CONFIG_SYNONYMS
.values()
.asScala
.flatMap(v => v.asScala.map(configSynonym => configSynonym.name()))
.filterNot(_ == ServerLogConfigs.LOG_MESSAGE_FORMAT_VERSION_CONFIG)
.toSet
}
Good point! But the doc is actually generated, and I just confirmed, after this PR, the doc will be correctly updated: |
clolov
left a comment
There was a problem hiding this comment.
The log.retention.hours should be not a read-only config. It can be dynamically changed, as well as log.roll.hours.
I think we should also update the documentation, as they are both reported as read-only.
Good point! But the doc is actually generated, and I just confirmed, after this PR, the doc will be correctly updated
My only call out is that currently (in trunk) log.retention.hours/minutes are documented as read-only and behave as read-only. After this change they become cluster-wide in both documentation and behaviour. I am happy with this change as long as we publicly call it out in some form.
Agreed. This is what happens right now, so it would be a behavior change. I think a simple release note would be fine. $ bin/kafka-configs.sh --bootstrap-server :9092 --entity-type brokers --entity-name 2 --alter --add-config log.retention.minutes=1
Error while executing config command with args '--bootstrap-server :9092 --entity-type brokers --entity-name 2 --alter --add-config log.retention.minutes=1'
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.InvalidRequestException: Cannot update these configs dynamically: Set(log.retention.minutes)
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:180)
at kafka.admin.ConfigCommand$.alterConfig(ConfigCommand.scala:390)
at kafka.admin.ConfigCommand$.processCommand(ConfigCommand.scala:351)
at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:100)
at kafka.admin.ConfigCommand.main(ConfigCommand.scala)
Caused by: org.apache.kafka.common.errors.InvalidRequestException: Cannot update these configs dynamically: Set(log.retention.minutes) |
|
I've sent a mail dev/user mailing list to discuss it here. FYI. |
|
Looks like there is a failing test:
|
|
For now, I'm going to change it back to the current behavior where we ignore the value of dynamically configured synonyms such as |
showuon
left a comment
There was a problem hiding this comment.
LGTM! It's fine we keep the old behavior since it won't block any operation.
|
This issue was first reported in KAFKA-15266 and a PR #14119 is opened for it. |
amangandhi94
left a comment
There was a problem hiding this comment.
This looks good to me and addresses the bug I had raised - https://issues.apache.org/jira/browse/KAFKA-15266
Just wondering how can I get more attention to bugs or PR i have raised next time onwards?
| override def reconfigure(oldConfig: KafkaConfig, newConfig: KafkaConfig): Unit = { | ||
| val originalLogConfig = logManager.currentDefaultConfig | ||
| val originalUncleanLeaderElectionEnable = originalLogConfig.uncleanLeaderElectionEnable | ||
| val newBrokerDefaults = new util.HashMap[String, Object](originalLogConfig.originals) |
There was a problem hiding this comment.
Just curious - Do we know why this logic was even added? I cant seem to understand if there was any benefit to doing this.
In this case raising the bug as a blocker, due to the potential data loss. In general, sending an email to the dev mailing list would be enough to catch some more attention in my experience. Hope it helps. |
| val originalUncleanLeaderElectionEnable = originalLogConfig.uncleanLeaderElectionEnable | ||
| val newBrokerDefaults = new util.HashMap[String, Object](originalLogConfig.originals) | ||
| newConfig.valuesFromThisConfig.forEach { (k, v) => | ||
| if (DynamicLogConfig.ReconfigurableConfigs.contains(k)) { |
There was a problem hiding this comment.
It has been a long time, so I may not remember correctly. But I think we used to only update configs which are ReconfigurableConfigs even if you updated ZooKeeper directly with new configs that included others. Non-reconfigurable configs would be picked up only on the next broker restart. Do we think it is safe to include all configs now either because we don't allow other configs to be updated in KRaft or because all log configs are reconfigurable now?
There was a problem hiding this comment.
Thanks for this comment, @rajinisivaram . I think you're right that we should be excluding changes to configurations that don't appear in DynamicLogConfig.ReconfigurableConfigs. I have added some code to do this now. I think from a practical point of view, this only affects message.format.version, which we explicitly excluded from being dynamically reconfigurable.
Do we think it is safe to include all configs now either because we don't allow other configs to be updated in KRaft or because all log configs are reconfigurable now?
KRaft has the same behavior as ZK, in that it lets you set any broker configuration you want, whether it's valid or not. There were too many people depending on this for us to change it in 3.x. Maybe in the future if someone creates a KIP...
Of course the command-line tool does its own validation.
…urable - Add testLogRetentionTimeMinutesIsNotDynamicallyReconfigurable - clean up some cases where we were using zkconnect but did not need to
rajinisivaram
left a comment
There was a problem hiding this comment.
@cmccabe Thanks for the PR, LGTM
| val newBrokerDefaults = new util.HashMap[String, Object](newConfig.extractLogConfigMap) | ||
| originalLogConfig.originals().forEach((k, v) => { | ||
| if (!DynamicLogConfig.ReconfigurableConfigs.contains(k)) { | ||
| newBrokerDefaults.put(k, v) |
There was a problem hiding this comment.
- Do we need to translate the name from KafkaConfig to LogConfig as the original code does?
- We started
newBrokerDefaultswithnewConfig.extractLogConfigMapthat includes non-reconfigurable configs. Where is the logic to remove them?
There was a problem hiding this comment.
Let me fix this a bit. It should be sufficient to just remove the non-reconfigurable configs (really, singular config)
| val newBrokerDefaults = new util.HashMap[String, Object](newConfig.extractLogConfigMap) | ||
| val originalLogConfigMap = originalLogConfig.originals() | ||
| DynamicLogConfig.NonReconfigrableLogConfigs.foreach(k => { | ||
| Option(originalLogConfigMap.get(k)) match { |
There was a problem hiding this comment.
Hmm, originalLogConfigMap is the current config. We should check from newConfig, right?
There was a problem hiding this comment.
Non-reconfigurable configs are copied over from the current (not new) configuration.
| DynamicLogConfig.NonReconfigrableLogConfigs.foreach(k => { | ||
| Option(originalLogConfigMap.get(k)) match { | ||
| case None => newBrokerDefaults.remove(k) | ||
| case Some(v) => newBrokerDefaults.put(k, v) |
There was a problem hiding this comment.
Not sure that I follow here. Why are we putting a non-reconfigurable config to newBrokerDefaults?
There was a problem hiding this comment.
Because we want the non-reconfigurable configuration to have the same value (or lack of value) that it had previously. It should not change.
| case Some(v) => newBrokerDefaults.put(k, v) | ||
| } | ||
| } | ||
| }) |
There was a problem hiding this comment.
newBrokerDefaults has the config name with log prefix, right? Should we translate the name from KafkaConfig to LogConfig as the original code does?
There was a problem hiding this comment.
newBrokerDefaults has the config name with log prefix, right
no. it comes from KafkaConfig.extractLogConfigMap, which creates a map of log (aka topic) configurations, not broker configurations.
|
All tests passed. |
…tions (#17258) Several Kafka log configurations in have synonyms. For example, log retention can be configured either by log.retention.ms, or by log.retention.minutes, or by log.retention.hours. There is also a faculty in Kafka to dynamically change broker configurations without restarting the broker. These dynamically set configurations are stored in the metadata log and override what is in the broker properties file. Unfortunately, these two features interacted poorly; there was a bug where the dynamic log configuration update code ignored synonyms. For example, if you set log.retention.minutes and then reconfigured something unrelated that triggered the LogConfig update path, the retention value that you had configured was overwritten. The reason for this was incorrect handling of synonyms. The code tried to treat the Kafka broker configuration as a bag of key/value entities rather than extracting the correct retention time (or other setting with overrides) from the KafkaConfig object. Reviewers: Luke Chen <showuon@gmail.com>, Jun Rao <junrao@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Christo Lolov <lolovc@amazon.com>, Federico Valeri <fedevaleri@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>, amangandhi94 <>
…tions (apache#17258) Several Kafka log configurations in have synonyms. For example, log retention can be configured either by log.retention.ms, or by log.retention.minutes, or by log.retention.hours. There is also a faculty in Kafka to dynamically change broker configurations without restarting the broker. These dynamically set configurations are stored in the metadata log and override what is in the broker properties file. Unfortunately, these two features interacted poorly; there was a bug where the dynamic log configuration update code ignored synonyms. For example, if you set log.retention.minutes and then reconfigured something unrelated that triggered the LogConfig update path, the retention value that you had configured was overwritten. The reason for this was incorrect handling of synonyms. The code tried to treat the Kafka broker configuration as a bag of key/value entities rather than extracting the correct retention time (or other setting with overrides) from the KafkaConfig object. Reviewers: Luke Chen <showuon@gmail.com>, Jun Rao <junrao@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Christo Lolov <lolovc@amazon.com>, Federico Valeri <fedevaleri@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>, amangandhi94 <>
…tions (#17258) (#17278) Several Kafka log configurations in have synonyms. For example, log retention can be configured either by log.retention.ms, or by log.retention.minutes, or by log.retention.hours. There is also a faculty in Kafka to dynamically change broker configurations without restarting the broker. These dynamically set configurations are stored in the metadata log and override what is in the broker properties file. Unfortunately, these two features interacted poorly; there was a bug where the dynamic log configuration update code ignored synonyms. For example, if you set log.retention.minutes and then reconfigured something unrelated that triggered the LogConfig update path, the retention value that you had configured was overwritten. The reason for this was incorrect handling of synonyms. The code tried to treat the Kafka broker configuration as a bag of key/value entities rather than extracting the correct retention time (or other setting with overrides) from the KafkaConfig object. Reviewers: Luke Chen <showuon@gmail.com>, Jun Rao <junrao@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Christo Lolov <lolovc@amazon.com>, Federico Valeri <fedevaleri@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>, amangandhi94 <> Co-authored-by: Colin Patrick McCabe <cmccabe@apache.org> Reviewers: Josep Prat <josep.prat@aiven.io>
…tions This is a cherry-pick of #17258 to 3.7.2 This commit differs from the original by using the old (read 3.7) references to the configurations and not changing as many unit tests Reviewers: Divij Vaidya <diviv@amazon.com>, Colin Patrick McCabe <cmccabe@apache.org>
…tions (apache#17258) Several Kafka log configurations in have synonyms. For example, log retention can be configured either by log.retention.ms, or by log.retention.minutes, or by log.retention.hours. There is also a faculty in Kafka to dynamically change broker configurations without restarting the broker. These dynamically set configurations are stored in the metadata log and override what is in the broker properties file. Unfortunately, these two features interacted poorly; there was a bug where the dynamic log configuration update code ignored synonyms. For example, if you set log.retention.minutes and then reconfigured something unrelated that triggered the LogConfig update path, the retention value that you had configured was overwritten. The reason for this was incorrect handling of synonyms. The code tried to treat the Kafka broker configuration as a bag of key/value entities rather than extracting the correct retention time (or other setting with overrides) from the KafkaConfig object. Reviewers: Luke Chen <showuon@gmail.com>, Jun Rao <junrao@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Christo Lolov <lolovc@amazon.com>, Federico Valeri <fedevaleri@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>, amangandhi94 <>
Several Kafka log configurations in have synonyms. For example, log retention can be configured either by log.retention.ms, or by log.retention.minutes, or by log.retention.hours. There is also a faculty in Kafka to dynamically change broker configurations without restarting the broker. These dynamically set configurations are stored in the metadata log and override what is in the broker properties file.
Unfortunately, these two features interacted poorly; there was a bug where the dynamic log configuration update code ignored synonyms. For example, if you set log.retention.minutes and then reconfigured something unrelated that triggered the LogConfig update path, the retention value that you had configured was overwritten.
The reason for this was incorrect handling of synonyms. The code tried to treat the Kafka broker configuration as a bag of key/value entities rather than extracting the correct retention time (or other setting with overrides) from the KafkaConfig object.