Skip to content

MINOR: Prevent creating partition.metadata until ID can be written#10041

Merged
junrao merged 7 commits intoapache:trunkfrom
jolshan:cleanupPartitionMetadataFile
Feb 10, 2021
Merged

MINOR: Prevent creating partition.metadata until ID can be written#10041
junrao merged 7 commits intoapache:trunkfrom
jolshan:cleanupPartitionMetadataFile

Conversation

@jolshan
Copy link
Copy Markdown
Member

@jolshan jolshan commented Feb 3, 2021

Currently the partition.metadata file is created when the log is created. However, clusters with older inter-broker protocols will never use this file. This PR moves the creation of the file to when we write to the file.

This PR also deletes the partition.metadata file on startup if the IBP version is lower than 2.8.

I will be looking at benchmarks to see how these changes affect LISR request processing.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@jolshan jolshan force-pushed the cleanupPartitionMetadataFile branch from 603a074 to 58ee278 Compare February 5, 2021 16:49
Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jolshan : Thanks for the PR. A few comments below.

Comment thread core/src/main/scala/kafka/log/Log.scala Outdated
Comment thread core/src/main/scala/kafka/log/Log.scala Outdated
Comment thread core/src/main/scala/kafka/server/PartitionMetadataFile.scala Outdated
Comment thread core/src/test/scala/unit/kafka/log/LogManagerTest.scala Outdated
Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jolshan : Thanks for the updated PR. Just one more comment below.

logDirFailureChannel: LogDirFailureChannel,
private val hadCleanShutdown: Boolean = true) extends Logging with KafkaMetricsGroup {
private val hadCleanShutdown: Boolean = true,
val keepPartitionMetadataFile: Boolean = true) extends Logging with KafkaMetricsGroup {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the new param to the javadoc? In the javadoc, it would be useful to explain a bit how this helps with re-upgrade.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. will do!

Copy link
Copy Markdown
Member Author

@jolshan jolshan Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this work for an explanation?

* boolean flag to indicate whether the partition.metadata file should be kept in the 
* log directory. A partition.metadata file is only created when the controller's 
* inter-broker protocol version is at least 2.8. This file will persist the topic ID on
* the broker. If inter-broker protocol is downgraded below 2.8, a topic ID may be lost
* and a new ID generated upon re-upgrade. If the inter-broker protocol version is below
* 2.8, partition.metadata will be deleted to avoid ID conflicts upon re-upgrade. 

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks good to me.

Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jolshan : Thanks for the latest PR. LGTM

Are the test failure related? Also, I guess you want this PR in the 2.8 branch too?

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Feb 9, 2021

Yes. I will want this on the 2.8 branch. I'll check the failed tests. I've also had trouble with building at least one of the three JDKs, but it seems like it is not the same one each time.

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Feb 9, 2021

MirrorConnectorsIntegrationSSLTest.testOneWayReplicationWithAutoOffsetSync() seems to be flaky (is failing on other open PRs) It passed when I ran locally.
ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit() seems to be unrelated. I also ran locally and it passed.

@junrao
Copy link
Copy Markdown
Contributor

junrao commented Feb 9, 2021

@jolshan : Thanks. For those transient failures, could you file a jira if it's not tracked already?

Are the JDK 8 failures also transient?

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Feb 9, 2021

I had 2 JDK 11 failures previously. I also noticed JDK failures on other PRs. I'll check JIRA for these issues

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Feb 9, 2021

@junrao
Copy link
Copy Markdown
Contributor

junrao commented Feb 10, 2021

@jolshan : Thanks. Were the JDK 8 tests ok too?

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Feb 10, 2021

@junrao On the previous commit JDK8 built and only MirrorConnectorsIntegrationSSLTest.testOneWayReplicationWithAutoOffsetSync() failed. (The only difference between this commit and the most recent was the javadoc change. Before that, all JDK8 tests passed.

@junrao
Copy link
Copy Markdown
Contributor

junrao commented Feb 10, 2021

@jolshan Could you rebase the PR for trunk? Also, I am not sure if the PR ports to 2.8 cleanly, if not, could you submit a separate PR for 2.8? Thanks.

@junrao
Copy link
Copy Markdown
Contributor

junrao commented Feb 10, 2021

@jolshan : Do you have any benchmark results that you want to share?

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Feb 10, 2021

Sure. Using the LeaderAndIsrBenchmark (and the async profiler) LISRbench.zip
from #10071 I had these results:

Benchmark                                                          (partitionCount)  (topicCount)  Mode  Cnt        Score       Error  Units
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       10            10  avgt   15    50920.020 ±  6450.121  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       10            20  avgt   15    81505.310 ±  3821.451  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       10           100  avgt   15   451907.331 ±  2157.775  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       20            10  avgt   15    75708.157 ±  4316.357  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       20            20  avgt   15   136747.799 ±  3886.797  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       20           100  avgt   15   980846.811 ±  4017.757  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       50            10  avgt   15   156896.027 ±  4952.014  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       50            20  avgt   15   318294.323 ±  4729.402  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       50           100  avgt   15  2904827.652 ± 74264.592  ns/op

I took a look at the flame graph for these tests and the file I/O impact seemed to be very minimal

This was trunk before the changes:

Benchmark                                                          (partitionCount)  (topicCount)  Mode  Cnt        Score        Error  Units
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       10            10  avgt   15    53887.240 ±   5645.067  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       10            20  avgt   15    83015.916 ±   3560.633  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       10           100  avgt   15   481109.947 ±  18767.713  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       20            10  avgt   15    75932.544 ±   6528.281  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       20            20  avgt   15   148323.073 ±   6009.839  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       20           100  avgt   15  1146604.372 ±  54073.673  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       50            10  avgt   15   171529.923 ±   9940.990  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       50            20  avgt   15   327926.279 ±   9292.803  ns/op
LeaderAndIsrRequestBenchmark.testHandleLeaderAndIsrRequest                       50           100  avgt   15  3512983.407 ± 570652.231  ns/op

@junrao junrao merged commit 39dcdef into apache:trunk Feb 10, 2021
@junrao
Copy link
Copy Markdown
Contributor

junrao commented Feb 10, 2021

@jolshan : It seems this PR can't be applied cleanly in 2.8. Could you submit a separate PR for 2.8? Thanks.

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Feb 10, 2021

@junrao #10100

ijuma added a commit to ijuma/kafka that referenced this pull request Feb 14, 2021
…e-allocations-lz4

* apache-github/trunk: (118 commits)
  KAFKA-12327: Remove MethodHandle usage in CompressionType (apache#10123)
KAFKA-12297: Make MockProducer return RecordMetadata with values as
per contract
  MINOR: Update zstd and use classes with no finalizers (apache#10120)
KAFKA-12326: Corrected regresion in MirrorMaker 2 executable
introduced with KAFKA-10021 (apache#10122)
KAFKA-12321 the comparison function for uuid type should be 'equals'
rather than '==' (apache#10098)
  MINOR: Add FetchSnapshot API doc in KafkaRaftClient (apache#10097)
  MINOR: KIP-631 KafkaConfig fixes and improvements (apache#10114)
  KAFKA-12272: Fix commit-interval metrics (apache#10102)
  MINOR: Improve confusing admin client shutdown logging (apache#10107)
  MINOR: Add BrokerMetadataListener (apache#10111)
  MINOR: Support Raft-based metadata quorums in system tests (apache#10093)
MINOR: add the MetaLogListener, LocalLogManager, and Controller
interface. (apache#10106)
  MINOR: Introduce the KIP-500 Broker lifecycle manager (apache#10095)
MINOR: Remove always-passing validation in
TestRecordTest#testProducerRecord (apache#9930)
KAFKA-5235: GetOffsetShell: Support for multiple topics and consumer
configuration override (KIP-635) (apache#9430)
MINOR: Prevent creating partition.metadata until ID can be written
(apache#10041)
  MINOR: Add RaftReplicaManager (apache#10069)
MINOR: Add ClientQuotaMetadataManager for processing QuotaRecord
(apache#10101)
  MINOR: Rename DecommissionBrokers to UnregisterBrokers (apache#10084)
MINOR: KafkaBroker.brokerState should be volatile instead of
AtomicReference (apache#10080)
  ...

clients/src/main/java/org/apache/kafka/common/record/CompressionType.java
core/src/test/scala/unit/kafka/coordinator/group/GroupMetadataManagerTest.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants