Skip to content

KAFKA-13092: Perf regression in LISR requests#11073

Merged
dajac merged 5 commits intoapache:2.8from
jolshan:KAFKA-13092-2.8
Aug 31, 2021
Merged

KAFKA-13092: Perf regression in LISR requests#11073
dajac merged 5 commits intoapache:2.8from
jolshan:KAFKA-13092-2.8

Conversation

@jolshan
Copy link
Copy Markdown
Member

@jolshan jolshan commented Jul 17, 2021

Cherry pick of 584213e

Main difference is that we do not try to write to partition metadata file when initializing the Log. Since we only do so in checkOrSet topic ID and there is a small bug in that path, the benchmark attempts to run that method last -- so that the log is created and we can actually record the ID to write.

Some quick runs of the benchmarks --
Master

Benchmark                                   (numPartitions)  (useTopicIds)  Mode  Cnt     Score   Error  Units
PartitionCreationBench.makeFollower                    2000          false  avgt    2  2576.278          ms/op
PartitionCreationBench.makeFollower                    2000           true  avgt    2  5378.075          ms/op

PR

Benchmark                                   (numPartitions)  (useTopicIds)  Mode  Cnt     Score   Error  Units
PartitionCreationBench.makeFollower                    2000          false  avgt    2  2678.498          ms/op
PartitionCreationBench.makeFollower                    2000           true  avgt    2  2962.268          ms/op

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

…artition.metadata file (apache#11056)

After noticing increased LISR times, we discovered a lot of time was spent synchronously flushing the partition metadata file. This PR changes the code so we asynchronously flush the files.

We ensure files are flushed before appending, renaming or closing the log to ensure we have the partition metadata information on disk. Three new tests have been added to address these cases.

Reviewers:  Lucas Bradstreet <lucas@confluent.io>, Jun Rao <junrao@gmail.com>
@dajac
Copy link
Copy Markdown
Member

dajac commented Aug 18, 2021

@jolshan is it something that we must add to 2.8.1?

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Aug 18, 2021

Ah yes. I had a feeling I still had a pr open for this. I think I can clean it up a bit and it should be ready for review by tomorrow.

@dajac
Copy link
Copy Markdown
Member

dajac commented Aug 24, 2021

@jolshan All the builds have failed. Could you take a look?

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Aug 24, 2021

Looks like an issue with more recent changes. I'll fix it up

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Aug 24, 2021

Fixed my issue locally but looks like 2.8 branch is broken in general.

> Task :streams:compileTestJava
/Users/jolshan/kafka/streams/src/test/java/org/apache/kafka/streams/integration/MetricsReporterIntegrationTest.java:73: error: stop() has private access in EmbeddedKafkaCluster
        CLUSTER.stop();
               ^
1 error

Will try to contact relevant person.

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Aug 25, 2021

fix here: #11257

@dajac
Copy link
Copy Markdown
Member

dajac commented Aug 26, 2021

Do you need to rebase the PR to include it?

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Aug 26, 2021

I think you may be able to just restart the build. (I think only committers can do that). If that doesn't work, I'll rebase.

@dajac
Copy link
Copy Markdown
Member

dajac commented Aug 26, 2021

I think you may be able to just restart the build. (I think only committers can do that). If that doesn't work, I'll rebase.

Done, thanks.

@dajac
Copy link
Copy Markdown
Member

dajac commented Aug 26, 2021

@jolshan All tests failed, again. I think that you have to rebase to include the fix.

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Aug 26, 2021

Ok will do.

@dajac
Copy link
Copy Markdown
Member

dajac commented Aug 30, 2021

@lbradstreet @junrao Could you take a look at this one as you both reviewed the original PR?

Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jolshan : Thanks for the PR. LGTM

Copy link
Copy Markdown
Contributor

@lbradstreet lbradstreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dajac
Copy link
Copy Markdown
Member

dajac commented Aug 31, 2021

@junrao @lbradstreet Thanks for your reviews.

The failures are not related. I will merge it to 2.8.

@dajac dajac merged commit fb47f6a into apache:2.8 Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants