Skip to content

KAFKA-13092: Perf regression in LISR requests#11056

Merged
junrao merged 1 commit intoapache:trunkfrom
jolshan:KAFKA-13092
Jul 15, 2021
Merged

KAFKA-13092: Perf regression in LISR requests#11056
junrao merged 1 commit intoapache:trunkfrom
jolshan:KAFKA-13092

Conversation

@jolshan
Copy link
Copy Markdown
Member

@jolshan jolshan commented Jul 15, 2021

After noticing increased LISR times, we discovered a lot of time was spent synchronously flushing the partition metadata file. This PR changes the code so we asynchronously flush the files.

We ensure files are flushed before appending, renaming or closing the log to ensure we have the partition metadata information on disk. Three new tests have been added to address these cases.

Benchmark by @lbradstreet is included to compare the times. We are looking for the mode that uses topic IDs to decrease compared to trunk. Here are the results comparing trunk to this pr:
Trunk

Benchmark                            (numPartitions)  (useTopicIds)  Mode  Cnt     Score     Error  Units
PartitionCreationBench.makeFollower             2000          false  avgt   15  2421.390 ± 120.005  ms/op
PartitionCreationBench.makeFollower             2000           true  avgt   15  5197.590 ±  97.346  ms/op

PR

Benchmark                            (numPartitions)  (useTopicIds)  Mode  Cnt     Score     Error  Units
PartitionCreationBench.makeFollower             2000          false  avgt   15  2451.289 ± 144.459  ms/op
PartitionCreationBench.makeFollower             2000           true  avgt   15  2848.374 ± 177.074  ms/op

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Copy Markdown
Contributor

@lbradstreet lbradstreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We'll need a committer to give the final OK.

Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jolshan : Thanks for the PR. LGTM

@junrao junrao merged commit 584213e into apache:trunk Jul 15, 2021
@junrao
Copy link
Copy Markdown
Contributor

junrao commented Jul 15, 2021

@jolshan : The PR doesn't apply cleanly to 3.0. Could you provide a separate PR for 3.0? Thanks.

jolshan added a commit to jolshan/kafka that referenced this pull request Jul 15, 2021
…artition.metadata file (apache#11056)

After noticing increased LISR times, we discovered a lot of time was spent synchronously flushing the partition metadata file. This PR changes the code so we asynchronously flush the files.

We ensure files are flushed before appending, renaming or closing the log to ensure we have the partition metadata information on disk. Three new tests have been added to address these cases.

Reviewers:  Lucas Bradstreet <lucas@confluent.io>, Jun Rao <junrao@gmail.com>
junrao pushed a commit that referenced this pull request Jul 15, 2021
…artition.metadata file (#11056) (#11065)

After noticing increased LISR times, we discovered a lot of time was spent synchronously flushing the partition metadata file. This PR changes the code so we asynchronously flush the files.

We ensure files are flushed before appending, renaming or closing the log to ensure we have the partition metadata information on disk. Three new tests have been added to address these cases.

Reviewers:  Lucas Bradstreet <lucas@confluent.io>, Jun Rao <junrao@gmail.com>
jolshan added a commit to jolshan/kafka that referenced this pull request Jul 17, 2021
…artition.metadata file (apache#11056)

After noticing increased LISR times, we discovered a lot of time was spent synchronously flushing the partition metadata file. This PR changes the code so we asynchronously flush the files.

We ensure files are flushed before appending, renaming or closing the log to ensure we have the partition metadata information on disk. Three new tests have been added to address these cases.

Reviewers:  Lucas Bradstreet <lucas@confluent.io>, Jun Rao <junrao@gmail.com>
xdgrulez pushed a commit to xdgrulez/kafka that referenced this pull request Dec 22, 2021
…artition.metadata file (apache#11056)

After noticing increased LISR times, we discovered a lot of time was spent synchronously flushing the partition metadata file. This PR changes the code so we asynchronously flush the files.

We ensure files are flushed before appending, renaming or closing the log to ensure we have the partition metadata information on disk. Three new tests have been added to address these cases.

Reviewers:  Lucas Bradstreet <lucas@confluent.io>, Jun Rao <junrao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants