Skip to content

KAFKA-17801: RemoteLogManager may compute inaccurate upperBoundOffset for aborted txns#17676

Merged
kamalcph merged 5 commits intoapache:trunkfrom
kamalcph:KAFKA-17801
Nov 9, 2024
Merged

KAFKA-17801: RemoteLogManager may compute inaccurate upperBoundOffset for aborted txns#17676
kamalcph merged 5 commits intoapache:trunkfrom
kamalcph:KAFKA-17801

Conversation

@kamalcph
Copy link
Copy Markdown
Contributor

@kamalcph kamalcph commented Nov 3, 2024

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@kamalcph kamalcph requested review from junrao and satishd November 3, 2024 16:23
@github-actions github-actions Bot added core Kafka Broker small Small PRs labels Nov 3, 2024
@kamalcph
Copy link
Copy Markdown
Contributor Author

kamalcph commented Nov 3, 2024

Not sure how to cover this patch with unit tests, please suggest me if you have ideas. Thanks!

@kamalcph kamalcph added the tiered-storage Related to the Tiered Storage feature label Nov 3, 2024
@kamalcph kamalcph closed this Nov 3, 2024
@kamalcph kamalcph reopened this Nov 3, 2024
Comment thread core/src/main/java/kafka/log/remote/RemoteLogManager.java Outdated
Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamalcph : Thanks for the PR. Left a comment.

For unit test, RemoteLogManagerTest already calls remoteLogManager.read. Could we add a unit test there?

Comment thread core/src/main/java/kafka/log/remote/RemoteLogManager.java Outdated
@github-actions github-actions Bot removed the small Small PRs label Nov 6, 2024
@kamalcph
Copy link
Copy Markdown
Contributor Author

kamalcph commented Nov 6, 2024

@junrao

Thanks for the review! Addressed the review comments and also added an unit test. PTAL.

@kamalcph
Copy link
Copy Markdown
Contributor Author

kamalcph commented Nov 6, 2024

Reran the failed test locally with Java 23, it got succeeded:

Gradle Test Run :storage:test > Gradle Test Executor 3 > TransactionsWithTieredStoreTest > testBumpTransactionalEpochWithTV2Disabled(String, boolean) > "testBumpTransactionalEpochWithTV2Disabled(String, boolean).quorum=zk, isTV2Enabled=false" PASSED

Gradle Test Run :storage:test > Gradle Test Executor 3 > TransactionsWithTieredStoreTest > testBumpTransactionalEpochWithTV2Disabled(String, boolean) > "testBumpTransactionalEpochWithTV2Disabled(String, boolean).quorum=kraft, isTV2Enabled=false" PASSED

% java --version
openjdk 23.0.1 2024-10-15
OpenJDK Runtime Environment (build 23.0.1+11-39)
OpenJDK 64-Bit Server VM (build 23.0.1+11-39, mixed mode, sharing)

Re-triggered the failed job.

Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamalcph : Thanks for the updated PR. A few more comments.

Comment thread core/src/main/java/kafka/log/remote/RemoteLogManager.java Outdated
Comment thread core/src/main/java/kafka/log/remote/RemoteLogManager.java Outdated
Comment thread core/src/test/java/kafka/log/remote/RemoteLogManagerTest.java
Comment thread core/src/test/java/kafka/log/remote/RemoteLogManagerTest.java
Comment thread core/src/main/java/kafka/log/remote/RemoteLogManager.java
@kamalcph
Copy link
Copy Markdown
Contributor Author

kamalcph commented Nov 7, 2024

@junrao

Addressed your review comments. PTAL.

Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamalcph : Thanks for the updated PR. Just a minor comment.

Comment thread core/src/main/java/kafka/log/remote/RemoteLogManager.java
Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamalcph : Thanks for the updated PR. LGTM

@kamalcph kamalcph merged commit 324a74d into apache:trunk Nov 9, 2024
@kamalcph kamalcph deleted the KAFKA-17801 branch November 10, 2024 02:57
kamalcph added a commit to kamalcph/kafka that referenced this pull request Nov 10, 2024
… for aborted txns (apache#17676)

Reviewers: Jun Rao <junrao@gmail.com>
kamalcph added a commit that referenced this pull request Nov 10, 2024
… for aborted txns (#17676) (#1 7733)

Reviewers: Jun Rao <junrao@gmail.com>
chiacyu pushed a commit to chiacyu/kafka that referenced this pull request Nov 30, 2024
… for aborted txns (apache#17676)

Reviewers: Jun Rao <junrao@gmail.com>
eduwercamacaro added a commit to littlehorse-enterprises/kafka that referenced this pull request Dec 17, 2024
* KAFKA-17926 Improve the documentation explaining why max.in.flight.requests.per.connection should not exceed 5 (apache#17719)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

* MINOR: Cleanup GroupCoordinatorRecordHelpers (apache#17718)

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>

* rebase to fix merge conflict (apache#17702)

Fixes an issue with the TTD in the specific case where users don't specify an initial time for the driver and also don't specify a start timestamp for the TestInputTopic, then pipe input records without timestamps. This combination results in a slight mismatch in the expected timestamps for the piped records, which can be noticeable when writing tests with very small time deltas.

The problem is that, while both the TTD and the TestInputTopic will be initialized to the "current time" when not otherwise specified, it's possible for some milliseconds to have passed between the creation of the TTD and the creation of the TestInputTopic. This can result in a TestInputTopic getting a start timestamp that's several ms larger than the driver's time, and ultimately causing the piped input records to have timestamps slightly in the future relative to the driver.

In practice even those who hit this issue might not notice it if they aren't manipulating time in their tests, or are advancing time by enough to negate the several-milliseconds of difference. However we noticed a test fail due to this because we were testing a ttl-based processor and had advanced the driver time by only 1 millisecond past the ttl. The piped record should have been expired, but because it's timestamp was a few milliseconds longer than the driver's start time, this test ended up failing.

Reviewers: Matthias Sax <mjsax@apache.org>, Bruno Cadonna <cadonna@apache.org>, Lucas Brutschy < lbrutschy@confluent.io>

* KAFKA-17801: RemoteLogManager may compute inaccurate upperBoundOffset for aborted txns (apache#17676)

Reviewers: Jun Rao <junrao@gmail.com>

* MINOR: log fix in SnapshottableCoordinator (apache#17726)

Reviewers: donaldzhu-cc, Chia-Ping Tsai <chia7712@gmail.com>

* KAFKA-17570 Rewrite StressTestLog by Java (apache#17249)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

* MINOR: Delete unused member from KafkaAdminClient (apache#17729)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

* KAFKA-17970 Moving some share purgatory classes from core to share module (apache#17722)

As part of PR: apache#17636 where purgatory has been moved from core to server-common hence move some existing classes used in Share Fetch to Share module.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

* KAFKA-17837 Rewrite DeleteTopicTest (apache#17579)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

* MINOR: Move StopPartition to server-common (apache#17704)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

* KAFKA-15549 Bump swagger dependency version from 2.2.8 to 2.2.25 (apache#17730)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

* KAFKA-17872: Update consumed offsets on records with invalid timestamp (apache#17710)

TimestampExtractor allows to drop records by returning a timestamp of -1. For this case, we still need to update consumed offsets to allows us to commit progress.

Reviewers: Bill Bejeck <bill@confluent.io>

* KAFKA-17925 Convert Kafka Client integration tests to use KRaft (apache#17670)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

* KAFKA-17779: Fix flaky RemoteLogManager test (apache#17724)

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>

* KAFKA-17455: fixes stuck producer by polling again after pollDelayMs in NetworkUtils#awaitReady()

* clarifies comments

* attempts to add test

* Adds a test but my changes to MockClient.java broke all sorts of stuff

* test that passes on my branch and fails on trunk

* addresses PR feedback: rename MockClient#setAdvanceTimeDuringPoll to advanceTimeDuringPoll()

---------

Co-authored-by: PoAn Yang <payang@apache.org>
Co-authored-by: David Jacot <djacot@confluent.io>
Co-authored-by: A. Sophie Blee-Goldman <ableegoldman@gmail.com>
Co-authored-by: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
Co-authored-by: Jeff Kim <kimkb2011@gmail.com>
Co-authored-by: TengYao Chi <kitingiao@gmail.com>
Co-authored-by: Apoorv Mittal <apoorvmittal10@gmail.com>
Co-authored-by: Mickael Maison <mimaison@users.noreply.github.com>
Co-authored-by: Yung <yungyung7654321@gmail.com>
Co-authored-by: Matthias J. Sax <matthias@confluent.io>
Co-authored-by: Kirk True <kirk@kirktrue.pro>
Co-authored-by: wperlichek <61857706+wperlichek@users.noreply.github.com>
Co-authored-by: Colt McNealy <colt@littlehorse.io>
tedyu pushed a commit to tedyu/kafka that referenced this pull request Jan 6, 2025
… for aborted txns (apache#17676)

Reviewers: Jun Rao <junrao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Kafka Broker tiered-storage Related to the Tiered Storage feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants