Skip to content

KAFKA-10085: correctly compute lag for optimized source changelogs#8787

Merged
vvcephei merged 15 commits intoapache:trunkfrom
ableegoldman:10085-compute-lag-for-optimized-source-changelogs
Jun 11, 2020
Merged

KAFKA-10085: correctly compute lag for optimized source changelogs#8787
vvcephei merged 15 commits intoapache:trunkfrom
ableegoldman:10085-compute-lag-for-optimized-source-changelogs

Conversation

@ableegoldman
Copy link
Copy Markdown
Member

@ableegoldman ableegoldman commented Jun 3, 2020

Split out the optimized source changelogs and fetch the committed offsets rather than the end offset for task lag computation

Must be cherrypicked to 2.6

@vvcephei
Copy link
Copy Markdown
Contributor

vvcephei commented Jun 3, 2020

Ok to test

Copy link
Copy Markdown
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ableegoldman ! It LGTM overall. I had a couple of comments. Also, it seems like we're missing a fair amount of testing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a step backwards, actually. Why wrap it as a StreamsException only just to immediately unwrap it again?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this might raise some eyebrows. I wanted to keep the ClientUtils methods consistent, and thought wrapping everything as a StreamsException would be cleaner. But maybe it makes more sense to throw the TimeoutException separately...

@ableegoldman
Copy link
Copy Markdown
Member Author

Hey @vvcephei, I addressed your comments and added tests. Let me know if there's any test coverage that still seems missing

@ableegoldman ableegoldman force-pushed the 10085-compute-lag-for-optimized-source-changelogs branch from bc50399 to 859c4c5 Compare June 5, 2020 01:41
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvcephei I've been wondering if maybe we should only catch the TimeoutException, and interpret a StreamsException as fatal (like IllegalStateException for example). This is how we were using Consumer#committed in the StoreChangelogReader, and AFAICT that only throws KafkaException on "unrecoverable errors" (quoted from javadocs)
But I can't tell whether the Admin's listOffsets might throw on transient errors, so I'm leaning towards catching both just to be safe. WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable, but I think if you throw an exception in the assignor, it just calls the assignor again in a tight loop, which seems worse than backing off and trying again later.

If you want to propose this change, maybe you can verify what exactly happens if we throw.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you throw an exception in the assignor, it just calls the assignor again in a tight loop

Wouldn't the leader thread just die? Not saying that that's ideal, either. But it's at least in line with how exceptions thrown by other admin client requests in the assignment are currently handled.

@vvcephei
Copy link
Copy Markdown
Contributor

vvcephei commented Jun 5, 2020

Test this please

@ableegoldman
Copy link
Copy Markdown
Member Author

Java14 build passed, Java 11 and 8 builds failed with env issue

@ableegoldman ableegoldman force-pushed the 10085-compute-lag-for-optimized-source-changelogs branch from 859c4c5 to 2fa5d78 Compare June 9, 2020 19:43
@vvcephei
Copy link
Copy Markdown
Contributor

vvcephei commented Jun 9, 2020

Test this please

1 similar comment
@vvcephei
Copy link
Copy Markdown
Contributor

vvcephei commented Jun 9, 2020

Test this please

Copy link
Copy Markdown
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still haven’t managed to finish a review, but I did have this comment:

Comment on lines 143 to 145
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon retrospect, I'm not sure if this is possible. The javadoc for Future#get indicates that any exception would be wrapped in an ExecutionException.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Do you think it should still be thrown/treated separately, though? See also my comment in StreamsPartitionAssignor below

@vvcephei
Copy link
Copy Markdown
Contributor

Test this please

@ableegoldman ableegoldman force-pushed the 10085-compute-lag-for-optimized-source-changelogs branch from 0fb2736 to 303a6e7 Compare June 10, 2020 17:26
@mjsax
Copy link
Copy Markdown
Member

mjsax commented Jun 10, 2020

Retest this please.

1 similar comment
@mjsax
Copy link
Copy Markdown
Member

mjsax commented Jun 10, 2020

Retest this please.

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Jun 10, 2020

Retest this please.

@vvcephei vvcephei merged commit 42f46ab into apache:trunk Jun 11, 2020
vvcephei pushed a commit that referenced this pull request Jun 11, 2020
…8787)

Split out the optimized source changelogs and fetch the committed offsets rather than the end offset for task lag computation

Reviewers: John Roesler <vvcephei@apache.org>
Kvicii pushed a commit to Kvicii/kafka that referenced this pull request Jun 13, 2020
* 'trunk' of github.com:apache/kafka: (42 commits)
  HOTFIX: Fix compile error in TopicAdminTest (apache#8866)
  KAFKA-10144: clean up corrupted standby tasks before attempting a commit (apache#8849)
  KAFKA-10157: Fix broken tests due to InterruptedException from FinalizedFeatureChangeListener (apache#8857)
  KAFKA-9432: automated protocol for DescribeConfigs (apache#8312)
  KAFKA-10049: Fixed FKJ bug where wrapped serdes are set incorrectly when using default StreamsConfig serdes (apache#8764)
  KAFKA-10027: Implement read path for feature versioning system (KIP-584) (apache#8680)
  KAFKA-10085: correctly compute lag for optimized source changelogs (apache#8787)
  KAFKA-10086: Integration test for ensuring warmups are effective (apache#8818)
  KAFKA-9374: Make connector interactions asynchronous (apache#8069)
  MINOR: reduce sizeInBytes for percentiles metrics (apache#8835)
  KAFKA-10115: Incorporate errors.tolerance with the Errant Record Reporter (apache#8829)
  KAFKA-9216: Enforce that Connect’s internal topics use `compact` cleanup policy (apache#8828)
  KAFKA-9845: Warn users about using config providers with plugin.path property (apache#8455)
  KAFKA-7833: Add missing test (apache#8847)
  KAFKA-9066: Retain metrics for failed tasks (apache#8502)
  KAFKA-9841: Revoke duplicate connectors and tasks when zombie workers return with an outdated assignment (apache#8453)
  KAFKA-9985: Sink connector may exhaust broker when writing in DLQ (apache#8663)
  KAFKA-9441: remove prepareClose() to simplify task management (apache#8833)
  KAFKA-7833: Add Global/StateStore name conflict check (apache#8825)
  KAFKA-9969: Exclude ConnectorClientConfigRequest from class loading isolation (apache#8630)
  ...
@ableegoldman ableegoldman deleted the 10085-compute-lag-for-optimized-source-changelogs branch June 26, 2020 22:39
ijuma added a commit to ijuma/kafka that referenced this pull request Nov 17, 2020
…t-for-generated-requests

* apache-github/trunk: (248 commits)
  KAFKA-10049: Fixed FKJ bug where wrapped serdes are set incorrectly when using default StreamsConfig serdes (apache#8764)
  KAFKA-10027: Implement read path for feature versioning system (KIP-584) (apache#8680)
  KAFKA-10085: correctly compute lag for optimized source changelogs (apache#8787)
  KAFKA-10086: Integration test for ensuring warmups are effective (apache#8818)
  KAFKA-9374: Make connector interactions asynchronous (apache#8069)
  MINOR: reduce sizeInBytes for percentiles metrics (apache#8835)
  KAFKA-10115: Incorporate errors.tolerance with the Errant Record Reporter (apache#8829)
  KAFKA-9216: Enforce that Connect’s internal topics use `compact` cleanup policy (apache#8828)
  KAFKA-9845: Warn users about using config providers with plugin.path property (apache#8455)
  KAFKA-7833: Add missing test (apache#8847)
  KAFKA-9066: Retain metrics for failed tasks (apache#8502)
  KAFKA-9841: Revoke duplicate connectors and tasks when zombie workers return with an outdated assignment (apache#8453)
  KAFKA-9985: Sink connector may exhaust broker when writing in DLQ (apache#8663)
  KAFKA-9441: remove prepareClose() to simplify task management (apache#8833)
  KAFKA-7833: Add Global/StateStore name conflict check (apache#8825)
  KAFKA-9969: Exclude ConnectorClientConfigRequest from class loading isolation (apache#8630)
  KAFKA-9991: Fix flaky unit tests (apache#8843)
  KAFKA-10014; Always try to close all channels in Selector#close (apache#8685)
  KAFKA-10079: improve thread-level stickiness (apache#8775)
  MINOR: Print all removed dynamic members during join complete (apache#8816)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants