Skip to content

MINOR: Update Consumer and Producer JavaDocs for committing offsets#18336

Merged
mjsax merged 4 commits intoapache:trunkfrom
mjsax:update-consumer-producer-javadocs
Jan 6, 2025
Merged

MINOR: Update Consumer and Producer JavaDocs for committing offsets#18336
mjsax merged 4 commits intoapache:trunkfrom
mjsax:update-consumer-producer-javadocs

Conversation

@mjsax
Copy link
Copy Markdown
Member

@mjsax mjsax commented Dec 28, 2024

The consumer/producer JavaDocs still contain instruction for naively computing the offset to be committed.
This PR updates the JavaDocs with regard to the improvements of KIP-1094.

The consumer/producer JavaDocs still contain instruction for naively
computing the offset to be committed.
This PR updates the JavaDocs with regard to the improvements of KIP-1094.
Copy link
Copy Markdown
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. A few small comments, but looks like a good improvement.

* which is at position 5 has consumed records with offsets 0 through 4 and will next receive the record with offset 5. There
* are actually two notions of position relevant to the user of the consumer:
* which is at position 5 has consumed records with offsets 0 through 4 and will next receive the record with offset 5.
* Note that offsets are not guaranteed to be consecutive (eg., for compacted topic, or—independent of "read_committed"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest the following the parentheses "(such as compacted topic or when records have been produced using transactions)".

* which is at position 5 has consumed records with offsets 0 through 4 and will next receive the record with offset 5.
* Note that offsets are not guaranteed to be consecutive (eg., for compacted topic, or—independent of "read_committed"
* mode— transactional topics). For example, if the consumer did read a record with offset 4, but 5 is not an offset
* with a record, it's position might advance to 6 (or higher) directly. Similarly, if the consumer's position is 5,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: its not it's

* should not be used. The committed offset should be the next message your application will consume,
* i.e. lastProcessedMessageOffset + 1. If automatic group management with {@link #subscribe(Collection)} is used,
* i.e. {@code nextRecordToBeProcessed.offset()} (or {@link ConsumerRecords#nextOffsets()}).
* You should also add the {@link ConsumerRecord#leaderEpoch()} (or {@code nextOffsets().get(...).leaderEpoch()})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "You should also add the leader epoch as commit metadata, which can be obtained from {@link ConsumerRecord#leaderEpoch()} or {@link ConsumerRecords#nextOffsets}." I didn't find the nextOffsets().get(...).leaderEpoch() that easy to follow and the hyperlink to the ConsumerRecords seems nicer to me.

* should not be used. The committed offset should be the next message your application will consume,
* i.e. lastProcessedMessageOffset + 1. If automatic group management with {@link #subscribe(Collection)} is used,
* i.e. {@code nextRecordToBeProcessed.offset()} (or {@link ConsumerRecords#nextOffsets()}).
* You should also add the {@link ConsumerRecord#leaderEpoch()} (or {@code nextOffsets().get(...).leaderEpoch()})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point as above about nextOffsets().get(...).leaderEpoch().

* should not be used. The committed offset should be the next message your application will consume,
* i.e. lastProcessedMessageOffset + 1. If automatic group management with {@link #subscribe(Collection)} is used,
* i.e. {@code nextRecordToBeProcessed.offset()} (or {@link ConsumerRecords#nextOffsets()}).
* You should also add the {@link ConsumerRecord#leaderEpoch()} (or {@code nextOffsets().get(...).leaderEpoch()})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

* be the next message your application will consume, i.e. lastProcessedMessageOffset + 1.
* be the next message your application will consume, i.e. {@code nextRecordToBeProcessed.offset()}
* (or {@link ConsumerRecords#nextOffsets()}). You should also add the {@link ConsumerRecord#leaderEpoch()}
* (or {@code nextOffsets().get(...).leaderEpoch()}) as commit metadata.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Copy Markdown
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjsax thanks for this patch. one small comment is left. PTAL

* }
* long lastOffset = partitionRecords.get(partitionRecords.size() - 1).offset();
* consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(lastOffset + 1)));
* consumer.commitSync(Collections.singletonMap(partition, partitionRecords.nextOffsets().get(partition));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: partitionRecords -> records

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and missing parenthesis, I guess we should end with :

Suggested change
* consumer.commitSync(Collections.singletonMap(partition, partitionRecords.nextOffsets().get(partition));
* consumer.commitSync(Collections.singletonMap(partition, records.nextOffsets().get(partition)));

Copy link
Copy Markdown
Member Author

@mjsax mjsax Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitionRecords is actually correct; nextRecords() was added to ConsumerRecord[s] (not ConsumerRecord) -- that's also why we need to call get(partition) -- ConsumerRecord is already data for a single partition, and get(partition) would not make sense.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjsax Could you please take a look at following attachment? the type of partitionRecords is List<ConsumerRecord<String, String>> rather than ConsumerRecords

Screenshot From 2025-01-03 10-42-00

btw, it miss a ) also.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sanity checking... I did mix up the variable names.

Copy link
Copy Markdown
Member

@lianetm lianetm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice update! Just couple of comments (and agree with the previous suggestions)

* mode&mdash; transactional topics). For example, if the consumer did read a record with offset 4, but 5 is not an offset
* with a record, it's position might advance to 6 (or higher) directly. Similarly, if the consumer's position is 5,
* but there is no record with offset 5, the consumer will return the record with the next higher offset.
* There are actually two notions of position relevant to the user of the consumer:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to this but on ln 84 right below (sorry couldn't add comment there):

The {@link #commitSync() committed position} is the last offset that has been stored securely

shouldn't that refer to committed(..) instead?

The {@link #committed(Set) committed position} is the last offset that has been stored securely

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure... I did not modify this part... There are actually two notions of position relevant to the user of the consumer: did not change.

But I think it ok as is? Guess it works both ways... It's just a difference between "write path" vs "read path", right?

Let me know what you think. If you think using #commited(Set) is better, happy to update it, but it's orthogonal to what I want to do in this PR and would be some side improvement.

* }
* long lastOffset = partitionRecords.get(partitionRecords.size() - 1).offset();
* consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(lastOffset + 1)));
* consumer.commitSync(Collections.singletonMap(partition, partitionRecords.nextOffsets().get(partition));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and missing parenthesis, I guess we should end with :

Suggested change
* consumer.commitSync(Collections.singletonMap(partition, partitionRecords.nextOffsets().get(partition));
* consumer.commitSync(Collections.singletonMap(partition, records.nextOffsets().get(partition)));

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Jan 3, 2025

Thanks for all the input. Pushed an updated.

Copy link
Copy Markdown
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one final comment.

* Thus, when calling {@link #commitSync(Map) commitSync(offsets)} you should add one to the offset of the last message processed.
* Thus, when calling {@link #commitSync(Map) commitSync(offsets)} you should use {@code nextRecordToBeProcessed.offset()}
* or if {@link ConsumerRecords} is exhausted already {@link ConsumerRecords#nextOffsets()} instead.
* You should also pass in the {@link ConsumerRecord#leaderEpoch()} (or {@code nextOffsets().get(...).leaderEpoch()})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other similar places, you use the You should also add the leader epoch as commit metadata, which can be obtained from {@link ConsumerRecord#leaderEpoch()} or {@link ConsumerRecords#nextOffsets()}.. I think consistency would be good here.

Copy link
Copy Markdown
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Looks good to me.

@mjsax mjsax merged commit 3918f37 into apache:trunk Jan 6, 2025
@mjsax mjsax deleted the update-consumer-producer-javadocs branch January 6, 2025 21:39
mjsax added a commit that referenced this pull request Jan 6, 2025
…18336)

The consumer/producer JavaDocs still contain instruction for naively
computing the offset to be committed.

This PR updates the JavaDocs with regard to the improvements of KIP-1094.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Lianet Magrans <lmagrans@confluent.io>
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Jan 6, 2025

Merged to trunk and cherry-picked to 4.0 branch.

ijuma added a commit to ijuma/kafka that referenced this pull request Jan 8, 2025
…og-compaction-write-record-v2

* apache-github/trunk: (34 commits)
  MINOR: Bump year to 2025 in NOTICE file (apache#18427)
  KAFKA-18411 Remove ZkProducerIdManager (apache#18413)
  KAFKA-18408 tweak the 'tag' field for BrokerHeartbeatRequest.json, BrokerRegistrationChangeRecord.json and RegisterBrokerRecord.json (apache#18421)
  KAFKA-18414 Remove KRaftRegistrationResult (apache#18401)
  KAFKA-17921 Support SASL_PLAINTEXT protocol with java.security.auth.login.config (apache#17671)
  KAFKA-18384 Remove ZkAlterPartitionManager (apache#18364)
  KAFKA-10790: Add deadlock detection to producer#flush (apache#17946)
  KAFKA-18412: Remove EmbeddedZookeeper (apache#18399)
  MINOR : Improve Exception log in NotEnoughReplicasException(apache#12394)
  MINOR: Improve PlaintextAdminIntegrationTest#testConsumerGroups (apache#18409)
  MINOR: Remove unused local variable (apache#18410)
  MINOR: Remove RaftManager.maybeDeleteMetadataLogDir and AutoTopicCreationManagerTest.scala (apache#17365)
  KAFKA-18368 Remove TestUtils#MockZkConnect and remove zkConnect from TestUtils#createBrokerConfig (apache#18352)
  MINOR: Update Consumer group timeout default to 30 sec (apache#16406)
  MINOR: Fix typo in CommitRequestManager (apache#18407)
  MINOR: cleanup JavaDocs for deprecation warnings (apache#18402)
  KAFKA-18303; Update ShareCoordinator to use new record format (apache#18396)
  MINOR: Update Consumer and Producer JavaDocs for committing offsets (apache#18336)
  KAFKA-16446: Improve controller event duration logging (apache#15622)
  KAFKA-18388 test-kraft-server-start.sh should use log4j2.yaml (apache#18370)
  ...
manoj-mathivanan pushed a commit to manoj-mathivanan/kafka that referenced this pull request Feb 19, 2025
…pache#18336)

The consumer/producer JavaDocs still contain instruction for naively
computing the offset to be committed.

This PR updates the JavaDocs with regard to the improvements of KIP-1094.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Lianet Magrans <lmagrans@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants