Skip to content

KAFKA-12330; FetchSessionCache may cause starvation for partitions when FetchResponse is full#10318

Merged
rajinisivaram merged 1 commit intoapache:trunkfrom
dajac:KAFKA-12330
Mar 16, 2021
Merged

KAFKA-12330; FetchSessionCache may cause starvation for partitions when FetchResponse is full#10318
rajinisivaram merged 1 commit intoapache:trunkfrom
dajac:KAFKA-12330

Conversation

@dajac
Copy link
Copy Markdown
Member

@dajac dajac commented Mar 15, 2021

The incremental FetchSessionCache sessions deprioritizes partitions where a response is returned. This may happen if log metadata such as log start offset, hwm, etc is returned, or if data for that partition is returned.

When a fetch response fills to maxBytes, data may not be returned for partitions even if the fetch offset is lower than the fetch upper bound. However, the fetch response will still contain updates to metadata such as hwm if that metadata has changed. This can lead to degenerate behavior where a partition's hwm or log start offset is updated resulting in the next fetch being unnecessarily skipped for that partition. At first this appeared to be worse, as hwm updates occur frequently, but starvation should result in hwm movement becoming blocked, allowing a fetch to go through and then becoming unstuck. However, it'll still require one more fetch request than necessary to do so. Consumers may be affected more than replica fetchers, however they often remove partitions with fetched data from the next fetch request and this may be helping prevent starvation.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Comment thread core/src/main/scala/kafka/server/FetchSession.scala
Copy link
Copy Markdown
Contributor

@lbradstreet lbradstreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dajac Thanks for the PR, LGTM

Copy link
Copy Markdown
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rajinisivaram
Copy link
Copy Markdown
Contributor

@dajac Thanks for the PR, LGTM. Merging to trunk.

@rajinisivaram rajinisivaram merged commit d80a87f into apache:trunk Mar 16, 2021
@dajac dajac deleted the KAFKA-12330 branch March 17, 2021 06:49
ijuma added a commit to confluentinc/kafka that referenced this pull request Mar 17, 2021
Conflicts:
* Jenkinsfile: `install` -> `publishToMavenLocal`, drop ARM build and
other changes that don't make sense for Confluent's version of
`Jenkinsfile`.
* build.gradle: keep Confluent changes for automatic skipping signing
for specific version patterns (upstream only does it if the version ends
with `SNAPSHOT`).

Commits:
* apache-github/trunk: (59 commits)
  MINOR: Remove redundant allows in import-control.xml (apache#10339)
  MINOR: remove some specifying types in tool command (apache#10329)
  KAFKA-12455: Fix OffsetValidationTest.test_broker_rolling_bounce failure with Raft (apache#10322)
  MINOR: Add toString to various Kafka Metrics classes (apache#10330)
  KAFKA-12330; FetchSessionCache may cause starvation for partitions when FetchResponse is full (apache#10318)
  KAFKA-12427: Don't update connection idle time for muted connections (apache#10267)
  MINOR; Various code cleanups (apache#10319)
  HOTFIX: timeout issue in removeStreamThread() (apache#10321)
  revert stream logging level back to ERROR (apache#10320)
  KAFKA-12352: Make sure all rejoin group and reset state has a reason (apache#10232)
  KAFKA-10348: Share client channel between forwarding and auto creation manager (apache#10135)
  MINOR: Update year in NOTICE (apache#10308)
  KAFKA-12398: Fix flaky test `ConsumerBounceTest.testClose` (apache#10243)
  MINOR: Remove redundant inheritance from FilteringJmxReporter #onMetricRemoved (apache#10303)
  KAFKA-12462: proceed with task revocation in case of thread in PENDING_SHUTDOWN (apache#10311)
  KAFKA-12460; Do not allow raft truncation below high watermark (apache#10310)
  MINOR: Log project, gradle, java and scala versions at the start of the build (apache#10307)
  KAFKA-10357: Add missing repartition topic validation (apache#10305)
  MINOR: Improve error message in MirrorConnectorsIntegrationBaseTest (apache#10268)
  MINOR: Add missing unit tests for Mirror Connect (apache#10192)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants