Skip to content

Track IngestionState more accurately in realtime tasks.#16934

Merged
kfaraz merged 3 commits intoapache:masterfrom
gianm:fix-stream-task-report
Aug 22, 2024
Merged

Track IngestionState more accurately in realtime tasks.#16934
kfaraz merged 3 commits intoapache:masterfrom
gianm:fix-stream-task-report

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Aug 21, 2024

Previously, SeekableStreamIndexTaskRunner set ingestion state to COMPLETED when it finished reading data from Kafka. This is incorrect. After the changes in this patch, the transitions go:

  1. The task stays in BUILD_SEGMENTS after it finishes reading from Kafka, while it is building its final set of segments to publish.

  2. The task transitions to SEGMENT_AVAILABILITY_WAIT after publishing, while waiting for handoff.

  3. The task transitions to COMPLETED immediately before exiting, when truly done.

Previously, SeekableStreamIndexTaskRunner set ingestion state to
COMPLETED when it finished reading data from Kafka. This is incorrect.
After the changes in this patch, the transitions go:

1) The task stays in BUILD_SEGMENTS after it finishes reading from Kafka,
   while it is building its final set of segments to publish.

2) The task transitions to SEGMENT_AVAILABILITY_WAIT after publishing,
   while waiting for handoff.

3) The task transitions to COMPLETED immediately before exiting, when
   truly done.
@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Aug 21, 2024

The CI failures were one flaky test case in FrameFileWriterTest (raised a PR here: #16938) and a failed coverage check for SeekableStreamIndexTaskRunner. I suggest we ignore the coverage check, since due to code structure, the code is tested in another module (kafka-indexing-service and kinesis-indexing-service have the tests for SeekableStreamIndexTaskRunner). The coverage checker doesn't look across modules.

@kfaraz kfaraz merged commit a83125e into apache:master Aug 22, 2024
@gianm gianm deleted the fix-stream-task-report branch August 22, 2024 18:02
hevansDev pushed a commit to hevansDev/druid that referenced this pull request Aug 29, 2024
Previously, SeekableStreamIndexTaskRunner set ingestion state to
COMPLETED when it finished reading data from Kafka. This is incorrect.
After the changes in this patch, the transitions go:

1) The task stays in BUILD_SEGMENTS after it finishes reading from Kafka,
   while it is building its final set of segments to publish.

2) The task transitions to SEGMENT_AVAILABILITY_WAIT after publishing,
   while waiting for handoff.

3) The task transitions to COMPLETED immediately before exiting, when
   truly done.
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants