Skip to content

Enable segments read/published stats on Compaction task completion reports#15947

Merged
kfaraz merged 27 commits intoapache:masterfrom
ac9817:adithyachakilam/enable-segment-stats-for-compaction
Mar 7, 2024
Merged

Enable segments read/published stats on Compaction task completion reports#15947
kfaraz merged 27 commits intoapache:masterfrom
ac9817:adithyachakilam/enable-segment-stats-for-compaction

Conversation

@ac9817
Copy link
Copy Markdown
Contributor

@ac9817 ac9817 commented Feb 22, 2024

Description

This adds visibility into number of segments read/published by each parallel compaction task.

This PR adds new subclass for IngestionStatsAndErrorsTaskReportData to take new fields and ParallelIndexSupervisorTask to populate the segments read/published in different parallel compaction algos.


Release note

Parallel compaction task completion reports now have segmentsRead and segmentsPublished fields to see how effective a compaction task is.


Key changed/added classes in this PR
  • ParallelIndexSupervisorTask.java
  • IngestionStatsAndErrorsTaskReportData.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@ac9817 ac9817 marked this pull request as draft February 22, 2024 22:51
@ac9817 ac9817 force-pushed the adithyachakilam/enable-segment-stats-for-compaction branch from d1794dc to 1a8de2a Compare February 26, 2024 20:17
@ac9817 ac9817 force-pushed the adithyachakilam/enable-segment-stats-for-compaction branch from 1a8de2a to 3a945ce Compare February 26, 2024 21:31
@ac9817 ac9817 marked this pull request as ready for review February 26, 2024 21:36
@ac9817 ac9817 changed the title [WIP] Enable segments read/published stats on Compaction task completion reports Enable segments read/published stats on Compaction task completion reports Feb 27, 2024
@ac9817 ac9817 requested a review from YongGang February 28, 2024 17:36
Copy link
Copy Markdown
Contributor

@YongGang YongGang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…druid/indexing/common/task/batch/parallel/GeneratedPartitionsReport.java
Comment thread docs/ingestion/tasks.md Outdated
Copy link
Copy Markdown
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-02-29 at 11 31 57 AM

I tried these scenarios:

  • index_parallel with no additional subtasks (wikipedia) - the task report has segmentsRead: null and segmentsPublished: null in the report
  • index_parallel with more than 1 subtask (kttm1) - the task report has segmentsRead: null and segmentsPublished: null in the sub task and segmentsRead: 0 and semgentsPublished: 1 in the index_parallel task
  • Issued a compaction task with no subtasks (kttm1) - the task report has segmentsRead: null and segmentsPublished: null
  • issued a compaction task with sub tasks (wikipedia) - no reports for the sub tasks, but the compactTask has segmentsRead: 1 segmentsPublished: 1
  • index_kafka on kttm - report has segmentsRead: null segmentsPublished: null

Since this change is meant to be scoped to just compaction tasks, I am -1 in it's current implementation because there are many different task reports than have the fields segmentsRead and segmentsPublished added to it returning null and index_parallel tasks that have a report that says it read 0 segments (while true, that's a little confusing).

If these fields did not show up in the task reports where they are not meant to, I think the change would be good.

@ac9817 ac9817 requested a review from suneet-s March 5, 2024 03:24
Copy link
Copy Markdown
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better! Only other thing I noticed is that index_parallel jobs with no subtasks that used to have task reports before this change no longer show a task report after this change. Can you look into that issue please

Comment thread docs/ingestion/tasks.md Outdated
@ac9817
Copy link
Copy Markdown
Contributor Author

ac9817 commented Mar 5, 2024

@suneet-s, the missing report you pointed out is actually related to change: #15981 I'll make up a follow up PR to address it.

@ac9817
Copy link
Copy Markdown
Contributor Author

ac9817 commented Mar 5, 2024

Here is the fix: #16042

@ac9817 ac9817 requested a review from suneet-s March 5, 2024 06:10
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the new fields, @adithyachakilam . I have left some comments that would be more in line with the existing report class structure.

Comment thread docs/ingestion/tasks.md Outdated
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, @adithyachakilam ! Requested minor changes, otherwise looks good.

Comment thread docs/ingestion/tasks.md Outdated
Adithya Chakilam added 2 commits March 6, 2024 08:02
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing the comments.

@kfaraz kfaraz merged commit 564c44e into apache:master Mar 7, 2024
@ac9817 ac9817 deleted the adithyachakilam/enable-segment-stats-for-compaction branch March 14, 2024 23:46
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants