Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction by AmatyaAvadhanula · Pull Request #15699 · apache/druid

AmatyaAvadhanula · 2024-01-17T06:14:44Z

This PR aims to let the original RetrieveUsedSegmentsAction work with REPLACE locks by introducing a boolean flag.
When the task does not hold any REPLACE locks, the behaviour is identical to the existing behaviour.
However, when there are REPLACE locks, only segments that were created before the times corresponding to the versions of the locks will be fetched for their respective intervals.

Using the original action helps with rolling upgrades where the flag is simply ignored when the overlord has not been upgraded.
It also helps eliminate the need for the undocumented parameter introduced in #15430

…Action

kfaraz

We should also remove the other task action RetrieveSegmentsToReplaceAction as that is not needed anymore.
A rolling upgrade would upgrade MMs first, so we would not have an issue of firing a task action that the OL doesn't recognize.

kfaraz · 2024-01-20T03:38:22Z

    // Defaulting to the former behaviour when visibility wasn't explicitly specified for backward compatibility
    this.visibility = visibility != null ? visibility : Segments.ONLY_VISIBLE;
+
+    this.replace = replace != null ? replace : false;


Can this not be determined automatically while running the action? Only tasks with a REPLACE lock would have this as true, right?

Thanks, I think this can always be true.
If we need to fetch all the unfiltered segments for a task holding replace locks, we could use a task action client created using a different task that holds no locks.

kfaraz · 2024-01-20T03:42:24Z

  @Override
  public Collection<DataSegment> perform(Task task, TaskActionToolbox toolbox)
+  {
+    if (!replace) {


Cleaner to do:

if (isReplaceTask()) { return newMethodWhichDoesTheRightThingForReplaceTask(); } else { return retrieveUsedSegments(toolbox); } private boolean isReplaceTask() { return replace && task.getDatasource().equals(dataSource); }

The new method should also have a javadoc saying that it returns a consistent view of segments and why it is needed.

I think the current structure makes it more readable as it is obvious that we are using the old logic whenever possible before trying to filter the segments using the locks.

if (datasourceToReplace != datasourceToRead) { return retrieveUsedSegments(); } Set<ReplaceLock> replaceLocks = fetchReplaceLocksForTask(); if (replaceLocks.isEmpty()) { return retrieveUsedSegments() } return retrieveUsedSegmentsCreatedBeforeReplaceVersions(replaceLocks);

…segments_action

kfaraz · 2024-01-21T03:20:28Z

+    final String supervisorId;
+    if (task instanceof AbstractBatchSubtask) {
+      supervisorId = ((AbstractBatchSubtask) task).getSupervisorTaskId();
+    } else {
+      supervisorId = task.getId();
+    }


For later, can we confirm if this logic is really needed? I think the task action is always fired using the supervisor task ID.

kfaraz · 2024-01-21T03:28:25Z

+          allSegmentsToBeReplaced.addAll(createdAndSegments.getValue());
+        } else {
+          for (DataSegment segment : createdAndSegments.getValue()) {
+            log.info("Ignoring segment[%s] as it has created_date[%s] greater than the REPLACE lock version[%s]",


Instead of logging all segment IDs separately, add them to a set and just log once.

The intent was to keep it verbose with the segment id and created date with the replace version available in each log.

I fear it might end up being too verbose. Better to just log the replace lock version for each interval and point out that anything newer than that would not be considered.
The segment IDs if needed can go in a debug log.

kfaraz

Changes look okay, left minor comments and MSQ tests need to be handled.

kfaraz · 2024-01-22T10:13:54Z

-            dataSource,
-            intervals
-        ));
+        // Additional check as the task action does not accept empty intervals


Don't think this comment is really needed or accurate. The real reason we are not firing a task action if the intervals is empty because we know we would get back an empty result. Why perform an unnecessary round trip?

kfaraz · 2024-01-22T10:14:35Z

+          publishedUsedSegments = context.taskActionClient().submit(new RetrieveUsedSegmentsAction(
+              dataSource,
+              intervals
+          ));


style:

Suggested change

publishedUsedSegments = context.taskActionClient().submit(new RetrieveUsedSegmentsAction(

dataSource,

intervals

));

publishedUsedSegments = context.taskActionClient().submit(

new RetrieveUsedSegmentsAction(dataSource, intervals)

);

kfaraz · 2024-01-22T10:20:56Z

+      return retrieveUsedSegments(toolbox);
+    }
+
+    Map<Interval, Map<String, Set<DataSegment>>> intervalToCreatedToSegments = new HashMap<>();


I recall putting a comment here but can't find it anywhere.
I would prefer it if we moved the code here on down into a new method
retrieveUsedSegmentsForReplace(toolbox, replaceLocks).

…segments_action

…Action (apache#15699) Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction

…Action (#15699) (#15784) Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction

…Action (apache#15699) Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction

Verify action segmentListUsed with retrieveSegmentsToReplace

37c8a43

github-actions Bot added the Area - Ingestion label Jan 17, 2024

Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegments…

80fc872

…Action

github-actions Bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jan 17, 2024

AmatyaAvadhanula added 2 commits January 17, 2024 18:39

Clean up unneeded config

9a048cc

Fix compilation

50a1a5d

AmatyaAvadhanula marked this pull request as ready for review January 19, 2024 10:33

AmatyaAvadhanula changed the title ~~[Do not merge] Verify action segmentListUsed with retrieveSegmentsToReplace~~ Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction Jan 19, 2024

AmatyaAvadhanula requested a review from kfaraz January 19, 2024 10:46

kfaraz reviewed Jan 20, 2024

View reviewed changes

AmatyaAvadhanula added 3 commits January 20, 2024 10:06

Merge remote-tracking branch 'upstream/master' into replace_retrieve_…

71e2e40

…segments_action

Address feedback

d079367

Revert accidental change

369f0fd

AmatyaAvadhanula requested a review from kfaraz January 20, 2024 05:39

kfaraz reviewed Jan 21, 2024

View reviewed changes

Address feedback

e78bf51

kfaraz reviewed Jan 22, 2024

View reviewed changes

AmatyaAvadhanula added 3 commits January 23, 2024 12:02

Fix concurrent tests

e75c831

Try to fix MSQ tests

09ab5a0

Merge remote-tracking branch 'upstream/master' into replace_retrieve_…

7da3183

…segments_action

abhishekagarwal87 added this to the 29.0.0 milestone Jan 29, 2024

abhishekagarwal87 approved these changes Jan 29, 2024

View reviewed changes

AmatyaAvadhanula merged commit 54d0e48 into apache:master Jan 29, 2024

AmatyaAvadhanula added a commit to AmatyaAvadhanula/druid that referenced this pull request Jan 30, 2024

Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegments…

e5ca65e

…Action (apache#15699) Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction

AmatyaAvadhanula mentioned this pull request Jan 30, 2024

[Backport] Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction (#15699) #15784

Merged

LakshSingla pushed a commit that referenced this pull request Jan 30, 2024

Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegments…

b2510f2

…Action (#15699) (#15784) Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction

pagrawal10 pushed a commit to pagrawal10/druid that referenced this pull request Jan 30, 2024

Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegments…

6f00727

…Action (apache#15699) Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction

Conversation

AmatyaAvadhanula commented Jan 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmatyaAvadhanula Jan 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AmatyaAvadhanula commented Jan 17, 2024 •

edited

Loading

AmatyaAvadhanula Jan 20, 2024 •

edited

Loading