Temporarily skip compaction for locked intervals by kfaraz · Pull Request #11190 · apache/druid

kfaraz · 2021-05-03T15:57:25Z

Compaction Tasks, both auto and manual, try to acquire locks on the Datasource Intervals which they need to compact. If a higher priority ingestion task is in progress for an overlapping interval, the compaction task waits until it can acquire a lock. This can lead to the following potential issues:

Due to poor configuration of skipOffsetFromLatest, compaction can get stuck for long periods of time.
Once the lock is released by the ingestion task, it is possible that some new segments are published (and/or removed). This invalidates the segment spec in the compaction task, causing it to fail. Such failures are unnecessary and often difficult to debug.

Resolution

Rather than waiting on already locked intervals, we can proceed to the compaction of other intervals (these would be older intervals, if the compaction policy is Newest First). When the locked intervals are freed up, subsequent compaction runs can submit compaction tasks for them.

In every compaction run (invocation ofCompactSegments.run())

the Coordinator makes a call to the Overlord API /lockedIntervals
the Overlord returns a list of intervals locked by each currently running task, using the
in-memory state of the TaskLockbox
the Coordinator then skips the locked intervals while submitting compaction tasks
For simplicity, Segment Locks are treated the same as Time Chunk Locks i.e. if a Segment in an interval
is locked by a higher priority, the whole interval is skipped while submitting compaction tasks

Code Changes:

Add Overlord REST endpoint /druid/indexer/v1/lockedIntervals
Use the above API in CompactSegments.run() to skip locked intervals
Add config druid.coordinator.compaction.skipLockedIntervals with default value as true

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

abhishekagarwal87 · 2021-05-04T12:12:24Z

+                  (interval, taskLockPosses) -> taskLockPosses.forEach(
+                      taskLockPosse -> taskLockPosse.taskIds.forEach(taskId -> {
+                        // Do not proceed if the lock is revoked
+                        if (taskLockPosse.getTaskLock().isRevoked()) {


what is the rationale behind this?

A lock that is revoked is effectively not locking any interval. So we don't need to consider revoked locks while looking for locked intervals.

Revoked locks are kept in the TaskLockbox to notify that those locks are revoked to the callers when they acquire the same locks again.

abhishekagarwal87 · 2021-05-04T12:17:58Z

+        );
+
        final CompactionSegmentIterator iterator =
            policy.reset(compactionConfigs, dataSources, compactionTaskIntervals);


maybe compactionTaskIntervals needs to be called something else now since it can also include intervals for which there is no lock.

Yeah, should we just call it compactionSkipIntervals?

abhishekagarwal87 · 2021-05-04T12:24:36Z

+* `/druid/indexer/v1/lockedIntervals`
+
+Retrieve the list of Intervals locked by currently running ingestion/compaction tasks. The response contains a Map from
+Task IDs to the list of Intervals locked by the respective Tasks.


why do we need task ids? can we just return a map of datasource --> intervals?

In CompactSegments.run(), some currently running compaction tasks can get cancelled if their spec is out of date. We use the taskId to identify the intervals for these tasks getting cancelled so that we don't consider those intervals as locked.

This is the only case where taskId is useful. Otherwise, datasource would be sufficient.

See here:

druid/server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java

Line 147 in 5eb91ab

taskToLockedIntervals.remove(status.getId());

Do we need to return interval locked by compaction tasks?
If the existing compaction task are out of date, we already have code to cancelled and submit for that interval (hence the lock those to-be-cancel compaction task does not matter). If they are not canceled, then there is already code to skip the interval of running compaction task.

In the above case, we can just return a datasource --> intervals of locked by non compaction tasks

Good point, @maytasm . I will make the changes.

maytasm

I think what we can do here is use the taskPriority set for the auto compaction of each dataSource. The lockedIntervals API can then only returns lockedIntervals of task with priority greater than the taskPriority set for the auto compaction. For example, if the task that is holding the lock has priority lower than compaction task to be scheduled, we should schedule it as it will revoke the lock of the current task (and not skip it). The priority of compaction task scheduled by auto compaction can also be set by the user (taskPriority field) and revoke locks of batch/stream ingestion task.
Can you add integration tests too? Maybe a simple one for the lockedIntervals API and a simple one to check that auto compaction skipped locked intervals.

maytasm · 2021-05-04T23:49:40Z

+* `/druid/indexer/v1/lockedIntervals`
+
+Retrieve the list of Intervals locked by currently running ingestion/compaction tasks. The response contains a Map from
+Task IDs to the list of Intervals locked by the respective Tasks.


Do we need to return interval locked by compaction tasks?
If the existing compaction task are out of date, we already have code to cancelled and submit for that interval (hence the lock those to-be-cancel compaction task does not matter). If they are not canceled, then there is already code to skip the interval of running compaction task.

In the above case, we can just return a datasource --> intervals of locked by non compaction tasks

maytasm · 2021-05-04T23:54:49Z

+    final Map<String, String> taskToDatasource = new HashMap<>();
+
+    // Take a lock and populate the maps
+    giant.lock();


why do we need the lock here?

maytasm · 2021-05-05T00:04:20Z

+  public Response getTaskLockedIntervals(@Context HttpServletRequest request)
+  {
+    // Perform authorization check
+    final ResourceAction resourceAction = new ResourceAction(


Any particular this API needs a finer grained access control rather than using the StateResourceFilter ?
The StateResourceFilter is already used for API like /taskStatus which I think is similar in access control to this API.

maytasm · 2021-05-05T00:06:21Z

+    final LockedIntervalsResponse response = new LockedIntervalsResponse(
+        taskStorageQueryAdapter.getLockedIntervals()
+    );
+    log.warn("Found Intervals: %s", response.getLockedIntervals());


Why is this a WARN log?

maytasm · 2021-05-05T00:11:08Z

+    }
+
+    // Build the response
+    final LockedIntervalsResponse response = new LockedIntervalsResponse(


Can the API just return Map<String, DatasourceIntervals> (removing the need for another class)?

Response.ok(taskStorageQueryAdapter.getLockedIntervals()).build()

maytasm · 2021-05-05T00:21:28Z

                indexingServiceClient.cancelTask(status.getId());
+
+                // Remove this from the locked intervals
+                taskToLockedIntervals.remove(status.getId());


As mentioned earlier, this is not really needed. This would be the same as taskToLockedIntervals not containing lock for compaction task in the first place.

jihoonson · 2021-05-05T06:41:30Z

+   *
+   * @return Map from Task Id to locked intervals.
+   */
+  public Map<String, DatasourceIntervals> getLockedIntervals()


getLockedInterval() seems a misnomer because this method returns the intervals of segment locks as well, but they don't lock intervals. I don't have a better suggestion though..

jihoonson · 2021-05-05T06:51:17Z

+            "Skipping the following intervals for Compaction as they are currently locked: %s",
+            taskToLockedIntervals
+        );
+        taskToLockedIntervals.forEach(


I think it should behave differently depending on what lockGranularity is used. If both the compaction task to run and the task that is already running use the segment lock, the compaction task can safely run. Otherwise, the entire locked interval should be skipped as what this code does.

For simplicity, we are treating Segment Locks the same as Time Chunk Locks i.e. the whole interval would be skipped while submitting compaction tasks even if there is just one Segment in that interval that is locked by a higher priority task.

Added this as a javadoc comment here:

druid/server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java

Line 233 in a8e30c3

private Map<String, List<Interval>> getLockedIntervalsToSkip(

druid/indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskLockbox.java

Line 690 in a8e30c3

public Map<String, List<Interval>> getLockedIntervals(Map<String, Integer> minTaskPriority)

jihoonson · 2021-05-05T06:52:54Z


 Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.

+* `/druid/indexer/v1/lockedIntervals`


Is this supposed to be called by users? What is the use case?

Actually, instead of exposing this API, maybe a better thing we need is how to communicate to the user that compaction skips an intervals because of some task holding lock. Like what intervals was skipped for which datasource and what was the task that holds the lock.

For now, we are just logging the intervals that were skipped due to locks.
The part about notifying the user about skipped intervals will most likely be done through Task Reports in a follow up PR.

Is this supposed to be called by users? What is the use case?

Removed changes to api-reference.md as this API is for internal use (between Coordinator and Overlord) only.

kfaraz · 2021-05-18T16:21:33Z

Fixed lockedIntervals API to use taskPriority
Added integration tests.

…ocked_intervals

maytasm

Minor comments

maytasm · 2021-06-03T21:14:04Z

@@ -62,9 +62,9 @@ public List<Task> getActiveTasks()
   *
   * @return Map from Task Id to locked intervals.


Is this Map from datasource to locked intervals?

maytasm · 2021-06-03T22:52:59Z

+   *                        priority are returned. Tasks for datasources that
+   *                        are not present in this Map are not returned.
+   * @return Map from Datasource to List of Intervals locked by Tasks that have
+   * priority strictly greater than the {@code minTaskPriority} for that datasource.


should this be > or >= the {@code minTaskPriority}?
Do we want to submit auto compaction task if there is a task with equal task priority already running?

maytasm · 2021-06-03T23:01:24Z

@@ -1149,29 +1149,24 @@ public void testGetLockedIntervals()
    );

    // Verify the locked intervals


Can you add test where the existing taskLock priority is lower than the minTaskPriority argument (and hence the lock should not be returned)?

maytasm · 2021-06-04T02:39:28Z

-            indexingServiceClient.getLockedIntervals());
+
+        // Skip all the intervals locked by higher priority tasks for each datasource
+        getLockedIntervalsToSkip(compactionConfigList).forEach(


Is this needed? Doesn't getLockedIntervalsToSkip already return dataSource -> list of intervals

abhishekagarwal87 · 2021-06-04T07:40:00Z

Thank you for the PR, @kfaraz. will there be a way to disable this code-path in a running druid cluster (either dynamically or by restarting services)

…ion_skip_locked_intervals

rohangarg · 2021-06-14T07:07:00Z

@kfaraz : Thank a lot for the PR - I have a couple of doubts regarding the new config added :

does the config druid.coordinator.compaction.skipLockedIntervals indicate that the segments which are locked during compaction are missed forever? or is it just for that run?
what would be the user impact of this config? for example, if I'm a user when should I enable/disable this config and what will be its benefits/problems?

kfaraz · 2021-06-14T08:27:44Z

1. does the config `druid.coordinator.compaction.skipLockedIntervals` indicate that the segments which are locked during compaction are missed forever? or is it just for that run?

@rohangarg , the locked intervals would be skipped only for that run, they would be retried in the next run

2. what would be the user impact of this config? for example, if I'm a user when should I enable/disable this config and what will be its benefits/problems?

Normally, the user would not have to specify any value for this config. The default value of true would suffice. The config has been provided to be able to disable the feature of skipping locked intervals in case of a bug.

maytasm · 2021-06-16T06:10:07Z

Can you add a test in ITAutoCompactionLockContentionTest that verify that setting the new flag druid.coordinator.compaction.skipLockedIntervals to false will actually disable this new feature.
Actually if getLockedIntervalsToSkip does return lock of Compaction task, then you will need Temporarily skip compaction for locked intervals #11190 (comment)
(as you mentioned, in CompactSegments.run(), some currently running compaction tasks can get cancelled and hence we have to return the lock)

rohangarg · 2021-06-16T10:11:25Z

Normally, the user would not have to specify any value for this config. The default value of true would suffice. The config has been provided to be able to disable the feature of skipping locked intervals in case of a bug.

Ok but given that the feature/change-set is not too big, I'm not sure whether adding a config would help. Is it possible to know some scenarios or unknowns where this could fail and might be hard to cover them in tests?

Apologies but one more question I had was that if this flag is turned on, will it always ensure that there is no locking conflict between compaction task and other tasks? or is it still possible that compaction task might get stuck due to unavailable lock? my initial guess is that there is still a chance that compaction might get stuck, so confirming? :)

maytasm · 2021-06-20T01:15:10Z

                          } else if (taskLockPosse.getTaskLock().getPriority() == null
-                                     || taskLockPosse.getTaskLock().getPriority() <= minTaskPriority.get(datasource)) {
-                            // Do not proceed if the lock has a priority less than or equal to the minimum
+                                     || taskLockPosse.getTaskLock().getPriority() < minTaskPriority.get(datasource)) {


Why is this changed back?

I had missed changing it the first time.

The behaviour after this change is that intervals of tasks with equal priority will be considered locked. That is the required behaviour, right?

kfaraz added 3 commits May 3, 2021 21:08

Add overlord API /lockedIntervals. Skip compaction for locked intervals

637c16f

Revert formatting changes

9ed7a30

Add license info

4a2b36f

clintropolis added the Area - Batch Ingestion label May 4, 2021

kfaraz added 5 commits May 4, 2021 11:44

Fix checkstyle

928c930

Remove invalid API invocation

5d61ff3

Fix checkstyle

1bb1532

Add DatasourceIntervalsTest

761b07c

Fix checkstyle

5eb91ab

abhishekagarwal87 reviewed May 4, 2021

View reviewed changes

maytasm requested changes May 5, 2021

View reviewed changes

jihoonson reviewed May 5, 2021

View reviewed changes

kfaraz added 2 commits May 5, 2021 13:23

Remove LockedIntervalsResponse

7d6eb4a

Add integration tests for lockedIntervals

a8e30c3

Add ITAutoCompactionLockContentionTest

9891462

kfaraz requested review from jihoonson and maytasm May 19, 2021 18:45

Merge remote-tracking branch 'upstream/master' into compaction_skip_l…

e442c72

…ocked_intervals

maytasm requested changes Jun 4, 2021

View reviewed changes

kfaraz added 3 commits June 8, 2021 10:22

Merge branch 'master' of https://github.com/apache/druid into compact…

9511a21

…ion_skip_locked_intervals

Add config druid.coordinator.compaction.skipLockedIntervals

e6afec0

Merge branch 'master' of https://github.com/apache/druid into compact…

6956549

…ion_skip_locked_intervals

kfaraz requested a review from maytasm June 15, 2021 08:31

Add test for TaskQueue

37171f1

maytasm reviewed Jun 20, 2021

View reviewed changes

maytasm approved these changes Jun 20, 2021

View reviewed changes

maytasm merged commit f0b105e into apache:master Jun 21, 2021

clintropolis added this to the 0.22.0 milestone Aug 12, 2021

kfaraz deleted the compaction_skip_locked_intervals branch August 23, 2021 06:47

clintropolis mentioned this pull request Sep 3, 2021

[Draft] 0.22.0 Release Notes #11657

Closed


		Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.

		* `/druid/indexer/v1/lockedIntervals`

		@@ -62,9 +62,9 @@ public List<Task> getActiveTasks()
		*
		* @return Map from Task Id to locked intervals.

		@@ -1149,29 +1149,24 @@ public void testGetLockedIntervals()
		);

		// Verify the locked intervals

Conversation

kfaraz commented May 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Resolution

Code Changes:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfaraz May 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maytasm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfaraz May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfaraz commented May 18, 2021

Uh oh!

maytasm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfaraz commented May 3, 2021 •

edited

Loading

kfaraz May 4, 2021 •

edited

Loading

kfaraz May 18, 2021 •

edited

Loading

rohangarg commented Jun 16, 2021 •

edited

Loading