Skip to content

Temporarily skip compaction for locked intervals#11190

Merged
maytasm merged 16 commits intoapache:masterfrom
kfaraz:compaction_skip_locked_intervals
Jun 21, 2021
Merged

Temporarily skip compaction for locked intervals#11190
maytasm merged 16 commits intoapache:masterfrom
kfaraz:compaction_skip_locked_intervals

Conversation

@kfaraz
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz commented May 3, 2021

Compaction Tasks, both auto and manual, try to acquire locks on the Datasource Intervals which they need to compact. If a higher priority ingestion task is in progress for an overlapping interval, the compaction task waits until it can acquire a lock. This can lead to the following potential issues:

  • Due to poor configuration of skipOffsetFromLatest, compaction can get stuck for long periods of time.
  • Once the lock is released by the ingestion task, it is possible that some new segments are published (and/or removed). This invalidates the segment spec in the compaction task, causing it to fail. Such failures are unnecessary and often difficult to debug.

Resolution

Rather than waiting on already locked intervals, we can proceed to the compaction of other intervals (these would be older intervals, if the compaction policy is Newest First). When the locked intervals are freed up, subsequent compaction runs can submit compaction tasks for them.

In every compaction run (invocation ofCompactSegments.run())

  • the Coordinator makes a call to the Overlord API /lockedIntervals
  • the Overlord returns a list of intervals locked by each currently running task, using the
    in-memory state of the TaskLockbox
  • the Coordinator then skips the locked intervals while submitting compaction tasks
  • For simplicity, Segment Locks are treated the same as Time Chunk Locks i.e. if a Segment in an interval
    is locked by a higher priority, the whole interval is skipped while submitting compaction tasks

Code Changes:

  • Add Overlord REST endpoint /druid/indexer/v1/lockedIntervals
  • Use the above API in CompactSegments.run() to skip locked intervals
  • Add config druid.coordinator.compaction.skipLockedIntervals with default value as true

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

(interval, taskLockPosses) -> taskLockPosses.forEach(
taskLockPosse -> taskLockPosse.taskIds.forEach(taskId -> {
// Do not proceed if the lock is revoked
if (taskLockPosse.getTaskLock().isRevoked()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the rationale behind this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lock that is revoked is effectively not locking any interval. So we don't need to consider revoked locks while looking for locked intervals.

Revoked locks are kept in the TaskLockbox to notify that those locks are revoked to the callers when they acquire the same locks again.

);

final CompactionSegmentIterator iterator =
policy.reset(compactionConfigs, dataSources, compactionTaskIntervals);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe compactionTaskIntervals needs to be called something else now since it can also include intervals for which there is no lock.

Copy link
Copy Markdown
Contributor Author

@kfaraz kfaraz May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, should we just call it compactionSkipIntervals?

Comment thread docs/operations/api-reference.md Outdated
* `/druid/indexer/v1/lockedIntervals`

Retrieve the list of Intervals locked by currently running ingestion/compaction tasks. The response contains a Map from
Task IDs to the list of Intervals locked by the respective Tasks.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need task ids? can we just return a map of datasource --> intervals?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CompactSegments.run(), some currently running compaction tasks can get cancelled if their spec is out of date. We use the taskId to identify the intervals for these tasks getting cancelled so that we don't consider those intervals as locked.

This is the only case where taskId is useful. Otherwise, datasource would be sufficient.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return interval locked by compaction tasks?
If the existing compaction task are out of date, we already have code to cancelled and submit for that interval (hence the lock those to-be-cancel compaction task does not matter). If they are not canceled, then there is already code to skip the interval of running compaction task.

In the above case, we can just return a datasource --> intervals of locked by non compaction tasks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, @maytasm . I will make the changes.

Copy link
Copy Markdown
Contributor

@maytasm maytasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think what we can do here is use the taskPriority set for the auto compaction of each dataSource. The lockedIntervals API can then only returns lockedIntervals of task with priority greater than the taskPriority set for the auto compaction. For example, if the task that is holding the lock has priority lower than compaction task to be scheduled, we should schedule it as it will revoke the lock of the current task (and not skip it). The priority of compaction task scheduled by auto compaction can also be set by the user (taskPriority field) and revoke locks of batch/stream ingestion task.
  • Can you add integration tests too? Maybe a simple one for the lockedIntervals API and a simple one to check that auto compaction skipped locked intervals.

Comment thread docs/operations/api-reference.md Outdated
* `/druid/indexer/v1/lockedIntervals`

Retrieve the list of Intervals locked by currently running ingestion/compaction tasks. The response contains a Map from
Task IDs to the list of Intervals locked by the respective Tasks.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return interval locked by compaction tasks?
If the existing compaction task are out of date, we already have code to cancelled and submit for that interval (hence the lock those to-be-cancel compaction task does not matter). If they are not canceled, then there is already code to skip the interval of running compaction task.

In the above case, we can just return a datasource --> intervals of locked by non compaction tasks

final Map<String, String> taskToDatasource = new HashMap<>();

// Take a lock and populate the maps
giant.lock();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the lock here?

public Response getTaskLockedIntervals(@Context HttpServletRequest request)
{
// Perform authorization check
final ResourceAction resourceAction = new ResourceAction(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular this API needs a finer grained access control rather than using the StateResourceFilter ?
The StateResourceFilter is already used for API like /taskStatus which I think is similar in access control to this API.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

final LockedIntervalsResponse response = new LockedIntervalsResponse(
taskStorageQueryAdapter.getLockedIntervals()
);
log.warn("Found Intervals: %s", response.getLockedIntervals());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a WARN log?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

}

// Build the response
final LockedIntervalsResponse response = new LockedIntervalsResponse(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the API just return Map<String, DatasourceIntervals> (removing the need for another class)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response.ok(taskStorageQueryAdapter.getLockedIntervals()).build()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

indexingServiceClient.cancelTask(status.getId());

// Remove this from the locked intervals
taskToLockedIntervals.remove(status.getId());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned earlier, this is not really needed. This would be the same as taskToLockedIntervals not containing lock for compaction task in the first place.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

*
* @return Map from Task Id to locked intervals.
*/
public Map<String, DatasourceIntervals> getLockedIntervals()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLockedInterval() seems a misnomer because this method returns the intervals of segment locks as well, but they don't lock intervals. I don't have a better suggestion though..

"Skipping the following intervals for Compaction as they are currently locked: %s",
taskToLockedIntervals
);
taskToLockedIntervals.forEach(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should behave differently depending on what lockGranularity is used. If both the compaction task to run and the task that is already running use the segment lock, the compaction task can safely run. Otherwise, the entire locked interval should be skipped as what this code does.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, we are treating Segment Locks the same as Time Chunk Locks i.e. the whole interval would be skipped while submitting compaction tasks even if there is just one Segment in that interval that is locked by a higher priority task.

Added this as a javadoc comment here:

private Map<String, List<Interval>> getLockedIntervalsToSkip(

public Map<String, List<Interval>> getLockedIntervals(Map<String, Integer> minTaskPriority)

Comment thread docs/operations/api-reference.md Outdated

Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.

* `/druid/indexer/v1/lockedIntervals`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be called by users? What is the use case?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, instead of exposing this API, maybe a better thing we need is how to communicate to the user that compaction skips an intervals because of some task holding lock. Like what intervals was skipped for which datasource and what was the task that holds the lock.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, we are just logging the intervals that were skipped due to locks.
The part about notifying the user about skipped intervals will most likely be done through Task Reports in a follow up PR.

Copy link
Copy Markdown
Contributor Author

@kfaraz kfaraz May 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be called by users? What is the use case?

Removed changes to api-reference.md as this API is for internal use (between Coordinator and Overlord) only.

@kfaraz
Copy link
Copy Markdown
Contributor Author

kfaraz commented May 18, 2021

Fixed lockedIntervals API to use taskPriority
Added integration tests.

@kfaraz kfaraz requested review from jihoonson and maytasm May 19, 2021 18:45
Copy link
Copy Markdown
Contributor

@maytasm maytasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

@@ -62,9 +62,9 @@ public List<Task> getActiveTasks()
*
* @return Map from Task Id to locked intervals.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Map from datasource to locked intervals?

* priority are returned. Tasks for datasources that
* are not present in this Map are not returned.
* @return Map from Datasource to List of Intervals locked by Tasks that have
* priority strictly greater than the {@code minTaskPriority} for that datasource.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be > or >= the {@code minTaskPriority}?
Do we want to submit auto compaction task if there is a task with equal task priority already running?

@@ -1149,29 +1149,24 @@ public void testGetLockedIntervals()
);

// Verify the locked intervals
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add test where the existing taskLock priority is lower than the minTaskPriority argument (and hence the lock should not be returned)?

indexingServiceClient.getLockedIntervals());

// Skip all the intervals locked by higher priority tasks for each datasource
getLockedIntervalsToSkip(compactionConfigList).forEach(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Doesn't getLockedIntervalsToSkip already return dataSource -> list of intervals

@abhishekagarwal87
Copy link
Copy Markdown
Contributor

Thank you for the PR, @kfaraz. will there be a way to disable this code-path in a running druid cluster (either dynamically or by restarting services)

@rohangarg
Copy link
Copy Markdown
Member

@kfaraz : Thank a lot for the PR - I have a couple of doubts regarding the new config added :

  1. does the config druid.coordinator.compaction.skipLockedIntervals indicate that the segments which are locked during compaction are missed forever? or is it just for that run?
  2. what would be the user impact of this config? for example, if I'm a user when should I enable/disable this config and what will be its benefits/problems?

@kfaraz
Copy link
Copy Markdown
Contributor Author

kfaraz commented Jun 14, 2021

1. does the config `druid.coordinator.compaction.skipLockedIntervals` indicate that the segments which are locked during compaction are missed forever? or is it just for that run?

@rohangarg , the locked intervals would be skipped only for that run, they would be retried in the next run

2. what would be the user impact of this config? for example, if I'm a user when should I enable/disable this config and what will be its benefits/problems?

Normally, the user would not have to specify any value for this config. The default value of true would suffice. The config has been provided to be able to disable the feature of skipping locked intervals in case of a bug.

@kfaraz kfaraz requested a review from maytasm June 15, 2021 08:31
@maytasm
Copy link
Copy Markdown
Contributor

maytasm commented Jun 16, 2021

  • Can you add a test in ITAutoCompactionLockContentionTest that verify that setting the new flag druid.coordinator.compaction.skipLockedIntervals to false will actually disable this new feature.
  • Actually if getLockedIntervalsToSkip does return lock of Compaction task, then you will need Temporarily skip compaction for locked intervals #11190 (comment)
    (as you mentioned, in CompactSegments.run(), some currently running compaction tasks can get cancelled and hence we have to return the lock)

@rohangarg
Copy link
Copy Markdown
Member

rohangarg commented Jun 16, 2021

Normally, the user would not have to specify any value for this config. The default value of true would suffice. The config has been provided to be able to disable the feature of skipping locked intervals in case of a bug.

Ok but given that the feature/change-set is not too big, I'm not sure whether adding a config would help. Is it possible to know some scenarios or unknowns where this could fail and might be hard to cover them in tests?

Apologies but one more question I had was that if this flag is turned on, will it always ensure that there is no locking conflict between compaction task and other tasks? or is it still possible that compaction task might get stuck due to unavailable lock? my initial guess is that there is still a chance that compaction might get stuck, so confirming? :)

} else if (taskLockPosse.getTaskLock().getPriority() == null
|| taskLockPosse.getTaskLock().getPriority() <= minTaskPriority.get(datasource)) {
// Do not proceed if the lock has a priority less than or equal to the minimum
|| taskLockPosse.getTaskLock().getPriority() < minTaskPriority.get(datasource)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changed back?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had missed changing it the first time.

The behaviour after this change is that intervals of tasks with equal priority will be considered locked. That is the required behaviour, right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maytasm maytasm merged commit f0b105e into apache:master Jun 21, 2021
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
@kfaraz kfaraz deleted the compaction_skip_locked_intervals branch August 23, 2021 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants