Allow kill task to mark segments as unused#11501
Conversation
| "id": <task_id>, | ||
| "dataSource": <task_datasource>, | ||
| "interval" : <all_segments_in_this_interval_will_die!>, | ||
| "markAsUnused": <true|false>, |
There was a problem hiding this comment.
It seems worth to mention what the default is.
| } | ||
| ``` | ||
|
|
||
| If `markAsUnused` is true, the kill task will first mark any segments within the specified interval as unused, before deleting the unused segments within the interval. |
There was a problem hiding this comment.
Perhaps we could make this more scary because it cannot be undone once they delete segments with markAsUnused set?
There was a problem hiding this comment.
I added a WARNING section to the kill task
| .createStatement( | ||
| StringUtils.format( | ||
| "UPDATE %s SET used=false WHERE dataSource = :dataSource " | ||
| + "AND start >= :start AND %2$send%2$s <= :end", |
There was a problem hiding this comment.
Hmm should the end be exclusive? I see IndexerSQLMetadataStorageCoordinator.retrieveUnusedSegmentsForInterval() uses the same filter of "end" <= :end, which seems to break the contract of IndexerMetadataStorageCoordinator.retrieveUnusedSegmentsForInterval() because the end time is exclusive for segment intervals. Maybe this hasn't caused much troubles so far because retrieveUnusedSegmentsForInterval returns only unused segments, even though it seems like a bug. But this method will unset the used flag and thus probably should respect the exclusivity of the end time?
There was a problem hiding this comment.
Hm, I used the same query as run by the markUnused endpoint on the coordinator:
https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/metadata/SqlSegmentsMetadataManager.java#L836
There was a problem hiding this comment.
Sorry NVM. I was confused.
Co-authored-by: Jihoon Son <jihoonson@apache.org>
|
Looks good, just had a few comments. Also is it doable to add integration test for this? Can we piggy back on existiing integration test? |
This PR adds a new
markAsUnusedoption for the Kill Task, which allows it to mark any segments within the specified interval as unused before deleting the unused segments within an interval.This is useful for allowing the mark unused -> delete sequence to happen with a single API call for the caller, as well as allowing the unmark action to occur under a task interval lock.
This PR has: