Remove task action audit logging and druid_taskLog metadata table#16309
Merged
kfaraz merged 16 commits intoapache:masterfrom Jul 17, 2024
Merged
Remove task action audit logging and druid_taskLog metadata table#16309kfaraz merged 16 commits intoapache:masterfrom
kfaraz merged 16 commits intoapache:masterfrom
Conversation
| { | ||
| OverlordResource overlordResource = | ||
| new OverlordResource(null, null, null, null, null, null, null, null, null, null); | ||
| final Response response = overlordResource.getTaskSegments("taskId"); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
| handler.getLogs("non_exist_entry") | ||
| Exception exception = Assert.assertThrows( | ||
| UnsupportedOperationException.class, | ||
| () -> handler.addLog("abcd", ImmutableMap.of("logentry", "created")) |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
| { | ||
| Exception exception = Assert.assertThrows( | ||
| UnsupportedOperationException.class, | ||
| () -> handler.getLogs("abcd") |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
| { | ||
| Exception exception = Assert.assertThrows( | ||
| DruidException.class, | ||
| () -> handler.addLog("abcd", "logentry") |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
| { | ||
| Exception exception = Assert.assertThrows( | ||
| DruidException.class, | ||
| () -> handler.getLogs("abcd") |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
clintropolis
approved these changes
Jul 17, 2024
Contributor
Author
|
Thanks for the review, @clintropolis ! |
10 tasks
edgar2020
pushed a commit
to edgar2020/druid
that referenced
this pull request
Jul 19, 2024
…ache#16309) Description: Task action audit logging was first deprecated and disabled by default in Druid 0.13, apache#6368. As called out in the original discussion apache#5859, there are several drawbacks to persisting task action audit logs. - Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments` which returns the list of segments created by a task. - The use case is really narrow and no prod clusters really use this information. - There can be better ways of obtaining this information, such as the metric `segment/added/bytes` which reports both the segment ID and task ID when a segment is committed by a task. We could also include committed segment IDs in task reports. - A task persisting several segments would bloat up the audit logs table putting unnecessary strain on metadata storage. Changes: - Remove `TaskAuditLogConfig` - Remove method `TaskAction.isAudited()`. No task action is audited anymore. - Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction` is the new incarnation which has been in use for a while. - Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore but need to be retained for backward compatibility of extensions. - Do not create `druid_taskLog` metadata table anymore.
edgar2020
pushed a commit
to edgar2020/druid
that referenced
this pull request
Jul 19, 2024
…ache#16309) Description: Task action audit logging was first deprecated and disabled by default in Druid 0.13, apache#6368. As called out in the original discussion apache#5859, there are several drawbacks to persisting task action audit logs. - Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments` which returns the list of segments created by a task. - The use case is really narrow and no prod clusters really use this information. - There can be better ways of obtaining this information, such as the metric `segment/added/bytes` which reports both the segment ID and task ID when a segment is committed by a task. We could also include committed segment IDs in task reports. - A task persisting several segments would bloat up the audit logs table putting unnecessary strain on metadata storage. Changes: - Remove `TaskAuditLogConfig` - Remove method `TaskAction.isAudited()`. No task action is audited anymore. - Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction` is the new incarnation which has been in use for a while. - Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore but need to be retained for backward compatibility of extensions. - Do not create `druid_taskLog` metadata table anymore.
sreemanamala
pushed a commit
to sreemanamala/druid
that referenced
this pull request
Aug 6, 2024
…ache#16309) Description: Task action audit logging was first deprecated and disabled by default in Druid 0.13, apache#6368. As called out in the original discussion apache#5859, there are several drawbacks to persisting task action audit logs. - Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments` which returns the list of segments created by a task. - The use case is really narrow and no prod clusters really use this information. - There can be better ways of obtaining this information, such as the metric `segment/added/bytes` which reports both the segment ID and task ID when a segment is committed by a task. We could also include committed segment IDs in task reports. - A task persisting several segments would bloat up the audit logs table putting unnecessary strain on metadata storage. Changes: - Remove `TaskAuditLogConfig` - Remove method `TaskAction.isAudited()`. No task action is audited anymore. - Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction` is the new incarnation which has been in use for a while. - Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore but need to be retained for backward compatibility of extensions. - Do not create `druid_taskLog` metadata table anymore.
10 tasks
kfaraz
added a commit
that referenced
this pull request
Sep 6, 2025
Description: The `MetadataStorageActionHandler` deals with persistence of `Task` and related objects. But those model classes exist in `indexing-service` module which forces the `MetadataStorageActionHandler` to handle generics unnecessarily. This makes the code cluttered and difficult to understand at times. Changes: - Move `MetadataStorageActionHandler` and factory classes to `indexing-service` module - Move implementations to respective extensions - Simplify class `TaskInfo` by removing type parameters - Add class `TaskIdStatus` - Remove class `TaskStuff` - Add `DerbyTaskStorageModule` to bind the `MetadataStorageActionHandler` to its default impl on `CliOverlord` - Remove references to the tasklog table as it had been disabled in #16309
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
druid_auditmetadata table (ifdruid.audit.manager.type=sql) or simply logged.druid_taskLogmetadata table.Description
Task action audit logging was first deprecated and disabled by default in Druid 0.13, #6368.
As called out in the original discussion #5859, there are several drawbacks to persisting task action audit logs.
/indexer/v1/task/{taskId}/segmentswhich returns the list of segments created by a task.
segment/added/byteswhich reports both the segment ID and task IDwhen a segment is committed by a task. We could also include committed segment IDs in task reports.
on metadata storage.
Changes
TaskAuditLogConfigTaskAction.isAudited(). No task action is audited anymore.SegmentInsertActionas it is not used anymore.SegmentTransactionalInsertActionis the new incarnation which has been in use for a while.MetadataStorageActionHandler.addLog()andgetLogs(). These are not used anymore but need to be retained for backward compatibility of extensions.druid_taskLogmetadata table anymore.Release notes
/indexer/v1/task/{taskId}/segmentsis not supported anymore and will give a 404 NOT FOUND response.druid_taskLoganymore.druid.indexer.auditlog.enabledwill be ignored by Druid.task/action/log/timewill not be emitted anymore.Extension dev notes
The changes in this PR are backward compatible with all existing metadata storage extensions.
The methods
addLogandgetLogsofMetadataStorageActionHandlerare now deprecatedand not used by the Druid code.
Any new metadata storage extension need not implement these methods.
Rolling upgrade concerns
No upgrade concerns as none of the tasks use the
SegmentInsertAction.Future solutions
Which task created a segment?
A more preferable approach would be to simply add a
task_idcolumn in thesegmentstable.Something similar has been recently done for pending segments in #16144.
Alternatively, it could also be possible to determine the list of segments committed by a task by inspecting
the reports of the task or emitted metrics.
Which user created a segment?
Task submission is already logged and/or persisted depending on configuration by the Druid audit system.
Once we can associate segments to task IDs, we would also be able to identify which user created a given
segment.
This PR has: