-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](audit) fix potential audit log missing issue #50357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces logging adjustments to help debug missing audit logs. The key changes include:
- Removing the boolean flag from handleAuditEvent calls to adjust audit event processing.
- Updating log messages and their levels to provide more informative warnings.
- Correcting a comment typo regarding “nereids” in the query processor.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| fe/fe-core/src/main/java/org/apache/doris/resource/workloadschedpolicy/WorkloadRuntimeStatusMgr.java | Updated audit event handling and log warn message when queue is full. |
| fe/fe-core/src/main/java/org/apache/doris/qe/MysqlConnectProcessor.java | Fixed typo in log comment. |
| fe/fe-core/src/main/java/org/apache/doris/qe/ConnectProcessor.java | Changed log level to warn in exception handling for query execution. |
| fe/fe-core/src/main/java/org/apache/doris/qe/AuditLogHelper.java | Refactored audit event submission and added debug logging for event query. |
| fe/fe-core/src/main/java/org/apache/doris/qe/AuditEventProcessor.java | Added a queue threshold check and updated exception log levels. |
| fe/fe-core/src/main/java/org/apache/doris/plugin/audit/AuditLogBuilder.java | Updated log level from debug to warn on audit event process failures. |
Comments suppressed due to low confidence (5)
fe/fe-core/src/main/java/org/apache/doris/resource/workloadschedpolicy/WorkloadRuntimeStatusMgr.java:92
- Removal of the boolean flag in the handleAuditEvent call changes the event processing logic. Confirm that this behavior change is intentional and does not lead to unintended audit logging issues.
boolean ret = Env.getCurrentAuditEventProcessor().handleAuditEvent(auditEvent);
fe/fe-core/src/main/java/org/apache/doris/qe/MysqlConnectProcessor.java:212
- Corrected the spelling of 'nereids'.
// nereids
fe/fe-core/src/main/java/org/apache/doris/qe/ConnectProcessor.java:384
- [nitpick] Changing the log level from debug to warn may generate excessive logs in production. Verify that warn is appropriate for this exception handling context.
LOG.warn("throw exception for query: {}", DebugUtil.printId(ctx.queryId), throwable);
fe/fe-core/src/main/java/org/apache/doris/qe/AuditEventProcessor.java:98
- The manual check for queue size to discard audit events may inadvertently drop important events. Confirm that this threshold logic aligns with the intended audit event processing behavior.
if (eventQueue.size() >= Config.audit_event_log_queue_size) {
fe/fe-core/src/main/java/org/apache/doris/plugin/audit/AuditLogBuilder.java:96
- [nitpick] Ensure that changing the log level to warn adequately signals failures without creating excessive noise in production logs.
LOG.warn("failed to process audit event: {}", event.queryId, e);
|
run buildall |
TPC-H: Total hot run time: 34094 ms |
TPC-DS: Total hot run time: 192735 ms |
ClickBench: Total hot run time: 29.5 s |
wangbo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
Problem Summary: After query finished, the audit event will be created and: 1. Try to put into `queryAuditEventList` in `WorkloadRuntimeStatusMgr` `WorkloadRuntimeStatusMgr` has a background thread to visit `queryAuditEventList`, try to wait for the statistic info of the event(such as scan rows, cpu time, etc.) before logging this event. The size of `queryAuditEventList` is very large, default is `250000`, which is configured by FE config `audit_event_log_queue_size`. So that it can always hold all events within a certain period (typically, 5 seconds). 2. If `queryAuditEventList` is full, the statistic info of this audit event will be ignored, and event will be logged directly. Either the event has the statistic info or not, it will be put into a `eventQueue` in `AuditEventProcessor`. But this queue only has capacity of `10000`, which may not enough to cache all events in a certain periods. So some of audit event will be discarded and lost. This PR mainly changes: 1. Use the same config `audit_event_log_queue_size` as the max size of `eventQueue`. 2. Change some log message for easy debugging in future.
Problem Summary: After query finished, the audit event will be created and: 1. Try to put into `queryAuditEventList` in `WorkloadRuntimeStatusMgr` `WorkloadRuntimeStatusMgr` has a background thread to visit `queryAuditEventList`, try to wait for the statistic info of the event(such as scan rows, cpu time, etc.) before logging this event. The size of `queryAuditEventList` is very large, default is `250000`, which is configured by FE config `audit_event_log_queue_size`. So that it can always hold all events within a certain period (typically, 5 seconds). 2. If `queryAuditEventList` is full, the statistic info of this audit event will be ignored, and event will be logged directly. Either the event has the statistic info or not, it will be put into a `eventQueue` in `AuditEventProcessor`. But this queue only has capacity of `10000`, which may not enough to cache all events in a certain periods. So some of audit event will be discarded and lost. This PR mainly changes: 1. Use the same config `audit_event_log_queue_size` as the max size of `eventQueue`. 2. Change some log message for easy debugging in future.
### What problem does this PR solve? Problem Summary: After query finished, the audit event will be created and: 1. Try to put into `queryAuditEventList` in `WorkloadRuntimeStatusMgr` `WorkloadRuntimeStatusMgr` has a background thread to visit `queryAuditEventList`, try to wait for the statistic info of the event(such as scan rows, cpu time, etc.) before logging this event. The size of `queryAuditEventList` is very large, default is `250000`, which is configured by FE config `audit_event_log_queue_size`. So that it can always hold all events within a certain period (typically, 5 seconds). 2. If `queryAuditEventList` is full, the statistic info of this audit event will be ignored, and event will be logged directly. Either the event has the statistic info or not, it will be put into a `eventQueue` in `AuditEventProcessor`. But this queue only has capacity of `10000`, which may not enough to cache all events in a certain periods. So some of audit event will be discarded and lost. This PR mainly changes: 1. Use the same config `audit_event_log_queue_size` as the max size of `eventQueue`. 2. Change some log message for easy debugging in future.
What problem does this PR solve?
Problem Summary:
After query finished, the audit event will be created and:
Try to put into
queryAuditEventListinWorkloadRuntimeStatusMgrWorkloadRuntimeStatusMgrhas a background thread to visitqueryAuditEventList, try to wait forthe statistic info of the event(such as scan rows, cpu time, etc.) before logging this event.
The size of
queryAuditEventListis very large, default is250000, which is configured by FE configaudit_event_log_queue_size. So that it can always hold all events within a certain period (typically, 5 seconds).If
queryAuditEventListis full, the statistic info of this audit event will be ignored, and event will be logged directly.Either the event has the statistic info or not, it will be put into a
eventQueueinAuditEventProcessor.But this queue only has capacity of
10000, which may not enough to cache all events in a certain periods.So some of audit event will be discarded and lost.
This PR mainly changes:
audit_event_log_queue_sizeas the max size ofeventQueue.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)