-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](memory) Fix BE OOM when load -238 fail #12666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](memory) Fix BE OOM when load -238 fail #12666
Conversation
| error->set_tablet_id(writers[i]->tablet_id()); | ||
| error->set_msg(err_msg); | ||
| _broken_tablets.insert(writers[i]->tablet_id()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If some error happened while flush_memtable_and_wait, we should not call wait_flush on that delta_writer in L260, in that situation, wait_flush might wait infinitely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait_flush should not wait infinitely, and the flush token that fails to submit the flush will not actually wait.
Do not return error directly, pass the error to the sink to decide whether to terminate the load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, _broken_tablets does not wait flush
zhannngchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated. Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated. Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated. Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated. Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated. Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated. Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
Proposed changes
Issue Number: close #12661
Problem summary
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated.
Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
Before fix:

After fix:

Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...