Skip to content

Conversation

@xinyiZzz
Copy link
Contributor

@xinyiZzz xinyiZzz commented Sep 16, 2022

Proposed changes

Issue Number: close #12661

Problem summary

When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated.

Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.

Before fix:
image

After fix:
image

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

error->set_tablet_id(writers[i]->tablet_id());
error->set_msg(err_msg);
_broken_tablets.insert(writers[i]->tablet_id());
}
Copy link
Contributor

@zhannngchen zhannngchen Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If some error happened while flush_memtable_and_wait, we should not call wait_flush on that delta_writer in L260, in that situation, wait_flush might wait infinitely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_flush should not wait infinitely, and the flush token that fails to submit the flush will not actually wait.

Do not return error directly, pass the error to the sink to decide whether to terminate the load.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, _broken_tablets does not wait flush

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 16, 2022
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit 942b310 into apache:master Sep 16, 2022
FreeOnePlus pushed a commit to FreeOnePlus/doris that referenced this pull request Oct 8, 2022
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated.

Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
FreeOnePlus pushed a commit to FreeOnePlus/doris that referenced this pull request Oct 8, 2022
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated.

Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
FreeOnePlus pushed a commit to FreeOnePlus/doris that referenced this pull request Oct 8, 2022
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated.

Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
FreeOnePlus pushed a commit to FreeOnePlus/doris that referenced this pull request Oct 8, 2022
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated.

Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
FreeOnePlus pushed a commit to FreeOnePlus/doris that referenced this pull request Oct 8, 2022
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated.

Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
FreeOnePlus pushed a commit to FreeOnePlus/doris that referenced this pull request Oct 8, 2022
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated.

Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
@yiguolei yiguolei mentioned this pull request Oct 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

affects-1.1 approved Indicates a PR has been approved by one committer. dev/merged-1.1.3-deprecated reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] BE OOM when load fails

4 participants