Skip to content

Conversation

@xinyiZzz
Copy link
Contributor

@xinyiZzz xinyiZzz commented Mar 22, 2022

Proposed changes

Issue Number: close #7196 (step 3.1/3)

Problem Summary:

In pr #8476, all memory usage of a process is recorded in the process mem tracker, and all memory usage of a query is recorded in the query mem tracker, and it is still necessary to manually call transfer to to track the cached memory size.

We hope to separate out more detailed memory usage based on Hook TCMalloc new/delete + TLS mem tracker.

In this pr, the more detailed mem tracker is switched to TLS, which automatically and accurately counts more detailed memory usage than before.

Checklist(Required)

  1. Does it affect the original behavior: (Yes)
  2. Has unit tests been added: (Yes)
  3. Has document been added or modified: (No)
  4. Does it need to update dependencies: (No)
  5. Are there any changes that cannot be rolled back: (No)

Further comments

  1. Added the method of switching mem tracker to TLS;
  • More accurate statistics of cache memory size. It is no longer necessary to manually calculate the memory size of each add/free cache, which can easily miss memory such as vector resize.
  • (TODO) Separate out the memory usage of each operator from the query mem tracker.
  1. Added callback behavior when switching TLS mem tracker consume fails;
  • Replace all methods of manually determining whether the instance mem tracker has exceeded limit in the loop, such as join node and agg node.

Statistical effect:

  • Initially (after BE restart)
    image

  • Query is executing
    image

  • After Query is over
    image

  • Load is executing
    Note: LoadChannelMgr consumption is 0, because this pr has not really called switch tls mem tracker, and the next pr will be modified in batches.
    image

  • After Load is over
    image

Note: If QueryPool and LoadPool are found to have negative values for a short time, this is normal, refer to the comments in MemTrackerTaskPool::logout_task_mem_tracker.

@morningman morningman added the dev/backlog waiting to be merged in future dev branch label Mar 23, 2022
nullptr, "In TCMalloc Hook, " + _consume_err_call_back.action_type, mem_usage, st);
if (_consume_err_call_back.call_back_func != nullptr) {
_consume_err_call_back.call_back_func();
nullptr, "In TCMalloc Hook, " + _consume_err_cb.cancel_msg, mem_usage, st);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nullptr, "In TCMalloc Hook, " + _consume_err_cb.cancel_msg, mem_usage, st);
nullptr, fmt::format("In TCMalloc Hook, {}", _consume_err_cb.cancel_msg), mem_usage, st);

Copy link
Contributor Author

@xinyiZzz xinyiZzz Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix~, for faster.

}

void ShardedLRUCache::release(Handle* handle) {
SCOPED_SWITCH_THREAD_LOCAL_MEM_TRACKER(_mem_tracker);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a metric counter for this SWITCH operation?
I want to know how many SWITCH may be called.

Copy link
Contributor Author

@xinyiZzz xinyiZzz Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I try to add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@xinyiZzz xinyiZzz force-pushed the switch_tls_tracker branch from 33e77da to b814868 Compare March 23, 2022 13:46
_shards[s] = new LRUCache(type);
_shards[s]->set_capacity(per_shard);
}
thread_local_ctx.get()->_thread_mem_tracker_mgr->clear_untracked_mems();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why clear untracked mem here?
you can add some comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the lru cache is created in the main thread, the main thread will not switch the lru cache mem tracker again.

After the non-query thread switches the mem tracker, if the thread will not switch the mem tracker again in the short term, can consider manually clear_untracked_mems.

The query thread will automatically clear_untracked_mems when detach_task.

@xinyiZzz xinyiZzz force-pushed the switch_tls_tracker branch from 0ac01eb to eb818f8 Compare March 24, 2022 03:48
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 24, 2022
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman
Copy link
Contributor

merge for quick test

@morningman morningman merged commit aaaaae5 into apache:master Mar 24, 2022
@xinyiZzz xinyiZzz changed the title [feature] (memory) Switch TLS mem tracker to separate more detailed memory usage [feature-wip] (memory) Switch TLS mem tracker to separate more detailed memory usage (Part1) Mar 30, 2022
@xinyiZzz xinyiZzz changed the title [feature-wip] (memory) Switch TLS mem tracker to separate more detailed memory usage (Part1) [feature-wip] (memory tracker) (step3) Switch TLS mem tracker to separate more detailed memory usage Apr 7, 2022
morningman pushed a commit that referenced this pull request Apr 8, 2022
…rate more detailed memory usage (#8669)

Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
weizhengte pushed a commit to weizhengte/incubator-doris that referenced this pull request Apr 22, 2022
…rate more detailed memory usage (apache#8669)

Based on apache#8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
zhengshiJ pushed a commit to zhengshiJ/incubator-doris that referenced this pull request Apr 27, 2022
…rate more detailed memory usage (apache#8669)

Based on apache#8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
starocean999 pushed a commit to starocean999/incubator-doris that referenced this pull request May 19, 2022
…rate more detailed memory usage (apache#8669)

Based on apache#8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/memory-consumption dev/backlog waiting to be merged in future dev branch reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Refactored memory statistics framework MemTracker

3 participants