-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature-wip] (memory tracker) (step5) Fix track bthread, fix track vectorized query #9145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature-wip] (memory tracker) (step5) Fix track bthread, fix track vectorized query #9145
Conversation
dbc64c5 to
077bdbb
Compare
|
|
||
| inline void ThreadMemTrackerMgr::add_tracker(const std::shared_ptr<MemTracker>& mem_tracker) { | ||
| DCHECK(_mem_trackers.find(mem_tracker->id()) == _mem_trackers.end()) << print_debug_string(); | ||
| if (_mem_trackers.find(mem_tracker->id()) == _mem_trackers.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add a DCHECK before, than I don't think we need to call find again. Just insert the mem tracker directly into the _mem_trackers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll remove if find .
But this is intentional. Because duplication of add tracker in some places may be unavoidable.
(Not currently, future prs will include this, but I'll try to avoid it)
| _mem_trackers[mem_tracker->id()] = mem_tracker; | ||
| DCHECK(_mem_trackers[mem_tracker->id()]) << print_debug_string(); | ||
| _untracked_mems[mem_tracker->id()] = 0; | ||
| _mem_tracker_labels[_temp_tracker_id] = mem_tracker->label(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is _temp_tracker_id mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reusable variable, Avoid memory allocation in functions and fall into an infinite loop. Has annotations.
| int64_t switch_count = 0; | ||
|
|
||
| std::string print_debug_string() { | ||
| std::stringstream mem_trackers_str; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use fmt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmt doesn't seem to support an indeterminate number of string concatenations.
I changed to string +=, which seems to be faster than stringstream. StringBuilder seems to be a better solution, but not necessary here = =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can search fmt in be/src/exec/tablet_sink.cpp to see example of concating string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool!
done, I learned.
b18f1c3 to
a3173b0
Compare
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ectorized query (apache#9145) 1. fix track bthread - Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS). - This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker. Ref: https://github.com/apache/incubator-brpc/blob/731730da85f6af5c25012b4c83ab5bb371320cf8/docs/en/server.md#bthread-local 2. fix track vectorized query - Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine. - Refactored ThreadContext to avoid dependency conflicts and make it easier to debug. - Fix some bugs.
…ectorized query (apache#9145) 1. fix track bthread - Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS). - This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker. Ref: https://github.com/apache/incubator-brpc/blob/731730da85f6af5c25012b4c83ab5bb371320cf8/docs/en/server.md#bthread-local 2. fix track vectorized query - Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine. - Refactored ThreadContext to avoid dependency conflicts and make it easier to debug. - Fix some bugs.
…ectorized query (apache#9145) 1. fix track bthread - Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS). - This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker. Ref: https://github.com/apache/incubator-brpc/blob/731730da85f6af5c25012b4c83ab5bb371320cf8/docs/en/server.md#bthread-local 2. fix track vectorized query - Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine. - Refactored ThreadContext to avoid dependency conflicts and make it easier to debug. - Fix some bugs.
…ectorized query (apache#9145) 1. fix track bthread - Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS). - This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker. Ref: https://github.com/apache/incubator-brpc/blob/731730da85f6af5c25012b4c83ab5bb371320cf8/docs/en/server.md#bthread-local 2. fix track vectorized query - Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine. - Refactored ThreadContext to avoid dependency conflicts and make it easier to debug. - Fix some bugs.
Proposed changes
Issue Number: close #7196
Problem Summary:
Ref: https://github.com/apache/incubator-brpc/blob/731730da85f6af5c25012b4c83ab5bb371320cf8/docs/en/server.md#bthread-local
Checklist(Required)
Further comments
1. Stability
1) mem_limit=10M
This means that the memory limit of the BE process is 10M. At this time, the BE process will still start normally, but cannot do anything, and the query will return an error:
This means that after BE is started, the process mem limit has left -156M.
2) mem_limit=80% (default), set exec_mem_limit=10M
The query returns the error:
At this time, the query mem limit is 10485760B, 10464520B has been used, and the re-allocation of 32768B fails.
The memory application information reported when switching the MemTracker is
aggregator, while execute get_next.The tracker belongs to
ExecNode:VAGGREGATION_NODE, and the consumption of the tracker in the TCMalloc Hook fails.3) mem_limit=99%, set exec_mem_limit=20G
(waiting for the test..., expecting BE will not crash)
2. Performance
Similar to #8669 test conclusion on row storage, the new memory statistics framework will also bring about a 2% performance loss on vectorized queries.
For POC performance testing, consider turning off the detailed memory track
memory_verbose_track=false, which will avoid a 1% performance loss, and further completely turn off the memory tracktrack_new_delete=false, which will further avoid a 1% performance loss.1) TEST 1 - SSB 600w
2) TEST 2 - SSB 60003w
3) TEST 3 - small query
3. Observability
see: BeIP:HttpPort/mem_tracker