Skip to content

Conversation

@xinyiZzz
Copy link
Contributor

@xinyiZzz xinyiZzz commented Apr 21, 2022

Proposed changes

Issue Number: close #7196

Problem Summary:

  1. fix track bthread
  1. fix track vectorized query
  • Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine.
  • Refactored ThreadContext to avoid dependency conflicts and make it easier to debug.
  • Fix some bugs.

Checklist(Required)

  1. Does it affect the original behavior: (Yes)
  2. Has unit tests been added: (No)
  3. Has document been added or modified: (No)
  4. Does it need to update dependencies: (No)
  5. Are there any changes that cannot be rolled back: (Yes)

Further comments

1. Stability

1) mem_limit=10M

This means that the memory limit of the BE process is 10M. At this time, the BE process will still start normally, but cannot do anything, and the query will return an error:
This means that after BE is started, the process mem limit has left -156M.

ERROR 1105 (HY000): errCode = 2, detailMessage = Memory exceed limit. fragment=89c882f82fa74db1-b8010703e7974fed, details=PartitionedHashTableCtx::ExprValuesCache failed to allocate 53760 bytes, on backend=10.81.85.89. Memory left in process limit=-156100257.00 B . failed alloc=<Memory limit exceeded: ExecNode:Exprs:AGGREGATION_NODE (id=65): TryConsume failed, bytes=53760 process whole consumption=244424704 mem limit=10485760>. current tracker=ExecNode:Exprs:AGGREGATION_NODE (id=65) . If this is a query, can change the limit b

2) mem_limit=80% (default), set exec_mem_limit=10M

The query returns the error:
At this time, the query mem limit is 10485760B, 10464520B has been used, and the re-allocation of 32768B fails.
The memory application information reported when switching the MemTracker is aggregator, while execute get_next.
The tracker belongs to ExecNode:VAGGREGATION_NODE, and the consumption of the tracker in the TCMalloc Hook fails.

ERROR 1105 (HY000): errCode = 2, detailMessage = Memory limit exceeded: Memory exceed limit. fragment=, details=In TCMalloc Hook, aggregator, while execute get_next., on backend=10.81.85.89. Memory left in process limit=300.81 GB. failed alloc=<Memory limit exceeded: label=queryId=fdb6664525fe40f9-8bc275d2c86d2546 TryConsume failed size=32768, used=10464520, limit=10485760>. current tracker=ExecNode:VAGGREGATION_NODE (id=24). If this is a query, can change the limit by session variable exec_mem_limit.

3) mem_limit=99%, set exec_mem_limit=20G

(waiting for the test..., expecting BE will not crash)

2. Performance

Similar to #8669 test conclusion on row storage, the new memory statistics framework will also bring about a 2% performance loss on vectorized queries.

For POC performance testing, consider turning off the detailed memory track memory_verbose_track=false, which will avoid a 1% performance loss, and further completely turn off the memory track track_new_delete=false, which will further avoid a 1% performance loss.

1) TEST 1 - SSB 600w

Env: 1 FE, 1 BE;
Test Set: ssb LINEORDER 600w;
Default session veriables;
jmeter thread=20;
conf Q1.1(qps, avg time ms) Q1.2 Q1.3 Q2,1 Q2.2 Q3.1 Q4.1
track_new_delete=false 175.0/s, 110 ms 214.1/s, 90 ms 225.7/s, 86 ms 72.5/s, 268 ms 71.6/s, 272 ms 68.0/s, 286 ms 45.8/s, 424 ms
track_new_delete=true 170.1/s, 114 ms 211.6/s, 91 ms 222.9/s, 87 ms 72.1/s, 269 ms 71.3/s, 273 ms 67.5/s, 288 ms 45.7/s, 427 ms

2) TEST 2 - SSB 60003w

Env: 1 FE, 1 BE;
Test Set: ssb LINEORDER 60003w;
Default session veriables;
jmeter thread=1;
conf Q1.1(qps, avg time ms) Q1.2 Q1.3 Q2,1 Q2.2 Q3.1 Q4.1
track_new_delete=false 2.0/s, 490 ms 12.7/s, 77 ms 13.8/s, 72 ms 0.0/s, 28306 ms 0.0/s, 20577 ms 0.0/s, 41081 ms 0.0/s, 48236 ms
track_new_delete=true 1.9/s, 525 ms 11.8/s, 84 ms 13.0/s, 76 ms 0.0/s, 28531 ms 0.0/s, 20970 ms 0.0/s, 42959 ms 0.0/s, 48723 ms

3) TEST 3 - small query

Env: 1 FE, 1 BE;
Test Set: ssb LINEORDER 600w;
Default session veriables;
jmeter thread=100;
conf select LO_EXTENDEDPRICE from LINEORDER2 where LO_EXTENDEDPRICE = 5273584 limit 1;
track_new_delete=false 8275.7/s, 11 ms
track_new_delete=true 8203.6/s, 11 ms

3. Observability

Env: 1 FE, 1 BE
Test Set: ssb LINEORDER 60003w
track_new_delete=true
memory_verbose_trace=true
set parallel_fragment_exec_instance_num=2;
Test SQL: SSB Q3.1
         SELECT C_NATION, S_NATION, D_YEAR,
          SUM(LO_REVENUE)  AS  REVENUE
          FROM customer, lineorder, supplier, dates
          WHERE  LO_CUSTKEY = C_CUSTKEY
          AND LO_SUPPKEY = S_SUPPKEY
          AND  LO_ORDERDATE = D_DATEKEY
          AND C_REGION = 'ASIA'
          AND S_REGION = 'ASIA'
          AND D_YEAR >= 1992 AND D_YEAR <= 1997
          GROUP BY C_NATION, S_NATION, D_YEAR
          ORDER BY D_YEAR ASC,  REVENUE DESC;

see: BeIP:HttpPort/mem_tracker

// The Level use to decide whether to show it in web page,
// each MemTracker have a Level less than or equal to parent, only be set explicit,
// TASK contains query, import, compaction, etc.
enum class MemTrackerLevel { OVERVIEW = 0, TASK, INSTANCE, VERBOSE };
  1. mem_tracker_level=0 (default OVERVIEW)

init
image

query ing
image

query done
image

  1. mem_tracker_level=1 (TASK)

query ing
image

  1. mem_tracker_level=2 (INSTANCE)

query ing
image

  1. mem_tracker_level=3 (VERBOSE)

query ing
image
image
image
image

query done
image

query ing, set parallel_fragment_exec_instance_num=10;

ERROR 1105 (HY000): errCode = 2, detailMessage = Memory limit exceeded: Memory exceed limit. fragment=, details=In TCMalloc Hook, aggregator, while execute get_next., on backend=10.81.85.89. Memory left in process limit=293.08 GB. failed alloc=<Memory limit exceeded: label=queryId=5e72ea753ed5438b-89adf8b2a56f6bf1 TryConsume failed size=4096, used=2147853311, limit=2147483648>. current tracker=ExecNode:VAGGREGATION_NODE (id=7). If this is a query, can change the limit by session variable exec_mem_limit.

query ing, set parallel_fragment_exec_instance_num=10, set exec_mem_limit= 21474836480;
image

@xinyiZzz xinyiZzz force-pushed the switch_tls_tracker5_fix_bthread_fix_vec branch from dbc64c5 to 077bdbb Compare April 21, 2022 03:43
@xinyiZzz
Copy link
Contributor Author

cc @morningman @yangzhg

@xinyiZzz xinyiZzz changed the title [feature-wip] (memory tracker) (step5) Track track bthread, fix track vectorized query [feature-wip] (memory tracker) (step5) Fix track bthread, fix track vectorized query Apr 21, 2022
@morningman morningman added kind/fix Categorizes issue or PR as related to a bug. kind/improvement area/memory-consumption dev/backlog waiting to be merged in future dev branch labels Apr 21, 2022

inline void ThreadMemTrackerMgr::add_tracker(const std::shared_ptr<MemTracker>& mem_tracker) {
DCHECK(_mem_trackers.find(mem_tracker->id()) == _mem_trackers.end()) << print_debug_string();
if (_mem_trackers.find(mem_tracker->id()) == _mem_trackers.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add a DCHECK before, than I don't think we need to call find again. Just insert the mem tracker directly into the _mem_trackers.

Copy link
Contributor Author

@xinyiZzz xinyiZzz Apr 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll remove if find .

But this is intentional. Because duplication of add tracker in some places may be unavoidable.
(Not currently, future prs will include this, but I'll try to avoid it)

_mem_trackers[mem_tracker->id()] = mem_tracker;
DCHECK(_mem_trackers[mem_tracker->id()]) << print_debug_string();
_untracked_mems[mem_tracker->id()] = 0;
_mem_tracker_labels[_temp_tracker_id] = mem_tracker->label();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is _temp_tracker_id mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reusable variable, Avoid memory allocation in functions and fall into an infinite loop. Has annotations.

int64_t switch_count = 0;

std::string print_debug_string() {
std::stringstream mem_trackers_str;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use fmt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt doesn't seem to support an indeterminate number of string concatenations.

I changed to string +=, which seems to be faster than stringstream. StringBuilder seems to be a better solution, but not necessary here = =

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can search fmt in be/src/exec/tablet_sink.cpp to see example of concating string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!
done, I learned.

@xinyiZzz xinyiZzz force-pushed the switch_tls_tracker5_fix_bthread_fix_vec branch from b18f1c3 to a3173b0 Compare April 27, 2022 06:57
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 26bc462 into apache:master Apr 27, 2022
zhengshiJ pushed a commit to zhengshiJ/incubator-doris that referenced this pull request Apr 29, 2022
…ectorized query (apache#9145)

1. fix track bthread
- Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS).
- This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker.
Ref: https://github.com/apache/incubator-brpc/blob/731730da85f6af5c25012b4c83ab5bb371320cf8/docs/en/server.md#bthread-local

2. fix track vectorized query
- Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine.
- Refactored ThreadContext to avoid dependency conflicts and make it easier to debug.
- Fix some bugs.
Kikyou1997 pushed a commit to Kikyou1997/incubator-doris that referenced this pull request May 9, 2022
…ectorized query (apache#9145)

1. fix track bthread
- Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS).
- This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker.
Ref: https://github.com/apache/incubator-brpc/blob/731730da85f6af5c25012b4c83ab5bb371320cf8/docs/en/server.md#bthread-local

2. fix track vectorized query
- Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine.
- Refactored ThreadContext to avoid dependency conflicts and make it easier to debug.
- Fix some bugs.
starocean999 pushed a commit to starocean999/incubator-doris that referenced this pull request May 19, 2022
…ectorized query (apache#9145)

1. fix track bthread
- Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS).
- This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker.
Ref: https://github.com/apache/incubator-brpc/blob/731730da85f6af5c25012b4c83ab5bb371320cf8/docs/en/server.md#bthread-local

2. fix track vectorized query
- Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine.
- Refactored ThreadContext to avoid dependency conflicts and make it easier to debug.
- Fix some bugs.
englefly pushed a commit to englefly/incubator-doris that referenced this pull request May 23, 2022
…ectorized query (apache#9145)

1. fix track bthread
- Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS).
- This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker.
Ref: https://github.com/apache/incubator-brpc/blob/731730da85f6af5c25012b4c83ab5bb371320cf8/docs/en/server.md#bthread-local

2. fix track vectorized query
- Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine.
- Refactored ThreadContext to avoid dependency conflicts and make it easier to debug.
- Fix some bugs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/memory-consumption area/vectorization dev/backlog waiting to be merged in future dev branch kind/fix Categorizes issue or PR as related to a bug. kind/improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Refactored memory statistics framework MemTracker

2 participants