-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Enhancement] [Memory] [Vectorized] Stress test and optimize memory allocation #9581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } while (!_reserved_bytes.compare_exchange_weak(old_reserved_bytes, new_reserved_bytes)); | ||
|
|
||
| // Reduce set metric frequency | ||
| if (_reserved_bytes % 100 == 32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to make sure the correctness?
At the first look, (_reserved_bytes % 100 < 32) or (_reserved_bytes % 100 > 32) both will not update the metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the first look, ChunkAllocator will allocate/free many times, the memory size of each allocate/free is a multiple of 2, so _reserved_bytes% 100 == 32 will definitely happen, and the latest _reserved_bytes value will be set each time .
The real-time and accurate _reserved_bytes value is not required. Usually, the value of _reserved_bytes is equal to ChunkAllocator MemTracker. The _reserved_bytes metric is only concerned when verifying the accuracy of MemTracker.
Therefore, reduce the number of sets and reduce the performance impact.
9b9b6bb to
f97096d
Compare
db7b85a to
345679c
Compare
| void* buf; | ||
|
|
||
| if (size >= MMAP_THRESHOLD) { | ||
| if (alignment > MMAP_MIN_ALIGNMENT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not call populate to populate the memory to avoid too many page fault during usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for zero-fill, because mmap guarantees it.
345679c to
c20832c
Compare
|
Based on the latest master (commit id: 7898c81) |
yiguolei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Proposed changes
Issue Number: close #9540
#9580
Problem Summary:
High concurrency stress test on SSB and wide table. Compare the performance of turning the vectorization engine on and off. Turning on the vectorization engine is slower for most SSB queries.
Optimize the Allocator in the vectorization engine. In most queries, the performance is improved by about 10%.
Memory allocation between 4KB and 64MB will be through ChunkAllocator, those less than 4KB will be through malloc, and those greater than 64MB will be through MMAP.
Optimize Chunk Allocator, increase the limit that allows chunks to be stolen from other core's arena, and optimize reserved bytes conf.
Checklist(Required)
Further comments
Stress testing the vectorization engine.
1. Env and Test Set
2. Test
3. Detailed description
In allocator.h, Memory allocation between 4KB and 64MB will be through ChunkAllocator, those less than 4KB will be through malloc (for example, tcmalloc), and those greater than 64MB will be through MMAP.
In the actual test, chunkallocator allocates less than 4KB of memory slower than malloc, and chunkallocator allocates more than 64MB of memory slower than MMAP, but the 4KB threshold is an empirical value, which needs to be determined by more detailed test later.