emmalloc: Fix allocations of high alignment#20704
Conversation
system/lib/emmalloc.c
Outdated
| @@ -657,8 +657,6 @@ static size_t validate_alloc_alignment(size_t alignment) | |||
| // Cannot perform allocations that are less than 4 byte aligned, because the Region | |||
There was a problem hiding this comment.
Perhaps replace 4 byte with MALLOC_ALIGNMENT here
There was a problem hiding this comment.
I actually think this might be accurate, as in wasm32 those data structures contain 32-bit pointers, and so they need 4-byte alignment (to avoid being unaligned). At least that's how I interpret the comment?
There was a problem hiding this comment.
Somehow the code and comments have become out of sync then.
Why not just say "Also round up to minimum outputted alignment." since that is the only thing the code is doing. It is not specifically handling 4-byte alignment.
system/lib/emmalloc.c
Outdated
| alignment = MAX(alignment, MALLOC_ALIGNMENT); | ||
| // Arbitrary upper limit on alignment - very likely a programming bug if alignment is higher than this. | ||
| assert(alignment <= 1024*1024); | ||
| return alignment; |
There was a problem hiding this comment.
Just return MAX(alignment, MALLOC_ALIGNMENT);?
| // MALLOC_ALIGNMENT), but it can matter if the alignment is very high. In that | ||
| // case, not adding the alignment can lead to this sbrk not giving us enough | ||
| // (in which case, the next attempt fails and will sbrk the same amount again, | ||
| // potentially allocating a lot more memory than necessary). |
There was a problem hiding this comment.
I'm not sure I understand the status quo.. does did this work at all without this change?
There was a problem hiding this comment.
It worked, but it would do the recursion in the earlier code twice. So it allocated a lot more than it needed, until it finally found enough room for the right size with the right allocation.
There was a problem hiding this comment.
Wouldn't iteratively doing this actually use less memory in some cases? i.e. you could get lucky and get a good alignmenty without quite as much over allocation?
With this change are are guaranteed to waste alignment bytes on each allocation aren't we? In the mimalloc case that would be ~4Mb for each allocation. Or we we return the over-allocated chunk to the free list?
There was a problem hiding this comment.
I believe we potentially reuse the extra space, Here is where the early unused space is added to the region before us:
emscripten/system/lib/emmalloc.c
Lines 604 to 617 in 87b24d7
Overall emmalloc's goal is to be compact and not the most optimal allocator, so I think this is a reasonable compromise. It's true that we might get lucky without this, as you said, and just happen to be aligned, but in large alignments that is less and less likely, and it's better to do the more straightforward and less risky thing.
There was a problem hiding this comment.
I think this we should wait for @juj to weigh in there.
There was a problem hiding this comment.
Talking offline, @juj might not be available to review this for a bit and suggested it land for now if it is urgent, and be reviewed later. I think that makes sense as I'd like to get this in to avoid mimalloc using 2x the memory it needs to, in the first release with mimalloc. @sbc100 does that sound ok?
system/lib/emmalloc.c
Outdated
| // case, not adding the alignment can lead to this sbrk not giving us enough | ||
| // (in which case, the next attempt fails and will sbrk the same amount again, | ||
| // potentially allocating a lot more memory than necessary). | ||
| numBytesToClaim += alignment - MALLOC_ALIGNMENT; |
There was a problem hiding this comment.
So alignment - MALLOC_ALIGNMENT is the most we would ever need to add to the resulting allocation to achieve and an alignment of alignment?
There was a problem hiding this comment.
I think so, yes. What I am not certain of is whether the 3* figure a few lines above this include slack or if they are precise, which could matter here.
Another way to go could is if alignment > malloc alignment then add the alignment?
There was a problem hiding this comment.
I think doing it that way would make the code more straight forwardly correct to a reader.
|
Code size changes here appear on |
|
I'll do a separate codesize update on the main branch if thats ok with you? I do these from time to time and push them directly to avoid mixing code size change with unrelated changes like this one. |
|
Makes sense, sounds good to update those expectations separately. I see you've done that already. |
|
Tests failed here in an interesting way temporarily:
It seems odd that
Based on that, I think this PR is actually a bugfix. However, I am confused as to why we do not combine regions after a first failed allocation - that seems wrong, so it's either a bug or a misunderstanding of mine. Still looking into that. |
|
#20759 fixes that issue: indeed an emmalloc bug caused us to fail to combined adjacent regions, due to an sbrk alignment issue. With that PR, the test passes in the middle case ( |
If we give it an unaligned size, then it will align it, and then memory will seem non-contiguous. That is, if we sbrk 4 bytes from address 0 then it returns 0 (the start of the allocation), and logically we reserved the range 0-4. Our next sbrk of any amount will return 8, because sbrk aligns up the address to multiples of 8. So it seems like we have a region 0-4 and another 8-. Because of this emmalloc would generate separate root regions for them, that is, they would not get merged because of the 4 bytes of dead space in the middle. With this PR, the amount of new regions created in test_emmalloc_memvalidate_verbose goes from 1311 to 1234. I'm not sure if it's worth testing that, as it would make updating the test difficult, and this is such an internal detail of emmalloc. Also add some verbose logging that helps debug this. Noticed in #20704 (comment)
If we give it an unaligned size, then it will align it, and then memory will seem non-contiguous. That is, if we sbrk 4 bytes from address 0 then it returns 0 (the start of the allocation), and logically we reserved the range 0-4. Our next sbrk of any amount will return 8, because sbrk aligns up the address to multiples of 8. So it seems like we have a region 0-4 and another 8-. Because of this emmalloc would generate separate root regions for them, that is, they would not get merged because of the 4 bytes of dead space in the middle. With this PR, the amount of new regions created in test_emmalloc_memvalidate_verbose goes from 1311 to 1234. I'm not sure if it's worth testing that, as it would make updating the test difficult, and this is such an internal detail of emmalloc. Also add some verbose logging that helps debug this. Noticed in emscripten-core#20704 (comment)
Take the alignment into account when requesting new memory from the OS. If the alignment is high then we need to ask for enough to ensure we end up aligned, or else we can end up allocating double the memory we need (see emscripten-core#20645). This only changes the amount we allocate from the OS if the alignment is non- standard, that is, this is NFC for normal calls to malloc. Also remove an assertion that limited the maximum alignment. mimalloc wants 4 MB alignment. Fixes emscripten-core#20654
#20704 removed the maximum alignment limit from emmalloc, so we no longer need it in the mimalloc port. Without it, we can properly use the 4GB-aligned pages that mimalloc requests.
Take the alignment into account when requesting new memory from the OS. If the
alignment is high then we need to ask for enough to ensure we end up aligned, or
else we can end up allocating double the memory we need (see #20645).
This only changes the amount we allocate from the OS if the alignment is non-
standard, that is, this is NFC for normal calls to
malloc.Also remove an assertion that limited the maximum alignment. mimalloc wants
4 MB alignment.
Fixes #20654