[cwksp] Align all allocated "tables" and "aligneds" to 64 bytes#2546
[cwksp] Align all allocated "tables" and "aligneds" to 64 bytes#2546senhuang42 merged 6 commits intofacebook:devfrom
Conversation
lib/compress/zstd_compress.c
Outdated
| size_t const slackSpace = ZSTD_cwksp_slack_space_required(); | ||
|
|
||
| /* tables are guaranteed to be sized in multiples of 64 */ | ||
| ZSTD_STATIC_ASSERT(ZSTD_HASHLOG_MIN >= 6 && ZSTD_WINDOWLOG_MIN >= 6 && ZSTD_CHAINLOG_MIN >= 6); |
There was a problem hiding this comment.
Technically you should be checking hash & chain logs against 4, since they allocate U32's
lib/compress/zstd_compress.c
Outdated
| */ | ||
| assert(ZSTD_cwksp_used(ws) >= neededSpace && | ||
| ZSTD_cwksp_used(ws) <= neededSpace + 3); | ||
| assert(ZSTD_cwksp_used(ws) >= neededSpace && ZSTD_cwksp_used(ws) <= neededSpace + 3); |
There was a problem hiding this comment.
Is this 3 extra bytes comment still true?
There was a problem hiding this comment.
Nope! I guess we can be exactly precise now.
lib/compress/zstd_cwksp.h
Outdated
| if (ws->tableExtraAlignmentBytes == ZSTD_CWKSP_ALIGNMENT_BYTES) { | ||
| /* Due to ASAN fixed-size allocation size increase, we must always | ||
| perform two allocations that sum to 64 bytes. */ | ||
| ZSTD_cwksp_reserve_internal(ws, ws->tableExtraAlignmentBytes/2, ZSTD_cwksp_alloc_tables); |
There was a problem hiding this comment.
Could this be cleaned up by allocating a 0-sized object when bytesToAlign == 0?
lib/compress/zstd_cwksp.h
Outdated
| size_t bytesToAlign = ZSTD_CWKSP_ALIGNMENT_BYTES - ZSTD_cwksp_bytes_to_align_ptr(ws->allocStart, ZSTD_CWKSP_ALIGNMENT_BYTES); | ||
| if (bytesToAlign == ZSTD_CWKSP_ALIGNMENT_BYTES) bytesToAlign = 0; |
There was a problem hiding this comment.
nit: This would be more simply represented as (uintptr_t)ws->allocStart & ZSTD_CWSKP_ALIGNMENT_MASK.
There was a problem hiding this comment.
Nice! though iirc uintptr_t isn't portable enough for the CI tests.
lib/compress/zstd_cwksp.h
Outdated
| #if ZSTD_ADDRESS_SANITIZER && !defined (ZSTD_ASAN_DONT_POISON_WORKSPACE) | ||
| alloc = (BYTE*)alloc - 2 * ZSTD_CWKSP_ASAN_REDZONE_SIZE; | ||
| #endif | ||
| if (alloc < ws->tableValidEnd) { | ||
| ws->tableValidEnd = alloc; | ||
| } | ||
| ws->allocStart = alloc; | ||
| #if ZSTD_ADDRESS_SANITIZER && !defined (ZSTD_ASAN_DONT_POISON_WORKSPACE) | ||
| alloc = (BYTE *)alloc + ZSTD_CWKSP_ASAN_REDZONE_SIZE; | ||
| if (ws->isStatic == ZSTD_cwksp_dynamic_alloc) { | ||
| __asan_unpoison_memory_region(alloc, bytesToAlign); | ||
| } | ||
| #endif | ||
| } |
There was a problem hiding this comment.
Why do we need thee ASAN stuff here? We are never actually going to use this memory, so we shouldn't need to unpoison it, or reserve extra redzone space, right?
There was a problem hiding this comment.
I thought it'd be nice for consistencies sake, though yeah, we aren't really supposed to access this memory - removing this will make the code a lot nicer, and we can refactor it a bit to share the common code.
a2366f7 to
f4c34f1
Compare
terrelln
left a comment
There was a problem hiding this comment.
This looks good to me, but @felixhandte can you take a look?
felixhandte
left a comment
There was a problem hiding this comment.
So, as I understand it, this makes things:
[objects][padding][tables->][padding] ... gap? ... [padding][<-aligned][padding][buffers]
However, if everything in tables and aligned is a multiple of 64 bytes, which you enforce AFAICT, then you're guaranteed to not need any padding between them. So you could remove 64 bytes of overhead and just do:
[objects][padding1][tables->] ... gap? ... [<-aligned][padding2][buffers]
And padding1 + padding2 == 64.
@felixhandte so as discussed offline, I've changed this to just be padding in front of the tables and aligneds, rather than in front and after. This does widen our estimation error range though, in cases where we reuse a context (or if it's static) and it's too large, i.e. there is a gap between the tables and aligned that we can't fully accurately account for (is there a way to detect cwksp re-use when it hadn't been resized?) |
b9b64dd to
a2274cf
Compare
…ce as needed for tables/aligned
felixhandte
left a comment
There was a problem hiding this comment.
It doesn't seem necessary to enforce sequencing of aligned -> table allocations. You could apply both alignment allocations when you transition from buffers -> aligned. Maybe you should back that out?
Otherwise this looks good.
6c52be2 to
78a7481
Compare
Sure thing - do we want to adjust the comment: Or is there some other mechanism that depends on this order somewhere else? The code as-is should be allocate aligned/tables in whatever order (tables also have the " |
|
Ping @felixhandte |
felixhandte
left a comment
There was a problem hiding this comment.
Yeah it might be nice to update the comment.
Then other thing we had talked about was applying the stricter used == needed assertion when the workspace is newly allocated. Not necessary though.
Looks good!
9d822a3 to
22a6f9d
Compare
The general strategy is:
ZSTD_cwksp_internal_advance_phase().ZSTD_cwksp_aligned_alloc_size()- however, the tables don't actually need this since they're already guaranteed to be in multiples of 64 (we additionally static assert this to be true).ZSTD_cwksp_slack_space_required()explain this mechanism.The two public functions added:
ZSTD_cwksp_slack_space_required()andZSTD_cwksp_finalize()are meant to be generic - the former returning all space required for internal purposes, and the latter handling any final things to take care of in the wksp after we're done allocating the buffers. In both cases we currently just deal with alignment.Test Plan: