Skip to content

[cwksp] Align all allocated "tables" and "aligneds" to 64 bytes#2546

Merged
senhuang42 merged 6 commits intofacebook:devfrom
senhuang42:wksp_alignment
Apr 2, 2021
Merged

[cwksp] Align all allocated "tables" and "aligneds" to 64 bytes#2546
senhuang42 merged 6 commits intofacebook:devfrom
senhuang42:wksp_alignment

Conversation

@senhuang42
Copy link
Copy Markdown

The general strategy is:

  1. When we begin allocating aligneds or tables, move the start of that section so that it is 64-byte aligned, done through ZSTD_cwksp_internal_advance_phase().
  2. Guarantee that the size of every table allocated in the aligned or tables section are all 64-bytes, therefore all of them will be 64-byte aligned. This is done with ZSTD_cwksp_aligned_alloc_size() - however, the tables don't actually need this since they're already guaranteed to be in multiples of 64 (we additionally static assert this to be true).
  3. Since we want our wksp estimation to still be accurate, we also will always reserve 64+64 bytes for the purposes of aligning the beginnings of the tables and aligned. The comments in ZSTD_cwksp_slack_space_required() explain this mechanism.

The two public functions added: ZSTD_cwksp_slack_space_required() and ZSTD_cwksp_finalize() are meant to be generic - the former returning all space required for internal purposes, and the latter handling any final things to take care of in the wksp after we're done allocating the buffers. In both cases we currently just deal with alignment.

Test Plan:

  • Unit tests, manual spot checking of 64-byte alignment (and asserts that guarantee them)

size_t const slackSpace = ZSTD_cwksp_slack_space_required();

/* tables are guaranteed to be sized in multiples of 64 */
ZSTD_STATIC_ASSERT(ZSTD_HASHLOG_MIN >= 6 && ZSTD_WINDOWLOG_MIN >= 6 && ZSTD_CHAINLOG_MIN >= 6);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically you should be checking hash & chain logs against 4, since they allocate U32's

*/
assert(ZSTD_cwksp_used(ws) >= neededSpace &&
ZSTD_cwksp_used(ws) <= neededSpace + 3);
assert(ZSTD_cwksp_used(ws) >= neededSpace && ZSTD_cwksp_used(ws) <= neededSpace + 3);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 3 extra bytes comment still true?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! I guess we can be exactly precise now.

if (ws->tableExtraAlignmentBytes == ZSTD_CWKSP_ALIGNMENT_BYTES) {
/* Due to ASAN fixed-size allocation size increase, we must always
perform two allocations that sum to 64 bytes. */
ZSTD_cwksp_reserve_internal(ws, ws->tableExtraAlignmentBytes/2, ZSTD_cwksp_alloc_tables);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be cleaned up by allocating a 0-sized object when bytesToAlign == 0?

Comment on lines +268 to +269
size_t bytesToAlign = ZSTD_CWKSP_ALIGNMENT_BYTES - ZSTD_cwksp_bytes_to_align_ptr(ws->allocStart, ZSTD_CWKSP_ALIGNMENT_BYTES);
if (bytesToAlign == ZSTD_CWKSP_ALIGNMENT_BYTES) bytesToAlign = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This would be more simply represented as (uintptr_t)ws->allocStart & ZSTD_CWSKP_ALIGNMENT_MASK.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! though iirc uintptr_t isn't portable enough for the CI tests.

Comment on lines +273 to +286
#if ZSTD_ADDRESS_SANITIZER && !defined (ZSTD_ASAN_DONT_POISON_WORKSPACE)
alloc = (BYTE*)alloc - 2 * ZSTD_CWKSP_ASAN_REDZONE_SIZE;
#endif
if (alloc < ws->tableValidEnd) {
ws->tableValidEnd = alloc;
}
ws->allocStart = alloc;
#if ZSTD_ADDRESS_SANITIZER && !defined (ZSTD_ASAN_DONT_POISON_WORKSPACE)
alloc = (BYTE *)alloc + ZSTD_CWKSP_ASAN_REDZONE_SIZE;
if (ws->isStatic == ZSTD_cwksp_dynamic_alloc) {
__asan_unpoison_memory_region(alloc, bytesToAlign);
}
#endif
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need thee ASAN stuff here? We are never actually going to use this memory, so we shouldn't need to unpoison it, or reserve extra redzone space, right?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it'd be nice for consistencies sake, though yeah, we aren't really supposed to access this memory - removing this will make the code a lot nicer, and we can refactor it a bit to share the common code.

@senhuang42 senhuang42 force-pushed the wksp_alignment branch 3 times, most recently from a2366f7 to f4c34f1 Compare March 19, 2021 17:31
Copy link
Copy Markdown
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but @felixhandte can you take a look?

Copy link
Copy Markdown
Contributor

@felixhandte felixhandte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as I understand it, this makes things:

[objects][padding][tables->][padding] ... gap? ... [padding][<-aligned][padding][buffers]

However, if everything in tables and aligned is a multiple of 64 bytes, which you enforce AFAICT, then you're guaranteed to not need any padding between them. So you could remove 64 bytes of overhead and just do:

[objects][padding1][tables->] ... gap? ... [<-aligned][padding2][buffers]

And padding1 + padding2 == 64.

@senhuang42
Copy link
Copy Markdown
Author

senhuang42 commented Mar 24, 2021

[objects][padding1][tables->] ... gap? ... [<-aligned][padding2][buffers]

And padding1 + padding2 == 64.

@felixhandte so as discussed offline, I've changed this to just be padding in front of the tables and aligneds, rather than in front and after. This does widen our estimation error range though, in cases where we reuse a context (or if it's static) and it's too large, i.e. there is a gap between the tables and aligned that we can't fully accurately account for (is there a way to detect cwksp re-use when it hadn't been resized?)

Copy link
Copy Markdown
Contributor

@felixhandte felixhandte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem necessary to enforce sequencing of aligned -> table allocations. You could apply both alignment allocations when you transition from buffers -> aligned. Maybe you should back that out?

Otherwise this looks good.

@senhuang42
Copy link
Copy Markdown
Author

It doesn't seem necessary to enforce sequencing of aligned -> table allocations. You could apply both alignment allocations when you transition from buffers -> aligned. Maybe you should back that out?

Sure thing - do we want to adjust the comment:

* The various types of objects must be allocated in order, so they can be
 * correctly packed into the workspace buffer. That order is:
 *
 * 1. Objects
 * 2. Buffers
 * 3. Aligned
 * 4. Tables
 *
 * Attempts to reserve objects of different types out of order will fail.

Or is there some other mechanism that depends on this order somewhere else? The code as-is should be allocate aligned/tables in whatever order (tables also have the "aligned" enum phase).

@terrelln
Copy link
Copy Markdown
Contributor

terrelln commented Apr 1, 2021

Ping @felixhandte

Copy link
Copy Markdown
Contributor

@felixhandte felixhandte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it might be nice to update the comment.

Then other thing we had talked about was applying the stricter used == needed assertion when the workspace is newly allocated. Not necessary though.

Looks good!

@senhuang42 senhuang42 merged commit 980f3bb into facebook:dev Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants