-
-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Document MaybeUninit bit validity #140463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
|
Cc @rust-lang/opsem |
library/core/src/mem/maybe_uninit.rs
Outdated
| /// If `T` contains initialized bytes at byte offsets where `U` contains padding bytes, these | ||
| /// may not be preserved in `MaybeUninit<U>`, and so `transmute(u)` may produce a `T` with | ||
| /// uninitialized bytes in these positions. This is an active area of discussion, and this code | ||
| /// may become sound in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to say that a type "contains initialized bytes" at some offset. That's a property of a representation.
The typical term for representation bytes that are lost here is "padding". I don't think we have rigorously defined padding anywhere yet, but the term is sufficiently widely-used (and generally with a consistent meaning) that we may just be able to use it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, you're making two points:
- We should speak about a type's representation containing bytes, not about the type itself containing bytes
- In a representation, we should speak about padding bytes rather than uninitialized bytes
Is that right?
One thing that's probably worth distinguishing here is between values and layouts. In my mental model, an uninit byte is one of the possible values that a byte can have (e.g., it's the 257th value that can legally appear in a MaybeUninit<u8>). By contrast, padding is a property of a layout - namely, it's a sequence of bytes in a type's layout that happen to have the validity [MaybeUninit<u8>; PADDING_LEN].
Based on this, maybe it's best to say:
If byte offsets exists at which
T's representation does not permit uninitialized bytes butU's representation does (e.g. due to padding), then the bytes inTat these offsets may not be preserved inu, and sotransmute(u)may produce aTwith uninitialized bytes at these offsets. This is an active area of discussion, and this code may become sound in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that right?
No. I think both of the following concepts make sense:
- The representation of a particular value at a particular type contains uninitialized bytes.
- A type contains padding bytes. (These are bytes which are always ignored by the representation relation.)
But it makes less sense to talk about padding of a representation, or to talk about uninitialized bytes in a type.
So for this PR, the two key points (and they are separate points) are:
- If
Uhas padding, those bytes may be reset to "uninitialized" as part of the round-trip. If those same bytes are not padding inT, this can therefore mean some of the information of the originalTvalue is lost. - If
Tdoes not permit uninitialized bytes on those positions, the round-trip is UB.
The second point is just a logical consequence of the first, it does not add any new information. Not sure if it is worth mentioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The representation of a particular value at a particular type contains uninitialized bytes.
- A type contains padding bytes. (These are bytes which are always ignored by the representation relation.)
Does this imply that a type contains padding bytes, not a type's representation?
I'm thinking through the implications of what you said, and I think I understand something new that I didn't before, and I want to run it by you: In my existing mental model, a padding byte is a location in a type's layout such that every byte value at that location (including uninit) is valid (enums complicate this model, but I don't think that complication is relevant for this discussion - we can just stick to thinking about structs). The problem with this mental model is that, interpreted naively, it implies that different byte values in a padding byte could correspond to different logical values of the type. So e.g. in the type #[repr(C)] struct T(u8, u16), [0, 0, 0, 0] and [0, 1, 0, 0] would correspond to different values of the type since we're treating the padding byte itself as part of the representation relation. Of course, that is not something we want.
IIUC, by contrast your model is that the representation relation simply doesn't include padding bytes at all. So it'd be more accurate to describe the representation of T as consisting of three bytes - at offsets 0, 2, and 3. Every representation of T has a "hole" at offset 1 which is not part of the representation. This ensures that there's a 1:1 mapping between logical values and representations. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply that a type contains padding bytes, not a type's representation?
That's how I think about it. We can't tell which byte is a padding byte by looking at one representation -- it's a property of the type.
In my existing mental model, a padding byte is a location in a type's layout such that every byte value at that location (including uninit) is valid
That would make the only byte of MaybeUninit<u8> a padding byte, so I don't think this is the right definition.
That's why I said above: a padding byte is a byte that is ignored by the representation relation. Slightly more formally: if r is some representation valid for type T, and r' is equal to r everywhere except for padding bytes, then r and r' represent the same value.
So it'd be more accurate to describe the representation of T as consisting of three bytes
The representation has 4 bytes. But only 3 of them actually affect the represented value (which is a tuple of two [mathematical] integers).
We seem to be using the term "representation" slightly differently. For me, that's list a List<Byte> of appropriate length. You may be using that term to refer to what I call "representation relation"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to be using the term "representation" slightly differently. For me, that's list a
List<Byte>of appropriate length. You may be using that term to refer to what I call "representation relation"?
That's helpful, thank you!
To avoid rabbit holing too much on the definitions (although it's interesting and useful – just maybe a bit of a distraction here), maybe you could propose language you'd prefer to see in place of what I've written here?
|
@rustbot ready |
|
@rustbot author |
|
Reminder, once the PR becomes ready for a review, use |
|
I raised the question on Zulip whether it is wise to make a guarantee here that isn't, strictly speaking, documented in the LLVM LangRef. Nikita says he thinks that that's fine -- we may have to adjust how exactly we compile MaybeUninit in the future, but LLVM currently intends do support this case in a somewhat roundabout and incomplete way that seems to work well enough in practice, and LLVM can't more aggressively exploit the fuzziness along the edges of that approximation until a proper alternative exists. |
|
Thanks for your contribution @joshlf from wg-triage. |
I likely won't have time to move this forward until mid-September or October, but I'll follow up at that point. |
This comment has been minimized.
This comment has been minimized.
Nvm, found some time 🙂 I've responded to various comment threads. |
|
Awesome, you should also rebase your changes onto master, and do |
This comment has been minimized.
This comment has been minimized.
|
@rustbot ready |
|
Checking in on checkboxes... cc @rust-lang/opsem |
|
🔔 This is now entering its final comment period, as per the review above. 🔔 |
|
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
Rollup merge of #140463 - joshlf:patch-13, r=RalfJung Document MaybeUninit bit validity Partially addresses rust-lang/unsafe-code-guidelines#555 by clarifying that it is sound to write any byte values (initialized or uninitialized) to any `MaybeUninit<T>` regardless of `T`. r? `@RalfJung`
Document MaybeUninit bit validity Partially addresses rust-lang/unsafe-code-guidelines#555 by clarifying that it is sound to write any byte values (initialized or uninitialized) to any `MaybeUninit<T>` regardless of `T`. r? `@RalfJung`
Rollup of 2 pull requests Successful merges: - rust-lang#140463 (Document MaybeUninit bit validity) - rust-lang#148017 (Add TidyFlags and merge DiagCtx) r? `@ghost` `@rustbot` modify labels: rollup
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [rust](https://github.com/rust-lang/rust) | minor | `1.91.1` -> `1.92.0` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>rust-lang/rust (rust)</summary> ### [`v1.92.0`](https://github.com/rust-lang/rust/blob/HEAD/RELEASES.md#Version-1920-2025-12-11) [Compare Source](rust-lang/rust@1.91.1...1.92.0) \========================== <a id="1.92.0-Language"></a> ## Language - [Document `MaybeUninit` representation and validity](rust-lang/rust#140463) - [Allow `&raw [mut | const]` for union field in safe code](rust-lang/rust#141469) - [Prefer item bounds of associated types over where-bounds for auto-traits and `Sized`](rust-lang/rust#144064) - [Do not materialize `X` in `[X; 0]` when `X` is unsizing a const](rust-lang/rust#145277) - [Support combining `#[track_caller]` and `#[no_mangle]` (requires every declaration specifying `#[track_caller]` as well)](rust-lang/rust#145724) - [Make never type lints `never_type_fallback_flowing_into_unsafe` and `dependency_on_unit_never_type_fallback` deny-by-default](rust-lang/rust#146167) - [Allow specifying multiple bounds for same associated item, except in trait objects](rust-lang/rust#146593) - [Slightly strengthen higher-ranked region handling in coherence](rust-lang/rust#146725) - [The `unused_must_use` lint no longer warns on `Result<(), Uninhabited>` (for instance, `Result<(), !>`), or `ControlFlow<Uninhabited, ()>`](rust-lang/rust#147382). This avoids having to check for an error that can never happen. <a id="1.92.0-Compiler"></a> ## Compiler - [Make `mips64el-unknown-linux-muslabi64` link dynamically](rust-lang/rust#146858) - [Remove current code for embedding command-line args in PDB](rust-lang/rust#147022) Command-line information is typically not needed by debugging tools, and the removed code was causing problems for incremental builds even on targets that don't use PDB debuginfo. <a id="1.92.0-Libraries"></a> ## Libraries - [Specialize `Iterator::eq{_by}` for `TrustedLen` iterators](rust-lang/rust#137122) - [Simplify `Extend` for tuples](rust-lang/rust#138799) - [Added details to `Debug` for `EncodeWide`](rust-lang/rust#140153). - [`iter::Repeat::last`](rust-lang/rust#147258) and [`count`](rust-lang/rust#146410) will now panic, rather than looping infinitely. <a id="1.92.0-Stabilized-APIs"></a> ## Stabilized APIs - [`NonZero<u{N}>::div_ceil`](https://doc.rust-lang.org/stable/std/num/struct.NonZero.html#method.div_ceil) - [`Location::file_as_c_str`](https://doc.rust-lang.org/stable/std/panic/struct.Location.html#method.file_as_c_str) - [`RwLockWriteGuard::downgrade`](https://doc.rust-lang.org/stable/std/sync/struct.RwLockWriteGuard.html#method.downgrade) - [`Box::new_zeroed`](https://doc.rust-lang.org/stable/std/boxed/struct.Box.html#method.new_zeroed) - [`Box::new_zeroed_slice`](https://doc.rust-lang.org/stable/std/boxed/struct.Box.html#method.new_zeroed_slice) - [`Rc::new_zeroed`](https://doc.rust-lang.org/stable/std/rc/struct.Rc.html#method.new_zeroed) - [`Rc::new_zeroed_slice`](https://doc.rust-lang.org/stable/std/rc/struct.Rc.html#method.new_zeroed_slice) - [`Arc::new_zeroed`](https://doc.rust-lang.org/stable/std/sync/struct.Arc.html#method.new_zeroed) - [`Arc::new_zeroed_slice`](https://doc.rust-lang.org/stable/std/sync/struct.Arc.html#method.new_zeroed_slice) - [`btree_map::Entry::insert_entry`](https://doc.rust-lang.org/stable/std/collections/btree_map/enum.Entry.html#method.insert_entry) - [`btree_map::VacantEntry::insert_entry`](https://doc.rust-lang.org/stable/std/collections/btree_map/struct.VacantEntry.html#method.insert_entry) - [`impl Extend<proc_macro::Group> for proc_macro::TokenStream`](https://doc.rust-lang.org/stable/proc_macro/struct.TokenStream.html#impl-Extend%3CGroup%3E-for-TokenStream) - [`impl Extend<proc_macro::Literal> for proc_macro::TokenStream`](https://doc.rust-lang.org/stable/proc_macro/struct.TokenStream.html#impl-Extend%3CLiteral%3E-for-TokenStream) - [`impl Extend<proc_macro::Punct> for proc_macro::TokenStream`](https://doc.rust-lang.org/stable/proc_macro/struct.TokenStream.html#impl-Extend%3CPunct%3E-for-TokenStream) - [`impl Extend<proc_macro::Ident> for proc_macro::TokenStream`](https://doc.rust-lang.org/stable/proc_macro/struct.TokenStream.html#impl-Extend%3CIdent%3E-for-TokenStream) These previously stable APIs are now stable in const contexts: - [`<[_]>::rotate_left`](https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rotate_left) - [`<[_]>::rotate_right`](https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rotate_right) <a id="1.92.0-Cargo"></a> ## Cargo - [Added a new chapter](rust-lang/cargo#16119) to the Cargo book, ["Optimizing Build Performance"](https://doc.rust-lang.org/stable/cargo/guide/build-performance.html). <a id="1.92.0-Rustdoc"></a> ## Rustdoc - [If a trait item appears in rustdoc search, hide the corresponding impl items](rust-lang/rust#145898). Previously a search for "last" would show both `Iterator::last` as well as impl methods like `std::vec::IntoIter::last`. Now these impl methods will be hidden, freeing up space for inherent methods like `BTreeSet::last`. - [Relax rules for identifiers in search](rust-lang/rust#147860). Previously you could only search for identifiers that were valid in rust code, now searches only need to be valid as part of an identifier. For example, you can now perform a search that starts with a digit. <a id="1.92.0-Compatibility-Notes"></a> ## Compatibility Notes - [Fix backtraces with `-C panic=abort` on Linux by generating unwind tables by default](rust-lang/rust#143613). Build with `-C force-unwind-tables=no` to keep omitting unwind tables. * As part of the larger effort refactoring compiler built-in attributes and their diagnostics, [the future-compatibility lint `invalid_macro_export_arguments` is upgraded to deny-by-default and will be reported in dependencies too.](rust-lang/rust#143857) * [Update the minimum external LLVM to 20](rust-lang/rust#145071) * [Prevent downstream `impl DerefMut for Pin<LocalType>`](rust-lang/rust#145608) * [Don't apply temporary lifetime extension rules to the arguments of non-extended `pin!` and formatting macros](rust-lang/rust#145838) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi40Ny4wIiwidXBkYXRlZEluVmVyIjoiNDIuNDcuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Partially addresses rust-lang/unsafe-code-guidelines#555 by clarifying that it is sound to write any byte values (initialized or uninitialized) to any
MaybeUninit<T>regardless ofT.r? @RalfJung