Skip to content

Arbitrary for arrays using just stabilized minimal const generics#282

Closed
regexident wants to merge 2 commits into
BurntSushi:masterfrom
regexident:arbitrary-array
Closed

Arbitrary for arrays using just stabilized minimal const generics#282
regexident wants to merge 2 commits into
BurntSushi:masterfrom
regexident:arbitrary-array

Conversation

@regexident
Copy link
Copy Markdown

No description provided.

Copy link
Copy Markdown
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I won't merge this until const generics have been stable for at least 2 releases, to give folks time to migrate to the new Rust release. Feel free to ping me when that time comes if I don't remember to merge it.

Also, could you please add a detailed comment for each unsafe block justifying why it is safe? It should be in this form:

// SAFETY: yadda yadda yadda

Also, I haven't scrutinized the code too carefully yet, but is unsafe required here? There is no safe way to create arrays in a generic context?

@regexident
Copy link
Copy Markdown
Author

I won't merge this until const generics have been stable for at least 2 releases, to give folks time to migrate to the new Rust release. Feel free to ping me when that time comes if I don't remember to merge it.

That's alright with me. :)

Also, could you please add a detailed comment for each unsafe block justifying why it is safe? It should be in this form:

// SAFETY: yadda yadda yadda

Done!

Currently there is no way of collecting an iterator into a fixed-size array, so we're forced to using MaybeUninit, unless we simply initialize an array with default values and then override those with the actual values.

The latter has two disadvantages, due to which I went with the former:

  • we would be initializing a possibly large number of values, which would be discarded immediately after
  • the type T of [T; N] might not actually be T: Default to begin with

Also, I haven't scrutinized the code too carefully yet, but is unsafe required here? There is no safe way to create arrays in a generic context?

I feel reasonably confident in the code, but you might want to give the unit test a check as I'm not 100% sure it actually does test what needs to be tested.

@jakoschiko
Copy link
Copy Markdown
Contributor

Would it be ok to use a single-purpose crate for the array initialization? array_init allows to initialize the array using a closure and also fixes the memory leak.

Copy link
Copy Markdown
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for using array-init, I'm actually a little torn. On the one hand, I'm not usually a fan of using small crates like that. On the other, this particular piece of code is a little bit tricky. However, our use case almost exactly matches the example given in std's docs.

I think my view is that this comes down to the leak-on-panic issue. Does quickcheck need to concern itself with that? If so, then using array-init absolutely seems worth it, since that gets quite a bit hairier. If not, though, then I think we can manage with just a couple lines of unsafe on our own.

So when it comes to the leak-on-panic, I think it would only be able to happen when calling T::arbitrary(). Almost certainly, a panic inside of an arbitrary() impl is a bug.

On the flip side, quickcheck does specifically catch panics and attempt to shrink the witnesses. So if there is a leak, it could build up. But, unless that leak is gratuitous, quickcheck is going to present some kind of witness as a failure, and ultimately, the panic should be fixed.

Do people buy my reasoning here?

Comment thread src/arbitrary.rs
// assignment instead of `ptr::write` does not cause the old
// uninitialized value to be dropped. Also if there is a panic during
// this loop, we have a memory leak, but there is no memory safety
// issue.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording is a little confusing to me. It says "Thus using raw pointer assignment instead of ptr::write" and that makes me think that the code below is going to be using raw pointer assignment. But it in fact uses ptr::write.

It looks like the wording was copied from the example in std's docs, but the actual code here is using ptr::write instead of just *elem = MaybeUninit::new(T::arbitrary(g)). It seems like the latter would be preferrable?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread src/arbitrary.rs

// SAFETY: Everything is initialized (i.e. safe). Transmute the array to the initialized type.
// (We need to use `transmute_copy` here as `transmute` has some limitations around generic types)
unsafe { std::mem::transmute_copy::<_, [T; N]>(&maybe_uninit) }
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to spell out in more detail why transmute_copy is needed here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is described in this issue and the recommendation to use transmute_copy instead was taken from here.

Should I just add a link to the rust issue in the code comment?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least add a link, yes. But if we can't grok and regurgitate the specific reasoning for why it's safe in this case, then that makes me worried.

I don't have time to dig into this now unfortunately though.

@jakoschiko
Copy link
Copy Markdown
Contributor

So when it comes to the leak-on-panic, I think it would only be able to happen when calling T::arbitrary(). Almost certainly, a panic inside of an arbitrary() impl is a bug.

The documentation of Arbitrary::arbitrary doesn't say that a panic inside the impl is a bug. Maybe this should be added.

On the flip side, quickcheck does specifically catch panics and attempt to shrink the witnesses. So if there is a leak, it could build up. But, unless that leak is gratuitous, quickcheck is going to present some kind of witness as a failure, and ultimately, the panic should be fixed.

If T::arbitrary panics, there is no value. How can quickcheck catch the panic and shrink the witness? Does quickcheck call T::arbitrary again in this case?

@BurntSushi
Copy link
Copy Markdown
Owner

If T::arbitrary panics, there is no value. How can quickcheck catch the panic and shrink the witness? Does quickcheck call T::arbitrary again in this case?

Oh good point. it would just tear down the entire testing infrastructure. Which probably makes "leaks may happen on panic" more tolerable.

@regexident
Copy link
Copy Markdown
Author

@BurntSushi is there anything I can do to get this merged, or should I close it?

@regexident regexident closed this by deleting the head repository Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants