Skip to content

Split rmc::nondet into safe and unsafe functions(#607)#657

Merged
celinval merged 11 commits intomodel-checking:mainfrom
celinval:issue-607
Jan 10, 2022
Merged

Split rmc::nondet into safe and unsafe functions(#607)#657
celinval merged 11 commits intomodel-checking:mainfrom
celinval:issue-607

Conversation

@celinval
Copy link
Copy Markdown
Contributor

@celinval celinval commented Nov 19, 2021

Description of changes:

The return value of rmc::nondet doesn't always respect the invariant of a type which could be confusing for the user. To fix the issue, we decided to provide two functions that create symbolic variables and one new trait that allows user to specify invariant of a type. The rmc::any::<T>() function generates a value that must satisfy the type invariant (specified by the new rmc::Invariant trait), while the unsafe function, rmc::any_raw::<T>() one generates an unconstrained return value which maps to any possible bit value assignment.

Resolved issues:

Resolves #607

Call-outs:

  1. I kept the rmc::nondet::<T>() for now so I can finish updating the tests in a follow up PR.
  2. I am still adding tests to the implementations of the Invariant. I'll update the PR once that it's done.

Testing:

  • How is this change tested? I modified all occurrences of the function in our tests.

  • Is this a refactor change? Yes

Checklist

  • Each commit message has a non-empty body, explaining why the change was made
  • Methods or procedures are documented
  • Regression or unit tests are included, or existing tests cover the modified code
  • My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

@celinval celinval requested a review from a team as a code owner November 19, 2021 05:59
Copy link
Copy Markdown
Contributor

@camshaft camshaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think this is the correct approach.

@celinval
Copy link
Copy Markdown
Contributor Author

A few folks have brought up some concerns with the message unsafe {} statement actually provides. Since for some cases RMC does provide assurance that the value is valid.

Instead we decided to create two different functions, one that is safe and it should only represent valid values and another one that we will mark with unsafe that represents all possible bit value for the memory occupied by the object.

@camshaft
Copy link
Copy Markdown
Contributor

Instead we decided to create two different functions, one that is safe and it should only represent valid values and another one that we will mark with unsafe that represents all possible bit value for the memory occupied by the object.

Is there going to be a marker trait for the safe function? If so that should be unsafe to implement (similar to Send and Sync).

The return value of rmc::nondet doesn't respect the invariant of a type.
Until we have a better way to create symbolic variables, users must
enforce the type invariant in their proof harness before using the
value returned.
We decided to replace the old method rmc::nondet because its behavior
wasn't clear. For some types it would respect the type invariants but
not for others. Additionally, there was no way for the user to specify
how to create safe non-deterministic values.
@celinval celinval force-pushed the issue-607 branch 2 times, most recently from c9f3360 to 620c80e Compare January 4, 2022 03:26
@celinval celinval changed the title Mark rmc::nondet as unsafe (#607) Split rmc::nondet into two functions, one that respect invariants and one that doesn't (#607) Jan 4, 2022
@celinval celinval changed the title Split rmc::nondet into two functions, one that respect invariants and one that doesn't (#607) Split rmc::nondet into two functions, one that respect type invariant and one that doesn't (#607) Jan 4, 2022
Users will be able to control how RMC generate symbolic values to their
variables.

- `rmc::any::<T>()` will provide a safe way to generate values
that respect any invariant specified to type T (Given by the
implementation of Invariant type.)

- `rmc::any_raw::<T>()` will provide any bitvalue possible for the
memory layout of the type. This is an unsafe method and users must
ensure correctness after this call.

Note: I still need to add tests to the implementations of Invariant we
provide.
@celinval celinval changed the title Split rmc::nondet into two functions, one that respect type invariant and one that doesn't (#607) Split rmc::nondet into safe and unsafe functions(#607) Jan 4, 2022
@celinval celinval requested a review from zhassan-aws January 4, 2022 18:19
@@ -1 +1 @@
** 0 of 1 failed (1 iterations)
** 0 of 2 failed (1 iterations)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is going on here? This was supposed to have a single assertion 😆

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know... we just can't keep this going, eh? We really need that report filter. This is due to a compiler sanity check. I could just use any_raw here to keep the 1 assertion.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, no need to change for any_raw in my opinion.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad :( The purpose of this test was to make sure we don't get all the noisy successes for builtin functions, e.g.:

<builtin-library-calloc> function calloc
[calloc.assertion.1] assertion false: SUCCESS

<builtin-library-ceil> function ceil
[ceil.assertion.1] assertion false: SUCCESS

<builtin-library-ceilf> function ceilf
[ceilf.assertion.1] assertion false: SUCCESS

<builtin-library-copysign> function copysign
[copysign.assertion.1] assertion false: SUCCESS
...

which we got rid of with --drop-unused-functions. So to make the test more robust, we need to check if the output does not include those lines. Is there a way to do that with our testing infrastructure?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present, there is no way to do that. Not that it would be hard to, but I would wait until we get structured output to implement this kind of checks. The test is still ensuring there are only a couple of successful assertions so I would leave as is.

{
#[inline(always)]
fn is_valid(&self) -> bool {
match self {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the generated discriminant/tag is invalid and the match would be undefined behavior? Or does the RMC backend always generate valid enum tags?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the implementation to avoid the issue I believe you are talking about.

Copy link
Copy Markdown
Contributor

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! What is unclear to me is my comment on the removal of the assumptions module.

unsafe impl Invariant for u8 {
#[inline(always)]
fn is_valid(&self) -> bool {
true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this the default implementation in the trait definition to reduce boiler-plate code, or is that risky? If we do that, many of impl's would just be:

unsafe impl Invariant for u8 { }

unsafe impl Invariant for u16 { }

...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll generate these with a macro. I don't know if we should provide a default implementation that returns true. Is the macro option OK?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A macro is fine.

fn new() -> Self {
let arr: [T; MAX_SLICE_LENGTH] = nondet();
let slice_len: usize = nondet();
let arr: [T; MAX_SLICE_LENGTH] = unsafe { any_raw() };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to implement any for arrays, e.g.:

unsafe impl<T, const LEN: usize> Invariant for [T; LEN]
where
    T: Invariant,
{
    #[inline(always)]
    fn is_valid(&self) -> bool {
        let valid = true;
        for i in *self {
            valid &= i.is_valid();
        }
        valid
    }
}

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I am adding a variant of that. Thanks!

false
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumptions module handled a lot more types than the ones implemented here, e.g. enums. Do we lose assumptions for those other types?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do. For enums and structs, @danielsn and I were thinking about adding a custom derive: #726

For other cases, I would prefer being safe. I don't think we should infer anything based on the type of fields without any sort of user intervention.

Copy link
Copy Markdown
Contributor

@zhassan-aws zhassan-aws Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents: it would be inconvenient if the user needs to annotate all the enums in their code with #[derive(Invariant)]. Automatically handling that during codegen is more user friendly (assuming this was done in the assumptions module).

}
}
rmc::nondet()
if rmc::any() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change maintaining the old behavior?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understood... any insight?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to replace this with just:

rmc::any()

we would have to add an Invariant trait bound to T, is that correct? Not sure which option is better (assuming what I mentioned is correct).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. This does remove some constraints from the proof. That said, the proof still succeeds, so I don't think there is any harm done.

I'll add the Invariant and use rmc::any since that is the prefeed method.

///
/// Implementations of Invariant traits must ensure that the current bit values of the given type
/// is valid and that all its invariants hold.
pub unsafe trait Invariant {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this trait itself need to be unsafe? I agree that it looks like it should be, but is there a way to do this without that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a suggestion from @camshaft which I thought made a lot of sense. So I would prefer keeping it.

Is there going to be a marker trait for the safe function? If so that should be unsafe to implement (similar to Send and Sync).

Copy link
Copy Markdown
Contributor

@camshaft camshaft Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally thinking it would be something like:

/// Types that implement this can be generated without any constraints or invariants
pub unsafe trait NondetTransmute { }

pub fn any<T: NondetTransmute>() -> T {
    unsafe { any_raw() }
}

You would then implement that for the primitives.

I'm trying to decide if the unsafe makes sense in its current form, though... My concern is the &self argument is completely unvalidated and without knowing the RMC compiler internals, may result in UB when using it.

Is there a reason the trait method doesn't return an owned value instead? It would make the trait safe to implement and each implementation may or may not need to use unsafe to generate a value:

pub trait Arbitrary {
    fn any() -> Self;
}

impl Arbitrary for u8 {
    fn any() -> Self {
        unsafe { any_raw() }
    }
}

impl Arbitrary for char {
    fn any() -> Self {
        let v = unsafe { any_raw() };
        let v = char::from_u32(v);
        assume(v.is_ok());
        v.unwrap()
    }
}

struct MyThing {
    field: u8
}

impl Arbitrary for MyThing {
    fn any() - Self {
        Self { field: u8::any() }
    }
}

Copy link
Copy Markdown
Contributor

@camshaft camshaft Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another concern is: in its current form, it's very easy to miss validating a field when implementing the trait:

struct MyThing {
    field: char,
    other: char,
}

impl Invariant for MyThing {
    fn is_valid(&self) -> bool {
        self.field.is_valid() // missing `other`!
    }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. This is another possibility which uses a builder pattern. I thought the Invariant approach was a bit cleaner and easier to use. It also adds a method that can be used in assert statements which I thought was nice. I do agree that it has the downside of having a potential invalid Self and I think the unsafe impl is proper in this case.

I think they are somewhat equivalent though. For an arbitrary enum, it would go from:

unsafe impl Invariant for MyEnum {
    #[inline(always)]
    fn is_valid(&self) -> bool {
        matches!(*self, MyEnum::V1 | MyEnum::V2 | MyEnum::V3)
    }
}

to

impl Arbitrary for MyEnum {
    #[inline(always)]
    fn any() -> Self {
        match u8::any() {
            0 => MyEnum::V1,
            1 => MyEnum::V2,
            _ => MyEnum::V3,
        }
    }
}

Maybe we should check which one performs better or even offer both.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the concern, and I debated between the two options for a long time. Maybe we can have a quick sync tomorrow to discuss the options.

Another possibility is to add both methods to the type with default implementations. Something like:

pub unsafe trait Arbitrary {
    fn is_valid(&self) -> bool { true }
    fn any() -> Self {
        let s = unsafe { rmc::any_raw::<Self>() };
        rmc::assume(s.is_valid());
        s
    } 
}

pub fn any<T: Arbritary>() -> T {
    T::any()
}

Users can pick and choose which one to implement / use.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm good to sync up whenever works for you!

Comment on lines 8 to 9
// TODO: This test should ideally work with nondeterminstic values but for
// for the moment it does not.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still true?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know.

let table: [[u64; 1]; 1] = [[rmc::nondet::<u64>()]];
table[0][rmc::nondet::<usize>()]; // EXPECTED FAIL
let table: [[u64; 1]; 1] = [[rmc::any::<u64>()]];
table[0][rmc::any::<usize>()]; // EXPCECTED FAIL
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the turbofish needed here?

Comment on lines 5 to 14
let input: &[u8] = &vec![
rmc::nondet(),
rmc::nondet(),
rmc::nondet(),
rmc::nondet(),
rmc::nondet(),
rmc::nondet(),
rmc::nondet(),
rmc::nondet(),
rmc::any(),
rmc::any(),
rmc::any(),
rmc::any(),
rmc::any(),
rmc::any(),
rmc::any(),
rmc::any(),
];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use any_slice

// Should return a nondet string of up to n characters
// Currently RMC does not support strings
fn rmc::nondet_string(n: u32) -> String {
fn rmc::any_string(n: u32) -> String {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this belong in the prelude?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently support generating strings. We should consider doing something similar to what @zhassan-aws did for slices. But I think this is out of the scope of this PR.

// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0 OR MIT

/// This module introduces the Invariant trait and as well as implementation for commonly used types.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any standard types missing here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the list of primitive types here, the types missing are:

  • never
  • fn
  • pointer
  • reference
  • slice
  • str
  • tuple
  • unit

Except for tuple, I don't see a use case for invariant. The tuple is a bit more painful to generate since rust doesn't have variadic templates.

In the next revision, I will include array which I was missing.

The standard library have a lot of other types that we are not implementing yet. Things with pointers and references are a bit trickier to implement. I think having a builder pattern for them will make more sense.

celinval and others added 3 commits January 4, 2022 17:31
Some typos

Co-authored-by: Zyad Hassan <88045115+zhassan-aws@users.noreply.github.com>
Co-authored-by: Adrian Palacios <73246657+adpaco-aws@users.noreply.github.com>
rmc::expect_fail(v2.get() != 0, "Any raw may generate invalid value.");

let option = Some(v2);
rmc::expect_fail(option.is_some(), "Yep. This can happen due to niche optimization");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprising!

@celinval
Copy link
Copy Markdown
Contributor Author

celinval commented Jan 7, 2022

I created the following issues to capture the mitigation tasks for the drawback of the current approach:
#726: Create custom derive attribute.
#735: Validate all fields and enum variants are checked in the Invariant implementation.

Conflicts:
    - src/rmc-compiler/rustc_codegen_rmc/src/codegen/assumptions.rs
@celinval
Copy link
Copy Markdown
Contributor Author

celinval commented Jan 7, 2022

I added the Arbitrary trait with an implementation for the Invariant trait per @camshaft and @danielsn suggestion. Thanks!

Copy link
Copy Markdown
Contributor

@camshaft camshaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how this turned out! nice work!

Copy link
Copy Markdown
Contributor

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@celinval celinval merged commit 176a9c6 into model-checking:main Jan 10, 2022
tedinski pushed a commit to tedinski/rmc that referenced this pull request Apr 26, 2022
…odel-checking#657)

* Introduce rmc::any and rmc::any_raw and deprecate rmc::nondet

We decided to replace the old method rmc::nondet because its behavior
wasn't clear. For some types it would respect the type invariants but
not for others. Additionally, there was no way for the user to specify
how to create safe non-deterministic values.

Users will be able to control how RMC generate symbolic values to their
variables.

- `rmc::any::<T>()` will provide a safe way to generate values
that respect any invariant specified to type T (Given by the
implementation of Arbitrary trait.)

- `rmc::any_raw::<T>()` will provide any possible bit value for the
memory layout of the type. This is an unsafe method and users must
ensure correctness after this call.

Co-authored-by: Zyad Hassan <88045115+zhassan-aws@users.noreply.github.com>
Co-authored-by: Adrian Palacios <73246657+adpaco-aws@users.noreply.github.com>
tedinski pushed a commit that referenced this pull request Apr 27, 2022
* Introduce rmc::any and rmc::any_raw and deprecate rmc::nondet

We decided to replace the old method rmc::nondet because its behavior
wasn't clear. For some types it would respect the type invariants but
not for others. Additionally, there was no way for the user to specify
how to create safe non-deterministic values.

Users will be able to control how RMC generate symbolic values to their
variables.

- `rmc::any::<T>()` will provide a safe way to generate values
that respect any invariant specified to type T (Given by the
implementation of Arbitrary trait.)

- `rmc::any_raw::<T>()` will provide any possible bit value for the
memory layout of the type. This is an unsafe method and users must
ensure correctness after this call.

Co-authored-by: Zyad Hassan <88045115+zhassan-aws@users.noreply.github.com>
Co-authored-by: Adrian Palacios <73246657+adpaco-aws@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide safe and unsafe functions to generate symbolic values

5 participants