Skip to content

sumtype: remove forced @system for opAssign preventing @safe code#8146

Closed
ljmf00 wants to merge 1 commit intodlang:masterfrom
ljmf00:remove-stupid-check
Closed

sumtype: remove forced @system for opAssign preventing @safe code#8146
ljmf00 wants to merge 1 commit intodlang:masterfrom
ljmf00:remove-stupid-check

Conversation

@ljmf00
Copy link
Member

@ljmf00 ljmf00 commented Jun 18, 2021

Fixes #22041 .

D programming language is moving towards @safe code and introducing new
@System code makes std.sumtype fundamentally useless when using it under @safe
code. Forcing @System when the users are using pointers is just wrong.

The simple example to prove it is the following:

void main() @safe
{
    int* thisIsNotUnsafe = new int;
}

Using pointers in D is not necessarily @System.

Moreover, this code is preventing the following code to compile in @safe:

alias FooBar = SumType!string;
auto foobar = FooBar("foo");
foobar = FooBar("bar");

Signed-off-by: Luís Ferreira contact@lsferreira.net

@dlang-bot
Copy link
Contributor

dlang-bot commented Jun 18, 2021

Thanks for your pull request and interest in making D better, @ljmf00! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
22041 normal SumType assignments should work with pointers in @safe code

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8146"

@ljmf00
Copy link
Member Author

ljmf00 commented Jun 18, 2021

Sorry, I wrongly categorized the issue importance. The label should be updated.

@ljmf00 ljmf00 force-pushed the remove-stupid-check branch 2 times, most recently from 197a55b to 97f2adf Compare June 18, 2021 19:55
@ljmf00 ljmf00 marked this pull request as ready for review June 18, 2021 20:13
@ljmf00 ljmf00 requested a review from pbackus as a code owner June 18, 2021 20:13
@dnadlinger
Copy link
Contributor

The rationale for this behaviour is described in the comments that your PR removes. If I have a int*, float sum type that stores a pointer, get a reference to that, and then store a float, how does that not lead to corruption?

@ljmf00
Copy link
Member Author

ljmf00 commented Jun 18, 2021

The rationale for this behaviour is described in the comments that your PR removes. If I have a int*, float sum type that stores a pointer, get a reference to that, and then store a float, how does that not lead to corruption?

Firstly, unless you are doing pretty weird stuff or breaking the language with undefined behaviour, there's no reason to write cast(void) () @system {}(); to force @System. If you are doing normal code and need to write this stuff, then, there's something wrong with the compiler or the specification. I could be wrong whether SumType is doing strange things here, but from what I analyzed, it's not.

The whole point of SumType is to match the right type, if you assign a float to the SumType, it will change the tid internally and match the correct one.

You should trust the compiler and respect the language's definition of @safe unless you are just afraid of unsafe code and think that every pointer is unsafe. According to the language specification:

19.24.5. Safe Values:
[...]
2. A pointer is a safe value when it is one of:

  • null
  • it points to a memory object that is live and the pointed to value in that memory object is safe.

When allocating with new, you know that the memory you are pointing to is live, because it relies on a garbage collector implementation and the language ensures that.

Particularly, SumType is using an union to share storage. Looking again into the specification:

28.3. Pointers and the Garbage Collector:
[...]
4. Things that are reliable and can be done:

  • Use a union to share storage with a pointer
    union U { void* ptr; int value }

So I don't see any problem here rather than a restricted idea of @safe code.

@dnadlinger
Copy link
Contributor

Firstly, unless you are doing pretty weird stuff or breaking the language with undefined behaviour, there's no reason to write cast(void) () @system {}(); to force @system. If you are doing normal code and need to write this stuff, then, there's something wrong with the compiler or the specification.

get() is @trusted. Wanting to be able to mark it as such might have implications.

You should trust the compiler and respect the language's definition of @safe unless you are just afraid of unsafe code and think that every pointer is unsafe.

See above. Watch your tone, too.

Particularly, SumType is using an union to share storage. Looking again into the specification: […]

That section is specifically about compatibility with the garbage collector and not relevant to the discussion here. All it says is that the GC doesn't miss pointers when stored in unions.

@ljmf00
Copy link
Member Author

ljmf00 commented Jun 19, 2021

get() is @trusted. Wanting to be able to mark it as such might have implications.

get() is @trusted because it is covered by the assert() before having a possible memory corruption, otherwise is @System. Also, memory corruption only happens if we are overlapping memory with pointers. In this case, SumType has checks with the tid mechanism. Moreover, get() is private. It's another discussion to have, some interface like get() should be public and with a properly documented behaviour for people to use it in performance cases.

Again, you should trust the language rather than forcing @System attribute in this case. If there's any possibility of having memory corruption with it, please come up with an example, I would be happy to discuss it.

You should trust the compiler and respect the language's definition of @safe unless you are just afraid of unsafe code and think that every pointer is unsafe.

See above. Watch your tone, too.

Sorry if I was aggressive, it is not my intention to offend, and if I've done it, sincere apologies.

Particularly, SumType is using an union to share storage. Looking again into the specification: […]

That section is specifically about compatibility with the garbage collector and not relevant to the discussion here. All it says is that the GC doesn't miss pointers when stored in unions.

Yes, I just related that with allocating with new. SumType internally uses a union that's why I came up with it to confirm that the GC can handle this situation and using pointers is safe.

@dnadlinger
Copy link
Contributor

get() is @trusted because it is covered by the assert() before having a possible memory corruption, otherwise is @System. […] Again, you should trust the language rather than forcing @System attribute in this case.

The point is that with one @trusted function handling that piece of data, you are already telling the language not to check your work, so you are responsible for appropriately restricting access to that piece of data so that the overall system is still safe. In that sense, the presence of a piece of @trusted code is relevant.

In particular, you need to prove that with your change, there is no way that there can be any references to the memory left over that are overwritten. The comment, presumably by @pbackus, seems to suggest that there is a way to get into that situation if the data type itself has indirections. Perhaps he has already has an example in mind (possibly involving self-referential data structures)?

Myself, I don't have a counter-example ready, but safety doesn't seem obvious to prove either.

Sorry if I was aggressive, it is not my intention to offend, and if I've done it, sincere apologies.

No apology needed, but you seemed to be arguing from a position seeing the intentional, documented check as "obviously stupid", which is rarely a productive way to go about things (and probably offensive to the original author of the piece of code – which I am not).

@pbackus
Copy link
Contributor

pbackus commented Jun 19, 2021

This change would allow undefined behavior in @safe code:

int n;

int example() @safe
{
    SumType!(int*, int) x = &n;
    return x.match!(
        (int n) => n,
        (ref int* p) {
            x = 123456789; // overwrites p (currently @system)
            return *p; // kaboom
        }
    );
}

@pbackus pbackus closed this Jun 19, 2021
@ljmf00
Copy link
Member Author

ljmf00 commented Jun 19, 2021

get() is @trusted because it is covered by the assert() before having a possible memory corruption, otherwise is @System. […] Again, you should trust the language rather than forcing @System attribute in this case.

The point is that with one @trusted function handling that piece of data, you are already telling the language not to check your work, so you are responsible for appropriately restricting access to that piece of data so that the overall system is still safe. In that sense, the presence of a piece of @trusted code is relevant.

Yes, @trusted could lead to very complicated edge cases, I agree.

In particular, you need to prove that with your change, there is no way that there can be any references to the memory left over that are overwritten. The comment, presumably by @pbackus, seems to suggest that there is a way to get into that situation if the data type itself has indirections. Perhaps he has already has an example in mind (possibly involving self-referential data structures)?

Myself, I don't have a counter-example ready, but safety doesn't seem obvious to prove either.

@pbackus just pointed an example that we can discuss further. Would be cool to discuss the self-referencial data structures too!

Sorry if I was aggressive, it is not my intention to offend, and if I've done it, sincere apologies.

No apology needed, but you seemed to be arguing from a position seeing the intentional, documented check as "obviously stupid", which is rarely a productive way to go about things (and probably offensive to the original author of the piece of code – which I am not).

You are right, I should have chosen another name for the branch.

This change would allow undefined behavior in @safe code:

int n;

int example() @safe
{
    SumType!(int*, int) x = &n;
    return x.match!(
        (int n) => n,
        (ref int* p) {
            x = 123456789; // overwrites p (currently @system)
            return *p; // kaboom
        }
    );
}

The issue you described proves that the problem is not in the opAssign and rather in the match. The matcher passes a reference of the shared union memory, that's why you get a memory corruption here. You are not covering this particular case. If you don't pass the reference of the shared memory to the lambda, everything is just fine:

int example() @safe
{
    SumType!(int*, int) x = &n;
    return x.match!(
        (int n) => n,
        (int* p) {
            x = 123456789;
            return *p;
        }
    );
}

I see what you are facing but making assignment @System is wrong, the problem is accessing the overwritten data rather than assigning it. The assignment is fine.

@pbackus
Copy link
Contributor

pbackus commented Jun 19, 2021

I see what you are facing but making assignment @system is wrong, the problem is accessing the overwritten data rather than assigning it. The assignment is fine.

Either one is fine on its own. Allowing both is what causes the problem. Since making the match call @system would be a breaking change at this point, opAssign must remain @system.

@ljmf00
Copy link
Member Author

ljmf00 commented Jun 19, 2021

I see what you are facing but making assignment @system is wrong, the problem is accessing the overwritten data rather than assigning it. The assignment is fine.

Either one is fine on its own. Allowing both is what causes the problem. Since making the match call @system would be a breaking change at this point, opAssign must remain @system.

No. Assignment does never lead to memory corruption unless that memory is not considered live. Assignment with new or other considered live memory is always fine. The problem is accessing references of shared storage.

@ljmf00
Copy link
Member Author

ljmf00 commented Jun 19, 2021

Also, @pbackus, I would suggest you to not close PRs/issues during active discussions. I see it unproductive and kinda disrespectful, in my perspective.

@pbackus
Copy link
Contributor

pbackus commented Jun 19, 2021

Allowing @safe code to access an invalid pointer is memory corruption:

When it is only called with safe values and safe aliasing, a function has a safe interface when:

[...]
2. it cannot create unsafe values that are accessible from other parts of the program (e.g., via return values, global variables, or ref parameters), and

From https://dlang.org/spec/function.html#safe-interfaces

@ljmf00
Copy link
Member Author

ljmf00 commented Jun 19, 2021

Allowing @safe code to access an invalid pointer is memory corruption:

When it is only called with safe values and safe aliasing, a function has a safe interface when:
[...]
2. it cannot create unsafe values that are accessible from other parts of the program (e.g., via return values, global variables, or ref parameters), and

From dlang.org/spec/function.html#safe-interfaces

You should revise your argument, "access an invalid pointer". You only access invalid memory because you assigned unlived memory. You are restricting the assignment of live memory, which is wrong. If the user assigns invalid memory, it's because it came from an unsafe interface.

Doing this is @safe:

void foobar(int* f) @safe
{
    *f = 5;
}

There's nothing wrong here. If the user passes invalid memory, yes you have memory corruption, but this is @safe code, the users just used unsafe interfaces to pass unlived memory to your @safe interface.

A more complete example, perfectly @safe:

int* b;

void foobar(int* f) @safe
{
    *f = 5;
    b = f;
    *f = *b;
}

You will probably consider this very unsafe, but it's not. The problem is if the user passes an invalid memory to foobar(). The interface of foobar() is @safe.

@pbackus
Copy link
Contributor

pbackus commented Jun 19, 2021

You should revise your argument, "access an invalid pointer". You only access invalid memory because you assigned unlived memory. You are restricting the assignment of live memory, which is wrong.

The restriction is necessary to ensure that undefined behavior cannot occur in @safe code. The fact that it also prevents some memory-safe assignments is unfortunate, but unavoidable. There is simply not enough information available at compile time to distinguish between memory-safe assignments and those that would cause memory corruption if allowed.

It would be possible to lift the restriction in certain special cases, like the one in pbackus/sumtype#67 where the pointer is the only member of the SumType.

@ljmf00
Copy link
Member Author

ljmf00 commented Jun 19, 2021

You should revise your argument, "access an invalid pointer". You only access invalid memory because you assigned unlived memory. You are restricting the assignment of live memory, which is wrong.

The restriction is necessary to ensure that undefined behavior cannot occur in @safe code. The fact that it also prevents some memory-safe assignments is unfortunate, but unavoidable. There is simply not enough information available at compile time to distinguish between memory-safe assignments and those that would cause memory corruption if allowed.

This is possible and the restriction is, again, not in the right place.

There is simply not enough information available at compile time

It is, you can check if you're escaping the reference in the matcher, as I have shown above. If the lambda is a reference parameter to some SumType field, the matcher is @System, because there is a way of memory corruption. I would say this change should be discussed.

It would be possible to lift the restriction in certain special cases, like the one in pbackus/sumtype#67 where the pointer is the only member of the SumType.

SumType with only one member makes it unpractical to use. In the same way not useful to use it with arrays, strings, pointers, classes or structs that may include arrays, strings, pointers, classes or other nested similar structs.

Everything I listed above is impossible to use in @safe code with SumType right now. This is fundamentally poor community discussion. SumType joined the standard library in a flash, without barely any community appreciation. D programming language, its users and contributors are trying to move forward in @safe'ty not backwards.

And don't get me wrong, SumType has a really cool concept behind it and thanks a ton for bringing up the idea, but the practicality of it should be properly discussed.

Discussion about these things is very crucial for the success of the language. If we keep on the same path, we end up having the same interface as the containers in the standard library.

@pbackus
Copy link
Contributor

pbackus commented Jun 19, 2021

It is, you can check if you're escaping the reference in the matcher, as I have shown above.

If the handler is a template, there is no way to tell via introspection whether it accepts its parameter by reference or by value. So this is not possible in the general case.

Everything I listed above is impossible to use in @safe code with SumType right now.

It's not impossible, just inconvenient. If your assignment satisfies the conditions outlined in the documentation of SumType.opAssign, you can use @trusted to allow it.

This is fundamentally poor community discussion. SumType joined the standard library in a flash, without barely any community appreciation.

The addition of SumType to the standard library was discussed publicly on the forums and on Github. I was also available for discussion on the D community Discord server and the Dlang Slack for the full duration of the submission process—which was over 3 months, from the day the PR was opened to the day it was merged. Anyone who wished to participate in this discussion had ample opportunity to do so, and many members of the community did.

@ljmf00
Copy link
Member Author

ljmf00 commented Jun 19, 2021

It is, you can check if you're escaping the reference in the matcher, as I have shown above.

If the handler is a template, there is no way to tell via introspection whether it accepts its parameter by reference or by value. So this is not possible in the general case.

I'm not sure if this is the case, but you have access to the handlers right away. Could be possible that, because it is a template, the compile has forward reference to it, but that's a compiler implementation problem. You can check if the lambda has references to the variable just fine.

I can do a change and we can discuss it but as I've seen you are sceptical about changes.

Everything I listed above is impossible to use in @safe code with SumType right now.

It's not impossible, just inconvenient. If your assignment satisfies the conditions outlined in the documentation of SumType.opAssign, you can use @trusted to allow it.

It is impossible in @safe code. @trusted is not the same as @safe, in this context. The user should avoid @trusted and they should not use it for perfectly @safe code. It's just a bad design. Restricting assignment as an unsafe call is a workaround and it is a bad design.

This is fundamentally poor community discussion. SumType joined the standard library in a flash, without barely any community appreciation.

The addition of SumType to the standard library was discussed publicly on the forums and on Github. I was also available for discussion on the D community Discord server and the Dlang Slack for the full duration of the submission process—which was over 3 months, from the day the PR was opened to the day it was merged. Anyone who wished to participate in this discussion had ample opportunity to do so, and many members of the community did.

It was discussed in the forum for 3 days and the PR has more discussions for the core devs of D, not the community actually trying and testing it extensively. One good point raised there was to pass sumtype to an experimental stage, but it was rather directly to the standard library and declared as stable.

The same attitude you had in pbackus/sumtype#64 is disappointing to the community who use D. Making get public could have really good benefits in terms of performance and saying radically a "no" and closing stuff right away makes them moving to other places. This is something a lot of people in the development team don't see, caring about the community, and the language is pretty much community-driven. Moreover, in the article you posted there:

a well-designed system makes it easy to do the right things and annoying (but not impossible) to do the wrong things

Your argument is controversial according to the concept of get. I could open a PR fixing it and have an open discussion but you go there and close it right away so I won't do it. @RazvanN7, for example, doesn't do it.

See this discussion, for example. I made a change to the compiler that people don't agree with, and we got sort of a conclusion, but the codeowners don't just go there and close the PR right away.

@pbackus
Copy link
Contributor

pbackus commented Jun 19, 2021

I can do a change and we can discuss it but as I've seen you are sceptical about changes.

I am skeptical about allowing undefined behavior in @safe code.

It is impossible in @safe code. @trusted is not the same as @safe, in this context. The user should avoid @trusted and they should not use it for perfectly @safe code. It's just a bad design. Restricting assignment as an unsafe call is a workaround and it is a bad design.

If you or anyone else can present an alternative design that (a) does not break backwards compatibility and (b) does not allow undefined behavior in @safe code, I will be happy to consider it.

@pbackus pbackus reopened this Jun 19, 2021
Copy link
Contributor

@pbackus pbackus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example given in this comment must not be allowed to compile.

@iK4tsu
Copy link
Contributor

iK4tsu commented Jun 19, 2021

Assigning should not be the focus here. Reading a field that contains pointers is the problem, in fact this was supported from 2.065.0 to 2.071.2, not sure why it was removed https://dlang.org/changelog/2.065.0.html#ctfe-overlapped-field. But following this, which I think it's the correct path, reading the fields from the SumType that contain pointers should be @system, even if they are being verified by the tid. So, assigning would remain implementation dependent for safety but any handler that depended on a SumType's type that was a pointer would be @system. The example you gave would still be @system code.

So, my proposal is, remove the check from opAssign and change get to be @system when T is or contains a pointer, otherwise follow the current documentation.

void main() @safe
{
	alias ST = SumType!(int, string);
	auto st = SM("@safe assignment");

	st.match!(
		(int i) => true,
		(ref string s) @trusted => false // @system handler
	);
}

Would this work?

@pbackus
Copy link
Contributor

pbackus commented Jun 19, 2021

So, my proposal is, remove the check from opAssign and change get to be @system when T is or contains a pointer, otherwise follow the current documentation.

This would work from a safety perspective, but it breaks backward compatibility.

@pbackus
Copy link
Contributor

pbackus commented Jun 19, 2021

This is the only way to do this, right now. The coverage of @safe'ty code with this is way wider. The code that will break is very minor due to the fact that currently SumType can't be majorly used in @safe code with arrays, strings, pointers, classes or structs that may include arrays, strings, pointers, classes or other nested similar structs, anyway.

Some code that is currently @system would be made @safe, and some code that is currently @safe would be made @system. If you have concrete evidence that the amount of code that would be made @safe is much larger than the amount of code that would be made @system, that would be a reasonable argument in favor of breaking compatibility, but merely asserting that the code that would be made @system is "minor" is not particularly convincing.

The whole problem of making a breaking change at this stage is because there was no experimental stage.

The experimental stage was the 3 years sumtype spend on code.dlang.org prior to its 1.0 release.

@ljmf00
Copy link
Member Author

ljmf00 commented Jun 19, 2021

This is the only way to do this, right now. The coverage of @safe'ty code with this is way wider. The code that will break is very minor due to the fact that currently SumType can't be majorly used in @safe code with arrays, strings, pointers, classes or structs that may include arrays, strings, pointers, classes or other nested similar structs, anyway.

Some code that is currently @system would be made @safe, and some code that is currently @safe would be made @system. If you have concrete evidence that the amount of code that would be made @safe is much larger than the amount of code that would be made @system, that would be a reasonable argument in favor of breaking compatibility, but merely asserting that the code that would be made @system is "minor" is not particularly convincing.

Yes, I can describe a list of benefits for this change.

The whole problem of making a breaking change at this stage is because there was no experimental stage.

The experimental stage was the 3 years sumtype spend on code.dlang.org prior to its 1.0 release.

Even though the package was on code.dlang.org for 3 years prior to its 1.0 release, doesn't mean it passed an experimental stage on the standard library. Skipping such a thing for a package so-called "magic" is not good. The experimental stage should be on the standard library itself or some kind of official way to do so, otherwise, people may not trust packages on code.dlang.org. To be honest, I've never used SumType until this point, for example.

@atilaneves
Copy link
Contributor

I am skeptical about allowing undefined behavior in @safe code.

It should be a given that we can't allow that to happen.

@SirNickolas
Copy link

I would like to highlight two other points that have not been discussed yet.

Let’s look at Rust. Unsurprisingly, it also has a concept of unsafe operations, but it takes a different approach to them regarding pointers. Creating a pointer in any way, even with casting an arbitrary integer to it, is considered safe in Rust since it cannot crash the process or harm in some other way. Dereferencing a (raw) pointer, on the other hand, is unsafe. That is, every block of code that works with raw pointers must be wrapped in unsafe.

On the contrary, pointers are much more common in D than in Rust, which makes it impractical to force every dereferencing operation to be @system. (Note that member access operator, i.e., ., when used on a pointer to a struct, dereferences it, too.) To keep some memory safety guararantees, D declares several ways of constructing pointers unsafe instead.

So, @system opAssign / @safe match is on par with D’s semantics for its basic language features.

Another thing to note is that only mutation of an existing SumType object is @system. With that means, sumtype encourages writing immutable code. Of course, this is not always practical or even possible, but it is usually worth trying.

@ljmf00
Copy link
Member Author

ljmf00 commented Jun 23, 2021

I'm sorry for the late response, I'm currently experiencing some struggle with school stuff.

I am skeptical about allowing undefined behavior in @safe code.

It should be a given that we can't allow that to happen.

My implementation is considering that. I just mentioned that @pbackus seems sceptical to changes because he closes the issues and PRs too quickly. Just that.

I would like to highlight two other points that have not been discussed yet.

Let’s look at Rust. Unsurprisingly, it also has a concept of unsafe operations, but it takes a different approach to them regarding pointers. Creating a pointer in any way, even with casting an arbitrary integer to it, is considered safe in Rust since it cannot crash the process or harm in some other way. Dereferencing a (raw) pointer, on the other hand, is unsafe. That is, every block of code that works with raw pointers must be wrapped in unsafe.

We shouldn't mix Rust here, IMO.

On the contrary, pointers are much more common in D than in Rust, which makes it impractical to force every dereferencing operation to be @system. (Note that member access operator, i.e., ., when used on a pointer to a struct, dereferences it, too.) To keep some memory safety guararantees, D declares several ways of constructing pointers unsafe instead.

Yes, it's just a different approach and it makes sense.

So, @system opAssign / @safe match is on par with D’s semantics for its basic language features.

No. You are partially right, because, yes accessing and dereferencing a pointer is @safe in D, but in this case, it is different, however. Sumtype uses unions internally. Accessing pointers from unions are unsafe because memory is shared. I already explained that above. Assigning has the same rules, as long as it's lived memory, or a null reference, it is @safe to assign. Sumtype is considering assignment unsafe, by explicitly forcing it, even though it doesn't follow the @safe rules for unions.

void main() @safe
{
    union Foo {
        int* a;
        int b;
    }
    
    Foo f;
    f.a;
}

This code doesn't compile.

Another thing to note is that only mutation of an existing SumType object is @system. With that means, sumtype encourages writing immutable code. Of course, this is not always practical or even possible, but it is usually worth trying.

That's an illogical argument. Immutability and @safe are different concepts. Also, encouragement is very different from enforcement. We should not consider using SumType as immutable just because its opAssign is @System . If the user wants it mutable and the code is completely @safe why forcing it?

My whole point on this thread is that. I was just writing completely @safe code and Sumtype reported it as @System. That's why I'm into this change. As I saw, it's not just me, I have already seen other people complaining about it. And this whole thing just happens because get is @trusted and the language doesn't allow accessing it.

@SirNickolas
Copy link

Also, I’d like to show an example of breaking changes that @pbackus is talking about.

I often use scope ref const handlers for matching my sumtypes to avoid copying large structures. scope prevents me from escaping pointers, and, given I don’t reassign the sumtype in the handler, this pattern is presently inferred to be @safe.

This is not entirely about optimizing out a shallow copy: a type may have a non-trivial copy constructor and/or destructor we don’t want to execute. It might even be non-copyable.

If we were to move @systemness from opAssign to match, examples you provided will start to compile and this one will stop. They are all memory-safe, but the D’s type system is not advanced enough to prove that.

@ljmf00
Copy link
Member Author

ljmf00 commented Jun 23, 2021

Also, I’d like to show an example of breaking changes that @pbackus is talking about.

I often use scope ref const handlers for matching my sumtypes to avoid copying large structures. scope prevents me from escaping pointers, and, given I don’t reassign the sumtype in the handler, this pattern is presently inferred to be @safe.

This is not entirely about optimizing out a shallow copy: a type may have a non-trivial copy constructor and/or destructor we don’t want to execute. It might even be non-copyable.

If we were to move @systemness from opAssign to match, examples you provided will start to compile and this one will stop. They are all memory-safe, but the D’s type system is not advanced enough to prove that.

Yes.

I'm working on minimizing the coverage of @System code and maximizing the coverage of @safe code in this situation. match will only be @System if the handler is matching a type with pointers or nested pointers and also has the storage class ref, otherwise it will be @safe. We can also make this @safe by checking if the parameter of the handler is immutable. The whole point of making these checks is only because the compiler doesn't have checks for escaping references. I can also do further testing about scope ref. If we can check these situations, your example will be @safe.

And I understand your point, although, we should discuss this breaking change because:

  1. The current behaviour breaks the semantics of @safe'ty in the language when using a union.
  2. The coverage of @safe code is bigger than @System code and we can minimize that by doing checks.
  3. This breaking change only occurs because the compiler is not capable of tracking those references, at the moment.

@ljmf00 ljmf00 marked this pull request as draft July 3, 2021 17:25
@RazvanN7
Copy link
Collaborator

What is the status of this PR?

@PetarKirov
Copy link
Member

I'd say we should make the {tryM,m}atch overload that accepts delegates @system and remove @system from opAssign, given that opAssign + match accepting pure function (pointers) (not delegates) does not cause memory safety issues, if I remember correctly.

@pbackus
Copy link
Contributor

pbackus commented Jul 29, 2021

@PetarKirov It does not matter whether the handler is a delegate, a normal function, or an object with overloaded opCall. As long as @safe handlers are allowed to receive the value by ref, we cannot allow opAssign to be @safe, or we will allow undefined behavior to occur as in my original example.

It is of course possible to resolve this by either (a) always passing the SumType's value to the handler by value, rather than by reference, or (b) making match @system and opAssign @safe instead of the other way around, but both of these are breaking changes to code that is currently correct—and both introduce their own new forms of inconvenience, which someone else will inevitably complain about, leading us right back to where we are now.

If D ever gets a real ownership/borrowing system, it will be possible to make opAssign @safe by having it mutably (i.e., exclusively) borrow this. Until and unless that happens, my recommendation is to leave things as they are.

@ljmf00
Copy link
Member Author

ljmf00 commented Jul 29, 2021

What is the status of this PR?

This PR is pretty much stuck because my modification requires changes on the compiler to be able to fetch storage classes from call expressions, see here #8146 (comment) and dlang/dmd#12712 for context.

I'd say we should make the {tryM,m}atch overload that accepts delegates @system and remove @system from opAssign, given that opAssign + match accepting pure function (pointers) (not delegates) does not cause memory safety issues, if I remember correctly.

@PetarKirov It does not matter whether the handler is a delegate, a normal function, or an object with overloaded opCall. As long as @safe handlers are allowed to receive the value by ref, we cannot allow opAssign to be @safe, or we will allow undefined behavior to occur as in my original example.

Yes, there is this problem @pbackus described. I agree that this shouldn't happen and we should have a way to circumvent these situations.

It is of course possible to resolve this by either (a) always passing the SumType's value to the handler by value, rather than by reference, or (b) making match @system and opAssign @safe instead of the other way around, but both of these are breaking changes to code that is currently correct—and both introduce their own new forms of inconvenience, which someone else will inevitably complain about, leading us right back to where we are now.

I don't agree with (a) because it is restrictive and (b), matcher will not be completely @system, because it will be @safe if there is no handler with ref storage class and the type need to have indirections. This approach opens much more @safe cases.

We won't implement this change because it is a breaking change is illogical, in my point of view. After all, there were no experimental phases in the first place. Also saying that the experimental phase happened in the dub registry is illogical. Even if it has a highly used dub package, you can pick a random one from 3 years ago or something. They can have design flaws that should be discussed and heavily tested. You shouldn't assume that this package is safe to go straight to production because it was 3 years on the dub registry.

If D ever gets a real ownership/borrowing system, it will be possible to make opAssign @safe by having it mutably (i.e., exclusively) borrow this. Until and unless that happens, my recommendation is to leave things as they are.

There is no need for a complete ownership/borrowing system, only a way to track if the pointer is being used, but I understand what you are saying. I read that Walter, for whatever reason (I know it is not an easy task either), doesn't want to implement this tracking mechanism, here https://issues.dlang.org/show_bug.cgi?id=22045#c1 .


As I said earlier, I'm not the only one complaining about this issue. This makes SumType useless in simple cases like mutable SumTypes with strings because strings have indirections. The point of making the match @system in those situations described is because there is a way in there to make memory corrupt, although, with the current system, only doing assignments there is no way to memory corruption.

SumType is very beneficial for situations where people want to create @safe code with SumType and don't want to depend on TypeInfo, because of (a) betterC compatibility or (b) some other reasons related to TypeInfo being badly designed and sometimes being non-@nogc for whatever reason that I'm also trying to figure out. I'm talking about TypeInfo here because of the really clever way SumType do not depend on TypeInfo due to its tagging system.

@pbackus
Copy link
Contributor

pbackus commented Jul 29, 2021

We won't implement this change because it is a breaking change is illogical, in my point of view.

It's not a totally black-and-white thing, but implementing a breaking change has a much higher cost than implementing a non-breaking change. We will only do so if we are confident that the benefit is great enough to outweigh that cost.

So far, I have not seen enough evidence to convince me that the benefit is worth the cost for this change.

After all, there were no experimental phases in the first place. Also saying that the experimental phase happened in the dub registry is illogical.

@atilaneves is the one who made the final decision on this. If you have ideas for how the Phobos submission process can be improved, you should send them to him, since he's the one best-positioned to do something about it.

This makes SumType useless in simple cases like mutable SumTypes with strings because strings have indirections.

Well, no; it just means you have to write

// Safe because there are no other references to mySumType
() @trusted { mySumType = "hello"; }();

...instead of

mySumType = "hello";

I agree that this is ugly and inconvenient, but saying it makes SumType useless is a bit of an exaggeration.

@ljmf00
Copy link
Member Author

ljmf00 commented Jul 29, 2021

We won't implement this change because it is a breaking change is illogical, in my point of view.

It's not a totally black-and-white thing, but implementing a breaking change has a much higher cost than implementing a non-breaking change. We will only do so if we are confident that the benefit is great enough to outweigh that cost.

So far, I have not seen enough evidence to convince me that the benefit is worth the cost for this change.

Ok. If you want, we can come up with a list of advantages and disadvantages and discuss them after I come up with my new proposal.


After all, there were no experimental phases in the first place. Also saying that the experimental phase happened in the dub registry is illogical.

@atilaneves is the one who made the final decision on this. If you have ideas for how the Phobos submission process can be improved, you should send them to him, since he's the one best-positioned to do something about it.

I'm not pointing out you in specific. You are just a maintainer of this package and totally understand your position. Although you present that as an argument and I'm just against it because that is not considered an experimental phase for a standard library. I mean, maybe it is, according to Walter, Atila or Andrea, I don't really know at this point.

The whole community and core contributors governance model in D is broken. You look at Python, for example, you have PEP 13, 8000, 8001, 8002, ... which are documents explaining the decision process, controversial decision process, voting mechanics,...

Because D doesn't have a concise decision model, every decision is pretty much done by their "dictators". Then unfortunate things like this happen: dlang/dmd#12828

More in-depth into the Python Software Foundation philosophy, dealing with new packages into the standard library is also accomplished with the help of PEP 411 using provisional packages.

This PEP describes a methodology for marking a standard library package "provisional" for the period of a single feature release. A provisional package may have its API modified prior to "graduating" into a "stable" state. On one hand, this state provides the package with the benefits of being formally part of the Python distribution. On the other hand, the core development team explicitly states that no promises are made with regards to the stability of the package's API, which may change for the next release. While it is considered an unlikely outcome, such packages may even be removed from the standard library without a deprecation period if the concerns regarding their API or maintenance prove well-founded.

I'm not a Python user nor Python developer, but, for what I've seen, this is a good example of some governance, organization and commitment for what is called a community-driven language.

Read more about those referred PEPs in their PEP index: https://www.python.org/dev/peps/


This makes SumType useless in simple cases like mutable SumTypes with strings because strings have indirections.

Well, no; it just means you have to write

// Safe because there are no other references to mySumType
() @trusted { mySumType = "hello"; }();

...instead of

mySumType = "hello";

I agree that this is ugly and inconvenient, but saying it makes SumType useless is a bit of an exaggeration.

Recommending this code to the end-user is a really bad practice. Your case presents a simple string literal. In a complex scenario, things can be hard to track and manually track which is, sometimes, not feasible. @trusted should be used if the user really knows what they are doing.

In this particular case, a user that is not aware of this, to properly check if their code is really @safe they need to consult SumType documentation, and if they end up consulting the code, like I've done, they will see this:

cast(void) () @system {}();

Furthermore, the user can fall into the situation of wrapping this assignment where the right side is @system or cause undefined behaviour. It can be numerous things, and thinking about all of them every time the user writes an assignment can be impractical.

@PetarKirov
Copy link
Member

PetarKirov commented Jul 29, 2021

If D ever gets a real ownership/borrowing system, it will be possible to make opAssign @safe by having it mutably (i.e., exclusively) borrow this.

Yes, ultimately this is what needs to happen for @safe to really mean safe.

@PetarKirov It does not matter whether the handler is a delegate, a normal function, or an object with overloaded opCall. As long as @safe handlers are allowed to receive the value by ref, we cannot allow opAssign to be @safe, or we will allow undefined behavior to occur as in my original example.

Hmm, unless I'm missing something, your example won't be possible if the (ref int* p) handler was a pure function, as it wouldn't have access to the enclosing scope and so there wouldn't be a mutable aliasing of SumType's tag field.

It is of course possible to resolve this by either (a) always passing the SumType's value to the handler by value, rather than by reference, or (b) making match @system and opAssign @safe instead of the other way around

Well if opAssign is currently @system, then it means that SumType is completely unusable in @safe code so, I think is option (b) is only a theoretical breaking change (not real one in practice) and the better option in my opinion.

P.S. Also the fact that BuildKite is completely green is also confirms that this breaking change has a very narrow impact in practice.

@ljmf00 While I agree with your sentiment about D's process, please let's have this discussion somewhere else, as this PR is not the right place for that.

@ljmf00
Copy link
Member Author

ljmf00 commented Jul 29, 2021

@PetarKirov It does not matter whether the handler is a delegate, a normal function, or an object with overloaded opCall. As long as @safe handlers are allowed to receive the value by ref, we cannot allow opAssign to be @safe, or we will allow undefined behavior to occur as in my original example.

Hmm, unless I'm missing something, your example won't be possible if the (ref int* p) handler was a pure function, as it wouldn't have access to the enclosing scope and so there wouldn't be a mutable aliasing of SumType's tag field.

I guess you missed something. I'm not sure if you are talking about strong purity or weak purity tho. But I don't see pure interfering here, AFAIK. Also, I'm covering @pbackus case on the unittests, see here https://github.com/dlang/phobos/pull/8146/files#diff-da9a9337d4295ecd14fdfb456a91a64e84279bb0bb4897afc68b7e9ddd046772R2508 .

@ljmf00 While I agree with your sentiment about D's process, please let's have this discussion somewhere else, as this PR is not the right place for that.

You're right, this is a recurrent thing tho and is delaying this PR, as the biggest argument I see is breaking changes. I won't discuss this here <EOF>

@pbackus
Copy link
Contributor

pbackus commented Jul 30, 2021

Hmm, unless I'm missing something, your example won't be possible if the (ref int* p) handler was a pure function, as it wouldn't have access to the enclosing scope and so there wouldn't be a mutable aliasing of SumType's tag field.

It's still possible for a pure handler to escape a reference to a SumType member:

static SumType!(int*, int) x; // global lifetime - scope does not apply

int** pp = x.match!(
    (ref int* p) => &p,
    (int _) => null
);

x = 12345; // overwrite *pp (currently @system)
int oops = **pp; // kaboom

Well if opAssign is currently @system, then it means that SumType is completely unusable in @safe code so, I think is option (b) is only a theoretical breaking change (not real one in practice) and the better option in my opinion.

SumType is perfectly usable in @safe code as long as you do not assign to a SumType that contains a pointer, or are willing to use @trusted as described in the documentation.

In fact, I would argue that it is much easier to avoid opAssign than to avoid match, since there are two ways to put a value into a SumType (constructor and opAssign), but only one way to get a value out (match). So implementing option (b) would actually make SumType harder to use in @safe code, rather than easier.

P.S. Also the fact that BuildKite is completely green is also confirms that this breaking change has a very narrow impact in practice.

Given that std.sumtype has only been recently added to Phobos, I do not think we can rely on BuildKite to give good data here—I would be surprised if any of the projects it tests make use of std.sumtype at all, let alone in @safe code specifically.

On the other hand, one user has already commented in this thread that they have code that would be broken by this change. So we actually have concrete evidence that the breakage is not "theoretical."

@RazvanN7
Copy link
Collaborator

This PR is pretty much stuck because my modification requires changes on the compiler to be able to fetch storage classes from call expressions, see here #8146 (comment) and dlang/dmd#12712 for context.

@ljmf00 @pbackus Now that dlang/dmd#12712 has been fixed, how do we proceed here?

@pbackus
Copy link
Contributor

pbackus commented Sep 13, 2021

If I understand correctly, the current proposal is:

  • By-ref access to SumType members with unsafe values becomes @system.
  • By-value access to those members remains @safe.
  • SumType.opAssign becomes @safe.

This is an improvement to previous proposals, which either made access unconditionally @system or forbade by-ref access entirely. Broken code has a straightforward upgrade path of switching to by-value access (although it comes with a potential hit to performance).

The downsides are:

  • It breaks existing code.
  • @trusted is still necessary if you really need by-ref access (e.g. for performance).
  • Writing a correct @trusted call to match is much more difficult than writing a correct @trusted assignment.

It is worth pointing out that the breakage will not just affect direct calls to match but also calls to methods like SumType.opEquals and SumType.toString, since they use by-ref access internally. (Of course, changing them to use by-value access is possible, but then they will no longer work for non-copyable types—another breaking change.)

Ultimately, I am still not convinced this is a good idea. The current proposal is strong enough that I think it could potentially be justified in isolation, but the precedent it sets is concerning. If we're willing to allow your PR to introduce a regression in order to fix your use-case, what do we say to the next contributor who wants to break someone else's code to fix their use-case?

Right now, given D's "dictatorship" governance model, I think the answer is, "it's up to Atila to decide on a case-by-case basis."

@atilaneves We need a decision here—is making SumType.opAssign @safe an acceptable justification for introducing regressions elsewhere?

@SirNickolas
Copy link

It is worth pointing out that the breakage will not just affect direct calls to match but also calls to methods like SumType.opEquals and SumType.toString, since they use by-ref access internally.

Thanks for an excellent point. @ljmf00, I’d like to ask, how would one write generic code with match-by-ref that propagates safety? You cannot declare those methods @trusted unconditionally—that would be a violation of the type system. You likely do not want to declare them @system unconditionally either. So how would you achieve that?

This affects not only two methods mentioned above but any generic code written by sumtype’s users as well.

Of course, changing them to use by-value access is possible

In my opinion, copying an object just to compare it to another seems inappropriate for a reusable library. Especially the stdlib.

@RazvanN7
Copy link
Collaborator

RazvanN7 commented Jan 7, 2022

@atilaneves I think that you are the only one who can sort this out.

@atilaneves
Copy link
Contributor

I agree with @pbackus and don't think this should be merged.

@RazvanN7
Copy link
Collaborator

I agree with @pbackus and don't think this should be merged.

@ljmf00 @pbackus I think that this settles it. Thank you all for the time you have invested in this.

@ljmf00 I hope that this will not discourage and stop the streak of awesome contributions that you have made up until now.

@RazvanN7 RazvanN7 closed this Jan 10, 2022
@ljmf00
Copy link
Member Author

ljmf00 commented Jan 10, 2022

I agree with @pbackus and don't think this should be merged.

@ljmf00 @pbackus I think that this settles it. Thank you all for the time you have invested in this.

@ljmf00 I hope that this will not discourage and stop the streak of awesome contributions that you have made up until now.

This doesn't discourage me to contribute to a unique language like D, just drains the batteries a bit. My point with this is to make D a better language.

Even though I disagree I can understand the point about breaking changes although I also think we can learn from this in the future, when adding a new module to the Phobos. In particularly this module, we should have had some experimental phase to accommodate breaking changes more easily. The whole point of the breaking change I disagree is the fact that the module was introduced in the last release, at the time of this PR.

The current behaviour is breaking the way D handles safety plus the current form is more restrictive. I already tried to prove it above. Ideally this should all be safe but edge cases are hard to handle. I think that the amount of things this PR makes system is less than the current behaviour to justify the breaking change.

I already have in mind a fork with upstreamed changes but I'm always welcome to try upstreaming them. The idea is not to diverge from Phobos but present a proposal to change with a practical concept of Phobos 2.0, both from code concepts but also robustness in extensively testing it.

I think we should learn one more thing from this PR: we should think about having tagged unions in the language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants