sumtype: remove forced @system for opAssign preventing @safe code#8146
sumtype: remove forced @system for opAssign preventing @safe code#8146ljmf00 wants to merge 1 commit intodlang:masterfrom
Conversation
|
Thanks for your pull request and interest in making D better, @ljmf00! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla references
Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + phobos#8146" |
|
Sorry, I wrongly categorized the issue importance. The label should be updated. |
197a55b to
97f2adf
Compare
|
The rationale for this behaviour is described in the comments that your PR removes. If I have a |
Firstly, unless you are doing pretty weird stuff or breaking the language with undefined behaviour, there's no reason to write The whole point of SumType is to match the right type, if you assign a You should trust the compiler and respect the language's definition of @safe unless you are just afraid of unsafe code and think that every pointer is unsafe. According to the language specification:
When allocating with Particularly, SumType is using an
So I don't see any problem here rather than a restricted idea of @safe code. |
See above. Watch your tone, too.
That section is specifically about compatibility with the garbage collector and not relevant to the discussion here. All it says is that the GC doesn't miss pointers when stored in |
Again, you should trust the language rather than forcing @System attribute in this case. If there's any possibility of having memory corruption with it, please come up with an example, I would be happy to discuss it.
Sorry if I was aggressive, it is not my intention to offend, and if I've done it, sincere apologies.
Yes, I just related that with allocating with |
The point is that with one In particular, you need to prove that with your change, there is no way that there can be any references to the memory left over that are overwritten. The comment, presumably by @pbackus, seems to suggest that there is a way to get into that situation if the data type itself has indirections. Perhaps he has already has an example in mind (possibly involving self-referential data structures)? Myself, I don't have a counter-example ready, but safety doesn't seem obvious to prove either.
No apology needed, but you seemed to be arguing from a position seeing the intentional, documented check as "obviously stupid", which is rarely a productive way to go about things (and probably offensive to the original author of the piece of code – which I am not). |
|
This change would allow undefined behavior in int n;
int example() @safe
{
SumType!(int*, int) x = &n;
return x.match!(
(int n) => n,
(ref int* p) {
x = 123456789; // overwrites p (currently @system)
return *p; // kaboom
}
);
} |
Yes, @trusted could lead to very complicated edge cases, I agree.
@pbackus just pointed an example that we can discuss further. Would be cool to discuss the self-referencial data structures too!
You are right, I should have chosen another name for the branch.
The issue you described proves that the problem is not in the opAssign and rather in the match. The matcher passes a reference of the shared union memory, that's why you get a memory corruption here. You are not covering this particular case. If you don't pass the reference of the shared memory to the lambda, everything is just fine: int example() @safe
{
SumType!(int*, int) x = &n;
return x.match!(
(int n) => n,
(int* p) {
x = 123456789;
return *p;
}
);
}I see what you are facing but making assignment @System is wrong, the problem is accessing the overwritten data rather than assigning it. The assignment is fine. |
Either one is fine on its own. Allowing both is what causes the problem. Since making the |
No. Assignment does never lead to memory corruption unless that memory is not considered live. Assignment with |
|
Also, @pbackus, I would suggest you to not close PRs/issues during active discussions. I see it unproductive and kinda disrespectful, in my perspective. |
|
Allowing
|
You should revise your argument, "access an invalid pointer". You only access invalid memory because you assigned unlived memory. You are restricting the assignment of live memory, which is wrong. If the user assigns invalid memory, it's because it came from an unsafe interface. Doing this is @safe: void foobar(int* f) @safe
{
*f = 5;
}There's nothing wrong here. If the user passes invalid memory, yes you have memory corruption, but this is @safe code, the users just used unsafe interfaces to pass unlived memory to your @safe interface. A more complete example, perfectly @safe: int* b;
void foobar(int* f) @safe
{
*f = 5;
b = f;
*f = *b;
}You will probably consider this very unsafe, but it's not. The problem is if the user passes an invalid memory to |
The restriction is necessary to ensure that undefined behavior cannot occur in It would be possible to lift the restriction in certain special cases, like the one in pbackus/sumtype#67 where the pointer is the only member of the |
This is possible and the restriction is, again, not in the right place.
It is, you can check if you're escaping the reference in the matcher, as I have shown above. If the lambda is a reference parameter to some SumType field, the matcher is @System, because there is a way of memory corruption. I would say this change should be discussed.
SumType with only one member makes it unpractical to use. In the same way not useful to use it with arrays, strings, pointers, classes or structs that may include arrays, strings, pointers, classes or other nested similar structs. Everything I listed above is impossible to use in @safe code with SumType right now. This is fundamentally poor community discussion. SumType joined the standard library in a flash, without barely any community appreciation. D programming language, its users and contributors are trying to move forward in @safe'ty not backwards. And don't get me wrong, SumType has a really cool concept behind it and thanks a ton for bringing up the idea, but the practicality of it should be properly discussed. Discussion about these things is very crucial for the success of the language. If we keep on the same path, we end up having the same interface as the containers in the standard library. |
If the handler is a template, there is no way to tell via introspection whether it accepts its parameter by reference or by value. So this is not possible in the general case.
It's not impossible, just inconvenient. If your assignment satisfies the conditions outlined in the documentation of
The addition of SumType to the standard library was discussed publicly on the forums and on Github. I was also available for discussion on the D community Discord server and the Dlang Slack for the full duration of the submission process—which was over 3 months, from the day the PR was opened to the day it was merged. Anyone who wished to participate in this discussion had ample opportunity to do so, and many members of the community did. |
I'm not sure if this is the case, but you have access to the handlers right away. Could be possible that, because it is a template, the compile has forward reference to it, but that's a compiler implementation problem. You can check if the lambda has references to the variable just fine. I can do a change and we can discuss it but as I've seen you are sceptical about changes.
It is impossible in @safe code. @trusted is not the same as @safe, in this context. The user should avoid @trusted and they should not use it for perfectly @safe code. It's just a bad design. Restricting assignment as an unsafe call is a workaround and it is a bad design.
It was discussed in the forum for 3 days and the PR has more discussions for the core devs of D, not the community actually trying and testing it extensively. One good point raised there was to pass The same attitude you had in pbackus/sumtype#64 is disappointing to the community who use D. Making
Your argument is controversial according to the concept of See this discussion, for example. I made a change to the compiler that people don't agree with, and we got sort of a conclusion, but the codeowners don't just go there and close the PR right away. |
I am skeptical about allowing undefined behavior in
If you or anyone else can present an alternative design that (a) does not break backwards compatibility and (b) does not allow undefined behavior in |
pbackus
left a comment
There was a problem hiding this comment.
The example given in this comment must not be allowed to compile.
|
Assigning should not be the focus here. Reading a field that contains pointers is the problem, in fact this was supported from So, my proposal is, remove the check from opAssign and change void main() @safe
{
alias ST = SumType!(int, string);
auto st = SM("@safe assignment");
st.match!(
(int i) => true,
(ref string s) @trusted => false // @system handler
);
}Would this work? |
This would work from a safety perspective, but it breaks backward compatibility. |
Some code that is currently
The experimental stage was the 3 years |
Yes, I can describe a list of benefits for this change.
Even though the package was on code.dlang.org for 3 years prior to its 1.0 release, doesn't mean it passed an experimental stage on the standard library. Skipping such a thing for a package so-called "magic" is not good. The experimental stage should be on the standard library itself or some kind of official way to do so, otherwise, people may not trust packages on code.dlang.org. To be honest, I've never used SumType until this point, for example. |
It should be a given that we can't allow that to happen. |
|
I would like to highlight two other points that have not been discussed yet. Let’s look at Rust. Unsurprisingly, it also has a concept of unsafe operations, but it takes a different approach to them regarding pointers. Creating a pointer in any way, even with casting an arbitrary integer to it, is considered safe in Rust since it cannot crash the process or harm in some other way. Dereferencing a (raw) pointer, on the other hand, is unsafe. That is, every block of code that works with raw pointers must be wrapped in On the contrary, pointers are much more common in D than in Rust, which makes it impractical to force every dereferencing operation to be So, Another thing to note is that only mutation of an existing |
|
I'm sorry for the late response, I'm currently experiencing some struggle with school stuff.
My implementation is considering that. I just mentioned that @pbackus seems sceptical to changes because he closes the issues and PRs too quickly. Just that.
We shouldn't mix Rust here, IMO.
Yes, it's just a different approach and it makes sense.
No. You are partially right, because, yes accessing and dereferencing a pointer is @safe in D, but in this case, it is different, however. Sumtype uses void main() @safe
{
union Foo {
int* a;
int b;
}
Foo f;
f.a;
}This code doesn't compile.
That's an illogical argument. Immutability and @safe are different concepts. Also, encouragement is very different from enforcement. We should not consider using SumType as immutable just because its opAssign is @System . If the user wants it mutable and the code is completely @safe why forcing it? My whole point on this thread is that. I was just writing completely @safe code and Sumtype reported it as @System. That's why I'm into this change. As I saw, it's not just me, I have already seen other people complaining about it. And this whole thing just happens because |
|
Also, I’d like to show an example of breaking changes that @pbackus is talking about. I often use This is not entirely about optimizing out a shallow copy: a type may have a non-trivial copy constructor and/or destructor we don’t want to execute. It might even be non-copyable. If we were to move |
Yes. I'm working on minimizing the coverage of @System code and maximizing the coverage of @safe code in this situation. And I understand your point, although, we should discuss this breaking change because:
|
|
What is the status of this PR? |
|
I'd say we should make the |
|
@PetarKirov It does not matter whether the handler is a delegate, a normal function, or an object with overloaded It is of course possible to resolve this by either (a) always passing the If D ever gets a real ownership/borrowing system, it will be possible to make |
This PR is pretty much stuck because my modification requires changes on the compiler to be able to fetch storage classes from call expressions, see here #8146 (comment) and dlang/dmd#12712 for context.
Yes, there is this problem @pbackus described. I agree that this shouldn't happen and we should have a way to circumvent these situations.
I don't agree with (a) because it is restrictive and (b), matcher will not be completely We won't implement this change because it is a breaking change is illogical, in my point of view. After all, there were no experimental phases in the first place. Also saying that the experimental phase happened in the dub registry is illogical. Even if it has a highly used dub package, you can pick a random one from 3 years ago or something. They can have design flaws that should be discussed and heavily tested. You shouldn't assume that this package is safe to go straight to production because it was 3 years on the dub registry.
There is no need for a complete ownership/borrowing system, only a way to track if the pointer is being used, but I understand what you are saying. I read that Walter, for whatever reason (I know it is not an easy task either), doesn't want to implement this tracking mechanism, here https://issues.dlang.org/show_bug.cgi?id=22045#c1 . As I said earlier, I'm not the only one complaining about this issue. This makes SumType useless in simple cases like mutable SumTypes with strings because strings have indirections. The point of making the SumType is very beneficial for situations where people want to create |
It's not a totally black-and-white thing, but implementing a breaking change has a much higher cost than implementing a non-breaking change. We will only do so if we are confident that the benefit is great enough to outweigh that cost. So far, I have not seen enough evidence to convince me that the benefit is worth the cost for this change.
@atilaneves is the one who made the final decision on this. If you have ideas for how the Phobos submission process can be improved, you should send them to him, since he's the one best-positioned to do something about it.
Well, no; it just means you have to write // Safe because there are no other references to mySumType
() @trusted { mySumType = "hello"; }();...instead of mySumType = "hello";I agree that this is ugly and inconvenient, but saying it makes |
Ok. If you want, we can come up with a list of advantages and disadvantages and discuss them after I come up with my new proposal.
I'm not pointing out you in specific. You are just a maintainer of this package and totally understand your position. Although you present that as an argument and I'm just against it because that is not considered an experimental phase for a standard library. I mean, maybe it is, according to Walter, Atila or Andrea, I don't really know at this point. The whole community and core contributors governance model in D is broken. You look at Python, for example, you have PEP 13, 8000, 8001, 8002, ... which are documents explaining the decision process, controversial decision process, voting mechanics,... Because D doesn't have a concise decision model, every decision is pretty much done by their "dictators". Then unfortunate things like this happen: dlang/dmd#12828 More in-depth into the Python Software Foundation philosophy, dealing with new packages into the standard library is also accomplished with the help of PEP 411 using provisional packages.
I'm not a Python user nor Python developer, but, for what I've seen, this is a good example of some governance, organization and commitment for what is called a community-driven language. Read more about those referred PEPs in their PEP index: https://www.python.org/dev/peps/
Recommending this code to the end-user is a really bad practice. Your case presents a simple string literal. In a complex scenario, things can be hard to track and manually track which is, sometimes, not feasible. In this particular case, a user that is not aware of this, to properly check if their code is really cast(void) () @system {}();Furthermore, the user can fall into the situation of wrapping this assignment where the right side is |
Yes, ultimately this is what needs to happen for
Hmm, unless I'm missing something, your example won't be possible if the
Well if P.S. Also the fact that BuildKite is completely green is also confirms that this breaking change has a very narrow impact in practice. @ljmf00 While I agree with your sentiment about D's process, please let's have this discussion somewhere else, as this PR is not the right place for that. |
I guess you missed something. I'm not sure if you are talking about strong purity or weak purity tho. But I don't see pure interfering here, AFAIK. Also, I'm covering @pbackus case on the unittests, see here https://github.com/dlang/phobos/pull/8146/files#diff-da9a9337d4295ecd14fdfb456a91a64e84279bb0bb4897afc68b7e9ddd046772R2508 .
You're right, this is a recurrent thing tho and is delaying this PR, as the biggest argument I see is breaking changes. I won't discuss this here |
It's still possible for a pure handler to escape a reference to a static SumType!(int*, int) x; // global lifetime - scope does not apply
int** pp = x.match!(
(ref int* p) => &p,
(int _) => null
);
x = 12345; // overwrite *pp (currently @system)
int oops = **pp; // kaboom
In fact, I would argue that it is much easier to avoid
Given that On the other hand, one user has already commented in this thread that they have code that would be broken by this change. So we actually have concrete evidence that the breakage is not "theoretical." |
@ljmf00 @pbackus Now that dlang/dmd#12712 has been fixed, how do we proceed here? |
|
If I understand correctly, the current proposal is:
This is an improvement to previous proposals, which either made access unconditionally The downsides are:
It is worth pointing out that the breakage will not just affect direct calls to Ultimately, I am still not convinced this is a good idea. The current proposal is strong enough that I think it could potentially be justified in isolation, but the precedent it sets is concerning. If we're willing to allow your PR to introduce a regression in order to fix your use-case, what do we say to the next contributor who wants to break someone else's code to fix their use-case? Right now, given D's "dictatorship" governance model, I think the answer is, "it's up to Atila to decide on a case-by-case basis." @atilaneves We need a decision here—is making |
Thanks for an excellent point. @ljmf00, I’d like to ask, how would one write generic code with match-by-ref that propagates safety? You cannot declare those methods This affects not only two methods mentioned above but any generic code written by
In my opinion, copying an object just to compare it to another seems inappropriate for a reusable library. Especially the stdlib. |
|
@atilaneves I think that you are the only one who can sort this out. |
|
I agree with @pbackus and don't think this should be merged. |
This doesn't discourage me to contribute to a unique language like D, just drains the batteries a bit. My point with this is to make D a better language. Even though I disagree I can understand the point about breaking changes although I also think we can learn from this in the future, when adding a new module to the Phobos. In particularly this module, we should have had some experimental phase to accommodate breaking changes more easily. The whole point of the breaking change I disagree is the fact that the module was introduced in the last release, at the time of this PR. The current behaviour is breaking the way D handles safety plus the current form is more restrictive. I already tried to prove it above. Ideally this should all be safe but edge cases are hard to handle. I think that the amount of things this PR makes system is less than the current behaviour to justify the breaking change. I already have in mind a fork with upstreamed changes but I'm always welcome to try upstreaming them. The idea is not to diverge from Phobos but present a proposal to change with a practical concept of Phobos 2.0, both from code concepts but also robustness in extensively testing it. I think we should learn one more thing from this PR: we should think about having tagged unions in the language. |
Fixes #22041 .
D programming language is moving towards @safe code and introducing new
@System code makes std.sumtype fundamentally useless when using it under @safe
code. Forcing @System when the users are using pointers is just wrong.
The simple example to prove it is the following:
Using pointers in D is not necessarily @System.
Moreover, this code is preventing the following code to compile in @safe:
Signed-off-by: Luís Ferreira contact@lsferreira.net