Skip to content

Comments

add support for Unrestricted Unions#5830

Merged
andralex merged 1 commit intodlang:masterfrom
WalterBright:unrestricted-unions
Jun 17, 2016
Merged

add support for Unrestricted Unions#5830
andralex merged 1 commit intodlang:masterfrom
WalterBright:unrestricted-unions

Conversation

@WalterBright
Copy link
Member

Unrestricted Unions are a feature added to C++11. Unrestricted unions can have fields that have destructors, postblits, or invariants. It's up to the user to call them correctly, the compiler does not automatically generate calls to them.

I added the "Needs Work" label because it is a bit unfinished, I want to see how it passes the autotester so far.

@WalterBright WalterBright force-pushed the unrestricted-unions branch 5 times, most recently from 127f809 to a7f254e Compare June 3, 2016 04:05
@WalterBright
Copy link
Member Author

Ready 2 rock.

@PetarKirov
Copy link
Member

Nice!
Can you add a runnable test? I would prefer a case where one of the union members is struct which contains a field with elaborate destructor. One assert that checks that RAII with union is not triggered and one to verify that manually calling destroy on union member correctly destroys it.
And maybe one more test where the union is a member of a struct.

@JackStouffer
Copy link
Contributor

Please add a change log entry, as marketing this change is important :)

@braddr
Copy link
Member

braddr commented Jun 3, 2016

This change concerns me. Not that the ability to do this is bad, but shouldn't it be distinguishable from a standard union? How's it interplay with other aspects of D. Just because it exists in c++ doesn't mean it's automatically good for D.

@yebblies
Copy link
Contributor

yebblies commented Jun 4, 2016

What's the justification behind this? What's the use case? Does this make any un-@safe things legal? Where's the spec change?

@andralex
Copy link
Member

andralex commented Jun 4, 2016

@yebblies refer to https://issues.dlang.org/show_bug.cgi?id=16104. The problem with the current definition of union is it's too restrictive to be useful. This has been also the case in C++ for many years (and D's union copied C++'s). As a result union was impossible to use in the exact places where it could have been useful (e.g. variant types) and people needed to actually not use union and instead resort to nonportable tricks. Relaxing union semantics marked a good step forward for low-level layout design in C++, and will mark for us too.

@braddr
Copy link
Member

braddr commented Jun 4, 2016

My problem with this is that while unions had limitations before, this opens a gigantic can of usability problems without any mechanisms to warn the user. Before unions were a fancy reinterpret casting construct. Now you've got one that also allows the inclusion of types that must have helper functions be invoked in exactly the right places with no warning that you've done that or that any of the types involve have potentially been changed from pod behavior to non-pod behavior. Yes, the limitations are too restrictive for some very interesting use cases, but they made unions a lot less likely to be used improperly. I'm not at all opposed to the new capabilities, as they are useful, I just want them separated or at least the previous protections kept available (and probably the default).

@WalterBright WalterBright force-pushed the unrestricted-unions branch from a7f254e to fa7fd2f Compare June 12, 2016 07:03
@WalterBright
Copy link
Member Author

Added changelog.dd entry

@WalterBright WalterBright force-pushed the unrestricted-unions branch from fa7fd2f to 3725a66 Compare June 12, 2016 07:05
@mihails-strasuns-sociomantic
Copy link
Contributor

I agree with @braddr and disapprove of this change (feature is good, but it must be a different type of union). Union restrictiveness helped me found some nasty bugs while porting code from D1 to D2 and that was very appreciated. I'd like to keep that safety tool in my arsenal.

@WalterBright
Copy link
Member Author

@mihails-strasuns-sociomantic I've been improving the safety checks to prevent more unsafe uses of unions.

@andralex
Copy link
Member

We've been light on process on this one because the C++ community has been through the same stages and has done all the ground work of discussing the relaxation, going through all arguments. etc. (Those might be worth looking at: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2544.pdf.) I myself have lived through the pain of pre-C++11 unions.

There is no doubt this relaxation improves the language. Disallowing types with destructors in a union provides at best a false sense of security. The moment two distinct types are allocated at the same address indiscriminately, a core tenet of the language (i.e. memory is typed, each object has a distinct address) is broken. So union is but a core mechanism for creating layout without attendance to the usual language rules.

I'll pull this now. If, after perusing http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2544.pdf, the conclusion is reached that that work does not apply to D, feel free to raise the issue.

@andralex
Copy link
Member

Auto-merge toggled on

@andralex andralex merged commit f49a6ef into dlang:master Jun 17, 2016
@braddr
Copy link
Member

braddr commented Jun 17, 2016

Grump, in a major way. This pull doesn't seem to have any of the extra checks and balances that the c++ world has. The c++ version, as described in that doc, I'd have no problems with.

@WalterBright
Copy link
Member Author

extra checks and balances

What in particular would you like to see?

@JackStouffer
Copy link
Contributor

Thinking about this more, this feature would have been a great candidate for the new DIP process that is being discussed in the forums. Especially since that new language features of this nature are supposed to go through DIPs, which this would have benefited from considering only Andrei and Walter think that merging as is was a good idea and three other core team members disagree.

@MartinNowak
Copy link
Member

MartinNowak commented Oct 9, 2016

Somewhat annoying to only find this b/c the changelog entry left so many questions.

As stated before the implementation is fairly simplistic, and we should try to add a few safety mechanisms. C++ also has several so the comparison w/ C++'s unrestricted unions is incorrect.

We don't seem to have the initialization problem of C++.

  • by default first union field is initialized
  • default initialization of unions is disabled if one of it's fields has a @disable this();
  • unions can be explicitly initialized with non-default constructed fields auto u = U(Point(1, 2)); or U u = { .p = Point(1, 2) }; if p isn't the first field
  • unions can have their own (templated or overloaded) constructors

We do have a destructor/postblit/invariant problem

  • it's disallowed to define any of these in a union
  • those operations are now silently ignored or incorrect 🙄

I can think of 3 kinds of unions, untagged, externally tagged, and internally tagged (e.g. using lower pointer bits) ones.
The first 2 require consistent lifecycle management outside of the union.
An internally tagged union could define correct postblit/destructor/invariant functions itself.

Based on that, here is an idea:

  • like in C++11 if any of the fields require an explicit function we @disable that functions on the union by default
  • operations that require postblit/destructors will show the usual cannot ... because is marked @disable error
  • I've proposed an unsafe intrinsic to make dmd's internal STCnodtor flag accessible for a more efficient move (see Issue 14466).
    This is also useful in a few other places, e.g. for more efficient library array and AA initialization.
    Here it could be required to explicitly mark a union variable as destructed/nodtor for dmd to not insist on calling the disabled destructor.
    And since __nodtor is unsafe, destructing unrestricted unions would correctly become unsafe.
  • unrestricted unions could remain non-copyable, b/c only assignment/copying of a field makes sense
  • possible future enhancement, not needed b/c internally tagged unions can be managed externally as well:
    It could be allowed to define postblit/destructors in the union, thus overriding the @disabled functions, to implement internally tagged unions
    The programmer would be responsible for the definitions as the compiler can't check correctness (@System).

Just rough ideas, but this feature clearly needs to make a small design roundtrip.
I'd propose to remove it from the upcoming 2.072 release so that we don't have to support it for the next 2 years.

@andralex
Copy link
Member

andralex commented Oct 10, 2016

As stated before the implementation is fairly simplistic, and we should try to add a few safety mechanisms. C++ also has several so the comparison w/ C++'s unrestricted unions is incorrect.

What would be C++'s safety mechanisms with unions, and how do they render the comparison incorrect? Thanks.

like in C++11 if any of the fields require an explicit function we @disable that functions on the union by default

Could you please put this another way? I didn't get it.

operations that require postblit/destructors will show the usual cannot ... because is marked @disable error

Construction itself (e.g. on the stack or as a member inside another object) requires the destructor. The point is to allow that. What other operations do you have in mind?

I've proposed an unsafe intrinsic to make dmd's internal STCnodtor flag accessible for a more efficient move (see Issue 14466). This is also useful in a few other places, e.g. for more efficient library array and AA initialization. Here it could be required to explicitly mark a union variable as destructed/nodtor for dmd to not insist on calling the disabled destructor. And since __nodtor is unsafe, destructing unrestricted unions would correctly become unsafe.

Trying to understand this. Does this mean letting a stack-allocated union go out of scope is unsafe?

unrestricted unions could remain non-copyable, b/c only assignment/copying of a field makes sense

Consider:

struct Optional(T) {
    union { T value; }
    bool isValid;
    ...
}

The goal here is to make sure Optional can implement the usual semantics (if isValid is true then the value is a T object, otherwise it's just trash). Would the non-copyability of the union make things difficult for the Optional?

possible future enhancement, not needed b/c internally tagged unions can be managed externally as well:
It could be allowed to define postblit/destructors in the union, thus overriding the @disabled functions, to implement internally tagged unions
The programmer would be responsible for the definitions as the compiler can't check correctness (@System).

That sounds like a lot of busywork for no good outcome.

To put it simply, unless we want to "get real" with built-in tagged unions, I think we're wasting time embellishing unions (which currently are untagged by the language). The C, C++, and D union notion allows allocating objects of unrelated types at the same address, and leaves the responsibility of tracking types and lifetimes squarely on the shoulders of the facility's user. This is the way it's always been, and it's the way it's always going to be. It's a simple contract. About the only proper use of such unions is in libraries that add a tag and implement said tracking. Any attempt to make things cleverer has the only net effect of making life unduly difficult for people who implement such libraries and is a mistake. (This is exactly the arc of the feature in C++.) I think the state of affairs with untagged unions in D is just right as of this time, and we should move on to doing something else.

If we were to support unions tagged implicitly by the language, the situation would be very different. I see at least one good use in better GC traceability, not to mention dynamic introspection etc. That kind of effort might be worth getting into.

@MartinNowak
Copy link
Member

MartinNowak commented Oct 11, 2016

This is the actual design work somebody should've done before opening this PR, but not us in last minute before a release. This alone should be reason enough to throw this out of the release.
I don't have time to design such a feature atm. (Beta, DIP1000) and GH isn't the right place to do it, so I'll only take 5 minutes to answer your questions.

What would be C++'s safety mechanisms with unions, and how do they render the comparison incorrect? Thanks.

From the C++ specs you quoted (Unrestricted Unions (Revision 2))

We have also changed the way implicitly declared special member functions of unions are
generated in the following way: if a non-trivial special member function is defined for any
member of a union, or a member of an anonymous union inside a class, that special member
function will be implicitly deleted (8.4 ¶10) for the union or class. This prevents the compiler
from trying to write code that it cannot know how to write, and forces the programmer to write
that code if it’s needed. The fact that the compiler can’t write such a function is no reason not to
let the programmer do so.

This is a good idea to avoid silent leaking or unsafe behavior. I tried to emulate that with the @disable concept above.

Construction itself (e.g. on the stack or as a member inside another object) requires the destructor. The point is to allow that. What other operations do you have in mind?

The compiler cannot generate such functions, it doesn't know which field to destruct/postblit.
So this requires the programmer to do it, and I've proposed ways to do that.

struct SomeExample
{
    static union U
    {
        int fd;
        File file; // RefCounted resource
    }
    bool file;
    U u;

    ~this() // you need to write this destructor b/c u.~this is @disabled
    {
        // destroy the correct u field
        if (file)
            .destroy(file);
        else
            .close(fd);
        __nodtor(u); // tells the compiler to skip u.~this
        /** It doesn't seem required to make __nodtor unsafe, but not using it correctly can result in memory/resource leaks. A closer analysis of __nodtor @safety would be required.
        */
    }

    this(this) // you need to write this postblit b/c u.this(this) is @disabled
    {
        if (file)
            u.file.__xpostblit; // just postblit the correct field
        else
            // not the best example :)
            throw new AssertException("Illegal copy, raw file descriptors aren't refcounted.");
    }
}

Trying to understand this. Does this mean letting a stack-allocated union go out of scope is unsafe?

That was a hasty conclusion, seems like __nodtor must not be unsafe (though it's dangerous b/c you can easily leak).

Consider:

struct Optional(T) {
union { T value; }
bool isValid;
...
}

I won't ask why you want to use a union for that (instead of simply align(T.alignof) ubyte[T.sizeof] storage = void; and cast(T*) storage.ptr), but let's take it as an example for a typical algebraic type use-case in function programming languages (e.g. Just a | Nothing, see Maybe).
You should be able to answer your question yourself following my example above.

struct Optional(T)
{
    union { T value = void; } // disable default construction of first field (doesn't work see Bugzilla 11331)
    bool isValid = false; // just to be explicit

    this(T value)
    {
        this.value = value;
        isValid = true;
    }

    ~this()
    {
        // mmh, seems my __nodtor idea doesn't integrate that nicely with anonymous unions,
        // b/c they have no name you should prolly have to call it on all union fields w/ dtor
        if (isValid)
            destroy(value);
        __nodtor(value);
    }

    this(this)
    {
        if (isValid)
            value.__xpostblit;
    }
}

Here is what the current compiler 2.071.2-b1 does for the File example.
It does not call any destructor and doesn't insist on any be written, even though a destructor is obviously needed.
For copying it doesn't insist on destructing the target, blits the raw memory, and doesn't insist on a postblit be written, even though one is obviously needed.

The solution I proposed is somewhat complex (comes from translating C++'s default deleted operators), and would require at least 2 new compiler features (__nodtor, defining ~this()/this(this) even though one of the fields has those @disabled).
Any (simple) solution that solves the "doesn't insist ... even though one is obviously needed" parts would make the 4 people opposed to this PR in the current state happy. It seems that the compiler couldn't realistically do more than that anyhow. This could maybe just be implemented a simple diagnostic error check in the compiler.
The former solution seems a bit more orthogonal and mostly reuses existing language feats. while the latter adds new single-purpose language rules (structs containing a union field w/ dtor/postblit must define a dtor/postblit themselves).

If we were to support unions tagged implicitly by the language, the situation would be very different.

Sure tagged unions could be fully managed by the compiler and have a lot of interesting properties, e.g. very efficient pattern matching (for D maybe allowing some final switch). Tagged unions are OT though.

That was 45-50 min. instead of 5, means 40-45 min. less sleep for me today.
Please spent just a little bit more time on convincing arguments when there are 3 people raising concerns, and in any case write more than 2 lines of unclear changelog notice w/o example for such a language change.
I know it's unfair to ask Walter to do even more, but if we don't have the time to properly implement a feat., we should prolly not do it.

@andralex
Copy link
Member

andralex commented Oct 11, 2016

That was 45-50 min. instead of 5, means 40-45 min. less sleep for me today.
Please spent just a little bit more time on convincing arguments when there are 3 people raising concerns, and in any case write more than 2 lines of unclear changelog notice w/o example for such a language change.
I know it's unfair to ask Walter to do even more, but if we don't have the time to properly implement a feat., we should prolly not do it.

Bummer about your losing sleep.

I really don't know what to say or do. I think no restriction is the way to go and we're really good, and am out of new ways to explain so. union has undefined behavior left, right, and center. There's no two ways about it. Attempting to claim some safety gets added by disabling things here is futile. Just let union be something inert, as it should be: a mechanism for defining layout. No implicit ctor, postblit, dtor, no clever requirements of things that should be defined - nothing will improve safety of an untagged union! It's fundamentally unsafe.

I think we'll have to agree to disagree. All that cleverness and __nodtor and subtleties, it's just not proportional response. It's not the kind of language design we want to do. Look over it again - is really union where we want to unleash our cleverness?

The question is now that we just have an irreducible disagreement what is the next step. Please advise.

@MartinNowak
Copy link
Member

I think we'll have to agree to disagree. All that cleverness and __nodtor and subtleties, it's just not proportional response. It's not the kind of language design we want to do. Look over it again - is really union where we want to unleash our cleverness?

As said above, I think it's complex myself, and it was just an attempt to emulate C++'s spec.
The problem quite a few people moan about is silently allowing obviously incorrect behavior.

union U
{
    RC rc;
    int num;
}

void leak(ref U a, ref U b)
{
    a = b; // assignments only bitblit and might leak w/ non-POD fields, why allow it
    // either do
    a.rc = b.rc;
    // or
    a.num = b.num;
}

struct S
{
    // not implementing `this(this)` and `~this()` in S is wrong and might leak
    U u; 
}

Those are diagnostic errors and could be added later on.

@MartinNowak
Copy link
Member

See dlang/dlang.org#1503 for changelog entry.
Still think this feature is incomplete/poorly implemented, and unnecessarily allows to make many mistakes (bugs).

MartinNowak added a commit to MartinNowak/dlang.org that referenced this pull request Oct 24, 2016
@JohanEngelen
Copy link
Contributor

JohanEngelen commented May 7, 2018

I believe this may have been a widely unnoticed language change, as it has lead to a nasty memory leak bug in Weka's code that was only discovered by lucky coincidence...

Below I've put my current addition to prevent bugs when people accidentally add a dtor to some type that ends up as part of a union (possibly nested).

Seems very useful to add to Phobos IMO.
Note there is this outstanding bug such that it doesn't work with @disable this(this) types: https://issues.dlang.org/show_bug.cgi?id=18628 Fixed it with addition of isCopyable!T.

template reportDangerousMembers(T, string text, OuterUnion, bool outputMsg)
{
    import std.meta;
    import std.traits;
    alias FilteredDangerousMembers = Filter!(ApplyRight!(.reportDangerousInsideUnion, OuterUnion, outputMsg), FieldTypeTuple!T);
    enum bool reportDangerousMembers = !is(FilteredDangerousMembers == AliasSeq!());

    static if (reportDangerousMembers)
    {
        alias DangerousType = FilteredDangerousMembers[0];
        enum dangerousMemberIndex = staticIndexOf!(DangerousType, FieldTypeTuple!T);
        static if (outputMsg)
            pragma(msg, "Member " ~ FieldNameTuple!(T)[dangerousMemberIndex] ~ " of type " ~ DangerousType.stringof ~ " in " ~ text ~ " is dangerous inside union " ~ OuterUnion.stringof ~ ".");
    }
}

template reportDangerousInsideUnion(T, OuterUnion, bool outputMsg)
{
    import std.traits;
    static if (is(T == union))
    {
        // Recurse deeper
        enum bool reportDangerousInsideUnion = reportDangerousMembers!(T, "union " ~ T.stringof, T, outputMsg);
        static if (reportDangerousInsideUnion && outputMsg)
            pragma(msg, "Union " ~ T.stringof ~ " has dangerous members.");
    }
    else static if (isStaticArray!T && T.length)
    {
        // Recurse with element type
        enum bool reportDangerousInsideUnion = reportDangerousInsideUnion!(typeof(T.init[0]), OuterUnion, outputMsg);
        static if (reportDangerousInsideUnion && outputMsg)
            pragma(msg, "Static array " ~ T.stringof ~ " has dangerous element type");
    }
    else static if (is(T == struct))
    {
        enum isDangerous = (hasElaborateCopyConstructor!T && isCopyable!T) || hasElaborateDestructor!T;
        enum anyMemberIsDangerous = reportDangerousMembers!(T, "struct " ~ T.stringof, OuterUnion, outputMsg);

        static if (isDangerous && outputMsg)
            pragma(msg, "Struct " ~ T.stringof ~ " is dangerous inside union " ~ OuterUnion.stringof ~ ".");

        enum bool reportDangerousInsideUnion = isDangerous || anyMemberIsDangerous;
    }
    else
    {
        // Basic types are not dangerous.
        // Dynamic objects like classes and dynamic arrays with dtors, etc, are not dangerous inside a union.
        enum bool reportDangerousInsideUnion = false;
    }
}

// Returns true when union is dangerous.
// A dangerous union is a union where at least one member has a destructor or postblit.
template reportDangerousUnion(U, bool outputMsg = false)
{
    static if (is(U == union))
    {
        enum bool reportDangerousUnion = reportDangerousInsideUnion!(U, U, outputMsg);
    }
    else
    {
        enum bool reportDangerousUnion = false;
    }
}

Simple testcase:

struct WithDtor
{
    ~this() {}
}
struct WithPostblit
{
    this(this) {}
}

struct WithDtor2 {
    union A2Union {
        int b;
        WithDtor[4] a2a;
    }
    A2Union a2union;
}

union B
{
    int i;
    WithDtor2 a;

    // B is dangerous. Also output pragma messages.
    static assert(reportDangerousUnion!(B, true) == true);
}

union C
{
    int i;
    WithPostblit a;

    // C is dangerous
    static assert(reportDangerousUnion!C == true);
}

union D
{
    int i;
    ulong a;

    static assert(reportDangerousUnion!D == false);
}

    struct NoPostblit {
        int i;
        @disable this(this);
    }

    union E
    {
        int i;
        NoPostblit a;

        static assert(reportDangerousUnion!E == false);
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants