Skip to content

Fix union layout and initialization#1846

Merged
dnadlinger merged 4 commits intoldc-developers:masterfrom
kinke:union
Oct 25, 2016
Merged

Fix union layout and initialization#1846
dnadlinger merged 4 commits intoldc-developers:masterfrom
kinke:union

Conversation

@kinke
Copy link
Member

@kinke kinke commented Oct 21, 2016

No description provided.

@kinke
Copy link
Member Author

kinke commented Oct 21, 2016

Doesn't fix #1829.

@kinke
Copy link
Member Author

kinke commented Oct 21, 2016

The _init type should now be always identical to the normal type (the used fields for the normal type are based on the field initializers). Only the initializer of globals may use a different type, based on which fields are set (this could be optimized to use the normal type in most cases - most aggregates don't feature any unions anyway).
Getting rid of _init types may work around #1829 in that particular case, as that config global is default-initialized IIRC, so the type wouldn't change and the global wouldn't need to be replaced.
But #1829 is a more general issue - either a bug in LLVM 3.9 (unlikely I guess) or caused by us keeping around a reference to the old, replaced global, and using it after the replacement.

Copy link
Member

@dnadlinger dnadlinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the red (although I'm still not too big a fan of removing braces from block statements).

I haven't had too close a look at the priorities management code, but assuming the test coverage is good, I think this is good to go. As far as I can see, the whole _init type scheme was based on the idea that we could deterministically construct that type without looking at the values themselves. This would be very desirable for forward-references, etc. – but unfortunately, we've already figured out that that this can't be done in general (e.g. for unions). Thus, we should just remove the excess baggage/potential for bugs.

}

namespace {
enum FieldPriority {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum class? <3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have had to implement the < and -- operators then, so I changed my initial enum class to enum. ;)

}

void AggrTypeBuilder::addAggregate(AggregateDeclaration *ad) {
addAggregate(ad, nullptr, /* addAliases = */ true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make the parameter an enum then to make it self-documenting?

continue;
if (v_begin < f_end && v_end > f_begin) {
if (addAliases && v_begin == f_begin &&
DtoMemType(data[i]->type) == DtoMemType(field->type)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this just be vd? In fact, if you loop over all of data anyway, can't you just turn the whole thing into a range-based for loop?

if (field->_init != nullptr && !field->_init->isVoidInitializer()) {
IF_LOG Logger::println("adding explicit initializer for struct field %s",
field->toChars());
unsigned errors = global.errors;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nitpick in old code: const)

By refactoring IrAggr::addFieldInitializers() and making it use an
extended and refactored AggrTypeBuilder::addAggregate().

AggrTypeBuilder::addAggregate() can now optionally detect alias fields
in unions (same offset and LL type as a dominant union field) and add
those to the variable-to-GEP-index mapping.
These alias fields can then be indexed directly with a GEP instead of
resorting to casting the pointer and applying the byte offset.
@kinke kinke force-pushed the union branch 2 times, most recently from c39a298 to f675a3c Compare October 22, 2016 22:53
char[2][4] multidim; // multidimensional init based on a single 0xff char
}
// CHECK-DAG: %union.S = type { i8, [3 x i8], i32, [2 x i8], [4 x [2 x i8]], [2 x i8] }
// CHECK-DAG: @_D5union1S6__initZ = constant %union.S { i8 -1, [3 x i8] zeroinitializer, i32 0, [2 x i8] zeroinitializer, [4 x [2 x i8]] {{\[}}[2 x i8] c"\FF\FF", [2 x i8] c"\FF\FF", [2 x i8] c"\FF\FF", [2 x i8] c"\FF\FF"], [2 x i8] zeroinitializer }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was really necessary to escape the [[ (FileCheck variable matcher) in [4 x [2 x i8]] [[2 x i8] c"\FF\FF", ...] via {{\[}}[. ;(

@kinke
Copy link
Member Author

kinke commented Oct 22, 2016

Alright, the 2nd commit fixes #1829 for DMD at least, as expected. It should restrict that bug to only a few exotic cases.

ir/iraggr.cpp Outdated

auto llStructType = getLLStructType();
bool notCompatible = (types.size() != llStructType->getNumElements());
if (!notCompatible) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nitpick: I'd remove the negation here and call the variable just compatible – my brain parsed !notCompatible parsed in interesting ways when just skimming the source. (Or at least, incompatible.)

@dnadlinger
Copy link
Member

dnadlinger commented Oct 23, 2016

Feel free to merge once tests are green. Codegen should effectively be the same, and there shouldn't be any compile time pessimisations either, right? (Of course, avoiding to switch out the global in general would be a slight improvement, even.)

I'm still hoping somebody will sit down and nicely abstract away all the bool zext stuff soon, though.

There's no <Type>_init type for aggregates (structs and classes) anymore,
effectively eliminating a *lot* of named LLVM types, some bitcasts as well
as replacements of globals etc.

To get there, it was even required to use the regular type for compatible
literals too, otherwise structs embedded as fields in other aggregates had
an anonymous type (well, the LLVM constant for the field initializer had)
and so the container initializer wasn't compatible with the regular type
anymore.

What was also necessary was a fix wrt. static arrays of bools (LLVM
constant of type `[N x i1]` vs. `[N x i8]` for regular type).
I also had to change the initializer for `char[2][3] x = 0xff` from
`[6 x i8]` to `[3 x [2 x i8]]`, i.e., NOT flattening multi-dimensional
inits from a scalar.

So only literals with overlapping (union) fields and an explicit
initializer initializing dominated non-alias union fields should still
have a mismatching anonymous type - i.e., very, very few cases.
@kinke
Copy link
Member Author

kinke commented Oct 23, 2016

Thanks for the blazingly fast review. ;)

compile time pessimisations

Based on the admittedly vague CI results, there seem to be no significant changes.

nicely abstract away all the bool zext stuff

I was thinking the same, it's a pain. Using i8 generally and, in case it's really worth the trouble, truncating right before the actual operations (and zexting immediately again when storing) should decomplicate things substantially. That was quite a nasty bug and invisible due to the (pointer) bitcasting...

@kinke kinke changed the title WIP: Fix union layout and initialization Fix union layout and initialization Oct 23, 2016
@dnadlinger
Copy link
Member

Using i8 generally

Boolean values (true/false/comparison results/…) are i1 on the LLVM level. I don't think it would work out well to represent all these as i8.

That was quite a nasty bug and invisible due to the (pointer) bitcasting...

Was there an actual bug – as in, wrong codegen – anywhere?

@dnadlinger
Copy link
Member

We might also want to rebase this onto 0.17.x to keep DMD hostable with a C++-based compiler.

@kinke
Copy link
Member Author

kinke commented Oct 23, 2016

I also had to zext scalar bools to prevent type mismatches. Furthermore, I've had a look at how the dynamic struct literals are generated, noticed an issue and fixed + refactored that part too. I hope this PR now fixes our union-related bugs for good and gets stuff like #1719 working again.

@kinke
Copy link
Member Author

kinke commented Oct 23, 2016

Was there an actual bug – as in, wrong codegen – anywhere?

I did a quick test with bools with an intermediate commit, which seemed to suggest that LLVM didn't pack scalar i1 and [N x i1] struct fields, using a byte for each i1 instead. So that behavior for constant initializers may have been okay, although not desirable.

Apart from that, there was at least one major issue wrt. potentially diverging LL type layouts (even in size, and according initializer garbage) between _init and regular types for unions, as I posted here.

// initialize any padding so struct comparisons work
if (vd->offset != offset)
if (vd->offset != offset) {
assert(vd->offset > offset);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, this assertion doesn't hold for the code in #1324. :(

@kinke
Copy link
Member Author

kinke commented Oct 24, 2016

I've amended the last commit to match DMD behavior for DMD issue 16471 referenced by #1324 (which is resolved by this, as we depend on a front-end fix to properly fix this).

The PR should now be complete. ;)

The front-end fills in missing init expressions (except for the nested
context pointer in most cases), so there are no implicitly initialized
fields for dynamically initialized struct literals.

The front-end is also supposed to make sure there are no overlapping
initializers by using null-expressions for overridden fields, but doesn't
in some cases (DMD issue 16471).
Instead of preferring the first one in lexical field order, now use the
lexically last one to mimic DMD.

The previous code iterated over the fields in lexical order and ignored
the initializer for a field if there were earlier declared fields with
greater offset (and an initializer expression), which is wrong for
anonymous structs inside a union:

union {
  struct { int i1;            long l = 123; }
  struct { int i2; int x = 1;               }
}

`x` was previously initialized with 0 (treated as padding).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants