Skip to content

compiler: move struct types into InternPool proper#17172

Merged
andrewrk merged 26 commits intomasterfrom
ip-structs
Sep 22, 2023
Merged

compiler: move struct types into InternPool proper#17172
andrewrk merged 26 commits intomasterfrom
ip-structs

Conversation

@andrewrk
Copy link
Member

@andrewrk andrewrk commented Sep 16, 2023

Structs were previously using SegmentedList to be given indexes, but were not actually backed by the InternPool arrays.

After this, the only remaining uses of SegmentedList in the compiler are Module.Decl and Module.Namespace. Once those last two are migrated to become backed by InternPool arrays as well, we can introduce state serialization via writing these arrays to disk all at once.

Unfortunately there are a lot of source code locations that touch the struct type API, so this commit is still work-in-progress. Once I get it compiling and passing the test suite, I can provide some interesting data points such as how it affected the InternPool memory size and performance comparison against master branch.

I also couldn't resist migrating over a bunch of alignment API over to use the log2 Alignment type rather than a mismash of u32 and u64 byte units with 0 meaning something implicitly different and special at every location. Turns out you can do all the math you need directly on the log2 representation of alignments.

This is a sub-task of the Performance Project.

@andrewrk andrewrk force-pushed the ip-structs branch 3 times, most recently from dc7d75a to 76f0557 Compare September 20, 2023 06:41
andrewrk and others added 21 commits September 21, 2023 14:48
Structs were previously using `SegmentedList` to be given indexes, but
were not actually backed by the InternPool arrays.

After this, the only remaining uses of `SegmentedList` in the compiler
are `Module.Decl` and `Module.Namespace`. Once those last two are
migrated to become backed by InternPool arrays as well, we can introduce
state serialization via writing these arrays to disk all at once.

Unfortunately there are a lot of source code locations that touch the
struct type API, so this commit is still work-in-progress. Once I get it
compiling and passing the test suite, I can provide some interesting
data points such as how it affected the InternPool memory size and
performance comparison against master branch.

I also couldn't resist migrating over a bunch of alignment API over to
use the log2 Alignment type rather than a mismash of u32 and u64 byte
units with 0 meaning something implicitly different and special at every
location. Turns out you can do all the math you need directly on the
log2 representation of alignments.
This also modifies AstGen so that struct types use 1 bit each from the
flags to communicate if there are nonzero inits, alignments, or comptime
fields. This allows adding a struct type to the InternPool without
looking ahead in memory to find out the answers to these questions,
which is easier for CPUs as well as for me, coding this logic right now.
for the new struct and packed struct encodings.
Not sure what that code was supposed to be doing, it doesn't seem to be
reachable.
Let's try to reduce the explosive scope of this branch.
All of the logic in `Value.elemValue` is quite questionable, but
printing an error is definitely better than crashing. Notably, this
should stop us from hitting crashes when dumping AIR.
We're hitting false compile errors, but this is progress!
Carve out a path forward for being a little bit more intentional about
handling "default" alignment values.
Previously it would canonicalize or not depending on some volatile
internal state of the compiler, now it forces resolution of the element
type to determine the alignment if it needs to.
This changeset fixes the handling of alignment in several places. The
new rules are:
* `@alignOf(T)` where `T` is a runtime zero-bit type is at least 1,
  maybe greater.
* Zero-bit fields in `extern` structs *do* force alignment, potentially
  offsetting following fields.
* Zero-bit fields *do* have addresses within structs which can be
  observed and are consistent with `@offsetOf`.

These are not necessarily all implemented correctly yet (see disabled
test), but this commit fixes all regressions compared to master, and
makes one new test pass.
Zero-byte alignment is no longer valid for runtime types. I made most of
these changes in an earlier commit, but missed this case.
@andrewrk andrewrk marked this pull request as ready for review September 21, 2023 21:49
@andrewrk
Copy link
Member Author

ghostty perf data point:

Benchmark 1 (3 runs): master/zig build-exe ghostty/src/main.zig ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          6.66s  ± 39.5ms    6.62s  … 6.70s           0 ( 0%)        0%
  peak_rss            634MB ±  136KB     634MB …  634MB          0 ( 0%)        0%
  cpu_cycles         27.8G  ± 44.4M     27.7G  … 27.8G           0 ( 0%)        0%
  instructions       38.2G  ± 3.63M     38.2G  … 38.2G           0 ( 0%)        0%
  cache_references   1.65G  ± 5.04M     1.64G  … 1.65G           0 ( 0%)        0%
  cache_misses       99.5M  ±  537K     99.0M  …  100M           0 ( 0%)        0%
  branch_misses       202M  ±  206K      202M  …  202M           0 ( 0%)        0%
Benchmark 2 (3 runs): ip-structs/zig build-exe ghostty/src/main.zig ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          6.57s  ± 48.9ms    6.51s  … 6.61s           0 ( 0%)          -  1.5% ±  1.5%
  peak_rss            634MB ±  342KB     634MB …  635MB          0 ( 0%)          +  0.0% ±  0.1%
  cpu_cycles         26.8G  ±  184M     26.6G  … 26.9G           0 ( 0%)        ⚡-  3.5% ±  1.1%
  instructions       36.2G  ± 2.29M     36.2G  … 36.2G           0 ( 0%)        ⚡-  5.2% ±  0.0%
  cache_references   1.65G  ± 19.2M     1.63G  … 1.67G           0 ( 0%)          +  0.0% ±  1.9%
  cache_misses        100M  ± 1.02M     98.9M  …  101M           0 ( 0%)          +  0.5% ±  1.9%
  branch_misses       195M  ±  227K      195M  …  196M           0 ( 0%)        ⚡-  3.1% ±  0.2%

self-hosted compiler perf data point:

Benchmark 1 (3 runs): master/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          53.4s  ± 2.87s     51.0s  … 56.6s           0 ( 0%)        0%
  peak_rss           3.94GB ± 32.7MB    3.92GB … 3.98GB          0 ( 0%)        0%
  cpu_cycles          217G  ± 1.54G      216G  …  219G           0 ( 0%)        0%
  instructions        292G  ± 1.81G      291G  …  294G           0 ( 0%)        0%
  cache_references   13.2G  ± 32.4M     13.2G  … 13.2G           0 ( 0%)        0%
  cache_misses       1.13G  ±  540K     1.13G  … 1.13G           0 ( 0%)        0%
  branch_misses      1.57G  ± 11.3M     1.57G  … 1.59G           0 ( 0%)        0%
Benchmark 2 (3 runs): ip-structs/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          53.9s  ± 72.1ms    53.8s  … 53.9s           0 ( 0%)          +  0.8% ±  8.6%
  peak_rss           3.92GB ±  247KB    3.92GB … 3.92GB          0 ( 0%)          -  0.4% ±  1.3%
  cpu_cycles          209G  ±  102M      209G  …  209G           0 ( 0%)        ⚡-  3.9% ±  1.1%
  instructions        273G  ± 89.8M      273G  …  273G           0 ( 0%)        ⚡-  6.6% ±  1.0%
  cache_references   13.0G  ± 59.4M     12.9G  … 13.0G           0 ( 0%)          -  1.3% ±  0.8%
  cache_misses       1.13G  ± 5.54M     1.12G  … 1.13G           0 ( 0%)          +  0.5% ±  0.8%
  branch_misses      1.52G  ± 2.32M     1.51G  … 1.52G           0 ( 0%)        ⚡-  3.5% ±  1.2%

mlugg and others added 2 commits September 21, 2023 16:27
When struct types have no field names, the names are implicitly
understood to be strings corresponding to the field indexes in
declaration order. It used to be the case that a NullTerminatedString
would be stored for each field in this case, however, now, callers must
handle the possibility that there are no names stored at all. This
commit introduces `legacyStructFieldName`, a function to fake the
previous behavior. Probably something better could be done by reworking
all the callsites of this function.
Gotta call the get() function inside the loop if the loop adds anything
to InternPool.
It seems the webassembly backend does not want the exception that
`structFieldAlignmentExtern` makes for 128-bit integers. Perhaps that
logic should be modified to check if the target is wasm.

Without this, this branch fails the C ABI tests for wasm, causing this:
```
wasm-ld: warning: function signature mismatch: zig_struct_u128
>>> defined as (i64, i64) -> void in cfuncs.o
>>> defined as (i32) -> void in test-c-abi-wasm32-wasi-musl-ReleaseFast.wasm.o
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants