Skip to content

InternPool: begin conversion to thread-safe data structure#20528

Merged
andrewrk merged 15 commits intoziglang:masterfrom
jacobly0:tsip
Jul 8, 2024
Merged

InternPool: begin conversion to thread-safe data structure#20528
andrewrk merged 15 commits intoziglang:masterfrom
jacobly0:tsip

Conversation

@jacobly0
Copy link
Member

@jacobly0 jacobly0 commented Jul 7, 2024

This is another step towards first putting codegen on a separate thread from sema and later making both sema and codegen multi-threaded. Currently, the multi-threaded features of the new data structure are disabled by InternPool.want_multi_threaded because they have a large performance impact on the compiler and are not yet in use. It is still undecided how to minimize the impact of

The main conflicting change is replacing *Zcu with Zcu.PerThread in any code that wants to be able to mutate the intern pool. I chose to pass this information around everywhere instead of attempting something with threadlocal variables because in the fast case that would turn the intern pool into a global singleton, and in the slow case would introduce an extra lookup on every intern pool mutation. Interestingly, if there is a desire to remove intern pool mutation from the backends, the per thread change can be reverted over time in parts of the code and that guarantees that the mutations can no longer in that section.

The current performance impact is in the range where I expect to be able to recoup the loss with more work, because while timing individual commits, this is close to the same time as before the change which ended up having the worst performance impact, which I mitigated by disabling features of the data structure as mentioned above. Even with this current slowdown, there may be a desire to get some or all of this change merged sooner to reduce future conflicts, and continue work separately.

Benchmark 1 (28 runs): master/bin/zig build-exe -ODebug --dep aro --dep aro_translate_c --dep build_options -Mroot=src/main.zig -Maro=lib/compiler/aro/aro.zig --dep aro -Maro_translate_c=lib/compiler/aro_translate_c.zig -Mbuild_options=options.zig -fno-emit-bin
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          4.43s  ± 69.4ms    4.28s  … 4.52s           1 ( 4%)        0%
  peak_rss            323MB ± 1.06MB     320MB …  325MB          1 ( 4%)        0%
  cpu_cycles         23.1G  ±  233M     22.8G  … 23.5G           0 ( 0%)        0%
  instructions       45.0G  ± 34.3K     45.0G  … 45.0G           2 ( 7%)        0%
  cache_references   2.13G  ± 18.6M     2.11G  … 2.19G           2 ( 7%)        0%
  cache_misses       86.9M  ± 1.15M     84.5M  … 89.7M           2 ( 7%)        0%
  branch_misses      85.7M  ±  577K     84.6M  … 87.1M           2 ( 7%)        0%
Benchmark 2 (26 runs): tsip/bin/zig build-exe -ODebug --dep aro --dep aro_translate_c --dep build_options -Mroot=src/main.zig -Maro=lib/compiler/aro/aro.zig --dep aro -Maro_translate_c=lib/compiler/aro_translate_c.zig -Mbuild_options=options.zig -fno-emit-bin
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          4.68s  ± 54.9ms    4.50s  … 4.74s           5 (19%)        💩+  5.7% ±  0.8%
  peak_rss            324MB ±  682KB     323MB …  325MB          0 ( 0%)          +  0.4% ±  0.2%
  cpu_cycles         23.3G  ±  168M     23.1G  … 23.6G           0 ( 0%)          +  0.9% ±  0.5%
  instructions       40.1G  ± 32.4K     40.1G  … 40.1G           1 ( 4%)        ⚡- 11.0% ±  0.0%
  cache_references   2.15G  ± 15.4M     2.13G  … 2.20G           2 ( 8%)         +   1.2% ±  0.4%
  cache_misses       79.2M  ± 1.55M     77.0M  … 82.8M           0 ( 0%)        ⚡-  8.9% ±  0.9%
  branch_misses      73.4M  ±  564K     72.6M  … 74.8M           0 ( 0%)        ⚡- 14.3% ±  0.4%

@andrewrk andrewrk merged commit ab4eeb7 into ziglang:master Jul 8, 2024
@andrewrk andrewrk added the release notes This PR should be mentioned in the release notes. label Jul 8, 2024
@andrewrk
Copy link
Member

andrewrk commented Jul 8, 2024

Data point: building the zig compiler with the x86_64 backend with codegen threading enabled:

--- a/src/InternPool.zig
+++ b/src/InternPool.zig
@@ -93,7 +93,7 @@ files: std.AutoArrayHashMapUnmanaged(Cache.BinDigest, OptionalDeclIndex) = .{},
 /// Whether a multi-threaded intern pool is useful.
 /// Currently `false` until the intern pool is actually accessed
 /// from multiple threads to reduce the cost of this data structure.
-const want_multi_threaded = false;
+const want_multi_threaded = true;
 
 /// Whether a single-threaded intern pool impl is in use.
 pub const single_threaded = builtin.single_threaded or !want_multi_threaded;
diff --git a/src/target.zig b/src/target.zig
index 2accc100b8..e236cea616 100644
--- a/src/target.zig
+++ b/src/target.zig
@@ -572,6 +572,7 @@ pub inline fn backendSupportsFeature(backend: std.builtin.CompilerBackend, compt
             else => false,
         },
         .separate_thread => switch (backend) {
+            .stage2_x86_64 => true,
             else => false,
         },
     };

before: 8f20e81
after: ab4eeb7
(before/after this PR merged)

Benchmark 1 (3 runs): /home/andy/dev/zig/build-release/stage3/bin/zig build-exe -fno-llvm -fno-lld --stack 33554432 -fno-sanitize-thread -ODebug --dep aro --dep aro_translate_c --dep build_options -Mroot=/home/andy/dev/zig/src/main.zig -Maro=/home/andy/dev/zig/lib/compiler/aro/aro.zig --dep aro -Maro_translate_c=/home/andy/dev/zig/lib/compiler/aro_translate_c.zig -Mbuild_options=/home/andy/dev/zig/.zig-cache/c/ecf36f9c39c1cbfb3043b07b8688debd/options.zig --cache-dir /home/andy/dev/zig/.zig-cache --global-cache-dir /home/andy/.cache/zig --name zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          12.8s  ±  189ms    12.7s  … 13.1s           0 ( 0%)        0%
  peak_rss            403MB ±  463KB     403MB …  404MB          0 ( 0%)        0%
  cpu_cycles         64.3G  ±  124M     64.2G  … 64.5G           0 ( 0%)        0%
  instructions        133G  ±  171K      133G  …  133G           0 ( 0%)        0%
  cache_references   4.07G  ± 4.88M     4.06G  … 4.07G           0 ( 0%)        0%
  cache_misses        419M  ± 3.33M      416M  …  423M           0 ( 0%)        0%
  branch_misses       435M  ±  831K      434M  …  436M           0 ( 0%)        0%
Benchmark 2 (3 runs): /home/andy/src/zig/build-release/stage3/bin/zig build-exe -fno-llvm -fno-lld --stack 33554432 -fno-sanitize-thread -ODebug --dep aro --dep aro_translate_c --dep build_options -Mroot=/home/andy/src/zig/src/main.zig -Maro=/home/andy/src/zig/lib/compiler/aro/aro.zig --dep aro -Maro_translate_c=/home/andy/src/zig/lib/compiler/aro_translate_c.zig -Mbuild_options=/home/andy/src/zig/.zig-cache/c/f18315ac6d459d8944cdba2c81e890df/options.zig --cache-dir /home/andy/src/zig/.zig-cache --global-cache-dir /home/andy/.cache/zig --name zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          8.57s  ± 66.9ms    8.52s  … 8.64s           0 ( 0%)        ⚡- 33.3% ±  2.5%
  peak_rss            517MB ± 3.73MB     513MB …  520MB          0 ( 0%)        💩+ 28.1% ±  1.5%
  cpu_cycles         65.3G  ±  131M     65.2G  … 65.4G           0 ( 0%)        💩+  1.5% ±  0.4%
  instructions        138G  ± 1.22M      138G  …  138G           0 ( 0%)        💩+  3.5% ±  0.0%
  cache_references   4.01G  ± 17.6M     4.00G  … 4.03G           0 ( 0%)          -  1.4% ±  0.7%
  cache_misses        220M  ± 4.17M      215M  …  223M           0 ( 0%)        ⚡- 47.5% ±  2.0%
  branch_misses       340M  ± 2.84M      338M  …  343M           0 ( 0%)        ⚡- 21.9% ±  1.1%

@jacobly0 jacobly0 deleted the tsip branch July 8, 2024 21:36
@Jarred-Sumner
Copy link
Contributor

Jarred-Sumner commented Jul 9, 2024

wow 12.8s end-to-end before

with -fno-emit-bin, bun currently takes ~11s (it's about 1m 15s end-to-end, most of which is the "Emit LLVM" step)

 time .cache/zig/zig build check --summary new --verbose
info: zig compiler v0.13.0
/root/bun/.cache/zig/zig build-obj -freference-trace=16 -fno-strip -fno-stack-check -fno-stack-protector -fno-omit-frame-pointer -fvalgrind -fPIC -ODebug -target native-native-gnu.2.27 -mcpu native --dep async_io --dep zlib-internal --dep async --dep ZigGeneratedClasses --dep ResolvedSourceTag --dep build_options -Mroot=/root/bun/root.zig -Masync_io=/root/bun/src/io/io_linux.zig -Mzlib-internal=/root/bun/src/deps/zlib.posix.zig -Masync=/root/bun/src/async/posix_event_loop.zig -MZigGeneratedClasses=/root/bun/build/codegen/ZigGeneratedClasses.zig -MResolvedSourceTag=/root/bun/build/codegen/ResolvedSourceTag.zig -Mbuild_options=/root/bun/.zig-cache/c/e16e8fad5a41dcf3d3a150b03d8ff201/options.zig -lc++ -lc -fno-emit-bin -fformatted-panics --eh-frame-hdr --emit-relocs -ffunction-sections --cache-dir /root/bun/.zig-cache --global-cache-dir /root/.cache/zig --name bun-debug -fno-compiler-rt --zig-lib-dir /root/bun/src/deps/zig/lib --listen=-
Build Summary: 3/3 steps succeeded
check success
└─ zig build-obj bun-debug Debug native-native-gnu.2.27 success 11s MaxRSS:563M
   └─ options success

________________________________________________________
Executed in   14.32 secs    fish           external
   usr time   14.19 secs    0.00 micros   14.19 secs
   sys time    0.74 secs  482.00 micros    0.74 secs

any ideas immediately come to mind why bun's compilation time is much slower than the zig compiler despite less code?

image

@andrewrk
Copy link
Member

andrewrk commented Jul 9, 2024

Note that my 12.8s data point above is with the x86_64 backend. You can try enabling it, but it's not the default yet due to machine code quality, debug info correctness, and behavior test coverage.

zig/build.zig

Lines 218 to 220 in 854e86c

const use_llvm = b.option(bool, "use-llvm", "Use the llvm backend");
exe.use_llvm = use_llvm;
exe.use_lld = use_llvm;

If you're on new Apple hardware you'll have to wait for the aarch64 backend to get a similar speedup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release notes This PR should be mentioned in the release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants