InternPool: begin conversion to thread-safe data structure by jacobly0 · Pull Request #20528 · ziglang/zig

jacobly0 · 2024-07-07T13:11:38Z

This is another step towards first putting codegen on a separate thread from sema and later making both sema and codegen multi-threaded. Currently, the multi-threaded features of the new data structure are disabled by InternPool.want_multi_threaded because they have a large performance impact on the compiler and are not yet in use. It is still undecided how to minimize the impact of

The main conflicting change is replacing *Zcu with Zcu.PerThread in any code that wants to be able to mutate the intern pool. I chose to pass this information around everywhere instead of attempting something with threadlocal variables because in the fast case that would turn the intern pool into a global singleton, and in the slow case would introduce an extra lookup on every intern pool mutation. Interestingly, if there is a desire to remove intern pool mutation from the backends, the per thread change can be reverted over time in parts of the code and that guarantees that the mutations can no longer in that section.

The current performance impact is in the range where I expect to be able to recoup the loss with more work, because while timing individual commits, this is close to the same time as before the change which ended up having the worst performance impact, which I mitigated by disabling features of the data structure as mentioned above. Even with this current slowdown, there may be a desire to get some or all of this change merged sooner to reduce future conflicts, and continue work separately.

Benchmark 1 (28 runs): master/bin/zig build-exe -ODebug --dep aro --dep aro_translate_c --dep build_options -Mroot=src/main.zig -Maro=lib/compiler/aro/aro.zig --dep aro -Maro_translate_c=lib/compiler/aro_translate_c.zig -Mbuild_options=options.zig -fno-emit-bin
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          4.43s  ± 69.4ms    4.28s  … 4.52s           1 ( 4%)        0%
  peak_rss            323MB ± 1.06MB     320MB …  325MB          1 ( 4%)        0%
  cpu_cycles         23.1G  ±  233M     22.8G  … 23.5G           0 ( 0%)        0%
  instructions       45.0G  ± 34.3K     45.0G  … 45.0G           2 ( 7%)        0%
  cache_references   2.13G  ± 18.6M     2.11G  … 2.19G           2 ( 7%)        0%
  cache_misses       86.9M  ± 1.15M     84.5M  … 89.7M           2 ( 7%)        0%
  branch_misses      85.7M  ±  577K     84.6M  … 87.1M           2 ( 7%)        0%
Benchmark 2 (26 runs): tsip/bin/zig build-exe -ODebug --dep aro --dep aro_translate_c --dep build_options -Mroot=src/main.zig -Maro=lib/compiler/aro/aro.zig --dep aro -Maro_translate_c=lib/compiler/aro_translate_c.zig -Mbuild_options=options.zig -fno-emit-bin
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          4.68s  ± 54.9ms    4.50s  … 4.74s           5 (19%)        💩+  5.7% ±  0.8%
  peak_rss            324MB ±  682KB     323MB …  325MB          0 ( 0%)          +  0.4% ±  0.2%
  cpu_cycles         23.3G  ±  168M     23.1G  … 23.6G           0 ( 0%)          +  0.9% ±  0.5%
  instructions       40.1G  ± 32.4K     40.1G  … 40.1G           1 ( 4%)        ⚡- 11.0% ±  0.0%
  cache_references   2.15G  ± 15.4M     2.13G  … 2.20G           2 ( 8%)         +   1.2% ±  0.4%
  cache_misses       79.2M  ± 1.55M     77.0M  … 82.8M           0 ( 0%)        ⚡-  8.9% ±  0.9%
  branch_misses      73.4M  ±  564K     72.6M  … 74.8M           0 ( 0%)        ⚡- 14.3% ±  0.4%

This was just a badly implemented arena anyway.

This reduces the cost of the new data structure until the multi-threaded behavior is actually used.

This allows them to be atomically replaced.

(There are no supported backends.)

andrewrk · 2024-07-08T20:52:46Z

Data point: building the zig compiler with the x86_64 backend with codegen threading enabled:

--- a/src/InternPool.zig
+++ b/src/InternPool.zig
@@ -93,7 +93,7 @@ files: std.AutoArrayHashMapUnmanaged(Cache.BinDigest, OptionalDeclIndex) = .{},
 /// Whether a multi-threaded intern pool is useful.
 /// Currently `false` until the intern pool is actually accessed
 /// from multiple threads to reduce the cost of this data structure.
-const want_multi_threaded = false;
+const want_multi_threaded = true;
 
 /// Whether a single-threaded intern pool impl is in use.
 pub const single_threaded = builtin.single_threaded or !want_multi_threaded;
diff --git a/src/target.zig b/src/target.zig
index 2accc100b8..e236cea616 100644
--- a/src/target.zig
+++ b/src/target.zig
@@ -572,6 +572,7 @@ pub inline fn backendSupportsFeature(backend: std.builtin.CompilerBackend, compt
             else => false,
         },
         .separate_thread => switch (backend) {
+            .stage2_x86_64 => true,
             else => false,
         },
     };

before: 8f20e81
after: ab4eeb7
(before/after this PR merged)

Benchmark 1 (3 runs): /home/andy/dev/zig/build-release/stage3/bin/zig build-exe -fno-llvm -fno-lld --stack 33554432 -fno-sanitize-thread -ODebug --dep aro --dep aro_translate_c --dep build_options -Mroot=/home/andy/dev/zig/src/main.zig -Maro=/home/andy/dev/zig/lib/compiler/aro/aro.zig --dep aro -Maro_translate_c=/home/andy/dev/zig/lib/compiler/aro_translate_c.zig -Mbuild_options=/home/andy/dev/zig/.zig-cache/c/ecf36f9c39c1cbfb3043b07b8688debd/options.zig --cache-dir /home/andy/dev/zig/.zig-cache --global-cache-dir /home/andy/.cache/zig --name zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          12.8s  ±  189ms    12.7s  … 13.1s           0 ( 0%)        0%
  peak_rss            403MB ±  463KB     403MB …  404MB          0 ( 0%)        0%
  cpu_cycles         64.3G  ±  124M     64.2G  … 64.5G           0 ( 0%)        0%
  instructions        133G  ±  171K      133G  …  133G           0 ( 0%)        0%
  cache_references   4.07G  ± 4.88M     4.06G  … 4.07G           0 ( 0%)        0%
  cache_misses        419M  ± 3.33M      416M  …  423M           0 ( 0%)        0%
  branch_misses       435M  ±  831K      434M  …  436M           0 ( 0%)        0%
Benchmark 2 (3 runs): /home/andy/src/zig/build-release/stage3/bin/zig build-exe -fno-llvm -fno-lld --stack 33554432 -fno-sanitize-thread -ODebug --dep aro --dep aro_translate_c --dep build_options -Mroot=/home/andy/src/zig/src/main.zig -Maro=/home/andy/src/zig/lib/compiler/aro/aro.zig --dep aro -Maro_translate_c=/home/andy/src/zig/lib/compiler/aro_translate_c.zig -Mbuild_options=/home/andy/src/zig/.zig-cache/c/f18315ac6d459d8944cdba2c81e890df/options.zig --cache-dir /home/andy/src/zig/.zig-cache --global-cache-dir /home/andy/.cache/zig --name zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          8.57s  ± 66.9ms    8.52s  … 8.64s           0 ( 0%)        ⚡- 33.3% ±  2.5%
  peak_rss            517MB ± 3.73MB     513MB …  520MB          0 ( 0%)        💩+ 28.1% ±  1.5%
  cpu_cycles         65.3G  ±  131M     65.2G  … 65.4G           0 ( 0%)        💩+  1.5% ±  0.4%
  instructions        138G  ± 1.22M      138G  …  138G           0 ( 0%)        💩+  3.5% ±  0.0%
  cache_references   4.01G  ± 17.6M     4.00G  … 4.03G           0 ( 0%)          -  1.4% ±  0.7%
  cache_misses        220M  ± 4.17M      215M  …  223M           0 ( 0%)        ⚡- 47.5% ±  2.0%
  branch_misses       340M  ± 2.84M      338M  …  343M           0 ( 0%)        ⚡- 21.9% ±  1.1%

Jarred-Sumner · 2024-07-09T04:42:14Z

wow 12.8s end-to-end before

with -fno-emit-bin, bun currently takes ~11s (it's about 1m 15s end-to-end, most of which is the "Emit LLVM" step)

❯ time .cache/zig/zig build check --summary new --verbose
info: zig compiler v0.13.0
/root/bun/.cache/zig/zig build-obj -freference-trace=16 -fno-strip -fno-stack-check -fno-stack-protector -fno-omit-frame-pointer -fvalgrind -fPIC -ODebug -target native-native-gnu.2.27 -mcpu native --dep async_io --dep zlib-internal --dep async --dep ZigGeneratedClasses --dep ResolvedSourceTag --dep build_options -Mroot=/root/bun/root.zig -Masync_io=/root/bun/src/io/io_linux.zig -Mzlib-internal=/root/bun/src/deps/zlib.posix.zig -Masync=/root/bun/src/async/posix_event_loop.zig -MZigGeneratedClasses=/root/bun/build/codegen/ZigGeneratedClasses.zig -MResolvedSourceTag=/root/bun/build/codegen/ResolvedSourceTag.zig -Mbuild_options=/root/bun/.zig-cache/c/e16e8fad5a41dcf3d3a150b03d8ff201/options.zig -lc++ -lc -fno-emit-bin -fformatted-panics --eh-frame-hdr --emit-relocs -ffunction-sections --cache-dir /root/bun/.zig-cache --global-cache-dir /root/.cache/zig --name bun-debug -fno-compiler-rt --zig-lib-dir /root/bun/src/deps/zig/lib --listen=-
Build Summary: 3/3 steps succeeded
check success
└─ zig build-obj bun-debug Debug native-native-gnu.2.27 success 11s MaxRSS:563M
   └─ options success

________________________________________________________
Executed in   14.32 secs    fish           external
   usr time   14.19 secs    0.00 micros   14.19 secs
   sys time    0.74 secs  482.00 micros    0.74 secs

any ideas immediately come to mind why bun's compilation time is much slower than the zig compiler despite less code?

andrewrk · 2024-07-09T07:10:36Z

Note that my 12.8s data point above is with the x86_64 backend. You can try enabling it, but it's not the default yet due to machine code quality, debug info correctness, and behavior test coverage.

zig/build.zig

Lines 218 to 220 in 854e86c

    
           const use_llvm = b.option(bool, "use-llvm", "Use the llvm backend"); 
        
           exe.use_llvm = use_llvm; 
        
           exe.use_lld = use_llvm;

If you're on new Apple hardware you'll have to wait for the aarch64 backend to get a similar speedup.

jacobly0 requested review from Snektron and kprotty as code owners July 7, 2024 13:11

jacobly0 force-pushed the tsip branch from 9da461d to d263a87 Compare July 7, 2024 14:15

jacobly0 added 14 commits July 7, 2024 22:59

Zcu: introduce PerThread and pass to all the functions

525f341

Zcu: pass PerThread to intern pool string functions

ca02266

InternPool: implement thread-safe hash map

cda716e

InternPool: use thread-safe hash map for strings

c8b9364

InternPool: replace garbage with an arena

3e1b190

This was just a badly implemented arena anyway.

InternPool: implement and use thread-safe list for strings

8293ff9

InternPool: implement and use thread-safe list for items

92ddb95

InternPool: temporarily disable multi-threaded behavior

383cffb

This reduces the cost of the new data structure until the multi-threaded behavior is actually used.

InternPool: remove usage of data with simple indices

49b2547

This allows them to be atomically replaced.

InternPool: implement and use thread-safe list for extra and limbs

bdae01a

bootstrap: fix build

166402c

InternPool: start documenting new thread-safe fields

1abc904

InternPool: fix dumping of simple types

1419201

InternPool: fix multi-thread build

c36e2bb

jacobly0 force-pushed the tsip branch from 259ecdc to 6f12072 Compare July 8, 2024 14:07

Compilation: put supported codegen backends on a separate thread

65ced4a

(There are no supported backends.)

jacobly0 force-pushed the tsip branch from 6f12072 to 65ced4a Compare July 8, 2024 15:51

andrewrk merged commit ab4eeb7 into ziglang:master Jul 8, 2024

andrewrk added the release notes This PR should be mentioned in the release notes. label Jul 8, 2024

jacobly0 deleted the tsip branch July 8, 2024 21:36

mlugg mentioned this pull request Jul 15, 2024

Runtime safety (panic interface, slices) #19764

Closed

40 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

InternPool: begin conversion to thread-safe data structure#20528

InternPool: begin conversion to thread-safe data structure#20528
andrewrk merged 15 commits intoziglang:masterfrom
jacobly0:tsip

jacobly0 commented Jul 7, 2024

Uh oh!

andrewrk commented Jul 8, 2024

Uh oh!

Jarred-Sumner commented Jul 9, 2024 •

edited

Loading

Uh oh!

andrewrk commented Jul 9, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jacobly0 commented Jul 7, 2024

Uh oh!

andrewrk commented Jul 8, 2024

Uh oh!

Jarred-Sumner commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewrk commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jarred-Sumner commented Jul 9, 2024 •

edited

Loading

andrewrk commented Jul 9, 2024 •

edited

Loading