Skip to content

cache system file deduplication#19388

Merged
andrewrk merged 6 commits intomasterfrom
cache-dedup
Mar 22, 2024
Merged

cache system file deduplication#19388
andrewrk merged 6 commits intomasterfrom
cache-dedup

Conversation

@andrewrk
Copy link
Member

@andrewrk andrewrk commented Mar 22, 2024

closes #16149

Contains some bonus enhancements to array hash maps.

Performance

Good news, this appears to make cache hits significantly faster.

Data point: cache hit building hello world with static musl libc

Benchmark 1 (61 runs): master/zig build-exe hello.c -target native-native-musl -lc
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          81.4ms ± 1.76ms    77.7ms … 87.1ms          1 ( 2%)        0%
  peak_rss           64.6MB ± 77.7KB    64.4MB … 64.7MB          0 ( 0%)        0%
  cpu_cycles         97.2M  ± 1.04M     95.1M  …  101M           1 ( 2%)        0%
  instructions        153M  ± 11.1K      152M  …  153M           0 ( 0%)        0%
  cache_references   2.21M  ± 97.1K     2.05M  … 2.54M           2 ( 3%)        0%
  cache_misses        529K  ± 24.4K      486K  …  600K           4 ( 7%)        0%
  branch_misses       409K  ± 6.45K      397K  …  437K           1 ( 2%)        0%
Benchmark 2 (189 runs): cache-dedup/zig build-exe hello.c -target native-native-musl -lc
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          25.8ms ± 1.26ms    23.9ms … 30.7ms         11 ( 6%)        ⚡- 68.4% ±  0.5%
  peak_rss           65.2MB ± 61.8KB    65.1MB … 65.4MB          2 ( 1%)        💩+  1.0% ±  0.0%
  cpu_cycles         41.2M  ±  608K     40.1M  … 46.3M           4 ( 2%)        ⚡- 57.6% ±  0.2%
  instructions       64.3M  ± 12.6K     64.3M  … 64.4M           2 ( 1%)        ⚡- 57.8% ±  0.0%
  cache_references   1.28M  ± 34.5K     1.21M  … 1.35M           0 ( 0%)        ⚡- 41.9% ±  0.7%
  cache_misses        348K  ± 18.6K      297K  …  396K           0 ( 0%)        ⚡- 34.2% ±  1.1%
  branch_misses       199K  ± 1.34K      197K  …  206K           6 ( 3%)        ⚡- 51.2% ±  0.2%

- more readable in markdown
- remove confusing stuff
- linkification
- rewording
- move parameter documentation to parameter documentation
The zig way is to let the compiler provide errors, rather than trying to
implement the compiler in the standard library.

I played around with this and found the compile errors to be easier to
comprehend without this logic.
Rather than an ArrayList. Provides deduplication.
Some users are hitting this limit. I think it's primarily due to not
deduplicating (solved in the previous commit) but this seems like a
better limit regardless.
@andrewrk andrewrk added the release notes This PR should be mentioned in the release notes. label Mar 22, 2024
@andrewrk andrewrk merged commit a2651cb into master Mar 22, 2024
@andrewrk andrewrk deleted the cache-dedup branch March 22, 2024 08:13
@squeek502
Copy link
Member

squeek502 commented Mar 23, 2024

Hitting a segfault on Windows after this, will try to investigate more and file an issue (EDIT: Issue filed here: #19408), but here's the reproduction:

> zig init

> zig build

then modify anything in build.zig (delete a comment, etc), then:

> zig build
thread 19376 panic: Segmentation fault at address 0xffffffffffffffff
C:\Users\Ryan\Programming\Zig\zig\lib\std\Build\Cache.zig:65:46: 0xeeb63c in hash (zig.exe.obj)
        return @truncate(std.hash.Wyhash.hash(pp.prefix, pp.sub_path));
                                             ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\Build\Cache.zig:323:43: 0xa0117b in hash (zig.exe.obj)
            return file.prefixed_path.hash();
                                          ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\array_hash_map.zig:1849:28: 0x6f66ee in checkedHash__anon_79003 (zig.exe.obj)
            return ctx.hash(key);
                           ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\array_hash_map.zig:1634:87: 0x1391d4b in getSlotByIndex__anon_115743 (zig.exe.obj)
            const h = if (store_hash) slice.items(.hash)[entry_index] else checkedHash(ctx, slice.items(.key)[entry_index]);
                                                                                      ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\array_hash_map.zig:1599:45: 0xeebc8f in removeFromIndexByIndexGeneric__anon_105092 (zig.exe.obj)
            const slot = self.getSlotByIndex(entry_index, ctx, header, I, indexes);
                                            ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\array_hash_map.zig:1593:58: 0xa017c1 in removeFromIndexByIndex (zig.exe.obj)
                .u8 => self.removeFromIndexByIndexGeneric(entry_index, ctx, header, u8, header.indexes(u8)),
                                                         ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\array_hash_map.zig:1379:48: 0x6f8707 in shrinkRetainingCapacityContext (zig.exe.obj)
                    self.removeFromIndexByIndex(i, if (store_hash) {} else ctx, header);
                                               ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\array_hash_map.zig:1367:55: 0x428a2f in shrinkRetainingCapacity (zig.exe.obj)
            return self.shrinkRetainingCapacityContext(new_len, undefined);
                                                      ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\Build\Cache.zig:639:43: 0x2511c2 in unhit (zig.exe.obj)
        self.files.shrinkRetainingCapacity(input_file_count);
                                          ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\Build\Cache.zig:605:27: 0x24f935 in hit (zig.exe.obj)
                self.unhit(bin_digest, input_file_count);
                          ^
C:\Users\Ryan\Programming\Zig\zig\src\Compilation.zig:2003:35: 0x2b09ec in update (zig.exe.obj)
            const is_hit = man.hit() catch |err| {
                                  ^
C:\Users\Ryan\Programming\Zig\zig\src\main.zig:4507:24: 0x2e3212 in updateModule (zig.exe.obj)
        try comp.update(main_progress_node);
                       ^
C:\Users\Ryan\Programming\Zig\zig\src\main.zig:5250:25: 0x36b428 in cmdBuild (zig.exe.obj)
            updateModule(comp, color) catch |err| switch (err) {
                        ^
C:\Users\Ryan\Programming\Zig\zig\src\main.zig:276:24: 0x18433b in mainArgs (zig.exe.obj)
        return cmdBuild(gpa, arena, cmd_args);
                       ^
C:\Users\Ryan\Programming\Zig\zig\src\main.zig:206:20: 0x18151e in main (zig.exe.obj)
    return mainArgs(gpa, arena, args);
                   ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\start.zig:484:5: 0x18124a in main (zig.exe.obj)
    return callMainWithArgs(@as(usize, @intCast(c_argc)), @as([*][*:0]u8, @ptrCast(c_argv)), envp);
    ^
C:\Users\Ryan\Programming\Zig\zig\lib\libc\mingw\crt\crtexe.c:267:0: 0x294b7c4 in __tmainCRTStartup (crt2.obj)
    mainret = _tmain (argc, argv, envp);

C:\Users\Ryan\Programming\Zig\zig\lib\libc\mingw\crt\crtexe.c:188:0: 0x294b81b in mainCRTStartup (crt2.obj)
  ret = __tmainCRTStartup ();

???:?:?: 0x7ffe569f7343 in ??? (KERNEL32.DLL)
???:?:?: 0x7ffe571426b0 in ??? (ntdll.dll)

%errorlevel% -2147483645

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release notes This PR should be mentioned in the release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

error: StreamTooLong when recompiling; duplicate source files in cache manifest

2 participants