-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Zig Version
0.10.0-dev.3685+dae7aeb33
Steps to Reproduce
Put the following code in test.zig:
const std = @import("std");
const Test = struct {
bigArray: [1000000]u32 = undefined,
pub fn passByValueTest(self: Test) u32 {
return self.bigArray[0];
}
};
pub fn main() void {
var v = Test{};
var t1 = std.time.nanoTimestamp();
_ = v.passByValueTest();
var t2 = std.time.nanoTimestamp();
std.log.info("{}", .{t2 -% t1});
}Run it with zig run test.zig -Drelease-fast=true or zig run -fstage1 test.zig -Drelease-fast=true
Edit: -Drelease-fast=true doesn't do anything for zig run. The performance issue only happens in debug.
Expected Behavior
The documentation says:
"Structs, unions, and arrays can sometimes be more efficiently passed as a reference, since a copy could be arbitrarily expensive depending on the size. When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster."
I expect that both stage1 and stage2 should pass the struct by reference, because at this size(4 MB) a copy clearly is less performant.
Actual Behavior
stage2 is significantly slower:
$ zig run -fstage1 test.zig -Drelease-fast=true
info: 300
$ zig run test.zig -Drelease-fast=true
info: 2245980
godbolt further reveals that stage2 is doing a memcpy, while stage1 doesn't.
So the performance regression is definitely caused by a pass by value.
This might be related to #12518, but I do not have the time to check that(it's a lot of code and no minimal reproducible example).
Bonus
It does work with just a big array:
const std = @import("std");
pub fn passByValueTest(bigArray: [1000000]u32) u32 {
return bigArray[0];
}
pub fn main() void {
var v: [1000000]u32 = undefined;
var t1 = std.time.nanoTimestamp();
_ = passByValueTest(v);
var t2 = std.time.nanoTimestamp();
std.log.info("{}", .{t2 -% t1});
}Output:
$ zig run -fstage1 test.zig -Drelease-fast=true
info: 340
$ zig run test.zig -Drelease-fast=true
info: 300