Skip stack_start_aligned for immediate-abort#153936
Skip stack_start_aligned for immediate-abort#153936rust-bors[bot] merged 1 commit intorust-lang:mainfrom
Conversation
|
rustbot has assigned @Mark-Simulacrum. Use Why was this reviewer chosen?The reviewer was selected based on:
|
| // Avoid stack_start_aligned, which makes slow syscalls to read /proc/self/maps | ||
| if cfg!(panic = "immediate-abort") { | ||
| return None; | ||
| } |
There was a problem hiding this comment.
This should also be done for other targets where the OS already provides guard pages for the stack, right?
There was a problem hiding this comment.
Added to install_main_guard_bsds
There was a problem hiding this comment.
The FreeBSD code can also be skipped.
There was a problem hiding this comment.
Added to install_main_guard_freebsd
|
It would have been nice if we could lazily get the stack bounds of the main thread inside the SIGSEGV handler, but it looks like |
974aa59 to
5d49b34
Compare
This improves startup performance by 16%, shown by an optimized hello-world program. glibc's `pthread_getattr_np` performs expensive syscalls when reading `/proc/self/maps`. That is all wasted with `panic = immediate-abort` active because `init()` immediately discards the return value from `install_main_guard()`. A similar improvement can be seen in environments that don't have `/proc`. This change is safe because the immediately succeeding comment says that we rely on Linux's "own stack-guard mechanism".
5d49b34 to
577dba9
Compare
| if cfg!(panic = "immediate-abort") { | ||
| return None; | ||
| } | ||
| // FreeBSD's stack autogrows, and optionally includes a guard page |
There was a problem hiding this comment.
@asomers do you have context on what this "optionally" here is referring to? Not really related to this PR but since we intend to guarantee stack guards, I'm wondering if we should be branching on something and defining our own if FreeBSD didn't?
…dPthreadGetattrNp, r=Mark-Simulacrum Skip stack_start_aligned for immediate-abort This improves startup performance by 16%, shown by an optimized hello-world program. glibc's `pthread_getattr_np` performs expensive syscalls when reading `/proc/self/maps`. That is all wasted with `panic = immediate-abort` active because `init()` immediately discards the return value from `install_main_guard()`. A similar improvement can be seen in environments that don't have `/proc`. This change is safe because the immediately succeeding comment says that we rely on Linux's "own stack-guard mechanism". Tracking issue: rust-lang#147286 # Benchmark Set it up with `cargo new hello-world2`, and replace these files: ```toml # Cargo.toml cargo-features = ["panic-immediate-abort"] [package] name = "hello-world" version = "0.1.0" edition = "2024" [profile.release] lto = true panic = "immediate-abort" codegen-units = 1 opt-level = "z" strip = true # .cargo/config.toml [unstable] build-std = ["std"] ``` ## Before ```console home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 524.8 µs ± 65.1 µs [User: 276.1 µs, System: 187.0 µs] Range (min … max): 446.4 µs … 975.5 µs 3996 runs home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 519.4 µs ± 65.8 µs [User: 282.1 µs, System: 177.7 µs] Range (min … max): 443.2 µs … 830.5 µs 3612 runs home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 520.0 µs ± 64.3 µs [User: 277.1 µs, System: 182.1 µs] Range (min … max): 447.1 µs … 1001.3 µs 3804 runs ``` For a visualization of the problem, run `cargo +stage1 build --release && perf record --call-graph dwarf -F max ./target/release/hello-world2 && perf script | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg`: <img width="3832" height="1216" alt="flamegraph with 17.41% __pthread_getattr_np" src="https://github.com/user-attachments/assets/acc2286e-1582-4772-9e3b-68b5c35e3e70" /> ## After ```console home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2Benchmark 1: target/release/hello-world2 Time (mean ± σ): 444.7 µs ± 57.3 µs [User: 257.4 µs, System: 130.2 µs] Range (min … max): 379.4 µs … 1289.3 µs 3893 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 452.3 µs ± 60.7 µs [User: 261.5 µs, System: 133.5 µs] Range (min … max): 374.9 µs … 1512.4 µs 4177 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 441.2 µs ± 56.1 µs [User: 256.2 µs, System: 128.8 µs] Range (min … max): 375.0 µs … 760.4 µs 4032 runs ```
…dPthreadGetattrNp, r=Mark-Simulacrum Skip stack_start_aligned for immediate-abort This improves startup performance by 16%, shown by an optimized hello-world program. glibc's `pthread_getattr_np` performs expensive syscalls when reading `/proc/self/maps`. That is all wasted with `panic = immediate-abort` active because `init()` immediately discards the return value from `install_main_guard()`. A similar improvement can be seen in environments that don't have `/proc`. This change is safe because the immediately succeeding comment says that we rely on Linux's "own stack-guard mechanism". Tracking issue: rust-lang#147286 # Benchmark Set it up with `cargo new hello-world2`, and replace these files: ```toml # Cargo.toml cargo-features = ["panic-immediate-abort"] [package] name = "hello-world" version = "0.1.0" edition = "2024" [profile.release] lto = true panic = "immediate-abort" codegen-units = 1 opt-level = "z" strip = true # .cargo/config.toml [unstable] build-std = ["std"] ``` ## Before ```console home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 524.8 µs ± 65.1 µs [User: 276.1 µs, System: 187.0 µs] Range (min … max): 446.4 µs … 975.5 µs 3996 runs home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 519.4 µs ± 65.8 µs [User: 282.1 µs, System: 177.7 µs] Range (min … max): 443.2 µs … 830.5 µs 3612 runs home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 520.0 µs ± 64.3 µs [User: 277.1 µs, System: 182.1 µs] Range (min … max): 447.1 µs … 1001.3 µs 3804 runs ``` For a visualization of the problem, run `cargo +stage1 build --release && perf record --call-graph dwarf -F max ./target/release/hello-world2 && perf script | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg`: <img width="3832" height="1216" alt="flamegraph with 17.41% __pthread_getattr_np" src="https://github.com/user-attachments/assets/acc2286e-1582-4772-9e3b-68b5c35e3e70" /> ## After ```console home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2Benchmark 1: target/release/hello-world2 Time (mean ± σ): 444.7 µs ± 57.3 µs [User: 257.4 µs, System: 130.2 µs] Range (min … max): 379.4 µs … 1289.3 µs 3893 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 452.3 µs ± 60.7 µs [User: 261.5 µs, System: 133.5 µs] Range (min … max): 374.9 µs … 1512.4 µs 4177 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 441.2 µs ± 56.1 µs [User: 256.2 µs, System: 128.8 µs] Range (min … max): 375.0 µs … 760.4 µs 4032 runs ```
…dPthreadGetattrNp, r=Mark-Simulacrum Skip stack_start_aligned for immediate-abort This improves startup performance by 16%, shown by an optimized hello-world program. glibc's `pthread_getattr_np` performs expensive syscalls when reading `/proc/self/maps`. That is all wasted with `panic = immediate-abort` active because `init()` immediately discards the return value from `install_main_guard()`. A similar improvement can be seen in environments that don't have `/proc`. This change is safe because the immediately succeeding comment says that we rely on Linux's "own stack-guard mechanism". Tracking issue: rust-lang#147286 # Benchmark Set it up with `cargo new hello-world2`, and replace these files: ```toml # Cargo.toml cargo-features = ["panic-immediate-abort"] [package] name = "hello-world" version = "0.1.0" edition = "2024" [profile.release] lto = true panic = "immediate-abort" codegen-units = 1 opt-level = "z" strip = true # .cargo/config.toml [unstable] build-std = ["std"] ``` ## Before ```console home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 524.8 µs ± 65.1 µs [User: 276.1 µs, System: 187.0 µs] Range (min … max): 446.4 µs … 975.5 µs 3996 runs home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 519.4 µs ± 65.8 µs [User: 282.1 µs, System: 177.7 µs] Range (min … max): 443.2 µs … 830.5 µs 3612 runs home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 520.0 µs ± 64.3 µs [User: 277.1 µs, System: 182.1 µs] Range (min … max): 447.1 µs … 1001.3 µs 3804 runs ``` For a visualization of the problem, run `cargo +stage1 build --release && perf record --call-graph dwarf -F max ./target/release/hello-world2 && perf script | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg`: <img width="3832" height="1216" alt="flamegraph with 17.41% __pthread_getattr_np" src="https://github.com/user-attachments/assets/acc2286e-1582-4772-9e3b-68b5c35e3e70" /> ## After ```console home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2Benchmark 1: target/release/hello-world2 Time (mean ± σ): 444.7 µs ± 57.3 µs [User: 257.4 µs, System: 130.2 µs] Range (min … max): 379.4 µs … 1289.3 µs 3893 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 452.3 µs ± 60.7 µs [User: 261.5 µs, System: 133.5 µs] Range (min … max): 374.9 µs … 1512.4 µs 4177 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. home@daniel-desktop3:~/CLionProjects/hello-world2$ hyperfine -N target/release/hello-world2 Benchmark 1: target/release/hello-world2 Time (mean ± σ): 441.2 µs ± 56.1 µs [User: 256.2 µs, System: 128.8 µs] Range (min … max): 375.0 µs … 760.4 µs 4032 runs ```
…uwer Rollup of 22 pull requests Successful merges: - #122668 (Add APIs for dealing with titlecase) - #152543 (privacy: Fix type privacy holes in RPITITs) - #153107 (Optimize BTreeMap::append() using CursorMut) - #153312 (Packages as namespaces part 1) - #153534 (Remove a flaky `got_timeout` assert from two channel tests) - #153718 (Fix environ on FreeBSD with cdylib targets that use -Wl,--no-undefined .) - #153857 (Rename `target.abi` to `target.cfg_abi` and enum-ify llvm_abiname) - #153880 (Lifted intersperse and intersperse_with Fused transformation and updated documentation + tests) - #153931 (remove usages of to-be-deprecated numeric constants) - #150630 (Unknown -> Unsupported compression algorithm) - #153491 (Move `freeze_*` methods to `OpenOptionsExt2`) - #153582 (Simplify find_attr! for HirId usage) - #153623 (std: move `sys::pal::os` to `sys::paths`) - #153647 (docs(fs): Clarify That File::lock Coordinates Across Processes) - #153936 (Skip stack_start_aligned for immediate-abort) - #154011 (implement `BinaryHeap::as_mut_slice`) - #154167 (ui/lto: move and rename two tests from issues/) - #154174 (allow `incomplete_features` in most UI tests) - #154175 (Add new alias for Guillaume Gomez email address) - #154182 (diagnostics: avoid ICE for undeclared generic parameter in impl) - #154188 (Update the tracking issue for #[diagnostic::on_move]) - #154201 (Use enums to clarify `DepNodeColorMap` color marking )
…uwer Rollup of 22 pull requests Successful merges: - #122668 (Add APIs for dealing with titlecase) - #152543 (privacy: Fix type privacy holes in RPITITs) - #153107 (Optimize BTreeMap::append() using CursorMut) - #153312 (Packages as namespaces part 1) - #153534 (Remove a flaky `got_timeout` assert from two channel tests) - #153718 (Fix environ on FreeBSD with cdylib targets that use -Wl,--no-undefined .) - #153857 (Rename `target.abi` to `target.cfg_abi` and enum-ify llvm_abiname) - #153880 (Lifted intersperse and intersperse_with Fused transformation and updated documentation + tests) - #153931 (remove usages of to-be-deprecated numeric constants) - #150630 (Unknown -> Unsupported compression algorithm) - #153491 (Move `freeze_*` methods to `OpenOptionsExt2`) - #153582 (Simplify find_attr! for HirId usage) - #153623 (std: move `sys::pal::os` to `sys::paths`) - #153647 (docs(fs): Clarify That File::lock Coordinates Across Processes) - #153936 (Skip stack_start_aligned for immediate-abort) - #154011 (implement `BinaryHeap::as_mut_slice`) - #154167 (ui/lto: move and rename two tests from issues/) - #154174 (allow `incomplete_features` in most UI tests) - #154175 (Add new alias for Guillaume Gomez email address) - #154182 (diagnostics: avoid ICE for undeclared generic parameter in impl) - #154188 (Update the tracking issue for #[diagnostic::on_move]) - #154201 (Use enums to clarify `DepNodeColorMap` color marking )
…uwer Rollup of 21 pull requests Successful merges: - #152543 (privacy: Fix type privacy holes in RPITITs) - #153107 (Optimize BTreeMap::append() using CursorMut) - #153312 (Packages as namespaces part 1) - #153534 (Remove a flaky `got_timeout` assert from two channel tests) - #153718 (Fix environ on FreeBSD with cdylib targets that use -Wl,--no-undefined .) - #153857 (Rename `target.abi` to `target.cfg_abi` and enum-ify llvm_abiname) - #153880 (Lifted intersperse and intersperse_with Fused transformation and updated documentation + tests) - #153931 (remove usages of to-be-deprecated numeric constants) - #150630 (Unknown -> Unsupported compression algorithm) - #153491 (Move `freeze_*` methods to `OpenOptionsExt2`) - #153582 (Simplify find_attr! for HirId usage) - #153623 (std: move `sys::pal::os` to `sys::paths`) - #153647 (docs(fs): Clarify That File::lock Coordinates Across Processes) - #153936 (Skip stack_start_aligned for immediate-abort) - #154011 (implement `BinaryHeap::as_mut_slice`) - #154167 (ui/lto: move and rename two tests from issues/) - #154174 (allow `incomplete_features` in most UI tests) - #154175 (Add new alias for Guillaume Gomez email address) - #154182 (diagnostics: avoid ICE for undeclared generic parameter in impl) - #154188 (Update the tracking issue for #[diagnostic::on_move]) - #154201 (Use enums to clarify `DepNodeColorMap` color marking )
This improves startup performance by 16%, shown by an optimized hello-world program. glibc's
pthread_getattr_npperforms expensive syscalls when reading/proc/self/maps. That is all wasted withpanic = immediate-abortactive becauseinit()immediately discards the return value frominstall_main_guard(). A similar improvement can be seen in environments that don't have/proc. This change is safe because the immediately succeeding comment says that we rely on Linux's "own stack-guard mechanism".Tracking issue: #147286
Benchmark
Set it up with
cargo new hello-world2, and replace these files:Before
For a visualization of the problem, run
cargo +stage1 build --release && perf record --call-graph dwarf -F max ./target/release/hello-world2 && perf script | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg:After