Starlark: reuse positional array in native calls where possible by stepancheg · Pull Request #12782 · bazelbuild/bazel

stepancheg · 2021-01-06T23:12:19Z

For certain calls like type(x) or str(x) or l.append(x) we
can reuse positional array argument of fastcall to pass it to
the MethodDescriptor.call and skip all runtime checks in
BuiltinFunction.

This optimization cannot be applied if

a function has extra positional or named arguments
has arguments guarded by flag
accepts extra StarlarkThread parameter
has unfilled arguments with default values

So this optimation won't be used for calls list(x) or bool().

For type(1) called in loop, the result is 15% speedup:

A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827
B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946
B/A: 0.852 0.820..0.884 (95% conf)

For bool() (no argument) called in loop, it results in 1% slowdown:

A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400
B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448
B/A: 1.010 0.976..1.045 (95% conf)

For certain calls like `type(x)` or `str(x)` or `l.append(x)` we can reuse `positional` array argument of `fastcall` to pass it to the `MethodDescriptor.call` and skip all runtime checks in `BuiltinFunction`. This optimization cannot be applied if * a function has extra positional or named arguments * has arguments guarded by flag * accepts extra `StarlarkThread` parameter * has unfilled arguments with default values So this optimation won't be used for calls `list(x)` or `bool()`. For `type(1)` called in loop, the result is 15% speedup: ``` A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827 B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946 B/A: 0.852 0.820..0.884 (95% conf) ``` For `bool()` (no argument) called in loop, it results in 1% slowdown: ``` A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400 B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448 B/A: 1.010 0.976..1.045 (95% conf) ```

Summary: Avoid array allocation/complex argument handling for simple builtin functions like `type` or `str`. The check is cheap, and the win is visible. Upstream PR: bazelbuild/bazel#12782 Reviewed By: mzlee fbshipit-source-id: 4f6bcaab88a51c33828da707c80634c8083b5b1e

tetromino

Thank you for the patch! It's clever and improves efficiency in the common case.

I do want to push another commit here to improve documentation and naming a bit (the important idea to emphasize is "use parameters as the java method invocation arg vector" rather than "without checks"), but the actual logic is sound.

tetromino · 2021-12-21T20:19:56Z

+  }
+
+  private static boolean paramCanBeUsedAsPositionalWithoutChecks(ParamDescriptor param) {
+    return param.isPositional() && param.disabledByFlag() == null && param.getAllowedClasses().contains(Object.class);


Actually, this check seems to be too conservative: we can still apply the optimization if non-positional parameters exist, as long as they are not used during the call.

…ment vector' for clarity. And reformat following Google style. No change in actual logic.

keertk · 2022-12-07T14:10:29Z

Hi there! Thank you for contributing to the Bazel repository. We appreciate your time and effort. We're doing a clean up of old PRs and will be closing this one since it seems to have stalled. Please feel free to reopen if you’re still interested in pursuing this or if you'd like to discuss anything further. We’ll respond as soon as we have the bandwidth/resources to do so.

This is a resubmission of bazelbuild#12782, with this description: ==== For certain calls like `type(x)` or `str(x)` or `l.append(x)` we can reuse `positional` array argument of `fastcall` to pass it to the `MethodDescriptor.call` and skip all runtime checks in `BuiltinFunction`. This optimization cannot be applied if * a function has extra positional or named arguments * has arguments guarded by flag * accepts extra `StarlarkThread` parameter * has unfilled arguments with default values So this optimation won't be used for calls `list(x)` or `bool()`. For `type(1)` called in loop, the result is 15% speedup: ``` A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827 B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946 B/A: 0.852 0.820..0.884 (95% conf) ``` For `bool()` (no argument) called in loop, it results in 1% slowdown: ``` A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400 B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448 B/A: 1.010 0.976..1.045 (95% conf) ``` ==== Beyond the original PR, this includes benchmarks, a test case addressing a review comment and small adaptions to the implementation to make it work with recent changes. Benchmark results with JDK 21 and `--seconds 5`: ``` before: File src/test/java/net/starlark/java/eval/testdata/bench_call.star: benchmark ops cpu/op wall/op steps/op alloc/op bench_call_bool 67108863 100ns 100ns 3 87B bench_call_dict_get 33554431 159ns 158ns 5 143B bench_call_dict_get_none 33554431 180ns 179ns 6 143B bench_call_list_append 33554431 170ns 169ns 5 207B bench_call_type 67108863 139ns 138ns 4 111B after: File src/test/java/net/starlark/java/eval/testdata/bench_call.star: benchmark ops cpu/op wall/op steps/op alloc/op bench_call_bool 67108863 100ns 100ns 3 87B bench_call_dict_get 33554431 155ns 154ns 5 143B bench_call_dict_get_none 33554431 174ns 174ns 6 143B bench_call_list_append 67108863 141ns 141ns 5 183B bench_call_type 67108863 115ns 114ns 4 87B ``` Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>

fmeum · 2023-10-03T06:22:09Z

@stepancheg I resubmitted this PR as #19708 with a few minor improvements on top, preserving your authorship, as it still provides a very relevant performance benefit. Let me know if you would rather revive this old PR.

This is a resubmission of bazelbuild#12782, with this description: ==== For certain calls like `type(x)` or `str(x)` or `l.append(x)` we can reuse `positional` array argument of `fastcall` to pass it to the `MethodDescriptor.call` and skip all runtime checks in `BuiltinFunction`. This optimization cannot be applied if * a function has extra positional or named arguments * has arguments guarded by flag * accepts extra `StarlarkThread` parameter * has unfilled arguments with default values So this optimation won't be used for calls `list(x)` or `bool()`. For `type(1)` called in loop, the result is 15% speedup: ``` A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827 B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946 B/A: 0.852 0.820..0.884 (95% conf) ``` For `bool()` (no argument) called in loop, it results in 1% slowdown: ``` A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400 B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448 B/A: 1.010 0.976..1.045 (95% conf) ``` ==== Beyond the original PR, this includes benchmarks, a test case addressing a review comment and small adaptions to the implementation to make it work with recent changes. Benchmark results with JDK 21 and `--seconds 5`: ``` before: File src/test/java/net/starlark/java/eval/testdata/bench_call.star: benchmark ops cpu/op wall/op steps/op alloc/op bench_call_bool 67108863 100ns 100ns 3 87B bench_call_dict_get 33554431 159ns 158ns 5 143B bench_call_dict_get_none 33554431 180ns 179ns 6 143B bench_call_list_append 33554431 170ns 169ns 5 207B bench_call_type 67108863 139ns 138ns 4 111B after: File src/test/java/net/starlark/java/eval/testdata/bench_call.star: benchmark ops cpu/op wall/op steps/op alloc/op bench_call_bool 67108863 100ns 100ns 3 87B bench_call_dict_get 33554431 155ns 154ns 5 143B bench_call_dict_get_none 33554431 174ns 174ns 6 143B bench_call_list_append 67108863 141ns 141ns 5 183B bench_call_type 67108863 115ns 114ns 4 87B ``` Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>

@stepancheg

This is a resubmission of @stepancheg's #12782, which became stale, with this original description: ==== For certain calls like `type(x)` or `str(x)` or `l.append(x)` we can reuse `positional` array argument of `fastcall` to pass it to the `MethodDescriptor.call` and skip all runtime checks in `BuiltinFunction`. This optimization cannot be applied if * a function has extra positional or named arguments * has arguments guarded by flag * accepts extra `StarlarkThread` parameter * has unfilled arguments with default values So this optimation won't be used for calls `list(x)` or `bool()`. For `type(1)` called in loop, the result is 15% speedup: ``` A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827 B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946 B/A: 0.852 0.820..0.884 (95% conf) ``` For `bool()` (no argument) called in loop, it results in 1% slowdown: ``` A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400 B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448 B/A: 1.010 0.976..1.045 (95% conf) ``` ==== Beyond the original PR, this includes benchmarks, a test case addressing a review comment and small adaptions to the implementation to make it work with recent changes. Benchmark results with JDK 21 and `--seconds 5`: ``` before: File src/test/java/net/starlark/java/eval/testdata/bench_call.star: benchmark ops cpu/op wall/op steps/op alloc/op bench_call_bool 67108863 100ns 100ns 3 87B bench_call_dict_get 33554431 159ns 158ns 5 143B bench_call_dict_get_none 33554431 180ns 179ns 6 143B bench_call_list_append 33554431 170ns 169ns 5 207B bench_call_type 67108863 139ns 138ns 4 111B after: File src/test/java/net/starlark/java/eval/testdata/bench_call.star: benchmark ops cpu/op wall/op steps/op alloc/op bench_call_bool 67108863 100ns 100ns 3 87B bench_call_dict_get 33554431 155ns 154ns 5 143B bench_call_dict_get_none 33554431 174ns 174ns 6 143B bench_call_list_append 67108863 141ns 141ns 5 183B bench_call_type 67108863 115ns 114ns 4 87B ``` Closes #19708. PiperOrigin-RevId: 602821608 Change-Id: If40e2eb1e09ec43d56b36ba13987aa5312fd47ec

@stepancheg

This is a resubmission of @stepancheg's bazelbuild#12782, which became stale, with this original description: ==== For certain calls like `type(x)` or `str(x)` or `l.append(x)` we can reuse `positional` array argument of `fastcall` to pass it to the `MethodDescriptor.call` and skip all runtime checks in `BuiltinFunction`. This optimization cannot be applied if * a function has extra positional or named arguments * has arguments guarded by flag * accepts extra `StarlarkThread` parameter * has unfilled arguments with default values So this optimation won't be used for calls `list(x)` or `bool()`. For `type(1)` called in loop, the result is 15% speedup: ``` A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827 B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946 B/A: 0.852 0.820..0.884 (95% conf) ``` For `bool()` (no argument) called in loop, it results in 1% slowdown: ``` A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400 B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448 B/A: 1.010 0.976..1.045 (95% conf) ``` ==== Beyond the original PR, this includes benchmarks, a test case addressing a review comment and small adaptions to the implementation to make it work with recent changes. Benchmark results with JDK 21 and `--seconds 5`: ``` before: File src/test/java/net/starlark/java/eval/testdata/bench_call.star: benchmark ops cpu/op wall/op steps/op alloc/op bench_call_bool 67108863 100ns 100ns 3 87B bench_call_dict_get 33554431 159ns 158ns 5 143B bench_call_dict_get_none 33554431 180ns 179ns 6 143B bench_call_list_append 33554431 170ns 169ns 5 207B bench_call_type 67108863 139ns 138ns 4 111B after: File src/test/java/net/starlark/java/eval/testdata/bench_call.star: benchmark ops cpu/op wall/op steps/op alloc/op bench_call_bool 67108863 100ns 100ns 3 87B bench_call_dict_get 33554431 155ns 154ns 5 143B bench_call_dict_get_none 33554431 174ns 174ns 6 143B bench_call_list_append 67108863 141ns 141ns 5 183B bench_call_type 67108863 115ns 114ns 4 87B ``` Closes bazelbuild#19708. PiperOrigin-RevId: 602821608 Change-Id: If40e2eb1e09ec43d56b36ba13987aa5312fd47ec

@stepancheg

…le (#21144) This is a resubmission of @stepancheg's #12782, which became stale, with this original description: ==== For certain calls like `type(x)` or `str(x)` or `l.append(x)` we can reuse `positional` array argument of `fastcall` to pass it to the `MethodDescriptor.call` and skip all runtime checks in `BuiltinFunction`. This optimization cannot be applied if * a function has extra positional or named arguments * has arguments guarded by flag * accepts extra `StarlarkThread` parameter * has unfilled arguments with default values So this optimation won't be used for calls `list(x)` or `bool()`. For `type(1)` called in loop, the result is 15% speedup: ``` A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827 B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946 B/A: 0.852 0.820..0.884 (95% conf) ``` For `bool()` (no argument) called in loop, it results in 1% slowdown: ``` A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400 B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448 B/A: 1.010 0.976..1.045 (95% conf) ``` ==== Beyond the original PR, this includes benchmarks, a test case addressing a review comment and small adaptions to the implementation to make it work with recent changes. Benchmark results with JDK 21 and `--seconds 5`: ``` before: File src/test/java/net/starlark/java/eval/testdata/bench_call.star: benchmark ops cpu/op wall/op steps/op alloc/op bench_call_bool 67108863 100ns 100ns 3 87B bench_call_dict_get 33554431 159ns 158ns 5 143B bench_call_dict_get_none 33554431 180ns 179ns 6 143B bench_call_list_append 33554431 170ns 169ns 5 207B bench_call_type 67108863 139ns 138ns 4 111B after: File src/test/java/net/starlark/java/eval/testdata/bench_call.star: benchmark ops cpu/op wall/op steps/op alloc/op bench_call_bool 67108863 100ns 100ns 3 87B bench_call_dict_get 33554431 155ns 154ns 5 143B bench_call_dict_get_none 33554431 174ns 174ns 6 143B bench_call_list_append 67108863 141ns 141ns 5 183B bench_call_type 67108863 115ns 114ns 4 87B ``` Closes #19708. Commit aded06b PiperOrigin-RevId: 602821608 Change-Id: If40e2eb1e09ec43d56b36ba13987aa5312fd47ec Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>

stepancheg requested review from alandonovan and brandjon as code owners January 6, 2021 23:12

google-cla Bot added the cla: yes label Jan 6, 2021

jin added the team-Starlark label Jan 15, 2021

jin assigned alandonovan Jan 15, 2021

brandjon assigned brandjon and unassigned alandonovan Feb 20, 2021

brandjon added team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel and removed team-Starlark labels Feb 20, 2021

stepancheg requested a review from tetromino as a code owner December 2, 2021 17:58

philwo force-pushed the master branch from 26cb401 to 168b89b Compare December 2, 2021 18:07

tetromino approved these changes Dec 21, 2021

View reviewed changes

tetromino requested changes Dec 21, 2021

View reviewed changes

Rephrase 'reuse positional without checks' as 'positional can be argu…

5882d70

…ment vector' for clarity. And reformat following Google style. No change in actual logic.

sgowroji added the awaiting-user-response Awaiting a response from the author label Oct 19, 2022

keertk closed this Dec 7, 2022

fmeum mentioned this pull request Oct 3, 2023

Starlark: reuse positional array in native calls where possible #19708

Closed

bazel-io mentioned this pull request Jan 30, 2024

[7.1.0] Starlark: reuse positional array in native calls where possible #21144

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Starlark: reuse positional array in native calls where possible#12782

Starlark: reuse positional array in native calls where possible#12782
stepancheg wants to merge 2 commits intobazelbuild:masterfrom
stepancheg:reuse

stepancheg commented Jan 6, 2021

Uh oh!

tetromino left a comment

Uh oh!

tetromino Dec 21, 2021

Uh oh!

keertk commented Dec 7, 2022

Uh oh!

fmeum commented Oct 3, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

stepancheg commented Jan 6, 2021

Uh oh!

tetromino left a comment

Choose a reason for hiding this comment

Uh oh!

tetromino Dec 21, 2021

Choose a reason for hiding this comment

Uh oh!

keertk commented Dec 7, 2022

Uh oh!

fmeum commented Oct 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

fmeum commented Oct 3, 2023 •

edited

Loading