Skip to content

Starlark: reuse positional array in native calls where possible#12782

Closed
stepancheg wants to merge 2 commits intobazelbuild:masterfrom
stepancheg:reuse
Closed

Starlark: reuse positional array in native calls where possible#12782
stepancheg wants to merge 2 commits intobazelbuild:masterfrom
stepancheg:reuse

Conversation

@stepancheg
Copy link
Copy Markdown
Contributor

For certain calls like type(x) or str(x) or l.append(x) we
can reuse positional array argument of fastcall to pass it to
the MethodDescriptor.call and skip all runtime checks in
BuiltinFunction.

This optimization cannot be applied if

  • a function has extra positional or named arguments
  • has arguments guarded by flag
  • accepts extra StarlarkThread parameter
  • has unfilled arguments with default values

So this optimation won't be used for calls list(x) or bool().

For type(1) called in loop, the result is 15% speedup:

A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827
B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946
B/A: 0.852 0.820..0.884 (95% conf)

For bool() (no argument) called in loop, it results in 1% slowdown:

A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400
B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448
B/A: 1.010 0.976..1.045 (95% conf)

For certain calls like `type(x)` or `str(x)` or `l.append(x)` we
can reuse `positional` array argument of `fastcall` to pass it to
the `MethodDescriptor.call` and skip all runtime checks in
`BuiltinFunction`.

This optimization cannot be applied if
* a function has extra positional or named arguments
* has arguments guarded by flag
* accepts extra `StarlarkThread` parameter
* has unfilled arguments with default values

So this optimation won't be used for calls `list(x)` or `bool()`.

For `type(1)` called in loop, the result is 15% speedup:

```
A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827
B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946
B/A: 0.852 0.820..0.884 (95% conf)
```

For `bool()` (no argument) called in loop, it results in 1% slowdown:

```
A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400
B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448
B/A: 1.010 0.976..1.045 (95% conf)
```
@google-cla google-cla Bot added the cla: yes label Jan 6, 2021
@brandjon brandjon assigned brandjon and unassigned alandonovan Feb 20, 2021
@brandjon brandjon added team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel and removed team-Starlark labels Feb 20, 2021
facebook-github-bot pushed a commit to facebook/buck that referenced this pull request Mar 10, 2021
Summary:
Avoid array allocation/complex argument handling for simple builtin functions like `type` or `str`.

The check is cheap, and the win is visible.

Upstream PR: bazelbuild/bazel#12782

Reviewed By: mzlee

fbshipit-source-id: 4f6bcaab88a51c33828da707c80634c8083b5b1e
@stepancheg stepancheg requested a review from tetromino as a code owner December 2, 2021 17:58
Copy link
Copy Markdown
Contributor

@tetromino tetromino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch! It's clever and improves efficiency in the common case.

I do want to push another commit here to improve documentation and naming a bit (the important idea to emphasize is "use parameters as the java method invocation arg vector" rather than "without checks"), but the actual logic is sound.

}

private static boolean paramCanBeUsedAsPositionalWithoutChecks(ParamDescriptor param) {
return param.isPositional() && param.disabledByFlag() == null && param.getAllowedClasses().contains(Object.class);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this check seems to be too conservative: we can still apply the optimization if non-positional parameters exist, as long as they are not used during the call.

…ment vector' for clarity.

And reformat following Google style. No change in actual logic.
@sgowroji sgowroji added the awaiting-user-response Awaiting a response from the author label Oct 19, 2022
@keertk
Copy link
Copy Markdown
Member

keertk commented Dec 7, 2022

Hi there! Thank you for contributing to the Bazel repository. We appreciate your time and effort. We're doing a clean up of old PRs and will be closing this one since it seems to have stalled. Please feel free to reopen if you’re still interested in pursuing this or if you'd like to discuss anything further. We’ll respond as soon as we have the bandwidth/resources to do so.

@keertk keertk closed this Dec 7, 2022
fmeum added a commit to fmeum/bazel that referenced this pull request Oct 3, 2023
This is a resubmission of bazelbuild#12782, with this description:

====

For certain calls like `type(x)` or `str(x)` or `l.append(x)` we
can reuse `positional` array argument of `fastcall` to pass it to
the `MethodDescriptor.call` and skip all runtime checks in
`BuiltinFunction`.

This optimization cannot be applied if
* a function has extra positional or named arguments
* has arguments guarded by flag
* accepts extra `StarlarkThread` parameter
* has unfilled arguments with default values

So this optimation won't be used for calls `list(x)` or `bool()`.

For `type(1)` called in loop, the result is 15% speedup:

```
A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827
B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946
B/A: 0.852 0.820..0.884 (95% conf)
```

For `bool()` (no argument) called in loop, it results in 1% slowdown:

```
A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400
B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448
B/A: 1.010 0.976..1.045 (95% conf)
```

====

Beyond the original PR, this includes benchmarks, a test case addressing
a review comment and small adaptions to the implementation to make it
work with recent changes.

Benchmark results with JDK 21 and `--seconds 5`:

```
before:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      159ns      158ns          5       143B
bench_call_dict_get_none    33554431      180ns      179ns          6       143B
bench_call_list_append      33554431      170ns      169ns          5       207B
bench_call_type             67108863      139ns      138ns          4       111B

after:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      155ns      154ns          5       143B
bench_call_dict_get_none    33554431      174ns      174ns          6       143B
bench_call_list_append      67108863      141ns      141ns          5       183B
bench_call_type             67108863      115ns      114ns          4        87B
```

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
@fmeum
Copy link
Copy Markdown
Collaborator

fmeum commented Oct 3, 2023

@stepancheg I resubmitted this PR as #19708 with a few minor improvements on top, preserving your authorship, as it still provides a very relevant performance benefit. Let me know if you would rather revive this old PR.

fmeum added a commit to fmeum/bazel that referenced this pull request Jan 24, 2024
This is a resubmission of bazelbuild#12782, with this description:

====

For certain calls like `type(x)` or `str(x)` or `l.append(x)` we
can reuse `positional` array argument of `fastcall` to pass it to
the `MethodDescriptor.call` and skip all runtime checks in
`BuiltinFunction`.

This optimization cannot be applied if
* a function has extra positional or named arguments
* has arguments guarded by flag
* accepts extra `StarlarkThread` parameter
* has unfilled arguments with default values

So this optimation won't be used for calls `list(x)` or `bool()`.

For `type(1)` called in loop, the result is 15% speedup:

```
A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827
B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946
B/A: 0.852 0.820..0.884 (95% conf)
```

For `bool()` (no argument) called in loop, it results in 1% slowdown:

```
A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400
B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448
B/A: 1.010 0.976..1.045 (95% conf)
```

====

Beyond the original PR, this includes benchmarks, a test case addressing
a review comment and small adaptions to the implementation to make it
work with recent changes.

Benchmark results with JDK 21 and `--seconds 5`:

```
before:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      159ns      158ns          5       143B
bench_call_dict_get_none    33554431      180ns      179ns          6       143B
bench_call_list_append      33554431      170ns      169ns          5       207B
bench_call_type             67108863      139ns      138ns          4       111B

after:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      155ns      154ns          5       143B
bench_call_dict_get_none    33554431      174ns      174ns          6       143B
bench_call_list_append      67108863      141ns      141ns          5       183B
bench_call_type             67108863      115ns      114ns          4        87B
```

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
copybara-service Bot pushed a commit that referenced this pull request Jan 30, 2024
This is a resubmission of @stepancheg's #12782, which became stale, with this original description:

====

For certain calls like `type(x)` or `str(x)` or `l.append(x)` we can reuse `positional` array argument of `fastcall` to pass it to the `MethodDescriptor.call` and skip all runtime checks in `BuiltinFunction`.

This optimization cannot be applied if
* a function has extra positional or named arguments
* has arguments guarded by flag
* accepts extra `StarlarkThread` parameter
* has unfilled arguments with default values

So this optimation won't be used for calls `list(x)` or `bool()`.

For `type(1)` called in loop, the result is 15% speedup:

```
A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827
B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946
B/A: 0.852 0.820..0.884 (95% conf)
```

For `bool()` (no argument) called in loop, it results in 1% slowdown:

```
A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400
B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448
B/A: 1.010 0.976..1.045 (95% conf)
```

====

Beyond the original PR, this includes benchmarks, a test case addressing a review comment and small adaptions to the implementation to make it work with recent changes.

Benchmark results with JDK 21 and `--seconds 5`:

```
before:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      159ns      158ns          5       143B
bench_call_dict_get_none    33554431      180ns      179ns          6       143B
bench_call_list_append      33554431      170ns      169ns          5       207B
bench_call_type             67108863      139ns      138ns          4       111B

after:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      155ns      154ns          5       143B
bench_call_dict_get_none    33554431      174ns      174ns          6       143B
bench_call_list_append      67108863      141ns      141ns          5       183B
bench_call_type             67108863      115ns      114ns          4        87B
```

Closes #19708.

PiperOrigin-RevId: 602821608
Change-Id: If40e2eb1e09ec43d56b36ba13987aa5312fd47ec
bazel-io pushed a commit to bazel-io/bazel that referenced this pull request Jan 30, 2024
This is a resubmission of @stepancheg's bazelbuild#12782, which became stale, with this original description:

====

For certain calls like `type(x)` or `str(x)` or `l.append(x)` we can reuse `positional` array argument of `fastcall` to pass it to the `MethodDescriptor.call` and skip all runtime checks in `BuiltinFunction`.

This optimization cannot be applied if
* a function has extra positional or named arguments
* has arguments guarded by flag
* accepts extra `StarlarkThread` parameter
* has unfilled arguments with default values

So this optimation won't be used for calls `list(x)` or `bool()`.

For `type(1)` called in loop, the result is 15% speedup:

```
A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827
B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946
B/A: 0.852 0.820..0.884 (95% conf)
```

For `bool()` (no argument) called in loop, it results in 1% slowdown:

```
A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400
B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448
B/A: 1.010 0.976..1.045 (95% conf)
```

====

Beyond the original PR, this includes benchmarks, a test case addressing a review comment and small adaptions to the implementation to make it work with recent changes.

Benchmark results with JDK 21 and `--seconds 5`:

```
before:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      159ns      158ns          5       143B
bench_call_dict_get_none    33554431      180ns      179ns          6       143B
bench_call_list_append      33554431      170ns      169ns          5       207B
bench_call_type             67108863      139ns      138ns          4       111B

after:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      155ns      154ns          5       143B
bench_call_dict_get_none    33554431      174ns      174ns          6       143B
bench_call_list_append      67108863      141ns      141ns          5       183B
bench_call_type             67108863      115ns      114ns          4        87B
```

Closes bazelbuild#19708.

PiperOrigin-RevId: 602821608
Change-Id: If40e2eb1e09ec43d56b36ba13987aa5312fd47ec
github-merge-queue Bot pushed a commit that referenced this pull request Jan 30, 2024
…le (#21144)

This is a resubmission of @stepancheg's #12782, which became stale, with
this original description:

====

For certain calls like `type(x)` or `str(x)` or `l.append(x)` we can
reuse `positional` array argument of `fastcall` to pass it to the
`MethodDescriptor.call` and skip all runtime checks in
`BuiltinFunction`.

This optimization cannot be applied if
* a function has extra positional or named arguments
* has arguments guarded by flag
* accepts extra `StarlarkThread` parameter
* has unfilled arguments with default values

So this optimation won't be used for calls `list(x)` or `bool()`.

For `type(1)` called in loop, the result is 15% speedup:

```
A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827
B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946
B/A: 0.852 0.820..0.884 (95% conf)
```

For `bool()` (no argument) called in loop, it results in 1% slowdown:

```
A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400
B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448
B/A: 1.010 0.976..1.045 (95% conf)
```

====

Beyond the original PR, this includes benchmarks, a test case addressing
a review comment and small adaptions to the implementation to make it
work with recent changes.

Benchmark results with JDK 21 and `--seconds 5`:

```
before:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      159ns      158ns          5       143B
bench_call_dict_get_none    33554431      180ns      179ns          6       143B
bench_call_list_append      33554431      170ns      169ns          5       207B
bench_call_type             67108863      139ns      138ns          4       111B

after:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      155ns      154ns          5       143B
bench_call_dict_get_none    33554431      174ns      174ns          6       143B
bench_call_list_append      67108863      141ns      141ns          5       183B
bench_call_type             67108863      115ns      114ns          4        87B
```

Closes #19708.

Commit
aded06b

PiperOrigin-RevId: 602821608
Change-Id: If40e2eb1e09ec43d56b36ba13987aa5312fd47ec

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting-user-response Awaiting a response from the author cla: yes team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants