Skip to content

Conversation

@lu-pinto
Copy link
Member

This is a cherry-pick from the performance branch that we used to test the changes in the latest interop.

original PR: #8814

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • spotless: ./gradlew spotlessApply
  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests

@lu-pinto lu-pinto force-pushed the operand-stack-growth-strategy branch from 86da553 to da60750 Compare June 29, 2025 08:01
}

private static int newLength(final int oldCapacity, final int minGrowth, final int prefGrowth) {
return oldCapacity + Math.max(minGrowth, prefGrowth);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a danger of int overflow

Copy link
Member Author

@lu-pinto lu-pinto Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @macfarla!
There was also another existing problem when someone sets maxSize == Integer.MAX_VALUE which is not a realistic value since the JVM is not able to allocate that many entries. You get something like this:

Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit
	at java.base/java.lang.reflect.Array.newArray(Native Method)
	at java.base/java.lang.reflect.Array.newInstance(Array.java:78)
	at FlexStack.expandEntries(FlexStack.java:124)
	at FlexStack.push(FlexStack.java:143)
	at Test.main(Test.java:6)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now. Couldn't write tests because we would have to increase the heap size to 16G for such a case so I checked it separately offline.

top = nextTop;
}

private int newLength(final int oldCapacity, final int minGrowth, final int prefGrowth) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private int newLength(final int oldCapacity, final int minGrowth, final int prefGrowth) {
private int newLength(final int currentCapacity, final int minGrowth, final int prefGrowth) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make consistent with other changes

@lu-pinto lu-pinto force-pushed the operand-stack-growth-strategy branch from 4ffbfe5 to 8104be6 Compare July 3, 2025 13:36
@lu-pinto
Copy link
Member Author

lu-pinto commented Jul 3, 2025

After speaking with @ahamlat today, it's best to assess performance on mainnet first and do some benchmarking for CALL* before pulling this in

@lu-pinto lu-pinto force-pushed the operand-stack-growth-strategy branch 2 times, most recently from 396ed88 to 92ed95e Compare July 8, 2025 18:23
@macfarla
Copy link
Contributor

moving to draft since we need to do performance testing on this before merging

@macfarla macfarla marked this pull request as draft July 22, 2025 23:17
@lu-pinto lu-pinto force-pushed the operand-stack-growth-strategy branch 3 times, most recently from fbd2f92 to 3ecd27a Compare August 6, 2025 16:45
@lu-pinto
Copy link
Member Author

lu-pinto commented Aug 6, 2025

I think the added benchmark might unblock this PR, waiting on feedback.
I got the following results running on a r6a.2xlarge:

Without this PR:

Benchmark                     (stackDepth)  Mode  Cnt     Score    Error  Units
OperandStackBenchmark.fillUp             6  avgt   15    12.804 ±  0.021  ns/op
OperandStackBenchmark.fillUp            15  avgt   15    26.348 ±  0.028  ns/op
OperandStackBenchmark.fillUp            34  avgt   15   145.041 ±  7.118  ns/op
OperandStackBenchmark.fillUp           100  avgt   15   335.065 ±  0.573  ns/op
OperandStackBenchmark.fillUp           234  avgt   15   843.487 ± 16.104  ns/op
OperandStackBenchmark.fillUp           500  avgt   15  2102.937 ±  9.191  ns/op
OperandStackBenchmark.fillUp           800  avgt   15  4502.884 ± 17.090  ns/op
OperandStackBenchmark.fillUp          1024  avgt   15  6943.397 ± 18.922  ns/op

With this PR:

Benchmark                     (stackDepth)  Mode  Cnt     Score    Error  Units
OperandStackBenchmark.fillUp             6  avgt   15    22.623 ±  0.913  ns/op
OperandStackBenchmark.fillUp            15  avgt   15    30.296 ±  0.149  ns/op
OperandStackBenchmark.fillUp            34  avgt   15    58.666 ±  0.090  ns/op
OperandStackBenchmark.fillUp           100  avgt   15   277.010 ±  0.520  ns/op
OperandStackBenchmark.fillUp           234  avgt   15   700.040 ±  1.873  ns/op
OperandStackBenchmark.fillUp           500  avgt   15  1610.791 ±  1.557  ns/op
OperandStackBenchmark.fillUp           800  avgt   15  2632.757 ± 33.295  ns/op
OperandStackBenchmark.fillUp          1024  avgt   15  3150.211 ± 13.124  ns/op

It's noticeable the gains at a higher stack height as expected (x2). On the lower end, up until 32, since there's no resize, there's a slight regression. However, I would like to note that the average used stack height stands at around 75 as we can see below:
image
This is the maximum stack size expansion in each CALL - green line from a node closer to genesis and yellow line from a node closer to HEAD.

lu-pinto and others added 7 commits August 7, 2025 12:07
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
@lu-pinto lu-pinto force-pushed the operand-stack-growth-strategy branch from 3ecd27a to ac0e781 Compare August 7, 2025 11:07
@lu-pinto lu-pinto marked this pull request as ready for review August 7, 2025 11:07
Copy link
Contributor

@siladu siladu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Regression tested on mainnet and no regressions found.
Also no obvious difference in performance detected.

@siladu
Copy link
Contributor

siladu commented Aug 12, 2025

@lu-pinto maybe warrants a changelog entry

@lu-pinto lu-pinto enabled auto-merge (squash) August 18, 2025 11:10
@lu-pinto lu-pinto merged commit f21630e into hyperledger:main Aug 18, 2025
46 checks passed
KimH4nKyul pushed a commit to KimH4nKyul/besu that referenced this pull request Aug 21, 2025
This PR changes the stack rate growth strategy of FlexStack to grow at a 50% rate instead of constantly growing by 32 slots as is currently. The former is the approach adopted by the JDK for ArrayList.

With a 50% allocation rate and considering a max stack size of 1024, setting the initial size to 91 one can reach the worse case (1024) in just 6 resizes compared to 32 required with the current approach.
Also the current usage on mainnet has shown that CALLs never go above 150 deep during execution but this can change in the future.

Without this PR:
Benchmark                     (stackDepth)  Mode  Cnt     Score    Error  Units
OperandStackBenchmark.fillUp             6  avgt   15    12.804 ±  0.021  ns/op
OperandStackBenchmark.fillUp            15  avgt   15    26.348 ±  0.028  ns/op
OperandStackBenchmark.fillUp            34  avgt   15   145.041 ±  7.118  ns/op
OperandStackBenchmark.fillUp           100  avgt   15   335.065 ±  0.573  ns/op
OperandStackBenchmark.fillUp           234  avgt   15   843.487 ± 16.104  ns/op
OperandStackBenchmark.fillUp           500  avgt   15  2102.937 ±  9.191  ns/op
OperandStackBenchmark.fillUp           800  avgt   15  4502.884 ± 17.090  ns/op
OperandStackBenchmark.fillUp          1024  avgt   15  6943.397 ± 18.922  ns/op

With this PR:
Benchmark                     (stackDepth)  Mode  Cnt     Score    Error  Units
OperandStackBenchmark.fillUp             6  avgt   15    22.623 ±  0.913  ns/op
OperandStackBenchmark.fillUp            15  avgt   15    30.296 ±  0.149  ns/op
OperandStackBenchmark.fillUp            34  avgt   15    58.666 ±  0.090  ns/op
OperandStackBenchmark.fillUp           100  avgt   15   277.010 ±  0.520  ns/op
OperandStackBenchmark.fillUp           234  avgt   15   700.040 ±  1.873  ns/op
OperandStackBenchmark.fillUp           500  avgt   15  1610.791 ±  1.557  ns/op
OperandStackBenchmark.fillUp           800  avgt   15  2632.757 ± 33.295  ns/op
OperandStackBenchmark.fillUp          1024  avgt   15  3150.211 ± 13.124  ns/op
jflo pushed a commit to jflo/besu that referenced this pull request Sep 8, 2025
This PR changes the stack rate growth strategy of FlexStack to grow at a 50% rate instead of constantly growing by 32 slots as is currently. The former is the approach adopted by the JDK for ArrayList.

With a 50% allocation rate and considering a max stack size of 1024, setting the initial size to 91 one can reach the worse case (1024) in just 6 resizes compared to 32 required with the current approach.
Also the current usage on mainnet has shown that CALLs never go above 150 deep during execution but this can change in the future.

Without this PR:
Benchmark                     (stackDepth)  Mode  Cnt     Score    Error  Units
OperandStackBenchmark.fillUp             6  avgt   15    12.804 ±  0.021  ns/op
OperandStackBenchmark.fillUp            15  avgt   15    26.348 ±  0.028  ns/op
OperandStackBenchmark.fillUp            34  avgt   15   145.041 ±  7.118  ns/op
OperandStackBenchmark.fillUp           100  avgt   15   335.065 ±  0.573  ns/op
OperandStackBenchmark.fillUp           234  avgt   15   843.487 ± 16.104  ns/op
OperandStackBenchmark.fillUp           500  avgt   15  2102.937 ±  9.191  ns/op
OperandStackBenchmark.fillUp           800  avgt   15  4502.884 ± 17.090  ns/op
OperandStackBenchmark.fillUp          1024  avgt   15  6943.397 ± 18.922  ns/op

With this PR:
Benchmark                     (stackDepth)  Mode  Cnt     Score    Error  Units
OperandStackBenchmark.fillUp             6  avgt   15    22.623 ±  0.913  ns/op
OperandStackBenchmark.fillUp            15  avgt   15    30.296 ±  0.149  ns/op
OperandStackBenchmark.fillUp            34  avgt   15    58.666 ±  0.090  ns/op
OperandStackBenchmark.fillUp           100  avgt   15   277.010 ±  0.520  ns/op
OperandStackBenchmark.fillUp           234  avgt   15   700.040 ±  1.873  ns/op
OperandStackBenchmark.fillUp           500  avgt   15  1610.791 ±  1.557  ns/op
OperandStackBenchmark.fillUp           800  avgt   15  2632.757 ± 33.295  ns/op
OperandStackBenchmark.fillUp          1024  avgt   15  3150.211 ± 13.124  ns/op
jflo pushed a commit to jflo/besu that referenced this pull request Sep 8, 2025
This PR changes the stack rate growth strategy of FlexStack to grow at a 50% rate instead of constantly growing by 32 slots as is currently. The former is the approach adopted by the JDK for ArrayList.

With a 50% allocation rate and considering a max stack size of 1024, setting the initial size to 91 one can reach the worse case (1024) in just 6 resizes compared to 32 required with the current approach.
Also the current usage on mainnet has shown that CALLs never go above 150 deep during execution but this can change in the future.

Without this PR:
Benchmark                     (stackDepth)  Mode  Cnt     Score    Error  Units
OperandStackBenchmark.fillUp             6  avgt   15    12.804 ±  0.021  ns/op
OperandStackBenchmark.fillUp            15  avgt   15    26.348 ±  0.028  ns/op
OperandStackBenchmark.fillUp            34  avgt   15   145.041 ±  7.118  ns/op
OperandStackBenchmark.fillUp           100  avgt   15   335.065 ±  0.573  ns/op
OperandStackBenchmark.fillUp           234  avgt   15   843.487 ± 16.104  ns/op
OperandStackBenchmark.fillUp           500  avgt   15  2102.937 ±  9.191  ns/op
OperandStackBenchmark.fillUp           800  avgt   15  4502.884 ± 17.090  ns/op
OperandStackBenchmark.fillUp          1024  avgt   15  6943.397 ± 18.922  ns/op

With this PR:
Benchmark                     (stackDepth)  Mode  Cnt     Score    Error  Units
OperandStackBenchmark.fillUp             6  avgt   15    22.623 ±  0.913  ns/op
OperandStackBenchmark.fillUp            15  avgt   15    30.296 ±  0.149  ns/op
OperandStackBenchmark.fillUp            34  avgt   15    58.666 ±  0.090  ns/op
OperandStackBenchmark.fillUp           100  avgt   15   277.010 ±  0.520  ns/op
OperandStackBenchmark.fillUp           234  avgt   15   700.040 ±  1.873  ns/op
OperandStackBenchmark.fillUp           500  avgt   15  1610.791 ±  1.557  ns/op
OperandStackBenchmark.fillUp           800  avgt   15  2632.757 ± 33.295  ns/op
OperandStackBenchmark.fillUp          1024  avgt   15  3150.211 ± 13.124  ns/op

Signed-off-by: jflo <justin+github@florentine.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants