Skip to content

Conversation

@ahamlat
Copy link
Contributor

@ahamlat ahamlat commented Sep 4, 2025

PR description

Improve a worst case scenario performance spotted by EEST memory tests on CallDataCopy opcode.
Surprisingly, the benchmarks showed that CallDataCopy performs worst when there's no actual data to copy, and this is related to the overhead of decoding the 3 values popped from the stack.

image

This PR implements an improvement for this case (data size = 0), to not decode the source offset if the number of bytes to copy is 0, and return directly the result.

Before this PR

Benchmark                                              (dataSize)  (fixedSrcDst)  (nonZeroData)  Mode  Cnt       Score      Error  Units
CallDataCopyOperationBenchmark.executeOperation                 0          false          false  avgt   15      97.077 ±    1.487  ns/op
CallDataCopyOperationBenchmark.executeOperation                 0          false           true  avgt   15      96.577 ±    1.966  ns/op
CallDataCopyOperationBenchmark.executeOperation                 0           true          false  avgt   15      88.163 ±    1.086  ns/op
CallDataCopyOperationBenchmark.executeOperation                 0           true           true  avgt   15      88.822 ±    1.711  ns/op
CallDataCopyOperationBenchmark.executeOperation               100          false          false  avgt   15     182.074 ±    8.460  ns/op
CallDataCopyOperationBenchmark.executeOperation               100          false           true  avgt   15     183.084 ±    4.897  ns/op
CallDataCopyOperationBenchmark.executeOperation               100           true          false  avgt   15     119.650 ±    9.987  ns/op
CallDataCopyOperationBenchmark.executeOperation               100           true           true  avgt   15     113.201 ±    0.695  ns/op
CallDataCopyOperationBenchmark.executeOperation             10240          false          false  avgt   15    1117.707 ±   15.820  ns/op
CallDataCopyOperationBenchmark.executeOperation             10240          false           true  avgt   15    1117.801 ±    7.763  ns/op
CallDataCopyOperationBenchmark.executeOperation             10240           true          false  avgt   15     312.852 ±   39.833  ns/op
CallDataCopyOperationBenchmark.executeOperation             10240           true           true  avgt   15     304.982 ±   35.716  ns/op
CallDataCopyOperationBenchmark.executeOperation           1048576          false          false  avgt   15  133887.226 ± 2171.843  ns/op
CallDataCopyOperationBenchmark.executeOperation           1048576          false           true  avgt   15  134614.670 ± 1066.297  ns/op
CallDataCopyOperationBenchmark.executeOperation           1048576           true          false  avgt   15   24759.562 ± 4016.988  ns/op
CallDataCopyOperationBenchmark.executeOperation           1048576           true           true  avgt   15   23758.696 ± 2273.540  ns/op

With this PR

Benchmark                                              (dataSize)  (fixedSrcDst)  (nonZeroData)  Mode  Cnt       Score      Error  Units
CallDataCopyOperationBenchmark.executeOperation                 0          false          false  avgt   15      71.941 ±    1.762  ns/op
CallDataCopyOperationBenchmark.executeOperation                 0          false           true  avgt   15      78.568 ±   12.322  ns/op
CallDataCopyOperationBenchmark.executeOperation                 0           true          false  avgt   15      66.725 ±    4.050  ns/op
CallDataCopyOperationBenchmark.executeOperation                 0           true           true  avgt   15      64.491 ±    2.025  ns/op
CallDataCopyOperationBenchmark.executeOperation               100          false          false  avgt   15     184.060 ±    4.993  ns/op
CallDataCopyOperationBenchmark.executeOperation               100          false           true  avgt   15     180.937 ±    9.094  ns/op
CallDataCopyOperationBenchmark.executeOperation               100           true          false  avgt   15     112.054 ±    2.260  ns/op
CallDataCopyOperationBenchmark.executeOperation               100           true           true  avgt   15     114.233 ±    1.935  ns/op
CallDataCopyOperationBenchmark.executeOperation             10240          false          false  avgt   15    1186.422 ±  187.566  ns/op
CallDataCopyOperationBenchmark.executeOperation             10240          false           true  avgt   15    1124.052 ±   43.443  ns/op
CallDataCopyOperationBenchmark.executeOperation             10240           true          false  avgt   15     304.100 ±   24.339  ns/op
CallDataCopyOperationBenchmark.executeOperation             10240           true           true  avgt   15     279.107 ±    2.298  ns/op
CallDataCopyOperationBenchmark.executeOperation           1048576          false          false  avgt   15  135896.704 ± 1630.046  ns/op
CallDataCopyOperationBenchmark.executeOperation           1048576          false           true  avgt   15  133703.610 ± 1657.089  ns/op
CallDataCopyOperationBenchmark.executeOperation           1048576           true          false  avgt   15   22804.807 ±  567.671  ns/op
CallDataCopyOperationBenchmark.executeOperation           1048576           true           true  avgt   15   22540.610 ±  571.624  ns/op

There is between 18 and 27 % improvement on zero data use case.

Configuration Before (ns/op) After (ns/op) Improvement Percentage
fixedSrcDst=false, nonZeroData=false 97.077 71.941 25.136 ns 25.9%
fixedSrcDst=false, nonZeroData=true 96.577 78.568 18.009 ns 18.6%
fixedSrcDst=true, nonZeroData=false 88.163 66.725 21.438 ns 24.3%
fixedSrcDst=true, nonZeroData=true 88.822 64.491 24.331 ns 27.4%

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • spotless: ./gradlew spotlessApply
  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests
  • hive tests: Engine or other RPCs modified?

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
@ahamlat ahamlat merged commit 20a6797 into hyperledger:main Sep 7, 2025
46 checks passed
jflo pushed a commit to jflo/besu that referenced this pull request Sep 8, 2025
* Improve CallDataCopyOperation worst case

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
jflo pushed a commit to jflo/besu that referenced this pull request Sep 8, 2025
* Improve CallDataCopyOperation worst case

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
Signed-off-by: jflo <justin+github@florentine.us>
georgereuben pushed a commit to georgereuben/besu that referenced this pull request Sep 16, 2025
* Improve CallDataCopyOperation worst case

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
Signed-off-by: georgereuben <reubengeorge101@gmail.com>
jflo pushed a commit to jflo/besu that referenced this pull request Oct 13, 2025
* Improve CallDataCopyOperation worst case

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
Signed-off-by: jflo <justin+github@florentine.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants