Optimize performances of AND, OR, XOR and NOT opcodes #9489

ahamlat · 2025-11-24T14:45:52Z

PR description

This PR reimplements four arithmetic opcodes using the new UInt256 introduced in #9188.
It updates: AND, OR, XOR and NOT. The changes deliver the following improvements:

Opcode	Baseline (ns/op)	Optimized (ns/op)	Improvement (%)
AND	92.406	73.915	20.01%
OR	94.804	72.537	23.49%
XOR	92.872	70.806	23.76%
NOT	55.527	42.931	22.68%

You can find below the details of the benchmarks.

AND Opcode

Benchmark                                        Mode  Cnt   Score   Error  Units
AndOperationBenchmark.executeOperation           avgt   15  92.406 ± 0.816  ns/op
AndOperationOptimizedBenchmark.executeOperation  avgt   15  73.915 ± 0.754  ns/op

OR Opcode

Benchmark                                       Mode  Cnt   Score   Error  Units
OrOperationBenchmark.executeOperation           avgt   15  94.804 ± 1.068  ns/op
OrOperationOptimizedBenchmark.executeOperation  avgt   15  72.537 ± 0.305  ns/op

XOR Opcode

Benchmark                                        Mode  Cnt   Score   Error  Units
XorOperationBenchmark.executeOperation           avgt   15  92.872 ± 3.150  ns/op
XorOperationOptimizedBenchmark.executeOperation  avgt   15  70.806 ± 0.277  ns/op

NOT Opcode

Benchmark                                        Mode  Cnt   Score   Error  Units
NotOperationBenchmark.executeOperation           avgt   15  55.527 ± 0.206  ns/op
NotOperationOptimizedBenchmark.executeOperation  avgt   15  42.931 ± 0.168  ns/op

This PR adds JMH benchmarks for each arithmetic opcode to validate performance improvements.
To run a benchmark for a specific opcode, use the following command (example for AND):

./gradlew clean :ethereum:core:jmh -Pf=5 -Pwi=10 -Pi=10 -Pincludes=AndOperation

It also adds property-based tests for each opcode to ensure that the new implementations behave as expected.

The implementation also includes an optimization to the fromBytesBE method, which accounts for roughly half of the overall improvement. You can find below the improvement we get without changin fromBytesBE

Opcode	Baseline (ns/op)	Optimized (ns/op)	Improvement (%)
AND	92.692	85.100	8.19%
XOR	94.557	84.303	10.84%
NOT	55.576	51.272	7.74%
OR	94.460	83.484	11.62%

This new implementation was tested on MulMod and showed significant improvement for cases where fromBytesBE method takes a big share in execution time. You can find the numbers here.

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

Checked out our contribution guidelines?
Considered documentation and added the doc-change-required label to this PR if updates are required.
Considered the changelog and included an update if required.
For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

spotless: ./gradlew spotlessApply
unit tests: ./gradlew build
acceptance tests: ./gradlew acceptanceTest
integration tests: ./gradlew integrationTest
reference tests: ./gradlew ethereum:referenceTests:referenceTests
hive tests: Engine or other RPCs modified?

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>

thomas-quadratic

Thanks @ahamlat this is nice. Improvement on fromBytesBE is good. Very happy about the fast paths.

Did you try with ByteBuffer getInt instead of your helper getIntBE ? It is the same code, but I was under the impression that ByteBuffer.getInt has some hardware acceleration. But I am not sure.

Doing bitwise ops sequentially for all limbs seem the right approach to me right now.

Benchmarks show the average over all sizes. I think a possible improvement would be to parametrize sizes like for mod ops, but that would be probably only be interesting for fromBytesBE; bitwise ops are done on all limbs.

thomas-quadratic · 2025-11-25T11:58:09Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

+    result[5] = this.limbs[5] & other.limbs[5];
+    result[6] = this.limbs[6] & other.limbs[6];
+    result[7] = this.limbs[7] & other.limbs[7];
+    int resultLength = nSetLimbs(result);


In the current implementation, you don't necessarily have to do this operation, you could just set the length to N_LIMBS (or Math.min(this.length, other.length)). If that can lead to performance improvements.
But this is what we discussed the other time, do we want to optimise nSetLimbs with Arrays.mismatch and use it with little cost all the time ? Or do we keep this length interpretation ?
Similarly for other bitwise ops.

IIUC, with existing implementation, I can just replace with N_LIMBS.
I can do that change, but related to Arrays.mismatch, I think that was a proposal from @lu-pinto so will let him address it in another PR.

thomas-quadratic · 2025-11-25T12:10:04Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

+  }
+
+  // Helper method to read 4 bytes as big-endian int
+  private static int getIntBE(final byte[] bytes, final int offset) {


I am not sure, but I think that ByteBuffer.getInt in Java is the same code. However the compiler can use some intrinsics for it, where I am not sure it can with your code.
I can test it if you like.

The implementation that is executing is from HeapByteBuffer and it is quite different from the new suggested implementation. I removed completely the use of ByteBuffer on fromBytesBE.

This is the implementation that is executing before this PR

public int getInt() { return SCOPED_MEMORY_ACCESS.getIntUnaligned(session(), hb, byteOffset(nextGetIndex(4)), bigEndian); }

ahamlat · 2025-11-25T13:35:13Z

Did you try with ByteBuffer getInt instead of your helper getIntBE ?

As I removed the bytebuffer, I can't use that method anymore and the new one showed better performances.

I think a possible improvement would be to parametrize sizes like for mod ops, but that would be probably only be interesting for fromBytesBE; bitwise ops are done on all limbs.

Bitwise opcode are very simple and don't have a complex path execution. I think we should keep the benchmarks simple to be able to evaluate the performances very quickly.
I executed the Mod benchmarks with the new implemention from this PR and you can find the results here.

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>

ahamlat · 2025-11-26T15:24:34Z

It has indeed better performances when setting the length of the UInt256 result to 8 limbs

Benchmark                                        Mode  Cnt   Score   Error  Units
AndOperationBenchmark.executeOperation           avgt   15  94.534 ± 5.494  ns/op
AndOperationOptimizedBenchmark.executeOperation  avgt   15  69.519 ± 0.172  ns/op

Benchmark                                       Mode  Cnt   Score   Error  Units
OrOperationBenchmark.executeOperation           avgt   15  94.715 ± 1.157  ns/op
OrOperationOptimizedBenchmark.executeOperation  avgt   15  70.134 ± 1.206  ns/op

Benchmark                                        Mode  Cnt   Score   Error  Units
XorOperationBenchmark.executeOperation           avgt   15  94.613 ± 1.007  ns/op
XorOperationOptimizedBenchmark.executeOperation  avgt   15  69.575 ± 0.173  ns/op

ahamlat · 2025-12-01T07:20:54Z

@thomas-quadratic @lu-pinto I addressed all the comments, could you take another look ?

lu-pinto · 2025-12-01T11:29:20Z

evm/src/test/java/org/hyperledger/besu/evm/UInt256PropertyBasedTest.java

+    // Assert - compare with Bytes.and() (existing implementation)
+    final Bytes bytesA = Bytes32.leftPad(Bytes.wrap(a));
+    final Bytes bytesB = Bytes32.leftPad(Bytes.wrap(b));
+    final byte[] expected = bytesA.and(bytesB).toArrayUnsafe();


I would be more at ease if you would compare it with BigInteger instead of tuweni

Oh I see there's one with BigInteger next. Why compare with both then? Is it not overkill?

I was trying to as much as possible, so I don't think it is overkill, but I can remove it if you think we should only compare with the existing implementation.

lu-pinto · 2025-12-01T11:34:33Z

evm/src/test/java/org/hyperledger/besu/evm/UInt256PropertyBasedTest.java

+    final byte[] expected = bytesA.not().toArrayUnsafe();
+    assertThat(resultBytes).containsExactly(expected);
+
+    System.out.println("✓ Test PASSED - matches Bytes.not()");


why the printouts in tests? AI generated? That looks strange...

Yes, that was for testing purposes and forgot to remove it. Let me remove it.

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>

lu-pinto

LGTM

* Optimize AND, OR, XOR and NOT opcodes using new UInt256 implementation Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net> Co-authored-by: Luis Pinto <luis.pinto@consensys.net> Signed-off-by: Ali Zhagparov <alijakparov.kz@gmail.com>

* Optimize AND, OR, XOR and NOT opcodes using new UInt256 implementation Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net> Co-authored-by: Luis Pinto <luis.pinto@consensys.net> Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>

ahamlat added 6 commits November 21, 2025 15:30

Optimize And opcode using new UInt256 implementation

cf2cd89

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>

Optimize Xor opcode using new UInt256 implementation

bb46d8d

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>

Optimize Xor opcode using new UInt256 implementation

8aa8530

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>

Optimize Or and Not opcodes using new UInt256 implementation

d5bff8b

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>

Add Property based tests for Or and Not opcodes

e71a502

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>

Remove dead code

cdbb5a7

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>

ahamlat added the performance label Nov 24, 2025

ahamlat added 2 commits November 24, 2025 16:12

Merge branch 'main' into and-operation-optimized

3eaf6ed

Merge branch 'main' into and-operation-optimized

3eca1ba

thomas-quadratic reviewed Nov 25, 2025

View reviewed changes

Use a fixed length for UInt256 result

02bacbf

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>

ahamlat requested a review from thomas-quadratic November 26, 2025 15:25

Merge branch 'main' into and-operation-optimized

de8bb02

lu-pinto reviewed Dec 1, 2025

View reviewed changes

Remove a debug method

c0c4684

Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>

lu-pinto approved these changes Dec 1, 2025

View reviewed changes

lu-pinto and others added 2 commits December 1, 2025 13:58

Merge branch 'main' into and-operation-optimized

c5043b4

Merge branch 'main' into and-operation-optimized

8d1eb56

ahamlat merged commit c844ea1 into hyperledger:main Dec 2, 2025
46 checks passed

Optimize performances of AND, OR, XOR and NOT opcodes #9489

Optimize performances of AND, OR, XOR and NOT opcodes #9489

Uh oh!

Conversation

ahamlat commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR description

AND Opcode

OR Opcode

XOR Opcode

NOT Opcode

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

Locally, you can run these tests to catch failures early:

Uh oh!

thomas-quadratic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahamlat commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahamlat commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahamlat commented Dec 1, 2025

Uh oh!

lu-pinto Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lu-pinto left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ahamlat commented Nov 24, 2025 •

edited

Loading

ahamlat commented Nov 25, 2025 •

edited

Loading

ahamlat commented Nov 26, 2025 •

edited

Loading

lu-pinto Dec 1, 2025 •

edited

Loading