-
Notifications
You must be signed in to change notification settings - Fork 995
Optimize performances of AND, OR, XOR and NOT opcodes #9489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
thomas-quadratic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ahamlat this is nice. Improvement on fromBytesBE is good. Very happy about the fast paths.
Did you try with ByteBuffer getInt instead of your helper getIntBE ? It is the same code, but I was under the impression that ByteBuffer.getInt has some hardware acceleration. But I am not sure.
Doing bitwise ops sequentially for all limbs seem the right approach to me right now.
Benchmarks show the average over all sizes. I think a possible improvement would be to parametrize sizes like for mod ops, but that would be probably only be interesting for fromBytesBE; bitwise ops are done on all limbs.
| result[5] = this.limbs[5] & other.limbs[5]; | ||
| result[6] = this.limbs[6] & other.limbs[6]; | ||
| result[7] = this.limbs[7] & other.limbs[7]; | ||
| int resultLength = nSetLimbs(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current implementation, you don't necessarily have to do this operation, you could just set the length to N_LIMBS (or Math.min(this.length, other.length)). If that can lead to performance improvements.
But this is what we discussed the other time, do we want to optimise nSetLimbs with Arrays.mismatch and use it with little cost all the time ? Or do we keep this length interpretation ?
Similarly for other bitwise ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, with existing implementation, I can just replace with N_LIMBS.
I can do that change, but related to Arrays.mismatch, I think that was a proposal from @lu-pinto so will let him address it in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
|
|
||
| // Helper method to read 4 bytes as big-endian int | ||
| private static int getIntBE(final byte[] bytes, final int offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, but I think that ByteBuffer.getInt in Java is the same code. However the compiler can use some intrinsics for it, where I am not sure it can with your code.
I can test it if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the implementation that is executing before this PR
public int getInt() {
return SCOPED_MEMORY_ACCESS.getIntUnaligned(session(), hb, byteOffset(nextGetIndex(4)), bigEndian);
}
As I removed the bytebuffer, I can't use that method anymore and the new one showed better performances.
Bitwise opcode are very simple and don't have a complex path execution. I think we should keep the benchmarks simple to be able to evaluate the performances very quickly.
|
Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
|
It has indeed better performances when setting the length of the UInt256 result to 8 limbs |
|
@thomas-quadratic @lu-pinto I addressed all the comments, could you take another look ? |
| // Assert - compare with Bytes.and() (existing implementation) | ||
| final Bytes bytesA = Bytes32.leftPad(Bytes.wrap(a)); | ||
| final Bytes bytesB = Bytes32.leftPad(Bytes.wrap(b)); | ||
| final byte[] expected = bytesA.and(bytesB).toArrayUnsafe(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be more at ease if you would compare it with BigInteger instead of tuweni
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see there's one with BigInteger next. Why compare with both then? Is it not overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to as much as possible, so I don't think it is overkill, but I can remove it if you think we should only compare with the existing implementation.
| final byte[] expected = bytesA.not().toArrayUnsafe(); | ||
| assertThat(resultBytes).containsExactly(expected); | ||
|
|
||
| System.out.println("✓ Test PASSED - matches Bytes.not()"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the printouts in tests? AI generated? That looks strange...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was for testing purposes and forgot to remove it. Let me remove it.
Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
lu-pinto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Optimize AND, OR, XOR and NOT opcodes using new UInt256 implementation Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net> Co-authored-by: Luis Pinto <luis.pinto@consensys.net> Signed-off-by: Ali Zhagparov <alijakparov.kz@gmail.com>
* Optimize AND, OR, XOR and NOT opcodes using new UInt256 implementation Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net> Co-authored-by: Luis Pinto <luis.pinto@consensys.net> Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>


PR description
This PR reimplements four arithmetic opcodes using the new UInt256 introduced in #9188.
It updates: AND, OR, XOR and NOT. The changes deliver the following improvements:
You can find below the details of the benchmarks.
AND Opcode
OR Opcode
XOR Opcode
NOT Opcode
This PR adds JMH benchmarks for each arithmetic opcode to validate performance improvements.
To run a benchmark for a specific opcode, use the following command (example for AND):
It also adds property-based tests for each opcode to ensure that the new implementations behave as expected.
The implementation also includes an optimization to the
fromBytesBEmethod, which accounts for roughly half of the overall improvement. You can find below the improvement we get without changinfromBytesBEThis new implementation was tested on MulMod and showed significant improvement for cases where
fromBytesBEmethod takes a big share in execution time. You can find the numbers here.Fixed Issue(s)
Thanks for sending a pull request! Have you done the following?
doc-change-requiredlabel to this PR if updates are required.Locally, you can run these tests to catch failures early:
./gradlew spotlessApply./gradlew build./gradlew acceptanceTest./gradlew integrationTest./gradlew ethereum:referenceTests:referenceTests