Add SSE2/AVX2/WASM SIMD support#86
Merged
james7132 merged 13 commits intopetgraph:masterfrom Mar 19, 2024
Merged
Conversation
Collaborator
Author
|
Note: I think this currently breaks serde support. |
This reverts commit 5282a4d.
Collaborator
Author
|
Reran benchmarks, this seems to be a significant gain on multiple fronts when the right CPU features are enabled during compilation: The Mask changes might need to be reverted though. The regressions in |
SkiFire13
reviewed
Mar 19, 2024
Comment on lines
+547
to
+555
| // SAFETY: This is using the exact same allocation pattern, size, and capacity | ||
| // making this reconstruction of the Vec safe. | ||
| let mut data = unsafe { | ||
| let mut data = ManuallyDrop::new(self.data); | ||
| let ptr = data.as_mut_ptr().cast(); | ||
| let len = data.len() * SimdBlock::USIZE_COUNT; | ||
| let capacity = data.capacity() * SimdBlock::USIZE_COUNT; | ||
| Vec::from_raw_parts(ptr, len, capacity) | ||
| }; |
Contributor
There was a problem hiding this comment.
I don't think this is safe. The Vec is initially allocated with SimdBlocks, which can have a different alignment than usizes. For example the one in avx2 has an alignment of 32 and the one in sse2 has an alignment of 16, while usize only has an alignment of 8.
Closed
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #73. Changes
Blockfrom a alias to a platform/target-specific newtype aroundusize,__m128i,__m256i, orv128.This supports all SIMD intrinsics that have been stabilized into the standard library.
SSE2 is universally available on all x86_64 machines, so this should see a 4x speedup relative to the u32-based approach originally used before #74.
AVX2 is only on~89% of consumer machines, so it may not be fully reliable, but should show another 2x speedup over SSE2. Those who are using this in a cloud or server environment will likely benefit from using
--target-cpu=native, which should enable it on the target machine.NOTE: This adds a lot of unsafe code, simply by nature of using SIMD intrinsics. There's a good chunk of
core::mem::transmutegoing around too, though I try to keep it to a minimum.Performance
Using a ported version of the benchmarks to Criterion (see #84), on set or batch operations, like
insert_range,intersection_with, etc. these SIMD accelerated versions are expectedly 2-4 times faster than when usingusizeas the block, which should extend the performance gains of #74 even further.I tested optionally using runtime feature detection via
ix_x86_feature_detectedon these operations, and unfortunately that causes serious regressions.