Currently, this repo uses x86 intrinsics which cause it to fail to compile on ARM. Is there any way to support ARM/Neon intrinsics?