On ARM, LLVM generates this error:
vector_extract index must be a constant multiple of the result type's known minimum vector length.
For IR that looks like this:
%514 = call <16 x i8> @llvm.vector.extract.v16i8.nxv16i8(<vscale x 16 x i8> %512, i64 0)
%515 = call <16 x i8> @llvm.aarch64.neon.sabd.v16i8(<16 x i8> %513, <16 x i8> %514)
%516 = call <5 x i8> @llvm.vector.extract.v5i8.nxv21i8(<vscale x 21 x i8> %505, i64 16)
%517 = call <vscale x 16 x i8> @llvm.vector.insert.nxv16i8.v5i8(<vscale x 16 x i8> poison, <5 x i8> %516, i64 0)
%518 = call <5 x i8> @llvm.vector.extract.v5i8.nxv21i8(<vscale x 21 x i8> %508, i64 16)
%519 = call <vscale x 16 x i8> @llvm.vector.insert.nxv16i8.v5i8(<vscale x 16 x i8> poison, <5 x i8> %518, i64 0)
Looking through Halide, we never generate this intrinsic with any argument other than 0, so the 16 that appears in the IR is probably the result of LLVM internal optimizations.
To reproduce, run the correctness/fuzz_extract_lanes on ARM (currently in #8629), with seed 11290674455725750672.