This was bought up at the last CG meeting and wasn't originally evaluated for this proposal. The question is if 128-bit shift-and-rotate operators should be added (IIRC, please correct me if I'm wrong). This would perhaps be i64.{shl,shr_s,shr_u,rotl,rotr}128 for example.
Performance and generated code should be evaluated for these operations today in comparison with what native platforms do. Ideally a benchmark or microbenchmark could be created to compare before/after performance of hypothetical operations.