The current implementation of mul_wide uses schoolbook multiplication, which has complexity of the order of O(n*m) where n, m are the number of limbs in the operands.
Perhaps we should switch to an asymptotically better algorithm like karatsuba multiplication. If so I would be happy to work on this.