An interesting optimization for no_std platforms would be to support in-place decoders/encoders.
The entire crate is already written to operate over byte slices as inputs/outputs, which should make much of the existing implementation reusable in this context..