#5479 uncovered an issue, it seems that the reference RGB -> YUV 420 test had excessive loads reported due to a shift being signed, and tracing being non-pure. This meant the extra code generated for shifts was executed, reporting more loads than necessary, and caused the test to pass.
After fixing this, the compute_with version does not actually have fewer loads than the reference.