It seems that top_mask and mask can contain -1 values in the Backward_cpu phase when multiple solvers run in parallel on the same weights (#1148). It seems the main loop (line 132) misses indexes, and I'm not sure how to change it to be resilient to weights changing concurrently.