Ever since commit cafb0e0, the naive CPU-based implementation of matrix multiplication appears to be roughly taking twice as long. A 1000x1000 multiplication used to take between 6 and 7 seconds on a MacBook Air (early-2015 model); now it takes between 11 and 13 seconds, according to NaiveMatrixTest.