Currently, TiledFastMatrix requires a suitable tile size to be specified in the constructor or otherwise it will fall back to the default tile size, which may not work for the matrix sizes involved.
The optimal tile size depends on the size of the input matrices as well as on the available amount of local memory on the GPU. TiledFastMatrix should be able to automatically determine a valid tile size.