At the end of every era, the chain stalls for some substantial period of time. This is likely due to Phragmen taking too long to execute, coupled with a timeout during block authoring that discards blocks if they take too long to evaluate when executed using wasmi (the slow interpreter).
We need to profile the wasmi execution of this using real data from the chain to confirm that it is taking longer than a few seconds to execute and see what of the algorithm is most problematic (trie i/o, general CPU, memory allocations, crypto), and whether there's a hidden O(N**2) complexity that we didn't see before.