It is becoming popular to preinitialize matrices, especially projection matrices and MLP matrices) with identity. Recommended e.g. by the Maluuba guys in "A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data".
Another part of this is taking a more serious look at relu again as a transfer function - of course with tanh() the identity will be a bit skewed even though repeated tanh() near zero doesn't have a big effect.