We should come back to attn1511 while trying out a bilinear form for attention rather than the dot product or elementwise weighed sum. This is basically an analog of our projection layer, and what MemNNs use for memory-level attention and what Danqi Chen, Jason Bolton and Christopher D. Manning report as quite helpful for the CNN/Daily Mail Reading Comprehension Task.