README: add graphic for matrix multiplication#6881
README: add graphic for matrix multiplication#6881JohannesGaessler merged 1 commit intoggml-org:masterfrom
Conversation
ggerganov
left a comment
There was a problem hiding this comment.
The main reason for the current layout is that I wanted matrix multiplications to be expressed as dot products of rows of elements that are ordered sequentially in memory. Normally, the result C_ij is defined as the product of i-th row in A by the j-th column in B. But accessing a column in a row-major array is not cache friendly, so I figured it would be better to have the matrix B transposed in order to perform the dot products in a cache-friendly manner - multiply row by row. The result is stored also in transposed form since this fits nicely in the transformer architecture - the result of a matrix multiplication is often used afterwards as the "B" for the next matrix multiplication:
B_1 = A_0 x B_0
B_2 = A_1 x B_1
...
Here the A's are the weights and the B's are the activations.
I guess instead of saying "transposed", we can also say "stored in column-major order" as you have noted. And probably this makes more sense.
It's a nice graphic to have. Though when I draw the arrays on paper I always draw them in the way they are stored in memory, so for me B^T rows in the picture going vertically is confusing. But I understand it
There is also this description, which I'm not sure if it helps or not: https://github.com/ggerganov/ggml/tree/master/examples/simple
|
Thanks for the high-effort reply. |
While looking at the README regarding matrix memory layout I felt confused regarding the statement
zT = x @ yTbecause the output tensor is transposed. @ggerganov what mental image do you have of the memory layout? Do you imagine basically all tensors in llama.cpp to be transposed, and therefore to be actually column-major? To make sure there are no misunderstandings I adapted a graphic I made before to visualize my mental image (which I suppose would also make sense to add for documentation).I imagine the memory layout on the left whenever I'm thinking about matrix multiplications.