Skip to content

README: add graphic for matrix multiplication#6881

Merged
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:readme-matrix-graphic
Apr 24, 2024
Merged

README: add graphic for matrix multiplication#6881
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:readme-matrix-graphic

Conversation

@JohannesGaessler
Copy link
Copy Markdown
Contributor

While looking at the README regarding matrix memory layout I felt confused regarding the statement zT = x @ yT because the output tensor is transposed. @ggerganov what mental image do you have of the memory layout? Do you imagine basically all tensors in llama.cpp to be transposed, and therefore to be actually column-major? To make sure there are no misunderstandings I adapted a graphic I made before to visualize my mental image (which I suppose would also make sense to add for documentation).

I imagine the memory layout on the left whenever I'm thinking about matrix multiplications.

Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for the current layout is that I wanted matrix multiplications to be expressed as dot products of rows of elements that are ordered sequentially in memory. Normally, the result C_ij is defined as the product of i-th row in A by the j-th column in B. But accessing a column in a row-major array is not cache friendly, so I figured it would be better to have the matrix B transposed in order to perform the dot products in a cache-friendly manner - multiply row by row. The result is stored also in transposed form since this fits nicely in the transformer architecture - the result of a matrix multiplication is often used afterwards as the "B" for the next matrix multiplication:

B_1 = A_0 x B_0 
B_2 = A_1 x B_1
...

Here the A's are the weights and the B's are the activations.

I guess instead of saying "transposed", we can also say "stored in column-major order" as you have noted. And probably this makes more sense.

It's a nice graphic to have. Though when I draw the arrays on paper I always draw them in the way they are stored in memory, so for me B^T rows in the picture going vertically is confusing. But I understand it

There is also this description, which I'm not sure if it helps or not: https://github.com/ggerganov/ggml/tree/master/examples/simple

@JohannesGaessler JohannesGaessler merged commit 784e11d into ggml-org:master Apr 24, 2024
@JohannesGaessler
Copy link
Copy Markdown
Contributor Author

Thanks for the high-effort reply.

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants