README: add graphic for matrix multiplication by JohannesGaessler · Pull Request #6881 · ggml-org/llama.cpp

JohannesGaessler · 2024-04-24T18:07:14Z

While looking at the README regarding matrix memory layout I felt confused regarding the statement zT = x @ yT because the output tensor is transposed. @ggerganov what mental image do you have of the memory layout? Do you imagine basically all tensors in llama.cpp to be transposed, and therefore to be actually column-major? To make sure there are no misunderstandings I adapted a graphic I made before to visualize my mental image (which I suppose would also make sense to add for documentation).

I imagine the memory layout on the left whenever I'm thinking about matrix multiplications.

ggerganov

The main reason for the current layout is that I wanted matrix multiplications to be expressed as dot products of rows of elements that are ordered sequentially in memory. Normally, the result C_ij is defined as the product of i-th row in A by the j-th column in B. But accessing a column in a row-major array is not cache friendly, so I figured it would be better to have the matrix B transposed in order to perform the dot products in a cache-friendly manner - multiply row by row. The result is stored also in transposed form since this fits nicely in the transformer architecture - the result of a matrix multiplication is often used afterwards as the "B" for the next matrix multiplication:

B_1 = A_0 x B_0 
B_2 = A_1 x B_1
...

Here the A's are the weights and the B's are the activations.

I guess instead of saying "transposed", we can also say "stored in column-major order" as you have noted. And probably this makes more sense.

It's a nice graphic to have. Though when I draw the arrays on paper I always draw them in the way they are stored in memory, so for me B^T rows in the picture going vertically is confusing. But I understand it

There is also this description, which I'm not sure if it helps or not: https://github.com/ggerganov/ggml/tree/master/examples/simple

JohannesGaessler · 2024-04-24T19:29:24Z

Thanks for the high-effort reply.

README: add graphic for matrix multiplication

2d5341d

ggerganov approved these changes Apr 24, 2024

View reviewed changes

JohannesGaessler merged commit 784e11d into ggml-org:master Apr 24, 2024

compilade mentioned this pull request Apr 27, 2024

GGUF writer reverses array (tensor) dimensions #6040

Closed

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

README: add graphic for matrix multiplication (ggml-org#6881)

be83989

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

README: add graphic for matrix multiplication (ggml-org#6881)

4adf11a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README: add graphic for matrix multiplication#6881

README: add graphic for matrix multiplication#6881
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:readme-matrix-graphic

JohannesGaessler commented Apr 24, 2024

Uh oh!

ggerganov left a comment

Uh oh!

JohannesGaessler commented Apr 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JohannesGaessler commented Apr 24, 2024

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler commented Apr 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants