ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) by ggerganov · Pull Request #5951 · ggml-org/llama.cpp

ggerganov · 2024-03-09T07:41:13Z

The struct block_q8_1 on the CPU uses float instead of ggml_fp16_t:

#define QK8_1 32
typedef struct {
    float d;               // delta
    float s;               // d * sum(qs[i])
    int8_t  qs[QK8_1];     // quants
} block_q8_1;
static_assert(sizeof(block_q8_1) == 2*sizeof(float) + QK8_1, "wrong q8_1 block size/padding");

ggerganov · 2024-03-09T07:41:53Z

@snadampal I haven't tested this change - please give it a try just in case

snadampal · 2024-03-09T14:00:17Z

Hi @ggerganov , LGTM. I have tested it on AWS Graviton3 based c7g instances.

ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla)

9914a71

ggerganov merged commit 8380ecf into master Mar 9, 2024

ggerganov deleted the gg/fix-mmla-q4_1-q8_1 branch March 9, 2024 15:36

hazelnutcloud pushed a commit to hazelnutcloud/llama.cpp that referenced this pull request Mar 10, 2024

ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (ggml-org#5951)

1d0795e

NeoZhangJianyu pushed a commit to NeoZhangJianyu/llama.cpp that referenced this pull request Mar 12, 2024

ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (ggml-org#5951)

3548b1c

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (ggml-org#5951)

71ade8c

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (ggml-org#5951)

c0fb395

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (ggml-org#5951)

6988d2f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla)#5951

ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla)#5951
ggerganov merged 1 commit intomasterfrom
gg/fix-mmla-q4_1-q8_1

ggerganov commented Mar 9, 2024

Uh oh!

ggerganov commented Mar 9, 2024

Uh oh!

snadampal commented Mar 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggerganov commented Mar 9, 2024

Uh oh!

ggerganov commented Mar 9, 2024

Uh oh!

snadampal commented Mar 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants