Skip to content

Add Q4_3 quantization (ARM NEON)#1082

Merged
ggerganov merged 1 commit intomasterfrom
q4_3
Apr 20, 2023
Merged

Add Q4_3 quantization (ARM NEON)#1082
ggerganov merged 1 commit intomasterfrom
q4_3

Conversation

@ggerganov
Copy link
Copy Markdown
Member

@ggerganov ggerganov commented Apr 20, 2023

Initial Q4_3 implementation runs at ~82 ms / token on M1.
Need to see if we can optimize that somehow.

For example Q4_1 runs at ~55 ms / token, so there is probably lots of room for improvement

#define QK4_3 16
typedef struct {
    ggml_fp16_t d;         // delta
    ggml_fp16_t m;         // min
    uint8_t qs[QK4_3 / 2]; // nibbles / quants
} block_q4_3;

Merging this, although the speed is not satisfying. We have to try to get it as fast as Q4_1.
We might have to change the block_q4_3 if needed to achieve this

@ggerganov ggerganov force-pushed the q4_3 branch 2 times, most recently from eed22ae to dff03c0 Compare April 20, 2023 16:51
@ggerganov ggerganov marked this pull request as ready for review April 20, 2023 17:18
@ggerganov ggerganov merged commit e0305ea into master Apr 20, 2023
@ggerganov ggerganov deleted the q4_3 branch April 20, 2023 17:35
Copy link
Copy Markdown
Contributor

@prusnak prusnak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

M1 16 GB benchmark:

7B q4_3 4 threads: 180 ms/token
7B q4_3 8 threads: 280 ms/token

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
* Webui: fix message scroll back due to setPending

smooth scroll

remove throttle

increase scroll margin

# Conflicts:
#	examples/server/public/index.html.gz
#	examples/server/webui/dist/index.html
#	examples/server/webui/src/utils/app.context.tsx

* webui: don't scroll to bottom when conversation changes or edit message

# Conflicts:
#	examples/server/public/index.html.gz
#	examples/server/webui/dist/index.html

* Webui: fix save config error

# Conflicts:
#	examples/server/public/index.html.gz
#	examples/server/webui/dist/index.html

* Webui: add api key to request model name

# Conflicts:
#	examples/server/public/index.html.gz
#	examples/server/webui/dist/index.html

* Update

* webui: fix loading dots display issue

# Conflicts:
#	examples/server/public/index.html.gz
#	examples/server/webui/dist/index.html
#	examples/server/webui/src/components/ChatMessage.tsx

* Webui: cancel scroll when user moves up

---------

Co-authored-by: firecoperana <firecoperana>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants