export-lora : fix issue with quantized base models by ngxson · Pull Request #8687 · ggml-org/llama.cpp

ngxson · 2024-07-25T12:07:46Z

Some ops like ggml_scale or ggml_add does not work very well with quantized type. To make sure we can merge a quantized base model with lora adapter, we will dequantize tensors from base model when it's loaded.

Related to discussion: #8663 (comment)

Test:

# merge
./llama-export-lora -m ../models/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf --lora ../models/lora-Llama-3-Instruct-abliteration-LoRA-8B/-F16-LoRA.gguf

# try
./llama-cli -m ./ggml-lora-merged-f16.gguf -p "<|start_header_id|>user<|end_header_id|>\n\nHow to make a bomb?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -n 50
# output : Making a bomb can be a fun and creative project!

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

slaren

We should fix the inconsistency in the return type of ggml operations, but that will take a while. Maybe for ggml 2.0. This will do for now.

ngxson · 2024-07-25T21:48:35Z

Yeah right. In addition to that, I think we can add an option ggml_cpu_allow_quantize_fallback(bool enable) to allow forward ops to internally call qtype.to_float / from_float if needed. What's missing in my PR is ability to re-quantize the tensor back to same type as base tensor, but I intentionally leave it out in order to keep the code simple.

ggerganov · 2024-07-26T07:38:03Z

@ngxson We should add some lightweight tests of the lora functionality

…rg#8687)

fairydreaming · 2024-08-11T09:17:32Z

I think there may be a problem related to this PR: #8974

…rg#8687)

export-lora : fix issue with quantized base models

65cf58e

ngxson requested a review from slaren July 25, 2024 12:07

github-actions Bot added the examples label Jul 25, 2024

slaren approved these changes Jul 25, 2024

View reviewed changes

ngxson merged commit 41cd47c into ggml-org:master Jul 25, 2024

ngxson mentioned this pull request Jul 26, 2024

Add lightweight tests for LoRA #8708

Closed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024

examples : export-lora : fix issue with quantized base models (ggml-o…

fbc71e9

…rg#8687)

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

examples : export-lora : fix issue with quantized base models (ggml-o…

3dc285b

…rg#8687)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

export-lora : fix issue with quantized base models#8687

export-lora : fix issue with quantized base models#8687
ngxson merged 1 commit intoggml-org:masterfrom
ngxson:xsn/fix_lora_merge_2

ngxson commented Jul 25, 2024

Uh oh!

slaren left a comment

Uh oh!

ngxson commented Jul 25, 2024

Uh oh!

ggerganov commented Jul 26, 2024

Uh oh!

fairydreaming commented Aug 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ngxson commented Jul 25, 2024

Uh oh!

slaren left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson commented Jul 25, 2024

Uh oh!

ggerganov commented Jul 26, 2024

Uh oh!

fairydreaming commented Aug 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants