common : refactor cli arg parsing by ggerganov · Pull Request #7675 · ggml-org/llama.cpp

ggerganov · 2024-05-31T14:28:20Z

TODO

remove params.instruct
remove params.chatml
params.escape = true by default
params.n_ctx = 0 by default
merge server params in gpt_params
merge retrieval params in gpt_params
merge passkey params in gpt_params

ggml-ci

github-actions · 2024-06-04T00:47:35Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 532 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8809.09ms p(95)=20316.63ms fails=, finish reason: stop=475 truncated=57
Prompt processing (pp): avg=95.21tk/s p(95)=392.2tk/s
Token generation (tg): avg=46.1tk/s p(95)=48.31tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=gg/gpt-params-refactor commit=e87c104dfd5c0710166fb5f7193c4a81128829b2

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 532 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717512233 --> 1717512861
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 377.42, 377.42, 377.42, 377.42, 377.42, 764.65, 764.65, 764.65, 764.65, 764.65, 655.83, 655.83, 655.83, 655.83, 655.83, 682.86, 682.86, 682.86, 682.86, 682.86, 709.57, 709.57, 709.57, 709.57, 709.57, 731.23, 731.23, 731.23, 731.23, 731.23, 735.81, 735.81, 735.81, 735.81, 735.81, 771.26, 771.26, 771.26, 771.26, 771.26, 769.62, 769.62, 769.62, 769.62, 769.62, 780.73, 780.73, 780.73, 780.73, 780.73, 802.39, 802.39, 802.39, 802.39, 802.39, 849.46, 849.46, 849.46, 849.46, 849.46, 847.19, 847.19, 847.19, 847.19, 847.19, 824.18, 824.18, 824.18, 824.18, 824.18, 828.69, 828.69, 828.69, 828.69, 828.69, 835.16, 835.16, 835.16, 835.16, 835.16, 836.16, 836.16, 836.16, 836.16, 836.16, 840.77, 840.77, 840.77, 840.77, 840.77, 824.38, 824.38, 824.38, 824.38, 824.38, 826.68, 826.68, 826.68, 826.68, 826.68, 833.99, 833.99, 833.99, 833.99, 833.99, 837.34, 837.34, 837.34, 837.34, 837.34, 819.34, 819.34, 819.34, 819.34, 819.34, 817.51, 817.51, 817.51, 817.51, 817.51, 819.14, 819.14, 819.14, 819.14, 819.14, 819.99, 819.99, 819.99, 819.99, 819.99, 833.21, 833.21, 833.21, 833.21, 833.21, 833.49, 833.49, 833.49, 833.49, 833.49, 833.91, 833.91, 833.91, 833.91, 833.91, 835.39, 835.39, 835.39, 835.39, 835.39, 837.78, 837.78, 837.78, 837.78, 837.78, 836.41, 836.41, 836.41, 836.41, 836.41, 840.57, 840.57, 840.57, 840.57, 840.57, 839.15, 839.15, 839.15, 839.15, 839.15, 840.46, 840.46, 840.46, 840.46, 840.46, 843.75, 843.75, 843.75, 843.75, 843.75, 847.7, 847.7, 847.7, 847.7, 847.7, 844.38, 844.38, 844.38, 844.38, 844.38, 844.52, 844.52, 844.52, 844.52, 844.52, 847.15, 847.15, 847.15, 847.15, 847.15, 849.15, 849.15, 849.15, 849.15, 849.15, 848.64, 848.64, 848.64, 848.64, 848.64, 841.88, 841.88, 841.88, 841.88, 841.88, 845.86, 845.86, 845.86, 845.86, 845.86, 846.3, 846.3, 846.3, 846.3, 846.3, 845.68, 845.68, 845.68, 845.68, 845.68, 841.22, 841.22, 841.22, 841.22, 841.22, 845.22, 845.22, 845.22, 845.22, 845.22, 846.55, 846.55, 846.55, 846.55, 846.55, 845.4, 845.4, 845.4, 845.4, 845.4, 843.99, 843.99, 843.99, 843.99, 843.99, 847.86, 847.86, 847.86, 847.86, 847.86, 851.73, 851.73, 851.73, 851.73, 851.73, 856.49, 856.49, 856.49, 856.49, 856.49, 854.74, 854.74, 854.74, 854.74, 854.74, 861.46, 861.46, 861.46, 861.46, 861.46, 863.12, 863.12, 863.12, 863.12, 863.12, 862.61, 862.61, 862.61, 862.61, 862.61, 861.74, 861.74, 861.74, 861.74, 861.74, 862.59, 862.59, 862.59, 862.59, 862.59, 863.79, 863.79, 863.79, 863.79, 863.79]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 532 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717512233 --> 1717512861
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.41, 36.41, 36.41, 36.41, 36.41, 38.9, 38.9, 38.9, 38.9, 38.9, 28.85, 28.85, 28.85, 28.85, 28.85, 28.79, 28.79, 28.79, 28.79, 28.79, 30.59, 30.59, 30.59, 30.59, 30.59, 31.55, 31.55, 31.55, 31.55, 31.55, 32.63, 32.63, 32.63, 32.63, 32.63, 34.11, 34.11, 34.11, 34.11, 34.11, 34.33, 34.33, 34.33, 34.33, 34.33, 34.45, 34.45, 34.45, 34.45, 34.45, 34.42, 34.42, 34.42, 34.42, 34.42, 34.17, 34.17, 34.17, 34.17, 34.17, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 31.79, 31.79, 31.79, 31.79, 31.79, 30.42, 30.42, 30.42, 30.42, 30.42, 30.39, 30.39, 30.39, 30.39, 30.39, 30.68, 30.68, 30.68, 30.68, 30.68, 30.44, 30.44, 30.44, 30.44, 30.44, 30.41, 30.41, 30.41, 30.41, 30.41, 30.37, 30.37, 30.37, 30.37, 30.37, 30.5, 30.5, 30.5, 30.5, 30.5, 30.73, 30.73, 30.73, 30.73, 30.73, 30.53, 30.53, 30.53, 30.53, 30.53, 30.63, 30.63, 30.63, 30.63, 30.63, 30.86, 30.86, 30.86, 30.86, 30.86, 30.77, 30.77, 30.77, 30.77, 30.77, 30.74, 30.74, 30.74, 30.74, 30.74, 30.93, 30.93, 30.93, 30.93, 30.93, 31.05, 31.05, 31.05, 31.05, 31.05, 31.17, 31.17, 31.17, 31.17, 31.17, 31.2, 31.2, 31.2, 31.2, 31.2, 31.28, 31.28, 31.28, 31.28, 31.28, 31.37, 31.37, 31.37, 31.37, 31.37, 31.18, 31.18, 31.18, 31.18, 31.18, 31.09, 31.09, 31.09, 31.09, 31.09, 30.6, 30.6, 30.6, 30.6, 30.6, 30.28, 30.28, 30.28, 30.28, 30.28, 30.48, 30.48, 30.48, 30.48, 30.48, 30.57, 30.57, 30.57, 30.57, 30.57, 30.7, 30.7, 30.7, 30.7, 30.7, 30.86, 30.86, 30.86, 30.86, 30.86, 30.86, 30.86, 30.86, 30.86, 30.86, 30.81, 30.81, 30.81, 30.81, 30.81, 30.77, 30.77, 30.77, 30.77, 30.77, 30.43, 30.43, 30.43, 30.43, 30.43, 28.97, 28.97, 28.97, 28.97, 28.97, 28.94, 28.94, 28.94, 28.94, 28.94, 28.97, 28.97, 28.97, 28.97, 28.97, 28.95, 28.95, 28.95, 28.95, 28.95, 28.9, 28.9, 28.9, 28.9, 28.9, 28.97, 28.97, 28.97, 28.97, 28.97, 29.09, 29.09, 29.09, 29.09, 29.09, 29.11, 29.11, 29.11, 29.11, 29.11, 29.02, 29.02, 29.02, 29.02, 29.02, 28.94, 28.94, 28.94, 28.94, 28.94, 28.99, 28.99, 28.99, 28.99, 28.99, 29.13, 29.13, 29.13, 29.13, 29.13, 29.24, 29.24, 29.24, 29.24, 29.24, 29.33, 29.33, 29.33, 29.33, 29.33, 29.39, 29.39, 29.39, 29.39, 29.39]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 532 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717512233 --> 1717512861
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.14, 0.14, 0.14, 0.14, 0.14, 0.45, 0.45, 0.45, 0.45, 0.45, 0.16, 0.16, 0.16, 0.16, 0.16, 0.27, 0.27, 0.27, 0.27, 0.27, 0.16, 0.16, 0.16, 0.16, 0.16, 0.07, 0.07, 0.07, 0.07, 0.07, 0.08, 0.08, 0.08, 0.08, 0.08, 0.18, 0.18, 0.18, 0.18, 0.18, 0.11, 0.11, 0.11, 0.11, 0.11, 0.22, 0.22, 0.22, 0.22, 0.22, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.23, 0.23, 0.23, 0.23, 0.23, 0.36, 0.36, 0.36, 0.36, 0.36, 0.26, 0.26, 0.26, 0.26, 0.26, 0.33, 0.33, 0.33, 0.33, 0.33, 0.13, 0.13, 0.13, 0.13, 0.13, 0.26, 0.26, 0.26, 0.26, 0.26, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.12, 0.12, 0.12, 0.12, 0.12, 0.31, 0.31, 0.31, 0.31, 0.31, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.24, 0.24, 0.24, 0.24, 0.24, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.18, 0.18, 0.18, 0.18, 0.18, 0.24, 0.24, 0.24, 0.24, 0.24, 0.27, 0.27, 0.27, 0.27, 0.27, 0.3, 0.3, 0.3, 0.3, 0.3, 0.39, 0.39, 0.39, 0.39, 0.39, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.19, 0.19, 0.19, 0.19, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.36, 0.36, 0.36, 0.36, 0.36, 0.58, 0.58, 0.58, 0.58, 0.58, 0.59, 0.59, 0.59, 0.59, 0.59, 0.58, 0.58, 0.58, 0.58, 0.58, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.22, 0.22, 0.22, 0.22, 0.22, 0.32, 0.32, 0.32, 0.32, 0.32, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.31, 0.31, 0.31, 0.31, 0.31, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 532 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717512233 --> 1717512861
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0]

ggml-ci

-ins and --instruct were moved in #7675 I have adjusted the README accordingly. There was no trace of --chatml in the README.

bartowski1182 · 2024-06-05T18:14:00Z

seems this broke some imatrix options, trying to run imatrix with -o or --output-file for an output file yields:

error: unknown argument: -o
usage: ./imatrix [options]

error: unknown argument: --output-file
usage: ./imatrix [options]

though based on the changes you made my guess is that it was never proper, but invalid use of gpt_params_parse made it go through

ericcurtin · 2024-06-29T12:01:25Z

@ggerganov will --instruct be introduced in future again? That was the feature I found most useful in the main cli binary. Or are there other options that can achieve the same result?

Green-Sky · 2024-06-29T13:27:42Z

@ericcurtin look at -i and -if with -r

ericcurtin · 2024-06-29T14:31:05Z

@Green-Sky -i -if -r don't seem to work if looking to create a basic CLI Ollama/ChatGPT-like assistant... --instruct works great for this...

wtarreau · 2024-07-06T17:43:12Z

I too am quite bothered by the removal of "-ins". Previously it used to be really interactive because you had a shell-like prompt ">" inviting you to type. You could distinguish user inputs from outputs in copy-pastes of the output. Now I cannot find an equivalent. I've tried --interactive-first (which is really much less convenient to type than -ins BTW) but there's no invite anymore. For me this is a significant functional regression which will force me to stick to tag b3086 for a while.

I really don't understand why features are removed. I can understand that it's unintended breakage of course, it always happens to any of us, but if the breakage is intentional I don't understand.

wtarreau · 2024-07-06T17:45:52Z

In addition the commit is huge (3800 lines of patch), it's impossible to analyze, too bad it was merged as a huge one and not in small pieces :-(

ericcurtin · 2024-07-06T17:59:20Z

I do kinda have the feature forked here:

https://github.com/ericcurtin/podman-llm

You can run like:

podman-llm run granite

But preferably the feature would be upstream in llama.cpp here...

Green-Sky · 2024-07-06T18:03:11Z

@ericcurtin and conversation mode does not meet your needs? (personally not using it)

  -cnv,  --conversation           run in conversation mode, does not print special tokens and suffix/prefix
                                  if suffix/prefix are not specified, default chat template will be used
                                  (default: false)

wtarreau · 2024-07-06T18:17:41Z

Thank you @Green-Sky it does the job like before indeed. So in the end it's a breaking change caused by a command line argument name change. At least -ins and -cml should be supported transparently (even emit a deprecation warning saying "no longer supported, use -cnv"). The help is so long now that you have little chance of stumbling over the new syntax when looking for the old one (which I did before coming here).

ericcurtin · 2024-07-06T18:54:02Z

@Green-Sky it seems to work sort of, get stranger output with it:

$ podman --root /home/curtine/.local/share/podman-llm/storage run --rm -it --security-opt=label=disable -v/home/curtine:/home/curtine -v/tmp:/tmp -v/home/curtine/.cache/huggingface/:/root/.cache/huggingface/ granite llama-main -m /root/granite-3b-code-instruct.Q4_K_M.gguf --log-disable -cnv

Tell me about podman
http://i.imgur.com/68hI9.png

$ podman --root /home/curtine/.local/share/podman-llm/storage run --rm -it --security-opt=label=disable -v/home/curtine:/home/curtine -v/tmp:/tmp -v/home/curtine/.cache/huggingface/:/root/.cache/huggingface/ granite llama-main -m /root/granite-3b-code-instruct.Q4_K_M.gguf --log-disable --instruct

Tell me about podman
Podman is a free, open-source container engine that uses cgroups to control group of Linux containers. It is similar to Docker, but it offers more security features and is designed to be more user-friendly. It also has a simpler command-line interface than Docker.

Green-Sky · 2024-07-07T08:33:24Z

instruct was hardcoded to use the alpaca prompt template, while conversation loads it from the model (by default), experiment with different prompt templates or explicitly set

         --in-prefix STRING       string to prefix user inputs with (default: empty)
         --in-suffix STRING       string to suffix after user inputs with (default: empty)
         --chat-template JINJA_TEMPLATE
                                  set custom jinja chat template (default: template taken from model's metadata)
                                  if suffix/prefix are specified, template will be disabled
                                  only commonly used templates are accepted:
                                  https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

you can always use:

--verbose-prompt         print a verbose prompt before generation (default: false)

to debug what you get.

ericcurtin · 2024-07-25T03:17:45Z

@Green-Sky in commit d94c6e0 --conversation seems to be completely broken:

llama_new_context_with_model: graph splits = 1
main: chat template example: <|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant


system_info: n_threads = 6 / 12 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 |
error: input is empty

and even when I fixed it was no good.

I had written a tool that was daemonless/serverless, and used the main binary like so:

$ ./ramalama run granite
> Tell me about the Jackson 5
5 is the fifth studio album by the Jackson 5, released on August 1, 1995. It was produced by Rick Rubin and features guest appearances from the late DJ DJ, the late MC Hammer, and the late MC Hammer. The album is notable for its experimental production style and its use of samples from other records. The album's sound incorporates elements of house, techno, and hip hop, and it is widely regarded as a pioneering work in the development of these genres. The album's success was largely due to the release of the single "B.I." in 1991, which peaked at number 1 on the Billboard Hot 100. The album has sold over 200 million copies worldwide and is considered a classic in the world of dance and electronic music.
>

There seems to be no way to match this behaviour now, which is quite frustrating.

* common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params

-ins and --instruct were moved in ggml-org#7675 I have adjusted the README accordingly. There was no trace of --chatml in the README.

* common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params

-ins and --instruct were moved in ggml-org#7675 I have adjusted the README accordingly. There was no trace of --chatml in the README.

ggerganov changed the title ~~common : gpt_params_parse do not print usage~~ common : refactor cli arg parsing May 31, 2024

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label May 31, 2024

github-actions Bot added the examples label May 31, 2024

ggerganov mentioned this pull request Jun 3, 2024

Bug: Default CTX on llama3 causes incoherence in server when 512 tokens passed in output #7609

Closed

ggerganov added 5 commits June 3, 2024 14:51

common : gpt_params_parse do not print usage

123175e

common : rework usage print (wip)

8f717fd

common : valign

eb858ce

common : rework print_usage

d89006b

infill : remove cfg support

9d44c4a

ggerganov force-pushed the gg/gpt-params-refactor branch from f325608 to 9d44c4a Compare June 3, 2024 12:41

ggerganov added 4 commits June 3, 2024 15:59

common : reorder args

bd01569

server : deduplicate parameters

a11eee7

ggml-ci

common : add missing header

4bb9322

ggml-ci

common : remote --random-prompt usages

b47e91e

ggml-ci

github-actions Bot added script Script related python python script changes server labels Jun 3, 2024

ggerganov added 5 commits June 4, 2024 10:50

examples : migrate to gpt_params

e7e0381

ggml-ci

batched-bench : migrate to gpt_params

a149eed

retrieval : migrate to gpt_params

c4b6b83

common : change defaults for escape and n_ctx

ea4665e

common : remove chatml and instruct params

4df8185

ggml-ci

ggerganov marked this pull request as ready for review June 4, 2024 10:08

common : passkey use gpt_params

e87c104

ggerganov merged commit 1442677 into master Jun 4, 2024

ggerganov deleted the gg/gpt-params-refactor branch June 4, 2024 18:23

This was referenced Jun 5, 2024

Bug: -ins command gone from main.exe #7757

Closed

Removing -ins from README.md #7759

Merged

ggerganov mentioned this pull request Jun 5, 2024

feat: add changes to handle jina v2 base code #7596

Merged

ggerganov pushed a commit that referenced this pull request Jun 5, 2024

readme : remove -ins (#7759)

9973e81

-ins and --instruct were moved in #7675 I have adjusted the README accordingly. There was no trace of --chatml in the README.

ggerganov mentioned this pull request Jun 5, 2024

imatrix : migrate to gpt_params #7771

Merged

ericcurtin mentioned this pull request Jul 25, 2024

Feature Request: Reintroduce chat / instruct templates #8681

Closed

4 tasks

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

readme : remove -ins (ggml-org#7759)

6929b25

-ins and --instruct were moved in ggml-org#7675 I have adjusted the README accordingly. There was no trace of --chatml in the README.

Conversation

ggerganov commented May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

github-actions Bot commented Jun 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartowski1182 commented Jun 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericcurtin commented Jun 29, 2024

Uh oh!

Green-Sky commented Jun 29, 2024

Uh oh!

ericcurtin commented Jun 29, 2024

Uh oh!

wtarreau commented Jul 6, 2024

Uh oh!

wtarreau commented Jul 6, 2024

Uh oh!

ericcurtin commented Jul 6, 2024

Uh oh!

Green-Sky commented Jul 6, 2024

Uh oh!

wtarreau commented Jul 6, 2024

Uh oh!

ericcurtin commented Jul 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Green-Sky commented Jul 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericcurtin commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ggerganov commented May 31, 2024 •

edited

Loading

github-actions Bot commented Jun 4, 2024 •

edited

Loading

bartowski1182 commented Jun 5, 2024 •

edited

Loading

ericcurtin commented Jul 6, 2024 •

edited

Loading

Green-Sky commented Jul 7, 2024 •

edited

Loading

ericcurtin commented Jul 25, 2024 •

edited

Loading