Per token attributes by jaime-m-p · Pull Request #7685 · ggml-org/llama.cpp

jaime-m-p · 2024-06-01T21:04:29Z

Token attributes may differ from model to model.
This is affecting the special token splitting.

This PR implements a general way to manage per token attributes.
For now only attributes lstrip and rstrip are implemented, neccesary for models phi-3 and jina-v2-es/de.

phi-3/tokenizer_config.json:

    "2": {
      "content": "</s>",
      "lstrip": false,
      "normalized": false,
      "rstrip": true,  <--
      "single_word": false,
      "special": false
    },

dolphin-2.8-mistral-7b-v02/tokenizer_config.json:

    "2": {
      "content": "</s>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,  <--
      "single_word": false,
      "special": true
    },

This information is currently not stored in the GGUF (WIP #7379).
I'm using harcoded attributes and tokenizer/model names until we have a better option.

github-actions · 2024-06-03T15:48:23Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 530 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8815.15ms p(95)=21280.5ms fails=, finish reason: stop=471 truncated=59
Prompt processing (pp): avg=96.72tk/s p(95)=373.24tk/s
Token generation (tg): avg=45.88tk/s p(95)=49.54tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=per-token-attribs commit=ac40ff0e5049eb7f1674e44f571a791612d3735a

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 530 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717429075 --> 1717429697
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 577.66, 577.66, 577.66, 577.66, 577.66, 559.23, 559.23, 559.23, 559.23, 559.23, 564.02, 564.02, 564.02, 564.02, 564.02, 606.33, 606.33, 606.33, 606.33, 606.33, 668.3, 668.3, 668.3, 668.3, 668.3, 684.06, 684.06, 684.06, 684.06, 684.06, 701.5, 701.5, 701.5, 701.5, 701.5, 727.8, 727.8, 727.8, 727.8, 727.8, 728.12, 728.12, 728.12, 728.12, 728.12, 741.81, 741.81, 741.81, 741.81, 741.81, 775.06, 775.06, 775.06, 775.06, 775.06, 773.56, 773.56, 773.56, 773.56, 773.56, 770.63, 770.63, 770.63, 770.63, 770.63, 820.28, 820.28, 820.28, 820.28, 820.28, 830.79, 830.79, 830.79, 830.79, 830.79, 832.6, 832.6, 832.6, 832.6, 832.6, 835.21, 835.21, 835.21, 835.21, 835.21, 833.47, 833.47, 833.47, 833.47, 833.47, 855.03, 855.03, 855.03, 855.03, 855.03, 859.49, 859.49, 859.49, 859.49, 859.49, 858.37, 858.37, 858.37, 858.37, 858.37, 849.15, 849.15, 849.15, 849.15, 849.15, 851.74, 851.74, 851.74, 851.74, 851.74, 825.35, 825.35, 825.35, 825.35, 825.35, 826.58, 826.58, 826.58, 826.58, 826.58, 829.09, 829.09, 829.09, 829.09, 829.09, 843.37, 843.37, 843.37, 843.37, 843.37, 841.14, 841.14, 841.14, 841.14, 841.14, 839.46, 839.46, 839.46, 839.46, 839.46, 840.05, 840.05, 840.05, 840.05, 840.05, 844.95, 844.95, 844.95, 844.95, 844.95, 844.53, 844.53, 844.53, 844.53, 844.53, 842.54, 842.54, 842.54, 842.54, 842.54, 838.54, 838.54, 838.54, 838.54, 838.54, 850.95, 850.95, 850.95, 850.95, 850.95, 855.16, 855.16, 855.16, 855.16, 855.16, 840.27, 840.27, 840.27, 840.27, 840.27, 838.2, 838.2, 838.2, 838.2, 838.2, 841.51, 841.51, 841.51, 841.51, 841.51, 844.15, 844.15, 844.15, 844.15, 844.15, 844.16, 844.16, 844.16, 844.16, 844.16, 852.38, 852.38, 852.38, 852.38, 852.38, 854.07, 854.07, 854.07, 854.07, 854.07, 842.16, 842.16, 842.16, 842.16, 842.16, 841.07, 841.07, 841.07, 841.07, 841.07, 838.29, 838.29, 838.29, 838.29, 838.29, 839.92, 839.92, 839.92, 839.92, 839.92, 836.22, 836.22, 836.22, 836.22, 836.22, 832.79, 832.79, 832.79, 832.79, 832.79, 834.04, 834.04, 834.04, 834.04, 834.04, 831.66, 831.66, 831.66, 831.66, 831.66, 831.05, 831.05, 831.05, 831.05, 831.05, 832.46, 832.46, 832.46, 832.46, 832.46, 836.91, 836.91, 836.91, 836.91, 836.91, 835.9, 835.9, 835.9, 835.9, 835.9, 840.17, 840.17, 840.17, 840.17, 840.17, 841.29, 841.29, 841.29, 841.29, 841.29, 842.21, 842.21, 842.21, 842.21, 842.21, 841.57, 841.57, 841.57, 841.57, 841.57, 842.87, 842.87, 842.87, 842.87, 842.87, 842.48, 842.48]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 530 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717429075 --> 1717429697
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 43.7, 43.7, 43.7, 43.7, 43.7, 43.79, 43.79, 43.79, 43.79, 43.79, 28.48, 28.48, 28.48, 28.48, 28.48, 30.61, 30.61, 30.61, 30.61, 30.61, 31.88, 31.88, 31.88, 31.88, 31.88, 32.98, 32.98, 32.98, 32.98, 32.98, 34.75, 34.75, 34.75, 34.75, 34.75, 35.27, 35.27, 35.27, 35.27, 35.27, 35.21, 35.21, 35.21, 35.21, 35.21, 34.9, 34.9, 34.9, 34.9, 34.9, 34.22, 34.22, 34.22, 34.22, 34.22, 34.03, 34.03, 34.03, 34.03, 34.03, 33.98, 33.98, 33.98, 33.98, 33.98, 33.3, 33.3, 33.3, 33.3, 33.3, 33.19, 33.19, 33.19, 33.19, 33.19, 31.33, 31.33, 31.33, 31.33, 31.33, 30.69, 30.69, 30.69, 30.69, 30.69, 31.05, 31.05, 31.05, 31.05, 31.05, 30.87, 30.87, 30.87, 30.87, 30.87, 30.92, 30.92, 30.92, 30.92, 30.92, 30.9, 30.9, 30.9, 30.9, 30.9, 30.92, 30.92, 30.92, 30.92, 30.92, 31.04, 31.04, 31.04, 31.04, 31.04, 31.12, 31.12, 31.12, 31.12, 31.12, 30.94, 30.94, 30.94, 30.94, 30.94, 31.22, 31.22, 31.22, 31.22, 31.22, 31.12, 31.12, 31.12, 31.12, 31.12, 31.08, 31.08, 31.08, 31.08, 31.08, 31.09, 31.09, 31.09, 31.09, 31.09, 31.24, 31.24, 31.24, 31.24, 31.24, 31.28, 31.28, 31.28, 31.28, 31.28, 31.29, 31.29, 31.29, 31.29, 31.29, 31.35, 31.35, 31.35, 31.35, 31.35, 31.44, 31.44, 31.44, 31.44, 31.44, 31.29, 31.29, 31.29, 31.29, 31.29, 31.19, 31.19, 31.19, 31.19, 31.19, 30.87, 30.87, 30.87, 30.87, 30.87, 30.99, 30.99, 30.99, 30.99, 30.99, 31.2, 31.2, 31.2, 31.2, 31.2, 31.29, 31.29, 31.29, 31.29, 31.29, 31.35, 31.35, 31.35, 31.35, 31.35, 31.39, 31.39, 31.39, 31.39, 31.39, 31.14, 31.14, 31.14, 31.14, 31.14, 31.05, 31.05, 31.05, 31.05, 31.05, 30.68, 30.68, 30.68, 30.68, 30.68, 29.35, 29.35, 29.35, 29.35, 29.35, 29.3, 29.3, 29.3, 29.3, 29.3, 29.19, 29.19, 29.19, 29.19, 29.19, 29.19, 29.19, 29.19, 29.19, 29.19, 29.26, 29.26, 29.26, 29.26, 29.26, 29.29, 29.29, 29.29, 29.29, 29.29, 29.3, 29.3, 29.3, 29.3, 29.3, 29.29, 29.29, 29.29, 29.29, 29.29, 29.23, 29.23, 29.23, 29.23, 29.23, 29.17, 29.17, 29.17, 29.17, 29.17, 29.08, 29.08, 29.08, 29.08, 29.08, 29.04, 29.04, 29.04, 29.04, 29.04, 29.01, 29.01, 29.01, 29.01, 29.01, 29.11, 29.11, 29.11, 29.11, 29.11, 29.22, 29.22, 29.22, 29.22, 29.22, 29.28, 29.28]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 530 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717429075 --> 1717429697
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.11, 0.11, 0.11, 0.11, 0.11, 0.48, 0.48, 0.48, 0.48, 0.48, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.21, 0.21, 0.21, 0.21, 0.21, 0.18, 0.18, 0.18, 0.18, 0.18, 0.1, 0.1, 0.1, 0.1, 0.1, 0.32, 0.32, 0.32, 0.32, 0.32, 0.38, 0.38, 0.38, 0.38, 0.38, 0.36, 0.36, 0.36, 0.36, 0.36, 0.38, 0.38, 0.38, 0.38, 0.38, 0.17, 0.17, 0.17, 0.17, 0.17, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.22, 0.22, 0.22, 0.22, 0.22, 0.12, 0.12, 0.12, 0.12, 0.12, 0.19, 0.19, 0.19, 0.19, 0.19, 0.32, 0.32, 0.32, 0.32, 0.32, 0.1, 0.1, 0.1, 0.1, 0.1, 0.14, 0.14, 0.14, 0.14, 0.14, 0.31, 0.31, 0.31, 0.31, 0.31, 0.18, 0.18, 0.18, 0.18, 0.18, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.28, 0.28, 0.28, 0.28, 0.28, 0.19, 0.19, 0.19, 0.19, 0.19, 0.3, 0.3, 0.3, 0.3, 0.3, 0.06, 0.06, 0.06, 0.06, 0.06, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.25, 0.25, 0.25, 0.25, 0.25, 0.55, 0.55, 0.55, 0.55, 0.55, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.33, 0.33, 0.33, 0.33, 0.33, 0.19, 0.19, 0.19, 0.19, 0.19, 0.32, 0.32, 0.32, 0.32, 0.32, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.2, 0.2, 0.2, 0.2, 0.2, 0.28, 0.28, 0.28, 0.28, 0.28, 0.26, 0.26, 0.26, 0.26, 0.26, 0.24, 0.24, 0.24, 0.24, 0.24, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 530 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717429075 --> 1717429697
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0]

* Add per token attributes enum * Using phi-3 for testing 'rstrip' * Using jina-v2 for testing 'lstrip' * Brute force test for 'lstrip' and 'rstrip' * Implement 'rstrip' and 'lstrip' * Update phi-3 GGUF file (obsolete since 917dc8c) * Replace llama_token_type with llama_token_attribs

jaime-m-p added 5 commits June 1, 2024 19:42

Add per token attrib enum

cec6a3b

Using phi-3 for testing 'rstrip'

3ead1b9

bugfix: assertions, wrong special token list

33de247

Implement 'rstrip' properly

ada961c

Refactor + add 'jina-v2' for testing 'lstrip'

01c9229

github-actions Bot added testing Everything test related python python script changes labels Jun 1, 2024

jaime-m-p mentioned this pull request Jun 1, 2024

Bug: SPM tokenization breaks in at least one specific case. #7629

Closed

jaime-m-p force-pushed the per-token-attribs branch from cc55198 to 01c9229 Compare June 1, 2024 23:47

Update phi-3 GGUF file (obsolete since 917dc8c)

8564c19

jaime-m-p mentioned this pull request Jun 2, 2024

fix: don't add space after special tokens in SPM #7697

Closed

jaime-m-p added 3 commits June 3, 2024 00:51

Update brute force test: testing 'lstrip' and 'rstrip'

83cac0e

Fix previous commit

54e9f23

Replace llama_token_type with llama_token_attribs

ac40ff0

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jun 3, 2024

ggerganov approved these changes Jun 3, 2024

View reviewed changes

Comment thread llama.cpp Outdated

Rename token attributes

18f5fc7

jaime-m-p merged commit 3b38d48 into ggml-org:master Jun 4, 2024

CISC reviewed Jun 4, 2024

View reviewed changes

Comment thread llama.h

jaime-m-p mentioned this pull request Jun 4, 2024

Fix per token atrributes bits #7749

Merged

compilade mentioned this pull request Jun 12, 2024

llama : support Jamba hybrid Transformer-Mamba models #7531

Merged

8 tasks

compilade mentioned this pull request Dec 18, 2024

Add Falcon3 support and Fix issue #10875 #10883

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per token attributes#7685

Per token attributes#7685
jaime-m-p merged 10 commits intoggml-org:masterfrom
jaime-m-p:per-token-attribs

jaime-m-p commented Jun 1, 2024

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jaime-m-p commented Jun 1, 2024

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants