llama : optimize long word tokenization with WPM by ggerganov · Pull Request #8034 · ggml-org/llama.cpp

ggerganov · 2024-06-20T11:48:37Z

fix #8029

more efficient "longest token" search for very long words, utilizing vocab.max_token_len
reuse llm_tokenizer_wpm instance in loop
reserve array in unicode_cpts_from_utf8

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ggml-ci

llama : optimize long word tokenization with WPM

677bf2e

ggml-ci

ggerganov force-pushed the gg/max-token-length branch from fb29bda to 677bf2e Compare June 20, 2024 11:50

ggerganov mentioned this pull request Jun 20, 2024

Bug: Embedding endpoint takes exponential time to process a long unknown token #8029

Closed

ggerganov merged commit a927b0f into master Jun 21, 2024

ggerganov deleted the gg/max-token-length branch June 21, 2024 05:51

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

llama : optimize long word tokenization with WPM (ggml-org#8034)

c2e3832

ggml-ci

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

llama : optimize long word tokenization with WPM (ggml-org#8034)

46e0320

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : optimize long word tokenization with WPM#8034

llama : optimize long word tokenization with WPM#8034
ggerganov merged 1 commit intomasterfrom
gg/max-token-length

ggerganov commented Jun 20, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ggerganov commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ggerganov commented Jun 20, 2024 •

edited

Loading