Skip to content

llama : optimize long word tokenization with WPM#8034

Merged
ggerganov merged 1 commit intomasterfrom
gg/max-token-length
Jun 21, 2024
Merged

llama : optimize long word tokenization with WPM#8034
ggerganov merged 1 commit intomasterfrom
gg/max-token-length

Conversation

@ggerganov
Copy link
Copy Markdown
Member

@ggerganov ggerganov commented Jun 20, 2024

fix #8029

  • more efficient "longest token" search for very long words, utilizing vocab.max_token_len
  • reuse llm_tokenizer_wpm instance in loop
  • reserve array in unicode_cpts_from_utf8

@ggerganov ggerganov force-pushed the gg/max-token-length branch from fb29bda to 677bf2e Compare June 20, 2024 11:50
@ggerganov ggerganov merged commit a927b0f into master Jun 21, 2024
@ggerganov ggerganov deleted the gg/max-token-length branch June 21, 2024 05:51
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Embedding endpoint takes exponential time to process a long unknown token

1 participant