Bug: Tokenization adds a space to the first non-special token

### What happened?

When tokenizing text with LlamaSharp, a C# library directly using llama.cpp, a space is added to the first non-special token.
Unfortunately, this changes the prompt templates and even breaks some of my use cases (e.g. stripping the last tokens then using anti-prompts in LlamaSharp).

In particular, it would be good if the following invariant always applied:
`detokenize(tokenize("text", addBos:false)) == "text"`

But this is not the case:
`llama-tokenize.exe --no-bos -m "gemma-1.1-2b-it.Q6_K.gguf" -p "<pad>text<eos>"`
```
     0 -> '<pad>'
  2793 -> ' text'
     1 -> '<eos>'
```

Token `2793 -> ' text'` should have been `1082 -> 'text'`.

Is there any possibility to fix this behavior, telling llama.cpp not to add any spaces when tokenizing and just process the raw text?

### Name and Version

version: 3412 (3807c3de)
built with MSVC 19.29.30154.0 for x64


### What operating system are you seeing the problem on?

Windows 10

### Relevant log output

```shell
$ llama-tokenize.exe --no-bos -m "gemma-1.1-2b-it.Q6_K.gguf" -p "<pad>text<eos>"

llama_model_loader: loaded meta data with 24 key-value pairs and 164 tensors from gemma-1.1-2b-it.Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma
llama_model_loader: - kv   1:                               general.name str              = gemma-1.1-2b-it
llama_model_loader: - kv   2:                       gemma.context_length u32              = 8192
llama_model_loader: - kv   3:                     gemma.embedding_length u32              = 2048
llama_model_loader: - kv   4:                          gemma.block_count u32              = 18
llama_model_loader: - kv   5:                  gemma.feed_forward_length u32              = 16384
llama_model_loader: - kv   6:                 gemma.attention.head_count u32              = 8
llama_model_loader: - kv   7:              gemma.attention.head_count_kv u32              = 1
llama_model_loader: - kv   8:     gemma.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv   9:                 gemma.attention.key_length u32              = 256
llama_model_loader: - kv  10:               gemma.attention.value_length u32              = 256
llama_model_loader: - kv  11:                          general.file_type u32              = 18
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,256000]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,256000]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,256000]  = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  20:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  21:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  22:                    tokenizer.chat_template str              = {{ bos_token }}{% if messages[0]['rol...
llama_model_loader: - kv  23:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   37 tensors
llama_model_loader: - type q6_K:  127 tensors
llm_load_vocab: special tokens cache size = 4
llm_load_vocab: token to piece cache size = 1.6014 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = gemma
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 256000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 1
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 2.51 B
llm_load_print_meta: model size       = 1.91 GiB (6.56 BPW)
llm_load_print_meta: general.name     = gemma-1.1-2b-it
llm_load_print_meta: BOS token        = 2 '<bos>'
llm_load_print_meta: EOS token        = 1 '<eos>'
llm_load_print_meta: UNK token        = 3 '<unk>'
llm_load_print_meta: PAD token        = 0 '<pad>'
llm_load_print_meta: LF token         = 227 '<0x0A>'
llm_load_print_meta: EOT token        = 107 '<end_of_turn>'
llm_load_print_meta: max token length = 93
llama_model_load: vocab only - skipping tensors
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 0.0
llama_new_context_with_model: freq_scale = 1
     0 -> '<pad>'
  2793 -> ' text'
     1 -> '<eos>'
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Tokenization adds a space to the first non-special token #8584

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Tokenization adds a space to the first non-special token #8584

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions