Skip to content

feat: integrate fastokens BPE tokenizer backend#7387

Merged
biswapanda merged 18 commits into
mainfrom
bis/fast-tokens-dynamo
Mar 15, 2026
Merged

feat: integrate fastokens BPE tokenizer backend#7387
biswapanda merged 18 commits into
mainfrom
bis/fast-tokens-dynamo

Conversation

@biswapanda
Copy link
Copy Markdown
Contributor

@biswapanda biswapanda commented Mar 14, 2026

Overview:

Add the fastokens crate (v0.1.0 from github.com/Atero-ai/fastokens) as an always-on workspace dependency for high-performance BPE encoding.

Related PR: #7388

Details:

Core integration:

  • lib/llm/src/tokenizers/fast.rs: hybrid FastTokenizer that encodes with fastokens and decodes with HuggingFace, with 4 unit tests
  • lib/llm/src/model_card.rs: tokenizer() checks DYN_TOKENIZER=fastokens env var, falls back to HuggingFace on load failure

Frontend CLI:

  • --tokenizer flag / DYN_TOKENIZER env var with values "default" (HuggingFace) or "fastokens"

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features

    • Added --dyn-tokenizer-backend command-line option to select different tokenizer backends.
    • New high-performance tokenizer implementation now available.
  • Tests

    • Added tokenizer test suite and validation data.

Add the fastokens crate (v0.1.0 from github.com/Atero-ai/fastokens) as
an always-on workspace dependency for high-performance BPE encoding.

Core integration:
- lib/llm/src/tokenizers/fast.rs: hybrid FastTokenizer that encodes with
  fastokens and decodes with HuggingFace, with 4 unit tests
- lib/llm/src/model_card.rs: tokenizer() checks DYN_TOKENIZER_BACKEND=fasttokens
  env var, falls back to HuggingFace on load failure

Frontend CLI:
- --dyn-tokenizer-backend flag / DYN_TOKENIZER_BACKEND env var with
  values "default" (HuggingFace) or "fasttokens"
@biswapanda biswapanda self-assigned this Mar 14, 2026
@biswapanda biswapanda requested a review from a team as a code owner March 14, 2026 21:52
@biswapanda biswapanda requested a review from a team March 14, 2026 21:52
@biswapanda biswapanda requested a review from a team as a code owner March 14, 2026 21:52
@github-actions github-actions Bot added feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Mar 14, 2026
@biswapanda biswapanda enabled auto-merge (squash) March 14, 2026 21:57
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 14, 2026

Walkthrough

This pull request introduces support for a new "fasttokens" tokenizer backend. Changes include: adding the fastokens workspace dependency from a Git repository, exposing CLI configuration for backend selection, implementing a hybrid FastTokenizer struct that uses fastokens for encoding and HuggingFaceTokenizer for decoding, and providing test data for validation.

Changes

Cohort / File(s) Summary
Workspace Dependencies
Cargo.toml, lib/llm/Cargo.toml
Added fastokens workspace dependency from GitHub repository for high-performance tokenization.
CLI Configuration
components/src/dynamo/frontend/frontend_args.py
Added --dyn-tokenizer-backend CLI argument and tokenizer_backend field to FrontendConfig with environment variable support.
Environment Setup
components/src/dynamo/frontend/main.py
Added environment variable propagation for DYN_TOKENIZER_BACKEND when backend is set to "fasttokens".
Tokenizer Abstraction
lib/llm/src/tokenizers.rs
Exposed new fast module and FastTokenizer type in public API.
Tokenizer Implementation
lib/llm/src/tokenizers/fast.rs
Implemented hybrid FastTokenizer using fastokens for encoding with batch processing via rayon and HuggingFaceTokenizer for decoding, including comprehensive unit tests.
Dynamic Backend Selection
lib/llm/src/model_card.rs
Added environment-driven tokenizer selection logic, attempting FastTokenizer when DYN_TOKENIZER_BACKEND=fasttokens before falling back to HuggingFace.
Test Data
lib/llm/tests/data/sample-models/minimal-bpe/tokenizer.json
Added minimal BPE tokenizer configuration file for testing round-trip encode/decode workflows.
Project Metadata
lib/bindings/python/pyproject.toml
Commented out explicit license and license-files metadata declarations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop, hop, the tokens fly so fast,
With fastokens here at last!
Encoding swift with fuzzy cheer,
New backends blooming far and near,
The code now hops with speed so bright! 🚀✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: integrate fastokens BPE tokenizer backend' accurately summarizes the main change: integrating a new high-performance BPE tokenizer backend throughout the codebase.
Description check ✅ Passed The PR description covers the required sections (Overview, Details, Related Issues) and provides clear context for the changes, though the related issues reference uses a placeholder format.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@biswapanda biswapanda changed the title feat: integrate fasttokens high-performance BPE tokenizer backend feat: integrate fasttokens BPE tokenizer backend Mar 14, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
lib/llm/src/tokenizers/fast.rs (1)

125-143: Make the decode-stream test assert emitted text, not just absence of errors.

Right now this only proves step() doesn't fail. Empty or duplicated chunks would still pass. Please accumulate the returned chunks and compare them with a reference decode for the continuation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/tokenizers/fast.rs` around lines 125 - 143, The test
test_fast_with_decode_stream currently only checks that stream.step() doesn't
error; change it to accumulate the emitted chunks from
wrapper.decode_stream(&prompt_ids, true) by collecting each step(...) return
(concatenate non-empty chunks) into a single string, then obtain the expected
text by decoding the continuation (e.g. via wrapper.decode(cont_ids) or
wrapper.decode(continuation)) and assert equality (or assert that the
accumulated string contains the expected continuation); update references in the
test around FastTokenizer::from_file, TokenizerWrapper::from,
wrapper.decode_stream, and stream.step to perform this accumulation and
comparison instead of a no-op loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Cargo.toml`:
- Line 49: The fastokens Git dependency is floating and needs an immutable ref;
update the Cargo.toml dependency line for fastokens (the entry fastokens = { git
= "https://github.com/Atero-ai/fastokens", version = "0.1.0" }) to include a rev
(or tag) field with the specific commit hash or tag you want pinned (e.g., rev =
"COMMIT_SHA") so the manifest is reproducible; ensure you use the exact commit
SHA or tag from the fastokens repo and run cargo update -p fastokens if needed
to regenerate the lockfile.

In `@components/src/dynamo/frontend/frontend_args.py`:
- Around line 429-440: The CLI currently allows any string for the
--dyn-tokenizer-backend argument; update the add_argument call that defines
flag_name="--dyn-tokenizer-backend" (dest="tokenizer_backend") to restrict
values to the supported set by adding an explicit choices constraint (e.g.,
["default","fasttokens"]) so parsing fails fast for invalid CLI/env values and
documents the accepted options in the help text.

In `@components/src/dynamo/frontend/main.py`:
- Around line 168-169: The current logic only sets
os.environ["DYN_TOKENIZER_BACKEND"] = "fasttokens" when config.tokenizer_backend
== "fasttokens" and never clears it; update the branch around
config.tokenizer_backend in main.py so that when config.tokenizer_backend ==
"default" you remove/unset DYN_TOKENIZER_BACKEND (e.g., pop or del from
os.environ if present), keep setting it for "fasttokens" as before, and ensure
no stale environment value remains when the default backend is chosen.

In `@lib/bindings/python/pyproject.toml`:
- Around line 25-26: Uncomment and replace the outdated dict license entries in
pyproject.toml by adding explicit SPDX and license-files fields: remove the
commented lines and add license = "Apache-2.0" and license-files = ["LICENSE"]
so the project uses the PEP 639-compatible string expression and explicit
license file declaration (update the existing commented/old keys related to
license and license-files).

In `@lib/llm/src/model_card.rs`:
- Around line 384-386: The current logic silently treats any non-"fasttokens"
DYN_TOKENIZER_BACKEND as the default behavior; change the handling to explicitly
match allowed values: if the env var is "fasttokens" set use_fast = true, if it
is "default" (or an explicit "slow"/"rust" token value you support) set use_fast
= false, and for any other value emit a clear warning or return an error so
misconfiguration is visible; update the code that reads DYN_TOKENIZER_BACKEND
(the use_fast assignment) to perform a match on the string and log/propagate an
error on unsupported values rather than silently falling back.
- Around line 394-399: When attempting the fast tokenizer, don't call
p.to_str().ok_or_else(...) which returns early on non-UTF-8 paths and prevents
the HuggingFace fallback; instead check p.to_str() with an if let/ match and
only call crate::tokenizers::FastTokenizer::from_file(path_str) when to_str()
returns Some. If to_str() is None, skip the fast-tokenizer attempt (do not
return an error) so the existing HF loader/fallback logic can run; also when
FastTokenizer::from_file fails, allow the code to continue to the HuggingFace
fallback rather than short-circuiting.

---

Nitpick comments:
In `@lib/llm/src/tokenizers/fast.rs`:
- Around line 125-143: The test test_fast_with_decode_stream currently only
checks that stream.step() doesn't error; change it to accumulate the emitted
chunks from wrapper.decode_stream(&prompt_ids, true) by collecting each
step(...) return (concatenate non-empty chunks) into a single string, then
obtain the expected text by decoding the continuation (e.g. via
wrapper.decode(cont_ids) or wrapper.decode(continuation)) and assert equality
(or assert that the accumulated string contains the expected continuation);
update references in the test around FastTokenizer::from_file,
TokenizerWrapper::from, wrapper.decode_stream, and stream.step to perform this
accumulation and comparison instead of a no-op loop.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 14f36a69-69af-4531-bb2b-0dfeeb99d059

📥 Commits

Reviewing files that changed from the base of the PR and between 0b66515 and 13eccdd.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • lib/bindings/python/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • Cargo.toml
  • components/src/dynamo/frontend/frontend_args.py
  • components/src/dynamo/frontend/main.py
  • lib/bindings/python/pyproject.toml
  • lib/llm/Cargo.toml
  • lib/llm/src/model_card.rs
  • lib/llm/src/tokenizers.rs
  • lib/llm/src/tokenizers/fast.rs
  • lib/llm/tests/data/sample-models/minimal-bpe/tokenizer.json

Comment thread Cargo.toml Outdated
Comment thread components/src/dynamo/frontend/frontend_args.py
Comment thread components/src/dynamo/frontend/main.py Outdated
Comment thread lib/bindings/python/pyproject.toml Outdated
Comment thread lib/llm/src/model_card.rs Outdated
Comment thread lib/llm/src/model_card.rs Outdated
Comment thread components/src/dynamo/frontend/frontend_args.py Outdated
@biswapanda
Copy link
Copy Markdown
Contributor Author

There are cargo-deny related CI failures due to upstream transitive dependencies and a PR for fast-tokens is being reviewed by Crusoe team - crusoecloud/fastokens#5


Issue: The fastokens crate (v0.1.0) declares hf-hub = "0.4.3" with default features, which pulls in native-tls and openssl-sys. Dynamo's deny.toml explicitly bans both crates (lines 63-64), causing cargo-deny to fail in CI.

The dependency chain is:

fastokens -> hf-hub (default features) -> ureq -> native-tls -> openssl-sys

Comment thread Cargo.toml
Comment thread .cargo/config.toml
@biswapanda biswapanda merged commit da810a2 into main Mar 15, 2026
150 checks passed
@biswapanda biswapanda deleted the bis/fast-tokens-dynamo branch March 15, 2026 23:49
ShounakRay pushed a commit to ShounakRay/fuzzy-dynamo that referenced this pull request Mar 20, 2026
yao531441 pushed a commit to yao531441/dynamo that referenced this pull request May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants