build(vllm-tensorizer): Bump vllm to v0.19.1, pin transformers 5.5.4 + nixl 1.0.0#153
Merged
alexeldeib merged 3 commits intomainfrom Apr 23, 2026
Merged
build(vllm-tensorizer): Bump vllm to v0.19.1, pin transformers 5.5.4 + nixl 1.0.0#153alexeldeib merged 3 commits intomainfrom
alexeldeib merged 3 commits intomainfrom
Conversation
…+ nixl 1.0.0
vLLM bump
Upstream released v0.19.1 (2026-04-22, commit b1388b1fb) cherry-picking
the transformers-v5 refactor (vllm#30566) on top of the v0.19.0 tree.
Its common.txt declares `transformers >= 4.56.0, != 5.5.0` and its
test.txt pins the tested combo at transformers 5.5.3 / hf-hub 1.10.2 /
tokenizers 0.22.2 / hf-xet 1.4.3. compressed-tensors bumps 0.14.0.1
-> 0.15.0.1.
Transformers pin
Replace the old `transformers >= 5.5.0` floor with an exact pin to
5.5.4. 5.5.4 is the first release containing huggingface/transformers
PR #45359, which removes `kimi_k25` from
MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS and restores the custom
TikTokenTokenizer path with sparse added_tokens_decoder IDs. Earlier
5.x versions dense-pack those IDs and scramble Kimi K2.5 tool-call
output (`<|tool_call_begin|>` encoded as 163595 instead of 163597,
shifting every tool-call marker by -2 or -3). 5.5.4 also satisfies
the Gemma 4 > 5.5.0 requirement mentioned in the previous comment.
Nixl pin
nixl-cu12 1.0.1 wheel (released after the previous ml-containers
build) ships a different bundled libucs.so than 1.0.0. Its
ucs_topo_release_devices() has a static-destructor ordering bug that
SIGSEGVs during Python interpreter shutdown (atexit). vLLM's
model-registry subprocess calls check_returncode() before reading
its pickled output file, so a shutdown-time SIGSEGV surfaces as
`pydantic ValidationError: Model architectures [...] failed to be
inspected`, even though the inspection work itself completed. Pinning
nixl-cu12 and nixl to 1.0.0 swaps in the known-good bundled libucs.so.
Drop this pin once nixl ships a fixed release. Evidence:
full gdb backtrace at projects/auto-debug-v0191-segv/logs/research-02/
gdb-backtrace-v2.txt in the k25-bug workspace.
Validation
2x2 factorial tested on GB200 in the ace-inference namespace (dev-us-e-01a):
R1A (v0.19.1 no nixl pin) -> CrashLoop, 6+ restarts, UCX atexit SIGSEGV
R1B (v0.19.1 nixl 1.0.0) -> 1/1 Running, 15/15 functional tests PASS
R2C (v0.19.0 no nixl pin) -> 1/1 Running, 15/15 PASS (v0.19.0 base
was built before nixl 1.0.1 released)
R2D (v0.19.0 nixl 1.0.0) -> 1/1 Running, 15/15 PASS (pin is no-op
on v0.19.0 base)
See projects/k25-bug/result.md for the full test matrix.
|
@c2w-sea Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/24800069137 |
|
@c2w-sea Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/24800546098 |
|
@c2w-sea Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/24800582037 |
alexeldeib
approved these changes
Apr 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
vllm-committov0.19.1transformers==5.5.4— contains huggingface/transformers#45359 (restores Kimi K2.5 slowtokenizer); still
>5.5.0for Gemma 4nixl==1.0.0 nixl-cu12==1.0.0—1.0.1ships alibucs.sowith aucs_topo_release_devices()destructor bugContext
nixl-cu12 1.0.1 SIGSEGVs during Python interpreter shutdown on GB200. vLLM's model-registry subprocess checks
returncodebefore reading its pickled output, sothe shutdown crash surfaces as
pydantic ValidationError: Model architectures ['KimiK25ForConditionalGeneration'] failed to be inspected— even though theinspection work itself completed. Drop the nixl pins once a fixed nixl release lands.
Earlier 5.x transformers (5.4–5.5.3) dense-pack Kimi's
added_tokens_decoderIDs, shifting every tool-call marker by −2/−3 and scrambling tool-call output.5.5.4 is the first version with the upstream fix.
Test plan
Test plan
reasoning, legacy completions)
v0.19.1no-pin → CrashLoop;v0.19.1+pin → WorksRef: INF-353