build(vllm-tensorizer): Bump vllm to v0.19.1, pin transformers 5.5.4 + nixl 1.0.0 by c2w-sea · Pull Request #153 · coreweave/ml-containers

c2w-sea · 2026-04-22T20:10:22Z

Summary

Bump vllm-commit to v0.19.1
Pin transformers==5.5.4 — contains huggingface/transformers#45359 (restores Kimi K2.5 slow
tokenizer); still >5.5.0 for Gemma 4
Pin nixl==1.0.0 nixl-cu12==1.0.0 — 1.0.1 ships a libucs.so with a ucs_topo_release_devices() destructor bug

Context

nixl-cu12 1.0.1 SIGSEGVs during Python interpreter shutdown on GB200. vLLM's model-registry subprocess checks returncode before reading its pickled output, so
the shutdown crash surfaces as pydantic ValidationError: Model architectures ['KimiK25ForConditionalGeneration'] failed to be inspected — even though the
inspection work itself completed. Drop the nixl pins once a fixed nixl release lands.

Earlier 5.x transformers (5.4–5.5.3) dense-pack Kimi's added_tokens_decoder IDs, shifting every tool-call marker by −2/−3 and scrambling tool-call output.
5.5.4 is the first version with the upstream fix.

Test plan

GB200 dev (TP=4): pod 1/1 Running; 15/15 Kimi K2.5 functional tests PASS (tokenizer sparse IDs, tool-call streaming/parallel/multi-turn/forced-choice,
reasoning, legacy completions)
2×2 factorial: v0.19.1 no-pin → CrashLoop; v0.19.1 +pin → Works

Ref: INF-353

…+ nixl 1.0.0 vLLM bump Upstream released v0.19.1 (2026-04-22, commit b1388b1fb) cherry-picking the transformers-v5 refactor (vllm#30566) on top of the v0.19.0 tree. Its common.txt declares `transformers >= 4.56.0, != 5.5.0` and its test.txt pins the tested combo at transformers 5.5.3 / hf-hub 1.10.2 / tokenizers 0.22.2 / hf-xet 1.4.3. compressed-tensors bumps 0.14.0.1 -> 0.15.0.1. Transformers pin Replace the old `transformers >= 5.5.0` floor with an exact pin to 5.5.4. 5.5.4 is the first release containing huggingface/transformers PR #45359, which removes `kimi_k25` from MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS and restores the custom TikTokenTokenizer path with sparse added_tokens_decoder IDs. Earlier 5.x versions dense-pack those IDs and scramble Kimi K2.5 tool-call output (`<|tool_call_begin|>` encoded as 163595 instead of 163597, shifting every tool-call marker by -2 or -3). 5.5.4 also satisfies the Gemma 4 > 5.5.0 requirement mentioned in the previous comment. Nixl pin nixl-cu12 1.0.1 wheel (released after the previous ml-containers build) ships a different bundled libucs.so than 1.0.0. Its ucs_topo_release_devices() has a static-destructor ordering bug that SIGSEGVs during Python interpreter shutdown (atexit). vLLM's model-registry subprocess calls check_returncode() before reading its pickled output file, so a shutdown-time SIGSEGV surfaces as `pydantic ValidationError: Model architectures [...] failed to be inspected`, even though the inspection work itself completed. Pinning nixl-cu12 and nixl to 1.0.0 swaps in the known-good bundled libucs.so. Drop this pin once nixl ships a fixed release. Evidence: full gdb backtrace at projects/auto-debug-v0191-segv/logs/research-02/ gdb-backtrace-v2.txt in the k25-bug workspace. Validation 2x2 factorial tested on GB200 in the ace-inference namespace (dev-us-e-01a): R1A (v0.19.1 no nixl pin) -> CrashLoop, 6+ restarts, UCX atexit SIGSEGV R1B (v0.19.1 nixl 1.0.0) -> 1/1 Running, 15/15 functional tests PASS R2C (v0.19.0 no nixl pin) -> 1/1 Running, 15/15 PASS (v0.19.0 base was built before nixl 1.0.1 released) R2D (v0.19.0 nixl 1.0.0) -> 1/1 Running, 15/15 PASS (pin is no-op on v0.19.0 base) See projects/k25-bug/result.md for the full test matrix.

github-actions · 2026-04-22T20:11:58Z

@c2w-sea Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/24800069137
Image: ghcr.io/coreweave/ml-containers/vllm-tensorizer:cwang-vllm-v0191-e9b854a-v0.19.1

github-actions · 2026-04-22T20:19:50Z

@c2w-sea Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/24800546098
Image: ghcr.io/coreweave/ml-containers/vllm-tensorizer:cwang-vllm-v0191-24b0ba0-v0.19.1

github-actions · 2026-04-22T20:20:31Z

@c2w-sea Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/24800582037
Image: ghcr.io/coreweave/ml-containers/vllm-tensorizer:cwang-vllm-v0191-b4d9159-v0.19.1

c2w-sea added 2 commits April 22, 2026 13:18

chore(vllm-tensorizer): Shorten pin-rationale comments

24b0ba0

docs(vllm-tensorizer): Link nixl pin to vllm-project/vllm#40642

b4d9159

c2w-sea marked this pull request as ready for review April 22, 2026 20:26

c2w-sea requested a review from a team as a code owner April 22, 2026 20:26

c2w-sea requested a review from alexeldeib April 22, 2026 20:39

alexeldeib approved these changes Apr 23, 2026

View reviewed changes

alexeldeib merged commit c85d8a6 into main Apr 23, 2026
5 checks passed

alexeldeib deleted the cwang/vllm-v0191 branch April 23, 2026 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(vllm-tensorizer): Bump vllm to v0.19.1, pin transformers 5.5.4 + nixl 1.0.0#153

build(vllm-tensorizer): Bump vllm to v0.19.1, pin transformers 5.5.4 + nixl 1.0.0#153
alexeldeib merged 3 commits intomainfrom
cwang/vllm-v0191

c2w-sea commented Apr 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

c2w-sea commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Test plan

Test plan

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

c2w-sea commented Apr 22, 2026 •

edited

Loading