Skip to content

build(vllm-tensorizer): Bump vllm to v0.19.1, pin transformers 5.5.4 + nixl 1.0.0#153

Merged
alexeldeib merged 3 commits intomainfrom
cwang/vllm-v0191
Apr 23, 2026
Merged

build(vllm-tensorizer): Bump vllm to v0.19.1, pin transformers 5.5.4 + nixl 1.0.0#153
alexeldeib merged 3 commits intomainfrom
cwang/vllm-v0191

Conversation

@c2w-sea
Copy link
Copy Markdown
Contributor

@c2w-sea c2w-sea commented Apr 22, 2026

Summary

  • Bump vllm-commit to v0.19.1
  • Pin transformers==5.5.4 — contains huggingface/transformers#45359 (restores Kimi K2.5 slow
    tokenizer); still >5.5.0 for Gemma 4
  • Pin nixl==1.0.0 nixl-cu12==1.0.01.0.1 ships a libucs.so with a ucs_topo_release_devices() destructor bug

Context

nixl-cu12 1.0.1 SIGSEGVs during Python interpreter shutdown on GB200. vLLM's model-registry subprocess checks returncode before reading its pickled output, so
the shutdown crash surfaces as pydantic ValidationError: Model architectures ['KimiK25ForConditionalGeneration'] failed to be inspected — even though the
inspection work itself completed. Drop the nixl pins once a fixed nixl release lands.

Earlier 5.x transformers (5.4–5.5.3) dense-pack Kimi's added_tokens_decoder IDs, shifting every tool-call marker by −2/−3 and scrambling tool-call output.
5.5.4 is the first version with the upstream fix.

Test plan

Test plan

  • GB200 dev (TP=4): pod 1/1 Running; 15/15 Kimi K2.5 functional tests PASS (tokenizer sparse IDs, tool-call streaming/parallel/multi-turn/forced-choice,
    reasoning, legacy completions)
  • 2×2 factorial: v0.19.1 no-pin → CrashLoop; v0.19.1 +pin → Works

Ref: INF-353

…+ nixl 1.0.0

vLLM bump
  Upstream released v0.19.1 (2026-04-22, commit b1388b1fb) cherry-picking
  the transformers-v5 refactor (vllm#30566) on top of the v0.19.0 tree.
  Its common.txt declares `transformers >= 4.56.0, != 5.5.0` and its
  test.txt pins the tested combo at transformers 5.5.3 / hf-hub 1.10.2 /
  tokenizers 0.22.2 / hf-xet 1.4.3. compressed-tensors bumps 0.14.0.1
  -> 0.15.0.1.

Transformers pin
  Replace the old `transformers >= 5.5.0` floor with an exact pin to
  5.5.4. 5.5.4 is the first release containing huggingface/transformers
  PR #45359, which removes `kimi_k25` from
  MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS and restores the custom
  TikTokenTokenizer path with sparse added_tokens_decoder IDs. Earlier
  5.x versions dense-pack those IDs and scramble Kimi K2.5 tool-call
  output (`<|tool_call_begin|>` encoded as 163595 instead of 163597,
  shifting every tool-call marker by -2 or -3). 5.5.4 also satisfies
  the Gemma 4 > 5.5.0 requirement mentioned in the previous comment.

Nixl pin
  nixl-cu12 1.0.1 wheel (released after the previous ml-containers
  build) ships a different bundled libucs.so than 1.0.0. Its
  ucs_topo_release_devices() has a static-destructor ordering bug that
  SIGSEGVs during Python interpreter shutdown (atexit). vLLM's
  model-registry subprocess calls check_returncode() before reading
  its pickled output file, so a shutdown-time SIGSEGV surfaces as
  `pydantic ValidationError: Model architectures [...] failed to be
  inspected`, even though the inspection work itself completed. Pinning
  nixl-cu12 and nixl to 1.0.0 swaps in the known-good bundled libucs.so.
  Drop this pin once nixl ships a fixed release. Evidence:
  full gdb backtrace at projects/auto-debug-v0191-segv/logs/research-02/
  gdb-backtrace-v2.txt in the k25-bug workspace.

Validation
  2x2 factorial tested on GB200 in the ace-inference namespace (dev-us-e-01a):
  R1A (v0.19.1 no nixl pin)  -> CrashLoop, 6+ restarts, UCX atexit SIGSEGV
  R1B (v0.19.1 nixl 1.0.0)   -> 1/1 Running, 15/15 functional tests PASS
  R2C (v0.19.0 no nixl pin)  -> 1/1 Running, 15/15 PASS (v0.19.0 base
                                was built before nixl 1.0.1 released)
  R2D (v0.19.0 nixl 1.0.0)   -> 1/1 Running, 15/15 PASS (pin is no-op
                                on v0.19.0 base)
  See projects/k25-bug/result.md for the full test matrix.
@github-actions
Copy link
Copy Markdown

@c2w-sea Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/24800069137
Image: ghcr.io/coreweave/ml-containers/vllm-tensorizer:cwang-vllm-v0191-e9b854a-v0.19.1

@github-actions
Copy link
Copy Markdown

@c2w-sea Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/24800546098
Image: ghcr.io/coreweave/ml-containers/vllm-tensorizer:cwang-vllm-v0191-24b0ba0-v0.19.1

@github-actions
Copy link
Copy Markdown

@c2w-sea Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/24800582037
Image: ghcr.io/coreweave/ml-containers/vllm-tensorizer:cwang-vllm-v0191-b4d9159-v0.19.1

@c2w-sea c2w-sea marked this pull request as ready for review April 22, 2026 20:26
@c2w-sea c2w-sea requested a review from a team as a code owner April 22, 2026 20:26
@c2w-sea c2w-sea requested a review from alexeldeib April 22, 2026 20:39
@alexeldeib alexeldeib merged commit c85d8a6 into main Apr 23, 2026
5 checks passed
@alexeldeib alexeldeib deleted the cwang/vllm-v0191 branch April 23, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants