Skip to content

fix(sglang): stop re-encoding routed_experts from sglang 0.5.11+#9657

Open
KrishnanPrash wants to merge 3 commits into
mainfrom
kprashanth/dyn-3046
Open

fix(sglang): stop re-encoding routed_experts from sglang 0.5.11+#9657
KrishnanPrash wants to merge 3 commits into
mainfrom
kprashanth/dyn-3046

Conversation

@KrishnanPrash
Copy link
Copy Markdown
Contributor

@KrishnanPrash KrishnanPrash commented May 16, 2026

What

--enable-return-routed-experts crashes on the first decoded token against any non-DSv4 MoE model.

docker run ... --enable-return-routed-experts ...
curl localhost:8000/v1/chat/completions ...

Before:

File ".../decode_handler.py", line 649, in _process_token_stream
    routed_experts.numpy().tobytes()
AttributeError: 'str' object has no attribute 'numpy'

After:

HTTP 200, `nvext.routed_experts` is a base64 UTF-8 string. Recover ids with `np.frombuffer(b64decode(routed_experts), dtype=np.int32)`.

Why

sgl-project/sglang#21634 (in v0.5.11) moved the b64encode(t.numpy().tobytes()) of routed_experts into tokenizer_manager. The decode handler still ran that same encode on what is now already a str. Two emit sites, same bug.

DSv4 unaffected: _resolve_routed_experts_kwargs keeps the code path dormant on forks that lack return_routed_experts on async_generate.

What changed

  • decode_handler.py: pass the string through at both emit sites, drop the now-unused pybase64 import.
  • _compat.py: short note in the module docstring on the wire-format contract.
  • New test_routed_experts_passthrough.py: two cases pinning the pass-through.

Test

  • Test red against pre-fix code: traceback at line 747, matches ticket.
  • After fix: 2 passed.
  • pytest test_sglang_unit.py: 27 passed, no regressions.
  • Not tested end-to-end on H100. Bug is pure Python serialization with no model-state dependency.

Resolves DYN-3046

@KrishnanPrash KrishnanPrash requested review from a team as code owners May 16, 2026 09:33
@github-actions github-actions Bot added fix backend::sglang Relates to the sglang backend labels May 16, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

Walkthrough

Updated routed_experts wire-format handling to accept pre-encoded base64 strings directly from SGLang >= 0.5.11. Removed redundant base64-encoding in the decode handler for both token and text streaming modes, removed the pybase64 dependency, updated the compatibility contract documentation, and added regression tests validating passthrough behavior.

Changes

Routed Experts Passthrough for SGLang 0.5.11+

Layer / File(s) Summary
Compatibility contract documentation
components/src/dynamo/sglang/_compat.py
Module docstring extended with release-specific guidance that SGLang >= 0.5.11 provides routed_experts as a pre-encoded base64 UTF-8 string, requiring passthrough without re-encoding.
Decode handler passthrough implementation
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
Removed pybase64 import and replaced base64-encoding logic in _process_token_stream and _process_text_stream to pass routed_experts strings from meta_info directly into disaggregated_params and nvext payloads.
Routed experts passthrough regression tests
components/src/dynamo/sglang/tests/test_routed_experts_passthrough.py
New test module with token-stream and text-stream tests verifying that pre-encoded routed_experts strings are forwarded verbatim to downstream fields, plus a test asserting that disaggregated_params is absent when routed_experts is not provided.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing re-encoding of routed_experts from sglang 0.5.11+, which is the core issue addressed across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description covers all required template sections: Overview (What), Details (Why and What changed), and Related Issues (Resolves DYN-3046). It provides clear problem statement, root cause analysis, and solution summary.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/src/dynamo/sglang/request_handlers/llm/decode_handler.py`:
- Around line 644-650: Normalize routed_experts before assigning to
out["disaggregated_params"]: if routed_experts is a str, pass it through
unchanged; if it appears tensor-like (has attributes/methods .detach, .cpu, and
.numpy), convert it to a base64 string by detaching, moving to CPU, calling
.numpy(), getting its raw bytes and base64-encoding that result, then assign
{"routed_experts": <base64_str>}; for any other type set {"routed_experts":
None} to avoid leaking non-serializable objects; add/adjust unit tests to cover
string pass-through and tensor-like normalization (use a small mock object or
actual tensor) and the fallback-to-None case.

In `@components/src/dynamo/sglang/tests/test_routed_experts_passthrough.py`:
- Around line 4-15: Replace internal Linear ticket identifiers in the added
docstring and any test IDs (e.g., "DYN-3046" and "dyn-3046-*") with a public
GitHub-style issue reference (for example "GH-9657"); update the module
docstring top comment and any test function or test name strings that include
those identifiers so they no longer contain "DYN-3046" or "dyn-3046-*" but
instead use the chosen GH-#### token, ensuring references in the docstring and
test identifiers (search for the literal "DYN-3046" and "dyn-3046-") are all
replaced consistently.
- Around line 26-32: The pytest module-level pytestmark list is missing a
required component marker and should be made immutable; update the pytestmark
definition used in this file (pytestmark) to include exactly one component
marker from {multimodal, router, kvbm, core} (e.g., pytest.mark.router) and
convert pytestmark from a mutable list to an immutable tuple so tests adhere to
marker policy.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0f2ffdeb-f61b-4279-b2b5-85941b2fd36a

📥 Commits

Reviewing files that changed from the base of the PR and between d6240e6 and 4de3bff.

📒 Files selected for processing (3)
  • components/src/dynamo/sglang/_compat.py
  • components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
  • components/src/dynamo/sglang/tests/test_routed_experts_passthrough.py

Comment thread components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
Comment thread components/src/dynamo/sglang/tests/test_routed_experts_passthrough.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::sglang Relates to the sglang backend fix size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants