fix(sglang): stop re-encoding routed_experts from sglang 0.5.11+#9657
fix(sglang): stop re-encoding routed_experts from sglang 0.5.11+#9657KrishnanPrash wants to merge 3 commits into
Conversation
WalkthroughUpdated routed_experts wire-format handling to accept pre-encoded base64 strings directly from SGLang >= 0.5.11. Removed redundant base64-encoding in the decode handler for both token and text streaming modes, removed the pybase64 dependency, updated the compatibility contract documentation, and added regression tests validating passthrough behavior. ChangesRouted Experts Passthrough for SGLang 0.5.11+
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@components/src/dynamo/sglang/request_handlers/llm/decode_handler.py`:
- Around line 644-650: Normalize routed_experts before assigning to
out["disaggregated_params"]: if routed_experts is a str, pass it through
unchanged; if it appears tensor-like (has attributes/methods .detach, .cpu, and
.numpy), convert it to a base64 string by detaching, moving to CPU, calling
.numpy(), getting its raw bytes and base64-encoding that result, then assign
{"routed_experts": <base64_str>}; for any other type set {"routed_experts":
None} to avoid leaking non-serializable objects; add/adjust unit tests to cover
string pass-through and tensor-like normalization (use a small mock object or
actual tensor) and the fallback-to-None case.
In `@components/src/dynamo/sglang/tests/test_routed_experts_passthrough.py`:
- Around line 4-15: Replace internal Linear ticket identifiers in the added
docstring and any test IDs (e.g., "DYN-3046" and "dyn-3046-*") with a public
GitHub-style issue reference (for example "GH-9657"); update the module
docstring top comment and any test function or test name strings that include
those identifiers so they no longer contain "DYN-3046" or "dyn-3046-*" but
instead use the chosen GH-#### token, ensuring references in the docstring and
test identifiers (search for the literal "DYN-3046" and "dyn-3046-") are all
replaced consistently.
- Around line 26-32: The pytest module-level pytestmark list is missing a
required component marker and should be made immutable; update the pytestmark
definition used in this file (pytestmark) to include exactly one component
marker from {multimodal, router, kvbm, core} (e.g., pytest.mark.router) and
convert pytestmark from a mutable list to an immutable tuple so tests adhere to
marker policy.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0f2ffdeb-f61b-4279-b2b5-85941b2fd36a
📒 Files selected for processing (3)
components/src/dynamo/sglang/_compat.pycomponents/src/dynamo/sglang/request_handlers/llm/decode_handler.pycomponents/src/dynamo/sglang/tests/test_routed_experts_passthrough.py
What
--enable-return-routed-expertscrashes on the first decoded token against any non-DSv4 MoE model.Before:
After:
Why
sgl-project/sglang#21634 (in v0.5.11) moved the
b64encode(t.numpy().tobytes())ofrouted_expertsintotokenizer_manager. The decode handler still ran that same encode on what is now already astr. Two emit sites, same bug.DSv4 unaffected:
_resolve_routed_experts_kwargskeeps the code path dormant on forks that lackreturn_routed_expertsonasync_generate.What changed
decode_handler.py: pass the string through at both emit sites, drop the now-unusedpybase64import._compat.py: short note in the module docstring on the wire-format contract.test_routed_experts_passthrough.py: two cases pinning the pass-through.Test
pytest test_sglang_unit.py: 27 passed, no regressions.Resolves DYN-3046