Skip to content

Conversation

@mqhc2020
Copy link
Contributor

@mqhc2020 mqhc2020 commented Dec 23, 2025

Motivation

For the function fused_qk_rope_cat_and_cache_mla, SGLang needs fake for it to pass torch compile.

Technical Details

This commit will need another SGLang commit merged simultaneously, because the API is changed.

Test Plan

Test Result

Submission Checklist

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds torch.compile support for the fused_qk_rope_cat_and_cache_mla function by introducing a fake tensor function that simulates tensor shapes and dtypes without actual computation. This is required for SGLang's torch.compile integration.

  • Adds fused_qk_rope_cat_and_cache_mla_fake_tensor function to generate fake tensors for torch.compile
  • Updates return type to always return 5 tensors (including q_nope_zeros_out) for consistency
  • Adds type hints and improves type annotations for better code clarity

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mqhc2020 mqhc2020 requested a review from azaidy December 23, 2025 13:16
@azaidy azaidy requested a review from k50112113 December 23, 2025 15:01
@mqhc2020 mqhc2020 changed the title add gen_fake for MLA RoPE operator add fake for MLA RoPE operator Dec 24, 2025
Copy link
Contributor

@k50112113 k50112113 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for the addition, I think we are not going to let torch compile see inside this function in any cases, so this is a pretty decent change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants