Conversation
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1216 +/- ##
==========================================
- Coverage 75.56% 75.45% -0.11%
==========================================
Files 353 362 +9
Lines 40430 40745 +315
==========================================
+ Hits 30551 30746 +195
- Misses 9879 9999 +120
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change: ?
New feature. Adds TriAttention KV cache sparsity as a new calibration-only mode under
modelopt.torch.sparsity.kv_cache. TriAttention scores cached KV entries using a trigonometric model derived from pre-RoPE Q/K concentration (arXiv:2604.04921), enabling KV cache compression at inference time with calibration only.This PR includes:
triattention), Pydantic config (TriAttentionConfig), convert/restoreentrypoints,
sparsify()andcalibrate()entry API undermodelopt.torch.sparsity.kv_cachefrequency statistics
examples/llm_sparsity/kv_cache_sparsity/hf_triattention.pyUsage
Testing
Comparison to the original implementation:
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information