Skip to content

Conversation

@Hzfengsy
Copy link
Member

@Hzfengsy Hzfengsy commented Aug 9, 2024

Introduce kv cache interface for Relax NNModule to support paged attention. Note that the implementation is migrated from MLC-llm

Introduce kv cache interface for Relax NNModule to support paged attention.
Note that the implementation is migrated from MLC-llm

Co-authored-by: Bohan Hou <bohanhou@andrew.cmu.edu>
Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu>
Co-authored-by: Hongyi Jin <hongyij@andrew.cmu.edu>
Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>
@Hzfengsy Hzfengsy force-pushed the kv_cache_interface branch from 5ad3b52 to a1639a5 Compare August 9, 2024 07:59
@tqchen tqchen merged commit b40a02c into apache:main Aug 9, 2024
@Hzfengsy Hzfengsy deleted the kv_cache_interface branch August 27, 2024 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants