Skip to content

Conversation

@MasterJH5574
Copy link
Contributor

This PR reorganizes the attention kernel invocation logic in the PagedKVCache, so that in cases of sequence fork, we can effectively merge one ragged-prefill kernel and a decode kernel into a single decode kernel.

@MasterJH5574 MasterJH5574 marked this pull request as draft August 2, 2024 21:00
@MasterJH5574
Copy link
Contributor Author

MasterJH5574 commented Aug 2, 2024

Depending on #17236.

This PR reorganizes the attention kernel invocation logic in the
PagedKVCache, so that in cases of sequence fork, we can effectively
merge one ragged-prefill kernel and a decode kernel into a single
decode kernel.
@MasterJH5574 MasterJH5574 force-pushed the tvm-dev/2024-08-02-kvcache-invocation-reorg branch from eac154a to 4351d36 Compare August 3, 2024 15:10
@MasterJH5574 MasterJH5574 marked this pull request as ready for review August 3, 2024 15:10
@tqchen tqchen merged commit cd09ab6 into apache:main Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants