[kernel] add kernels for llama inference#4462
[kernel] add kernels for llama inference#4462tiandiao123 wants to merge 32 commits intohpcaitech:mainfrom
Conversation
kurisusnowdeng
left a comment
There was a problem hiding this comment.
Any differece from the original codes? For example, I quickly checked pos_encoding & layernorm kernels but they seem exactly the same as the codes in vllm's repository.
If this is just integrating some external codes into our project, I recommend just to add the external tool as a dependency and import the necessary API for our needs.
There is some minor changes. I only integrate useful parts which will be integrated into our llama-inference attention over here. |
|
The code coverage for the changed files is %. Click me to view the complete report |
|
Adding llama shardformer demo, try to directly import third-party useful ops @kurisusnowdeng |
|
use this PR instead: #4485 @kurisusnowdeng |
|
temporarily closed this. |
📌 Checklist before creating the PR
[doc/gemini/tensor/...]: A concise description🚨 Issue number
📝 What does this PR do?
This PR is used to add llama useful cuda kernels for inference.
💥 Checklist before requesting a review
⭐️ Do you enjoy contributing to Colossal-AI?
Tell us more if you don't enjoy contributing to Colossal-AI.