feat(qwen3): add config and quantization files for 0.6B model#482
Merged
chenghuaWang merged 5 commits intoUbiquitousLearning:v2from Oct 16, 2025
Merged
feat(qwen3): add config and quantization files for 0.6B model#482chenghuaWang merged 5 commits intoUbiquitousLearning:v2from
chenghuaWang merged 5 commits intoUbiquitousLearning:v2from
Conversation
Collaborator
chenghuaWang
commented
Oct 16, 2025
- Add new kai_sme.cpp and kai_sme.hpp files with proper copyright headers
- Implement ARM-specific linear kernel using SME instructions
- Include necessary header guards and license information
- Remove empty KernelSelector files that were not being used
Added new source files Nn.cc and Compile.cc to the MllmFFIExtension library in CMakeLists.txt to extend the FFI interface. feat(build): format MLIR installation script Reformatted the cmake command in install_mlir.sh to a single line for better readability and consistency in the build script.
- Add new kai_sme.cpp and kai_sme.hpp files with proper copyright headers - Implement ARM-specific linear kernel using SME instructions - Include necessary header guards and license information - Remove empty KernelSelector files that were not being used
Contributor
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
- Add `config_0.6B_w4a8_i8mm_kai.json` with model architecture settings - Add `quant_cfg_0.6B_w4a8_i8mm_kai.json` with layer-wise quantization hints - Configure KaiLinear implementation types for various modules perf(cpu): add label support for KaiLinear implementations - Insert labels for kai linear implementations to enable goto jumps - Optimize forward path by switching implementations based on input shape refactor(mllm): comment out memory cleanup temporarily - Comment out `clearAll()` call in `shutdownContext()` - Mark as FIXME for CUDA compatibility style(qwen3): reformat function signature for readability - Reformat `makeRotaryPosEmbedding` function declaration to fit within line limits - Improve code style consistency fix(qwen3): remove redundant finish token callback - Remove unnecessary finish token callback in Qwen3Session - Clean up post-processing logic for radix tree insertion
Adds a new devcontainer.json file for cu128 environment with comprehensive VS Code extension setup including Python, C++, debugging, and formatting tools.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.