Conversation
feat(cli): add mllm-llm-benchmark tool for performance testing
- Define QNN_QUANT_SCALE_NAME constant for quant scale key - Replace all occurrences of "quant_scale" string literal - Improve code maintainability and reduce typo risks - Ensure consistent usage of quant scale identifier - Simplify future modifications to quant scale key name
- Add QNNOpNamingPass to assign unique names to unnamed operations - Traverse subgraphs and name ops using module_name.op_type.index pattern - Handle CallGraphOp and SubGraphOp during IR traversal - Ensure all QNN operations have unique identifiers for graph construction - Add pass factory function and integrate with existing pass infrastructure
Added new source files Nn.cc and Compile.cc to the MllmFFIExtension library in CMakeLists.txt to extend the FFI interface. feat(build): format MLIR installation script Reformatted the cmake command in install_mlir.sh to a single line for better readability and consistency in the build script.
- Add new kai_sme.cpp and kai_sme.hpp files with proper copyright headers - Implement ARM-specific linear kernel using SME instructions - Include necessary header guards and license information - Remove empty KernelSelector files that were not being used
- Add QNNCastTypeOp to handle type casting with quantization - Support both quantize and dequantize operations - Integrate with QNN backend for graph node creation - Handle scale propagation for int8 and int16 types - Add pattern matching for CastType operations in IR
- Add `config_0.6B_w4a8_i8mm_kai.json` with model architecture settings - Add `quant_cfg_0.6B_w4a8_i8mm_kai.json` with layer-wise quantization hints - Configure KaiLinear implementation types for various modules perf(cpu): add label support for KaiLinear implementations - Insert labels for kai linear implementations to enable goto jumps - Optimize forward path by switching implementations based on input shape refactor(mllm): comment out memory cleanup temporarily - Comment out `clearAll()` call in `shutdownContext()` - Mark as FIXME for CUDA compatibility style(qwen3): reformat function signature for readability - Reformat `makeRotaryPosEmbedding` function declaration to fit within line limits - Improve code style consistency fix(qwen3): remove redundant finish token callback - Remove unnecessary finish token callback in Qwen3Session - Clean up post-processing logic for radix tree insertion
Adds a new devcontainer.json file for cu128 environment with comprehensive VS Code extension setup including Python, C++, debugging, and formatting tools.
feat(qwen3): add config and quantization files for 0.6B model
- Added `rmsnorm_fp32_inplace` and `rmsnorm_fp16_inplace` functions in ARM kernels - Updated RMSNormOp to support inplace operations using the new kernel functions - Modified LinearOp and related classes to support tensor redirection - Enhanced FlashAttention2Op with updated kernel includes and input handling - Added new test cases for FlashAttention2 with improved accuracy checks - Fixed contiguous tensor assertions in RMSNorm and RoPE operations - Extended Layer macros to support redirect attribute for ops - Updated StaticCache with new methods for KV cache management - Improved FA2 kernel tests with radix attention support and better validation
feat(cpu): add inplace rmsnorm implementations for fp32 and fp16
- Skip tensor data printing when trace mode is enable
- Add QNNMulOp class with reshape implementation for broadcasting - Implement QNNMulPattern to add ElementWiseMultiply nodes to QNN graph - Update QNNAddPattern to use standard ElementWiseAdd operator - Add tensor shape compatibility checks for Mul operations - Include proper error handling for tensor operations and backend access - Add factory class for QNNMulOp creation
- Switch implementation from Conv2d to FullyConnected operator - Reshape weights to 2D [out_channels, in_channels] format - Convert bias to int32 type for proper quantization handling - Remove unused biasInt32_ tensor member - Update reshape logic to flatten input for FullyConnected - Add keep_dims parameter for HTP support - Remove stride and pad parameters for Conv2d - Simplify bias conversion logic for quantized operations
- Implemented hash() method combining tensor uuid and attached views uuids - Updated tensor IR caching to use hash instead of uuid
- Add QNNX2XOp to handle data transfer between CPU and QNN shared buffer - Implement forward method to perform memory copy using std::memcpy - Create QNNX2XOpFactory for op creation in QNN backend - Add QNNX2XPattern as a placeholder that should not appear in QNN graph - Include OpTypes header in QNNDispatcher - Execute X2X op setup and forward in QNN dispatcher for kX2X operations
feat: Implement Qwen NPU Decoding Support with Memory Management Fixes
- ensure CausalMask layer is materialized on CPU before running kernel tests
- add deterministic Prefill/Decode/Append regressions based on runScenario helper
- exercise new coverage under build-tests/bin/Mllm-Test-CPUKernel --gtest_filter=CausalMaskOpTest.*
…aths - Update Hexagon SDK requirement from 5.x to 6.x in documentation - Adjust Makefile execution logic in HexagonMakeTask to use updated paths - Update library names from 'libQnnMllmPackage' to 'libQnnLLaMAPackage' - Modify build configuration files to reflect new package location - Ensure proper renaming of CPU and HTP libraries after build
QNN Op Package Migrate to v2
feat: add DeepSeek-OCR support, C++ API updates, and dual-model loadi…
test: fix CausalMaskOp CPU coverage
- Add detailed documentation for mllm's operator plugin system - Document in-tree and out-of-tree operator registration methods - Include examples for implementing custom operators and factories - Add plugin descriptor and build configuration guidelines - Update model configuration examples with GGUF quantization hints - Document supported quantization types in mllm-quantizer - Add guidance on selecting appropriate quantization methods - Remove outdated backend addition guide from quick start index
update docs
Added a note about model version compatibility and recommendations.
feat(build): update threading options for Apple GCD support in build configurations
fix(docs): update links for Qwen2 and Qwen2.5 models in README
feat(docs): add mllm-params-inspector tool usage instructions to README
Add build status entries for OrangePi AI Pro (310B) and OrangePi AI Studio (310P) with Ubuntu 22.04 in the compatibility matrix.
docs(readme): add OrangePi AI Pro and Studio build status
|
Caution Review failedThe pull request is closed. WalkthroughComprehensive v2 repository restructure introducing modernized build infrastructure (CMake with extensive feature flags), new lazy visual-language model algorithms with dynamic KV caching, extensive C++ SDK preparation, multi-platform Docker support, refined development tooling (clang-format, clang-tidy, devcontainers), GitHub CI/CD workflows, and extensive API documentation. Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer
participant Git as Git Workflow
participant Build as CMake Build
participant CI as GitHub Actions
participant Img as Container Image
participant Test as Testing
Dev->>Git: Push to v2 branch
Git->>CI: Trigger workflow (PR/push)
par CI Jobs
CI->>Img: build-android (Ubuntu)
Img->>Build: cmake + Ninja (NDK r28b)
Build->>Test: Execute Android build
CI->>Img: build-macos (macOS)
Img->>Build: cmake + Clang (Apple Silicon)
Build->>Test: Execute macOS build
CI->>Img: docs-deploy (Ubuntu)
Img->>Build: Sphinx + Doxygen
Build->>Test: Build docs → Deploy Pages
CI->>Img: pymllm-nightly (macOS)
Img->>Build: Build wheel (bump version)
Build->>Test: Upload PyPI artifact
end
Dev->>Build: Local: cmake (C++20, feature flags)
Build->>Build: Link MllmRT, MllmCPUBackend, etc.
Build->>Test: Run benchmarks (ARM BLAS, IntrusivePtr)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Key areas requiring attention:
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (18)
📒 Files selected for processing (107)
⛔ Files not processed due to max files limit (17)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary by CodeRabbit
New Features
Infrastructure & Documentation
Build & Deployment
✏️ Tip: You can customize this high-level summary in your review settings.