V2 Release by oreomaker · Pull Request #545 · UbiquitousLearning/mllm

oreomaker · 2025-11-23T08:27:43Z

Summary by CodeRabbit

New Features
- Added lazy visual-language model implementations with optimized KV cache management for Qwen2.5VL and Qwen2VL models
- Added multi-platform development container support for ARM, CUDA 12.4, CUDA 12.8, and QNN environments
- Added support for modern build infrastructure and CI/CD workflows for Android NDK, macOS Apple Silicon, and documentation deployment
Infrastructure & Documentation
- Restructured project configuration with expanded feature flags and build options
- Modernized development tooling configuration (.clang-format, .clang-tidy, .editorconfig)
- Added comprehensive API, architecture, and backend-specific documentation
- Enhanced GitHub workflows with new issue templates and contribution guidelines
Build & Deployment
- Updated CMake configuration with standardized build options and packaging support
- Added Docker support for multiple platforms and development environments
- Reorganized submodule structure for improved dependency management

_{✏️ Tip: You can customize this high-level summary in your review settings.}

feat(cli): add mllm-llm-benchmark tool for performance testing

- Define QNN_QUANT_SCALE_NAME constant for quant scale key - Replace all occurrences of "quant_scale" string literal - Improve code maintainability and reduce typo risks - Ensure consistent usage of quant scale identifier - Simplify future modifications to quant scale key name

- Add QNNOpNamingPass to assign unique names to unnamed operations - Traverse subgraphs and name ops using module_name.op_type.index pattern - Handle CallGraphOp and SubGraphOp during IR traversal - Ensure all QNN operations have unique identifiers for graph construction - Add pass factory function and integrate with existing pass infrastructure

Added new source files Nn.cc and Compile.cc to the MllmFFIExtension library in CMakeLists.txt to extend the FFI interface. feat(build): format MLIR installation script Reformatted the cmake command in install_mlir.sh to a single line for better readability and consistency in the build script.

- Add new kai_sme.cpp and kai_sme.hpp files with proper copyright headers - Implement ARM-specific linear kernel using SME instructions - Include necessary header guards and license information - Remove empty KernelSelector files that were not being used

- Add QNNCastTypeOp to handle type casting with quantization - Support both quantize and dequantize operations - Integrate with QNN backend for graph node creation - Handle scale propagation for int8 and int16 types - Add pattern matching for CastType operations in IR

- Add `config_0.6B_w4a8_i8mm_kai.json` with model architecture settings - Add `quant_cfg_0.6B_w4a8_i8mm_kai.json` with layer-wise quantization hints - Configure KaiLinear implementation types for various modules perf(cpu): add label support for KaiLinear implementations - Insert labels for kai linear implementations to enable goto jumps - Optimize forward path by switching implementations based on input shape refactor(mllm): comment out memory cleanup temporarily - Comment out `clearAll()` call in `shutdownContext()` - Mark as FIXME for CUDA compatibility style(qwen3): reformat function signature for readability - Reformat `makeRotaryPosEmbedding` function declaration to fit within line limits - Improve code style consistency fix(qwen3): remove redundant finish token callback - Remove unnecessary finish token callback in Qwen3Session - Clean up post-processing logic for radix tree insertion

Adds a new devcontainer.json file for cu128 environment with comprehensive VS Code extension setup including Python, C++, debugging, and formatting tools.

feat(qwen3): add config and quantization files for 0.6B model

- Added `rmsnorm_fp32_inplace` and `rmsnorm_fp16_inplace` functions in ARM kernels - Updated RMSNormOp to support inplace operations using the new kernel functions - Modified LinearOp and related classes to support tensor redirection - Enhanced FlashAttention2Op with updated kernel includes and input handling - Added new test cases for FlashAttention2 with improved accuracy checks - Fixed contiguous tensor assertions in RMSNorm and RoPE operations - Extended Layer macros to support redirect attribute for ops - Updated StaticCache with new methods for KV cache management - Improved FA2 kernel tests with radix attention support and better validation

feat(cpu): add inplace rmsnorm implementations for fp32 and fp16

- Skip tensor data printing when trace mode is enable

- Add QNNMulOp class with reshape implementation for broadcasting - Implement QNNMulPattern to add ElementWiseMultiply nodes to QNN graph - Update QNNAddPattern to use standard ElementWiseAdd operator - Add tensor shape compatibility checks for Mul operations - Include proper error handling for tensor operations and backend access - Add factory class for QNNMulOp creation

- Switch implementation from Conv2d to FullyConnected operator - Reshape weights to 2D [out_channels, in_channels] format - Convert bias to int32 type for proper quantization handling - Remove unused biasInt32_ tensor member - Update reshape logic to flatten input for FullyConnected - Add keep_dims parameter for HTP support - Remove stride and pad parameters for Conv2d - Simplify bias conversion logic for quantized operations

- Implemented hash() method combining tensor uuid and attached views uuids - Updated tensor IR caching to use hash instead of uuid

- Add QNNX2XOp to handle data transfer between CPU and QNN shared buffer - Implement forward method to perform memory copy using std::memcpy - Create QNNX2XOpFactory for op creation in QNN backend - Add QNNX2XPattern as a placeholder that should not appear in QNN graph - Include OpTypes header in QNNDispatcher - Execute X2X op setup and forward in QNN dispatcher for kX2X operations

…ding

feat: Implement Qwen NPU Decoding Support with Memory Management Fixes

- ensure CausalMask layer is materialized on CPU before running kernel tests - add deterministic Prefill/Decode/Append regressions based on runScenario helper - exercise new coverage under build-tests/bin/Mllm-Test-CPUKernel --gtest_filter=CausalMaskOpTest.*

…aths - Update Hexagon SDK requirement from 5.x to 6.x in documentation - Adjust Makefile execution logic in HexagonMakeTask to use updated paths - Update library names from 'libQnnMllmPackage' to 'libQnnLLaMAPackage' - Modify build configuration files to reflect new package location - Ensure proper renaming of CPU and HTP libraries after build

QNN Op Package Migrate to v2

feat: add DeepSeek-OCR support, C++ API updates, and dual-model loadi…

test: fix CausalMaskOp CPU coverage

…configurations

- Add detailed documentation for mllm's operator plugin system - Document in-tree and out-of-tree operator registration methods - Include examples for implementing custom operators and factories - Add plugin descriptor and build configuration guidelines - Update model configuration examples with GGUF quantization hints - Document supported quantization types in mllm-quantizer - Add guidance on selecting appropriate quantization methods - Remove outdated backend addition guide from quick start index

update docs

Added a note about model version compatibility and recommendations.

feat(build): update threading options for Apple GCD support in build configurations

fix(docs): update links for Qwen2 and Qwen2.5 models in README

feat(docs): add mllm-params-inspector tool usage instructions to README

Add build status entries for OrangePi AI Pro (310B) and OrangePi AI Studio (310P) with Ubuntu 22.04 in the compatibility matrix.

docs(readme): add OrangePi AI Pro and Studio build status

coderabbitai · 2025-11-23T08:27:57Z

Caution

Review failed

The pull request is closed.

Walkthrough

Comprehensive v2 repository restructure introducing modernized build infrastructure (CMake with extensive feature flags), new lazy visual-language model algorithms with dynamic KV caching, extensive C++ SDK preparation, multi-platform Docker support, refined development tooling (clang-format, clang-tidy, devcontainers), GitHub CI/CD workflows, and extensive API documentation.

Changes

Cohort / File(s)	Summary
Build & CMake Infrastructure `CMakeLists.txt`, `cmake/CPM.cmake`, `cmake/mllmConfig.cmake.in`, `benchmarks/CMakeLists.txt`, `benchmarks/cpu/CMakeLists.txt`, `benchmarks/ext_stl/CMakeLists.txt`, `algorithms/*/CMakeLists.txt`	Modernized CMake v3.21+ configuration with C++20, extensive feature flags (MLLM_ENABLE_TEST, MLLM_BUILD_*_BACKEND, MLLM_EXT_ENABLE, etc.), CPM package manager integration, Git commit hash detection, Tracy profiling, and platform-specific thread vendors (OpenMP, Apple GCD).
Lazy VLM Algorithm Implementation `algorithms/lazy_vlm/HKVCache.{hpp,cpp}`, `algorithms/lazy_vlm/HKVCacheFast.{hpp,cpp}`, `algorithms/lazy_vlm/models/qwen2_5vl/.{hpp,cpp}`, `algorithms/lazy_vlm/models/qwen2vl/.{hpp,cpp}`, `algorithms/lazy_vlm/LazyVLMQwen2.cpp`, `algorithms/lazy_vlm/run.py`	Complete lazy visual-language model system with hierarchical KV caching (HKVCache, HKVCacheFast), Qwen2.5VL and Qwen2VL model implementations, attention-based pruning, dynamic token selection, and Python build/deployment scripts.
Benchmark Implementations `benchmarks/cpu/arm_mllm_blas_sgemm.cpp`, `benchmarks/ext_stl/intrusive_ptr.cpp`	ARM CPU BLAS benchmarking (GEMV, batched GEMV with NEON optimization) and intrusive pointer performance comparison against std::shared_ptr.
Code Quality & Formatting `.clang-format`, `.clang-tidy`, `.clang-tidy.ignore`, `.pre-commit-config.yaml`, `.clangd`	Updated clang-format with C++20 focus, C++ column limit 128, 2-space indentation; clang-tidy enabled with warnings-as-errors, broader checks (google-, modernize-, performance-*); pre-commit hook for clang-format; clangd C++20 configuration.
Development Environment `.devcontainer//devcontainer.json`, `docker/Dockerfile.`, `docker/README.md`, `.editorconfig`, `.vscode/*`	Multi-platform devcontainers (ARM, CUDA 12.4, CUDA 12.8, QNN) with VSCode extensions; Dockerfiles for ARM NDK, CUDA variants, and QNN SDK; editor config for project-wide formatting.
GitHub Workflows & Templates `.github/workflows/.yml`, `.github/ISSUE_TEMPLATE/.yml`, `.github/pull_request_template.md`, `.github/copilot-instructions.md`	CI pipelines for Android NDK, macOS Apple Silicon, documentation deployment, and pymllm nightly builds; structured issue templates (bugs, features, model support, performance, research); PR template and Copilot guidelines.
Project Metadata & Configuration `.gitignore`, `.gitmodules`, `CODEOWNERS`, `LICENSE`, `AUTHORS`, `README.md`, `algorithms/.gitignore`	Restructured .gitignore to granular file patterns, replaced submodules (removed android/pybind11, added fmt/benchmark/kleidiai/cccl/cutlass/llvm-project/tokenizers), CODEOWNERS for code ownership, updated LICENSE year to 2025, regenerated AUTHORS, comprehensive README redesign.
Documentation Structure `docs/`, `docs/conf.py`, `docs/index.rst`, `docs/api/`, `docs/arch/`, `docs/cache/`, `docs/compile/`, `docs/cpu_backend/`, `docs/qnn_backend/`, `docs/contribute/`, `docs/qa/`	Complete Sphinx documentation suite with C++ API reference (Tensor, Module, Layer, NN, Functional, ARGeneration), architecture guides (module/layer/dispatcher, IR levels, op plugin system, tensor layout), backend-specific docs (CPU threads/FA2/ARM/X86, QNN design), contribution guidelines, and FAQ.
Fancy Algorithm Skeleton `algorithms/fancy_algorithm/{.gitignore,CMakeLists.txt,README.md,main.cpp,run.py}`	Minimal custom algorithm development template with CMake configuration, empty main entry point, and Android build/deployment script.

Sequence Diagram(s)

sequenceDiagram
    participant Dev as Developer
    participant Git as Git Workflow
    participant Build as CMake Build
    participant CI as GitHub Actions
    participant Img as Container Image
    participant Test as Testing

    Dev->>Git: Push to v2 branch
    Git->>CI: Trigger workflow (PR/push)
    
    par CI Jobs
        CI->>Img: build-android (Ubuntu)
        Img->>Build: cmake + Ninja (NDK r28b)
        Build->>Test: Execute Android build
        
        CI->>Img: build-macos (macOS)
        Img->>Build: cmake + Clang (Apple Silicon)
        Build->>Test: Execute macOS build
        
        CI->>Img: docs-deploy (Ubuntu)
        Img->>Build: Sphinx + Doxygen
        Build->>Test: Build docs → Deploy Pages
        
        CI->>Img: pymllm-nightly (macOS)
        Img->>Build: Build wheel (bump version)
        Build->>Test: Upload PyPI artifact
    end
    
    Dev->>Build: Local: cmake (C++20, feature flags)
    Build->>Build: Link MllmRT, MllmCPUBackend, etc.
    Build->>Test: Run benchmarks (ARM BLAS, IntrusivePtr)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Key areas requiring attention:

Lazy VLM Algorithm Implementation (algorithms/lazy_vlm/HKVCache*.{hpp,cpp}, modeling_qwen2*.hpp): Dense multi-layer cache management, attention-based pruning logic, dynamic tensor slicing, and per-layer state tracking across prefill/decode phases. Critical for correctness of visual token selection and KV cache consistency.
CMakeLists.txt Overhaul: Extensive new CMake options (30+ flags), intricate platform-specific logic (OpenMP vs. Apple GCD), CPM package integration, and multi-backend build targets (ARM, CUDA, QNN, Ascend). Requires validation of conditional dependencies and build artifact organization.
Submodule Restructuring (.gitmodules): Significant replacement of tracked dependencies; impacts reproducibility and vendored library integration across multiple backend paths.
CI/CD Workflow Changes: Removal of main.yml and introduction of multiple specialized workflows; risk of CI coverage gaps if branch/path patterns are misconfigured.
Configuration Consistency: Clang-format, clang-tidy, and clangd changes must align with project standards and not conflict with existing IDE/linter expectations.

Possibly related PRs

feat(cli): add mllm-llm-benchmark tool for performance testing #481: Shares Git commit hash propagation logic in CMakeLists.txt; overlapping version/metadata handling.
feat: Implement Qwen NPU Decoding Support with Memory Management Fixes #537: Modifies KV-cache interfaces (HKVCache, cache update rules, reordering); directly related to lazy VLM cache implementation.
feat(thread-pool): implement HpcThreadPool for efficient CPU task management and update build configurations #531: Touches threading infrastructure (OpenMP/GCD CMake options, thread-pool initialization); overlaps with MLLM_KERNEL_THREADS_VENDOR_* flags.

Suggested reviewers

yirongjie
chenghuaWang
liang1232018

Poem

🐰 A bunny hops through v2's grand maze,
With caches bright and algorithms ablaze!
From clang-format to docs so fine,
And workflows that build in parallel lines.
Lazy tokens pruned with bunny care,
MLLM now blooms—a framework beyond compare! 🌟

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch v2

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db227d7 and 0ff1f20.

⛔ Files ignored due to path filters (18)

assets/australia.jpg is excluded by !**/*.jpg
assets/bird_audio.wav is excluded by !**/*.wav
assets/bird_image.jpg is excluded by !**/*.jpg
assets/bus.png is excluded by !**/*.png
assets/car_audio.wav is excluded by !**/*.wav
assets/car_image.jpg is excluded by !**/*.jpg
assets/cat.jpg is excluded by !**/*.jpg
assets/chat_record_demo.png is excluded by !**/*.png
assets/dog_audio.wav is excluded by !**/*.wav
assets/dog_image.jpg is excluded by !**/*.jpg
assets/shadow_execution.png is excluded by !**/*.png
assets/two_cats.jpg is excluded by !**/*.jpg
assets/uidemo.jpg is excluded by !**/*.jpg
assets/uidemo2.png is excluded by !**/*.png
docs/_static/img/arch.png is excluded by !**/*.png
docs/_static/img/qnn-trace-execute-seq.png is excluded by !**/*.png
docs/_static/img/tensor-storage.png is excluded by !**/*.png
mllm-cli/go.sum is excluded by !**/*.sum

📒 Files selected for processing (107)

.clang-format (1 hunks)
.clang-tidy (1 hunks)
.clang-tidy.ignore (1 hunks)
.clangd (1 hunks)
.devcontainer/arm/devcontainer.json (1 hunks)
.devcontainer/cu124/devcontainer.json (1 hunks)
.devcontainer/cu128/devcontainer.json (1 hunks)
.devcontainer/qnn/devcontainer.json (1 hunks)
.editorconfig (1 hunks)
.github/ISSUE_TEMPLATE/01-bugs-report.yml (1 hunks)
.github/ISSUE_TEMPLATE/02-feature_request.yml (1 hunks)
.github/ISSUE_TEMPLATE/03-model-support-request.yml (1 hunks)
.github/ISSUE_TEMPLATE/04-performance.yml (1 hunks)
.github/ISSUE_TEMPLATE/05-research-experiment.yml (1 hunks)
.github/copilot-instructions.md (1 hunks)
.github/pull_request_template.md (1 hunks)
.github/workflows/build-android.yml (1 hunks)
.github/workflows/build-osx.yml (1 hunks)
.github/workflows/docs-deploy.yml (1 hunks)
.github/workflows/main.yml (0 hunks)
.github/workflows/pymllm-macos-nightly.yml (1 hunks)
.gitignore (1 hunks)
.gitmodules (1 hunks)
.pre-commit-config.yaml (1 hunks)
.vscode/extensions.json (1 hunks)
.vscode/settings_recommended.json (1 hunks)
AUTHORS (1 hunks)
CMakeLists.txt (1 hunks)
CODEOWNERS (1 hunks)
LICENSE (2 hunks)
README.md (4 hunks)
algorithms/.gitignore (1 hunks)
algorithms/fancy_algorithm/.gitignore (1 hunks)
algorithms/fancy_algorithm/CMakeLists.txt (1 hunks)
algorithms/fancy_algorithm/README.md (1 hunks)
algorithms/fancy_algorithm/main.cpp (1 hunks)
algorithms/fancy_algorithm/run.py (1 hunks)
algorithms/lazy_vlm/.gitignore (1 hunks)
algorithms/lazy_vlm/CMakeLists.txt (1 hunks)
algorithms/lazy_vlm/HKVCache.cpp (1 hunks)
algorithms/lazy_vlm/HKVCache.hpp (1 hunks)
algorithms/lazy_vlm/HKVCacheFast.cpp (1 hunks)
algorithms/lazy_vlm/HKVCacheFast.hpp (1 hunks)
algorithms/lazy_vlm/LazyVLMQwen2VL.cpp (1 hunks)
algorithms/lazy_vlm/LazyVLMQwen2VLFast.cpp (1 hunks)
algorithms/lazy_vlm/LazyVLMQwen2_5VL.cpp (1 hunks)
algorithms/lazy_vlm/LazyVLMQwen2_5VLFast.cpp (1 hunks)
algorithms/lazy_vlm/models/qwen2_5vl/lazy_vlm_cfg.hpp (1 hunks)
algorithms/lazy_vlm/models/qwen2_5vl/lazy_vlm_cfg_fast.hpp (1 hunks)
algorithms/lazy_vlm/models/qwen2_5vl/modeling_qwen2_5vl.hpp (1 hunks)
algorithms/lazy_vlm/models/qwen2_5vl/modeling_qwen2_5vl_fast.hpp (1 hunks)
algorithms/lazy_vlm/models/qwen2vl/lazy_vlm_cfg.hpp (1 hunks)
algorithms/lazy_vlm/models/qwen2vl/modeling_qwen2vl.hpp (1 hunks)
algorithms/lazy_vlm/run.py (1 hunks)
algorithms/lazy_vlm/run_remote_android.py (1 hunks)
android (0 hunks)
benchmarks/CMakeLists.txt (1 hunks)
benchmarks/cpu/CMakeLists.txt (1 hunks)
benchmarks/cpu/arm_mllm_blas_sgemm.cpp (1 hunks)
benchmarks/ext_stl/CMakeLists.txt (1 hunks)
benchmarks/ext_stl/intrusive_ptr.cpp (1 hunks)
cmake/CPM.cmake (1 hunks)
cmake/mllmConfig.cmake.in (1 hunks)
docker/Dockerfile.arm (1 hunks)
docker/Dockerfile.cu124 (1 hunks)
docker/Dockerfile.cu128 (1 hunks)
docker/Dockerfile.qnn (1 hunks)
docker/README.md (1 hunks)
docs/.gitignore (1 hunks)
docs/Doxyfile (1 hunks)
docs/Makefile (1 hunks)
docs/algorithms/index.rst (1 hunks)
docs/algorithms/pruning.rst (1 hunks)
docs/api/argeneration.rst (1 hunks)
docs/api/functional.rst (1 hunks)
docs/api/index.rst (1 hunks)
docs/api/layer.rst (1 hunks)
docs/api/mllm.rst (1 hunks)
docs/api/module.rst (1 hunks)
docs/api/nn.rst (1 hunks)
docs/api/tensor.rst (1 hunks)
docs/arch/arch.rst (1 hunks)
docs/arch/index.rst (1 hunks)
docs/arch/op_plugin_system.rst (1 hunks)
docs/arch/support_ops.rst (1 hunks)
docs/arch/tensor.rst (1 hunks)
docs/cache/index.rst (1 hunks)
docs/compile/index.rst (1 hunks)
docs/compile/ir.rst (1 hunks)
docs/conf.py (1 hunks)
docs/contribute/guidelines.rst (1 hunks)
docs/contribute/index.rst (1 hunks)
docs/contribute/model_supports.rst (1 hunks)
docs/contribute/roadmap.rst (1 hunks)
docs/cpu_backend/arm/index.rst (1 hunks)
docs/cpu_backend/arm/mllm_blas.rst (1 hunks)
docs/cpu_backend/arm/multithread_behaviors.rst (1 hunks)
docs/cpu_backend/fa2_radix_paged.rst (1 hunks)
docs/cpu_backend/index.rst (1 hunks)
docs/cpu_backend/threads.rst (1 hunks)
docs/cpu_backend/x86/index.rst (1 hunks)
docs/index.rst (1 hunks)
docs/make.bat (1 hunks)
docs/qa/index.rst (1 hunks)
docs/qnn_backend/core_design.rst (1 hunks)
docs/qnn_backend/index.rst (1 hunks)
docs/qnn_backend/qnn_model_convert.rst (1 hunks)

⛔ Files not processed due to max files limit (17)

docs/qnn_backend/setup_env.rst
docs/quantization/data_types.rst
docs/quantization/how_to_add_new_dtype.rst
docs/quantization/index.rst
docs/quick_start/how_to_add_backend.rst
docs/quick_start/how_to_add_op.rst
docs/quick_start/how_to_async.rst
docs/quick_start/how_to_model.rst
docs/quick_start/how_to_perf.rst
docs/quick_start/index.rst
docs/requirements.txt
docs/service/index.rst
docs/service/mllm_cli.rst
docs/talks/index.rst
examples/CMakeLists.txt
examples/deepseek_ocr/CMakeLists.txt
examples/deepseek_ocr/main.cpp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chenghuaWang and others added 30 commits October 15, 2025 22:02

Merge pull request #481 from chenghuaWang/v2

99446d3

feat(cli): add mllm-llm-benchmark tool for performance testing

feat(linalg): add name attribute printing for AOp in dump method

21f5173

feat(qnn): refactor QNNLinearOp reshape and quantization handling

7f7f581

feat(qnn): implement QNNRMSNormOp for RMS normalization

9ce5427

Merge branch 'UbiquitousLearning:v2' into v2

d08c31a

fix(qnn): remove redundant dtype checks in cast type operations

fcf96e0

feat(devcontainer): add cu128 development container configuration

ea8594b

Adds a new devcontainer.json file for cu128 environment with comprehensive VS Code extension setup including Python, C++, debugging, and formatting tools.

Merge pull request #482 from chenghuaWang/v2

72f292e

feat(qwen3): add config and quantization files for 0.6B model

feat(qnn): implement QNNAddOp for element-wise addition

22bf73d

Merge branch 'UbiquitousLearning:v2' into v2

68f516e

Merge pull request #483 from chenghuaWang/v2

3401c62

feat(cpu): add inplace rmsnorm implementations for fp32 and fp16

feat(mllm): add trace mode support for tensor formatting

5f2fe95

- Skip tensor data printing when trace mode is enable

feat(nn): rename module impl to unique name during trace

71830cd

feat(cache): add clearCache functionality to KVCache and StaticCache

0708487

feat(cpu): add clearCache method to KVCacheOp

4896582

feat(qnn): implement QNNSiLUOp for SiLU activation support

015c5ea

feat(qnn): implement QNN transpose and view operations

3186a7f

feat(qnn): add QNNParamOp implementation

f1be02f

feat(tensor): add hash computation based on tensor and views uuids

85f7649

- Implemented hash() method combining tensor uuid and attached views uuids - Updated tensor IR caching to use hash instead of uuid

feat(qnn): add quantization helper functions for QNN tensors

749d0cc

jialilve and others added 28 commits November 20, 2025 06:24

Merge remote-tracking branch 'upstream/v2' into feature/qwen-npu-deco…

e5ac2d4

…ding

chore: bump kleidiai submodule to 84796ec

2f6077b

Merge branch 'v2' into feat/deepseek-ocr-support

2b99c1d

Merge pull request #537 from jialilve/feature/qwen-npu-decoding

58da27e

feat: Implement Qwen NPU Decoding Support with Memory Management Fixes

feat: add Optimized DeepSeek-CLI support

3add611

add tests/cpu/CausalMaskOpTest.hpp

9addc5b

Merge branch 'UbiquitousLearning:v2' into v2

7172364

Merge pull request #539 from oreomaker/v2

091ee54

QNN Op Package Migrate to v2

Merge pull request #534 from yuerqiqi/feat/deepseek-ocr-support

9d081db

feat: add DeepSeek-OCR support, C++ API updates, and dual-model loadi…

update mllm/backends/cpu/kernels/common/paged_attn/fwd_bshd.hpp

8fa9644

Merge pull request #538 from jialilve/feature/qwen-npu-decoding

93378ac

test: fix CausalMaskOp CPU coverage

feat(build): update threading options for Apple GCD support in build …

ab2935f

…configurations

Merge branch 'UbiquitousLearning:v2' into v2

12d32ce

Merge branch 'UbiquitousLearning:v2' into v2

c85d2e0

Merge pull request #541 from oreomaker/v2

9737443

update docs

Update MLLM V1 support details in README

a29e1d5

Add note on model version compatibility

94e7e31

Added a note about model version compatibility and recommendations.

Merge pull request #540 from chenghuaWang/v2

44018f1

feat(build): update threading options for Apple GCD support in build configurations

fix(docs): update links for Qwen2 and Qwen2.5 models in README

f1c775c

Merge pull request #542 from chenghuaWang/v2

e76fffa

fix(docs): update links for Qwen2 and Qwen2.5 models in README

feat(docs): add mllm-params-inspector tool usage instructions to README

0fb4ba5

Merge branch 'UbiquitousLearning:v2' into v2

73725e8

Merge pull request #543 from chenghuaWang/v2

612b220

feat(docs): add mllm-params-inspector tool usage instructions to README

docs(readme): add OrangePi AI Pro and Studio build status

77630f2

Add build status entries for OrangePi AI Pro (310B) and OrangePi AI Studio (310P) with Ubuntu 22.04 in the compatibility matrix.

Merge pull request #544 from chenghuaWang/v2

0ff1f20

docs(readme): add OrangePi AI Pro and Studio build status

oreomaker closed this Nov 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V2 Release#545

V2 Release#545
oreomaker wants to merge 812 commits intomainfrom
v2

oreomaker commented Nov 23, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Nov 23, 2025 •

edited

Loading

Review failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

oreomaker commented Nov 23, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

oreomaker commented Nov 23, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Nov 23, 2025 •

edited

Loading