feat(cpu, smollm3-tokenizer): add KAI SGEMM NEON implementation for ARM by chenghuaWang · Pull Request #503 · UbiquitousLearning/mllm

chenghuaWang · 2025-11-03T13:21:13Z

feat(cpu): add KAI SGEMM NEON implementation for ARM

Introduce KaiLinear_fp32_fp32_fp32p_mxk_kxn kernel for fp32 GEMM on ARM NEON
Add new linear implementation types: kMllmBlas_KAI_SGEMM_NT_NT_NEON and kMllmBlas_KAI_SGEMM_NT_T_SME
Update CMake options with Android performance hints and profiling components
Enhance ParameterFile loading with optional mmap support
Refactor matmul tests to include manual reference computation
Add Android performance hint headers for future optimizations

This commit enables high-performance fp32 linear operations on ARM CPUs using KAI kernels,
provides better control over memory mapping during model loading, and improves test coverage
for BLAS-like operations.

feat(examples): add smollm3 example with tokenizer and chat template

Add CMakeLists.txt for smollm3 example executable
Implement main.cpp with SmolLM3Tokenizer usage
Include tokenization logic with thinking/non-thinking templates
Support dynamic date insertion in chat templates
Enable BPE-based encoding/decoding workflows

Summary by CodeRabbit

Release Notes

New Features
- Added SmollLM3 3B tokenizer with chat template and byte-pair encoding support
- Added example demonstrating SmollLM3 model inference
Performance
- Introduced memory-mapped parameter file loading for improved memory efficiency
- Added Arm NEON-optimized fp32 linear layer kernel for faster computation
Configuration
- Added Android burst performance hints build option

- Introduce `KaiLinear_fp32_fp32_fp32p_mxk_kxn` kernel for fp32 GEMM on ARM NEON - Add new linear implementation types: `kMllmBlas_KAI_SGEMM_NT_NT_NEON` and `kMllmBlas_KAI_SGEMM_NT_T_SME` - Update CMake options with Android performance hints and profiling components - Enhance ParameterFile loading with optional mmap support - Refactor matmul tests to include manual reference computation - Add Android performance hint headers for future optimizations This commit enables high-performance fp32 linear operations on ARM CPUs using KAI kernels, provides better control over memory mapping during model loading, and improves test coverage for BLAS-like operations.

…support - Add CMakeLists.txt for smollm3 example executable - Implement main.cpp with SmolLM3Tokenizer usage - Include tokenization logic with thinking/non-thinking templates - Support dynamic date insertion in chat templates - Enable BPE-based encoding/decoding workflows

coderabbitai · 2025-11-03T13:21:24Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR adds SmolLM3 model support with a dedicated tokenizer implementation, introduces fp32 linear kernel implementations for ARM64 via Kai, enables memory-mapped (mmap) parameter file loading, integrates new Kai SGEMM NEON/SME backend implementations, and adds Android burst performance hints configuration across the build system and CPU backend infrastructure.

Changes

Cohort / File(s)	Summary
CMake Configuration `CMakeLists.txt`, `examples/CMakeLists.txt`	Adds `MLLM_ANDROID_BURST_PERFORMANCE_HINTS` cache option with default OFF; adds smollm3 example subdirectory reference.
SmolLM3 Example Project `examples/smollm3/CMakeLists.txt`, `examples/smollm3/main.cpp`	Introduces new executable `mllm-smollm3-runner` linked to MllmRT and MllmCPUBackend, with example program demonstrating chat-template encoding and decoding via SmolLM3Tokenizer.
ARM Kai fp32 Kernels `mllm/backends/cpu/kernels/arm/linear/kai.hpp`, `mllm/backends/cpu/kernels/arm/linear/kai.cpp`	Adds new `KaiLinear_fp32_fp32_fp32p_mxk_kxn` kernel structure with methods for workspace sizing, RHS packing, and tiled matmul using Neon-optimized Kai ukernels; includes fp32-specific headers for matmul and RHS packing.
CPU Ops Integration `mllm/backends/cpu/ops/LinearOp.cpp`	Extends load path to recognize Kai SGEMM NEON/SME impl types; adds BLAS selection logic, RHS pre-transposition, and Kai packing in post-load phase; adds dedicated Kai NEON forward path with kernel invocation.
Op Enumeration `mllm/core/aops/LinearOp.hpp`	Adds two new `LinearImplTypes` enumerators: `kMllmBlas_KAI_SGEMM_NT_NT_NEON` and `kMllmBlas_KAI_SGEMM_NT_T_SME`.
MatMul Op `mllm/backends/cpu/ops/MatMulOp.cpp`	Adds informational comment noting kGGUF is buggy in auto matmul-type inference branch.
Parameter File mmap Support `mllm/core/ParameterFile.hpp`, `mllm/core/ParameterFile.cpp`	Extends read() signatures with optional `bool mmap = true` parameter; implements dual code paths for mmap-based and traditional binary file loading for V1 and V2 CPU model files, with descriptor parsing and tensor mapping.
Public API `mllm/mllm.hpp`, `mllm/mllm.cpp`	Adds optional `bool mmap = true` parameter to `load()` function; propagates mmap flag to underlying ParameterFileIOImpl calls.
SmolLM3 Tokenizer `mllm/models/smollm3_3B/tokenization_smollm3.hpp`	Introduces `SmolLM3Tokenizer` class extending `AutoTokenizerUTF8` with BPE-based encode/decode, chat-template support, special-token trie management, and byte-level pre-tokenization; includes `SmolLM3Message` struct with template strings for thinking/non-thinking modes.
Android Hints `mllm/engine/hints/Android.hpp`	Adds new header file with standard guards and includes for Android performance hints infrastructure.
Tests `tests/cpu/MllmBlasArmSgemmKernelTest.hpp`, `tests/cpu/MllmBlasArmSgemvKernelTest.hpp`	Replaces BLAS matmul calls with explicit nested-loop CPU computation in Sgemm tests; updates Sgemv test with uniform random tensor initialization and shape adjustments from {1, D} to {1, S}.

Sequence Diagram(s)

sequenceDiagram
    participant User as Application
    participant Loader as ParameterFile::load()
    participant Reader as ParameterFileIOImpl::read()
    participant File as Disk/Memory
    participant Tensor as TensorStorage

    User->>Loader: load(file_name, version, device, mmap=true)
    Loader->>Reader: read(file_path, mmap=true)
    
    alt mmap enabled
        Reader->>File: mmap(file_path)
        Reader->>File: validate header
        Reader->>File: parse descriptors from mapped region
        Reader->>Tensor: create tensor with MMAP memory type
        Tensor-->>Reader: tensor view into mapped data
    else mmap disabled
        Reader->>File: open binary file
        Reader->>File: read header
        Reader->>File: read/allocate descriptors
        Reader->>File: seek to data offset
        Reader->>Tensor: allocate storage
        Reader->>File: read data into storage
    end
    
    Reader-->>Loader: ParameterFile with tensors
    Loader-->>User: ParameterFile::ptr_t

sequenceDiagram
    participant App as Application
    participant LinearOp as LinearOp
    participant Loader as Load Phase
    participant Forward as Forward Phase
    participant Kai as KaiLinear_fp32

    App->>LinearOp: load(impl_type=KAI_SGEMM_NT_NT_NEON)
    LinearOp->>Loader: identify impl_type
    Loader->>Loader: select BLAS or default backend
    Loader->>Loader: pretranspose weight if needed
    Loader->>Kai: quant_pack_rhs_offline(weight, bias)
    Kai-->>Loader: packed_weight
    Loader->>LinearOp: store packed_weight
    
    App->>LinearOp: forward(lhs, weight, bias)
    LinearOp->>Forward: route to KAI_SGEMM_NT_NT_NEON case
    Forward->>Kai: matmul(dst, lhs, packed_weight, workspace, M, K, N, threads)
    Kai->>Kai: tile matmul with M/N steps
    Kai->>Kai: invoke ukernel for each tile
    Kai-->>Forward: compute result into dst
    Forward-->>LinearOp: dst
    LinearOp-->>App: output

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

ParameterFile mmap implementation (ParameterFile.cpp/hpp): High-density I/O logic with dual code paths for V1/V2, file mapping, descriptor parsing, and memory management requiring careful validation.
Kai fp32 kernel implementations (kai.cpp/hpp): Dense kernel logic with ukernel configuration, tiled matmul scheduling, and multi-threaded execution; requires understanding of Kai/Neon architecture.
LinearOp integration (LinearOp.cpp): Complex control flow additions with new backend routing, pre/post-processing phases, and forward-path dispatching; affects existing code paths.
SmolLM3Tokenizer (tokenization_smollm3.hpp): Substantial tokenizer implementation with BPE vocab management, special-token trie, chat templates, and UTF-8 handling.
Heterogeneous scope: Changes span CMake, kernels, ops, parameter I/O, tokenization, and tests across multiple domains, each requiring separate reasoning.

Areas requiring extra attention:

Mmap memory lifetime and file handle management in ParameterFile.cpp
Thread safety and boundary conditions in Kai tiled matmul loop
Correct weight pre-transposition and packing integration in LinearOp load phase
Special token trie correctness and chat template substitution in tokenizer

Possibly related PRs

PR feat: add kai&qnn-vl&opencl #489: Extends CPU ARM/Kai backend with kernel structures and CMake integration; directly overlaps with this PR's Kai kernel additions and backend wiring.
PR feat(qnn): Basic QNN Prefill on v2 #485: Adds CustomizedOp plugin interface extensions; related through potential operator customization patterns that may interact with new backend implementations.

Suggested reviewers

yirongjie
oreomaker
liang1232018

Poem

🐰 A tokenizer hop and Kai kernels bloom,
Memory-mapped files chase away the gloom,
SmolLM3 speaks in fp32 grace,
While hints burst Android's race,
NEON optimizations light the room! ✨

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title 'feat(cpu, smollm3-tokenizer): add KAI SGEMM NEON implementation for ARM' accurately captures the main objectives of the changeset. It highlights the primary contribution: adding a KAI SGEMM NEON implementation for ARM processors, while also referencing the smollm3-tokenizer aspect. The title is specific, concise, and clearly conveys the key technical change without being vague or misleading.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Change memory type assignment from kGlobal to kParamsNormal in ParameterFile.cpp to correctly handle parameter allocation.

chenghuaWang · 2025-11-03T13:22:49Z

@coderabbitai review

coderabbitai · 2025-11-03T13:23:00Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (6)

tests/cpu/MllmBlasArmSgemmKernelTest.hpp (3)
27-44: Optional: address static analysis style hints.

The static analyzer flags several style issues in this block:

Variable naming convention: DST, M, K, N (expected snake_case or camelCase per project style)

Variable name length: M, K, N are flagged as too short

Float literal suffix: 0.0f should be 0.0F (uppercase)

These are not functional issues but could improve consistency if the project enforces these rules.

Example diff (if adopting lowercase naming and uppercase suffix):
-    auto DST = mllm::Tensor::emptyLike(RefDST).alloc();
+    auto dst = mllm::Tensor::emptyLike(RefDST).alloc();

     // Calculate DST.
     {
-      auto dst_ptr = DST.ptr<float>();
+      auto dst_ptr = dst.ptr<float>();
       auto a_ptr = A.ptr<float>();
       auto b_ptr = B.ptr<float>();
-      const int M = S_Q;
-      const int K = S_KV;
-      const int N = D;
-      for (int i = 0; i < M; ++i) {
-        for (int j = 0; j < N; ++j) {
-          float sum = 0.0f;
-          for (int k = 0; k < K; ++k) { sum += a_ptr[i * K + k] * b_ptr[k * N + j]; }
-          dst_ptr[i * N + j] = sum;
+      const int num_rows = S_Q;
+      const int inner_dim = S_KV;
+      const int num_cols = D;
+      for (int i = 0; i < num_rows; ++i) {
+        for (int j = 0; j < num_cols; ++j) {
+          float sum = 0.0F;
+          for (int k = 0; k < inner_dim; ++k) { sum += a_ptr[i * inner_dim + k] * b_ptr[k * num_cols + j]; }
+          dst_ptr[i * num_cols + j] = sum;
         }
       }
     }

-    auto result = mllm::test::allClose(DST, RefDST);
+    auto result = mllm::test::allClose(dst, RefDST);
71-88: Optional: address static analysis style hints.

Similar to the first test, the static analyzer flags style issues:

Variable naming: DST, M, K, N

Variable name length: M, K, N

Float literal suffix: 0.0f vs 0.0F

These are the same concerns as in the previous function. If you decide to address them, apply consistent changes across both test functions.

29-44: Consider extracting common matmul logic to reduce duplication.

Both test functions contain nearly identical nested-loop matrix multiplication logic, differing only in the inner loop's B matrix access pattern (transposed vs non-transposed). Consider extracting a helper function to reduce duplication:

Example helper approach:
// Helper to compute C = A * B (or A * B^T if transpose_b is true)
static void compute_matmul_cpu(
    float* c_ptr, const float* a_ptr, const float* b_ptr,
    int M, int K, int N, bool transpose_b) {
  for (int i = 0; i < M; ++i) {
    for (int j = 0; j < N; ++j) {
      float sum = 0.0f;
      for (int k = 0; k < K; ++k) {
        const float b_val = transpose_b ? b_ptr[j * K + k] : b_ptr[k * N + j];
        sum += a_ptr[i * K + k] * b_val;
      }
      c_ptr[i * N + j] = sum;
    }
  }
}
Then call it from both test functions, reducing maintenance burden and improving readability.

Also applies to: 73-88
CMakeLists.txt (1)
45-46: Consider wrapping the long option description for better readability.

The new MLLM_ANDROID_BURST_PERFORMANCE_HINTS option is a good addition for Android performance optimization. However, the description string on Line 46 is quite long (over 120 characters). Consider wrapping it across multiple lines for better maintainability.

Apply this diff to improve readability:
-option(MLLM_ANDROID_BURST_PERFORMANCE_HINTS "If MLLM need use APerformanceHintManager to tell android we need best performance" OFF)
+option(MLLM_ANDROID_BURST_PERFORMANCE_HINTS 
+       "If MLLM need use APerformanceHintManager to tell android we need best performance" 
+       OFF)
Alternatively, shorten the description:
-option(MLLM_ANDROID_BURST_PERFORMANCE_HINTS "If MLLM need use APerformanceHintManager to tell android we need best performance" OFF)
+option(MLLM_ANDROID_BURST_PERFORMANCE_HINTS 
+       "Enable Android APerformanceHintManager for burst performance hints" 
+       OFF)
mllm/backends/cpu/ops/MatMulOp.cpp (1)

53-54: TODO comment indicates known kGGUF bug.

The comment on Line 53 flags that the kGGUF matmul type is still buggy, yet Line 54 still selects it under certain conditions. This could lead to incorrect results in production.

Do you want me to:

Generate a verification script to check if there's an existing issue tracking this bug?

Open a new issue to track this bug with details about when kGGUF is selected and what the expected fix timeline is?

Suggest adding a runtime warning when kGGUF is selected to alert users of potential issues?
examples/smollm3/main.cpp (1)
13-16: Consider adding error handling for invalid tokenizer path.

The example creates a SmolLM3Tokenizer without verifying that the provided path is valid. If the tokenizer file is missing or corrupted, this will likely throw an exception or crash. For a user-facing example, consider adding error handling to provide a helpful message.
   {
+    try {
       auto tokenizer = mllm::models::smollm3::SmolLM3Tokenizer(tokenizer_path.get());
       auto ids = tokenizer.encode(tokenizer.applyChatTemplate("Bonjour 😈", false));
       mllm::print(ids);
       mllm::print(tokenizer.decode(ids));
+    } catch (const std::exception& e) {
+      fmt::print(stderr, "Error loading tokenizer from '{}': {}\n", tokenizer_path.get(), e.what());
+      return 1;
+    }
   }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b301ca and 8494a34.

📒 Files selected for processing (17)

CMakeLists.txt (1 hunks)
examples/CMakeLists.txt (1 hunks)
examples/smollm3/CMakeLists.txt (1 hunks)
examples/smollm3/main.cpp (1 hunks)
mllm/backends/cpu/kernels/arm/linear/kai.cpp (2 hunks)
mllm/backends/cpu/kernels/arm/linear/kai.hpp (2 hunks)
mllm/backends/cpu/ops/LinearOp.cpp (4 hunks)
mllm/backends/cpu/ops/MatMulOp.cpp (1 hunks)
mllm/core/ParameterFile.cpp (2 hunks)
mllm/core/ParameterFile.hpp (1 hunks)
mllm/core/aops/LinearOp.hpp (1 hunks)
mllm/engine/hints/Android.hpp (1 hunks)
mllm/mllm.cpp (1 hunks)
mllm/mllm.hpp (1 hunks)
mllm/models/smollm3_3B/tokenization_smollm3.hpp (1 hunks)
tests/cpu/MllmBlasArmSgemmKernelTest.hpp (2 hunks)
tests/cpu/MllmBlasArmSgemvKernelTest.hpp (2 hunks)

🧰 Additional context used

🪛 Clang (14.0.6)

tests/cpu/MllmBlasArmSgemvKernelTest.hpp

[error] 22-22: variable name 'A' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 22-22: invalid case style for variable 'A'

(readability-identifier-naming,-warnings-as-errors)

[error] 23-23: variable name 'B' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 23-23: invalid case style for variable 'B'

(readability-identifier-naming,-warnings-as-errors)

[error] 24-24: variable name 'C' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 24-24: invalid case style for variable 'C'

(readability-identifier-naming,-warnings-as-errors)

[error] 25-25: invalid case style for variable 'DST'

(readability-identifier-naming,-warnings-as-errors)

[error] 35-35: invalid case style for variable 'DSTP'

(readability-identifier-naming,-warnings-as-errors)

mllm/engine/hints/Android.hpp

[error] 5-5: 'android/performance_hint.h' file not found

(clang-diagnostic-error)

tests/cpu/MllmBlasArmSgemmKernelTest.hpp

[error] 27-27: invalid case style for variable 'DST'

(readability-identifier-naming,-warnings-as-errors)

[error] 34-34: variable name 'M' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 34-34: invalid case style for variable 'M'

(readability-identifier-naming,-warnings-as-errors)

[error] 35-35: variable name 'K' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 35-35: invalid case style for variable 'K'

(readability-identifier-naming,-warnings-as-errors)

[error] 36-36: variable name 'N' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 36-36: invalid case style for variable 'N'

(readability-identifier-naming,-warnings-as-errors)

[error] 39-39: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 71-71: invalid case style for variable 'DST'

(readability-identifier-naming,-warnings-as-errors)

[error] 78-78: variable name 'M' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 78-78: invalid case style for variable 'M'

(readability-identifier-naming,-warnings-as-errors)

[error] 79-79: variable name 'K' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 79-79: invalid case style for variable 'K'

(readability-identifier-naming,-warnings-as-errors)

[error] 80-80: variable name 'N' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 80-80: invalid case style for variable 'N'

(readability-identifier-naming,-warnings-as-errors)

[error] 83-83: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

mllm/backends/cpu/kernels/arm/linear/kai.hpp

[error] 56-56: invalid case style for class 'KaiLinear_fp32_fp32_fp32p_mxk_kxn'

(readability-identifier-naming,-warnings-as-errors)

[error] 57-57: method 'need_pack_lhs' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 59-59: method 'need_pack_rhs' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 66-66: parameter name 'K' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 66-66: parameter name 'N' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 69-69: parameter name 'M' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 69-69: parameter name 'K' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 69-69: parameter name 'N' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 72-72: variable 'ukernel_' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 72-72: invalid case style for variable 'ukernel_'

(readability-identifier-naming,-warnings-as-errors)

mllm/models/smollm3_3B/tokenization_smollm3.hpp

[error] 5-5: 'string' file not found

(clang-diagnostic-error)

[error] 21-21: do not declare C-style arrays, use std::array<> instead

(cppcoreguidelines-avoid-c-arrays,-warnings-as-errors)

[error] 25-25: constructor does not initialize these fields: prompt

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 29-29: variable 'no_think_template_str' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 35-35: variable 'think_template_str' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 53-53: constructor does not initialize these fields: bpe_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 69-69: method 'replaceAll' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 69-69: parameter name 's' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 69-69: 2 adjacent parameters of 'replaceAll' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 69-69: parameter name 'to' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 71-71: variable 'pos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 79-79: variable 't' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 79-79: variable name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 80-80: variable 'tm_' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 80-80: invalid case style for variable 'tm_'

(readability-identifier-naming,-warnings-as-errors)

[error] 82-82: variable 'oss' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 84-84: variable 'date_in_number' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 86-86: do not declare C-style arrays, use std::array<> instead

(cppcoreguidelines-avoid-c-arrays,-warnings-as-errors)

[error] 88-88: variable 'month' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 89-89: variable 'year' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 91-91: variable 'tpl' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 144-144: variable 'ret' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

examples/smollm3/main.cpp

[error] 1-1: 'fmt/core.h' file not found

(clang-diagnostic-error)

[error] 8-8: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 8-8: invalid case style for variable 'MLLM_MAIN'

(readability-identifier-naming,-warnings-as-errors)

mllm/mllm.cpp

[error] 95-95: 3 adjacent parameters of 'load' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 96-96: repeated branch in conditional chain

(bugprone-branch-clone,-warnings-as-errors)

mllm/backends/cpu/kernels/arm/linear/kai.cpp

[error] 4-4: 'limits' file not found

(clang-diagnostic-error)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-android
GitHub Check: build-macos

🔇 Additional comments (7)

tests/cpu/MllmBlasArmSgemvKernelTest.hpp (2)

22-25: LGTM! Improved test coverage and corrected output shape.

The changes improve the test in two ways:

Random tensor initialization with range [-1, 1] provides better coverage than fixed values

The DST shape correction from {1, D} to {1, S} properly reflects the matrix multiplication result: {1, D} @ {D, S} = {1, S}

35-35: LGTM! Consistent shape correction.

The DSTP shape now correctly matches DST's {1, S} shape, ensuring both the baseline and optimized implementations are tested with the proper output dimensions.

tests/cpu/MllmBlasArmSgemmKernelTest.hpp (2)

27-44: Good approach: manual CPU reference for BLAS testing.

Replacing the BLAS matmul call with an explicit nested-loop implementation provides a clear, simple reference to validate the BLAS kernels against. The logic correctly computes the matrix product for row-major layout.

71-88: Correctly implements transposed matrix multiply.

The manual computation properly handles the transposed B matrix by accessing b_ptr[j * K + k] instead of b_ptr[k * N + j], which correctly represents B^T[k,j] in row-major layout.

mllm/models/smollm3_3B/tokenization_smollm3.hpp (1)

127-128: Clarify the comment about regex processing.

Line 128 states "No need to Regex:" and then creates a single-element initializer list. This suggests regex processing was intentionally skipped or is a placeholder. Consider clarifying why regex processing is not needed here, or if this is temporary code that should be implemented.

Based on the GPT2_EXPR defined at Line 21 and the byteLevelPreTokenizer using unicode_regex_split at Line 164, it appears regex processing should occur. Please verify if this is intentional or if regex-based splitting should be applied here before byte-level pre-tokenization.

examples/CMakeLists.txt (1)

10-10: LGTM!

The addition of the smollm3 subdirectory properly integrates the new SmolLM3 example into the build system.

mllm/mllm.cpp (1)

95-100: LGTM! The mmap parameter is properly propagated.

The addition of the mmap parameter to the load function signature and its propagation to the underlying ParameterFileIOImpl::read calls is clean and consistent. This enables optional memory-mapped file loading for better performance.

Note: The static analysis warning about "easily swappable parameters" is a minor concern. If you want to make the API more robust against parameter swapping, consider using an options struct or strong types in the future, but this is not critical for the current change.

When bias is null, allocate and explicitly zero-initialize the bias array to ensure correct behavior during offline packing. fix(cpu): use transposed weight dimensions for packing calculations Corrected the dimension parameters passed to quant_pack_rhs_size and quant_pack_rhs_offline to use transposed weight tensor sizes. feat(core): add new MllmBlas KAI SGEMM implementation types Registered new linear implementation types for KAI-based SGEMM with NEON and SME backends in both NT/NT and NT/T configurations. feat(engine): conditionally include Android performance hints Wrapped Android-specific headers in preprocessor guards to avoid build errors on non-Android platforms. refactor(models): mark template strings as const in SmolLM3 tokenizer Changed static inline string variables to be explicitly const to enforce immutability and improve code clarity.

chenghuaWang and others added 3 commits November 3, 2025 17:24

Merge branch 'UbiquitousLearning:v2' into v2

16ee063

chenghuaWang requested review from oreomaker and yirongjie as code owners November 3, 2025 13:21

fix(core): update memory type assignment in ParameterFileIOImpl

8494a34

Change memory type assignment from kGlobal to kParamsNormal in ParameterFile.cpp to correctly handle parameter allocation.

coderabbitai Bot reviewed Nov 3, 2025

View reviewed changes

chenghuaWang merged commit a1b2032 into UbiquitousLearning:v2 Nov 3, 2025
3 checks passed

coderabbitai Bot mentioned this pull request Nov 20, 2025

feat(cpu-backend): add support for SME2 and SVE2 in ARM backend configurations #533

Merged

coderabbitai Bot mentioned this pull request Dec 13, 2025

feat(qwen3, cpu): add support for Qwen3 model on x86 architecture #561

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cpu, smollm3-tokenizer): add KAI SGEMM NEON implementation for ARM#503

feat(cpu, smollm3-tokenizer): add KAI SGEMM NEON implementation for ARM#503
chenghuaWang merged 5 commits intoUbiquitousLearning:v2from
chenghuaWang:v2

chenghuaWang commented Nov 3, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Nov 3, 2025 •

edited

Loading

Review skipped

Uh oh!

chenghuaWang commented Nov 3, 2025

Uh oh!

coderabbitai Bot commented Nov 3, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chenghuaWang commented Nov 3, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

chenghuaWang commented Nov 3, 2025

Uh oh!

coderabbitai Bot commented Nov 3, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chenghuaWang commented Nov 3, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Nov 3, 2025 •

edited

Loading