Skip to content

feat(cpu, smollm3-tokenizer): add KAI SGEMM NEON implementation for ARM#503

Merged
chenghuaWang merged 5 commits intoUbiquitousLearning:v2from
chenghuaWang:v2
Nov 3, 2025
Merged

feat(cpu, smollm3-tokenizer): add KAI SGEMM NEON implementation for ARM#503
chenghuaWang merged 5 commits intoUbiquitousLearning:v2from
chenghuaWang:v2

Conversation

@chenghuaWang
Copy link
Copy Markdown
Collaborator

@chenghuaWang chenghuaWang commented Nov 3, 2025

feat(cpu): add KAI SGEMM NEON implementation for ARM

  • Introduce KaiLinear_fp32_fp32_fp32p_mxk_kxn kernel for fp32 GEMM on ARM NEON
  • Add new linear implementation types: kMllmBlas_KAI_SGEMM_NT_NT_NEON and kMllmBlas_KAI_SGEMM_NT_T_SME
  • Update CMake options with Android performance hints and profiling components
  • Enhance ParameterFile loading with optional mmap support
  • Refactor matmul tests to include manual reference computation
  • Add Android performance hint headers for future optimizations

This commit enables high-performance fp32 linear operations on ARM CPUs using KAI kernels,
provides better control over memory mapping during model loading, and improves test coverage
for BLAS-like operations.

feat(examples): add smollm3 example with tokenizer and chat template

  • Add CMakeLists.txt for smollm3 example executable
  • Implement main.cpp with SmolLM3Tokenizer usage
  • Include tokenization logic with thinking/non-thinking templates
  • Support dynamic date insertion in chat templates
  • Enable BPE-based encoding/decoding workflows

Summary by CodeRabbit

Release Notes

  • New Features

    • Added SmollLM3 3B tokenizer with chat template and byte-pair encoding support
    • Added example demonstrating SmollLM3 model inference
  • Performance

    • Introduced memory-mapped parameter file loading for improved memory efficiency
    • Added Arm NEON-optimized fp32 linear layer kernel for faster computation
  • Configuration

    • Added Android burst performance hints build option

chenghuaWang and others added 3 commits November 3, 2025 17:24
- Introduce `KaiLinear_fp32_fp32_fp32p_mxk_kxn` kernel for fp32 GEMM on ARM NEON
- Add new linear implementation types: `kMllmBlas_KAI_SGEMM_NT_NT_NEON` and `kMllmBlas_KAI_SGEMM_NT_T_SME`
- Update CMake options with Android performance hints and profiling components
- Enhance ParameterFile loading with optional mmap support
- Refactor matmul tests to include manual reference computation
- Add Android performance hint headers for future optimizations

This commit enables high-performance fp32 linear operations on ARM CPUs using KAI kernels,
provides better control over memory mapping during model loading, and improves test coverage
for BLAS-like operations.
…support

- Add CMakeLists.txt for smollm3 example executable
- Implement main.cpp with SmolLM3Tokenizer usage
- Include tokenization logic with thinking/non-thinking templates
- Support dynamic date insertion in chat templates
- Enable BPE-based encoding/decoding workflows
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 3, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR adds SmolLM3 model support with a dedicated tokenizer implementation, introduces fp32 linear kernel implementations for ARM64 via Kai, enables memory-mapped (mmap) parameter file loading, integrates new Kai SGEMM NEON/SME backend implementations, and adds Android burst performance hints configuration across the build system and CPU backend infrastructure.

Changes

Cohort / File(s) Summary
CMake Configuration
CMakeLists.txt, examples/CMakeLists.txt
Adds MLLM_ANDROID_BURST_PERFORMANCE_HINTS cache option with default OFF; adds smollm3 example subdirectory reference.
SmolLM3 Example Project
examples/smollm3/CMakeLists.txt, examples/smollm3/main.cpp
Introduces new executable mllm-smollm3-runner linked to MllmRT and MllmCPUBackend, with example program demonstrating chat-template encoding and decoding via SmolLM3Tokenizer.
ARM Kai fp32 Kernels
mllm/backends/cpu/kernels/arm/linear/kai.hpp, mllm/backends/cpu/kernels/arm/linear/kai.cpp
Adds new KaiLinear_fp32_fp32_fp32p_mxk_kxn kernel structure with methods for workspace sizing, RHS packing, and tiled matmul using Neon-optimized Kai ukernels; includes fp32-specific headers for matmul and RHS packing.
CPU Ops Integration
mllm/backends/cpu/ops/LinearOp.cpp
Extends load path to recognize Kai SGEMM NEON/SME impl types; adds BLAS selection logic, RHS pre-transposition, and Kai packing in post-load phase; adds dedicated Kai NEON forward path with kernel invocation.
Op Enumeration
mllm/core/aops/LinearOp.hpp
Adds two new LinearImplTypes enumerators: kMllmBlas_KAI_SGEMM_NT_NT_NEON and kMllmBlas_KAI_SGEMM_NT_T_SME.
MatMul Op
mllm/backends/cpu/ops/MatMulOp.cpp
Adds informational comment noting kGGUF is buggy in auto matmul-type inference branch.
Parameter File mmap Support
mllm/core/ParameterFile.hpp, mllm/core/ParameterFile.cpp
Extends read() signatures with optional bool mmap = true parameter; implements dual code paths for mmap-based and traditional binary file loading for V1 and V2 CPU model files, with descriptor parsing and tensor mapping.
Public API
mllm/mllm.hpp, mllm/mllm.cpp
Adds optional bool mmap = true parameter to load() function; propagates mmap flag to underlying ParameterFileIOImpl calls.
SmolLM3 Tokenizer
mllm/models/smollm3_3B/tokenization_smollm3.hpp
Introduces SmolLM3Tokenizer class extending AutoTokenizerUTF8 with BPE-based encode/decode, chat-template support, special-token trie management, and byte-level pre-tokenization; includes SmolLM3Message struct with template strings for thinking/non-thinking modes.
Android Hints
mllm/engine/hints/Android.hpp
Adds new header file with standard guards and includes for Android performance hints infrastructure.
Tests
tests/cpu/MllmBlasArmSgemmKernelTest.hpp, tests/cpu/MllmBlasArmSgemvKernelTest.hpp
Replaces BLAS matmul calls with explicit nested-loop CPU computation in Sgemm tests; updates Sgemv test with uniform random tensor initialization and shape adjustments from {1, D} to {1, S}.

Sequence Diagram(s)

sequenceDiagram
    participant User as Application
    participant Loader as ParameterFile::load()
    participant Reader as ParameterFileIOImpl::read()
    participant File as Disk/Memory
    participant Tensor as TensorStorage

    User->>Loader: load(file_name, version, device, mmap=true)
    Loader->>Reader: read(file_path, mmap=true)
    
    alt mmap enabled
        Reader->>File: mmap(file_path)
        Reader->>File: validate header
        Reader->>File: parse descriptors from mapped region
        Reader->>Tensor: create tensor with MMAP memory type
        Tensor-->>Reader: tensor view into mapped data
    else mmap disabled
        Reader->>File: open binary file
        Reader->>File: read header
        Reader->>File: read/allocate descriptors
        Reader->>File: seek to data offset
        Reader->>Tensor: allocate storage
        Reader->>File: read data into storage
    end
    
    Reader-->>Loader: ParameterFile with tensors
    Loader-->>User: ParameterFile::ptr_t
Loading
sequenceDiagram
    participant App as Application
    participant LinearOp as LinearOp
    participant Loader as Load Phase
    participant Forward as Forward Phase
    participant Kai as KaiLinear_fp32

    App->>LinearOp: load(impl_type=KAI_SGEMM_NT_NT_NEON)
    LinearOp->>Loader: identify impl_type
    Loader->>Loader: select BLAS or default backend
    Loader->>Loader: pretranspose weight if needed
    Loader->>Kai: quant_pack_rhs_offline(weight, bias)
    Kai-->>Loader: packed_weight
    Loader->>LinearOp: store packed_weight
    
    App->>LinearOp: forward(lhs, weight, bias)
    LinearOp->>Forward: route to KAI_SGEMM_NT_NT_NEON case
    Forward->>Kai: matmul(dst, lhs, packed_weight, workspace, M, K, N, threads)
    Kai->>Kai: tile matmul with M/N steps
    Kai->>Kai: invoke ukernel for each tile
    Kai-->>Forward: compute result into dst
    Forward-->>LinearOp: dst
    LinearOp-->>App: output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • ParameterFile mmap implementation (ParameterFile.cpp/hpp): High-density I/O logic with dual code paths for V1/V2, file mapping, descriptor parsing, and memory management requiring careful validation.
  • Kai fp32 kernel implementations (kai.cpp/hpp): Dense kernel logic with ukernel configuration, tiled matmul scheduling, and multi-threaded execution; requires understanding of Kai/Neon architecture.
  • LinearOp integration (LinearOp.cpp): Complex control flow additions with new backend routing, pre/post-processing phases, and forward-path dispatching; affects existing code paths.
  • SmolLM3Tokenizer (tokenization_smollm3.hpp): Substantial tokenizer implementation with BPE vocab management, special-token trie, chat templates, and UTF-8 handling.
  • Heterogeneous scope: Changes span CMake, kernels, ops, parameter I/O, tokenization, and tests across multiple domains, each requiring separate reasoning.

Areas requiring extra attention:

  • Mmap memory lifetime and file handle management in ParameterFile.cpp
  • Thread safety and boundary conditions in Kai tiled matmul loop
  • Correct weight pre-transposition and packing integration in LinearOp load phase
  • Special token trie correctness and chat template substitution in tokenizer

Possibly related PRs

  • PR feat: add kai&qnn-vl&opencl #489: Extends CPU ARM/Kai backend with kernel structures and CMake integration; directly overlaps with this PR's Kai kernel additions and backend wiring.
  • PR feat(qnn): Basic QNN Prefill on v2 #485: Adds CustomizedOp plugin interface extensions; related through potential operator customization patterns that may interact with new backend implementations.

Suggested reviewers

  • yirongjie
  • oreomaker
  • liang1232018

Poem

🐰 A tokenizer hop and Kai kernels bloom,
Memory-mapped files chase away the gloom,
SmolLM3 speaks in fp32 grace,
While hints burst Android's race,
NEON optimizations light the room! ✨

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'feat(cpu, smollm3-tokenizer): add KAI SGEMM NEON implementation for ARM' accurately captures the main objectives of the changeset. It highlights the primary contribution: adding a KAI SGEMM NEON implementation for ARM processors, while also referencing the smollm3-tokenizer aspect. The title is specific, concise, and clearly conveys the key technical change without being vague or misleading.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Change memory type assignment from kGlobal to kParamsNormal in
ParameterFile.cpp to correctly handle parameter allocation.
@chenghuaWang
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 3, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (6)
tests/cpu/MllmBlasArmSgemmKernelTest.hpp (3)

27-44: Optional: address static analysis style hints.

The static analyzer flags several style issues in this block:

  • Variable naming convention: DST, M, K, N (expected snake_case or camelCase per project style)
  • Variable name length: M, K, N are flagged as too short
  • Float literal suffix: 0.0f should be 0.0F (uppercase)

These are not functional issues but could improve consistency if the project enforces these rules.

Example diff (if adopting lowercase naming and uppercase suffix):

-    auto DST = mllm::Tensor::emptyLike(RefDST).alloc();
+    auto dst = mllm::Tensor::emptyLike(RefDST).alloc();

     // Calculate DST.
     {
-      auto dst_ptr = DST.ptr<float>();
+      auto dst_ptr = dst.ptr<float>();
       auto a_ptr = A.ptr<float>();
       auto b_ptr = B.ptr<float>();
-      const int M = S_Q;
-      const int K = S_KV;
-      const int N = D;
-      for (int i = 0; i < M; ++i) {
-        for (int j = 0; j < N; ++j) {
-          float sum = 0.0f;
-          for (int k = 0; k < K; ++k) { sum += a_ptr[i * K + k] * b_ptr[k * N + j]; }
-          dst_ptr[i * N + j] = sum;
+      const int num_rows = S_Q;
+      const int inner_dim = S_KV;
+      const int num_cols = D;
+      for (int i = 0; i < num_rows; ++i) {
+        for (int j = 0; j < num_cols; ++j) {
+          float sum = 0.0F;
+          for (int k = 0; k < inner_dim; ++k) { sum += a_ptr[i * inner_dim + k] * b_ptr[k * num_cols + j]; }
+          dst_ptr[i * num_cols + j] = sum;
         }
       }
     }

-    auto result = mllm::test::allClose(DST, RefDST);
+    auto result = mllm::test::allClose(dst, RefDST);

71-88: Optional: address static analysis style hints.

Similar to the first test, the static analyzer flags style issues:

  • Variable naming: DST, M, K, N
  • Variable name length: M, K, N
  • Float literal suffix: 0.0f vs 0.0F

These are the same concerns as in the previous function. If you decide to address them, apply consistent changes across both test functions.


29-44: Consider extracting common matmul logic to reduce duplication.

Both test functions contain nearly identical nested-loop matrix multiplication logic, differing only in the inner loop's B matrix access pattern (transposed vs non-transposed). Consider extracting a helper function to reduce duplication:

Example helper approach:

// Helper to compute C = A * B (or A * B^T if transpose_b is true)
static void compute_matmul_cpu(
    float* c_ptr, const float* a_ptr, const float* b_ptr,
    int M, int K, int N, bool transpose_b) {
  for (int i = 0; i < M; ++i) {
    for (int j = 0; j < N; ++j) {
      float sum = 0.0f;
      for (int k = 0; k < K; ++k) {
        const float b_val = transpose_b ? b_ptr[j * K + k] : b_ptr[k * N + j];
        sum += a_ptr[i * K + k] * b_val;
      }
      c_ptr[i * N + j] = sum;
    }
  }
}

Then call it from both test functions, reducing maintenance burden and improving readability.

Also applies to: 73-88

CMakeLists.txt (1)

45-46: Consider wrapping the long option description for better readability.

The new MLLM_ANDROID_BURST_PERFORMANCE_HINTS option is a good addition for Android performance optimization. However, the description string on Line 46 is quite long (over 120 characters). Consider wrapping it across multiple lines for better maintainability.

Apply this diff to improve readability:

-option(MLLM_ANDROID_BURST_PERFORMANCE_HINTS "If MLLM need use APerformanceHintManager to tell android we need best performance" OFF)
+option(MLLM_ANDROID_BURST_PERFORMANCE_HINTS 
+       "If MLLM need use APerformanceHintManager to tell android we need best performance" 
+       OFF)

Alternatively, shorten the description:

-option(MLLM_ANDROID_BURST_PERFORMANCE_HINTS "If MLLM need use APerformanceHintManager to tell android we need best performance" OFF)
+option(MLLM_ANDROID_BURST_PERFORMANCE_HINTS 
+       "Enable Android APerformanceHintManager for burst performance hints" 
+       OFF)
mllm/backends/cpu/ops/MatMulOp.cpp (1)

53-54: TODO comment indicates known kGGUF bug.

The comment on Line 53 flags that the kGGUF matmul type is still buggy, yet Line 54 still selects it under certain conditions. This could lead to incorrect results in production.

Do you want me to:

  1. Generate a verification script to check if there's an existing issue tracking this bug?
  2. Open a new issue to track this bug with details about when kGGUF is selected and what the expected fix timeline is?
  3. Suggest adding a runtime warning when kGGUF is selected to alert users of potential issues?
examples/smollm3/main.cpp (1)

13-16: Consider adding error handling for invalid tokenizer path.

The example creates a SmolLM3Tokenizer without verifying that the provided path is valid. If the tokenizer file is missing or corrupted, this will likely throw an exception or crash. For a user-facing example, consider adding error handling to provide a helpful message.

   {
+    try {
       auto tokenizer = mllm::models::smollm3::SmolLM3Tokenizer(tokenizer_path.get());
       auto ids = tokenizer.encode(tokenizer.applyChatTemplate("Bonjour 😈", false));
       mllm::print(ids);
       mllm::print(tokenizer.decode(ids));
+    } catch (const std::exception& e) {
+      fmt::print(stderr, "Error loading tokenizer from '{}': {}\n", tokenizer_path.get(), e.what());
+      return 1;
+    }
   }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b301ca and 8494a34.

📒 Files selected for processing (17)
  • CMakeLists.txt (1 hunks)
  • examples/CMakeLists.txt (1 hunks)
  • examples/smollm3/CMakeLists.txt (1 hunks)
  • examples/smollm3/main.cpp (1 hunks)
  • mllm/backends/cpu/kernels/arm/linear/kai.cpp (2 hunks)
  • mllm/backends/cpu/kernels/arm/linear/kai.hpp (2 hunks)
  • mllm/backends/cpu/ops/LinearOp.cpp (4 hunks)
  • mllm/backends/cpu/ops/MatMulOp.cpp (1 hunks)
  • mllm/core/ParameterFile.cpp (2 hunks)
  • mllm/core/ParameterFile.hpp (1 hunks)
  • mllm/core/aops/LinearOp.hpp (1 hunks)
  • mllm/engine/hints/Android.hpp (1 hunks)
  • mllm/mllm.cpp (1 hunks)
  • mllm/mllm.hpp (1 hunks)
  • mllm/models/smollm3_3B/tokenization_smollm3.hpp (1 hunks)
  • tests/cpu/MllmBlasArmSgemmKernelTest.hpp (2 hunks)
  • tests/cpu/MllmBlasArmSgemvKernelTest.hpp (2 hunks)
🧰 Additional context used
🪛 Clang (14.0.6)
tests/cpu/MllmBlasArmSgemvKernelTest.hpp

[error] 22-22: variable name 'A' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 22-22: invalid case style for variable 'A'

(readability-identifier-naming,-warnings-as-errors)


[error] 23-23: variable name 'B' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 23-23: invalid case style for variable 'B'

(readability-identifier-naming,-warnings-as-errors)


[error] 24-24: variable name 'C' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 24-24: invalid case style for variable 'C'

(readability-identifier-naming,-warnings-as-errors)


[error] 25-25: invalid case style for variable 'DST'

(readability-identifier-naming,-warnings-as-errors)


[error] 35-35: invalid case style for variable 'DSTP'

(readability-identifier-naming,-warnings-as-errors)

mllm/engine/hints/Android.hpp

[error] 5-5: 'android/performance_hint.h' file not found

(clang-diagnostic-error)

tests/cpu/MllmBlasArmSgemmKernelTest.hpp

[error] 27-27: invalid case style for variable 'DST'

(readability-identifier-naming,-warnings-as-errors)


[error] 34-34: variable name 'M' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 34-34: invalid case style for variable 'M'

(readability-identifier-naming,-warnings-as-errors)


[error] 35-35: variable name 'K' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 35-35: invalid case style for variable 'K'

(readability-identifier-naming,-warnings-as-errors)


[error] 36-36: variable name 'N' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 36-36: invalid case style for variable 'N'

(readability-identifier-naming,-warnings-as-errors)


[error] 39-39: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 71-71: invalid case style for variable 'DST'

(readability-identifier-naming,-warnings-as-errors)


[error] 78-78: variable name 'M' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 78-78: invalid case style for variable 'M'

(readability-identifier-naming,-warnings-as-errors)


[error] 79-79: variable name 'K' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 79-79: invalid case style for variable 'K'

(readability-identifier-naming,-warnings-as-errors)


[error] 80-80: variable name 'N' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 80-80: invalid case style for variable 'N'

(readability-identifier-naming,-warnings-as-errors)


[error] 83-83: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

mllm/backends/cpu/kernels/arm/linear/kai.hpp

[error] 56-56: invalid case style for class 'KaiLinear_fp32_fp32_fp32p_mxk_kxn'

(readability-identifier-naming,-warnings-as-errors)


[error] 57-57: method 'need_pack_lhs' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 59-59: method 'need_pack_rhs' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 66-66: parameter name 'K' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 66-66: parameter name 'N' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 69-69: parameter name 'M' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 69-69: parameter name 'K' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 69-69: parameter name 'N' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 72-72: variable 'ukernel_' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 72-72: invalid case style for variable 'ukernel_'

(readability-identifier-naming,-warnings-as-errors)

mllm/models/smollm3_3B/tokenization_smollm3.hpp

[error] 5-5: 'string' file not found

(clang-diagnostic-error)


[error] 21-21: do not declare C-style arrays, use std::array<> instead

(cppcoreguidelines-avoid-c-arrays,-warnings-as-errors)


[error] 25-25: constructor does not initialize these fields: prompt

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 29-29: variable 'no_think_template_str' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 35-35: variable 'think_template_str' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 53-53: constructor does not initialize these fields: bpe_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 69-69: method 'replaceAll' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 69-69: parameter name 's' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 69-69: 2 adjacent parameters of 'replaceAll' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 69-69: parameter name 'to' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 71-71: variable 'pos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 79-79: variable 't' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 79-79: variable name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 80-80: variable 'tm_' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 80-80: invalid case style for variable 'tm_'

(readability-identifier-naming,-warnings-as-errors)


[error] 82-82: variable 'oss' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 84-84: variable 'date_in_number' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 86-86: do not declare C-style arrays, use std::array<> instead

(cppcoreguidelines-avoid-c-arrays,-warnings-as-errors)


[error] 88-88: variable 'month' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 89-89: variable 'year' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 91-91: variable 'tpl' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 144-144: variable 'ret' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

examples/smollm3/main.cpp

[error] 1-1: 'fmt/core.h' file not found

(clang-diagnostic-error)


[error] 8-8: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 8-8: invalid case style for variable 'MLLM_MAIN'

(readability-identifier-naming,-warnings-as-errors)

mllm/mllm.cpp

[error] 95-95: 3 adjacent parameters of 'load' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 96-96: repeated branch in conditional chain

(bugprone-branch-clone,-warnings-as-errors)

mllm/backends/cpu/kernels/arm/linear/kai.cpp

[error] 4-4: 'limits' file not found

(clang-diagnostic-error)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-android
  • GitHub Check: build-macos
🔇 Additional comments (7)
tests/cpu/MllmBlasArmSgemvKernelTest.hpp (2)

22-25: LGTM! Improved test coverage and corrected output shape.

The changes improve the test in two ways:

  • Random tensor initialization with range [-1, 1] provides better coverage than fixed values
  • The DST shape correction from {1, D} to {1, S} properly reflects the matrix multiplication result: {1, D} @ {D, S} = {1, S}

35-35: LGTM! Consistent shape correction.

The DSTP shape now correctly matches DST's {1, S} shape, ensuring both the baseline and optimized implementations are tested with the proper output dimensions.

tests/cpu/MllmBlasArmSgemmKernelTest.hpp (2)

27-44: Good approach: manual CPU reference for BLAS testing.

Replacing the BLAS matmul call with an explicit nested-loop implementation provides a clear, simple reference to validate the BLAS kernels against. The logic correctly computes the matrix product for row-major layout.


71-88: Correctly implements transposed matrix multiply.

The manual computation properly handles the transposed B matrix by accessing b_ptr[j * K + k] instead of b_ptr[k * N + j], which correctly represents B^T[k,j] in row-major layout.

mllm/models/smollm3_3B/tokenization_smollm3.hpp (1)

127-128: Clarify the comment about regex processing.

Line 128 states "No need to Regex:" and then creates a single-element initializer list. This suggests regex processing was intentionally skipped or is a placeholder. Consider clarifying why regex processing is not needed here, or if this is temporary code that should be implemented.

Based on the GPT2_EXPR defined at Line 21 and the byteLevelPreTokenizer using unicode_regex_split at Line 164, it appears regex processing should occur. Please verify if this is intentional or if regex-based splitting should be applied here before byte-level pre-tokenization.

examples/CMakeLists.txt (1)

10-10: LGTM!

The addition of the smollm3 subdirectory properly integrates the new SmolLM3 example into the build system.

mllm/mllm.cpp (1)

95-100: LGTM! The mmap parameter is properly propagated.

The addition of the mmap parameter to the load function signature and its propagation to the underlying ParameterFileIOImpl::read calls is clean and consistent. This enables optional memory-mapped file loading for better performance.

Note: The static analysis warning about "easily swappable parameters" is a minor concern. If you want to make the API more robust against parameter swapping, consider using an options struct or strong types in the future, but this is not critical for the current change.

Comment thread mllm/backends/cpu/kernels/arm/linear/kai.cpp
Comment thread mllm/backends/cpu/ops/LinearOp.cpp
Comment thread mllm/core/aops/LinearOp.hpp
Comment thread mllm/core/ParameterFile.cpp
Comment thread mllm/core/ParameterFile.cpp
Comment thread mllm/core/ParameterFile.cpp
Comment thread mllm/engine/hints/Android.hpp
Comment thread mllm/models/smollm3_3B/tokenization_smollm3.hpp Outdated
Comment thread mllm/models/smollm3_3B/tokenization_smollm3.hpp
When bias is null, allocate and explicitly zero-initialize the bias array
to ensure correct behavior during offline packing.

fix(cpu): use transposed weight dimensions for packing calculations

Corrected the dimension parameters passed to quant_pack_rhs_size and
quant_pack_rhs_offline to use transposed weight tensor sizes.

feat(core): add new MllmBlas KAI SGEMM implementation types

Registered new linear implementation types for KAI-based SGEMM with
NEON and SME backends in both NT/NT and NT/T configurations.

feat(engine): conditionally include Android performance hints

Wrapped Android-specific headers in preprocessor guards to avoid
build errors on non-Android platforms.

refactor(models): mark template strings as const in SmolLM3 tokenizer

Changed static inline string variables to be explicitly const to
enforce immutability and improve code clarity.
@chenghuaWang chenghuaWang merged commit a1b2032 into UbiquitousLearning:v2 Nov 3, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant