Skip to content

Comments

Add SVE implementation for Mamba Sequential Scan Algorithm#38185

Open
vineelabhinav wants to merge 3 commits intohuggingface:mainfrom
vineelabhinav:mamba-seq_scan-sve
Open

Add SVE implementation for Mamba Sequential Scan Algorithm#38185
vineelabhinav wants to merge 3 commits intohuggingface:mainfrom
vineelabhinav:mamba-seq_scan-sve

Conversation

@vineelabhinav
Copy link

@vineelabhinav vineelabhinav commented May 17, 2025

What does this PR do?

This PR adds SVE kernel support specific to Mamba Model on ARM architecture.
This PR contributions are:

  • Sequential Scan algorithm of Mamba model is implemented using SVE intrinsics and openMP pragmas and replaces the original naive Sequential Scan algorithm.
  • All data moments are limited within registers without any main memory movements.
  • All codes are written in cpp language and some modifications in python.

Advantages:

  • Speeds the model as it leverages parallelism from SVE intrinsics and openMP.
  • Reduces memory overhead and redundant data movements.

Main Files:

  • src/transformers/kernels/mamba/sve_kernels/helper.cpp : Contains the core cpp code for sequential scan algorithm
  • src/transformers/models/mamba/modeling_mamba.py: Contains the Mamba model creation in python

Bindings:

  • Used cython to bind cpp code with python.
  • Shared library file generation(.so) is done only once and stored in 'TORCH_EXTENSIONS_DIR' directory

Results

  • Task1: Prompt length: 32 tokens, Generated Tokens: 1 token
Batch Size Original Sequential Scan (Sec) SVE Sequential Scan(Sec)
32 32.37 14.07
64 71.00 25.59
128 138.45 45.68
256 273.17 86.34
512 540.85 167.49
1024 OOM Error 329.26
  • Task2: Prompt length: 1 token, Generated Tokens: 100 tokens
Batch Size Original Sequential Scan (Sec) SVE Sequential Scan(Sec)
1 29.857 44.21
2 33.955 70.80
4 43.714 58.50
8 45.379 93.72
16 373.812 346.89
32 405.519 336.03
64 581.049 360.83
128 701.731 414.27
256 1190.674 672.87
512 2249.906 1034.64

Accuracy

There is no change in model accuracy as a result of this PR. Accuracy remained intact.

Contributors

  • Sumit Suthar ( @SSgit2008 )
  • Vineel Abhinav Gottala

cc: @NishantPrabhuFujitsu

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@MekkCyber
Copy link
Contributor

Hi @vineelabhinav, thank you for the PR 🤗 — it looks great!

That said, we're currently moving away from maintaining custom kernels directly inside transformers, as it's become increasingly difficult to manage. Instead, we're shifting towards a more modular approach using the kernels and kernel-builder libraries.

The idea is to build and publish kernels using kernel-builder, and then load them in transformers via kernels from the Hugging Face Hub. You can check out this PR as an example of how that integration works.

Currently, kernel-builder doesn’t support ARM yet — but once it does, we’d love to support your contribution as a built kernel in huggingface.co/kernels-community!

@vineelabhinav
Copy link
Author

Hi @MekkCyber,
Thank you for the comment!
I understood the current integration procedure. I have follow-up questions:

  1. Is there an estimated timeline for when kernel-builder will support ARM? Knowing this would help us better plan and accelerate the integration process.
  2. There's a module mamba.py (https://github.com/alxndrTL/mamba.py) available as a pip installable package. It's used within the Mamba transformers module . Could we explore a similar integration approach for our case?

@vineelabhinav
Copy link
Author

Hi @MekkCyber,
Any suggestions/updates regarding the two points I have discussed earlier?

@vineelabhinav
Copy link
Author

vineelabhinav commented May 31, 2025

Hi @MekkCyber @ArthurZucker ,
Following up on the previous comment. Please let me know your thoughts on points?

@MekkCyber
Copy link
Contributor

Hi @vineelabhinav! sorry I was off for a bit !

@SSgit2008
Copy link

@MekkCyber,
Thank you for the reply. We will explore addition of an installable module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants