Skip to content

hexagon: add f32 ssm_conv op#20122

Merged
max-krasnyansky merged 6 commits intoggml-org:masterfrom
qualcomm:tb/htp-ssm-conv
Mar 6, 2026
Merged

hexagon: add f32 ssm_conv op#20122
max-krasnyansky merged 6 commits intoggml-org:masterfrom
qualcomm:tb/htp-ssm-conv

Conversation

@tboinovski1
Copy link
Copy Markdown
Contributor

Make sure to read the contributing guidelines before submitting a PR

@github-actions github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 5, 2026
Copy link
Copy Markdown
Member

@max-krasnyansky max-krasnyansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
We can improve DMA pipelining and precompute more per-thread state in a local context. Ok to do that in a followup though.

test-backend-ops is passing on S25 and Gen5, but failing on S24 (gen3, hex-arch v75).

[SSM_CONV] ERR = 0.045690326 > 0.000000100   SSM_CONV(type=f32,ne_a=[3,1024,1,1],ne_b=[3,1024,1,1]): FAIL
[SSM_CONV] ERR = 0.027886839 > 0.000000100   SSM_CONV(type=f32,ne_a=[6,1024,1,1],ne_b=[3,1024,1,1]): FAIL
[SSM_CONV] ERR = 0.069031246 > 0.000000100   SSM_CONV(type=f32,ne_a=[3,1024,4,1],ne_b=[3,1024,1,1]): FAIL
[SSM_CONV] ERR = 0.011857102 > 0.000000100   SSM_CONV(type=f32,ne_a=[3,1536,1,1],ne_b=[3,1536,1,1]): FAIL
[SSM_CONV] ERR = 0.021944322 > 0.000000100   SSM_CONV(type=f32,ne_a=[6,1536,1,1],ne_b=[3,1536,1,1]): FAIL
[SSM_CONV] ERR = 0.046434721 > 0.000000100   SSM_CONV(type=f32,ne_a=[3,1536,4,1],ne_b=[3,1536,1,1]): FAIL
[SSM_CONV] ERR = 0.009548537 > 0.000000100   SSM_CONV(type=f32,ne_a=[3,2048,1,1],ne_b=[3,2048,1,1]): FAIL
[SSM_CONV] ERR = 0.023424253 > 0.000000100   SSM_CONV(type=f32,ne_a=[6,2048,1,1],ne_b=[3,2048,1,1]): FAIL
[SSM_CONV] ERR = 0.022690631 > 0.000000100   SSM_CONV(type=f32,ne_a=[3,2048,4,1],ne_b=[3,2048,1,1]): FAIL
[SSM_CONV] ERR = 0.020098070 > 0.000000100   SSM_CONV(type=f32,ne_a=[4,1024,1,1],ne_b=[4,1024,1,1]): FAIL
[SSM_CONV] ERR = 0.033635444 > 0.000000100   SSM_CONV(type=f32,ne_a=[8,1024,1,1],ne_b=[4,1024,1,1]): FAIL
[SSM_CONV] ERR = 0.053863172 > 0.000000100   SSM_CONV(type=f32,ne_a=[4,1024,4,1],ne_b=[4,1024,1,1]): FAIL
[SSM_CONV] ERR = 0.046260286 > 0.000000100   SSM_CONV(type=f32,ne_a=[4,1536,1,1],ne_b=[4,1536,1,1]): FAIL
[SSM_CONV] ERR = 0.019073288 > 0.000000100   SSM_CONV(type=f32,ne_a=[8,1536,1,1],ne_b=[4,1536,1,1]): FAIL
[SSM_CONV] ERR = 0.019367290 > 0.000000100   SSM_CONV(type=f32,ne_a=[4,1536,4,1],ne_b=[4,1536,1,1]): FAIL
[SSM_CONV] ERR = 0.003745381 > 0.000000100   SSM_CONV(type=f32,ne_a=[4,2048,1,1],ne_b=[4,2048,1,1]): FAIL
[SSM_CONV] ERR = 0.017238832 > 0.000000100   SSM_CONV(type=f32,ne_a=[8,2048,1,1],ne_b=[4,2048,1,1]): FAIL
[SSM_CONV] ERR = 0.015438665 > 0.000000100   SSM_CONV(type=f32,ne_a=[4,2048,4,1],ne_b=[4,2048,1,1]): FAIL
[SSM_CONV] ERR = 0.026768994 > 0.000000100   SSM_CONV(type=f32,ne_a=[9,1024,1,1],ne_b=[9,1024,1,1]): FAIL
[SSM_CONV] ERR = 0.013978035 > 0.000000100   SSM_CONV(type=f32,ne_a=[18,1024,1,1],ne_b=[9,1024,1,1]): FAIL
[SSM_CONV] ERR = 0.020436464 > 0.000000100   SSM_CONV(type=f32,ne_a=[9,1024,4,1],ne_b=[9,1024,1,1]): FAIL
[SSM_CONV] ERR = 0.003860245 > 0.000000100   SSM_CONV(type=f32,ne_a=[9,1536,1,1],ne_b=[9,1536,1,1]): FAIL
[SSM_CONV] ERR = 0.006388827 > 0.000000100   SSM_CONV(type=f32,ne_a=[18,1536,1,1],ne_b=[9,1536,1,1]): FAIL
[SSM_CONV] ERR = 0.021140220 > 0.000000100   SSM_CONV(type=f32,ne_a=[9,1536,4,1],ne_b=[9,1536,1,1]): FAIL
[SSM_CONV] ERR = 0.005109369 > 0.000000100   SSM_CONV(type=f32,ne_a=[9,2048,1,1],ne_b=[9,2048,1,1]): FAIL
[SSM_CONV] ERR = 0.006309449 > 0.000000100   SSM_CONV(type=f32,ne_a=[18,2048,1,1],ne_b=[9,2048,1,1]): FAIL
[SSM_CONV] ERR = 0.011532749 > 0.000000100   SSM_CONV(type=f32,ne_a=[9,2048,4,1],ne_b=[9,2048,1,1]): FAIL

I'll dig some more tomorrow to see what's up with that.
The error is quite large. LFM2 output seems OK but it would be good to fix those errors.

@max-krasnyansky
Copy link
Copy Markdown
Member

Latest updates fixed all test-backend-ops failures on Snapdragon Gen3,4,5 and X-Elites.

@max-krasnyansky max-krasnyansky merged commit 34df42f into ggml-org:master Mar 6, 2026
78 checks passed
@max-krasnyansky max-krasnyansky deleted the tb/htp-ssm-conv branch March 6, 2026 18:48
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 10, 2026
* hexagon: add ssm_conv op

* hexagon: hvx kernel is functional

* hexagon: improvements to ssm-conv hvx kernel

* hexagon: added dma to ssm-conv hvx kernel

* hexagon: ssm-conv dynamically compute gather scratchpad

* hex-ssm-conv: add local context and fix various issues (spad indexing, etc)

---------

Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026
* hexagon: add ssm_conv op

* hexagon: hvx kernel is functional

* hexagon: improvements to ssm-conv hvx kernel

* hexagon: added dma to ssm-conv hvx kernel

* hexagon: ssm-conv dynamically compute gather scratchpad

* hex-ssm-conv: add local context and fix various issues (spad indexing, etc)

---------

Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* hexagon: add ssm_conv op

* hexagon: hvx kernel is functional

* hexagon: improvements to ssm-conv hvx kernel

* hexagon: added dma to ssm-conv hvx kernel

* hexagon: ssm-conv dynamically compute gather scratchpad

* hex-ssm-conv: add local context and fix various issues (spad indexing, etc)

---------

Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
* hexagon: add ssm_conv op

* hexagon: hvx kernel is functional

* hexagon: improvements to ssm-conv hvx kernel

* hexagon: added dma to ssm-conv hvx kernel

* hexagon: ssm-conv dynamically compute gather scratchpad

* hex-ssm-conv: add local context and fix various issues (spad indexing, etc)

---------

Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants