Skip to content

hexagon: various Op fixes#17135

Merged
max-krasnyansky merged 4 commits intoggml-org:masterfrom
qualcomm:hexagon-op-fixes
Nov 11, 2025
Merged

hexagon: various Op fixes#17135
max-krasnyansky merged 4 commits intoggml-org:masterfrom
qualcomm:hexagon-op-fixes

Conversation

@max-krasnyansky
Copy link
Copy Markdown
Member

Introduce fastdiv and fixtest-backend-ops failures for ADD/SUB/MUL
Thanks @chraac!
Subsumes #17042

Fixed inference with Qwen3-VL models that generate graphs with ne[1] == 0

@chraac
Copy link
Copy Markdown
Contributor

chraac commented Nov 10, 2025

Great work! I've tested the fix on my device and it works well.

One small thing: init_fastdiv_values is called for each binary operation during each calculation.
However, this should have minimal performance impact since it only contains constant shifts and mpys, so I think it's acceptable.

Looking ahead, have you considered maintaining a similar structure to store these values on the CPU? That could be optimal since the weight dimensions remain fixed throughout all stages.

Copy link
Copy Markdown
Contributor

@chraac chraac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

max-krasnyansky and others added 4 commits November 10, 2025 15:03
llm_graph_context::build_inp_out_ids() can generate tensors with zero nrows.
Somehow other backends seems to handle this without obvious explicit checks.
In the hexagon case we need to check explicitly and skip them.
@max-krasnyansky max-krasnyansky marked this pull request as ready for review November 10, 2025 23:45
@max-krasnyansky
Copy link
Copy Markdown
Member Author

Great work! I've tested the fix on my device and it works well.

One small thing: init_fastdiv_values is called for each binary operation during each calculation. However, this should have minimal performance impact since it only contains constant shifts and mpys, so I think it's acceptable.

Looking ahead, have you considered maintaining a similar structure to store these values on the CPU? That could be optimal since the weight dimensions remain fixed throughout all stages.

Yeah, I did lots of profiling runs and the overall perf is the same as before but all test-backend-ops for ADD/SUB/MUl/ADD_ID are passing now. Definitely acceptable :)

And yes, let's think about caching those. I was thinking host at first as we discussed in the other PR. But maybe it makes sense to allocate a little bit of vtcm and cache them there.

@max-krasnyansky
Copy link
Copy Markdown
Member Author

@lhez please take a look

Copy link
Copy Markdown
Contributor

@lhez lhez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@max-krasnyansky max-krasnyansky merged commit c273d75 into ggml-org:master Nov 11, 2025
70 of 71 checks passed
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
* hexagon: explicitly check for ops with zero nrows

llm_graph_context::build_inp_out_ids() can generate tensors with zero nrows.
Somehow other backends seems to handle this without obvious explicit checks.
In the hexagon case we need to check explicitly and skip them.

* hexagon: introduce fastdiv, fix test-backend-ops for ADD/SUB/MUL

Co-authored-by: chraac <chraac@gmail.com>

* hexagon: use fastdiv in ADD_ID

* hexagon: use ggml_op_is_empty and ggml_is_empty to check for NOPs

---------

Co-authored-by: chraac <chraac@gmail.com>
@max-krasnyansky max-krasnyansky deleted the hexagon-op-fixes branch January 25, 2026 23:01
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* hexagon: explicitly check for ops with zero nrows

llm_graph_context::build_inp_out_ids() can generate tensors with zero nrows.
Somehow other backends seems to handle this without obvious explicit checks.
In the hexagon case we need to check explicitly and skip them.

* hexagon: introduce fastdiv, fix test-backend-ops for ADD/SUB/MUL

Co-authored-by: chraac <chraac@gmail.com>

* hexagon: use fastdiv in ADD_ID

* hexagon: use ggml_op_is_empty and ggml_is_empty to check for NOPs

---------

Co-authored-by: chraac <chraac@gmail.com>
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* hexagon: explicitly check for ops with zero nrows

llm_graph_context::build_inp_out_ids() can generate tensors with zero nrows.
Somehow other backends seems to handle this without obvious explicit checks.
In the hexagon case we need to check explicitly and skip them.

* hexagon: introduce fastdiv, fix test-backend-ops for ADD/SUB/MUL

Co-authored-by: chraac <chraac@gmail.com>

* hexagon: use fastdiv in ADD_ID

* hexagon: use ggml_op_is_empty and ggml_is_empty to check for NOPs

---------

Co-authored-by: chraac <chraac@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants