Skip to content

Qualcomm AI Engine Direct - Merge the two pybind libraries into a single library#15999

Merged
cccclai merged 2 commits intopytorch:mainfrom
CodeLinaro:dev1/hutton/combine_pybind_libraries
Dec 15, 2025
Merged

Qualcomm AI Engine Direct - Merge the two pybind libraries into a single library#15999
cccclai merged 2 commits intopytorch:mainfrom
CodeLinaro:dev1/hutton/combine_pybind_libraries

Conversation

@shewu-quic
Copy link
Collaborator

@shewu-quic shewu-quic commented Nov 27, 2025

Summary:

  • Prevent dynamic_cast failures caused by separate typeinfo in each library with clang.

cc: @haowhsu-quic

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 27, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15999

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job, 1 Unrelated Failure

As of commit fffb232 with merge base ee236cb (image):

NEW FAILURE - The following job has failed:

CANCELLED JOB - The following job was cancelled. Please retry:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 27, 2025
@shewu-quic shewu-quic marked this pull request as ready for review November 27, 2025 06:16
@shewu-quic
Copy link
Collaborator Author

shewu-quic commented Nov 27, 2025

Hi @cccclai
This PR addresses the dynamic_cast failure encountered with clang in this issue #15734
I will split Qualcomm AI Engine Direct - Support PARSeq in floating point precision into a few PRs.

Could you take a look at this when you have a moment?

Thanks

@shewu-quic
Copy link
Collaborator Author

@pytorchbot label "release notes: qualcomm"

@pytorch-bot pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Nov 27, 2025
@cccclai
Copy link
Contributor

cccclai commented Dec 1, 2025

It seems like a pretty big change and I just recently disable qnn pybind test #15949. Let's make sure it passes internal test and the qnn related CI..

@shewu-quic shewu-quic force-pushed the dev1/hutton/combine_pybind_libraries branch 2 times, most recently from 9b4866c to 1ab4cd2 Compare December 3, 2025 01:34
…gle library

Summary:
- Prevent dynamic_cast failures caused by separate typeinfo in each library.
@shewu-quic shewu-quic force-pushed the dev1/hutton/combine_pybind_libraries branch from 1ab4cd2 to fffb232 Compare December 11, 2025 01:37
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Dec 11, 2025

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D88963737.

@cccclai cccclai merged commit 68ddd80 into pytorch:main Dec 15, 2025
276 of 279 checks passed
@DamonFool
Copy link
Contributor

Hi @cccclai , did you test executorch/examples/qualcomm/oss_scripts/qwen2_5/qwen2_5.py?

I got the following error

Traceback (most recent call last):
  File "/home/jiefu/executorch/examples/qualcomm/oss_scripts/qwen2_5/qwen2_5.py", line 16, in <module>
    from executorch.backends.qualcomm.quantizer.quantizer import QuantDtype
  File "/home/jiefu/executorch/backends/qualcomm/quantizer/quantizer.py", line 12, in <module>
    from executorch.backends.qualcomm._passes.qnn_pass_manager import QnnPassManager
  File "/home/jiefu/executorch/backends/qualcomm/_passes/__init__.py", line 7, in <module>
    from .annotate_adaptive_avg_pool1d import AnnotateAdaptiveAvgPool1D
  File "/home/jiefu/executorch/backends/qualcomm/_passes/annotate_adaptive_avg_pool1d.py", line 7, in <module>
    from executorch.backends.qualcomm.builders.node_visitor import q_ops
  File "/home/jiefu/executorch/backends/qualcomm/builders/__init__.py", line 7, in <module>
    from . import (
  File "/home/jiefu/executorch/backends/qualcomm/builders/node_visitor.py", line 51, in <module>
    torch.int8: PyQnnManager.Qnn_DataType_t.QNN_DATATYPE_SFIXED_POINT_8,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'executorch.backends.qualcomm.python.PyQnnManagerAdaptor' has no attribute 'Qnn_DataType_t'

@shewu-quic
Copy link
Collaborator Author

shewu-quic commented Dec 16, 2025

executorch/examples/qualcomm/oss_scripts/qwen2_5/qwen2_5.py

Hi @DamonFool ,

Could you rebuild PyQnnManagerAdaptor.so?

./backends/qualcomm/scripts/build.sh

@DamonFool
Copy link
Contributor

./backends/qualcomm/scripts/build.sh

It works for me.
Thanks @cccclai .

But with the following error

[QNN Partitioner Op Support]: aten.unsqueeze_copy.default | True
[QNN Partitioner Op Support]: aten.unsqueeze_copy.default | True
[QNN Partitioner Op Support]: aten.where.self | True
[ERROR] [Qnn ExecuTorch]: Number of input elements 1 does not match number of output elements 128.

[ERROR] [Qnn ExecuTorch]: Op specific validation failed.

[ERROR] [Qnn ExecuTorch]:  <E> validateNativeOps master op validator aten_copy_default:qti.aisw:Reshape failed 3110

[ERROR] [Qnn ExecuTorch]:  <E> QnnBackend_validateOpConfig failed 3110

[ERROR] [Qnn ExecuTorch]:  <E> Failed to validate op aten_copy_default with error 0xc26

[WARNING] [Qnn ExecuTorch]: Qnn Backend op validation failed with error: 3110
[QNN Partitioner Op Support]: aten.copy.default | False
Traceback (most recent call last):
  File "/home/jiefu/executorch/examples/qualcomm/oss_scripts/qwen2_5/qwen2_5.py", line 259, in <module>
    main(args)
  File "/home/jiefu/executorch/examples/qualcomm/oss_scripts/qwen2_5/qwen2_5.py", line 192, in main
    compile(args)
  File "/home/jiefu/executorch/examples/qualcomm/oss_scripts/qwen2_5/qwen2_5.py", line 80, in compile
    manager.to_edge_transform_and_lower_to_qnn(
  File "/home/jiefu/executorch/examples/qualcomm/oss_scripts/llm_utils/qnn_decoder_model_manager.py", line 292, in to_edge_transform_and_lower_to_qnn
    self.edge_prog_mgr = to_edge_transform_and_lower_to_qnn(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiefu/executorch/backends/qualcomm/utils/utils.py", line 448, in to_edge_transform_and_lower_to_qnn
    return to_edge_transform_and_lower(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiefu/executorch/exir/program/_program.py", line 115, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiefu/executorch/exir/program/_program.py", line 1378, in to_edge_transform_and_lower
    edge_manager = edge_manager.to_backend(method_to_partitioner)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiefu/executorch/exir/program/_program.py", line 115, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiefu/executorch/exir/program/_program.py", line 1680, in to_backend
    new_edge_programs = to_backend(method_to_programs_and_partitioners)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/functools.py", line 909, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiefu/executorch/exir/backend/backend_api.py", line 721, in _
    partitioner_result = partitioner_instance(fake_edge_program)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiefu/executorch/exir/backend/partitioner.py", line 66, in __call__
    return self.partition(exported_program)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiefu/executorch/backends/qualcomm/partition/qnn_partitioner.py", line 199, in partition
    partitions = self.generate_partitions(edge_program)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiefu/executorch/backends/qualcomm/partition/qnn_partitioner.py", line 164, in generate_partitions
    return generate_partitions_from_list_of_nodes(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiefu/executorch/exir/backend/canonical_partitioners/pattern_op_partitioner.py", line 54, in generate_partitions_from_list_of_nodes
    partition_list = capability_partitioner.propose_partitions()
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiefu/.python/executorch/lib/python3.11/site-packages/torch/fx/passes/infra/partitioner.py", line 226, in propose_partitions
    if self._is_node_supported(node) and node not in assignment:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiefu/.python/executorch/lib/python3.11/site-packages/torch/fx/passes/infra/partitioner.py", line 87, in _is_node_supported
    return self.operator_support.is_node_supported(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiefu/executorch/backends/qualcomm/partition/qnn_partitioner.py", line 100, in is_node_supported
    op_wrapper = self.node_visitors[node.target.__name__].define_node(
                 ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'dim_order_ops._empty_dim_order.default'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jiefu/executorch/examples/qualcomm/oss_scripts/qwen2_5/qwen2_5.py", line 265, in <module>
    raise Exception(e)
Exception: 'dim_order_ops._empty_dim_order.default'

I tested it with transformers==4.53.1.

So may I ask is the qwen2_5.py script also broken with older transformers?

It is broken with transformers==4.56.1, right?

@DamonFool
Copy link
Contributor

My test command is

python3 examples/qualcomm/oss_scripts/qwen2_5/qwen2_5.py \
    -m SM8650  \
    -s xxx \
    --prompt "My favourite condiment is "  \
    -b build-android \
    --decoder_model qwen2.5_0.5B \
    --calibration_tasks wikitext \
    --calibration_limit 1 \
    --ptq 16a8w

@shewu-quic
Copy link
Collaborator Author

My test command is

python3 examples/qualcomm/oss_scripts/qwen2_5/qwen2_5.py \
    -m SM8650  \
    -s xxx \
    --prompt "My favourite condiment is "  \
    -b build-android \
    --decoder_model qwen2.5_0.5B \
    --calibration_tasks wikitext \
    --calibration_limit 1 \
    --ptq

It seems to be misaligned with transformers==4.56.1. Will look into it. Thanks for reporting.

BTW, you can also run this script to run qwen to get better performance.

https://github.com/pytorch/executorch/tree/main/examples/qualcomm/oss_scripts/llama

python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s ${SERIAL_NUM} -m ${SOC_MODEL} --temperature 0 --model_mode hybrid --max_seq_len 1024 --prefill_ar_len 128 --decoder_model qwen2_5-0_5b --prompt "I would like to learn python, could you teach me with a simple example?" --tasks wikitext --limit 1

@DamonFool
Copy link
Contributor

python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s ${SERIAL_NUM} -m ${SOC_MODEL} --temperature 0 --model_mode hybrid --max_seq_len 1024 --prefill_ar_len 128 --decoder_model qwen2_5-0_5b --prompt "I would like to learn python, could you teach me with a simple example?" --tasks wikitext --limit 1

Thanks @shewu-quic .
Will try it later.

xingguo01 pushed a commit to xingguo01/executorch that referenced this pull request Dec 18, 2025
…gle library (pytorch#15999)

Summary:

- Prevent dynamic_cast failures caused by separate typeinfo in each
library with clang.

cc: @haowhsu-quic
jirioc pushed a commit to nxp-upstream/executorch that referenced this pull request Dec 19, 2025
…gle library (pytorch#15999)

Summary:

- Prevent dynamic_cast failures caused by separate typeinfo in each
library with clang.

cc: @haowhsu-quic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants