Skip to content

Qualcomm AI Engine Direct - GLM1.5B#15691

Merged
cccclai merged 2 commits intopytorch:mainfrom
CodeLinaro:dev1/winskuo/glm_1.5b
Nov 25, 2025
Merged

Qualcomm AI Engine Direct - GLM1.5B#15691
cccclai merged 2 commits intopytorch:mainfrom
CodeLinaro:dev1/winskuo/glm_1.5b

Conversation

@winskuo-quic
Copy link
Collaborator

@winskuo-quic winskuo-quic commented Nov 10, 2025

Summary

GLM Enablement
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s $DEVICE -m SM8750 --temperature 0 --model_mode kv --max_seq_len 128 --decoder_model glm-1_5b --prompt "Could you tell me about Facebook?"

Test plan

python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_static_glm1_5b --model SM8750 --build_folder build-android/ --executorch_root . -s $DEVICE --artifact ./glm1_5b

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15691

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 10, 2025
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, can you fix the lint?

attention_kwargs: Dict[str, Any] = dataclasses.field(default_factory=dict)
# Hybrid models can have layer types different from attention
layer_types: Optional[list] = None
model_architecture: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments to explain this variable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Thanks

"use_hf_rope": true,
"attention_qkv_bias": false,
"use_qk_norm": false,
"model_architecture" : "GlmForCausalLM"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any existing variable that can be used for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.
I was actually thinking of reusing base_model_name_or_path. However, it seems like this variable is used in optimum for some other purpose, like referring to actual model path, so I created a new variable to prevent any conflict in future.
Another reason of creating this config is that as we are enabling more models, we noticed minor differences among models. For example, GLM FeedForward is different from other model's FeedForward. We need some variables to differentiate GLM and other LLM models.

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 24, 2025

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D87811592.

@cccclai cccclai merged commit f84d45b into pytorch:main Nov 25, 2025
139 of 141 checks passed
jirioc pushed a commit to nxp-upstream/executorch that referenced this pull request Dec 19, 2025
### Summary

GLM Enablement
`python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s
$DEVICE -m SM8750 --temperature 0 --model_mode kv --max_seq_len 128
--decoder_model glm-1_5b --prompt "Could you tell me about Facebook?"`

### Test plan
`python backends/qualcomm/tests/test_qnn_delegate.py -k
TestExampleLLMScript.test_static_glm1_5b --model SM8750 --build_folder
build-android/ --executorch_root . -s $DEVICE --artifact ./glm1_5b`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants