Qualcomm AI Engine Direct - GLM1.5B#15691
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15691
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
cccclai
left a comment
There was a problem hiding this comment.
Looks good, can you fix the lint?
examples/models/llama/model_args.py
Outdated
| attention_kwargs: Dict[str, Any] = dataclasses.field(default_factory=dict) | ||
| # Hybrid models can have layer types different from attention | ||
| layer_types: Optional[list] = None | ||
| model_architecture: Optional[str] = None |
There was a problem hiding this comment.
Can you add comments to explain this variable?
There was a problem hiding this comment.
Added. Thanks
| "use_hf_rope": true, | ||
| "attention_qkv_bias": false, | ||
| "use_qk_norm": false, | ||
| "model_architecture" : "GlmForCausalLM" |
There was a problem hiding this comment.
Do we have any existing variable that can be used for this?
There was a problem hiding this comment.
Thanks for the suggestion.
I was actually thinking of reusing base_model_name_or_path. However, it seems like this variable is used in optimum for some other purpose, like referring to actual model path, so I created a new variable to prevent any conflict in future.
Another reason of creating this config is that as we are enabling more models, we noticed minor differences among models. For example, GLM FeedForward is different from other model's FeedForward. We need some variables to differentiate GLM and other LLM models.
306d471 to
555714b
Compare
### Summary GLM Enablement `python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s $DEVICE -m SM8750 --temperature 0 --model_mode kv --max_seq_len 128 --decoder_model glm-1_5b --prompt "Could you tell me about Facebook?"` ### Test plan `python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_static_glm1_5b --model SM8750 --build_folder build-android/ --executorch_root . -s $DEVICE --artifact ./glm1_5b`
Summary
GLM Enablement
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s $DEVICE -m SM8750 --temperature 0 --model_mode kv --max_seq_len 128 --decoder_model glm-1_5b --prompt "Could you tell me about Facebook?"Test plan
python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_static_glm1_5b --model SM8750 --build_folder build-android/ --executorch_root . -s $DEVICE --artifact ./glm1_5b