-
Notifications
You must be signed in to change notification settings - Fork 693
[Model] support other model #3786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for a new multimodal model called "QF VL" (QFVLForConditionalGeneration) to the FastDeploy framework. The implementation includes comprehensive multimodal processing capabilities for handling both images and videos alongside text inputs.
Key changes include:
- Addition of the QF VL model architecture with SigLIP vision transformer backbone
- Implementation of multimodal input processing pipeline for text, images, and videos
- Integration of the new model type into the existing model registry and preprocessing workflow
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/worker/gpu_model_runner.py | Adds QF model-specific RoPE embedding handling and vision feature extraction |
| fastdeploy/multimodal/utils.py | Removes unused import and adds formatting cleanup |
| fastdeploy/multimodal/registry.py | Registers the new QFVLForConditionalGeneration model type |
| fastdeploy/model_executor/models/qf_vl/siglip.py | Implements SigLIP vision transformer with rotary embeddings and attention mechanisms |
| fastdeploy/model_executor/models/qf_vl/qf_vl.py | Defines the main QF VL model architecture and weight management |
| fastdeploy/model_executor/models/qf_vl/projector.py | Implements vision-to-text feature projection layer |
| fastdeploy/model_executor/models/qf_vl/config.py | Defines configuration classes for the QF VL model |
| fastdeploy/model_executor/models/qf_vl/init.py | Module initialization file |
| fastdeploy/input/qf_vl_processor/qf_vl_processor.py | Main processor for handling QF VL multimodal inputs |
| fastdeploy/input/qf_vl_processor/process.py | Core data processing logic for tokenization and multimodal handling |
| fastdeploy/input/qf_vl_processor/image_processor.py | Image and video preprocessing implementation |
| fastdeploy/input/qf_vl_processor/init.py | Module initialization for QF VL processor |
| fastdeploy/input/preprocess.py | Integrates QF VL processor into the preprocessing pipeline |
Comments suppressed due to low confidence (1)
fastdeploy/input/qf_vl_processor/process.py:1
- The dtype parameter should be passed to np.concatenate as a separate argument, not as a keyword argument. The correct syntax is np.concatenate([...]).astype(np.int64).
"""
| ) | ||
| else: |
Copilot
AI
Sep 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditional logic for QF model handling should include the original code inside the else block. The current structure suggests the else block is empty, which may break existing functionality for non-QF models.
| patch_embeds = self.patch_embedding(pixel_values.to(dtype=target_dtype)) # shape = [*, width, grid, grid] | ||
| embeddings = patch_embeds.flatten(-2).squeeze(-1) | ||
| embeddings = rearrange(embeddings, "(b l) d -> b l d", b=batch_size, l=squence_len) | ||
| # todo: not dubug |
Copilot
AI
Sep 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment contains a typo. 'dubug' should be 'debug'.
| # todo: not dubug | |
| # todo: not debug |
| **({extra.value: True} if extra else {}), | ||
| } | ||
|
|
||
| if "lm_head.weight" or "" in weight_name: |
Copilot
AI
Sep 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition will always evaluate to True because 'lm_head.weight' is a non-empty string. The intended logic appears to be checking if weight_name contains 'lm_head.weight' or is empty.
| if "lm_head.weight" or "" in weight_name: | |
| if "lm_head.weight" in weight_name or weight_name == "": |
| tool_parser_obj=None, | ||
| ): | ||
| """ | ||
| Initialize QwenVLProcessor instance. |
Copilot
AI
Sep 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring incorrectly refers to 'QwenVLProcessor' instead of 'QFVLProcessor'.
| Initialize QwenVLProcessor instance. | |
| Initialize QFVLProcessor instance. |
| @@ -0,0 +1,107 @@ | |||
| """ | |||
| # Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2025
4ddac23 to
65c3b3f
Compare
|
In #4396 |
No description provided.