Adding VLM pipeline by qcdipankar · Pull Request #234 · quic/efficient-transformers

qcdipankar · 2025-01-21T14:24:57Z

We were able to create automodel for vlm specifically for phi3-vision

Supports available are

Export to onnx for 1 layer and full
generate
compile
inference on pytorch and run ai 100 is coming correct

TODO
Transformer model Clip needs to be set to eager manually for the test_vlm_model to run

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com>

quic-akuruvil · 2025-01-21T14:44:50Z

-    SEQ_LEN = 32
-    CTX_LEN = 32
+    SEQ_LEN = 1024
+    CTX_LEN = 1280


Changing this here, will affect the causal_lm models, specify another set of constants for vlm models.

Since the above seq_len ctx_len is not generalized value, when considering other vlms, keeping it inside the specific function is better, I think.

resolved in new patch

quic-akuruvil · 2025-01-21T14:48:35Z


 ONNX_EXPORT_EXAMPLE_BATCH_SIZE = 1
-ONNX_EXPORT_EXAMPLE_SEQ_LEN = 32
+ONNX_EXPORT_EXAMPLE_SEQ_LEN = 1024


Same here, please verify and make sure the existing causalLM pipeline is not broken

resolved in new patch

quic-akuruvil · 2025-01-21T15:09:13Z

Keep name test_image_text_to_text_models

resolved in new patch

quic-akuruvil · 2025-01-22T02:12:56Z

+        #     raise TypeError("missing required argument: 'full_batch_size'")
+
+        # if kv_cache_batch_size and not full_batch_size:
+        #     raise ValueError(


avoid any commented lines in code

resolved in new patch

quic-rishinr · 2025-01-22T02:59:54Z

Please add documentation and an example script.

quic-amitraj

I observed code cleaning yet to be done. Please address all the comments and remove all the commented and unnecessary lines. Also update the docstring accordingly.

quic-amitraj · 2025-01-23T17:43:22Z

+        from QEfficient import QEFFAutoModelForImageTextToText
+        from transformers import AutoTokenizer
+
+        model_name = "llava"


Update Docstrings.

quic-amitraj · 2025-01-23T17:47:47Z

+            # warnings.warn(
+            #     "full_batch_size argument is deprecated. Use continuous_batching=True instead.", DeprecationWarning, 2
+            # )
+        # breakpoint()


Remove commented codes.

quic-amitraj · 2025-01-23T17:54:27Z

+            from transformers import AutoTokenizer
+
+            # Initialize the model using from_pretrained similar to transformers.AutoModelForCausalLM
+            model_name = "gpt2"


Update here as well.

quic-akuruvil · 2025-01-24T06:07:31Z

Verify the PR on 4.46.0 transformers version

quic-rishinr · 2025-01-22T06:43:45Z

 from QEfficient.transformers.quantizers.quant_transforms import AwqToMatmulNbitsTransform, GPTQToMatmulNbitsTransform
 from QEfficient.utils import constants, get_padding_shape_from_config
+
+# from QEfficient.transformers.models.phi3_vision.modeling_phi3_vision import Phi3VModelWrapper


Please remove unused imports

quic-rishinr · 2025-01-22T06:44:02Z


+class QEFFAutoModelForImageTextToText(QEFFTransformersBase):
+    """
+    The QEFF class is designed for manipulating any causal language model from the HuggingFace hub.


Please update the doc string. Doc string is referring to causal model

quic-rishinr · 2025-01-22T06:45:06Z

+        from transformers import AutoTokenizer
+
+        model_name = "llava"
+        model = QEFFAutoModelForCausalLM.from_pretrained(model_name, num_hidden_layers=2)


Why is it using QEFFAutoModelForCausalLM? Can you update it with right scripts?

quic-rishinr · 2025-01-22T06:49:02Z

+            # warnings.warn(
+            #     "full_batch_size argument is deprecated. Use continuous_batching=True instead.", DeprecationWarning, 2
+            # )
+        # breakpoint()


Please remove the debugging/commented lines

quic-rishinr · 2025-01-22T06:50:14Z

+        self.continuous_batching = continuous_batching
+        self.is_tlm = is_tlm
+        self.pad_token_id = model.config.pad_token_id
+        self.ctx_len = 1280


Why Ctx len is hardcoded?

quic-rishinr · 2025-01-23T05:12:26Z

+        :ctx_len (int): Maximum context length to compile the model.
+        :n_layers (int): Number of layers for the Model.
+    """
+    # replace_transformers_quantizers()


Please remove unwanted commented lines

quic-rishinr · 2025-01-23T05:12:55Z

+    streamer = TextStreamer(processor)
+    # Testing for Phi-3.5 only atm
+    inputs = _generate_inputs(model_hf, processor)
+    breakpoint()


remove break point

quic-rishinr · 2025-01-23T05:13:15Z

+        num_hidden_layers=n_layer,
+        _attn_implementation="eager",
+        trust_remote_code=True,
+        # Check if this works


please remove unwanted commented line from this method

quic-rishinr · 2025-01-23T05:13:53Z

+    model_hf, _ = load_vlm_model(model_config)
+    # Load processor instead
+    processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
+    # config = model_hf.config


remove unwanted commented line

quic-rishinr · 2025-01-23T05:28:46Z

+    qeff_model.generate(inputs, streamer, device_ids=[0], runtime_ai100=False)
+    # cloud_ai_100_tokens = exec_info[0]  # Because we always run for single input and single batch size
+    # gen_len = ort_tokens.shape[-1]
+    # assert (


Why are all asserts commented out? The objective of this method is to test the output correctness of native pytorch model output vs transformed pytorch output vs ORT output vs ai100 output. Please do add all the tests back.

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com>

quic-akuruvil · 2025-01-29T05:57:47Z

+from QEfficient.utils import hf_download
+
+
+def load_vlm_model(model_config):


Change name to image_text_to_text

quic-dhirajku · 2025-01-29T11:11:20Z

+        )
+        inputs["attention_mask"] = torch.nn.functional.pad(
+            inputs["attention_mask"], (0, 1024 - input_ids_size), "constant", 0
+        )


The values 1024 have to be replaced with constant.seq_len.

quic-amitraj · 2025-02-14T11:14:39Z

Already addressed in #267

Adding VLM pipeline

2cee194

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com>

qcdipankar requested review from ochougul and quic-rishinr as code owners January 21, 2025 14:24

qcdipankar requested review from anujgupt-github, quic-akuruvil and vbaddi January 21, 2025 14:37

quic-akuruvil requested changes Jan 22, 2025

View reviewed changes

quic-amitraj marked this pull request as draft January 23, 2025 10:45

quic-amitraj self-requested a review January 23, 2025 17:39

quic-amitraj requested changes Jan 23, 2025

View reviewed changes

quic-rishinr reviewed Jan 24, 2025

View reviewed changes

Adding VLM Pipeline to Qeff

b97cd22

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com>

quic-akuruvil reviewed Jan 29, 2025

View reviewed changes

quic-dhirajku reviewed Jan 29, 2025

View reviewed changes

quic-amitraj closed this Feb 14, 2025

		from QEfficient.utils import hf_download


		def load_vlm_model(model_config):

Conversation

qcdipankar commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quic-rishinr commented Jan 22, 2025

Uh oh!

quic-amitraj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quic-akuruvil commented Jan 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quic-amitraj commented Feb 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

qcdipankar commented Jan 21, 2025 •

edited

Loading