Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
OwlViT and OwlV2 need an upgrade to their onnx config (opset=11->12 for the einsum operator), I will push it once the onnxruntime tests finish (to see if there's anything left). |
echarlaix
left a comment
There was a problem hiding this comment.
Thanks a lot for taking care of this @IlyasMoutawwakil
|
should be all fixed now |
echarlaix
left a comment
There was a problem hiding this comment.
Looks great thanks a lot @IlyasMoutawwakil
| if use_torch is True: | ||
| cache_position = cache_position.to(self.device) | ||
|
|
||
| return use_cache_branch_tensor, past_key_values, cache_position |
There was a problem hiding this comment.
this is a breaking change so we should be careful, not sure this method is used by anyone though
There was a problem hiding this comment.
the method is only used by the forward pass, I don't think any sub packages use it
…gface/optimum into support-transformers-4.43
|
@IlyasMoutawwakil Is this PR ready to support transformers 4.43.3 version? |
|
@sreenivasulureddysura yep it's ready |
* fix bt bark test * setup * patch clip models for sd * infer ort model dtype property from inputs dtypes * patch all clip variants * device setter * bigger model for now * fix device attribution * onnx opset for owlvit and owlv2 * model dtype * revert * use model part dtype instead * no need for dtype with diffusion pipelines * revert * fix clip text model with projection not outputting hidden states * whisper generation * fix whisper, support cache_position, and using transformers whisper generation loop * style * create cache position for merged decoder and fix test for non whisper speech to text * typo * conditioned cache position argument * update whisper min transformers version * compare whisper ort generation with transformers * fix generation length for speech to text model type * cache position in whisper only with dynamic axis decoder_sequence_length * use minimal prepare_inputs_for_generation in ORTModelForSpeechSeq2Seq * remove version restrictions on whisper * comment * fix * simpler --------- Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
Support the `cache_position` input that was added to Hugging Face Whisper models as part of a revision of how it handles KV-caching. This is like `position_ids`, but there is no batch dimension. See huggingface/optimum#1971 and huggingface/transformers#31166.
Support the `cache_position` input that was added to Hugging Face Whisper models as part of a revision of how it handles KV-caching. This is like `position_ids`, but there is no batch dimension. See huggingface/optimum#1971 and huggingface/transformers#31166.
What does this PR do?
This PR adds support for transformers 4.43 with which the following issues emerge:
clipmodels usingsdpaattention.pipelinecallingmodel.dtypeto convert inputs to the correct floating point precisionbarkmodels can't be saved due to shared tensors (track inBarkModelcan't be saved anymore transformers#32224)cache_position.Before submitting
Who can review?