Skip to content

Add gpu optimizations to base model#14

Merged
jlamypoirier merged 79 commits intomainfrom
fast_inference_base
Mar 2, 2023
Merged

Add gpu optimizations to base model#14
jlamypoirier merged 79 commits intomainfrom
fast_inference_base

Conversation

@jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Feb 28, 2023

This allows KV cache pre-allocation and key length padding outside of the inference runner. With this, the inference runner is exclusively a CPU optimization (except for small GPU gains from cuda graphs)

Separate PR for now because the inference runner needs to be adapted.

bigximik and others added 30 commits August 31, 2022 04:41
* Added onnx config whisper

* added whisper support onnx

* add audio input data

* added whisper support onnx

* fixed the seqlength value

* Updated the whisper onnx ocnfig

* restore files to old version

* removed attention mask from inputs

* Updated get_dummy_input_onnxruntime docstring

* Updated relative imports and token generation

* update docstring
* Add ESMFold code sample

* sorry sylvain

* make fixup

* sorry sylvain again
Base automatically changed from fast_inference to main March 2, 2023 19:26
@jlamypoirier jlamypoirier marked this pull request as ready for review March 2, 2023 19:28
@jlamypoirier jlamypoirier merged commit 9c3c548 into main Mar 2, 2023
@jlamypoirier jlamypoirier deleted the fast_inference_base branch March 2, 2023 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants