Conversation
There was a problem hiding this comment.
PR Summary
This PR adds OpenVINO and TensorRT optimizations to improve model inference performance across different hardware platforms.
- Added OpenVINO support in
/libs/infinity_emb/Docker.template.yamlwithINFINITY_ENGINE="optimum"for CPU builds - Updated TensorRT support with CUDA 12.3.2 and TensorRT 10.3.0 in
/libs/infinity_emb/Docker.template.yaml - Added provider-specific optimizations in
/libs/infinity_emb/infinity_emb/transformer/utils_optimum.pyfor OpenVINO and TensorRT - Added quantized model support for OpenVINO in
/libs/infinity_emb/infinity_emb/transformer/embedder/optimum.py - Updated dependencies in
pyproject.tomlto use OpenVINO 2024.4.0 and TensorRT 10.6.0
11 file(s) reviewed, 7 comment(s)
Edit PR Review Bot Settings | Greptile
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files@@ Coverage Diff @@
## main #460 +/- ##
==========================================
- Coverage 79.23% 73.02% -6.21%
==========================================
Files 42 42
Lines 3380 3392 +12
==========================================
- Hits 2678 2477 -201
- Misses 702 915 +213 ☔ View full report in Codecov by Sentry. |
openvino not working, need to use #454
openvino:
tensorrt: