-
Notifications
You must be signed in to change notification settings - Fork 76
Open
Description
Description
I’m trying to deploy wtpsplit (SaT models) using ONNX Runtime inside the NVIDIA Triton Python backend.
The same code works in:
- Google Colab
- A normal Python process inside a Docker container
- Even in triton when i ran without ONNX runtime
However, when the same SaT constructor is executed inside Triton’s Python backend, the process exits silently during model initialization, causing Triton to mark the model as UNHEALTHY.
There is no Python exception, no traceback, and no stderr output — the Python backend process simply terminates.
I checked online and tried these things (Not sure i should have tried them but I did)
- TOKENIZERS_PARALLELISM=false
- OMP_NUM_THREADS=1, MKL_NUM_THREADS=1
- Explicit HF cache paths (HF_HOME, TRANSFORMERS_CACHE)
- Disabling TensorRTExecutionProvider
- Restricting ORT providers to CUDA + CPU
- Running Triton Python backend in spawn mode instead of fork
- Verifying ORT providers include CUDA (['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'])
I checked the other issues but didnt found any similar issue, would be great if someone can help in debugging this
Metadata
Metadata
Assignees
Labels
No labels