SaT ONNX constructor crashes silently when run inside NVIDIA Triton Python backend

Description

I’m trying to deploy wtpsplit (SaT models) using ONNX Runtime inside the NVIDIA Triton Python backend.
The same code works in:

1. Google Colab
2. A normal Python process inside a Docker container
3. Even in triton when i ran without ONNX runtime

However, when the same SaT constructor is executed inside Triton’s Python backend, the process exits silently during model initialization, causing Triton to mark the model as UNHEALTHY.

There is no Python exception, no traceback, and no stderr output — the Python backend process simply terminates.

I checked online and tried these things (Not sure i should have tried them but I did)
1. TOKENIZERS_PARALLELISM=false
2. OMP_NUM_THREADS=1, MKL_NUM_THREADS=1
3. Explicit HF cache paths (HF_HOME, TRANSFORMERS_CACHE)
4. Disabling TensorRTExecutionProvider
5. Restricting ORT providers to CUDA + CPU
6. Running Triton Python backend in spawn mode instead of fork
7. Verifying ORT providers include CUDA (['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'])

I checked the other issues but didnt found any similar issue, would be great if someone can help in debugging this


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SaT ONNX constructor crashes silently when run inside NVIDIA Triton Python backend #166

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SaT ONNX constructor crashes silently when run inside NVIDIA Triton Python backend #166

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions