Skip to content

SaT ONNX constructor crashes silently when run inside NVIDIA Triton Python backend #166

@samyak112

Description

@samyak112

Description

I’m trying to deploy wtpsplit (SaT models) using ONNX Runtime inside the NVIDIA Triton Python backend.
The same code works in:

  1. Google Colab
  2. A normal Python process inside a Docker container
  3. Even in triton when i ran without ONNX runtime

However, when the same SaT constructor is executed inside Triton’s Python backend, the process exits silently during model initialization, causing Triton to mark the model as UNHEALTHY.

There is no Python exception, no traceback, and no stderr output — the Python backend process simply terminates.

I checked online and tried these things (Not sure i should have tried them but I did)

  1. TOKENIZERS_PARALLELISM=false
  2. OMP_NUM_THREADS=1, MKL_NUM_THREADS=1
  3. Explicit HF cache paths (HF_HOME, TRANSFORMERS_CACHE)
  4. Disabling TensorRTExecutionProvider
  5. Restricting ORT providers to CUDA + CPU
  6. Running Triton Python backend in spawn mode instead of fork
  7. Verifying ORT providers include CUDA (['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'])

I checked the other issues but didnt found any similar issue, would be great if someone can help in debugging this

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions