Idiomatic multi-gpu usage

### Describe the question.

I'm trying to utilize this library in multi-gpu torch settings and so far I'm getting no luck with frequent core dumped error, that doesn't tell me much. 

my current approach is this (LLM generated):

```            # Ensure CuPy/NVIMGCodec use the same device as this rank
            dev_idx = _torch.cuda.current_device() if _torch.cuda.is_available() else 0
            with cp.cuda.Device(dev_idx):
                encoder = nvimgcodec.Encoder()
                output_path = frame_dir / f"{to_stem(name)}.tiff"
                gpu_arr = cp.asarray(result.cpu())
                encoder.write(str(output_path), gpu_arr)
                # Release GPU memory retained by CuPy's pool to avoid OOM across concurrent saves
                try:
                    cp.get_default_memory_pool().free_all_blocks()
                except Exception:
                    pass
```

from nvidia-smi I see that most of the memory is being allocated on GPU 0 and it feels like nvImageEncoder is not explicitly aware of what GPU it is expected to run on. Can you provide example on how to utilize it and keep tensors on GPU?

### Check for duplicates

- [x] I have searched the [open bugs/issues](https://github.com/NVIDIA/nvImageCodec/issues) and have found no duplicates for this bug report

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Idiomatic multi-gpu usage #37

Describe the question.

Check for duplicates

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Idiomatic multi-gpu usage #37

Description

Describe the question.

Check for duplicates

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions