Revert "Remove nvenc/dec for xenna 0.1.6" by ayushdg · Pull Request #1374 · NVIDIA-NeMo/Curator

ayushdg · 2026-01-14T20:51:56Z

Reverts #1202 due to some issues we see with Xenna's GPU usage vs the allocated resources via Ray

greptile-apps · 2026-01-14T21:01:26Z

Greptile Summary

This PR reverts changes from #1202 to address GPU allocation issues observed with Xenna's GPU usage vs Ray-allocated resources.

Key changes:

Downgrades cosmos-xenna from 0.1.8 to 0.1.2
Re-introduces nvdec/nvenc resource tracking in the Resources dataclass
Updates ClipTranscodingStage to partition GPU resources based on nvenc counts instead of fractional GPU shares
Updates import paths in Xenna adapter from pipelines.private.resources to ray_utils.resources
Adds explicit error handling in Ray Data adapter when nvdecs/nvencs are requested without GPU support

Impact:

Video transcoding stages now request nvenc/nvdec units directly rather than fractional GPU allocations
Ray Data backend explicitly rejects nvdec/nvenc resource requests (with a TODO for future support)
Xenna backend now passes nvdecs, nvencs, and entire_gpu parameters to resource allocation

Confidence Score: 4/5

This revert PR is safe to merge as it addresses known GPU allocation issues with Xenna.
The PR is a clean revert addressing a known issue. The changes are straightforward and align with the linked cosmos-curate implementation. Minor concerns exist around edge cases (empty GPU list, zero division) but these represent scenarios where the stage wouldn't function anyway.
nemo_curator/stages/video/clipping/clip_extraction_stages.py (GPU info access), tests/stages/video/clipping/test_clip_transcoding_stage.py (unused mock class)

Important Files Changed

Filename	Overview
nemo_curator/stages/video/clipping/clip_extraction_stages.py	Reverts GPU resource allocation to use nvdec/nvenc units instead of fractional GPU shares, but accessing `[0]` on GPU info list without validation could raise IndexError if no GPUs are detected.
nemo_curator/stages/resources.py	Adds nvdecs/nvencs fields to Resources dataclass and removes automatic `gpus=1.0` assignment when `entire_gpu=True`. This is intentional as Xenna handles entire_gpu allocation differently.
nemo_curator/backends/xenna/adapter.py	Updates import paths from `pipelines.private.resources` to `ray_utils.resources` to match cosmos-xenna 0.1.2 API, and passes nvdecs/nvencs/entire_gpu to XennaResources.
nemo_curator/backends/experimental/ray_data/adapter.py	Adds explicit error when nvdecs/nvencs are requested without GPU allocation in Ray Data backend, which does not yet support these resources.
pyproject.toml	Downgrades cosmos-xenna from 0.1.8 to 0.1.2 to address GPU usage vs allocated resources issues via Ray.
tests/stages/video/clipping/test_clip_transcoding_stage.py	Removes tests for old GPU allocation behavior but adds unused MockGpuResources class. Existing tests need updating to cover new nvenc resource logic.

Sequence Diagram

sequenceDiagram
    participant User
    participant ClipTranscodingStage
    participant cosmos_xenna
    participant Resources
    participant XennaAdapter
    participant XennaResources

    User->>ClipTranscodingStage: __post_init__(encoder="h264_nvenc")
    ClipTranscodingStage->>cosmos_xenna: _get_local_gpu_info()
    cosmos_xenna-->>ClipTranscodingStage: [GpuInfo(name="...")]
    ClipTranscodingStage->>cosmos_xenna: _make_gpu_resources_from_gpu_name(name)
    cosmos_xenna-->>ClipTranscodingStage: GpuResources(num_nvencs=N)
    ClipTranscodingStage->>Resources: Resources(nvencs=N/streams, gpu_memory_gb=M/streams)
    Resources-->>ClipTranscodingStage: resources

    Note over XennaAdapter: During pipeline execution
    XennaAdapter->>Resources: processing_stage.resources
    Resources-->>XennaAdapter: {cpus, gpus, nvdecs, nvencs, entire_gpu}
    XennaAdapter->>XennaResources: XennaResources(cpus, gpus, nvdecs, nvencs, entire_gpu)
    XennaResources-->>XennaAdapter: resource allocation for Ray

greptile-apps

Additional Comments (1)

tests/stages/video/clipping/test_clip_transcoding_stage.py, line 31-42 (link)

style: Both MockGpuInfo and MockGpuResources classes are defined but never used in any tests. Consider removing them or adding tests that utilize these mocks.

_{7 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{7 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-16T15:53:37Z

+                gpu_info = _get_local_gpu_info()[0]
+                nvencs = _make_gpu_resources_from_gpu_name(gpu_info.name).num_nvencs


logic: Accessing [0] assumes at least one GPU is available. This will raise an IndexError if no GPUs are detected. Should this handle the case where no GPUs are available or add validation?

greptile-apps · 2026-01-16T15:53:37Z

+                    nvencs=nvencs // self.nb_streams_per_gpu, gpu_memory_gb=gpu_memory_gb // self.nb_streams_per_gpu
+                )


logic: Integer division by nb_streams_per_gpu could result in zero resources if the divisor is larger than the dividend. What should happen when nb_streams_per_gpu exceeds available nvencs or gpu_memory_gb?

greptile-apps · 2026-01-16T19:38:29Z

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

greptile-apps

_{7 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-16T20:17:42Z

+# Mock GPU resources class to simulate GPU resources
+class MockGpuResources:
+    def __init__(self, num_nvencs: int = 3, num_nvdecs: int = 3):
+        self.num_nvencs = num_nvencs
+        self.num_nvdecs = num_nvdecs
+


style: MockGpuResources class is defined but never used in any test. Either add tests for the new nvenc-based resource allocation in ClipTranscodingStage.__post_init__ that use this mock, or remove the dead code.

copy-pr-bot Bot temporarily deployed to test January 14, 2026 20:52 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci January 14, 2026 20:52 Inactive