fix: Update transformers dependency to upper bound 4.55.2 and upgrade… by abhinavg4 · Pull Request #1478 · NVIDIA-NeMo/Curator

abhinavg4 · 2026-02-09T20:13:28Z

… pyasn1 and tokenizers versions

Changed transformers version constraint from >=4.55.2 to <=4.55.2 in pyproject.toml and uv.lock
Upgraded pyasn1 from 0.6.1 to 0.6.2
Downgraded tokenizers from 0.22.1 to 0.21.4

Description

Usage

# Add snippet demonstrating usage

Checklist

I am familiar with the Contributing Guide.
New or Existing tests cover these changes.
The documentation is up to date with these changes.

… pyasn1 and tokenizers versions - Changed transformers version constraint from >=4.55.2 to <=4.55.2 in pyproject.toml and uv.lock - Upgraded pyasn1 from 0.6.1 to 0.6.2 - Downgraded tokenizers from 0.22.1 to 0.21.4 Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>

greptile-apps

_{6 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-09T20:27:54Z

    "protobuf>=5.29.5",  # Override nemo-toolkits constraint of ~=5.29.5
    "setuptools>=80.10.1", # Override setuptools range in other dependencies to address CVE GHSA-58pv-8j8x-9vj2
-    "transformers>=4.55.2",
+    "transformers<=4.55.2",


The constraint <=4.55.2 prevents ANY version above 4.55.2 but doesn't specify a lower bound. This means version 4.0.0 would satisfy this constraint. If pinning to exactly 4.55.2 is needed, use ==4.55.2 instead. If allowing a range while preventing breaking changes, use >=4.55.0,<=4.55.2 or similar.

Suggested change

"transformers<=4.55.2",

"transformers==4.55.2",

…entries - Removed several outdated benchmark entries related to domain classification, embedding generation, and deduplication processes from the nightly benchmark configuration file. - This cleanup helps streamline the benchmarking process and focuses on relevant metrics and scripts. Signed-off-by: [Your Name] <your.email@example.com> Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>

- Changed the path for the video model weights dataset from 'video_model_weights' to 'video_model_weights_123' to reflect the new directory structure. - This update ensures that the nightly benchmarks reference the correct model weights location. Signed-off-by: [Your Name] <your.email@example.com> Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>

- Updated the path for the video model weights dataset from 'video_model_weights_123' back to 'video_model_weights' to align with the current directory structure. - Removed obsolete model directory references from benchmark entries to streamline the configuration. Signed-off-by: [Your Name] <your.email@example.com> Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>

greptile-apps

_{6 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-09T20:51:53Z

+from typing import Any
+
+from loguru import logger
+from utils import setup_executor, write_benchmark_results


Broken local import path

from utils import setup_executor, write_benchmark_results will fail when this script is executed from outside benchmarking/scripts (e.g., via the nightly benchmark runner), because utils.py lives at benchmarking/scripts/utils.py and this file only adds the tutorials directory to sys.path. As a result, the benchmark entry in benchmarking/nightly-benchmark.yaml will raise ModuleNotFoundError: No module named 'utils' at startup.

- Commented out existing video benchmark entries for embedding, transcoding, and captioning to prevent execution during nightly runs. - Updated the dataset name for video model weights to a blank space for clarity. Signed-off-by: [Your Name] <your.email@example.com> Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>

…ng generation - Introduced new benchmark configurations for domain classification using both ray_data and xenna executors, including metrics for throughput and domain predictions. - Added embedding generation benchmarks for both executors, focusing on throughput metrics. - Included exact and fuzzy deduplication identification benchmarks with specific requirements for metrics. - Updated the nightly benchmark configuration to enhance testing capabilities and ensure comprehensive performance tracking. Signed-off-by: [Your Name] <your.email@example.com> Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>

greptile-apps

_{6 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-10T00:55:45Z

 # appropriate path at runtime.
 results_path: /path/where/results/are/stored
 datasets_path: /path/to/datasets
+model_weights_path: /path/to/model_weights


Unresolved model_weights_path placeholder

model_weights_path is introduced and used in dataset paths (e.g. {model_weights_path}/fasttext/lid.176.bin), but the benchmarking runner’s PathResolver only knows results_path and datasets_path (benchmarking/runner/path_resolver.py). As a result, {model_weights_path} is left unsubstituted in commands (see Entry.substitute_paths_in_cmd()), which will make any entry consuming {dataset:fasttext_model,bin} pass an invalid literal path containing braces.

greptile-apps · 2026-02-10T00:55:46Z

+# The size of the object store used by Ray which can be either a value in bytes (int), or
+# a fraction of total system memory (float), or the value "default" (string) which allows
+# for "ray start" to determine object store size.
+object_store_size: 536870912000  # 500GB
+


object_store_size silently ignored

The new top-level object_store_size: key won’t take effect: Session.create_from_dict() filters YAML keys to dataclass field names, and the field is named default_object_store_size_bytes (not object_store_size). This means the configured 536870912000 value is dropped and the session will keep using its default (50% of host RAM).

greptile-apps · 2026-02-10T00:55:47Z

            "  - Aesthetic models: For filtering (--aesthetic-threshold)\n"
            "Default: ./models\n"
            "Example: --model-dir /path/to/models or --model-dir ./models"
-        )
+        ),
    )


CLI flag rename breaks docs/users

This PR renames the required CLI flag from --output-clip-path to --output-path, but the repository docs still reference the old flag (e.g. tutorials/video/getting-started/README.md and multiple docs/* pages). Running the documented commands will now fail with “unrecognized arguments: --output-clip-path”.

greptile-apps · 2026-02-10T00:55:48Z

+import argparse
+import sys
+import time
+import traceback
+from pathlib import Path


Broken import when executed

from utils import setup_executor, write_benchmark_results relies on the current working directory / sys.path containing benchmarking/scripts. When the benchmark runner executes this script via an absolute path (python benchmarking/scripts/video_pipeline_benchmark.py ...), utils.py won’t be importable as a top-level module and this will raise ModuleNotFoundError. Prefer an explicit package import (e.g. from benchmarking.scripts.utils import ...) or adjust sys.path accordingly.

ayushdg · 2026-02-11T21:55:39Z

@abhinavg4 okay to close this in favor of #1471 ?

copy-pr-bot Bot temporarily deployed to test February 9, 2026 20:13 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci February 9, 2026 20:13 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci February 9, 2026 20:13 Error

copy-pr-bot Bot temporarily deployed to nemo-ci February 9, 2026 20:13 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci February 9, 2026 20:13 Error

copy-pr-bot Bot temporarily deployed to nemo-ci February 9, 2026 20:13 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci February 9, 2026 20:13 Error

copy-pr-bot Bot temporarily deployed to nemo-ci February 9, 2026 20:13 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci February 9, 2026 20:26 Inactive

greptile-apps Bot reviewed Feb 9, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to test February 9, 2026 20:46 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci February 9, 2026 20:46 Error

abhinavg4 added 2 commits February 9, 2026 20:48

greptile-apps Bot reviewed Feb 9, 2026

View reviewed changes

abhinavg4 added 2 commits February 10, 2026 00:45

greptile-apps Bot reviewed Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Update transformers dependency to upper bound 4.55.2 and upgrade…#1478

fix: Update transformers dependency to upper bound 4.55.2 and upgrade…#1478
abhinavg4 wants to merge 7 commits intor1.1.0from
video_transformers_r110

abhinavg4 commented Feb 9, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

greptile-apps Bot Feb 9, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

greptile-apps Bot Feb 9, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

greptile-apps Bot Feb 10, 2026

Uh oh!

greptile-apps Bot Feb 10, 2026

Uh oh!

greptile-apps Bot Feb 10, 2026

Uh oh!

greptile-apps Bot Feb 10, 2026

Uh oh!

ayushdg commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

abhinavg4 commented Feb 9, 2026

Description

Usage

Checklist

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

ayushdg commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants