Skip to content

fix: Update transformers dependency to upper bound 4.55.2 and upgrade…#1478

Closed
abhinavg4 wants to merge 7 commits intor1.1.0from
video_transformers_r110
Closed

fix: Update transformers dependency to upper bound 4.55.2 and upgrade…#1478
abhinavg4 wants to merge 7 commits intor1.1.0from
video_transformers_r110

Conversation

@abhinavg4
Copy link
Copy Markdown
Contributor

… pyasn1 and tokenizers versions

  • Changed transformers version constraint from >=4.55.2 to <=4.55.2 in pyproject.toml and uv.lock
  • Upgraded pyasn1 from 0.6.1 to 0.6.2
  • Downgraded tokenizers from 0.22.1 to 0.21.4

Description

Usage

# Add snippet demonstrating usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

… pyasn1 and tokenizers versions

- Changed transformers version constraint from >=4.55.2 to <=4.55.2 in pyproject.toml and uv.lock
- Upgraded pyasn1 from 0.6.1 to 0.6.2
- Downgraded tokenizers from 0.22.1 to 0.21.4

Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment thread pyproject.toml
"protobuf>=5.29.5", # Override nemo-toolkits constraint of ~=5.29.5
"setuptools>=80.10.1", # Override setuptools range in other dependencies to address CVE GHSA-58pv-8j8x-9vj2
"transformers>=4.55.2",
"transformers<=4.55.2",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constraint <=4.55.2 prevents ANY version above 4.55.2 but doesn't specify a lower bound. This means version 4.0.0 would satisfy this constraint. If pinning to exactly 4.55.2 is needed, use ==4.55.2 instead. If allowing a range while preventing breaking changes, use >=4.55.0,<=4.55.2 or similar.

Suggested change
"transformers<=4.55.2",
"transformers==4.55.2",

…entries

- Removed several outdated benchmark entries related to domain classification, embedding generation, and deduplication processes from the nightly benchmark configuration file.
- This cleanup helps streamline the benchmarking process and focuses on relevant metrics and scripts.

Signed-off-by: [Your Name] <your.email@example.com>
Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
- Changed the path for the video model weights dataset from 'video_model_weights' to 'video_model_weights_123' to reflect the new directory structure.
- This update ensures that the nightly benchmarks reference the correct model weights location.

Signed-off-by: [Your Name] <your.email@example.com>
Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
- Updated the path for the video model weights dataset from 'video_model_weights_123' back to 'video_model_weights' to align with the current directory structure.
- Removed obsolete model directory references from benchmark entries to streamline the configuration.

Signed-off-by: [Your Name] <your.email@example.com>
Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

from typing import Any

from loguru import logger
from utils import setup_executor, write_benchmark_results
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken local import path

from utils import setup_executor, write_benchmark_results will fail when this script is executed from outside benchmarking/scripts (e.g., via the nightly benchmark runner), because utils.py lives at benchmarking/scripts/utils.py and this file only adds the tutorials directory to sys.path. As a result, the benchmark entry in benchmarking/nightly-benchmark.yaml will raise ModuleNotFoundError: No module named 'utils' at startup.

- Commented out existing video benchmark entries for embedding, transcoding, and captioning to prevent execution during nightly runs.
- Updated the dataset name for video model weights to a blank space for clarity.

Signed-off-by: [Your Name] <your.email@example.com>
Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
…ng generation

- Introduced new benchmark configurations for domain classification using both ray_data and xenna executors, including metrics for throughput and domain predictions.
- Added embedding generation benchmarks for both executors, focusing on throughput metrics.
- Included exact and fuzzy deduplication identification benchmarks with specific requirements for metrics.
- Updated the nightly benchmark configuration to enhance testing capabilities and ensure comprehensive performance tracking.

Signed-off-by: [Your Name] <your.email@example.com>
Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 8 to +11
# appropriate path at runtime.
results_path: /path/where/results/are/stored
datasets_path: /path/to/datasets
model_weights_path: /path/to/model_weights
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolved model_weights_path placeholder

model_weights_path is introduced and used in dataset paths (e.g. {model_weights_path}/fasttext/lid.176.bin), but the benchmarking runner’s PathResolver only knows results_path and datasets_path (benchmarking/runner/path_resolver.py). As a result, {model_weights_path} is left unsubstituted in commands (see Entry.substitute_paths_in_cmd()), which will make any entry consuming {dataset:fasttext_model,bin} pass an invalid literal path containing braces.

Comment on lines +83 to +87
# The size of the object store used by Ray which can be either a value in bytes (int), or
# a fraction of total system memory (float), or the value "default" (string) which allows
# for "ray start" to determine object store size.
object_store_size: 536870912000 # 500GB

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object_store_size silently ignored

The new top-level object_store_size: key won’t take effect: Session.create_from_dict() filters YAML keys to dataclass field names, and the field is named default_object_store_size_bytes (not object_store_size). This means the configured 536870912000 value is dropped and the session will keep using its default (50% of host RAM).

Comment on lines 275 to 279
" - Aesthetic models: For filtering (--aesthetic-threshold)\n"
"Default: ./models\n"
"Example: --model-dir /path/to/models or --model-dir ./models"
)
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI flag rename breaks docs/users

This PR renames the required CLI flag from --output-clip-path to --output-path, but the repository docs still reference the old flag (e.g. tutorials/video/getting-started/README.md and multiple docs/* pages). Running the documented commands will now fail with “unrecognized arguments: --output-clip-path”.

Comment on lines +22 to +26
import argparse
import sys
import time
import traceback
from pathlib import Path
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken import when executed

from utils import setup_executor, write_benchmark_results relies on the current working directory / sys.path containing benchmarking/scripts. When the benchmark runner executes this script via an absolute path (python benchmarking/scripts/video_pipeline_benchmark.py ...), utils.py won’t be importable as a top-level module and this will raise ModuleNotFoundError. Prefer an explicit package import (e.g. from benchmarking.scripts.utils import ...) or adjust sys.path accordingly.

@ayushdg
Copy link
Copy Markdown
Contributor

ayushdg commented Feb 11, 2026

@abhinavg4 okay to close this in favor of #1471 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants