fix: Update transformers dependency to upper bound 4.55.2 and upgrade…#1478
fix: Update transformers dependency to upper bound 4.55.2 and upgrade…#1478
Conversation
… pyasn1 and tokenizers versions - Changed transformers version constraint from >=4.55.2 to <=4.55.2 in pyproject.toml and uv.lock - Upgraded pyasn1 from 0.6.1 to 0.6.2 - Downgraded tokenizers from 0.22.1 to 0.21.4 Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
| "protobuf>=5.29.5", # Override nemo-toolkits constraint of ~=5.29.5 | ||
| "setuptools>=80.10.1", # Override setuptools range in other dependencies to address CVE GHSA-58pv-8j8x-9vj2 | ||
| "transformers>=4.55.2", | ||
| "transformers<=4.55.2", |
There was a problem hiding this comment.
The constraint <=4.55.2 prevents ANY version above 4.55.2 but doesn't specify a lower bound. This means version 4.0.0 would satisfy this constraint. If pinning to exactly 4.55.2 is needed, use ==4.55.2 instead. If allowing a range while preventing breaking changes, use >=4.55.0,<=4.55.2 or similar.
| "transformers<=4.55.2", | |
| "transformers==4.55.2", |
…entries - Removed several outdated benchmark entries related to domain classification, embedding generation, and deduplication processes from the nightly benchmark configuration file. - This cleanup helps streamline the benchmarking process and focuses on relevant metrics and scripts. Signed-off-by: [Your Name] <your.email@example.com> Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
- Changed the path for the video model weights dataset from 'video_model_weights' to 'video_model_weights_123' to reflect the new directory structure. - This update ensures that the nightly benchmarks reference the correct model weights location. Signed-off-by: [Your Name] <your.email@example.com> Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
- Updated the path for the video model weights dataset from 'video_model_weights_123' back to 'video_model_weights' to align with the current directory structure. - Removed obsolete model directory references from benchmark entries to streamline the configuration. Signed-off-by: [Your Name] <your.email@example.com> Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
| from typing import Any | ||
|
|
||
| from loguru import logger | ||
| from utils import setup_executor, write_benchmark_results |
There was a problem hiding this comment.
Broken local import path
from utils import setup_executor, write_benchmark_results will fail when this script is executed from outside benchmarking/scripts (e.g., via the nightly benchmark runner), because utils.py lives at benchmarking/scripts/utils.py and this file only adds the tutorials directory to sys.path. As a result, the benchmark entry in benchmarking/nightly-benchmark.yaml will raise ModuleNotFoundError: No module named 'utils' at startup.
- Commented out existing video benchmark entries for embedding, transcoding, and captioning to prevent execution during nightly runs. - Updated the dataset name for video model weights to a blank space for clarity. Signed-off-by: [Your Name] <your.email@example.com> Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
…ng generation - Introduced new benchmark configurations for domain classification using both ray_data and xenna executors, including metrics for throughput and domain predictions. - Added embedding generation benchmarks for both executors, focusing on throughput metrics. - Included exact and fuzzy deduplication identification benchmarks with specific requirements for metrics. - Updated the nightly benchmark configuration to enhance testing capabilities and ensure comprehensive performance tracking. Signed-off-by: [Your Name] <your.email@example.com> Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
| # appropriate path at runtime. | ||
| results_path: /path/where/results/are/stored | ||
| datasets_path: /path/to/datasets | ||
| model_weights_path: /path/to/model_weights |
There was a problem hiding this comment.
Unresolved model_weights_path placeholder
model_weights_path is introduced and used in dataset paths (e.g. {model_weights_path}/fasttext/lid.176.bin), but the benchmarking runner’s PathResolver only knows results_path and datasets_path (benchmarking/runner/path_resolver.py). As a result, {model_weights_path} is left unsubstituted in commands (see Entry.substitute_paths_in_cmd()), which will make any entry consuming {dataset:fasttext_model,bin} pass an invalid literal path containing braces.
| # The size of the object store used by Ray which can be either a value in bytes (int), or | ||
| # a fraction of total system memory (float), or the value "default" (string) which allows | ||
| # for "ray start" to determine object store size. | ||
| object_store_size: 536870912000 # 500GB | ||
|
|
There was a problem hiding this comment.
object_store_size silently ignored
The new top-level object_store_size: key won’t take effect: Session.create_from_dict() filters YAML keys to dataclass field names, and the field is named default_object_store_size_bytes (not object_store_size). This means the configured 536870912000 value is dropped and the session will keep using its default (50% of host RAM).
| " - Aesthetic models: For filtering (--aesthetic-threshold)\n" | ||
| "Default: ./models\n" | ||
| "Example: --model-dir /path/to/models or --model-dir ./models" | ||
| ) | ||
| ), | ||
| ) |
There was a problem hiding this comment.
CLI flag rename breaks docs/users
This PR renames the required CLI flag from --output-clip-path to --output-path, but the repository docs still reference the old flag (e.g. tutorials/video/getting-started/README.md and multiple docs/* pages). Running the documented commands will now fail with “unrecognized arguments: --output-clip-path”.
| import argparse | ||
| import sys | ||
| import time | ||
| import traceback | ||
| from pathlib import Path |
There was a problem hiding this comment.
Broken import when executed
from utils import setup_executor, write_benchmark_results relies on the current working directory / sys.path containing benchmarking/scripts. When the benchmark runner executes this script via an absolute path (python benchmarking/scripts/video_pipeline_benchmark.py ...), utils.py won’t be importable as a top-level module and this will raise ModuleNotFoundError. Prefer an explicit package import (e.g. from benchmarking.scripts.utils import ...) or adjust sys.path accordingly.
|
@abhinavg4 okay to close this in favor of #1471 ? |
… pyasn1 and tokenizers versions
Description
Usage
# Add snippet demonstrating usageChecklist