[benchmarking] Adds image curation benchmark to nightly#1341
[benchmarking] Adds image curation benchmark to nightly#1341praateekmahajan merged 27 commits intoNVIDIA-NeMo:mainfrom
Conversation
…images with :latest by default, adds session name to slack report. Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
…a_updates Signed-off-by: rlratzel <rratzel@nvidia.com>
…atzel/curator into 2602_benchmark_infra_updates Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
…g script to allow for more flexibility. Signed-off-by: rlratzel <rratzel@nvidia.com>
…n-readable output is needed, updates paths to benchmark output dir. Signed-off-by: rlratzel <rratzel@nvidia.com>
…sults Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
…laceholders were silently ignored, comment cleanup. Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
Greptile SummaryThis PR adds an image curation benchmark to the nightly benchmark suite, along with refactoring the placeholder substitution logic in the benchmark runner to support the new Key Changes:
Critical Issue:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Runner as Benchmark Runner
participant Session as Session
participant Entry as Entry
participant PathRes as PathResolver
participant DataRes as DatasetResolver
participant Ray as Ray Cluster
participant Script as Image Curation Script
Runner->>Runner: Load YAML config
Runner->>Session: create_from_dict(config)
Session->>PathRes: Create PathResolver
Session->>DataRes: Create DatasetResolver
Session->>Session: Create Entry objects
Runner->>Entry: get_command_to_run()
Entry->>Entry: substitute_reserved_placeholders()<br/>{curator_repo_dir}, {session_entry_dir}, {dataset:...}
Entry->>PathRes: substitute_container_or_host_paths()<br/>resolve paths for container/host mapping
Entry-->>Runner: Return resolved command
Runner->>Ray: setup_ray_cluster_and_env()<br/>with num_gpus from entry.ray config
Note over Ray: Defaults to 0 GPUs if not specified
Runner->>Script: Execute python command
Script->>Script: create_image_curation_pipeline()
Note over Script: Pipeline stages use num_gpus_per_worker<br/>ImageReaderStage: 0.25<br/>ImageEmbeddingStage: 0.25<br/>ImageAestheticFilterStage: 0.25<br/>ImageNSFWFilterStage: 0.25
Script-->>Runner: Return exit code
Runner->>Runner: get_entry_script_persisted_data()<br/>Read metrics.json, params.json, tasks.pkl
Runner->>Runner: check_requirements_update_results()
Runner->>Ray: teardown_ray_cluster_and_env()
Runner->>Runner: Write results.json
|
Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
| - name: image_curation | ||
| enabled: true | ||
| script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py" | ||
| args: >- | ||
| --input-wds-dataset-dir {dataset:mscoco,wds} | ||
| --output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco | ||
| --model-dir {dataset:mscoco_model_weights,files} | ||
| --batch-size 100 | ||
| --embedding-batch-size 100 | ||
| --aesthetic-batch-size 100 | ||
| --nsfw-batch-size 100 | ||
| --tar-files-per-partition 10 | ||
| --aesthetic-threshold 0.9 | ||
| --nsfw-threshold 0.9 | ||
| --skip-download | ||
| --verbose |
There was a problem hiding this comment.
missing ray: configuration block to allocate GPUs
the script requires GPUs for multiple stages (ImageReaderStage uses 0.25 GPUs, ImageEmbeddingStage/ImageAestheticFilterStage/ImageNSFWFilterStage each default to 0.25 GPUs per worker)
other GPU benchmarks like domain_classification_raydata (lines 75-78) include:
ray:
num_cpus: 64
num_gpus: 4
enable_object_spilling: falseSigned-off-by: rlratzel <rratzel@nvidia.com>
| - name: image_curation | ||
| enabled: true | ||
| script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py" | ||
| args: >- | ||
| --input-wds-dataset-dir {dataset:mscoco,wds} | ||
| --output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco | ||
| --model-dir {dataset:mscoco_model_weights,files} | ||
| --batch-size 100 | ||
| --embedding-batch-size 100 | ||
| --aesthetic-batch-size 100 | ||
| --nsfw-batch-size 100 | ||
| --tar-files-per-partition 10 | ||
| --aesthetic-threshold 0.9 | ||
| --nsfw-threshold 0.9 | ||
| --skip-download | ||
| --verbose |
There was a problem hiding this comment.
logic: missing ray: configuration block for GPU allocation
the image curation script requires GPUs for multiple stages (ImageReaderStage, ImageEmbeddingStage, ImageAestheticFilterStage, ImageNSFWFilterStage each use 0.25 GPUs per worker by default)
add configuration like other GPU benchmarks:
| - name: image_curation | |
| enabled: true | |
| script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py" | |
| args: >- | |
| --input-wds-dataset-dir {dataset:mscoco,wds} | |
| --output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco | |
| --model-dir {dataset:mscoco_model_weights,files} | |
| --batch-size 100 | |
| --embedding-batch-size 100 | |
| --aesthetic-batch-size 100 | |
| --nsfw-batch-size 100 | |
| --tar-files-per-partition 10 | |
| --aesthetic-threshold 0.9 | |
| --nsfw-threshold 0.9 | |
| --skip-download | |
| --verbose | |
| - name: image_curation | |
| enabled: true | |
| script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py" | |
| args: >- | |
| --input-wds-dataset-dir {dataset:mscoco,wds} | |
| --output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco | |
| --model-dir {dataset:mscoco_model_weights,files} | |
| --batch-size 100 | |
| --embedding-batch-size 100 | |
| --aesthetic-batch-size 100 | |
| --nsfw-batch-size 100 | |
| --tar-files-per-partition 10 | |
| --aesthetic-threshold 0.9 | |
| --nsfw-threshold 0.9 | |
| --skip-download | |
| --verbose | |
| ray: | |
| num_cpus: 64 | |
| num_gpus: 4 | |
| enable_object_spilling: false |
Signed-off-by: rlratzel <rratzel@nvidia.com>
| - name: image_curation | ||
| enabled: true | ||
| script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py" | ||
| args: >- | ||
| --input-wds-dataset-dir {dataset:mscoco,wds} | ||
| --output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco | ||
| --model-dir {dataset:mscoco_model_weights,files} | ||
| --batch-size 100 | ||
| --embedding-batch-size 100 | ||
| --aesthetic-batch-size 100 | ||
| --nsfw-batch-size 100 | ||
| --tar-files-per-partition 10 | ||
| --aesthetic-threshold 0.9 | ||
| --nsfw-threshold 0.9 | ||
| --skip-download | ||
| --verbose |
There was a problem hiding this comment.
logic: missing ray: configuration block for GPU allocation
the image curation pipeline requires GPUs (4 stages use 0.25 GPUs per worker by default: ImageReaderStage, ImageEmbeddingStage, ImageAestheticFilterStage, ImageNSFWFilterStage)
without this config, the benchmark will use 0 GPUs (default from benchmarking/run.py:161) and likely fail or run very slowly
add GPU config like other benchmarks (e.g., lines 75-78):
| - name: image_curation | |
| enabled: true | |
| script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py" | |
| args: >- | |
| --input-wds-dataset-dir {dataset:mscoco,wds} | |
| --output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco | |
| --model-dir {dataset:mscoco_model_weights,files} | |
| --batch-size 100 | |
| --embedding-batch-size 100 | |
| --aesthetic-batch-size 100 | |
| --nsfw-batch-size 100 | |
| --tar-files-per-partition 10 | |
| --aesthetic-threshold 0.9 | |
| --nsfw-threshold 0.9 | |
| --skip-download | |
| --verbose | |
| - name: image_curation | |
| enabled: true | |
| script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py" | |
| args: >- | |
| --input-wds-dataset-dir {dataset:mscoco,wds} | |
| --output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco | |
| --model-dir {dataset:mscoco_model_weights,files} | |
| --batch-size 100 | |
| --embedding-batch-size 100 | |
| --aesthetic-batch-size 100 | |
| --nsfw-batch-size 100 | |
| --tar-files-per-partition 10 | |
| --aesthetic-threshold 0.9 | |
| --nsfw-threshold 0.9 | |
| --skip-download | |
| --verbose | |
| ray: | |
| num_cpus: 64 | |
| num_gpus: 4 | |
| enable_object_spilling: false |
| - name: image_curation | ||
| enabled: true | ||
| script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py" | ||
| args: >- | ||
| --input-wds-dataset-dir {dataset:mscoco,wds} | ||
| --output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco | ||
| --model-dir {dataset:mscoco_model_weights,files} | ||
| --batch-size 100 | ||
| --embedding-batch-size 100 | ||
| --aesthetic-batch-size 100 | ||
| --nsfw-batch-size 100 | ||
| --tar-files-per-partition 10 | ||
| --aesthetic-threshold 0.9 | ||
| --nsfw-threshold 0.9 | ||
| --skip-download | ||
| --verbose |
There was a problem hiding this comment.
logic: missing ray: configuration to allocate GPUs
the image curation pipeline uses GPUs in 4 stages: ImageReaderStage (0.25), ImageEmbeddingStage (0.25), ImageAestheticFilterStage (0.25), and ImageNSFWFilterStage (0.25) - see tutorials/image/getting-started/image_curation_example.py:50,56,65,74
without this config, benchmarking/run.py:161 defaults to 0 GPUs, causing the pipeline to fail or run incorrectly
add GPU allocation like other GPU benchmarks (e.g., lines 75-78):
| - name: image_curation | |
| enabled: true | |
| script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py" | |
| args: >- | |
| --input-wds-dataset-dir {dataset:mscoco,wds} | |
| --output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco | |
| --model-dir {dataset:mscoco_model_weights,files} | |
| --batch-size 100 | |
| --embedding-batch-size 100 | |
| --aesthetic-batch-size 100 | |
| --nsfw-batch-size 100 | |
| --tar-files-per-partition 10 | |
| --aesthetic-threshold 0.9 | |
| --nsfw-threshold 0.9 | |
| --skip-download | |
| --verbose | |
| - name: image_curation | |
| enabled: true | |
| script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py" | |
| args: >- | |
| --input-wds-dataset-dir {dataset:mscoco,wds} | |
| --output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco | |
| --model-dir {dataset:mscoco_model_weights,files} | |
| --batch-size 100 | |
| --embedding-batch-size 100 | |
| --aesthetic-batch-size 100 | |
| --nsfw-batch-size 100 | |
| --tar-files-per-partition 10 | |
| --aesthetic-threshold 0.9 | |
| --nsfw-threshold 0.9 | |
| --skip-download | |
| --verbose | |
| ray: | |
| num_cpus: 64 | |
| num_gpus: 4 | |
| enable_object_spilling: false |
Adds image curation benchmark to nightly run. This uses the image curation "getting started" tutorial.