cp: Update tutorial to be more explicit about num_gpus (#1492) by thomasdhc · Pull Request #1499 · NVIDIA-NeMo/Curator

thomasdhc · 2026-02-12T16:12:26Z

Update tutorial to be more explicit about num_gpus
fix false positive secret scan
Hopefully fix the secrets

Description

Usage

# Add snippet demonstrating usage

Checklist

I am familiar with the Contributing Guide.
New or Existing tests cover these changes.
The documentation is up to date with these changes.

* Update tutorial to be more explicit about num_gpus Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * fix false positive secret scan Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Hopefully fix the secrets Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>

copy-pr-bot · 2026-02-12T16:12:32Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-02-12T16:14:33Z

Greptile Overview

Greptile Summary

Added GPU availability validation to three deduplication tutorial notebooks. Each tutorial now imports torch and checks torch.cuda.device_count() against the required GPU count before initializing RayClient, raising a helpful error message if insufficient GPUs are available.

Key changes:

Introduced NUM_GPUS constant (2 for fuzzy deduplication, 4 for semantic deduplication)
Added pre-flight GPU check to prevent runtime failures
Improved user experience with clear error messaging
Minor notebook metadata updates (Python version, kernel display name)

The implementation correctly prevents users from running tutorials on machines without sufficient GPU resources, though the error message has a minor grammar issue ("are lesser" should be "is less").

Confidence Score: 4/5

This PR is safe to merge with minimal risk
The changes add helpful validation to prevent runtime failures in tutorials. The implementation is straightforward and correctly uses torch.cuda.device_count() to check GPU availability. Only minor style issue is the grammatical error in error messages.
No files require special attention

Important Files Changed

Filename	Overview
tutorials/text/deduplication/fuzzy/fuzzy_e2e.ipynb	Added GPU availability check with torch before initializing RayClient with 2 GPUs; minor grammar issue in error message
tutorials/text/deduplication/semantic/semantic_e2e.ipynb	Added GPU availability check with torch before initializing RayClient with 4 GPUs; minor grammar issue in error message
tutorials/text/deduplication/semantic/semantic_step_by_step.ipynb	Added GPU availability check with torch before initializing RayClient with 4 GPUs; minor grammar issue in error message

greptile-apps

_{3 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-12T16:14:37Z

    "\n",
-    "client = RayClient(num_cpus=64, num_gpus=2)  # change as needed\n",
+    "NUM_GPUS = 2\n",
+    "\n",


Grammar: "are lesser" should be "is less" or "are fewer"

Suggested change

"\n",

error_msg = "The number of GPUs on this machine is less than the default this tutorial was tested with, please update `num_gpus` passed into `RayClient`"

greptile-apps · 2026-02-12T16:14:38Z

    "# Number of GPUs should be roughly 2x the memory of the embeddings\n",
-    "client = RayClient(num_cpus=64, num_gpus=4)\n",
+    "NUM_GPUS = 4\n",
+    "\n",


Grammar: "are lesser" should be "is less" or "are fewer"

Suggested change

"\n",

error_msg = "The number of GPUs on this machine is less than the default this tutorial was tested with, please update `num_gpus` passed into `RayClient`"

greptile-apps · 2026-02-12T16:14:39Z

    "\n",
    "# Number of GPUs should be roughly 2x the memory of the embeddings\n",
-    "client = RayClient(num_cpus=64, num_gpus=4)\n",
+    "NUM_GPUS = 4\n",


Grammar: "are lesser" should be "is less" or "are fewer"

Suggested change

"NUM_GPUS = 4\n",

error_msg = "The number of GPUs on this machine is less than the default this tutorial was tested with, please update `num_gpus` passed into `RayClient`"

…NVIDIA-NeMo#1499) * Update tutorial to be more explicit about num_gpus * fix false positive secret scan * Hopefully fix the secrets --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com> Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>

* ci: Bump version to 1.1.0 (#1364) (#1365) Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * feat: FFmpeg to 8.0.1 (#1362) (#1363) Signed-off-by: Ao Tang <aot@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Ao Tang <aot@nvidia.com> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Fix bug in SDG example (#1370) (#1371) Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Fix bug in Gliner tutorial (#1372) (#1378) * Fix bug in Gliner tutorial * update readmes --------- Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Address aiohttp and urllib3 cve (#1379) (#1383) Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Adding one worker per partition to FilePartioningStage and URLGeneratorStage (#1350) (#1366) Signed-off-by: Abhinav Garg <abhinavg@stanford.edu> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Abhinav Garg <abhinavg@stanford.edu> Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Update instructions for AWS credentials in ArXiv download and extract tutorial (#1380) (#1402) * Update instructions for AWS credentials in ArXiv download and extract tutorial * ruff --------- Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * cp: Revert "Remove nvenc/dec for xenna 0.1.6 (#1202)" (#1374) (#1403) This reverts commit c4805ae. Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Pin sklearn to < 1.8.0 for cuml 25.10 for r.1.1.0 #1405 Signed-off-by: Lawrence Lane <llane@nvidia.com> * Clarify instructions for downloading the Llama Nemotron Post-Training Dataset (#1416) (#1423) Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * CP: Fix vllm API compatibility with Video Pipeline + Upgrade vLLM to 0.14 (#1429) * vllm API compatibility fixed Signed-off-by: Ao Tang <aot@nvidia.com> * upgrade vllm to 0.14.0 Signed-off-by: Ao Tang <aot@nvidia.com> * refactor Signed-off-by: Ao Tang <aot@nvidia.com> * pyproject update Signed-off-by: Ao Tang <aot@nvidia.com> * add protobuf in constraint-dependencies Signed-off-by: Ao Tang <aot@nvidia.com> * comment improve Signed-off-by: Ao Tang <aot@nvidia.com> * resolve pyproject Signed-off-by: Ao Tang <aot@nvidia.com> --------- Signed-off-by: Ao Tang <aot@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * ci: Address setuptools CVE (#1438) (#1439) * Address CVE fixes * Remove cache of aiohttp from ray * Update uv lock * Update cache path --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * ci: Optimize docker layer and uv with no cache (#1444) (#1446) * Optimize docker layer and uv with no cache * Add missing slash * Add comments to dockerfile --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Purge InternVideo2 (#1451) (#1462) * Remove Internvideo2 * more to remove * fix writer * Enhance Clip class to include cosmos_embed1_frames and cosmos_embed1_embedding in total size calculation * remove iv2 --------- Signed-off-by: Ao Tang <aot@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Ao Tang <aot@nvidia.com> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * ci: Update cve for python-multipart (#1450) (#1455) * Update cve for python-multipart * Update uv lock --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * cherry pick commit, no benchmarking needed (#1461) Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * ci: Update vllm to 0.14.1 and override conflict (#1467) (#1468) * Update vllm to 0.14.1 and override conflict * Upperbound numpy for Numba compatibility * Update vllm to 0.15.1 --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * ci: Remove thirdparty aiohttp file from ray (#1469) (#1475) Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Fix: fasttext predict call for numpy>2 (#1482) (#1486) Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Update transformers dependency to exact version 4.55.2 in pyproject.t… (#1471) (#1488) * Update transformers dependency to exact version 4.55.2 in pyproject.toml and uv.lock to prevent import failures in Cosmos Embed. Downgrade tokenizers version to 0.21.4 for compatibility. * Update transformers dependency in pyproject.toml and uv.lock to allow versions up to 4.55.2, ensuring compatibility with Cosmos Embed imports. --------- Signed-off-by: Abhinav Garg <abhinavg@stanford.edu> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Abhinav Garg <abhinavg@stanford.edu> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Cherry pick `tutorials` changes from #1477 (#1491) * Update tutorial README Signed-off-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> * Update quickstart Updated sample sentences to provide more detailed feedback. Signed-off-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> --------- Signed-off-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Update tutorial to be more explicit about num_gpus (#1492) (#1499) * Update tutorial to be more explicit about num_gpus * fix false positive secret scan * Hopefully fix the secrets --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com> Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Add relevant 26.02 docs to r1.1.0 (#1493) * add release notes Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> * add more pages Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> * add more pages Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> * add new sdg docs Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> * update remaining files from sdg docs Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> * continue adding more changes Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> * more video docs Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> * add remaining updates Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> --------- Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * ci: Address new CVEs from rc4 (#1497) (#1500) * Scrub thirdparty aiohttp file from ray * Address new rc4 CVE * Apt get for consistency --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Add feedback to tutorials (#1476) (#1501) * Add feedback to tutorials * clarify install instructions for classifier tutorials * byo classifiers * add descriptions --------- Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> Signed-off-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * ci: Update pyasn1 in uv lock (#1505) Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * Refactor video frame extraction to improve PyNvCodec availability check (#1511) (#1513) * Refactor video frame extraction to improve PyNvCodec availability check - Removed the try-except block for importing PyNvcFrameExtractor, simplifying the import logic. - Updated the condition for initializing the PyNvcFrameExtractor in the VideoFrameExtractionStage to rely solely on the _PYNVC_AVAILABLE flag. - Adjusted the handling of pixel format conversion in NvVideoDecoder to prepare for future updates to cvcuda. * Refactor NvVideoDecoder to replace deprecated nvcv_image with cvcuda tensor - Updated NvVideoDecoder to remove the use of nvcv_image, which is deprecated, and replaced it with cvcuda tensor. - Adjusted related tensor operations and tests to ensure compatibility with the new cvcuda implementation. * Update import statements in test_nvcodec_utils.py to include ruff linting rule - Modified import statements in the test file to include the RUF100 linting rule, ensuring better adherence to coding standards. - This change enhances the clarity of the import handling tests. * Update tests/utils/test_nvcodec_utils.py * Update tests/utils/test_nvcodec_utils.py --------- Signed-off-by: Abhinav Garg <abhinavg@stanford.edu> Signed-off-by: [Your Name] <your.email@example.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Abhinav Garg <abhinavg@stanford.edu> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs: isolate release notes and changelog (#1529) * docs: isolate release notes and changelog Signed-off-by: Lawrence Lane <llane@nvidia.com> * abhinav's feedback Signed-off-by: Lawrence Lane <llane@nvidia.com> * feedback Signed-off-by: Lawrence Lane <llane@nvidia.com> --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> * ci: Update final release version (#1540) Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs: release note updates Signed-off-by: Lawrence Lane <llane@nvidia.com> --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Ao Tang <aot@nvidia.com> Signed-off-by: Sarah Yurick <sarahyurick@gmail.com> Signed-off-by: Abhinav Garg <abhinavg@stanford.edu> Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: [Your Name] <your.email@example.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com> Co-authored-by: Ao Tang <aot@nvidia.com> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com> Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Co-authored-by: Abhinav Garg <abhinavg@stanford.edu> Co-authored-by: Praateek Mahajan <praateekmahajan@users.noreply.github.com> Co-authored-by: Huy Vu <86480512+huvunvidia@users.noreply.github.com> Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

thomasdhc requested review from ayushdg, lbliii and sarahyurick February 12, 2026 16:12

greptile-apps Bot reviewed Feb 12, 2026

View reviewed changes

ayushdg approved these changes Feb 12, 2026

View reviewed changes

thomasdhc merged commit 56ae537 into r1.1.0 Feb 12, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: Update tutorial to be more explicit about num_gpus (#1492)#1499

cp: Update tutorial to be more explicit about num_gpus (#1492)#1499
thomasdhc merged 1 commit intor1.1.0from
cp-1492-r1.1.0

thomasdhc commented Feb 12, 2026

Uh oh!

copy-pr-bot Bot commented Feb 12, 2026

Uh oh!

greptile-apps Bot commented Feb 12, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

greptile-apps Bot Feb 12, 2026

Uh oh!

greptile-apps Bot Feb 12, 2026

Uh oh!

greptile-apps Bot Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"\n",
	error_msg = "The number of GPUs on this machine is less than the default this tutorial was tested with, please update `num_gpus` passed into `RayClient`"

	"NUM_GPUS = 4\n",
	error_msg = "The number of GPUs on this machine is less than the default this tutorial was tested with, please update `num_gpus` passed into `RayClient`"

Conversation

thomasdhc commented Feb 12, 2026

Description

Usage

Checklist

Uh oh!

copy-pr-bot Bot commented Feb 12, 2026

Uh oh!

greptile-apps Bot commented Feb 12, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants