Skip to content

Fix bug in SDG example#1370

Merged
sarahyurick merged 1 commit intoNVIDIA-NeMo:mainfrom
sarahyurick:sdg_example_fix
Jan 14, 2026
Merged

Fix bug in SDG example#1370
sarahyurick merged 1 commit intoNVIDIA-NeMo:mainfrom
sarahyurick:sdg_example_fix

Conversation

@sarahyurick
Copy link
Copy Markdown
Contributor

No description provided.

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
@ayushdg ayushdg requested a review from huvunvidia January 14, 2026 18:37
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jan 14, 2026

Greptile Summary

Fixed a bug in BeginsWithLanguageFilter.__init__ where self.name was being assigned instead of self._name. The parent class DocumentFilter has a read-only name property that returns self._name, so attempting to set self.name would fail. The change correctly sets the underlying private attribute self._name directly.

  • Fixed property assignment error in BeginsWithLanguageFilter constructor

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • The change is a simple, correct bug fix that resolves a property assignment error. The parent class has a read-only property name that accesses _name, so the child class must set _name directly. This is a one-line fix with no side effects or behavioral changes.
  • No files require special attention

Important Files Changed

Filename Overview
tutorials/synthetic/synthetic_data_generation_example.py Fixed property assignment bug by changing self.name to self._name to properly set the private attribute

Sequence Diagram

sequenceDiagram
    participant Client as Client Code
    participant BeginsWithLanguageFilter as BeginsWithLanguageFilter
    participant DocumentFilter as DocumentFilter (Parent)
    
    Client->>BeginsWithLanguageFilter: __init__(languages)
    BeginsWithLanguageFilter->>BeginsWithLanguageFilter: self._name = "begins_with_language_filter"
    BeginsWithLanguageFilter->>BeginsWithLanguageFilter: self.languages = languages
    Note over BeginsWithLanguageFilter: Fixed: Now sets _name directly<br/>instead of trying to set read-only<br/>property 'name'
    
    Client->>BeginsWithLanguageFilter: score_document(text)
    BeginsWithLanguageFilter->>BeginsWithLanguageFilter: Check if text starts with language prefix
    BeginsWithLanguageFilter-->>Client: Return 1.0 or 0.0
    
    Client->>BeginsWithLanguageFilter: name (property access)
    BeginsWithLanguageFilter->>DocumentFilter: @property name
    DocumentFilter-->>Client: Return self._name
Loading

@sarahyurick sarahyurick merged commit a086dd4 into NVIDIA-NeMo:main Jan 14, 2026
18 checks passed
@sarahyurick sarahyurick added the r1.1.0 Pick this label for auto cherry-picking into r1.1.0 label Jan 14, 2026
pablo-garay pushed a commit that referenced this pull request Jan 14, 2026
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
sarahyurick added a commit that referenced this pull request Jan 14, 2026
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
@sarahyurick sarahyurick deleted the sdg_example_fix branch February 9, 2026 18:13
copy-pr-bot Bot pushed a commit that referenced this pull request Feb 19, 2026
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
lbliii pushed a commit to lbliii/NeMo-Curator that referenced this pull request Mar 16, 2026
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
lbliii added a commit that referenced this pull request Mar 23, 2026
* ci: Bump version to 1.1.0 (#1364) (#1365)

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* feat: FFmpeg to 8.0.1 (#1362) (#1363)

Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Fix bug in SDG example (#1370) (#1371)

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Fix bug in Gliner tutorial (#1372) (#1378)

* Fix bug in Gliner tutorial

* update readmes

---------

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Address aiohttp and urllib3 cve (#1379) (#1383)

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com>
Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Adding one worker per partition to FilePartioningStage and URLGeneratorStage (#1350) (#1366)

Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Abhinav Garg <abhinavg@stanford.edu>
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Update instructions for AWS credentials in ArXiv download and extract tutorial (#1380) (#1402)

* Update instructions for AWS credentials in ArXiv download and extract tutorial

* ruff

---------

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* cp: Revert "Remove nvenc/dec for xenna 0.1.6 (#1202)" (#1374) (#1403)

This reverts commit c4805ae.

Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Pin sklearn to < 1.8.0 for cuml 25.10 for r.1.1.0 #1405

Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Clarify instructions for downloading the Llama Nemotron Post-Training Dataset (#1416) (#1423)

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* CP: Fix vllm API compatibility with Video Pipeline + Upgrade vLLM to 0.14 (#1429)

* vllm API compatibility fixed

Signed-off-by: Ao Tang <aot@nvidia.com>

* upgrade vllm to 0.14.0

Signed-off-by: Ao Tang <aot@nvidia.com>

* refactor

Signed-off-by: Ao Tang <aot@nvidia.com>

* pyproject update

Signed-off-by: Ao Tang <aot@nvidia.com>

* add protobuf in constraint-dependencies

Signed-off-by: Ao Tang <aot@nvidia.com>

* comment improve

Signed-off-by: Ao Tang <aot@nvidia.com>

* resolve pyproject

Signed-off-by: Ao Tang <aot@nvidia.com>

---------

Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* ci: Address setuptools CVE (#1438) (#1439)

* Address CVE fixes

* Remove cache of aiohttp from ray

* Update uv lock

* Update cache path

---------

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* ci: Optimize docker layer and uv with no cache (#1444) (#1446)

* Optimize docker layer and uv with no cache

* Add missing slash

* Add comments to dockerfile

---------

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Purge InternVideo2 (#1451) (#1462)

* Remove Internvideo2

* more to remove

* fix writer

* Enhance Clip class to include cosmos_embed1_frames and cosmos_embed1_embedding in total size calculation

* remove iv2

---------

Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* ci: Update cve for python-multipart (#1450) (#1455)

* Update cve for python-multipart

* Update uv lock

---------

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* cherry pick commit, no benchmarking needed (#1461)

Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* ci: Update vllm to 0.14.1 and override conflict (#1467) (#1468)

* Update vllm to 0.14.1 and override conflict

* Upperbound numpy for Numba compatibility

* Update vllm to 0.15.1

---------

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* ci: Remove thirdparty aiohttp file from ray (#1469) (#1475)

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Fix: fasttext predict call for numpy>2 (#1482) (#1486)

Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Update transformers dependency to exact version 4.55.2 in pyproject.t… (#1471) (#1488)

* Update transformers dependency to exact version 4.55.2 in pyproject.toml and uv.lock to prevent import failures in Cosmos Embed. Downgrade tokenizers version to 0.21.4 for compatibility.

* Update transformers dependency in pyproject.toml and uv.lock to allow versions up to 4.55.2, ensuring compatibility with Cosmos Embed imports.

---------

Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Abhinav Garg <abhinavg@stanford.edu>
Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Cherry pick `tutorials` changes from #1477 (#1491)

* Update tutorial README

Signed-off-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>

* Update quickstart

Updated sample sentences to provide more detailed feedback.

Signed-off-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>

---------

Signed-off-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Update tutorial to be more explicit about num_gpus (#1492) (#1499)

* Update tutorial to be more explicit about num_gpus

* fix false positive secret scan

* Hopefully fix the secrets

---------

Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Add relevant 26.02 docs to r1.1.0 (#1493)

* add release notes

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

* add more pages

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

* add more pages

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

* add new sdg docs

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

* update remaining files from sdg docs

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

* continue adding more changes

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

* more video docs

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

* add remaining updates

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

---------

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* ci: Address new CVEs from rc4 (#1497) (#1500)

* Scrub thirdparty aiohttp file from ray

* Address new rc4 CVE

* Apt get for consistency

---------

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Add feedback to tutorials (#1476) (#1501)

* Add feedback to tutorials

* clarify install instructions for classifier tutorials

* byo classifiers

* add descriptions

---------

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* ci: Update pyasn1 in uv lock (#1505)

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Refactor video frame extraction to improve PyNvCodec availability check (#1511) (#1513)

* Refactor video frame extraction to improve PyNvCodec availability check

- Removed the try-except block for importing PyNvcFrameExtractor, simplifying the import logic.
- Updated the condition for initializing the PyNvcFrameExtractor in the VideoFrameExtractionStage to rely solely on the _PYNVC_AVAILABLE flag.
- Adjusted the handling of pixel format conversion in NvVideoDecoder to prepare for future updates to cvcuda.

* Refactor NvVideoDecoder to replace deprecated nvcv_image with cvcuda tensor

- Updated NvVideoDecoder to remove the use of nvcv_image, which is deprecated, and replaced it with cvcuda tensor.
- Adjusted related tensor operations and tests to ensure compatibility with the new cvcuda implementation.

* Update import statements in test_nvcodec_utils.py to include ruff linting rule

- Modified import statements in the test file to include the RUF100 linting rule, ensuring better adherence to coding standards.
- This change enhances the clarity of the import handling tests.

* Update tests/utils/test_nvcodec_utils.py

* Update tests/utils/test_nvcodec_utils.py

---------

Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
Signed-off-by: [Your Name] <your.email@example.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Abhinav Garg <abhinavg@stanford.edu>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: isolate release notes and changelog (#1529)

* docs: isolate release notes and changelog

Signed-off-by: Lawrence Lane <llane@nvidia.com>

* abhinav's feedback

Signed-off-by: Lawrence Lane <llane@nvidia.com>

* feedback

Signed-off-by: Lawrence Lane <llane@nvidia.com>

---------

Signed-off-by: Lawrence Lane <llane@nvidia.com>

* ci: Update final release version (#1540)

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: release note updates

Signed-off-by: Lawrence Lane <llane@nvidia.com>

---------

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: [Your Name] <your.email@example.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com>
Co-authored-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Co-authored-by: Abhinav Garg <abhinavg@stanford.edu>
Co-authored-by: Praateek Mahajan <praateekmahajan@users.noreply.github.com>
Co-authored-by: Huy Vu <86480512+huvunvidia@users.noreply.github.com>
Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r1.1.0 Pick this label for auto cherry-picking into r1.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants