Support configurable extra fields for LazyNeMoTarredIterator#9548
Support configurable extra fields for LazyNeMoTarredIterator#9548
Conversation
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
| try: | ||
| if isinstance(getattr(cfg, attr), fdl.Config): | ||
| _artifact_transform(getattr(cfg, attr), output_path=output_path) | ||
| except ValueError: |
Check notice
Code scanning / CodeQL
Empty except
|
|
||
| track_io(SentencePieceTokenizer, artifacts=[FileArtifact("model_path")]) | ||
| __all__.append("SentencePieceTokenizer") | ||
| except ImportError: |
Check notice
Code scanning / CodeQL
Empty except
| ], | ||
| ) | ||
| __all__.append("AutoTokenizer") | ||
| except ImportError: |
Check notice
Code scanning / CodeQL
Empty except
| @@ -1,4 +1,4 @@ | |||
| from typing import Callable, List, Optional | |||
| from typing import Any, Callable, List, Mapping, Optional | |||
Check notice
Code scanning / CodeQL
Unused import
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
b00c28c to
43559ef
Compare
| seed = resolve_seed(self.shard_seed) | ||
| random.Random(seed).shuffle(shard_ids) | ||
|
|
||
| # Propagate the random seed |
There was a problem hiding this comment.
just for reproducibility or is there any other reason?
There was a problem hiding this comment.
Reproducibility/consistent RNG behavior across all dataloading modules.
|
|
||
| >>> cuts = lhotse.CutSet(LazyNeMoIterator( | ||
| ... "nemo_manifests/train.json", | ||
| ... extra_fields=[{"type": "text_iter", "name": "question", "path": "questions.txt"}], |
There was a problem hiding this comment.
is there value in allowing to path to be a list of questions also ?
There was a problem hiding this comment.
I'm not sure I understood your comment - do you mean "path": ["questions1.txt", "questions2.txt"]? what would be the expected behavior?
|
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
* Support configurable extra fields for LazyNeMoTarredIterator Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Add tests and fixes Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Documentation, more tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com>
…NeMo#9548) * Support configurable extra fields for LazyNeMoTarredIterator Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Add tests and fixes Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Documentation, more tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Support configurable extra fields for LazyNeMoTarredIterator Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Add tests and fixes Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Documentation, more tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Support configurable extra fields for LazyNeMoTarredIterator Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Add tests and fixes Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Documentation, more tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…NeMo#9548) * Support configurable extra fields for LazyNeMoTarredIterator Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Add tests and fixes Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Documentation, more tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
…NeMo#9548) * Support configurable extra fields for LazyNeMoTarredIterator Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Add tests and fixes Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Documentation, more tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information