Added ConvertToTarredAudioDataset by ssh-meister · Pull Request #110 · NVIDIA/NeMo-speech-data-processor

ssh-meister · 2025-04-21T15:41:06Z

Processor converts an ASR dataset into a tarred format compatible with TarredAudioToTextDataLayer

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

lilithgrigoryan

Thanks for the PR. Overall looks good to me. Let's have a unit test covering 2 values of num_workers and num_shards. And minor comments above. Also, please update with main

lilithgrigoryan · 2025-04-29T15:45:14Z

sdp/utils/convert_to_tarred_audio_dataset.py

+        entries, total_duration, filtered_entries, filtered_duration = self._read_manifest(manifest_path, config)
+
+        if len(filtered_entries) > 0:
+            print(f"Filtered {len(filtered_entries)} files which amounts to {filtered_duration} seconds of audio.")


please avoid prints and use logger instead

lilithgrigoryan · 2025-04-29T15:46:22Z

sdp/utils/convert_to_tarred_audio_dataset.py

+
+        cuts = CutSet(LazyNeMoIterator(manifest_path, metadata_only=True))
+        bins = estimate_duration_buckets(cuts, num_buckets=num_buckets)
+        print(


Please, replace prints with logging everywhere

Added ConvertToTarredAudioDataset

ac43ec2

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

ssh-meister requested review from karpnv and lilithgrigoryan April 21, 2025 15:41

Documentation added

11de666

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

lilithgrigoryan requested changes Apr 29, 2025

View reviewed changes

ssh-meister mentioned this pull request Jul 21, 2025

ConvertToTarredAudioDataset processor implemetation #145

Merged

ssh-meister closed this Jul 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added ConvertToTarredAudioDataset#110

Added ConvertToTarredAudioDataset#110
ssh-meister wants to merge 2 commits intomainfrom
tarred

ssh-meister commented Apr 21, 2025

Uh oh!

lilithgrigoryan left a comment •

edited

Loading

Uh oh!

lilithgrigoryan Apr 29, 2025

Uh oh!

lilithgrigoryan Apr 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

ssh-meister commented Apr 21, 2025

Uh oh!

lilithgrigoryan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lilithgrigoryan Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

lilithgrigoryan Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

lilithgrigoryan left a comment •

edited

Loading