Closed
Conversation
Collaborator
|
Good day, could you please sign the commits |
4ac9463 to
aa2b67c
Compare
Signed-off-by: Nithin Rao Koluguri <nithinrao.koluguri@gmail.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Collaborator
|
Docs fixed, error from non existing link, that's ok |
Jorjeous
requested changes
Jun 5, 2025
Collaborator
Jorjeous
left a comment
There was a problem hiding this comment.
LGTM, waiting for env vars to merge
Collaborator
There was a problem hiding this comment.
env vars is missing, (in test config file)
Jorjeous
reviewed
Jun 7, 2025
Collaborator
Jorjeous
left a comment
There was a problem hiding this comment.
Lets cover changes with tests
Collaborator
Author
|
Canceling this. Opened PR #130 in the origin branch to avoid test failures from cert issues. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Earnings21/22 Dataset Processing Pipeline with Forced Alignment
Overview
This PR introduces a complete 7-step processing pipeline for converting Earnings21 and Earnings22 datasets to NeMo format with advanced forced alignment capabilities. The pipeline supports both full dataset processing and evaluation subsets with optional speaker segmentation.
High-Level Changelog
New Features
Core Pipeline Processors:
CreateInitialAudioAndManifest: Initial audio manifest creation with automatic audio conversion (MP3 → WAV, multi-channel → mono, any sample rate → 16kHz)CreateFullAudioManifestEarnings21: Ground truth text reconstruction from NLP token files with punctuation/capitalization preservationNeMoForcedAligner: Word-level forced alignment using NeMo ASR models with CTC headsCreateSentenceSegmentedManifest: Intelligent sentence-level segmentation based on CTM files with punctuation-aware splittingSpeakerSegmentedManifest: Speaker-change detection and segmentation with optional metadata mappingDataset Support:
Audio Processing:
Pipeline Configuration
7-Step Processing Workflow:
Key Configuration Options:
dataset_type: "earnings21" | "earnings22"subset: "full" | "eval10" (earnings21 only)forced_alignment_model: Configurable NeMo ASR modelpreserve_punctuation/preserve_capitalization: Text processing optionsinclude_speaker_info/include_tags: Optional metadata inclusionOutput Formats
Sentence-Level Segments (Primary Output):
{ "audio_filepath": "/path/to/audio.wav", "duration": 15.2, "offset": 45.3, "text": "This is a complete sentence with proper punctuation.", "alignment": [ {"word": "This", "start": 45.3, "end": 45.6}, {"word": "is", "start": 45.6, "end": 45.8} ] }Speaker-Level Segments (Optional):
{ "audio_filepath": "/path/to/audio.wav", "duration": 0, "text": "Speaker segment text...", "speaker": "speaker_1", "segment_id": 0 }Usage Examples