Skip to content

Comments

German Youtube with new processors#48

Merged
ssh-meister merged 6 commits intoNVIDIA:meister/youtubefrom
ssh-meister:yt
Mar 18, 2024
Merged

German Youtube with new processors#48
ssh-meister merged 6 commits intoNVIDIA:meister/youtubefrom
ssh-meister:yt

Conversation

@ssh-meister
Copy link
Collaborator

@ssh-meister ssh-meister commented Mar 17, 2024

Pipeline for YouTube subset of German language which includes following steps:

  1. CreateInitialManifest - Create initial manifests based on pairs of .opus audio + .srt related transcript (with ground-truth timestamps)
  2. AggregateSegments - Aggregate ground-truth segments to longer one based on duration threshold
  3. Identification of the text language and audio language
  4. ASRInferenceParallel - added processor for the parallel ASR Inference (and following MergeManifests)
  5. preprocessing to audio-based TN
  6. running of audio-based TN
  7. postprocessing of audio-based TN
  8. filtration with metrics thresholds
  9. data saving

@ssh-meister ssh-meister changed the base branch from karpnv/memory_chunk to karpnv/cc March 17, 2024 20:32
@ssh-meister ssh-meister marked this pull request as ready for review March 18, 2024 11:20
class CreateInitialManifest(BaseParallelProcessor):
def __init__(
self,
data_dir: str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call it more specific

@karpnv
Copy link
Collaborator

karpnv commented Mar 18, 2024

Thank you! Just need to fix minor issues

@karpnv
Copy link
Collaborator

karpnv commented Mar 18, 2024

Each processor should have docstring. Please see other processors from the Main branch

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
@ssh-meister ssh-meister changed the base branch from karpnv/cc to meister/youtube March 18, 2024 21:07
@ssh-meister ssh-meister merged commit 6e08018 into NVIDIA:meister/youtube Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants