Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 12 additions & 11 deletions tutorials/nlp/Punctuation_and_Capitalization_Lexical_Audio.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@
"- whether the word should be capitalized\n",
"\n",
"\n",
"In some cases lexical only model can't predict punctutation correctly without audio. It is especially hard for conversational speech.\n",
"In some cases lexical only model can't predict punctuation correctly without audio. It is especially hard for conversational speech.\n",
"\n",
"For example:\n",
"\n",
Expand All @@ -119,7 +119,7 @@
"## Architecture\n",
"Punctuation and capitaalization lexical audio model is based on [Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech](https://arxiv.org/pdf/2008.00702.pdf). Model consists of lexical encoder (BERT-like model), acoustic encoder (i.e. Conformer's audio encoder), fusion of lexical and audio features (attention based fusion) and prediction layers.\n",
"\n",
"Fusion is needed because encoded text and audio might have different length therfore can't be alligned one-to-one. As model predicts punctuation and capitalization per text token we use cross-attention between encoded lexical and encoded audio input."
"Fusion is needed because encoded text and audio might have different length therefore can't be aligned one-to-one. As model predicts punctuation and capitalization per text token we use cross-attention between encoded lexical and encoded audio input."
]
},
{
Expand Down Expand Up @@ -279,22 +279,23 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"outputs": [],
"source": [
"## download get_tatoeba_data.py script to download and preprocess the Tatoeba data\n",
"## download get_libritts_data.py script to download and preprocess the LibriTTS data\n",
"os.makedirs(WORK_DIR, exist_ok=True)\n",
"if not os.path.exists(WORK_DIR + '/get_libritts_data.py'):\n",
" print('Downloading get_libritts_data.py...')\n",
" wget.download(f'https://raw.githubusercontent.com/NVIDIA/NeMo/{BRANCH}/examples/nlp/token_classification/data/get_libritts_data.py', WORK_DIR)\n",
"else:\n",
" print ('get_libritts_data.py already exists')"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
]
},
{
"cell_type": "code",
Expand Down