From 743a9a5adc6cfb20d3c40d2fd05941a3f7e3eb7f Mon Sep 17 00:00:00 2001 From: Eric Harper Date: Wed, 7 Dec 2022 20:35:46 -0700 Subject: [PATCH 001/154] Merge r1.13.0 main (#5570) * update branch Signed-off-by: ericharper * Rename Speech Dataset Processor to Speech Data Processor (#5378) Signed-off-by: Elena Rastorgueva Signed-off-by: Elena Rastorgueva * Megatron Export Update (#5343) * export update for Megatron + change ORT optimization Signed-off-by: David Mosallanezhad * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated export_utils to use autocast instead of manually casting >:/ Signed-off-by: David Mosallanezhad * removed dtype from LayerNorm Signed-off-by: David Mosallanezhad * added comment Signed-off-by: David Mosallanezhad * reverting changes on FloatCast Signed-off-by: David Mosallanezhad * Cherry-picked changes from megatron-norm Signed-off-by: Boris Fomitchev * updated asr_model import to cast_utils Signed-off-by: David Mosallanezhad * updated del onnx_model place Signed-off-by: David Mosallanezhad * changed ort optimization to basic -> temp fix Signed-off-by: David Mosallanezhad Signed-off-by: David Mosallanezhad Signed-off-by: Boris Fomitchev Co-authored-by: David Mosallanezhad Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Boris Fomitchev * Disable sync_batch_comm in validation_step for GPT (#5397) * disable sync_batch_comm in validation_step Signed-off-by: ericharper * Read sync_batch_comm from config or default to False Signed-off-by: Markel Sanz Ausin * Update megatron_gpt_config to default sync_batch_comm to False to avoid CUDA error Signed-off-by: Markel Sanz Ausin * Empty Signed-off-by: MaximumEntropy * Comment out test Signed-off-by: MaximumEntropy Signed-off-by: ericharper Signed-off-by: Markel Sanz Ausin Signed-off-by: MaximumEntropy Signed-off-by: Oleksii Kuchaiev Co-authored-by: Oleksii Kuchaiev Co-authored-by: Markel Sanz Ausin Co-authored-by: Sandeep Subramanian Co-authored-by: Oleksii Kuchaiev * Radtts 1.13 (#5451) * [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (#5358) * [TTS] add CI test for RADTTS training recipe. Signed-off-by: Boris Fomitchev Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev * Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (#5339) (#5478) * Initial refactor Signed-off-by: MaximumEntropy * Resolve config before passing to load_from_checkpoint Signed-off-by: MaximumEntropy * Fixes for model parallel and nemo restore Signed-off-by: MaximumEntropy * Fixes for eval Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert config changes Signed-off-by: MaximumEntropy * Refactor Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix typo Signed-off-by: MaximumEntropy * Remove comments Signed-off-by: MaximumEntropy * Minor Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix validation reconfiguration Signed-off-by: MaximumEntropy * Remove old comment Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes for test_ds Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: MaximumEntropy Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: MaximumEntropy Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * export_utils bugfix (#5480) * updated export_utils Signed-off-by: David Mosallanezhad * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: David Mosallanezhad Co-authored-by: David Mosallanezhad Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Export fixes for Riva (#5496) * Export fixes for Riva Signed-off-by: Boris Fomitchev * Cleaning up training_utils Signed-off-by: Boris Fomitchev Signed-off-by: Boris Fomitchev * added set_start_method + function param bugfix (#5539) * added set_start_method + function param bugfix Signed-off-by: David Mosallanezhad * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * upper bound torchmetrics Signed-off-by: ericharper Signed-off-by: David Mosallanezhad Signed-off-by: ericharper Co-authored-by: David Mosallanezhad Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ericharper * remove notebook (#5548) Signed-off-by: ericharper Signed-off-by: ericharper * update readme Signed-off-by: ericharper * update branch Signed-off-by: ericharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert Signed-off-by: ericharper * revert Signed-off-by: ericharper * revert Signed-off-by: ericharper * revert Signed-off-by: ericharper * revert Signed-off-by: ericharper * revert Signed-off-by: ericharper * revert Signed-off-by: ericharper Signed-off-by: ericharper Signed-off-by: Elena Rastorgueva Signed-off-by: David Mosallanezhad Signed-off-by: Boris Fomitchev Signed-off-by: Markel Sanz Ausin Signed-off-by: MaximumEntropy Signed-off-by: Oleksii Kuchaiev Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: David Co-authored-by: David Mosallanezhad Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Boris Fomitchev Co-authored-by: Oleksii Kuchaiev Co-authored-by: Markel Sanz Ausin Co-authored-by: Sandeep Subramanian Co-authored-by: Oleksii Kuchaiev Co-authored-by: Boris Fomitchev Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- README.rst | 2 +- .../conf/megatron_gpt_config.yaml | 1 - .../megatron_gpt_prompt_learning.py | 1 + .../megatron_t5_prompt_learning.py | 1 + tests/collections/tts/test_tts_exportables.py | 4 +- .../Non_English_Downstream_Tasks_(NER).ipynb | 899 ------------------ 6 files changed, 4 insertions(+), 904 deletions(-) delete mode 100644 tutorials/nlp/Non_English_Downstream_Tasks_(NER).ipynb diff --git a/README.rst b/README.rst index 06ea863600e7..c2334f5e2336 100644 --- a/README.rst +++ b/README.rst @@ -224,7 +224,7 @@ Install it manually if not using the NVIDIA PyTorch container. git clone https://github.com/ericharper/apex.git cd apex - git checkout nm_v1.11.0 + git checkout nm_v1.13.0 pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./ Transformer Engine diff --git a/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml b/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml index a40e5a2ae35f..0604c0287d05 100755 --- a/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml +++ b/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml @@ -106,7 +106,6 @@ model: apex_transformer_log_level: 30 # Python logging level displays logs with severity greater than or equal to this gradient_as_bucket_view: True # PyTorch DDP argument. Allocate gradients in a contiguous bucket to save memory (less fragmentation and buffer memory) sync_batch_comm: False # Enable stream synchronization after each p2p communication between pipeline stages - use_unified_checkpoint: True # Use model parallel independent checkpointing ## Activation Checkpointing # NeMo Megatron supports 'selective' activation checkpointing where only the memory intensive part of attention is checkpointed. diff --git a/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py b/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py index da620dd434ec..554178e7e95d 100644 --- a/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py +++ b/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py @@ -13,6 +13,7 @@ # limitations under the License. import torch.multiprocessing as mp +from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer from pytorch_lightning.callbacks.timer import Timer diff --git a/examples/nlp/language_modeling/megatron_t5_prompt_learning.py b/examples/nlp/language_modeling/megatron_t5_prompt_learning.py index e35c023d7d26..efcf01288a7c 100644 --- a/examples/nlp/language_modeling/megatron_t5_prompt_learning.py +++ b/examples/nlp/language_modeling/megatron_t5_prompt_learning.py @@ -13,6 +13,7 @@ # limitations under the License. import torch.multiprocessing as mp +from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer from pytorch_lightning.callbacks.timer import Timer diff --git a/tests/collections/tts/test_tts_exportables.py b/tests/collections/tts/test_tts_exportables.py index d7684de732e5..c7083b45f6d7 100644 --- a/tests/collections/tts/test_tts_exportables.py +++ b/tests/collections/tts/test_tts_exportables.py @@ -15,7 +15,6 @@ import tempfile import pytest -import torch from omegaconf import OmegaConf from nemo.collections.tts.models import FastPitchModel, HifiGanModel, RadTTSModel @@ -81,5 +80,4 @@ def test_RadTTSModel_export_to_torchscript(self, radtts_model): model = radtts_model.cuda() with tempfile.TemporaryDirectory() as tmpdir: filename = os.path.join(tmpdir, 'rad.ts') - with torch.cuda.amp.autocast(enabled=True): - model.export(output=filename, verbose=True, check_trace=True) + model.export(output=filename, verbose=True, check_trace=True) diff --git a/tutorials/nlp/Non_English_Downstream_Tasks_(NER).ipynb b/tutorials/nlp/Non_English_Downstream_Tasks_(NER).ipynb deleted file mode 100644 index bfa56e5a2567..000000000000 --- a/tutorials/nlp/Non_English_Downstream_Tasks_(NER).ipynb +++ /dev/null @@ -1,899 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "OETcTQlcguCm" - }, - "outputs": [], - "source": [ - "BRANCH = 'main'" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "o_0K1lsW1dj9" - }, - "outputs": [], - "source": [ - "\"\"\"\n", - "You can run either this notebook locally (if you have all the dependencies and a GPU) or on Google Colab.\n", - "\n", - "Instructions for setting up Colab are as follows:\n", - "1. Open a new Python 3 notebook.\n", - "2. Import this notebook from GitHub (File -> Upload Notebook -> \"GITHUB\" tab -> copy/paste GitHub URL)\n", - "3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n", - "4. Run this cell to set up dependencies.\n", - "\"\"\"\n", - "# If you're using Google Colab and not running locally, run this cell\n", - "\n", - "# install NeMo\n", - "!python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[nlp]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pC0slAc0h9zN", - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# If you're not using Colab, you might need to upgrade jupyter notebook to avoid the following error:\n", - "# 'ImportError: IProgress not found. Please update jupyter and ipywidgets.'\n", - "\n", - "! pip install ipywidgets\n", - "! jupyter nbextension enable --py widgetsnbextension\n", - "\n", - "# Please restart the kernel after running this cell" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "dzqD2WDFOIN-" - }, - "outputs": [], - "source": [ - "from nemo.collections import nlp as nemo_nlp\n", - "from nemo.utils.exp_manager import exp_manager\n", - "\n", - "import os\n", - "import wget \n", - "import torch\n", - "import pytorch_lightning as pl\n", - "from omegaconf import OmegaConf\n", - "\n", - "import zipfile\n", - "import random\n", - "from glob import glob" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "daYw_Xll2ZR9" - }, - "source": [ - "# Tutorial Overview\n", - "In this tutorial, we will show how to use a pre-trained BERT language model on a non-English downstream task. Here we are going to use Persian language and Named entity recognition (NER) task as an example. Note, most of the rest downstream tasks supported in NeMo should work similarly for other languages. \n", - "\n", - "# Task Description\n", - "NER is the task of detecting and classifying key information (entities) in text.\n", - "For example, in a sentence: `Mary lives in Santa Clara and works at NVIDIA`, we should detect that `Mary` is a person, `Santa Clara` is a location and `NVIDIA` is a company.\n", - "\n", - "In this tutorial we will be using [BERT language model](https://arxiv.org/abs/1810.04805).\n", - "\n", - "To read more about other topics and downstream task that can be done in NeMo, you can see the [NeMo's tutorial page](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ZnuziSwJ1yEB" - }, - "source": [ - "# Dataset\n", - "\n", - "In this tutorial we are going to use [Persian Arman dataset for our NER task](https://github.com/HaniehP/PersianNER).\n", - "\n", - "Arman is a hand annotated Persian corpus for NER task with 250,015 tokens and 7,682 sentences. Using [IOB encoding](https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)), tokens are labeled with either one of the following name entities or labeled with O. \n", - "\n", - "* event = event\n", - "* fac = facility\n", - "* loc = location\n", - "* org = organization\n", - "* pers = person\n", - "* pro = product\n", - "\n", - "Each of these has a label staring with **B** that indicates it is the first token of the name entity and with **I** for others. \n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "qzcZ3nb_-SVT" - }, - "source": [ - "# NeMo Token Classification Data Format\n", - "\n", - "[TokenClassification Model](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/nlp/models/token_classification/token_classification_model.py) in NeMo supports NER and other token level classification tasks, as long as the data follows the format specified below. \n", - "\n", - "Token Classification Model requires the data to be split into 2 files: \n", - "* text.txt \n", - "* labels.txt. \n", - "\n", - "Each line of the **text.txt** file contains text sequences, where words are separated with spaces, i.e.: \n", - "[WORD] [SPACE] [WORD] [SPACE] [WORD].\n", - "\n", - "The **labels.txt** file contains corresponding labels for each word in text.txt, the labels are separated with spaces, i.e.:\n", - "[LABEL] [SPACE] [LABEL] [SPACE] [LABEL].\n", - "\n", - "Example of a text.txt file:\n", - "```\n", - "دبیر شورای عالی انقلاب فرهنگی از گنجانده شدن 5 زبان خارجی جدید در برنامه درسی مدارس خبر داد.\n", - "```\n", - "Corresponding labels.txt file:\n", - "```\n", - "O B_ORG I_ORG I_ORG I_ORG O O O O O O O O O O O O O O \n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "SL58EWkd2ZVb" - }, - "source": [ - "## Download and preprocess the data¶" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "_z2tCEIXZa90" - }, - "source": [ - "You can download the Arman dataset by cloning to the following github repository: https://github.com/HaniehP/PersianNER.\n", - "\n", - "After downloading the data, you will see a few files and folders inside a directory named PersianNER. Take ArmanPersoNERCorpus.zip and upload it to `DATA_DIR` (if running in a docker or locally) or use **files** from Google colab to upload the files.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "n8HZrDmr12_-" - }, - "outputs": [], - "source": [ - "# path to the folder with ArmanPersoNERCorpus.zip file (if running locally on in a docker)\n", - "DATA_DIR = \"PATH_TO_FOLDER_WITH_ZIP.ZIP_FILE\"\n", - "WORK_DIR = \"WORK_DIR\"\n", - "\n", - "# adding an empty subfolder for data (otherwise it can interact with existing folders in DATA_DIR)\n", - "subfolder = f\"{DATA_DIR}/non_eng_NER\"\n", - "\n", - "os.makedirs(WORK_DIR, exist_ok=True)\n", - "os.makedirs(DATA_DIR, exist_ok=True)\n", - "os.makedirs(subfolder, exist_ok=True)\n", - "\n", - "! cp $DATA_DIR/ArmanPersoNERCorpus.zip $subfolder/.\n", - "DATA_DIR = f\"{DATA_DIR}/non_eng_NER\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "k1TmF5rrdPMj" - }, - "outputs": [], - "source": [ - "if 'google.colab' in str(get_ipython):\n", - " from google.colab import files\n", - " uploaded = files.upload() " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "HTUKJOownkrF" - }, - "outputs": [], - "source": [ - "if 'google.colab' in str(get_ipython):\n", - " ! mv ArmanPersoNERCorpus.zip $DATA_DIR/." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "NhUzIeF0Yg0l" - }, - "source": [ - "Let's extract files from the zip file. It will generate three test and train files which have overlaps and are intended to be used in turn as train and test sets. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Y01BdjPRW-7B" - }, - "outputs": [], - "source": [ - "! cd $DATA_DIR && unzip \"ArmanPersoNERCorpus.zip\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "qaDgL-sQaX2e" - }, - "source": [ - "Next, we will be putting all data into a single file and removing any repeated sentences. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "B0T4CzJvbBJ4" - }, - "outputs": [], - "source": [ - "file_all = os.path.join(DATA_DIR, \"all_data.txt\")\n", - "with open(file_all, \"w\") as f1:\n", - " for filename in glob(f\"{DATA_DIR}/test_fold*.txt\") + glob(f\"{DATA_DIR}/train_fold*.txt\"):\n", - " with open(filename, \"r\", encoding = \"ISO-8859-1\") as f2:\n", - " for line in f2:\n", - " f1.write(line)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "VzVuET8HESFB" - }, - "source": [ - "Now, you need to convert this data into NeMo compatible format before starting the training process. For this purpose, you can run [examples/nlp/token_classification/data/import_from_iob_format.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/token_classification/data/import_from_iob_format.py) on your train and dev files, as follows:\n", - "\n", - "\n", - "\n", - "\n", - "```\n", - "python examples/nlp/token_classification/data/import_from_iob_format.py --data_file PATH_TO_IOB_FORMAT_DATAFILE, e.g., \"DATA_DIR/all_data.txt\"\n", - "```\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ord_6KlkeNl8" - }, - "outputs": [], - "source": [ - "!wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/nlp/token_classification/data/import_from_iob_format.py" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IfSUkxffeSpL" - }, - "outputs": [], - "source": [ - "!python import_from_iob_format.py --data_file $DATA_DIR/all_data.txt" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Aj0rXbYXbivW" - }, - "source": [ - "Now we process the data to remove potentially any repeated sentences and then split them into train and dev sets. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "CgvnTlqzbq5-" - }, - "outputs": [], - "source": [ - "sent_dict = dict()\n", - "line_removed = dict()\n", - "line_counter = 0\n", - "with open(DATA_DIR + \"/text_all_not_repeated.txt\", \"w\") as f1:\n", - " with open(DATA_DIR + \"/text_all_data.txt\", \"r\") as f2:\n", - " for line in f2:\n", - " line_counter += 1\n", - " if (not line in sent_dict):\n", - " sent_dict[line] = 1\n", - " f1.write(line)\n", - " else:\n", - " line_removed[line_counter] = 1\n", - "#labels:\n", - "line_counter = 0\n", - "with open(DATA_DIR + \"/labels_all_not_repeated.txt\", \"w\") as f1:\n", - " with open(DATA_DIR + \"/labels_all_data.txt\", \"r\") as f2:\n", - " for line in f2:\n", - " line_counter += 1\n", - " if(not line_counter in line_removed):\n", - " f1.write(line)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0cO3crs_gXjt" - }, - "source": [ - "After preprocessing the data and removing repeated sentences, there will be 7668 total valid sentences. We will be using 85% of that as train and 15% as dev. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "7oHQYsMMbugP" - }, - "outputs": [], - "source": [ - "total_data = 7668\n", - "train_share = 0.85\n", - "used_lines_train = dict()\n", - "flag = 1\n", - "count = 0\n", - "while flag:\n", - " idx = random.randint(1, total_data)\n", - " if (not idx in used_lines_train):\n", - " used_lines_train[idx] = 1\n", - " count += 1\n", - " if (count/total_data > train_share):\n", - " flag = 0\n", - "\n", - "line_counter = 0\n", - "with open(DATA_DIR+ \"/text_train.txt\", \"w\") as f1:\n", - " with open(DATA_DIR + \"/text_dev.txt\", \"w\") as f2:\n", - " with open(DATA_DIR + \"/text_all_not_repeated.txt\", \"r\") as f3:\n", - " for line in f3:\n", - " line_counter += 1\n", - " if (line_counter in used_lines_train):\n", - " f1.write(line)\n", - " else:\n", - " f2.write(line)\n", - "\n", - "line_counter = 0\n", - "with open(DATA_DIR + \"/labels_train.txt\", \"w\") as f1:\n", - " with open(DATA_DIR + \"/labels_dev.txt\", \"w\") as f2:\n", - " with open(DATA_DIR + \"/labels_all_not_repeated.txt\", \"r\") as f3:\n", - " for line in f3:\n", - " line_counter += 1\n", - " if (line_counter in used_lines_train):\n", - " f1.write(line)\n", - " else:\n", - " f2.write(line)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "1Q-GWNwDbzKl" - }, - "source": [ - "Finally, we remove files that are not needed anymore." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "II20ustub5BF" - }, - "outputs": [], - "source": [ - "print(\"Removed files:\")\n", - "for filename in os.listdir(DATA_DIR):\n", - " if (filename == \"text_dev.txt\" or filename == \"text_train.txt\" or filename == \"labels_dev.txt\" or filename == \"labels_train.txt\"):\n", - " continue\n", - " print(filename)\n", - " os.remove(DATA_DIR + \"/\" + filename)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U8Ty5_S7Ye8h" - }, - "source": [ - "Now, the data folder should contain these 4 files:" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "L8vsyh3JZH26" - }, - "source": [ - "\n", - "\n", - "* labels_dev.txt\n", - "* labels_train.txt\n", - "* text_dev.txt\n", - "* text_train.txt\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "qB0oLE4R9EhJ" - }, - "outputs": [], - "source": [ - "! ls -l $DATA_DIR" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6UDPgadLN6SG" - }, - "outputs": [], - "source": [ - "# let's take a look at the data \n", - "print('Text:')\n", - "! head -n 5 {DATA_DIR}/text_train.txt\n", - "\n", - "print('\\nLabels:')\n", - "! head -n 5 {DATA_DIR}/labels_train.txt" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "_whKCxfTMo6Y" - }, - "source": [ - "# Model configuration\n", - "\n", - "Our Named Entity Recognition model is comprised of the pretrained [BERT](https://arxiv.org/pdf/1810.04805.pdf) model followed by a Token Classification layer.\n", - "\n", - "The model is defined in a config file which declares multiple important sections. They are:\n", - "- **model**: All arguments that are related to the Model - language model, token classifier, optimizer and schedulers, datasets and any other related information\n", - "\n", - "- **trainer**: Any argument to be passed to PyTorch Lightning" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "T1gA8PsJ13MJ" - }, - "outputs": [], - "source": [ - "MODEL_CONFIG = \"token_classification_config.yaml\"\n", - "# download the model's configuration file \n", - "config_dir = WORK_DIR + '/configs/'\n", - "os.makedirs(config_dir, exist_ok=True)\n", - "if not os.path.exists(config_dir + MODEL_CONFIG):\n", - " print('Downloading config file...')\n", - " wget.download(f'https://raw.githubusercontent.com/NVIDIA/NeMo/{BRANCH}/examples/nlp/token_classification/conf/' + MODEL_CONFIG, config_dir)\n", - "else:\n", - " print ('config file is already exists')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mX3KmWMvSUQw" - }, - "outputs": [], - "source": [ - "# this line will print the entire config of the model\n", - "config_path = f'{WORK_DIR}/configs/{MODEL_CONFIG}'\n", - "print(config_path)\n", - "config = OmegaConf.load(config_path)\n", - "print(OmegaConf.to_yaml(config))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ZCgWzNBkaQLZ" - }, - "source": [ - "# Fine-tuning the model using Arman dataset\n", - "\n", - "Let's select a [`bert-base-multilingual-uncased`](https://huggingface.co/bert-base-multilingual-uncased) BERT model and fine-tune it on the Arman dataset.\n", - "\n", - "## Setting up Data within the config\n", - "\n", - "Among other things, the config file contains dictionaries called dataset, train_ds and validation_ds. These are configurations used to setup the Dataset and DataLoaders of the corresponding config.\n", - "\n", - "We assume that both training and evaluation files are in the same directory and use the default names mentioned during the data download step. \n", - "So, to start model training, we simply need to specify `model.dataset.data_dir`, like we are going to do below.\n", - "\n", - "Also notice that some config lines, including `model.dataset.data_dir`, have `???` in place of paths, this means that values for these fields are required to be specified by the user.\n", - "\n", - "Let us now add the data directory path to the config.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "LQHCJN-ZaoLp" - }, - "outputs": [], - "source": [ - "# in this tutorial train and dev datasets are located in the same folder, so it is enought to add the path of the data directory to the config\n", - "config.model.dataset.data_dir = DATA_DIR\n", - "\n", - "# if you want to use the full dataset, set NUM_SAMPLES to -1\n", - "NUM_SAMPLES = 1000\n", - "config.model.train_ds.num_samples = NUM_SAMPLES\n", - "config.model.validation_ds.num_samples = NUM_SAMPLES\n", - "\n", - "# for demonstartion purposes we're running only a single epoch\n", - "config.trainer.max_epochs = 5\n", - "print(OmegaConf.to_yaml(config.model))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "nB96-3sTc3yk" - }, - "source": [ - "## Building the PyTorch Lightning Trainer\n", - "\n", - "NeMo models are primarily PyTorch Lightning modules - and therefore are entirely compatible with the PyTorch Lightning ecosystem.\n", - "\n", - "Let's first instantiate a Trainer object" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "1tG4FzZ4Ui60" - }, - "outputs": [], - "source": [ - "print(\"Trainer config - \\n\")\n", - "print(OmegaConf.to_yaml(config.trainer))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "knF6QeQQdMrH" - }, - "outputs": [], - "source": [ - "# lets modify some trainer configs\n", - "# checks if we have GPU available and uses it\n", - "accelerator = 'gpu' if torch.cuda.is_available() else 'cpu'\n", - "config.trainer.devices = 1\n", - "config.trainer.accelerator = accelerator\n", - "\n", - "config.trainer.precision = 16 if torch.cuda.is_available() else 32\n", - "\n", - "# for mixed precision training, uncomment the line below (precision should be set to 16 and amp_level to O1):\n", - "# config.trainer.amp_level = O1\n", - "\n", - "# remove distributed training flags\n", - "config.trainer.strategy = None\n", - "\n", - "# setup max number of steps to reduce training time for demonstration purposes of this tutorial\n", - "config.trainer.max_steps = 32\n", - "\n", - "config.exp_manager.exp_dir = WORK_DIR\n", - "trainer = pl.Trainer(**config.trainer)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8IlEMdVxdr6p" - }, - "source": [ - "## Setting up a NeMo Experiment¶\n", - "\n", - "NeMo has an experiment manager that handles logging and checkpointing for us, so let's use it:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "exp_manager(trainer, config.get(\"exp_manager\", None))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "8uztqGAmdrYt" - }, - "outputs": [], - "source": [ - "exp_dir = config.exp_manager.exp_dir\n", - "\n", - "# the exp_dir provides a path to the current experiment for easy access\n", - "exp_dir = str(exp_dir)\n", - "exp_dir" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8tjLhUvL_o7_" - }, - "source": [ - "Before initializing the model, we might want to modify some of the model configs. For example, we might want to modify the pretrained BERT model:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Xeuc2i7Y_nP5" - }, - "outputs": [], - "source": [ - "# get the list of supported BERT-like models, for the complete list of HugginFace models, see https://huggingface.co/models\n", - "print(nemo_nlp.modules.get_pretrained_lm_models_list(include_external=False))\n", - "\n", - "# specify BERT-like model, you want to use\n", - "PRETRAINED_BERT_MODEL = \"bert-base-multilingual-uncased\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "fzNZNAVRjDD-" - }, - "source": [ - "Now, we are ready to initialize our model. During the model initialization call, the dataset and data loaders we'll be prepared for training and evaluation.\n", - "Also, the pretrained BERT model will be downloaded, note it can take up to a few minutes depending on the size of the chosen BERT model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NgsGLydWo-6-" - }, - "outputs": [], - "source": [ - "model = nemo_nlp.models.TokenClassificationModel(cfg=config.model, trainer=trainer)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "kQ592Tx4pzyB" - }, - "source": [ - "## Monitoring training progress\n", - "Optionally, you can create a Tensorboard visualization to monitor training progress." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mTJr16_pp0aS" - }, - "outputs": [], - "source": [ - "try:\n", - " from google import colab\n", - " COLAB_ENV = True\n", - "except (ImportError, ModuleNotFoundError):\n", - " COLAB_ENV = False\n", - "\n", - "# Load the TensorBoard notebook extension\n", - "if COLAB_ENV:\n", - " %load_ext tensorboard\n", - " %tensorboard --logdir {exp_dir}\n", - "else:\n", - " print(\"To use tensorboard, please use this notebook in a Google Colab environment.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Fj1pdEdD0Vm3" - }, - "source": [ - "See how it performs before fine-tuning" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "wo1oVGIT0aBZ" - }, - "outputs": [], - "source": [ - "# define the list of queries for inference\n", - "queries = [\n", - " 'حمید طاهایی افزود : برای اجرای این طرحها 0 میلیارد و 0 میلیون ریال اعتبار هزینه شده است . ',\n", - " 'دکتر اصغری دبیر چهارمین همایش انجمن زمین‌شناسی ایران در این زمینه گفت : از مجموع چهار صد مقاله رسیده به دبیرخانه همایش ، يك صد و هشتاد مقاله ظرف مدت دو روز در هشت سالن همایش برگزار شد . '\n", - "]\n", - "results = model.add_predictions(queries)\n", - "\n", - "for query, result in zip(queries, results):\n", - " print()\n", - " print(f'Query : {query}')\n", - " print(f'Result: {result.strip()}\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "kyElt0Es-aSk" - }, - "outputs": [], - "source": [ - "print(\"Trainer config - \\n\")\n", - "print(OmegaConf.to_yaml(config.trainer))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "hUvnSpyjp0Dh" - }, - "outputs": [], - "source": [ - "# start model training\n", - "trainer.fit(model)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "MOrR0PeJqa0j" - }, - "source": [ - "After the training is complete, `.nemo` file that contains model's checkpoints and all associated artifacts could be found under `nemo_experiments/token_classification_model/DATE_TIME`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-lFo27PJ0o3W" - }, - "source": [ - "See how it gets better after:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "9fNcBnz80rLO" - }, - "outputs": [], - "source": [ - "results = model.add_predictions(queries)\n", - "\n", - "for query, result in zip(queries, results):\n", - " print()\n", - " print(f'Query : {query}')\n", - " print(f'Result: {result.strip()}\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JxBiIKMlH8yv" - }, - "source": [ - "After training for 100 epochs, with the default config and NUM_SAMPLES = -1 (i.e. all data is used), your model performance should look similar to this: \n", - "```\n", - " label precision recall f1 support\n", - " O (label_id: 0) 99.09 99.19 99.14 32867\n", - " B-event (label_id: 1) 67.74 70.00 68.85 90\n", - " B-fac (label_id: 2) 70.89 73.68 72.26 76\n", - " B-loc (label_id: 3) 87.45 82.70 85.01 497\n", - " B-org (label_id: 4) 81.88 87.06 84.39 649\n", - " B-pers (label_id: 5) 94.93 93.36 94.14 542\n", - " B-pro (label_id: 6) 79.31 70.41 74.59 98\n", - " I-event (label_id: 7) 87.38 74.72 80.55 352\n", - " I-fac (label_id: 8) 83.08 77.14 80.00 140\n", - " I-loc (label_id: 9) 77.78 73.39 75.52 124\n", - " I-org (label_id: 10) 86.51 89.93 88.18 834\n", - " I-pers (label_id: 11) 95.30 94.35 94.82 301\n", - " I-pro (label_id: 12) 82.86 86.57 84.67 67\n", - " -------------------\n", - " micro avg 97.78 97.78 97.78 36637\n", - " macro avg 84.17 82.50 83.24 36637\n", - " weighted avg 97.78 97.78 97.77 36637\n", - "```\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "VZp9STMHQAp1" - }, - "source": [ - "**References**\n", - "\n", - "1. Devlin, Jacob, et al. \"BERT: Pre-training of deep bidirectional transformers for language understanding.\" arXiv preprint arXiv:1810.04805 (2018).\n", - "\n", - "2. Hanieh Poostchi, Ehsan Zare Borzeshi, Mohammad Abdous, and Massimo Piccardi, \"PersoNER: Persian Named-Entity Recognition,\" The 26th International Conference on Computational Linguistics (COLING 2016), pages 3381–3389, Osaka, Japan, 2016.\n", - "\n", - "3. Hanieh Poostchi, Ehsan Zare Borzeshi, and Massimo Piccardi, \"BiLSTM-CRF for Persian Named-Entity Recognition; ArmanPersoNERCorpus: the First Entity-Annotated Persian Dataset,\" The 11th Edition of the Language Resources and Evaluation Conference (LREC), Miyazaki, Japan, 7-12 May 2018, ISLRN 399-379-640-828-6, ISLRN 921-509-141-609-6." - ] - } - ], - "metadata": { - "accelerator": "GPU", - "colab": { - "collapsed_sections": [], - "name": "Non_English_Downstream_Tasks_(NER).ipynb", - "private_outputs": true, - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.13" - }, - "pycharm": { - "stem_cell": { - "cell_type": "raw", - "metadata": { - "collapsed": false - }, - "source": [] - } - } - }, - "nbformat": 4, - "nbformat_minor": 1 -} From e7c82f577b04abd0412b6f60c48652f57ec54b14 Mon Sep 17 00:00:00 2001 From: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri, 9 Dec 2022 00:04:00 +0400 Subject: [PATCH 002/154] Optimized loop and bugfix in SDE (#5573) - Fixed bug with loading custom data attributes from JSON in Speech Data Explorer Signed-off-by: George Zelenfroynd Signed-off-by: Elena Rastorgueva --- tools/speech_data_explorer/data_explorer.py | 22 +++++++++------------ 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/tools/speech_data_explorer/data_explorer.py b/tools/speech_data_explorer/data_explorer.py index 281228ad7378..3602c369b135 100755 --- a/tools/speech_data_explorer/data_explorer.py +++ b/tools/speech_data_explorer/data_explorer.py @@ -298,19 +298,15 @@ def append_data( data[-1]['I'] = measures['insertions'] data[-1]['D'] = measures['deletions'] data[-1]['D-I'] = measures['deletions'] - measures['insertions'] - else: - for k in item: - if k not in data[-1]: - data[-1][k] = item[k] - if estimate_audio: - filepath = absolute_audio_filepath(item['audio_filepath'], data_filename) - signal, sr = librosa.load(path=filepath, sr=None) - bw = eval_bandwidth(signal, sr) - item['freq_bandwidth'] = int(bw) - item['level_db'] = 20 * np.log10(np.max(np.abs(signal))) - for k in item: - if k not in data[-1]: - data[-1][k] = item[k] + if estimate_audio: + filepath = absolute_audio_filepath(item['audio_filepath'], data_filename) + signal, sr = librosa.load(path=filepath, sr=None) + bw = eval_bandwidth(signal, sr) + item['freq_bandwidth'] = int(bw) + item['level_db'] = 20 * np.log10(np.max(np.abs(signal))) + for k in item: + if k not in data[-1]: + data[-1][k] = item[k] vocabulary_data = [{'word': word, 'count': vocabulary[word]} for word in vocabulary] return ( From 0ce4428b26614e4e9c32b6aa29440a2e79ec3de8 Mon Sep 17 00:00:00 2001 From: Nithin Rao Date: Fri, 9 Dec 2022 03:26:19 +0530 Subject: [PATCH 003/154] Update torchmetrics (#5566) * add task arg Signed-off-by: nithinraok * update state Signed-off-by: nithinraok Signed-off-by: nithinraok Co-authored-by: Taejin Park Signed-off-by: Elena Rastorgueva --- nemo/collections/asr/models/label_models.py | 4 ++-- requirements/requirements_lightning.txt | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/nemo/collections/asr/models/label_models.py b/nemo/collections/asr/models/label_models.py index 94048e5ab0c4..62e4dd5e456f 100644 --- a/nemo/collections/asr/models/label_models.py +++ b/nemo/collections/asr/models/label_models.py @@ -165,7 +165,7 @@ def __init__(self, cfg: DictConfig, trainer: Trainer = None): self.encoder = EncDecSpeakerLabelModel.from_config_dict(cfg.encoder) self.decoder = EncDecSpeakerLabelModel.from_config_dict(cfg.decoder) - self._macro_accuracy = Accuracy(num_classes=num_classes, average='macro') + self._macro_accuracy = Accuracy(num_classes=num_classes, average='macro', task='multiclass') self.labels = None if hasattr(self._cfg, 'spec_augment') and self._cfg.spec_augment is not None: @@ -365,7 +365,7 @@ def evaluation_step(self, batch, batch_idx, dataloader_idx: int = 0, tag: str = acc_top_k = self._accuracy(logits=logits, labels=labels) correct_counts, total_counts = self._accuracy.correct_counts_k, self._accuracy.total_counts_k self._macro_accuracy.update(preds=logits, target=labels) - stats = self._macro_accuracy._get_final_stats() + stats = self._macro_accuracy._final_state() return { f'{tag}_loss': loss_value, diff --git a/requirements/requirements_lightning.txt b/requirements/requirements_lightning.txt index 9876002b806d..faa23c758746 100644 --- a/requirements/requirements_lightning.txt +++ b/requirements/requirements_lightning.txt @@ -2,7 +2,7 @@ hydra-core>=1.2.0,<1.3 omegaconf>=2.2,<2.3 pytorch-lightning>=1.8.3 pyyaml<6 # Pinned until omegaconf works with pyyaml>=6 -torchmetrics>=0.4.1rc0,<=0.10.3 +torchmetrics transformers>=4.0.1,<=4.21.2 wandb webdataset>=0.1.48,<=0.1.62 From b0fe80cd58a96c5469a2ab9feb75ce20b9d3fd5f Mon Sep 17 00:00:00 2001 From: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Date: Fri, 9 Dec 2022 10:04:18 -0800 Subject: [PATCH 004/154] remove useless files. (#5580) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- nemo/collections/common/data/vocabs.py | 383 ------------------------- 1 file changed, 383 deletions(-) delete mode 100644 nemo/collections/common/data/vocabs.py diff --git a/nemo/collections/common/data/vocabs.py b/nemo/collections/common/data/vocabs.py deleted file mode 100644 index 2a4123ff2c69..000000000000 --- a/nemo/collections/common/data/vocabs.py +++ /dev/null @@ -1,383 +0,0 @@ -# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import itertools -import re -import string -import time -import unicodedata -from abc import ABC, abstractmethod -from builtins import str as unicode -from contextlib import contextmanager -from typing import List - -import nltk -import torch - -from nemo.collections.common.parts.preprocessing import parsers -from nemo.utils import logging -from nemo.utils.get_rank import is_global_rank_zero - -_words_re = re.compile("([a-z\-]+'[a-z\-]+|[a-z\-]+)|([^a-z{}]+)") - - -def _text_preprocessing(text): - text = unicode(text) - text = ''.join(char for char in unicodedata.normalize('NFD', text) if unicodedata.category(char) != 'Mn') - text = text.lower() - text = re.sub("[^ a-z'\".,?!()\[\]:;\-]", "", text) - return text - - -def _word_tokenize(text): - words = _words_re.findall(text) - words = [re.sub(r'\s(\d)', r'\1', word[1].upper()) if word[0] == '' else word[0] for word in words] - return words - - -def download_corpora(): - # Download NLTK datasets if this class is to be instantiated - try: - nltk.data.find('taggers/averaged_perceptron_tagger.zip') - except LookupError: - nltk.download('averaged_perceptron_tagger', quiet=True) - try: - nltk.data.find('corpora/cmudict.zip') - except LookupError: - nltk.download('cmudict', quiet=True) - - -class G2p: - def __init__( - self, - g2p_library, - phoneme_dict_path=None, - use_seq2seq_for_oov=False, - ignore_ambiguous_words=True, - text_preprocessing_func=_text_preprocessing, - word_tokenize_func=_word_tokenize, - ): - self._g2p = g2p_library - self.homograph2features = self._g2p.homograph2features - self.g2p_dict = self._construct_grapheme2phoneme_dict(phoneme_dict_path) - self.use_seq2seq_for_oov = use_seq2seq_for_oov - self.ignore_ambiguous_words = ignore_ambiguous_words - - self.text_preprocessing_func = text_preprocessing_func - self.word_tokenize_func = word_tokenize_func - - @staticmethod - def _construct_grapheme2phoneme_dict(phoneme_dict_path=None, encoding='latin-1'): - if phoneme_dict_path is None: - from nltk.corpus import cmudict - - return cmudict.dict() - - _alt_re = re.compile(r'\([0-9]+\)') - g2p_dict = {} - with open(phoneme_dict_path, encoding=encoding) as file: - for line in file: - if len(line) > 0 and ('A' <= line[0] <= 'Z' or line[0] == "'"): - parts = line.split(' ') - word = re.sub(_alt_re, '', parts[0]) - word = word.lower() - - pronunciation = parts[1].strip().split(" ") - if word in g2p_dict: - g2p_dict[word].append(pronunciation) - else: - g2p_dict[word] = [pronunciation] - return g2p_dict - - def handle_ambiguous(self, word): - if not self.ignore_ambiguous_words or len(self.g2p_dict[word]) == 1: - return True - return False - - def __call__(self, text): - text = self.text_preprocessing_func(text) - words = self.word_tokenize_func(text) - words_and_pos_tags = nltk.pos_tag(words) - - prons = [] - for word, pos in words_and_pos_tags: - word_by_hyphen = word.split("-") - - # punctuation - if re.search("[a-zA-Z]", word) is None: - pron = list(word) - # homograph - elif word in self.homograph2features: - pron1, pron2, pos1 = self.homograph2features[word] - if pos.startswith(pos1): - pron = pron1 - else: - pron = pron2 - # `'s` suffix - elif ( - len(word) > 2 - and word.endswith("'s") - and (word not in self.g2p_dict) - and (word[:-2] in self.g2p_dict) - and self.handle_ambiguous(word[:-2]) - ): - pron = self.g2p_dict[word[:-2]][0] + ["Z"] - # `s` suffix - elif ( - len(word) > 1 - and word.endswith("s") - and (word not in self.g2p_dict) - and (word[:-1] in self.g2p_dict) - and self.handle_ambiguous(word[:-1]) - ): - pron = self.g2p_dict[word[:-1]][0] + ["Z"] - # g2p dict - elif word in self.g2p_dict and self.handle_ambiguous(word): - pron = self.g2p_dict[word][0] - # word with hyphens - elif len(word_by_hyphen) > 1 and all( - [sub_word in self.g2p_dict and self.handle_ambiguous(sub_word) for sub_word in word_by_hyphen] - ): - pron = [] - for sub_word in word_by_hyphen: - pron.extend(self.g2p_dict[sub_word][0]) - pron.extend(["-"]) - pron.pop() - else: - if self.use_seq2seq_for_oov: - # run gru-based seq2seq model from _g2p - pron = self._g2p.predict(word) - else: - pron = word - - prons.extend(pron) - - return prons - - -class Base(ABC): - """Vocabulary for turning str text to list of int tokens.""" - - # fmt: off - PUNCT = ( # Derived from LJSpeech and "/" additionally - ',', '.', '!', '?', '-', - ':', ';', '/', '"', '(', - ')', '[', ']', '{', '}', - ) - # fmt: on - PAD, BLANK, OOV = '', '', '' - - def __init__(self, labels, *, pad=PAD, blank=BLANK, oov=OOV, sep='', add_blank_at="last_but_one"): - super().__init__() - - labels = list(labels) - self.pad, labels = len(labels), labels + [pad] # Padding - - if add_blank_at is not None: - self.blank, labels = len(labels), labels + [blank] # Reserved for blank from QN - else: - # use add_blank_at=None only for ASR where blank is added automatically - self.blank = -1 - - self.oov, labels = len(labels), labels + [oov] # Out Of Vocabulary - - if add_blank_at == "last": - labels[-1], labels[-2] = labels[-2], labels[-1] - self.oov, self.blank = self.blank, self.oov - - self.labels = labels - self.sep = sep - - self._util_ids = {self.pad, self.blank, self.oov} - self._label2id = {l: i for i, l in enumerate(labels)} - self._id2label = labels - - def __call__(self, text: str) -> List[int]: - return self.encode(text) - - @abstractmethod - def encode(self, text: str) -> List[int]: - """Turns str text into int tokens.""" - - def decode(self, tokens: List[int]) -> str: - """Turns ints tokens into str text.""" - return self.sep.join(self._id2label[t] for t in tokens if t not in self._util_ids) - - -class Chars(Base): - """Chars vocabulary.""" - - def __init__( - self, punct=True, spaces=False, apostrophe=True, add_blank_at="last_but_one", - ): - labels = [] - self.space, labels = len(labels), labels + [' '] # Space - labels.extend(string.ascii_lowercase) - if apostrophe: - labels.append("'") # Apostrophe for saving "don't" and "Joe's" - - if punct: - labels.extend(self.PUNCT) - - super().__init__(labels, add_blank_at=add_blank_at) - - self.punct = punct - self.spaces = spaces - - self._parser = parsers.ENCharParser(labels) - - def encode(self, text): - """See base class.""" - text = self._parser._normalize(text) # noqa - - if self.spaces: - for p in set(text) & set(self.PUNCT): - text = text.replace(p, f' {p} ') - text = text.strip().replace(' ', ' ') - - return self._parser._tokenize(text) # noqa - - -class Phonemes(Base): - """Phonemes vocabulary.""" - - # fmt: off - VOWELS = ( - 'AA', 'AE', 'AH', 'AO', 'AW', - 'AY', 'EH', 'ER', 'EY', 'IH', - 'IY', 'OW', 'OY', 'UH', 'UW', - ) - CONSONANTS = ( - 'B', 'CH', 'D', 'DH', 'F', 'G', - 'HH', 'JH', 'K', 'L', 'M', 'N', - 'NG', 'P', 'R', 'S', 'SH', 'T', - 'TH', 'V', 'W', 'Y', 'Z', 'ZH', - ) - # fmt: on - - def __init__( - self, - punct=True, - stresses=False, - spaces=True, - chars=False, - *, - space=' ', - silence=None, - apostrophe=True, - oov=Base.OOV, - sep='|', # To be able to distinguish between 2/3 letters codes. - add_blank_at="last_but_one", - pad_with_space=False, - improved_version_g2p=False, - phoneme_dict_path=None, - ): - labels = [] - self.space, labels = len(labels), labels + [space] # Space - - if silence is not None: - self.silence, labels = len(labels), labels + [silence] # Silence - - labels.extend(self.CONSONANTS) - vowels = list(self.VOWELS) - - if stresses: - vowels = [f'{p}{s}' for p, s in itertools.product(vowels, (0, 1, 2))] - labels.extend(vowels) - - if chars: - labels.extend(string.ascii_lowercase) - - if apostrophe: - labels.append("'") # Apostrophe - - if punct: - labels.extend(self.PUNCT) - - super().__init__(labels, oov=oov, sep=sep, add_blank_at=add_blank_at) - - self.punct = punct - self.stresses = stresses - self.spaces = spaces - self.pad_with_space = pad_with_space - - # g2p_en tries to run download_corpora() on import but it is not rank zero guarded - # Try to check if torch distributed is available, if not get global rank zero to download corpora and make - # all other ranks sleep for a minute - if torch.distributed.is_available() and torch.distributed.is_initialized(): - group = torch.distributed.group.WORLD - if is_global_rank_zero(): - download_corpora() - torch.distributed.barrier(group=group) - elif is_global_rank_zero(): - logging.error( - f"Torch distributed needs to be initialized before you initialized {self}. This class is prone to " - "data access race conditions. Now downloading corpora from global rank 0. If other ranks pass this " - "before rank 0, errors might result." - ) - download_corpora() - else: - logging.error( - f"Torch distributed needs to be initialized before you initialized {self}. This class is prone to " - "data access race conditions. This process is not rank 0, and now going to sleep for 1 min. If this " - "rank wakes from sleep prior to rank 0 finishing downloading, errors might result." - ) - time.sleep(60) - - import g2p_en # noqa pylint: disable=import-outside-toplevel - - _g2p = g2p_en.G2p() - _g2p.variables = None - - if improved_version_g2p: - self.g2p = G2p(_g2p, phoneme_dict_path) - else: - self.g2p = _g2p - - def encode(self, text): - """See base class.""" - ps, space, labels = [], self.labels[self.space], set(self.labels) - - for p in self.g2p(text): # noqa - # Remove stress - if p.isalnum() and len(p) == 3 and not self.stresses: - p = p[:2] - - # Add space if last one isn't one - if p == space and len(ps) > 0 and ps[-1] != space: - ps.append(p) - - # Add next phoneme - if (p.isalnum() or p == "'") and p in labels: - ps.append(p) - - # Add punct and remove space if needed - if (p in self.PUNCT) and self.punct: - if not self.spaces and len(ps) > 0 and ps[-1] == space: - ps.pop() - ps.append(p) - - # Remove trailing spaces - while ps[-1] == space: - ps.pop() - - if self.pad_with_space: - ps = [space] + ps + [space] - - return [self._label2id[p] for p in ps] - - @contextmanager - def set_phone_prob(self, prob=None): - # Add do nothing since this class doesn't support mixed g2p - yield From b76dc7e326c1d69eff39320dccf5075fddb6a5c0 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Wed, 7 Dec 2022 16:57:20 -0800 Subject: [PATCH 005/154] add initial NFA code Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 48 ++++ tools/nemo_forced_aligner/align.py | 161 ++++++++++++ tools/nemo_forced_aligner/requirements.txt | 1 + tools/nemo_forced_aligner/utils.py | 292 +++++++++++++++++++++ 4 files changed, 502 insertions(+) create mode 100644 tools/nemo_forced_aligner/README.md create mode 100644 tools/nemo_forced_aligner/align.py create mode 100644 tools/nemo_forced_aligner/requirements.txt create mode 100644 tools/nemo_forced_aligner/utils.py diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md new file mode 100644 index 000000000000..44dce0edef8e --- /dev/null +++ b/tools/nemo_forced_aligner/README.md @@ -0,0 +1,48 @@ +# NeMo Forced Aligner (NFA) + +A tool for doing Forced Alignment using Viterbi decoding of NeMo CTC-based models. + +## How do I use NeMo Forced Aligner? +To use NFA, all you need to provide is a correct NeMo manifest (with 'audio_filepath' and 'text' fields). + +Call the align function in align.py, specifying the parameters as follows: + +* `model_name`: Any Quartznet, Citrinet, Conformer CTC model should work, in any language (only English has been tested so far). You can provide a pretrained model name or a path to a saved '.nemo' file. + +> Note: If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely OOM. + +* `model_downsample_factor`: should be 2 if your model is QuartzNet, 4 if it is Conformer CTC, 8 if it is Citrinet. + +* `manifest_filepath`: The path to the manifest of the data you want to align. + +* `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called /.ctm. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `utt_id_extractor_func`. + +* `grouping_for_ctm`: A string, either 'word' or 'basetoken'. 'basetoken' can mean either a token or character, depending on the model used for alignment. If you select 'basetoken' - the code will output a CTM with alignments at the token/character level. If you select 'word' - the code will group the tokens/characters into words. + +* [OPTIONAL] `utt_id_extractor_func`: The function mapping the audio_filepath to the utt_id that will be used to save CTM files. + +* `audio_sr`: The sample rate of your audio + +* `device`: The device where you want your ASR Model to be loaded into. + +* `batch_size`: The batch_size you wish to use for transcription and alignment. + + +# Example script + +```python +from align import align + +if __name__ == "__main__": + align( + model_name="stt_en_citrinet_1024_gamma_0_25", + model_downsample_factor=8, + manifest_filepath=, + output_ctm_folder=, + grouping_for_ctm="word", # either "word" or "basetoken" + audio_sr=16000, # the sampling rate of your audio files + device="cuda:0", # the device that the ASR model will be loaded into + batch_size=1, + ) + +``` \ No newline at end of file diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py new file mode 100644 index 000000000000..2e5883cf540a --- /dev/null +++ b/tools/nemo_forced_aligner/align.py @@ -0,0 +1,161 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from pathlib import Path + +import torch +from nemo.collections.asr.models import ASRModel + +from utils import ( + get_manifest_lines, + get_log_probs_y_T_U, + make_basetoken_ctm, + make_word_ctm, +) + +V_NEG_NUM = -1e30 + + +def viterbi_decoding(log_probs, y, T, U): + """ + Does Viterbi decoding. + Returns: + alignments: list of lists containing locations for the tokens we align to at each timestep + looks like: [[0, 0, 1, 2, 2, 3, 3, ... ], [0, 1, 2, 2, 2, 3, 4, ....], ...] + v_matrix: + tensor of shape (Batch, Time, Length_of_U_including_interspersed_blanks) + containing viterbi probabilities + This is returned in case you want to examine it in a parent function + log_probs_reordered: + log_probs reordered to match the order of ground truth tokens + tensor of shape (Batch, Time, Length_of_U_including_interspersed_blanks) + This is returned in case you want to examine it in a parent function + """ + B, T_max, V = log_probs.shape + U_max = y.shape[1] + + padding_for_log_probs = V_NEG_NUM * torch.ones((B, T_max, 1)) + log_probs_padded = torch.cat((log_probs, padding_for_log_probs), dim=2) + log_probs_reordered = torch.gather(input=log_probs_padded, dim=2, index=y.unsqueeze(1).repeat(1, T_max, 1)) + log_probs_reordered = log_probs_reordered.cpu() # TODO: do alignment on GPU if available + + v_matrix = V_NEG_NUM * torch.ones_like(log_probs_reordered) + backpointers = -999 * torch.ones_like(v_matrix) + v_matrix[:, 0, :2] = log_probs_reordered[:, 0, :2] + + y_shifted_left = torch.roll(y, shifts=2, dims=1) + letter_repetition_mask = y - y_shifted_left + letter_repetition_mask[:, :2] = 1 # make sure dont apply mask to first 2 tokens + letter_repetition_mask = letter_repetition_mask == 0 + + for t in range(1, T_max): + + e_current = log_probs_reordered[:, t, :] + + v_prev = v_matrix[:, t - 1, :] + v_prev_shifted = torch.roll(v_prev, shifts=1, dims=1) + v_prev_shifted[:, 0] = V_NEG_NUM + + v_prev_shifted2 = torch.roll(v_prev, shifts=2, dims=1) + v_prev_shifted2[:, :2] = V_NEG_NUM + v_prev_shifted2 = v_prev_shifted2.masked_fill(letter_repetition_mask, V_NEG_NUM) # TODO: do in-place instead? + + v_prev_dup = torch.cat( + (v_prev.unsqueeze(2), v_prev_shifted.unsqueeze(2), v_prev_shifted2.unsqueeze(2),), dim=2, + ) + + candidates_v_current = v_prev_dup + e_current.unsqueeze(2) + v_current, bp_relative = torch.max(candidates_v_current, dim=2) + + bp_absolute = torch.arange(U_max).unsqueeze(0).repeat(B, 1) - bp_relative + + v_matrix[:, t, :] = v_current + backpointers[:, t, :] = bp_absolute + + # trace backpointers TODO: parallelize over batch_size + alignments = [] + for b in range(B): + T_b = int(T[b]) + U_b = int(U[b]) + + final_state = int(torch.argmax(v_matrix[b, T_b - 1, U_b - 2 : U_b])) + U_b - 2 + alignment_b = [final_state] + for t in range(T_b - 1, 0, -1): + alignment_b.insert(0, int(backpointers[b, t, alignment_b[0]])) + alignments.append(alignment_b) + + return alignments, v_matrix, log_probs_reordered + + +def align_batch(data, model): + log_probs, y, T, U = get_log_probs_y_T_U(data, model) + alignments, v_matrix, log_probs_reordered = viterbi_decoding(log_probs, y, T, U) + + return alignments, v_matrix, log_probs_reordered + + +def align( + manifest_filepath, + model_name, + model_downsample_factor, + output_ctm_folder, + grouping_for_ctm, + utt_id_extractor_func=lambda fp : Path(fp).resolve().stem, + audio_sr=16000, # TODO: get audio SR automatically + device="cuda:0", + batch_size=1, +): + """ + Function that does alignment of utterances in `manifest_filepath`. + Results are saved in ctm files in `output_ctm_folder`. + Returns the most recent alignment, v_matrix, log_probs_reordered, in case you want to inspect them + in a parent function. + + TODO: add check that Model is CTC/CTCBPE model + """ + + # load model + device = torch.device(device) + if ".nemo" in model_name: + model = ASRModel.restore_from(model_name, map_location=device) + else: + model = ASRModel.from_pretrained(model_name, map_location=device) + + # define start and end line IDs of batches + num_lines_in_manifest = sum(1 for _ in open(manifest_filepath)) + starts = [x for x in range(0, num_lines_in_manifest, batch_size)] + ends = [x - 1 for x in starts] + ends.pop(0) + ends.append(num_lines_in_manifest) + + # get alignment and save in CTM batch-by-batch + for start, end in zip(starts, ends): + data = get_manifest_lines(manifest_filepath, start, end) + + alignments, v_matrix, log_probs_reordered = align_batch(data, model) + + if grouping_for_ctm == "basetoken": + make_basetoken_ctm( + data, alignments, model, model_downsample_factor, output_ctm_folder, utt_id_extractor_func, audio_sr, + ) + + elif grouping_for_ctm == "word": + make_word_ctm( + data, alignments, model, model_downsample_factor, output_ctm_folder, utt_id_extractor_func, audio_sr, + ) + + else: + raise ValueError(f"Unexpected value for grouping_for_ctm: {grouping_for_ctm}") + + return alignments, v_matrix, log_probs_reordered diff --git a/tools/nemo_forced_aligner/requirements.txt b/tools/nemo_forced_aligner/requirements.txt new file mode 100644 index 000000000000..84e0b7dbae6d --- /dev/null +++ b/tools/nemo_forced_aligner/requirements.txt @@ -0,0 +1 @@ +nemo diff --git a/tools/nemo_forced_aligner/utils.py b/tools/nemo_forced_aligner/utils.py new file mode 100644 index 000000000000..52360d5a6593 --- /dev/null +++ b/tools/nemo_forced_aligner/utils.py @@ -0,0 +1,292 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import json +import os +import torch + +V_NEG_NUM = -1e30 + + +def get_manifest_lines(manifest_filepath, start, end): + data = [] + with open(manifest_filepath, "r") as f: + for line_i, line in enumerate(f): + if line_i == start and line_i == end: + data.append(json.loads(line)) + break + + if line_i == end: + break + if line_i >= start: + data.append(json.loads(line)) + return data + + +def tokenize_text(model, text): + """ Works for token-based and character-based models """ + if hasattr(model, "tokenizer"): + return model.tokenizer.text_to_ids(text) + + elif hasattr(model.decoder, "vocabulary"): + # i.e. assume it is character-based model (in theory could be TalkNet - that will not work) + tokens = [] + for character in text: + if character in model.decoder.vocabulary: + tokens.append(model.decoder.vocabulary.index(character)) + else: + tokens.append(len(model.decoder.vocabulary)) # return unk token (same as blank token) + + return tokens + + else: + raise RuntimeError("Can't tokenize this model atm") + + +def get_log_probs_y_T_U(data, model): + """ + Preparing some tensors to be used during Viterbi decoding. + Returns: + log_probs, y, T, U_dash (ie. U with every other token being a blank) + """ + + audio_filepaths = [line["audio_filepath"] for line in data] + B = len(audio_filepaths) + + with torch.no_grad(): + hypotheses = model.transcribe(audio_filepaths, return_hypotheses=True, batch_size=B) + + log_probs_list = [] + T_list = [] + for hypothesis in hypotheses: + log_probs_list.append(hypothesis.y_sequence) + T_list.append(hypothesis.y_sequence.shape[0]) + + y_list = [] + U_list = [] + for line in data: + y_line = tokenize_text(model, line["text"]) + y_list.append(y_line) + U_list.append(len(y_line)) + + T_max = max(T_list) + U_max = max(U_list) + blank_index = len(model.decoder.vocabulary) + V = len(model.decoder.vocabulary) + 1 + T = torch.tensor(T_list) + + # make log_probs tensor of shape (B x T_max x V) + log_probs = V_NEG_NUM * torch.ones((B, T_max, V)) + for b, log_probs_b in enumerate(log_probs_list): + t = log_probs_b.shape[0] + log_probs[b, :t, :] = log_probs_b + + # make y tensor of shape (B x U_dash_max) + U_dash_max = 2 * U_max + 1 + y = V * torch.ones((B, U_dash_max), dtype=torch.int64) + U_dash_list = [] + for b, y_b in enumerate(y_list): + y_dash_b = [blank_index] + for y_b_element in y_b: + y_dash_b.append(y_b_element) + y_dash_b.append(blank_index) + U_dash = len(y_dash_b) + y[b, :U_dash] = torch.tensor(y_dash_b) + U_dash_list.append(U_dash) + + U_dash = torch.tensor(U_dash_list) + + return log_probs, y, T, U_dash + +def make_basetoken_ctm( + data, alignments, model, model_downsample_factor, output_ctm_folder, utt_id_extractor_func, audio_sr, +): + """ + Note: assume order of utts in data matches order of utts in alignments + """ + assert len(data) == len(alignments) + + os.makedirs(output_ctm_folder, exist_ok=True) + + blank_index = len(model.decoder.vocabulary) + spectrogram_hop_length = model.preprocessor.featurizer.hop_length + timestep_to_sample_ratio = spectrogram_hop_length * model_downsample_factor + + for manifest_line, alignment in zip(data, alignments): + + # alignments currently list into loc in sentence. + # we want to get token_ids for utterance + token_ids = tokenize_text(model, manifest_line["text"]) + u_list = [blank_index] + for token_id in token_ids: + u_list.extend([token_id, blank_index]) + + # make 'basetokens_info' - a list of dictionaries: + # e.g. [ + # {"basetoken": "", "u": 0, "t_start": 0, "t_end", 1 }, + # {"basetoken": "_sh", "u": 0, "t_start": 2, "t_end", 4 }, + # ... + # ] + basetokens_info = [] + u = 0 + for t, u in enumerate(alignment): + alignment_token_in_vocab = u_list[u] + if alignment_token_in_vocab == blank_index: + alignment_token_as_string = "" + else: + alignment_token_as_string = model.decoder.vocabulary[alignment_token_in_vocab] + if alignment_token_as_string == " ": + alignment_token_as_string = "" + + if t == 0: + basetokens_info.append( + {"basetoken": alignment_token_as_string, "u": u, "t_start": t,} + ) + + else: + if u == basetokens_info[-1]["u"]: + pass + else: + basetokens_info[-1]["t_end"] = t - 1 + basetokens_info.append( + {"basetoken": alignment_token_as_string, "u": u, "t_start": t,} + ) + + basetokens_info[-1]["t_end"] = len(alignment) - 1 + + utt_id = utt_id_extractor_func(manifest_line["audio_filepath"]) + + with open(os.path.join(output_ctm_folder, f"{utt_id}.ctm"), "w") as f_ctm: + for basetoken_info in basetokens_info: + basetoken = basetoken_info["basetoken"] + start_sample = basetoken_info["t_start"] * timestep_to_sample_ratio + end_sample = (basetoken_info["t_end"] + 1) * timestep_to_sample_ratio - 1 + + start_time = start_sample / audio_sr + end_time = end_sample / audio_sr + + f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {basetoken}\n") + + return None + + +def make_word_ctm( + data, alignments, model, model_downsample_factor, output_ctm_folder, utt_id_extractor_func, audio_sr, +): + """ + Note: assume order of utts in data matches order of utts in alignments + """ + assert len(data) == len(alignments) + + os.makedirs(output_ctm_folder, exist_ok=True) + + spectrogram_hop_length = model.preprocessor.featurizer.hop_length + timestep_to_sample_ratio = spectrogram_hop_length * model_downsample_factor + + for manifest_line, alignment in zip(data, alignments): + # make 'words_info' - a list of dictionaries: + # e.g. [ + # {"word": "h#", "u_start": 0, "u_end": 0, "t_start": 0, "t_end", 0 }, + # {"word": "she", "u_start": 1, "u_end": 5, "t_start": 1, "t_end", 5 }, + # {"word": "had", "u_start": 7, "u_end": 10, "t_start": 6, "t_end", 8 } + # ... + # ] + words_info = [{"word": "h#", "u_start": 0, "u_end": 0, "t_start": 0, "t_end": None}] + u_counter = 1 + if " " not in model.decoder.vocabulary: + for word in manifest_line["text"].split(" "): + word_info = { + "word": word, + "u_start": u_counter, + "u_end": None, + "t_start": None, + "t_end": None, + } + word_tokens = tokenize_text(model, word) + word_info["u_end"] = word_info["u_start"] + 2 * len(word_tokens) - 2 + u_counter += 2 * len(word_tokens) + + words_info.append(word_info) + + else: + + for word_i, word in enumerate(manifest_line["text"].split(" ")): + word_info = { + "word": word, + "u_start": u_counter, + "u_end": None, + "t_start": None, + "t_end": None, + } + word_tokens = tokenize_text(model, word) + + word_info["u_end"] = word_info["u_start"] + 2 * len(word_tokens) - 2 + u_counter += 2 * len(word_tokens) + + words_info.append(word_info) + + if word_i < len(manifest_line["text"].split(" ")) - 1: + # add the space after every word except the final word + word_info = { + "word": "", + "u_start": u_counter, + "u_end": None, + "t_start": None, + "t_end": None, + } + word_tokens = tokenize_text(model, " ") + + word_info["u_end"] = word_info["u_start"] + 2 * len(word_tokens) - 1 + u_counter += 2 * len(word_tokens) + + words_info.append(word_info) + + words_info_pointer = 0 + for t, u_at_t in enumerate(alignment): + if words_info_pointer < len(words_info): + if u_at_t == words_info[words_info_pointer]["u_start"]: + if words_info[words_info_pointer]["t_start"] is None: + words_info[words_info_pointer]["t_start"] = t + + if u_at_t > words_info[words_info_pointer]["u_end"]: + if words_info[words_info_pointer]["t_end"] is None: + words_info[words_info_pointer]["t_end"] = t - 1 + + words_info_pointer += 1 + + if words_info_pointer < len(words_info): + if u_at_t == words_info[words_info_pointer]["u_start"]: + if words_info[words_info_pointer]["t_start"] is None: + words_info[words_info_pointer]["t_start"] = t + + # don't forget the final t_end as our for-loop might miss it + if alignment[-1] == words_info[-1]["u_end"]: + words_info[-1]["t_end"] = len(alignment) - 1 + + utt_id = utt_id_extractor_func(manifest_line["audio_filepath"]) + + with open(os.path.join(output_ctm_folder, f"{utt_id}.ctm"), "w") as f_ctm: + for word_info in words_info: + if not (word_info["word"] == "initial_silence" and word_info["t_end"] == None): + word = word_info["word"] + # print('word_info', word_info) + start_sample = word_info["t_start"] * timestep_to_sample_ratio + end_sample = (word_info["t_end"] + 1) * timestep_to_sample_ratio - 1 + + start_time = start_sample / audio_sr + end_time = end_sample / audio_sr + + f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {word}\n") + + return None From d397c22ffceb9ac7a1c827dbea2e7d5ae6dc3297 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu, 8 Dec 2022 01:05:17 +0000 Subject: [PATCH 006/154] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 13 ++++--------- tools/nemo_forced_aligner/utils.py | 6 ++++-- 2 files changed, 8 insertions(+), 11 deletions(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 2e5883cf540a..501fda0b0e06 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -15,14 +15,9 @@ from pathlib import Path import torch -from nemo.collections.asr.models import ASRModel +from utils import get_log_probs_y_T_U, get_manifest_lines, make_basetoken_ctm, make_word_ctm -from utils import ( - get_manifest_lines, - get_log_probs_y_T_U, - make_basetoken_ctm, - make_word_ctm, -) +from nemo.collections.asr.models import ASRModel V_NEG_NUM = -1e30 @@ -48,7 +43,7 @@ def viterbi_decoding(log_probs, y, T, U): padding_for_log_probs = V_NEG_NUM * torch.ones((B, T_max, 1)) log_probs_padded = torch.cat((log_probs, padding_for_log_probs), dim=2) log_probs_reordered = torch.gather(input=log_probs_padded, dim=2, index=y.unsqueeze(1).repeat(1, T_max, 1)) - log_probs_reordered = log_probs_reordered.cpu() # TODO: do alignment on GPU if available + log_probs_reordered = log_probs_reordered.cpu() # TODO: do alignment on GPU if available v_matrix = V_NEG_NUM * torch.ones_like(log_probs_reordered) backpointers = -999 * torch.ones_like(v_matrix) @@ -111,7 +106,7 @@ def align( model_downsample_factor, output_ctm_folder, grouping_for_ctm, - utt_id_extractor_func=lambda fp : Path(fp).resolve().stem, + utt_id_extractor_func=lambda fp: Path(fp).resolve().stem, audio_sr=16000, # TODO: get audio SR automatically device="cuda:0", batch_size=1, diff --git a/tools/nemo_forced_aligner/utils.py b/tools/nemo_forced_aligner/utils.py index 52360d5a6593..349dade11458 100644 --- a/tools/nemo_forced_aligner/utils.py +++ b/tools/nemo_forced_aligner/utils.py @@ -14,6 +14,7 @@ import json import os + import torch V_NEG_NUM = -1e30 @@ -109,6 +110,7 @@ def get_log_probs_y_T_U(data, model): return log_probs, y, T, U_dash + def make_basetoken_ctm( data, alignments, model, model_downsample_factor, output_ctm_folder, utt_id_extractor_func, audio_sr, ): @@ -223,7 +225,7 @@ def make_word_ctm( for word_i, word in enumerate(manifest_line["text"].split(" ")): word_info = { - "word": word, + "word": word, "u_start": u_counter, "u_end": None, "t_start": None, @@ -239,7 +241,7 @@ def make_word_ctm( if word_i < len(manifest_line["text"].split(" ")) - 1: # add the space after every word except the final word word_info = { - "word": "", + "word": "", "u_start": u_counter, "u_end": None, "t_start": None, From c39bd56688dc8207234699f4af2dc1e1dfecbb28 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Thu, 8 Dec 2022 16:47:53 -0800 Subject: [PATCH 007/154] Make use of the specified device during viterbi decoding Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 12 +++++++----- tools/nemo_forced_aligner/utils.py | 6 ++++++ 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 501fda0b0e06..95a554ce1a39 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -16,7 +16,6 @@ import torch from utils import get_log_probs_y_T_U, get_manifest_lines, make_basetoken_ctm, make_word_ctm - from nemo.collections.asr.models import ASRModel V_NEG_NUM = -1e30 @@ -40,10 +39,11 @@ def viterbi_decoding(log_probs, y, T, U): B, T_max, V = log_probs.shape U_max = y.shape[1] - padding_for_log_probs = V_NEG_NUM * torch.ones((B, T_max, 1)) + device = log_probs.device + + padding_for_log_probs = V_NEG_NUM * torch.ones((B, T_max, 1), device=device) log_probs_padded = torch.cat((log_probs, padding_for_log_probs), dim=2) log_probs_reordered = torch.gather(input=log_probs_padded, dim=2, index=y.unsqueeze(1).repeat(1, T_max, 1)) - log_probs_reordered = log_probs_reordered.cpu() # TODO: do alignment on GPU if available v_matrix = V_NEG_NUM * torch.ones_like(log_probs_reordered) backpointers = -999 * torch.ones_like(v_matrix) @@ -54,6 +54,8 @@ def viterbi_decoding(log_probs, y, T, U): letter_repetition_mask[:, :2] = 1 # make sure dont apply mask to first 2 tokens letter_repetition_mask = letter_repetition_mask == 0 + bp_absolute_template = torch.arange(U_max, device=device).unsqueeze(0).repeat(B, 1) + for t in range(1, T_max): e_current = log_probs_reordered[:, t, :] @@ -64,7 +66,7 @@ def viterbi_decoding(log_probs, y, T, U): v_prev_shifted2 = torch.roll(v_prev, shifts=2, dims=1) v_prev_shifted2[:, :2] = V_NEG_NUM - v_prev_shifted2 = v_prev_shifted2.masked_fill(letter_repetition_mask, V_NEG_NUM) # TODO: do in-place instead? + v_prev_shifted2.masked_fill_(letter_repetition_mask, V_NEG_NUM) v_prev_dup = torch.cat( (v_prev.unsqueeze(2), v_prev_shifted.unsqueeze(2), v_prev_shifted2.unsqueeze(2),), dim=2, @@ -73,7 +75,7 @@ def viterbi_decoding(log_probs, y, T, U): candidates_v_current = v_prev_dup + e_current.unsqueeze(2) v_current, bp_relative = torch.max(candidates_v_current, dim=2) - bp_absolute = torch.arange(U_max).unsqueeze(0).repeat(B, 1) - bp_relative + bp_absolute = bp_absolute_template - bp_relative v_matrix[:, t, :] = v_current backpointers[:, t, :] = bp_absolute diff --git a/tools/nemo_forced_aligner/utils.py b/tools/nemo_forced_aligner/utils.py index 349dade11458..7ede57f401f1 100644 --- a/tools/nemo_forced_aligner/utils.py +++ b/tools/nemo_forced_aligner/utils.py @@ -108,6 +108,12 @@ def get_log_probs_y_T_U(data, model): U_dash = torch.tensor(U_dash_list) + # transfer all tensors to device + log_probs = log_probs.to(model.device) + y = y.to(model.device) + T = T.to(model.device) + U_dash = U_dash.to(model.device) + return log_probs, y, T, U_dash From e4708a74161e2a1302a00695c64c905b3e6baaf9 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 9 Dec 2022 00:49:26 +0000 Subject: [PATCH 008/154] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 95a554ce1a39..c7caddeea39f 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -16,6 +16,7 @@ import torch from utils import get_log_probs_y_T_U, get_manifest_lines, make_basetoken_ctm, make_word_ctm + from nemo.collections.asr.models import ASRModel V_NEG_NUM = -1e30 From b5a200c198caa19e8e093e5f2a5fedbf265fa9fe Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 9 Dec 2022 11:08:31 -0800 Subject: [PATCH 009/154] Fix CodeQL notes Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/utils.py | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/nemo_forced_aligner/utils.py b/tools/nemo_forced_aligner/utils.py index 7ede57f401f1..254f6b1ce53d 100644 --- a/tools/nemo_forced_aligner/utils.py +++ b/tools/nemo_forced_aligner/utils.py @@ -147,7 +147,6 @@ def make_basetoken_ctm( # ... # ] basetokens_info = [] - u = 0 for t, u in enumerate(alignment): alignment_token_in_vocab = u_list[u] if alignment_token_in_vocab == blank_index: @@ -286,7 +285,7 @@ def make_word_ctm( with open(os.path.join(output_ctm_folder, f"{utt_id}.ctm"), "w") as f_ctm: for word_info in words_info: - if not (word_info["word"] == "initial_silence" and word_info["t_end"] == None): + if not (word_info["word"] == "initial_silence" and word_info["t_end"] is None): word = word_info["word"] # print('word_info', word_info) start_sample = word_info["t_start"] * timestep_to_sample_ratio From 0ec10938e8961584ae432b8a70753b3bb3829b5a Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 9 Dec 2022 11:29:16 -0800 Subject: [PATCH 010/154] Fix CodeQL warning Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index c7caddeea39f..61ca1ef1d1ec 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -131,7 +131,9 @@ def align( model = ASRModel.from_pretrained(model_name, map_location=device) # define start and end line IDs of batches - num_lines_in_manifest = sum(1 for _ in open(manifest_filepath)) + with open(manifest_filepath, 'r') as f: + num_lines_in_manifest = sum(1 for _ in f) + starts = [x for x in range(0, num_lines_in_manifest, batch_size)] ends = [x - 1 for x in starts] ends.pop(0) From 53b74670ddec16e54b4b913154462b48dd348a98 Mon Sep 17 00:00:00 2001 From: anteju <108555623+anteju@users.noreply.github.com> Date: Fri, 9 Dec 2022 13:14:12 -0800 Subject: [PATCH 011/154] Add an option to defer data setup from ``__init__`` to ``setup`` (#5569) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Add an option to defer dataloader setup from __init__ to setup Signed-off-by: Ante Jukić * Updated doc Signed-off-by: Ante Jukić Signed-off-by: Ante Jukić Signed-off-by: Elena Rastorgueva --- docs/source/asr/configs.rst | 14 +++++ .../asr/speaker_recognition/configs.rst | 6 +-- nemo/core/classes/modelPT.py | 54 +++++++++++++++++-- 3 files changed, 68 insertions(+), 6 deletions(-) diff --git a/docs/source/asr/configs.rst b/docs/source/asr/configs.rst index cfc5e975d26d..21d101a0a7c7 100644 --- a/docs/source/asr/configs.rst +++ b/docs/source/asr/configs.rst @@ -61,6 +61,20 @@ An example ASR train and validation configuration should look similar to the fol num_workers: 8 pin_memory: true +By default, dataloaders are set up when the model is instantiated. However, dataloader setup can be deferred to +model's `setup()` method by setting ``defer_setup`` in the configuration. + +For example, training data setup can be deferred as follows: + +.. code-block:: yaml + + model: + train_ds: + # Configure training data as usual + ... + # Defer train dataloader setup from `__init__` to `setup` + defer_setup: true + Preprocessor Configuration -------------------------- diff --git a/docs/source/asr/speaker_recognition/configs.rst b/docs/source/asr/speaker_recognition/configs.rst index d31cae4ac589..a6fcf6f7582d 100644 --- a/docs/source/asr/speaker_recognition/configs.rst +++ b/docs/source/asr/speaker_recognition/configs.rst @@ -1,11 +1,11 @@ -NeMo ASR Configuration Files -============================ +NeMo Speaker Recognition Configuration Files +============================================ This page covers NeMo configuration file setup that is specific to speaker recognition models. For general information about how to set up and run experiments that is common to all NeMo models (e.g. experiment manager and PyTorch Lightning trainer parameters), see the :doc:`../../core/core` page. -The model section of NeMo ASR configuration files will generally require information about the dataset(s) being +The model section of NeMo speaker recognition configuration files will generally require information about the dataset(s) being used, the preprocessor for audio files, parameters for any augmentation being performed, as well as the model architecture specification. The sections on this page cover each of these in more detail. diff --git a/nemo/core/classes/modelPT.py b/nemo/core/classes/modelPT.py index 1cb19e6d17de..9a1444bc1137 100644 --- a/nemo/core/classes/modelPT.py +++ b/nemo/core/classes/modelPT.py @@ -128,13 +128,27 @@ def __init__(self, cfg: DictConfig, trainer: Trainer = None): app_state.device_id = torch.cuda.current_device() if self._cfg is not None and not self._is_model_being_restored(): - if 'train_ds' in self._cfg and self._cfg.train_ds is not None: + # Setup data loaders now (default) or defer setup to `self.setup()` + # if `defer_setup` is set in the config of the corresponding dataloader. + if ( + 'train_ds' in self._cfg + and self._cfg.train_ds is not None + and not self._cfg.train_ds.get('defer_setup', False) + ): self.setup_training_data(self._cfg.train_ds) - if 'validation_ds' in self._cfg and self._cfg.validation_ds is not None: + if ( + 'validation_ds' in self._cfg + and self._cfg.validation_ds is not None + and not self._cfg.validation_ds.get('defer_setup', False) + ): self.setup_multiple_validation_data(val_data_config=cfg.validation_ds) - if 'test_ds' in self._cfg and self._cfg.test_ds is not None: + if ( + 'test_ds' in self._cfg + and self._cfg.test_ds is not None + and not self._cfg.test_ds.get('defer_setup', False) + ): self.setup_multiple_test_data(test_data_config=cfg.test_ds) else: @@ -684,6 +698,40 @@ def configure_optimizers(self): else: return [self._optimizer], [self._scheduler] + def setup(self, stage: Optional[str] = None): + """Called at the beginning of fit, validate, test, or predict. + This is called on every process when using DDP. + + Args: + stage: fit, validate, test or predict + """ + if stage == 'fit': + train_deferred_setup = ( + 'train_ds' in self._cfg + and self._cfg.train_ds is not None + and self._cfg.train_ds.get('defer_setup', False) + ) + if self.train_dataloader() is None and train_deferred_setup: + self.setup_training_data(self._cfg.train_ds) + + if stage in ('fit', 'validate'): + val_deferred_setup = ( + 'validation_ds' in self._cfg + and self._cfg.validation_ds is not None + and self._cfg.validation_ds.get('defer_setup', False) + ) + if self.val_dataloader() is None and val_deferred_setup: + self.setup_validation_data(self._cfg.validation_ds) + + if stage == 'test': + test_deferred_setup = ( + 'test_ds' in self._cfg + and self._cfg.test_ds is not None + and self._cfg.test_ds.get('defer_setup', False) + ) + if self.test_dataloader() is None and test_deferred_setup: + self.setup_test_data(self._cfg.test_ds) + def train_dataloader(self): if self._train_dl is not None: return self._train_dl From 303d7f530659e46d9e6133d7984b44860c1dd9a3 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 9 Dec 2022 13:47:56 -0800 Subject: [PATCH 012/154] Make utt_id specified by number of parts of audio_filepath user wishes to use Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 6 +++--- tools/nemo_forced_aligner/utils.py | 17 +++++++++++++---- 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 61ca1ef1d1ec..398c4791bab3 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -109,7 +109,7 @@ def align( model_downsample_factor, output_ctm_folder, grouping_for_ctm, - utt_id_extractor_func=lambda fp: Path(fp).resolve().stem, + n_parts_for_ctm_id=1, audio_sr=16000, # TODO: get audio SR automatically device="cuda:0", batch_size=1, @@ -147,12 +147,12 @@ def align( if grouping_for_ctm == "basetoken": make_basetoken_ctm( - data, alignments, model, model_downsample_factor, output_ctm_folder, utt_id_extractor_func, audio_sr, + data, alignments, model, model_downsample_factor, output_ctm_folder, n_parts_for_ctm_id, audio_sr, ) elif grouping_for_ctm == "word": make_word_ctm( - data, alignments, model, model_downsample_factor, output_ctm_folder, utt_id_extractor_func, audio_sr, + data, alignments, model, model_downsample_factor, output_ctm_folder, n_parts_for_ctm_id, audio_sr, ) else: diff --git a/tools/nemo_forced_aligner/utils.py b/tools/nemo_forced_aligner/utils.py index 254f6b1ce53d..65337c5a3c23 100644 --- a/tools/nemo_forced_aligner/utils.py +++ b/tools/nemo_forced_aligner/utils.py @@ -14,6 +14,7 @@ import json import os +from pathlib import Path import torch @@ -117,8 +118,14 @@ def get_log_probs_y_T_U(data, model): return log_probs, y, T, U_dash +def _get_utt_id(audio_filepath, n_parts_for_ctm_id): + fp_parts = Path(audio_filepath).parts[-n_parts_for_ctm_id:] + utt_id = Path("_".join(fp_parts)).stem + return utt_id + + def make_basetoken_ctm( - data, alignments, model, model_downsample_factor, output_ctm_folder, utt_id_extractor_func, audio_sr, + data, alignments, model, model_downsample_factor, output_ctm_folder, n_parts_for_ctm_id, audio_sr, ): """ Note: assume order of utts in data matches order of utts in alignments @@ -172,7 +179,8 @@ def make_basetoken_ctm( basetokens_info[-1]["t_end"] = len(alignment) - 1 - utt_id = utt_id_extractor_func(manifest_line["audio_filepath"]) + # get utt_id that will be used for saving CTM file as .ctm + utt_id = _get_utt_id(manifest_line['audio_filepath'], n_parts_for_ctm_id) with open(os.path.join(output_ctm_folder, f"{utt_id}.ctm"), "w") as f_ctm: for basetoken_info in basetokens_info: @@ -189,7 +197,7 @@ def make_basetoken_ctm( def make_word_ctm( - data, alignments, model, model_downsample_factor, output_ctm_folder, utt_id_extractor_func, audio_sr, + data, alignments, model, model_downsample_factor, output_ctm_folder, n_parts_for_ctm_id, audio_sr, ): """ Note: assume order of utts in data matches order of utts in alignments @@ -281,7 +289,8 @@ def make_word_ctm( if alignment[-1] == words_info[-1]["u_end"]: words_info[-1]["t_end"] = len(alignment) - 1 - utt_id = utt_id_extractor_func(manifest_line["audio_filepath"]) + # get utt_id that will be used for saving CTM file as .ctm + utt_id = _get_utt_id(manifest_line['audio_filepath'], n_parts_for_ctm_id) with open(os.path.join(output_ctm_folder, f"{utt_id}.ctm"), "w") as f_ctm: for word_info in words_info: From b8398f5bd2919c92850542d668ae754f6a50b66a Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 9 Dec 2022 13:49:05 -0800 Subject: [PATCH 013/154] remove audio_sr TODO - reduce risk of silent bugs Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 398c4791bab3..620f7551662b 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -110,7 +110,7 @@ def align( output_ctm_folder, grouping_for_ctm, n_parts_for_ctm_id=1, - audio_sr=16000, # TODO: get audio SR automatically + audio_sr=16000, device="cuda:0", batch_size=1, ): From db77d8f3b7cd3d251d7907eff7d578453b2b2aaa Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 9 Dec 2022 14:01:20 -0800 Subject: [PATCH 014/154] Add check that model is CTC Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 620f7551662b..8bddb574a507 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -18,6 +18,7 @@ from utils import get_log_probs_y_T_U, get_manifest_lines, make_basetoken_ctm, make_word_ctm from nemo.collections.asr.models import ASRModel +from nemo.collections.asr.models.ctc_models import EncDecCTCModel V_NEG_NUM = -1e30 @@ -119,8 +120,6 @@ def align( Results are saved in ctm files in `output_ctm_folder`. Returns the most recent alignment, v_matrix, log_probs_reordered, in case you want to inspect them in a parent function. - - TODO: add check that Model is CTC/CTCBPE model """ # load model @@ -130,6 +129,12 @@ def align( else: model = ASRModel.from_pretrained(model_name, map_location=device) + if not isinstance(model, EncDecCTCModel): + raise NotImplementedError( + f"Model {model_name} is not an instance of NeMo EncDecCTCModel." + " Currently only instances of EncDecCTCModels are supported" + ) + # define start and end line IDs of batches with open(manifest_filepath, 'r') as f: num_lines_in_manifest = sum(1 for _ in f) From 19cdbe4390475c141924eefa18e4cd0d9132ccd5 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 9 Dec 2022 14:31:06 -0800 Subject: [PATCH 015/154] Remove unused import Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 2 -- 1 file changed, 2 deletions(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 8bddb574a507..f5bf9ef4b8a8 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -12,8 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -from pathlib import Path - import torch from utils import get_log_probs_y_T_U, get_manifest_lines, make_basetoken_ctm, make_word_ctm From 5b8aaccbcfc39b30caca64d45b8dcd0a6496d466 Mon Sep 17 00:00:00 2001 From: Yi Dong <43824965+yidong72@users.noreply.github.com> Date: Fri, 9 Dec 2022 17:36:32 -0500 Subject: [PATCH 016/154] Text generation improvement (UI client, data parallel support) (#5437) * Squashed commit of the following: commit a5e124f34be31bd6eafe5e5fdf5bedcd0d50915c Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu Oct 13 15:07:42 2022 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 35b424044fe80c3081e7756ab21244f701716f7e Author: Yi Dong Date: Thu Oct 13 08:04:49 2022 -0700 get rid of base Signed-off-by: Yi Dong commit 2955210e2311791543538cfbb5ad26b79414c954 Merge: d52edef8c eaf6757ca Author: Yi Dong Date: Thu Oct 13 13:17:02 2022 +0000 Merge branch 'universal_prompt' of github.com:NVIDIA/NeMo into universal_prompt commit d52edef8cd7b36593838fb270047e80f8ccb652e Author: Yi Dong Date: Thu Oct 13 13:16:24 2022 +0000 align with main Signed-off-by: Yi Dong commit eaf6757ca5be8e099492f57c81d984429b0ad49c Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu Oct 13 13:12:11 2022 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit c4b86d97626ea0721bf8fb4c0a45dec5becc94c9 Author: Yi Dong Date: Thu Oct 13 13:10:58 2022 +0000 same as main Signed-off-by: Yi Dong commit e335de51bcc0d681c58b568c3d8c238bc5687c3b Merge: c231086e0 4463a9fe9 Author: Yi Dong Date: Thu Oct 13 13:08:09 2022 +0000 Merge branch 'main' into universal_prompt commit c231086e057f1efaa915f691d84664cb3d5aad85 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed Oct 12 19:59:12 2022 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 6a821a4b49a23dd3408a706a2a3dd393149b0bb1 Author: Yi Dong Date: Wed Oct 12 19:56:17 2022 +0000 default to pad Signed-off-by: Yi Dong commit 9d908e39fef1beed9ba2da4d1a6806161eb7ef25 Author: Yi Dong Date: Wed Oct 12 19:55:44 2022 +0000 add the option to pad the tokens Signed-off-by: Yi Dong commit 876dc395b43fdeeaa2bcbbe13c76523633764c33 Merge: fbb0f4035 fe3c77ee9 Author: Yi Dong Date: Wed Oct 12 19:20:47 2022 +0000 Merge branch 'fix_global_init' into universal_prompt commit fe3c77ee93ab6cf3ea152db68cb6beefcac2a392 Author: Yi Dong Date: Wed Oct 12 18:59:49 2022 +0000 fix import again Signed-off-by: Yi Dong commit fbb0f4035c6cd6bfefed50a20605503de8c1dccb Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed Oct 12 16:00:24 2022 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 372ca8c0d7988f2339b15888dc72aa21f4fb6937 Author: Yi Dong Date: Wed Oct 12 15:58:32 2022 +0000 enable server Signed-off-by: Yi Dong commit cbe05d9fbc978f812cfbb671f45f147f300713c4 Author: Yi Dong Date: Wed Oct 12 13:07:28 2022 +0000 fix comment error Signed-off-by: Yi Dong commit 1948048922e726ec6131e44b1a745389f18d4ef2 Merge: 232c2cce3 984f5c09a Author: Yi Dong Date: Wed Oct 12 13:05:30 2022 +0000 Merge branch 'fix_global_init' into universal_prompt commit 232c2cce34d7a8b902da406706f3dd9b39475091 Merge: 34c8a68df 658243fb6 Author: Yi Dong Date: Wed Oct 12 12:50:00 2022 +0000 Merge branch 'fix_global_init' into universal_prompt commit 984f5c09a6dbf1d1fb5aa30ed9b0df188e66a50f Merge: 658243fb6 3fda5de46 Author: Yi Dong <43824965+yidong72@users.noreply.github.com> Date: Wed Oct 12 08:42:11 2022 -0400 Merge branch 'main' into fix_global_init commit 658243fb6580191b5d60edd30cde16dcc23cbb85 Author: Yi Dong Date: Wed Oct 12 12:40:57 2022 +0000 fix import error Signed-off-by: Yi Dong commit 8e0fe1cad05ec288ec122b3cd0e139a96872e08c Author: Yi Dong Date: Tue Oct 11 22:44:12 2022 +0000 update the fused kernel Signed-off-by: Yi Dong commit 536cf6bef9447b75843fad630729c47a2fba35f3 Author: Yi Dong Date: Tue Oct 11 14:44:52 2022 -0700 add the missing file Signed-off-by: Yi Dong commit 1b437ec41dc5e354453ce0a089bca0171cbcb6c2 Author: Yi Dong Date: Tue Oct 11 14:43:14 2022 -0700 fix fused softmax Signed-off-by: Yi Dong commit 7813f60e05f9783af61f8c14ec1cb0c6c4f1f263 Author: Yi Dong Date: Tue Oct 11 14:16:48 2022 -0700 move global step to base Signed-off-by: Yi Dong commit 34c8a68df084b18d377e84415d9f07b2cd6673dd Author: Yi Dong Date: Thu Oct 6 13:50:11 2022 +0000 fix pipeline for eval Signed-off-by: Yi Dong commit eee5d38218f26660c3ffebe9f615c850c80a1f0d Author: Yi Dong Date: Thu Oct 6 13:48:22 2022 +0000 fix for pipleline parallel Signed-off-by: Yi Dong commit 323bca73e7ef6099ee79c0a2fffac7b709ed6c5d Merge: 125e49947 e3b4c4d1f Author: Yi Dong Date: Wed Oct 5 19:29:13 2022 +0000 Merge branch 'universal_prompt' of github.com:NVIDIA/NeMo into universal_prompt commit 125e4994760448ff75dd9328395813eda1c87547 Author: Yi Dong Date: Wed Oct 5 19:29:04 2022 +0000 add share option Signed-off-by: Yi Dong commit e3b4c4d1f7346c9fa596f3cca6d4df0a9e05c368 Author: Yi Dong Date: Wed Oct 5 11:43:48 2022 -0700 make sure consolidation works Signed-off-by: Yi Dong commit a5c833964ecf05dc460ca1da69275c4019742150 Merge: 2a07ab52d abcb74be2 Author: Yi Dong Date: Wed Oct 5 18:40:29 2022 +0000 Merge branch 'universal_prompt' of github.com:NVIDIA/NeMo into universal_prompt commit 2a07ab52d95f15ba666823028c69e23825666c05 Author: Yi Dong Date: Wed Oct 5 18:40:23 2022 +0000 added requirement Signed-off-by: Yi Dong commit 3abecd9dd1611993a87c537636abe7f7e6a9b04c Author: Yi Dong Date: Wed Oct 5 18:39:42 2022 +0000 added a simple web server Signed-off-by: Yi Dong commit abcb74be2caf1cdec40eb9ba2be4dde4d45a3b4b Author: Yi Dong Date: Wed Oct 5 06:54:12 2022 -0700 fix empty val loss Signed-off-by: Yi Dong commit b8eb92ac4a0d665570af75e34c9ba3c2e2420c26 Author: Yi Dong Date: Tue Oct 4 19:25:30 2022 -0700 text gen working Signed-off-by: Yi Dong commit d59f3e3f3a6fd19736d1c5706fed65a3dd4049ba Author: Yi Dong Date: Tue Oct 4 16:08:40 2022 -0700 first change Signed-off-by: Yi Dong commit 59d077585e6962a669b824af58f64e8a0bea6547 Author: Yi Dong Date: Tue Oct 4 15:00:40 2022 -0700 revert Signed-off-by: Yi Dong commit 12a0f3902d99e9179403644bd951c045df716ca7 Author: Yi Dong Date: Tue Oct 4 21:26:23 2022 +0000 init imp Signed-off-by: Yi Dong commit 62a15dfd943cc48be495ac61b9f2f00995775c5f Merge: 82c90d2cd e0cc6b767 Author: Yi Dong Date: Tue Oct 4 11:58:26 2022 -0700 Merge branch 'main' into universal_prompt commit 82c90d2cd0fd156f16a4b899f8c741d598f33990 Author: Yi Dong Date: Tue Oct 4 11:17:13 2022 -0700 add sync Signed-off-by: Yi Dong commit 9819b703eef877d90cd1257bf3610c69de9b4d7e Author: Yi Dong Date: Sun Oct 2 17:52:34 2022 -0700 fix save model Signed-off-by: root commit e4937e2fc5fb7d70754c97668416e4a69c3079fe Author: Yi Dong Date: Sat Oct 1 18:56:09 2022 +0000 working Signed-off-by: Yi Dong commit b73b06d1c7cf5417a6d87cb33d8ed83a57e38b7b Author: Yi Dong Date: Sat Oct 1 17:34:03 2022 +0000 calcuate the mask Signed-off-by: Yi Dong commit 9db3bc13eb65a94a475b837603351da68e3745bc Author: Yi Dong Date: Fri Sep 30 23:26:32 2022 +0000 fix bug in datasets Signed-off-by: Yi Dong commit f289900375d4412f53f8110be00fec6587627550 Author: Yi Dong Date: Fri Sep 30 22:29:40 2022 +0000 update the code Signed-off-by: Yi Dong commit 8e28a1f208aabaab72dbe769e72756baada04d99 Author: Yi Dong Date: Fri Sep 30 21:52:52 2022 +0000 added new ds Signed-off-by: Yi Dong commit 8d41315bab7ce90e200a8a7d1023c34f8e046897 Author: Yi Dong Date: Fri Sep 30 18:57:09 2022 +0000 added new files Signed-off-by: Yi Dong commit 984e0e94e15e16323c1ba1ca2efeabd84f69463f Merge: cbe8b7ab1 fa6cd8588 Author: Yi Dong Date: Thu Sep 29 21:43:29 2022 +0000 Merge branch 'llm-prompt-learning-improvements' into universal_prompt commit fa6cd858839277939446afe7275976078d54c512 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu Sep 29 16:47:30 2022 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 78ba46e5d6fde1be53c08e1e30a54cce59824be0 Merge: 7d6d46742 8d670bc77 Author: Virginia Adams <78445382+vadam5@users.noreply.github.com> Date: Thu Sep 29 09:43:27 2022 -0700 Merge branch 'main' into llm-prompt-learning-improvements commit 7d6d46742170a66758287a207d67e1b1bfd15613 Author: Virginia Adams Date: Thu Sep 29 16:42:43 2022 +0000 Removed inference step and added sentence peice check to predict step Signed-off-by: Virginia Adams commit 20fd265acd6f7f9912cf52155fe66ccfa6b201a2 Author: Virginia Adams Date: Thu Sep 29 15:26:32 2022 +0000 fixed first stage check for pipeline parallel T5 pt Signed-off-by: Virginia Adams commit 3637be2b258c8d9028856f9971edb7da4a8121f0 Merge: a3ea722fd 986a76612 Author: Virginia Adams <78445382+vadam5@users.noreply.github.com> Date: Wed Sep 28 10:23:30 2022 -0700 Merge branch 'main' into llm-prompt-learning-improvements commit a3ea722fdc12fbcc5989b76ef5643a574b763bc4 Merge: 770967a52 971485ce7 Author: Virginia Adams <78445382+vadam5@users.noreply.github.com> Date: Mon Sep 26 13:35:52 2022 -0700 Merge branch 'main' into llm-prompt-learning-improvements commit 770967a5251a474b6dcc2d44bf9a2076adbcb604 Merge: d23bf6c30 e3ac280a8 Author: Virginia Adams <78445382+vadam5@users.noreply.github.com> Date: Mon Sep 26 10:17:03 2022 -0700 Merge branch 'main' into llm-prompt-learning-improvements commit d23bf6c30acc0e3f6af9b4e24547669866a34d62 Merge: de6a31651 333d2b749 Author: Virginia Adams Date: Mon Sep 26 10:05:16 2022 -0700 Merge branch 'llm-prompt-learning-improvements' of https://github.com/NVIDIA/NeMo into llm-prompt-learning-improvements commit de6a31651e63d88a42b971794d93f18ff5a3cdff Author: Virginia Adams Date: Mon Sep 26 17:00:53 2022 +0000 Updated PP check to be on first stage pipeline only Signed-off-by: Virginia Adams commit 333d2b7498e6742ce66436f733c980a74616900c Merge: 592c0986a a39fc925a Author: Virginia Adams <78445382+vadam5@users.noreply.github.com> Date: Fri Sep 23 16:11:21 2022 -0700 Merge branch 'main' into llm-prompt-learning-improvements commit 592c0986a476a91b57b8605d7b70830d7acfa021 Author: Virginia Adams Date: Fri Sep 23 23:08:41 2022 +0000 Fixed unused import and CI test bug Signed-off-by: Virginia Adams commit ea9cd82d85638bc60ae4ad7ef105db931c8e3455 Merge: ce4b72c8c b566c2d0e Author: Virginia Adams Date: Fri Sep 23 18:57:25 2022 +0000 Merge branch 'llm-prompt-learning-improvements' of https://github.com/NVIDIA/NeMo into llm-prompt-learning-improvements commit ce4b72c8c52f32be336e323dd78a38089edc3e7c Author: Virginia Adams Date: Fri Sep 23 18:57:16 2022 +0000 Switch to import from base class Signed-off-by: Virginia Adams commit b566c2d0e35a068f758fd1310bc620a47be4590b Merge: 6621f2854 e872061ac Author: Virginia Adams <78445382+vadam5@users.noreply.github.com> Date: Fri Sep 23 10:09:03 2022 -0700 Merge branch 'main' into llm-prompt-learning-improvements commit 6621f28543828a48484a5637f6c9f3ccb23a5b02 Author: Virginia Adams Date: Wed Sep 14 20:47:35 2022 +0000 python format fix Signed-off-by: Virginia Adams commit 8deafc8987b6af5f7b99a250310f57a40198c37f Author: Virginia Adams Date: Wed Sep 14 20:28:02 2022 +0000 Save .nemo on new best val score Signed-off-by: Virginia Adams commit 761bd36969cb465d6a129e9eee6ce1f883d3cf41 Author: Virginia Adams Date: Wed Sep 14 18:03:19 2022 +0000 Added automatic checkpoint to nemo file method Signed-off-by: Virginia Adams commit 3be4ed57b6cd3ddfe4876d78650dfe8fe794598b Author: Virginia Adams Date: Wed Sep 14 02:11:56 2022 +0000 Make GPT use base prompt learning model class: Signed-off-by: Virginia Adams Signed-off-by: Yi Dong * fix LGTM Signed-off-by: Yi Dong * fix validation Signed-off-by: Yi Dong * change for the lm eval Signed-off-by: Yi Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * make text generation work in data parallel environment Signed-off-by: Yi Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * implement the service with rest service Signed-off-by: Yi Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * surpress log Signed-off-by: Yi Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy * Fix Signed-off-by: MaximumEntropy * Fixes Signed-off-by: MaximumEntropy * Update config Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore function needed for NMT Signed-off-by: MaximumEntropy * handles no answer only Signed-off-by: Yi Dong * Fix config Signed-off-by: MaximumEntropy * added knn to web Signed-off-by: Yi Dong * fix lgtm.com comments Signed-off-by: Yi Dong * output the retrieved context Signed-off-by: Yi Dong * allow no neighbor query Signed-off-by: Yi Dong * remove the imports Signed-off-by: Yi Dong * warn only once Signed-off-by: Yi Dong * Change output file format from JSON to JSONL Signed-off-by: MaximumEntropy * new t0 dataset Signed-off-by: Yi Dong * Add T0 data preproc scripts Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Merge and multiprocessing Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix for is_correct Signed-off-by: MaximumEntropy * fix epoch > 2 Signed-off-by: Yi Dong * handles multiple dataloader Signed-off-by: Yi Dong * remove template Signed-off-by: Yi Dong * Refactor T0 dataset Signed-off-by: MaximumEntropy * Add script to merge train folder into individual training files to minimize number of blends Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added on the fly service Signed-off-by: Yi Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add combo instance Signed-off-by: Yi Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added combo service Signed-off-by: Yi Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * send weights back to server Signed-off-by: Yi Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix index store Signed-off-by: Yi Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor changes Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add reset button Signed-off-by: Yi Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add add eos Signed-off-by: Yi Dong * use a seperate bert service Signed-off-by: Yi Dong * no loss of accuracy Signed-off-by: Yi Dong * pin the gradio version Signed-off-by: Yi Dong * Remove bin compat Signed-off-by: MaximumEntropy * Fix header lines Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * evaluate based on text generation Signed-off-by: Yi Dong * exact match result aggregation Signed-off-by: Yi Dong * working SP and SA Signed-off-by: Yi Dong * sync Signed-off-by: Yi Dong * fix checkpoint Signed-off-by: Yi Dong * fix eval Signed-off-by: Yi Dong * backup states Signed-off-by: Yi Dong * backup states reset Signed-off-by: Yi Dong * fix the bug Signed-off-by: Yi Dong * fix evaluation for sentence piece Signed-off-by: Yi Dong * fix a bug Signed-off-by: Yi Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * potential fix in the future Signed-off-by: Yi Dong * remove the universal codes Signed-off-by: Yi Dong * remove universal strategy Signed-off-by: Yi Dong * address reviewer comment Signed-off-by: Yi Dong Signed-off-by: Yi Dong Signed-off-by: MaximumEntropy Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: MaximumEntropy Co-authored-by: Oleksii Kuchaiev Signed-off-by: Elena Rastorgueva --- .../conf/megatron_gpt_inference.yaml | 7 +- .../conf/megatron_retro_inference.yaml | 29 +- .../language_modeling/megatron_gpt_eval.py | 5 + .../language_modeling/megatron_retro_eval.py | 78 ++- .../megatron_retrieval_model.py | 8 +- .../nlp/modules/common/megatron/mup/layer.py | 5 +- .../common/megatron/retrieval_service.py | 495 ++++++++++++++++-- .../common/megatron/retrieval_transformer.py | 4 + .../modules/common/megatron/transformer.py | 2 +- .../nlp/modules/common/megatron_web_server.py | 199 +++++++ .../modules/common/text_generation_server.py | 38 +- .../common/text_generation_strategy.py | 133 ++++- .../modules/common/text_generation_utils.py | 32 +- requirements/requirements_nlp.txt | 1 + .../build_retrieval_index.py | 16 +- 15 files changed, 968 insertions(+), 84 deletions(-) create mode 100644 nemo/collections/nlp/modules/common/megatron_web_server.py diff --git a/examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml b/examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml index f7c1a5c977a0..661dd7191496 100644 --- a/examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml +++ b/examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml @@ -28,6 +28,9 @@ hparams_file: null # model configuration file, only used for PTL checkpoint load prompts: # prompts for GPT inference - "Q: How are you?" - "Q: How big is the universe?" -server: False # whether launch the inference server +server: False # whether launch the API server port: 5555 # the port number for the inference server - \ No newline at end of file +web_server: False # whether launch the web inference server +share: False # whether create a public URL +username: test # user name for web client +password: test2 # password for web client \ No newline at end of file diff --git a/examples/nlp/language_modeling/conf/megatron_retro_inference.yaml b/examples/nlp/language_modeling/conf/megatron_retro_inference.yaml index 8c2f31915ed6..e3d8b08ba3c0 100644 --- a/examples/nlp/language_modeling/conf/megatron_retro_inference.yaml +++ b/examples/nlp/language_modeling/conf/megatron_retro_inference.yaml @@ -34,11 +34,28 @@ prompts: # prompts for RETRO model inference ########### Faiss service parameters ######## retrieval_service: - faiss_devices: '0,1,2' - faiss_index: null # the faiss index file that is used to find KNN - nprobe: 100 - retrieval_index: null - sentence_bert: 'all-mpnet-base-v2' - sentence_bert_batch: 4 neighbors: 4 frequent_query: False # for the current token generation, frequently update the retrieval context. If false, update it every 64 tokens + pad_tokens: True # pad the tokens at the beginning to make it minimum of 64 tokens for retrieving at least once + store_retrieved: False # whether store the retrieved documents, so it can be checked + weights: [0.5, 0.5] # weight for different retrieval services + sentence_bert: + devices: '0,1,2' + sentence_bert: 'all-mpnet-base-v2' + sentence_bert_batch: 4 + services: + - type: FaissRetrievalService + faiss_devices: '0,1,2' + faiss_index: null # the faiss index file that is used to find KNN + nprobe: 100 + retrieval_index: null + - type: DynamicFaissRetrievalService + faiss_devices: '0,1,2' + chunk_size: 64 + stride: 32 +server: False # whether launch the API server +port: 5555 # the port number for the inference server +web_server: False # whether launch the web inference server +share: False # whether create a public URL +username: test # user name for web client +password: test2 # password for web client \ No newline at end of file diff --git a/examples/nlp/language_modeling/megatron_gpt_eval.py b/examples/nlp/language_modeling/megatron_gpt_eval.py index fb30622f7219..865fbc174a2d 100644 --- a/examples/nlp/language_modeling/megatron_gpt_eval.py +++ b/examples/nlp/language_modeling/megatron_gpt_eval.py @@ -13,6 +13,7 @@ # limitations under the License. import os +import threading import torch from omegaconf import OmegaConf, open_dict @@ -21,6 +22,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel from nemo.collections.nlp.modules.common.megatron.megatron_init import fake_initialize_model_parallel +from nemo.collections.nlp.modules.common.megatron_web_server import get_demo from nemo.collections.nlp.modules.common.text_generation_server import MegatronServer from nemo.collections.nlp.modules.common.text_generation_utils import generate from nemo.collections.nlp.modules.common.transformer.text_generation import LengthParam, SamplingParam @@ -253,6 +255,9 @@ def main(cfg) -> None: # Third method of running text generation, use inference server if cfg.server: if parallel_state.is_pipeline_first_stage() and parallel_state.get_tensor_model_parallel_rank() == 0: + if cfg.web_server: + thread = threading.Thread(target=get_demo, daemon=True, args=(cfg.share, cfg.username, cfg.password)) + thread.start() server = MegatronServer(model.cuda()) server.run("0.0.0.0", port=cfg.port) diff --git a/examples/nlp/language_modeling/megatron_retro_eval.py b/examples/nlp/language_modeling/megatron_retro_eval.py index 21c389bb1015..788a5bfffb1b 100644 --- a/examples/nlp/language_modeling/megatron_retro_eval.py +++ b/examples/nlp/language_modeling/megatron_retro_eval.py @@ -13,17 +13,28 @@ # limitations under the License. import os +import threading +import torch from examples.nlp.language_modeling.megatron_gpt_eval import RequestDataSet from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer from torch.utils.data import DataLoader from nemo.collections.nlp.models.language_modeling.megatron_retrieval_model import MegatronRetrievalModel +from nemo.collections.nlp.modules.common.megatron_web_server import get_retro_demo +from nemo.collections.nlp.modules.common.text_generation_server import MegatronServer +from nemo.collections.nlp.modules.common.text_generation_utils import generate from nemo.collections.nlp.modules.common.transformer.text_generation import LengthParam, SamplingParam from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy, NLPSaveRestoreConnector from nemo.core.config import hydra_runner +try: + from apex.transformer import parallel_state + + HAVE_APEX = True +except (ImportError, ModuleNotFoundError): + HAVE_APEX = False """ This is the script to run RETRO Model text generation. @@ -86,26 +97,55 @@ def main(cfg) -> None: "compute_logprob": cfg.inference.compute_logprob, } - if not cfg.use_predict_method: - # First method of running text generation, call model.generate method - response = model.generate( - inputs=OmegaConf.to_container(cfg.prompts), - length_params=length_params, - sampling_params=sampling_params, - **cfg.retrieval_service, - ) + # check whether the DDP is initialized + if parallel_state.is_unitialized(): + + def dummy(): + return + + if model.trainer.strategy.launcher is not None: + model.trainer.strategy.launcher.launch(dummy, trainer=model.trainer) + model.trainer.strategy.setup_environment() + + config = OmegaConf.to_container(cfg.inference) + retrieval_service = OmegaConf.to_container(cfg.retrieval_service) + model.set_inference_config(config, retrieval_service) + + # running text generation, use inference server + if cfg.server: + if parallel_state.is_pipeline_first_stage() and parallel_state.get_tensor_model_parallel_rank() == 0: + if cfg.web_server: + thread = threading.Thread( + target=get_retro_demo, daemon=True, args=(cfg.share, cfg.username, cfg.password) + ) + thread.start() + server = MegatronServer(model.cuda(), inference_strategy=model.inference_strategy) + server.run("0.0.0.0", port=cfg.port) + + while True: + choice = torch.cuda.LongTensor(1) + torch.distributed.broadcast(choice, 0) + if choice[0].item() == 0: + generate(model.cuda(), strategy=model.inference_strategy) else: - # Second method of running text generation, call trainer.predict - ds = RequestDataSet(OmegaConf.to_container(cfg.prompts)) - request_dl = DataLoader(dataset=ds, batch_size=cfg.inference_batch_size) - config = OmegaConf.to_container(cfg.inference) - retrieval_service = OmegaConf.to_container(cfg.retrieval_service) - model.set_inference_config(config, retrieval_service) - response = trainer.predict(model, request_dl) - - print("***************************") - print(response) - print("***************************") + + if not cfg.use_predict_method: + # First method of running text generation, call model.generate method + response = model.generate( + inputs=OmegaConf.to_container(cfg.prompts), + length_params=length_params, + sampling_params=sampling_params, + strategy=model.inference_strategy, + ) + else: + # Second method of running text generation, call trainer.predict + ds = RequestDataSet(OmegaConf.to_container(cfg.prompts)) + request_dl = DataLoader(dataset=ds, batch_size=cfg.inference_batch_size) + response = trainer.predict(model, request_dl) + + print("***************************") + print(response) + print("***************************") if __name__ == '__main__': diff --git a/nemo/collections/nlp/models/language_modeling/megatron_retrieval_model.py b/nemo/collections/nlp/models/language_modeling/megatron_retrieval_model.py index 66e43458d20e..653c66b48669 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_retrieval_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_retrieval_model.py @@ -340,6 +340,8 @@ def validation_step(self, batch, batch_idx): return reduced_loss def validation_epoch_end(self, outputs): + if len(outputs) == 0: + return averaged_loss = torch.stack(outputs).mean() self.log('val_loss', averaged_loss, prog_bar=True) # formula to compute the perplexity @@ -457,7 +459,7 @@ def setup(self, stage=None): def set_inference_config(self, inference_config, retrieval_config): self._inference_config = inference_config - self._inference_strategy = model_inference_strategy_dispatcher(self, **retrieval_config) + self.inference_strategy = model_inference_strategy_dispatcher(self, **retrieval_config) def predict_step(self, batch: Any, batch_idx: int, dataloader_idx: Optional[int] = None) -> Any: inference_config = self._inference_config @@ -474,13 +476,13 @@ def predict_step(self, batch: Any, batch_idx: int, dataloader_idx: Optional[int] inference_config['all_probs'] = True inference_config["add_BOS"] = False inference_config['greedy'] = True - response = generate(self, **inference_config, strategy=self._inference_strategy) + response = generate(self, **inference_config, strategy=self.inference_strategy) compute_prob_response = get_computeprob_response(self.tokenizer, response, batch) return compute_prob_response else: del inference_config['compute_logprob'] inference_config['inputs'] = batch - return generate(self, **inference_config, strategy=self._inference_strategy) + return generate(self, **inference_config, strategy=self.inference_strategy) def generate( self, diff --git a/nemo/collections/nlp/modules/common/megatron/mup/layer.py b/nemo/collections/nlp/modules/common/megatron/mup/layer.py index 8683134b997b..ce969e455d6e 100644 --- a/nemo/collections/nlp/modules/common/megatron/mup/layer.py +++ b/nemo/collections/nlp/modules/common/megatron/mup/layer.py @@ -71,13 +71,16 @@ def __init__(self, mpu_vocab_size, parallel_output): self.bias.partition_dim = 0 self.bias.stride = 1 self.parallel_output = parallel_output + self.warn_once = False def forward(self, hidden_states, word_embeddings_weight): if hasattr(word_embeddings_weight, 'infshape'): width_mult = word_embeddings_weight.infshape.width_mult() else: width_mult = 1.0 - logging.warning("need to set_shape before use mu-Transfer readout layer") + if not self.warn_once: + logging.warning("need to set_shape before use mu-Transfer readout layer") + self.warn_once = True async_tensor_model_parallel_allreduce = parallel_state.get_tensor_model_parallel_world_size() > 1 output = parallel_lm_logits( hidden_states / width_mult, diff --git a/nemo/collections/nlp/modules/common/megatron/retrieval_service.py b/nemo/collections/nlp/modules/common/megatron/retrieval_service.py index 0d72439b60ca..c599cb1b2bff 100644 --- a/nemo/collections/nlp/modules/common/megatron/retrieval_service.py +++ b/nemo/collections/nlp/modules/common/megatron/retrieval_service.py @@ -13,64 +13,198 @@ # limitations under the License. import abc +import base64 +import json +import logging +import pickle +import threading import time from typing import List, Union import faiss import numpy as np +import requests import torch +from flask import Flask, jsonify, request +from flask_restful import Api, Resource from sentence_transformers import SentenceTransformer from nemo.collections.common.tokenizers.tokenizer_spec import TokenizerSpec from nemo.collections.nlp.data.language_modeling.megatron.indexed_retrieval_dataset import MMapRetrievalIndexedDataset -from nemo.utils import logging + +log = logging.getLogger('werkzeug') +log.setLevel(logging.ERROR) + +lock = threading.Lock() +headers = {"Content-Type": "application/json"} + +PORT_NUM = 17179 +PORT_NUM_DYN = 17180 +PORT_NUM_BERT = 17181 + + +def request_data(data, port=PORT_NUM): + resp = requests.put('http://localhost:{}/knn'.format(port), data=json.dumps(data), headers=headers) + return resp.json() class RetrievalService: + """ + Abstract class for Retrieval Service. + """ + @abc.abstractmethod - def get_knn(self, query: Union[List[str], str, torch.Tensor]): + def get_knn(self, query: Union[List[str], str, torch.Tensor], neighbors: int): pass + @abc.abstractmethod + def add_docs_to_index(self, docs: List[str], add_eos: bool = True): + """ + Add documents to the Faiss index + Args: + docs: List[str], list of documents that is going to be added to the index + add_eos: bool, whether add the eos in the end + """ + raise NotImplementedError() + + +class ChunkStore: + """ + ChunkStore maps chunk id to tokens. It is used as an in memory storage for dynamic retrieval DB. + """ + + def __init__(self, chunk_size, pad_id): + self.store = {} + self._count = 0 + self.no_retrieval = np.ones(2 * chunk_size, dtype=np.int64) * pad_id + self.store[-1] = self.no_retrieval + + def add(self, chunk): + self.store[self._count] = chunk + self._count += 1 + + def get_chunk(self, neighbor_id): + return self.store[neighbor_id] + + def reset(self): + self._count = 0 + self.store = {} + self.store[-1] = self.no_retrieval + + +class SentenceBertResource(Resource): + """ + SentenceBERT Flask resource. + The PUT method is to get token/str embedding. + """ + + def __init__( + self, bert_model, tokenizer, pool, sentence_bert_batch, + ): + # server + self.bert_model = bert_model + self.tokenizer = tokenizer + self.pool = pool + self.sentence_bert_batch = sentence_bert_batch + self.embedding_dim = self.bert_model.get_sentence_embedding_dimension() + + def put(self): + data = request.get_json() + if isinstance(data, dict): + return jsonify({'dim': self.embedding_dim}) + sentences = data + emb = self.get_emb(sentences) + str_emb = base64.b64encode(pickle.dumps(emb)) + return str_emb.decode('ascii') + + def get_emb(self, query: Union[List[str], str, torch.Tensor]): + if isinstance(query, str): + query = [query] + elif isinstance(query, torch.Tensor): + sentence_list = [] + for q in query: + text = self.tokenizer.ids_to_text(q) + sentence_list.append(text) + query = sentence_list + emb = self.bert_model.encode_multi_process( + sentences=query, pool=self.pool, batch_size=self.sentence_bert_batch + ) + return emb + + +class SentenceBertServer(object): + """ + Flask SentenceBERT server, which helps to calculate str/token embeddings + """ -class FaissRetrievalService(RetrievalService): def __init__( self, - faiss_index: str, - faiss_devices: str, - nprobe: int, - retrieval_index: str, + devices: str, tokenizer: TokenizerSpec, sentence_bert: str = 'all-mpnet-base-v2', sentence_bert_batch: int = 4, - neighbors: int = 4, ): - self.neighbors = neighbors - has_gpu = torch.cuda.is_available() and hasattr(faiss, "index_gpu_to_cpu") + self.app = Flask(__name__, static_url_path='') - if faiss_devices is None or not torch.cuda.is_available(): + if devices is None or not torch.cuda.is_available(): device_list = None else: - device_list = ['cuda:' + str(device) for device in faiss_devices.split(',')] - - self.index = faiss.read_index(faiss_index) - if has_gpu and device_list is not None: - beg = time.time() - co = faiss.GpuMultipleClonerOptions() - co.useFloat16 = True - co.usePrecomputed = False - co.shard = True - self.index = faiss.index_cpu_to_all_gpus(self.index, co, ngpu=len(device_list)) - end = time.time() - logging.info('convert Faiss db to GPU takes', end - beg) - self.index.nprobe = nprobe + device_list = ['cuda:' + str(device) for device in devices.split(',')] self.bert_model = SentenceTransformer(sentence_bert) self.tokenizer = tokenizer - self.ds = MMapRetrievalIndexedDataset(retrieval_index) self.pool = self.bert_model.start_multi_process_pool(device_list) self.sentence_bert_batch = sentence_bert_batch + api = Api(self.app) + api.add_resource( + SentenceBertResource, + '/knn', + resource_class_args=[self.bert_model, self.tokenizer, self.pool, self.sentence_bert_batch,], + ) + + def run(self, url, port=PORT_NUM_BERT): + threading.Thread(target=lambda: self.app.run(host=url, threaded=True, port=port)).start() + + +def start_sentence_bert_server( + devices: str, tokenizer: TokenizerSpec, sentence_bert: str = 'all-mpnet-base-v2', sentence_bert_batch: int = 4 +): + """ + Start the sentence bert server method. + It only starts the server at rank 0 worker. + Doesn't support multiple nodes yet. + """ + + if torch.distributed.get_rank() == 0: + server = SentenceBertServer(devices, tokenizer, sentence_bert, sentence_bert_batch,) + server.run("0.0.0.0") + torch.distributed.barrier() + + +class FaissRetrievalResource(Resource): + """ + Static Faiss Retrieval Flask resource. + The PUT method is to get KNN tokens. + """ + + def __init__( + self, index, tokenizer, ds, + ): + # server + self.index = index + self.tokenizer = tokenizer + self.ds = ds - def get_knn(self, query: Union[List[str], str, torch.Tensor]): + def put(self): + data = request.get_json() + sentences = data['sentences'] + num_neighbors = data['neighbors'] + with lock: # Need to get lock to keep multiple threads from hitting code + neighbors = self.get_knn(sentences, num_neighbors) + return jsonify(neighbors.tolist()) + # check keys + + def get_knn(self, query: Union[List[str], str, torch.Tensor], neighbors: int): single_sentence = False if isinstance(query, str): single_sentence = True @@ -81,10 +215,10 @@ def get_knn(self, query: Union[List[str], str, torch.Tensor]): text = self.tokenizer.ids_to_text(q) sentence_list.append(text) query = sentence_list - emb = self.bert_model.encode_multi_process( - sentences=query, pool=self.pool, batch_size=self.sentence_bert_batch - ) - D, knn = self.index.search(emb, self.neighbors) + emb = request_data(query, PORT_NUM_BERT) + emb_data = base64.b64decode(emb.encode()) + emb = pickle.loads(emb_data) + D, knn = self.index.search(emb, neighbors) results = [] for sentence_neighbors in knn: chunks = [] @@ -97,3 +231,304 @@ def get_knn(self, query: Union[List[str], str, torch.Tensor]): # unpack the single sentence input return results[0] return np.stack(results, axis=0).astype(np.int64) + + +class RetrievalServer(object): + """ + Flask Retrieval server, which helps to get the KNN tokens given the query chunk + """ + + def __init__( + self, faiss_index: str, faiss_devices: str, nprobe: int, retrieval_index: str, tokenizer: TokenizerSpec, + ): + self.app = Flask(__name__, static_url_path='') + # server + has_gpu = torch.cuda.is_available() and hasattr(faiss, "index_gpu_to_cpu") + + if faiss_devices is None or not torch.cuda.is_available(): + device_list = None + else: + device_list = ['cuda:' + str(device) for device in faiss_devices.split(',')] + + self.index = faiss.read_index(faiss_index) + if has_gpu and device_list is not None: + beg = time.time() + co = faiss.GpuMultipleClonerOptions() + co.useFloat16 = True + co.usePrecomputed = False + co.shard = True + self.index = faiss.index_cpu_to_all_gpus(self.index, co, ngpu=len(device_list)) + end = time.time() + logging.info('convert Faiss db to GPU takes', end - beg) + self.index.nprobe = nprobe + self.tokenizer = tokenizer + self.ds = MMapRetrievalIndexedDataset(retrieval_index) + api = Api(self.app) + api.add_resource( + FaissRetrievalResource, '/knn', resource_class_args=[self.index, self.tokenizer, self.ds,], + ) + + def run(self, url, port=PORT_NUM): + threading.Thread(target=lambda: self.app.run(host=url, threaded=True, port=port)).start() + + +class DynamicRetrievalResource(FaissRetrievalResource): + """ + Dynamic Faiss Retrieval Flask resource. + The PUT method is to get KNN tokens, add new chunks, reset index. + """ + + def __init__( + self, index, tokenizer, chunk_size, stride, store, + ): + self.index = index + self.tokenizer = tokenizer + self.chunk_size = chunk_size + self.stride = stride + self.pad_id = self.tokenizer.pad_id + self.ds = store + + def put(self): + data = request.get_json() + if 'neighbors' in data: + sentences = data['sentences'] + # do knn query + num_neighbors = data['neighbors'] + with lock: # Need to get lock to keep multiple threads from hitting code + neighbors = self.get_knn(sentences, num_neighbors) + return jsonify(neighbors.tolist()) + elif 'reset' in data: + with lock: # Need to get lock to keep multiple threads from hitting code + self.reset() + return "success" + else: + sentences = data['sentences'] + add_eos = data['add_eos'] + # update the index + with lock: # Need to get lock to keep multiple threads from hitting code + self.add_docs_to_index(sentences, add_eos) + return "success" + + def reset(self): + self.index.reset() + self.ds.reset() + + def add_docs_to_index(self, docs: List[str], add_eos: bool = True): + """ + Add documents to the Faiss index + Args: + docs: List[str], list of documents that is going to be added to the index + add_eos: bool, whether add the eos in the end + """ + for doc in docs: + token_ids = self.tokenizer.text_to_ids(doc) + # append eos in the end + if add_eos: + token_ids.append(self.tokenizer.eos_id) + np_array = np.array(token_ids, dtype=np.int32) + padded_size = self.chunk_size - (len(np_array) % self.chunk_size) + # for retrieval database, added one more chunk in the end as padding + padded_size += self.chunk_size + np_array = np.pad(np_array, (0, padded_size), 'constant', constant_values=self.pad_id) + chunk_texts = [] + for i in range(0, len(np_array), self.stride): + if i + 2 * self.chunk_size <= len(np_array): + chunk = np_array[i : i + 2 * self.chunk_size] + self.ds.add(chunk) + chunk_texts.append(self.tokenizer.ids_to_text(chunk)) + emb = request_data(chunk_texts, PORT_NUM_BERT) + emb_data = base64.b64decode(emb.encode()) + emb = pickle.loads(emb_data) + self.index.add(emb) # add vectors to the index + + +class DynamicRetrievalServer(object): + """ + Flask Dynamic Retrieval server, which helps to build dynamic retrieval index. + """ + + def __init__( + self, faiss_devices: str, tokenizer: TokenizerSpec, chunk_size: int = 64, stride: int = 32, + ): + self.app = Flask(__name__, static_url_path='') + has_gpu = torch.cuda.is_available() and hasattr(faiss, "index_gpu_to_cpu") + embedding_dim = request_data({}, PORT_NUM_BERT)['dim'] + self.index = faiss.IndexFlatL2(embedding_dim) # build the index + self.pad_id = tokenizer.pad_id + self.chunk_size = chunk_size + self.stride = stride + self.store = ChunkStore(chunk_size, self.pad_id) + + if faiss_devices is None or not torch.cuda.is_available(): + device_list = None + else: + device_list = ['cuda:' + str(device) for device in faiss_devices.split(',')] + + if has_gpu and device_list is not None: + beg = time.time() + co = faiss.GpuMultipleClonerOptions() + co.useFloat16 = True + co.usePrecomputed = False + co.shard = True + self.index = faiss.index_cpu_to_all_gpus(self.index, co, ngpu=len(device_list)) + end = time.time() + logging.info('convert Faiss db to GPU takes', end - beg) + + self.tokenizer = tokenizer + + api = Api(self.app) + api.add_resource( + DynamicRetrievalResource, + '/knn', + resource_class_args=[self.index, self.tokenizer, self.chunk_size, self.stride, self.store,], + ) + + def run(self, url, port=PORT_NUM_DYN): + threading.Thread(target=lambda: self.app.run(host=url, threaded=True, port=port)).start() + + +class FaissRetrievalService(RetrievalService): + """ + Top level static retrieval service class. + It starts the server at rank 0 worker, currently doesn't support multiple nodes yet. + It implements the retrieval services interface, has a simple client to do KNN queries. + """ + + def __init__( + self, faiss_index: str, faiss_devices: str, nprobe: int, retrieval_index: str, tokenizer: TokenizerSpec, + ): + self.updatable = False + self.tokenizer = tokenizer + ds = MMapRetrievalIndexedDataset(retrieval_index) + self.chunk_size = ds.chunk_size + pad_id = self.tokenizer.pad_id + # batch, neighbors, 2*chunk_size + self.no_retrieval = np.ones((1, 1, 2 * self.chunk_size), dtype=ds._index.dtype) * pad_id + if torch.distributed.get_rank() == 0: + server = RetrievalServer(faiss_index, faiss_devices, nprobe, retrieval_index, tokenizer) + server.run("0.0.0.0") + torch.distributed.barrier() + + def get_knn(self, query: Union[List[str], str, torch.Tensor], neighbors): + if isinstance(query, torch.Tensor): + sentence_list = [] + for q in query: + text = self.tokenizer.ids_to_text(q) + sentence_list.append(text) + query = sentence_list + if neighbors == 0: + # use padding + return np.repeat(self.no_retrieval, len(query), 0).astype(np.int64) + data = {'sentences': query} + data['neighbors'] = neighbors + result = request_data(data, PORT_NUM) + result = np.array(result) + return result + + +class DynamicFaissRetrievalService(RetrievalService): + """ + Top level dynamic retrieval service class. + It starts the server at rank 0 worker, currently doesn't support multiple nodes yet. + It implements the retrieval services interface, has a simple client to add, reset and query + the dynamic retrieval index. + """ + + def __init__( + self, faiss_devices: str, tokenizer: TokenizerSpec, chunk_size: int, stride: int, + ): + self.updatable = True + self.tokenizer = tokenizer + self.chunk_size = chunk_size + pad_id = self.tokenizer.pad_id + # batch, neighbors, 2*chunk_size + self.no_retrieval = np.ones((1, 1, 2 * self.chunk_size), dtype=np.int64) * pad_id + if torch.distributed.get_rank() == 0: + server = DynamicRetrievalServer(faiss_devices, tokenizer, chunk_size, stride) + server.run("0.0.0.0") + torch.distributed.barrier() + + def get_knn(self, query: Union[List[str], str, torch.Tensor], neighbors): + if isinstance(query, torch.Tensor): + sentence_list = [] + for q in query: + text = self.tokenizer.ids_to_text(q) + sentence_list.append(text) + query = sentence_list + if neighbors == 0: + # use padding + return np.repeat(self.no_retrieval, len(query), 0).astype(np.int64) + data = {'sentences': query} + data['neighbors'] = neighbors + result = request_data(data, PORT_NUM_DYN) + result = np.array(result) + return result + + def add_docs_to_index(self, query: List[str], add_eos: bool = True): + """ + Add documents to the Faiss index + Args: + docs: List[str], list of documents that is going to be added to the index + add_eos: bool, whether add the eos in the end + """ + if isinstance(query, torch.Tensor): + sentence_list = [] + for q in query: + text = self.tokenizer.ids_to_text(q) + sentence_list.append(text) + query = sentence_list + data = {'sentences': query, 'add_eos': add_eos} + return request_data(data, PORT_NUM_DYN) + + +class ComboRetrievalService(RetrievalService): + """ + Top level retrieval service class. + It combines other retrieval services as a combo retrieval service. + It uses `weights` to determine the number of neighbors for each of the retrieval service members. + """ + + def __init__(self, retrieval_services, weights): + self.retrieval_services = retrieval_services + self.updatable = any([service.updatable for service in retrieval_services]) + weights = np.array(weights) + # normalize the weights + self.weights = weights / weights.sum() + self.chunk_size = self.retrieval_services[0].chunk_size + + def update_weights(self, weights): + weights = np.array(weights) + # normalize the weights + self.weights = weights / weights.sum() + + def get_knn(self, query: Union[List[str], str, torch.Tensor], neighbors): + if neighbors == 0: + return self.retrieval_services[0].get_knn(query, 0) + total_neighbors = 0 + results = [] + for i, service in enumerate(self.retrieval_services): + k = int(neighbors * self.weights[i]) + if i == len(self.retrieval_services) - 1: + k = neighbors - total_neighbors + total_neighbors += k + if k == 0: + # empty, skip it + continue + result = service.get_knn(query, k) + results.append(result) + return np.concatenate(results, axis=1) + + def add_docs_to_index(self, query: List[str], add_eos: bool = True): + """ + Add documents to the Faiss index + Args: + docs: List[str], list of documents that is going to be added to the index + add_eos: bool, whether add the eos in the end + """ + output = 'success' + if not self.updatable: + return output + for i, service in enumerate(self.retrieval_services): + if service.updatable: + service.add_docs_to_index(query, add_eos) + return output diff --git a/nemo/collections/nlp/modules/common/megatron/retrieval_transformer.py b/nemo/collections/nlp/modules/common/megatron/retrieval_transformer.py index abe0ebf6b03d..a22a84e41267 100644 --- a/nemo/collections/nlp/modules/common/megatron/retrieval_transformer.py +++ b/nemo/collections/nlp/modules/common/megatron/retrieval_transformer.py @@ -495,8 +495,12 @@ def forward( ), f'sequence requires {num_seq_chunks} retrieved chunks, but only {k} passed in' # need to add extra chunk size, since it will be shifted if retrieved_emb is not None: + # -63, -62, ... 63 will be cut into -> [0, ... 63] in the chunk cross attention layer cross_attn_q_pos_emb = self.rotary_pos_emb(self.chunk_size * 2 - 1, offset=-self.chunk_size + 1) cross_attn_k_pos_emb = self.rotary_pos_emb(rn, offset=0) + # TODO, the first 64 tokens in retrieved is from the last chunk, align the continuation part with the query tokens + # use the following in the future. [-63, -62, ..., 63, 64] + # cross_attn_k_pos_emb = self.rotary_pos_emb(rn, offset=-self.chunk_size + 1) attn_pos_emb = (self_attn_emb, cross_attn_q_pos_emb, cross_attn_k_pos_emb) else: attn_pos_emb = (self_attn_emb, None, None) diff --git a/nemo/collections/nlp/modules/common/megatron/transformer.py b/nemo/collections/nlp/modules/common/megatron/transformer.py index 5881e10ab689..11d9a7848cd3 100644 --- a/nemo/collections/nlp/modules/common/megatron/transformer.py +++ b/nemo/collections/nlp/modules/common/megatron/transformer.py @@ -2259,7 +2259,7 @@ def get_num_layers(self, num_layers): num_layers = num_layers // num_ranks_in_encoder else: num_layers = num_layers // num_ranks_in_decoder - else: + elif self.model_type == ModelType.encoder_or_decoder: assert ( num_layers % parallel_state.get_pipeline_model_parallel_world_size() == 0 ), 'num_layers must be divisible by pipeline_model_parallel_size' diff --git a/nemo/collections/nlp/modules/common/megatron_web_server.py b/nemo/collections/nlp/modules/common/megatron_web_server.py new file mode 100644 index 000000000000..d969c532231f --- /dev/null +++ b/nemo/collections/nlp/modules/common/megatron_web_server.py @@ -0,0 +1,199 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import json + +import gradio as gr +import requests + +from nemo.collections.nlp.modules.common.megatron.retrieval_service import PORT_NUM_DYN + +PORT_NUM = 5555 +headers = {"Content-Type": "application/json"} + + +def request_data(data, port_num=PORT_NUM): + resp = requests.put('http://localhost:{}/generate'.format(port_num), data=json.dumps(data), headers=headers) + output_json = resp.json() + return output_json + + +def update_index(data, port_num=PORT_NUM_DYN): + resp = requests.put('http://localhost:{}/knn'.format(port_num), data=json.dumps(data), headers=headers) + output_json = resp.json() + return output_json + + +def get_generation(prompt, greedy, add_BOS, token_to_gen, min_tokens, temp, top_p, top_k, repetition): + data = { + "sentences": [prompt], + "tokens_to_generate": int(token_to_gen), + "temperature": temp, + "add_BOS": add_BOS, + "top_k": top_k, + "top_p": top_p, + "greedy": greedy, + "all_probs": False, + "repetition_penalty": repetition, + "min_tokens_to_generate": int(min_tokens), + } + sentences = request_data(data)['sentences'] + return sentences[0] + + +def convert_retrieved_to_md(retrieved): + output_str = '' + for item in retrieved: + output_str += f'' + for i, neighbor in enumerate(item['neighbors']): + if i == 0: + output_str += f"" + else: + output_str += f"" + output_str += '
QueryRetrieved Doc
{item["query"]}{neighbor}
{neighbor}
' + return output_str + + +def get_retro_generation( + prompt, greedy, add_BOS, token_to_gen, min_tokens, temp, top_p, top_k, repetition, neighbors, weights +): + data = { + "sentences": [prompt], + "tokens_to_generate": int(token_to_gen), + "temperature": temp, + "add_BOS": add_BOS, + "top_k": top_k, + "top_p": top_p, + "greedy": greedy, + "all_probs": False, + "repetition_penalty": repetition, + "min_tokens_to_generate": int(min_tokens), + "neighbors": int(neighbors), + "weights": weights, + } + output_json = request_data(data) + sentences = output_json['sentences'] + retrieved = output_json['retrieved'] + return sentences[0], convert_retrieved_to_md(retrieved) + + +def add_doc(doc, add_eos): + data = { + "sentences": [doc], + "add_eos": add_eos, + } + return update_index(data) + + +def reset_index(): + data = {"reset": True} + resp = requests.put('http://localhost:{}/knn'.format(PORT_NUM_DYN), data=json.dumps(data), headers=headers) + output_json = resp.json() + return output_json + + +def get_demo(share, username, password): + with gr.Blocks() as demo: + with gr.Row(): + with gr.Column(scale=2, width=200): + greedy_flag = gr.Checkbox(label="Greedy") + add_BOS = gr.Checkbox(label="Add BOS token", value=False) + token_to_gen = gr.Number(label='Number of Tokens to generate', value=300, type=int) + min_token_to_gen = gr.Number(label='Min number of Tokens to generate', value=1, type=int) + temperature = gr.Slider(minimum=0.0, maximum=10.0, value=1.0, label='Temperature', step=0.1) + top_p = gr.Slider(minimum=0.0, maximum=1.0, step=0.02, value=0.9, label='Top P') + top_k = gr.Slider(minimum=0, maximum=10000, step=2, value=0, label='Top K') + repetition_penality = gr.Slider( + minimum=0.0, maximum=5.0, step=0.02, value=1.2, label='Repetition penalty' + ) + with gr.Column(scale=1, min_width=800): + input_prompt = gr.Textbox( + label="Input", + value="Ariel was playing basketball. 1 of her shots went in the hoop. 2 of her shots did not go in the hoop. How many shots were there in total?", + lines=5, + ) + output_box = gr.Textbox(value="", label="Output") + btn = gr.Button(value="Submit") + btn.click( + get_generation, + inputs=[ + input_prompt, + greedy_flag, + add_BOS, + token_to_gen, + min_token_to_gen, + temperature, + top_p, + top_k, + repetition_penality, + ], + outputs=[output_box], + ) + demo.launch(share=share, server_port=13570, server_name='0.0.0.0', auth=(username, password)) + + +def get_retro_demo(share, username, password): + with gr.Blocks(css="table, th, td { border: 1px solid blue; table-layout: fixed; width: 100%; }") as demo: + with gr.Row(): + with gr.Column(scale=2, width=200): + greedy_flag = gr.Checkbox(label="Greedy") + add_BOS = gr.Checkbox(label="Add BOS token", value=False) + token_to_gen = gr.Number(label='Number of Tokens to generate', value=300, type=int) + min_token_to_gen = gr.Number(label='Min number of Tokens to generate', value=1, type=int) + temperature = gr.Slider(minimum=0.0, maximum=10.0, value=1.0, label='Temperature', step=0.1) + top_p = gr.Slider(minimum=0.0, maximum=1.0, step=0.02, value=0.9, label='Top P') + top_k = gr.Slider(minimum=0, maximum=10000, step=2, value=0, label='Top K') + repetition_penality = gr.Slider( + minimum=0.0, maximum=5.0, step=0.02, value=1.2, label='Repetition penalty' + ) + k_neighbors = gr.Slider(minimum=0, maximum=50, step=1, value=2, label='Retrieved Documents') + weights = gr.Slider( + minimum=0.0, maximum=1.0, value=0.5, label='Weight for the first Retrieval', step=0.02 + ) + add_retrival_doc = gr.Textbox(label="Add New Retrieval Doc", value="", lines=5,) + add_EOS = gr.Checkbox(label="Add EOS token to Retrieval Doc", value=True) + with gr.Row(): + add_btn = gr.Button(value="Add") + reset_btn = gr.Button(value="Reset Index") + output_status = gr.Label(value='') + add_btn.click(add_doc, inputs=[add_retrival_doc, add_EOS], outputs=[output_status]) + reset_btn.click(reset_index, inputs=[], outputs=[output_status]) + + with gr.Column(scale=1, min_width=800): + input_prompt = gr.Textbox( + label="Input", + value="Ariel was playing basketball. 1 of her shots went in the hoop. 2 of her shots did not go in the hoop. How many shots were there in total?", + lines=5, + ) + output_box = gr.Textbox(value="", label="Output") + output_retrieval = gr.HTML() + btn = gr.Button(value="Submit") + btn.click( + get_retro_generation, + inputs=[ + input_prompt, + greedy_flag, + add_BOS, + token_to_gen, + min_token_to_gen, + temperature, + top_p, + top_k, + repetition_penality, + k_neighbors, + weights, + ], + outputs=[output_box, output_retrieval], + ) + demo.launch(share=share, server_port=13570, server_name='0.0.0.0', auth=(username, password)) diff --git a/nemo/collections/nlp/modules/common/text_generation_server.py b/nemo/collections/nlp/modules/common/text_generation_server.py index 48dac1022899..b2b98b2361d8 100644 --- a/nemo/collections/nlp/modules/common/text_generation_server.py +++ b/nemo/collections/nlp/modules/common/text_generation_server.py @@ -20,6 +20,7 @@ from flask import Flask, jsonify, request from flask_restful import Api, Resource +from nemo.collections.nlp.modules.common.text_generation_strategy import RetroModelTextGenerationStrategy from nemo.collections.nlp.modules.common.text_generation_utils import generate from nemo.utils import logging @@ -44,8 +45,9 @@ class MegatronGenerate(Resource): - def __init__(self, model): + def __init__(self, model, inference_strategy=None): self.model = model + self.inference_strategy = inference_strategy @staticmethod def send_do_generate(): @@ -142,11 +144,36 @@ def put(self): if min_tokens_to_generate < 0: return "min_tokens_to_generate must be an integer no less than 0" + neighbors = None + if "neighbors" in request.get_json(): + neighbors = request.get_json()["neighbors"] + if not isinstance(neighbors, int): + return "num of neighbors must be an integer no less than 0" + if neighbors < 0: + return "num of neighbors must be an integer no less than 0" + + weights = None + if "weights" in request.get_json(): + weights = request.get_json()["weights"] + if not (type(weights) == int or type(weights) == float): + return "weights must be a positive number less than or equal to 1.0" + if not (0.0 <= weights <= 1.0): + return "weights must be a positive number less than or equal to 1.0" + with lock: # Need to get lock to keep multiple threads from hitting code MegatronGenerate.send_do_generate() # Tell other ranks we're doing generate extra = {} if task_ids is not None: extra['task_ids'] = task_ids + if self.inference_strategy is not None: + extra['strategy'] = self.inference_strategy + # RETRO specific arguments + if isinstance(self.inference_strategy, RetroModelTextGenerationStrategy): + if neighbors is not None: + self.inference_strategy.update_neighbors(neighbors) + if weights is not None: + self.inference_strategy.update_weights([weights, 1 - weights]) + output = generate( self.model, sentences, @@ -166,14 +193,19 @@ def put(self): output[k] = output[k].tolist() if not all_probs: del output['full_logprob'] + + if self.inference_strategy is not None: + if isinstance(self.inference_strategy, RetroModelTextGenerationStrategy): + retrieved_doc = self.inference_strategy.retrieved_text + output['retrieved'] = retrieved_doc return jsonify(output) class MegatronServer(object): - def __init__(self, model): + def __init__(self, model, inference_strategy=None): self.app = Flask(__name__, static_url_path='') api = Api(self.app) - api.add_resource(MegatronGenerate, '/generate', resource_class_args=[model]) + api.add_resource(MegatronGenerate, '/generate', resource_class_args=[model, inference_strategy]) def run(self, url, port=5000): self.app.run(url, threaded=True, port=port, debug=False) diff --git a/nemo/collections/nlp/modules/common/text_generation_strategy.py b/nemo/collections/nlp/modules/common/text_generation_strategy.py index 1bb9c3e41014..061e5b56316b 100644 --- a/nemo/collections/nlp/modules/common/text_generation_strategy.py +++ b/nemo/collections/nlp/modules/common/text_generation_strategy.py @@ -17,7 +17,12 @@ import torch -from nemo.collections.nlp.modules.common.megatron.retrieval_service import FaissRetrievalService +from nemo.collections.nlp.modules.common.megatron.retrieval_service import ( + ComboRetrievalService, + DynamicFaissRetrievalService, + FaissRetrievalService, + start_sentence_bert_server, +) from nemo.collections.nlp.modules.common.megatron.utils import get_ltor_masks_and_position_ids try: @@ -87,7 +92,7 @@ def tokenize_batch(self, sentences, max_len, add_BOS): """ tokenizer = self.model.tokenizer if add_BOS: - context_tokens = [[tokenizer.eos_id] + tokenizer.text_to_ids(s) for s in sentences] + context_tokens = [[tokenizer.bos_id] + tokenizer.text_to_ids(s) for s in sentences] else: context_tokens = [tokenizer.text_to_ids(s) for s in sentences] context_tokens, context_lengths = pad_batch(context_tokens, tokenizer.eos_id, max_len) @@ -282,9 +287,96 @@ def __init__(self, model, **args): super().__init__(model) self.forward_model = self.model.model self.frequent_query = args['frequent_query'] - del args['frequent_query'] + self.pad_token_for_retrieval = args['pad_tokens'] self.neighbors = args['neighbors'] - self.service = FaissRetrievalService(tokenizer=self.model.tokenizer, **args) + self.store_retrieved = args['store_retrieved'] + weights = args['weights'] + # start the sentence bert server + start_sentence_bert_server(tokenizer=self.model.tokenizer, **args['sentence_bert']) + services = [] + for service_conf in args['services']: + if service_conf['type'] == 'FaissRetrievalService': + del service_conf['type'] + service = FaissRetrievalService(tokenizer=self.model.tokenizer, **service_conf) + services.append(service) + elif service_conf['type'] == 'DynamicFaissRetrievalService': + del service_conf['type'] + service = DynamicFaissRetrievalService(tokenizer=self.model.tokenizer, **service_conf) + services.append(service) + else: + raise ValueError(f'no such service {service_conf["type"]} implemented') + self.service = ComboRetrievalService(retrieval_services=services, weights=weights) + self.retrieved = [] + self.retrieved_text = [] + self.chunk_size = self.service.chunk_size + + def update_neighbors(self, neighbors): + # dynamically change the number of neighbors during the query + self.neighbors = neighbors + + def update_weights(self, weights): + # dynamically change the weights between different retrieval services + self.service.update_weights(weights) + + def tokenize_batch(self, sentences, max_len, add_BOS): + """ + convert the sentences into lists of tokens, pad them to the same length, add bos tokens if it is needed + Args: + sentences (List[str]): list of input sentences in str format. + max_len (int): max number of tokens to generate. + add_BOS (bool): whether to add the BOS token at the beginning + Returns: + Tuple[torch.Tensor], the tokenized and padded torch tensor and the token context length tensor. + """ + tokenizer = self.model.tokenizer + if add_BOS: + context_tokens = [[tokenizer.bos_id] + tokenizer.text_to_ids(s) for s in sentences] + else: + context_tokens = [tokenizer.text_to_ids(s) for s in sentences] + if self.pad_token_for_retrieval: + padded = [] + for line in context_tokens: + if len(line) < self.chunk_size: + pad_len = self.chunk_size - len(line) + padded.append([tokenizer.pad_id] * pad_len + line) + else: + padded.append(line) + context_tokens = padded + context_tokens, context_lengths = pad_batch(context_tokens, tokenizer.eos_id, max_len) + context_tokens_tensor = torch.cuda.LongTensor(context_tokens) + context_length_tensor = torch.cuda.LongTensor(context_lengths) + return context_tokens_tensor, context_length_tensor + + def tokenize_batch_with_context_and_completion(self, sentences, max_len, add_BOS): + """ + convert the sentences into lists of tokens, pad them to the same length, add bos tokens if it is needed + Args: + sentences (List[str]): list of input sentences in str format. + max_len (int): max number of tokens to generate. + add_BOS (bool): whether to add the BOS token at the beginning + Returns: + Tuple[torch.Tensor], the tokenized and padded torch tensor and the token context length tensor. + """ + tokenizer = self.model.tokenizer + if add_BOS: + context_tokens = [ + [[tokenizer.bos_id] + tokenizer.text_to_ids(s[0]), tokenizer.text_to_ids(s[1])] for s in sentences + ] + else: + context_tokens = [[tokenizer.text_to_ids(s[0]), tokenizer.text_to_ids(s[1])] for s in sentences] + if self.pad_token_for_retrieval: + padded = [] + for line in context_tokens: + if len(line[0]) < self.chunk_size: + pad_len = self.chunk_size - len(line[0]) + padded.append([tokenizer.pad_id] * pad_len + line[0] + line[1]) + else: + padded.append(line[0] + line[1]) + context_tokens = padded + context_tokens, context_lengths = pad_batch(context_tokens, tokenizer.eos_id, max_len) + context_tokens_tensor = torch.cuda.LongTensor(context_tokens) + context_length_tensor = torch.cuda.LongTensor(context_lengths) + return context_tokens_tensor, context_length_tensor def clip_max_len(self, maxlen: int) -> int: """ clip the max len based on the LM model max sequence length""" @@ -292,8 +384,21 @@ def clip_max_len(self, maxlen: int) -> int: maxlen = self.model.cfg.encoder_seq_length + 1 return maxlen + def _store_retrieved(self, tokens, neighbors): + tokenizer = self.model.tokenizer + for batch_id in range(len(tokens)): + item = {} + query_text = tokenizer.ids_to_text(tokens[batch_id]) + item['query'] = query_text + item['neighbors'] = [] + for context_id in range(len(neighbors[batch_id])): + neighbor_text = tokenizer.ids_to_text(neighbors[batch_id][context_id]) + item['neighbors'].append(neighbor_text) + self.retrieved_text.append(item) + def init_batch(self, context_tokens: torch.Tensor, context_length: int): self.retrieved = [] + self.retrieved_text = [] """initialize the batch data before the inference steps.""" # Move to GPU. tokenizer = self.model.tokenizer @@ -302,7 +407,9 @@ def init_batch(self, context_tokens: torch.Tensor, context_length: int): for i in range(0, context_length, 64): if i > 0: tokens = context_tokens[:, i - 64 : i] - chunks = self.service.get_knn(tokens) + chunks = self.service.get_knn(tokens, self.neighbors) + if self.store_retrieved: + self._store_retrieved(tokens, chunks) self.retrieved.append(chunks) def prepare_batch_at_step( @@ -313,11 +420,15 @@ def prepare_batch_at_step( if context_length % 64 == 0: # added a new retrieval context token_context = tokens[:, context_length - 64 : context_length] - chunks = self.service.get_knn(token_context) + chunks = self.service.get_knn(token_context, self.neighbors) + if self.store_retrieved: + self._store_retrieved(token_context, chunks) self.retrieved.append(chunks) elif self.frequent_query and len(self.retrieved) > 0: token_context = tokens[:, context_length - 64 : context_length] - chunks = self.service.get_knn(token_context) + chunks = self.service.get_knn(token_context, self.neighbors) + if self.store_retrieved: + self._store_retrieved(token_context, chunks) self.retrieved[-1] = chunks # types2use = None @@ -349,7 +460,11 @@ def prepare_batch_at_step( [set_inference_key_value_memory] * micro_batch_size, device=torch.cuda.current_device() ) len_array = torch.tensor([maxlen] * micro_batch_size, device=torch.cuda.current_device()) - neighbors_array = torch.tensor([self.neighbors] * micro_batch_size, device=torch.cuda.current_device()) + if self.neighbors == 0: + # no retrieval, use 1 padding + neighbors_array = torch.tensor([1] * micro_batch_size, device=torch.cuda.current_device()) + else: + neighbors_array = torch.tensor([self.neighbors] * micro_batch_size, device=torch.cuda.current_device()) batch = [ tokens2use, @@ -373,7 +488,7 @@ def model_inference_strategy_dispatcher(model, **args): if isinstance(model, MegatronGPTPromptLearningModel): return PromptLearningModelTextGenerationStrategy(model, **args) - if isinstance(model, MegatronGPTModel): + elif isinstance(model, MegatronGPTModel): return GPTModelTextGenerationStrategy(model) elif isinstance(model, MegatronRetrievalModel): return RetroModelTextGenerationStrategy(model, **args) diff --git a/nemo/collections/nlp/modules/common/text_generation_utils.py b/nemo/collections/nlp/modules/common/text_generation_utils.py index aeeedd24148f..983203c51965 100644 --- a/nemo/collections/nlp/modules/common/text_generation_utils.py +++ b/nemo/collections/nlp/modules/common/text_generation_utils.py @@ -16,6 +16,7 @@ from collections.abc import Iterable +import numpy as np import torch import torch.nn.functional as F @@ -213,6 +214,19 @@ def repetition_penalty(logits, repetition_penalty, used_tokens): return logits +def get_model_parallel_src_rank(): + """Calculate the global rank corresponding to the first local rank + in the model parallel group.""" + world_size = torch.distributed.get_world_size() + all_ranks = np.arange(world_size) + tp_size = parallel_state.get_tensor_model_parallel_world_size() + pp_size = parallel_state.get_pipeline_model_parallel_world_size() + # [pipeline dim, data parallel, tensor dim] + all_ranks = all_ranks.reshape(pp_size, -1, tp_size) + dp_rank = parallel_state.get_data_parallel_rank() + return all_ranks[:, dp_rank, :].min() + + def send_generate_info( context_tokens_tensor, context_length_tensor, @@ -228,6 +242,8 @@ def send_generate_info( """ Needs to be synced up with receive_generate_info """ + model_parallel_group = parallel_state.get_model_parallel_group() + src = get_model_parallel_src_rank() # Send the sizes of the tensors input_info = [ context_tokens_tensor.size(0), # batch_size @@ -242,19 +258,21 @@ def send_generate_info( min_tokens_to_generate, ] input_info_tensor = torch.cuda.FloatTensor(input_info) - torch.distributed.broadcast(input_info_tensor, 0) + torch.distributed.broadcast(input_info_tensor, src, model_parallel_group) # Send variables to all ranks - torch.distributed.broadcast(context_length_tensor, 0) - torch.distributed.broadcast(context_tokens_tensor, 0) + torch.distributed.broadcast(context_length_tensor, src, model_parallel_group) + torch.distributed.broadcast(context_tokens_tensor, src, model_parallel_group) def receive_generate_info(): """ Needs to be synced up with send_generate_info """ + model_parallel_group = parallel_state.get_model_parallel_group() + src = get_model_parallel_src_rank() input_info_tensor = torch.empty(10, dtype=torch.float32, device=torch.cuda.current_device()) - torch.distributed.broadcast(input_info_tensor, 0) + torch.distributed.broadcast(input_info_tensor, src, model_parallel_group) batch_size = int(input_info_tensor[0].item()) seq_len = int(input_info_tensor[1].item()) tokens_to_generate = int(input_info_tensor[2].item()) @@ -269,8 +287,8 @@ def receive_generate_info(): context_length_tensor = torch.empty(batch_size, dtype=torch.int64, device=torch.cuda.current_device()) context_tokens_tensor = torch.empty(batch_size, seq_len, dtype=torch.int64, device=torch.cuda.current_device()) # Send variables to all ranks - torch.distributed.broadcast(context_length_tensor, 0) - torch.distributed.broadcast(context_tokens_tensor, 0) + torch.distributed.broadcast(context_length_tensor, src, model_parallel_group) + torch.distributed.broadcast(context_tokens_tensor, src, model_parallel_group) return ( context_length_tensor, @@ -408,7 +426,7 @@ def generate( else: inference_strategy = model_inference_strategy_dispatcher(model, **strategy_args) tokenizer = model.tokenizer - if torch.distributed.get_rank() == 0: + if torch.distributed.get_rank() == get_model_parallel_src_rank(): if isinstance(inputs, tuple): context_tokens_tensor, context_length_tensor = inputs else: diff --git a/requirements/requirements_nlp.txt b/requirements/requirements_nlp.txt index b9e8d6a9d726..fa7f232c1a41 100644 --- a/requirements/requirements_nlp.txt +++ b/requirements/requirements_nlp.txt @@ -5,6 +5,7 @@ fasttext flask_restful ftfy gdown +gradio==3.4.0 h5py ijson inflect diff --git a/scripts/nlp_language_modeling/build_retrieval_index.py b/scripts/nlp_language_modeling/build_retrieval_index.py index e48e08d72fa9..30dfacf77a08 100644 --- a/scripts/nlp_language_modeling/build_retrieval_index.py +++ b/scripts/nlp_language_modeling/build_retrieval_index.py @@ -149,7 +149,7 @@ def process_sentence_chunks( logging.info(f"Use {use_num_docs} out of {num_docs} docs to build index") total_chunks = ds._index._chunk_id_start[min(use_num_docs, num_docs - 1)] logging.info(f"{total_chunks} chunks are used to build the index") - + start = 0 if stage is None or stage == 0: beg = time.time() # only prepare the warmup batch for stage None and stage 0 @@ -307,8 +307,18 @@ def get_emb(): faiss.write_index(index, args.output_file) logging.info(f'Write to {args.output_file}, Size of Index : {index.ntotal}') # consolidate it as one index - out_index = faiss.read_index(args.output_file) - faiss.write_index(out_index, args.output_file) + if args.devices is None or not torch.cuda.is_available(): + device_list = None + else: + device_list = ['cuda:' + str(device) for device in args.devices.split(',')] + index = faiss.read_index(args.output_file) + co = faiss.GpuMultipleClonerOptions() + co.useFloat16 = True + co.usePrecomputed = False + co.shard = True + index = faiss.index_cpu_to_all_gpus(index, co, ngpu=len(device_list)) + index = faiss.index_gpu_to_cpu(index) + faiss.write_index(index, args.output_file) sys.exit(0) model = SentenceTransformer(args.sentence_transformer_model) From f02c88f637c8c0f771f8993fcc1a0df6511d4540 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 9 Dec 2022 15:55:03 -0800 Subject: [PATCH 017/154] Add align function docstrings and make most args optional Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 56 +++++++++++++------------ tools/nemo_forced_aligner/align.py | 63 ++++++++++++++++++++++------- 2 files changed, 76 insertions(+), 43 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 44dce0edef8e..fa1743182c44 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -2,47 +2,45 @@ A tool for doing Forced Alignment using Viterbi decoding of NeMo CTC-based models. -## How do I use NeMo Forced Aligner? -To use NFA, all you need to provide is a correct NeMo manifest (with 'audio_filepath' and 'text' fields). - -Call the align function in align.py, specifying the parameters as follows: +# Example script -* `model_name`: Any Quartznet, Citrinet, Conformer CTC model should work, in any language (only English has been tested so far). You can provide a pretrained model name or a path to a saved '.nemo' file. +```python +from align import align -> Note: If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely OOM. +if __name__ == "__main__": + align( + manifest_filepath=, + output_ctm_folder=, + ) -* `model_downsample_factor`: should be 2 if your model is QuartzNet, 4 if it is Conformer CTC, 8 if it is Citrinet. +``` -* `manifest_filepath`: The path to the manifest of the data you want to align. +## How do I use NeMo Forced Aligner? +To use NFA, all you need to provide is a correct NeMo manifest (with 'audio_filepath' and 'text' fields). -* `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called /.ctm. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `utt_id_extractor_func`. +Call the align function in align.py, specifying the parameters as follows: -* `grouping_for_ctm`: A string, either 'word' or 'basetoken'. 'basetoken' can mean either a token or character, depending on the model used for alignment. If you select 'basetoken' - the code will output a CTM with alignments at the token/character level. If you select 'word' - the code will group the tokens/characters into words. +* `manifest_filepath`: The path to the manifest of the data you want to align. -* [OPTIONAL] `utt_id_extractor_func`: The function mapping the audio_filepath to the utt_id that will be used to save CTM files. +* `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. -* `audio_sr`: The sample rate of your audio +* [OPTIONAL] `model_name`: Any Quartznet, Citrinet, Conformer CTC model should work, in any language (only English has been tested so far). You can provide a pretrained model name or a path to a saved '.nemo' file. (Default: "stt_en_citrinet_1024_gamma_0_25") +> Note: If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely give Out Of Memory errors. -* `device`: The device where you want your ASR Model to be loaded into. +* [OPTIONAL] `model_downsample_factor`: the downsample factor of the ASR model. It should be 2 if your model is QuartzNet, 4 if it is Conformer CTC, 8 if it is Citrinet (Default: 8, to match with the default model_name "stt_en_citrinet_1024_gamma_0_25"). -* `batch_size`: The batch_size you wish to use for transcription and alignment. +* [OPTIONAL] `grouping_for_ctm`: A string, either 'word' or 'basetoken'. 'basetoken' can mean either a token or character, depending on the model used for alignment. If you select 'basetoken' - the code will output a CTM with alignments at the token/character level. If you select 'word' - the code will group the tokens/characters into words (Default: "word"). +* [OPTIONAL] `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath) -# Example script +* [OPTIONAL] `audio_sr`: The sample rate (in Hz) of your audio. (Default: 16000) -```python -from align import align +* [OPTIONAL] `device`: The device that will be used for generating log-probs and doing Viterbi decoding. (Default: 'cpu'). -if __name__ == "__main__": - align( - model_name="stt_en_citrinet_1024_gamma_0_25", - model_downsample_factor=8, - manifest_filepath=, - output_ctm_folder=, - grouping_for_ctm="word", # either "word" or "basetoken" - audio_sr=16000, # the sampling rate of your audio files - device="cuda:0", # the device that the ASR model will be loaded into - batch_size=1, - ) +* [OPTIONAL] `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). -``` \ No newline at end of file +# CTM file format +For each utterance specified in a line of `manifest_filepath`, one CTM file will be generated at the location of `/.ctm`. +Each CTM file will contain lines of the format: +` 1 `. +Note the second item in the line (the 'channel ID', which is required by the CTM file format) is always 1, as NFA operates on single channel audio. diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index f5bf9ef4b8a8..a157b6000db7 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -103,26 +103,61 @@ def align_batch(data, model): def align( - manifest_filepath, - model_name, - model_downsample_factor, - output_ctm_folder, - grouping_for_ctm, - n_parts_for_ctm_id=1, - audio_sr=16000, - device="cuda:0", - batch_size=1, + manifest_filepath: str, + output_ctm_folder: str, + model_name: str = "stt_en_citrinet_1024_gamma_0_25", + model_downsample_factor: int = 8, + grouping_for_ctm: str = "word", + n_parts_for_ctm_id: int = 1, + audio_sr: int = 16000, + device: str = "cpu", + batch_size: int = 1, ): """ - Function that does alignment of utterances in `manifest_filepath`. - Results are saved in ctm files in `output_ctm_folder`. - Returns the most recent alignment, v_matrix, log_probs_reordered, in case you want to inspect them - in a parent function. + Function that does alignment of utterances in manifest_filepath. + Results are saved in ctm files in output_ctm_folder. + + Args: + manifest_filepath: filepath to the manifest of the data you want to align. + output_ctm_folder: the folder where output CTM files will be saved. + model_name: string specifying a NeMo ASR model to use for generating the log-probs + which we will use to do alignment. + If the string ends with '.nemo', the code will treat `model_name` as a local filepath + from which it will attempt to restore a NeMo model. + If the string does not end with '.nemo', the code will attempt to download a model + with name `model_name` from NGC. + model_downsample_factor: an int indicating the downsample factor of the ASR model, ie the ratio of input + timesteps to output timesteps. + If the ASR model is a QuartzNet model, its downsample factor is 2. + If the ASR model is a Conformer CTC model, its downsample factor is 4. + If the ASR model is a Citirnet model, its downsample factor is 8. + grouping_for_ctm: a string, either 'word' or 'basetoken'. + If 'basetoken', the CTM files will contain timestamps for either the tokens or + characters in the ground truth (depending on whether it is a token-based or + character-based model). + If 'word', the basetokens will be grouped into words, and the CTM files + will contain word timestamps instead of basetoken timestamps. + n_parts_for_ctm_id: int specifying how many how many of the 'parts' of the audio_filepath + we will use (starting from the final part of the audio_filepath) to determine the + utt_id that will be used in the CTM files. + e.g. if audio_filepath is "/a/b/c/d/e.wav" and n_parts_for_ctm_id is 1 => utt_id will be "e" + e.g. if audio_filepath is "/a/b/c/d/e.wav" and n_parts_for_ctm_id is 2 => utt_id will be "d_e" + e.g. if audio_filepath is "/a/b/c/d/e.wav" and n_parts_for_ctm_id is 3 => utt_id will be "c_d_e" + audio_sr: int specifying the sample rate of your audio files. + device: string specifying the device that will be used for generating log-probs and doing + Viterbi decoding. The string needs to be in a format recognized by torch.device() + batch_size: int specifying batch size that will be used for generating log-probs and doing Viterbi decoding. + + + Returns: + alignment: + v_matrix: + log_probs_reordered: """ # load model device = torch.device(device) - if ".nemo" in model_name: + if model_name.endswith('.nemo'): model = ASRModel.restore_from(model_name, map_location=device) else: model = ASRModel.from_pretrained(model_name, map_location=device) From 698305b658474c9a8ef159e3392c512b782ba18c Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 9 Dec 2022 16:11:39 -0800 Subject: [PATCH 018/154] Remove redundant returns of viterbi and log probs matrices Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index a157b6000db7..2ea8832bd3e9 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -36,7 +36,7 @@ def viterbi_decoding(log_probs, y, T, U): tensor of shape (Batch, Time, Length_of_U_including_interspersed_blanks) This is returned in case you want to examine it in a parent function """ - B, T_max, V = log_probs.shape + B, T_max, _ = log_probs.shape U_max = y.shape[1] device = log_probs.device @@ -92,14 +92,14 @@ def viterbi_decoding(log_probs, y, T, U): alignment_b.insert(0, int(backpointers[b, t, alignment_b[0]])) alignments.append(alignment_b) - return alignments, v_matrix, log_probs_reordered + return alignments def align_batch(data, model): log_probs, y, T, U = get_log_probs_y_T_U(data, model) - alignments, v_matrix, log_probs_reordered = viterbi_decoding(log_probs, y, T, U) + alignments = viterbi_decoding(log_probs, y, T, U) - return alignments, v_matrix, log_probs_reordered + return alignments def align( @@ -148,11 +148,6 @@ def align( Viterbi decoding. The string needs to be in a format recognized by torch.device() batch_size: int specifying batch size that will be used for generating log-probs and doing Viterbi decoding. - - Returns: - alignment: - v_matrix: - log_probs_reordered: """ # load model @@ -181,7 +176,7 @@ def align( for start, end in zip(starts, ends): data = get_manifest_lines(manifest_filepath, start, end) - alignments, v_matrix, log_probs_reordered = align_batch(data, model) + alignments = align_batch(data, model) if grouping_for_ctm == "basetoken": make_basetoken_ctm( @@ -196,4 +191,4 @@ def align( else: raise ValueError(f"Unexpected value for grouping_for_ctm: {grouping_for_ctm}") - return alignments, v_matrix, log_probs_reordered + return None From 4fa2fd128e23c8567f966a383cb8ef20fa764b9c Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 9 Dec 2022 16:19:50 -0800 Subject: [PATCH 019/154] Rename h# to Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/utils.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/nemo_forced_aligner/utils.py b/tools/nemo_forced_aligner/utils.py index 65337c5a3c23..198041eb8e1c 100644 --- a/tools/nemo_forced_aligner/utils.py +++ b/tools/nemo_forced_aligner/utils.py @@ -212,12 +212,12 @@ def make_word_ctm( for manifest_line, alignment in zip(data, alignments): # make 'words_info' - a list of dictionaries: # e.g. [ - # {"word": "h#", "u_start": 0, "u_end": 0, "t_start": 0, "t_end", 0 }, + # {"word": "", "u_start": 0, "u_end": 0, "t_start": 0, "t_end", 0 }, # {"word": "she", "u_start": 1, "u_end": 5, "t_start": 1, "t_end", 5 }, # {"word": "had", "u_start": 7, "u_end": 10, "t_start": 6, "t_end", 8 } # ... # ] - words_info = [{"word": "h#", "u_start": 0, "u_end": 0, "t_start": 0, "t_end": None}] + words_info = [{"word": "", "u_start": 0, "u_end": 0, "t_start": 0, "t_end": None}] u_counter = 1 if " " not in model.decoder.vocabulary: for word in manifest_line["text"].split(" "): From 7aad56023582513622f5d9cef99388b741e155eb Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 9 Dec 2022 19:12:50 -0800 Subject: [PATCH 020/154] Update manifest format description in README Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 13 +++++++++++-- tools/nemo_forced_aligner/align.py | 3 ++- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index fa1743182c44..8f88f0010033 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -20,7 +20,7 @@ To use NFA, all you need to provide is a correct NeMo manifest (with 'audio_file Call the align function in align.py, specifying the parameters as follows: -* `manifest_filepath`: The path to the manifest of the data you want to align. +* `manifest_filepath`: The path to the manifest of the data you want to align, containing 'audio_filepath' and 'text' fields. * `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. @@ -39,7 +39,16 @@ Call the align function in align.py, specifying the parameters as follows: * [OPTIONAL] `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). -# CTM file format + +# Input manifest file format +NFA needs to be provided with a 'manifest' file where each line specifies the "audio_filepath" and "text" of each utterance that you wish to produce alignments for, like the format below: +```json +{"audio_filepath": "/path/to/audio.wav", "text": "the transcription of the utterance"} +``` +> Note: NFA does not require "duration" fields, and can align long audio files without running out of memory. Depending on your machine specs, you can align audios up to 5-10 minutes on Conformer CTC models, up to around 1.5 hours for QuartzNet models, and up to several hours for Citrinet models. NFA will also produce better alignments the more accurate the ground-truth "text" is. + + +# Output CTM file format For each utterance specified in a line of `manifest_filepath`, one CTM file will be generated at the location of `/.ctm`. Each CTM file will contain lines of the format: ` 1 `. diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 2ea8832bd3e9..e672e4928e58 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -118,7 +118,8 @@ def align( Results are saved in ctm files in output_ctm_folder. Args: - manifest_filepath: filepath to the manifest of the data you want to align. + manifest_filepath: filepath to the manifest of the data you want to align, + containing 'audio_filepath' and 'text' fields. output_ctm_folder: the folder where output CTM files will be saved. model_name: string specifying a NeMo ASR model to use for generating the log-probs which we will use to do alignment. From e42d121c6a7948a20746ce8af927a70d0b9d6880 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 9 Dec 2022 19:21:55 -0800 Subject: [PATCH 021/154] always remove any spaces from utt_id Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 2 +- tools/nemo_forced_aligner/align.py | 12 +++++++----- tools/nemo_forced_aligner/utils.py | 1 + 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 8f88f0010033..fc29a7633f78 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -31,7 +31,7 @@ Call the align function in align.py, specifying the parameters as follows: * [OPTIONAL] `grouping_for_ctm`: A string, either 'word' or 'basetoken'. 'basetoken' can mean either a token or character, depending on the model used for alignment. If you select 'basetoken' - the code will output a CTM with alignments at the token/character level. If you select 'word' - the code will group the tokens/characters into words (Default: "word"). -* [OPTIONAL] `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath) +* [OPTIONAL] `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be stripped away from the utt_id, so as not to change the number of space-separated elements in the CTM files. * [OPTIONAL] `audio_sr`: The sample rate (in Hz) of your audio. (Default: 16000) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index e672e4928e58..96b75a7a5ecf 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -138,12 +138,14 @@ def align( character-based model). If 'word', the basetokens will be grouped into words, and the CTM files will contain word timestamps instead of basetoken timestamps. - n_parts_for_ctm_id: int specifying how many how many of the 'parts' of the audio_filepath + n_parts_for_ctm_id: int specifying how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the - utt_id that will be used in the CTM files. - e.g. if audio_filepath is "/a/b/c/d/e.wav" and n_parts_for_ctm_id is 1 => utt_id will be "e" - e.g. if audio_filepath is "/a/b/c/d/e.wav" and n_parts_for_ctm_id is 2 => utt_id will be "d_e" - e.g. if audio_filepath is "/a/b/c/d/e.wav" and n_parts_for_ctm_id is 3 => utt_id will be "c_d_e" + utt_id that will be used in the CTM files. Note also that any spaces that are present in the audio_filepath + will be stripped away from the utt_id, so as not to change the number of space-separated elements in the + CTM files. + e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 1 => utt_id will be "e1" + e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 2 => utt_id will be "d_e1" + e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 3 => utt_id will be "c_d_e1" audio_sr: int specifying the sample rate of your audio files. device: string specifying the device that will be used for generating log-probs and doing Viterbi decoding. The string needs to be in a format recognized by torch.device() diff --git a/tools/nemo_forced_aligner/utils.py b/tools/nemo_forced_aligner/utils.py index 198041eb8e1c..f3174c9bb963 100644 --- a/tools/nemo_forced_aligner/utils.py +++ b/tools/nemo_forced_aligner/utils.py @@ -121,6 +121,7 @@ def get_log_probs_y_T_U(data, model): def _get_utt_id(audio_filepath, n_parts_for_ctm_id): fp_parts = Path(audio_filepath).parts[-n_parts_for_ctm_id:] utt_id = Path("_".join(fp_parts)).stem + utt_id = "".join(utt_id.split()) # remove any spaces that may have been in the filepath return utt_id From 7d7e529d09e533579c752cc1d7b3c586a6c7c3f1 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 9 Dec 2022 22:47:01 -0800 Subject: [PATCH 022/154] Patch the hanging of threads on very large stderr (#5589) (#5590) Signed-off-by: smajumdar Signed-off-by: smajumdar Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Elena Rastorgueva --- nemo/core/utils/process_launcher/launcher.py | 19 ------------------- 1 file changed, 19 deletions(-) diff --git a/nemo/core/utils/process_launcher/launcher.py b/nemo/core/utils/process_launcher/launcher.py index 54d0cbd211ef..7be6c52801ad 100644 --- a/nemo/core/utils/process_launcher/launcher.py +++ b/nemo/core/utils/process_launcher/launcher.py @@ -240,25 +240,6 @@ def launch(launcher, job_overrides: Sequence[Sequence[str]], initial_job_idx: in # Poll for samples in batch to finish. if len(subprocess_list) > 0: - finished_processes = [0] * len(subprocess_list) - - # Check if all processes are completed or not - # TODO: This is busy waiting, need to check if its really needed or we can do one time communicate() - while sum(finished_processes) < len(subprocess_list): - # Check all processes to make sure they have a retcode (doesnt matter yet if 0 or not) - for proc_idx, proc in enumerate(subprocess_list): - # poll() is cheaper op than communicate() - retcode = proc.poll() - - if retcode is not None: - # Log that the process with some ID has finished - if finished_processes[proc_idx] == 0: - logging.info(f"Processed job : {len(ret) + proc_idx}") - - finished_processes[proc_idx] = 1 - - time.sleep(1.0) - # Process all the subprocess results for proc, res in zip(subprocess_list, results): # Wait until completion of process From d4aae2a512931ba9d47d9a8987f1be5cfcbdbc57 Mon Sep 17 00:00:00 2001 From: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com> Date: Fri, 9 Dec 2022 22:58:54 -0800 Subject: [PATCH 023/154] O2 style amp for gpt3 ptuning (#5246) * enable amp o2 plugin Signed-off-by: Jimmy Zhang * only create master param if param requires gradient Signed-off-by: Jimmy Zhang * remove pytorch autocast Signed-off-by: Jimmy Zhang * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jimmy Zhang * Update optimizer_with_main_params.py Signed-off-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com> * create master grad only if param group requires grad Signed-off-by: Jimmy Zhang * fix grad scaler for pp > 1 Signed-off-by: Jimmy Zhang Signed-off-by: Jimmy Zhang Signed-off-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com> Co-authored-by: Jimmy Zhang Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev Signed-off-by: Elena Rastorgueva --- .../megatron_gpt_prompt_learning.py | 28 ++++++++++++------- .../megatron_gpt_prompt_learning_model.py | 22 +++++++-------- nemo/core/optim/optimizer_with_main_params.py | 7 +++-- 3 files changed, 32 insertions(+), 25 deletions(-) diff --git a/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py b/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py index 554178e7e95d..308fde5f519e 100644 --- a/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py +++ b/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py @@ -23,6 +23,7 @@ ) from nemo.collections.nlp.parts.nlp_overrides import ( GradScaler, + MegatronHalfPrecisionPlugin, NLPDDPStrategy, NLPSaveRestoreConnector, PipelineMixedPrecisionPlugin, @@ -47,18 +48,25 @@ def main(cfg) -> None: logging.info("\n\n************** Experiment configuration ***********") logging.info(f'\n{OmegaConf.to_yaml(cfg)}') + megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False) + plugins = [] strategy = NLPDDPStrategy(no_ddp_communication_hook=True, find_unused_parameters=False,) - if cfg.trainer.precision == 16: - scaler = GradScaler( - init_scale=cfg.model.get('native_amp_init_scale', 2 ** 32), - growth_interval=cfg.model.get('native_amp_growth_interval', 1000), - hysteresis=cfg.model.get('hysteresis', 2), - enabled=False - if cfg.model.pipeline_model_parallel_size > 1 - else True, # turn off the grad scale for pipeline parallel LM model - ) - plugins.append(PipelineMixedPrecisionPlugin(precision=cfg.trainer.precision, device='cuda', scaler=scaler)) + if cfg.trainer.precision in [16, 'bf16']: + scaler = None + if cfg.trainer.precision == 16: + scaler = GradScaler( + init_scale=cfg.model.get('native_amp_init_scale', 2 ** 32), + growth_interval=cfg.model.get('native_amp_growth_interval', 1000), + hysteresis=cfg.model.get('hysteresis', 2), + enabled=False + if cfg.model.pipeline_model_parallel_size > 1 + else True, # turn off the grad scale for pipeline parallel LM model + ) + if megatron_amp_o2: + plugins.append(MegatronHalfPrecisionPlugin(precision=cfg.trainer.precision, device='cuda', scaler=scaler)) + else: + plugins.append(PipelineMixedPrecisionPlugin(precision=cfg.trainer.precision, device='cuda', scaler=scaler)) if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py index febec3fe0da5..96f0649061dc 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py @@ -131,8 +131,7 @@ def __init__(self, cfg: DictConfig, trainer: Trainer): override_config_path=frozen_model_cfg, ).to(dtype=self.autocast_dtype) - # TODO: Enable amp_o2 training - self.megatron_amp_o2 = False + self.megatron_amp_o2 = self.cfg.get('megatron_amp_O2', False) self.pipeline_parallel = self.cfg.get('pipeline_model_parallel_size', 1) > 1 self.tokenizer = self.frozen_model.tokenizer self.hidden_size = self.frozen_model.cfg.hidden_size @@ -455,16 +454,15 @@ def forward( inference_max_sequence_len=inference_max_sequence_len, ) else: - with torch.autocast(device_type="cuda", dtype=self.autocast_dtype): - output = self.frozen_model.model( - input_ids=None, - position_ids=None, - encoder_input=encoder_input, - attention_mask=attention_mask, - labels=labels, - set_inference_key_value_memory=set_inference_key_value_memory, - inference_max_sequence_len=inference_max_sequence_len, - ) + output = self.frozen_model.model( + input_ids=None, + position_ids=None, + encoder_input=encoder_input, + attention_mask=attention_mask, + labels=labels, + set_inference_key_value_memory=set_inference_key_value_memory, + inference_max_sequence_len=inference_max_sequence_len, + ) return output diff --git a/nemo/core/optim/optimizer_with_main_params.py b/nemo/core/optim/optimizer_with_main_params.py index d4721b3a71ce..fe7ed0e5bf1f 100644 --- a/nemo/core/optim/optimizer_with_main_params.py +++ b/nemo/core/optim/optimizer_with_main_params.py @@ -217,7 +217,8 @@ def __init__( num_elements[i] = num_elements.get(i, 0) + param.data.nelement() # Allocate gradient memory buffers for each data type - self._main_grad_buffers[i] = GradBucket(num_elements[i], self._grad_allreduce_chunk_size_mb) + if any(param.requires_grad for param in param_group['params']): + self._main_grad_buffers[i] = GradBucket(num_elements[i], self._grad_allreduce_chunk_size_mb) # Three groups of parameters: self.float16_groups = [] # original float16 parameters @@ -278,8 +279,8 @@ def __init__( ) # Add gradient accumulation hook for fp32 grad accumulation - if self._fp32_grad_accum: - # Expand so we get access to grad_fn. + if self._fp32_grad_accum and param.requires_grad: + # Expand so we get access to grad_fn param_tmp = param.expand_as(param) # Get the gradient accumulator function. grad_acc = param_tmp.grad_fn.next_functions[0][0] From 634cd2d941abee40810f28723e3247dc347c838a Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 9 Dec 2022 23:13:56 -0800 Subject: [PATCH 024/154] Better patch hydra (#5591) (#5592) * Readd buffereing and thread drain to Hydra Launcher Signed-off-by: smajumdar * Readd buffereing and thread drain to Hydra Launcher Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: smajumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- nemo/core/utils/process_launcher/launcher.py | 52 ++++++++++++++++++-- 1 file changed, 48 insertions(+), 4 deletions(-) diff --git a/nemo/core/utils/process_launcher/launcher.py b/nemo/core/utils/process_launcher/launcher.py index 7be6c52801ad..2fb7faa27a4d 100644 --- a/nemo/core/utils/process_launcher/launcher.py +++ b/nemo/core/utils/process_launcher/launcher.py @@ -17,6 +17,7 @@ import os import subprocess import sys +import threading import time from dataclasses import dataclass from pathlib import Path @@ -144,6 +145,13 @@ def execute_job( # https://stackoverflow.com/questions/39607172/python-subprocess-popen-poll-seems-to-hang-but-communicate-works proc = subprocess.Popen(cmd, stderr=subprocess.PIPE) + # Setup data thread for stderr + std_error_buffer = [] + # Trivial thread just reads lines from stdout into the list + drainerthread = threading.Thread(target=std_error_buffer.extend, args=(proc.stderr,)) + drainerthread.daemon = True + drainerthread.start() + # Construct JobReturn object for Hydra res = JobReturn() res.cfg = task_cfg @@ -152,7 +160,7 @@ def execute_job( res.working_dir = os.getcwd() res.return_value = None - return proc, res + return proc, res, (std_error_buffer, drainerthread) def launch(launcher, job_overrides: Sequence[Sequence[str]], initial_job_idx: int,) -> Sequence[JobReturn]: @@ -213,6 +221,10 @@ def launch(launcher, job_overrides: Sequence[Sequence[str]], initial_job_idx: in subprocess_list = [] # Buffer of subprocess results = [] # Buffer of JobResult + # STD ERROR cache + std_error_buffers = [] # type: List[List[str]] + std_error_threads = [] # type: threading.Thread + # Run over all job combinations while job_idx < num_overrides: # Fill up subprocess buffer while its size is smaller than multiplex batch size @@ -222,7 +234,7 @@ def launch(launcher, job_overrides: Sequence[Sequence[str]], initial_job_idx: in break # Submit a job as a new process - process, res = execute_job( + process, res, error_tup = execute_job( initial_job_idx + job_idx, overrides[job_idx], launcher.hydra_context, @@ -235,13 +247,45 @@ def launch(launcher, job_overrides: Sequence[Sequence[str]], initial_job_idx: in subprocess_list.append(process) results.append(res) + # Manage stderror thread data + std_error_buffers.append(error_tup[0]) + std_error_threads.append(error_tup[1]) + job_idx += 1 gpu_idx += 1 # Poll for samples in batch to finish. if len(subprocess_list) > 0: + finished_processes = [0] * len(subprocess_list) + + # Check if all processes are completed or not + # This is busy waiting, this is actually quite necessary + # Turns out that when you do proc.communicate(), you block all other threads immediately. + # IE they may fill up their buffers entirely, and hang while they wait for the first thread + # who called communicate() to finish its work or crash. + # Effectively it entirely stops multiprocessing jobs or multiplexed runs. + # Must poll and busy wait to keep threads alive, along with drain the pipes with thread buffers. + while sum(finished_processes) < len(subprocess_list): + # Check all processes to make sure they have a retcode (doesnt matter yet if 0 or not) + for proc_idx, proc in enumerate(subprocess_list): + # poll() is cheaper op than communicate() + retcode = proc.poll() + + if retcode is not None: + # Log that the process with some ID has finished + if finished_processes[proc_idx] == 0: + logging.info(f"Processed job : {len(ret) + proc_idx}") + + finished_processes[proc_idx] = 1 + + # Join this thread and merge its stderror buffer + std_error_threads[proc_idx].join() + std_error_buffers[proc_idx] = b''.join(std_error_buffers[proc_idx]) + + time.sleep(1.0) + # Process all the subprocess results - for proc, res in zip(subprocess_list, results): + for proc_idx, (proc, res) in enumerate(zip(subprocess_list, results)): # Wait until completion of process output, error = proc.communicate() @@ -260,7 +304,7 @@ def launch(launcher, job_overrides: Sequence[Sequence[str]], initial_job_idx: in f"\nHyperparameter Arguments : {proc.args}\n" f"Process Return code : {proc.returncode}\n" f"Error Trace :\n" - f"{str(error, encoding='utf-8').encode('utf-8').decode('utf-8')}" + f"{str(std_error_buffers[proc_idx], encoding='utf-8').encode('utf-8').decode('utf-8')}" ) res.return_value = Exception(error_msg) res.status = JobStatus.FAILED From c32fdaae030d87b54c695d90ff37d669a056568d Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 9 Dec 2022 23:42:53 -0800 Subject: [PATCH 025/154] Yet another fix with hydra multirun (#5594) (#5595) Signed-off-by: smajumdar Signed-off-by: smajumdar Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Elena Rastorgueva --- nemo/core/utils/process_launcher/launcher.py | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/nemo/core/utils/process_launcher/launcher.py b/nemo/core/utils/process_launcher/launcher.py index 2fb7faa27a4d..5d7031cfe8a1 100644 --- a/nemo/core/utils/process_launcher/launcher.py +++ b/nemo/core/utils/process_launcher/launcher.py @@ -279,8 +279,15 @@ def launch(launcher, job_overrides: Sequence[Sequence[str]], initial_job_idx: in finished_processes[proc_idx] = 1 # Join this thread and merge its stderror buffer + proc.wait() std_error_threads[proc_idx].join() - std_error_buffers[proc_idx] = b''.join(std_error_buffers[proc_idx]) + error_data = std_error_buffers[proc_idx] + error_data = [ + str(data, encoding='utf-8').encode('utf-8').decode('utf-8').encode('utf-8') + for data in error_data + ] + + std_error_buffers[proc_idx] = error_data time.sleep(1.0) From 9adfbf81a25de908759c8d2180ea7e98c37f81cc Mon Sep 17 00:00:00 2001 From: Yi Dong <43824965+yidong72@users.noreply.github.com> Date: Mon, 12 Dec 2022 12:25:14 -0500 Subject: [PATCH 026/154] Add RETRO model documentation (#5578) * added retro doc Signed-off-by: Yi Dong * finish data part Signed-off-by: Yi Dong * added the data format Signed-off-by: Yi Dong * added training script Signed-off-by: Yi Dong * added training and evaluation steps Signed-off-by: Yi Dong * edit the text Signed-off-by: Yi Dong * added the images Signed-off-by: Yi Dong * fix beginning Signed-off-by: Yi Dong * fix the grammar Signed-off-by: Yi Dong * trim it down Signed-off-by: Yi Dong * add wandb option Signed-off-by: Yi Dong * add reference Signed-off-by: Yi Dong * fix path Signed-off-by: Yi Dong * added the parameters table Signed-off-by: Yi Dong * fix section Signed-off-by: Yi Dong Signed-off-by: Yi Dong Co-authored-by: Eric Harper Signed-off-by: Elena Rastorgueva --- docs/source/nlp/nemo_megatron/intro.rst | 2 + .../nlp/nemo_megatron/retro/images/arch.png | Bin 0 -> 78188 bytes .../nlp/nemo_megatron/retro/retro_model.rst | 470 ++++++++++++++++++ docs/source/nlp/nlp_all.bib | 36 ++ 4 files changed, 508 insertions(+) create mode 100644 docs/source/nlp/nemo_megatron/retro/images/arch.png create mode 100644 docs/source/nlp/nemo_megatron/retro/retro_model.rst diff --git a/docs/source/nlp/nemo_megatron/intro.rst b/docs/source/nlp/nemo_megatron/intro.rst index b83d89d94ce8..bcc335fa6ee8 100644 --- a/docs/source/nlp/nemo_megatron/intro.rst +++ b/docs/source/nlp/nemo_megatron/intro.rst @@ -7,6 +7,7 @@ team at NVIDIA. NeMo Megatron supports several types of models: * GPT-style models (decoder only) * T5/BART/UL2-style models (encoder-decoder) * BERT-style models (encoder only) +* RETRO model (decoder only) @@ -22,6 +23,7 @@ team at NVIDIA. NeMo Megatron supports several types of models: batching parallelisms prompt_learning + retro/retro_model References diff --git a/docs/source/nlp/nemo_megatron/retro/images/arch.png b/docs/source/nlp/nemo_megatron/retro/images/arch.png new file mode 100644 index 0000000000000000000000000000000000000000..eca506a12ceaa0a26c317824412987cc22a9f299 GIT binary patch literal 78188 zcmeFZcT`i`*Df4X^hlATg7l&ulp>J^iLJ3t6EJ#xk0qH%2AT@LZ zg#c2e1PE1-79g|`NFaPWcs%F#-uL^)_{JUgj&c9EYYcJkoxRsud#<_GeC9Lf+AsBV zG?^K=7(gHp^PSr_?}I=mOhKTdjei^mu0*v2Uj_c7^SrOA210f7ECMIT>{Yc@L7kPd^-^6Am`3aRYPB^M(kHawrpnrz^=L8gc zM9t{mv)gp%MgKj^4*9>$0n+&2B22x~%U_AV1|=laP|hh7PY2pi9eh`UIsuA08~-D{*SffVg*~ersJ@teVXD+D5@Z~h~%6Y!=N zsERtLqz;0lQWBym7Wp}9X`G7hyEQP0XPtmMzASHB=O@7tqF_peSes8YC{&hC^*e33q#!Z1A(%2frSJM2%lCM?*Gg` z>1|FN`01T2&bqv(I3{&Dd~|c#wS)Co{hzB``?jfSA0k(>7LGqpQik$Vgrvhw8&CPz zzVbm%5nHmL2L|~q3jE;~HQlD$`d;W7mr~oLWunrbP9;OlQ9SZ<+o7$UPx6|8~L@4dz(u= zLXIC%vDNr{0SpN>zIVRxhz!3DkwJ?7`ik00yB_258>9Ti_a$!3-XHdTTm@A=B~|Z} z63BWg5OvB2db0uSdzD*lrA3c1Rk!YwwAp8bEi;1@uh!G4A9`oQA5iOlL^;;slX;!w z!NEK!6!VO8D`TOy#U5)UgLx0Td@%;86m!Hu8y6u-W1fRa&6=^|!88hzw)tv1F^*(8 z^8;g+Nndw{Q{`54F_`2)655D47K-KF$%nVUX3Y|x+{}BY<-?>S(xC_abl6;{<)7zHN-}8gM_=B2}xLN!I zJ6|@@ke5tpnUJr-v9+QDh9|i=y)}%kO|Uz$hu6YxHe5BbVduFV{w$cW!X{m)4{_f% z%5lxpmRGmGOr&n(sTI;QQhFP{BCg)`yp(~*fVt*Ev7FgpZrYMb8o#(5@l1;GHQ#8d zt7ka%G$}#Htj~%u5hwZSL*r%;i%$b%x&17E6#IMSOJ(0T!03NMcvN2-fJc~Rw|&|f z(PNBAvzqO>mlu1bibSEDKPHvt&FBWwgcnl%PR_155E+$3WZbZuT{hxa(hj{UbGjiqOJZ&=MyEcQpm zP1c|m#{psjK`IF!jo9q4VgK`Y#3iL1?5#2<<0Ca>cSL@ zG&mzs3LI38vx?wA*K)*7-6r!jUxtbweY^Y0V=jM2sk$S6+07VkqPbmo6lr^^L>WLwz=n(qNO=J4$0llHu;kz4A>ZtA4X2!BKty8QqBatxj!c zUB;#P8oeJ~86P*w3>SB+EfSiFgiN!4V?whPxp9i`ZEnfy_Ww9eFM6Wly+h`VU=4L@ z{S1BS8AYp1kyfahSR!6~*4kCHsidITu3HNl0KW)S7$_b(z0){8q|og0b4;(_3*CM{ zJ~z)XW>fet3qob}xuQ}zAzwG6D!{1K{<~c35z~hry~duQ7i}M&bg?yEs?4moZa|F2 z-Oc>m&uGMOE5fA730GX}qFx8c(=$q1%<~hyb(&a{-gQ~]@=e1mK9?KMhClS_SC zc#?>N2Bg|d*YK(14SZOH@iJ4tpYXK-F67HlGhT&Jr7dmf8B7+!H1O6e<;gm)Lq2(L z`3UHHfd#l5aEh|?en$}#48Cu$$T)AZ@aH9ex3%&^FY=bbZI+W2(HXftF9NJljYmNt%{PCe zIf1N>ap}pQo>Pg8O@B&qL9BY1D!|Xpt~o;<6*hmH$T$ZTGcJm+y*A*=Tro9(Hambk zK&bV##Y*qgIAo?2i`eNhSJ()OgFuV_07TY;se#7CcMe7U_w+;9UCic8w41Cee>yq8 ziCZSkEbAEzo3T0BmVeP7h6m)gARHi4%Yzw}LIZ{)2;mB)@XSdTALs(CZ#CI%FymbI zVi@E+-Z9E5uzOUGu|aq8EPzFSFU26%IB!?clICLHP=mJPSc9m`J2h>>4q^CD-QKu} zKXjh*R7X2BI1nn|6nQckJq=o_d?_@r^+S7ixu6vGFkjf28HFVWPrF($RHd;>SBe_dDgp=s5@+q%|gC{s&R#&GbE5dEGc0|L)rdLCTp>~j#<2|sq zxFOB8y~deJz=AUVp22!ISwrXxx%j7`ox)UNQ&Af>S`(Ky>3iz@E7sl~d39=(gMqDg8p`r!*=z`(3&FLCDB3bQ4j;mKP}qwfQY(web2kI0SMP7FeZJ=m^r#V-c3W#m%0#r6jM4$hBNkgJOpluiKc z_-E=V?gZjzDv77-CLDs9(l-_52Hq50I%VXpl}XS-6rTXGroK@+k7D_SH(0Yf^v^C= zE){%Rk+v4f-t()19!;~U;*2~ZT*S?pEzovC*UWrl$z4Vy+z=B{E8W&FpU}s2QVIfx zCuI^l-La?)gkGC*`mLbcP^lDyS&A-`6y%x{+=Sk^Y0-==+o`_Wfz^9)pSpRom~h%j z{(Up#;N#Hpo>dk1COI=Dt<{<>v7z}sCh+pWUKOIFQhu!HVtk3NMmcn7mzlxG1@`FL zp1;^YM0l-HV7JzEe)Hgzy%W=ZO%-6dB`4wrF-?RIt=c3d--)E$5q@>7ti)Kwh2TqD zR^#@5as#1sh0PnuU&i5m6+47i$AXVv#9$*y5?LN=ggzKv0YT`0o001d-UtLBjBM_k z1^$&V5_qFF4jOMi=7NS8w2q2_B#nOiU_!p(f0kYnHEVl&H~qbH8$w`%l5viyB2)kN zwo1N+(VQS9=+32Mug`&w3uh6fFHPA&o?`cjR5rr!Q~RfiP;-5e(}H z4&|hr4LC5LgcR?wGyNJtyW8$b19lZYxb%x-MrU{>s?OQf!6X9(Q$*OiiU%q|*jmT-XuYzYmOA_ZIk|xe7z9*07%?C`)_ejZJjl0mJ z`_wu%db&p+%L$do8n!A!YUa>tD-C-6)yEbCV6KpiTu;X1@%JY;zuRABsBj>S$Gq&m zEISeBbyc2Vg6UOO$|LB^`iU|bhI^Dlwm0* zG48d{$k`v{UxT5dir|rIA_AUKx>JV<_Gn1!%e()@Xa9fq7HJRvUtj;12>%x`8G`<5 z7W}_z6uKRsb*YLEB*1T|&54JDdS(Bs8~%?f(f1ga8fjOLxzd5E_2rgaYjXfiv(@~q zf3({E7xnJjc?}P7o*xtR{yh6s0>kQyx&rg>i^RGQ_KZBNt=mE>jz0rCt zw=*^BpH7+#ZuAGp3;Nf;0g_5qg;xRoS>0wdK7HbG7lRwNNaAtJD|av#{ik|=Zcjpe z>lLeI?}&wccI#z65J>OyKTuC`eay3hEf{eBT|M27Z?wU%;bA6=z32CKhRmbn@W?e= zz=P%eZ4zTI&vkjGuy&@d$g8CGN;c^#PV+t6TXAb03n;s8Y|We6P+c>s6sO(#jC!z9)|rZt{A&RzepvM$hI$64c2O>2x$i>U^q_v9**9&&g6Od z^U2cTU&d{>-7iN##1NERjC#_N>V)U1Q{fVb;??Qb0-BzreG@())2=lte-Vf)A*o`sWI%w zG^6XI3q1RFe7S1bM90GZTBPpybNJNdYcRe|EiG*OqlfVV@9Ot8+%$`Hx2M$fjlE1dM0g`oM};E2HqB;dmn>mAaVp zhPi_dw3}h4`vU(rAxmHIWZ+nBQPPv@@pB^Kp$*Ne*>7L_QFq*sDb$YI%Rpr`=(yVe9XnL6WpG2iPKaJ|MR04_(jIyg*1g-^^YT9 z=Tl(sudj@TtLL*@Be}$E6-WESF7=>#RxrO3WL?#g^A-tynUx|UKQ=`&7(Z8W-1FXPt9dKEZM-I3hTT97nof#Tc!Hg!aneJu{Wq z6$x#g>)N*YC0oUj-Z6uFn@)YLmSQ`Bj%_5_(#1MhbPc)>r7E_eu)kc^XPvOrV>a8P zQjR@lK8qjRO6l9Bx6UIx-KJna!i>h}8A}GKgzF$l*MET2?0LvkfyfOjgGW|347<3B zL=4$H;oZsdgRl#GOUA&WgVjWVh2^)qWVkn%%cey>r*pfyr^K4OMOn7z&rJBF`Tv>O z(c}P4pw7vl&OZ4j)+ee4yBbU_^N;<^>rmE|qAd{~71)#N zs`-LkA`+2hS~CDsyIR``yRh)lzT1oCh-6U15d%m)Vv8v?GVl5T(a&dMQQq(7yrm^? z1MO8WH1;ZKu8$khS9 z7cA;w6G1qb8f#E1<6`E?h)^@IMB-=t;!P{R7T8CdM?c~;FDa*5`$)dG(JONLaubV1 z7yq(pU(;Vr*kLWBH1UD(Mk>rVrPq$WXFYk+D@9f^*XqXkpayeb5qWZ@Prk4Te8r@s z83qFR{fD(-SlC{EIeCU&ru{an$niV`#3mVG)0QB#wiO!xBJTVMhQC&)4coXX&%D;k z4QK_!h(rI46FA$LYIgfLL|B-?HR}M69V;PcZ9^9>q*~Bi;sTb{xet*j@pB!Ml@{jq zvf0j(f-ffQHzZhliwg-5cSnwZX3zazrbDuMJf7k0Oh~J!vl^H3?rT4OH8(ZTJhGJs z9!w#d%p*Qk*lOBdt^TXpqpw=CHrFQ7sOzKfB*kjWb4?q|9+mr=@5WyND;U075+$_h z1_+d>_77W8{4SwZC-s}r!j+_RrtxXT^_3*~G6r3#ab&74dr=qH$Gh@pLk(tc&0wE1 z-%k%%DwvkCr31AI{`TNOckYxQcdR*vwd~5Z%XK(ane686)_I|59dQl9mWQ4&lX+qI zC_lF+N`ua&9RFenQD|TtECWeOk{z|Dplhx5@16*frVTJY3ldJOQ8Sb-5L+i~TyGKB=F2B(L=plK@*z)pCgYw*(OG_bi zsIwbCB2jNu+(&UU(Vo1QtE`|mFj}l%`Mf{ipDmCXgdN!EUpMZA`~O)QRF9W6s~t!N z6rI7)y?&SwfB}@I?esrlxi0({Gb&|wDhze@%SXBCxhRflpcn0uKcR~7pz#`aYlG?3 zK<_F7uU|G$H*9_A{2)1n28RG;?X{_}&%}SzTp#+hKPN-$;L16(7d+o`wSn=!!Pecx zuwqy4f|mz@WdC`UALR8I%geLa#%Mlgk-FLyXKYQdr$Toq)m*mSoy?LaYrcYR^=O#P)45s}x`RPqAkYt7IU zYm+9=w0v}KKkPr+c1L{ z+4kOdS{als`)>K9M<5N;f6}j!(qo87E^*M-r*~9sn)&GU z=6Vm9e-(58{tx-{CDaPN{%$9EOWmgNMEeMenK~rnZL!KAJkd;iz`#9kO>|@baw-Q( zXzbH@p4aqwVUSpPk4*nRRiA#CAb@qZ$b3r8nEO>rX@L2O+MOUZ@B7-G=IkJgnDO@e z?>@4f2&&y4hCuwEyA?zS-UuaWVz%(#oyMddmGaA^l(ot};WZCOVECOx7nUdTiV zD51~PUW4UK*!;#Qw3Uf_j_;UIy3+QrJJ`}zK%PHXTyRGsb7|0F+NrL`*mniW5`@~)Gw0f9y8wW8^r~YX^Wn3puDbLxIhvRNH|~k7ea5Fv_D|* zC1xft$j*TqfL9y@1@>LJXGlP|xWKuLv@kqSHl88w8w@to9gh4 zn$8sQ$^x$aX(^d&SsOu#3&khgG8&YAhe(z>7dNSzlF&Uq=>rf6v>}N3xflrFOCeP8 z(s8NNkJPX<^$b;wyNCHtIXLe@_dc}D1V&3r&_m}fbYm{d( zst|$bP@X)$V}Ecu8cf}(k7jqis2S~5JeC~Xg)!5+!jNLTW)dfsxS1sB_y%Wy?bij@ zW7jX3XKvQ)Z7gJbCKK@_c_TY5FRP;`PI>und-tlISb^2qXwB66#?z^4Oa| zzA@mPG_HV7=EX@TimSWKM|Z7Sw(?l|rB^T1T1y>0wen+|R_m^Ie0X#5X;4VO7q2Ez zEdPhA95jYT>W(=&*wHn32eSDEpZbXodgspBenef_VsBB}VlPoNSG$ESR|CJDYr1J9 zZ4-Y6GwS`>juHk9Zf^Z`>)^%d8|ChZt)PC>9zw{pC4MELIHJ&LD8Q@scP=0Q~8aXh>`H z-JuX1KYZ{w5@7?;aJ&kH@aCOTjOgp-`BT_wPqVP1&U`vdUkfIS{%aNrr0KPDZT#ToA3BC~QP}Q` z1cG*a^89_fi;Djy!ps`=Jx8XYCTv4A+gP0zsF*OURh;QY`U6 zJ7|25QlFFS7x_MSMOkTVVlR6WI#idVWSt@d+!@Pw_R_siw5$hU_J4?97XH5bej=&1 z(?Zhq)e0I%(Cg296omZ3)f4T)-4y}hAExfl+)B(M^V?8>CsZomy?Zw-)?fLzq>Rer zkv`hwgz-w($qE5LsD4WUKXrVlv|FW9g2;=Nzh0iTEE8$BzhW!V8D{V}>0HjET+4*w zs`_x-@mIT#=cC2uG_#^5!E*BsXxd=>Dk{>fmO%0eMb!DTNRh&_8#+8!Tf;YL>O^jx z3&8&aWp*9Ik2a7mv<%)Sy1LD{Q+MZahbzB|vQSMdapE%71zq%wB+=-{onEDUYt-KS zE}a{gpzP{4ZQl6g>FyH>``iyg(4a-Ykny+`P?Nwxfe3LPd{DhYS{nSew}Sy!?)&P< zFa9NA^BF|?BQDeyQ33}f;_{0=%r@44hJ<2Wq%T`;rH@oV(T><%V5}hzP%T_t(M7;m z5AuJwqiPo>|wA|%*uRShRfe%5ig@Fqo(1<>3KmEaaF=05Qq8yp2 z4;VH;GBk^0b@}iH+Mke4r>*n&4yokMtYDU6+(1nLxYosa54I=|tN-UF(2YD`I=oFy zeT2?=MEUSIC(2XA(lP<1oBtTdD^n@)Tdkg7mTYzMoa-zB*sDEH&x3tZ@_F@$0Y#s+ zrW46pSF67&Cm;p2-VYygSo#mTl_|~ui4{|8+>%SO2DD;s_N^|;FL=$uYqg-){&-A| zXP?I&Ebp#uhH%8i|ITCe>OmloUn(2R!bMC%rXfx2btsGcE{*EghFsIgp~^{)m9df* zr1|y9?E^B7Kj~<#_a_=w6i8uBx}#jw&ZXE5Ev|ozo}_N0yJ}!~XYl>!r+7O>i|Za9 zXX2RJzM1E~g>9r(__K~rHHEGC5z2ITV)(*B=hNs4}@azEc0jHT9Gz)%4bEHz1)AT7@bD9JbvUnAuQecv5F z)Ts0fE|?(xY|rz9H#Luvqod?)*eugf2B|xm`_xywK#zsIVwAQ&$ZZeXy13_uxOaZI zE*+qVN0yC=z7BL9$k+EMlw?yre$*(uKt4Jty1+J4t>*-PK<{?g7S-l6K(L8>j>E7c zBxvD`pLpF`D6^hzif(xb1UnOxV;iYVkZIhFTDc5hs0aBhCSmh=_)W~fW?m}X>3)2v zTiyDtln}^yw>A00^}*pDmrGks(2UV$6(hHKJOPT1N@?uIq@~#w)3jGenl#~GJT5Y2 zUge&VDC?0~O;d!!_BX{IghV9#E{bMBr$j})%~#z=@W zN?CM=N?EP#xnfq{HWskiwY@*O1eoJI5HXlK9}GOn`hy$-nMNVhh@4at`ATqtFQgzf zca=FJBM#`dKfVa+VsRVh5IFL!Nl>vPkT`_e!G${1T5L=?p` z@@wHEZ+$OeyQ@rDE|czMu9=SDwh&hgCup!+&hicB%!o{Fv3A}vuaz`OfqxT2-%Rl~ zo@GdJyGUlOB)>AE&uK^U7h<8y{@cro+Z?zU88*Ow21JVW~}yI)&o+|Im4_^W}SU0tP3(-G3rFzd$tMna=k%(i#8 z+~K_etU*5sZcO7^H1aC$hCypZUJO*cK`sdIJhzT4-I{+QzxnTFHK5 zHnO{Pk>h5n?~j*v=%e8&qz=*IDy!i-zfFst4;Nz;Z%nTrXXcuP_tA$rY2%X@$mH!6 zJiN7Tvs(eho+1Lzf&fGYPc3asS6@h)eguM!S`i7$8(ki?ZdH*6ksi~?D(z1i$n4#AzOLHA+s+fEUwCB5%4JZjt3h8;qvL2@j(_gXGWkhzft4~= z(((R7E6NOq!B9=OKh3(fu_Q$s4Zg`-3{4BB`r&jT2#m0>u!oXu-uY8b3vwv|8|`!9 z>{g%l&E4t;Edq&ymHfJOJ@<&dFS`M+b2W$J3+nyze+*?wA?_MMTtsYK`Vf}-CjgUz zm0g~wf9^z!I%aD_An`6ykXr8*U<7->2V=xH>=v6+RjE#jO;`;SC%46O*!J=WAmV*8 z(X*ax`OD<}0X2+A&RcC(C8FKqg$V{}(T3_#XN$7oEqr}i;$Cvup)F0P2CzBE z2SwdBX0tAtUsh)5@G zfj{U0$Wb}U^5}tg@ZLJ6i_}0;!V!8i!U|BR%6*fqiOZ?J&D&W~;X}TIetu9aN{X_4 z*2*bcIp2;xRu{;x&Er#FJ_m~BhAO8zq76%rI4pB%3*tYr7>$NhIF_9|hKz2_a4ehi zwveO&YZdhhx5>0)idqw?xqjk05q=~J*jiT?G4M}MIS1xeK0+)6hnY1`9>~YcrFCxZx0IA`p$uIpp z>(;K`rxJjp30{V`?vVkShRfLrVF~d-EDaVXiZEQqyyl$Y<1`=S66Ihvs_zg(kNtpG zT!_gNkm6W+1!RU0>Pc4U%MI3hKiE{T#FKe z7OZ2`x1Zi$J%Z_a84wAId0j+R-0YNav4-Z0$PyoH7NIAcPz_`!9D8``<+OkeD*|}^ z>)Dm0Vs2^YPi%PzFCuHU&PUlkC4`p2-qL=PyLmAIn8QkkIUc#(+x?6f?#MR1A>+QK z4-?}P%$eA`FM`}HuLodYQ9{(CANv)r&PRXJpAs{#ZWrd$XlmNbrPkq zdm9Z2_afi_%pDJUUQ@p&jU<0w9uI_CLH{DTmIjDaTjl0+4OR1$oN{sMz(w7i@2IQWZyz_fpAcqFRVRyiOGM_B2#!*f|2Z zXu-mvGDA!4+68j=RZ1Laa4nF0q*l!3Y$_K!Y}aFb<`~fA>`_QguE;3-Q_~EO$Jf&+ z_T(e#CsFlWj%k70B~LR%)O+Od$Rrv=a`(G^THsLUM>)^p5Xc_I+J8lgGTPR2U%-zB z0Gx(y(iJX9oV#k_@ZX6N8QgzkO1OV)_3NU41ManWWaN94)MRx)sj%bs0k37{wiS)+ z3av>ge{X`M!KIJ(a9$OTU+xMQJCK$x0M=_~NH<{(5X5hx6NX(J@uQ7bFcWvppC9~~ zjQQ7l2q6pa#1i8ZBUj$XN-2YtgX(~H4k7o}q&oozKxsjys}JcKAtg51i=+V~Bqe5% zb1Acb=bfbE!#OdFM;Kb8`7w+AdB1!LkX|Q)yj_a2gd$Ol4*J9iVLF+cbZEB@tIIDA zJ!KkbGMveF9(rR>cGsQj@G|X>$`XULo#pz-EdwHzL4p1%zT}eTjN$$J4>8&^hki`B zTI08TI*^`qA$TWmci!@gv$^67i>vq&eg(YW7X7pEY~zDUSCi@OTcn?r%yat*tCA9Y z@`ZDJ3;W+DDTV$3S`N>%Q6DmXd>mo2tDzv5;o&r$$K?I1b`Xd)%!S*yHuXBUaqtv8 zzc{LJ5{(l_yWGGvnm{RiLLq@O8D(90p1Day_=MW}X_ENP{L1R-zbjpIMNHivvekac zFAk=G8KAV)SMb-H8ceen7Hqh!mdT7PHQtO5Q)w{SBJWGTh@TU~C51(kpDqS2bM)`_ zo?W&D;uX@xE8}6dXVF}K}$=4?u zp?TJV&Kk=7l1=1p_)(qO!7ceY%-rYmlQR3d`{BEzf(PzBvpQ=mJj#i?m(eoP`<;%j z?CPoGc_?=vQJ^JXK-~yD=;9mI7b@1;buuc!`pbLaFdBwH0HAqX8aR+E8uclQ4{FL? zY7B(){IoiqyJ$Q~?GN`Z(ih-&q3)fC>EO31xVw!u2E@tF604HdwX>$& zHDpI4w(ZV9?~g^TM>i51fW=Vq5 zQL2a;dynsDB>>l`+_jkI3Il9$|b zZojabcu8TqG-%iVvFRYI|5bCWIGQ54{WFC{;)G%J-KwA^FoA0h-pZs5eb|MUDKfR2 zfP0m%V9xR+kiXkQo};WM=c5X#wgPjm8OHrSn38RJ;g<7Y_R& zv%~{_-{y}X#0}{T`R|w9oW%B=t3Nl|V8>6PU}xGkmq`_|MZ3Pa<3`$?P&BC5gw3Lj zT$Fb2#y2CB3~x<93&7`j0z;F6lSzh&9@uUZ#<~?H)&u_LNz$*eLE8`@`_`~YSj_9i z;H9b$bjfNdB5H$8dc@R}`0@rM2)-OwJcJ<4koe9#URr!&g+B)}u5h9U*}y z2>lx*s?)i>O>dHra-ffh+GK*A`bZnW$X(G38OS0x(HS_`S~ZFF1!Ka2ZD4@ql1GJ5 znRc>%4$iUYHar5jT-Q-UFCTw>e(j`+YdGnm>)g^V*Do_GeDxA?`T(-HeqG=t{ z`6{J~nH;>Rl?JKnUAF zRLspPV-|Z|R8pX4=<}MeKFhGzfl+9eey6gB)UwLrO|z}h`t6vDUtEOiSoB;+?-4||531hzEdAWhY>-Bo@{nVioIJ;a`P~})H$aEF z|Dq^6`>QLXvl1k#GIV`d1lzQ1pUnaP_P^z~jLb=3oj4|&H>JbwNI3@e7g z+ZTe5pMlKWYOvbcgzcbRQJw^%=jOI87Vjl0+zW~QT=$!Ugr9kJ$+D=#V@+)X_ z$=CcC^mx^ww+c#MD;JsUAsVV~7vKlK*JMtpYVHd--T06)h`Ba4e?uw z^Kjf^ob*CkVfKb2%n(PI%p_-mEyXS9I94ofVjv*AFvr$luquDgM*#NoC@%lVS552I zPZ8nEP}W~JjJoYi!Ic$=BM804fpr&RLOVuuLo4* zM577kzqo+8!Ydi%W=^K2Da%aMlrwCHVsM`}zyKN0t&302HJIJMW7kq=57f?A(tM|H zQ{G;`#r4MdC&~;|y5`m%Wv}|N@A3^z=}Na}_*hq17=fN;*{BW4@guKsHx9cdNxL~> zuc>@-lXqP3d6jOMJNK!Vh}r{mLFy zRG$0FQ7j{_3L$;l@Q=g=leo|fo0g}VtxQRw+b8S03u%a?IW+#|PfDQR^m+$L=_@gl zk1+tJLI>$lpt*2hk3_!jMjo~nWJUiv<@%^#V`zZuUG@4 zAc(L!MSbwp+MgX1o5pa-F3{iA7jZV1K}S57iRSa`{c-#hj{`SIpuy(ar6!xrZ!5{m z*oY$#HCt`Qu14WP4Hz*76T5oYZ^-vQ zQ!L%w5U~Zk_i|{+hQFzP6K0faNWVpl`=-R(( z?sG`vcm6i94<;?uur@U8=kbp~(}-UM{p-bzNq>O!co;6&jaLTdB`yxx=Dd-r-}7|} zkm?2m1&WOsDl)g{2HCioN%U)JYUCvp2}q^pigj3sH`3>=cS&k1j2#SEe`vDTMxL#F z)!fg?5`;xkr_x+?zc?~1eV67P&)U+O8}7P`tFjWK?|>K-BR2O>fCQ3(CIfNw$S(B* zOnLSl1Dqz@K~W0z1H(yc769uH`ja$ZBBy4;1Ct(aHwJCPzZYSzgS`o3=Ckmk#5c5^ zH~=F5kZ*gY(2;A>89KS*&RxmpIc>eXdmKBHLRpqIKhG6;WYhtBx+rkvO>lM{)m6l6 z*&&VFxu#n{JB9~YV>E0bx3S$@t!zFY$OZUXH*#pnit|MZm-fHiqf(svYXXf8&YA3R zFP6w{@&7O%D!w*(FsI@NrC%rdX@R7BCCqWse(dAXaKsm*;5s2$y zEz%;dz7us4QMib?*}p`dXT2>g$w38NFbi>>QIV~v>~!P)ox38F_?zXYiVkK%eQlFF zdmsp~e-rdfc=dWVnLI_hG*mTLE5+YaZca5ntzv8ZXFT!{gC~l}O6*mYUZv&zpV@5@ z4|>22_)bdUGEGhmeCpk>db3AtCu*#G%^uKu5Frye#rW`;_@$(L7X@Bm=<529@ zKwi%&3X6OFrzv>R;2j?EzR?PoBaqK&!5_zVf1cI8IJUXKIDXOA+w}k-e>zC`8F;I+ z$#pD|!Qd0ze^Nc#M+pQf6(xw8%D7CpM(?Zch`(eYil1g|N{X=$ryeNHRcN+;sC<+& zodWby<=LAzXKK3!i%&P)EjZjvgFW2zT?T14lM>WL`5StpxIy>zo#T-VCi8p5`MGZ{ z$;&fFE47Iidv?Q^M=#Kea&VjJb0P)q8Lku*yl#13$ndDbDjz%Fh`KR;3f!eMYX0eo zv{}6C>^eD3zS~QAs}UgJL)l8sxV&alX_Q|#B;`*-o{gaF;(g<@nh=kp=KI(c`G}M^ zNDHIV<6uhu=B2@tYhiQB{%=k9o3twV@ta|jWp>^AlXx^A?+TxOI~OysJK>eagmG!N zWmSjkeoaQ(>$#jq-gpckep=bpq>DMC9xknH1F`W2Ol>sh2l_&8wpo$hD9yCfcqt0$ zpc2oiP~rON`+$K-0ZH@)!uvs-!-}@{E?#)})ppRyj9 zgI?hR(_>QAoax`LFYKCNV^0?A>_wyu$(pg1pkRQLLAq@q&%XqJki=)wuNc{&kYpzN zgqDI_y{>Pvcy!3OV_U9`XxOx8)XcEbrXrI};s?Uw8x3!W)XS4M zNXpb)M@U1Oqk%^VF4BvV2~QlU@rdX2+}88#OVq4BB$O--ONTTBq1%YF(O_jowkcWw z3Jv>kTZjwY8dbHYN z9LU#f^gb!$?~2GI!EhZaM;zNnhSgAf#H)!$-I$BoK+Xja!0mcGctU(I@s#ugxrpKG z1mHwVrCo;CqqH#Jr6}`;i}IwAwVI6*k#kuoX-De!PboE}rY%$fe1x7SSn_=wr1H(n zyF)hgd8mvZA*z+d9$41SpfN8UV4|yX-#j5~m;(46_}9j3)R)l}9)qDt*I!gjxLg%A zCGu{P?t8`T@q--97XF7UrkFaPCI-`}iUv<|5R$sa)?Wvyl~1FE>m{xHtlP;GCq!3< zjLj}cIEk2ovk&*-s8AD?18ZDBdQ7a_LUu2m<5-s+I)J~fY zyF{;960P7qkRttQgSB`R13R=qW(1m0NL1ca3AkRcG(3nF$yyKria z*OipW)ItF7brw7e1NMy3;#*+XwID+t-)#J&)FZmO^HV){uslpAQkUWYN-cGC{z!a! z6Jw$zX?)kqMw2)OdG1#^DBi_+l>>&ryxhCqT_9D5Onv@kv*y3#N;%*U zbEWFs(mSx54=PsG*UG+3M)KEaKH%a$NNOY8m!wWQ5>uykfX`g$TIsUa`Hsv#9vIft z?th$<`|zMS`CLY?VX!=Ad9?b=y+rraMc8NMLtv&CxfqInSTnV(O_zWt@P+IdFZ9zN zRk3XX@?~CWSQ(d+3joLe`-#qNF0alc+-x~t`f2HMsUYQ0xN4g?xhHSf{lVGOf zRRB^2#r*;Wz!6C0y6B78c%n&JR_>el-znO)=TJ|9@XR<@KbbG`N?=Ae&K ztI6o)!vHhwS)jmvy0k4}>0MAu9dFLyG$X=WN1}cH7(Cp|zR|vzm*)CwmD?2vRv$_# z2rYQnbqPH2H&$vNC@=u95>N#AXFb|#zPkoK5-iQCd?TeurCt41mO3blp#^qbSHTm6 zz{3r_A@#ff@BB|LM;((X*sP%&32;`Djzi6oE=Eh+7q}%SxJbn0$`J~!mSfs!$~2M< zC?OAxXkQ-LY%P~LmsGeKFjVd^K2eTCqj8bWBWojSWwzhoo)oscq|=%bv|a$oTi^ex z&ZYmHVd-2bKPenY z|M~ZuC^tK5V~LSsxUNzNUi;45OF*X9MJ}KH_?^5$nPJKyQBl3!r^!D;!`2`b<+j?` zL1SHMms_8cw6v4amF9@U@2|{;@hQQFf8Wy>JE+Yz_PQX{vm>Ulkrtu}QQY6^|94av zP1&nE?x+ooZOjk|G-6o*NZ%{}HS^Oy{Cpr-n0$3u+cQoQ+8n^~EL7zk$}b0& zR)fH#nvsa<=Bq{)Y6I>3S81!gJLLXhKCeK-b(bJw1?Env&BVnrtF zjgs*fUA!xp61Os;pxhlc8{z-}Y zsohrN9M5Q)9Imc3dmu6K+ASpCS!Ds*4Sr& zYOg&WtMd4__kWn83B!&?s1XNBBEpW>YNYSAVx^WgezI&#Da^N-l=T&9R@**7V~knR zv~b&hJ}LoeA-|1N>e@FpN^mhI5_KM9eIy0l7SU0A?-pIm5%KxP+hw|SaG-fO=$TDg4RBf8Sq+I{?UZ6|P%9jt8)~*tR z-{zC9rgU_HS8G47&_0}1T70;>3%>NEwyUB;q8Z6iGjABSkN&wX0uk)n>|a%Eb*=C&%;mf#1}Q~k!aAO9 z-YY)3?LqU`^qlYBy}K-J+;~f`{Vo8UzWz8sZ!~jT|38#{cT`hZ_cl6X85EfrL6oZF zSO5i)CLo}qGAaQTLT^g%2n3`A6b4X{B1j8Oq$d!1kS?Gg(v*+{2vs1IP=pWygoN+h zFys6F*0>b8 z?-uc3ZftaqiV-js__E5O6U^YT_Y_(&yFN(yZoXHyRT9*gd;&bZ?lZUIu;$XImv5~lFV>Wvo zS}dSzlk=GTYiG4S?Nj14Qm3785|l8?GboZL`^@858i(wys!FwLX-8!@!kQ=*56m2Q z>(?_Vn(8IUzNw|?MvgoD=W#!S$KCX_XdfJM6>@C<{QY&4YVGR*molPr{{t**biawBga+e{DRPk%gETVtD#?WLqFcwvRE&@OJ#{77=816@$kh3lVnJW*q^bZ>9oufX706J@)r5XL7 zKf%RT+M>r+3)^0FIu4!-$k;-1Kn3UFU)<}?J8;C}#PhA2|LD`NtOc-VZ9Tgl^`JT= zG8BALWrr7BhP={h%+0E>jdx#e6Dxu*y8&wpigc|Rd)4Oc;I84tRx1VSU0}t|Ob*@m zFPb7nL2&O^%t6xHZ4=CcNFRODP=e$)$m@+zCs! z%J{+*Ag%zPvK-B5QQWR8nFDnX!jqSS#8PglXBg?m&q<0d)lc%!dTt%)TH#fEX8<1M z&0UTfv#|Di>$dDt=XPJkC=kZh*;JEFf+nPAtislc+@t4~x;6-u*6<_mG?Q~a58%UW zuzudX>ItY>xYKY}*oFuBW_ciCALUsCGi6^}P#d)8ugLgr&gLzwK_zgqGX^x-yJFI7 zhFm?1R$kUF;6Sr|ZP#lQH9htWNmW9H#KTqaecQx$JhFJuMeXjB&7N?z!h)NTc|Bj5 zs3mj7_w(gJ$z-d}SRZx8!`(q$suwBg=%KCAQRn+y3z5imL%iph+Ty)H$v)c6mdmu- z?IZ!Q^)G92)k_5p4R=-lvwsEQr>Nif=@v_t%&1`nQ@FqHnqZY8ZSR_^TMfcFzI2U# zHErmOyf8?txwwuhd1M2pcw7NqjJ{2GJ4_+{VrY`6SoHvH%qXt#@oza&x17AuQ9S)l zlZ(5l?7iH`9g1G9xid5#EoRh1s0_;!j+ix$qeu5h%b&z1y}GZPFOhQ|m6VW~%T4rx z2P!DYJC)T`dO;2-KtfUy3J0aC8&o9)IXWL2?YN-0txR~C{J6r!Kuuw2Xk|{^cGYb! zc^dkALysezTB`H5MQ`(iVDx)B#{gT8_>6d9((wZdlE)<%EQgf8UkX)*+x?Sut zsqu5_n+`dzi&AiMtcZL z0-%mD_n$sV!&__~2`RNw3vSG~yz}L{ywfRAgs(#D!T7RsR|fT* z4Oy83rHGK~=|Q)YiMy+q%Yipb<9luvR73GsPMY-eB%G;TOJD>`!&O({+YM zB&3EDn-@lkLzKq%?K^{XQ``0`_nm!pgThi&FP}J}-2Fw?FUhI_-=d~XsngAT&+aP7 zND6f;9&nnQZD|}K=ta%ckmRYC-FSz2&8S9-dV`+$N9in9Fv|by@{^0Sv1fd=0^pkhSmUh;a`w(pWMJ4~MJtRBmo=XE(Pqgf|ylp3^=f4ZM1F`XUi>gK$8xn6&18LWR)uxx1O zW}R2B9s8meJ1FHsqjtZzUtte{_9_G8KUs{R3RnzI(`&}+zsMIF>M@tzq_C zk)?}@XfT#W?u?W$l#XmCE4Ph3THV-;noIw%c}Q-uzF@tFidmSRd0d0F1e0R?=&Dg` zz6b2EkLo|H$|lZHL--y?GzM`$W+7i}sm(FpE?L!qeBrO+)X+kzVkl|Vxi>p`m@@eN za}+i)qg!~Si|)}eDjl>9t>1B>#RjG;nd`WPc>n$|`TFo|P@wg1tGjm?EAfG~*5BaW zWdN`C<&NqV**s-M+Z0*b;?sUELG=ne!=Cxo`IZ*J>!oTI9+he3uyuoBnvp2=>V(I4*$}IOX#~5}nfdLGzd8F- zjVrwAW|j<(sN@RxuUtR~DH+G)A$iT0z56_Q19ezFf99W9f)Oy$=dl~YBBbqeH=;^s zXkI$OdNPX^VgW_1q$Ng?XAe%iQSHd^8H?Y9gfqriUm45oazB6NdNMhA!J22AZW5PU z%?rkiZBDp}O^P0)4b&`P6UDp&+jt9EytEL8+XPV2{xor(>|Jf{x6Yhe5`$KWi|eIP zy>%Ql6fw)Ma;`vn&slB*xt#o53Zd@}_c3`h*9&#YYh&TqL{())`B43=AlZJ?jed9k zz|pOCgU-lI8Z9&XXHpR?fma84>BxD@c+0zaBzxyoUW~eutPwS=1Axp9TogcEg&KIXY+{-pF zV4&Ueg$_y~=0ElH`6F~xhwWfjg>T`PHr-8jJ^X&DW*=%%zl}F6*j!^IMFO5gmGy!t zW4DdCqaJ-{Zmyh+FqPKAZ)}o$xVMJrYGt}zfunjhX>(MoWyY;>NB02G&%UYg(sV0x z4;Nq7P-vl0OMJ{JheB)kQKglBsasV}-AW&zl#9}Y*_W*Sbo-~Z$<@JGeGk~Q((=k} zU2}aEZDrEq_$oVL1E$z*v2z%v*gDVDQetBeSlY8v`$W8AK>U4!rDLdUlUo|3$(J{G z#RWHfLKyIO2GDvz#J^QIJ8oSn?)05;tBboZyEVl6912%a z_L=cS34+XJFln#AaqtBSOvChZv=2BLoddGyS5ad+6?>h&%VwEn&1}hc8&4?wRxh<~ z)+hgSUTNUzUlY(0y~Q#@w-|CqC&j-<9L&QQ5ZDJlC@YFsWLjp2)hWUS4fb>|6Mc9B zE%+s-5A12-HaO|v92{i-o6?jl*Kt`vf>%>w)Vza~lIn(c^KWTd89$Lxy+=%HqX>wn{k* zWBJ!>@lheg3${5X)|Pxl%Ch(dgwbgIZp$76UcXnFUgx+Cwa~?c@BW7L1vDst==T*QgEIEke4E>as^snpuz7pG&$I?|E^@3 zKE;XMv5sM|8?B1vYqBLaepS5>EJo@TReV~*CLirIv8QD--H#xNgQoVx0c7Ipe!}dQ zsFuLIxMlrjLtmcctwgqJN0*BLLmtY*Qxys$W{IWo{-=eLMm9eDA*DC1;3C(D4lW7n7Fm@mgo= zU1-`uC7ZXKAMe8aEP04mL=efkCw(d5s|0@t)9+Bed;QGs!->{s`4~_0g!_v_sDzB% z*vK~37VmBT@FUhOpraIkzYj-Dmt@4?UM^6c>-sGiRiX8?jG#f5YUsOMH{w4Y0^Mm;pW$2#SpYC2Ri@kU_VNdYn^Sw8p_m9&IB_o~ zzH(#%;EnR91ro6n1xx;j;Kqy_+=w%}F4?;qP?b@X;eAH?OaZHW-{a9+my;TGW{%w$_6a zsM%n}i)I&~cQZc=wBRmWF~xsiL}(BEzFspRHsH$cJ89)!%msl@{XxVj0r>l?v&Iwh zhl2^RU$@2*n!JrobO_blTE(nEPl|M7`r)ePGPNZnBSZ4-2-z|Mr&bVbB0@t=q{~By zN_G}y+NaI6LNV-b{#W!@J(;!*4gpfyB1Tys?U!%O6Pzw*<9 zI(A}or*wl-mwx0;-3-QQZe$MpgOX(7_Ec5wfYoMY;4UDvP5rx&Dtq_i;IG`q{LY_C z*vT2}llC1rGjB|c-mckvq0utk-A;~U*V*e^_k})eqD|?D^U3&N3LhvACs^=%|Um0L$h*rmR+Ta{d6Zm7|!=D|aB)?x`8 z{gM{Br0qUTn&USym2McL#u_Q=7jG1Qt?%02LS9dMF&?CvzCLB~=sMi?e>? z=nex>&}}poZW%Kjl54o)05b|rN87Jfm{WqxEfz#W#a}KwzOv9bi@fNQMkVNcHxz3W z*fyL!Dj6Ek^a2+8H2KKEBcnPiv|v14^zzw_W4jPR+SS?7fVLsR+I8^c7{*PsIow3< zt%RY75RhYkzV)f5DI91tqvh=v?tOugFcR-hV^RbsFzTDVUmT9q7@gg*Npdszj{!V`aL8BsJ*VRr%0YdS;m5r zdkWPaJLflz=Cc^LB7-V=`1UYrG#ZLD(V9YBS${3% zon4n~BU7huxevKto92JTa?`@$lfUl0QVY0E8S}1jCG;hsxB;ydv zUan7WoSfB36s3$GlJyO97{*43Q)31#E`PLY=^d}!&P;l_m}ZjfD&n*qO2ysqEh1Z5wmwArpVC%##w)bVrHN7KuNEu_o56X_?5 zCpE1p7LS%1#1x1-x@|G{GtLTs6Az#-7AOycrW!+x`@L4RfO&JLh})Ag2yN~`W1NGl znpsK1=@|l>xM19;B`HjOoCsH2@GG^La5xW#(IvUF;Aw*5Uzh*r+5R9V@~~keuK1ni zIM@9q@0j2z%Xc}{zJ8{Y3N`F@t(4uLp@Ha|YhFQ7qWpow%~hnnmzBf=s`E2QC-W40 z;cw5^l9hvoGh8eJ9G0Th-Qa6cfb>|ngB}&Q_r!N8NpUVffJMK~)cVSpswMHbM?3CW zO6x9Syc)l4R5XFUf)~>mEy9QjncKlg5FVRoly@nn7R(GP&sF-BE*1n5-B^o~fNrR! z*3r%S{F<8>IY7_S<5ie(O#1Qm$wr}pwXyZ$0T+dBli#>JPUPYD{dabWa7b)tbGy3-6$$z1U53BgA5pgQt9G z5mx6xY96(hzH#JOHvQ6N;Q-D1kA(_X`~0_L0>0;Pd3DxnlCNA8Sxx4SQbp!Ast<0` zh;$@i4RE8s8pr8Pi)~@&YL6|*-mnsbGK%<$*0yK@l*D%XQteJ&{L=;4bWTS37(*lD zH{T-Vj4|Z18|u`irDy6sBC^Jzo;zC|3Q`JT0!Q6?<-nAYgvb;hR69N)SzWEMZ=pS% zQ>YmFgNn{ZdInZjZ+zwLd?snz-;mcfSj??o-1F3Bfv;s?@*k(h72~i&s&a=m^TA^a zc56R?S&yM(G?{$HtS#v_GU+BZc@V4Hfx7S)A@un=U6h{Qd&$WVG+g?M|HT~!^DBIE zfBNnBE}afEX~y*J@-+XFygBB|ug6kx@@JDL3Q*EV_3BxSn(g#Fs_tKf-0;Xp-vw`E zWI7ROAm%+Re#CqK&Ew^zFyb$}ZbVlGtrls_8*Vt=u@(OP-WN18%SrDPCeMoLpG{># z=r_4UuTeUk$9ULBeg&*z{!&S+g!!qQ)@xS3Q;!O|rTJ7WyFzPdHDg$lJ9cSfnQ&<) zKzW`7rpVQ0oIz@O1vXjD3VNOYinh@8MY=mqV)wOKs7XDJ;^BF_i>tl8-K%MP^n8$P zakufqK~Phxuut3CZql&^^goS$wekS&%$>}sK9YWo=0F}o|BlJuG}q94Jcc;f@Urnu zbkRHh(86>q{d2h=nIV2$6ft44nA}M&wW)_n%(TiT)voQ@H8Q8Z|DtJ=KCZs*G^&Q& z7RlhwJs>p#`n8U~3P4F6<$7@p)cVgJQ7W)DTgp42?Rg5*+%6<$9OKbp^IjRGZ}LF2 zJ!1J@Fxp&^46jH?lrL{Tz88$8%>+P+O&kyqaLBK^Ga7f2%cE zGxa}?>1gv7$DL@Jutqhy{R<4Bt3Lh-E+Q_dEa0;Lj;Ta+49fh4dqGI4wa%gvRJC(A zn3N?5?Ys!r7iw>Mu>*X#d{2Rk5{gus(@raGtl!M?48Kq6xac$LxPy$W`~Ry%^5iyg zxA)4uG|-j$QBtASoU*~h*^m==3X;hR|5XAxy7+i&53&>Z37q=37stFM(GSxe`{syC zJK2hD&oex!T4JdVPHM?H)!#Y`8%=Fd@;QG%GRA>>;8>DjeN{yDFvWQ4m=Dns0|Iy%RGOU4;!3573%S9nikuQ?f?M#Y57V7OtCRj`Z3t&f+E?gT zEMZw*l>I#8ViMw?!&JZn-Zd?=;FQWX)s1!6qZ({*$#k}#?%*E*Tc;!yQ=aQcKOZ1E z=kxqx;b#ZyBVEUhUS7-e*Yuka&aq4EldwiClGsZD<$lB7RcHLSSmc^Cs0{Svp~sKW z7obh8OsuTtb6afes*b?9lhF(J%cU^6U+ZV0BxLE8@iOG8Mr4nE13=i!Bt;5b6KPhG zlOl0%dvo(PZ4%QloNkQn=D9zGzGhUU-J?ox*!8?oiz-u(a7d>9p5)14mMKCH2Xmu^>A%gr* znr048zsKbU^>~mqo!M*`4(tW*0s7oHO^q*47&qycvoo(p==qVoZ{wPlQkj2ZX zn?fh#7z@M~4q&t|&AbVjU(1SC1N~yq$3uKyeBt=#HZ8rf+>Qp3+h)?+;(Ge@1=2o* zhkqH+1R;PD1Ny^@OB2!RJ0-U#0Wqu?1W<3hyQNzT%Y*2)bQ$$y19*CDRzcXS%R$>X z*E9c_2YcmF`=(bDubMo5YfVR1i|TEr1x@G~Sv?Ab1ExVEnYcz|o2zHr;RA7~YgNEo zcrtW&6`=}y^cfu!TPVRI_Z&%UNp7i41{f_z?-m(U>uZPK`Q{Dq-GeL7irtY_2>nh$ zWT2pqv5dg&qn_i(b|9IYx0ZmZ0Dh01Tg{kh5CeJ;uKl0=PGO1z&N;-gqxIjs3s}%6 zOS^piVCt*yR%`39oj&Dl@#jJ#B~_y{#DEw5iy?ovt%@c5QlaN17efKr5+8q|)YIg; zl`>m@)x>u3zges-PV(|jPhWqKBaZpU<{dkBY#u$LcUOYjxYBkO5;#BwZmk;cyD+@f zg8KL~?KQ**R8*~>zjsvla`?5eB{3&R$-$QhQVbHvg}~>@!vq!(%;H)l$bHgeKkR~T#8%R%or+MD}WYc*}-ZQwxncNw$naBg};=_kf~?)3168T#Tni2)Tqtc z#l}vf74p&vcIW$I?%T?Ibdyo*JEgtYVZvSu>1N2TxJg2@A*c#$1e z#-SaMhn$_XFTIVcruMQG+|TFJ@6w)H#1Tdz)d*iG7>@DrhaHvQCdMAbwsQnHr}7wb zQ#M^g>Fa#&Mpj0VE(FQgi6t@4cK>Lz{(T&`YXJ8{vhisjCxepgZdlQwu%d}chs>BL zFjuU$em09+e^22Elr*jMRG|n2GK|qZE}`!WK3Fro|aI zFQ4c+H&B`$EC?yL0ppfz!Pq5ANr-ZC+tPXiN9Ja&|Nrv3i*J5cOwmpV3gxIJj-v|= z$`(h`1}vWG05Cv*Y4pAQwCA`>c^6>Bk^x|{Ou&xJrOT|9u4OPWMz-yrySyzNB2H5p z8wXm}%BQFy%7o5R@f z1bJb?aJ8}oplKdlMWs8o29^uUNA@}kg-%0Vv+sZ zj%WGh@aY{DA7rZPX6}0>jY58h9X9>Mw|{i`!!@A_N09-sAm(v~mO|)~#R3b0k}~hM zRyuZsP+Vti0SfUU6}CGn`kpl8K`qc>R4X7}lJy_=BOBbH^6D(QG6%l>y9k_>yi)p! z?bTE(YtSxuP*cU{ETVogb%oFk1Vj5e6spYw{YjyPv0dU(EzOMLwLplPqSHPdxYLE6 zb6LE%e3nS%c-A*EpN74fj-9pP3Cq%CVBG6JV3nqSg?uu36oiB1&B5E`9v`qNJnDtr z8PNbngSjbprMovfQ+;SEl##|jJ-tk;Qga_`0bZ9`y=yIGI}2iUK|-~WHuj_3LLUHL_w;YZFR{VmQ(hA$pkXlAYAicNcHO?L=4u+ib_`ClGttfIuWFL8(h1zOW=z(k(2I@PUjhU}} zomT9t&QyV(4i%?uSlx2@4N8$F4M@#~!O%gvqAH{Kf}Q9nK^X5QhR=mB+&`B*w*5u6 zqp_JoUH%FOF8XAHf3#U!xo&h*re<=1?fE|;Jq#(_7-8pWOKPu4IErs`%xq61eY0&2jX#R+ZLOSs&Bhp1qMDuaS_WJrcAvFiL00Rf_G+ zON*c}le+nHY-zYO$^_h`GpUx6BzdO094{QlMp? za+BjZsO$z%PY$%xtM)y&KVMjR$1X*o=uQ_kI*NmJpY-i%X*C>2lae%u_T%Wp=o1*? z+xm4dS5y5`K`q&-6L=KLSAoQ2kZ-2Yv@gHcK5^OZ>t3eeQ%tbI9&9*6Lh%gG?EDBP ztXvs~psR6h7mBkjwKVh3H$jMpnIH?$83pSO1| z494#k=tV3ImJ-ex-SZK&D*EV07^v?`auc)07`p`pWEMHXD>2wdbH!0*CpIgRogii= z=@Dj8>r%4KfTw2^@RU?QzFmY;;{opMDExj{%RPX7>I0HtWeo;pTzzo?Sm_ z#pRv{Db*L_>^Qfh^P+OTYNxG;B-#b3e>DO{p%08u}A4Vln@PP|4?2bfxX; z`SOLD!nfjKnxf>7Yo3`P-SUJ~c#u>Ws%x?lRc^YiVm5#ZzbKt6zsBgit~Kon zmOYDR-YNEI3-!>F-TLaG?iG&k&|>2_{mg|=6E#U?39zmOopTj%tX~Gvdk1`20CIio zU}1ZW8=yL5al!KZz#aC{UmokPt*6=~<|;P;WJdY>r1HS$!QLYXx9Htt>1LnjD_Xft zJyj|nt+P2XRr^v!H)-c(+)0o8kVEI@HW6xPYV_ zHZZv+$TVbp$L0@N8(+aox%bm#bjwJGU6`!JZ8Bx|mY6aM5ZJWra<>6pb+Dtn{&XdQ9c5y&BaZgMp;q zWcHhpg)X4Y!Dajfk`Bt`Nfk0XHajN1emk0H9j){ODz z<2YONN}%qIxmbRap{z}45uib_MQ-$G_(c_K9(= zPLaB#;Zz1iHkCA22)_b?@ytOndD!>LX;DWDP9i-Ka$rby;44Hb$x^`l@+X&!ni^LQ ziJp-(jDxCdXTea(fxXWp?g|vGHv_J0Q|p9q#BP#UYTvMadz26|iB-P@93-J|VzarE zuo%yI^?l$tRjp34P#Z7SzqyXlbQ|ArM+_Q*OHMK0+_SluA?m7J>GU83KX8ClYqysB z!T=27yxmJ6R8nhN-#R?U!xxl6B4i6-51Ayr_3C%kVu#Y}{B_((XWbR!NcDp(sE+j)Nb)yvO6PEy-x{>0?ItR!m{R4F5od)`b*dMN$Ox+&axyx(1ypc4EpeH*qtfN)i9iDWg zXt28e;~heNV(JKf?FSjdcw5wtzOZ)2ZnXk06cIf7wW<@xa)xxB10|apo}KT1>RNI; zI=coU$eU6t7_sIB5~gofwfj@I$Zb5%)$ts=wrU11-Kkq#RD$1N(Np?#nXs_QAq}lwL*6?Xz)i5bpr=gygIu>J7!&!K;)~q3|RiH?=ri z@;N)GkH@yWdAJb;CFH=$=>V05#SFyGivN9*=h8c~E@dp*1G0l;1fyzKaoPMqRrUizV#C8n+=n#Aqv65??nPx6xk-B-qM* z=NQOjbYXa55%N=w3bXO=X^i?dO}8A7znK=@4D&W1HK=d9HI6n9TG;L$b@Cf`{?F|k zt`vKF3l5v!oACv}(WT!P_;1_gm3cTpVg?Q`+D&@ZhLB0oXl#tEc8XHvB|d^~_oVea z6?l%dFBnRx-KabO*eUu`$rdd)ozf4@g9bP^{LpJC8;LZk@x$<3$Y3!fWPRFiRw=5_ z3f3V~8ai)HIc66L8Kg*~6H!YU%3n+oO(ag)Q_lOxq-z&k>1ii6u#N%`_$_pSm#la) z{fS*Cb0{o@8=SZFJ&^zWugR|a9e8b_W8EcXv!8ot!xa_F?u>R=E2}}NihC`{4 z?8OO1lq38D;A;I}kA@j|S3FpetvXn_YdOd0&@&dx363(2;Q_z!&GP-2{42540&MaE}@v$v@(kN>_po9yU4on^PW>5d?~=%C~!mQYOZJJ!{ZBQ zGo%a&si?jECS@Mw8v{$zD9_d&IfGHfQ6PT`MtKjYE)4)h^5#>c25*dixXh@Ua!?+R zeu*FXBXU9R6Dh+dpC|5)i8bp8XjG|mB{Xved+`+mP$X*}Sx{*>QPd`QrvU_%$Zpux zl7B1_#8HD0m7Y|8+PPX>;Xj~hGEW%Wj$+>?%7*)R*1Ny=`ziv5dSG02yo@Kj0EBZ2 z)`!WY>QKwqmW*7XG@sTVso`9rT;jvb#~-(27PqokwD@-D*vNKev!@d+0;^rFe0gHNu;L0MK3?$! zmdAuw6dJX`;he+;Ahv>EOc9Hrzq&&i?iH&z)@^;UvPEqGn2e4)_gQS_asXSeg^f^z z8Uq4}9GxF1l;O+oB>U$k1QQZRjC9KzoUy@#<6uWs4*AbL1Yad*o|h3=<8tM0ihdXO z#hDHUaH9)Scr z%;&4XhP2hn$`_o{@Xle$MJ5V}`U`164SO+H!Tlt_iG<)GpU5x~_73_6cCM{yv>x?Q zoR)aD_}^JyH=>mmj;rK+huhH=Y%)jXmsn&?Z{r8M4Zt;Ae>Dmhw$Dk3$yS`EM)IC7 zfxkf?tiQM#Ve|G498Wj*U(8&igN7{yMYVVDRpW#JC{M~r{fghRw(fbIcey)PwYDX` z5F8HxzUqqJZ>V;J(HR)Z+9a2M5I};1m)@mqG~F6rKwS1?{kG=Qp=n{NUmit0AZt)- zzzLCa7--~)-N$aT+ODysk;8NGN!qG=!+mm-1E1CbB}9t$f^q+t*IF}uF}c3r8$pMF zksw;LSB&w$=V6-l;1%4yU=mLM=fRIc3w9Jg5#k&9jL!Q?nBLL7a%z`2EmD|IAStC=}>7ijH80 zAikt_WT!CT3+P0fPezUM7K=kR%A%M0!BGEq^*laM&IVllq^M0)?kfDFv$D7|L3*LS z1}8IR^EIftEAc(tBDpf7^I{iDL8hGTk-S44i=J>|gZ{IH+%8kA*0i7`GXC+|q`ud+ zs3?PYY-H%-fDjGc6(Or$ezk13!5+{w#PQM}66v$SAB!id?Q|Z09`Jf@I$9Bve{pS& zNlCe<*kI!txcUgQ^+Kq-9JgMva&1+z_q2@dH6~4!hw<&)f6*!18A^;uIjCR+gFGkSrnCbjYI%eA!L_ebOoc9wgjD(dKgCZSF$p#vzQkI$_CP7=W@I9inu4jiyc=L#J z=&sSfr;YrlRvV^!RtYdWfHF*3jn0ve&qS@wHtYH&^&Wu%q@1#V%1?F&7@-S@u77eQZ6n`f2?`AKh2kY!9>;*w^I5!2Y4;_!h@>3H` zbAp-xpwQ)C*7D<}tx>=9o-k87gen|a6_ibsea^~r*cpyJ>lZsxXMJI(gyjHTfqMbpT+3&g{zqHW z1`Wa&`V}_);(oa=Rf@6yekTNIm+$73?4DGQLs@GR zxL<$lZ9#~<6t}6SDCwi~2(BZl2kiec>bq{Eil+MCp`W&*qfN9ntDEzE~Mk#d8E-@P%j{si`|Dab4JR@G7qCFhbmvuvz5RS&WM>+Q~YK zo5RYWpA;kQA#3JN*7vTpViCHMIHUg4qct+DJy@0XBMWil80f{FkjqyJ#%j{A7zI`tF`*@3slQWcQYc&LxuvpqouN);Fe8^oE^kF|xa2Juujxz!UE2Slh zPiOq$>4Eq0XiI3V24c%!7m5%H6LPoA8Wb*{`o8>=N)BJ@UX6jCYmJ7s8Zdq*B{-xP z`Irxl5j>ut1us+Z23(19(O0yiB_P(KsSgL;mDGc#VSR@> z!HBGirQS9<+P6hg1qBts*8~E}+M%8Sob*3k z&?QJ%_$15PTE}zcRT@B+?%Hf%Qjs;ePBqAC)`9ZCsonTSMj++uXzBuoo-l;Y4|0hL zjJ&`pYY}u4$u3xU`lG6y0k}VAAAD?Zg$*lHuu9(NVSUj=VdUyZ8-?qamp=S7|GN6G z^k#xmgd&hH8IVghKb&BqvY7DxWoWAT6R-9Mr<{OTC9d^05qNgf;s?EmRO_>rtj#yt zwjdE)djw^*jm`9~eTBad1p#LX6nzp8%6}_)kufr0@CU&0Z{Aga3Hu7ikL+#=v*I2P zgQ_RAkhmK1XUkC5rby6SqdpNNpTDNi-JqBvrw968)<7aFhi$L1F=6Fp4+4Q0k0jw@ zX&^W>&5%Fa+mY0~a4WLgzgNA?0AT{``anCl@~$aoD$uVHyhgv23Py?lEu=lF894#0 z@4)s6&&ETPE?=tiO1r%rv`ODzI--#DQQ(&c4Mg=mh$2!EzAmQXe3O?*ZHwX zf_1?k6(()&mg}I|8O4z+4S4*3bf7~--H_G*|8G~kn;~uv0H793s>jM#WrF&bXS||JJ>OW-w zb-}y)IdSc((g7nz(SDuAa{uW=wOyw|6Qt|v2iN2g?tquZMb~dI3BDN$JB7Aae%>xF z2~fVA(*{jo3Ms`8m?OO1YE8RgaYy<;>l5F@#Nfr77yO~}6nsE;yRvsL;IZJkin1fn z@`Bbe?~%Ns;l+`7W(*t>XaT0VuQ=B&vcI);v=}M5bN1-H{m9m~k+}xvM98i=5@xRM8nsDf6AV)mf+ur?B9_(4l z=jSx%8!}eB;9L2kIaO?gW*W<+oYi5LJMmOADK<8Bs-CEGb4?DKB*S#;E(YmZ#rw7L ztGbtMP(%-xd{uw%l*)`cD`Pn~$C1FTydB~0ty{OA4^!qy&;ECg`kO^n?tOSpM(|nd z#N0B-b$Ca91*OL)=x^t~qKf`g9fcTb%1i^l_uK<$1tN8|HHdEJnh_#WPb z+0EHm5{8^#qzQh}0kD?D?+8Llnpk{ zn*qIplHHaT<17t;(>5Sl8W*+ERz`a^4&lBQTF{>=yVmA;ld#9*xEu3VWRaxzDRmh(3+(OQE`f+rPzu$e6*gfjQ z_njFd*8WxX%VwtWTT1S&1M{d%ZdQyc_kH-S0KoCjuMTh4F4$*TZeMj69y3bz&w1uA2ha)(FmqjKI=0!A%n11)pcjEU{xlG9^Lhj(m!U=(hXclhR_{b$QPBf3F|*T+DLp-U_)D7pj&O>Waq)^Y;# zldX90V2SMdPo|HUvbhxqf9T-DDuE5h3A+PbYv-w5X$DAq>UToOPR{qA2RVM9kez!A zwok$cPX$0|fpzaoUZtmzm|W0|o#xed7HxDWNJ3ockMu8w*TV$RD1YE9PfK0|hthmU zt}q}T>Ekr#c#gmR9Cj-Eqn@uH2H;9jy57{J?zZuG`c>=?W}l$f4CpTzFu>yh=@Fs+ zp^F)~`Z&Wwux!)8&S&X%k5|*Da~vWKbi9G7bHCrS$qqi={AR0_uXzKeuBqmp24(2> zjfd9E#g9UD7;gi(n`be!;4v^Ii?~X#*$0Za+x@gQ`l>Ywj)9TXvY-ZmGXrWqZ;QK- z<}iHkFT-NPdEnSOM`ACBbbqXD@ETIY4WFEpgo{`JCo2m?``)m-Mf+}FO3%=HTIip{ zNgaXK!_2nR%o9ReJ8V{rjS`Ph4$#!_VT|~5M_BllE+Amf1s07<1A?Slou1jm7CYBt zj!^&Bu*6IHlI~Gz$N;mr@TFk)TdAO+{e}=6_|vKJ(D`4e-1~L+3mhrRQ=wFZ%)f4i zVAv8Fs$(;vIK{WPuqb%|2?tw_hzx0(!HubZ)Db5-cnNaufdt+#3ZcM$uOv>g{4&}F`#{0`7_C>usq;f#V$vze zmd15)d-^UaD0yWqMrX68qb|MIfB(>p&EA=BciaeFc*o!r6*{<3!yB+IX)%?|z;TBlA7t{D^a#`YEW~#A zUT;u%qaeo`-Oi}@-&I)Fz`MQOu6iOSX8+$2LsuVMm%3eHpY!FASDH{OHV*f&iCwcB z!LAH0s>IJ^d%u`T{mCrohy-yKC+OvJ<~!B$amt=Q?axa5`tzSdo~I_$w`d#bONX!R zKcS?QIIT3zGHm~J0`+)YDz#HRMpHUa`^$!TfS0n&?kt;qvHu?}|bm#7nk9QCcK zmcDJQuw+4H)fnRSuTkd&-Oc3evc<(OyRF@gE86;i@z+X_+yzmuW8a!ZtwoeS?KKh# zW`z5Y-^^>?;P>{%;+X!GTMwRkD*f}7Z}Dsvvx}?d+MRfb-H%>{{@(mblljgOSB7;O z*?JTk|LRXCMPbKj`!)O;{}E zN#|>E-OeYqcduP4b5N)`?2g!cH)nSa)%1TT`|7x=w(r{wD2RxlTtz}aDQP98Q4pjR z1f&%Xhi+*O9K`^U?nWe}4shs3Md>bSlSb z&g*?xoL1^2DKeAOlK*lZd5RqM)9#g(P}b~c+m6>D>k@{d$kVH=5Zw|O>L&we%ayHr zY8pMa{56;O@wE2gz5N-6%5F9e4Mi3E$Kd$iYS&aII5v3c2O#mU7bK(I0sPJV1>coqy~JfR+C z82!MF*(%yqo}&5Hjz}r_hPjOY?Ev?pG&x& zxlc~9lXB=u|0O;x)tU+|mGFT<-J9l_>oemER}&oj-~7F;BA>SZ`;MQwzJy?ANengg z5$E$5hFj~1=8oEZE)L>|X#->FE5l8U-EZy`?04a6%sV4k^d>(E3lN$0_tD`~ z^AC*oD9#~7*6uZGy4h;^J$aIF(2y|Zb+lZFrybv-9rYXm`)p;ZtE$BK&=~6I5Ka$& z)nQ#<7I0S;YJ|&vnMLGceU~}be$lTVy=bZXp11;pPKKLh9F0Sr#SUKLI12vPf03-YD-`2>6)sQ9{3@Hzx)iWxOA~ye zX6ozy@=jGV?&GG%@w{|c`4AOpw}tTYQIVqhd=d3;M3A*`OpeNJTJU#Jg@O;~`_|$) zS>ornmM61jUlQs$W{`zu#mZ_I8D$XH^fbVNioxW221_-gstXHA$Qk|RmW57+dHoL0 zTbuL;br%v0IqTeS2#ua3vKej`MoAWXOy~w7sea}KX~W(eXpXxLDX#f@eB9iZGo1Xb zGy%qc<^wotem!LSS!hJidR8@)Jrgs{OhpKBl~fNM1wv6Q{Pu ztx~RGS$a4MzK%cO)uw4gI8m62-vWodK~P_*tsx(Jr86-{6nE*c?mM`Iej47u14mhf z<4*&P-MS62N^=^vOG4V1GqA0R{5*FnBKMNHjiBRe?G}+opz9Eb7!i??v3M`uA+h+Q z_lTdrXh^*?pBvNpKZi{i-T{XE0)`mN!cyPVsL0n=^|5Bi>XJdb$XvzTKKLPomW*fM zpz@pJRyg;mk`T#~)YIVfdye1n_%7bSK*>Lw24EVZ(RRwg8Ufe)JFI9H+M~I?wrCTX z!tf2MGt`ceKhrot3I3Xuem>prs*H73%3W>u@YU_AT^3+$jlW9&47 zoi4=WV19>_IQO4=JPy-Pm;B$Glls&pzrKxc-W|QwKXEifgbB_0z@%OI>yahB7B(He z^Chjb+B|IvQDfDY(+WPS#8NLV8@%?R^S++RwfJ!no{67h_)KH*c6}JG;Rk@{D0&&w zlpeIW@qI8oC*Ob}MY~6(Zo8Q4dsP&;Tpf544dEk6HY!?!_VwBioS-WYDk^Z>M1qO2 zV6Li}%ok2Bn{QU9QJ5|;A;OVS>qg&$2F|rN8^8B`;w`~%KW;_c>wm7&^sM>en=r@2 zgl+I|Q13!tHpryeE(&6UhUeH7&N2Qh3R>(gH&(6_aM@?b%rfVn9kOmI>4KL(!})UV z#)YjDrZ9o3_(@gC(`OLyKKA5;yb&fWOdMIfs24rq2CS8zuyxIX13Y1w+l( zmziH*ve2Q{KPclijJIw+uiDan`=uA_#QspE5$`^@<;~ZNuW~Zk{CHnHGc>oc2z~30 zpZhDhp}M?Wo4ZXx=+9HVOrGXHBDGr{uN}WoM-QH;Y_2@xrPrTT&&?lGYE{Xpt$m9J zBl++xs2XfUnjzWym zYb`nE*DoLSpchkw8I|8|!y$ol5e^9^Z?@w-GuFaSuQt-00H%OoqeA&azFlBt;YDx< zG^3Ev0C+R%5fM!_OIy5OeI>8+2}?j|vy0(psTODvIq9-w;$W2DD5W)Do@_1$5Pw2t zg_BZjM2kCGsekR?3qj=Ps?~qD^^e!&NZkf6XOprTR;onL*+OX=hyB`vxqxc+_+-^A(-o(b6UX{^Ipp51$-kR4r%! zK%$`T>$i?o36rPO>t!mHPpn^`KD>7?mDDZxO*8Pc#8tOb&{cbrt#9zM$9~#!Uv&K) zA-`uw-6yoB7bUrBD=xR4hO#(DcI)aGzq{I15IVOEd@9bQs) zWfF}Ryc}_HZM66L9pXEX*vqG>gxr<7jGbM=*cNYPHk@1&B-7z-6~8Fmbh5~4|3YQ@ z0rNpB7RF)XvYL{#r-6~ZKRv*DNoLifK`1Xr+=>dAvz^&?LKMDzf+>wN+iVe+x6?3iKp4tO}SGQq~I_Q(q} zWNVM?0{OIIAXpZBbbyP)+vVWklh4YcS>~$K1uw|cBY{zz4HRI010V(O!IoG%Jk)w) z<)tG_;K8H@TyHk&s2cVMgUj55gNk0G?A|x^;uU+5D$o1 z0_25IdfR&aoC94x>df9PA^j2TRs6BVb zYb(A2egh)stPY#39E}184QDxztuNdlarQ~1}^mgT&GN+?(SlDAq?IN@6~kV{|Ep<6QQd5uhlgo~!i`P&U)9UVqR+v}EC{g>dK`pIw_ zjc;=sly??Cf2l?|w?kCXF;=VKJ517faywhv7Xu;lk_2o3}fF;XCKDe2B;y!;F zH3qUw{$BrMkm`P}i$c|4=-_L!``}|Sw#|TvLz7gcf&}p8FH8oBz^sO7P8)!>s#c1a z9Nm)Ds7kN>3MQuQ$Qqv?!`0`)v5b|xv(njD>785L6AdJa$QKgLZEVch&U#F3+w`h z)`HiN`Y?mx?o1(a%=Tx5o;{sZn{t4|#sm@axRz&}tx3LyqZzz0`k`}C>JLQsErtHK z4HS7xcTfjKL55oO_4U!YM8~hsR(!bp72VigJ&nyPQyAmo3`sa>R<46F9yc5pF_cjQ zU)%LLBqTITPS*4Pq0bB6O?~30`mn(8o@$>)l%lnU-xPu@Lid<|_|NOb5ejVij8wd3 z32PHUKD0f!^7{SGTo3lklF z{xhMrq5HF}7{(XI&Rf1+1k-1_Vo4FvJ7K_bE&CpT>S(*o zR_15_ZSMS{?a0n6>jlS%LFo4Ih=`b9Ce-zI_&Ez80PT@eeoO;Fvb`7pBexZzu(q@3 zhDx9EoX1OvA=Cur0LOouOzJD%01Iz!#;*dQo?Iod38B$bdldg99uv~4$f}^{89GYh zEg;yd&@@Z0uxcQ`u`~3*nFqX%-E)Ig|4h_4zkjj#vd`z2q^|=BD`vTdRaEsR@cD;z z72_;}<$dy?_HO@A{B;%2hwVXi$v>DopwbvFZ`{2s4zBU4L%r<+rhUgmw_hu)jrv@> z{oz*`!$^3qIr4e_XI+<$l6TDH1FGLVGLL(FVM0%b`zM=;#NwEXQs)(Pkb$S&0qM6^ z+`~Uq48rTjBJTZ;!!*1J`s-JvuW|q^x;1-u=I5ysUWT{LeXu-lWK3lI4SxQ)i(HgB zJ`;kS!V2dI5_*>U{>^AkuR*Xjne3-vnTRdjH}08@_IC%->$WR<|qP@s5;`WtBqZLj_sOzC(>@3$c*752|WnIDc5a(-zM6F!6g05BI!puFGdS^HgMi zEKyJhI(!UZV$^t!-ZLDGKN2=AR5ja-<`o{@_o(0k*kbH{{(P*NpWKnup;6Ts*|l>_ zM`K;MVrIvNXzW*rTZKc%vn=nc-jg)Sy}9osTd=EJ{8y%urjwH(vk$=+VXpxtS|ki# zq*A*93>jlDDHXxlJSV2&z7ZZvD@QC>M;1;_SfL8?A}%oUpF|;#&wEr`r-WLzL0P`z z>Kk3yh4`Z0&nUHKV zs6Kl8e50E{-DEGtN?qgL@Od-(0ak-SMsNUDAr6#xWw$#bNVs!9N5+sO3T=9QOMj?j zRTcATF*c5RYU;~%ut(oEj){@$dn>#S{MQ|tG)a?7+G6zpk-%0wC@O5AGO)X|I8dAv zhpc4S<+pq&+1o$nMk!tXEZLD7(dRF7wVeu-sCBCC-!*jxjb>{cHV^6my^@}QKK25S z-x)dkD#v1de72?XEK#azZAj=yWzwiWCQBHl#;&>&Wywkw72zFadHPi-OOb8Vg$a`0 zHG}Rkkwkh2bsC;#)Ux}}cdAT=mFIu>-)KCq7avL5zmZ`vpKGiPl#&y3dM(m2iwO+^ z%qKz|FWvyC%y#HNY>9Ql=;g=SZ<`h7imD?^=ICb2gd1Qh%^W)^M)p4X%1Scc$_~{J z=rH4+#&Vwm@pFdT{P`6xR-O*-PC4;{s_aQ0|33Tk9M04rKxwQYgPo9pJa?aZq(6y` zi)*FT|J0xu6z34g2rfw;5Z_&QE-$Awl^fT3wo^cd`>BUAd)rV(EmNoC9}YFM<~HC^ z<&^`&V5C283>W{tDv1}cI~__9@k@$rRqccqx&+$SyC*YF9~1g&{!ErPCPymROv(&= z+$QfuUrxQp&zgw|=E}kHpWdN)8`ElkMAy>%kXQVN>F7b=RPJ#!2{``1_cFLHdq%G{ z7qd_0cVUC@^I)Y(@iY4Oi5FGP*ZlS;^4eJ)wj#xpoo^0H7r8y3wUo|NP@zKBZ> zaoLOtJ(Gq&0F1gQ?9$-RGd4TOkR2Ps*KDOM$idw#tis231*T&pr)Oqj@y;mxDZVtx z$Q8gUyst`taS!qNvkEh(@)rV*SXv^`@MWgv8L}xjb3XA{MiuQC7k@Da*GR5pQBAaZ zJD{N<)E(Bx%>UCdC$)bJ|MXH{jk}{?l!G-mWAsi#GMY)7N*?R{?IVC2o;>)_1Zm<$ z>_z(c8-vC~3hL^NFZ=yFCwoR5;#od4@ixE;+gd~5bOBxRcdIBtKatQ*y zu#Wl3|Dv?vD%eXFTu%*LFDpj4#tNWI=XPO<_H{UzrNnqW%`~z2brJA2lx@himKHt& zDLYW?*u5*k8_*U$GBLD|XyPA`z<|p4mYm|P0*~cu+c8o0$#>f!twk~i z8qh~UD=D5$lrXL(s=4!+O-)Ukb7A)AORtloEwt6~N;6Zmy^?>0>G zAOnROVC8SlFN9GJERF8E4~Ris&!I8=3C-B=6`qevrfQvJu3~NlquymNX+dRg0Ew;k z?zGeqV&2>SeQwzs6T!G!a~5K-1GmVE=WHA5$!NVQF%+Z`@*Sq)q;{7IJ6hK&vKD0L z6E54@-Wb91hPB2*c9Lyk-T+N8)WI?+eeG>d5^S@kOcHtw{s)HJR3U6Y_Jg&>=$GgWM+=Ujm% zglB8-&c29SPN}73%WSg8g5^DP5Ct#qW4~yI?238C$=3d+u@BDe_I80J@E_CngbBnr zQ`@8r<}@f+Eaz@U06d6Srw@XTYX{(79RNyF(%k=_)FF2)2o07-vARFYXEhQs!{#@W z!hZZdVV8X$+lwhw=Ns$%oEFgo5O= z*P>$U&2%w$McNPzPE2a;;tZi6^b-iE{Gf$a-K-T00P?s$#wKP`6SZx+xfG9Aqz&#& za4DiG>t%gy^qsNRPvUWXZb#n2o2utd&ox^%Q{|Q5d_k#CwvWXh12RF~flZ9WQ!rQg zQ2$c6GY4(}wmODT4=zAZ{FTS|G`A@Y_bCcNbO1gfyjkepv1A{*gHl8ct zW3ZDYQ;A48*hOFU(@kK4dGWh4~J1n-}v&pN7`My#m!#+a$!^YTtO+Q_L<5k=sAfLl#MQ+M3a# zX3pyX3AMc|hHi#Ml6w8_6GviV>|&SCGhofo+tpvy+yH&JCvFJ}z?cL+w81Wf_WdUW zD0fe*wmVp2-IT*k3Ptij? zgaqx-UNN_cOf{g|;Rwmj+REqeO$f2|bi#U0lNc`>fO|m8)<50#fV>a|;GQ$t$VBJ5 zMW&@$==!&VpW@I>?e?+@P32=PD*6KJAFeRtA^R0#X{(NG`KD}5<{Q1<-n_FonMvY6E~~PyXPTm^V206HkdV-2TPU3gE73f`3A=bAatGIdnL^V z-0?mM(j)yapI=BQrd(@&Q&Qc)YR2Ct;k1y1CY*-+Luzs$?$_YSMI5D7BB!{nnR2Jc zMf8jxs(h?Y*uIIO9H0+Jz1zDEFvZ=+AR*M?)+{%U-w~y04N1wj(OTf#RW+J!2&>ya z!NlBEO7xz)dJ0~ow1S%!4EFIpck`m5PAbw!vGoNyzK%G_KqUc)GBqZ5Kd}Jt4LySQ z3xDM$1j|offTV)zr;GD9Ki_`wtRm-NRbS1}L+5hX!9&1r9KCnx9eDC^I+FldvxK2R zJgb#J!g9sLZxt<}M%!^w7ROr8%uxa%Vf`M4VM(?qD)s$Aaj3HZptExGA*sx4+t8$% zt;}%Eenqfy0bsjp`4erxlAJZT27@>N`P`g(p6@lB{A6uVu+&{802P9aM)@-Vw~=Ri zl7^K-tsUWBHYEm*6;r{3UbB|MGpB;toCqm4J-vQV*Dh|XX&o5=C1-<-o-7*wY|&9N z+GSUSAh@-u|?{#nmDl4OLz#`^H3esIpfX@K9Ma;$&BXcLchx?}*Z9F^XP)G)mSO4D7`RUfX_qG1H+M0a4cYHa?*OY%7hszfiZ@h>{!rH)L1mSydDu zfJOmc&+9k9{NEoRMfr9fFyvj#Sw1-y8@pdNex#48p=OEzBdT!Uf&SnM?ykC*OcqC= z{Aw?g+f0KaQ?3v_KNwNyxch?qa`A)UzUrMTS}|li0-fg}@rannTSCe-x16?>H+rHZ zA^=xO;dvJFo_p|YU-W%Vo>`~HxH_nZZ>Uyb<=(&VcxWyCsa)0Qe(%+gGw9EkwpX{J zkZCPG{xwE3z+$Y}Hm{Ccxw`MU<(5fA>GfUA5WcZ50)Wsh#94AXFek{(VyW_0xnP}5 z0t4%8-#z!~!eE-LVH@V@gdm1)85B!4wfNR$d8H3wI7u-kCsPIH8<{}>R1W67Rn>QT z^CbSK<&4XybHZWYlHch{tMiU!-rHNJj=7VTG)hqVcb%CAKxcrp2k%Tv^o?p|9#|}3 zAtt`FvFAtaHZ$-5yFAHDVbzDD1}85pDEwx3V+Lr~g@804y_O~~<{Rrd?YmEgqPB>z~BhwVx}$!P>HN7F{IgTe$;LrY5B&>4sf*r;Bn zffFlf#)ld0?bo|APOiy6TrsY_>0%-%M&aGA|OZU6*P5P(cT zCIz75p+`(EYJy$C75`iesFkMtv~baGfJsF}=DDU4EJS6vcdepM@iqNRcrIw^9q@(CbU zG~cfnnRyk)YyLt6)dIb~0LJ#Uj?w*`5AoeM8o0`xjR@e6NG?E;9ZpiDoPZnCk^Qb zh+}xol`5)-1 zlpdJwM9S4jDyA#b+1S7ZkMuQ>z_ecMlGd+BK701<5<6j!?WB?{yETJz28* z<|ngO72(`XDy$7yMf9T^jYALminlkGT2`C$yb8aK^h1^oNymWHtgUBFk4EWtJu>7H zh+u8)l*-=kS5}{Dw>98klP8`L1ouQw+>q|3;Wsu0rZM8Af5DtB-)VKlV$Je(N_e{K@UBRvVZ~?QXhGV zCW)!Qew~Z)9(j@ssF0y6>-Ghq2N{D7iY?b9EzqZdp5e(kth)~q*{WNjzd>L9V+9X#0Td%7 z1k4Wa4%n6IoVF&OP>C~XAXTqFD*-Twp>arBS-A~sJur*>$5W4!dD>J*ws;VkjEuJ!R9nu1JFsKx0);cP#Aoi$fjGA#@ z1Z-0}u%g%i&2WwARHTL;Guv)Az@#qQM?x3fvJ|PvF_Fj*BoIc9_&H?oaaD^Nws%_8 zMUar=)k&T?-K1OQ3Ya1v2cPSVtwhfIEYm@DLICHlUAFE>o_?Kx&OY|a!P7#KeF{AX z>CyhZjR~l!V}v|zO?Qbs1L267gVB$H1L}YWfd^>5`(tS@0bLzPJ<1xzod@ec8A6wk{so*@DRg zp4XKcLo!}9Gc5W7QM-ys0dT-QNd2WbQ5A@3gY<5@tqEzb5M!*+A z3|VHKHp>KB_DaR-&S(>S&dk&7M{@G8hZE%i{Ta zYRiC&@w*jk5Dlj~tDc3dmDPu%0eF?%f8Jlc3fNj{;1cY#vaNj{J%}%r3ThLynH{Xj zu9YrNig1bgSb0H-GQk3nGv)QDfn&puQ()o!V5!9Y{yk~Lj8iR~j*PtH)eHTB-M23x zwy$ytl-e&6|M|-3M!?QMT{1kk&1via`QD~yb-#}aGLQ-`CUW+2kV!l#J&AB3x%gB3 z=DJ^IXz+;KWnEdT`=RweSRPvGRq+e-mwq5LxA+`U5v!1COGWCWv190`C+b$_xtdqY z5XMc9z`-wbuBfMvl-G99xygHr8l=sPd;I|SBKCe;fiHTK9!wn>#-nOLiw`w8bwlAfF`#hfVL0v$mhEgI3^(05Mp5tQxP0cc=01T8i(!f_d z#&MT{>}XR~*uY_EWF?O{gWUS9vw*K;2u4`0;}iN%MOD|TCSNG~un`CWdjgo^$)N6ouG0u6 zN4ZJ%D%_Gj$cX0s`d0#|tI}fm(ctj>JerO%Y#Tpu_xnj>o>%2JM1H!7KKHs`^^(^8 zQvghRP>(3`TI|SZ>=vy-oI=86(Jh6Y0A1CGW3TMtFUS`REU8Jn9gy86fL+k-2^gE( z)33Ly9e^&XuHp*E*cj;WK5q8n;Z5&s;{2?!BiVNXby-`rc^%U8o87bT9051{LWDa! zU;ojgXEli-_P?^lOxxq##wLremzS6CB+fdnYz~?x4T{fp&-U#U3X`fdi4*k@l}Inrex%{6Zh zRk;<8I%2>sL;Fn%8LxO{@!b2?fRsy?WTC|y4^4?c&DqI6H)U{r2iN^9r%r6dQUPZv0dTb^)hns z&od(#qT|vy)FiKV=@N;eA-z#v~`_Ubh2I?g>KJEyEd7mY% zkx8Y!OPTCPdhR%dP_0OBb?&y5$3ozqof@zzB(8mmHS0KFbp`NR0`%apyNW<~=A(v? zGAA~jx!u^Wf9xH*Vsv%d_HLBT;{o?^;(MZO`&BZbYJ6!fGq)=T^e=}VJTyiHJV_Gf zdhkwRDxGR_St}(fy=$Pi_n`xCKr;aIox(@SsWl(99e(JSy|^-Bz8M(C>*Y#M72ULY z1o#Q+YmNVlAl2mnx|7JMQYPestO%rwXJ?LZ3O(7mbV|Gcy*2b2eamVZ1nlmIlZv)2 z3y0RB!O@)a1Sc{^Mw8&}Gs-g`HwPX+re3f>+*^6FcVcspa-{IWVZq#y9boPWqoRiH zf^HpPSxg-z$m?{70&3}u4>yGN-kMFWNm}q=lO(}QJ}5GT{X5EU(7?>p_J$Kw$)Ep$ zWZ2xXC9Wr*q*>~++)XaJIH5J>Hx;ej5*3n4CA`59|EaTQq`4sU34d9|dedU-LFNu^ zFYi9^hW98y`vw(2;dYbkgBCsBr11g^3=@EaGe%dvOwXne#pH0=RO7TZnoR#+td9h? zwyNS#_tdq6CV_45h1ExdE+s@?-k3|~{Ac|G+vd)7Oem|tsPT;3X2av1%aHp+Z^OO| zT5T^9_(k+tPO;Lbcd0`jkT3t^@v!YA11~wjno_yTVd`nBwO!76LQE_f@}n`l!uDBu z)YNfP3mk8U^5D0dYpkqPU@BBinO`Qh&CIin1QwHYz1z^4he(~25q|6B-aetGAn#+j z9^>G|X=p+c)M@uLs+0Kym{owrsVr%AbX`+7va{jJx$0N1@P*yCK3|}{+qpuV^SK?t zwOjUdMsm#-Ip#7k9U2^uoH0-{RKH;&CwXU6pygPHC1{cWuGA*(&n?dD>o{AFVV6UJ zy5UR)F$5>Nnglh~>D)CvQ+*bX0qwlPC3K{_{Yo|WX0o^cU2P2XiUZb(eXg7jSV+vt zHaIdYT#gXz48zk*>(EWAbufUPy68ajWR*PjcQ~^Xjt4-{KNpzh*wDQ3U=Z#r(gt`P zcE*B%a5n;ad)Vmy2pLNp3E_51#~%i_o1W%eVh!pb_7W!`ypB5BDyNUg$bk1MWvh8j zg}y&Itnzn`#B~DFf=Vc zSo6YwV!L7R+}!fi-mf=>f98?7c_!7D%fch@43$QQY>+-OWT%P?aokU_u{o zI5+=QTGFd-L%8=Xmd|G&OA~~g9KAz@2KVZ^8}g2r0y;WU&pmA*7SOtRP)cdq1}w$ z{F^84D;KGHC9(KT*Kfev_Coq;)d4eYbQThy(KwK*?uch?ePf161C~CDyp=nfz)gP3 zkQjQNg4Yj}nrUFaWhvC))5~jXpF&Zc3;EoPIGSFhkG57@V7^54Ok=-dYnyHMwA*kE zkabc#!eqwFP~G(+EE@TwEm1@YmJD_J z8==hv9cPsIUNKwnYLn5ih1FI$2E>?NNMrz*rB&>*01(Z(f9oTUe-v(-{7kMv(q%TD zHY8|`mRQAc8iuz;jMAkuvkyqvT>&kfb|J8B3?^s43NyA69Yg4~?p96Ge4C%nBSpi} z34Br*AOt+1mq~cm5R}3awjFW8etURo?C7Ql{59qmpX433<0@bpiQzKnFtL9i0hP4^ zNvQ3C(w^=0kT)FXmE2t5A)9{&%s>FRD{H%ezE7aerw3&lT>EtIDI`vLSr~$k<`Ekx_FD`WISjDG?s!^ zhuCU&U}c&Shx;H}S9|t!1EWv1S7k#jApV1ThLAcgHWuKGxAm;DNS^+kEvJRWzJc64 z$x!7)vB3+j(k3Qp;h_0yuf%$|^(uIkCq+-Lnjou{2UcG=`Ba~U>}ZY{3_S=IJc=V- zXTNzsc4SBTI^!8pJne$3eIQj$eM4QzJ?*Yi4}j+0`jTM<1%z6%1^oreRbL7IMDbbjP*Wdl#0x+43er)E>wl6w?$f(VPEv5u@h4_Bfu5BBx?&AxHG88Zz5nD) z?kRAjP89v|+B+l$ueWwDuJ(?joSR#@)p(75f1aT&ZWn<-G&VZb?CO#OlPndNPnwqbw@Us;82aADcajif>jt-L08&;p*g;Lv0bf>8Za-{ii zDW!KN=2TpW81MPTLaj%tiPm^v6B=| z?!`8XqsrO?Q4d&2mb!hR^^~UiAvip9GyYkW>(gWZF^omK4tHp z@X1PJi(j|KCD-Z*X2qlK82^ZFI2v8`GF=@M(7E{ZWO$!8zM*?~6h$ki8WeHtlHgqk z0!X21>3)3tk^9otT{tUg0>aRAO{DErcg-U(vwA+;XjaTc;p2G09nh? zETGG-aP=cNS?~?NPO+3nC4LBtkq6k;`IA zHgD>|E_HYk0I@k4SnJdKmXuwA=pmvux1!Jg@ESi}atkQ2Nw}nqXbqY7Lv(Bj>$M z$rH!Qj7=w!`T~ms>T&l)u03!D#Wc|9lIuhJZ>gh?)`PsQc%M^p z>GV>6?#r`e=x(Z7@+AZ(Y(3qs%?p|oi|pZW9xk!o_)5YP?WViG3VtvwK{;B25?>t1 z6N>qiU=mP5mS2LBXg99jU&v0kwbnYb@l<#Wg&TFm&!HBxIS>IEXM!}gN#}{@k1>2m z<~*Q*q^96@IMAR|(+~v87?1`Ufj#aH=q0K)7{}r9yl+(MmrFXSK5Td!l&ZB4l=5p9 zxRk7|;CMDkc5i`$^LtIW8Y5f^IDu4KW0=8M?xX}&vIuHw*71We+!7Z(sgZ@M)~x;3 zu?KENV6&u%Y4tQ_ImHWS2NfHAKTqM?S2qQo(X`e<5IsLPSN?ei*E2B#g4}X$;gDof zAl>SH?ab@dXU}dtbqdeTeyKsaYhkB1@697Fev}de+Gp#Ax6UuXdW7W4FY6p;NJb(9 z|4nn8qzWNY)vIgIYW5q)P42K_OwHqC$4whPRfT2Kk~RCl z25+!Azp%==oJ)+dKVx28ThguZEKl%jYFxVYIrE+bsgVfY57fbq)6bTzE;;&QKEsK^ zPu=zVPg^3n!p4txN?GQH`&GR*C`2hJ(93h>w@`Ydb|U=AgO69cfzhhJ-IMvHsJj@7 zv+b@kuazi8qj6v2m?6Q03MkRyArL6hE}Z~G-At+(u=G0_R3~bL-P}~q0bAQum$d_q z3i^-Vx;f>FC`_Sm*5px+K;j>lSrQ_{zL4AB>pvMel*`Pu46dKz_fu6W@c6jpI%l6L z@N!OasQMfs>27hFc9gZ^yw|MgctK$xmwrWWyUdu|>~+!2E{2TARVPkuY<;^(dbI+) z!LO|!v5ZKEDnTNp2RHSERh&kJ{iH1T&4G07kAb4ARXy$P86A@a4RnR2B3HrdgaCqt z{1GrbDkNacgHLt&Dy4zPitit5Uid?wGKhe7-UItZ4=*5?hsM&flI0Fsj~#&BDEh3R zQolNe>^7riW%4w}c6<9kidUadB{M6dra`5VS=Mn621DRO!D1SBmWP*HXUV*GeG>M- z;V3&#my1bd+7rbq74hV%lRlwNhD-94QS1JOMTLRIDC_IQQw0{PeC|9uhagH%-DIHD z20lxe`3EcA7vXLnfE3{}4CnfbzNUb_e8dj)WwW|%;2m#*t)$G8g}k9kmE-`B0UfGr zf>6K2`@{=C%9K|*9%PB(fGB5X5mX8FP-ib=bk_rAew#h)7#{VPsszgsJng8z6X6 z=-|n*1H|E_3kE{B8Dr{%AsH%o{ceHvJNiW6Y0ceXL^;ev(3B`e>;?Nx4l%UzG zq)V{olG&vzh|xnL!@R~+1IpFCYi*(K+XA|?u5P37DYvp^sI@Od&i%q%eFxv@kMn!F zO7v_@F%=8!q9mB--`)-m?hO&52)Ke9U=bqZ4Y1mfSHsY!OP+(~E}m9K5G-1?cHoni z)qqCIh>PTE@coMV224X4EN9~8WtIe zG+mi_YYc+!NOo35NMg<<84rH@?;?;44|qkWC@24z69~kAee9q+w)2VTPnT$a-b;GG z?<)RCkG*n-F6`&eyYfXbv2)RM^x`USJOX1U-zE;1J7xeICc@NocyO;M)yC1$F{%M; z(y~-5HvZi`m{8&Q^N$H{G#P^6%H^g%5_|PZQb8$jO~5b#`m-BBtO612TcR|P0Kl%v zp)ZLP0J~ui{FfAW?)A6nqS9=M^PK}6;{}gy@M5*UE$_`HmGXsr_;6uwL95oncCU!E zB2lET(4cI7e%^Uz+&F+*pt)T};$VqSbWv~kp!f*4Fl3om%oieg_>0VWruNUbF5zE% ze9|e40NW$TDKx(5H*6NVS1m*i?fCdo*C8w`ijt2cgZ zXS6H73ahBtQ6GC8ESGl7lk9(@KWc1mrv@kYBq^F7P0rfU;loLSU#bS$v$qJ@?N+@gz5{|=Vczx&?+hb`c} zCp>l5>wpMMISC4lKa)@ktQBC#9}1SqhZ~e%G2eWZ^Syy1bz;)U&o3a2-jXtA@*fK0 zz&JX|T{(gr@4YB*bKByo=RtH_<9qEcOSy&o2aXH9nLCkEQ9%=Es1z=ZYCWX64=*Qfi{tZ-m>uaNUkz`YlG@f6SLymp`edE~f~k|v`7 zzdMe4GA;HF_}%U6(gWt54(q?aI~;8fBvGIbxAVNYqVyd6%0~@`ZKq)TzOg$QG^~5Q z=j6wml}#b|At#~E%cS!U?ZIYN9Hpj4bY-RiG9gyDjYquS?jds06ZQEOst+=rn+q=S8j z^*>hQwr1ZQFFl2cUlK^FVe95~3Og>&JG!K08!l-sXG9wp_Iy*-7c>>eFF#wR6E_HX z#QU^X7=N2IG$Fy-lrsHg1&nl$w+&A;R1(F^v7(jK&olr$;nPl#JYElKGcL|J;J3>T zsx=K-J>L>I=dDig)?E@ML5+A}7W4%50Y))cr^Wt@e9sfSPx^IjV-8A|fwS0c&arpa zy#UJYZs}%ZTtegiY4nSjfea?za%n4x; z5DrYtfzURvwx29o)KLmroxHvN#))8UT#YfD-O~mW!2gQCNyGjA z30>EP0aD$dxLUqH28}(p-*gD9b4pRl(3z(iYCMKW+KL8Y&>gYu3UiSm3VGZs2}&%M zOkLTHxT{GerJj~K^wEp&t`Er(!wb#18+ai(3o|YTyS0?W@qz^LBH}+iuI+lAx_>sH z3TNgZY3umsn8nXCu^pj*(w?#L?MiJJsOsc9Dq2?_xJ1w&JXy$UnKs28#*FNKZ#ng8 zGhpZFD?px!fcsVHTmuLn2SxNYcXd-2j%w+dV5kSknzo&XdG%vQ75aIBJ#)*{*NXu| zB~LQ@A$fofss>FTo?p7-#ZqyNm*gg9{XHJo`;dEa9d0}Nd5Yxqz_ zZYA^Upvd0FA<}nBEPzv7p6?*8M$5e;G}yl12}oeQR*D;9`rNx@Z{D>E-VtuI`b9sx zE8K8;KJo|>7{Wp$Vi2}YHlW!F|Npf1oncWdLAxpndJxfr$^k(zfr@}6NfHf65+%B1 zWyu)?BrMS!07Zl)s)z_IAWM=gA~`Csoh6;0#4@;;xR3Hz_o`v ziVo6S>K=XI|59F8*Y_)@{Ruafq3I{qqW%!kDqPlN35H%n1O`*jv{p=t4)Sd_t5+9z zo-4a@Wb2lChGm~SKr$5C%?3v10$jISwP*hsI-Hos!B>U&W{7kiQfdd-O6ObdYjcB7 zGqVs!M$l55nww9crit{F>$&pf2LT?akm(9c^+Aas!+<_bcint+O_-1gymZS$tG41R zQZI*|QX2_@KJOjH&$w94%2mZ^thlZtr!=~Cu-KNaz~W%y1zSsVaB$}+DAJ79G0|-f z*PETaOcmSK-o)m%uO9sO+qEySBhY{M%CJ>y)j8Kka|%^QgM_v~bZTsJj?#;N`;6zE zLR#~?;B>!R0NMcFm+_=qNNfW?{;)9kvs!iSR1x-yD94)6?|~OnW{@<(Lmfq)t8v39DcV0d8fF_}$~tM=8E8*WN6ecj;COT{_x)b$Li?kypO&0*Ka2jBg;{S2)ycKyeXTtWy9BcbIQ z*NJGalNi;YpGl02tt7GgS^uw+CEF)p;@VzMzh|y;%pbvUOZEWZ{69e>4Mi%04>XDM z^j^b0@nBxrA`x?7KZNn~86k1BS+(h7yAdE`SD&u7!zb(59cwFET>NHr_S>U4t9DGT zzH8Zjqwn7ryFOAsJS-;>6|9lLUAFL$gv7eIXYJa{Etcp{*IEtr3_v2i4ZlYVMH25w zX{vGoPCXOU!6Bv^ed~4-Ba~~(M|Bf&1Zm4^^|wMN)4)=L9CA!kR=O&nv;Dr~`AAQw1PJ0yl)MBo?T~=F zy6wAWip5!*zYeDL*>>!GHq3x8GeSB)V>ec%-H5Db;xP|efPf+aU+nN6ccZ|mVLvLT zN6|SYmlQcxcGcKura3B8VBzE;$kNOc|FWhC>LtlI7cJsYj?922s!;zxZT_a(YK`XHXbMo$qrf-Q6 z)4Cj^D{RZ2r{1XB*@=_|!&wRzjN=+j&tB5{8v>$s>n?~)tJiR|S24Pfw8W(D{=U)U zO41`*vAGMQ2+ntGsB9}Oatl-b)ST?xVDLIFAz8%4lPCD#azM{NbB3bRt(mVjy3g3=f4Z{#j$_F==f-g~exB{QQD*P$A8AP@RT>Ht z1MeL#+jrfe8l+n-sylf*q`liR{q}e8+&mxjJlooFUtDCc#^vxTTN+&)kuJUsZHRyu`3$Gh2Fg+MYjlPfem59ct2<`*T*%s_T2agF~D zq;w|j45aH@-tJJNJUEg=D@B;K*2=fJkpAc7XbD6S2?^3FNTp}5X|N0D6QUy z<}PbxnLJ4HyM7*paFZ(*4iSi<6Uxy}gnD%6LEG&p9mO3%CCm!7(;OtgA-OJICWKG4 z=*M&R(S2zU@xPDRQ~-r3>rRUDTs9|fHi$u+WbPq7AzgPTLH8^c_okm^l74*JCDDB% zyNvvOO4&X8D@ja0fNHhF1UhZ>s-6PYg_gyOkDjRD-S_>2aAk#Luz2awWaR*bN)5RU z$ggY>QHGFDb2ZEwkl<;T%tXsNQ8dDpb#=ms*pSQ49nCbT1(tLf!rlG9ZhKZHG#26pt<9kWFhe2Y3kGssES*H0<#df)IQErt_ z4egx*Gcwfew#E7upQ=6Vr9ch0d``UDEQnbCkh1-0t{*P*qV;*Wqf&Ie*6q_XlQxJv z{Z(X(nxGS|a&XBIK05dprXkUy-pyuz3!rXP#R%<)Xurk2!%6+V<3KC&t9+~Omq49j z9HXwC-gv2m2-O{R4X0xgBp~mWhewT#jp2(d{czE2TputT;n2zJTymg3_y*?IJ5f=8ZDWDdRG8e(+hocF238N{C zzoPUMd#qZUP=u4CHN}vwBR}3|KP%Zs-^83NXP%@TDPoNv+7S>S$CNK%>D|bSinq6e zD~IKa)-Q0CoibX;vo@J-L<<4mlC^%{Uglc2K$4j8ev88Wk(99NXsK=(eAjG4eiYDl z23(MfW5MbcuQLog)-#*l8?+MmO2 zX|3nSB6|JiKv5;SCf~DO^ca`l)7IpvB)O+3Ha*U)q4um4WS1ou>64#$%9(*(k?!sK zPsRZYC22iXNXzQ6Ba|rj`;r9$8seRaP*p&6yyb0S6>Vt_MPVffT`$(s z+i6;D$BdQda}Ra(oAVh9Rl@X@3$GYMrEz_^!p}Ymh(tKq6A)eiabX ze|R?d%{at^g`|wxnjW>ybHC^xOlZ@IY>>tU^^1WDo>qdY&0!HywaA2Lz1sov7zhEI zB;O$j?V#=fbW)=2%O8CON)1sacoXAgUmqY0=1|NK&%}6~Q&bbUW&bLTNA9Y67}3Fx z#Ip(Y+mvYP?*Y34?3V%WoTUmL`zJVf5&$F6J8YY3|HkK z1e>G>kRR`Bc3Oc=!|;%5kwow-1;Y%1+U5EXq|@~cD|f_^iB=J~Pogi5hAOn8je+5E z#CXXv_?07zdT{}1q&Q0rTv;T$2ObkW~^VL8dJ_97gLa3zi&V>#hzfG9i`0eBJ2G$q$!cn9XAmX`57Tjq=|O_j7+X_EJ#@pN4~grSB!S?e7qD$v&4 z2RddwaXSk>-RTw+CB=1iy##&q!Mc`M;lI`Qt^a>>eQ5tH}4f z7Et=MVr?wujf81r2YG`VQrfiJ@T*7=Ujjxf6^je-FwHwt;Hn-_N;S*FS?{(LR1B^f zIO$Q!!-L6HXeT$!yt=Bap}zKIHn*?N?=m6=Yhhm zH83D_Fz12Q+5tyX)OLl?7SxoiBSIogg=b$&;7O?q!0!)KlBP152XMU7Ge#|}5?L-%q6 z6uElmL9OkmNUL&bqS5#xV@s*$j}xfnVSSwJy4aH}06L{!U=8AmAr}#Tetvkw1x|w< z0U4ZvF-{Hs3?`DW;8XzPkz+&v>5O5Q3p?2_&CA>IQXuh$_e$z~V{edYRkwb4q`9eC zN(Qgxqm&SGOOU#IsB3C!U@~#};*5g}UYl6q#FNh!dYG|^UoedzIuH;D9l1QD20C&= zsI`^uq918z;MKQ(1VDv_v#2n}AGnY-&)ye6O+otLE{NS4j#T=kTHl!1iPv1@KVq~E z*mGf)BXSc`cFg5z$Y6-lE1Cg49{EQF#SU-tu1cOxIO`ZuhvGF|`_fOM694u3EjxF( z1k>&@dYL6@^TURFc^OH3mX~S$7 zJR85;KGo>)M{PMAu507RW?<28lNfrKrr%H6cD)tNvwQw1`0Qta%A4DXA4T;adTsHu zxu^#|r|NtV>~rpcdAZ*U41}mUM8eW zn6346F1>n~m$fPcK2h1HQzgDSK8 zrC3P2$0Z&*E+kqDKmw%f+|6i=&*3Z|LE{rF(*(wMv)W(autWf$D;8z&e*+++aUeXc_he22-XYP3PZg*WOVnKP#o|IRZV?D@y$OD&7sQ#Av`$=Iw z36$CpkMb$hcyy7j%AUMBkkj1o+S< z-J$28xbxWAV>cj~0e2EaMoF9E=kJ<)*NSIC*D#v}3> zi!1b)eT^(goy=11OKQG}oj;L2?EOgD(Y@!>IV~sAzeW2V^tgT7D_Z|&0y=k|q}kMp z-7h9_WB(o%*&|~zgpoGtv}ccwl#g7h%@G~_Z_n%?`9x(ld0Ux^vN=nChRWRD=hO;6 zqFYhhmCJ^i(43cl>diO4Quy-qGek1H zPT55`iL0sEl60VO~@*`x|_7(G- zZ4u1Ar5?7%dzmxM1_CToD-eaBT-ZAh4A|w5cZJHe_4?4M)a&N&VyO!!A!jGVSnH0N& z)7}$3_ssBg(r#Ttdl)X)DaKqse$V-txcS0=@6;}fjxt%07_6Z+W+@3d*WKc_Z){T{ zwQaB7m+*?Vwg90il?zl)AhCe-=>(4rx>%-2uI2+yBKn}NS^#|022@w^9E0{$kP42o z%;OQn#AG@Z4oe%GWYYfT-WC*$0P3LZOrN4gMJf?OKbztcKzHW_)H2^}bGPKJZ7JIP zSuxXy-nu5$ZM+SqO?_wJr6+4w$~gPykv^P(6Wu0Z-n;aL1)YJ#rTTa=gQgq=N>Po+ zf{%jhxx(^l@!9Lw@0|g;#?lXm9CaD+=Xbnai<0t1)J;1EYfG7fl*ZiuNR85Q$5TU8 zbF=C0Q~^F?Os+zjn#4mcFi znH+kmL|7{^g&Z;K?L3nXyE>VIRw{z6eIf10vI5P6FCcSf3I|-AMDmTA9?ghlkAJg9 zf+~~@TCcs`w93Dg4~28TuANm+bm_>37<-0OffZqW_z`|4s^b0N@IBuoW?c-j#2I$= zzj!(1E_q2VA*Ztr_Ir`#9$se`*A7q!QJM1$vtmRc$(hLM`RB4#s6snq`~<@C4&V5i zJE$T|iGwm^c_2U5^mU=3pF?N~JwxGMA||$v-o=(yMey!<5#q_Ofy^~hT9mw1Lw_Wu zl|k29hvNoSNL_&@+BQ3c+GJzRYfpIgyit?1!w1n(#^fj*IEc^wv3bCzj7DVQquaT{ zEBan5%b7nJwrt_MCVx>z{fgz~#0vr%8Cr>ZYPMLbOct&-Nz8z>b@tgHi`XH z&>>pyzew{G`qky|71H~6l4Zs?&-W5l(!z#E$S&DT`pitN>rIAkk)DIo^s-&e>2aA} zBcpW*GJS~R<@!Nn6aot2=}+=jR(W$*-#|bJQg-G|nc&4_2<*TzB6y4lTk6a339q`j zvl#QfBMeGTrIk*XOg5Sidvwo<(HXN5q~kw`9qM5bINnq%Ta1!2MQH!G2-iQ^s;1`a zBEw0l(=KITzyz=ZI2pqI(#ze)IZCtWw$@y0n#o~Sa=Hu-pz|TwZC&ZRu+9gmo!7LR zp-U*`q2bLvc(HGN^*d0G4AkDiAeFXu;qVlII50sMzx!c(I@T{*gpSfR@5Jo%w>% z3S+nW%%}HGZrFu~L&FL~NUTdtRza90>j)G=N zyTH-%6tHO_ZDt476EKNwR}6IrN00-bOgVaO`RS3Q6TZlbi+al#)9*aF6h|MMSsU_P z)KS4J?b2Wb4EY~X+Zji~>X;)s$GjYQ_X?0rq9yTtv^g*m} z;=c_Z$APUFC#AJfB zbo|>dv*v|6RXb?-f|{dj2G9)r$C%>-4j$sse#_0iYED{gPzU?8j4F1|ND;zA$yY>70U|h5!F(&P z`E{;{DdII+^qZybED9!!nYG#hp9J$Z%JQ08Gx#953A0*u}*SqF-)z z{pml=@o5<*QRQN9foc($ie_fJJxm6C-pka}c%|kRzwv3vl+R~)I6C{vIs}4Tj68QP10wHR zgg~~GYqTS^Ua2e6y4Sf!dn3NFbICqgV7|>bxO=sT^6%f@QJRY=6*AfEXneuadMkr{ z+g&ASw6;awQV#X0pnowBE>PiY*Ly<|ec9&0W0UWmgNetgbp>urx(!?qn5<4)x|HNY zIAyfr%zSMRbFSyrdX}Rrd$h2^mwc0++X8jX)?JxV$u>PSr*cTXygV5f*R=?F7#F6az`tvX7i$2zyba*j|^G3db z6$$C@CX^QnO)Ez-n+tdoO#Ie+qNQFe1z`3rN4+PM7v)!LrrpR5zmz*}WZ*?OvZktZ z+3KX|XmiW{D2F1%Fs4&GX2WInD=LbO*3CT4DcCJ5rcG_n5EX@vv_wLxP6GKy^d6xn4 z{JHu~%zP1TXt`!-YDe2;`RjX07k)$_UCFLbd!H#%kA9QGVtv%gUO=RQC z=+_*gO0r2CSm#paNO&c(UfahO`l3wF$#miaLA3}JUioz$h#Y?N?F^@6J4e32gDVPc zhPm{5*Y)g`5-Rs77T(lqljwB>m*fjcNT0v~g!(-|ZEbHp&7&0|pimdCkD*bt=sp6> zLJz@_YuDgfEZRqx*Zi5S*4430Wxai3S2W|{rGE#X(o9i&x2sQOU8X62yLENkZZEtd zdux&R3?ce@LQk_xcv8Da@C^vOq_YephS{Fy0!z+!+eHm-UO}9EM)FiSYoBr3c7uZ> zumm_X3$TSPm6*BqAja=>b%Rra-0m!uxDE

^S6@V$)yJ$rTn$a>EoN;n!U`ALU~j zluM_(HyTo6A@+AUpgh%J7u(v_gpOsCp>tu+aM%jQ7|VPh(Ff9(rU}sy$@`Lo60`VX z*R%&YBN5jlgnN-mc4sNCH)0n(;$`-#3!gl{C^gU3+c!^7bM(NZ5F(Kzr!6Dp?{AT_ z14F$>fiLO%0miNIX7u>`LU^V_@APr3-=u$L*|L~y!`AU)t?-wfhIALqIm|w|V>1^f z!|B_x2=7?6V>n;-Y@;nkS!HefD`88b)S52U37%KdOC=&(vw-;WCy>3=&|kaAp*IT7 zVH7YG6c~DEaLv)gMF}8Kj!g>&d1d&&Efz83t}#t~qP96Uh>vu;;*FSF_V(a~`HI%2 zi?G;BM0^Q|%7J|gntK>NPwaN`N_yjjq7?RYU5C*1n+qGckYy6)1 zs7$CK&409@CCqH)1RID{R5kB1+Lxo8M$KLrbId zsEaN0&4J!2IE#i2ro*9jW@VgHiRmWYd$9`!0TUbGwn6KwiSXa3!((wbM=HczXV3y2 zhCuQ;pd8p^@sn2F4$xcvY!O4}{@Ne^hb^-U(Hg99DS1|B|3#(b;8G$V?YeAnmIxY6 zX9FLd6_00vw8dyxwKX>@i%mG~)k?6#HNMu zt1v{LAGKYJ6{yBRMy&r5i2p%G^0+`BPwO|PVl!@%e6Jv&CiGvc2J=JAN{kvPmY~>= z?PeU%DYOeFYjY$8q|I%4IIaeae3wgp_;v(oXlORKqna@eAHLVtpZNQjurmg6rz5&b zpb(r@ui7?!y(^M7?w{VcU7X=m(wJx9xn**8K2puz?nt;Zd1_53Nua3M(4Fh%*w%R! zl`-cEgShBKYKsg`3lUwu)5@N6HB~>cxg3>YD=A}ydapwYNQeCKcHLEja6FP}#L{Z7 zp96AIW06Ez4;Vzd#=|)QyHw(_QhF1iPY~!{wvTVU-_}RVS^jv^;0WmHg|0jomSb;Q zBDNT|Q|V;AmaKhw>yXrMH@qYm+#q^|zZHipI!~yzO_e4%4K}{-6=>cvjtp1z1$hCj zxTi$p?{1et`}JoL{f?Y%$%TDC7-p^a$f{@($%czMQ6B;rN4k6K$G%Z4$%D7^(ld)! zGT40H89$3WqaEA-7?kE9L>EVFF|I)B{Z&#HG)u;0_ECf$dGm<|cJ|V;wha$3EJPdd zqs~dXc8dC_(WPGv4{YrwOdDjZ@uA%c|(myyX;LyS;6`+>P9x`A`d_3s+Q3_q_kU>BazX1BfZ)x z!(|NIIn^Bu5mC6>O*ig41M>dz=k_#f8yo(1Gm^^`EFA(H;Xo{o%9KlN7yY|U64%qU?;w_D23;q!%ZZYO6HhxvXWuk(p@M}7-a3;Oy|urzcj5eQK@pE`8@2)+Q~9*`CPEb&%RnL&Z;wS!U` zh+%86AyfD?Pc=%4Q$Vy4ad`c%WsB?)aN9MgVqz_Y1c;`*UKRs-Z+u#^q{NJEAN|yq zDz63C%JRKLPD`LN-)tklcw0qN1Eptcs<%+_MnQbmA3Tt*U^kvw{IbjBumX4!e_2c5 zUEbx4(6OK%?_Xq3Y|s(nJ5p5d*UsAc{VCh?V5ktFKViD2`U2Am zxhgt3{_@U(-|8Gqf$r423PwU~C63cM_h(8&od#6~Y%O7;@Og#5oJ`2)^=yM^3Gk(f zyOGSKyxr$dB57=wnoafF`v1-8F0c=+*r}l-#Us>yEf-AOs{qx&&JM$r@mml zh>?TT(sIcC&}?cmFECm|r+`8C#ewcZ*#ZX*FNQJJ@4T_U-gM65n#DE6o#GU$8F_d< z$sjoYov*-fsOAeZ2HbIh{V5Zoob}5v8O2!wcq7mXU;C@WLyi@j`d*K=NN2({{cQ34U^nT*LpnO*Sf#uMNzvXj{ru)PZWSB6}k6L%oqMA z!pUXP586k5zXReP;oFVS>u=sq_Jvm@w1&g%WB<_SUPZf&d3VwWi?i*LnOaGVBBKMi zW4~SnNvgozq&ymVD^pq7hYs)dE$+e9aBDbGSBG}gRd?2+giZf)j?J@Mc1u>6w08_u z{dj5;hRFr(3(3ih*t8a$mB9>IUD)tu_L@ET_2hoPg}npG(FT$vQ66UF=cuf_9n}Pc zsvK8FRG4n9xxCr^@w91YgwrOoiS_INqFc21;h4-b5zJo{$tCr>M(KBm-^&sFUX#_V zClG0t>w{$*_#P`H-u?n$yCp*rPu`zbp1dnn9!99LqKd3|488)yaoLh;^kDNHpWf@* zsd@Y15Nl3eR^mIkzJ+#`E6^)r$CeMzPE}ZtTSW0uBUN8~GG&UGinoN&b^(+g5tPHv zqVredF@kKEYBMGXZlz zgU(?Qox$WzuLTgWlnh-wgZ z+c#ykzl5!>V7bLrN4)etzeP5b$qf1C_T{qlpi|fzkhLO;xA)I6yu6SF&Tq zwCeYn=6})K^n{t^#`+o=0YSyDR+Ab3rq-(191O`zeOnD4_3S$Kz4{>d#<@mcbeYip zZseyTB-JMg#gWD{Y?P_|td}02fIv`w&scj&(|l_5ec|%uxdxT+KlHW$VccoQSq{D@=#4=4<7z&s4fKx9pk{DD!YKc*!`CtNLEKJUxINZ=`~ad?S%HhlKA^N?R2PUkhoCu~t|I)D+3!<}8&{JcAocys5ghc)Q&T{VXJgVB;N zA$6!TecQ23sLOAam$0}3L%r8-_yT|cLW+C&kh00F=s@Idm-g2=66x^NCxe7Gl8()gJZGz8{VFwuM?SR)k+Rg2KZ9L6n=`118rmNO^Q&b@ zhtst4OMG^A>t_A&=sCD}Y4a+2+3{L;|Mc{Cwvw5*6Kd-b(;wLEw>S-{cXw z?mQ)!$&t9l^#dg6rfs)K^;q-kQ5c@k&$vDLUUEgKh$<8tM6>)Vb}?kqujGI5QQS-& z`31Py4Xe=OC!#h~1&0!f8UV!EJ+V(70Gx=iWkqO6491EfwCT-4bI%w6MngYyiw)rD zWB(v1doGib^c)Oz=){v(*;c_kI}ql z%`MNA2t{iz#+8)pn(T4un(LT`L5MB(moPS2Au?UiR;0DOMl3QIdj+#0bQ%DlXFL;L z3)Ls&>%m;bL=gYU0&;`U%`+7+hQ-v&VRh#S&7sy>W&s*bu6%UE<29BO9Pvf6S-_p<+ItgP8_QNsm^g&$J zN1x%%3u6reKiPQ@lAP|*K^e3zjSPtE=j5iGVYsCqNOo-6F0^qL|Kna;?!^vo%fdgHMxIf`s zhD}oA`7~7CDvetX5uR7Hj!dck_}wtEOZe0`rw7GOfaq0|A^_RnUtN4Nuu1EFW@>?@ zjAn*;*)H+f9omfD=PRvP*~Sw;3Pb~OTtA{Uf`CUcSZRdyFRzT7veAPS8Ur-}>KAsK zJRGp!(c#bCP9nL&%eO0uzg;1Yj1 zjn=feM45u6M=Fv&nGMc3MmSIk8YcLta?^Pr`z!!j_*Qr?3+T*(+Pf_!AsydkrO_@5 zC~C1HJBm1f&#(hB7?4w+cnS{AWvH56c!TSKvrI^n8YeL`YeiWyKGOW!tkQ}4rir)0Jv5&^0hQH@( zy#R{paGxoA0(*wmTY$}HBYYXZKL}!9GGsJ`oJCIfb$%9I56Mnv`#;dakSTo6><%LH z=qVRvrrqD_)Rj#r3`} z%g^p#Gzbf}A^hq3O~Ei4&r(lV9D)`fthMKaAi!T`${quOGH5&26Mn@-q7ycI@THIo z?)Q=eS$9|baK#Y?$Nzw3J#(x78jj)!lmAn!E}R}F)Q3DnTDN4;u2*Qkl*mMs?TV&G*%Be*qk@n!@* zHh}(j?Zo>NcPx!qs)Up7swL%~C*RY9WgnGJgE*t~h`w0vRIn2101$Yes3WX2)i2fxxMAxSop zvi?1<6w$4szLLiXliwb64{VH@MiR2wkv2L-07NUvert1@#yv2xr{~pOO`6_Soi&_%QoVEhq9*;|JwxO z`<@c`r?bQzyljZ;b%7ai(vadvDSw0azCi+k0JewQ7PMMX)u@3TChSphb!I`%7pr|Y zYVGrvAHUhO9o|t!kbOlzvXuy7ZPZI&grkKZ^KkOZ&=wH^xjG?Xp~?zBw$P=G0*|=k zoF(C+zrUyFgxY&0zIG(^#QL=>Pml(cmuPFjnW`LSJSimD%>#)drIgQ!kntV2C2>X< z7VcOpm}@skiEb&Ed1@PwFhi;;(I_ngu-UtnRAUAHS~Y@X2#^!w1sp}c-!M1^35+1| zxNJbo;X29q3hpwOkDa4*;lj4({><;4J5*q46(T3a*6&NDgmob=WN(Blxv^+FL(ww6 z2c9@e8$gWXknFj>K6FCCCQqW`v&z2h(nl&>+-T3u`Y$_iWlv9`UUz<}Fn(8oVa^fn zqCGA0R-e@kIb-F`I$C1?d3b!=_d_SmsU9tV*DOzHRKR?AtvXvTtf;XP_`tU@A>n0$ ztm~DTy&1`kA1-RdU(Y@;Wj!e%p4%>Jy5VI&Fgl;sr8imaA^H8=Ms@4W0T<83jk%@` zM1bm(si~=FZk)6}gx_P;&eCq%wl^q<;dS`J-q_1E`Y z>OW7u&5$qMsXUCO!^^Tj^B&Gobv|kA82QRnZ<{1qBHfmQ1@|=S(eQBm`8d&dgLqr{ zR*D6vr#J&s-d32+J3rnPHQ03Y>^o88Le&jbHuAK4&2WC9!x8y9dQ=d6L19%}7~UY# zLETP8MddK3O4J!m9;dOE)Y(aML4e|_=gm4i*uz>mDeAd;C)FVy=&kzkb>R}pn$5Fy zQU*BPk+Iga5Da0=J|ky-qEiQ_Yh!5GgRjfIBdFFKd>(en5D&uGHzmE4f@KSMK1APb znDqQR92c?RIRoca-FZ}6S<{G^+niadLw0Mlkh^YDvdMaqzPX@Ui2KmCg=X5tOn&kvuM+n{Iy% z!c`Pu*y@^B1-Y~|=7+C6U?HVth8v#58p#s)&)}Bz2=5~{BnU?%C>F%92EHSs+&aP~ z6YVySbjAk20`TVJNdPteNkKrGr^_bptN=o4}HSk+D}Lkg1bO&FF_f zV$V7@#ELtV?^Vo z%KmH>@-5P%VmxTQ>{}=nNkIm7&C4uI9;tY?;ZuZ0GL~}7YbRDlJV(D;nE&pA#>J%= zUkM~}E>awuo*x;NaIt#Snbo-;nUPGQdN}8-#s~3kQmO{3l&#h~d$>GMXzBX}@mVL2 zmghzCtq<_L2~?@uwMHQf1mYG zoJI`4L(YwH9`kNBBD~|BC*=jr?_(|yewLfM^pCdGyDzm_S7Jhb1*Rq|mp#A*V5Xy5 zK6xrDJ*B_oWcBgJ_#X2@^X@~oHzHRkRp{pW)1%zo-F{3THDsB@$Dyj$B-hKMI%vq( zp&=gSVdvNHxxdZI;yfK=`c{+ml%}{&n`xVEn@gJ%`L454IY-gTWjjJ-`dSQK&ev#9 z2uPHK&r@Q2ssoaDqJnOtZTKZuWf~J=T)X1GuYZ>eDk~I8vn5wx=y_YYJjbZ2bk?VD zDBEQ=LoIkkaRq3@$(ILIn(yK&f(+Kx3p6iZy_EDFTUZk|A3PuOa)q)u4yb6|!|5zH zmo?MZZ4Zb6az8=K?3DDnbc0_naMFP9Sn5{+61RL9aMq^OqFWgP@7CU7h%jUricHR? zjLH%MtVW)6yWRGR3W&mR8HiuXj_zJ3y3b0IW#4{4lCBscz}UCueQaum0J6DzBSFh?oPY+mQP9JMw1 zY&Oc*_1<^3@%L#Z`3q9s}a@?1x!?}P+D_PQ|ke)A}dN{ugmT^xg;^> zVmL{3$x$bFwQyg9T99jjdhGIjDG}^o{DeA1Ha?2n#oe>@8fWY)$aT4x zBzE{hoXNK8*wanIsKlRcYMtaS`21rZUjJu6)RHbvC-?N}V{t}v9pWV;-(QR<5R*oA zf=L0NaO=U4K;qGMsmvq#cbqvsv53x^Jf)AWoQ*x&6f~lse`mlkTc6S;_PbD~OZ`+? zWs{h3pYv?XJ;V?GbOTGoS0h)5$KwmsDXYZC#k)8>L%HE`WTPZtoc_6?)m_h)*i0;&3AUWk7O-G2;**^7JbcPbxnMaqe>#bC z&(|;wt{KZ;jQ~OM^V(PvpVsQ$kY648dZr5~v^%jvX6))))lJdCAxN6ksl9KY( zMmg^H7Qr9<=}RtR8WWC{cpb1etmB>*H^}Cl$Tm{^Wn4IONtWO`6`fMZ=(SL{v5&V| z&e~z%QKQKj1HvIWFLRewZA{0~bil!1d#)-6I#Nzetzo($Iv-p1sJY9(Ves;((HR3_ z+Z#G;mNQOuwMo-hz4qCm@$pmZl!m`%-AN7`+_FWxUPqTAR%X7x!bo24kVT>lpm-nr nb0<3`bbpJP&;R!uwZCpqwNJdam=f$IDbw`_ :cite:`nlp-retro-borgeaud2021improving` by Deepmind. +The NeMo RETRO Model is an open-source implementation of the paper, and it has the following differences/features compared to Deepmind's proposed implementation: + +1. The NeMo RETRO Model is built on top of NeMo Megatron code, allowing for efficient training of large language models in a cluster environment. +2. The NeMo RETRO Model uses `Faiss `_ :cite:`nlp-retro-jegou2022faiss` as the K$N search library, which can be accelerated by GPUs. +3. The NeMo RETRO uses `RoPe relative positional encoding `_ :cite:`nlp-retro-su2021roformer`. +4. The NeMo RETRO uses `SentenceTransformers `_ :cite:`nlp-retro-reimers2019sentence` as the retriever encoder. +5. The NeMo RETRO supports `mu-Transfer `_ :cite:`nlp-retro-yang2022tensor`, allowing for scalable training of the RETRO model via Zero-Shot Hyperparameter Transfer. + +Quick start +************ +Steps below demonstrate training and evaluating a NeMo RETRO model + +Data pre-processing +------------------- + +Step 1: Collect training data +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The RETRO model uses two types of data: training data, which typically consists of 64-token chunks, and retrieval data, which typically consists of 128-token chunks. +The training data is used to train the model, while the retrieval data is used to supplement the language model. +It's possible to use the same data for both training and retrieval, as long as duplicates are removed properly, as described below. +Both types of data are stored in a loose JSON format, with each line containing a single text sample. For example: + +.. code-block:: json + + {"src": "www.nvidia.com", "text": "The quick brown fox", "type": "Eng", "id": "0", "title": "First Part"} + {"src": "The Internet", "text": "jumps over the lazy dog", "type": "Eng", "id": "42", "title": "Second Part"} + +The name of the text field of the json can be changed by using the ``--json-key`` flag in ``preprocess_data_for_megatron.py``. The other metadata are optional and are not used in training. + +Step 2: Convert training data into memory map format +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The loose json is then processed into a binary format for training and retrieval. To convert the json into mmap, cached index file. +Set the ``--dataset-impl`` flag to `retmmap`, which is the memory map format dedicated for RETRO model. + +An example script to prepare data for RETRO training is: + +.. code-block:: bash + + python scripts/nlp_language_modeling/preprocess_data_for_megatron.py \ + --input=/dataset/pubmed_train.jsonl \ + --json-keys=text \ + --tokenizer-library=megatron \ + --apply-ftfy \ + --dataset-impl=retmmap \ + --merge-file=/dataset/gpt2-merges.txt \ + --vocab-file=/dataset/gpt2-vocab.json \ + --tokenizer-type=GPT2BPETokenizer \ + --output-prefix=/result/pubmed_train \ + --need-pad-id \ + --append-eod \ + --retrieval-db \ + --chunk_size=64 \ + --workers=48 + +The RETRO model processes chunked documents using 64 tokens as the default chunk size. The RETRO memory map dataset will add padding +tokens to the end of each document to make it a multiple of 64. The ``--need-pad-id`` argument adds a padding token to the tokenizer +if it doesn't already have one. The ``--append-eod`` argument controls whether to add ``end-of-document`` tokens to the preprocessed +data, and the ``--retrieval-db`` argument indicates whether to create a retrieval database for the preprocessed data. If ``--retrieval-db`` +is used, it will add an additional 64 padding tokens at the end of the document. The ``--chunk_size`` and ``--workers`` arguments +control the size of the data chunks to be processed and the number of worker processes to use, respectively. + +Following is the retro memory map index data format: + +.. list-table:: + :widths: 25 25 25 25 25 25 + + * - 'MMIDRET\x00\x00' (header 9 bytes) + - 1 (version 8 byte) + - dtype code :sup:`1` (1 byte) + - sentence count (8 byte) + - chunk size (8 byte) + - chunk count (8 byte) + * - retrieved db :sup:`2` (1 byte) + - number of tokens for each of sentences ( int32 array) + - start of sentence address in byte (int64 array) + - start of chunk id (int64 array) + - chunk id address in byte (int64 array) + - + +:sup:`1` 1: np.uint8, 2: np.int8, 3: np.int16, 4: np.int32, 5: np.int64, 6: np.float, 7: np.double, 8: np.uint16 + +:sup:`2` When building the indexed dataset, we pad each sentence to be a multiple of ``chunk_size`` with ``pad_id`` from the tokenizer. +The number of tokens for each sentence includes the padded token ids. For retrieval data, there is an extra ``chunk_size`` padding at +the end of each sentence, and the ``retrieved_db`` flag is set to True. However, the number of tokens for each sentence excludes this extra ``chunk_size`` padding. + +Following is the retro memory map binary data format: + +.. list-table:: + :widths: 65 + + * - token id array for sentence 0,1, 2 ... (dtype :sup:`3` array) + +:sup:`3` np.uint16 vocab_size < 65500 else np.int32 + +Step 3: Create Faiss index for retrieval data +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +After creating the memory map retrieval data binary file and index files, we can build a Faiss index that can quickly find the K-nearest neighbors of a given +chunk ID based on a query embedding vector. Because the retrieval data is typically very large, we break this process down into three steps. + +Step 3.1: Train the Faiss index structure +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In this step, it uses a subset of the retrieval data to train a empty Faiss index. An example script is: + +.. code-block:: bash + + python scripts/nlp_language_modeling/build_retrieval_index.py \ + --input_file=/result/pubmed_train_text_document \ + --tokenizer-library=megatron \ + --tokenizer-type=GPT2BPETokenizer \ + --merge-file=/dataset/gpt2-merges.txt \ + --vocab-file=/dataset/gpt2-vocab.json \ + --percent=1.0 \ + --sentence_transformer_model=all-mpnet-base-v2 \ + --batch_size=1024 \ + --train_index_size=2000000 \ + --workers=2 \ + --devices=0,1,2,3,4,5,6,7 \ + --stage=0 \ + --output_file=/result/pubmed_faiss_learn.index + +This command is used to build an empty Faiss index using the 2000000 training data in ``pubmed_train_text_document``. +The ``all-mpnet-base-v2`` sentence transformer model is used to encode the chunk tokens into an embedding vector. +The index will be saved in the result directory as ``pubmed_faiss_learn.index``. This command specifies using 8 GPUs to train the Faiss index. + +Step 3.2: Add retrieval data into sharding index +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This step adds all the retrieval data to the empty Faiss index created in the previous step. An example script is: + +.. code-block:: bash + + python scripts/nlp_language_modeling/build_retrieval_index.py \ + --input_file=/result/pubmed_train_text_document \ + --tokenizer-library=megatron \ + --tokenizer-type=GPT2BPETokenizer \ + --merge-file=/dataset/gpt2-merges.txt \ + --vocab-file=/dataset/gpt2-vocab.json \ + --percent=1.0 \ + --sentence_transformer_model=all-mpnet-base-v2 \ + --batch_size=1024 \ + --shard_id=0 \ + --total_shards=10 \ + --workers=2 \ + --devices=0,1,2,3,4,5,6,7 \ + --stage=1 \ + --learned_index=/result/pubmed_faiss_learn.index \ + --output_file=/result/pubmed_faiss_shard0.save + +This command breaks the retrieval data into ``total_shards`` shards and adds the data in the shard specified by ``shard_id``. +The result is saved to a file specified by ``output_file``. In the example above, 10 sharding indexes are created. + +Step 3.3: Merge the sharding indexes into final Faiss index +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This step merges all the sharding indexes created in the previous step into the final Faiss index. An example script is: + +.. code-block:: bash + + python scripts/nlp_language_modeling/build_retrieval_index.py \ + --stage=2 \ + --devices=0,1,2,3,4,5,6,7 \ + --learned_index=/result/pubmed_faiss_learn.index \ + --shard_index_input=/result/pubmed_faiss_shard \ + --output_file=/result/pubmed_faiss_final.index + +Step 4: Build KNN index +^^^^^^^^^^^^^^^^^^^^^^^ + +During training, it is inefficient to run a query to find the K-nearest neighbor chunk IDs for each training data point. +This can be pre-calculated by building a KNN index before training. The KNN index maps the training data chunk IDs to the K-nearest neighbor chunk IDs +in the retrieval data. As with building the Faiss index, this process is divided into two steps. + +Following is the KNN index data format: + +.. list-table:: + :widths: 25 25 25 25 45 + + * - 'KNNRETM\x00\x00' (header 9 bytes) + - 1 (version 8 byte) + - K number of neighbors (8 byte) + - Number chunks (8 byte) + - Map to K retrieval data chunk IDs, shape (number_chunks, K) ( int64 array) + +Step 4.1: Build KNN sharding index +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The KNN index is built using the memory-mapped training data created by the ``preprocess_data_for_megatron.py`` script and the Faiss index +file for the retrieval data built by the ``build_retrieval_index.py`` script. + +An example script is: + +.. code-block:: bash + + python scripts/nlp_language_modeling/build_knn_map_index.py \ + --input_file=/result/pubmed_eval_text_document \ + --tokenizer-library=megatron \ + --tokenizer-type=GPT2BPETokenizer \ + --merge-file=/dataset/gpt2-merges.txt \ + --vocab-file=/dataset/gpt2-vocab.json \ + --process_chunk_size=10000 \ + --sentence_transformer_model=all-mpnet-base-v2 \ + --batch_size=1024 \ + --K_neighbors=50 \ + --workers=2 \ + --devices=0,1,2,3,4,5,6,7 \ + --remove_duplicate \ + --dedup_margin=70 \ + --nprobe=100 \ + --shard_id=0 \ + --total_shards=10 \ + --stage=1 \ + --output_file=/dataset/pubmed_knn_shard0.save \ + --faiss_index=/result/pubmed_faiss_final.index + +In this example, the training data is broken into ``total_shards`` shards, and the KNN index is calculated for the shard specified by ``shard_id``. +The result is saved to a file specified by ``output_file``. In the example above, 10 KNN sharding indexes are created. + +Use the ``remove_duplicate`` flag if the training data and retrieval data are the same to remove neighbors from the same document. + +Step 4.2: Merge KNN sharding index into final KNN index +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +An example script is: + +.. code-block:: bash + + python scripts/nlp_language_modeling/build_knn_map_index.py \ + --stage=2 \ + --output_file=pubmed_knn_final.save \ + --shard_index_input=pubmed_knn_shard + +Train NeMo RETRO Model +----------------------- + +Once the training data, retrieval data, KNN index, and Faiss index are prepared, we are ready to train the RETRO model. In the NeMo implementation, +the RETRO model can be pre-trained with or without the `mu-Transfer `_ :cite:`nlp-retro-yang2022tensor` feature. We will introduce both ways. + + +The table below lists some of the common parameters that can be configured for model pre-training. + ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| **Parameter** | **Default** | **Description** | ++==================================+=============+========================================================================================+ +| model.micro_batch_size | 4 | the micro batch size used for training | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.tensor_model_parallel_size | 1 | tensor model parallel size | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.encoder_seq_length | 2048 | token sequence length | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.chunk_size | 64 | the chunk size used to retrieve | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.enc_num_layers | 4 | total number of encoder layers | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.dec_num_layers | 6 | total number of decoder layers | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.enc_cross_attention | [3] | layer numbers for cross attention in encoder | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.dec_cross_attention | [3,4,5] | layer numbers for chunked cross attention in decoder | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.add_position_embedding | FALSE | whether to add the absolute position encoding | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.hidden_size | 768 | model hidden size | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.ffn_hidden_size | 3072 | model FFN hidden size. Usually 4 * hidden_size | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.num_attention_heads | 12 | number of attention heads | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.init_method_std | 0.02 | standard deviation of the zero mean normal distribution used for weight initialization | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.hidden_dropout | 0.1 | dropout probability for hidden state transformer | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.attention_dropout | 0.1 | dropout probability in the attention layer | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ +| model.ffn_dropout | 0 | dropout probability in the feed-forward layer | ++----------------------------------+-------------+----------------------------------------------------------------------------------------+ + + +Option 1: Train the NeMo RETRO model *without* mu-Transfer +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +An example RETRO pre-training script is: + +.. code-block:: bash + + python examples/nlp/language_modeling/megatron_retro_pretraining.py \ + trainer.devices=8 \ + trainer.num_nodes=2 \ + trainer.accelerator=gpu \ + trainer.max_steps=800000 \ + trainer.precision=16 \ + exp_manager.exp_dir=/result/retro_model \ + model.apply_query_key_layer_scaling=False \ + model.tensor_model_parallel_size=8 \ + model.optim.name=adamw \ + model.enc_num_layers=2 \ + model.dec_num_layers=32 \ + model.enc_cross_attention=[0] \ + model.dec_cross_attention=[8,11,14,17,20,23,26,29,31] \ + model.hidden_size=4096 \ + model.ffn_hidden_size=16384 \ + model.num_attention_heads=32 \ + model.tokenizer.merge_file=/dataset/gpt2-merges.txt \ + model.tokenizer.vocab_file=/dataset/gpt2-vocab.json \ + model.data.data_prefix=[/result/pubmed_eval_text_document] \ + model.data.knn_index=[dataset/pubmed_knn_final.save] \ + model.data.retrieval_prefix=/result/pubmed_eval_text_document \ + model.micro_batch_size=8 + + +During the training, launch Tensorboard to monitor training like so: + +.. code-block:: bash + + tensorboard --logdir /result/retro_model --bind_all + +.. note:: Weights and Biases (WandB) is supported too. Add ``exp_manager.create_wandb_logger=True`` to the model training arguments to enable it. + +After the training, the model nemo file can be found at the result checkpoint directory. + +Option 2: Train the NeMo RETRO model *with* mu-Transfer +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +`mu-Transfer `_ :cite:`nlp-retro-yang2022tensor` paper proposed a method to zero-shot transfer hyperparameter to train a larger model. +This can be done in 3 steps in NeMo RETRO implementation. + + +Step 1. find optimal hyper parameter for a small base model +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Use the pre-training code in Option 1, either manually or automatically ind a set of optimal hyperparameter for a small base RETRO +model. This is can be done cheaply ans fast due to the small model size. + +Step 2. calculate the shape file that can be used to run mu-Transfer +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The shape file determines which hyperparameters will be scaled up, allowing the model to adjust the learning rate, weight scaling factor, etc. + +Here is an example shape file calculation script: + + +.. code-block:: bash + + python examples/nlp/language_modeling/megatron_retro_cal_shape.py \ + trainer.devices=8 \ + trainer.num_nodes=1 \ + trainer.accelerator=gpu \ + exp_manager.exp_dir=/result/retro_model \ + base_model.enc_num_layers=2 \ + delta_model.enc_num_layers=2 \ + base_model.dec_num_layers=32 \ + delta_model.dec_num_layers=32 \ + base_model.tensor_model_parallel_size=8 \ + delta_model.tensor_model_parallel_size=8 \ + base_model.dec_cross_attention=[8,11,14,17,20,23,26,29,31] \ + delta_model.dec_cross_attention=[8,11,14,17,20,23,26,29,31] \ + base_model.enc_cross_attention=[0] \ + delta_model.enc_cross_attention=[0] \ + base_model.hidden_size=768 \ + base_model.ffn_hidden_size=3072 \ + delta_model.hidden_size=96 \ + delta_model.ffn_hidden_size=384 \ + base_model.num_attention_heads=16 \ + delta_model.num_attention_heads=16 \ + model.shape_file=tp8_32depth_o1_rel_shape_info.yaml + +In this example, the ``base_model`` refers to the small base model for which an optimal set of hyperparameters has been determined. +The ``delta_model`` refers to a model with certain hyperparameters that have been scaled up or down. In this case, +the ``hidden_size`` and ``ffn_hidden_size`` have been changed in the ``delta_model``, allowing these two parameters to be scaled freely later. + +Step 3. Pretrain mu-Transfer RETRO model +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Once the shape file is created, we can start training a RETRO model. The model training can be scale up freely using the hyperparameters +specified by the delta model and the shape file. + +An example mu-Transfer pre-training script is: + +.. code-block:: bash + + python examples/nlp/language_modeling/megatron_retro_mutransfer_pretrain.py \ + trainer.devices=8 \ + trainer.num_nodes=2 \ + trainer.accelerator=gpu \ + trainer.max_steps=500000 \ + trainer.precision=16 \ + exp_manager.exp_dir=/result/retro_model \ + model.apply_query_key_layer_scaling=False \ + model.tensor_model_parallel_size=8 \ + model.optim.name=muadamw \ + model.enc_num_layers=2 \ + model.dec_num_layers=32 \ + model.enc_cross_attention=[0] \ + model.dec_cross_attention=[8,11,14,17,20,23,26,29,31] \ + model.hidden_size=4096 \ + model.ffn_hidden_size=16384 \ + model.num_attention_heads=32 \ + model.tokenizer.merge_file=/dataset/gpt2-merges.txt \ + model.tokenizer.vocab_file=/dataset/gpt2-vocab.json \ + model.data.data_prefix=[/result/pubmed_eval_text_document] \ + model.data.knn_index=[dataset/pubmed_knn_final.save] \ + model.data.retrieval_prefix=/result/pubmed_eval_text_document \ + model.micro_batch_size=8 \ + model.shape_file=tp8_32depth_o1_rel_shape_info.yaml + +.. note:: We have chosen to use ``muadamw`` as the optimizer for use with the mu-transfer method. Currently, only ``muadam`` and ``muadamw`` are supported. + +Similarly to the pre-training in Option 1, the model nemo file can be found at the result checkpoint directory after training is complete. + +Run NeMo RETRO Model Inference +------------------------------- + +Once the NeMo RETRO model has been trained, we can put it into inference mode and experiment with it. +During inference, we are not limited to the static Faiss index that we built earlier for KNN queries. +We can feed any external data to the model as retrieval context. NeMo RETRO implementation supports dynamic retrieval service, +allowing users to add, reset, and query new documents on the fly. + +We have built a simple web client that makes it easy for users to play around with the model. Here is an example script to launch the server: + +.. code-block:: bash + + python examples/nlp/language_modeling/megatron_retro_eval.py \ + trainer.devices=8 \ + trainer.num_nodes=1 \ + trainer.accelerator=gpu \ + trainer.precision=16 \ + retro_model_file=megatron_retro.nemo \ + tensor_model_parallel_size=8 \ + pipeline_model_parallel_size=1 \ + retrieval_service.sentence_bert.devices=\'0,1,2,3,4,5,6,7\' \ + retrieval_service.services.0.faiss_devices=\'0,1,2,3,4,5,6,7\' \ + retrieval_service.services.1.faiss_devices=\'0,1,2,3,4,5,6,7\' \ + retrieval_service.services.0.faiss_index=/result/pubmed_faiss_final.index \ + retrieval_service.services.0.retrieval_index=/result/pubmed_eval_text_document \ + retrieval_service.neighbors=2 \ + retrieval_service.pad_tokens=True \ + retrieval_service.store_retrieved=True \ + server=True \ + web_server=True \ + share=True \ + username=test \ + password=test123 + +Set the retro_model_file to use the nemo file generated in the pre-training step. After launching the server, copy-paste the URL from +the terminal into your browser. Use the specified username and password to log in and have fun experimenting with the RETRO model. + +References +************ + +.. bibliography:: ../../nlp_all.bib + :style: plain + :labelprefix: nlp-retro + :keyprefix: nlp-retro- + diff --git a/docs/source/nlp/nlp_all.bib b/docs/source/nlp/nlp_all.bib index 290c30ca630c..fd0f15f6d1da 100644 --- a/docs/source/nlp/nlp_all.bib +++ b/docs/source/nlp/nlp_all.bib @@ -180,3 +180,39 @@ @article{chen2019bert journal={arXiv preprint arXiv:1902.10909}, year={2019} } + +@article{borgeaud2021improving, + title={Improving language models by retrieving from trillions of tokens}, + author={Borgeaud, Sebastian and Mensch, Arthur and Hoffmann, Jordan and Cai, Trevor and Rutherford, Eliza and Millican, Katie and Driessche, George van den and Lespiau, Jean-Baptiste and Damoc, Bogdan and Clark, Aidan and others}, + journal={arXiv preprint arXiv:2112.04426}, + year={2021} +} + +@article{su2021roformer, + title={Roformer: Enhanced transformer with rotary position embedding}, + author={Su, Jianlin and Lu, Yu and Pan, Shengfeng and Wen, Bo and Liu, Yunfeng}, + journal={arXiv preprint arXiv:2104.09864}, + year={2021} +} + +@article{reimers2019sentence, + title={Sentence-bert: Sentence embeddings using siamese bert-networks}, + author={Reimers, Nils and Gurevych, Iryna}, + journal={arXiv preprint arXiv:1908.10084}, + year={2019} +} + +@article{yang2022tensor, + title={Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer}, + author={Yang, Greg and Hu, Edward J and Babuschkin, Igor and Sidor, Szymon and Liu, Xiaodong and Farhi, David and Ryder, Nick and Pachocki, Jakub and Chen, Weizhu and Gao, Jianfeng}, + journal={arXiv preprint arXiv:2203.03466}, + year={2022} +} + +@article{jegou2022faiss, + title={Faiss: Similarity search and clustering of dense vectors library}, + author={J{\'e}gou, Herv{\'e} and Douze, Matthijs and Johnson, Jeff and Hosseini, Lucas and Deng, Chengqi}, + journal={Astrophysics Source Code Library}, + pages={ascl--2210}, + year={2022} +} From 8ae60f5bbefa0e12c33245664d5e516768805ca1 Mon Sep 17 00:00:00 2001 From: anteju <108555623+anteju@users.noreply.github.com> Date: Mon, 12 Dec 2022 10:23:45 -0800 Subject: [PATCH 027/154] Fix: setup_multiple validation/test data (#5585) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix: setup_multiple validation/test data (#5585) Signed-off-by: Ante Jukić Signed-off-by: Elena Rastorgueva --- nemo/core/classes/modelPT.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/nemo/core/classes/modelPT.py b/nemo/core/classes/modelPT.py index 9a1444bc1137..5dc30ed2a3fd 100644 --- a/nemo/core/classes/modelPT.py +++ b/nemo/core/classes/modelPT.py @@ -721,7 +721,7 @@ def setup(self, stage: Optional[str] = None): and self._cfg.validation_ds.get('defer_setup', False) ) if self.val_dataloader() is None and val_deferred_setup: - self.setup_validation_data(self._cfg.validation_ds) + self.setup_multiple_validation_data(val_data_config=self._cfg.validation_ds) if stage == 'test': test_deferred_setup = ( @@ -730,7 +730,7 @@ def setup(self, stage: Optional[str] = None): and self._cfg.test_ds.get('defer_setup', False) ) if self.test_dataloader() is None and test_deferred_setup: - self.setup_test_data(self._cfg.test_ds) + self.setup_multiple_test_data(test_data_config=self._cfg.test_ds) def train_dataloader(self): if self._train_dl is not None: From 99dc03bcdf3b9d1260f1d441d578e3ba1aa3700b Mon Sep 17 00:00:00 2001 From: Sean Naren Date: Mon, 12 Dec 2022 19:18:45 +0000 Subject: [PATCH 028/154] Move to optimizer based EMA implementation (#5169) * Move to optimizer Signed-off-by: SeanNaren * Fix replacing weights Signed-off-by: SeanNaren * Allow swapping of weights be optional Signed-off-by: SeanNaren * Save 2 models Signed-off-by: SeanNaren * Use different hook Signed-off-by: SeanNaren * Expose cpu device Signed-off-by: SeanNaren * Add clause to see if this fixes issue with O2 optimizer Signed-off-by: SeanNaren * Try to get O2 working Signed-off-by: SeanNaren * WIP Signed-off-by: SeanNaren * Fixes Signed-off-by: SeanNaren * Fixes to tests Signed-off-by: SeanNaren * Add guard Signed-off-by: SeanNaren * Remove import Signed-off-by: SeanNaren * Add guard Signed-off-by: SeanNaren * Add comment Signed-off-by: SeanNaren * Remove overwrite Signed-off-by: SeanNaren * Add BatchNorm, currently tests fail Signed-off-by: SeanNaren * Fix tests/functionality for batch norm Signed-off-by: SeanNaren * Get rid of NLP changes Signed-off-by: SeanNaren Signed-off-by: SeanNaren Signed-off-by: Elena Rastorgueva --- nemo/collections/common/callbacks/ema.py | 395 ++++++++++++++++------- nemo/utils/exp_manager.py | 33 +- tests/collections/common/test_ema.py | 219 +++++-------- 3 files changed, 383 insertions(+), 264 deletions(-) diff --git a/nemo/collections/common/callbacks/ema.py b/nemo/collections/common/callbacks/ema.py index 58d6a1668ab2..56db703aab28 100644 --- a/nemo/collections/common/callbacks/ema.py +++ b/nemo/collections/common/callbacks/ema.py @@ -1,4 +1,4 @@ -# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -11,25 +11,17 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -import os.path -import warnings -from typing import Any, Dict, List, Optional +import contextlib +import copy +import logging +import os +import threading +from typing import Any, Dict, Iterable import pytorch_lightning as pl import torch from pytorch_lightning import Callback -from pytorch_lightning.utilities import rank_zero_warn from pytorch_lightning.utilities.exceptions import MisconfigurationException -from pytorch_lightning.utilities.types import STEP_OUTPUT - -from nemo.utils import logging - -try: - import amp_C - - apex_available = True -except Exception: - apex_available = False class EMA(Callback): @@ -42,88 +34,83 @@ class EMA(Callback): Args: decay: The exponential decay used when calculating the moving average. Has to be between 0-1. - apply_ema_every_n_steps: Apply EMA every n global steps. - start_step: Start applying EMA from ``start_step`` global step onwards. - evaluate_ema_weights_instead: Validate the EMA weights instead of the original weights. - Note this means that when saving the model, the validation metrics are calculated with the EMA weights. - save_ema_weights_in_callback_state: Enable saving ema weights in callback state. - This is not required when using NeMo as the experiment manager handles saving weights. + validate_original_weights: Validate the original weights, as apposed to the EMA weights. + every_n_steps: Apply EMA every N steps. + cpu_offload: Offload weights to CPU. """ def __init__( - self, - decay: float, - apply_ema_every_n_steps: int = 1, - start_step: int = 0, - save_ema_weights_in_callback_state: bool = False, - evaluate_ema_weights_instead: bool = False, + self, decay: float, validate_original_weights: bool = False, every_n_steps: int = 1, cpu_offload: bool = False, ): - if not apex_available: - rank_zero_warn( - "EMA has better performance when Apex is installed: https://github.com/NVIDIA/apex#installation." - ) if not (0 <= decay <= 1): raise MisconfigurationException("EMA decay value must be between 0 and 1") - self._ema_model_weights: Optional[List[torch.Tensor]] = None - self._overflow_buf: Optional[torch.Tensor] = None - self._cur_step: Optional[int] = None - self._weights_buffer: Optional[List[torch.Tensor]] = None - self.apply_ema_every_n_steps = apply_ema_every_n_steps - self.start_step = start_step - self.save_ema_weights_in_callback_state = save_ema_weights_in_callback_state - self.evaluate_ema_weights_instead = evaluate_ema_weights_instead self.decay = decay + self.validate_original_weights = validate_original_weights + self.every_n_steps = every_n_steps + self.cpu_offload = cpu_offload - def on_train_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: - logging.info('Creating EMA weights copy.') - if self._ema_model_weights is None: - self._ema_model_weights = [p.detach().clone() for p in pl_module.state_dict().values()] - # ensure that all the weights are on the correct device - self._ema_model_weights = [p.to(pl_module.device) for p in self._ema_model_weights] - self._overflow_buf = torch.IntTensor([0]).to(pl_module.device) - - def ema(self, pl_module: "pl.LightningModule") -> None: - if apex_available and pl_module.device.type == "cuda": - return self.apply_multi_tensor_ema(pl_module) - return self.apply_ema(pl_module) - - def apply_multi_tensor_ema(self, pl_module: "pl.LightningModule") -> None: - model_weights = list(pl_module.state_dict().values()) - amp_C.multi_tensor_axpby( - 65536, # todo (sean): chunk size, should we expose? - self._overflow_buf, - [self._ema_model_weights, model_weights, self._ema_model_weights], - self.decay, - 1 - self.decay, - -1, - ) - - def apply_ema(self, pl_module: "pl.LightningModule") -> None: - for orig_weight, ema_weight in zip(list(pl_module.state_dict().values()), self._ema_model_weights): - diff = ema_weight.data - orig_weight.data - diff.mul_(1.0 - self.decay) - ema_weight.sub_(diff) - - def should_apply_ema(self, step: int) -> bool: - return step != self._cur_step and step >= self.start_step and step % self.apply_ema_every_n_steps == 0 - - def on_train_batch_end( - self, trainer: "pl.Trainer", pl_module: "pl.LightningModule", outputs: STEP_OUTPUT, batch: Any, batch_idx: int - ) -> None: - if self.should_apply_ema(trainer.global_step): - self._cur_step = trainer.global_step - self.ema(pl_module) + def on_fit_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: + device = pl_module.device if not self.cpu_offload else torch.device('cpu') + trainer.optimizers = [ + EMAOptimizer( + optim, + device=device, + decay=self.decay, + every_n_steps=self.every_n_steps, + current_step=trainer.global_step, + ) + for optim in trainer.optimizers + if not isinstance(optim, EMAOptimizer) + ] - def state_dict(self) -> Dict[str, Any]: - if self.save_ema_weights_in_callback_state: - return dict(cur_step=self._cur_step, ema_weights=self._ema_model_weights) - return dict(cur_step=self._cur_step) + def on_validation_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: + if self._should_validate_ema_weights(trainer): + self.swap_model_weights(trainer) - def load_state_dict(self, state_dict: Dict[str, Any]) -> None: - self._cur_step = state_dict['cur_step'] - # when loading using NeMo, ema weights will be loaded by the experiment manager separately. - if self._ema_model_weights is None: - self._ema_model_weights = state_dict.get('ema_weights') + def on_validation_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: + if self._should_validate_ema_weights(trainer): + self.swap_model_weights(trainer) + + def on_test_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: + if self._should_validate_ema_weights(trainer): + self.swap_model_weights(trainer) + + def on_test_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: + if self._should_validate_ema_weights(trainer): + self.swap_model_weights(trainer) + + def _should_validate_ema_weights(self, trainer: "pl.Trainer") -> bool: + return not self.validate_original_weights and self._ema_initialized(trainer) + + def _ema_initialized(self, trainer: "pl.Trainer") -> bool: + return any(isinstance(optimizer, EMAOptimizer) for optimizer in trainer.optimizers) + + def swap_model_weights(self, trainer: "pl.Trainer", saving_ema_model: bool = False): + for optimizer in trainer.optimizers: + assert isinstance(optimizer, EMAOptimizer) + optimizer.switch_main_parameter_weights(saving_ema_model) + + @contextlib.contextmanager + def save_ema_model(self, trainer: "pl.Trainer"): + """ + Saves an EMA copy of the model + EMA optimizer states for resume. + """ + self.swap_model_weights(trainer, saving_ema_model=True) + try: + yield + finally: + self.swap_model_weights(trainer, saving_ema_model=False) + + @contextlib.contextmanager + def save_original_optimizer_state(self, trainer: "pl.Trainer"): + for optimizer in trainer.optimizers: + assert isinstance(optimizer, EMAOptimizer) + optimizer.save_original_optimizer_state = True + try: + yield + finally: + for optimizer in trainer.optimizers: + optimizer.save_original_optimizer_state = False def on_load_checkpoint( self, trainer: "pl.Trainer", pl_module: "pl.LightningModule", checkpoint: Dict[str, Any] @@ -142,43 +129,219 @@ def on_load_checkpoint( ema_path = trainer.ckpt_path.replace(ext, f'-EMA{ext}') if os.path.exists(ema_path): ema_state_dict = torch.load(ema_path, map_location=torch.device('cpu')) - self._ema_model_weights = ema_state_dict['state_dict'].values() + + # this is wrong, basically when we save the EMA weights, optimizer_states actually contains the model parameters + # as we swapped the model parameters with the state dict parameters. + # we could enforce that if you trained with EMA and want to continue training + checkpoint['optimizer_states'] = ema_state_dict['optimizer_states'] del ema_state_dict logging.info("EMA weights have been loaded successfully. Continuing training with saved EMA weights.") else: - warnings.warn( - "we were unable to find the associated EMA weights when re-loading, " - "training will start with new EMA weights.", - UserWarning, + raise MisconfigurationException( + "Unable to find the associated EMA weights when re-loading, " + f"training will start with new EMA weights. Expected them to be at: {ema_path}", ) - def replace_model_weights(self, pl_module: "pl.LightningModule") -> None: - self._weights_buffer = [p.detach().clone().to('cpu') for p in pl_module.state_dict().values()] - new_state_dict = {k: v for k, v in zip(pl_module.state_dict().keys(), self._ema_model_weights)} - pl_module.load_state_dict(new_state_dict) - def restore_original_weights(self, pl_module: "pl.LightningModule") -> None: - state_dict = pl_module.state_dict() - new_state_dict = {k: v for k, v in zip(state_dict.keys(), self._weights_buffer)} - pl_module.load_state_dict(new_state_dict) - del self._weights_buffer +@torch.no_grad() +def ema_update(ema_model_tuple, current_model_tuple, decay): + torch._foreach_mul_(ema_model_tuple, decay) + torch._foreach_add_( + ema_model_tuple, current_model_tuple, alpha=(1.0 - decay), + ) - @property - def ema_initialized(self) -> bool: - return self._ema_model_weights is not None - def on_validation_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: - if self.ema_initialized and self.evaluate_ema_weights_instead: - self.replace_model_weights(pl_module) +def run_ema_update_cpu(ema_model_tuple, current_model_tuple, decay, pre_sync_stream=None): + if pre_sync_stream is not None: + pre_sync_stream.synchronize() - def on_validation_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: - if self.ema_initialized and self.evaluate_ema_weights_instead: - self.restore_original_weights(pl_module) + ema_update(ema_model_tuple, current_model_tuple, decay) - def on_test_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: - if self.ema_initialized and self.evaluate_ema_weights_instead: - self.replace_model_weights(pl_module) - def on_test_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: - if self.ema_initialized and self.evaluate_ema_weights_instead: - self.restore_original_weights(pl_module) +class EMAOptimizer(torch.optim.Optimizer): + r""" + EMAOptimizer is a wrapper for torch.optim.Optimizer that computes + Exponential Moving Average of parameters registered in the optimizer. + + EMA parameters are automatically updated after every step of the optimizer + with the following formula: + + ema_weight = decay * ema_weight + (1 - decay) * training_weight + + To access EMA parameters, use ``swap_ema_weights()`` context manager to + perform a temporary in-place swap of regular parameters with EMA + parameters. + + Notes: + - EMAOptimizer is not compatible with APEX AMP O2. + + Args: + optimizer (torch.optim.Optimizer): optimizer to wrap + device (torch.device): device for EMA parameters + decay (float): decay factor + + Returns: + returns an instance of torch.optim.Optimizer that computes EMA of + parameters + + Example: + model = Model().to(device) + opt = torch.optim.Adam(model.parameters()) + + opt = EMAOptimizer(opt, device, 0.9999) + + for epoch in range(epochs): + training_loop(model, opt) + + regular_eval_accuracy = evaluate(model) + + with opt.swap_ema_weights(): + ema_eval_accuracy = evaluate(model) + """ + + def __init__( + self, + optimizer: torch.optim.Optimizer, + device: torch.device, + decay: float = 0.9999, + every_n_steps: int = 1, + current_step: int = 0, + ): + self.optimizer = optimizer + self.decay = decay + self.device = device + self.current_step = current_step + self.every_n_steps = every_n_steps + self.save_original_optimizer_state = False + + self.first_iteration = True + self.rebuild_ema_params = True + self.stream = None + self.thread = None + + self.ema_params = () + self.in_saving_ema_model_context = False + + def all_parameters(self) -> Iterable[torch.Tensor]: + return (param for group in self.param_groups for param in group['params']) + + def step(self, closure=None, **kwargs): + self.join() + + if self.first_iteration: + if any(p.is_cuda for p in self.all_parameters()): + self.stream = torch.cuda.Stream() + + self.first_iteration = False + + if self.rebuild_ema_params: + opt_params = list(self.all_parameters()) + + self.ema_params += tuple( + copy.deepcopy(param.data.detach()).to(self.device) for param in opt_params[len(self.ema_params) :] + ) + self.rebuild_ema_params = False + + loss = self.optimizer.step(closure) + + if self._should_update_at_step(): + self.update() + self.current_step += 1 + return loss + + def _should_update_at_step(self) -> bool: + return self.current_step % self.every_n_steps == 0 + + @torch.no_grad() + def update(self): + if self.stream is not None: + self.stream.wait_stream(torch.cuda.current_stream()) + + with torch.cuda.stream(self.stream): + current_model_state = tuple( + param.data.to(self.device, non_blocking=True) for param in self.all_parameters() + ) + + if self.device.type == 'cuda': + ema_update(self.ema_params, current_model_state, self.decay) + + if self.device.type == 'cpu': + self.thread = threading.Thread( + target=run_ema_update_cpu, args=(self.ema_params, current_model_state, self.decay, self.stream,), + ) + self.thread.start() + + def swap_tensors(self, tensor1, tensor2): + tmp = torch.empty_like(tensor1) + tmp.copy_(tensor1) + tensor1.copy_(tensor2) + tensor2.copy_(tmp) + + def switch_main_parameter_weights(self, saving_ema_model: bool = False): + self.join() + self.in_saving_ema_model_context = saving_ema_model + for param, ema_param in zip(self.all_parameters(), self.ema_params): + self.swap_tensors(param.data, ema_param) + + @contextlib.contextmanager + def swap_ema_weights(self, enabled: bool = True): + r""" + A context manager to in-place swap regular parameters with EMA + parameters. + It swaps back to the original regular parameters on context manager + exit. + + Args: + enabled (bool): whether the swap should be performed + """ + + if enabled: + self.switch_main_parameter_weights() + try: + yield + finally: + if enabled: + self.switch_main_parameter_weights() + + def __getattr__(self, name): + return getattr(self.optimizer, name) + + def join(self): + if self.stream is not None: + self.stream.synchronize() + + if self.thread is not None: + self.thread.join() + + def state_dict(self): + self.join() + + if self.save_original_optimizer_state: + return self.optimizer.state_dict() + + # if we are in the context of saving an EMA model, the EMA weights are in the modules' actual weights + ema_params = self.ema_params if not self.in_saving_ema_model_context else list(self.all_parameters()) + state_dict = { + 'opt': self.optimizer.state_dict(), + 'ema': ema_params, + 'current_step': self.current_step, + 'decay': self.decay, + 'every_n_steps': self.every_n_steps, + 'device': self.device, + } + return state_dict + + def load_state_dict(self, state_dict): + self.join() + + self.optimizer.load_state_dict(state_dict['opt']) + self.ema_params = tuple(param.to(self.device) for param in copy.deepcopy(state_dict['ema'])) + self.current_step = state_dict['current_step'] + self.decay = state_dict['decay'] + self.device = state_dict['device'] + self.every_n_steps = state_dict['every_n_steps'] + self.rebuild_ema_params = False + + def add_param_group(self, param_group): + self.optimizer.add_param_group(param_group) + self.rebuild_ema_params = True diff --git a/nemo/utils/exp_manager.py b/nemo/utils/exp_manager.py index 2e467ef594e8..2406d343e0f5 100644 --- a/nemo/utils/exp_manager.py +++ b/nemo/utils/exp_manager.py @@ -114,10 +114,10 @@ class StepTimingParams: @dataclass class EMAParams: enable: Optional[bool] = False - evaluate_ema_weights_instead: Optional[bool] = False decay: Optional[float] = 0.999 - apply_ema_every_n_steps: Optional[int] = 1 - start_step: Optional[int] = 0 + cpu_offload: Optional[bool] = False + validate_original_weights: Optional[bool] = False + every_n_steps: int = 1 @dataclass @@ -387,9 +387,9 @@ def exp_manager(trainer: 'pytorch_lightning.Trainer', cfg: Optional[Union[DictCo if cfg.ema.enable: ema_callback = EMA( decay=cfg.ema.decay, - apply_ema_every_n_steps=cfg.ema.apply_ema_every_n_steps, - start_step=cfg.ema.start_step, - evaluate_ema_weights_instead=cfg.ema.evaluate_ema_weights_instead, + validate_original_weights=cfg.ema.validate_original_weights, + cpu_offload=cfg.ema.cpu_offload, + every_n_steps=cfg.ema.every_n_steps, ) trainer.callbacks.append(ema_callback) @@ -914,24 +914,27 @@ def _del_model_without_trainer(self, filepath: str) -> None: except: logging.info(f"Tried to remove checkpoint: {filepath} but failed.") - def _get_ema_callback(self, trainer) -> Optional[EMA]: + def _ema_callback(self, trainer: 'pytorch_lightning.Trainer') -> Optional[EMA]: ema_callback = None for callback in trainer.callbacks: if isinstance(callback, EMA): ema_callback = callback return ema_callback - def _save_checkpoint(self, trainer, filepath: str) -> None: - super()._save_checkpoint(trainer, filepath) - ema_callback = self._get_ema_callback(trainer) + def _save_checkpoint(self, trainer: 'pytorch_lightning.Trainer', filepath: str) -> None: + ema_callback = self._ema_callback(trainer) if ema_callback is not None: + with ema_callback.save_original_optimizer_state(trainer): + super()._save_checkpoint(trainer, filepath) + # save EMA copy of the model as well. - ema_callback.replace_model_weights(trainer.lightning_module) - filepath = self._ema_format_filepath(filepath) - if self.verbose: - rank_zero_info(f"Saving EMA weights to separate checkpoint {filepath}") + with ema_callback.save_ema_model(trainer): + filepath = self._ema_format_filepath(filepath) + if self.verbose: + rank_zero_info(f"Saving EMA weights to separate checkpoint {filepath}") + super()._save_checkpoint(trainer, filepath) + else: super()._save_checkpoint(trainer, filepath) - ema_callback.restore_original_weights(trainer.lightning_module) def _ema_format_filepath(self, filepath: str) -> str: return filepath.replace(self.FILE_EXTENSION, f'-EMA{self.FILE_EXTENSION}') diff --git a/tests/collections/common/test_ema.py b/tests/collections/common/test_ema.py index ae4f40d52d51..16de01e0fbe4 100644 --- a/tests/collections/common/test_ema.py +++ b/tests/collections/common/test_ema.py @@ -13,9 +13,7 @@ # limitations under the License. import os.path -from copy import deepcopy from typing import Any, Dict, Union -from unittest import mock import pytest import pytorch_lightning as pl @@ -26,40 +24,53 @@ from pytorch_lightning.utilities.types import STEP_OUTPUT from nemo.collections.common.callbacks import EMA +from nemo.collections.common.callbacks.ema import EMAOptimizer from nemo.core import ModelPT from nemo.utils.exp_manager import exp_manager from tests.collections.nlp.test_gpt_model import DEVICE_CAPABILITY -class OnesDataset(torch.utils.data.Dataset): - def __init__(self, dataset_len): - super().__init__() - self.__dataset_len = dataset_len +def extract_ema_weights(pl_module, trainer): + ema_callback = [x for x in trainer.callbacks if isinstance(x, EMA)][0] + ema_callback.swap_model_weights(trainer) + weights = extract_weights(pl_module) + ema_callback.swap_model_weights(trainer) + return weights - def __getitem__(self, *args): - return torch.ones(2) + +def extract_weights(pl_module): + return [w.detach().clone() for w in pl_module.parameters()] + + +class RandomDataset(torch.utils.data.Dataset): + def __init__(self, size, length): + self.len = length + self.data = torch.randn(length, size) + + def __getitem__(self, index): + return self.data[index] def __len__(self): - return self.__dataset_len + return self.len class ExampleModel(ModelPT): def __init__(self, *args, **kwargs): cfg = OmegaConf.structured({}) super().__init__(cfg) - self.l1 = torch.nn.modules.Linear(in_features=2, out_features=1) + self.l1 = torch.nn.modules.Linear(in_features=32, out_features=32) + self.bn = torch.nn.BatchNorm1d(32) def train_dataloader(self): - dataset = OnesDataset(16) + dataset = RandomDataset(32, 16) return torch.utils.data.DataLoader(dataset, batch_size=2) def val_dataloader(self): - dataset = OnesDataset(10) + dataset = RandomDataset(32, 16) return torch.utils.data.DataLoader(dataset, batch_size=2) def forward(self, batch): - output = self.l1(batch) - return torch.nn.functional.l1_loss(output, torch.zeros(output.size()).to(output.device)) + return self.l1(self.bn(batch)).sum() def validation_step(self, batch, batch_idx): return self(batch) @@ -68,7 +79,7 @@ def training_step(self, batch, batch_idx): return self(batch) def configure_optimizers(self): - return torch.optim.Adam(self.parameters(), lr=0.1) + return torch.optim.SGD(self.parameters(), lr=1e-3) def list_available_models(self): pass @@ -89,11 +100,6 @@ def test_ema_value(self): with pytest.raises(MisconfigurationException, match="between 0 and 1"): EMA(decay=2) - @mock.patch('nemo.collections.common.callbacks.ema.apex_available', False) - def test_ema_apex_unavailable(self): - with pytest.warns(UserWarning, match="EMA has better performance when Apex is installed"): - EMA(decay=0.999) - @pytest.mark.unit @pytest.mark.run_only_on('GPU') def test_ema_saved_state(self, tmpdir, caplog): @@ -102,9 +108,8 @@ def test_ema_saved_state(self, tmpdir, caplog): class TerminateCallback(Callback): def on_train_epoch_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: - ema_callback = [x for x in trainer.callbacks if isinstance(x, EMA)][0] - self.saved_ema_weights = ema_callback._ema_model_weights - self.pl_module_weights = list(pl_module.state_dict().values()) + self.saved_ema_weights = extract_ema_weights(pl_module, trainer) + self.pl_module_weights = extract_weights(pl_module) raise SystemExit model = ExampleModel() @@ -124,7 +129,7 @@ def on_train_epoch_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModu exp_manager( trainer, { - "ema": {"enable": True, "evaluate_ema_weights_instead": True}, + "ema": {"enable": True}, "explicit_log_dir": str(temp_path), "checkpoint_callback_params": {"filename": f"{{epoch}}-{{step}}"}, }, @@ -137,13 +142,16 @@ def on_train_epoch_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModu class CheckStateCallback(Callback): def on_train_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: - ema_callback = [x for x in trainer.callbacks if isinstance(x, EMA)][0] - weights = list(pl_module.state_dict().values()) + weights = extract_weights(pl_module) for x, y in zip(weights, terminate_callback.pl_module_weights): assert torch.allclose(x.cpu(), y.cpu()) - for x, y in zip(ema_callback._ema_model_weights, terminate_callback.saved_ema_weights): + current_ema_weights = extract_ema_weights(pl_module, trainer) + for x, y in zip(current_ema_weights, terminate_callback.saved_ema_weights): assert torch.allclose(x.cpu(), y.cpu()) - assert ema_callback._cur_step == 8 + + for optimizer in trainer.optimizers: + assert isinstance(optimizer, EMAOptimizer) + assert optimizer.current_step == 8 trainer = Trainer( max_epochs=2, @@ -157,7 +165,7 @@ def on_train_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") exp_manager( trainer, { - "ema": {"enable": True, "evaluate_ema_weights_instead": True}, + "ema": {"enable": True}, "explicit_log_dir": str(temp_path), "checkpoint_callback_params": {"filename": f"{{epoch}}-{{step}}"}, }, @@ -181,7 +189,7 @@ def on_train_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") exp_manager( trainer, { - "ema": {"enable": True, "evaluate_ema_weights_instead": True}, + "ema": {"enable": True}, "explicit_log_dir": str(temp_path), "checkpoint_callback_params": {"filename": f"{{epoch}}-{{step}}"}, }, @@ -203,12 +211,14 @@ def on_train_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") exp_manager( trainer, { - "ema": {"enable": True, "evaluate_ema_weights_instead": True}, + "ema": {"enable": True, "validate_original_weights": True}, "explicit_log_dir": str(temp_path), "checkpoint_callback_params": {"filename": f"{{epoch}}-{{step}}"}, }, ) - with pytest.warns(UserWarning, match="we were unable to find the associated EMA weights when re-loading"): + with pytest.raises( + MisconfigurationException, match="Unable to find the associated EMA weights when re-loading" + ): trainer.fit(model, ckpt_path=resume_path) @pytest.mark.unit @@ -221,70 +231,23 @@ def test_exp_manager_ema_weights(self, tmpdir): exp_manager( trainer, { - "ema": {"enable": True, "evaluate_ema_weights_instead": True}, + "ema": {"enable": True, "validate_original_weights": True}, "explicit_log_dir": str(tmp_path), "checkpoint_callback_params": {"filename": f"{{epoch}}-{{step}}"}, }, ) assert any(isinstance(callback, EMA) for callback in trainer.callbacks) trainer.fit(model) + ema_weights = extract_ema_weights(model, trainer) assert os.path.exists(tmp_path / "checkpoints/epoch=0-step=8.ckpt") ema_path = tmp_path / "checkpoints/epoch=0-step=8-EMA.ckpt" assert os.path.exists(ema_path) duplicate_model = ExampleModel.load_from_checkpoint(str(ema_path)) - ema_callback = [x for x in trainer.callbacks if isinstance(x, EMA)][0] - for saved_weight, ema_weight in zip(duplicate_model.state_dict().values(), ema_callback._ema_model_weights): + for saved_weight, ema_weight in zip(duplicate_model.state_dict().values(), ema_weights): assert torch.allclose(saved_weight.cpu(), ema_weight.cpu()) - @pytest.mark.unit - @pytest.mark.run_only_on('GPU') - def test_ema_save_in_callback(self, tmpdir): - """Test to ensure when `save_ema_weights_in_callback_state` is enabled, we save to the callback state.""" - temp_path = os.path.join(tmpdir, 'saved_state') - - model = ExampleModel() - - trainer = Trainer( - max_epochs=2, - limit_val_batches=1, - limit_train_batches=16, - logger=False, - val_check_interval=0.5, - enable_checkpointing=False, - accelerator='gpu', - devices=1, - callbacks=[EMA(decay=0.999, save_ema_weights_in_callback_state=True, evaluate_ema_weights_instead=True)], - ) - exp_manager( - trainer, - {"explicit_log_dir": str(temp_path), "checkpoint_callback_params": {"filename": f"{{epoch}}-{{step}}"},}, - ) - trainer.fit(model=model) - - resume_path = os.path.join(temp_path, "checkpoints/epoch=0-step=8.ckpt") - callback = EMA(decay=0.999, save_ema_weights_in_callback_state=True) - - class AssertCallback(Callback): - def on_train_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: - assert callback._ema_model_weights is not None - - model = ExampleModel() - - trainer = Trainer( - max_epochs=2, - limit_val_batches=1, - limit_train_batches=16, - logger=False, - val_check_interval=0.5, - enable_checkpointing=False, - accelerator='gpu', - devices=1, - callbacks=[callback, AssertCallback()], - ) - trainer.fit(model, ckpt_path=resume_path) - class TestEMATrain: @pytest.mark.unit @@ -303,41 +266,32 @@ class TestEMATrain: ], ) @pytest.mark.parametrize("accumulate_grad_batches", [1, 2]) - @pytest.mark.parametrize("evaluate_ema_weights_instead", [True, False]) - @pytest.mark.parametrize("apex_available_mock", [True, False]) + @pytest.mark.parametrize("validate_original_weights", [True, False]) @pytest.mark.run_only_on('GPU') def test_ema_run_cuda( - self, - test_data_dir, - precision, - accumulate_grad_batches, - evaluate_ema_weights_instead, - apex_available_mock, - tmpdir, + self, test_data_dir, precision, accumulate_grad_batches, validate_original_weights, tmpdir, ): - with mock.patch('nemo.collections.common.callbacks.ema.apex_available', apex_available_mock): - self.run_training_test( - accumulate_grad_batches=accumulate_grad_batches, - evaluate_ema_weights_instead=evaluate_ema_weights_instead, - accelerator='gpu', - precision=precision, - tmpdir=tmpdir, - ) + self.run_training_test( + accumulate_grad_batches=accumulate_grad_batches, + validate_original_weights=validate_original_weights, + accelerator='gpu', + precision=precision, + tmpdir=tmpdir, + ) @pytest.mark.unit @pytest.mark.parametrize("accumulate_grad_batches", [1, 2]) - @pytest.mark.parametrize("evaluate_ema_weights_instead", [True, False]) - @pytest.mark.run_only_on('GPU') - def test_ema_run_cpu(self, test_data_dir, accumulate_grad_batches, evaluate_ema_weights_instead, tmpdir): + @pytest.mark.parametrize("validate_original_weights", [True, False]) + def test_ema_run_cpu(self, test_data_dir, accumulate_grad_batches, validate_original_weights, tmpdir): self.run_training_test( accumulate_grad_batches=accumulate_grad_batches, - evaluate_ema_weights_instead=evaluate_ema_weights_instead, + validate_original_weights=validate_original_weights, accelerator='cpu', precision=32, tmpdir=tmpdir, ) - def run_training_test(self, accumulate_grad_batches, evaluate_ema_weights_instead, accelerator, precision, tmpdir): + def run_training_test(self, accumulate_grad_batches, validate_original_weights, accelerator, precision, tmpdir): pl.seed_everything(123) model = ExampleModel() trainer = Trainer( @@ -356,33 +310,24 @@ def run_training_test(self, accumulate_grad_batches, evaluate_ema_weights_instea exp_manager( trainer, { - "ema": {"enable": True, "evaluate_ema_weights_instead": evaluate_ema_weights_instead}, + "ema": {"enable": True, "validate_original_weights": validate_original_weights, "decay": 0.999}, "explicit_log_dir": str(tmpdir), "checkpoint_callback_params": {"filename": f"{{epoch}}-{{step}}"}, }, ) # add the check callback after the exp manager has made modifications. trainer.callbacks.append(EMAAssertCallback()) + trainer.callbacks.insert(0, EMAValidationAssertCallback()) trainer.fit(model=model, val_dataloaders=model.train_dataloader()) class EMAAssertCallback(Callback): - def __init__(self): - self._before_calc_ema_weights = None - def on_train_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: - model_weights = list(pl_module.state_dict().values()) - ema_callback = [x for x in trainer.callbacks if isinstance(x, EMA)][0] - for x, y in zip(model_weights, ema_callback._ema_model_weights): + model_weights = extract_weights(pl_module) + self.ema_weights = extract_ema_weights(pl_module, trainer) + for x, y in zip(model_weights, self.ema_weights): assert torch.allclose(x, y) - def on_train_batch_start( - self, trainer: "pl.Trainer", pl_module: "pl.LightningModule", batch: Any, batch_idx: int - ) -> None: - ema_callback = [x for x in trainer.callbacks if isinstance(x, EMA)][0] - # saved for manual calculation of ema to compare against implementation - self._before_calc_ema_weights = deepcopy(ema_callback._ema_model_weights) - def on_train_batch_end( self, trainer: "pl.Trainer", pl_module: "pl.LightningModule", outputs: STEP_OUTPUT, batch: Any, batch_idx: int ) -> None: @@ -392,28 +337,36 @@ def on_train_batch_end( ema_callback = [x for x in trainer.callbacks if isinstance(x, EMA)][0] decay = ema_callback.decay expected_ema_weights = [] - for orig_weight, ema_weight in zip(list(pl_module.state_dict().values()), self._before_calc_ema_weights): - expected_ema_weight = orig_weight * (1 - decay) + ema_weight * decay - expected_ema_weights.append(expected_ema_weight) - for actual_ema_weight, expected_ema_weight in zip(ema_callback._ema_model_weights, expected_ema_weights): + new_weights = extract_weights(pl_module) + + for ema_weight, new_weight in zip(self.ema_weights, new_weights): + expected_ema_weight = ema_weight * decay + expected_ema_weight += new_weight * (1 - decay) + expected_ema_weights.append(expected_ema_weight) + ema_weights = extract_ema_weights(pl_module, trainer) + for actual_ema_weight, expected_ema_weight in zip(ema_weights, expected_ema_weights): assert torch.allclose(actual_ema_weight, expected_ema_weight) + self.ema_weights = expected_ema_weights + +class EMAValidationAssertCallback(Callback): def on_validation_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: ema_callback = [x for x in trainer.callbacks if isinstance(x, EMA)][0] - if ema_callback.evaluate_ema_weights_instead: - # todo (sean): shouldn't use the weights buffer to check original weights - self._original_weights = list(x.detach().clone() for x in ema_callback._weights_buffer) - if ema_callback.ema_initialized: - for ema_weights, module_weights in zip( - ema_callback._ema_model_weights, pl_module.state_dict().values() - ): + self._original_weights = extract_weights(pl_module) + self._ema_weights = extract_ema_weights(pl_module, trainer) + # call original EMA function + super().on_validation_start(trainer, pl_module) + if not ema_callback.validate_original_weights: + if ema_callback._ema_initialized: + # check model weights are now EMA weights + for ema_weights, module_weights in zip(self._ema_weights, extract_weights(pl_module)): torch.allclose(ema_weights, module_weights) def on_validation_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: ema_callback = [x for x in trainer.callbacks if isinstance(x, EMA)][0] - if ema_callback.evaluate_ema_weights_instead: - model_weights = list(pl_module.state_dict().values()) - if ema_callback.ema_initialized: + if not ema_callback.validate_original_weights: + model_weights = extract_weights(pl_module) + if ema_callback._ema_initialized: for orig_weights, module_weights in zip(self._original_weights, model_weights): - torch.allclose(orig_weights, module_weights.cpu()) + torch.allclose(orig_weights.cpu(), module_weights.cpu()) From 1250c36253147999a180e110b2ce74143a110b06 Mon Sep 17 00:00:00 2001 From: anteju <108555623+anteju@users.noreply.github.com> Date: Mon, 12 Dec 2022 11:27:42 -0800 Subject: [PATCH 029/154] AIStore for ASR datasets (#5462) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit AIStore for ASR datasets Signed-off-by: Ante Jukić Signed-off-by: Elena Rastorgueva --- Dockerfile | 5 + nemo/collections/asr/data/audio_to_text.py | 143 ++++++++ .../asr/parts/utils/manifest_utils.py | 5 +- .../common/parts/preprocessing/manifest.py | 26 +- nemo/constants.py | 2 + nemo/core/classes/common.py | 5 +- nemo/utils/data_utils.py | 312 ++++++++++++++++++ nemo/utils/model_utils.py | 32 +- .../tokenizers/process_asr_text_tokenizer.py | 3 +- tests/collections/asr/test_asr_datasets.py | 95 +++++- tests/utils/test_utils.py | 104 ++++++ 11 files changed, 696 insertions(+), 36 deletions(-) create mode 100644 nemo/utils/data_utils.py create mode 100644 tests/utils/test_utils.py diff --git a/Dockerfile b/Dockerfile index 4106915e257e..c02876fee81a 100644 --- a/Dockerfile +++ b/Dockerfile @@ -95,3 +95,8 @@ COPY tutorials /workspace/nemo/tutorials RUN printf "#!/bin/bash\njupyter lab --no-browser --allow-root --ip=0.0.0.0" >> start-jupyter.sh && \ chmod +x start-jupyter.sh + +# Prepare AIS CLI +ARG AIS_VERSION=v1.3.15 +ARG AIS_BIN=https://github.com/NVIDIA/aistore/releases/download/${AIS_VERSION}/ais-linux-amd64.tar.gz +RUN curl -LO ${AIS_BIN} && tar -xzvf ais-linux-amd64.tar.gz && mv ./ais /usr/local/bin/. diff --git a/nemo/collections/asr/data/audio_to_text.py b/nemo/collections/asr/data/audio_to_text.py index 28e4da6246ca..4d0ad776841b 100644 --- a/nemo/collections/asr/data/audio_to_text.py +++ b/nemo/collections/asr/data/audio_to_text.py @@ -12,7 +12,9 @@ # See the License for the specific language governing permissions and # limitations under the License. import io +import json import math +import multiprocessing import os from typing import Callable, Dict, Iterable, List, Optional, Tuple, Union @@ -21,6 +23,7 @@ import torch import webdataset as wd from torch.utils.data import ChainDataset +from tqdm import tqdm from nemo.collections.asr.parts.preprocessing.features import WaveformFeaturizer from nemo.collections.asr.parts.utils.audio_utils import ChannelSelectorType @@ -29,6 +32,14 @@ from nemo.core.classes import Dataset, IterableDataset from nemo.core.neural_types import * from nemo.utils import logging +from nemo.utils.data_utils import ( + DataStoreObject, + datastore_object_get, + datastore_path_to_webdataset_url, + is_datastore_cache_shared, + is_datastore_path, +) +from nemo.utils.get_rank import is_global_rank_zero __all__ = [ 'AudioToCharDataset', @@ -182,6 +193,11 @@ def expand_audio_filepaths(audio_tar_filepaths, shard_strategy: str, world_size: # Brace expand audio_tar_filepaths = list(braceexpand.braceexpand(audio_tar_filepaths)) + # Expand store paths into WebDataset URLs + audio_tar_filepaths = [ + datastore_path_to_webdataset_url(p) if is_datastore_path(p) else p for p in audio_tar_filepaths + ] + # Check for distributed and partition shards accordingly if world_size > 1: if shard_strategy == 'scatter': @@ -208,6 +224,127 @@ def expand_audio_filepaths(audio_tar_filepaths, shard_strategy: str, world_size: return audio_tar_filepaths +def cache_datastore_manifests( + manifest_filepaths: Union[str, List[str]], + cache_audio: bool = False, + shared_cache: Optional[bool] = None, + num_workers: Optional[int] = None, + max_num_workers: int = 20, +): + """Cache manifests and audio from an object store. + It is assumed that remote manifests are using relative paths. + + Args: + manifest_filepaths: list of paths to manifest files (list of strings or a string with `,` as separator) + cache_audio: If True, audio from manifest will also be cached + shared_cache: Optional, True if cache is shared across all nodes + num_workers: Optional, number of workers to be used for download + max_num_workers: max number of workers to be used for download, used when setting num_workers automatically + """ + if isinstance(manifest_filepaths, str): + manifest_filepaths = manifest_filepaths.split(',') + + num_datastore_manifests = sum([is_datastore_path(f) for f in manifest_filepaths]) + + if num_datastore_manifests > 0: + # Local utility function + def cache_data(manifest_filepaths, cache_audio, num_workers, max_num_workers): + """Cache manifests and audio data from object store. + """ + # Determine the number of workers to use + if num_workers is None: + num_workers = os.cpu_count() - 1 + num_workers = min(num_workers, max_num_workers) + + # Process each manifest file + for manifest_file in manifest_filepaths: + # If manifest is on a data store, then cache it. + # Otherwise, nothing to do. + if is_datastore_path(manifest_file): + logging.info('Cache manifest file: %s', manifest_file) + cached_manifest_file = DataStoreObject(manifest_file).get() + logging.info('Cached at: %s', str(cached_manifest_file)) + + if cache_audio: + # Each audio file from manifest will be cached. + logging.info('Cache audio from manifest file: %s', manifest_file) + # Assumes that manifest is using relative paths + manifest_dir = os.path.dirname(manifest_file) + # Prepare all store objects + audio_objects = [] + with open(cached_manifest_file, 'r') as f: + for line in f: + item = json.loads(line) + store_path = os.path.join(manifest_dir, item['audio_filepath']) + audio_objects.append(DataStoreObject(store_path=store_path)) + + if num_workers is not None and num_workers > 1: + logging.debug('Using multiprocessing with num_workers: %d.', num_workers) + with multiprocessing.Pool(processes=num_workers) as p: + result = list( + tqdm(p.imap(datastore_object_get, audio_objects), total=len(audio_objects)) + ) + else: + logging.debug('Using a single process.') + result = [] + for audio_object in tqdm(audio_objects): + result.append(audio_object.get() is not None) + + if not all(result): + raise RuntimeError('Some files not downloaded successfully') + logging.info('Caching complete') + + else: + # Nothing to do here + logging.debug('Manifest is not on a data store: %s', manifest_file) + + if torch.distributed.is_available() and torch.distributed.is_initialized(): + logging.debug('Distributed environment is available and initialized.') + + # Handle distributed environment + if shared_cache is None: + shared_cache = is_datastore_cache_shared() + + if shared_cache: + logging.debug('Cache is shared among nodes, cache data on global rank zero.') + is_rank_zero = is_global_rank_zero() + else: + logging.debug('Cache is not shared among nodes, cache data on local rank zero.') + local_rank = int(os.environ.get("LOCAL_RANK", 0)) + is_rank_zero = local_rank == 0 + + if is_rank_zero: + logging.info('Cache data from %s rank 0', 'global' if shared_cache else 'local') + cache_data( + manifest_filepaths=manifest_filepaths, + cache_audio=cache_audio, + num_workers=num_workers, + max_num_workers=max_num_workers, + ) + logging.debug('Reached barrier') + torch.distributed.barrier() + + elif is_global_rank_zero(): + # Handle non-distributed environment, e.g., if running on a single GPU + logging.warning( + 'Torch distributed is not initialized and caching may be prone to data race conditions. ' + 'Now caching data from global rank 0. If there are other ranks and they pass this ' + 'before rank 0, errors might result.' + ) + cache_data( + manifest_filepaths=manifest_filepaths, + cache_audio=cache_audio, + num_workers=num_workers, + max_num_workers=max_num_workers, + ) + else: + raise RuntimeError( + 'Torch distributed is not initialized and caching on nodes other than global rank zero is disabled ' + 'to avoid race condition between different ranks. To ensure distributed environment is ' + 'initialized, please update data config to use `defer_setup = True`.' + ) + + class _AudioTextDataset(Dataset): """ Dataset that loads tensors via a json file containing paths to audio files, transcripts, and durations (in seconds). @@ -266,6 +403,9 @@ def __init__( if type(manifest_filepath) == str: manifest_filepath = manifest_filepath.split(",") + # If necessary, cache manifests and audio from object store + cache_datastore_manifests(manifest_filepaths=manifest_filepath, cache_audio=True) + self.manifest_processor = ASRManifestProcessor( manifest_filepath=manifest_filepath, parser=parser, @@ -633,6 +773,9 @@ def __init__( world_size: int = 0, return_sample_id: bool = False, ): + # If necessary, cache manifests from object store + cache_datastore_manifests(manifest_filepaths=manifest_filepath) + self.manifest_processor = ASRManifestProcessor( manifest_filepath=manifest_filepath, parser=parser, diff --git a/nemo/collections/asr/parts/utils/manifest_utils.py b/nemo/collections/asr/parts/utils/manifest_utils.py index 579b7ff47b3d..e4f4281ce21a 100644 --- a/nemo/collections/asr/parts/utils/manifest_utils.py +++ b/nemo/collections/asr/parts/utils/manifest_utils.py @@ -29,6 +29,7 @@ segments_manifest_to_subsegments_manifest, write_rttm2manifest, ) +from nemo.utils.data_utils import DataStoreObject def rreplace(s: str, old: str, new: str) -> str: @@ -368,9 +369,11 @@ def read_manifest(manifest: str) -> List[dict]: Returns: data (list): List of JSON items """ + manifest = DataStoreObject(manifest) + data = [] try: - f = open(manifest, 'r', encoding='utf-8') + f = open(manifest.get(), 'r', encoding='utf-8') except: raise Exception(f"Manifest file could not be opened: {manifest}") for line in f: diff --git a/nemo/collections/common/parts/preprocessing/manifest.py b/nemo/collections/common/parts/preprocessing/manifest.py index fc7768d26d29..f6d8d4a6015b 100644 --- a/nemo/collections/common/parts/preprocessing/manifest.py +++ b/nemo/collections/common/parts/preprocessing/manifest.py @@ -13,10 +13,14 @@ # limitations under the License. import json +import os from os.path import expanduser from pathlib import Path from typing import Any, Callable, Dict, Iterator, List, Optional, Union +from nemo.utils import logging +from nemo.utils.data_utils import DataStoreObject, datastore_path_to_local_path, is_datastore_path + class ManifestBase: def __init__(self, *args, **kwargs): @@ -66,8 +70,12 @@ def item_iter( parse_func = __parse_item k = -1 + logging.debug('Manifest files: %s', str(manifests_files)) for manifest_file in manifests_files: - with open(expanduser(manifest_file), 'r') as f: + logging.debug('Using manifest file: %s', str(manifest_file)) + cached_manifest_file = DataStoreObject(manifest_file).get() + logging.debug('Cached at: %s', str(cached_manifest_file)) + with open(expanduser(cached_manifest_file), 'r') as f: for line in f: k += 1 item = parse_func(line, manifest_file) @@ -142,11 +150,23 @@ def get_full_path(audio_file: str, manifest_file: str, audio_file_len_limit: int Full path to audio_file. """ audio_file = Path(audio_file) - manifest_dir = Path(manifest_file).parent + + if is_datastore_path(manifest_file): + # WORKAROUND: pathlib does not support URIs, so use os.path + manifest_dir = os.path.dirname(manifest_file) + else: + manifest_dir = Path(manifest_file).parent.as_posix() if (len(str(audio_file)) < audio_file_len_limit) and not audio_file.is_file() and not audio_file.is_absolute(): # assume audio_file path is relative to manifest_dir - audio_file_path = manifest_dir / audio_file + audio_file_path = os.path.join(manifest_dir, audio_file.as_posix()) + + if is_datastore_path(audio_file_path): + # If audio was originally on an object store, use locally-cached path + audio_file_path = datastore_path_to_local_path(audio_file_path) + + audio_file_path = Path(audio_file_path) + if audio_file_path.is_file(): audio_file = str(audio_file_path.absolute()) else: diff --git a/nemo/constants.py b/nemo/constants.py index 0df46fc152d8..b2afbf882277 100644 --- a/nemo/constants.py +++ b/nemo/constants.py @@ -17,3 +17,5 @@ NEMO_ENV_VARNAME_TESTING = "NEMO_TESTING" # Set to True to enable nemo.util.logging's debug mode NEMO_ENV_VARNAME_VERSION = "NEMO_EXPM_VERSION" # Used for nemo.utils.exp_manager versioning NEMO_ENV_CACHE_DIR = "NEMO_CACHE_DIR" # Used to change default nemo cache directory +NEMO_ENV_DATA_STORE_CACHE_DIR = "NEMO_DATA_STORE_CACHE_DIR" # Used to change default nemo data store cache directory +NEMO_ENV_DATA_STORE_CACHE_SHARED = "NEMO_DATA_STORE_CACHE_SHARED" # Shared among nodes (1) or not shared (0) diff --git a/nemo/core/classes/common.py b/nemo/core/classes/common.py index a4b9bb1829ea..e6b260e0d5b6 100644 --- a/nemo/core/classes/common.py +++ b/nemo/core/classes/common.py @@ -35,8 +35,9 @@ import nemo from nemo.core.connectors.save_restore_connector import SaveRestoreConnector from nemo.core.neural_types import NeuralType, NeuralTypeComparisonResult -from nemo.utils import logging, model_utils +from nemo.utils import logging from nemo.utils.cloud import maybe_download_from_cloud +from nemo.utils.data_utils import resolve_cache_dir from nemo.utils.model_utils import import_class_by_path, maybe_update_config_version __all__ = ['Typing', 'FileIO', 'Model', 'Serialization', 'typecheck', 'PretrainedModelInfo'] @@ -901,7 +902,7 @@ def _get_ngc_pretrained_model_info(cls, model_name: str, refresh_cache: bool = F ) filename = location_in_the_cloud.split("/")[-1] url = location_in_the_cloud.replace(filename, "") - cache_dir = Path.joinpath(model_utils.resolve_cache_dir(), f'{filename[:-5]}') + cache_dir = Path.joinpath(resolve_cache_dir(), f'{filename[:-5]}') # If either description and location in the cloud changes, this will force re-download cache_subfolder = hashlib.md5((location_in_the_cloud + description).encode('utf-8')).hexdigest() # if file exists on cache_folder/subfolder, it will be re-used, unless refresh_cache is True diff --git a/nemo/utils/data_utils.py b/nemo/utils/data_utils.py new file mode 100644 index 000000000000..09da7ba93512 --- /dev/null +++ b/nemo/utils/data_utils.py @@ -0,0 +1,312 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import pathlib +import shutil +import subprocess +from typing import Tuple + +from nemo import __version__ as NEMO_VERSION +from nemo import constants +from nemo.utils import logging + + +def resolve_cache_dir() -> pathlib.Path: + """ + Utility method to resolve a cache directory for NeMo that can be overriden by an environment variable. + + Example: + NEMO_CACHE_DIR="~/nemo_cache_dir/" python nemo_example_script.py + + Returns: + A Path object, resolved to the absolute path of the cache directory. If no override is provided, + uses an inbuilt default which adapts to nemo versions strings. + """ + override_dir = os.environ.get(constants.NEMO_ENV_CACHE_DIR, "") + if override_dir == "": + path = pathlib.Path.joinpath(pathlib.Path.home(), f'.cache/torch/NeMo/NeMo_{NEMO_VERSION}') + else: + path = pathlib.Path(override_dir).resolve() + return path + + +def is_datastore_path(path) -> bool: + """Check if a path is from a data object store. + Currently, only AIStore is supported. + """ + return path.startswith('ais://') + + +def is_datastore_cache_shared() -> bool: + """Check if store cache is shared. + """ + # Assume cache is shared by default, e.g., as in resolve_cache_dir (~/.cache) + cache_shared = int(os.environ.get(constants.NEMO_ENV_DATA_STORE_CACHE_SHARED, 1)) + + if cache_shared == 0: + return False + elif cache_shared == 1: + return True + else: + raise ValueError(f'Unexpected value of env {constants.NEMO_ENV_DATA_STORE_CACHE_SHARED}') + + +def ais_cache_base() -> str: + """Return path to local cache for AIS. + """ + override_dir = os.environ.get(constants.NEMO_ENV_DATA_STORE_CACHE_DIR, "") + if override_dir == "": + cache_dir = resolve_cache_dir().as_posix() + else: + cache_dir = pathlib.Path(override_dir).resolve().as_posix() + + if cache_dir.endswith(NEMO_VERSION): + # Prevent re-caching dataset after upgrading NeMo + cache_dir = os.path.dirname(cache_dir) + return os.path.join(cache_dir, 'ais') + + +def ais_endpoint() -> str: + """Get configured AIS endpoint. + """ + return os.getenv('AIS_ENDPOINT') + + +def bucket_and_object_from_uri(uri: str) -> Tuple[str, str]: + """Parse a path to determine bucket and object path. + + Args: + uri: Full path to an object on an object store + + Returns: + Tuple of strings (bucket_name, object_path) + """ + if not is_datastore_path(uri): + raise ValueError(f'Provided URI is not a valid store path: {uri}') + uri_parts = pathlib.PurePath(uri).parts + bucket = uri_parts[1] + object_path = pathlib.PurePath(*uri_parts[2:]) + + return str(bucket), str(object_path) + + +def ais_endpoint_to_dir(endpoint: str) -> str: + """Convert AIS endpoint to a valid dir name. + Used to build cache location. + + Args: + endpoint: AIStore endpoint in format https://host:port + + Returns: + Directory formed as `host/port`. + """ + if not endpoint.startswith('http://'): + raise ValueError(f'Unexpected format for ais endpoint: {endpoint}') + + endpoint = endpoint.replace('http://', '') + host, port = endpoint.split(':') + return os.path.join(host, port) + + +def ais_binary() -> str: + """Return location of `ais` binary. + """ + path = shutil.which('ais') + + if path is not None: + logging.debug('Found AIS binary at %s', path) + return path + + logging.warning('AIS binary not found with `which ais`.') + + # Double-check if it exists at the default path + default_path = '/usr/local/bin/ais' + if os.path.isfile(default_path): + logging.info('ais available at the default path: %s', default_path) + return default_path + else: + raise RuntimeError(f'AIS binary not found.') + + +def datastore_path_to_local_path(store_path: str) -> str: + """Convert a data store path to a path in a local cache. + + Args: + store_path: a path to an object on an object store + + Returns: + Path to the same object in local cache. + """ + if store_path.startswith('ais://'): + endpoint = ais_endpoint() + if endpoint is None: + raise RuntimeError(f'AIS endpoint not set, cannot resolve {store_path}') + + local_ais_cache = os.path.join(ais_cache_base(), ais_endpoint_to_dir(endpoint)) + store_bucket, store_object = bucket_and_object_from_uri(store_path) + local_path = os.path.join(local_ais_cache, store_bucket, store_object) + else: + raise ValueError(f'Unexpected store path format: {store_path}') + + return local_path + + +def get_datastore_object(path: str, force: bool = False, num_retries: int = 5) -> str: + """Download an object from a store path and return the local path. + If the input `path` is a local path, then nothing will be done, and + the original path will be returned. + + Args: + path: path to an object + force: force download, even if a local file exists + num_retries: number of retries if the get command fails + + Returns: + Local path of the object. + """ + if path.startswith('ais://'): + endpoint = ais_endpoint() + if endpoint is None: + raise RuntimeError(f'AIS endpoint not set, cannot resolve {path}') + + local_path = datastore_path_to_local_path(store_path=path) + + if not os.path.isfile(local_path) or force: + # Either we don't have the file in cache or we force download it + # Enhancement: if local file is present, check some tag and compare against remote + local_dir = os.path.dirname(local_path) + if not os.path.isdir(local_dir): + os.makedirs(local_dir, exist_ok=True) + + cmd = [ais_binary(), 'get', path, local_path] + + # for now info, later debug + logging.debug('Downloading from AIS') + logging.debug('\tendpoint %s', endpoint) + logging.debug('\tpath: %s', path) + logging.debug('\tlocal path: %s', local_path) + logging.debug('\tcmd: %s', subprocess.list2cmdline(cmd)) + + done = False + for n in range(num_retries): + if not done: + try: + # Use stdout=subprocess.DEVNULL to prevent showing AIS command on each line + subprocess.check_call(cmd, stdout=subprocess.DEVNULL) + done = True + except subprocess.CalledProcessError as err: + logging.warning('Attempt %d of %d failed with: %s', n + 1, num_retries, str(err)) + + if not done: + raise RuntimeError('Download failed: %s', subprocess.list2cmdline(cmd)) + + return local_path + + else: + # Assume the file is local + return path + + +class DataStoreObject: + """A simple class for handling objects in a data store. + Currently, this class supports objects on AIStore. + + Args: + store_path: path to a store object + local_path: path to a local object, may be used to upload local object to store + get: get the object from a store + """ + + def __init__(self, store_path: str, local_path: str = None, get: bool = False): + if local_path is not None: + raise NotImplementedError('Specifying a local path is currently not supported.') + + self._store_path = store_path + self._local_path = local_path + + if get: + self.get() + + @property + def store_path(self) -> str: + """Return store path of the object. + """ + return self._store_path + + @property + def local_path(self) -> str: + """Return local path of the object. + """ + return self._local_path + + def get(self, force: bool = False) -> str: + """Get an object from the store to local cache and return the local path. + + Args: + force: force download, even if a local file exists + + Returns: + Path to a local object. + """ + if not self.local_path: + # Assume the object needs to be downloaded + self._local_path = get_datastore_object(self.store_path, force=force) + return self.local_path + + def put(self, force: bool = False) -> str: + """Push to remote and return the store path + + Args: + force: force download, even if a local file exists + + Returns: + Path to a (remote) object object on the object store. + """ + raise NotImplementedError() + + def __str__(self): + """Return a human-readable description of the object. + """ + description = f'{type(self)}: store_path={self.store_path}, local_path={self.local_path}' + return description + + +def datastore_path_to_webdataset_url(store_path: str): + """Convert store_path to a WebDataset URL. + + Args: + store_path: path to buckets on store + + Returns: + URL which can be directly used with WebDataset. + """ + if store_path.startswith('ais://'): + url = f'pipe:ais get {store_path} - || true' + else: + raise ValueError(f'Unknown store path format: {store_path}') + + return url + + +def datastore_object_get(store_object: DataStoreObject) -> bool: + """A convenience wrapper for multiprocessing.imap. + + Args: + store_object: An instance of DataStoreObject + + Returns: + True if get() returned a path. + """ + return store_object.get() is not None diff --git a/nemo/utils/model_utils.py b/nemo/utils/model_utils.py index ca215c76f89c..cef6b75b9dec 100644 --- a/nemo/utils/model_utils.py +++ b/nemo/utils/model_utils.py @@ -21,9 +21,9 @@ import wrapt -import nemo -from nemo import constants from nemo.utils import AppState, logging +from nemo.utils.data_utils import resolve_cache_dir # imported for compatibility: model_utils.resolve_cache_dir() +from nemo.utils.data_utils import is_datastore_path # TODO @blisc: Perhaps refactor instead of import guarding @@ -133,8 +133,7 @@ def resolve_dataset_name_from_cfg(cfg: 'DictConfig') -> Optional[str]: values_are_paths = 0 for val_i in value: val_i = str(val_i) - - if os.path.exists(val_i) or os.path.isdir(val_i): + if os.path.exists(val_i) or os.path.isdir(val_i) or is_datastore_path(val_i): values_are_paths += 1 else: # reset counter and break inner loop @@ -144,7 +143,7 @@ def resolve_dataset_name_from_cfg(cfg: 'DictConfig') -> Optional[str]: return key else: - if os.path.exists(str(value)) or os.path.isdir(str(value)): + if os.path.exists(str(value)) or os.path.isdir(str(value)) or is_datastore_path(str(value)): return key return None @@ -161,7 +160,7 @@ def parse_dataset_as_name(name: str) -> str: Returns: str prefix used to identify uniquely this data/manifest file. """ - if os.path.exists(str(name)) or os.path.isdir(str(name)): + if os.path.exists(str(name)) or os.path.isdir(str(name)) or is_datastore_path(str(name)): name = Path(name).stem else: name = str(name) @@ -276,14 +275,12 @@ def resolve_validation_dataloaders(model: 'ModelPT'): model._validation_dl = dataloaders model._validation_names = [parse_dataset_as_name(ds) for ds in ds_values] - unique_names_check(name_list=model._validation_names) return else: model.setup_validation_data(cfg.validation_ds) model._validation_names = [parse_dataset_as_name(ds_values)] - unique_names_check(name_list=model._validation_names) @@ -565,25 +562,6 @@ def check_lib_version(lib_name: str, checked_version: str, operator) -> Tuple[Op return None, msg -def resolve_cache_dir() -> Path: - """ - Utility method to resolve a cache directory for NeMo that can be overriden by an environment variable. - - Example: - NEMO_CACHE_DIR="~/nemo_cache_dir/" python nemo_example_script.py - - Returns: - A Path object, resolved to the absolute path of the cache directory. If no override is provided, - uses an inbuilt default which adapts to nemo versions strings. - """ - override_dir = os.environ.get(constants.NEMO_ENV_CACHE_DIR, "") - if override_dir == "": - path = Path.joinpath(Path.home(), f'.cache/torch/NeMo/NeMo_{nemo.__version__}') - else: - path = Path(override_dir).resolve() - return path - - def uninject_model_parallel_rank(filepath): filepath = str(filepath) if 'mp_rank' in filepath or 'tp_rank' in filepath: diff --git a/scripts/tokenizers/process_asr_text_tokenizer.py b/scripts/tokenizers/process_asr_text_tokenizer.py index 6fc47fb5d630..c31b50815c13 100644 --- a/scripts/tokenizers/process_asr_text_tokenizer.py +++ b/scripts/tokenizers/process_asr_text_tokenizer.py @@ -89,6 +89,7 @@ import tokenizers from nemo.collections.common.tokenizers.sentencepiece_tokenizer import create_spt_model +from nemo.utils.data_utils import DataStoreObject parser = argparse.ArgumentParser(description='Create tokenizer') group = parser.add_mutually_exclusive_group(required=True) @@ -161,7 +162,7 @@ def __build_document_from_manifests( num_lines = 0 with open(document_path, 'w') as out_writer: for manifest in manifests: - with open(manifest, 'r') as in_reader: + with open(DataStoreObject(manifest).get(), 'r') as in_reader: for line in in_reader: item = json.loads(line) text = item['text'] diff --git a/tests/collections/asr/test_asr_datasets.py b/tests/collections/asr/test_asr_datasets.py index 88b0ab4127f1..d64b8ad98f5f 100644 --- a/tests/collections/asr/test_asr_datasets.py +++ b/tests/collections/asr/test_asr_datasets.py @@ -11,11 +11,13 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. - import copy +import filecmp import json import os +import shutil import tempfile +from unittest import mock import numpy as np import pytest @@ -32,7 +34,12 @@ AudioToTargetWithReferenceDataset, _audio_collate_fn, ) -from nemo.collections.asr.data.audio_to_text import TarredAudioToBPEDataset, TarredAudioToCharDataset +from nemo.collections.asr.data.audio_to_text import ( + DataStoreObject, + TarredAudioToBPEDataset, + TarredAudioToCharDataset, + cache_datastore_manifests, +) from nemo.collections.asr.data.audio_to_text_dali import ( __DALI_MINIMUM_VERSION__, AudioToBPEDALIDataset, @@ -1535,3 +1542,87 @@ def test_audio_to_target_with_embedding_dataset(self): assert np.isclose( item_factory_signal, golden_signal, atol=atol ).all(), f'Test 1: Failed for factory example {n}, signal {signal} (random seed {random_seed})' + + +class TestUtilityFunctions: + @pytest.mark.unit + @pytest.mark.parametrize('cache_audio', [False, True]) + def test_cache_datastore_manifests(self, cache_audio: bool): + """Test caching of manifest and audio files. + """ + # Data setup + random_seed = 42 + sample_rate = 16000 + num_examples = 10 + num_manifests = 2 + data_duration = 1.0 + + # Generate random signals + _rng = np.random.default_rng(seed=random_seed) + + # Input and target signals have the same duration + data_duration_samples = int(data_duration * sample_rate) + + with tempfile.TemporaryDirectory() as test_dir: + test_store_dir = os.path.join(test_dir, 'store') + os.mkdir(test_store_dir) + + # Prepare metadata and audio files + manifest_filepaths = [] + audio_files = [] + for m in range(num_manifests): + manifest_dir = os.path.join(test_store_dir, f'manifest_{m}') + os.mkdir(manifest_dir) + manifest_filepath = os.path.join(manifest_dir, 'manifest.json') + + metadata = [] + data = _rng.uniform(low=-0.5, high=0.5, size=(data_duration_samples, num_examples)) + for n in range(num_examples): + audio_filepath = f'manifest_{m}_audio_{n:02d}.wav' + audio_file = os.path.join(manifest_dir, audio_filepath) + # Write audio file + sf.write(audio_file, data[:, n], sample_rate, 'float') + # Update metadata + metadata.append( + { + 'audio_filepath': audio_filepath, + 'duration': data_duration, + 'text': f'text for example {n:02d}', + } + ) + # Update audio files + audio_files.append(audio_file) + + # Save manifest + write_manifest(manifest_filepath, metadata) + manifest_filepaths.append(manifest_filepath) + + # Cache location + test_cache_dir = os.path.join(test_dir, 'cache') + + # Instead of using AIS, copy object from store dir to cache dir + def fake_get(self): + # Object path relative to store path + object_path = os.path.relpath(self.store_path, start=test_store_dir) + # Copy to fake local path + self._local_path = os.path.join(test_cache_dir, object_path) + os.makedirs(os.path.dirname(self.local_path), exist_ok=True) + shutil.copy(self.store_path, self.local_path) + # Return path as in the original get + return self.local_path + + with mock.patch( + 'nemo.collections.asr.data.audio_to_text.is_datastore_path', lambda x: True + ), mock.patch.object(DataStoreObject, 'get', fake_get): + cache_datastore_manifests(manifest_filepaths, cache_audio=cache_audio) + + # Manifests need to be compared + store_files_to_compare = manifest_filepaths + if cache_audio: + # Audio needs to be compared + store_files_to_compare += audio_files + + # Compare files + for f_store in store_files_to_compare: + f_cache = os.path.join(test_cache_dir, os.path.relpath(f_store, test_store_dir)) + assert filecmp.cmp(f_store, f_cache, shallow=False), f'Files {f_store} and {f_cache} do not match.' diff --git a/tests/utils/test_utils.py b/tests/utils/test_utils.py new file mode 100644 index 000000000000..0d70de3bacee --- /dev/null +++ b/tests/utils/test_utils.py @@ -0,0 +1,104 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from unittest import mock + +import pytest + +from nemo import __version__ as NEMO_VERSION +from nemo.utils.data_utils import ( + ais_binary, + ais_endpoint_to_dir, + bucket_and_object_from_uri, + datastore_path_to_webdataset_url, + is_datastore_path, + resolve_cache_dir, +) + + +class TestDataUtils: + @pytest.mark.unit + def test_resolve_cache_dir(self): + """Test cache dir path. + """ + TEST_NEMO_ENV_CACHE_DIR = 'TEST_NEMO_ENV_CACHE_DIR' + with mock.patch('nemo.constants.NEMO_ENV_CACHE_DIR', TEST_NEMO_ENV_CACHE_DIR): + + envar_to_resolved_path = { + '/path/to/cache': '/path/to/cache', + 'relative/path': os.path.join(os.getcwd(), 'relative/path'), + '': os.path.expanduser(f'~/.cache/torch/NeMo/NeMo_{NEMO_VERSION}'), + } + + for envar, expected_path in envar_to_resolved_path.items(): + # Set envar + os.environ[TEST_NEMO_ENV_CACHE_DIR] = envar + # Check path + uut_path = resolve_cache_dir().as_posix() + assert uut_path == expected_path, f'Expected: {expected_path}, got {uut_path}' + + @pytest.mark.unit + def test_is_datastore_path(self): + """Test checking for datastore path. + """ + # Positive examples + assert is_datastore_path('ais://positive/example') + # Negative examples + assert not is_datastore_path('ais/negative/example') + assert not is_datastore_path('/negative/example') + assert not is_datastore_path('negative/example') + + @pytest.mark.unit + def test_bucket_and_object_from_uri(self): + """Test getting bucket and object from URI. + """ + # Positive examples + assert bucket_and_object_from_uri('ais://bucket/object') == ('bucket', 'object') + assert bucket_and_object_from_uri('ais://bucket_2/object/is/here') == ('bucket_2', 'object/is/here') + + # Negative examples: invalid URI + with pytest.raises(ValueError): + bucket_and_object_from_uri('/local/file') + + with pytest.raises(ValueError): + bucket_and_object_from_uri('local/file') + + @pytest.mark.unit + def test_ais_endpoint_to_dir(self): + """Test converting an AIS endpoint to dir. + """ + assert ais_endpoint_to_dir('http://local:123') == os.path.join('local', '123') + assert ais_endpoint_to_dir('http://1.2.3.4:567') == os.path.join('1.2.3.4', '567') + + with pytest.raises(ValueError): + ais_endpoint_to_dir('local:123') + + @pytest.mark.unit + def test_ais_binary(self): + """Test cache dir path. + """ + with mock.patch('shutil.which', lambda x: '/test/path/ais'): + assert ais_binary() == '/test/path/ais' + + # Negative example: AIS binary cannot be found + with mock.patch('shutil.which', lambda x: None), mock.patch('os.path.isfile', lambda x: None): + with pytest.raises(RuntimeError): + ais_binary() + + @pytest.mark.unit + def test_datastore_path_to_webdataset_url(self): + """Test conversion of data store path to an URL for WebDataset. + """ + assert datastore_path_to_webdataset_url('ais://test/path') == 'pipe:ais get ais://test/path - || true' From d4f3cca2e4fc845ca0c4373e825af4e2cce62ea1 Mon Sep 17 00:00:00 2001 From: Somshubra Majumdar Date: Mon, 12 Dec 2022 15:51:58 -0800 Subject: [PATCH 030/154] Add support for MHA adapters to ASR (#5396) * Convert AbstractAdapterModule to AbstractAdapterMixin Signed-off-by: smajumdar * Temporary fixes to new signature of mixin Signed-off-by: smajumdar * Add adapter util for constants, add all mha adapters. Signed-off-by: smajumdar * Update name of function Signed-off-by: smajumdar * Roll back changes to convASR Signed-off-by: smajumdar * Convert AbstractAdapterModule to AbstractAdapterMixin Signed-off-by: smajumdar * First draft of Conformer support for MHA attention Signed-off-by: smajumdar * Add some preliminary tests Signed-off-by: smajumdar * Add support for projection of the hidden dimension for attention Signed-off-by: smajumdar * Add support for squeezeformer Signed-off-by: smajumdar * Update train adapter config Signed-off-by: smajumdar * Add tests for squeezeformer and unit tests for new modules Signed-off-by: smajumdar * Update config for hp search,set limits on modules for conformer and squeezeformer, update adapter mixin, add cache to import_from_class_path Signed-off-by: smajumdar * Update location of adapters Signed-off-by: smajumdar * Add pre_norm for proper attention learning, Fix the issue with nan/inf in pos_bias_u and pos_bias_v Signed-off-by: smajumdar * Update expmanager to clean up checkpoints Signed-off-by: smajumdar * Fix style Signed-off-by: smajumdar * Add docstrings and update tests Signed-off-by: smajumdar * Add docstrings and update tests Signed-off-by: smajumdar * Add docstrings and update tests Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update training scripts Signed-off-by: smajumdar * Update config and docs Signed-off-by: smajumdar * Expose nemo delete function Signed-off-by: smajumdar * Correct adapter partial state saving Signed-off-by: smajumdar * Correct a bug with state management of adapter tokens Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Pull down EMA test Signed-off-by: smajumdar * Correct name of adapter module utility class Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: smajumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- Jenkinsfile | 55 ++- docs/source/asr/api.rst | 42 ++ docs/source/core/adapters/api.rst | 8 + .../asr/asr_adapters/train_asr_adapter.py | 36 +- .../asr/conf/asr_adapters/asr_adaptation.yaml | 50 ++- .../conf/asr_adapters/asr_adaptation_hp.yaml | 68 +++- .../asr/modules/conformer_encoder.py | 8 + nemo/collections/asr/modules/conv_asr.py | 5 + .../asr/modules/squeezeformer_encoder.py | 15 + .../asr/parts/submodules/adapters/__init__.py | 26 ++ .../multi_head_attention_adapter_module.py | 382 ++++++++++++++++++ .../asr/parts/submodules/conformer_modules.py | 77 +++- .../parts/submodules/squeezeformer_modules.py | 77 +++- .../asr/parts/utils/adapter_utils.py | 41 +- .../common/parts/adapter_modules.py | 12 +- .../megatron_gpt_adapter_model.py | 4 +- .../megatron_t5_adapter_model.py | 8 +- .../megatron/adapters/parallel_adapters.py | 6 +- .../modules/common/megatron/transformer.py | 10 +- nemo/core/classes/mixins/__init__.py | 2 + .../mixins/adapter_mixin_strategies.py | 154 ++++++- nemo/core/classes/mixins/adapter_mixins.py | 139 +++++-- nemo/core/utils/process_launcher/launcher.py | 2 +- nemo/utils/exp_manager.py | 27 ++ nemo/utils/model_utils.py | 2 + .../mixins/adapters/test_asr_adapter_mixin.py | 210 +++++++++- .../adapters/test_asr_adapter_modules.py | 227 +++++++++++ .../common/mixins/test_adapter_modules.py | 6 +- .../adapters/test_adapter_model_mixin.py | 77 ++++ 29 files changed, 1619 insertions(+), 157 deletions(-) create mode 100644 nemo/collections/asr/parts/submodules/adapters/__init__.py create mode 100644 nemo/collections/asr/parts/submodules/adapters/multi_head_attention_adapter_module.py create mode 100644 tests/collections/asr/mixins/adapters/test_asr_adapter_modules.py diff --git a/Jenkinsfile b/Jenkinsfile index 3da4efea155b..1e416b7df757 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -210,20 +210,6 @@ pipeline { } } - stage('Speech to Text EMA') { - steps { - sh 'python examples/asr/asr_ctc/speech_to_text_ctc.py \ - model.train_ds.manifest_filepath=/home/TestData/an4_dataset/an4_train.json \ - model.validation_ds.manifest_filepath=/home/TestData/an4_dataset/an4_val.json \ - trainer.devices=2 \ - trainer.accelerator="gpu" \ - +trainer.fast_dev_run=True \ - +exp_manager.ema.enable=True \ - exp_manager.exp_dir=examples/asr/speech_to_text_results' - sh 'rm -rf examples/asr/speech_to_text_results' - } - } - stage('L2: Speech to Text WPE - CitriNet') { steps { sh 'python examples/asr/asr_ctc/speech_to_text_ctc_bpe.py \ @@ -318,6 +304,26 @@ pipeline { } } + stage('L2: Speech to Text EMA') { + when { + anyOf { + branch 'main' + changeRequest target: 'main' + } + } + steps { + sh 'python examples/asr/asr_ctc/speech_to_text_ctc.py \ + model.train_ds.manifest_filepath=/home/TestData/an4_dataset/an4_train.json \ + model.validation_ds.manifest_filepath=/home/TestData/an4_dataset/an4_val.json \ + trainer.devices=2 \ + trainer.accelerator="gpu" \ + +trainer.fast_dev_run=True \ + +exp_manager.ema.enable=True \ + exp_manager.exp_dir=examples/asr/speech_to_text_results' + sh 'rm -rf examples/asr/speech_to_text_results' + } + + } stage('L2: Speaker dev run') { when { @@ -658,12 +664,12 @@ pipeline { } failFast true parallel { - stage('ASR Adapters') { + stage('Linear Adapters') { steps { sh 'python examples/asr/asr_adapters/train_asr_adapter.py \ model.pretrained_model="stt_en_conformer_ctc_small" \ model.adapter.adapter_name="an4" \ - model.adapter.in_features=176 \ + model.adapter.linear.in_features=176 \ model.train_ds.manifest_filepath=/home/TestData/an4_dataset/an4_train.json \ model.validation_ds.manifest_filepath=/home/TestData/an4_dataset/an4_val.json \ trainer.max_steps=5 \ @@ -674,6 +680,23 @@ pipeline { sh 'rm -rf examples/asr/speech_to_text_adapters_results' } } + stage('RelPos MHA Adapters') { + steps { + sh 'python examples/asr/asr_adapters/train_asr_adapter.py \ + model.pretrained_model="stt_en_conformer_ctc_small" \ + model.adapter.adapter_name="encoder:an4" \ + model.adapter.adapter_type="tiny_attn" \ + model.adapter.tiny_attn.n_feat=176 \ + model.train_ds.manifest_filepath=/home/TestData/an4_dataset/an4_train.json \ + model.validation_ds.manifest_filepath=/home/TestData/an4_dataset/an4_val.json \ + trainer.max_steps=5 \ + trainer.devices=[0] \ + trainer.accelerator="gpu" \ + +trainer.fast_dev_run=True \ + exp_manager.exp_dir=examples/asr/speech_to_text_adapters_mha_results' + sh 'rm -rf examples/asr/speech_to_text_adapters_mha_results' + } + } } } diff --git a/docs/source/asr/api.rst b/docs/source/asr/api.rst index b9749e86ac86..590ca6d6b7f3 100644 --- a/docs/source/asr/api.rst +++ b/docs/source/asr/api.rst @@ -232,3 +232,45 @@ Hypotheses .. autoclass:: nemo.collections.asr.parts.utils.rnnt_utils.NBestHypotheses :show-inheritance: :no-members: + +Adapter Networks +~~~~~~~~~~~~~~~~ + +.. autoclass:: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.MultiHeadAttentionAdapter + :show-inheritance: + :members: + :member-order: bysource + +----- + +.. autoclass:: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.RelPositionMultiHeadAttentionAdapter + :show-inheritance: + :members: + :member-order: bysource + +----- + +.. autoclass:: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.PositionalEncodingAdapter + :show-inheritance: + :members: + :member-order: bysource + +----- + +.. autoclass:: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.RelPositionalEncodingAdapter + :show-inheritance: + :members: + :member-order: bysource + + +Adapter Strategies +~~~~~~~~~~~~~~~~~~ + +.. autoclass:: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.MHAResidualAddAdapterStrategy + :show-inheritance: + :members: + :member-order: bysource + :undoc-members: adapter_module_names + +----- + diff --git a/docs/source/core/adapters/api.rst b/docs/source/core/adapters/api.rst index 5ad8ea4164a9..8eea09f57e78 100644 --- a/docs/source/core/adapters/api.rst +++ b/docs/source/core/adapters/api.rst @@ -50,6 +50,14 @@ Adapter Strategies ----- +.. autoclass:: nemo.core.classes.mixins.adapter_mixin_strategies.ReturnResultAdapterStrategy + :show-inheritance: + :members: + :member-order: bysource + :undoc-members: adapter_module_names + +----- + .. autoclass:: nemo.core.classes.mixins.adapter_mixin_strategies.ResidualAddAdapterStrategy :show-inheritance: :members: diff --git a/examples/asr/asr_adapters/train_asr_adapter.py b/examples/asr/asr_adapters/train_asr_adapter.py index a71643760044..5a94e2bb332d 100644 --- a/examples/asr/asr_adapters/train_asr_adapter.py +++ b/examples/asr/asr_adapters/train_asr_adapter.py @@ -21,10 +21,11 @@ model.pretrained_model=null \ model.nemo_model=null \ model.adapter.adapter_name= \ + model.adapter.adapter_type="" \ model.adapter.adapter_module_name= \ - model.adapter.in_features= \ - model.adapter.dim=32 \ - model.adapter.dropout=0.0 \ + model.adapter.linear.in_features= \ + model.adapter.linear.dim=32 \ + model.adapter.linear.dropout=0.0 \ model.train_ds.manifest_filepath= \ model.train_ds.batch_size=16 \ model.validation_ds.manifest_filepath= \ @@ -46,8 +47,9 @@ model.pretrained_model=null \ model.nemo_model=null \ model.adapter.adapter_name= \ + model.adapter.adapter_type="" \ model.adapter.adapter_module_name= \ - model.adapter.in_features= \ + model.adapter.linear.in_features= \ model.train_ds.manifest_filepath= \ model.train_ds.batch_size=16 \ model.validation_ds.manifest_filepath= \ @@ -79,7 +81,6 @@ https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/results.html """ -import glob import os from dataclasses import is_dataclass @@ -90,7 +91,7 @@ from nemo.core import adapter_mixins from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import exp_manager +from nemo.utils.exp_manager import clean_exp_ckpt, exp_manager def update_model_config_to_support_adapter(model_cfg, current_cfg): @@ -188,10 +189,21 @@ def main(cfg): with open_dict(cfg.model.adapter): # Extract the name of the adapter (must be give for training) adapter_name = cfg.model.adapter.pop("adapter_name") + adapter_type = cfg.model.adapter.pop("adapter_type") adapter_module_name = cfg.model.adapter.pop("adapter_module_name", None) adapter_state_dict_name = cfg.model.adapter.pop("adapter_state_dict_name", None) - # augment adapter name with module name, if not provided by user + # Resolve the config of the specified `adapter_type` + if adapter_type not in cfg.model.adapter.keys(): + raise ValueError( + f"Adapter type ({adapter_type}) config could not be found. Adapter setup config - \n" + f"{OmegaConf.to_yaml(cfg.model.adapter)}" + ) + + adapter_type_cfg = cfg.model.adapter[adapter_type] + print(f"Found `{adapter_type}` config :\n" f"{OmegaConf.to_yaml(adapter_type_cfg)}") + + # Augment adapter name with module name, if not provided by user if adapter_module_name is not None and ':' not in adapter_name: adapter_name = f'{adapter_module_name}:{adapter_name}' @@ -200,7 +212,7 @@ def main(cfg): if adapter_global_cfg is not None: add_global_adapter_cfg(model, adapter_global_cfg) - model.add_adapter(adapter_name, cfg=cfg.model.adapter) + model.add_adapter(adapter_name, cfg=adapter_type_cfg) assert model.is_adapter_available() # Disable all other adapters, enable just the current adapter. @@ -234,12 +246,8 @@ def main(cfg): if 'delete_ckpt_after_train' in cfg: delete_ckpt_after_train = cfg.delete_ckpt_after_train if delete_ckpt_after_train: - logging.info("Deleting *.ckpt files after training to preserve storage space...") - - ckpt_files = glob.glob(os.path.join(exp_log_dir, "checkpoints", "*.ckpt")) - for filepath in ckpt_files: - logging.info(f"Deleting file : {filepath}") - os.remove(filepath) + # Remove PTL ckpt file, and potentially also remove .nemo file to conserve storage space. + clean_exp_ckpt(exp_log_dir, remove_ckpt=True, remove_nemo=False) if __name__ == '__main__': diff --git a/examples/asr/conf/asr_adapters/asr_adaptation.yaml b/examples/asr/conf/asr_adapters/asr_adaptation.yaml index 59df7ee41ca7..05a2d1ca0d92 100644 --- a/examples/asr/conf/asr_adapters/asr_adaptation.yaml +++ b/examples/asr/conf/asr_adapters/asr_adaptation.yaml @@ -60,24 +60,46 @@ model: log_prediction: false # enables logging sample predictions in the output during training adapter: - # Config of the adapter training/eval script. + ### Config of the adapter training/eval script ### adapter_name: ??? # Name of the adapter, used by the script + adapter_type: "linear" # Type of the adapter. Corresponds to the subconfigs below. adapter_module_name: null # Name of the adapter module. Combine multiple modules with '+' between module names. adapter_state_dict_name: "adapters.pt" # If the individual adapters must be saved, a file name can be provided here. null disables this. - # Config of the adapter module itself - _target_: nemo.collections.common.parts.adapter_modules.LinearAdapter - in_features: ??? # User must provide the output dimension of the layers of the model, which is the input dimension of this adapter. - dim: 32 # The hidden dimension of the adapter, as chosen by user, but small values are preferred to reduce param count. - activation: swish - norm_position: 'pre' # Can be `pre` or `post` - dropout: 0.0 # float, dropout for the adapter - - # Adapter strategy config - adapter_strategy: - _target_: nemo.core.classes.mixins.adapter_mixin_strategies.ResidualAddAdapterStrategy - stochastic_depth: 0.0 # float, setting to > 0 will enable stochastic depth for each adapter block. - l2_lambda: 0.0 # float, setting to > 0 will enable l2 norm auxiliary loss for each adapter's output. + ### Adapter Configs ### + # Linear / Houlsby Adapter (https://arxiv.org/abs/1902.00751) + linear: + # Config of the adapter module itself + _target_: nemo.collections.common.parts.adapter_modules.LinearAdapter + in_features: ??? # User must provide the output dimension of the layers of the model, which is the input dimension of this adapter. + dim: 32 # The hidden dimension of the adapter, as chosen by user, but small values are preferred to reduce param count. + activation: swish + norm_position: 'pre' # Can be `pre` or `post` + dropout: 0.0 # float, dropout for the adapter + + # Adapter strategy config + adapter_strategy: + _target_: nemo.core.classes.mixins.adapter_mixin_strategies.ResidualAddAdapterStrategy + stochastic_depth: 0.0 # float, setting to > 0 will enable stochastic depth for each adapter block. + l2_lambda: 0.0 # float, setting to > 0 will enable l2 norm auxiliary loss for each adapter's output. + + # Tiny-Attention Adapter (https://arxiv.org/abs/2211.01979) + # NOTE: Only supported for Attention based encoders. Make sure to pass `adapter_module_name` as "encoder" + tiny_attn: + # Config of the adapter module itself + # Defaults to Relative Positional Encoding MHA + # _target_ can instead be .MultiHeadAttentionAdapter if Conformer was originally using Absolute Positional Encoding. + _target_: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.RelPositionMultiHeadAttentionAdapter + n_feat: ??? # User must provide the output dimension of the layers of the model, which is the input dimension of this adapter. + n_head: 1 # Number of heads for attention. + proj_dim: -1 # Can be `null` - to avoid projection, > 0 for explicit dim, or -1 to default to `n_head` + dropout_rate: 0.0 # float, dropout for the adapter + + # Adapter strategy config + adapter_strategy: + _target_: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.MHAResidualAddAdapterStrategy + stochastic_depth: 0.0 # float, setting to > 0 will enable stochastic depth for each adapter block. + l2_lambda: 0.0 # float, setting to > 0 will enable l2 norm auxiliary loss for each adapter's output. # Optional global config available to all adapters at a global level. # A global config is shared across every layer of the adapters, defining global properties rather diff --git a/examples/asr/conf/asr_adapters/asr_adaptation_hp.yaml b/examples/asr/conf/asr_adapters/asr_adaptation_hp.yaml index 3762e8ae71d8..5d30901192a3 100644 --- a/examples/asr/conf/asr_adapters/asr_adaptation_hp.yaml +++ b/examples/asr/conf/asr_adapters/asr_adaptation_hp.yaml @@ -60,24 +60,46 @@ model: log_prediction: false # enables logging sample predictions in the output during training adapter: - # Config of the adapter training/eval script. + ### Config of the adapter training/eval script ### adapter_name: ??? # Name of the adapter, used by the script + adapter_type: "linear" # Type of the adapter. Corresponds to the subconfigs below. adapter_module_name: null # Name of the adapter module. Combine multiple modules with '+' between module names. adapter_state_dict_name: "adapters.pt" # If the individual adapters must be saved, a file name can be provided here. null disables this. - # Config of the adapter module itself - _target_: nemo.collections.common.parts.adapter_modules.LinearAdapter - in_features: ??? # User must provide the output dimension of the layers of the model, which is the input dimension of this adapter. - dim: 32 # The hidden dimension of the adapter, as chosen by user, but small values are preferred to reduce param count. - activation: swish - norm_position: 'pre' # Can be `pre` or `post` - dropout: 0.0 # float, dropout for the adapter + ### Adapter Configs ### + # Linear / Houlsby Adapter (https://arxiv.org/abs/1902.00751) + linear: + # Config of the adapter module itself + _target_: nemo.collections.common.parts.adapter_modules.LinearAdapter + in_features: ??? # User must provide the output dimension of the layers of the model, which is the input dimension of this adapter. + dim: 32 # The hidden dimension of the adapter, as chosen by user, but small values are preferred to reduce param count. + activation: swish + norm_position: 'pre' # Can be `pre` or `post` + dropout: 0.0 # float, dropout for the adapter - # Adapter strategy config - adapter_strategy: - _target_: nemo.core.classes.mixins.adapter_mixin_strategies.ResidualAddAdapterStrategy - stochastic_depth: 0.0 # float, setting to > 0 will enable stochastic depth for each adapter block. - l2_lambda: 0.0 # float, setting to > 0 will enable l2 norm auxiliary loss for each adapter's output. + # Adapter strategy config + adapter_strategy: + _target_: nemo.core.classes.mixins.adapter_mixin_strategies.ResidualAddAdapterStrategy + stochastic_depth: 0.0 # float, setting to > 0 will enable stochastic depth for each adapter block. + l2_lambda: 0.0 # float, setting to > 0 will enable l2 norm auxiliary loss for each adapter's output. + + # Tiny-Attention Adapter (https://arxiv.org/abs/2211.01979) + # NOTE: Only supported for Attention based encoders. Make sure to pass `adapter_module_name` as "encoder" + tinyattn: + # Config of the adapter module itself + # Defaults to Relative Positional Encoding MHA + # _target_ can instead be .MultiHeadAttentionAdapter if Conformer was originally using Absolute Positional Encoding. + _target_: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.RelPositionMultiHeadAttentionAdapter + n_feat: ??? # User must provide the output dimension of the layers of the model, which is the input dimension of this adapter. + n_head: 1 # Number of heads for attention. + proj_dim: -1 # Can be `null` - to avoid projection, > 0 for explicit dim, or -1 to default to `n_head` + dropout_rate: 0.0 # float, dropout for the adapter + + # Adapter strategy config + adapter_strategy: + _target_: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.MHAResidualAddAdapterStrategy + stochastic_depth: 0.0 # float, setting to > 0 will enable stochastic depth for each adapter block. + l2_lambda: 0.0 # float, setting to > 0 will enable l2 norm auxiliary loss for each adapter's output. # Optional global config available to all adapters at a global level. # A global config is shared across every layer of the adapters, defining global properties rather @@ -179,7 +201,12 @@ exp_manager: # Add a unique name for all hyperparameter arguments to allow continued training. # NOTE: It is necessary to add all hyperparameter arguments to the name ! # This ensures successful restoration of model runs in case HP search crashes. - name: ${name}-lr-${model.optim.lr}-adim-${model.adapter.dim}-sd-${model.adapter.adapter_strategy.stochastic_depth} + + ### `linear` adapter experiment name ### + name: ${name}-lr-${model.optim.lr}-adim-${model.adapter.linear.dim}-sd-${model.adapter.linear.adapter_strategy.stochastic_depth} + + ### `tiny_attn` adapter experiment name ### + # name: ${name}-lr-${model.optim.lr}-pdim-${model.adapter.tiny_attn.proj_dim}-nhead-${model.adapter.tiny_attn.n_head}-sd-${model.adapter.tiny_attn.adapter_strategy.stochastic_depth} create_tensorboard_logger: true create_checkpoint_callback: true @@ -209,16 +236,25 @@ defaults: - override hydra/launcher: nemo_launcher # Hydra arguments necessary for hyperparameter optimization +# NOTE: This is for the `linear` adapter type ! Please change sweep.params for other adapter types ! hydra: sweep: dir: "." subdir: "." sweeper: + ### `linear` adapter configuration ### params: # place all the parameters you wish to search over here (corresponding to the rest of the config) model.optim.lr: 0.001,0.0001 - model.adapter.dim: 32,64,96,128 - model.adapter.adapter_strategy.stochastic_depth: 0.0,0.5,0.6,0.7,0.8,0.9 + model.adapter.linear.dim: 32,64,96,128 + model.adapter.linear.adapter_strategy.stochastic_depth: 0.0,0.5,0.6,0.7,0.8,0.9 + + ### `tiny_attn` adapter configuration ### +# params: +# model.optim.lr: 0.001,0.0001 +# model.adapter.tiny_attn.n_head: 1 # Note if you use > 1 heads, the *minimum* proj_dim below should match your *max* n_head ! +# model.adapter.tiny_attn.proj_dim: 1,8,16,32,64 # 1,4,8,16,32,64 +# model.adapter.tiny_attn.adapter_strategy.stochastic_depth: 0.0,0.5,0.6,0.7,0.8,0.9 # Arguments to the hyperparameter runner launcher: diff --git a/nemo/collections/asr/modules/conformer_encoder.py b/nemo/collections/asr/modules/conformer_encoder.py index 64850d53587f..d042c09a60e7 100644 --- a/nemo/collections/asr/modules/conformer_encoder.py +++ b/nemo/collections/asr/modules/conformer_encoder.py @@ -615,6 +615,14 @@ class ConformerEncoderAdapter(ConformerEncoder, adapter_mixins.AdapterModuleMixi # Higher level forwarding def add_adapter(self, name: str, cfg: dict): + self.set_accepted_adapter_types( + [ + adapter_utils.LINEAR_ADAPTER_CLASSPATH, + adapter_utils.MHA_ADAPTER_CLASSPATH, + adapter_utils.RELMHA_ADAPTER_CLASSPATH, + ] + ) + cfg = self._update_adapter_cfg_input_dim(cfg) for conformer_layer in self.layers: # type: adapter_mixins.AdapterModuleMixin conformer_layer.add_adapter(name, cfg) diff --git a/nemo/collections/asr/modules/conv_asr.py b/nemo/collections/asr/modules/conv_asr.py index 6631de153036..e930723d0172 100644 --- a/nemo/collections/asr/modules/conv_asr.py +++ b/nemo/collections/asr/modules/conv_asr.py @@ -441,6 +441,9 @@ def __init__(self, feat_in, num_classes, init_mode="xavier_uniform", vocabulary= ) self.apply(lambda x: init_weights(x, mode=init_mode)) + accepted_adapters = [adapter_utils.LINEAR_ADAPTER_CLASSPATH] + self.set_accepted_adapter_types(accepted_adapters) + @typecheck() def forward(self, encoder_output): # Adapter module forward step @@ -885,6 +888,8 @@ class ConvASREncoderAdapter(ConvASREncoder, adapter_mixins.AdapterModuleMixin): def add_adapter(self, name: str, cfg: dict): for jasper_block in self.encoder: # type: adapter_mixins.AdapterModuleMixin cfg = self._update_adapter_cfg_input_dim(jasper_block, cfg) + + jasper_block.set_accepted_adapter_types([adapter_utils.LINEAR_ADAPTER_CLASSPATH]) jasper_block.add_adapter(name, cfg) def is_adapter_available(self) -> bool: diff --git a/nemo/collections/asr/modules/squeezeformer_encoder.py b/nemo/collections/asr/modules/squeezeformer_encoder.py index e8532f4d1bab..7aabea642877 100644 --- a/nemo/collections/asr/modules/squeezeformer_encoder.py +++ b/nemo/collections/asr/modules/squeezeformer_encoder.py @@ -19,10 +19,12 @@ import torch import torch.distributed import torch.nn as nn +from omegaconf import DictConfig from nemo.collections.asr.parts.submodules.multi_head_attention import PositionalEncoding, RelPositionalEncoding from nemo.collections.asr.parts.submodules.squeezeformer_modules import SqueezeformerLayer from nemo.collections.asr.parts.submodules.subsampling import ConvSubsampling, StackingSubsampling, TimeReductionModule +from nemo.collections.asr.parts.utils import adapter_utils from nemo.core.classes.common import typecheck from nemo.core.classes.exportable import Exportable from nemo.core.classes.mixins import adapter_mixins @@ -393,6 +395,15 @@ class SqueezeformerEncoderAdapter(SqueezeformerEncoder, adapter_mixins.AdapterMo # Higher level forwarding def add_adapter(self, name: str, cfg: dict): + self.set_accepted_adapter_types( + [ + adapter_utils.LINEAR_ADAPTER_CLASSPATH, + adapter_utils.MHA_ADAPTER_CLASSPATH, + adapter_utils.RELMHA_ADAPTER_CLASSPATH, + ] + ) + + cfg = self._update_adapter_cfg_input_dim(cfg) for conformer_layer in self.layers: # type: adapter_mixins.AdapterModuleMixin conformer_layer.add_adapter(name, cfg) @@ -411,6 +422,10 @@ def get_enabled_adapters(self) -> List[str]: names = sorted(list(names)) return names + def _update_adapter_cfg_input_dim(self, cfg: DictConfig): + cfg = adapter_utils.update_adapter_cfg_input_dim(self, cfg, module_dim=self.d_model) + return cfg + """ Register any additional information diff --git a/nemo/collections/asr/parts/submodules/adapters/__init__.py b/nemo/collections/asr/parts/submodules/adapters/__init__.py new file mode 100644 index 000000000000..6aa05d07dea1 --- /dev/null +++ b/nemo/collections/asr/parts/submodules/adapters/__init__.py @@ -0,0 +1,26 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module import ( + MHAResidualAddAdapterStrategy, + MHAResidualAddAdapterStrategyConfig, + MultiHeadAttentionAdapter, + MultiHeadAttentionAdapterConfig, + PositionalEncodingAdapter, + PositionalEncodingAdapterConfig, + RelPositionalEncodingAdapter, + RelPositionalEncodingAdapterConfig, + RelPositionMultiHeadAttentionAdapter, + RelPositionMultiHeadAttentionAdapterConfig, +) diff --git a/nemo/collections/asr/parts/submodules/adapters/multi_head_attention_adapter_module.py b/nemo/collections/asr/parts/submodules/adapters/multi_head_attention_adapter_module.py new file mode 100644 index 000000000000..169dde48602f --- /dev/null +++ b/nemo/collections/asr/parts/submodules/adapters/multi_head_attention_adapter_module.py @@ -0,0 +1,382 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +from dataclasses import dataclass +from typing import Any, Optional + +import torch +from torch import nn as nn + +from nemo.collections.asr.parts.submodules import multi_head_attention as mha +from nemo.collections.common.parts import adapter_modules +from nemo.core.classes.mixins import adapter_mixin_strategies + + +class MHAResidualAddAdapterStrategy(adapter_mixin_strategies.ResidualAddAdapterStrategy): + """ + An implementation of residual addition of an adapter module with its input for the MHA Adapters. + """ + + def forward(self, input: torch.Tensor, adapter: torch.nn.Module, *, module: 'AdapterModuleMixin'): + """ + A basic strategy, comprising of a residual connection over the input, after forward pass by + the underlying adapter. Additional work is done to pack and unpack the dictionary of inputs and outputs. + + Note: The `value` tensor is added to the output of the attention adapter as the residual connection. + + Args: + input: A dictionary of multiple input arguments for the adapter module. + `query`, `key`, `value`: Original output tensor of the module, or the output of the + previous adapter (if more than one adapters are enabled). + `mask`: Attention mask. + `pos_emb`: Optional positional embedding for relative encoding. + adapter: The adapter module that is currently required to perform the forward pass. + module: The calling module, in its entirety. It is a module that implements `AdapterModuleMixin`, + therefore the strategy can access all other adapters in this module via `module.adapter_layer`. + + Returns: + The result tensor, after one of the active adapters has finished its forward passes. + """ + out = self.compute_output(input, adapter, module=module) + + # If not in training mode, or probability of stochastic depth is 0, skip step. + p = self.stochastic_depth + if not module.training or p == 0.0: + pass + else: + out = self.apply_stochastic_depth(out, input['value'], adapter, module=module) + + # Return the residual connection output = input + adapter(input) + result = input['value'] + out + + # If l2_lambda is activated, register the loss value + self.compute_auxiliary_losses(result, input['value'], adapter, module=module) + + return result + + def compute_output( + self, input: torch.Tensor, adapter: torch.nn.Module, *, module: 'AdapterModuleMixin' + ) -> torch.Tensor: + """ + Compute the output of a single adapter to some input. + + Args: + input: Original output tensor of the module, or the output of the previous adapter (if more than + one adapters are enabled). + adapter: The adapter module that is currently required to perform the forward pass. + module: The calling module, in its entirety. It is a module that implements `AdapterModuleMixin`, + therefore the strategy can access all other adapters in this module via `module.adapter_layer`. + + Returns: + The result tensor, after one of the active adapters has finished its forward passes. + """ + if isinstance(input, (list, tuple)): + out = adapter(*input) + elif isinstance(input, dict): + out = adapter(**input) + else: + out = adapter(input) + return out + + +@dataclass +class MHAResidualAddAdapterStrategyConfig(adapter_mixin_strategies.ResidualAddAdapterStrategyConfig): + _target_: str = "{0}.{1}".format( + MHAResidualAddAdapterStrategy.__module__, MHAResidualAddAdapterStrategy.__name__ + ) # mandatory field + + +class MultiHeadAttentionAdapter(mha.MultiHeadAttention, adapter_modules.AdapterModuleUtil): + """Multi-Head Attention layer of Transformer. + Args: + n_head (int): number of heads + n_feat (int): size of the features + dropout_rate (float): dropout rate + proj_dim (int, optional): Optional integer value for projection before computing attention. + If None, then there is no projection (equivalent to proj_dim = n_feat). + If > 0, then will project the n_feat to proj_dim before calculating attention. + If <0, then will equal n_head, so that each head has a projected dimension of 1. + adapter_strategy: By default, MHAResidualAddAdapterStrategyConfig. An adapter composition function object. + """ + + def __init__( + self, + n_head: int, + n_feat: int, + dropout_rate: float, + proj_dim: Optional[int] = None, + adapter_strategy: MHAResidualAddAdapterStrategy = None, + ): + super().__init__(n_head=n_head, n_feat=n_feat, dropout_rate=dropout_rate, max_cache_len=0) + + self.pre_norm = nn.LayerNorm(n_feat) + + # Set the projection dim to number of heads automatically + if proj_dim is not None and proj_dim < 1: + proj_dim = n_head + + self.proj_dim = proj_dim + + # Recompute weights for projection dim + if self.proj_dim is not None: + if self.proj_dim % n_head != 0: + raise ValueError(f"proj_dim ({proj_dim}) is not divisible by n_head ({n_head})") + + self.d_k = self.proj_dim // n_head + self.s_d_k = math.sqrt(self.d_k) + self.linear_q = nn.Linear(n_feat, self.proj_dim) + self.linear_k = nn.Linear(n_feat, self.proj_dim) + self.linear_v = nn.Linear(n_feat, self.proj_dim) + self.linear_out = nn.Linear(self.proj_dim, n_feat) + + # Setup adapter strategy + self.setup_adapter_strategy(adapter_strategy) + + # reset parameters for Q to be identity operation + self.reset_parameters() + + def forward(self, query, key, value, mask, pos_emb=None, cache=None, cache_next=None): + """Compute 'Scaled Dot Product Attention'. + Args: + query (torch.Tensor): (batch, time1, size) + key (torch.Tensor): (batch, time2, size) + value(torch.Tensor): (batch, time2, size) + mask (torch.Tensor): (batch, time1, time2) + cache (torch.Tensor) : (cache_nums, batch, time_cache, size) + cache_next (torch.Tensor) : (cache_nums, batch, time_cache_next, size) + + returns: + output (torch.Tensor): transformed `value` (batch, time1, d_model) weighted by the query dot key attention + """ + # Need to perform duplicate computations as at this point the tensors have been + # separated by the adapter forward + query = self.pre_norm(query) + key = self.pre_norm(key) + value = self.pre_norm(value) + + return super().forward(query, key, value, mask, pos_emb, cache=cache, cache_next=cache_next) + + def reset_parameters(self): + with torch.no_grad(): + nn.init.zeros_(self.linear_out.weight) + nn.init.zeros_(self.linear_out.bias) + + def get_default_strategy_config(self) -> 'dataclass': + return MHAResidualAddAdapterStrategyConfig() + + +@dataclass +class MultiHeadAttentionAdapterConfig: + n_head: int + n_feat: int + dropout_rate: float = 0.0 + proj_dim: Optional[int] = None + adapter_strategy: Optional[Any] = MHAResidualAddAdapterStrategyConfig() + _target_: str = "{0}.{1}".format(MultiHeadAttentionAdapter.__module__, MultiHeadAttentionAdapter.__name__) + + +class RelPositionMultiHeadAttentionAdapter(mha.RelPositionMultiHeadAttention, adapter_modules.AdapterModuleUtil): + """Multi-Head Attention layer of Transformer-XL with support of relative positional encoding. + Paper: https://arxiv.org/abs/1901.02860 + Args: + n_head (int): number of heads + n_feat (int): size of the features + dropout_rate (float): dropout rate + proj_dim (int, optional): Optional integer value for projection before computing attention. + If None, then there is no projection (equivalent to proj_dim = n_feat). + If > 0, then will project the n_feat to proj_dim before calculating attention. + If <0, then will equal n_head, so that each head has a projected dimension of 1. + adapter_strategy: By default, MHAResidualAddAdapterStrategyConfig. An adapter composition function object. + """ + + def __init__( + self, + n_head: int, + n_feat: int, + dropout_rate: float, + proj_dim: Optional[int] = None, + adapter_strategy: MHAResidualAddAdapterStrategyConfig = None, + ): + super().__init__( + n_head=n_head, n_feat=n_feat, dropout_rate=dropout_rate, pos_bias_u=None, pos_bias_v=None, max_cache_len=0 + ) + + self.pre_norm = nn.LayerNorm(n_feat) + + # Set the projection dim to number of heads automatically + if proj_dim is not None and proj_dim < 1: + proj_dim = n_head + + self.proj_dim = proj_dim + + # Recompute weights for projection dim + if self.proj_dim is not None: + if self.proj_dim % n_head != 0: + raise ValueError(f"proj_dim ({proj_dim}) is not divisible by n_head ({n_head})") + + self.d_k = self.proj_dim // n_head + self.s_d_k = math.sqrt(self.d_k) + self.linear_q = nn.Linear(n_feat, self.proj_dim) + self.linear_k = nn.Linear(n_feat, self.proj_dim) + self.linear_v = nn.Linear(n_feat, self.proj_dim) + self.linear_out = nn.Linear(self.proj_dim, n_feat) + self.linear_pos = nn.Linear(n_feat, self.proj_dim, bias=False) + self.pos_bias_u = nn.Parameter(torch.FloatTensor(self.h, self.d_k)) + self.pos_bias_v = nn.Parameter(torch.FloatTensor(self.h, self.d_k)) + + # Setup adapter strategy + self.setup_adapter_strategy(adapter_strategy) + + # reset parameters for Q to be identity operation + self.reset_parameters() + + def forward(self, query, key, value, mask, pos_emb, cache=None, cache_next=None): + """Compute 'Scaled Dot Product Attention' with rel. positional encoding. + Args: + query (torch.Tensor): (batch, time1, size) + key (torch.Tensor): (batch, time2, size) + value(torch.Tensor): (batch, time2, size) + mask (torch.Tensor): (batch, time1, time2) + pos_emb (torch.Tensor) : (batch, time1, size) + cache (torch.Tensor) : (cache_nums, batch, time_cache, size) + cache_next (torch.Tensor) : (cache_nums, batch, time_cache_next, size) + Returns: + output (torch.Tensor): transformed `value` (batch, time1, d_model) weighted by the query dot key attention + """ + # Need to perform duplicate computations as at this point the tensors have been + # separated by the adapter forward + query = self.pre_norm(query) + key = self.pre_norm(key) + value = self.pre_norm(value) + + return super().forward(query, key, value, mask, pos_emb, cache=cache, cache_next=cache_next) + + def reset_parameters(self): + with torch.no_grad(): + nn.init.zeros_(self.linear_out.weight) + nn.init.zeros_(self.linear_out.bias) + + # NOTE: This exact procedure apparently highly important. + # Above operation is safe to do as self.linear_out.weight *= 0.0 (similar for bias) + # However: + # DO NOT REPLACE BELOW WITH self.pos_bias_u *= 0.0 OR self.pos_bias_v *= 0.0 + # For some reason at init sometimes it will cause the value of the tensor to become NaN + # All operations to compute matrix_ac and matrix_bd will then fail. + nn.init.zeros_(self.pos_bias_u) + nn.init.zeros_(self.pos_bias_v) + + def get_default_strategy_config(self) -> 'dataclass': + return MHAResidualAddAdapterStrategyConfig() + + +@dataclass +class RelPositionMultiHeadAttentionAdapterConfig: + n_head: int + n_feat: int + dropout_rate: float = 0.0 + proj_dim: Optional[int] = None + adapter_strategy: Optional[Any] = MHAResidualAddAdapterStrategyConfig() + _target_: str = "{0}.{1}".format( + RelPositionMultiHeadAttentionAdapter.__module__, RelPositionMultiHeadAttentionAdapter.__name__ + ) + + +class PositionalEncodingAdapter(mha.PositionalEncoding, adapter_modules.AdapterModuleUtil): + + """ + Absolute positional embedding adapter. + + .. note:: + + Absolute positional embedding value is added to the input tensor *without residual connection* ! + Therefore, the input is changed, if you only require the positional embedding, drop the returned `x` ! + + Args: + d_model (int): The input dimension of x. + max_len (int): The max sequence length. + xscale (float): The input scaling factor. Defaults to 1.0. + adapter_strategy (AbstractAdapterStrategy): By default, ReturnResultAdapterStrategyConfig. + An adapter composition function object. + NOTE: Since this is a positional encoding, it will not add a residual ! + """ + + def __init__( + self, + d_model: int, + max_len: int = 5000, + xscale=1.0, + adapter_strategy: adapter_mixin_strategies.ReturnResultAdapterStrategyConfig = None, + ): + + super().__init__( + d_model=d_model, dropout_rate=0.0, max_len=max_len, xscale=xscale, dropout_rate_emb=0.0, + ) + + # Setup adapter strategy + self.setup_adapter_strategy(adapter_strategy) + + def get_default_strategy_config(self) -> 'dataclass': + return adapter_mixin_strategies.ReturnResultAdapterStrategyConfig() + + +@dataclass +class PositionalEncodingAdapterConfig: + d_model: int + max_len: int = 5000 + xscale: float = 1.0 + adapter_strategy: Optional[Any] = adapter_mixin_strategies.ResidualAddAdapterStrategyConfig() + _target_: str = "{0}.{1}".format(PositionalEncodingAdapter.__module__, PositionalEncodingAdapter.__name__) + + +class RelPositionalEncodingAdapter(mha.RelPositionalEncoding, adapter_modules.AdapterModuleUtil): + """ + Relative positional encoding for TransformerXL's layers + See : Appendix B in https://arxiv.org/abs/1901.02860 + + .. note:: + + Relative positional embedding value is **not** added to the input tensor ! + Therefore, the input should be updated changed, if you only require the positional embedding, drop the returned `x` ! + + Args: + d_model (int): embedding dim + max_len (int): maximum input length + xscale (bool): whether to scale the input by sqrt(d_model) + adapter_strategy: By default, ReturnResultAdapterStrategyConfig. An adapter composition function object. + """ + + def __init__( + self, + d_model: int, + max_len: int = 5000, + xscale=1.0, + adapter_strategy: adapter_mixin_strategies.ReturnResultAdapterStrategyConfig = None, + ): + super().__init__(d_model=d_model, dropout_rate=0.0, max_len=max_len, xscale=xscale, dropout_rate_emb=0.0) + + # Setup adapter strategy + self.setup_adapter_strategy(adapter_strategy) + + def get_default_strategy_config(self) -> 'dataclass': + return adapter_mixin_strategies.ReturnResultAdapterStrategyConfig() + + +@dataclass +class RelPositionalEncodingAdapterConfig: + d_model: int + max_len: int = 5000 + xscale: float = 1.0 + adapter_strategy: Optional[Any] = adapter_mixin_strategies.ResidualAddAdapterStrategyConfig() + _target_: str = "{0}.{1}".format(RelPositionalEncodingAdapter.__module__, RelPositionalEncodingAdapter.__name__) diff --git a/nemo/collections/asr/parts/submodules/conformer_modules.py b/nemo/collections/asr/parts/submodules/conformer_modules.py index c8e9b94e64ef..df525db8dfbc 100644 --- a/nemo/collections/asr/parts/submodules/conformer_modules.py +++ b/nemo/collections/asr/parts/submodules/conformer_modules.py @@ -23,6 +23,7 @@ RelPositionMultiHeadAttention, ) from nemo.collections.asr.parts.utils.activations import Swish +from nemo.collections.common.parts import adapter_modules from nemo.collections.common.parts.utils import activation_registry from nemo.core.classes.mixins import AccessMixin from nemo.core.classes.mixins.adapter_mixins import AdapterModuleMixin @@ -154,6 +155,17 @@ def forward( x = None residual = residual + self.dropout(x) + if self.is_adapter_available(): + # Call the MHA adapters + pack_ip = { + 'x': residual, + 'loc': 'mha', + 'att_mask': att_mask, + 'pos_emb': pos_emb, + } + pack_ip = self.forward_enabled_adapters(pack_ip) + residual = pack_ip['x'] + x = self.norm_conv(residual) x = self.conv(x, pad_mask=pad_mask, cache=cache_last_time, cache_next=cache_last_time_next) residual = residual + self.dropout(x) @@ -166,13 +178,76 @@ def forward( if self.is_adapter_available(): # Call the adapters - x = self.forward_enabled_adapters(x) + pack_ip = { + 'x': x, + 'loc': 'post', + } + pack_ip = self.forward_enabled_adapters(pack_ip) + x = pack_ip['x'] if self.is_access_enabled() and self.access_cfg.get('save_encoder_tensors', False): self.register_accessible_tensor(name='encoder', tensor=x) return x + def forward_single_enabled_adapter_( + self, + input: dict, + adapter_module: torch.nn.Module, + *, + adapter_name: str, + adapter_strategy: 'nemo.core.classes.mixins.adapter_mixin_strategies.AbstractAdapterStrategy', + ): + """ + Perform the forward step of a single adapter module on some input data. + + **Note**: Subclasses can override this method to accommodate more complicate adapter forward steps. + + Args: + input: Dictionary of packed tensors. The dict should contain at least + `x`: output tensor + `loc`: Semantic location in module where this adapter was called + `att_mask`: Optional, Attention mask + `pos_emb`: Optional, Positional Embedding for Relative Positional Encoding. + The output tensor of the calling module is the input to the first adapter, whose output + is then chained to the next adapter until all adapters are consumed. + adapter_module: The adapter module that is currently required to perform the forward pass. + adapter_name: The resolved name of the adapter that is undergoing the current forward pass. + adapter_strategy: A subclass of `AbstractAdapterStrategy`, that determines how the + output of the adapter should be merged with the input, or if it should be merged at all. + + Returns: + The result tensor, after the current active adapter has finished its forward pass. + """ + # (input: torch.Tensor, adapter: torch.nn.Module, *, module: 'AdapterModuleMixin') + x = input['x'] + loc = input['loc'] + att_mask = input.get('att_mask', None) + pos_emb = input.get('pos_emb', None) + + if isinstance(adapter_module, adapter_modules.LinearAdapter) and loc == 'post': + output = adapter_strategy(x, adapter_module, module=self) + + elif isinstance(adapter_module, MultiHeadAttention) and loc == 'mha': + if self.self_attention_model == 'rel_pos': + x = dict(query=x, key=x, value=x, mask=att_mask, pos_emb=pos_emb) + output = adapter_strategy(x, adapter_module, module=self) + + elif self.self_attention_model == 'abs_pos': + x = dict(query=x, key=x, value=x, mask=att_mask) + output = adapter_strategy(x, adapter_module, module=self) + + else: + raise ValueError(f"Unsupported value of self_attention_model , provided {self.self_attention_model}!") + + else: + # No adapter compatible, skip + output = x + + input['x'] = output + + return input + class ConformerConvolution(nn.Module): """The convolution module for the Conformer model. diff --git a/nemo/collections/asr/parts/submodules/squeezeformer_modules.py b/nemo/collections/asr/parts/submodules/squeezeformer_modules.py index 76ff89cd7442..56c462396523 100644 --- a/nemo/collections/asr/parts/submodules/squeezeformer_modules.py +++ b/nemo/collections/asr/parts/submodules/squeezeformer_modules.py @@ -21,6 +21,7 @@ MultiHeadAttention, RelPositionMultiHeadAttention, ) +from nemo.collections.common.parts import adapter_modules from nemo.core.classes.mixins import AccessMixin from nemo.core.classes.mixins.adapter_mixins import AdapterModuleMixin @@ -152,6 +153,17 @@ def forward(self, x, att_mask=None, pos_emb=None, pad_mask=None): x = self.norm_self_att(x) residual = x + if self.is_adapter_available(): + # Call the MHA adapters + pack_ip = { + 'x': residual, + 'loc': 'mha', + 'att_mask': att_mask, + 'pos_emb': pos_emb, + } + pack_ip = self.forward_enabled_adapters(pack_ip) + x = pack_ip['x'] + x = self.feed_forward1_scale(x) x = self.feed_forward1(x) x = residual + self.dropout(x) * self.fc_factor @@ -171,13 +183,76 @@ def forward(self, x, att_mask=None, pos_emb=None, pad_mask=None): if self.is_adapter_available(): # Call the adapters - x = self.forward_enabled_adapters(x) + pack_ip = { + 'x': x, + 'loc': 'post', + } + pack_ip = self.forward_enabled_adapters(pack_ip) + x = pack_ip['x'] if self.is_access_enabled() and self.access_cfg.get('save_encoder_tensors', False): self.register_accessible_tensor(name='encoder', tensor=x) return x + def forward_single_enabled_adapter_( + self, + input: dict, + adapter_module: torch.nn.Module, + *, + adapter_name: str, + adapter_strategy: 'nemo.core.classes.mixins.adapter_mixin_strategies.AbstractAdapterStrategy', + ): + """ + Perform the forward step of a single adapter module on some input data. + + **Note**: Subclasses can override this method to accommodate more complicate adapter forward steps. + + Args: + input: Dictionary of packed tensors. The dict should contain at least + `x`: output tensor + `loc`: Semantic location in module where this adapter was called + `att_mask`: Optional, Attention mask + `pos_emb`: Optional, Positional Embedding for Relative Positional Encoding. + The output tensor of the calling module is the input to the first adapter, whose output + is then chained to the next adapter until all adapters are consumed. + adapter_module: The adapter module that is currently required to perform the forward pass. + adapter_name: The resolved name of the adapter that is undergoing the current forward pass. + adapter_strategy: A subclass of `AbstractAdapterStrategy`, that determines how the + output of the adapter should be merged with the input, or if it should be merged at all. + + Returns: + The result tensor, after the current active adapter has finished its forward pass. + """ + # (input: torch.Tensor, adapter: torch.nn.Module, *, module: 'AdapterModuleMixin') + x = input['x'] + loc = input['loc'] + att_mask = input.get('att_mask', None) + pos_emb = input.get('pos_emb', None) + + if isinstance(adapter_module, adapter_modules.LinearAdapter) and loc == 'post': + output = adapter_strategy(x, adapter_module, module=self) + + elif isinstance(adapter_module, MultiHeadAttention) and loc == 'mha': + if self.self_attention_model == 'rel_pos': + x = dict(query=x, key=x, value=x, mask=att_mask, pos_emb=pos_emb) + output = adapter_strategy(x, adapter_module, module=self) + + elif self.self_attention_model == 'abs_pos': + x = dict(query=x, key=x, value=x, mask=att_mask) + output = adapter_strategy(x, adapter_module, module=self) + + else: + raise ValueError(f"Unsupported value of self_attention_model , provided {self.self_attention_model}!") + + else: + # No adapter compatible, skip + output = x + + input['x'] = output + + return input + def reset_parameters(self): # Used for Squeezeformer initialization only self.feed_forward1.reset_parameters_ff() diff --git a/nemo/collections/asr/parts/utils/adapter_utils.py b/nemo/collections/asr/parts/utils/adapter_utils.py index a489a1c8a040..5b74a296419a 100644 --- a/nemo/collections/asr/parts/utils/adapter_utils.py +++ b/nemo/collections/asr/parts/utils/adapter_utils.py @@ -19,6 +19,19 @@ from nemo.utils import logging +# Constants +LINEAR_ADAPTER_CLASSPATH = "nemo.collections.common.parts.adapter_modules.LinearAdapter" +MHA_ADAPTER_CLASSPATH = ( + "nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.MultiHeadAttentionAdapter" +) +RELMHA_ADAPTER_CLASSPATH = "nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.RelPositionMultiHeadAttentionAdapter" +POS_ENCODING_ADAPTER_CLASSPATH = ( + "nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.PositionalEncodingAdapter" +) +REL_POS_ENCODING_ADAPTER_CLASSPATH = ( + "nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.RelPositionalEncodingAdapter" +) + def convert_adapter_cfg_to_dict_config(cfg: DictConfig): # Convert to DictConfig from dict or Dataclass @@ -45,16 +58,26 @@ def update_adapter_cfg_input_dim(module: torch.nn.Module, cfg: DictConfig, *, mo """ cfg = convert_adapter_cfg_to_dict_config(cfg) - if 'in_features' in cfg: - in_planes = cfg['in_features'] + input_dim_valid_keys = ['in_features', 'n_feat'] + input_key = None - if in_planes != module_dim: - logging.info(f"Updating {module.__class__.__name__} Adapter input dim from {in_planes} to {module_dim}") - in_planes = module_dim + for key in input_dim_valid_keys: + if key in cfg: + input_key = key + break - cfg['in_features'] = in_planes - return cfg - else: + if input_key is None: raise ValueError( - f"Failed to infer the input dimension of the Adapter cfg. Provided config : \n" f"{OmegaConf.to_yaml(cfg)}" + f"Failed to infer the input dimension of the Adapter cfg. \nExpected one of : {input_dim_valid_keys}.\n" + f"Provided config : \n" + f"{OmegaConf.to_yaml(cfg)}" ) + + input_dim = cfg[input_key] + + if input_dim != module_dim: + logging.info(f"Updating {module.__class__.__name__} Adapter input dim from {input_dim} to {module_dim}") + input_dim = module_dim + + cfg[input_key] = input_dim + return cfg diff --git a/nemo/collections/common/parts/adapter_modules.py b/nemo/collections/common/parts/adapter_modules.py index ad80b84c3f02..8cd4e8941ca5 100644 --- a/nemo/collections/common/parts/adapter_modules.py +++ b/nemo/collections/common/parts/adapter_modules.py @@ -23,7 +23,7 @@ from nemo.core.classes.mixins import access_mixins, adapter_mixin_strategies -class AbstractAdapterModule(nn.Module, access_mixins.AccessMixin): +class AdapterModuleUtil(access_mixins.AccessMixin): """ Base class of Adapter Modules, providing common functionality to all Adapter Modules. """ @@ -40,7 +40,7 @@ def setup_adapter_strategy(self, adapter_strategy: Optional[adapter_mixin_strate """ # set default adapter strategy if adapter_strategy is None: - adapter_strategy = adapter_mixin_strategies.ResidualAddAdapterStrategyConfig() + adapter_strategy = self.get_default_strategy_config() if is_dataclass(adapter_strategy): adapter_strategy = OmegaConf.structured(adapter_strategy) @@ -55,8 +55,14 @@ def setup_adapter_strategy(self, adapter_strategy: Optional[adapter_mixin_strate else: raise AttributeError(f'`adapter_strategy` provided is invalid : {adapter_strategy}') + def get_default_strategy_config(self) -> 'dataclass': + """ + Returns a default adapter module strategy. + """ + return adapter_mixin_strategies.ResidualAddAdapterStrategyConfig() + -class LinearAdapter(AbstractAdapterModule): +class LinearAdapter(nn.Module, AdapterModuleUtil): """ Simple Linear Feedforward Adapter module with LayerNorm and singe hidden layer with activation function. diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_adapter_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_adapter_model.py index b0f90e22ba4c..2ef1d2995367 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_adapter_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_adapter_model.py @@ -138,7 +138,7 @@ def state_dict(self, destination=None, prefix=None, keep_vars=False): for name, module in self.frozen_model.named_modules(): if isinstance(module, adapter_mixins.AdapterModuleMixin) and module.is_adapter_available(): for adapter_key in self.adapter_name_keys: - adapter_module = module.get_from_adapter_layer(adapter_key) + adapter_module = module.get_adapter_module(adapter_key) if adapter_module: state_adapter_key = ':'.join([name, adapter_key]) state_dict_[state_adapter_key] = adapter_module.state_dict() @@ -154,7 +154,7 @@ def load_state_dict(self, state_dict, strict: bool = True): for name, module in self.frozen_model.named_modules(): if isinstance(module, adapter_mixins.AdapterModuleMixin) and module.is_adapter_available(): for adapter_key in self.adapter_name_keys: - adapter_module = module.get_from_adapter_layer(adapter_key) + adapter_module = module.get_adapter_module(adapter_key) if adapter_module: state_adapter_key = ':'.join([name, adapter_key]) adapter_module.load_state_dict(state_dict[state_adapter_key], strict) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_t5_adapter_model.py b/nemo/collections/nlp/models/language_modeling/megatron_t5_adapter_model.py index ab707eb1d3ba..f0ba7a8b69bd 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_t5_adapter_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_t5_adapter_model.py @@ -237,7 +237,7 @@ def state_dict(self, destination=None, prefix=None, keep_vars=False): for name, module in self.frozen_model.named_modules(): if isinstance(module, adapter_mixins.AdapterModuleMixin) and module.is_adapter_available(): for adapter_key in self.adapter_name_keys: - adapter_module = module.get_from_adapter_layer(adapter_key) + adapter_module = module.get_adapter_module(adapter_key) if adapter_module: state_adapter_key = ':'.join([name, adapter_key]) state_dict_[state_adapter_key] = adapter_module.state_dict() @@ -252,7 +252,7 @@ def load_state_dict(self, state_dict, strict: bool = True): for name, module in self.frozen_model.named_modules(): if isinstance(module, adapter_mixins.AdapterModuleMixin) and module.is_adapter_available(): for adapter_key in self.adapter_name_keys: - adapter_module = module.get_from_adapter_layer(adapter_key) + adapter_module = module.get_adapter_module(adapter_key) if adapter_module: state_adapter_key = ':'.join([name, adapter_key]) adapter_module.load_state_dict(state_dict[state_adapter_key], strict) @@ -481,7 +481,7 @@ def _component_state_dict(self, component_name, component, adapter_name_keys): for name, module in component.named_modules(): if isinstance(module, adapter_mixins.AdapterModuleMixin) and module.is_adapter_available(): for adapter_key in adapter_name_keys: - adapter_module = module.get_from_adapter_layer(adapter_key) + adapter_module = module.get_adapter_module(adapter_key) if adapter_module: state_adapter_key = ':'.join([component_name, name, adapter_key]) state_dict_[state_adapter_key] = adapter_module.state_dict() @@ -494,7 +494,7 @@ def _load_component_state_dict( for name, module in component.named_modules(): if isinstance(module, adapter_mixins.AdapterModuleMixin) and module.is_adapter_available(): for adapter_key in adapter_name_keys: - adapter_module = module.get_from_adapter_layer(adapter_key) + adapter_module = module.get_adapter_module(adapter_key) if adapter_module: state_adapter_key = ':'.join([component_name, name, adapter_key]) adapter_module.load_state_dict(state_dict[state_adapter_key], strict) diff --git a/nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py b/nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py index b8cbb3ef0c24..e8e9fa24d008 100644 --- a/nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py +++ b/nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py @@ -22,7 +22,7 @@ import torch import torch.nn as nn -from nemo.collections.common.parts.adapter_modules import AbstractAdapterModule +from nemo.collections.common.parts.adapter_modules import AdapterModuleUtil from nemo.collections.common.parts.utils import activation_registry from nemo.collections.nlp.modules.common.megatron.utils import init_method_const, init_method_normal from nemo.core.classes.mixins import adapter_mixin_strategies @@ -50,7 +50,7 @@ class AdapterName(str, enum.Enum): POST_ATTN_ADAPTER = 'adapter_2' -class InfusedAdapter(AbstractAdapterModule): +class InfusedAdapter(nn.Module, AdapterModuleUtil): def __init__( self, in_features: int, adapter_strategy: adapter_mixin_strategies.ResidualAddAdapterStrategyConfig = None, ) -> None: @@ -85,7 +85,7 @@ class MLPInfusedAdapterConfig(InfusedAdapterConfig): _target_: str = "{0}.{1}".format(MLPInfusedAdapter.__module__, MLPInfusedAdapter.__name__) -class ParallelLinearAdapter(AbstractAdapterModule): +class ParallelLinearAdapter(nn.Module, AdapterModuleUtil): def __init__( self, in_features: int, diff --git a/nemo/collections/nlp/modules/common/megatron/transformer.py b/nemo/collections/nlp/modules/common/megatron/transformer.py index 11d9a7848cd3..8c36d2d504d7 100644 --- a/nemo/collections/nlp/modules/common/megatron/transformer.py +++ b/nemo/collections/nlp/modules/common/megatron/transformer.py @@ -273,7 +273,7 @@ def forward(self, hidden_states): if self.dropout > 0: intermediate_parallel = F.dropout(intermediate_parallel, p=self.dropout, training=self.training) - infused_adapter = self.get_from_adapter_layer(AdapterName.MLP_INFUSED) + infused_adapter = self.get_adapter_module(AdapterName.MLP_INFUSED) if infused_adapter: intermediate_parallel = infused_adapter(intermediate_parallel) @@ -954,8 +954,8 @@ def forward( query_layer = query_layer.view(*new_tensor_shape) if self.is_adapter_available(): - key_infused_adapter = self.get_from_adapter_layer(AdapterName.KEY_INFUSED) - value_infused_adapter = self.get_from_adapter_layer(AdapterName.VALUE_INFUSED) + key_infused_adapter = self.get_adapter_module(AdapterName.KEY_INFUSED) + value_infused_adapter = self.get_adapter_module(AdapterName.VALUE_INFUSED) if key_infused_adapter: assert value_infused_adapter is not None, "Expected value_infused_adapter not found!" kls = key_layer.shape @@ -1608,7 +1608,7 @@ def forward( # print(f"Layer: {self.layer_number} Attention checksum {layernorm_input.sum()}") if self.is_adapter_available(): - adapter_1 = self.get_from_adapter_layer(AdapterName.PRE_ATTN_ADAPTER) + adapter_1 = self.get_adapter_module(AdapterName.PRE_ATTN_ADAPTER) if adapter_1: strategy = adapter_1.adapter_strategy layernorm_input = self.forward_single_enabled_adapter_( @@ -1703,7 +1703,7 @@ def forward( if ( self.is_adapter_available() ): # TODO: (@adithyre) was able to move adapter_2 back to the end of the transformer after ptl 1.7 update. - adapter_2 = self.get_from_adapter_layer(AdapterName.POST_ATTN_ADAPTER) + adapter_2 = self.get_adapter_module(AdapterName.POST_ATTN_ADAPTER) if adapter_2: strategy = adapter_2.adapter_strategy output = self.forward_single_enabled_adapter_( diff --git a/nemo/core/classes/mixins/__init__.py b/nemo/core/classes/mixins/__init__.py index 17974fab08e3..f29d48a41f31 100644 --- a/nemo/core/classes/mixins/__init__.py +++ b/nemo/core/classes/mixins/__init__.py @@ -16,6 +16,8 @@ from nemo.core.classes.mixins.adapter_mixin_strategies import ( ResidualAddAdapterStrategy, ResidualAddAdapterStrategyConfig, + ReturnResultAdapterStrategy, + ReturnResultAdapterStrategyConfig, ) from nemo.core.classes.mixins.adapter_mixins import ( AdapterModelPTMixin, diff --git a/nemo/core/classes/mixins/adapter_mixin_strategies.py b/nemo/core/classes/mixins/adapter_mixin_strategies.py index 24069cd1098f..d552bdbd6c5f 100644 --- a/nemo/core/classes/mixins/adapter_mixin_strategies.py +++ b/nemo/core/classes/mixins/adapter_mixin_strategies.py @@ -14,6 +14,7 @@ from abc import ABC from dataclasses import dataclass +from typing import Any, Dict, List, Tuple, Union import torch @@ -51,6 +52,66 @@ def __call__(self, *args, **kwargs): return self.forward(*args, **kwargs) +class ReturnResultAdapterStrategy(AbstractAdapterStrategy): + """ + An implementation of an adapter strategy that simply returns the result of the adapter. + Supports stochastic + """ + + def forward(self, input: torch.Tensor, adapter: torch.nn.Module, *, module: 'AdapterModuleMixin'): + """ + A basic strategy, which simply returns the result of the adapter's calculation as the output. + + Args: + input: Original output tensor of the module, or the output of the previous adapter (if more than + one adapters are enabled). + adapter: The adapter module that is currently required to perform the forward pass. + module: The calling module, in its entirety. It is a module that implements `AdapterModuleMixin`, + therefore the strategy can access all other adapters in this module via `module.adapter_layer`. + + Returns: + The result tensor, after one of the active adapters has finished its forward passes. + """ + result = self.compute_output(input, adapter, module=module) + + return result + + def compute_output( + self, + input: Union[torch.Tensor, List[torch.Tensor], Tuple[torch.Tensor], Dict[str, Any]], + adapter: torch.nn.Module, + *, + module: 'AdapterModuleMixin', + ) -> torch.Tensor: + """ + Compute the output of a single adapter to some input. + + Args: + input: Original output tensor of the module, or the output of the previous adapter (if more than + one adapters are enabled). + adapter: The adapter module that is currently required to perform the forward pass. + module: The calling module, in its entirety. It is a module that implements `AdapterModuleMixin`, + therefore the strategy can access all other adapters in this module via `module.adapter_layer`. + + Returns: + The result tensor, after one of the active adapters has finished its forward passes. + """ + if isinstance(input, (list, tuple)): + out = adapter(*input) + elif isinstance(input, dict): + out = adapter(**input) + else: + out = adapter(input) + return out + + +@dataclass +class ReturnResultAdapterStrategyConfig: + _target_: str = "{0}.{1}".format( + ReturnResultAdapterStrategy.__module__, ReturnResultAdapterStrategy.__name__ + ) # mandatory field + + class ResidualAddAdapterStrategy(AbstractAdapterStrategy): """ An implementation of residual addition of an adapter module with its input. @@ -87,31 +148,90 @@ def forward(self, input: torch.Tensor, adapter: torch.nn.Module, *, module: 'Ada Returns: The result tensor, after one of the active adapters has finished its forward passes. """ - out = adapter(input) - - # Perform stochastic depth if needed. - p = self.stochastic_depth - if p < 0.0 or p > 1.0: - raise ValueError(f"Stochastic depth probability has to be between 0 and 1, but got {p}") + out = self.compute_output(input, adapter, module=module) # If not in training mode, or probability of stochastic depth is 0, skip step. + p = self.stochastic_depth if not module.training or p == 0.0: pass else: - # Apply stochastic depth to the output of adapter. - keep_prob = 1.0 - p - shape = [1] * out.ndim - noise = torch.empty(shape, dtype=input.dtype, device=input.device) - noise = noise.bernoulli_(keep_prob) - if keep_prob > 0.0: # Done to normalize activation for inference mode - noise.div_(keep_prob) - - out = noise * out + out = self.apply_stochastic_depth(out, input, adapter, module=module) # Return the residual connection output = input + adapter(input) result = input + out # If l2_lambda is activated, register the loss value + self.compute_auxiliary_losses(result, input, adapter, module=module) + + return result + + def compute_output( + self, input: torch.Tensor, adapter: torch.nn.Module, *, module: 'AdapterModuleMixin' + ) -> torch.Tensor: + """ + Compute the output of a single adapter to some input. + + Args: + input: Original output tensor of the module, or the output of the previous adapter (if more than + one adapters are enabled). + adapter: The adapter module that is currently required to perform the forward pass. + module: The calling module, in its entirety. It is a module that implements `AdapterModuleMixin`, + therefore the strategy can access all other adapters in this module via `module.adapter_layer`. + + Returns: + The result tensor, after one of the active adapters has finished its forward passes. + """ + out = adapter(input) + return out + + def apply_stochastic_depth( + self, output: torch.Tensor, input: torch.Tensor, adapter: torch.nn.Module, *, module: 'AdapterModuleMixin' + ): + """ + Compute and apply stochastic depth if probability is greater than 0. + + Args: + output: The result tensor, after one of the active adapters has finished its forward passes. + input: Original output tensor of the module, or the output of the previous adapter (if more than + one adapters are enabled). + adapter: The adapter module that is currently required to perform the forward pass. + module: The calling module, in its entirety. It is a module that implements `AdapterModuleMixin`, + therefore the strategy can access all other adapters in this module via `module.adapter_layer`. + + Returns: + The result tensor, after stochastic depth has been potentially applied to it. + """ + # Perform stochastic depth if needed. + p = self.stochastic_depth + if p < 0.0 or p > 1.0: + raise ValueError(f"Stochastic depth probability has to be between 0 and 1, but got {p}") + + # Apply stochastic depth to the output of adapter. + keep_prob = 1.0 - p + shape = [1] * output.ndim + noise = torch.empty(shape, dtype=output.dtype, device=output.device) + noise = noise.bernoulli_(keep_prob) + if keep_prob > 0.0: # Done to normalize activation for inference mode + noise.div_(keep_prob) + + output = noise * output + + return output + + def compute_auxiliary_losses( + self, output: torch.Tensor, input: torch.Tensor, adapter: torch.nn.Module, *, module: 'AdapterModuleMixin' + ): + """ + Compute any auxiliary losses and preserve it in the tensor registry. + + Args: + output: The result tensor, after one of the active adapters has finished its forward passes. + input: Original output tensor of the module, or the output of the previous adapter (if more than + one adapters are enabled). + adapter: The adapter module that is currently required to perform the forward pass. + module: The calling module, in its entirety. It is a module that implements `AdapterModuleMixin`, + therefore the strategy can access all other adapters in this module via `module.adapter_layer`. + """ if module.training and self.l2_lambda > 0.0: if not isinstance(adapter, AccessMixin): raise ValueError(f"Module {adapter.__class__.__name__} does not implement AccessMixin !") @@ -125,11 +245,9 @@ def forward(self, input: torch.Tensor, adapter: torch.nn.Module, *, module: 'Ada # if l2 lambda is enabled, also enable AccessMixin adapter.set_access_enabled(access_enabled=True) - l2_loss = self.l2_lambda * (input - result).square().reshape(input.size(0), -1).sum(dim=-1).mean() + l2_loss = self.l2_lambda * (input - output).square().reshape(input.size(0), -1).sum(dim=-1).mean() adapter.register_accessible_tensor(name='adapter_loss', tensor=l2_loss) - return result - @dataclass class ResidualAddAdapterStrategyConfig: diff --git a/nemo/core/classes/mixins/adapter_mixins.py b/nemo/core/classes/mixins/adapter_mixins.py index 7680872eb5ac..e6006e1c6c5a 100644 --- a/nemo/core/classes/mixins/adapter_mixins.py +++ b/nemo/core/classes/mixins/adapter_mixins.py @@ -14,7 +14,7 @@ from abc import ABC from dataclasses import dataclass, is_dataclass -from typing import List, Optional, Tuple, Union +from typing import List, Optional, Set, Tuple, Union import torch import torch.nn as nn @@ -150,29 +150,6 @@ class AdapterModuleMixin(ABC): adapter_global_cfg_key = "global_cfg" adapter_metadata_cfg_key = "adapter_meta_cfg" - def set_accepted_adapter_types(self, adapter_types: List[str]) -> None: - """ - The module with this mixin can define a list of adapter names that it will accept. - This method should be called in the modules init method and set the adapter names the module will expect to be added. - """ - if hasattr(self, "_accepted_adapter_types"): - raise RuntimeError("accepted adapter types can only be set once.") - self._accepted_adapter_types = [model_utils.import_class_by_path(s) for s in adapter_types] - - def get_accepted_adapter_types(self,) -> List[str]: - """ - Returns the list of accepted adapter types. - """ - if hasattr(self, '_accepted_adapter_types'): - return self._accepted_adapter_types - else: - return [] - - def get_from_adapter_layer(self, name: str): - if hasattr(self, "adapter_layer"): - return self.adapter_layer[name] if name in self.adapter_layer else None - return None - def add_adapter(self, name: str, cfg: DictConfig): """ Add an Adapter module to this module. @@ -182,18 +159,21 @@ def add_adapter(self, name: str, cfg: DictConfig): cfg: A DictConfig or Dataclass that contains at the bare minimum `__target__` to instantiate a new Adapter module. """ - _types = self.get_accepted_adapter_types() + adapter_types = self.get_accepted_adapter_types() _pass_types = False - if len(_types) > 0: + if len(adapter_types) > 0: test = model_utils.import_class_by_path(cfg._target_) - for _type in _types: + for _type in adapter_types: # TODO: (@adithyare) should revisit if subclass is the best check... if issubclass(test, _type): _pass_types = True break if not _pass_types: raise ValueError( - f"Config {cfg} creates adapter class {test} is not in the list of accepted adapter types." + f"Config: \n{OmegaConf.to_yaml(cfg)}\n" + f"It creates adapter class {test} \n" + f"that is not in the list of accepted adapter types.\n" + f"Accepted adapters: {[t for t in adapter_types]}" ) # Convert to DictConfig from dict or Dataclass @@ -306,6 +286,9 @@ def get_enabled_adapters(self) -> List[str]: if hasattr(self, 'adapter_layer'): available_module_names.update(list(self.adapter_layer.keys())) + # populate list of allowed adapter classes + adapter_types = self.get_accepted_adapter_types() + enabled_adapters = [] for name, config in self.adapter_cfg.items(): # Skip the global adapter config @@ -314,11 +297,62 @@ def get_enabled_adapters(self) -> List[str]: # If name is in the current available modules, and it is enabled in the config if name in available_module_names and self.adapter_cfg[name]['enabled']: - enabled_adapters.append(name) + # Check if type is supported (if available) and is an enabled adapter + if len(adapter_types) > 0: + module = self.get_adapter_module(name) + + for adapter_type in adapter_types: + if isinstance(module, adapter_type): + enabled_adapters.append(name) + break + + else: + # Ignore type checking and fall back to adding all enabled adapters + enabled_adapters.append(name) return enabled_adapters - # Inherited methods that dont need to be overridden + # Inherited methods that don't need to be overridden + + def get_adapter_module(self, name: str): + """ + Gets an adapter module by name if possible, otherwise returns None. + + Args: + name: A str name (resolved or not) corresponding to an Adapter. + + Returns: + An nn.Module if the name could be resolved and matched, otherwise None/ + """ + _, name = self.resolve_adapter_module_name_(name) + + if hasattr(self, "adapter_layer"): + return self.adapter_layer[name] if name in self.adapter_layer else None + return None + + def set_accepted_adapter_types(self, adapter_types: List[str]) -> None: + """ + The module with this mixin can define a list of adapter names that it will accept. + This method should be called in the modules init method and set the adapter names the module will expect to be added. + + Args: + adapter_types: A list of str paths that correspond to classes. The class paths will be instantiated to + ensure that the class path is correct. + """ + # Let user update and set accepted adapter types. + self._accepted_adapter_types = set([model_utils.import_class_by_path(s) for s in adapter_types]) + + def get_accepted_adapter_types(self,) -> Set[type]: + """ + Utility function to get the set of all classes that are accepted by the module. + + Returns: + Returns the set of accepted adapter types as classes, otherwise an empty set. + """ + if hasattr(self, '_accepted_adapter_types'): + return self._accepted_adapter_types + else: + return set([]) def unfreeze_enabled_adapters(self, freeze_batchnorm: bool = True) -> None: """ @@ -755,17 +789,25 @@ def save_adapters(self, filepath: str, name: str = None): # It must have the attribute `adapter_name`. # It must match the adapter name provided by the user. for module in self.modules(): - if ( - isinstance(module, AdapterModuleMixin) - and hasattr(module, 'adapter_name') - and module.adapter_name == adapter_name - ): + if isinstance(module, AdapterModuleMixin): # If all match, extract the state dict into a list of state dicts. # This is because one name can be shared within one model by multiple adapters bearing # a common name. This can occur when the adapter is common to a module which has multiple # layers and blocks, all of which require an adapter. - state_dict = module.state_dict() - output_dict[key].append(state_dict) + adapter_module = module.get_adapter_module(adapter_name) + if adapter_module is not None: + # If the module was found, then extract the entire adapter ModuleDict state_dict(), + # Then select only the parts of the state dict that correspond to the current adapter_name. + # This is done so that it preserves the relation ship of the module name : parameters + # inside of the state dict. + # It will be normalized in the corresponding `load_adapters()` call. + adapter_state_dict = module.adapter_layer.state_dict() + state_dict = {} + for k, v in adapter_state_dict.items(): + if adapter_name in k: + state_dict[k] = v + + output_dict[key].append(state_dict) # Preserve the binary OmegaConf dictionary of the model's adapter config output_dict['__cfg__'] = self.cfg.adapters @@ -856,12 +898,10 @@ def load_adapters(self, filepath: str, name: str = None, map_location: str = Non # between state dict and the adapters parameters will be allowed. modules_to_load = [] # type: List[torch.nn.Module] for module in self.modules(): - if ( - isinstance(module, AdapterModuleMixin) - and hasattr(module, 'adapter_name') - and module.adapter_name == adapter_name - ): - modules_to_load.append(module) + if isinstance(module, AdapterModuleMixin): + adapter_module = module.get_adapter_module(adapter_name) + if adapter_module is not None: + modules_to_load.append(adapter_module) # Assert that the number of states in the state dict matches the newly created adapter if len(adapter_state) != len(modules_to_load): @@ -873,7 +913,18 @@ def load_adapters(self, filepath: str, name: str = None, map_location: str = Non # For the pair of (adapter_state_in_checkpoint, adapter_in_model), restore the weights for state, module in zip(adapter_state, modules_to_load): - module.load_state_dict(state, strict=strict) + # Note that state is a list of multiple state dicts for 1:1 Module mapping. + # However, the state_dict keys are of the form `adapter_name.`. + # We therefore strip the `adapter_name.` part of the state dict + # And then directly load each module with its 1:1 state dict. + sub_dict = {} + for k, v in state.items(): + if adapter_name in k: + k_ = k.replace(f"{adapter_name}.", "") + sub_dict[k_] = v + + module.load_state_dict(sub_dict, strict=strict) + del sub_dict # delete the dictionaries to preserve memory for next adapter del adapter_state, modules_to_load diff --git a/nemo/core/utils/process_launcher/launcher.py b/nemo/core/utils/process_launcher/launcher.py index 5d7031cfe8a1..f0d769b24e78 100644 --- a/nemo/core/utils/process_launcher/launcher.py +++ b/nemo/core/utils/process_launcher/launcher.py @@ -274,7 +274,7 @@ def launch(launcher, job_overrides: Sequence[Sequence[str]], initial_job_idx: in if retcode is not None: # Log that the process with some ID has finished if finished_processes[proc_idx] == 0: - logging.info(f"Processed job : {len(ret) + proc_idx}") + logging.info(f"Processed job : {len(ret) + proc_idx} :: Ret code = {retcode}") finished_processes[proc_idx] = 1 diff --git a/nemo/utils/exp_manager.py b/nemo/utils/exp_manager.py index 2406d343e0f5..f5c997e4842e 100644 --- a/nemo/utils/exp_manager.py +++ b/nemo/utils/exp_manager.py @@ -12,6 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. +import glob import os import re import subprocess @@ -1042,3 +1043,29 @@ def _should_check_val_fx(self) -> bool: if self.restarting and self.global_step % self.trainer.val_check_batch == 0: return False return super()._should_check_val_fx() + + +def clean_exp_ckpt(exp_log_dir: Union[str, Path], remove_ckpt: bool = True, remove_nemo: bool = False): + """ + Helper method that removes Pytorch Lightning .ckpt files or NeMo .nemo files from the checkpoint directory + + Args: + exp_log_dir: str path to the root directory of the current experiment. + remove_ckpt: bool, whether to remove all *.ckpt files in the checkpoints directory. + remove_nemo: bool, whether to remove all *.nemo files in the checkpoints directory. + """ + exp_log_dir = str(exp_log_dir) + + if remove_ckpt: + logging.info("Deleting *.ckpt files ...") + ckpt_files = glob.glob(os.path.join(exp_log_dir, "checkpoints", "*.ckpt")) + for filepath in ckpt_files: + os.remove(filepath) + logging.info(f"Deleted file : {filepath}") + + if remove_nemo: + logging.info("Deleting *.nemo files ...") + nemo_files = glob.glob(os.path.join(exp_log_dir, "checkpoints", "*.nemo")) + for filepath in nemo_files: + os.remove(filepath) + logging.info(f"Deleted file : {filepath}") diff --git a/nemo/utils/model_utils.py b/nemo/utils/model_utils.py index cef6b75b9dec..297132ed9c65 100644 --- a/nemo/utils/model_utils.py +++ b/nemo/utils/model_utils.py @@ -16,6 +16,7 @@ import os from dataclasses import dataclass, is_dataclass from enum import Enum +from functools import lru_cache from pathlib import Path from typing import List, Optional, Tuple, Union @@ -458,6 +459,7 @@ def maybe_update_config_version(cfg: 'DictConfig'): return cfg +@lru_cache(maxsize=1024) def import_class_by_path(path: str): """ Recursive import of class by path string. diff --git a/tests/collections/asr/mixins/adapters/test_asr_adapter_mixin.py b/tests/collections/asr/mixins/adapters/test_asr_adapter_mixin.py index e2c4d70ffa30..300dfb3f96e6 100644 --- a/tests/collections/asr/mixins/adapters/test_asr_adapter_mixin.py +++ b/tests/collections/asr/mixins/adapters/test_asr_adapter_mixin.py @@ -17,12 +17,14 @@ from omegaconf import DictConfig, ListConfig, OmegaConf from nemo.collections.asr.models import ASRModel, EncDecCTCModel, EncDecRNNTModel +from nemo.collections.asr.parts.submodules.adapters import multi_head_attention_adapter_module +from nemo.collections.asr.parts.utils import adapter_utils from nemo.collections.common.parts import adapter_modules from nemo.core.classes.mixins.access_mixins import AccessMixin from nemo.core.classes.mixins.adapter_mixins import AdapterModuleMixin, get_registered_adapter from nemo.core.utils import numba_utils from nemo.core.utils.numba_utils import __NUMBA_MINIMUM_VERSION__ -from nemo.utils import config_utils +from nemo.utils import config_utils, model_utils NUMBA_RNNT_LOSS_AVAILABLE = numba_utils.numba_cpu_is_supported( __NUMBA_MINIMUM_VERSION__ @@ -96,6 +98,125 @@ def model(): return model_instance +@pytest.fixture() +def conformer_ctc_adapter(): + preprocessor = {'_target_': 'nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor'} + encoder = { + '_target_': 'nemo.collections.asr.modules.ConformerEncoderAdapter', + 'feat_in': 64, + 'feat_out': -1, + 'n_layers': 2, + 'd_model': 128, + 'subsampling': 'striding', + 'subsampling_factor': 4, + 'self_attention_model': 'rel_pos', + 'n_heads': 4, + 'conv_kernel_size': 31, + } + + decoder = { + '_target_': 'nemo.collections.asr.modules.ConvASRDecoder', + 'feat_in': 128, + 'num_classes': 28, + 'vocabulary': [ + ' ', + 'a', + 'b', + 'c', + 'd', + 'e', + 'f', + 'g', + 'h', + 'i', + 'j', + 'k', + 'l', + 'm', + 'n', + 'o', + 'p', + 'q', + 'r', + 's', + 't', + 'u', + 'v', + 'w', + 'x', + 'y', + 'z', + "'", + ], + } + modelConfig = DictConfig( + {'preprocessor': DictConfig(preprocessor), 'encoder': DictConfig(encoder), 'decoder': DictConfig(decoder)} + ) + + model_instance = EncDecCTCModel(cfg=modelConfig) + return model_instance + + +@pytest.fixture() +def squeezeformer_ctc_adapter(): + preprocessor = {'_target_': 'nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor'} + encoder = { + '_target_': 'nemo.collections.asr.modules.SqueezeformerEncoderAdapter', + 'feat_in': 64, + 'feat_out': -1, + 'n_layers': 2, + 'd_model': 128, + 'time_reduce_idx': 1, + 'subsampling': 'dw_striding', + 'subsampling_factor': 4, + 'self_attention_model': 'rel_pos', + 'n_heads': 4, + 'conv_kernel_size': 31, + } + + decoder = { + '_target_': 'nemo.collections.asr.modules.ConvASRDecoder', + 'feat_in': 128, + 'num_classes': 28, + 'vocabulary': [ + ' ', + 'a', + 'b', + 'c', + 'd', + 'e', + 'f', + 'g', + 'h', + 'i', + 'j', + 'k', + 'l', + 'm', + 'n', + 'o', + 'p', + 'q', + 'r', + 's', + 't', + 'u', + 'v', + 'w', + 'x', + 'y', + 'z', + "'", + ], + } + modelConfig = DictConfig( + {'preprocessor': DictConfig(preprocessor), 'encoder': DictConfig(encoder), 'decoder': DictConfig(decoder)} + ) + + model_instance = EncDecCTCModel(cfg=modelConfig) + return model_instance + + @pytest.fixture() def rnnt_model(): preprocessor = {'cls': 'nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor', 'params': dict({})} @@ -165,13 +286,43 @@ def rnnt_model(): return model_instance -def get_adapter_cfg(in_features=50, dim=100, norm_pos='pre'): - cfg = adapter_modules.LinearAdapterConfig(in_features=in_features, dim=dim, norm_position=norm_pos) +def get_adapter_cfg(in_features=50, dim=100, norm_pos='pre', atype='linear', **kwargs): + valid_types = ['linear', 'mha', 'relmha'] + if atype not in valid_types: + raise ValueError(f"Invalid type. Valid types = {atype}") + + if atype == 'linear': + cfg = adapter_modules.LinearAdapterConfig(in_features=in_features, dim=dim, norm_position=norm_pos) + elif atype == 'mha': + cfg = multi_head_attention_adapter_module.MultiHeadAttentionAdapterConfig( + n_head=kwargs.get('n_head', 1), n_feat=in_features + ) + elif atype == 'relmha': + cfg = multi_head_attention_adapter_module.RelPositionMultiHeadAttentionAdapterConfig( + n_head=kwargs.get('n_head', 1), n_feat=in_features + ) + + print(cfg._target_) + cfg = OmegaConf.structured(cfg) return cfg class TestASRAdapterMixin: + @pytest.mark.unit + def test_class_paths_are_correct(self): + # Resolve all object names in module + obj_keys = list(dir(adapter_utils)) + for key in obj_keys: + if 'CLASSPATH' in key: + classpath = getattr(adapter_utils, key) + # This will raise import error if it fails + _ = model_utils.import_class_by_path(classpath) + + # Try getting thmulti_head_attention_adapter_module.pye config of the class + config_path = classpath + "Config" + _ = model_utils.import_class_by_path(config_path) + @pytest.mark.unit def test_asr_model_constructor(self, model): original_num_params = model.num_weights @@ -180,6 +331,27 @@ def test_asr_model_constructor(self, model): new_num_params = model.num_weights assert new_num_params > original_num_params + @pytest.mark.unit + def test_asr_model_constructor_mha_adapter(self, model): + with pytest.raises(ValueError): + model.add_adapter(name='adapter_0', cfg=get_adapter_cfg(atype='mha')) + + @pytest.mark.unit + def test_conformer_constructor_mha_adapter(self, conformer_ctc_adapter): + original_num_params = conformer_ctc_adapter.num_weights + + conformer_ctc_adapter.add_adapter(name='adapter_0', cfg=get_adapter_cfg(atype='relmha')) + new_num_params = conformer_ctc_adapter.num_weights + assert new_num_params > original_num_params + + @pytest.mark.unit + def test_squeezeformer_constructor_mha_adapter(self, squeezeformer_ctc_adapter): + original_num_params = squeezeformer_ctc_adapter.num_weights + + squeezeformer_ctc_adapter.add_adapter(name='adapter_0', cfg=get_adapter_cfg(atype='relmha')) + new_num_params = squeezeformer_ctc_adapter.num_weights + assert new_num_params > original_num_params + @pytest.mark.unit def test_asr_model_constructor_encoder_module(self, model): original_num_params = model.num_weights @@ -263,6 +435,38 @@ def test_asr_forward_linear_post(self, model, name): assert torch.mean(torch.abs(origial_output - new_output)) < 1e-5 + @pytest.mark.unit + @pytest.mark.parametrize('name', ['adapter_0', 'encoder:adapter_0']) + def test_conformer_forward_mha(self, conformer_ctc_adapter, name): + conformer_ctc_adapter.eval() + torch.random.manual_seed(0) + input_signal = torch.randn(2, 512) + input_signal_length = torch.tensor([512, 512], dtype=torch.int32) + + origial_output = conformer_ctc_adapter(input_signal=input_signal, input_signal_length=input_signal_length)[0] + + conformer_ctc_adapter.add_adapter(name=name, cfg=get_adapter_cfg(in_features=128, atype='mha')) + new_output = conformer_ctc_adapter(input_signal=input_signal, input_signal_length=input_signal_length)[0] + + assert torch.mean(torch.abs(origial_output - new_output)) < 1e-5 + + @pytest.mark.unit + @pytest.mark.parametrize('name', ['adapter_0', 'encoder:adapter_0']) + def test_squeezeformer_forward_mha(self, squeezeformer_ctc_adapter, name): + squeezeformer_ctc_adapter.eval() + torch.random.manual_seed(0) + input_signal = torch.randn(2, 512) + input_signal_length = torch.tensor([512, 512], dtype=torch.int32) + + origial_output = squeezeformer_ctc_adapter(input_signal=input_signal, input_signal_length=input_signal_length)[ + 0 + ] + + squeezeformer_ctc_adapter.add_adapter(name=name, cfg=get_adapter_cfg(in_features=128, atype='mha')) + new_output = squeezeformer_ctc_adapter(input_signal=input_signal, input_signal_length=input_signal_length)[0] + + assert torch.mean(torch.abs(origial_output - new_output)) < 1e-5 + @pytest.mark.unit @pytest.mark.parametrize('name1', ['adapter_0', 'encoder:adapter_0', 'decoder:adapter_0']) @pytest.mark.parametrize('name2', ['adapter_1', 'encoder:adapter_1', 'decoder:adapter_1']) diff --git a/tests/collections/asr/mixins/adapters/test_asr_adapter_modules.py b/tests/collections/asr/mixins/adapters/test_asr_adapter_modules.py new file mode 100644 index 000000000000..2637e33ebd2a --- /dev/null +++ b/tests/collections/asr/mixins/adapters/test_asr_adapter_modules.py @@ -0,0 +1,227 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pytest +import torch + +from nemo.collections.asr.parts.submodules import adapters as adapter_modules +from nemo.core.classes.mixins import adapter_mixin_strategies +from nemo.utils import config_utils + + +def _create_masks(att_mask, max_audio_length, padding_length): + # pad_mask is the masking to be used to ignore paddings + pad_mask = torch.arange(0, max_audio_length).expand(padding_length.size(0), -1) < padding_length.unsqueeze(-1) + + # pad_mask_for_att_mask is the mask which helps to ignore paddings + pad_mask_for_att_mask = pad_mask.unsqueeze(1).repeat([1, max_audio_length, 1]) + pad_mask_for_att_mask = torch.logical_and(pad_mask_for_att_mask, pad_mask_for_att_mask.transpose(1, 2)) + # att_mask is the masking to be used by the MHA layers to ignore the tokens not supposed to be visible + att_mask = att_mask[:, :max_audio_length, :max_audio_length] + # paddings should also get ignored, so pad_mask_for_att_mask is used to ignore their corresponding scores + att_mask = torch.logical_and(pad_mask_for_att_mask, att_mask.to(pad_mask_for_att_mask.device)) + + pad_mask = ~pad_mask + att_mask = ~att_mask + + return pad_mask, att_mask + + +def get_mask(lengths: torch.Tensor): + max_seq_len = lengths.max() + att_mask = torch.ones(1, max_seq_len, max_seq_len, dtype=torch.bool) + + pad_mask, att_mask = _create_masks(att_mask, max_seq_len, lengths) + return pad_mask, att_mask + + +class TestASRAdapterModules: + @pytest.mark.unit + def test_mha_adapter_config(self): + IGNORED_ARGS = ['_target_'] + + result = config_utils.assert_dataclass_signature_match( + adapter_modules.MultiHeadAttentionAdapter, + adapter_modules.MultiHeadAttentionAdapterConfig, + ignore_args=IGNORED_ARGS, + ) + + signatures_match, cls_subset, dataclass_subset = result + + assert signatures_match + assert cls_subset is None + assert dataclass_subset is None + + @pytest.mark.unit + def test_relpos_mha_adapter_config(self): + IGNORED_ARGS = ['_target_'] + + result = config_utils.assert_dataclass_signature_match( + adapter_modules.RelPositionMultiHeadAttentionAdapter, + adapter_modules.RelPositionMultiHeadAttentionAdapterConfig, + ignore_args=IGNORED_ARGS, + ) + + signatures_match, cls_subset, dataclass_subset = result + + assert signatures_match + assert cls_subset is None + assert dataclass_subset is None + + @pytest.mark.unit + def test_abs_pos_encoding_adapter_config(self): + IGNORED_ARGS = ['_target_'] + + result = config_utils.assert_dataclass_signature_match( + adapter_modules.PositionalEncodingAdapter, + adapter_modules.PositionalEncodingAdapterConfig, + ignore_args=IGNORED_ARGS, + ) + + signatures_match, cls_subset, dataclass_subset = result + + assert signatures_match + assert cls_subset is None + assert dataclass_subset is None + + @pytest.mark.unit + def test_rel_pos_encoding_adapter_config(self): + IGNORED_ARGS = ['_target_'] + + result = config_utils.assert_dataclass_signature_match( + adapter_modules.RelPositionalEncodingAdapter, + adapter_modules.RelPositionalEncodingAdapterConfig, + ignore_args=IGNORED_ARGS, + ) + + signatures_match, cls_subset, dataclass_subset = result + + assert signatures_match + assert cls_subset is None + assert dataclass_subset is None + + @pytest.mark.unit + @pytest.mark.parametrize('n_head', [1, 2, 10]) + @pytest.mark.parametrize('proj_dim', [None, -1]) + def test_mha_adapter_init(self, n_head, proj_dim): + torch.random.manual_seed(0) + x = torch.randn(2, 32, 50) + lengths = torch.randint(1, x.size(1), size=(x.size(0),)) + lengths[torch.randint(0, x.size(0), size=(1,))[0]] = x.size(1) + + adapter = adapter_modules.MultiHeadAttentionAdapter( + n_head=n_head, n_feat=50, dropout_rate=0.0, proj_dim=proj_dim + ) + + pad_mask, att_mask = get_mask(lengths) + + with torch.no_grad(): + assert adapter.linear_out.weight.sum() == 0 + if hasattr(adapter.linear_out, 'bias') and adapter.linear_out.bias is not None: + assert adapter.linear_out.bias.sum() == 0 + + out = adapter(x, x, x, att_mask) + assert out.sum().abs() <= 1e-8 + assert out.shape == x.shape + + @pytest.mark.unit + @pytest.mark.parametrize('n_head', [1, 2, 10]) + @pytest.mark.parametrize('proj_dim', [None, -1]) + def test_relmha_adapter_init(self, n_head, proj_dim): + torch.random.manual_seed(0) + x = torch.randn(2, 32, 50) + lengths = torch.randint(1, x.size(1), size=(x.size(0),)) + lengths[torch.randint(0, x.size(0), size=(1,))[0]] = x.size(1) + + adapter = adapter_modules.RelPositionMultiHeadAttentionAdapter( + n_head=n_head, n_feat=50, dropout_rate=0.0, proj_dim=proj_dim + ) + relpos_enc = adapter_modules.RelPositionalEncodingAdapter(d_model=50) + + pad_mask, att_mask = get_mask(lengths) + relpos_enc.extend_pe(lengths.max(), device='cpu') + + with torch.no_grad(): + assert adapter.linear_out.weight.sum() == 0 + if hasattr(adapter.linear_out, 'bias') and adapter.linear_out.bias is not None: + assert adapter.linear_out.bias.sum() == 0 + + _, pos_emb = relpos_enc(x) + out = adapter(x, x, x, att_mask, pos_emb) + assert out.sum().abs() <= 1e-8 + assert out.shape == x.shape + + @pytest.mark.unit + def test_abspos_encoding_init(self): + torch.random.manual_seed(0) + x = torch.randn(2, 32, 50) + lengths = torch.randint(1, x.size(1), size=(x.size(0),)) + lengths[torch.randint(0, x.size(0), size=(1,))[0]] = x.size(1) + + relpos_enc = adapter_modules.PositionalEncodingAdapter(d_model=50) + + relpos_enc.extend_pe(lengths.max(), device='cpu') + + with torch.no_grad(): + out, pos_emb = relpos_enc(x) + assert (out - pos_emb - x).sum().abs() <= 1e-5 + assert out.shape == x.shape + + @pytest.mark.unit + def test_relpos_encoding_init(self): + torch.random.manual_seed(0) + x = torch.randn(2, 32, 50) + lengths = torch.randint(1, x.size(1), size=(x.size(0),)) + lengths[torch.randint(0, x.size(0), size=(1,))[0]] = x.size(1) + + relpos_enc = adapter_modules.RelPositionalEncodingAdapter(d_model=50) + + relpos_enc.extend_pe(lengths.max(), device='cpu') + + with torch.no_grad(): + out, pos_emb = relpos_enc(x) + assert (out - x).sum().abs() <= 1e-8 + assert out.shape == x.shape + + @pytest.mark.unit + def test_mha_adapter_strategy(self): + adapter = adapter_modules.MultiHeadAttentionAdapter(n_head=1, n_feat=50, dropout_rate=0.0) + assert hasattr(adapter, 'adapter_strategy') + assert adapter.adapter_strategy is not None + # assert default strategy is set + assert isinstance(adapter.adapter_strategy, adapter_modules.MHAResidualAddAdapterStrategy) + + @pytest.mark.unit + def test_relpos_mha_adapter_strategy(self): + adapter = adapter_modules.RelPositionMultiHeadAttentionAdapter(n_head=1, n_feat=50, dropout_rate=0.0) + assert hasattr(adapter, 'adapter_strategy') + assert adapter.adapter_strategy is not None + # assert default strategy is set + assert isinstance(adapter.adapter_strategy, adapter_modules.MHAResidualAddAdapterStrategy) + + @pytest.mark.unit + def test_abspos_encoding_adapter_strategy(self): + adapter = adapter_modules.PositionalEncodingAdapter(d_model=50) + assert hasattr(adapter, 'adapter_strategy') + assert adapter.adapter_strategy is not None + # assert default strategy is set + assert isinstance(adapter.adapter_strategy, adapter_mixin_strategies.ReturnResultAdapterStrategy) + + @pytest.mark.unit + def test_relpos_encoding_adapter_strategy(self): + adapter = adapter_modules.RelPositionalEncodingAdapter(d_model=50) + assert hasattr(adapter, 'adapter_strategy') + assert adapter.adapter_strategy is not None + # assert default strategy is set + assert isinstance(adapter.adapter_strategy, adapter_mixin_strategies.ReturnResultAdapterStrategy) diff --git a/tests/collections/common/mixins/test_adapter_modules.py b/tests/collections/common/mixins/test_adapter_modules.py index 48ebf03f4ff4..ea6eede8f842 100644 --- a/tests/collections/common/mixins/test_adapter_modules.py +++ b/tests/collections/common/mixins/test_adapter_modules.py @@ -48,7 +48,7 @@ def test_linear_adapter_init(self): assert adapter.module[-1].bias.sum() == 0 out = adapter(x) - assert out.sum() <= 1e-8 + assert out.sum().abs() <= 1e-8 @pytest.mark.unit def test_linear_adapter_dropout(self): @@ -63,7 +63,7 @@ def test_linear_adapter_dropout(self): assert adapter.module[-1].bias.sum() == 0 out = adapter(x) - assert out.sum() <= 1e-8 + assert out.sum().abs() <= 1e-8 @pytest.mark.unit @pytest.mark.parametrize('norm_position', ['pre', 'post']) @@ -79,7 +79,7 @@ def test_linear_adapter_norm_position(self, norm_position): assert adapter.module[-1].bias.sum() == 0 out = adapter(x) - assert out.sum() <= 1e-8 + assert out.sum().abs() <= 1e-8 @pytest.mark.unit def test_linear_adapter_strategy(self): diff --git a/tests/core/mixins/adapters/test_adapter_model_mixin.py b/tests/core/mixins/adapters/test_adapter_model_mixin.py index 320c0ce4487b..87c6b4e4cfb3 100644 --- a/tests/core/mixins/adapters/test_adapter_model_mixin.py +++ b/tests/core/mixins/adapters/test_adapter_model_mixin.py @@ -933,6 +933,83 @@ def test_multiple_decoder_save_load_adapter_only_exact_name(self): for ogkey, newkey in zip(original_state_dict.keys(), restored_state_dict.keys()): assert (original_state_dict[ogkey] - restored_state_dict[newkey]).abs().mean() < 1e-6 + @pytest.mark.unit + @pytest.mark.parametrize("decoder", ["adapter_0",]) # "decoder:adapter_0" + @pytest.mark.parametrize("encoder", ["adapter_1",]) # "encoder:adapter_1" + def test_multiple_save_load_adapter_with_multiple_load(self, decoder, encoder): + # create a model config, but do not add global_cfg to it + # we want to test just module level adapter + cfg = get_model_config(in_features=50) + + model = DefaultAdapterModel(cfg) + original_num_params = model.num_weights + + model.add_adapter(name=decoder, cfg=get_adapter_cfg(dim=5)) + model.add_adapter(name=encoder, cfg=get_adapter_cfg(dim=5)) + new_num_params = model.num_weights + assert new_num_params > original_num_params + + assert len(model.get_enabled_adapters()) == 2 + + # save restore test + with tempfile.TemporaryDirectory() as outer_tmpdir: + with tempfile.TemporaryDirectory() as tmpdir: + adapter_path = os.path.join(tmpdir, 'temp.pt') + adapter_path_2 = os.path.join(tmpdir, 'temp-2.pt') + print("saving adapter ", decoder) + model.save_adapters(adapter_path, name=decoder) + print("Saving adapter ", encoder) + model.save_adapters(adapter_path_2, name=encoder) + + model_path = os.path.join('temp.nemo') + model.save_to(model_path) + + shutil.move(adapter_path, outer_tmpdir) + shutil.move(adapter_path_2, outer_tmpdir) + shutil.move(model_path, outer_tmpdir) + + outer_adapter_path = os.path.join(outer_tmpdir, 'temp.pt') + outer_adapter_path_2 = os.path.join(outer_tmpdir, 'temp-2.pt') + outer_model_path = os.path.join(outer_tmpdir, 'temp.nemo') + + # Assert size of this params + adapter_filesize = os.path.getsize(outer_adapter_path) + adapter_2_filesize = os.path.getsize(outer_adapter_path_2) + model_filesize = os.path.getsize(outer_model_path) + + assert model_filesize > adapter_filesize + assert model_filesize > adapter_2_filesize + + # restore adapter to new model (without any decoder adapter) + new_model = DefaultAdapterModel(cfg) + new_model.load_adapters(outer_adapter_path, name=decoder) + # Seperately load another adapter after the first one + new_model.load_adapters(outer_adapter_path_2, name=encoder) + + assert isinstance(new_model, AdapterModelPTMixin) + assert len(new_model.get_enabled_adapters()) > 0 + assert model.num_weights == new_model.num_weights # the new model has only one adapter not both + assert new_model.encoder.is_adapter_available() is True # encoder adaper is available in new model + + adapter_cfg = new_model.cfg.adapters + meta_cfg = adapter_cfg[model.adapter_global_cfg_key][model.adapter_metadata_cfg_key] + modules_cfg = meta_cfg['modules'] + + assert modules_cfg is not None + enabled_adapters = new_model.get_enabled_adapters() + assert len(enabled_adapters) == 2 + + if "decoder:" in decoder: + original_state_dict = model.decoder.adapter_layer.state_dict() + restored_state_dict = new_model.decoder.adapter_layer.state_dict() + else: + # Default adapter position is on encoder + original_state_dict = model.encoder.adapter_layer.state_dict() + restored_state_dict = new_model.encoder.adapter_layer.state_dict() + + for ogkey, newkey in zip(original_state_dict.keys(), restored_state_dict.keys()): + assert (original_state_dict[ogkey] - restored_state_dict[newkey]).abs().mean() < 1e-6 + @pytest.mark.unit def test_multiple_decoder_save_load_adapter_dual_name(self): # create a model config, but do not add global_cfg to it From 199ff16cdc6ae60370075909e5dd9cd599e6cc00 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 12 Dec 2022 16:07:35 -0800 Subject: [PATCH 031/154] Remove unused TTS eval functions w/ pesq and pystoi dependencies (#5605) (#5606) Signed-off-by: Jocelyn Huang Signed-off-by: Jocelyn Huang Signed-off-by: Jocelyn Huang Co-authored-by: Jocelyn Signed-off-by: Elena Rastorgueva --- nemo/collections/tts/helpers/helpers.py | 34 ------------------------- requirements/requirements_tts.txt | 2 -- 2 files changed, 36 deletions(-) diff --git a/nemo/collections/tts/helpers/helpers.py b/nemo/collections/tts/helpers/helpers.py index 23941f6215e8..c7235780bb45 100644 --- a/nemo/collections/tts/helpers/helpers.py +++ b/nemo/collections/tts/helpers/helpers.py @@ -51,8 +51,6 @@ import torch from numba import jit, prange from numpy import ndarray -from pesq import pesq -from pystoi import stoi from nemo.collections.tts.torch.tts_data_types import DATA_STR2DATA_CLASS, MAIN_DATA_TYPES, WithLens from nemo.utils import logging @@ -480,38 +478,6 @@ def remove(conv_list): return new_conv_list -def eval_tts_scores( - y_clean: ndarray, y_est: ndarray, T_ys: Sequence[int] = (0,), sampling_rate=22050 -) -> Dict[str, float]: - """ - calculate metric using EvalModule. y can be a batch. - Args: - y_clean: real audio - y_est: estimated audio - T_ys: length of the non-zero parts of the histograms - sampling_rate: The used Sampling rate. - - Returns: - A dictionary mapping scoring systems (string) to numerical scores. - 1st entry: 'STOI' - 2nd entry: 'PESQ' - """ - - if y_clean.ndim == 1: - y_clean = y_clean[np.newaxis, ...] - y_est = y_est[np.newaxis, ...] - if T_ys == (0,): - T_ys = (y_clean.shape[1],) * y_clean.shape[0] - - clean = y_clean[0, : T_ys[0]] - estimated = y_est[0, : T_ys[0]] - stoi_score = stoi(clean, estimated, sampling_rate, extended=False) - pesq_score = pesq(16000, np.asarray(clean), estimated, 'wb') - ## fs was set 16,000, as pesq lib doesnt currently support felxible fs. - - return {'STOI': stoi_score, 'PESQ': pesq_score} - - def regulate_len(durations, enc_out, pace=1.0, mel_max_len=None): """A function that takes predicted durations per encoded token, and repeats enc_out according to the duration. NOTE: durations.shape[1] == enc_out.shape[1] diff --git a/requirements/requirements_tts.txt b/requirements/requirements_tts.txt index 4b2a4afe3e0f..5a611fb34380 100644 --- a/requirements/requirements_tts.txt +++ b/requirements/requirements_tts.txt @@ -5,6 +5,4 @@ librosa matplotlib nltk pandas -pesq pypinyin -pystoi From 675ec6b732ed4858d7bb33f53e3fa6db7fbbdeac Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Mon, 12 Dec 2022 17:25:16 -0800 Subject: [PATCH 032/154] Create separator parameter Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 2 +- tools/nemo_forced_aligner/align.py | 42 ++++++++++++++--------- tools/nemo_forced_aligner/utils.py | 53 +++++++++++++++++++++-------- 3 files changed, 65 insertions(+), 32 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index fc29a7633f78..b8cbcdf2fea1 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -29,7 +29,7 @@ Call the align function in align.py, specifying the parameters as follows: * [OPTIONAL] `model_downsample_factor`: the downsample factor of the ASR model. It should be 2 if your model is QuartzNet, 4 if it is Conformer CTC, 8 if it is Citrinet (Default: 8, to match with the default model_name "stt_en_citrinet_1024_gamma_0_25"). -* [OPTIONAL] `grouping_for_ctm`: A string, either 'word' or 'basetoken'. 'basetoken' can mean either a token or character, depending on the model used for alignment. If you select 'basetoken' - the code will output a CTM with alignments at the token/character level. If you select 'word' - the code will group the tokens/characters into words (Default: "word"). +* [OPTIONAL] `separator`: the string used to separate CTM segments. If the separator is `“”` (empty string), the CTM segments will be the tokens used by the ASR model. If the separator is anything else, e.g. `“ “`, `“|”` or `“”`, the segments will be the blocks of text separated by that separator. (Default: `“ “`, so for languages such as English, the CTM segments will be words.) * [OPTIONAL] `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be stripped away from the utt_id, so as not to change the number of space-separated elements in the CTM files. diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 96b75a7a5ecf..19b5030783e5 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -95,8 +95,8 @@ def viterbi_decoding(log_probs, y, T, U): return alignments -def align_batch(data, model): - log_probs, y, T, U = get_log_probs_y_T_U(data, model) +def align_batch(data, model, separator): + log_probs, y, T, U = get_log_probs_y_T_U(data, model, separator) alignments = viterbi_decoding(log_probs, y, T, U) return alignments @@ -107,7 +107,7 @@ def align( output_ctm_folder: str, model_name: str = "stt_en_citrinet_1024_gamma_0_25", model_downsample_factor: int = 8, - grouping_for_ctm: str = "word", + separator: str = " ", n_parts_for_ctm_id: int = 1, audio_sr: int = 16000, device: str = "cpu", @@ -132,12 +132,18 @@ def align( If the ASR model is a QuartzNet model, its downsample factor is 2. If the ASR model is a Conformer CTC model, its downsample factor is 4. If the ASR model is a Citirnet model, its downsample factor is 8. - grouping_for_ctm: a string, either 'word' or 'basetoken'. - If 'basetoken', the CTM files will contain timestamps for either the tokens or - characters in the ground truth (depending on whether it is a token-based or - character-based model). - If 'word', the basetokens will be grouped into words, and the CTM files - will contain word timestamps instead of basetoken timestamps. + separator: the string used to separate CTM segments. + If the separator is “”, the CTM segments will be the tokens used by the ASR model. + If the separator is anything else, e.g. “ “, “|” or “”, the segments will be the blocks of + text separated by that separator. + Default: “ “, ie for languages such as English, the CTM segments will be words. + Note: if the separator is not “” or “ “, it will be removed from the CTM, ie it is treated as a marker + which is not part of the ground truth. It will essentially be treated as a space, and any additional spaces + around it will be amalgamated into one, i.e. the following texts will be treated as equivalent: + “abc|def” + “abc |def” + “abc| def” + “abc | def” n_parts_for_ctm_id: int specifying how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. Note also that any spaces that are present in the audio_filepath @@ -179,19 +185,23 @@ def align( for start, end in zip(starts, ends): data = get_manifest_lines(manifest_filepath, start, end) - alignments = align_batch(data, model) + alignments = align_batch(data, model, separator) - if grouping_for_ctm == "basetoken": + if separator == "": make_basetoken_ctm( data, alignments, model, model_downsample_factor, output_ctm_folder, n_parts_for_ctm_id, audio_sr, ) - elif grouping_for_ctm == "word": + else: make_word_ctm( - data, alignments, model, model_downsample_factor, output_ctm_folder, n_parts_for_ctm_id, audio_sr, + data, + alignments, + model, + model_downsample_factor, + output_ctm_folder, + n_parts_for_ctm_id, + audio_sr, + separator, ) - else: - raise ValueError(f"Unexpected value for grouping_for_ctm: {grouping_for_ctm}") - return None diff --git a/tools/nemo_forced_aligner/utils.py b/tools/nemo_forced_aligner/utils.py index f3174c9bb963..daf70059a84e 100644 --- a/tools/nemo_forced_aligner/utils.py +++ b/tools/nemo_forced_aligner/utils.py @@ -36,6 +36,26 @@ def get_manifest_lines(manifest_filepath, start, end): return data +def get_processed_text_and_segments(text, separator): + """ + If the separator is not empty string, ie CTM segments will not just be tokens, + this function replaces the separator with a space, amalgamates any spaces around + it into one, and returns the new processed text as well as a list of strings + containing the segments to be returned in the CTM. + """ + if separator == "": + return text, None + + segments = text.split(separator) + + # remove any spaces at start and end of segments + segments = [seg.strip() for seg in segments] + + processed_text = " ".join(segments) + + return processed_text, segments + + def tokenize_text(model, text): """ Works for token-based and character-based models """ if hasattr(model, "tokenizer"): @@ -56,7 +76,7 @@ def tokenize_text(model, text): raise RuntimeError("Can't tokenize this model atm") -def get_log_probs_y_T_U(data, model): +def get_log_probs_y_T_U(data, model, separator): """ Preparing some tensors to be used during Viterbi decoding. Returns: @@ -78,7 +98,8 @@ def get_log_probs_y_T_U(data, model): y_list = [] U_list = [] for line in data: - y_line = tokenize_text(model, line["text"]) + processed_text, _ = get_processed_text_and_segments(line['text'], separator) + y_line = tokenize_text(model, processed_text) y_list.append(y_line) U_list.append(len(y_line)) @@ -198,7 +219,7 @@ def make_basetoken_ctm( def make_word_ctm( - data, alignments, model, model_downsample_factor, output_ctm_folder, n_parts_for_ctm_id, audio_sr, + data, alignments, model, model_downsample_factor, output_ctm_folder, n_parts_for_ctm_id, audio_sr, separator ): """ Note: assume order of utts in data matches order of utts in alignments @@ -220,26 +241,29 @@ def make_word_ctm( # ] words_info = [{"word": "", "u_start": 0, "u_end": 0, "t_start": 0, "t_end": None}] u_counter = 1 - if " " not in model.decoder.vocabulary: - for word in manifest_line["text"].split(" "): + _, segments = get_processed_text_and_segments(manifest_line['text'], separator) + if separator not in model.decoder.vocabulary: + for word in segments: word_info = { - "word": word, + "word": word.replace(" ", ""), "u_start": u_counter, "u_end": None, "t_start": None, "t_end": None, } word_tokens = tokenize_text(model, word) - word_info["u_end"] = word_info["u_start"] + 2 * len(word_tokens) - 2 + word_info["u_end"] = word_info["u_start"] + 2 * (len(word_tokens) - 1) + words_info.append(word_info) + u_counter += 2 * len(word_tokens) - words_info.append(word_info) + if " " in model.decoder.vocabulary: + u_counter += 2 else: - - for word_i, word in enumerate(manifest_line["text"].split(" ")): + for word_i, word in enumerate(segments): word_info = { - "word": word, + "word": word.replace(" ", ""), "u_start": u_counter, "u_end": None, "t_start": None, @@ -247,15 +271,15 @@ def make_word_ctm( } word_tokens = tokenize_text(model, word) - word_info["u_end"] = word_info["u_start"] + 2 * len(word_tokens) - 2 + word_info["u_end"] = word_info["u_start"] + 2 * (len(word_tokens) - 1) u_counter += 2 * len(word_tokens) words_info.append(word_info) - if word_i < len(manifest_line["text"].split(" ")) - 1: + if word_i < len(manifest_line["text"].split(separator)) - 1: # add the space after every word except the final word word_info = { - "word": "", + "word": separator.replace(" ", ""), "u_start": u_counter, "u_end": None, "t_start": None, @@ -297,7 +321,6 @@ def make_word_ctm( for word_info in words_info: if not (word_info["word"] == "initial_silence" and word_info["t_end"] is None): word = word_info["word"] - # print('word_info', word_info) start_sample = word_info["t_start"] * timestep_to_sample_ratio end_sample = (word_info["t_end"] + 1) * timestep_to_sample_ratio - 1 From ad4fd5cfef711027faad6525e020af1a0af66320 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Mon, 12 Dec 2022 17:59:43 -0800 Subject: [PATCH 033/154] Call align function with hydra config Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 171 +++++++++++++++++------------ 1 file changed, 100 insertions(+), 71 deletions(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 19b5030783e5..440c8da62aab 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -12,14 +12,80 @@ # See the License for the specific language governing permissions and # limitations under the License. +from dataclasses import dataclass +from typing import Optional + import torch from utils import get_log_probs_y_T_U, get_manifest_lines, make_basetoken_ctm, make_word_ctm from nemo.collections.asr.models import ASRModel from nemo.collections.asr.models.ctc_models import EncDecCTCModel +from nemo.core.config import hydra_runner V_NEG_NUM = -1e30 +""" +Align the utterances in manifest_filepath. +Results are saved in ctm files in output_ctm_folder. + +Arguments: + manifest_filepath: filepath to the manifest of the data you want to align, + containing 'audio_filepath' and 'text' fields. + output_ctm_folder: the folder where output CTM files will be saved. + model_name: string specifying a NeMo ASR model to use for generating the log-probs + which we will use to do alignment. + If the string ends with '.nemo', the code will treat `model_name` as a local filepath + from which it will attempt to restore a NeMo model. + If the string does not end with '.nemo', the code will attempt to download a model + with name `model_name` from NGC. + model_downsample_factor: an int indicating the downsample factor of the ASR model, ie the ratio of input + timesteps to output timesteps. + If the ASR model is a QuartzNet model, its downsample factor is 2. + If the ASR model is a Conformer CTC model, its downsample factor is 4. + If the ASR model is a Citirnet model, its downsample factor is 8. + separator: the string used to separate CTM segments. + If the separator is “”, the CTM segments will be the tokens used by the ASR model. + If the separator is anything else, e.g. “ “, “|” or “”, the segments will be the blocks of + text separated by that separator. + Default: “ “, ie for languages such as English, the CTM segments will be words. + Note: if the separator is not “” or “ “, it will be removed from the CTM, ie it is treated as a marker + which is not part of the ground truth. It will essentially be treated as a space, and any additional spaces + around it will be amalgamated into one, i.e. the following texts will be treated as equivalent: + “abc|def” + “abc |def” + “abc| def” + “abc | def” + n_parts_for_ctm_id: int specifying how many of the 'parts' of the audio_filepath + we will use (starting from the final part of the audio_filepath) to determine the + utt_id that will be used in the CTM files. Note also that any spaces that are present in the audio_filepath + will be stripped away from the utt_id, so as not to change the number of space-separated elements in the + CTM files. + e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 1 => utt_id will be "e1" + e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 2 => utt_id will be "d_e1" + e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 3 => utt_id will be "c_d_e1" + audio_sr: int specifying the sample rate of your audio files. + device: string specifying the device that will be used for generating log-probs and doing + Viterbi decoding. The string needs to be in a format recognized by torch.device() + batch_size: int specifying batch size that will be used for generating log-probs and doing Viterbi decoding. + +""" + + +@dataclass +class AlignmentConfig: + # Required configs + manifest_filepath: Optional[str] = None + output_ctm_folder: Optional[str] = None + + # General configs + model_name: str = "stt_en_citrinet_1024_gamma_0_25" + model_downsample_factor: int = 8 + separator: str = " " + n_parts_for_ctm_id: int = 1 + audio_sr: int = 16000 + device: str = "cpu" + batch_size: int = 1 + def viterbi_decoding(log_probs, y, T, U): """ @@ -102,94 +168,53 @@ def align_batch(data, model, separator): return alignments -def align( - manifest_filepath: str, - output_ctm_folder: str, - model_name: str = "stt_en_citrinet_1024_gamma_0_25", - model_downsample_factor: int = 8, - separator: str = " ", - n_parts_for_ctm_id: int = 1, - audio_sr: int = 16000, - device: str = "cpu", - batch_size: int = 1, -): - """ - Function that does alignment of utterances in manifest_filepath. - Results are saved in ctm files in output_ctm_folder. - - Args: - manifest_filepath: filepath to the manifest of the data you want to align, - containing 'audio_filepath' and 'text' fields. - output_ctm_folder: the folder where output CTM files will be saved. - model_name: string specifying a NeMo ASR model to use for generating the log-probs - which we will use to do alignment. - If the string ends with '.nemo', the code will treat `model_name` as a local filepath - from which it will attempt to restore a NeMo model. - If the string does not end with '.nemo', the code will attempt to download a model - with name `model_name` from NGC. - model_downsample_factor: an int indicating the downsample factor of the ASR model, ie the ratio of input - timesteps to output timesteps. - If the ASR model is a QuartzNet model, its downsample factor is 2. - If the ASR model is a Conformer CTC model, its downsample factor is 4. - If the ASR model is a Citirnet model, its downsample factor is 8. - separator: the string used to separate CTM segments. - If the separator is “”, the CTM segments will be the tokens used by the ASR model. - If the separator is anything else, e.g. “ “, “|” or “”, the segments will be the blocks of - text separated by that separator. - Default: “ “, ie for languages such as English, the CTM segments will be words. - Note: if the separator is not “” or “ “, it will be removed from the CTM, ie it is treated as a marker - which is not part of the ground truth. It will essentially be treated as a space, and any additional spaces - around it will be amalgamated into one, i.e. the following texts will be treated as equivalent: - “abc|def” - “abc |def” - “abc| def” - “abc | def” - n_parts_for_ctm_id: int specifying how many of the 'parts' of the audio_filepath - we will use (starting from the final part of the audio_filepath) to determine the - utt_id that will be used in the CTM files. Note also that any spaces that are present in the audio_filepath - will be stripped away from the utt_id, so as not to change the number of space-separated elements in the - CTM files. - e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 1 => utt_id will be "e1" - e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 2 => utt_id will be "d_e1" - e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 3 => utt_id will be "c_d_e1" - audio_sr: int specifying the sample rate of your audio files. - device: string specifying the device that will be used for generating log-probs and doing - Viterbi decoding. The string needs to be in a format recognized by torch.device() - batch_size: int specifying batch size that will be used for generating log-probs and doing Viterbi decoding. +@hydra_runner(config_name="AlignmentConfig", schema=AlignmentConfig) +def main(cfg: AlignmentConfig): - """ + # Validate config + if cfg.manifest_filepath is None: + raise ValueError("cfg.manifest_filepath needs to be specified") + + if cfg.output_ctm_folder is None: + raise ValueError("cfg.output_ctm_folder needs to be specified") # load model - device = torch.device(device) - if model_name.endswith('.nemo'): - model = ASRModel.restore_from(model_name, map_location=device) + device = torch.device(cfg.device) + if cfg.model_name.endswith('.nemo'): + model = ASRModel.restore_from(cfg.model_name, map_location=device) else: - model = ASRModel.from_pretrained(model_name, map_location=device) + model = ASRModel.from_pretrained(cfg.model_name, map_location=device) if not isinstance(model, EncDecCTCModel): raise NotImplementedError( - f"Model {model_name} is not an instance of NeMo EncDecCTCModel." + f"Model {cfg.model_name} is not an instance of NeMo EncDecCTCModel." " Currently only instances of EncDecCTCModels are supported" ) # define start and end line IDs of batches - with open(manifest_filepath, 'r') as f: + with open(cfg.manifest_filepath, 'r') as f: num_lines_in_manifest = sum(1 for _ in f) - starts = [x for x in range(0, num_lines_in_manifest, batch_size)] + starts = [x for x in range(0, num_lines_in_manifest, cfg.batch_size)] ends = [x - 1 for x in starts] ends.pop(0) ends.append(num_lines_in_manifest) # get alignment and save in CTM batch-by-batch for start, end in zip(starts, ends): - data = get_manifest_lines(manifest_filepath, start, end) + data = get_manifest_lines(cfg.manifest_filepath, start, end) - alignments = align_batch(data, model, separator) + alignments = align_batch(data, model, cfg.separator) - if separator == "": + if cfg.separator == "": make_basetoken_ctm( - data, alignments, model, model_downsample_factor, output_ctm_folder, n_parts_for_ctm_id, audio_sr, + data, + alignments, + model, + cfg.model_downsample_factor, + cfg.output_ctm_folder, + cfg.n_parts_for_ctm_id, + cfg.audio_sr, ) else: @@ -197,11 +222,15 @@ def align( data, alignments, model, - model_downsample_factor, - output_ctm_folder, - n_parts_for_ctm_id, - audio_sr, - separator, + cfg.model_downsample_factor, + cfg.output_ctm_folder, + cfg.n_parts_for_ctm_id, + cfg.audio_sr, + cfg.separator, ) return None + + +if __name__ == "__main__": + main() From 191695aa51181aaab66dad6f4059a46888cebbd0 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Mon, 12 Dec 2022 18:12:21 -0800 Subject: [PATCH 034/154] update usage example Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index b8cbcdf2fea1..e09a5542acdd 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -2,23 +2,18 @@ A tool for doing Forced Alignment using Viterbi decoding of NeMo CTC-based models. -# Example script - -```python -from align import align - -if __name__ == "__main__": - align( - manifest_filepath=, - output_ctm_folder=, - ) +## Usage example ``` +python /NeMo/tools/nemo_forced_aligner/align.py \ + manifest_filepath= \ + output_ctm_folder= +``` ## How do I use NeMo Forced Aligner? To use NFA, all you need to provide is a correct NeMo manifest (with 'audio_filepath' and 'text' fields). -Call the align function in align.py, specifying the parameters as follows: +Call the align.py script, specifying the parameters as follows: * `manifest_filepath`: The path to the manifest of the data you want to align, containing 'audio_filepath' and 'text' fields. From 9e84910f1f167b921565492095295f376c2d51b4 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 13 Dec 2022 10:40:41 -0800 Subject: [PATCH 035/154] Update Dockerfile (#5614) (#5616) Pinned to use `numba==0.53.1` to avoid crashing in training with `num_workers > 0`. This is just a temporary workaround, still need to fix it in the future. Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- Dockerfile | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Dockerfile b/Dockerfile index c02876fee81a..8044a99771f8 100644 --- a/Dockerfile +++ b/Dockerfile @@ -85,6 +85,10 @@ RUN python -c "import nemo.collections.nlp as nemo_nlp" && \ # install pinned numba version # RUN conda install -c conda-forge numba==0.54.1 +# Pinned to numba==0.53.1 to avoid bug in training with num_workers > 0 +# The bug still exists with PTL 1.8.4, this is just a temporary workaround. +RUN pip install numba==0.53.1 + # copy scripts/examples/tests into container for end user WORKDIR /workspace/nemo COPY scripts /workspace/nemo/scripts From 3a6adb5ffd994b20690160a0a793013fbb18979a Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 13 Dec 2022 11:30:28 -0800 Subject: [PATCH 036/154] Make separate pretrained_name and model_path parameters Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 11 +++++++---- tools/nemo_forced_aligner/align.py | 26 +++++++++++++++----------- 2 files changed, 22 insertions(+), 15 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index e09a5542acdd..3933a198b71f 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -11,16 +11,19 @@ python /NeMo/tools/nemo_forced_aligner/align.py \ ``` ## How do I use NeMo Forced Aligner? -To use NFA, all you need to provide is a correct NeMo manifest (with 'audio_filepath' and 'text' fields). +To use NFA, all you need to provide is a correct NeMo manifest (with `"audio_filepath"` and `"text"` fields). -Call the align.py script, specifying the parameters as follows: +Call the `align.py` script, specifying the parameters as follows: * `manifest_filepath`: The path to the manifest of the data you want to align, containing 'audio_filepath' and 'text' fields. * `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. -* [OPTIONAL] `model_name`: Any Quartznet, Citrinet, Conformer CTC model should work, in any language (only English has been tested so far). You can provide a pretrained model name or a path to a saved '.nemo' file. (Default: "stt_en_citrinet_1024_gamma_0_25") -> Note: If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely give Out Of Memory errors. +* [OPTIONAL] `pretrained_name`: string specifying the name of a CTC NeMo ASR model which will be automatically downloaded from NGC and used for generating the log-probs which we will use to do alignment. Any Quartznet, Citrinet, Conformer CTC model should work, in any language (only English has been tested so far). (Default: "stt_en_citrinet_1024_gamma_0_25"). +>Note: NFA can only use CTC models (not Transducer models) at the moment. If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely give Out Of Memory errors. + +* [OPTIONAL] `model_path`: string specifying the local filepath to a CTC NeMo ASR model which will be used to generate the log-probs which we will use to do alignment. +>Note: NFA can only use CTC models (not Transducer models) at the moment. If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely give Out Of Memory errors. * [OPTIONAL] `model_downsample_factor`: the downsample factor of the ASR model. It should be 2 if your model is QuartzNet, 4 if it is Conformer CTC, 8 if it is Citrinet (Default: 8, to match with the default model_name "stt_en_citrinet_1024_gamma_0_25"). diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 440c8da62aab..2de719c54b14 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -20,6 +20,7 @@ from nemo.collections.asr.models import ASRModel from nemo.collections.asr.models.ctc_models import EncDecCTCModel +from nemo.collections.asr.parts.utils.transcribe_utils import setup_model from nemo.core.config import hydra_runner V_NEG_NUM = -1e30 @@ -32,12 +33,13 @@ manifest_filepath: filepath to the manifest of the data you want to align, containing 'audio_filepath' and 'text' fields. output_ctm_folder: the folder where output CTM files will be saved. - model_name: string specifying a NeMo ASR model to use for generating the log-probs - which we will use to do alignment. - If the string ends with '.nemo', the code will treat `model_name` as a local filepath - from which it will attempt to restore a NeMo model. - If the string does not end with '.nemo', the code will attempt to download a model - with name `model_name` from NGC. + pretrained_name: string specifying the name of a CTC NeMo ASR model which will be automatically downloaded + from NGC and used for generating the log-probs which we will use to do alignment. + Note: NFA can only use CTC models (not Transducer models) at the moment. + model_path: string specifying the local filepath to a CTC NeMo ASR model which will be used to generate the + log-probs which we will use to do alignment. + Note: NFA can only use CTC models (not Transducer models) at the moment. + Note: if a model_path is provided, it will override the pretrained_name. model_downsample_factor: an int indicating the downsample factor of the ASR model, ie the ratio of input timesteps to output timesteps. If the ASR model is a QuartzNet model, its downsample factor is 2. @@ -78,7 +80,8 @@ class AlignmentConfig: output_ctm_folder: Optional[str] = None # General configs - model_name: str = "stt_en_citrinet_1024_gamma_0_25" + pretrained_name: Optional[str] = "stt_en_citrinet_1024_gamma_0_25" + model_path: Optional[str] = None model_downsample_factor: int = 8 separator: str = " " n_parts_for_ctm_id: int = 1 @@ -178,12 +181,13 @@ def main(cfg: AlignmentConfig): if cfg.output_ctm_folder is None: raise ValueError("cfg.output_ctm_folder needs to be specified") + if cfg.model_path is None and cfg.pretrained_name is None: + raise ValueError("Both cfg.model_path and cfg.pretrained_name cannot be None") + # load model device = torch.device(cfg.device) - if cfg.model_name.endswith('.nemo'): - model = ASRModel.restore_from(cfg.model_name, map_location=device) - else: - model = ASRModel.from_pretrained(cfg.model_name, map_location=device) + + model, _ = setup_model(cfg, device) if not isinstance(model, EncDecCTCModel): raise NotImplementedError( From d9792344db1e044cb60d0f1d13152ebb00b9d2f3 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Date: Tue, 13 Dec 2022 11:33:53 -0800 Subject: [PATCH 037/154] make "optional" tags bold in markdown Signed-off-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 3933a198b71f..20ad033813f1 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -19,23 +19,23 @@ Call the `align.py` script, specifying the parameters as follows: * `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. -* [OPTIONAL] `pretrained_name`: string specifying the name of a CTC NeMo ASR model which will be automatically downloaded from NGC and used for generating the log-probs which we will use to do alignment. Any Quartznet, Citrinet, Conformer CTC model should work, in any language (only English has been tested so far). (Default: "stt_en_citrinet_1024_gamma_0_25"). +* **[OPTIONAL]** `pretrained_name`: string specifying the name of a CTC NeMo ASR model which will be automatically downloaded from NGC and used for generating the log-probs which we will use to do alignment. Any Quartznet, Citrinet, Conformer CTC model should work, in any language (only English has been tested so far). (Default: "stt_en_citrinet_1024_gamma_0_25"). >Note: NFA can only use CTC models (not Transducer models) at the moment. If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely give Out Of Memory errors. -* [OPTIONAL] `model_path`: string specifying the local filepath to a CTC NeMo ASR model which will be used to generate the log-probs which we will use to do alignment. +* **[OPTIONAL]** `model_path`: string specifying the local filepath to a CTC NeMo ASR model which will be used to generate the log-probs which we will use to do alignment. >Note: NFA can only use CTC models (not Transducer models) at the moment. If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely give Out Of Memory errors. -* [OPTIONAL] `model_downsample_factor`: the downsample factor of the ASR model. It should be 2 if your model is QuartzNet, 4 if it is Conformer CTC, 8 if it is Citrinet (Default: 8, to match with the default model_name "stt_en_citrinet_1024_gamma_0_25"). +* **[OPTIONAL]** `model_downsample_factor`: the downsample factor of the ASR model. It should be 2 if your model is QuartzNet, 4 if it is Conformer CTC, 8 if it is Citrinet (Default: 8, to match with the default model_name "stt_en_citrinet_1024_gamma_0_25"). -* [OPTIONAL] `separator`: the string used to separate CTM segments. If the separator is `“”` (empty string), the CTM segments will be the tokens used by the ASR model. If the separator is anything else, e.g. `“ “`, `“|”` or `“”`, the segments will be the blocks of text separated by that separator. (Default: `“ “`, so for languages such as English, the CTM segments will be words.) +* **[OPTIONAL]** `separator`: the string used to separate CTM segments. If the separator is `“”` (empty string), the CTM segments will be the tokens used by the ASR model. If the separator is anything else, e.g. `“ “`, `“|”` or `“”`, the segments will be the blocks of text separated by that separator. (Default: `“ “`, so for languages such as English, the CTM segments will be words.) -* [OPTIONAL] `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be stripped away from the utt_id, so as not to change the number of space-separated elements in the CTM files. +* **[OPTIONAL]** `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be stripped away from the utt_id, so as not to change the number of space-separated elements in the CTM files. -* [OPTIONAL] `audio_sr`: The sample rate (in Hz) of your audio. (Default: 16000) +* **[OPTIONAL]** `audio_sr`: The sample rate (in Hz) of your audio. (Default: 16000) -* [OPTIONAL] `device`: The device that will be used for generating log-probs and doing Viterbi decoding. (Default: 'cpu'). +* **[OPTIONAL]** `device`: The device that will be used for generating log-probs and doing Viterbi decoding. (Default: 'cpu'). -* [OPTIONAL] `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). +* **[OPTIONAL]** `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). # Input manifest file format From 9c6c6db4339512f489e423287e52b2cd3b32a9f3 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 13 Dec 2022 11:51:27 -0800 Subject: [PATCH 038/154] Move non-main functions to utils dir Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 89 +----------- tools/nemo_forced_aligner/utils/data_prep.py | 137 ++++++++++++++++++ .../{utils.py => utils/make_ctm.py} | 123 +--------------- .../utils/viterbi_decoding.py | 91 ++++++++++++ 4 files changed, 234 insertions(+), 206 deletions(-) create mode 100644 tools/nemo_forced_aligner/utils/data_prep.py rename tools/nemo_forced_aligner/{utils.py => utils/make_ctm.py} (69%) create mode 100644 tools/nemo_forced_aligner/utils/viterbi_decoding.py diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 2de719c54b14..b1d1aec9c1ec 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -16,14 +16,15 @@ from typing import Optional import torch -from utils import get_log_probs_y_T_U, get_manifest_lines, make_basetoken_ctm, make_word_ctm +from utils.data_prep import get_log_probs_y_T_U, get_manifest_lines +from utils.make_ctm import make_basetoken_ctm, make_word_ctm +from utils.viterbi_decoding import viterbi_decoding from nemo.collections.asr.models import ASRModel from nemo.collections.asr.models.ctc_models import EncDecCTCModel from nemo.collections.asr.parts.utils.transcribe_utils import setup_model from nemo.core.config import hydra_runner -V_NEG_NUM = -1e30 """ Align the utterances in manifest_filepath. @@ -90,87 +91,6 @@ class AlignmentConfig: batch_size: int = 1 -def viterbi_decoding(log_probs, y, T, U): - """ - Does Viterbi decoding. - Returns: - alignments: list of lists containing locations for the tokens we align to at each timestep - looks like: [[0, 0, 1, 2, 2, 3, 3, ... ], [0, 1, 2, 2, 2, 3, 4, ....], ...] - v_matrix: - tensor of shape (Batch, Time, Length_of_U_including_interspersed_blanks) - containing viterbi probabilities - This is returned in case you want to examine it in a parent function - log_probs_reordered: - log_probs reordered to match the order of ground truth tokens - tensor of shape (Batch, Time, Length_of_U_including_interspersed_blanks) - This is returned in case you want to examine it in a parent function - """ - B, T_max, _ = log_probs.shape - U_max = y.shape[1] - - device = log_probs.device - - padding_for_log_probs = V_NEG_NUM * torch.ones((B, T_max, 1), device=device) - log_probs_padded = torch.cat((log_probs, padding_for_log_probs), dim=2) - log_probs_reordered = torch.gather(input=log_probs_padded, dim=2, index=y.unsqueeze(1).repeat(1, T_max, 1)) - - v_matrix = V_NEG_NUM * torch.ones_like(log_probs_reordered) - backpointers = -999 * torch.ones_like(v_matrix) - v_matrix[:, 0, :2] = log_probs_reordered[:, 0, :2] - - y_shifted_left = torch.roll(y, shifts=2, dims=1) - letter_repetition_mask = y - y_shifted_left - letter_repetition_mask[:, :2] = 1 # make sure dont apply mask to first 2 tokens - letter_repetition_mask = letter_repetition_mask == 0 - - bp_absolute_template = torch.arange(U_max, device=device).unsqueeze(0).repeat(B, 1) - - for t in range(1, T_max): - - e_current = log_probs_reordered[:, t, :] - - v_prev = v_matrix[:, t - 1, :] - v_prev_shifted = torch.roll(v_prev, shifts=1, dims=1) - v_prev_shifted[:, 0] = V_NEG_NUM - - v_prev_shifted2 = torch.roll(v_prev, shifts=2, dims=1) - v_prev_shifted2[:, :2] = V_NEG_NUM - v_prev_shifted2.masked_fill_(letter_repetition_mask, V_NEG_NUM) - - v_prev_dup = torch.cat( - (v_prev.unsqueeze(2), v_prev_shifted.unsqueeze(2), v_prev_shifted2.unsqueeze(2),), dim=2, - ) - - candidates_v_current = v_prev_dup + e_current.unsqueeze(2) - v_current, bp_relative = torch.max(candidates_v_current, dim=2) - - bp_absolute = bp_absolute_template - bp_relative - - v_matrix[:, t, :] = v_current - backpointers[:, t, :] = bp_absolute - - # trace backpointers TODO: parallelize over batch_size - alignments = [] - for b in range(B): - T_b = int(T[b]) - U_b = int(U[b]) - - final_state = int(torch.argmax(v_matrix[b, T_b - 1, U_b - 2 : U_b])) + U_b - 2 - alignment_b = [final_state] - for t in range(T_b - 1, 0, -1): - alignment_b.insert(0, int(backpointers[b, t, alignment_b[0]])) - alignments.append(alignment_b) - - return alignments - - -def align_batch(data, model, separator): - log_probs, y, T, U = get_log_probs_y_T_U(data, model, separator) - alignments = viterbi_decoding(log_probs, y, T, U) - - return alignments - - @hydra_runner(config_name="AlignmentConfig", schema=AlignmentConfig) def main(cfg: AlignmentConfig): @@ -208,7 +128,8 @@ def main(cfg: AlignmentConfig): for start, end in zip(starts, ends): data = get_manifest_lines(cfg.manifest_filepath, start, end) - alignments = align_batch(data, model, cfg.separator) + log_probs, y, T, U = get_log_probs_y_T_U(data, model, cfg.separator) + alignments = viterbi_decoding(log_probs, y, T, U) if cfg.separator == "": make_basetoken_ctm( diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py new file mode 100644 index 000000000000..3fb2c58f192f --- /dev/null +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -0,0 +1,137 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import json + +import torch + +V_NEG_NUM = -1e30 + + +def get_manifest_lines(manifest_filepath, start, end): + data = [] + with open(manifest_filepath, "r") as f: + for line_i, line in enumerate(f): + if line_i == start and line_i == end: + data.append(json.loads(line)) + break + + if line_i == end: + break + if line_i >= start: + data.append(json.loads(line)) + return data + + +def get_processed_text_and_segments(text, separator): + """ + If the separator is not empty string, ie CTM segments will not just be tokens, + this function replaces the separator with a space, amalgamates any spaces around + it into one, and returns the new processed text as well as a list of strings + containing the segments to be returned in the CTM. + """ + if separator == "": + return text, None + + segments = text.split(separator) + + # remove any spaces at start and end of segments + segments = [seg.strip() for seg in segments] + + processed_text = " ".join(segments) + + return processed_text, segments + + +def tokenize_text(model, text): + """ Works for token-based and character-based models """ + if hasattr(model, "tokenizer"): + return model.tokenizer.text_to_ids(text) + + elif hasattr(model.decoder, "vocabulary"): + # i.e. assume it is character-based model (in theory could be TalkNet - that will not work) + tokens = [] + for character in text: + if character in model.decoder.vocabulary: + tokens.append(model.decoder.vocabulary.index(character)) + else: + tokens.append(len(model.decoder.vocabulary)) # return unk token (same as blank token) + + return tokens + + else: + raise RuntimeError("Can't tokenize this model atm") + + +def get_log_probs_y_T_U(data, model, separator): + """ + Preparing some tensors to be used during Viterbi decoding. + Returns: + log_probs, y, T, U_dash (ie. U with every other token being a blank) + """ + + audio_filepaths = [line["audio_filepath"] for line in data] + B = len(audio_filepaths) + + with torch.no_grad(): + hypotheses = model.transcribe(audio_filepaths, return_hypotheses=True, batch_size=B) + + log_probs_list = [] + T_list = [] + for hypothesis in hypotheses: + log_probs_list.append(hypothesis.y_sequence) + T_list.append(hypothesis.y_sequence.shape[0]) + + y_list = [] + U_list = [] + for line in data: + processed_text, _ = get_processed_text_and_segments(line['text'], separator) + y_line = tokenize_text(model, processed_text) + y_list.append(y_line) + U_list.append(len(y_line)) + + T_max = max(T_list) + U_max = max(U_list) + blank_index = len(model.decoder.vocabulary) + V = len(model.decoder.vocabulary) + 1 + T = torch.tensor(T_list) + + # make log_probs tensor of shape (B x T_max x V) + log_probs = V_NEG_NUM * torch.ones((B, T_max, V)) + for b, log_probs_b in enumerate(log_probs_list): + t = log_probs_b.shape[0] + log_probs[b, :t, :] = log_probs_b + + # make y tensor of shape (B x U_dash_max) + U_dash_max = 2 * U_max + 1 + y = V * torch.ones((B, U_dash_max), dtype=torch.int64) + U_dash_list = [] + for b, y_b in enumerate(y_list): + y_dash_b = [blank_index] + for y_b_element in y_b: + y_dash_b.append(y_b_element) + y_dash_b.append(blank_index) + U_dash = len(y_dash_b) + y[b, :U_dash] = torch.tensor(y_dash_b) + U_dash_list.append(U_dash) + + U_dash = torch.tensor(U_dash_list) + + # transfer all tensors to device + log_probs = log_probs.to(model.device) + y = y.to(model.device) + T = T.to(model.device) + U_dash = U_dash.to(model.device) + + return log_probs, y, T, U_dash diff --git a/tools/nemo_forced_aligner/utils.py b/tools/nemo_forced_aligner/utils/make_ctm.py similarity index 69% rename from tools/nemo_forced_aligner/utils.py rename to tools/nemo_forced_aligner/utils/make_ctm.py index daf70059a84e..147d0e7d2a89 100644 --- a/tools/nemo_forced_aligner/utils.py +++ b/tools/nemo_forced_aligner/utils/make_ctm.py @@ -12,131 +12,10 @@ # See the License for the specific language governing permissions and # limitations under the License. -import json import os from pathlib import Path -import torch - -V_NEG_NUM = -1e30 - - -def get_manifest_lines(manifest_filepath, start, end): - data = [] - with open(manifest_filepath, "r") as f: - for line_i, line in enumerate(f): - if line_i == start and line_i == end: - data.append(json.loads(line)) - break - - if line_i == end: - break - if line_i >= start: - data.append(json.loads(line)) - return data - - -def get_processed_text_and_segments(text, separator): - """ - If the separator is not empty string, ie CTM segments will not just be tokens, - this function replaces the separator with a space, amalgamates any spaces around - it into one, and returns the new processed text as well as a list of strings - containing the segments to be returned in the CTM. - """ - if separator == "": - return text, None - - segments = text.split(separator) - - # remove any spaces at start and end of segments - segments = [seg.strip() for seg in segments] - - processed_text = " ".join(segments) - - return processed_text, segments - - -def tokenize_text(model, text): - """ Works for token-based and character-based models """ - if hasattr(model, "tokenizer"): - return model.tokenizer.text_to_ids(text) - - elif hasattr(model.decoder, "vocabulary"): - # i.e. assume it is character-based model (in theory could be TalkNet - that will not work) - tokens = [] - for character in text: - if character in model.decoder.vocabulary: - tokens.append(model.decoder.vocabulary.index(character)) - else: - tokens.append(len(model.decoder.vocabulary)) # return unk token (same as blank token) - - return tokens - - else: - raise RuntimeError("Can't tokenize this model atm") - - -def get_log_probs_y_T_U(data, model, separator): - """ - Preparing some tensors to be used during Viterbi decoding. - Returns: - log_probs, y, T, U_dash (ie. U with every other token being a blank) - """ - - audio_filepaths = [line["audio_filepath"] for line in data] - B = len(audio_filepaths) - - with torch.no_grad(): - hypotheses = model.transcribe(audio_filepaths, return_hypotheses=True, batch_size=B) - - log_probs_list = [] - T_list = [] - for hypothesis in hypotheses: - log_probs_list.append(hypothesis.y_sequence) - T_list.append(hypothesis.y_sequence.shape[0]) - - y_list = [] - U_list = [] - for line in data: - processed_text, _ = get_processed_text_and_segments(line['text'], separator) - y_line = tokenize_text(model, processed_text) - y_list.append(y_line) - U_list.append(len(y_line)) - - T_max = max(T_list) - U_max = max(U_list) - blank_index = len(model.decoder.vocabulary) - V = len(model.decoder.vocabulary) + 1 - T = torch.tensor(T_list) - - # make log_probs tensor of shape (B x T_max x V) - log_probs = V_NEG_NUM * torch.ones((B, T_max, V)) - for b, log_probs_b in enumerate(log_probs_list): - t = log_probs_b.shape[0] - log_probs[b, :t, :] = log_probs_b - - # make y tensor of shape (B x U_dash_max) - U_dash_max = 2 * U_max + 1 - y = V * torch.ones((B, U_dash_max), dtype=torch.int64) - U_dash_list = [] - for b, y_b in enumerate(y_list): - y_dash_b = [blank_index] - for y_b_element in y_b: - y_dash_b.append(y_b_element) - y_dash_b.append(blank_index) - U_dash = len(y_dash_b) - y[b, :U_dash] = torch.tensor(y_dash_b) - U_dash_list.append(U_dash) - - U_dash = torch.tensor(U_dash_list) - - # transfer all tensors to device - log_probs = log_probs.to(model.device) - y = y.to(model.device) - T = T.to(model.device) - U_dash = U_dash.to(model.device) - - return log_probs, y, T, U_dash +from .data_prep import get_processed_text_and_segments, tokenize_text def _get_utt_id(audio_filepath, n_parts_for_ctm_id): diff --git a/tools/nemo_forced_aligner/utils/viterbi_decoding.py b/tools/nemo_forced_aligner/utils/viterbi_decoding.py new file mode 100644 index 000000000000..da28ba4cf35e --- /dev/null +++ b/tools/nemo_forced_aligner/utils/viterbi_decoding.py @@ -0,0 +1,91 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch + +V_NEG_NUM = -1e30 + + +def viterbi_decoding(log_probs, y, T, U): + """ + Does Viterbi decoding. + Returns: + alignments: list of lists containing locations for the tokens we align to at each timestep + looks like: [[0, 0, 1, 2, 2, 3, 3, ... ], [0, 1, 2, 2, 2, 3, 4, ....], ...] + v_matrix: + tensor of shape (Batch, Time, Length_of_U_including_interspersed_blanks) + containing viterbi probabilities + This is returned in case you want to examine it in a parent function + log_probs_reordered: + log_probs reordered to match the order of ground truth tokens + tensor of shape (Batch, Time, Length_of_U_including_interspersed_blanks) + This is returned in case you want to examine it in a parent function + """ + B, T_max, _ = log_probs.shape + U_max = y.shape[1] + + device = log_probs.device + + padding_for_log_probs = V_NEG_NUM * torch.ones((B, T_max, 1), device=device) + log_probs_padded = torch.cat((log_probs, padding_for_log_probs), dim=2) + log_probs_reordered = torch.gather(input=log_probs_padded, dim=2, index=y.unsqueeze(1).repeat(1, T_max, 1)) + + v_matrix = V_NEG_NUM * torch.ones_like(log_probs_reordered) + backpointers = -999 * torch.ones_like(v_matrix) + v_matrix[:, 0, :2] = log_probs_reordered[:, 0, :2] + + y_shifted_left = torch.roll(y, shifts=2, dims=1) + letter_repetition_mask = y - y_shifted_left + letter_repetition_mask[:, :2] = 1 # make sure dont apply mask to first 2 tokens + letter_repetition_mask = letter_repetition_mask == 0 + + bp_absolute_template = torch.arange(U_max, device=device).unsqueeze(0).repeat(B, 1) + + for t in range(1, T_max): + + e_current = log_probs_reordered[:, t, :] + + v_prev = v_matrix[:, t - 1, :] + v_prev_shifted = torch.roll(v_prev, shifts=1, dims=1) + v_prev_shifted[:, 0] = V_NEG_NUM + + v_prev_shifted2 = torch.roll(v_prev, shifts=2, dims=1) + v_prev_shifted2[:, :2] = V_NEG_NUM + v_prev_shifted2.masked_fill_(letter_repetition_mask, V_NEG_NUM) + + v_prev_dup = torch.cat( + (v_prev.unsqueeze(2), v_prev_shifted.unsqueeze(2), v_prev_shifted2.unsqueeze(2),), dim=2, + ) + + candidates_v_current = v_prev_dup + e_current.unsqueeze(2) + v_current, bp_relative = torch.max(candidates_v_current, dim=2) + + bp_absolute = bp_absolute_template - bp_relative + + v_matrix[:, t, :] = v_current + backpointers[:, t, :] = bp_absolute + + # trace backpointers TODO: parallelize over batch_size + alignments = [] + for b in range(B): + T_b = int(T[b]) + U_b = int(U[b]) + + final_state = int(torch.argmax(v_matrix[b, T_b - 1, U_b - 2 : U_b])) + U_b - 2 + alignment_b = [final_state] + for t in range(T_b - 1, 0, -1): + alignment_b.insert(0, int(backpointers[b, t, alignment_b[0]])) + alignments.append(alignment_b) + + return alignments From 67093df30b744e14031058fc67ea299ec350127e Mon Sep 17 00:00:00 2001 From: anteju <108555623+anteju@users.noreply.github.com> Date: Tue, 13 Dec 2022 13:46:45 -0800 Subject: [PATCH 039/154] Temp workaround: Disable test with cache_audio=True since it is failing in CI (#5607) (#5615) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Ante Jukić Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- tests/collections/asr/test_asr_datasets.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/collections/asr/test_asr_datasets.py b/tests/collections/asr/test_asr_datasets.py index d64b8ad98f5f..18ac747a845b 100644 --- a/tests/collections/asr/test_asr_datasets.py +++ b/tests/collections/asr/test_asr_datasets.py @@ -1546,7 +1546,8 @@ def test_audio_to_target_with_embedding_dataset(self): class TestUtilityFunctions: @pytest.mark.unit - @pytest.mark.parametrize('cache_audio', [False, True]) + # TEMP WORKAROUND: Skip `True`, multiprocessing failing in CI (#5607) + @pytest.mark.parametrize('cache_audio', [False]) def test_cache_datastore_manifests(self, cache_audio: bool): """Test caching of manifest and audio files. """ From f4cfa471cd6b34f547fad7223293ff016b2b1444 Mon Sep 17 00:00:00 2001 From: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Date: Tue, 13 Dec 2022 19:43:47 -0800 Subject: [PATCH 040/154] [TTS] fix ranges of char set for accented letters. (#5607) * [TTS] fix ranges of char set for accented letters. * remove digits pattern and added unit tests for math operators. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- nemo_text_processing/g2p/data/data_utils.py | 16 ++++++++--- .../g2p/data/test_data_utils.py | 27 ++++++++++++++++--- 2 files changed, 36 insertions(+), 7 deletions(-) diff --git a/nemo_text_processing/g2p/data/data_utils.py b/nemo_text_processing/g2p/data/data_utils.py index b91f533de858..b90ee394ae57 100644 --- a/nemo_text_processing/g2p/data/data_utils.py +++ b/nemo_text_processing/g2p/data/data_utils.py @@ -37,9 +37,17 @@ # 3rd group -- punctuation marks. # Text (first line) and mask of groups for every char (second line). # config file must contain |EY1 EY1|, B, C, D, E, F, and G. -# 111111311113111131111111322222222233133133133133133111313 -_WORDS_RE = re.compile(r"([a-zA-Z]+(?:[a-zA-Z-']*[a-zA-Z]+)*)|(\|[^|]*\|)|([^a-zA-Z|]+)") -_WORDS_RE_IPA = re.compile(r"([a-zA-ZÀ-ÿ\d]+(?:[a-zA-ZÀ-ÿ\d\-']*[a-zA-ZÀ-ÿ\d]+)*)|(\|[^|]*\|)|([^a-zA-ZÀ-ÿ\d|]+)") + +# define char set based on https://en.wikipedia.org/wiki/List_of_Unicode_characters +LATIN_ALPHABET_BASIC = "A-Za-z" +ACCENTED_CHARS = "À-ÖØ-öø-ÿ" +LATIN_CHARS_ALL = f"{LATIN_ALPHABET_BASIC}{ACCENTED_CHARS}" +_WORDS_RE = re.compile( + fr"([{LATIN_ALPHABET_BASIC}]+(?:[{LATIN_ALPHABET_BASIC}\-']*[{LATIN_ALPHABET_BASIC}]+)*)|(\|[^|]*\|)|([^{LATIN_ALPHABET_BASIC}|]+)" +) +_WORDS_RE_IPA = re.compile( + fr"([{LATIN_CHARS_ALL}]+(?:[{LATIN_CHARS_ALL}\-']*[{LATIN_CHARS_ALL}]+)*)|(\|[^|]*\|)|([^{LATIN_CHARS_ALL}|]+)" +) # - @@ -77,7 +85,7 @@ def read_wordids(wordid_map: str): def get_wordid_to_nemo(wordid_to_nemo_cmu_file: str = "../../../scripts/tts_dataset_files/wordid_to_nemo_cmu.tsv"): """ - WikiHomograph and NeMo use slightly differene phoneme sets, this funtion reads WikiHomograph word_ids to NeMo + WikiHomograph and NeMo use slightly different phoneme sets, this function reads WikiHomograph word_ids to NeMo IPA heteronyms mapping """ wordid_to_nemo_cmu = {} diff --git a/tests/nemo_text_processing/g2p/data/test_data_utils.py b/tests/nemo_text_processing/g2p/data/test_data_utils.py index ccf9c7da5b61..28edca13fa58 100644 --- a/tests/nemo_text_processing/g2p/data/test_data_utils.py +++ b/tests/nemo_text_processing/g2p/data/test_data_utils.py @@ -90,9 +90,30 @@ def test_ipa_word_tokenize_with_accents(self): @pytest.mark.run_only_on('CPU') @pytest.mark.unit - def test_ipa_word_tokenize_with_numbers(self): - input_text = "The 3D movie on 4-1-2022" - expected_output = self._create_expected_output(["the", " ", "3d", " ", "movie", " ", "on", " ", "4-1-2022"]) + def test_ipa_word_tokenize_with_non_latin_chars(self): + input_text = "Three times× four^teen ÷divided by [movies] on \slash." + expected_output = self._create_expected_output( + [ + "three", + " ", + "times", + "× ", + "four", + "^", + "teen", + " ÷", + "divided", + " ", + "by", + " [", + "movies", + "] ", + "on", + " \\", + "slash", + ".", + ] + ) output = ipa_word_tokenize(input_text) assert output == expected_output From 1d385209cab3b301033482d9eb514f25e145f56b Mon Sep 17 00:00:00 2001 From: Sean Naren Date: Wed, 14 Dec 2022 17:29:18 +0000 Subject: [PATCH 041/154] Change success message to reduce confusion (#5621) Signed-off-by: SeanNaren Signed-off-by: SeanNaren Signed-off-by: Elena Rastorgueva --- nemo/collections/common/callbacks/ema.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nemo/collections/common/callbacks/ema.py b/nemo/collections/common/callbacks/ema.py index 56db703aab28..0737f4262b62 100644 --- a/nemo/collections/common/callbacks/ema.py +++ b/nemo/collections/common/callbacks/ema.py @@ -135,7 +135,7 @@ def on_load_checkpoint( # we could enforce that if you trained with EMA and want to continue training checkpoint['optimizer_states'] = ema_state_dict['optimizer_states'] del ema_state_dict - logging.info("EMA weights have been loaded successfully. Continuing training with saved EMA weights.") + logging.info("EMA state has been restored.") else: raise MisconfigurationException( "Unable to find the associated EMA weights when re-loading, " From 67dffe4e87229abdd7cdccb6875d29f1c283d763 Mon Sep 17 00:00:00 2001 From: Somshubra Majumdar Date: Wed, 14 Dec 2022 12:09:40 -0800 Subject: [PATCH 042/154] Update documentation and tutorials for Adapters (#5610) * Improve docs for adapter and tests Signed-off-by: smajumdar * Improve docs for adapter and tests Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Rename test file Signed-off-by: smajumdar Signed-off-by: smajumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- .../asr/modules/conformer_encoder.py | 24 +- nemo/collections/asr/modules/conv_asr.py | 14 +- .../asr/modules/squeezeformer_encoder.py | 24 +- nemo/core/classes/mixins/adapter_mixins.py | 18 +- .../mixins/test_adapter_common_model_mixin.py | 172 ++++++++++ tutorials/02_NeMo_Adapters.ipynb | 323 +++++++++++++++++- .../asr/asr_adapters/ASR_with_Adapters.ipynb | 143 +++++++- 7 files changed, 680 insertions(+), 38 deletions(-) create mode 100644 tests/collections/common/mixins/test_adapter_common_model_mixin.py diff --git a/nemo/collections/asr/modules/conformer_encoder.py b/nemo/collections/asr/modules/conformer_encoder.py index d042c09a60e7..a3b747e138a4 100644 --- a/nemo/collections/asr/modules/conformer_encoder.py +++ b/nemo/collections/asr/modules/conformer_encoder.py @@ -14,7 +14,7 @@ import math from collections import OrderedDict -from typing import List, Optional +from typing import List, Optional, Set import torch import torch.distributed @@ -615,14 +615,6 @@ class ConformerEncoderAdapter(ConformerEncoder, adapter_mixins.AdapterModuleMixi # Higher level forwarding def add_adapter(self, name: str, cfg: dict): - self.set_accepted_adapter_types( - [ - adapter_utils.LINEAR_ADAPTER_CLASSPATH, - adapter_utils.MHA_ADAPTER_CLASSPATH, - adapter_utils.RELMHA_ADAPTER_CLASSPATH, - ] - ) - cfg = self._update_adapter_cfg_input_dim(cfg) for conformer_layer in self.layers: # type: adapter_mixins.AdapterModuleMixin conformer_layer.add_adapter(name, cfg) @@ -646,6 +638,20 @@ def _update_adapter_cfg_input_dim(self, cfg: DictConfig): cfg = adapter_utils.update_adapter_cfg_input_dim(self, cfg, module_dim=self.d_model) return cfg + def get_accepted_adapter_types(self,) -> Set[type]: + types = super().get_accepted_adapter_types() + + if len(types) == 0: + self.set_accepted_adapter_types( + [ + adapter_utils.LINEAR_ADAPTER_CLASSPATH, + adapter_utils.MHA_ADAPTER_CLASSPATH, + adapter_utils.RELMHA_ADAPTER_CLASSPATH, + ] + ) + types = self.get_accepted_adapter_types() + return types + """ Register any additional information diff --git a/nemo/collections/asr/modules/conv_asr.py b/nemo/collections/asr/modules/conv_asr.py index e930723d0172..fc59ce17f076 100644 --- a/nemo/collections/asr/modules/conv_asr.py +++ b/nemo/collections/asr/modules/conv_asr.py @@ -13,7 +13,7 @@ # limitations under the License. from collections import OrderedDict from dataclasses import dataclass, field -from typing import List, Optional, Union +from typing import List, Optional, Set, Union import torch import torch.distributed @@ -889,7 +889,7 @@ def add_adapter(self, name: str, cfg: dict): for jasper_block in self.encoder: # type: adapter_mixins.AdapterModuleMixin cfg = self._update_adapter_cfg_input_dim(jasper_block, cfg) - jasper_block.set_accepted_adapter_types([adapter_utils.LINEAR_ADAPTER_CLASSPATH]) + jasper_block.set_accepted_adapter_types(self.get_accepted_adapter_types()) jasper_block.add_adapter(name, cfg) def is_adapter_available(self) -> bool: @@ -911,6 +911,16 @@ def _update_adapter_cfg_input_dim(self, block: JasperBlock, cfg): cfg = adapter_utils.update_adapter_cfg_input_dim(self, cfg, module_dim=block.planes) return cfg + def get_accepted_adapter_types(self,) -> Set[type]: + types = super().get_accepted_adapter_types() + + if len(types) == 0: + self.set_accepted_adapter_types( + [adapter_utils.LINEAR_ADAPTER_CLASSPATH,] + ) + types = self.get_accepted_adapter_types() + return types + @dataclass class JasperEncoderConfig: diff --git a/nemo/collections/asr/modules/squeezeformer_encoder.py b/nemo/collections/asr/modules/squeezeformer_encoder.py index 7aabea642877..125f7f5df0e6 100644 --- a/nemo/collections/asr/modules/squeezeformer_encoder.py +++ b/nemo/collections/asr/modules/squeezeformer_encoder.py @@ -14,7 +14,7 @@ import math from collections import OrderedDict -from typing import List, Optional +from typing import List, Optional, Set import torch import torch.distributed @@ -395,14 +395,6 @@ class SqueezeformerEncoderAdapter(SqueezeformerEncoder, adapter_mixins.AdapterMo # Higher level forwarding def add_adapter(self, name: str, cfg: dict): - self.set_accepted_adapter_types( - [ - adapter_utils.LINEAR_ADAPTER_CLASSPATH, - adapter_utils.MHA_ADAPTER_CLASSPATH, - adapter_utils.RELMHA_ADAPTER_CLASSPATH, - ] - ) - cfg = self._update_adapter_cfg_input_dim(cfg) for conformer_layer in self.layers: # type: adapter_mixins.AdapterModuleMixin conformer_layer.add_adapter(name, cfg) @@ -426,6 +418,20 @@ def _update_adapter_cfg_input_dim(self, cfg: DictConfig): cfg = adapter_utils.update_adapter_cfg_input_dim(self, cfg, module_dim=self.d_model) return cfg + def get_accepted_adapter_types(self,) -> Set[type]: + types = super().get_accepted_adapter_types() + + if len(types) == 0: + self.set_accepted_adapter_types( + [ + adapter_utils.LINEAR_ADAPTER_CLASSPATH, + adapter_utils.MHA_ADAPTER_CLASSPATH, + adapter_utils.RELMHA_ADAPTER_CLASSPATH, + ] + ) + types = self.get_accepted_adapter_types() + return types + """ Register any additional information diff --git a/nemo/core/classes/mixins/adapter_mixins.py b/nemo/core/classes/mixins/adapter_mixins.py index e6006e1c6c5a..3d789be7dc61 100644 --- a/nemo/core/classes/mixins/adapter_mixins.py +++ b/nemo/core/classes/mixins/adapter_mixins.py @@ -12,6 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. +import inspect from abc import ABC from dataclasses import dataclass, is_dataclass from typing import List, Optional, Set, Tuple, Union @@ -159,6 +160,9 @@ def add_adapter(self, name: str, cfg: DictConfig): cfg: A DictConfig or Dataclass that contains at the bare minimum `__target__` to instantiate a new Adapter module. """ + if not isinstance(cfg, DictConfig): + cfg = DictConfig(cfg) + adapter_types = self.get_accepted_adapter_types() _pass_types = False if len(adapter_types) > 0: @@ -330,7 +334,7 @@ def get_adapter_module(self, name: str): return self.adapter_layer[name] if name in self.adapter_layer else None return None - def set_accepted_adapter_types(self, adapter_types: List[str]) -> None: + def set_accepted_adapter_types(self, adapter_types: List[Union[type, str]]) -> None: """ The module with this mixin can define a list of adapter names that it will accept. This method should be called in the modules init method and set the adapter names the module will expect to be added. @@ -340,7 +344,17 @@ def set_accepted_adapter_types(self, adapter_types: List[str]) -> None: ensure that the class path is correct. """ # Let user update and set accepted adapter types. - self._accepted_adapter_types = set([model_utils.import_class_by_path(s) for s in adapter_types]) + types = [] + for s in adapter_types: + if inspect.isclass(s): + if not issubclass(s, nn.Module): + raise ValueError(f"Attempted to add class ({s}) but is not a subclass of torch.nn.Module") + + types.append(s) + else: + types.append(model_utils.import_class_by_path(s)) + + self._accepted_adapter_types = set(types) def get_accepted_adapter_types(self,) -> Set[type]: """ diff --git a/tests/collections/common/mixins/test_adapter_common_model_mixin.py b/tests/collections/common/mixins/test_adapter_common_model_mixin.py new file mode 100644 index 000000000000..22cd3fdb31bd --- /dev/null +++ b/tests/collections/common/mixins/test_adapter_common_model_mixin.py @@ -0,0 +1,172 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +import shutil +import tempfile +from typing import Tuple + +import pytest +import torch +from hydra.utils import instantiate +from omegaconf import DictConfig, OmegaConf + +from nemo.collections.common.parts.adapter_modules import LinearAdapter +from nemo.core import ModelPT, NeuralModule +from nemo.core.classes.mixins import adapter_mixin_strategies, adapter_mixins +from nemo.core.classes.mixins.adapter_mixins import AdapterModelPTMixin, AdapterModuleMixin +from nemo.utils import logging, logging_mode + + +class MockLinearAdapter1(LinearAdapter): + pass + + +class MockLinearAdapter2(LinearAdapter): + pass + + +class CommonModule(NeuralModule): + """ Define a default neural module (without adapter support)""" + + def __init__(self): + super().__init__() + + self.fc = torch.nn.Linear(50, 50) + self.bn = torch.nn.BatchNorm1d(50) + + def forward(self, x): + x = self.fc(x) + x = self.bn(x) + out = x + return out + + def num_params(self): + num: int = 0 + for p in self.parameters(): + if p.requires_grad: + num += p.numel() + return num + + +class CommonModuleAdapter(CommonModule, AdapterModuleMixin): + """ Subclass the DefaultModule, adding adapter module support""" + + def forward(self, x): + x = super().forward(x) + + if self.is_adapter_available(): + # For testing purposes, cache the adapter names + self._adapter_names = self.get_enabled_adapters() + # call forward over model adapters, summing them up + x = self.forward_enabled_adapters(x) + + return x + + def get_accepted_adapter_types(self,) -> 'Set[type]': + types = super().get_accepted_adapter_types() + + if len(types) == 0: + self.set_accepted_adapter_types(['nemo.collections.common.parts.adapter_modules.LinearAdapter']) + types = self.get_accepted_adapter_types() + return types + + +def get_adapter_cfg(in_features=50, dim=100, norm_pos='pre'): + cfg = { + '_target_': 'nemo.collections.common.parts.adapter_modules.LinearAdapter', + 'in_features': in_features, + 'dim': dim, + 'norm_position': norm_pos, + } + return cfg + + +def get_classpath(cls): + return f'{cls.__module__}.{cls.__name__}' + + +if adapter_mixins.get_registered_adapter(CommonModule) is None: + adapter_mixins.register_adapter(CommonModule, CommonModuleAdapter) + + +class TestCommonAdapterModuleMixin: + @pytest.mark.unit + def test_get_accepted_adapter_types(self): + + model = CommonModuleAdapter() + original_num_params = model.num_weights + + assert not hasattr(model, '_accepted_adapter_types') + + model.add_adapter(name='adapter_0', cfg=get_adapter_cfg()) + new_num_params = model.num_weights + assert new_num_params > original_num_params + + # Adding adapter will implicitly try to get accepted adapters, initializing the set + assert hasattr(model, '_accepted_adapter_types') + + types = model.get_accepted_adapter_types() + types = list(types) + assert len(types) == 1 + assert types[0].__name__ == 'LinearAdapter' + + @pytest.mark.unit + def test_set_accepted_adapter_types_reset_types(self): + model = CommonModuleAdapter() + original_num_params = model.num_weights + + assert not hasattr(model, '_accepted_adapter_types') + + # Implicitly sets some types + model.get_accepted_adapter_types() + + # Adding adapter will implicitly try to get accepted adapters, initializing the set + assert hasattr(model, '_accepted_adapter_types') + + types = model.get_accepted_adapter_types() + types = list(types) + assert len(types) == 1 + assert types[0].__name__ == 'LinearAdapter' + + # Reset type now + model.set_accepted_adapter_types([]) + + assert hasattr(model, '_accepted_adapter_types') + types = model._accepted_adapter_types + assert len(types) == 0 + + # Since types are empty, get_types will set the default types + model.add_adapter(name='adapter_0', cfg=get_adapter_cfg()) + new_num_params = model.num_weights + assert new_num_params > original_num_params + + @pytest.mark.unit + def test_set_accepted_adapter_types_invalid_class(self): + model = CommonModuleAdapter() + original_num_params = model.num_weights + + assert not hasattr(model, '_accepted_adapter_types') + + # Explicitly set the accepted types to be the subclasses + model.set_accepted_adapter_types( + [ + get_classpath(MockLinearAdapter1), # Pass string class path + MockLinearAdapter2, # Pass actual class itself + ] + ) + + # Should throw error because the base class is now no longer in accepted list + # and the get_types method does not fill in the default + with pytest.raises(ValueError): + model.add_adapter(name='adapter_0', cfg=get_adapter_cfg()) diff --git a/tutorials/02_NeMo_Adapters.ipynb b/tutorials/02_NeMo_Adapters.ipynb index 75942c6bf4af..2f921c9859a7 100644 --- a/tutorials/02_NeMo_Adapters.ipynb +++ b/tutorials/02_NeMo_Adapters.ipynb @@ -420,13 +420,321 @@ }, { "cell_type": "markdown", + "source": [ + "-----\n", + "You will often not directly with this module, instead of passing the Config Dataclass to the `AdapterModuleMixin` methods. We see an example below - " + ], + "metadata": { + "id": "0E2877IlIVoM" + } + }, + { + "cell_type": "markdown", + "source": [ + "## [Optional] Constructing Adapter Components\n", + "\n", + "Linear Adapter Modules are not the only type of adapters you can create ! In PyTorch, any torch.nn.Module can be made into an Adapter component.\n", + "\n", + "For example, you can potentially convert a pre-existing pytorch module into an adapter component. The below section is **optional**, but is recommended if you wish to create your own adapters.\n" + ], + "metadata": { + "id": "8eH6mW792lkY" + } + }, + { + "cell_type": "markdown", + "source": [ + "------\n", + "First, let us start with a simple PyTorch module." + ], + "metadata": { + "id": "lgsyaQHI3w5X" + } + }, + { + "cell_type": "code", + "source": [ + "class SimpleModule(torch.nn.Module):\n", + " def __init__(self, size: int):\n", + " super().__init__()\n", + " self.size = size\n", + " self.model = torch.nn.Sequential(\n", + " torch.nn.Linear(size, size, bias=False),\n", + " torch.nn.Identity(),\n", + " )\n", + " \n", + " def forward(self, x):\n", + " return self.model(x)" + ], + "metadata": { + "id": "wAfA3r0b3fpi" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Adapter Strategy\n", + "\n", + "Adapter modules are, at the end of the day, simply PyTorch modules. Just as PyTorch modules, they take some input tensors, perform some operation and then return some result.\n", + "\n", + "There are many ways to integrate adapters - add them as a residual, multiply pointwise, concatenate with the input (at the end or the beginning). The Adapter Strategy class determines how an adapter integrates with its input." + ], + "metadata": { + "id": "hMKoxc0e5c14" + } + }, + { + "cell_type": "code", + "source": [ + "# The earlier LinearAdapter has a simple ResidualAddStrategy\n", + "# Uncomment below to see the ResidualAddAdapterStrategy definition\n", + "# help(linear_adapter.adapter_strategy)" + ], "metadata": { - "id": "aptqy8l4qilp" + "id": "1DVeRqH65IN8" }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", "source": [ + "### Creating a custom Adapter Strategy\n", + "\n", + "Residual Add strategy can be considered the simple operation $f(x) = x + adapter(x)$ such that $adapter$'s initial outputs without training should be 0. \n", + "\n", + "In doing so, the output of the adapter augmented model is originally just $f(x) = x$, and so the model retains the exact performance of the original model (without any adapters).\n", + "\n", "-----\n", - "You will often not directly with this module, instead of passing the Config Dataclass to the `AdapterModuleMixin` methods. We see an example below - " - ] + "\n", + "Below, we will create a Multiplication adapter strategy simply as a demonstration." + ], + "metadata": { + "id": "5mWowiS269h8" + } + }, + { + "cell_type": "code", + "source": [ + "from nemo.core.classes.mixins import adapter_mixin_strategies" + ], + "metadata": { + "id": "teiTVBMq687x" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "We will implement a special `forward` method of adapters as follows" + ], + "metadata": { + "id": "BS2W1a919KDr" + } + }, + { + "cell_type": "code", + "source": [ + "# Uncomment to see the definition of the AbstractAdapterStrategy\n", + "# help(adapter_mixin_strategies.AbstractAdapterStrategy)" + ], + "metadata": { + "id": "G9_bq05P9Izh" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "class MultiplicationAdapterStrategy(adapter_mixin_strategies.AbstractAdapterStrategy):\n", + "\n", + " def __init__(self, scaling_factor: float = 1.0):\n", + " super().__init__()\n", + " self.scale = scaling_factor\n", + "\n", + " def forward(self, input: torch.Tensor, adapter: torch.nn.Module, *, module: 'AdapterModuleMixin'):\n", + " # This is the forward method that takes in the previous input (here, its a tensor, but it can be a dictionary, a tuple, a class, anything really).\n", + " # The second argument is the adapter that is currently being applied to this input\n", + " # The final argument is the entire nn.Module that supports adapters.\n", + " # In this case, the final argument would be the entire `MLP` module\n", + " \n", + " # Equivalent to f(x) = x * adapter(x)\n", + " adapter_out = adapter(input) # compute the adapter output from the input(s)\n", + " result = input * adapter_out\n", + "\n", + " # Apply scaling factor. Equivalent to f(x) = scale * (x * adapter(x))\n", + " result = self.scale * result\n", + " return result\n" + ], + "metadata": { + "id": "T3agPtjA3fst" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Design a corresponding dataclass for the Adapter Strategy\n", + "\n", + "In order to make usage of this class easier, you should create a Dataclass that can be used to create the strategy easily. We show an example below:" + ], + "metadata": { + "id": "ZOS9Z3KVAGH2" + } + }, + { + "cell_type": "code", + "source": [ + "from dataclasses import dataclass\n", + "\n", + "@dataclass\n", + "class MultiplicationAdapterStrategyConfig:\n", + " scaling_factor: float = 1.0\n", + "\n", + " # mandatory field\n", + " _target_: str = \"{0}.{1}\".format(\n", + " MultiplicationAdapterStrategy.__module__, MultiplicationAdapterStrategy.__name__\n", + " ) " + ], + "metadata": { + "id": "MI4oYRYDAeqb" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Creating a Custom Adapter Component\n", + "\n", + "Now that we have both the basic PyTorch module (`SimpleModule`) as well as the Adapter Strategy (`MultiplicationAdapterStrategy`), we can now construct a new adapter component.\n", + "\n", + "The prime difference between a basic PyTorch module and an adapter componet is the `adapter_strategy` - it defines how an adapter integrates with the original input. " + ], + "metadata": { + "id": "mZ22m_ZK_ifY" + } + }, + { + "cell_type": "code", + "source": [ + "class SimpleModuleAdapter(SimpleModule, adapter_modules.AdapterModuleUtil):\n", + "\n", + " def __init__(self, size: int, adapter_strategy: MultiplicationAdapterStrategy = None):\n", + " \"\"\"\n", + " The input arguments should match the original module so you can pass the inputs to the module.\n", + " It should also accept an adapter strategy.\n", + "\n", + " We will then use the method `setup_adapter_strategy()` to prepare the component to be used as an adapter.\n", + " Note: Passing None to the strategy will let it pick a default strategy provided by the method\n", + " `get_default_strategy_config()`.\n", + " \"\"\"\n", + " super().__init__(size=size)\n", + "\n", + " # Prepare the adapter strategy\n", + " self.setup_adapter_strategy(adapter_strategy)\n", + "\n", + " # Initialize the weights to be 0 at init\n", + " self.reset_parameters()\n", + "\n", + " # Note: In this case, because we didnt add new modules, nor change how the original forward works\n", + " # We dont need to subclass and override forward() !\n", + " \n", + " def reset_parameters(self):\n", + " # We normally want an adapter at initialization to have no effect on the output\n", + " # Therefore we replace the random uniform with a simple identity matrix, which will cause\n", + " # the output of the adapter to match the input\n", + " with torch.no_grad():\n", + " self.model[0].weight = torch.nn.Parameter(torch.eye(self.size))\n", + " \n", + "\n", + " def get_default_strategy_config(self) -> 'dataclass':\n", + " \"\"\"\n", + " Make the default adapter strategy of this component be the `MultiplicationAdapterStrategy()` \n", + " \"\"\"\n", + " return MultiplicationAdapterStrategyConfig()" + ], + "metadata": { + "id": "wCcdXIix__Yq" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "-----\n", + "Let's quickly test whether the adapter behaves as expected" + ], + "metadata": { + "id": "lYfjhWBtEBYj" + } + }, + { + "cell_type": "code", + "source": [ + "simple_adapter = SimpleModuleAdapter(size=5)\n", + "multiplication_strategy = simple_adapter.adapter_strategy\n", + "x = torch.randn(1, 5)\n", + "adapter_x = simple_adapter(x)\n", + "output = multiplication_strategy(input=x, adapter=simple_adapter, module=None) # Normally you would pass the module here, but in this example can be skipped.\n", + "print(\"Original input :\", x)\n", + "print(\"Adapter output :\", adapter_x)\n", + "print(\"Strategy output:\", output)" + ], + "metadata": { + "id": "tVkikcmCEJun" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "We see that the original input passes through the adapter, which results in the original values being returned successfully, and then the adapter strategy multiplies the two values (effectively computing the square of the input).\n", + "\n", + "This is a sufficient demonstration of creating custom adapters, and we would normally not perform elementwise multiplication as an adapter strategy. Normally we would prefer the output of the strategy to be equal to the original init, at least at initialization." + ], + "metadata": { + "id": "mi7WpFgxHmGQ" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Design a corresponding dataclass for the Adapter Component\n", + "\n", + "In order to make usage of this Adapter component easier, you should create a Dataclass that can be used to create the component easily. We show an example below:" + ], + "metadata": { + "id": "OrmCQCX6ImKs" + } + }, + { + "cell_type": "code", + "source": [ + "from typing import Optional\n", + "\n", + "@dataclass\n", + "class SimpleModuleAdapterConfig:\n", + " size: int\n", + " adapter_strategy: Optional[MultiplicationAdapterStrategyConfig] = None\n", + "\n", + " # mandatory field\n", + " _target_: str = \"{0}.{1}\".format(\n", + " SimpleModuleAdapter.__module__, SimpleModuleAdapter.__name__\n", + " ) " + ], + "metadata": { + "id": "Ml17OoOVJOwR" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -1657,15 +1965,14 @@ "id": "iz2wF3cd-6MF" }, "outputs": [], - "source": [ - "" - ] + "source": [] } ], "metadata": { "colab": { - "collapsed_sections": [], - "name": "NeMo_Adapter_Primer.ipynb", + "collapsed_sections": [ + "8eH6mW792lkY" + ], "provenance": [] }, "kernelspec": { diff --git a/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb b/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb index 468c602a8765..e57e71584aff 100644 --- a/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb +++ b/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb @@ -3,11 +3,9 @@ "nbformat_minor": 0, "metadata": { "colab": { - "name": "ASR_with_Adapters.ipynb", "provenance": [], "collapsed_sections": [ - "UKTiAPV_sdFI", - "x-0fzrfPshJj", + "dm-qqTdZDUlZ", "GGKgsW5gvAuf", "0CqpJGR6ecYW" ], @@ -387,7 +385,11 @@ "\n", "Next, we attach a `Trainer` to the model to be appropriately initialized.\n", "\n", - "Note that we select just **300 update steps**, which is approximately just ten epochs over this dataset at batch sizes of 32. You can experiment with different steps to see the effect of overfitting or underfitting." + "Note that we select just **300 update steps**, which is approximately just ten epochs over this dataset at batch sizes of 32. You can experiment with different steps to see the effect of overfitting or underfitting.\n", + "\n", + "**Recommendation**:\n", + "\n", + "You should normally start with 1-5 epochs of adaptation over your entire new domain, and then increase or decrease your number of training steps to trade off a balance in accuracy on general speech." ], "metadata": { "id": "x0C2r7388cRd" @@ -619,7 +621,11 @@ "\n", "Now that we have a baseline result, let us set up the data loaders of this model to prepare for training on this dataset.\n", "\n", - "You may note: this step is nearly identical to from scratch training / fine-tuning and skips the tokenizer construction/change vocabulary steps." + "You may note: this step is nearly identical to from scratch training / fine-tuning and skips the tokenizer construction/change vocabulary steps.\n", + "\n", + "**Note**: Each model may have special parameters in their data loader. Please refer to the configs of the pre-trained models to determine what additional changes are necessary). Below recommendations are primarily for Conformer CTC and may differ from model to model.\n", + "\n", + "You can parse the model config via - `print(OmegaConf.to_yaml(model.cfg))`" ], "metadata": { "id": "b-L3prIzs3CW" @@ -657,6 +663,37 @@ "execution_count": null, "outputs": [] }, + { + "cell_type": "markdown", + "source": [ + "## Setup Spectrogram Augmentation\n", + "\n", + "For this experiment we will continue to use the original spec augmentation config in the base model, however you may find better results by modifying the strength of this augmentation.\n", + "\n", + "**Note**: The script inside ASR examples **disables spec augment entirely**. This is done in order to provide a stable default to measure the best possible adaptation case, but may severely degrade the performance on general speech. Please be careful when copying the hyper parameters from the tutorial to the script for large scale experimentatin." + ], + "metadata": { + "id": "T3VuqcGTNuIJ" + } + }, + { + "cell_type": "code", + "source": [ + "with open_dict(model.cfg):\n", + " # Spec Augment\n", + " model.cfg.spec_augment.freq_masks = model.cfg.spec_augment.freq_masks # Can be changed\n", + " model.cfg.spec_augment.freq_width = model.cfg.spec_augment.freq_width # Can be changed\n", + " model.cfg.spec_augment.time_masks = model.cfg.spec_augment.time_masks # Can be changed\n", + " model.cfg.spec_augment.time_width = model.cfg.spec_augment.time_width # Can be changed\n", + "\n", + "model.spec_augmentation = model.from_config_dict(model.cfg.spec_augment)" + ], + "metadata": { + "id": "T-XFuaA3OlOB" + }, + "execution_count": null, + "outputs": [] + }, { "cell_type": "markdown", "source": [ @@ -666,12 +703,26 @@ "\n", "For this reason, we have chosen hyperparameter settings that are significantly different from other tutorials - a small learning rate multiplier of 0.1 (for NoamScheduler) and a small warmup phase of just 100 steps (remember, trainer.max_steps is just 300!).\n", "\n", - "Feel free to modify these values to see the effect on adapters' convergence." + "Feel free to modify these values to see the effect on adapters' convergence.\n", + "\n", + "**Note**: The hyper parameters below correspond to the base model and may not match those applied in the ASR examples! Please note that the script the examples defaults to an **AdamW** optimizer with a **CosineAnnealing** scheduler, where as the config of Conformers is geneally a **AdamW** optimizer with a **NoamAnnealing** scheduler. The *learning rate*, *weight decay* and other hyper parameters may not be exactly the same between the tutorial and the example scripts, so please be careful when transferring the hyper parameters for large scale experiments." ], "metadata": { "id": "xGpdUWl_tGuA" } }, + { + "cell_type": "code", + "source": [ + "if 'optim' in model.cfg:\n", + " print(OmegaConf.to_yaml(model.cfg.optim))" + ], + "metadata": { + "id": "UDEIfMTcP6j6" + }, + "execution_count": null, + "outputs": [] + }, { "cell_type": "code", "source": [ @@ -688,6 +739,82 @@ "execution_count": null, "outputs": [] }, + { + "cell_type": "markdown", + "source": [ + "# Adapters: Supported Components\n", + "\n", + "A NeMo model may have multiple types of adapters that are supported in each of their components. Let us see at a glance what are some of the adapter types supported by the Conformer ASR model.\n", + "\n", + "**Note**: Every domain may support their own types of adapters, and use them in different ways. Please refer to the documentation of each domain for information on the adapter support." + ], + "metadata": { + "id": "AGrThAt9Qh0D" + } + }, + { + "cell_type": "markdown", + "source": [ + "-----\n", + "Let's start with the modules in which the model will support adapters. We can select these adapters with a special syntax to construct \"Module adapters\".\n", + "\n", + "**Note**: `''` refers to the \"default\" adapter - usually the `encoder` but it is model dependent. It may also be that no specific modules are provided, in which case only `default` adapters will be available." + ], + "metadata": { + "id": "Wq1JLbNvROcL" + } + }, + { + "cell_type": "code", + "source": [ + "if hasattr(model, 'adapter_module_names'):\n", + " print(model.adapter_module_names)" + ], + "metadata": { + "id": "fRIDhU8RVBwi" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "-----\n", + "Next, we can try to obtain the accepted types of each of the child modules in the Model." + ], + "metadata": { + "id": "u5BOWWBjfQwN" + } + }, + { + "cell_type": "code", + "source": [ + "for module in model.children():\n", + " if hasattr(module, 'get_accepted_adapter_types'):\n", + " types = module.get_accepted_adapter_types()\n", + " print(\"Module : \", module.__class__.__name__)\n", + "\n", + " for tp in types:\n", + " print(tp)\n", + " print()" + ], + "metadata": { + "id": "iNnSp_azQ2u8" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "-----\n", + "\n", + "As you can see, a single component of the model may support one or more adapter types (or none at all)! Below, we will experiment with the simple Linear Adapters, but as an excercise, you might try to use other adapter types present here." + ], + "metadata": { + "id": "YXTC4LiSnB2O" + } + }, { "cell_type": "markdown", "source": [ @@ -724,7 +851,7 @@ "adapter_name = \"AN4\" #@param {type:\"string\"}\n", "adapter_dim = 32 #@param {type:\"integer\"}\n", "adapter_activation = \"swish\" #@param {type:\"string\"}\n", - "adapter_norm_position = \"post\" #@param [\"pre\", \"post\"]" + "adapter_norm_position = \"pre\" #@param [\"pre\", \"post\"]" ], "metadata": { "id": "dlj0Yud4MxOi" @@ -1186,4 +1313,4 @@ } } ] -} +} \ No newline at end of file From 00d8ec835aac425fc7730bf9d1abf9cdcf540d66 Mon Sep 17 00:00:00 2001 From: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Date: Wed, 14 Dec 2022 13:25:00 -0800 Subject: [PATCH 043/154] [TTS] add type hints and change varialbe names for tokenizers and g2p (#5602) * [TTS] add type hints and change variable names for tokenizers and g2p Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- .../text_to_speech/tts_tokenizers.py | 4 +- nemo_text_processing/g2p/data/data_utils.py | 93 ++++++++++++++----- nemo_text_processing/g2p/modules.py | 15 +-- .../text_normalization/normalize.py | 2 +- .../g2p/data/test_data_utils.py | 18 ++-- 5 files changed, 88 insertions(+), 44 deletions(-) diff --git a/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py b/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py index 9bd473b3c7c2..08c7bb09cac3 100644 --- a/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py +++ b/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py @@ -20,10 +20,10 @@ from typing import List, Optional from nemo_text_processing.g2p.data.data_utils import ( + any_locale_text_preprocessing, chinese_text_preprocessing, english_text_preprocessing, german_text_preprocessing, - ipa_text_preprocessing, spanish_text_preprocessing, ) @@ -583,7 +583,7 @@ def __init__( if locale == "en-US": self.text_preprocessing_func = lambda text: english_text_preprocessing(text, lower=False) else: - self.text_preprocessing_func = ipa_text_preprocessing + self.text_preprocessing_func = any_locale_text_preprocessing def encode(self, text): """See base class for more information.""" diff --git a/nemo_text_processing/g2p/data/data_utils.py b/nemo_text_processing/g2p/data/data_utils.py index b90ee394ae57..15522a796bf5 100644 --- a/nemo_text_processing/g2p/data/data_utils.py +++ b/nemo_text_processing/g2p/data/data_utils.py @@ -18,9 +18,18 @@ import string import unicodedata from builtins import str as unicode -from typing import List - -__all__ = ['read_wordids'] +from typing import List, Tuple + +__all__ = [ + "read_wordids", + "chinese_text_preprocessing", + "english_text_preprocessing", + "german_text_preprocessing", + "any_locale_text_preprocessing", + "spanish_text_preprocessing", + "any_locale_word_tokenize", + "english_word_tokenize", +] # + # Derived from LJSpeech @@ -30,11 +39,11 @@ } SYNOGLYPH2ASCII = {g: asc for asc, glyphs in _synoglyphs.items() for g in glyphs} -# Example of parsing by groups via _WORDS_RE. -# Groups: -# 1st group -- valid english words, -# 2nd group -- any substring starts from | to | (mustn't be nested), useful when you want to leave sequence unchanged, -# 3rd group -- punctuation marks. +# Example of parsing by groups via _WORDS_RE_EN. +# Regular expression pattern groups: +# 1st group -- valid english words, +# 2nd group -- any substring starts from | to | (mustn't be nested), useful when you want to leave sequence unchanged, +# 3rd group -- punctuation marks or whitespaces. # Text (first line) and mask of groups for every char (second line). # config file must contain |EY1 EY1|, B, C, D, E, F, and G. @@ -42,10 +51,10 @@ LATIN_ALPHABET_BASIC = "A-Za-z" ACCENTED_CHARS = "À-ÖØ-öø-ÿ" LATIN_CHARS_ALL = f"{LATIN_ALPHABET_BASIC}{ACCENTED_CHARS}" -_WORDS_RE = re.compile( +_WORDS_RE_EN = re.compile( fr"([{LATIN_ALPHABET_BASIC}]+(?:[{LATIN_ALPHABET_BASIC}\-']*[{LATIN_ALPHABET_BASIC}]+)*)|(\|[^|]*\|)|([^{LATIN_ALPHABET_BASIC}|]+)" ) -_WORDS_RE_IPA = re.compile( +_WORDS_RE_ANY_LOCALE = re.compile( fr"([{LATIN_CHARS_ALL}]+(?:[{LATIN_CHARS_ALL}\-']*[{LATIN_CHARS_ALL}]+)*)|(\|[^|]*\|)|([^{LATIN_CHARS_ALL}|]+)" ) @@ -121,41 +130,75 @@ def english_text_preprocessing(text, lower=True): return text -def _word_tokenize(words): +def _word_tokenize(words: List[Tuple[str, str, str]]) -> List[Tuple[List[str], bool]]: """ - Convert text (str) to List[Tuple[Union[str, List[str]], bool]] where every tuple denotes word representation and - flag whether to leave unchanged or not. - Word can be one of: valid english word, any substring starts from | to | (unchangeable word) or punctuation marks. - This function expects that unchangeable word is carefully divided by spaces (e.g. HH AH L OW). - Unchangeable word will be splitted by space and represented as List[str], other cases are represented as str. + Process a list of words and attach indicators showing if each word is unchangeable or not. Each word representation + can be one of valid word, any substring starting from | to | (unchangeable word), or punctuation marks including + whitespaces. This function will split unchanged strings by whitespaces and return them as `List[str]`. For example, + + .. code-block:: python + [ + ('Hello', '', ''), # valid word + ('', '', ' '), # punctuation mark + ('World', '', ''), # valid word + ('', '', ' '), # punctuation mark + ('', '|NVIDIA unchanged|', ''), # unchangeable word + ('', '', '!') # punctuation mark + ] + + will be converted into, + + .. code-block:: python + [ + (["hello"], False), + ([" "], False), + (["world"], False), + ([" "], False), + (["nvidia", "unchanged"], True), + (["!"], False) + ] + + Args: + words (List[str]): a list of tuples like `(maybe_word, maybe_without_changes, maybe_punct)` where each element + corresponds to a non-overlapping match of either `_WORDS_RE_EN` or `_WORDS_RE_ANY_LOCALE`. + + Returns: List[Tuple[List[str], bool]], a list of tuples like `(a list of words, is_unchanged)`. + """ result = [] for word in words: maybe_word, maybe_without_changes, maybe_punct = word + without_changes = False if maybe_word != '': - without_changes = False - result.append((maybe_word.lower(), without_changes)) + token = [maybe_word.lower()] elif maybe_punct != '': - without_changes = False - result.append((maybe_punct, without_changes)) + token = [maybe_punct] elif maybe_without_changes != '': without_changes = True - result.append((maybe_without_changes[1:-1].split(" "), without_changes)) + token = maybe_without_changes[1:-1].split(" ") + else: + raise ValueError( + f"This is not expected. Found empty string: <{word}>. " + f"Please validate your regular expression pattern '_WORDS_RE_EN' or '_WORDS_RE_ANY_LOCALE'." + ) + + result.append((token, without_changes)) + return result def english_word_tokenize(text): - words = _WORDS_RE.findall(text) + words = _WORDS_RE_EN.findall(text) return _word_tokenize(words) -def ipa_word_tokenize(text): - words = _WORDS_RE_IPA.findall(text) +def any_locale_word_tokenize(text): + words = _WORDS_RE_ANY_LOCALE.findall(text) return _word_tokenize(words) -def ipa_text_preprocessing(text): +def any_locale_text_preprocessing(text): return text.lower() diff --git a/nemo_text_processing/g2p/modules.py b/nemo_text_processing/g2p/modules.py index b9e6c5b51514..7dac0741d289 100644 --- a/nemo_text_processing/g2p/modules.py +++ b/nemo_text_processing/g2p/modules.py @@ -23,7 +23,7 @@ import nltk import torch -from nemo_text_processing.g2p.data.data_utils import english_word_tokenize, ipa_word_tokenize +from nemo_text_processing.g2p.data.data_utils import any_locale_word_tokenize, english_word_tokenize from nemo.collections.common.tokenizers.text_to_speech.ipa_lexicon import validate_locale from nemo.utils import logging @@ -235,9 +235,9 @@ def __call__(self, text): prons.extend(word) continue - word_by_hyphen = word.split("-") - - pron, is_handled = self.parse_one_word(word) + word_str = word[0] + word_by_hyphen = word_str.split("-") + pron, is_handled = self.parse_one_word(word_str) if not is_handled and len(word_by_hyphen) > 1: pron = [] @@ -354,7 +354,7 @@ def __init__( if locale == "en-US": word_tokenize_func = english_word_tokenize else: - word_tokenize_func = ipa_word_tokenize + word_tokenize_func = any_locale_word_tokenize super().__init__( phoneme_dict=self.phoneme_dict, @@ -497,9 +497,10 @@ def __call__(self, text: str) -> List[str]: prons.extend(word) continue - pron, is_handled = self.parse_one_word(word) + word_str = word[0] + word_by_hyphen = word_str.split("-") + pron, is_handled = self.parse_one_word(word_str) - word_by_hyphen = word.split("-") if not is_handled and len(word_by_hyphen) > 1: pron = [] for sub_word in word_by_hyphen: diff --git a/nemo_text_processing/text_normalization/normalize.py b/nemo_text_processing/text_normalization/normalize.py index 2e9105b15bf0..1511a9fbfea9 100644 --- a/nemo_text_processing/text_normalization/normalize.py +++ b/nemo_text_processing/text_normalization/normalize.py @@ -55,7 +55,7 @@ class Normalizer: cache_dir: path to a dir with .far grammar file. Set to None to avoid using cache. overwrite_cache: set to True to overwrite .far files whitelist: path to a file with whitelist replacements - post_process: WFST-based post processing, e.g. to remove extra spaces added during TN. + post_process: WFST-based post-processing, e.g. to remove extra spaces added during TN. Note: punct_post_process flag in normalize() supports all languages. """ diff --git a/tests/nemo_text_processing/g2p/data/test_data_utils.py b/tests/nemo_text_processing/g2p/data/test_data_utils.py index 28edca13fa58..d7c829cf361f 100644 --- a/tests/nemo_text_processing/g2p/data/test_data_utils.py +++ b/tests/nemo_text_processing/g2p/data/test_data_utils.py @@ -13,13 +13,13 @@ # limitations under the License. import pytest -from nemo_text_processing.g2p.data.data_utils import english_word_tokenize, ipa_word_tokenize +from nemo_text_processing.g2p.data.data_utils import any_locale_word_tokenize, english_word_tokenize class TestDataUtils: @staticmethod def _create_expected_output(words): - return [(word, False) for word in words] + return [([word], False) for word in words] @pytest.mark.run_only_on('CPU') @pytest.mark.unit @@ -63,34 +63,34 @@ def test_english_word_tokenize_with_compound_words(self): @pytest.mark.unit def test_english_word_tokenize_with_escaped(self): input_text = "Leave |this part UNCHANGED|." - expected_output = [("leave", False), (" ", False), (["this", "part", "UNCHANGED"], True), (".", False)] + expected_output = [(["leave"], False), ([" "], False), (["this", "part", "UNCHANGED"], True), (["."], False)] output = english_word_tokenize(input_text) assert output == expected_output @pytest.mark.run_only_on('CPU') @pytest.mark.unit - def test_ipa_word_tokenize(self): + def test_any_locale_word_tokenize(self): input_text = "apple banana pear" expected_output = self._create_expected_output(["apple", " ", "banana", " ", "pear"]) - output = ipa_word_tokenize(input_text) + output = any_locale_word_tokenize(input_text) assert output == expected_output @pytest.mark.run_only_on('CPU') @pytest.mark.unit - def test_ipa_word_tokenize_with_accents(self): + def test_any_locale_word_tokenize_with_accents(self): input_text = "The naïve piñata at the café..." expected_output = self._create_expected_output( ["the", " ", "naïve", " ", "piñata", " ", "at", " ", "the", " ", "café", "..."] ) - output = ipa_word_tokenize(input_text) + output = any_locale_word_tokenize(input_text) assert output == expected_output @pytest.mark.run_only_on('CPU') @pytest.mark.unit - def test_ipa_word_tokenize_with_non_latin_chars(self): + def test_any_locale_word_tokenize_with_numbers(self): input_text = "Three times× four^teen ÷divided by [movies] on \slash." expected_output = self._create_expected_output( [ @@ -115,5 +115,5 @@ def test_ipa_word_tokenize_with_non_latin_chars(self): ] ) - output = ipa_word_tokenize(input_text) + output = any_locale_word_tokenize(input_text) assert output == expected_output From 676a5ea0a61c9a21605eb6ab2581af97efa8417c Mon Sep 17 00:00:00 2001 From: Micha Livne Date: Thu, 15 Dec 2022 00:27:50 +0200 Subject: [PATCH 044/154] 1. Added missing import for gather_objects. (#5627) Signed-off-by: Micha Livne Signed-off-by: Micha Livne Co-authored-by: Micha Livne Signed-off-by: Elena Rastorgueva --- nemo/utils/distributed.py | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/nemo/utils/distributed.py b/nemo/utils/distributed.py index 6b9866fbe252..709d4180426e 100644 --- a/nemo/utils/distributed.py +++ b/nemo/utils/distributed.py @@ -18,6 +18,13 @@ from nemo.utils import logging +try: + from apex.transformer import parallel_state + + HAVE_APEX = True +except (ImportError, ModuleNotFoundError): + HAVE_APEX = False + def initialize_distributed(args, backend='nccl'): """Initialize torch.distributed.""" From 78488a7cfe2d326db1df6ac6c7afd3a253296305 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 14 Dec 2022 17:31:58 -0800 Subject: [PATCH 045/154] [TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. (#5596) (#5625) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- docs/source/tts/data/datasets.csv | 3 +- docs/source/tts/data/ngc_models_aligner.csv | 2 +- docs/source/tts/data/ngc_models_am.csv | 3 +- docs/source/tts/data/ngc_models_vocoder.csv | 3 +- docs/source/tts/datasets.rst | 33 +++++++++++++++++---- nemo/collections/tts/models/fastpitch.py | 9 ++++++ nemo/collections/tts/models/hifigan.py | 11 +++++++ 7 files changed, 55 insertions(+), 9 deletions(-) diff --git a/docs/source/tts/data/datasets.csv b/docs/source/tts/data/datasets.csv index 417594392eea..4c8d553ec3c1 100644 --- a/docs/source/tts/data/datasets.csv +++ b/docs/source/tts/data/datasets.csv @@ -9,4 +9,5 @@ Spanish,es-CL,Crowdsourced high-quality Chilean Spanish,31,13,18,7.15,2.84,4.31, Spanish,es-CO,Crowdsourced high-quality Colombian Spanish,33,16,17,7.58,3.74,3.84,"48,000Hz",https://www.openslr.org/72/ Spanish,es-PE,Crowdsourced high-quality Peruvian Spanish,38,18,20,9.22,4.35,4.87,"48,000Hz",https://www.openslr.org/73/ Spanish,es-PR,Crowdsourced high-quality Puerto Rico Spanish,5,5,0,1.00,1.00,0.00,"48,000Hz",https://www.openslr.org/74/ -Spanish,es-VE,Crowdsourced high-quality Venezuelan Spanish,23,11,12,4.81,2.41,2.40,"48,000Hz",https://www.openslr.org/75/ \ No newline at end of file +Spanish,es-VE,Crowdsourced high-quality Venezuelan Spanish,23,11,12,4.81,2.41,2.40,"48,000Hz",https://www.openslr.org/75/ +Chinese,zh-CN,SFSpeech Chinese/English Bilingual Speech,1,1,0,4.50,4.50,0.00,"22,050Hz",https://catalog.ngc.nvidia.com/orgs/nvidia/resources/sf_bilingual_speech_zh_en diff --git a/docs/source/tts/data/ngc_models_aligner.csv b/docs/source/tts/data/ngc_models_aligner.csv index 6860af0e55cf..039ddf9a272a 100644 --- a/docs/source/tts/data/ngc_models_aligner.csv +++ b/docs/source/tts/data/ngc_models_aligner.csv @@ -1,3 +1,3 @@ Locale,Model Name,Dataset,Sampling Rate,#Spk,Phoneme Unit,Model Class,Overview,Checkpoint en-US,tts_en_radtts_aligner,LJSpeech,22050Hz,1,ARPABET,nemo.collections.tts.models.aligner.AlignerModel,`tts_en_radtts_aligner `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_radtts_aligner/versions/ARPABET_1.11.0/files/Aligner.nemo`` -en-US,tts_en_radtts_aligner_ipa,LJSpeech,22050Hz,1,IPA,nemo.collections.tts.models.aligner.AlignerModel,`tts_en_radtts_aligner `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_radtts_aligner/versions/IPA_1.13.0/files/Aligner.nemo`` \ No newline at end of file +en-US,tts_en_radtts_aligner_ipa,LJSpeech,22050Hz,1,IPA,nemo.collections.tts.models.aligner.AlignerModel,`tts_en_radtts_aligner `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_radtts_aligner/versions/IPA_1.13.0/files/Aligner.nemo`` diff --git a/docs/source/tts/data/ngc_models_am.csv b/docs/source/tts/data/ngc_models_am.csv index 602e572fac70..c31922ca907f 100644 --- a/docs/source/tts/data/ngc_models_am.csv +++ b/docs/source/tts/data/ngc_models_am.csv @@ -8,4 +8,5 @@ en-US,RAD-TTS,TBD,TBD,TBD,ARPABET,nemo.collections.tts.models.radtts.RadTTSModel en-US,tts_en_tacotron2,LJSpeech,22050Hz,1,ARPABET,nemo.collections.tts.models.tacotron2.Tacotron2Model,`tts_en_tacotron2 `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_tacotron2/versions/1.0.0/files/tts_en_tacotron2.nemo`` de-DE,tts_de_fastpitch_multispeaker_5,HUI Audio Corpus German,44100Hz,5,ARPABET,nemo.collections.tts.models.fastpitch.FastPitchModel,`tts_de_fastpitch_multispeaker_5 `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitch_multispeaker_5/versions/1.11.0/files/tts_de_fastpitch_multispeaker_5.nemo`` de-DE,tts_de_fastpitch_singlespeaker,Thorsten Müller (German Neutral-TTS dataset),22050Hz,1,ARPABET,nemo.collections.tts.models.fastpitch.FastPitchModel,`tts_de_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitchhifigan/versions/1.10.0/files/tts_de_fastpitch_align.nemo`` -es,tts_es_fastpitch_multispeaker,OpenSLR crowdsourced Latin American Spanish,44100Hz,174,grapheme,nemo.collections.tts.models.fastpitch.FastPitchModel,`tts_es_multispeaker_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.14.0/files/tts_es_fastpitch_multispeaker.nemo`` \ No newline at end of file +es,tts_es_fastpitch_multispeaker,OpenSLR crowdsourced Latin American Spanish,44100Hz,174,grapheme,nemo.collections.tts.models.fastpitch.FastPitchModel,`tts_es_multispeaker_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.14.0/files/tts_es_fastpitch_multispeaker.nemo`` +zh-CN ,tts_zh_fastpitch_sfspeech,SFSpeech Chinese/English Bilingual Speech,22050Hz,1,pinyin,nemo.collections.tts.models.fastpitch.FastPitchModel,`tts_zh_fastpitch_hifigan_sfspeech `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_zh_fastpitch_hifigan_sfspeech/versions/1.14.0/files/tts_zh_fastpitch_sfspeech.nemo`` diff --git a/docs/source/tts/data/ngc_models_vocoder.csv b/docs/source/tts/data/ngc_models_vocoder.csv index fdbd7a22dc0e..b6f23b3425f9 100644 --- a/docs/source/tts/data/ngc_models_vocoder.csv +++ b/docs/source/tts/data/ngc_models_vocoder.csv @@ -9,4 +9,5 @@ en-US,tts_waveglow_88m,librosa.filters.mel,LJSpeech,22050Hz,1,nemo.collections.t en-US,tts_waveglow_268m,librosa.filters.mel,LJSpeech,22050Hz,1,nemo.collections.tts.models.waveglow.WaveGlowModel,`tts_waveglow_268m `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_waveglow_268m/versions/1.0.0rc1/files/tts_waveglow_268m.nemo`` de-DE,tts_de_hui_hifigan_ft_fastpitch_multispeaker_5,FastPitch,HUI Audio Corpus German,44100Hz,5,nemo.collections.tts.models.hifigan.HifiGanModel,`tts_de_fastpitch_multispeaker_5 `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitch_multispeaker_5/versions/1.11.0/files/tts_de_hui_hifigan_ft_fastpitch_multispeaker_5.nemo`` de-DE,tts_de_slr_hifigan_ft_fastpitch_singlespeaker,FastPitch,Thorsten Müller (German Neutral-TTS dataset),22050Hz,1,nemo.collections.tts.models.hifigan.HifiGanModel,`tts_de_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitchhifigan/versions/1.10.0/files/tts_de_hifigan.nemo`` -es,tts_es_hifigan_ft_fastpitch_multispeaker,FastPitch,OpenSLR crowdsourced Latin American Spanish,44100Hz,174,nemo.collections.tts.models.hifigan.HifiGanModel,`tts_es_multispeaker_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.14.0/files/tts_es_hifigan_ft_fastpitch_multispeaker.nemo`` \ No newline at end of file +es,tts_es_hifigan_ft_fastpitch_multispeaker,FastPitch,OpenSLR crowdsourced Latin American Spanish,44100Hz,174,nemo.collections.tts.models.hifigan.HifiGanModel,`tts_es_multispeaker_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.14.0/files/tts_es_hifigan_ft_fastpitch_multispeaker.nemo`` +zh-CN ,tts_zh_hifigan_sfspeech,FastPitch,SFSpeech Chinese/English Bilingual Speech,22050Hz,1,nemo.collections.tts.models.hifigan.HifiGanModel,`tts_zh_fastpitch_hifigan_sfspeech `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_zh_fastpitch_hifigan_sfspeech/versions/1.14.0/files/tts_zh_hifigan_sfspeech.nemo`` diff --git a/docs/source/tts/datasets.rst b/docs/source/tts/datasets.rst index c9241a5d40a9..77e349134a9d 100644 --- a/docs/source/tts/datasets.rst +++ b/docs/source/tts/datasets.rst @@ -1,7 +1,7 @@ Data Preprocessing ================== -NeMo TTS recipes support most of public TTS datasets that consist of multiple languages, multiple emotions, and multiple speakers. Current recipes covered English (en-US), German (de-DE), Spanish (es-ES), and Mandarin Chinese (work in progress), while the support for many other languages is under planning. NeMo provides corpus-specific data preprocessing scripts, as shown in the directory of `scripts/data_processing/tts/ `_, to convert common public TTS datasets into the format expected by the dataloaders as defined in `nemo/collections/tts/torch/data.py `_. The ``nemo_tts`` collection expects each dataset to consist of a set of utterances in individual audio files plus a ``JSON`` manifest that describes the dataset, with information about one utterance per line. The audio files can be of any format supported by `Pydub `_, though we recommend ``WAV`` files as they are the default and have been most thoroughly tested. +NeMo TTS recipes support most of public TTS datasets that consist of multiple languages, multiple emotions, and multiple speakers. Current recipes covered English (en-US), German (de-DE), Spanish (es-ES), and Mandarin Chinese (zh-CN), while the support for many other languages is under planning. NeMo provides corpus-specific data preprocessing scripts, as shown in the directory of `scripts/data_processing/tts/ `_, to convert common public TTS datasets into the format expected by the dataloaders as defined in `nemo/collections/tts/torch/data.py `_. The ``nemo_tts`` collection expects each dataset to consist of a set of utterances in individual audio files plus a ``JSON`` manifest that describes the dataset, with information about one utterance per line. The audio files can be of any format supported by `Pydub `_, though we recommend ``WAV`` files as they are the default and have been most thoroughly tested. There should be one ``JSON`` manifest file per dataset that will be passed in, therefore, if the user wants separate training and validation datasets, they should also have separate manifests. Otherwise, they will be loading validation data with their training data and vice versa. Each line of the manifest should be in the following format: @@ -66,7 +66,9 @@ LibriTTS $ python scripts/dataset_processing/tts/libritts/get_data.py \ --data-root \ - --data-sets dev_clean + --manifests-path \ + --val-size 0.01 \ + --test-size 0.01 $ python scripts/dataset_processing/tts/extract_sup_data.py \ --config-path ljspeech/ds_conf \ @@ -88,19 +90,19 @@ The texts of this dataset has been normalized already. So there is no extra need Thorsten Müller (German Neutral-TTS dataset) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Dataset URL: https://www.openslr.org/resources/95/ -* Dataset Processing Script: https://github.com/NVIDIA/NeMo/tree/stable/scripts/dataset_processing/tts/openslr/get_data.py +* Dataset Processing Script: https://github.com/NVIDIA/NeMo/tree/stable/scripts/dataset_processing/tts/openslr_95/get_data.py * Command Line Instruction: .. code-block:: bash - $ python scripts/dataset_processing/tts/openslr/get_data.py \ + $ python scripts/dataset_processing/tts/openslr_95/get_data.py \ --data-root \ --val-size 0.1 \ --test-size 0.2 \ --seed-for-ds-split 100 $ python scripts/dataset_processing/tts/extract_sup_data.py \ - --config-path openslr/ds_conf \ + --config-path openslr_95/ds_conf \ --config-name ds_for_fastpitch_align.yaml \ manifest_filepath= \ sup_data_path= @@ -130,4 +132,25 @@ HUI Audio Corpus German --config-path hui_acg/ds_conf \ --config-name ds_for_fastpitch_align.yaml \ manifest_filepath= \ + sup_data_path= + + +SFSpeech Chinese/English Bilingual Speech +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +* Dataset URL: https://catalog.ngc.nvidia.com/orgs/nvidia/resources/sf_bilingual_speech_zh_en +* Dataset Processing Script: https://github.com/NVIDIA/NeMo/tree/stable/scripts/dataset_processing/tts/sfbilingual/get_data.py +* Command Line Instruction: + +.. code-block:: bash + + $ python scripts/dataset_processing/tts/sfbilingual/get_data.py \ + --data-root \ + --val-size 0.1 \ + --test-size 0.2 \ + --seed-for-ds-split 100 + + $ python scripts/dataset_processing/tts/extract_sup_data.py \ + --config-path sfbilingual/ds_conf \ + --config-name ds_for_fastpitch_align.yaml \ + manifest_filepath= \ sup_data_path= \ No newline at end of file diff --git a/nemo/collections/tts/models/fastpitch.py b/nemo/collections/tts/models/fastpitch.py index fd3a4412889a..95745140db45 100644 --- a/nemo/collections/tts/models/fastpitch.py +++ b/nemo/collections/tts/models/fastpitch.py @@ -610,6 +610,15 @@ def list_available_models(cls) -> 'List[PretrainedModelInfo]': ) list_of_models.append(model) + # zh, single speaker, 22050Hz, SFSpeech Bilingual Chinese/English dataset + model = PretrainedModelInfo( + pretrained_model_name="tts_zh_fastpitch_sfspeech", + location="https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_zh_fastpitch_hifigan_sfspeech/versions/1.14.0/files/tts_zh_fastpitch_sfspeech.nemo", + description="This model is trained on a single female speaker in SFSpeech Bilingual Chinese/English dataset sampled at 22050Hz and can be used to generate female Mandarin Chinese voices.", + class_=cls, + ) + list_of_models.append(model) + return list_of_models # Methods for model exportability diff --git a/nemo/collections/tts/models/hifigan.py b/nemo/collections/tts/models/hifigan.py index 533b37160d6a..b98ede98e2bd 100644 --- a/nemo/collections/tts/models/hifigan.py +++ b/nemo/collections/tts/models/hifigan.py @@ -402,6 +402,7 @@ def list_available_models(cls) -> 'Optional[Dict[str, str]]': ) list_of_models.append(model) + # Spanish, multi-speaker, 44100 Hz, Latin American Spanish OpenSLR model = PretrainedModelInfo( pretrained_model_name="tts_es_hifigan_ft_fastpitch_multispeaker", location="https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.14.0/files/tts_es_hifigan_ft_fastpitch_multispeaker.nemo", @@ -413,6 +414,16 @@ def list_available_models(cls) -> 'Optional[Dict[str, str]]': ) list_of_models.append(model) + # zh, single female speaker, 22050 Hz, SFSpeech Chinese/English Bilingual Dataset. + model = PretrainedModelInfo( + pretrained_model_name="tts_zh_hifigan_sfspeech", + location="https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_zh_fastpitch_hifigan_sfspeech/versions/1.14.0/files/tts_zh_hifigan_sfspeech.nemo", + description="This model is finetuned from the HiFiGAN pretrained checkpoint `tts_en_hifitts_hifigan_ft_fastpitch` " + "by the mel-spectrograms generated from the FastPitch checkpoint `tts_zh_fastpitch_sfspeech`. This model " + "has been tested on generating female Mandarin Chinese voices.", + class_=cls, + ) + list_of_models.append(model) return list_of_models def load_state_dict(self, state_dict, strict=True): From 9b59d16ecde8a42d578658d9afa3011ccd606037 Mon Sep 17 00:00:00 2001 From: Boris Fomitchev Date: Wed, 14 Dec 2022 18:13:11 -0800 Subject: [PATCH 046/154] Fixed RadTTS unit test (#5572) Signed-off-by: Boris Fomitchev Signed-off-by: Boris Fomitchev Signed-off-by: Elena Rastorgueva --- nemo/collections/tts/modules/radtts.py | 2 +- tests/collections/tts/test_tts_exportables.py | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/nemo/collections/tts/modules/radtts.py b/nemo/collections/tts/modules/radtts.py index 9f360a4e5a33..73f73e3d17e5 100644 --- a/nemo/collections/tts/modules/radtts.py +++ b/nemo/collections/tts/modules/radtts.py @@ -771,7 +771,7 @@ def input_example(self, max_batch=1, max_dim=256): par = next(self.parameters()) sz = (max_batch, max_dim) inp = torch.randint(16, 32, sz, device=par.device, dtype=torch.int64) - lens = torch.randint(max_dim // 4, max_dim // 2, (max_batch,), device=par.device, dtype=torch.int) + lens = torch.randint(max_dim // 2, max_dim, (max_batch,), device=par.device, dtype=torch.int) speaker = torch.randint(0, 1, (max_batch,), device=par.device, dtype=torch.int64) inputs = { 'text': inp, diff --git a/tests/collections/tts/test_tts_exportables.py b/tests/collections/tts/test_tts_exportables.py index c7083b45f6d7..bf2c0842eb91 100644 --- a/tests/collections/tts/test_tts_exportables.py +++ b/tests/collections/tts/test_tts_exportables.py @@ -15,6 +15,7 @@ import tempfile import pytest +import torch from omegaconf import OmegaConf from nemo.collections.tts.models import FastPitchModel, HifiGanModel, RadTTSModel @@ -73,11 +74,11 @@ def test_HifiGanModel_export_to_onnx(self, hifigan_model): filename = os.path.join(tmpdir, 'hfg.pt') model.export(output=filename, verbose=True, check_trace=True) - @pytest.mark.pleasefixme @pytest.mark.run_only_on('GPU') @pytest.mark.unit def test_RadTTSModel_export_to_torchscript(self, radtts_model): model = radtts_model.cuda() with tempfile.TemporaryDirectory() as tmpdir: filename = os.path.join(tmpdir, 'rad.ts') - model.export(output=filename, verbose=True, check_trace=True) + with torch.cuda.amp.autocast(enabled=True): + model.export(output=filename, verbose=True, check_trace=True) From 9c50507dce164889fe8a211eed879319e2e4bc15 Mon Sep 17 00:00:00 2001 From: Eric Harper Date: Thu, 15 Dec 2022 00:13:08 -0700 Subject: [PATCH 047/154] remove tests (#5633) Signed-off-by: ericharper Signed-off-by: ericharper Signed-off-by: Elena Rastorgueva --- Jenkinsfile | 325 ++++++++++++++++++++++++++-------------------------- 1 file changed, 165 insertions(+), 160 deletions(-) diff --git a/Jenkinsfile b/Jenkinsfile index 1e416b7df757..4b7ed2ce833f 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -3449,106 +3449,110 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' trainer.num_nodes=1" } } - stage('L2: Megatron GPT Prompt Tuning') { - when { - anyOf { - branch 'main' - changeRequest target: 'main' - } - } - failFast true - parallel{ - stage('GPT Prompt Learning TP=1 PP=1') { - steps { - sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning.py \ - --config-name=megatron_gpt_prompt_learning_config \ - name='/home/TestData/nlp/prompt_learning/prompt_tuning_test' \ - trainer.devices=1 \ - trainer.max_steps=6 \ - trainer.max_epochs=null \ - model.tensor_model_parallel_size=1 \ - model.virtual_prompt_style='prompt-tuning' \ - model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp1_pp1.nemo' \ - model.existing_tasks=[] \ - model.new_tasks=['rte'] \ - model.data.train_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - model.data.validation_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - model.global_batch_size=4" - sh "rm -rf /home/TestData/nlp/prompt_learning/prompt_tuning_test" - sh "rm -rf /home/TestData/nlp/prompt_learning/prompt_tuning_test.nemo" - } - } - } - } - stage('L2: Megatron GPT Prompt Learning') { - when { - anyOf { - branch 'main' - changeRequest target: 'main' - } - } - failFast true - parallel{ - stage('GPT Prompt Learning TP=2 PP=1') { - steps { - sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning.py \ - --config-name=megatron_gpt_prompt_learning_config \ - name='/home/TestData/nlp/prompt_learning/p_tuning_test_tp' \ - trainer.devices=2 \ - trainer.max_steps=6 \ - trainer.max_epochs=null \ - model.tensor_model_parallel_size=2 \ - model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp2_pp1.nemo' \ - model.existing_tasks=[] \ - model.new_tasks=['rte'] \ - model.data.train_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - model.data.validation_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - model.global_batch_size=4" - sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp" - sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py \ - virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/p_tuning_test_tp.nemo' \ - gpt_model_file='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp2_pp1.nemo' \ - inference.greedy=True \ - inference.add_BOS=False \ - trainer.devices=2 \ - tensor_model_parallel_size=2 \ - pred_file_path=/home/TestData/nlp/prompt_learning/p_tuning_test_tp_preds.txt \ - data_paths=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl']" - sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp.nemo" - sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp_preds.txt" - } - } - stage('GPT Prompt Learning TP=1 PP=2') { - steps { - sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning.py \ - --config-name=megatron_gpt_prompt_learning_config \ - name='/home/TestData/nlp/prompt_learning/p_tuning_test_pp' \ - trainer.devices=2 \ - trainer.max_steps=6 \ - trainer.max_epochs=null \ - model.pipeline_model_parallel_size=2 \ - model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp1_pp2.nemo' \ - model.existing_tasks=[] \ - model.new_tasks=['boolq'] \ - model.data.train_ds=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl'] \ - model.data.validation_ds=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl'] \ - model.global_batch_size=4" - sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp" - sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py \ - virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/p_tuning_test_pp.nemo' \ - gpt_model_file='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp1_pp2.nemo' \ - inference.greedy=True \ - inference.add_BOS=False \ - trainer.devices=2 \ - pipeline_model_parallel_size=2 \ - pred_file_path=/home/TestData/nlp/prompt_learning/p_tuning_test_pp_preds.txt \ - data_paths=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl']" - sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp.nemo" - sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp_preds.txt" - } - } - } - } + + // TODO: fix these tests! They are taking ~10 minutes to complete + // stage('L2: Megatron GPT Prompt Tuning') { + // when { + // anyOf { + // branch 'main' + // changeRequest target: 'main' + // } + // } + // failFast true + // parallel{ + // stage('GPT Prompt Learning TP=1 PP=1') { + // steps { + // sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning.py \ + // --config-name=megatron_gpt_prompt_learning_config \ + // name='/home/TestData/nlp/prompt_learning/prompt_tuning_test' \ + // trainer.devices=1 \ + // trainer.max_steps=6 \ + // trainer.max_epochs=null \ + // model.tensor_model_parallel_size=1 \ + // model.virtual_prompt_style='prompt-tuning' \ + // model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp1_pp1.nemo' \ + // model.existing_tasks=[] \ + // model.new_tasks=['rte'] \ + // model.data.train_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ + // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ + // model.global_batch_size=4" + // sh "rm -rf /home/TestData/nlp/prompt_learning/prompt_tuning_test" + // sh "rm -rf /home/TestData/nlp/prompt_learning/prompt_tuning_test.nemo" + // } + // } + // } + // } + + // TODO: fix these tests! They are taking ~10 minutes to complete + // stage('L2: Megatron GPT Prompt Learning') { + // when { + // anyOf { + // branch 'main' + // changeRequest target: 'main' + // } + // } + // failFast true + // parallel{ + // stage('GPT Prompt Learning TP=2 PP=1') { + // steps { + // sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning.py \ + // --config-name=megatron_gpt_prompt_learning_config \ + // name='/home/TestData/nlp/prompt_learning/p_tuning_test_tp' \ + // trainer.devices=2 \ + // trainer.max_steps=6 \ + // trainer.max_epochs=null \ + // model.tensor_model_parallel_size=2 \ + // model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp2_pp1.nemo' \ + // model.existing_tasks=[] \ + // model.new_tasks=['rte'] \ + // model.data.train_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ + // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ + // model.global_batch_size=4" + // sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp" + // sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py \ + // virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/p_tuning_test_tp.nemo' \ + // gpt_model_file='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp2_pp1.nemo' \ + // inference.greedy=True \ + // inference.add_BOS=False \ + // trainer.devices=2 \ + // tensor_model_parallel_size=2 \ + // pred_file_path=/home/TestData/nlp/prompt_learning/p_tuning_test_tp_preds.txt \ + // data_paths=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl']" + // sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp.nemo" + // sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp_preds.txt" + // } + // } + // stage('GPT Prompt Learning TP=1 PP=2') { + // steps { + // sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning.py \ + // --config-name=megatron_gpt_prompt_learning_config \ + // name='/home/TestData/nlp/prompt_learning/p_tuning_test_pp' \ + // trainer.devices=2 \ + // trainer.max_steps=6 \ + // trainer.max_epochs=null \ + // model.pipeline_model_parallel_size=2 \ + // model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp1_pp2.nemo' \ + // model.existing_tasks=[] \ + // model.new_tasks=['boolq'] \ + // model.data.train_ds=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl'] \ + // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl'] \ + // model.global_batch_size=4" + // sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp" + // sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py \ + // virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/p_tuning_test_pp.nemo' \ + // gpt_model_file='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp1_pp2.nemo' \ + // inference.greedy=True \ + // inference.add_BOS=False \ + // trainer.devices=2 \ + // pipeline_model_parallel_size=2 \ + // pred_file_path=/home/TestData/nlp/prompt_learning/p_tuning_test_pp_preds.txt \ + // data_paths=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl']" + // sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp.nemo" + // sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp_preds.txt" + // } + // } + // } + // } @@ -3866,66 +3870,67 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_preds.txt" } } - stage('T5 Prompt Learning TP=2 PP=1') { - steps { - sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning.py \ - --config-name=megatron_t5_prompt_learning \ - name='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2' \ - trainer.devices=2 \ - trainer.max_steps=6 \ - trainer.max_epochs=null \ - model.tensor_model_parallel_size=2 \ - model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ - model.existing_tasks=[] \ - model.new_tasks=['squad'] \ - model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - model.data.validation_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - model.global_batch_size=8 \ - model.micro_batch_size=8" - sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2" - sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \ - virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo' \ - language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ - data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt' \ - tensor_model_parallel_size=2 \ - trainer.devices=2 \ - data.global_batch_size=8 \ - data.micro_batch_size=8" - sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo" - sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt" - } - } - stage('T5 Prompt Learning TP=1 PP=2') { - steps { - sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning.py \ - --config-name=megatron_t5_prompt_learning \ - name='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2' \ - trainer.devices=2 \ - trainer.max_steps=6 \ - trainer.max_epochs=null \ - model.pipeline_model_parallel_size=2 \ - model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ - model.existing_tasks=[] \ - model.new_tasks=['squad'] \ - model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - model.data.validation_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - model.global_batch_size=8 \ - model.micro_batch_size=8" - sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2" - sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \ - virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2.nemo' \ - language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ - data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2_preds.txt' \ - tensor_model_parallel_size=2 \ - trainer.devices=2 \ - data.global_batch_size=8 \ - data.micro_batch_size=8" - sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2.nemo" - sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2_preds.txt" - } - } + // TODO: Fix these tests, they are taking ~10 minutes to complete + // stage('T5 Prompt Learning TP=2 PP=1') { + // steps { + // sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning.py \ + // --config-name=megatron_t5_prompt_learning \ + // name='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2' \ + // trainer.devices=2 \ + // trainer.max_steps=6 \ + // trainer.max_epochs=null \ + // model.tensor_model_parallel_size=2 \ + // model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ + // model.existing_tasks=[] \ + // model.new_tasks=['squad'] \ + // model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ + // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ + // model.global_batch_size=8 \ + // model.micro_batch_size=8" + // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2" + // sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \ + // virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo' \ + // language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ + // data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ + // pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt' \ + // tensor_model_parallel_size=2 \ + // trainer.devices=2 \ + // data.global_batch_size=8 \ + // data.micro_batch_size=8" + // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo" + // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt" + // } + // } + // stage('T5 Prompt Learning TP=1 PP=2') { + // steps { + // sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning.py \ + // --config-name=megatron_t5_prompt_learning \ + // name='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2' \ + // trainer.devices=2 \ + // trainer.max_steps=6 \ + // trainer.max_epochs=null \ + // model.pipeline_model_parallel_size=2 \ + // model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ + // model.existing_tasks=[] \ + // model.new_tasks=['squad'] \ + // model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ + // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ + // model.global_batch_size=8 \ + // model.micro_batch_size=8" + // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2" + // sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \ + // virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2.nemo' \ + // language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ + // data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ + // pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2_preds.txt' \ + // tensor_model_parallel_size=2 \ + // trainer.devices=2 \ + // data.global_batch_size=8 \ + // data.micro_batch_size=8" + // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2.nemo" + // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2_preds.txt" + // } + // } } } stage('L2: Megatron UL2 Pretraining and Resume Training TP=2') { From 82155693c564c7f18be3f2d0d43458e77ef5a77c Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu, 15 Dec 2022 00:24:28 -0800 Subject: [PATCH 048/154] [TTS][DOC] add notes about automatic conversion to target sampling rates. (#5624) (#5634) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- docs/source/tts/datasets.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/source/tts/datasets.rst b/docs/source/tts/datasets.rst index 77e349134a9d..abdfee1af7f8 100644 --- a/docs/source/tts/datasets.rst +++ b/docs/source/tts/datasets.rst @@ -1,7 +1,7 @@ Data Preprocessing ================== -NeMo TTS recipes support most of public TTS datasets that consist of multiple languages, multiple emotions, and multiple speakers. Current recipes covered English (en-US), German (de-DE), Spanish (es-ES), and Mandarin Chinese (zh-CN), while the support for many other languages is under planning. NeMo provides corpus-specific data preprocessing scripts, as shown in the directory of `scripts/data_processing/tts/ `_, to convert common public TTS datasets into the format expected by the dataloaders as defined in `nemo/collections/tts/torch/data.py `_. The ``nemo_tts`` collection expects each dataset to consist of a set of utterances in individual audio files plus a ``JSON`` manifest that describes the dataset, with information about one utterance per line. The audio files can be of any format supported by `Pydub `_, though we recommend ``WAV`` files as they are the default and have been most thoroughly tested. +NeMo TTS recipes support most of public TTS datasets that consist of multiple languages, multiple emotions, and multiple speakers. Current recipes covered English (en-US), German (de-DE), Spanish (es-ES), and Mandarin Chinese (zh-CN), while the support for many other languages is under planning. NeMo provides corpus-specific data preprocessing scripts, as shown in the directory of `scripts/data_processing/tts/ `_, to convert common public TTS datasets into the format expected by the dataloaders as defined in `nemo/collections/tts/torch/data.py `_. The ``nemo_tts`` collection expects each dataset to consist of a set of utterances in individual audio files plus a ``JSON`` manifest that describes the dataset, with information about one utterance per line. The audio files can be of any format supported by `Pydub `_, though we recommend ``WAV`` files as they are the default and have been most thoroughly tested. NeMo supports any original sampling rates of audios, although our scripts of extracting supplementary data and model training all specify the common target sampling rates as either 44100 Hz or 22050 Hz. If the original sampling rate mismatches the target sampling rate, the `feature preprocess `_ can automatically resample the original sampling rate into the target one. There should be one ``JSON`` manifest file per dataset that will be passed in, therefore, if the user wants separate training and validation datasets, they should also have separate manifests. Otherwise, they will be loading validation data with their training data and vice versa. Each line of the manifest should be in the following format: @@ -76,6 +76,8 @@ LibriTTS manifest_filepath= \ sup_data_path= +.. note:: + LibriTTS original sampling rate is **24000 Hz**, we re-use LJSpeech's config to down-sample it to **22050 Hz**. HiFiTTS From f306269c0f4038a9645703b15f9974ff4bc6ae31 Mon Sep 17 00:00:00 2001 From: Samuel Kriman Date: Thu, 15 Dec 2022 03:16:03 -0800 Subject: [PATCH 049/154] Conformer local attention (#5525) * local attn and merge Signed-off-by: sam1373 * optional Signed-off-by: sam1373 * override Signed-off-by: sam1373 * incorporate comments Signed-off-by: sam1373 * update Signed-off-by: sam1373 * fix Signed-off-by: sam1373 * comment Signed-off-by: sam1373 * changes, test Signed-off-by: sam1373 * changes Signed-off-by: sam1373 * check att context Signed-off-by: sam1373 * readme link Signed-off-by: sam1373 * utils Signed-off-by: sam1373 * update Signed-off-by: sam1373 Signed-off-by: sam1373 Signed-off-by: Samuel Kriman Co-authored-by: Vahid Noroozi Signed-off-by: Elena Rastorgueva --- README.rst | 1 + docs/source/asr/results.rst | 32 ++ examples/asr/transcribe_speech.py | 19 +- .../asr/modules/conformer_encoder.py | 228 +++++++++-- nemo/collections/asr/parts/mixins/mixins.py | 34 ++ .../asr/parts/submodules/conformer_modules.py | 23 +- .../parts/submodules/multi_head_attention.py | 359 ++++++++++++++++++ .../asr/parts/utils/transcribe_utils.py | 9 +- tests/collections/asr/test_asr_local_attn.py | 88 +++++ 9 files changed, 758 insertions(+), 35 deletions(-) create mode 100644 tests/collections/asr/test_asr_local_attn.py diff --git a/README.rst b/README.rst index c2334f5e2336..3b67330615dc 100644 --- a/README.rst +++ b/README.rst @@ -68,6 +68,7 @@ Key Features * Beam Search decoding * `Language Modelling for ASR `_: N-gram LM in fusion with Beam Search decoding, Neural Rescoring with Transformer * Streaming and Buffered ASR (CTC/Transducer) - `Chunked Inference Examples `_ + * `Support of long audios for Conformer with memory efficient local attention `_ * `Speech Classification and Speech Command Recognition `_: MatchboxNet (Command Recognition) * `Voice activity Detection (VAD) `_: MarbleNet * `Speaker Recognition `_: TitaNet, ECAPA_TDNN, SpeakerNet diff --git a/docs/source/asr/results.rst b/docs/source/asr/results.rst index 9d611e8477be..97b2aeb9550e 100644 --- a/docs/source/asr/results.rst +++ b/docs/source/asr/results.rst @@ -71,6 +71,38 @@ To perform inference and transcribe a sample of speech after loading the model, Setting the argument ``logprobs`` to ``True`` returns the log probabilities instead of transcriptions. For more information, see `nemo.collections.asr.modules <./api.html#modules>`__. The audio files should be 16KHz mono-channel wav files. +Inference on long audio +^^^^^^^^^^^^^^^^^^^^^^ + +In some cases the audio is too long for standard inference, especially if you're using a model such as Conformer, where the time and memory costs of the attention layers scale quadratically with the duration. + +There are two main ways of performing inference on long audio files in NeMo: + +The first way is to use buffered inference, where the audio is divided into chunks to run on, and the output is merged afterwards. +The relevant scripts for this are contained in `this folder `_. + +The second way, specifically for models with the Conformer encoder, is to convert to local attention, which changes the costs to be linear. +This can be done even for models trained with full attention, though may result in lower WER in some cases. You can switch to local attention when running the +`transcribe `_ or `evaluation `_ +scripts in the following way: + +.. code-block:: python + + python speech_to_text_eval.py \ + (...other parameters...) \ + ++model_change.conformer.self_attention_model="rel_pos_local_attn" \ + ++model_change.conformer.att_context_size=[64, 64] + +Alternatively, you can change the attention model after loading a checkpoint: + +.. code-block:: python + + asr_model = ASRModel.from_pretrained('stt_en_conformer_ctc_large') + asr_model.change_attention_model( + self_attention_model="rel_pos_local_attn", + att_context_size=[64, 64] + ) + Fine-tuning on Different Datasets ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/examples/asr/transcribe_speech.py b/examples/asr/transcribe_speech.py index d1da1f6d4dfd..6c9044ef1149 100644 --- a/examples/asr/transcribe_speech.py +++ b/examples/asr/transcribe_speech.py @@ -24,6 +24,7 @@ from nemo.collections.asr.metrics.rnnt_wer import RNNTDecodingConfig from nemo.collections.asr.metrics.wer import CTCDecodingConfig from nemo.collections.asr.models.ctc_models import EncDecCTCModel +from nemo.collections.asr.modules.conformer_encoder import ConformerChangeConfig from nemo.collections.asr.parts.utils.transcribe_utils import ( compute_output_filename, prepare_audio_data, @@ -44,7 +45,7 @@ pretrained_name: name of pretrained ASR model (from NGC registry) audio_dir: path to directory with audio files dataset_manifest: path to dataset JSON manifest file (in NeMo format) - + compute_timestamps: Bool to request greedy time stamp information (if the model supports it) compute_langs: Bool to request language ID information (if the model supports it) @@ -52,6 +53,10 @@ ctc_decoding.ctc_timestamp_type="all" # (default all, can be [all, char, word]) rnnt_decoding.rnnt_timestamp_type="all" # (default all, can be [all, char, word]) + (Optionally: You can limit the type of timestamp computations using below overrides) + ctc_decoding.ctc_timestamp_type="all" # (default all, can be [all, char, word]) + rnnt_decoding.rnnt_timestamp_type="all" # (default all, can be [all, char, word]) + output_filename: Output filename where the transcriptions will be written batch_size: batch size during inference @@ -59,7 +64,7 @@ amp: Bool to decide if Automatic Mixed Precision should be used during inference audio_type: Str filetype of the audio. Supported = wav, flac, mp3 - overwrite_transcripts: Bool which when set allowes repeated transcriptions to overwrite previous results. + overwrite_transcripts: Bool which when set allows repeated transcriptions to overwrite previous results. rnnt_decoding: Decoding sub-config for RNNT. Refer to documentation for specific values. @@ -86,6 +91,13 @@ """ +@dataclass +class ModelChangeConfig: + + # Sub-config for changes specific to the Conformer Encoder + conformer: ConformerChangeConfig = ConformerChangeConfig() + + @dataclass class TranscriptionConfig: # Required configs @@ -128,6 +140,9 @@ class TranscriptionConfig: # decoder type: ctc or rnnt, can be used to switch between CTC and RNNT decoder for Joint RNNT/CTC models decoder_type: Optional[str] = None + # Use this for model-specific changes before transcription + model_change: ModelChangeConfig = ModelChangeConfig() + @hydra_runner(config_name="TranscriptionConfig", schema=TranscriptionConfig) def main(cfg: TranscriptionConfig) -> TranscriptionConfig: diff --git a/nemo/collections/asr/modules/conformer_encoder.py b/nemo/collections/asr/modules/conformer_encoder.py index a3b747e138a4..04ad67f4d65f 100644 --- a/nemo/collections/asr/modules/conformer_encoder.py +++ b/nemo/collections/asr/modules/conformer_encoder.py @@ -14,6 +14,7 @@ import math from collections import OrderedDict +from dataclasses import dataclass from typing import List, Optional, Set import torch @@ -26,9 +27,12 @@ from nemo.collections.asr.parts.submodules.causal_convs import CausalConv1D from nemo.collections.asr.parts.submodules.conformer_modules import ConformerLayer from nemo.collections.asr.parts.submodules.multi_head_attention import ( + LocalAttRelPositionalEncoding, MultiHeadAttention, PositionalEncoding, RelPositionalEncoding, + RelPositionMultiHeadAttention, + RelPositionMultiHeadAttentionLongformer, ) from nemo.collections.asr.parts.submodules.subsampling import ( ConvSubsampling, @@ -74,12 +78,17 @@ class ConformerEncoder(NeuralModule, StreamingEncoder, Exportable): Defaults to 4. self_attention_model (str): type of the attention layer and positional encoding 'rel_pos': relative positional embedding and Transformer-XL + 'rel_pos_local_attn': relative positional embedding and Transformer-XL with local attention using + overlapping chunks. Attention context is determined by att_context_size parameter. 'abs_pos': absolute positional embedding and Transformer - default is rel_pos. + Default is rel_pos. pos_emb_max_len (int): the maximum length of positional embeddings - Defaulst to 5000 + Defaults to 5000 n_heads (int): number of heads in multi-headed attention layers Defaults to 4. + att_context_size (List[int]): List of 2 ints corresponding to left and right attention context sizes, + or None for full context. + Defaults to None. xscaling (bool): enables scaling the inputs to the multi-headed attention layers by sqrt(d_model) Defaults to True. untie_biases (bool): whether to not share (untie) the bias weights between layers of Transformer-XL @@ -192,6 +201,7 @@ def __init__( self.scale = math.sqrt(self.d_model) self.att_context_style = att_context_style self.subsampling_factor = subsampling_factor + self.self_attention_model = self_attention_model if att_context_size: self.att_context_size = list(att_context_size) @@ -263,7 +273,7 @@ def __init__( feat_in=feat_in, feat_out=d_model, conv_channels=subsampling_conv_channels, - activation=nn.ReLU(), + activation=nn.ReLU(True), is_causal=causal_downsampling, ) else: @@ -293,6 +303,7 @@ def __init__( pos_bias_v = None self.pos_emb_max_len = pos_emb_max_len + self.att_mask = None if self_attention_model == "rel_pos": self.pos_enc = RelPositionalEncoding( d_model=d_model, @@ -301,6 +312,17 @@ def __init__( xscale=self.xscale, dropout_rate_emb=dropout_emb, ) + elif self_attention_model == 'rel_pos_local_attn': + if max(att_context_size) <= 0: + raise ValueError("When using local attention, context size must be set > 0") + self.pos_enc = LocalAttRelPositionalEncoding( + att_context_size=att_context_size, + d_model=d_model, + dropout_rate=dropout, + max_len=pos_emb_max_len, + xscale=self.xscale, + dropout_rate_emb=dropout_emb, + ) elif self_attention_model == "abs_pos": pos_bias_u = None pos_bias_v = None @@ -362,25 +384,28 @@ def set_max_audio_length(self, max_audio_length): device = next(self.parameters()).device self.pos_enc.extend_pe(max_audio_length, device) - att_mask = torch.ones(1, max_audio_length, max_audio_length, dtype=torch.bool, device=device) - if self.chunk_size is None: - if self.att_context_size[0] >= 0: - att_mask = att_mask.triu(diagonal=-self.att_context_size[0]) - if self.att_context_size[1] >= 0: - att_mask = att_mask.tril(diagonal=self.att_context_size[1]) - else: - chunk_idx = torch.arange(0, max_audio_length, dtype=torch.int, device=att_mask.device) - chunk_idx = torch.div(chunk_idx, self.chunk_size, rounding_mode="trunc") - diff_chunks = chunk_idx.unsqueeze(1) - chunk_idx.unsqueeze(0) - chunked_limited_mask = torch.logical_and( - torch.le(diff_chunks, self.left_chunks_num), torch.ge(diff_chunks, 0) - ) - att_mask = torch.logical_and(att_mask, chunked_limited_mask.unsqueeze(0)) + if self.self_attention_model != "rel_pos_local_attn": + att_mask = torch.ones(1, max_audio_length, max_audio_length, dtype=torch.bool, device=device) + if self.chunk_size is None: + if self.att_context_size[0] >= 0: + att_mask = att_mask.triu(diagonal=-self.att_context_size[0]) + if self.att_context_size[1] >= 0: + att_mask = att_mask.tril(diagonal=self.att_context_size[1]) + else: + chunk_idx = torch.arange(0, max_audio_length, dtype=torch.int, device=att_mask.device) + chunk_idx = torch.div(chunk_idx, self.chunk_size, rounding_mode="trunc") + diff_chunks = chunk_idx.unsqueeze(1) - chunk_idx.unsqueeze(0) + chunked_limited_mask = torch.logical_and( + torch.le(diff_chunks, self.left_chunks_num), torch.ge(diff_chunks, 0) + ) + att_mask = torch.logical_and(att_mask, chunked_limited_mask.unsqueeze(0)) - if hasattr(self, 'att_mask'): - self.att_mask = att_mask + if hasattr(self, 'att_mask'): + self.att_mask = att_mask + else: + self.register_buffer('att_mask', att_mask, persistent=False) else: - self.register_buffer('att_mask', att_mask, persistent=False) + self.att_mask = None @typecheck() def forward(self, audio_signal, length, cache_last_channel=None, cache_last_time=None): @@ -429,14 +454,18 @@ def forward(self, audio_signal, length, cache_last_channel=None, cache_last_time cache_last_channel_next = None cache_len = 0 - audio_signal, pos_emb = self.pos_enc(x=audio_signal, cache_len=cache_len) + if self.self_attention_model == 'abs_pos': + audio_signal, pos_emb = self.pos_enc(x=audio_signal) + else: + audio_signal, pos_emb = self.pos_enc(x=audio_signal, cache_len=cache_len) # Create the self-attention and padding masks pad_mask, att_mask = self._create_masks(max_audio_length, padding_length, audio_signal.device) if cache_last_channel is not None: pad_mask = pad_mask[:, cache_len:] - att_mask = att_mask[:, cache_len:] + if self.att_mask is not None: + att_mask = att_mask[:, cache_len:] for lth, layer in enumerate(self.layers): audio_signal = layer( @@ -478,16 +507,20 @@ def _create_masks(self, max_audio_length, padding_length, device): padding_length.size(0), -1 ) < padding_length.unsqueeze(-1) - # pad_mask_for_att_mask is the mask which helps to ignore paddings - pad_mask_for_att_mask = pad_mask.unsqueeze(1).repeat([1, max_audio_length, 1]) - pad_mask_for_att_mask = torch.logical_and(pad_mask_for_att_mask, pad_mask_for_att_mask.transpose(1, 2)) - # att_mask is the masking to be used by the MHA layers to ignore the tokens not supposed to be visible - att_mask = self.att_mask[:, :max_audio_length, :max_audio_length] - # paddings should also get ignored, so pad_mask_for_att_mask is used to ignore their corresponding scores - att_mask = torch.logical_and(pad_mask_for_att_mask, att_mask.to(pad_mask_for_att_mask.device)) + if self.att_mask is not None: + # pad_mask_for_att_mask is the mask which helps to ignore paddings + pad_mask_for_att_mask = pad_mask.unsqueeze(1).repeat([1, max_audio_length, 1]) + pad_mask_for_att_mask = torch.logical_and(pad_mask_for_att_mask, pad_mask_for_att_mask.transpose(1, 2)) + # att_mask is the masking to be used by the MHA layers to ignore the tokens not supposed to be visible + att_mask = self.att_mask[:, :max_audio_length, :max_audio_length] + # paddings should also get ignored, so pad_mask_for_att_mask is used to ignore their corresponding scores + att_mask = torch.logical_and(pad_mask_for_att_mask, att_mask.to(pad_mask_for_att_mask.device)) + + att_mask = ~att_mask + else: + att_mask = None pad_mask = ~pad_mask - att_mask = ~att_mask return pad_mask, att_mask @@ -610,6 +643,124 @@ def get_initial_cache_state(self, batch_size=1, dtype=torch.float32, device=None return cache_last_channel, cache_last_time + def change_attention_model( + self, + self_attention_model: str = None, + att_context_size: List[int] = None, + update_config: bool = True, + device: torch.device = None, + ): + + """ + Update the self_attention_model which changes the positional encoding and attention layers. + + Args: + self_attention_model (str): type of the attention layer and positional encoding + 'rel_pos': relative positional embedding and Transformer-XL + 'rel_pos_local_attn': relative positional embedding and Transformer-XL with local attention using + overlapping windows. Attention context is determined by att_context_size parameter. + 'abs_pos': absolute positional embedding and Transformer + If None is provided, the self_attention_model isn't changed. Defauts to None. + att_context_size (List[int]): List of 2 ints corresponding to left and right attention context sizes, + or None to keep as it is. Defauts to None. + update_config (bool): Whether to update the config or not with the new attention model. + Defaults to True. + device (torch.device): If provided, new layers will be moved to the device. + Defaults to None. + """ + + if att_context_size: + att_context_size = list(att_context_size) + else: + att_context_size = self._cfg.att_context_size + + if self_attention_model is None: + self_attention_model = self._cfg.self_attention_model + + if self_attention_model == 'rel_pos_local_attn' and max(att_context_size) <= 0: + raise ValueError("When using local attention, context size must be set > 0") + + if self_attention_model == "rel_pos": + self.att_mask = None + new_pos_enc = RelPositionalEncoding( + d_model=self._cfg.d_model, + dropout_rate=self._cfg.dropout, + max_len=self._cfg.pos_emb_max_len, + xscale=self.xscale, + dropout_rate_emb=self._cfg.dropout_emb, + ) + elif self_attention_model == 'rel_pos_local_attn': + new_pos_enc = LocalAttRelPositionalEncoding( + att_context_size=att_context_size, + d_model=self._cfg.d_model, + dropout_rate=self._cfg.dropout, + max_len=self._cfg.pos_emb_max_len, + xscale=self.xscale, + dropout_rate_emb=self._cfg.dropout_emb, + ) + elif self_attention_model == "abs_pos": + new_pos_enc = PositionalEncoding( + d_model=self._cfg.d_model, + dropout_rate=self._cfg.dropout, + max_len=self._cfg.pos_emb_max_len, + xscale=self.xscale, + ) + else: + raise ValueError(f"Not valid self_attention_model: '{self_attention_model}'!") + + if device is not None: + new_pos_enc = new_pos_enc.to(device=device) + del self.pos_enc + self.pos_enc = new_pos_enc + self.self_attention_model = self_attention_model + self.att_context_size = att_context_size + self.set_max_audio_length(self.pos_emb_max_len) + + for name, m in self.named_modules(): + if type(m) == ConformerLayer: + + if self_attention_model == 'rel_pos': + new_attn = RelPositionMultiHeadAttention( + n_head=self._cfg.n_heads, + n_feat=self._cfg.d_model, + dropout_rate=self._cfg.dropout_att, + max_cache_len=att_context_size[0], + pos_bias_u=None, + pos_bias_v=None, + ) + elif self_attention_model == 'rel_pos_local_attn': + new_attn = RelPositionMultiHeadAttentionLongformer( + n_head=self._cfg.n_heads, + n_feat=self._cfg.d_model, + dropout_rate=self._cfg.dropout_att, + max_cache_len=att_context_size[0], + att_context_size=att_context_size, + pos_bias_u=None, + pos_bias_v=None, + ) + elif self_attention_model == 'abs_pos': + new_attn = MultiHeadAttention( + n_head=self._cfg.n_heads, + n_feat=self._cfg.d_model, + dropout_rate=self._cfg.dropout_att, + max_cache_len=att_context_size[0], + ) + else: + raise ValueError( + f"'{self_attention_model}' is not not a valid value for 'self_attention_model', " + f"valid values can be from ['rel_pos', 'rel_pos_local_attn', 'abs_pos']" + ) + if device is not None: + new_attn = new_attn.to(device=device) + new_attn.load_state_dict(m.self_attn.state_dict(), strict=False) + del m.self_attn + m.self_attn = new_attn + m.self_attention_model = self_attention_model + + if update_config: + self._cfg.self_attention_model = self_attention_model + self._cfg.att_context_size = att_context_size + class ConformerEncoderAdapter(ConformerEncoder, adapter_mixins.AdapterModuleMixin): @@ -658,3 +809,20 @@ def get_accepted_adapter_types(self,) -> Set[type]: """ if adapter_mixins.get_registered_adapter(ConformerEncoder) is None: adapter_mixins.register_adapter(base_class=ConformerEncoder, adapter_class=ConformerEncoderAdapter) + + +@dataclass +class ConformerChangeConfig: + # Change self_attention_model for Conformer + # Options: + # 'rel_pos': relative positional embedding and Transformer-XL + # 'rel_pos_local_attn': relative positional embedding and Transformer-XL with local attention using + # overlapping chunks. Attention context is determined by att_context_size parameter. + # 'abs_pos': absolute positional embedding and Transformer + # If None is provided, self_attention_model is not changed. + self_attention_model: Optional[str] = None + + # Change the attention context size by providing 2 integers, + # corresponding to left and right context, or -1 for full context. + # If None is provided, the attention context size isn't changed. + att_context_size: Optional[List[int]] = None diff --git a/nemo/collections/asr/parts/mixins/mixins.py b/nemo/collections/asr/parts/mixins/mixins.py index 3e32e82b442d..1da54ed062ae 100644 --- a/nemo/collections/asr/parts/mixins/mixins.py +++ b/nemo/collections/asr/parts/mixins/mixins.py @@ -394,6 +394,40 @@ def change_conv_asr_se_context_window(self, context_window: int, update_config: self, context_window=context_window, update_config=update_config ) + def change_attention_model( + self, self_attention_model: str = None, att_context_size: List[int] = None, update_config: bool = True + ): + """ + Update the self_attention_model if function is available in encoder. + + Args: + self_attention_model (str): type of the attention layer and positional encoding + 'rel_pos': relative positional embedding and Transformer-XL + 'rel_pos_local_attn': relative positional embedding and Transformer-XL with local attention using + overlapping windows. Attention context is determined by att_context_size parameter. + 'abs_pos': absolute positional embedding and Transformer + If None is provided, the self_attention_model isn't changed. Defauts to None. + att_context_size (List[int]): List of 2 ints corresponding to left and right attention context sizes, + or None to keep as it is. Defauts to None. + update_config (bool): Whether to update the config or not with the new attention model. + Defaults to True. + """ + if not hasattr(self, 'encoder'): + logging.info( + "Could not change the self_attention_model in encoder " + "since the model provided does not contain an `encoder` module in its config." + ) + return + + if not hasattr(self.encoder, "change_attention_model"): + logging.info("Model encoder doesn't have a change_attention_model method ") + return + + self.encoder.change_attention_model(self_attention_model, att_context_size, update_config, self.device) + if update_config: + self.cfg.encoder.self_attention_model = self_attention_model + self.cfg.encoder.att_context_size = att_context_size + def conformer_stream_step( self, processed_signal: torch.Tensor, diff --git a/nemo/collections/asr/parts/submodules/conformer_modules.py b/nemo/collections/asr/parts/submodules/conformer_modules.py index df525db8dfbc..f1198d07a3a2 100644 --- a/nemo/collections/asr/parts/submodules/conformer_modules.py +++ b/nemo/collections/asr/parts/submodules/conformer_modules.py @@ -21,6 +21,7 @@ from nemo.collections.asr.parts.submodules.multi_head_attention import ( MultiHeadAttention, RelPositionMultiHeadAttention, + RelPositionMultiHeadAttentionLongformer, ) from nemo.collections.asr.parts.utils.activations import Swish from nemo.collections.common.parts import adapter_modules @@ -90,6 +91,16 @@ def __init__( pos_bias_v=pos_bias_v, max_cache_len=MHA_max_cache_len, ) + elif self_attention_model == 'rel_pos_local_attn': + self.self_attn = RelPositionMultiHeadAttentionLongformer( + n_head=n_heads, + n_feat=d_model, + dropout_rate=dropout_att, + pos_bias_u=pos_bias_u, + pos_bias_v=pos_bias_v, + max_cache_len=MHA_max_cache_len, + att_context_size=att_context_size, + ) elif self_attention_model == 'abs_pos': self.self_attn = MultiHeadAttention( n_head=n_heads, n_feat=d_model, dropout_rate=dropout_att, max_cache_len=MHA_max_cache_len @@ -97,7 +108,7 @@ def __init__( else: raise ValueError( f"'{self_attention_model}' is not not a valid value for 'self_attention_model', " - f"valid values can be from ['rel_pos', 'abs_pos']" + f"valid values can be from ['rel_pos', 'rel_pos_local_attn', 'abs_pos']" ) # second feed forward module @@ -147,6 +158,16 @@ def forward( cache=cache_last_channel, cache_next=cache_last_channel_next, ) + elif self.self_attention_model == 'rel_pos_local_attn': + x = self.self_attn( + query=x, + key=x, + value=x, + pad_mask=pad_mask, + pos_emb=pos_emb, + cache=cache_last_channel, + cache_next=cache_last_channel_next, + ) elif self.self_attention_model == 'abs_pos': x = self.self_attn( query=x, key=x, value=x, mask=att_mask, cache=cache_last_channel, cache_next=cache_last_channel_next diff --git a/nemo/collections/asr/parts/submodules/multi_head_attention.py b/nemo/collections/asr/parts/submodules/multi_head_attention.py index 62206fb7d3da..f83c23387dda 100644 --- a/nemo/collections/asr/parts/submodules/multi_head_attention.py +++ b/nemo/collections/asr/parts/submodules/multi_head_attention.py @@ -33,9 +33,12 @@ """ import math +from functools import lru_cache +from typing import List import torch import torch.nn as nn +import torch.nn.functional as F from nemo.utils import avoid_float16_autocast_context @@ -251,11 +254,320 @@ def forward(self, query, key, value, mask, pos_emb, cache=None, cache_next=None) matrix_bd = matrix_bd[:, :, :, : matrix_ac.size(-1)] scores = (matrix_ac + matrix_bd) / self.s_d_k # (batch, head, time1, time2) + out = self.forward_attention(v, scores, mask) return out +class RelPositionMultiHeadAttentionLongformer(RelPositionMultiHeadAttention): + """Multi-Head Attention layer of Transformer-XL with sliding window local attention from Longformer. + Paper: https://arxiv.org/abs/1901.02860 (Transformer-XL), + https://arxiv.org/abs/2004.05150 (Longformer) + Args: + n_head (int): number of heads + n_feat (int): size of the features + dropout_rate (float): dropout rate + pos_bias_u (Tensor): the positional bias matrix U + pos_bias_v (Tensor): the positional bias matrix V + att_context_size (List[int]): List of 2 ints corresponding to left and right attention context sizes. + max_cache_len (int): the maximum size of cache + """ + + def __init__(self, n_head, n_feat, dropout_rate, pos_bias_u, pos_bias_v, att_context_size, max_cache_len=0): + """Construct an RelPositionMultiHeadedAttention object.""" + super().__init__( + n_head=n_head, + n_feat=n_feat, + dropout_rate=dropout_rate, + pos_bias_u=pos_bias_u, + pos_bias_v=pos_bias_v, + max_cache_len=max_cache_len, + ) + self.att_context_size = att_context_size + + def forward(self, query, key, value, pad_mask, pos_emb, cache=None, cache_next=None): + """Compute Scaled Dot Product Local Attention with rel. positional encoding. using overlapping chunks + Args: + query (torch.Tensor): (batch, time, size) + key (torch.Tensor): (batch, time, size) + value(torch.Tensor): (batch, time, size) + pad_mask (torch.Tensor): (batch, time) + pos_emb (torch.Tensor) : (batch, 2w + 1, size) + cache (torch.Tensor) : (cache_nums, batch, time_cache, size) + cache_next (torch.Tensor) : (cache_nums, batch, time_cache_next, size) + Returns: + output (torch.Tensor): transformed `value` (batch, time1, d_model) weighted by the query dot key attention + """ + + key, value, query = self.update_cache(key=key, value=value, query=query, cache=cache, cache_next=cache_next) + + if torch.is_autocast_enabled(): + query, key, value = query.to(torch.float32), key.to(torch.float32), value.to(torch.float32) + + # temporary until we solve this more gracefully + with avoid_float16_autocast_context(): + q, k, v = self.forward_qkv(query, key, value) + n_batch, _, T, _ = q.size() + + w = max(self.att_context_size[0], self.att_context_size[1]) + if w <= 0: + raise ValueError("When using local attention, context size must be set > 0") + pad_len = (2 * w - T % (2 * w)) % (2 * w) # pad time to 2w + q = F.pad(q, (0, 0, 0, pad_len)) # (batch, head, time, size) + k = F.pad(k, (0, 0, 0, pad_len)) # (batch, head, time, size) + v = F.pad(v, (0, 0, 0, pad_len)) # (batch, head, time, size) + mask = F.pad(pad_mask, (0, pad_len), value=1.0) + + q_with_bias_u = q + self.pos_bias_u.unsqueeze(1) # (batch, head, time, size) + q_with_bias_v = q + self.pos_bias_v.unsqueeze(1) # (batch, head, time, size) + + diagonal_matrix_ac = self.sliding_chunks_matmul_qk( + q_with_bias_u, k, w, padding_value=0.0 + ) # (batch, head, time, 2w + 1) + + # add relative positional embedding + + n_batch_pos = pos_emb.size(0) + p = self.linear_pos(pos_emb).view(n_batch_pos, -1, self.h, self.d_k).transpose(1, 2) + # (batch, head, 2w, size) + diagonal_matrix_bd = torch.matmul(q_with_bias_v, p.transpose(-2, -1)) + # (batch, head, time, 2w + 1) + + start_pos = w - self.att_context_size[0] + end_pos = w + self.att_context_size[1] + + diagonal_matrix_ac[:, :, :, : self.att_context_size[0]] += diagonal_matrix_bd[ + :, :, :, : self.att_context_size[0] + ] + diagonal_matrix_ac[:, :, :, -(self.att_context_size[1] + 1) :] += diagonal_matrix_bd[ + :, :, :, self.att_context_size[0] : + ] + scores = diagonal_matrix_ac / self.s_d_k + # (batch, head, time, 2w + 1) + + # mask invalid positions + scores[:, :, :, :start_pos] = -10000.0 + scores[:, :, :, end_pos + 1 :] = -10000.0 + + # This implementation is fast and takes very little memory because num_heads x hidden_size = 1 + # from (bsz x seq_len) to (bsz x num_heads x seqlen x hidden_size) + mask = mask.unsqueeze(dim=1).unsqueeze(dim=-1) + # cast to float/half then replace 1's with -inf + float_mask = mask.type_as(scores).masked_fill(mask, -10000.0) + ones = float_mask.new_ones(size=float_mask.size()) # tensor of ones + # diagonal mask with zeros everywhere and -inf inplace of padding + d_mask = self.sliding_chunks_matmul_qk(ones, float_mask, w, padding_value=0.0) + # (batch, head, time, 2w + 1) + + scores += d_mask + + attn = torch.softmax(scores, dim=-1).masked_fill(mask, 0.0) + p_attn = self.dropout(attn) + # (batch, head, time, 2w + 1) + + x = self.sliding_chunks_matmul_pv(p_attn, v, w).reshape(n_batch, -1, self.h * self.d_k)[:, :T] + # (batch, time, size) + + return self.linear_out(x) + + # Longformer implementation for overlap case adapted for arbitrary left and right chunk size + # https://github.com/allenai/longformer/blob/master/longformer/sliding_chunks.py + def _skew(self, x: torch.Tensor, direction: List[int], padding_value: float) -> torch.Tensor: + """Convert diagonals into columns (or columns into diagonals depending on `direction` + + Args: + x (torch.Tensor): (batch x head, chunk_count, 2w, 2w) + direction (List[int]): padding directions + padding_value (float): value to pad with + + Returns: + output (torch.Tensor): (batch x head, chunk_count, 2w, 2w + 1) + + """ + x_padded = F.pad(x, direction, value=padding_value) + x_padded = x_padded.view(*x_padded.size()[:-2], x_padded.size(-1), x_padded.size(-2)) + return x_padded + + def _skew2(self, x: torch.Tensor, padding_value: float) -> torch.Tensor: + """Shift every row 1 step to right converting columns into diagonals + + Args: + x (torch.Tensor): (batch x head, chunks_count + 1, w, 2w + 1) + padding_value (float): value to pad with + + Returns: + output (torch.Tensor): (batch x head, chunks_count + 1, w, 3w) + """ + # X = B x C x M x L + B, C, M, L = x.size() + x = F.pad(x, (0, M + 1), value=padding_value) # B x C x M x (L+M+1) + x = x.view(B, C, -1) # B x C x ML+MM+M + x = x[:, :, :-M] # B x C x ML+MM + x = x.view(B, C, M, M + L) # B x C, M x L+M + x = x[:, :, :, :-1] + return x + + def _chunk_overlap(self, x: torch.Tensor, w: int) -> torch.Tensor: + """Convert into overlapping chunks. + + Args: + x (torch.Tensor): # (batch x head, time, size) + w (int): Chunk overlap size + + Returns: + output (torch.Tensor): # (batch x head, chunk_count, 2w, size) + """ + + # non-overlapping chunks of size = 2w + x = x.view(x.size(0), x.size(1) // (w * 2), w * 2, x.size(2)) + + # use `as_strided` to make the chunks overlap with an overlap size = w + chunk_size = list(x.size()) + chunk_size[1] = chunk_size[1] * 2 - 1 + + chunk_stride = list(x.stride()) + chunk_stride[1] = chunk_stride[1] // 2 + return x.as_strided(size=chunk_size, stride=chunk_stride) + + @lru_cache() + def _get_invalid_locations_mask(self, w: int, device: str): + + diagonals_list = [] + for j in range(-w, 1): + diagonal_mask = torch.zeros(w, device='cpu', dtype=torch.uint8) + diagonal_mask[:-j] = 1 + diagonals_list.append(diagonal_mask) + + mask = torch.stack(diagonals_list, dim=-1) + mask = mask[None, None, :, :] + + ending_mask = mask.flip(dims=(2, 3)).bool().to(device) + return mask.bool().to(device), ending_mask + + def mask_invalid_locations( + self, input_tensor: torch.Tensor, w: int, + ): + """ + Mask locations invalid for the sliding window attention + + Args: + input_tensor (torch.Tensor): # (batch x head, time, size) + w (int): Chunk overlap size + """ + beginning_mask, ending_mask = self._get_invalid_locations_mask(w, input_tensor.device) + seq_len = input_tensor.size(2) + beginning_input = input_tensor[:, :, :w, : w + 1] + beginning_mask = beginning_mask[:, :, :seq_len].expand(beginning_input.size()) + beginning_input.masked_fill_(beginning_mask, -float('inf')) + + ending_input = input_tensor[:, :, -w:, -(w + 1) :] + ending_mask = ending_mask[:, :, -seq_len:].expand(ending_input.size()) + ending_input.masked_fill_(ending_mask, -float('inf')) + + def sliding_chunks_matmul_qk(self, q: torch.Tensor, k: torch.Tensor, w: int, padding_value: float) -> torch.Tensor: + """Matrix multiplication of query x key tensors using with a sliding window attention pattern. + This implementation splits the input into overlapping chunks of size 2w + with an overlap of size w + + Args: + q (torch.Tensor): (batch, head, time, size) + k (torch.Tensor): (batch, head, time, size) + w (int): Chunk overlap size + padding_value (float): Value to pad with + + Returns: + output (torch.Tensor): (batch, head, time, 2w + 1) + """ + bsz, num_heads, seqlen, head_dim = q.size() + assert seqlen % (w * 2) == 0 + assert q.size() == k.size() + + chunks_count = seqlen // w - 1 + + # group bsz and num_heads dimensions into one, then chunk seqlen into chunks of size w * 2 + q = q.reshape(bsz * num_heads, seqlen, head_dim) + k = k.reshape(bsz * num_heads, seqlen, head_dim) + + chunk_q = self._chunk_overlap(q, w) # (batch x head, chunk_count, 2w, size) + chunk_k = self._chunk_overlap(k, w) # (batch x head, chunk_count, 2w, size) + + # matrix multipication + # bcxd: bsz*num_heads x chunks x 2w x head_dim + # bcyd: bsz*num_heads x chunks x 2w x head_dim + # bcxy: bsz*num_heads x chunks x 2w x 2w + chunk_attn = torch.einsum('bcxd,bcyd->bcxy', (chunk_q, chunk_k)) # multiply + # (batch x head, chunk_count, 2w, 2w) + + # convert diagonals into columns + diagonal_chunk_attn = self._skew(chunk_attn, direction=(0, 0, 0, 1), padding_value=padding_value) + # (batch x head, chunk_count, 2w, 2w + 1) + + # allocate space for the overall attention matrix where the chunks are combined. The last dimension + # has (w * 2 + 1) columns. The first (w) columns are the w lower triangles (attention from a word to + # w previous words). The following column is attention score from each word to itself, then + # followed by w columns for the upper triangle. + + diagonal_attn = diagonal_chunk_attn.new_empty((bsz * num_heads, chunks_count + 1, w, w * 2 + 1)) + # (batch x head, chunk_count + 1, w, 2w + 1) + + # copy parts from diagonal_chunk_attn into the compined matrix of attentions + # - copying the main diagonal and the upper triangle + diagonal_attn[:, :-1, :, w:] = diagonal_chunk_attn[:, :, :w, : w + 1] + diagonal_attn[:, -1, :, w:] = diagonal_chunk_attn[:, -1, w:, : w + 1] + # - copying the lower triangle + diagonal_attn[:, 1:, :, :w] = diagonal_chunk_attn[:, :, -(w + 1) : -1, w + 1 :] + diagonal_attn[:, 0, 1:w, 1:w] = diagonal_chunk_attn[:, 0, : w - 1, 1 - w :] + + # separate bsz and num_heads dimensions again + diagonal_attn = diagonal_attn.view(bsz, num_heads, seqlen, 2 * w + 1) + # (batch, head, time, 2w + 1) + + self.mask_invalid_locations(diagonal_attn, w) + + return diagonal_attn + + def sliding_chunks_matmul_pv(self, prob: torch.Tensor, v: torch.Tensor, w: int): + """Same as sliding_chunks_matmul_qk but for prob and value tensors. + + Args: + prob (torch.Tensor): (batch, head, time, size) + v (torch.Tensor): (batch, head, time, size) + w (int): Chunk overlap size + + Returns: + output (torch.Tensor): (batch, time, head, size) + """ + bsz, num_heads, seqlen, head_dim = v.size() + chunks_count = seqlen // w - 1 + # group bsz and num_heads dimensions into one, then chunk seqlen into chunks of size 2w + chunk_prob = prob.reshape(bsz * num_heads, seqlen // w, w, 2 * w + 1) + # (batch x head, chunks_count + 1, w, 2w + 1) + + # group bsz and num_heads dimensions into one + v = v.reshape(bsz * num_heads, seqlen, head_dim) + # (batch x head, time, size) + + # pad seqlen with w at the beginning of the sequence and another w at the end + padded_v = F.pad(v, (0, 0, w, w), value=-1) + # (batch x head, time + 2w, size) + + # chunk padded_v into chunks of size 3w and an overlap of size w + chunk_v_size = (bsz * num_heads, chunks_count + 1, 3 * w, head_dim) + chunk_v_stride = padded_v.stride() + chunk_v_stride = chunk_v_stride[0], w * chunk_v_stride[1], chunk_v_stride[1], chunk_v_stride[2] + chunk_v = padded_v.as_strided(size=chunk_v_size, stride=chunk_v_stride) + # (batch x head, chunks_count + 1, 3w, size) + + skewed_prob = self._skew2(chunk_prob, padding_value=0) + # (batch x head, chunks_count + 1, w, 3w) + + context = torch.einsum('bcwd,bcdh->bcwh', (skewed_prob, chunk_v)) + # (batch x head, chunks_count + 1, w, size) + + return context.view(bsz, num_heads, seqlen, head_dim).transpose(1, 2) + + class PositionalEncoding(torch.nn.Module): """Fixed sinusoidal positional encoding. Args: @@ -362,3 +674,50 @@ def forward(self, x, cache_len=0): if self.dropout_emb: pos_emb = self.dropout_emb(pos_emb) return self.dropout(x), pos_emb + + +class LocalAttRelPositionalEncoding(PositionalEncoding): + """Relative positional encoding for sliding window attention or chunked attention. + See above for relative positional encoding based on Transformer-XL paper + Args: + left_chunk_size (int): number of frames to in past chunks + chunk size (int): number of frames (max frames if using multimode) in current chunk + d_model (int): embedding dim + dropout_rate (float): dropout rate + max_len (int): maximum input length + xscale (bool): whether to scale the input by sqrt(d_model) + dropout_rate_emb (float): dropout rate for the positional embeddings + """ + + def __init__(self, att_context_size, **kwargs): + super(LocalAttRelPositionalEncoding, self).__init__(**kwargs) + self.left_context = att_context_size[0] + self.right_context = att_context_size[1] + + def extend_pe(self, length, device): + """Reset and extend the positional encodings only at the beginning""" + if hasattr(self, 'pe'): + return + + positions = torch.arange( + self.left_context, -self.right_context - 1, -1, dtype=torch.float32, device=device + ).unsqueeze(1) + self.create_pe(positions=positions) + + def forward(self, x, cache_len=0): + """Compute positional encoding. + Args: + x (torch.Tensor): Input. Its shape is (batch, time, feature_size) + Returns: + x (torch.Tensor): Its shape is (batch, time, feature_size) + pos_emb (torch.Tensor): Its shape is (1, time, feature_size) + """ + + if self.xscale: + x = x * self.xscale + + end_pos = self.left_context + self.right_context + 1 + pos_emb = self.pe[:, :end_pos] + if self.dropout_emb: + pos_emb = self.dropout_emb(pos_emb) + return self.dropout(x), pos_emb diff --git a/nemo/collections/asr/parts/utils/transcribe_utils.py b/nemo/collections/asr/parts/utils/transcribe_utils.py index ddfcd2ce9ce1..ac400690ec0d 100644 --- a/nemo/collections/asr/parts/utils/transcribe_utils.py +++ b/nemo/collections/asr/parts/utils/transcribe_utils.py @@ -184,13 +184,18 @@ def setup_model(cfg: DictConfig, map_location: torch.device) -> Tuple[ASRModel, imported_class = model_utils.import_class_by_path(classpath) # type: ASRModel logging.info(f"Restoring model : {imported_class.__name__}") asr_model = imported_class.restore_from( - restore_path=cfg.model_path, map_location=map_location + restore_path=cfg.model_path, map_location=map_location, ) # type: ASRModel + if hasattr(cfg, "model_change"): + asr_model.change_attention_model( + self_attention_model=cfg.model_change.conformer.get("self_attention_model", None), + att_context_size=cfg.model_change.conformer.get("att_context_size", None), + ) model_name = os.path.splitext(os.path.basename(cfg.model_path))[0] else: # restore model by name asr_model = ASRModel.from_pretrained( - model_name=cfg.pretrained_name, map_location=map_location + model_name=cfg.pretrained_name, map_location=map_location, ) # type: ASRModel model_name = cfg.pretrained_name diff --git a/tests/collections/asr/test_asr_local_attn.py b/tests/collections/asr/test_asr_local_attn.py new file mode 100644 index 000000000000..713048ae9457 --- /dev/null +++ b/tests/collections/asr/test_asr_local_attn.py @@ -0,0 +1,88 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import os +import shutil +import tempfile + +import pytest +import torch + +from nemo.collections.asr.models import ASRModel + + +def getattr2(object, attr): + if not '.' in attr: + return getattr(object, attr) + else: + arr = attr.split('.') + return getattr2(getattr(object, arr[0]), '.'.join(arr[1:])) + + +class TestASRLocalAttention: + @pytest.mark.with_downloads() + @pytest.mark.unit + def test_forward(self): + asr_model = ASRModel.from_pretrained("stt_en_conformer_ctc_small") + asr_model = asr_model.eval() + + len = 16000 * 60 * 30 # 30 minutes, OOM without local attention + input_signal_long = torch.randn(size=(1, len), device=asr_model.device) + length_long = torch.tensor([len], device=asr_model.device) + + # switch to local attn + asr_model.change_attention_model(self_attention_model="rel_pos_local_attn", att_context_size=(64, 64)) + with torch.no_grad(): + asr_model.forward(input_signal=input_signal_long, input_signal_length=length_long) + + # switch context size only (keep local) + asr_model.change_attention_model(att_context_size=(192, 192)) + with torch.no_grad(): + asr_model.forward(input_signal=input_signal_long, input_signal_length=length_long) + + @pytest.mark.with_downloads() + @pytest.mark.unit + def test_change_save_restore(self): + + model = ASRModel.from_pretrained("stt_en_conformer_ctc_small") + model.change_attention_model(self_attention_model="rel_pos_local_attn", att_context_size=(64, 64)) + attr_for_eq_check = ["encoder.self_attention_model", "encoder.att_context_size"] + + with tempfile.TemporaryDirectory() as restore_folder: + with tempfile.TemporaryDirectory() as save_folder: + save_folder_path = save_folder + # Where model will be saved + model_save_path = os.path.join(save_folder, f"{model.__class__.__name__}.nemo") + model.save_to(save_path=model_save_path) + # Where model will be restored from + model_restore_path = os.path.join(restore_folder, f"{model.__class__.__name__}.nemo") + shutil.copy(model_save_path, model_restore_path) + # at this point save_folder should not exist + assert save_folder_path is not None and not os.path.exists(save_folder_path) + assert not os.path.exists(model_save_path) + assert os.path.exists(model_restore_path) + # attempt to restore + model_copy = model.__class__.restore_from( + restore_path=model_restore_path, + map_location=None, + strict=True, + return_config=False, + override_config_path=None, + ) + + assert model.num_weights == model_copy.num_weights + if attr_for_eq_check is not None and len(attr_for_eq_check) > 0: + for attr in attr_for_eq_check: + assert getattr2(model, attr) == getattr2(model_copy, attr) From d7e84cbb5c52a406c229240014bef62f116daeb4 Mon Sep 17 00:00:00 2001 From: Taejin Park Date: Thu, 15 Dec 2022 09:46:23 -0800 Subject: [PATCH 050/154] Add core classes and functions for online clustering diarizer part 1 (#5526) * Add core classes and functions for online clustering diarizer Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add audio to labels code Signed-off-by: Taejin Park * resolve type errors Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added unit=tests for very short audio Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Filled all missing docstrings Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolved conflict and added missing docstrings Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed unit-test errors Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the wrongly added file - megatron_gpt_model.py Signed-off-by: Taejin Park * Fix wrongly included file - megatron_gpt_model.py Signed-off-by: Taejin Park * resolve code quality issue Signed-off-by: Taejin Park * Fixed unit-test errors and bugs Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changed total_sec for offline_clustering toy_data in unit-tests Signed-off-by: Taejin Park * fixed merging index offset bug Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * only including part 1 files Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused function Signed-off-by: Taejin Park * fixed unused imports Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * divided nmesc_clustering.py into two and reflected first-pass comments Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding offline/online_clustering.py Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix code QL autocomment Signed-off-by: Taejin Park * Removed unused imports Signed-off-by: Taejin Park * Update nemo/collections/asr/parts/utils/online_clustering.py Co-authored-by: Sean Naren Signed-off-by: Taejin Park * Reflected comments Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolved code scanning issue Signed-off-by: Taejin Park * Update nemo/collections/asr/parts/utils/offline_clustering.py Co-authored-by: Sean Naren Signed-off-by: Taejin Park Signed-off-by: Taejin Park Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nithin Rao Co-authored-by: Sean Naren Signed-off-by: Elena Rastorgueva --- .../asr/data/audio_to_diar_label.py | 2 +- nemo/collections/asr/data/audio_to_label.py | 21 +- ...sc_clustering.py => offline_clustering.py} | 134 +- .../asr/parts/utils/online_clustering.py | 1096 +++++++++++++++++ .../asr/parts/utils/speaker_utils.py | 2 +- tests/collections/asr/test_diar_utils.py | 545 ++++++-- 6 files changed, 1654 insertions(+), 146 deletions(-) rename nemo/collections/asr/parts/utils/{nmesc_clustering.py => offline_clustering.py} (91%) create mode 100644 nemo/collections/asr/parts/utils/online_clustering.py diff --git a/nemo/collections/asr/data/audio_to_diar_label.py b/nemo/collections/asr/data/audio_to_diar_label.py index 75ab2482cee9..a1cb6d0f1bdc 100644 --- a/nemo/collections/asr/data/audio_to_diar_label.py +++ b/nemo/collections/asr/data/audio_to_diar_label.py @@ -19,7 +19,7 @@ import torch -from nemo.collections.asr.parts.utils.nmesc_clustering import get_argmin_mat +from nemo.collections.asr.parts.utils.offline_clustering import get_argmin_mat from nemo.collections.asr.parts.utils.speaker_utils import convert_rttm_line, prepare_split_data from nemo.collections.common.parts.preprocessing.collections import DiarizationSpeechLabel from nemo.core.classes import Dataset diff --git a/nemo/collections/asr/data/audio_to_label.py b/nemo/collections/asr/data/audio_to_label.py index a738f5bc5cb5..cc41fb5e0bd5 100644 --- a/nemo/collections/asr/data/audio_to_label.py +++ b/nemo/collections/asr/data/audio_to_label.py @@ -30,20 +30,25 @@ VALID_FILE_FORMATS = ';'.join(['wav', 'mp3', 'flac'] + [fmt.lower() for fmt in valid_sf_formats.keys()]) -def repeat_signal(signal, sig_len, required_length): +def repeat_signal(signal: torch.Tensor, sig_len: int, required_length: int) -> torch.Tensor: """repeat signal to make short signal to have required_length Args: - signal (FloatTensor): input signal - sig_len (LongTensor): length of input signal - required_length(float) : length of generated signal + signal (Tensor): input signal + sig_len (int): length of input signal + required_length (int): length of generated signal Returns: - signal (FloatTensor): generated signal of required_length by repeating itself. + signal (Tensor): generated signal of required_length by repeating itself. """ + sub: torch.Tensor = torch.tensor([]) repeat = int(required_length // sig_len) rem = int(required_length % sig_len) - sub = signal[-rem:] if rem > 0 else torch.tensor([]) - rep_sig = torch.cat(repeat * [signal]) - signal = torch.cat((rep_sig, sub)) + sub: torch.Tensor = torch.tensor([]) + rep_sig: torch.Tensor = torch.cat(repeat * [signal]) + if rem > 0: + sub = signal[-rem:] + signal = torch.cat((rep_sig, sub)) + else: + signal = rep_sig return signal diff --git a/nemo/collections/asr/parts/utils/nmesc_clustering.py b/nemo/collections/asr/parts/utils/offline_clustering.py similarity index 91% rename from nemo/collections/asr/parts/utils/nmesc_clustering.py rename to nemo/collections/asr/parts/utils/offline_clustering.py index 9c18b43f64e9..8ce10f692b4a 100644 --- a/nemo/collections/asr/parts/utils/nmesc_clustering.py +++ b/nemo/collections/asr/parts/utils/offline_clustering.py @@ -38,7 +38,7 @@ @torch.jit.script -def cos_similarity(a: torch.Tensor, b: torch.Tensor, eps=torch.tensor(3.5e-4)) -> torch.Tensor: +def cos_similarity(emb_a: torch.Tensor, emb_b: torch.Tensor, eps=torch.tensor(3.5e-4)) -> torch.Tensor: """ Calculate cosine similarities of the given two set of tensors. The output is an N by N matrix where N is the number of feature vectors. @@ -53,8 +53,11 @@ def cos_similarity(a: torch.Tensor, b: torch.Tensor, eps=torch.tensor(3.5e-4)) - res (Tensor): N by N matrix containing the cosine similarities of the values. """ - a_norm = a / (torch.norm(a, dim=1).unsqueeze(1) + eps) - b_norm = b / (torch.norm(a, dim=1).unsqueeze(1) + eps) + # If number of embedding count is 1, it creates nan values + if emb_a.shape[0] == 1 or emb_b.shape[0] == 1: + raise ValueError(f"Number of feature vectors should be greater than 1 but got {emb_a.shape} and {emb_b.shape}") + a_norm = emb_a / (torch.norm(emb_a, dim=1).unsqueeze(1) + eps) + b_norm = emb_b / (torch.norm(emb_b, dim=1).unsqueeze(1) + eps) res = torch.mm(a_norm, b_norm.transpose(0, 1)) res.fill_diagonal_(1) return res @@ -280,7 +283,6 @@ def getTheLargestComponent(affinity_mat: torch.Tensor, seg_index: int, device: t Returns: connected_nodes (Tensor): A tensor containing booleans that indicate whether the node is connected. - """ num_of_segments = affinity_mat.shape[0] @@ -288,6 +290,7 @@ def getTheLargestComponent(affinity_mat: torch.Tensor, seg_index: int, device: t nodes_to_explore = torch.zeros(num_of_segments, dtype=torch.bool).to(device) nodes_to_explore[seg_index] = True + nodes_to_explore = nodes_to_explore.to(device) for k in range(num_of_segments): last_num_component = connected_nodes.sum() torch.logical_or(connected_nodes, nodes_to_explore, out=connected_nodes) @@ -298,7 +301,7 @@ def getTheLargestComponent(affinity_mat: torch.Tensor, seg_index: int, device: t if len(indices.size()) == 0: indices = indices.unsqueeze(0) for i in indices: - neighbors = affinity_mat[i] + neighbors = affinity_mat[i].to(device) torch.logical_or(nodes_to_explore, neighbors.squeeze(0), out=nodes_to_explore) return connected_nodes @@ -331,7 +334,7 @@ def getAffinityGraphMat(affinity_mat_raw: torch.Tensor, p_value: int) -> torch.T Calculate a binarized graph matrix and symmetrize the binarized graph matrix. """ - X = getKneighborsConnections(affinity_mat_raw, p_value) + X = affinity_mat_raw if p_value <= 0 else getKneighborsConnections(affinity_mat_raw, p_value) symm_affinity_mat = 0.5 * (X + X.T) return symm_affinity_mat @@ -362,9 +365,9 @@ def getRepeatedList(mapping_argmat: torch.Tensor, score_mat_size: torch.Tensor) repeated indices that will be used for creating a repeated affinity matrix. This repeated matrix is then used for fusing multiple affinity values. """ - repeat_list = torch.zeros(score_mat_size, dtype=torch.int32) + repeat_list = torch.zeros(score_mat_size, dtype=torch.int32).to(mapping_argmat.device) idxs, counts = torch.unique(mapping_argmat, return_counts=True) - repeat_list[idxs] = counts.int() + repeat_list[idxs] = counts.int().to(mapping_argmat.device) return repeat_list @@ -418,12 +421,64 @@ def getCosAffinityMatrix(emb: torch.Tensor) -> torch.Tensor: Matrix containing cosine similarity values among the given embedding vectors. dimension: (Number of embedding vectors) x (Number of embedding vectors) """ - emb = emb.float() - sim_d = cos_similarity(emb, emb) - sim_d = ScalerMinMax(sim_d) + if emb.shape[0] == 1: + sim_d = torch.tensor([[1]]).to(emb.device) + else: + emb = emb.float() + sim_d = cos_similarity(emb, emb) + sim_d = ScalerMinMax(sim_d) return sim_d +@torch.jit.script +def get_scale_interpolated_embs( + multiscale_weights: torch.Tensor, + embeddings_in_scales: List[torch.Tensor], + timestamps_in_scales: List[torch.Tensor], + device: torch.device = torch.device('cpu'), +) -> Tuple[torch.Tensor, List[torch.Tensor]]: + """ + Generate a scale-interpolated single embedding vector by calculating the weighted sum + of the multiple embedding vectors from different scales. The output is a set of embedding + vectors corresponding to the base-scale segments. + + Args: + multiscale_weights (Tensor): + Tensor containing Multiscale weights + Dimensions: (Number of scales) x 1 + embeddings_in_scales (list): + List containing split embedding tensors by each scale + timestamps_in_scales (list): + List containing split timestamps tensors by each scale + device (torch.device): + Torch device variable + + Returns: + context_emb (torch.tensor): + A set of scale-interpolated embedding vectors. + Dimensions: (Number of base-scale segments) x (Dimensions of embedding vector) + session_scale_mapping_list (list): + List containing argmin arrays indexed by scale index. + """ + rep_mat_list = [] + multiscale_weights = multiscale_weights.to(device) + session_scale_mapping_list = get_argmin_mat(timestamps_in_scales) + scale_list = list(range(len(timestamps_in_scales))) + for scale_idx in scale_list: + mapping_argmat = session_scale_mapping_list[scale_idx] + emb_t = embeddings_in_scales[scale_idx].to(device) + mapping_argmat = mapping_argmat.to(device) + repeat_list = getRepeatedList(mapping_argmat, torch.tensor(emb_t.shape[0])).to(device) + rep_emb_t = torch.repeat_interleave(emb_t, repeats=repeat_list, dim=0) + rep_mat_list.append(rep_emb_t) + stacked_scale_embs = torch.stack(rep_mat_list) + context_emb = torch.matmul(stacked_scale_embs.permute(2, 1, 0), multiscale_weights.t()).squeeze().t() + if len(context_emb.shape) < 2: + context_emb = context_emb.unsqueeze(0) + context_emb = context_emb.to(device) + return context_emb, session_scale_mapping_list + + @torch.jit.script def getMultiScaleCosAffinityMatrix( multiscale_weights: torch.Tensor, @@ -436,10 +491,15 @@ def getMultiScaleCosAffinityMatrix( apply multiscale weights to calculate the fused similarity matrix. Args: - uniq_embs_and_timestamps (dict): - The dictionary containing embeddings, timestamps and multiscale weights. - If uniq_embs_and_timestamps contains only one scale, single scale diarization - is performed. + multiscale_weights (Tensor): + Tensor containing Multiscale weights + Dimensions: (Number of scales) x 1 + embeddings_in_scales (list): + List containing split embedding tensors by each scale + timestamps_in_scales (list): + List containing split timestamps tensors by each scale + device (torch.device): + Torch device variable Returns: fused_sim_d (Tensor): @@ -539,10 +599,11 @@ def addAnchorEmb(emb: torch.Tensor, anchor_sample_n: int, anchor_spk_n: int, sig """ emb_dim = emb.shape[1] std_org = torch.std(emb, dim=0) + sigma = torch.tensor(sigma).to(emb.device) new_emb_list = [] for _ in range(anchor_spk_n): - emb_m = torch.tile(torch.randn(1, emb_dim), (anchor_sample_n, 1)) - emb_noise = torch.randn(anchor_sample_n, emb_dim).T + emb_m = torch.tile(torch.randn(1, emb_dim), (anchor_sample_n, 1)).to(emb.device) + emb_noise = torch.randn(anchor_sample_n, emb_dim).T.to(emb.device) emb_noise = torch.matmul( torch.diag(std_org), emb_noise / torch.max(torch.abs(emb_noise), dim=0)[0].unsqueeze(0) ).T @@ -602,14 +663,14 @@ def getEnhancedSpeakerCount( max_num_speakers=emb.shape[0], max_rp_threshold=0.15, sparse_search=True, - sparse_search_volume=50, + sparse_search_volume=10, fixed_thres=-1.0, nme_mat_size=300, cuda=cuda, ) est_num_of_spk, _ = nmesc.forward() est_num_of_spk_list.append(est_num_of_spk.item()) - comp_est_num_of_spk = torch.mode(torch.tensor(est_num_of_spk_list))[0] - anchor_spk_n + comp_est_num_of_spk = torch.tensor(max(torch.mode(torch.tensor(est_num_of_spk_list))[0].item() - anchor_spk_n, 1)) return comp_est_num_of_spk @@ -625,9 +686,9 @@ def split_input_data( Args: embeddings_in_scales (Tensor): - Concatenated Torch Tensor containing embeddings in multiple scales + Concatenated Torch tensor containing embeddings in multiple scales timestamps_in_scales (Tensor): - Concatenated Torch Tensor containing timestamps in multiple scales + Concatenated Torch tensor containing timestamps in multiple scales multiscale_segment_counts (LongTensor): Concatenated Torch LongTensor containing number of segments per each scale @@ -860,6 +921,8 @@ def __init__( Number of p_values we search during NME analysis. Default is 30. The lower the value, the faster NME-analysis becomes. However, a value lower than 20 might cause a poor parameter estimation. + nme_mat_size (int): + Targeted size of matrix for NME analysis. use_subsampling_for_nme (bool): Use subsampling to reduce the calculational complexity. Default is True. @@ -875,8 +938,9 @@ def __init__( If True, turn on parallelism based on torch.jit.script library. cuda (bool): Use cuda for Eigen decomposition if cuda=True. - nme_mat_size (int): - Targeted size of matrix for NME analysis. + device (torch.device): + Torch device variable + """ self.max_num_speakers: int = max_num_speakers self.max_rp_threshold: float = max_rp_threshold @@ -886,18 +950,24 @@ def __init__( self.sparse_search_volume: int = sparse_search_volume self.min_p_value = torch.tensor(2) self.fixed_thres: float = fixed_thres - self.cuda: bool = cuda self.eps = 1e-10 self.max_N = torch.tensor(0) self.mat: torch.Tensor = mat self.p_value_list: torch.Tensor = self.min_p_value.unsqueeze(0) - self.device = device + self.cuda: bool = cuda + self.device: torch.device = device self.maj_vote_spk_count: bool = maj_vote_spk_count self.parallelism: bool = parallelism def forward(self) -> Tuple[torch.Tensor, torch.Tensor]: """ Subsample the input matrix to reduce the computational load. + + Returns: + est_num_of_spk (Tensor): + Estimated number of speakers from NMESC approach + p_hat_value (Tensor): + Estimated p-value (determines how many neighboring values to be selected) """ if self.use_subsampling_for_nme: subsample_ratio = self.subsampleAffinityMat(self.nme_mat_size) @@ -1025,7 +1095,7 @@ def getPvalueList(self) -> torch.Tensor: self.max_N = torch.max( torch.floor(torch.tensor(self.mat.shape[0] * self.fixed_thres)).type(torch.int), self.min_p_value ) - p_value_list = torch.tensor(self.max_N).unsqueeze(0).int() + p_value_list = self.max_N.unsqueeze(0).int() else: self.max_N = torch.max( torch.floor(torch.tensor(self.mat.shape[0] * self.max_rp_threshold)).type(torch.int), self.min_p_value @@ -1088,7 +1158,7 @@ def __init__( def forward(self, param_dict: Dict[str, torch.Tensor]) -> torch.LongTensor: """ A function wrapper designed for inference in exported script format. - + Note: Dict is used to allow easy inference of the exported jit model in Triton server using easy to understand naming convention. @@ -1138,7 +1208,7 @@ def forward_infer( oracle_num_speakers: int = -1, max_rp_threshold: float = 0.15, max_num_speakers: int = 8, - enhanced_count_thres: int = 80, + enhanced_count_thres: int = 40, sparse_search_volume: int = 30, fixed_thres: float = -1.0, ) -> torch.LongTensor: @@ -1180,7 +1250,7 @@ def forward_infer( Limits the range of parameter search. Clustering performance can vary depending on this range. Default is 0.15. - enhanced_count_thres: (int) + enhanced_count_thres (int): For the short audio recordings, clustering algorithm cannot accumulate enough amount of speaker profile for each cluster. Thus, function `getEnhancedSpeakerCount` employs anchor embeddings @@ -1203,7 +1273,8 @@ def forward_infer( embeddings_in_scales, timestamps_in_scales, multiscale_segment_counts ) - emb = self.embeddings_in_scales[multiscale_segment_counts.shape[0] - 1] + # Last slot is the base scale embeddings + emb = self.embeddings_in_scales[-1] # Cases for extreamly short sessions if emb.shape[0] == 1: @@ -1239,7 +1310,8 @@ def forward_infer( est_num_of_spk, p_hat_value = nmesc.forward() affinity_mat = getAffinityGraphMat(mat, p_hat_value) else: - est_num_of_spk = torch.tensor(1) + nmesc.fixed_thres = max_rp_threshold + est_num_of_spk, p_hat_value = nmesc.forward() affinity_mat = mat # n_clusters is number of speakers estimated from spectral clustering. diff --git a/nemo/collections/asr/parts/utils/online_clustering.py b/nemo/collections/asr/parts/utils/online_clustering.py new file mode 100644 index 000000000000..a01a3982c0a1 --- /dev/null +++ b/nemo/collections/asr/parts/utils/online_clustering.py @@ -0,0 +1,1096 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Copyright (c) 2007-2020 The scikit-learn developers. + +# BSD 3-Clause License + +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE +# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# NME-SC clustering is based on the implementation from the paper +# https://arxiv.org/pdf/2003.02405.pdf and the implementation from +# https://github.com/tango4j/Auto-Tuning-Spectral-Clustering. + +from typing import List, Tuple + +import numpy as np +import torch +from scipy.optimize import linear_sum_assignment +from sklearn.metrics.pairwise import linear_kernel +from sklearn.preprocessing import OneHotEncoder + +from nemo.collections.asr.parts.utils.offline_clustering import ( + NMESC, + SpectralClustering, + getAffinityGraphMat, + getCosAffinityMatrix, +) + + +def hungarian_algorithm( + spk_count: int, U_set: List[int], cmm_P: torch.Tensor, cmm_Q: torch.Tensor, PmQ: List[int], QmP: List[int] +) -> np.array: + """ + Find a mapping that minimizes the matching cost between the label P and Q. + One-hot encodding is employed to represent sequence and calculate the cost. + + Args: + spk_count (int): + Estimated speaker count + U_set (list): + Whole set of the estimated speakers + cmm_P (Tensor): + Length-matched old sequence + cmm_Q (Tensor): + Length-matched new sequence + PmQ (list): + Set P - Q (Difference of sets) + QmP (list): + Set Q - P (Difference of sets) + + Returns: + mapping_array (np.array): + Mapped labels that minimizes the cost + """ + enc = OneHotEncoder(handle_unknown='ignore') + all_spks_labels = [[x] for x in range(len(U_set))] + enc.fit(all_spks_labels) + enc_P = enc.transform(cmm_P.reshape(-1, 1)).toarray() + enc_Q = enc.transform(cmm_Q.reshape(-1, 1)).toarray() + stacked = np.hstack((enc_P, enc_Q)) + cost = -1 * linear_kernel(stacked.T)[spk_count:, :spk_count] + row_ind, col_ind = linear_sum_assignment(cost) + + # If number of are speakers in each vector is not the same + mapping_array = np.arange(len(U_set)).astype(int) + for x in range(col_ind.shape[0]): + if x in (set(PmQ) | set(QmP)): + mapping_array[x] = x + else: + mapping_array[x] = col_ind[x] + return mapping_array + + +def get_minimal_indices(Y_new: torch.LongTensor) -> torch.LongTensor: + """ + Force the unique indices of the labels to use the lowest numbers. + + Example: + >>> Y_new = [3, 3, 3, 4, 4, 5] + >>> get_minimal_indices(Y_new) + Return: + [0, 0, 0, 1, 1, 2] + + Args: + Y_new (Tensor): + Tensor containing cluster labels + + Returns: + (Tensor): Newly mapped cluster labels that has minimized indicies + """ + device = Y_new.device + Y_new_enlisted = torch.unique(Y_new).sort()[0].to(torch.long).to(device) + sequence = torch.arange(torch.max(Y_new_enlisted) + 1).to(device) + sequence[Y_new_enlisted] = torch.arange(len(Y_new_enlisted)).to(device) + return sequence[Y_new] + + +def stitch_cluster_labels(Y_old: torch.Tensor, Y_new: torch.Tensor, with_history=True): + """ + Run Hungarian algorithm (linear sum assignment) to find the best permutation mapping between + the cumulated labels in history and the new clustering output labels. + + + Args: + Y_old (Tensor): + Cumulated diarization labels. This will be concatenated with history embedding speaker label + then compared with the predicted label Y_new. + Y_new (Tensor): + Contains predicted labels for reduced history embeddings concatenated with the predicted label. + Permutation is not matched yet. + + Returns: + mapping_array[Y] (Tensor): + An output numpy array where the input Y_new is mapped with mapping_array. + """ + Y_new = get_minimal_indices(Y_new) + + # TODO: This function needs to be converted to a fully torch.jit.script-able function. + Y_old = Y_old.cpu().numpy() + Y_new = Y_new.cpu().numpy() + + if len(Y_old) == 0: + matched_output = Y_new + else: + spk_count = max(len(set(Y_old)), len(set(Y_new))) + P_raw, Q_raw = Y_old.astype(int), Y_new.astype(int) + U_set = set(P_raw) | set(Q_raw) + min_len = min(P_raw.shape[0], Q_raw.shape[0]) + P, Q = P_raw[:min_len], Q_raw[:min_len] + PmQ, QmP = set(P) - set(Q), set(Q) - set(P) + + if len(U_set) == 1: + # When two speaker vectors are exactly the same: No need to encode. + mapping_array = np.array([0, 0]) + else: + # Run Hungarian algorithm if there are more than one speaker in universal set U. + mapping_array = hungarian_algorithm(spk_count, U_set, P, Q, PmQ, QmP) + matched_output = mapping_array[Y_new] + matched_output = torch.tensor(matched_output) + matched_output = get_minimal_indices(matched_output) + return matched_output + + +@torch.jit.script +def calculate_removable_counts(removable_counts_mat: torch.Tensor, remain_count: int, num_clus: int) -> torch.Tensor: + """ + Calculate removable counts based on the arguments and calculate how many counts should be + removed from the each cluster. This function has `O(N)` (N = num_clus) time complexity to + return the desired `removable_counts_mat`. + + Example: + + The original input to `get_merge_quantity` function: + >>> pre_clus_labels = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2] + >>> num_to_be_removed = 3 + >>> min_count_per_cluster = 2 + + Histogram: (`min_count_per_cluster`=2 is removed) + 0 |***** + 1 |*** + 2 |* + + Inputs: + >>> removable_counts_mat = [5, 3, 1] + >>> remain_count = 6 + >>> num_clus = 3 + + Interim results: + >>> diff_counts + [1, 2, 2] + >>> gradual_counts + [3, 4, 2] + >>> cumsum_counts + [3, 7, 9] + + Return: + >>> removable_counts_mat + [2, 1, 0] + + Args: + removable_counts_mat (Tensor): + Tensor containing how many vectors could be removed from each cluster + remain_count (int): + Integer value that indicates the number of vectors removed from the total set + num_clus (int): + Number of clusters in the given label sequence (cardinality of a label set) + + Returns: + removable_counts_mat (Tensor): + Tensor containing the number of vectors should be removed from each cluster + """ + device = removable_counts_mat.device + zero_padded_counts = torch.cat( + [torch.tensor([0]).to(device), removable_counts_mat.sort()[0], torch.tensor([0]).to(device)], dim=0 + ) + removable_count_args = removable_counts_mat.sort(descending=True)[1] + + # Calculate the size difference between clusters + diff_counts = (zero_padded_counts[1:] - zero_padded_counts[:-1])[:num_clus] + gradual_counts = torch.arange(num_clus, 0, -1).to(device) * diff_counts + cumsum_counts = torch.cumsum(gradual_counts, dim=0) + count, remain_count_rem = 0, remain_count + + # Find how many remaining counts we can use + ind: int = 0 + for ind, num in enumerate(cumsum_counts): + if remain_count < num: + break + + # Subtract the common values step by step + if ind > 0: + for knd in range(ind): + removable_counts_mat[removable_count_args[: num_clus - knd]] -= diff_counts[knd] + remain_count_rem -= int(diff_counts[knd].item()) * (num_clus - knd) + assert remain_count >= 0, "remain_count should never be negative." + + # Add remaining values + num_labels = remain_count_rem // (num_clus - ind) + rem_labels = remain_count_rem % (num_clus - ind) + removable_counts_mat[removable_count_args[: (num_clus - ind)]] -= num_labels + removable_counts_mat[removable_count_args[:rem_labels]] -= 1 + return removable_counts_mat + + +@torch.jit.script +def get_merge_quantity( + num_to_be_removed: int, pre_clus_labels: torch.Tensor, min_count_per_cluster: int, +) -> torch.Tensor: + """ + Determine which embeddings we need to reduce or merge in history buffer. + We want to merge or remove the embedding in the bigger cluster first. + At the same time, we keep the minimum number of embedding per cluster + with the variable named min_count_per_cluster. + + Constraint: + - Each cluster should keep the number of vectors over `min_count_per_cluster`. + - In total, `num_to_be_removed` of vectors should be removed from the total buffer. + - While merging embeddings, minimize the gap between quantities between clusters. + + Example: + >>> num_to_be_removed = 3 + >>> pre_clus_labels = [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2] + >>> min_count_per_cluster = 2 + >>> get_merge_quantity(num_to_be_removed, pre_clus_labels, min_count_per_cluster) + Return: + torch.tensor([2, 1, 0]) + >>> # Sum should be equal to `num_to_be_removed` which is 3 + + Args: + num_to_be_removed: (int) + the quantity of the newly obtained embedding from the new stream of input. + pre_clus_labels: (Tensor) + the speaker labels of (the history_embedding_buffer_emb) + (the new embeddings to be added) + min_count_per_cluster: (int) + Minimum vector quantity for each cluster + + Returns: + removable_counts_mat: (Tensor) + Tensor containing the number of vectors should be removed from each cluster + """ + if num_to_be_removed > pre_clus_labels.shape[0] - 1: + raise ValueError( + f"num_to_be_removed: {num_to_be_removed} should be less than pre_clus_labels length - 1 {pre_clus_labels.shape[0]-1}" + ) + remain_count = pre_clus_labels.shape[0] - num_to_be_removed + spk_freq_count = torch.bincount(pre_clus_labels) + num_clus = len(torch.unique(pre_clus_labels)) + if remain_count < min_count_per_cluster * num_clus: + raise ValueError(f"The remaining embedding vectors should be more than { min_count_per_cluster * num_clus }") + + # Minimum vector counts should be excluded from the removable amount + min_seg_count = torch.tensor([min_count_per_cluster] * len(spk_freq_count)).to(pre_clus_labels.device) + min_seg_count_mat = torch.stack((min_seg_count, spk_freq_count)).min(0)[0] + + # Exclude minimum quantities from the removable count matrix + remain_count -= int(torch.sum(min_seg_count_mat)) + removable_counts_mat = spk_freq_count - min_seg_count_mat + + # Calculate removable counts from `remain_count` variable + removable_counts_mat = calculate_removable_counts(removable_counts_mat, remain_count, num_clus) + if int(removable_counts_mat.sum()) != num_to_be_removed: + raise ValueError("Sum of `removable_counts_mat` is not equal to `num_to_be_removed` variable.") + if not torch.all(removable_counts_mat >= 0) or not torch.all(spk_freq_count - min_seg_count_mat >= 0): + raise ValueError( + "Every value in `removable_counts_mat` should be always non-negative value but got {removable_counts_mat}" + ) + return removable_counts_mat + + +@torch.jit.script +def merge_vectors(selected_inds: torch.Tensor, emb_ndx: torch.Tensor, pre_cluster_labels: torch.Tensor): + """ + Merge feature (embedding) vectors estimated to be the same cluster label. + + Args: + selected_inds (Tensor): + Selected indices for merging + emb_ndx (Tensor): + Feature (embedding) vectors + Dimension: (original vector counts) x (feature dimension) + pre_cluster_labels (Tensor): + Original cluster labels before merging + + Returns: + merged_vecs (Tensor): + Merged feature vectors that are concatenated + Dimension: (merged vector counts) x (feature dimension) + merged_clus_labels (Tensor): + Cluster labels for the merged feature vectors + Dimension: (merged vector counts) + index_mapping (Tuple): + index_mapping[0] contains bypassed vector labels + index_mapping[1] contains merged vector labels + """ + if emb_ndx.shape[0] != pre_cluster_labels.shape[0]: + raise ValueError("pre_cluster_labels and emb_ndx have mismatch in dimension") + avg_emb = torch.mean(emb_ndx[selected_inds, :], dim=0) + merged_clus_labels = pre_cluster_labels[selected_inds] + selected_inds_list: List[int] = selected_inds.tolist() + bypass_inds_list: List[int] = [] + for k in range(emb_ndx.shape[0]): + if k not in selected_inds_list: + bypass_inds_list.append(k) + bypass_inds = torch.tensor(bypass_inds_list) + selected_inds = torch.tensor(selected_inds_list) + merged_vecs = torch.vstack((emb_ndx[bypass_inds], avg_emb)) + merged_clus_labels = torch.hstack((pre_cluster_labels[bypass_inds], merged_clus_labels[0])) + index_mapping: Tuple[torch.Tensor, torch.Tensor] = (bypass_inds, selected_inds) + return merged_vecs, merged_clus_labels + + +@torch.jit.script +def get_closest_embeddings(affinity_mat: torch.Tensor, n_closest: int) -> Tuple[torch.Tensor, torch.Tensor]: + """ + Get the indices of the embedding vectors we want to merge. + + Example: + >>> n_closest = 2 + >>> affinity_mat = [[1.0, 0.2, 0.8], + [0.2, 1.0, 0.4], + [0.8, 0.4, 1.0]] + >>> affinity_mat.sum(0) + [2.0, 1.6, 2.2] + + # The closest two embedding vectors are at index 0 and 2. + + Args: + affinity_mat: (Tensor) + Symmetric affinity matrix of the given embedding vector set. + target_emb_index: (Tensor) + Targeted speaker index + n_closest (int): + The amount of vector counts that are expected to be removed from the set + Example: + Input: 10 vectors in a set + n_closest = 5 + (5+1) vectors are merged into 1 vector + Output: 5 vectors in a set + + Returns: + idx_aff_sum (torch.Tensor): + Indices of the closest `n_closest` embedding vectors + rest_inds (torch.Tensor): + Indices of the complementary set of the indices in `idx_aff_sum` + """ + comb_limit = int(affinity_mat.shape[0] - 1) + if n_closest > comb_limit: + raise ValueError(f"Got n_closest of {n_closest}: {n_closest} is bigger than comb_limit {comb_limit}") + + # Take summed values over one axis + sum_cmat = affinity_mat.sum(0) + + # `n_closest + 1` will become 1 embedding vector after merging + idx_aff_sum = torch.argsort(sum_cmat, descending=True)[: (n_closest + 1)] + rest_inds = torch.argsort(sum_cmat, descending=True)[(n_closest + 1) :] + return idx_aff_sum, rest_inds + + +@torch.jit.script +def run_reducer( + pre_embs: torch.Tensor, target_spk_idx: int, merge_quantity: int, pre_clus_labels: torch.Tensor, +): + """ + Reduce the number of embedding vectors by merging the closest embedding vectors. + - This merging algorithm is based on the assumption that the closest embeddings are the most redundant + embedding vectors. + - The closest embedding vectors are chosen by selecting the highest top-N sum of each column in a given + affinity matrix. + - If merge_quantity is N, we choose (N+1) vectors into 1 embedding vector. Thus, we reduce N embeddings + in the original embedding vector set. + + Example: + >>> merge_quantity = 1 # We merge 1+1 = 2 embedding vectors + >>> affinity_mat = [[1.0, 0.2, 0.8], + [0.2, 1.0, 0.4], + [0.8, 0.4, 1.0]] + >>> affinity_mat.sum(0) + [2.0, 1.6, 2.2] + + The first and the third embedding vectors are merged into one embedding vector. + >>> index_mapping # (bypassed indices, merged indices) + ([1], [0, 2]) + + Args: + pre_embs (Tensor): + Potential Embedding vectors to be merged + affinity_mat (Tensor): + The affinity matrix of the `pre_embs` + target_spk_idx (int): + The targeted speaker index for merging + merge_quantity (int): + The count of embeddings to be reduced + pre_clus_labels (list) + The original cluster (speaker) index + + Returns: + result_emb (Tensor): + Set of merged embedding vectors + merged_clus_labels (list): + Cluster (speaker) labels for the merged embedding vectors + """ + if pre_embs.shape[0] != pre_clus_labels.shape[0]: + raise ValueError("Dimension mismatch between `pre_embs` and `pre_clus_labels`.") + + target_emb_index = torch.where(pre_clus_labels == target_spk_idx)[0] + org_size = target_emb_index.shape[0] + if merge_quantity > 0: + if merge_quantity > (target_emb_index.shape[0] - 1): + raise ValueError( + f"merge_quantity {merge_quantity} is larger than the half of targeted speaker's labels {target_emb_index.shape[0]-1}" + ) + total_affinity_mat = getCosAffinityMatrix(pre_embs) + # Get the lower triangle of the affinity_mat array + affinity_mat = total_affinity_mat[:, target_emb_index][target_emb_index, :] + if affinity_mat.shape[0] != target_emb_index.shape[0]: + raise ValueError( + "Dimension mismatch between targeted speaker affinity `affinity_mat` and targeted speaker index `target_emb_index`." + ) + # Get the indices of the closest embedding vectors + selected_inds, rest_inds = get_closest_embeddings(affinity_mat, merge_quantity) + spk_cluster_labels, selected_embs = pre_clus_labels[target_emb_index], pre_embs[target_emb_index] + + # Note that we need to return the indices of speaker-specific indices from `target_emb_index`. + index_mapping = (target_emb_index[rest_inds.sort()[0]], target_emb_index[selected_inds]) + + # Merge the embeddings targeted by the 2-dim indices `index_2d` + merged_embs, merged_clus_labels = merge_vectors(selected_inds, selected_embs, spk_cluster_labels) + + if (org_size - merge_quantity) != merged_embs.shape[0]: + raise ValueError( + f"Reducer output {merged_embs.shape[0]} is not matched to the target quantity {org_size - merge_quantity}." + ) + + else: + merged_embs = pre_embs[target_emb_index] + merged_clus_labels = pre_clus_labels[target_emb_index] + index_mapping = (target_emb_index, torch.arange(0)) + return merged_embs, merged_clus_labels, index_mapping + + +@torch.jit.script +def get_first_arg_index(mat: torch.Tensor, label: int) -> int: + """ + Get the index of the first element are specified by `index` variable. + + Args: + mat (Tensor): + Source matrix filled with indices + label (int): + Label which we want to find the first occuring index + + Returns: + (int) The first index of the given label + """ + return int(torch.where(mat == label)[0][0]) + + +class OnlineSpeakerClustering: + """ + Online clustering method for speaker diarization based on cosine similarity. + + Regular Clustering Attributes: + + max_num_speakers (int): + The upper bound for the number of speakers in each session + max_rp_threshold (float): + Limits the range of parameter search. + Clustering performance can vary depending on this range. + Default is 0.15. + enhanced_count_thres (int): + For the short audio recordings, clustering algorithm cannot + accumulate enough amount of speaker profile for each cluster. + Thus, function `getEnhancedSpeakerCount` employs anchor embeddings + (dummy representations) to mitigate the effect of cluster sparsity. + enhanced_count_thres = 40 is recommended. + sparse_search_volume (int): + Number of p_values we search during NME analysis. + Default is 30. The lower the value, the faster NME-analysis becomes. + Lower than 20 might cause a poor parameter estimation. + fixed_thres (float): + A fixed threshold for finding p-closest neighbors in affinity matrix for clustering. + If fixed_thres value is provided, NME-analysis process will be skipped. + This value should be optimized on a development set to obtain a quality result. + Default is None and performs NME-analysis to estimate the threshold. + min_samples_for_nmesc (int): + The minimum number of samples required for NME clustering. This avoids + zero p_neighbour_lists. If the input has fewer segments than min_samples, + it is directed to the enhanced speaker counting mode. + sparse_search (bool): + Toggle sparse search mode. If True, limit the size of p_value_list to sparse_search_volume. + cuda (bool): + Use cuda for Eigen decomposition if cuda=True. + + Online Processing Attributes: + + history_buffer_size (int): + - This is a buffer where diarization history is saved in the form of averaged speaker embedding vector. + - The values in [50, 200] range is recommended while the system requires bigger buffer size for + sessions with larger number of speakers. + current_buffer_size (int): + - This is a buffer which process the most recent speaker embedding vector inputs. + current-buffer is first-in-first-out (FIFO) queue where the embeddings accepted earlier + get to merged and saved to history buffer. + - In general, [50, 200] range is recommended and the performance can be sensitive on this buffer size. + min_spk_counting_buffer_size (int): + Integer number for speaker counting buffer. Number of speakers are estimated through a small buffer + and the number is obtained by taking majority vote. + min_frame_per_spk (int): + Below this number, the system considers the whole input segments as a single speaker. + p_update_freq (int): + Frequency (interval) of updating p_value for NMESC algorithm. + p_value_skip_frame_thres (int): + After `frame_index` passes this number, `p_value` estimation is skipped for inference speed + p_value_queue_size (int): + `p_value` buffer for major voting + use_temporal_label_major_vote (bool): + Boolean that determines whether to use temporal majorvoting for the final speaker labels + temporal_label_major_vote_buffer_size (int): + Buffer size for major-voting the + """ + + def __init__( + self, + max_num_speakers: int, + max_rp_threshold: float = 0.15, + enhanced_count_thres: float = 40, + fixed_thres: float = -1.0, + sparse_search_volume: int = 15, + history_buffer_size: int = 150, + current_buffer_size: int = 150, + min_spk_counting_buffer_size=3, + min_frame_per_spk: int = 15, + p_update_freq: int = 5, + p_value_skip_frame_thres: int = 50, + p_value_queue_size: int = 3, + use_temporal_label_major_vote: bool = False, + temporal_label_major_vote_buffer_size: int = 11, + ): + self.max_num_speakers = max_num_speakers + self.max_rp_threshold = max_rp_threshold + self.enhanced_count_thres = enhanced_count_thres + self.sparse_search_volume = sparse_search_volume + self.fixed_thres = fixed_thres + self.history_n = history_buffer_size + self.current_n = current_buffer_size + self.min_spk_counting_buffer_size = min_spk_counting_buffer_size + self.min_frame_per_spk = min_frame_per_spk + self.p_update_freq = p_update_freq + self.p_value_skip_frame_thres = p_value_skip_frame_thres + self.p_value_queue_size = p_value_queue_size + self.use_temporal_label_major_vote = use_temporal_label_major_vote + self.temporal_label_major_vote_buffer_size = temporal_label_major_vote_buffer_size + self.num_spk_stat = [] + self.p_value_hist = [] + + self._init_memory_buffer_variables() + self._init_memory_embeddings() + + def _init_memory_buffer_variables(self): + """ + Initialize memory buffer related variables. + + Attributes: + max_embed_count (int): + The maximum number of segments the streaming system has ever seen + memory_margin (int): + The margin that is added to keep the segmentation data in the streaming system + _minimum_segments_per_buffer (int): + Maximum number of embedding vectors kept in history buffer per speaker. + Example: + history_buffer_size (history_n) = 100 + max_num_speakers = 4 + _minimum_segments_per_buffer = 25 + history_buffer_seg_end (int): + Index that indicates the boundary between history embedding sets and current processing buffer + when history embedding vectors and current input embedding vectors are concatenated into a + single matrix. + """ + self.max_embed_count = 0 + self.memory_margin = 0 + self._minimum_segments_per_buffer = int(self.history_n / self.max_num_speakers) + self.history_buffer_seg_end = 0 + + def _init_memory_embeddings(self): + """ + Initialize history buffer related variables. + + Attributes: + is_online (bool): + If self.is_online is False: + FIFO queue does not push out any speaker embedding vector + If self.is_online is True: + FIFO queue starts push out speaker embedding vectors and saving them into + history buffer. + history_embedding_buffer_emb (Tensor) + Tensor containing speaker embedding vectors for saving the history of the previous + speaker profile in the given audio session + history_embedding_buffer_label (Tensor) + Speaker label (cluster label) for embedding vectors saved in the history buffer + Y_fullhist (Tensor) + Tensor containing the speaker label hypothesis from start to current frame + """ + self.is_online = False + self.history_embedding_buffer_emb = torch.tensor([]) + self.history_embedding_buffer_label = torch.tensor([]) + self.Y_fullhist = torch.tensor([]) + + def onlineNMEanalysis(self, nmesc: NMESC, frame_index: int) -> Tuple[torch.LongTensor, torch.Tensor]: + """ + To save the running time, the p-value is only estimated in the beginning of the session. + After switching to online mode, the system uses the most common estimated p-value. + Estimating p-value requires a plenty of computational resource. The less frequent estimation of + p-value can speed up the clustering algorithm by a huge margin. + + Args: + nmesc: (NMESC) + nmesc instance. + frame_index (int): + Unique index for each segment and embedding vector + + Returns: + est_num_of_spk: (int) + The estimated number of speakers. + p_hat_value: (int) + The estimated p-value from NMESC method. + """ + if len(self.p_value_hist) == 0 or ( + frame_index < self.p_value_skip_frame_thres and frame_index % self.p_update_freq == 0 + ): + est_num_of_spk, p_hat_value = nmesc.forward() + self.p_value_hist.append(p_hat_value) + if len(self.p_value_hist) > self.p_value_queue_size: + self.p_value_hist.pop(0) + p_hat_value = max(self.p_value_hist, key=self.p_value_hist.count) + g_p, est_num_of_spk = nmesc.getEigRatio(p_hat_value) + return est_num_of_spk, p_hat_value + + def speaker_counter_buffer(self, est_num_of_spk: int) -> int: + """ + Use a queue to avoid unstable speaker counting results. + + Args: + est_num_of_spk (int): + Estimated number of speakers + """ + if type(est_num_of_spk.item()) != int: + est_num_of_spk = int(est_num_of_spk.item()) + + self.num_spk_stat.append(est_num_of_spk) + if len(self.num_spk_stat) > self.min_spk_counting_buffer_size: + self.num_spk_stat.pop(0) + num_spks_bincount = torch.bincount(torch.tensor(self.num_spk_stat)) + est_num_of_spk = torch.argmax(num_spks_bincount) + return est_num_of_spk + + def limit_frames_per_speaker(self, frame_index: int, est_num_of_spk: int) -> int: + """ + Limit the estimated number of speakers in proportion to the number of speakers. + + Args: + frame_index (int): + Unique index for each segment and embedding vector + est_num_of_spk (int): + Estimated number of speakers + Returns: + (int) Estimated number of speakers capped by `self.min_frame_per_spk` + """ + return min(est_num_of_spk, int(1 + frame_index // self.min_frame_per_spk)) + + def online_spk_num_estimation(self, mat_in: torch.Tensor, nmesc, frame_index: int) -> Tuple[int, torch.Tensor]: + """ + Online version of speaker estimation involves speaker counting buffer and application of per-speaker + frame count limit. + + Args: + mat_in (Tensor): + Raw affinity matrix containing similarity values of each pair of segments + nmesc (NMESC): + NMESC class instance + frame_index (int) + Unique frame index of online processing pipeline + + Returns: + est_num_of_spk (int): + Estimated number of speakers + nmesc (NMESC): + NMESC class instance + frame_index (int): + Unique frame index of online processing pipeline + """ + est_num_of_spk, p_hat_value = self.onlineNMEanalysis(nmesc, frame_index) + affinity_mat = getAffinityGraphMat(mat_in, p_hat_value) + raw_est_num_of_spk = self.speaker_counter_buffer(est_num_of_spk) + est_num_of_spk = self.limit_frames_per_speaker(frame_index, raw_est_num_of_spk) + return est_num_of_spk, affinity_mat + + def prepare_embedding_update( + self, emb_in: torch.Tensor, base_segment_indexes: List[int] + ) -> Tuple[bool, int, torch.Tensor]: + """ + This function performs the following tasks: + 1. Decide whether to extract more embeddings or not (by setting `update_speaker_register`) + (If we need update): + 2. Calculate how many embeddings should be updated (set `new_emb_n` variable) + 3. Update history embedding vectors and save it to `pre_embs`. + + We only save the index and clustering label of each embedding. + + - Case-1: The very first step + This else statement is for the very first diarization loop. + This is the very first reduction frame. + + - Case-2: Number of embedding vectors is increased, therefore we need to update. + Since there are new embeddings, we push the same amount (new_emb_n) + of old embeddings to the history buffer. + We should also update self.history_buffer_seg_end which is a pointer. + update to history emb: emb_in[emb_idx_stt:emb_idx_end] + update to history label: self.Y_fullhist[label_stt:_end] + + - Case-3: Number of embedding vectors is decreased + If the number of embeddings is decreased compared to the last trial, + then skip embedding merging. + + Args: + emb_in (Tensor): + Tensor containing embedding vectors + Dimensions: (number of embedding vectors) x (embedding dimension) + base_segment_indexes (list): + List containing unique segment (embedding vector) index + + Returns: + update_speaker_register (bool): + Boolean indicates whether to update speaker embedding vectors. + new_emb_n (int): + The amount of embedding vectors that are exceeding FIFO queue size + pre_embs (Tensor): + Embedding vector matrix (# of embs x emb dim) before merging + """ + _segment_indexes_mat = torch.tensor(base_segment_indexes) + self.total_segments_processed_count = int(_segment_indexes_mat[-1] + 1) + hist_curr_boundary = int(self.total_segments_processed_count - self.current_n) + new_emb_n, pre_embs = None, None + update_speaker_register = True + + # Case-1: The very first step + if len(self.history_embedding_buffer_emb) == 0: + new_emb_n = self.total_segments_processed_count - (self.current_n + self.history_n) + hist_curr_boundary_emb_idx = get_first_arg_index(_segment_indexes_mat, hist_curr_boundary) + pre_embs = emb_in[:hist_curr_boundary_emb_idx] + self.pre_clus_labels = self.Y_fullhist[:hist_curr_boundary] + + # Case-2: Number of embedding vectors is increased, need to update history and its label + elif self.total_segments_processed_count > self.max_embed_count: + + # Calculate the number of new embedding vectors + label_stt, label_end = self.history_buffer_seg_end, hist_curr_boundary + new_emb_n = label_end - label_stt + assert new_emb_n > 0, "new_emb_n should be a positve integer number." + + # Add embedding vectors to `pre_embs` so that we can merge it with reducer function. + emb_idx_stt = int(get_first_arg_index(_segment_indexes_mat, label_stt)) + emb_idx_end = int(get_first_arg_index(_segment_indexes_mat, label_end)) + pre_embs = torch.vstack((self.history_embedding_buffer_emb, emb_in[emb_idx_stt:emb_idx_end])) + self.pre_clus_labels = torch.hstack( + (self.history_embedding_buffer_label, self.Y_fullhist[label_stt:label_end]) + ) + + # Case-3: Number of embedding vectors is decreased + # There will be no embedding update, so new_emb_n, pre_embs should be None + else: + update_speaker_register = False + + # Update the history buffer index + self.history_buffer_seg_end = hist_curr_boundary + if new_emb_n is not None and new_emb_n < 0: + raise ValueError(f"new_emb_n should not be negative but got new_emb_n: {new_emb_n}") + return update_speaker_register, new_emb_n, pre_embs + + def make_constant_length_emb(self, emb_in: torch.Tensor, base_segment_indexes: torch.Tensor) -> torch.Tensor: + """ + This function deals with edge cases when the number of segments decreases and the number of embedding falls + short for the labels. + + - ASR decoder occasionally returns less number of words compared to the previous frame. + - In this case, we obtain fewer embedding vectors for the short period of time. To match the pre-defined + length, the last embedding vector is repeated to fill the voidness. + - The repeated embedding will be soon replaced by the actual embeddings once the system takes new frames. + + Args: + emb_in (Tensor): + If self.is_online is False: + `emb` contains only current speaker embedding inputs, which is FIFO queue + If self.is_online is True: + `emb` contains history buffer and FIFO queue + base_segment_indexes (Tensor): + Tensor containing unique segment (embedding vector) index + + Returns: + emb_curr (Tensor): + Length preserved speaker embedding vectors + """ + segment_indexes_mat = torch.tensor(base_segment_indexes) + curr_clustered_segments = torch.where(segment_indexes_mat >= self.history_buffer_seg_end)[0] + + # Check if the current buffer result is falling short compared to `self.current_n`. + if emb_in[curr_clustered_segments].shape[0] < self.current_n: + delta_count = self.current_n - emb_in[curr_clustered_segments].shape[0] + fill_in_emb = torch.tile(emb_in[curr_clustered_segments][-1], (delta_count, 1)) + emb_curr = torch.vstack((emb_in[curr_clustered_segments], fill_in_emb)) + else: + emb_curr = emb_in[curr_clustered_segments] + return emb_curr + + def reduce_embedding_sets( + self, emb_in: torch.Tensor, base_segment_indexes: torch.Tensor + ) -> Tuple[torch.Tensor, bool]: + """ + Merge the given embedding vectors based on the calculate affinity matrix. + + Args: + emb_in (Tensor): + If self.is_online is False: + `emb` contains only current speaker embedding inputs, which is FIFO queue + If self.is_online is True: + `emb` contains history buffer and FIFO queue + base_segment_indexes (Tensor): + Tensor containing unique segment (embedding vector) index + + Returns: + history_embedding_buffer_emb (Tensor): + Matrix containing merged embedding vectors of the previous frames. + This matrix is referred to as "history buffer" in this class. + update_speaker_register (bool): + Boolean indicates whether to update speaker + + Example: + + at the frame index where `is_online` turns to True: + + |---hist-buffer---|-----FIFO-queue-----| + + self.history_n = 10 + self.current_n = 20 + + Step (1) + |-----------------|ABCDEF--------------| + + If we get two more segments, "NN" as in the description: + history buffer = 10 + current buffer = 22 + + Step (2) + |-----------------|ABCDEF--------------XY| + + The newly accepted embeddings go through a FIFO queue (first embedding, first merged) + history buffer = 12 + current buffer = 20 + + Step (3) + |-----------------AB|CDEF--------------XY| + + After merging (reducing) the embedding set: + history buffer = 10 + current buffer = 20 + + Step (4) + |================|CDEF--------------XY| + + After clustering: + + |0000011111|11110000110010010011| + + This label is self.Y_fullhist (shape is (history_n + current_n) ) + + self.history_buffer_seg_end (int): + The total number of segments that have been merged from the beginning of the session. + (=hist_curr_boundary) + + """ + update_speaker_register, new_emb_n, pre_embs = self.prepare_embedding_update(emb_in, base_segment_indexes) + + # Update the history/current_buffer boundary cursor + total_emb, total_cluster_labels = [], [] + + if update_speaker_register: + + # Calculate how many embedding vectors should be reduced per speaker + class_target_vol = get_merge_quantity( + num_to_be_removed=new_emb_n, + pre_clus_labels=self.pre_clus_labels, + min_count_per_cluster=self._minimum_segments_per_buffer, + ) + + # Merge the segments in the history buffer + for spk_idx, target_num in enumerate(list(class_target_vol)): + merged_embs, merged_clus_labels, _ = run_reducer( + pre_embs=pre_embs, + target_spk_idx=spk_idx, + merge_quantity=target_num, + pre_clus_labels=self.pre_clus_labels, + ) + total_emb.append(merged_embs) + total_cluster_labels.append(merged_clus_labels) + + self.history_embedding_buffer_emb = torch.vstack(total_emb) + self.history_embedding_buffer_label = torch.hstack(total_cluster_labels) + if self.history_embedding_buffer_emb.shape[0] != self.history_n: + raise ValueError("History embedding size is not maintained correctly.") + + else: + total_emb.append(self.history_embedding_buffer_emb) + total_cluster_labels.append(self.history_embedding_buffer_label) + + # `emb_curr` is the incumbent set of embeddings which is the the latest. + emb_curr = self.make_constant_length_emb(emb_in, base_segment_indexes) + total_emb.append(emb_curr) + + # Before perform clustering, we attach the current_n number of estimated speaker labels + # from the previous clustering result. + total_cluster_labels.append(self.Y_fullhist[-self.current_n :]) + + history_and_current_emb = torch.vstack(total_emb) + history_and_current_labels = torch.hstack(total_cluster_labels) + if history_and_current_emb.shape[0] != len(history_and_current_labels): + raise ValueError("history_and_current_emb has a mismatch with history_and_current_labels.") + + self.max_embed_count = max(self.total_segments_processed_count, self.max_embed_count) + return history_and_current_emb, update_speaker_register + + def get_reduced_mat(self, emb_in, base_segment_indexes) -> Tuple[torch.Tensor, bool]: + """ + Choose whether we want to add embeddings to the memory or not. + The processing buffer has size of (self.current_n + self.history_n). + + 1. If margin_seg_n > 0, this means we have more embedding vectors than we can hold in the processing buffer. + - `is_online` should be `True` + - reduce the number of embedding vectors by merging the closest ones. + call `self.reduce_embedding_sets` function + + 2. If margin_seg_n <= 0, this means that we can accept more embedding vectors and yet to fill the processing buffer. + - `is_online` should be `False` + - We replace `merged_emb` variable with the raw input `emb_in`. + - `add_new` is `True`, since we are adding more embedding vectors to `merged_emb` variable. + + Args: + emb_in (Tensor): + If self.is_online is False: + `emb` contains only current speaker embedding inputs + base_segment_indexes (Tensor): + Tensor containing unique segment (embedding vector) index + + Returns: + merged_emb (Tensor): + Matrix containing merged embedding vectors of the previous frames. + This matrix is referred to as "history buffer" in this class. + add_new (bool): + Boolean that indicates whether there is a new set of segments. Depending on the VAD timestamps, + the number of subsegments can be ocassionally decreased. If `add_new=True`, then it adds the newly + acquired cluster labels. + + """ + margin_seg_n = emb_in.shape[0] - (self.current_n + self.history_n) + if margin_seg_n > 0: + self.is_online = True + merged_emb, add_new = self.reduce_embedding_sets(emb_in, base_segment_indexes) + else: + self.is_online = False + merged_emb = emb_in + add_new = True + return merged_emb, add_new + + def match_labels(self, Y_new: torch.Tensor, add_new: bool) -> torch.Tensor: + """ + self.history_buffer_seg_end is a timestamp that tells to which point is history embedding contains from self.Y_fullhist. + If embedding reducing is done correctly, we should discard (0, self.history_n) amount and take + (self.history_n, len(Y_new) ) from the new clustering output Y_new. + + Args: + Y_new (Tensor): + The newly generated clustering label sequence that may have different permutations with the existing + speaker labels in the history buffer. + add_new (bool): + This variable indicates whether there is a new set of segments. Depending on the VAD timestamps, + the number of subsegments can be ocassionally decreased. If `add_new=True`, then it adds the newly + acquired cluster labels. + + Returns: + Y_out (Tensor): + Permutation-matched speaker labels based on history buffer + """ + if self.is_online: + # Online clustering mode with history buffer + Y_old = torch.hstack((self.history_embedding_buffer_label, self.Y_fullhist[self.history_buffer_seg_end :])) + + # Stitch the old history and new cluster labels + Y_matched = stitch_cluster_labels(Y_old=Y_old, Y_new=Y_new, with_history=True).to(Y_new.device) + + if add_new: + if Y_matched[self.history_n :].shape[0] != self.current_n: + raise ValueError("Update point sync is not correct.") + # Concatenate the newly generated speaker labels + Y_out = torch.hstack((self.Y_fullhist[: self.history_buffer_seg_end], Y_matched[self.history_n :])) + self.Y_fullhist = Y_out + else: + # Do not update cumulative labels since there are no new segments. + Y_out = self.Y_fullhist[: Y_new.shape[0]] + else: + # If no memory is used, offline clustering is applied. + Y_out = stitch_cluster_labels(Y_old=self.Y_fullhist, Y_new=Y_new, with_history=False).to(Y_new.device) + self.Y_fullhist = Y_out + return Y_out + + def forward_infer( + self, emb: torch.Tensor, frame_index: int, enhanced_count_thres: int = 40, cuda: bool = False, + ) -> torch.Tensor: + """ + Perform speaker clustering in online mode. Embedding vector set `emb` is expected to be containing + history embeddings to count the number of speakers. + + Args: + emb (Tensor): + If self.is_online is False: + `emb` contains only current speaker embedding inputs + If self.is_online is True: + `emb` is a concatenated matrix with history embedding and current embedding inputs + frame_index (int): + Unique index for each segment (also each embedding vector) + cuda (bool): + Boolean that determines whether cuda is used or not + device (torch.device): + `torch.device` variable + + Returns: + Y (Tensor): + Speaker labels for history embeddings and current embedding inputs + """ + if emb.shape[0] == 1: + Y = torch.zeros((1,), dtype=torch.int32) + + else: + mat = getCosAffinityMatrix(emb) + nmesc = NMESC( + mat, + max_num_speakers=self.max_num_speakers, + max_rp_threshold=self.max_rp_threshold, + sparse_search=True, + maj_vote_spk_count=False, + sparse_search_volume=self.sparse_search_volume, + fixed_thres=self.fixed_thres, + nme_mat_size=256, + device=emb.device, + ) + est_num_of_spk, affinity_mat = self.online_spk_num_estimation(mat, nmesc, frame_index) + spectral_model = SpectralClustering(n_clusters=est_num_of_spk, cuda=cuda, device=emb.device) + Y = spectral_model.forward(affinity_mat).to(emb.device) + return Y diff --git a/nemo/collections/asr/parts/utils/speaker_utils.py b/nemo/collections/asr/parts/utils/speaker_utils.py index f5cb7bce60b7..396cef812ca9 100644 --- a/nemo/collections/asr/parts/utils/speaker_utils.py +++ b/nemo/collections/asr/parts/utils/speaker_utils.py @@ -27,7 +27,7 @@ from pyannote.core import Annotation, Segment from tqdm import tqdm -from nemo.collections.asr.parts.utils.nmesc_clustering import SpeakerClustering, get_argmin_mat, split_input_data +from nemo.collections.asr.parts.utils.offline_clustering import SpeakerClustering, get_argmin_mat, split_input_data from nemo.utils import logging diff --git a/tests/collections/asr/test_diar_utils.py b/tests/collections/asr/test_diar_utils.py index afc99bf87ab2..c3d4eca0a8ba 100644 --- a/tests/collections/asr/test_diar_utils.py +++ b/tests/collections/asr/test_diar_utils.py @@ -13,18 +13,29 @@ # limitations under the License. import os -from itertools import permutations import numpy as np import pytest import torch -from nemo.collections.asr.parts.utils.nmesc_clustering import SpeakerClustering -from nemo.collections.asr.parts.utils.speaker_utils import ( - combine_float_overlaps, - combine_int_overlaps, - get_subsegments, +from nemo.collections.asr.parts.utils.offline_clustering import ( + SpeakerClustering, + get_scale_interpolated_embs, + getCosAffinityMatrix, + split_input_data, ) +from nemo.collections.asr.parts.utils.online_clustering import ( + OnlineSpeakerClustering, + get_closest_embeddings, + get_merge_quantity, + get_minimal_indices, + merge_vectors, + run_reducer, + stitch_cluster_labels, +) +from nemo.collections.asr.parts.utils.speaker_utils import get_subsegments + +MAX_SEED_COUNT = 2 def check_range_values(target, source): @@ -35,130 +46,289 @@ def check_range_values(target, source): return all(bool_list) -def matrix(mat, torch=True): - if torch: - return torch.tensor(mat) +def check_labels(target, source): + bool_list = [] + for x, y in zip(target, source): + bool_list.append(abs(x - y) < 1e-6) + return all(bool_list) + + +def matrix(mat, use_tensor=True, dtype=torch.long): + if use_tensor: + mat = torch.Tensor(mat).to(dtype) else: - return np.array(mat) + mat = np.array(mat) + return mat -def generate_mock_emb(n_emb_per_spk, perturb_sigma, emb_dim): - """Generate a set of artificial embedding vectors from random numbers +def generate_orthogonal_embs(total_spks, perturb_sigma, emb_dim): + """Generate a set of artificial orthogonal embedding vectors from random numbers """ - return torch.rand(1, emb_dim).repeat(n_emb_per_spk, 1) + perturb_sigma * torch.rand(n_emb_per_spk, emb_dim) + gaus = torch.randn(emb_dim, emb_dim) + _svd = torch.linalg.svd(gaus) + orth = _svd[0] @ _svd[2] + orth_embs = orth[:total_spks] + # Assert orthogonality + assert torch.abs(getCosAffinityMatrix(orth_embs) - torch.diag(torch.ones(total_spks))).sum() < 1e-4 + return orth_embs -def generate_mock_data( +def generate_toy_data( n_spks=2, spk_dur=3, emb_dim=192, - perturb_sigma=0.01, + perturb_sigma=0.0, ms_window=[1.5, 1.0, 0.5], ms_shift=[0.75, 0.5, 0.25], torch_seed=0, ): - torch.manual_seed(torch_seed) - segment_lists = [] spk_timestamps = [(spk_dur * k, spk_dur) for k in range(n_spks)] emb_list, seg_list = [], [] multiscale_segment_counts = [0 for _ in range(len(ms_window))] + ground_truth = [] + random_orthogonal_embs = generate_orthogonal_embs(n_spks, perturb_sigma, emb_dim) for scale_idx, (window, shift) in enumerate(zip(ms_window, ms_shift)): for spk_idx, (offset, dur) in enumerate(spk_timestamps): - segments = get_subsegments(offset=offset, window=window, shift=shift, duration=dur) - emb = generate_mock_emb(n_emb_per_spk=len(segments), perturb_sigma=perturb_sigma, emb_dim=emb_dim,) + segments_stt_dur = get_subsegments(offset=offset, window=window, shift=shift, duration=dur) + segments = [[x[0], x[0] + x[1]] for x in segments_stt_dur] + emb_cent = random_orthogonal_embs[spk_idx, :] + emb = emb_cent.tile((len(segments), 1)) + 0.1 * torch.rand(len(segments), emb_dim) seg_list.extend(segments) emb_list.append(emb) multiscale_segment_counts[scale_idx] += emb.shape[0] + if scale_idx == len(multiscale_segment_counts) - 1: + ground_truth.extend([spk_idx] * emb.shape[0]) + emb_tensor = torch.concat(emb_list) multiscale_segment_counts = torch.tensor(multiscale_segment_counts) segm_tensor = torch.tensor(seg_list) multiscale_weights = torch.ones(len(ms_window)).unsqueeze(0) - return emb_tensor, segm_tensor, multiscale_segment_counts, multiscale_weights, spk_timestamps - - -@pytest.mark.run_only_on('GPU') -def test_speaker_counting(n_spks=3, total_dur_sec=30, num_speakers=-1, max_num_speakers=5, cuda=True): - speaker_clustering_python = SpeakerClustering(maj_vote_spk_count=False, cuda=cuda) - assert isinstance(speaker_clustering_python, SpeakerClustering) - each_spk_dur = float(total_dur_sec / n_spks) - em, ts, mc, mw, spk_ts = generate_mock_data(n_spks=n_spks, spk_dur=each_spk_dur) - Y = speaker_clustering_python.forward_infer( - embeddings_in_scales=em, - timestamps_in_scales=ts, - multiscale_segment_counts=mc, - multiscale_weights=mw, - oracle_num_speakers=num_speakers, - max_num_speakers=max_num_speakers, - ) - return len(set(Y.tolist())) + ground_truth = torch.tensor(ground_truth) + return emb_tensor, segm_tensor, multiscale_segment_counts, multiscale_weights, spk_timestamps, ground_truth class TestDiarizationUtilFunctions: + """Tests diarization and speaker-task related utils. """ - Tests diarization and speaker-task related utils. - Test functions include: - - Segment interval merging function - - Embedding merging - """ @pytest.mark.unit - def test_combine_float_overlaps(self): - intervals = [[0.25, 1.7], [1.5, 3.0], [2.8, 5.0], [5.5, 10.0]] - target = [[0.25, 5.0], [5.5, 10.0]] - merged = combine_float_overlaps(intervals) - assert check_range_values(target, merged) + @pytest.mark.parametrize("Y", [[3, 3, 3, 4, 4, 5], [100, 100, 100, 104, 104, 1005]]) + @pytest.mark.parametrize("target", [[0, 0, 0, 1, 1, 2]]) + @pytest.mark.parametrize("offset", [1, 10]) + def test_minimal_index_ex2(self, Y, target, offset): + Y = torch.tensor(Y) + target = torch.tensor(target) + min_Y = get_minimal_indices(Y) + assert check_labels(target, min_Y) + min_Y = get_minimal_indices(Y + offset) + assert check_labels(target, min_Y) + + @pytest.mark.parametrize("Y", [[4, 0, 0, 5, 4, 5], [14, 12, 12, 19, 14, 19]]) + @pytest.mark.parametrize("target", [[1, 0, 0, 2, 1, 2]]) + @pytest.mark.parametrize("offset", [1, 10]) + def test_minimal_index_ex2(self, Y, target, offset): + Y = torch.tensor(Y) + target = torch.tensor(target) + min_Y = get_minimal_indices(Y) + assert check_labels(target, min_Y) + min_Y = get_minimal_indices(Y + offset) + assert check_labels(target, min_Y) + + @pytest.mark.unit + @pytest.mark.parametrize("N", [2, 4, 16, 64]) + def test_minimal_index_same(self, N): + Y = matrix([0] * N + [1] * N + [2] * N) + min_Y = get_minimal_indices(Y) + target = matrix([0] * N + [1] * N + [2] * N) + assert check_labels(target, min_Y) + + @pytest.mark.unit + @pytest.mark.parametrize("N", [2, 4, 16, 64]) + def test_stitch_cluster_labels_label_switch(self, N): + Y_old = matrix([0] * N) + Y_new = matrix([0] * N) + 1 + target = matrix([0] * N) + result = stitch_cluster_labels(Y_old, Y_new) + assert check_labels(target, result) + + @pytest.mark.unit + @pytest.mark.parametrize("N", [2, 4, 16, 64]) + def test_stitch_cluster_labels_label_many_to_one(self, N): + Y_old = matrix(np.arange(N).tolist()) + Y_new = matrix([0] * N) + target = matrix([0] * N) + result = stitch_cluster_labels(Y_old, Y_new) + assert check_labels(target, result) + + @pytest.mark.unit + @pytest.mark.parametrize("N", [2, 4, 16, 64]) + def test_stitch_cluster_labels_label_one_to_many(self, N): + Y_old = matrix(np.arange(N).tolist()) + Y_new = matrix([k for k in range(N)]) + target = matrix([k for k in range(N)]) + result = stitch_cluster_labels(Y_old, Y_new) + assert check_labels(target, result) + + @pytest.mark.unit + @pytest.mark.parametrize("N", [2, 4, 16, 64]) + def test_stitch_cluster_labels_one_label_replaced(self, N): + Y_old = matrix([0] * N + [1] * N + [2] * N) + Y_new = matrix([1] * N + [2] * N + [3] * N) + target = matrix([0] * N + [1] * N + [2] * N) + result = stitch_cluster_labels(Y_old, Y_new) + assert check_labels(target, result) + + @pytest.mark.unit + @pytest.mark.parametrize("N", [2, 4, 16, 64]) + def test_stitch_cluster_labels_confusion_error(self, N): + Y_old = matrix([0] * N + [1] * (N - 1) + [2] * (N + 1)) + Y_new = matrix([1] * N + [2] * N + [3] * N) + target = matrix([0] * N + [1] * N + [2] * N) + result = stitch_cluster_labels(Y_old, Y_new) + assert check_labels(target, result) + + @pytest.mark.unit + @pytest.mark.parametrize("N", [2, 256]) + def test_stitch_cluster_labels_speaker_more_speakers(self, N): + Y_old = matrix([0] * N + [1] * (N - 1) + [2] * (N + 1) + [0, 0, 0]) + Y_new = matrix([1] * N + [0] * N + [2] * N + [4, 5, 6]) + target = matrix([0] * N + [1] * N + [2] * N + [3, 4, 5]) + result = stitch_cluster_labels(Y_old, Y_new) + assert check_labels(target, result) + + @pytest.mark.unit + @pytest.mark.parametrize("N", [2, 256]) + def test_stitch_cluster_labels_speaker_longer_sequence(self, N): + Y_old = matrix([0] * N + [1] * N + [2] * N + [0, 0, 0] * N) + Y_new = matrix([1] * N + [2] * N + [0] * N + [1, 2, 3, 1, 2, 3] * N) + target = matrix([0] * N + [1] * N + [2] * N + [0, 1, 3, 0, 1, 3] * N) + result = stitch_cluster_labels(Y_old, Y_new) + assert check_labels(target, result) + + @pytest.mark.unit + @pytest.mark.parametrize("n_spks", [2, 3, 4, 5]) + @pytest.mark.parametrize("merge_quantity", [2, 3]) + def test_embedding_merger(self, n_spks, merge_quantity): + em, ts, mc, mw, spk_ts, gt = generate_toy_data(n_spks, spk_dur=5, perturb_sigma=10) + em_s, ts_s = split_input_data(em, ts, mc) + target_speaker_index = 0 + pre_clus_labels = gt + ndx = torch.where(pre_clus_labels == target_speaker_index)[0] + pre_embs = em_s[-1] + affinity_mat = getCosAffinityMatrix(pre_embs) + cmat = affinity_mat[:, ndx][ndx, :] + # Check the dimension of the selected affinity values + assert cmat.shape[0] == cmat.shape[1] == torch.sum(pre_clus_labels == target_speaker_index).item() + index_2d, rest_inds = get_closest_embeddings(cmat, merge_quantity) + # Check the most closest affinity value + assert torch.max(cmat.sum(0)) == cmat.sum(0)[index_2d[0]] + spk_cluster_labels, emb_ndx = pre_clus_labels[ndx], pre_embs[ndx] + merged_embs, merged_clus_labels = merge_vectors(index_2d, emb_ndx, spk_cluster_labels) + # Check the number of merged embeddings and labels + assert (torch.sum(gt == target_speaker_index).item() - merge_quantity) == merged_clus_labels.shape[0] + + @pytest.mark.unit + @pytest.mark.parametrize("n_spks", [1, 8]) + @pytest.mark.parametrize("spk_dur", [0.2, 0.25, 0.5, 1, 10]) + def test_cosine_affinity_calculation(self, n_spks, spk_dur): + em, ts, mc, mw, spk_ts, gt = generate_toy_data(n_spks=n_spks, spk_dur=spk_dur) + em_s, ts_s = split_input_data(em, ts, mc) + affinity_mat = getCosAffinityMatrix(em_s[-1]) + # affinity_mat should not contain any nan element + assert torch.any(torch.isnan(affinity_mat)) == False + + @pytest.mark.unit + @pytest.mark.parametrize("n_spks", [1, 8]) + @pytest.mark.parametrize("spk_dur", [0.2, 0.25, 0.5, 1, 10]) + def test_cosine_affinity_calculation_scale_interpol(self, n_spks, spk_dur): + em, ts, mc, mw, spk_ts, gt = generate_toy_data(n_spks=n_spks, spk_dur=spk_dur) + em_s, ts_s = split_input_data(em, ts, mc) + embs, _ = get_scale_interpolated_embs(mw, em_s, ts_s) + affinity_mat = getCosAffinityMatrix(embs) + # affinity_mat should not contain any nan element + assert torch.any(torch.isnan(affinity_mat)) == False + + @pytest.mark.unit + @pytest.mark.parametrize("n_spks", [4, 5, 6]) + @pytest.mark.parametrize("target_speaker_index", [0, 1, 2]) + @pytest.mark.parametrize("merge_quantity", [2, 3]) + def test_embedding_reducer(self, n_spks, target_speaker_index, merge_quantity): + em, ts, mc, mw, spk_ts, gt = generate_toy_data(n_spks=n_spks, spk_dur=10) + em_s, ts_s = split_input_data(em, ts, mc) + merged_embs, merged_clus_labels, _ = run_reducer( + pre_embs=em_s[-1], target_spk_idx=target_speaker_index, merge_quantity=merge_quantity, pre_clus_labels=gt, + ) + assert (torch.sum(gt == target_speaker_index).item() - merge_quantity) == merged_clus_labels.shape[0] + + @pytest.mark.unit + @pytest.mark.parametrize("ntbr", [3]) + @pytest.mark.parametrize("pcl", [torch.tensor([0] * 70 + [1] * 32)]) + @pytest.mark.parametrize("mspb", [25]) + def test_merge_scheduler_2clus(self, ntbr, pcl, mspb): + class_target_vol = get_merge_quantity(num_to_be_removed=ntbr, pre_clus_labels=pcl, min_count_per_cluster=mspb,) + assert all(class_target_vol == torch.tensor([3, 0])) @pytest.mark.unit - def test_combine_int_overlaps(self): - intervals = [[1, 3], [2, 6], [8, 10], [15, 18]] - target = [[1, 6], [8, 10], [15, 18]] - merged = combine_int_overlaps(intervals) - assert check_range_values(target, merged) + @pytest.mark.parametrize("ntbr", [3]) + @pytest.mark.parametrize("pcl", [torch.tensor([0] * 80 + [1] * 35 + [2] * 32)]) + @pytest.mark.parametrize("mspb", [0, 25]) + def test_merge_scheduler_3clus(self, ntbr, pcl, mspb): + class_target_vol = get_merge_quantity(num_to_be_removed=ntbr, pre_clus_labels=pcl, min_count_per_cluster=mspb,) + assert all(class_target_vol == torch.tensor([3, 0, 0])) @pytest.mark.unit - def test_combine_int_overlaps_edge(self): - intervals = [[1, 4], [4, 5]] - target = [[1, 5]] - merged = combine_int_overlaps(intervals) - assert check_range_values(target, merged) + @pytest.mark.parametrize("ntbr", [132 - 45]) + @pytest.mark.parametrize("pcl", [torch.tensor([2] * 70 + [0] * 32 + [1] * 27 + [3] * 3)]) + @pytest.mark.parametrize("mspb", [3, 10]) + def test_merge_scheduler_4clus_shuff(self, ntbr, pcl, mspb): + class_target_vol = get_merge_quantity(num_to_be_removed=ntbr, pre_clus_labels=pcl, min_count_per_cluster=mspb,) + assert all(class_target_vol == torch.tensor([18, 13, 56, 0])) - def test_embedding_merge(self): - # TODO - pass + @pytest.mark.unit + @pytest.mark.parametrize("ntbr", [3]) + @pytest.mark.parametrize("pcl", [torch.tensor([0] * 5 + [1] * 4 + [2] * 3)]) + @pytest.mark.parametrize("mspb", [0, 2]) + def test_merge_scheduler_3clus(self, ntbr, pcl, mspb): + class_target_vol = get_merge_quantity(num_to_be_removed=ntbr, pre_clus_labels=pcl, min_count_per_cluster=mspb,) + assert all(class_target_vol == torch.tensor([2, 1, 0])) + + @pytest.mark.unit + @pytest.mark.parametrize("ntbr", [2]) + @pytest.mark.parametrize("pcl", [torch.tensor([0] * 7 + [1] * 5 + [2] * 3 + [3] * 5)]) + @pytest.mark.parametrize("mspb", [2]) + def test_merge_scheduler_3clus_repeat(self, ntbr, pcl, mspb): + class_target_vol = get_merge_quantity(num_to_be_removed=ntbr, pre_clus_labels=pcl, min_count_per_cluster=mspb,) + assert all(class_target_vol == torch.tensor([2, 0, 0, 0])) class TestSpeakerClustering: """ Test speaker clustering module - Test functions include: - - script module export - - speaker counting feature """ @pytest.mark.run_only_on('GPU') @pytest.mark.unit - def test_clus_script_export(self): + @pytest.mark.parametrize("n_spks", [1, 2]) + def test_clus_script_export(self, n_spks, total_dur_sec=30): exported_filename = 'speaker_clustering_script.pt' speaker_clustering_python = SpeakerClustering(maj_vote_spk_count=False, cuda=True) speaker_clustering_scripted_source = torch.jit.script(speaker_clustering_python) torch.jit.save(speaker_clustering_scripted_source, exported_filename) - speaker_clustering_scripted = torch.load(exported_filename) + speaker_clustering_scripted = torch.jit.load(exported_filename) assert os.path.exists(exported_filename) os.remove(exported_filename) assert not os.path.exists(exported_filename) - total_dur_sec = 30 - n_spks = 3 - each_spk_dur = float(total_dur_sec / n_spks) - em, ts, mc, mw, spk_ts = generate_mock_data(n_spks=n_spks, spk_dur=each_spk_dur) + each_spk_dur = float(total_dur_sec / n_spks) + em, ts, mc, mw, _, _ = generate_toy_data(n_spks=n_spks, spk_dur=each_spk_dur) num_speakers = -1 max_num_speakers = 8 + enhanced_count_thres = 80 sparse_search_volume = 10 max_rp_threshold = 0.15 fixed_thres = -1.0 - enhanced_count_thres = 80 # Function call for NeMo python pipeline (unexported) in python Y_py = speaker_clustering_python.forward_infer( @@ -193,10 +363,10 @@ def test_clus_script_export(self): 'timestamps': ts, 'multiscale_segment_counts': mc, 'multiscale_weights': mw, - 'oracle_num_speakers': torch.LongTensor([num_speakers]), - 'max_num_speakers': torch.LongTensor([max_num_speakers]), - 'enhanced_count_thres': torch.LongTensor([enhanced_count_thres]), - 'sparse_search_volume': torch.LongTensor([sparse_search_volume]), + 'oracle_num_speakers': torch.Tensor([num_speakers]), + 'max_num_speakers': torch.Tensor([max_num_speakers]), + 'enhanced_count_thres': torch.Tensor([enhanced_count_thres]), + 'sparse_search_volume': torch.Tensor([sparse_search_volume]), 'max_rp_threshold': torch.tensor([max_rp_threshold]), 'fixed_thres': torch.tensor([fixed_thres]), } @@ -212,48 +382,213 @@ def test_clus_script_export(self): @pytest.mark.run_only_on('CPU') @pytest.mark.unit - def test_speaker_counting_1_cpu(self, n_spks=1): - est_n_spk = test_speaker_counting(n_spks=n_spks, total_dur_sec=10, cuda=False) - assert est_n_spk == n_spks, f"Clustering test failed at n_spks={n_spks} speaker test" - est_n_spk = test_speaker_counting(n_spks=n_spks, total_dur_sec=20, cuda=False) - assert est_n_spk == n_spks, f"Clustering test failed at n_spks={n_spks} speaker test" + @pytest.mark.parametrize("n_spks", [1, 2]) + @pytest.mark.parametrize("spk_dur", [20]) + @pytest.mark.parametrize("SSV", [10]) + @pytest.mark.parametrize("perturb_sigma", [0.1]) + def test_offline_speaker_clustering_cpu(self, n_spks, spk_dur, SSV, perturb_sigma): + em, ts, mc, mw, spk_ts, gt = generate_toy_data(n_spks=n_spks, spk_dur=spk_dur, perturb_sigma=perturb_sigma) + offline_speaker_clustering = SpeakerClustering(maj_vote_spk_count=False, cuda=False) + assert isinstance(offline_speaker_clustering, SpeakerClustering) + + Y_out = offline_speaker_clustering.forward_infer( + embeddings_in_scales=em, + timestamps_in_scales=ts, + multiscale_segment_counts=mc, + multiscale_weights=mw, + oracle_num_speakers=-1, + max_num_speakers=8, + enhanced_count_thres=40, + sparse_search_volume=SSV, + max_rp_threshold=0.15, + fixed_thres=-1.0, + ) + permuted_Y = stitch_cluster_labels(Y_old=gt, Y_new=Y_out) - @pytest.mark.run_only_on('GPU') - @pytest.mark.unit - def test_speaker_counting_1spk_gpu(self, n_spks=1): - est_n_spk = test_speaker_counting(n_spks=n_spks, total_dur_sec=10) - assert est_n_spk == n_spks, f"Clustering test failed at n_spks={n_spks} speaker test" - est_n_spk = test_speaker_counting(n_spks=n_spks, total_dur_sec=20) - assert est_n_spk == n_spks, f"Clustering test failed at n_spks={n_spks} speaker test" + # mc[-1] is the number of base scale segments + assert len(set(permuted_Y.tolist())) == n_spks + assert Y_out.shape[0] == mc[-1] + assert all(permuted_Y == gt) @pytest.mark.run_only_on('GPU') @pytest.mark.unit - def test_speaker_counting_2spk_gpu(self, n_spks=2): - est_n_spk = test_speaker_counting(n_spks=n_spks) - assert est_n_spk == n_spks, f"Clustering test failed at n_spks={n_spks} speaker test" + @pytest.mark.parametrize("n_spks", [1, 2, 3, 4, 5, 6, 7]) + @pytest.mark.parametrize("total_sec", [30]) + @pytest.mark.parametrize("SSV", [10]) + @pytest.mark.parametrize("perturb_sigma", [0.1]) + @pytest.mark.parametrize("seed", np.arange(MAX_SEED_COUNT).tolist()) + def test_offline_speaker_clustering_gpu(self, n_spks, total_sec, SSV, perturb_sigma, seed): + spk_dur = total_sec / n_spks + em, ts, mc, mw, spk_ts, gt = generate_toy_data( + n_spks=n_spks, spk_dur=spk_dur, perturb_sigma=perturb_sigma, torch_seed=seed + ) + offline_speaker_clustering = SpeakerClustering(maj_vote_spk_count=False, cuda=True) + assert isinstance(offline_speaker_clustering, SpeakerClustering) - @pytest.mark.run_only_on('GPU') + Y_out = offline_speaker_clustering.forward_infer( + embeddings_in_scales=em, + timestamps_in_scales=ts, + multiscale_segment_counts=mc, + multiscale_weights=mw, + oracle_num_speakers=-1, + max_num_speakers=8, + enhanced_count_thres=40, + sparse_search_volume=SSV, + max_rp_threshold=0.15, + fixed_thres=-1.0, + ) + permuted_Y = stitch_cluster_labels(Y_old=gt, Y_new=Y_out) + + # mc[-1] is the number of base scale segments + assert len(set(permuted_Y.tolist())) == n_spks + assert Y_out.shape[0] == mc[-1] + assert all(permuted_Y == gt) + + @pytest.mark.run_only_on('CPU') @pytest.mark.unit - def test_speaker_counting_3spk_gpu(self, n_spks=3): - est_n_spk = test_speaker_counting(n_spks=n_spks) - assert est_n_spk == n_spks, f"Clustering test failed at n_spks={n_spks} speaker test" + @pytest.mark.parametrize("n_spks", [1]) + @pytest.mark.parametrize("spk_dur", [0.25, 0.5, 0.75, 1, 1.5, 2]) + @pytest.mark.parametrize("SSV", [5]) + @pytest.mark.parametrize("enhanced_count_thres", [40]) + @pytest.mark.parametrize("min_samples_for_nmesc", [6]) + @pytest.mark.parametrize("seed", np.arange(MAX_SEED_COUNT).tolist()) + def test_offline_speaker_clustering_very_short_cpu( + self, n_spks, spk_dur, SSV, enhanced_count_thres, min_samples_for_nmesc, seed, + ): + em, ts, mc, mw, spk_ts, gt = generate_toy_data( + n_spks=n_spks, spk_dur=spk_dur, perturb_sigma=0.1, torch_seed=seed + ) + offline_speaker_clustering = SpeakerClustering(maj_vote_spk_count=False, min_samples_for_nmesc=0, cuda=False) + assert isinstance(offline_speaker_clustering, SpeakerClustering) + Y_out = offline_speaker_clustering.forward_infer( + embeddings_in_scales=em, + timestamps_in_scales=ts, + multiscale_segment_counts=mc, + multiscale_weights=mw, + oracle_num_speakers=-1, + max_num_speakers=8, + enhanced_count_thres=enhanced_count_thres, + sparse_search_volume=SSV, + max_rp_threshold=0.15, + fixed_thres=-1.0, + ) + permuted_Y = stitch_cluster_labels(Y_old=gt, Y_new=Y_out) + + # mc[-1] is the number of base scale segments + assert len(set(permuted_Y.tolist())) == n_spks + assert Y_out.shape[0] == mc[-1] + assert all(permuted_Y == gt) @pytest.mark.run_only_on('GPU') @pytest.mark.unit - def test_speaker_counting_4spk_gpu(self, n_spks=4): - est_n_spk = test_speaker_counting(n_spks=n_spks) - assert est_n_spk == n_spks, f"Clustering test failed at n_spks={n_spks} speaker test" + @pytest.mark.parametrize("n_spks", [1]) + @pytest.mark.parametrize("spk_dur", [0.25, 0.5, 0.75, 1, 2, 4]) + @pytest.mark.parametrize("SSV", [5]) + @pytest.mark.parametrize("enhanced_count_thres", [40]) + @pytest.mark.parametrize("min_samples_for_nmesc", [6]) + @pytest.mark.parametrize("seed", np.arange(MAX_SEED_COUNT).tolist()) + def test_offline_speaker_clustering_very_short_gpu( + self, n_spks, spk_dur, SSV, enhanced_count_thres, min_samples_for_nmesc, seed + ): + em, ts, mc, mw, spk_ts, gt = generate_toy_data( + n_spks=n_spks, spk_dur=spk_dur, perturb_sigma=0.1, torch_seed=seed + ) + offline_speaker_clustering = SpeakerClustering(maj_vote_spk_count=False, min_samples_for_nmesc=0, cuda=True) + assert isinstance(offline_speaker_clustering, SpeakerClustering) + Y_out = offline_speaker_clustering.forward_infer( + embeddings_in_scales=em, + timestamps_in_scales=ts, + multiscale_segment_counts=mc, + multiscale_weights=mw, + oracle_num_speakers=-1, + max_num_speakers=8, + enhanced_count_thres=enhanced_count_thres, + sparse_search_volume=SSV, + max_rp_threshold=0.15, + fixed_thres=-1.0, + ) + permuted_Y = stitch_cluster_labels(Y_old=gt, Y_new=Y_out) + + # mc[-1] is the number of base scale segments + assert Y_out.shape[0] == mc[-1] + assert all(permuted_Y == gt) @pytest.mark.run_only_on('GPU') @pytest.mark.unit - def test_speaker_counting_5spk_gpu(self, n_spks=5): - est_n_spk = test_speaker_counting(n_spks=n_spks) - assert est_n_spk == n_spks, f"Clustering test failed at n_spks={n_spks} speaker test" + @pytest.mark.parametrize("n_spks", [1, 2, 3, 4]) + @pytest.mark.parametrize("total_sec", [30]) + @pytest.mark.parametrize("buffer_size", [30]) + @pytest.mark.parametrize("sigma", [0.1]) + @pytest.mark.parametrize("seed", np.arange(MAX_SEED_COUNT).tolist()) + def test_online_speaker_clustering(self, n_spks, total_sec, buffer_size, sigma, seed): + step_per_frame = 2 + spk_dur = total_sec / n_spks + em, ts, mc, _, _, gt = generate_toy_data(n_spks, spk_dur=spk_dur, perturb_sigma=sigma, torch_seed=seed) + em_s, ts_s = split_input_data(em, ts, mc) + + emb_gen = em_s[-1] + segment_indexes = ts_s[-1] + if torch.cuda.is_available(): + emb_gen, segment_indexes = emb_gen.to("cuda"), segment_indexes.to("cuda") + cuda = True + else: + cuda = False + + history_buffer_size = buffer_size + current_buffer_size = buffer_size + + online_clus = OnlineSpeakerClustering( + max_num_speakers=8, + max_rp_threshold=0.15, + sparse_search_volume=30, + history_buffer_size=history_buffer_size, + current_buffer_size=current_buffer_size, + ) - def test_offline_clustering(self): - # TODO - pass + n_frames = int(emb_gen.shape[0] / step_per_frame) + evaluation_list = [] + + # Simulate online speaker clustering + for frame_index in range(n_frames): + curr_emb = emb_gen[0 : (frame_index + 1) * step_per_frame] + base_segment_indexes = np.arange(curr_emb.shape[0]) + + # Save history embeddings + concat_emb, add_new = online_clus.get_reduced_mat( + emb_in=curr_emb, base_segment_indexes=base_segment_indexes + ) + + # Check history_buffer_size and history labels + assert ( + online_clus.history_embedding_buffer_emb.shape[0] <= history_buffer_size + ), "History buffer size error" + assert ( + online_clus.history_embedding_buffer_emb.shape[0] + == online_clus.history_embedding_buffer_label.shape[0] + ) + + # Call clustering function + Y_concat = online_clus.forward_infer(emb=concat_emb, frame_index=frame_index, cuda=cuda) + + # Resolve permutations + merged_clus_labels = online_clus.match_labels(Y_concat, add_new=add_new) + assert len(merged_clus_labels) == (frame_index + 1) * step_per_frame + print(merged_clus_labels) + # Resolve permutation issue by using stitch_cluster_labels function + merged_clus_labels = stitch_cluster_labels(Y_old=gt[: len(merged_clus_labels)], Y_new=merged_clus_labels) + evaluation_list.extend(list(merged_clus_labels == gt[: len(merged_clus_labels)])) + + assert online_clus.is_online + assert add_new + cumul_label_acc = sum(evaluation_list) / len(evaluation_list) + assert cumul_label_acc > 0.9 - def test_online_clustering(self): - # TODO - pass + @pytest.mark.run_only_on('CPU') + @pytest.mark.unit + @pytest.mark.parametrize("n_spks", [4]) + @pytest.mark.parametrize("total_sec", [30]) + @pytest.mark.parametrize("buffer_size", [30]) + @pytest.mark.parametrize("sigma", [0.1]) + @pytest.mark.parametrize("seed", [0]) + def test_online_speaker_clustering_cpu(self, n_spks, total_sec, buffer_size, sigma, seed): + self.test_online_speaker_clustering(n_spks, total_sec, buffer_size, sigma, seed) From 9a5e5f8fd36798c47f8fcf93a7a8bfc5fbfb7407 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu, 15 Dec 2022 11:25:30 -0800 Subject: [PATCH 051/154] [STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models (#5639) (#5641) * add stt_eo_conformer_ctc_large model * stt_eo_conformer_transducer_large Co-authored-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- nemo/collections/asr/models/ctc_bpe_models.py | 7 +++++++ nemo/collections/asr/models/rnnt_bpe_models.py | 7 +++++++ 2 files changed, 14 insertions(+) diff --git a/nemo/collections/asr/models/ctc_bpe_models.py b/nemo/collections/asr/models/ctc_bpe_models.py index cb0abdeb6b29..ea23fd05ab77 100644 --- a/nemo/collections/asr/models/ctc_bpe_models.py +++ b/nemo/collections/asr/models/ctc_bpe_models.py @@ -649,4 +649,11 @@ def list_available_models(cls) -> List[PretrainedModelInfo]: ) results.append(model) + model = PretrainedModelInfo( + pretrained_model_name="stt_eo_conformer_ctc_large", + description="For details about this model, please visit https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_eo_conformer_ctc_large", + location="https://api.ngc.nvidia.com/v2/models/nvidia/nemo/stt_eo_conformer_ctc_large/versions/1.14.0/files/stt_eo_conformer_ctc_large.nemo", + ) + results.append(model) + return results diff --git a/nemo/collections/asr/models/rnnt_bpe_models.py b/nemo/collections/asr/models/rnnt_bpe_models.py index 81f0a3b49da5..3037eecee2a6 100644 --- a/nemo/collections/asr/models/rnnt_bpe_models.py +++ b/nemo/collections/asr/models/rnnt_bpe_models.py @@ -240,6 +240,13 @@ def list_available_models(cls) -> List[PretrainedModelInfo]: ) results.append(model) + model = PretrainedModelInfo( + pretrained_model_name="stt_eo_conformer_transducer_large", + description="For details about this model, please visit https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_eo_conformer_transducer_large", + location="https://api.ngc.nvidia.com/v2/models/nvidia/nemo/stt_eo_conformer_transducer_large/versions/1.14.0/files/stt_eo_conformer_transducer_large.nemo", + ) + results.append(model) + return results def __init__(self, cfg: DictConfig, trainer: Trainer = None): From eea669ef7b22a7bbdaf144842c6280083aeef298 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Thu, 15 Dec 2022 13:06:19 -0800 Subject: [PATCH 052/154] Removed unused import Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index b1d1aec9c1ec..ac5bba36f2e2 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -20,7 +20,6 @@ from utils.make_ctm import make_basetoken_ctm, make_word_ctm from utils.viterbi_decoding import viterbi_decoding -from nemo.collections.asr.models import ASRModel from nemo.collections.asr.models.ctc_models import EncDecCTCModel from nemo.collections.asr.parts.utils.transcribe_utils import setup_model from nemo.core.config import hydra_runner From 9dd2b11b2833b9aa1262b81208f84a75671d42f8 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Thu, 15 Dec 2022 13:07:56 -0800 Subject: [PATCH 053/154] Specify that filepaths need to be absolute Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 20ad033813f1..bc6ff49d28ea 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -15,7 +15,7 @@ To use NFA, all you need to provide is a correct NeMo manifest (with `"audio_fil Call the `align.py` script, specifying the parameters as follows: -* `manifest_filepath`: The path to the manifest of the data you want to align, containing 'audio_filepath' and 'text' fields. +* `manifest_filepath`: The path to the manifest of the data you want to align, containing `'audio_filepath'` and `'text'` fields. The audio filepaths need to be absolute paths. * `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. @@ -39,9 +39,9 @@ Call the `align.py` script, specifying the parameters as follows: # Input manifest file format -NFA needs to be provided with a 'manifest' file where each line specifies the "audio_filepath" and "text" of each utterance that you wish to produce alignments for, like the format below: +NFA needs to be provided with a 'manifest' file where each line specifies the absolute "audio_filepath" and "text" of each utterance that you wish to produce alignments for, like the format below: ```json -{"audio_filepath": "/path/to/audio.wav", "text": "the transcription of the utterance"} +{"audio_filepath": "/absolute/path/to/audio.wav", "text": "the transcription of the utterance"} ``` > Note: NFA does not require "duration" fields, and can align long audio files without running out of memory. Depending on your machine specs, you can align audios up to 5-10 minutes on Conformer CTC models, up to around 1.5 hours for QuartzNet models, and up to several hours for Citrinet models. NFA will also produce better alignments the more accurate the ground-truth "text" is. From 9952524a40f44d047b42e70bb1e01703abacc796 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Thu, 15 Dec 2022 13:12:42 -0800 Subject: [PATCH 054/154] replaces any spaces in utt_id with dashes Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 2 +- tools/nemo_forced_aligner/align.py | 2 +- tools/nemo_forced_aligner/utils/make_ctm.py | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index bc6ff49d28ea..4816d0bdc34a 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -29,7 +29,7 @@ Call the `align.py` script, specifying the parameters as follows: * **[OPTIONAL]** `separator`: the string used to separate CTM segments. If the separator is `“”` (empty string), the CTM segments will be the tokens used by the ASR model. If the separator is anything else, e.g. `“ “`, `“|”` or `“”`, the segments will be the blocks of text separated by that separator. (Default: `“ “`, so for languages such as English, the CTM segments will be words.) -* **[OPTIONAL]** `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be stripped away from the utt_id, so as not to change the number of space-separated elements in the CTM files. +* **[OPTIONAL]** `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be replaced with dashes, so as not to change the number of space-separated elements in the CTM files. * **[OPTIONAL]** `audio_sr`: The sample rate (in Hz) of your audio. (Default: 16000) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index ac5bba36f2e2..f1140b89dccd 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -60,7 +60,7 @@ n_parts_for_ctm_id: int specifying how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. Note also that any spaces that are present in the audio_filepath - will be stripped away from the utt_id, so as not to change the number of space-separated elements in the + will be replaced with dashes, so as not to change the number of space-separated elements in the CTM files. e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 1 => utt_id will be "e1" e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 2 => utt_id will be "d_e1" diff --git a/tools/nemo_forced_aligner/utils/make_ctm.py b/tools/nemo_forced_aligner/utils/make_ctm.py index 147d0e7d2a89..ed12fcff7184 100644 --- a/tools/nemo_forced_aligner/utils/make_ctm.py +++ b/tools/nemo_forced_aligner/utils/make_ctm.py @@ -21,7 +21,7 @@ def _get_utt_id(audio_filepath, n_parts_for_ctm_id): fp_parts = Path(audio_filepath).parts[-n_parts_for_ctm_id:] utt_id = Path("_".join(fp_parts)).stem - utt_id = "".join(utt_id.split()) # remove any spaces that may have been in the filepath + utt_id = utt_id.replace(" ", "-") # replace any spaces in the filepath with dashes return utt_id From 7b5fcce3719910a980f7901d6a3b5673e915b331 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu, 15 Dec 2022 21:13:55 +0000 Subject: [PATCH 055/154] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/utils/make_ctm.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/nemo_forced_aligner/utils/make_ctm.py b/tools/nemo_forced_aligner/utils/make_ctm.py index ed12fcff7184..0873559f513e 100644 --- a/tools/nemo_forced_aligner/utils/make_ctm.py +++ b/tools/nemo_forced_aligner/utils/make_ctm.py @@ -21,7 +21,7 @@ def _get_utt_id(audio_filepath, n_parts_for_ctm_id): fp_parts = Path(audio_filepath).parts[-n_parts_for_ctm_id:] utt_id = Path("_".join(fp_parts)).stem - utt_id = utt_id.replace(" ", "-") # replace any spaces in the filepath with dashes + utt_id = utt_id.replace(" ", "-") # replace any spaces in the filepath with dashes return utt_id From 8f4eceac41327ff67de2640c08e8b8f40f67ed2a Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Thu, 15 Dec 2022 13:24:10 -0800 Subject: [PATCH 056/154] Make hydra script callable by another script Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index f1140b89dccd..22a6cf2835f9 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -12,10 +12,11 @@ # See the License for the specific language governing permissions and # limitations under the License. -from dataclasses import dataclass +from dataclasses import dataclass, is_dataclass from typing import Optional import torch +from omegaconf import OmegaConf from utils.data_prep import get_log_probs_y_T_U, get_manifest_lines from utils.make_ctm import make_basetoken_ctm, make_word_ctm from utils.viterbi_decoding import viterbi_decoding @@ -93,6 +94,9 @@ class AlignmentConfig: @hydra_runner(config_name="AlignmentConfig", schema=AlignmentConfig) def main(cfg: AlignmentConfig): + if is_dataclass(cfg): + cfg = OmegaConf.structured(cfg) + # Validate config if cfg.manifest_filepath is None: raise ValueError("cfg.manifest_filepath needs to be specified") From 27037341b21bd9f5db8edbcfdbde4dc7009e106a Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Thu, 15 Dec 2022 13:44:36 -0800 Subject: [PATCH 057/154] do not specify default model or model_downsample_factor Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 18 ++++++++++-------- tools/nemo_forced_aligner/align.py | 28 +++++++++++++++++----------- 2 files changed, 27 insertions(+), 19 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 4816d0bdc34a..920dde2070b8 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -4,8 +4,10 @@ A tool for doing Forced Alignment using Viterbi decoding of NeMo CTC-based model ## Usage example -``` +``` bash python /NeMo/tools/nemo_forced_aligner/align.py \ + pretrained_name="stt_en_citrinet_1024_gamma_0_25" \ + model_downsample_factor=8 \ manifest_filepath= \ output_ctm_folder= ``` @@ -15,17 +17,17 @@ To use NFA, all you need to provide is a correct NeMo manifest (with `"audio_fil Call the `align.py` script, specifying the parameters as follows: -* `manifest_filepath`: The path to the manifest of the data you want to align, containing `'audio_filepath'` and `'text'` fields. The audio filepaths need to be absolute paths. - -* `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. - -* **[OPTIONAL]** `pretrained_name`: string specifying the name of a CTC NeMo ASR model which will be automatically downloaded from NGC and used for generating the log-probs which we will use to do alignment. Any Quartznet, Citrinet, Conformer CTC model should work, in any language (only English has been tested so far). (Default: "stt_en_citrinet_1024_gamma_0_25"). +* `pretrained_name`: string specifying the name of a CTC NeMo ASR model which will be automatically downloaded from NGC and used for generating the log-probs which we will use to do alignment. Any Quartznet, Citrinet, Conformer CTC model should work, in any language (only English has been tested so far). If `model_path` is specified, `pretrained_name` must not be specified. >Note: NFA can only use CTC models (not Transducer models) at the moment. If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely give Out Of Memory errors. -* **[OPTIONAL]** `model_path`: string specifying the local filepath to a CTC NeMo ASR model which will be used to generate the log-probs which we will use to do alignment. +* `model_path`: string specifying the local filepath to a CTC NeMo ASR model which will be used to generate the log-probs which we will use to do alignment. If `pretrained_name` is specified, `model_path` must not be specified. >Note: NFA can only use CTC models (not Transducer models) at the moment. If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely give Out Of Memory errors. -* **[OPTIONAL]** `model_downsample_factor`: the downsample factor of the ASR model. It should be 2 if your model is QuartzNet, 4 if it is Conformer CTC, 8 if it is Citrinet (Default: 8, to match with the default model_name "stt_en_citrinet_1024_gamma_0_25"). +* `model_downsample_factor`: the downsample factor of the ASR model. It should be 2 if your model is QuartzNet, 4 if it is Conformer CTC, 8 if it is Citrinet. + +* `manifest_filepath`: The path to the manifest of the data you want to align, containing `'audio_filepath'` and `'text'` fields. The audio filepaths need to be absolute paths. + +* `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. * **[OPTIONAL]** `separator`: the string used to separate CTM segments. If the separator is `“”` (empty string), the CTM segments will be the tokens used by the ASR model. If the separator is anything else, e.g. `“ “`, `“|”` or `“”`, the segments will be the blocks of text separated by that separator. (Default: `“ “`, so for languages such as English, the CTM segments will be words.) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 22a6cf2835f9..ab57b1ece985 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -31,9 +31,6 @@ Results are saved in ctm files in output_ctm_folder. Arguments: - manifest_filepath: filepath to the manifest of the data you want to align, - containing 'audio_filepath' and 'text' fields. - output_ctm_folder: the folder where output CTM files will be saved. pretrained_name: string specifying the name of a CTC NeMo ASR model which will be automatically downloaded from NGC and used for generating the log-probs which we will use to do alignment. Note: NFA can only use CTC models (not Transducer models) at the moment. @@ -46,6 +43,9 @@ If the ASR model is a QuartzNet model, its downsample factor is 2. If the ASR model is a Conformer CTC model, its downsample factor is 4. If the ASR model is a Citirnet model, its downsample factor is 8. + manifest_filepath: filepath to the manifest of the data you want to align, + containing 'audio_filepath' and 'text' fields. + output_ctm_folder: the folder where output CTM files will be saved. separator: the string used to separate CTM segments. If the separator is “”, the CTM segments will be the tokens used by the ASR model. If the separator is anything else, e.g. “ “, “|” or “”, the segments will be the blocks of @@ -77,13 +77,13 @@ @dataclass class AlignmentConfig: # Required configs + pretrained_name: Optional[str] = None + model_path: Optional[str] = None + model_downsample_factor: Optional[int] = None manifest_filepath: Optional[str] = None output_ctm_folder: Optional[str] = None # General configs - pretrained_name: Optional[str] = "stt_en_citrinet_1024_gamma_0_25" - model_path: Optional[str] = None - model_downsample_factor: int = 8 separator: str = " " n_parts_for_ctm_id: int = 1 audio_sr: int = 16000 @@ -98,14 +98,20 @@ def main(cfg: AlignmentConfig): cfg = OmegaConf.structured(cfg) # Validate config + if cfg.model_path is None and cfg.pretrained_name is None: + raise ValueError("Both cfg.model_path and cfg.pretrained_name cannot be None") + + if cfg.model_path is not None and cfg.pretrained_name is not None: + raise ValueError("One of cfg.model_path and cfg.pretrained_name must be None") + + if cfg.model_downsample_factor is None: + raise ValueError("cfg.model_downsample_factor must be specified") + if cfg.manifest_filepath is None: - raise ValueError("cfg.manifest_filepath needs to be specified") + raise ValueError("cfg.manifest_filepath must be specified") if cfg.output_ctm_folder is None: - raise ValueError("cfg.output_ctm_folder needs to be specified") - - if cfg.model_path is None and cfg.pretrained_name is None: - raise ValueError("Both cfg.model_path and cfg.pretrained_name cannot be None") + raise ValueError("cfg.output_ctm_folder must be specified") # load model device = torch.device(cfg.device) From 6dbc849ccfcae00b5d58db6dc01bff47e79e7d3b Mon Sep 17 00:00:00 2001 From: anteju <108555623+anteju@users.noreply.github.com> Date: Thu, 15 Dec 2022 13:58:52 -0800 Subject: [PATCH 058/154] [Dockerfile] Remove AIS archive from docker image (#5629) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Ante Jukić Signed-off-by: Elena Rastorgueva --- Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Dockerfile b/Dockerfile index 8044a99771f8..43fb42b12992 100644 --- a/Dockerfile +++ b/Dockerfile @@ -103,4 +103,4 @@ RUN printf "#!/bin/bash\njupyter lab --no-browser --allow-root --ip=0.0.0.0" >> # Prepare AIS CLI ARG AIS_VERSION=v1.3.15 ARG AIS_BIN=https://github.com/NVIDIA/aistore/releases/download/${AIS_VERSION}/ais-linux-amd64.tar.gz -RUN curl -LO ${AIS_BIN} && tar -xzvf ais-linux-amd64.tar.gz && mv ./ais /usr/local/bin/. +RUN curl -LO ${AIS_BIN} && tar -xzvf ais-linux-amd64.tar.gz && mv ./ais /usr/local/bin/. && rm ais-linux-amd64.tar.gz From 941f85e40e8a75bf3c526b8b1051a107a8f4a04d Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Thu, 15 Dec 2022 14:01:17 -0800 Subject: [PATCH 059/154] Measure audio_sr from audio instead of needing to specify Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 2 -- tools/nemo_forced_aligner/align.py | 16 +++++++++++----- tools/nemo_forced_aligner/utils/data_prep.py | 20 ++++++++++++++++++++ 3 files changed, 31 insertions(+), 7 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 920dde2070b8..6fda0290fa95 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -33,8 +33,6 @@ Call the `align.py` script, specifying the parameters as follows: * **[OPTIONAL]** `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be replaced with dashes, so as not to change the number of space-separated elements in the CTM files. -* **[OPTIONAL]** `audio_sr`: The sample rate (in Hz) of your audio. (Default: 16000) - * **[OPTIONAL]** `device`: The device that will be used for generating log-probs and doing Viterbi decoding. (Default: 'cpu'). * **[OPTIONAL]** `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index ab57b1ece985..e3093cfe8080 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -17,13 +17,14 @@ import torch from omegaconf import OmegaConf -from utils.data_prep import get_log_probs_y_T_U, get_manifest_lines +from utils.data_prep import get_audio_sr, get_log_probs_y_T_U, get_manifest_lines from utils.make_ctm import make_basetoken_ctm, make_word_ctm from utils.viterbi_decoding import viterbi_decoding from nemo.collections.asr.models.ctc_models import EncDecCTCModel from nemo.collections.asr.parts.utils.transcribe_utils import setup_model from nemo.core.config import hydra_runner +from nemo.utils import logging """ @@ -66,7 +67,6 @@ e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 1 => utt_id will be "e1" e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 2 => utt_id will be "d_e1" e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 3 => utt_id will be "c_d_e1" - audio_sr: int specifying the sample rate of your audio files. device: string specifying the device that will be used for generating log-probs and doing Viterbi decoding. The string needs to be in a format recognized by torch.device() batch_size: int specifying batch size that will be used for generating log-probs and doing Viterbi decoding. @@ -86,7 +86,6 @@ class AlignmentConfig: # General configs separator: str = " " n_parts_for_ctm_id: int = 1 - audio_sr: int = 16000 device: str = "cpu" batch_size: int = 1 @@ -124,6 +123,13 @@ def main(cfg: AlignmentConfig): " Currently only instances of EncDecCTCModels are supported" ) + audio_sr = get_audio_sr(cfg.manifest_filepath) + logging.info( + f"Detected audio sampling rate {audio_sr}Hz in first audio in manifest at {cfg.manifest_filepath}. " + "Will assume all audios in manifest have this sampling rate. Sampling rate will be used to determine " + "timestamps in output CTM." + ) + # define start and end line IDs of batches with open(cfg.manifest_filepath, 'r') as f: num_lines_in_manifest = sum(1 for _ in f) @@ -148,7 +154,7 @@ def main(cfg: AlignmentConfig): cfg.model_downsample_factor, cfg.output_ctm_folder, cfg.n_parts_for_ctm_id, - cfg.audio_sr, + audio_sr, ) else: @@ -159,7 +165,7 @@ def main(cfg: AlignmentConfig): cfg.model_downsample_factor, cfg.output_ctm_folder, cfg.n_parts_for_ctm_id, - cfg.audio_sr, + audio_sr, cfg.separator, ) diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index 3fb2c58f192f..713bd801205e 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -13,12 +13,32 @@ # limitations under the License. import json +import os +import soundfile as sf import torch V_NEG_NUM = -1e30 +def get_audio_sr(manifest_filepath): + """ + Measure the sampling rate of the audio file in the first line + of the manifest at manifest_filepath + """ + with open(manifest_filepath, "r") as f_manifest: + first_line = json.loads(f_manifest.readline()) + + audio_file = first_line["audio_filepath"] + if not os.path.exists(audio_file): + raise RuntimeError( + f"Did not find filepath {audio_file} which was specified in manifest {manifest_filepath}." + ) + + with sf.SoundFile(audio_file, "r") as f_audio: + return f_audio.samplerate + + def get_manifest_lines(manifest_filepath, start, end): data = [] with open(manifest_filepath, "r") as f: From 44a856db420fbf1d127f2f3eb9309ee5db5256f6 Mon Sep 17 00:00:00 2001 From: Yuekai Zhang Date: Fri, 16 Dec 2022 06:12:51 +0800 Subject: [PATCH 060/154] [TTS][ZH] Disambiguate polyphones with augmented dict and Jieba segmenter for Chinese FastPitch (#5541) * Chinese TTS replaces default pypinyin dict * Add jieba word segmenter as an option Signed-off-by: Yuekai Zhang Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- .../tts/conf/zh/fastpitch_align_22050.yaml | 1 + nemo_text_processing/g2p/modules.py | 46 +++++++++++++++---- requirements/requirements_tts.txt | 2 + .../ds_conf/ds_for_fastpitch_align.yaml | 1 + 4 files changed, 42 insertions(+), 8 deletions(-) diff --git a/examples/tts/conf/zh/fastpitch_align_22050.yaml b/examples/tts/conf/zh/fastpitch_align_22050.yaml index 20b18d042014..b09540a75aad 100644 --- a/examples/tts/conf/zh/fastpitch_align_22050.yaml +++ b/examples/tts/conf/zh/fastpitch_align_22050.yaml @@ -74,6 +74,7 @@ model: g2p: _target_: nemo_text_processing.g2p.modules.ChineseG2p phoneme_dict: ${phoneme_dict_path} + word_segmenter: jieba # Only jieba is supported now. train_ds: dataset: diff --git a/nemo_text_processing/g2p/modules.py b/nemo_text_processing/g2p/modules.py index 7dac0741d289..264cbc4d3784 100644 --- a/nemo_text_processing/g2p/modules.py +++ b/nemo_text_processing/g2p/modules.py @@ -516,7 +516,12 @@ def __call__(self, text: str) -> List[str]: class ChineseG2p(BaseG2p): def __init__( - self, phoneme_dict=None, word_tokenize_func=None, apply_to_oov_word=None, mapping_file: Optional[str] = None, + self, + phoneme_dict=None, + word_tokenize_func=None, + apply_to_oov_word=None, + mapping_file: Optional[str] = None, + word_segmenter: Optional[str] = None, ): """Chinese G2P module. This module first converts Chinese characters into pinyin sequences using pypinyin, then pinyin sequences would be further converted into phoneme sequences using pinyin_dict_nv_22.10.txt dict file. For Chinese and English bilingual sentences, the English words @@ -528,8 +533,13 @@ def __init__( It is expected that unchangeable word representation will be represented as List[str], other cases are represented as str. It is useful to mark word as unchangeable which is already in phoneme representation. apply_to_oov_word: Function that will be applied to out of phoneme_dict word. + word_segmenter: method that will be applied to segment utterances into words for better polyphone disambiguation. """ assert phoneme_dict is not None, "Please set the phoneme_dict path." + assert word_segmenter in [ + None, + 'jieba', + ], f"{word_segmenter} is not supported now. Please choose correct word_segmenter." phoneme_dict = ( self._parse_as_pinyin_dict(phoneme_dict) if isinstance(phoneme_dict, str) or isinstance(phoneme_dict, pathlib.Path) @@ -551,7 +561,17 @@ def __init__( mapping_file=mapping_file, ) self.tones = {'1': '#1', '2': '#2', '3': '#3', '4': '#4', '5': '#5'} - from pypinyin import lazy_pinyin, Style + self.word_segmenter = word_segmenter + + try: + from pypinyin import lazy_pinyin, Style + from pypinyin_dict.pinyin_data import cc_cedict + import jieba + except ImportError as e: + logging.error(e) + + # replace pypinyin default dict with cc_cedict.txt for polyphone disambiguation + cc_cedict.load() self._lazy_pinyin = lazy_pinyin self._Style = Style @@ -580,12 +600,22 @@ def __call__(self, text): ' ', 'S', 't', 'o', 'r', 'e', ',', ' ', 'mai3', 'le5', 'yi2', 'ge4', 'i', 'P', 'h', 'o', 'n', 'e', '。'] """ - pinyin_seq = self._lazy_pinyin( - text, - style=self._Style.TONE3, - neutral_tone_with_five=True, - errors=lambda en_words: [letter for letter in en_words], - ) + pinyin_seq = [] + if self.word_segmenter == 'jieba': + # Cut sentences into words to improve polyphone disambiguation + words_list = list(jieba.cut(text)) + elif self.word_segmenter is None: + words_list = [text] + else: + raise NotImplementedError + + for word in words_list: + pinyin_seq += self._lazy_pinyin( + word, + style=self._Style.TONE3, + neutral_tone_with_five=True, + errors=lambda en_words: [letter for letter in en_words], + ) phoneme_seq = [] for pinyin in pinyin_seq: if pinyin[-1] in self.tones: diff --git a/requirements/requirements_tts.txt b/requirements/requirements_tts.txt index 5a611fb34380..6d77ba71bd6e 100644 --- a/requirements/requirements_tts.txt +++ b/requirements/requirements_tts.txt @@ -1,8 +1,10 @@ attrdict einops inflect +jieba librosa matplotlib nltk pandas pypinyin +pypinyin-dict diff --git a/scripts/dataset_processing/tts/sfbilingual/ds_conf/ds_for_fastpitch_align.yaml b/scripts/dataset_processing/tts/sfbilingual/ds_conf/ds_for_fastpitch_align.yaml index 9dd41f4fbd18..12ba58a93dc7 100755 --- a/scripts/dataset_processing/tts/sfbilingual/ds_conf/ds_for_fastpitch_align.yaml +++ b/scripts/dataset_processing/tts/sfbilingual/ds_conf/ds_for_fastpitch_align.yaml @@ -48,3 +48,4 @@ dataset: g2p: _target_: nemo_text_processing.g2p.modules.ChineseG2p phoneme_dict: ${phoneme_dict_path} + word_segmenter: jieba # Only jieba is supported now. From b635d4e44c026b9ef909bb27965848ed6755c3b7 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Thu, 15 Dec 2022 14:50:04 -0800 Subject: [PATCH 061/154] Make separate parameters for device of transcription and viterbi steps Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 4 +++- tools/nemo_forced_aligner/align.py | 19 ++++++++++++------- tools/nemo_forced_aligner/utils/data_prep.py | 10 +--------- .../utils/viterbi_decoding.py | 8 ++++++-- 4 files changed, 22 insertions(+), 19 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 6fda0290fa95..884f17d4c8d0 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -33,7 +33,9 @@ Call the `align.py` script, specifying the parameters as follows: * **[OPTIONAL]** `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be replaced with dashes, so as not to change the number of space-separated elements in the CTM files. -* **[OPTIONAL]** `device`: The device that will be used for generating log-probs and doing Viterbi decoding. (Default: 'cpu'). +* **[OPTIONAL]** `transcribe_device`: The device that will be used for generating log-probs (i.e. transcribing). (Default: 'cpu'). + +* **[OPTIONAL]** `viterbi_device`: The device that will be used for doing Viterbi decoding. (Default: 'cpu'). * **[OPTIONAL]** `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index e3093cfe8080..d0a129e7f892 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -67,8 +67,10 @@ e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 1 => utt_id will be "e1" e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 2 => utt_id will be "d_e1" e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 3 => utt_id will be "c_d_e1" - device: string specifying the device that will be used for generating log-probs and doing - Viterbi decoding. The string needs to be in a format recognized by torch.device() + transcribe_device: string specifying the device that will be used for generating log-probs (i.e. "transcribing"). + The string needs to be in a format recognized by torch.device(). + viterbi_device: string specifying the device that will be used for doing Viterbi decoding. + The string needs to be in a format recognized by torch.device(). batch_size: int specifying batch size that will be used for generating log-probs and doing Viterbi decoding. """ @@ -86,7 +88,8 @@ class AlignmentConfig: # General configs separator: str = " " n_parts_for_ctm_id: int = 1 - device: str = "cpu" + transcribe_device: str = "cpu" + viterbi_device: str = "cpu" batch_size: int = 1 @@ -112,10 +115,12 @@ def main(cfg: AlignmentConfig): if cfg.output_ctm_folder is None: raise ValueError("cfg.output_ctm_folder must be specified") - # load model - device = torch.device(cfg.device) + # init devices + transcribe_device = torch.device(cfg.transcribe_device) + viterbi_device = torch.device(cfg.viterbi_device) - model, _ = setup_model(cfg, device) + # load model + model, _ = setup_model(cfg, transcribe_device) if not isinstance(model, EncDecCTCModel): raise NotImplementedError( @@ -144,7 +149,7 @@ def main(cfg: AlignmentConfig): data = get_manifest_lines(cfg.manifest_filepath, start, end) log_probs, y, T, U = get_log_probs_y_T_U(data, model, cfg.separator) - alignments = viterbi_decoding(log_probs, y, T, U) + alignments = viterbi_decoding(log_probs, y, T, U, viterbi_device) if cfg.separator == "": make_basetoken_ctm( diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index 713bd801205e..dfe6e373d19f 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -31,9 +31,7 @@ def get_audio_sr(manifest_filepath): audio_file = first_line["audio_filepath"] if not os.path.exists(audio_file): - raise RuntimeError( - f"Did not find filepath {audio_file} which was specified in manifest {manifest_filepath}." - ) + raise RuntimeError(f"Did not find filepath {audio_file} which was specified in manifest {manifest_filepath}.") with sf.SoundFile(audio_file, "r") as f_audio: return f_audio.samplerate @@ -148,10 +146,4 @@ def get_log_probs_y_T_U(data, model, separator): U_dash = torch.tensor(U_dash_list) - # transfer all tensors to device - log_probs = log_probs.to(model.device) - y = y.to(model.device) - T = T.to(model.device) - U_dash = U_dash.to(model.device) - return log_probs, y, T, U_dash diff --git a/tools/nemo_forced_aligner/utils/viterbi_decoding.py b/tools/nemo_forced_aligner/utils/viterbi_decoding.py index da28ba4cf35e..301af6bd7a6b 100644 --- a/tools/nemo_forced_aligner/utils/viterbi_decoding.py +++ b/tools/nemo_forced_aligner/utils/viterbi_decoding.py @@ -17,7 +17,7 @@ V_NEG_NUM = -1e30 -def viterbi_decoding(log_probs, y, T, U): +def viterbi_decoding(log_probs, y, T, U, device): """ Does Viterbi decoding. Returns: @@ -35,7 +35,11 @@ def viterbi_decoding(log_probs, y, T, U): B, T_max, _ = log_probs.shape U_max = y.shape[1] - device = log_probs.device + # transfer all tensors to device + log_probs = log_probs.to(device) + y = y.to(device) + T = T.to(device) + U = U.to(device) padding_for_log_probs = V_NEG_NUM * torch.ones((B, T_max, 1), device=device) log_probs_padded = torch.cat((log_probs, padding_for_log_probs), dim=2) From 5d30692c1282bf32b290f39b30e875a3cc7801fc Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Thu, 15 Dec 2022 15:54:34 -0800 Subject: [PATCH 062/154] Add mention of gecko Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 884f17d4c8d0..367af39d2c70 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -53,3 +53,9 @@ For each utterance specified in a line of `manifest_filepath`, one CTM file will Each CTM file will contain lines of the format: ` 1 `. Note the second item in the line (the 'channel ID', which is required by the CTM file format) is always 1, as NFA operates on single channel audio. + + +# How do I evaluate the alignment accuracy? +Ideally you would have some 'true' CTM files to compare with your generated CTM files. + +Alternatively (or additionally), you can visualize the quality of alignments using tools such as Gecko, which can play your audio file and display the predicted alignments at the same time. The Gecko tool requires you to upload an audio file and at least one CTM file. The Gecko tool can be accessed here: https://gong-io.github.io/gecko/. More information about the Gecko tool can be found on its Github page here: https://github.com/gong-io/gecko. From db480d12b45a8b75fd23c2df8f7f8aad9e803cce Mon Sep 17 00:00:00 2001 From: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Date: Thu, 15 Dec 2022 18:04:42 -0800 Subject: [PATCH 063/154] [workflow] add exclude labels option to ignore cherry-picks in release changelog. (#5645) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- .../workflows/config/changelog-config.json | 30 ++++++++++++------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/.github/workflows/config/changelog-config.json b/.github/workflows/config/changelog-config.json index 85de88af260e..fe18f8ac0681 100644 --- a/.github/workflows/config/changelog-config.json +++ b/.github/workflows/config/changelog-config.json @@ -2,39 +2,48 @@ "categories": [ { "title": "## ASR \n\n

Changelog\n\n
\n\n", - "labels": ["asr"] + "labels": ["asr"], + "exclude_labels": ["cherry-pick"] }, { "title": "## TTS \n\n
Changelog\n\n
\n\n", - "labels": ["tts"] + "labels": ["tts"], + "exclude_labels": ["cherry-pick"] }, { "title": "## NLP / NMT \n\n
Changelog\n\n
\n\n", - "labels": ["nlp", "nmt", "megatron"] + "labels": ["nlp", "nmt", "megatron"], + "exclude_labels": ["cherry-pick"] }, { "title": "## Text Normalization / Inverse Text Normalization \n\n
Changelog\n\n
\n\n", - "labels": ["tn", "itn"] + "labels": ["tn", "itn"], + "exclude_labels": ["cherry-pick"] }, { "title": "## NeMo Tools \n\n
Changelog\n\n
\n\n", - "labels": ["tools"] + "labels": ["tools"], + "exclude_labels": ["cherry-pick"] }, { "title": "## Export \n\n
Changelog\n\n
\n\n", - "labels": ["export"] + "labels": ["export"], + "exclude_labels": ["cherry-pick"] }, { "title": "## Documentation \n\n
Changelog\n\n
\n\n", - "labels": ["docs"] + "labels": ["docs"], + "exclude_labels": ["cherry-pick"] }, { "title": "## Bugfixes \n\n
Changelog\n\n
\n\n", - "labels": ["bug"] + "labels": ["bug"], + "exclude_labels": ["cherry-pick"] }, { "title": "## Cherrypick \n\n
Changelog\n\n
\n\n", - "labels": ["cherrypick"] + "labels": ["cherry-pick"], + "exclude_labels": ["cherry-pick"] } ], "ignore_labels": [ @@ -121,4 +130,5 @@ "tag_resolver": { "method": "semver" } -} \ No newline at end of file +} + From 189a44d46a32f714fa22f69df46f27ef35c181ed Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu, 15 Dec 2022 19:51:01 -0800 Subject: [PATCH 064/154] [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. (#5643) (#5647) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- .../tts/FastPitch_ChineseTTS_Training.ipynb | 317 +++++++++--------- 1 file changed, 158 insertions(+), 159 deletions(-) diff --git a/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb b/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb index cf04498a5a35..4d85d95c49b8 100644 --- a/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb +++ b/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb @@ -73,7 +73,6 @@ "import torch\n", "import librosa\n", "import numpy as np\n", - "\n", "from pathlib import Path\n", "from tqdm.notebook import tqdm" ] @@ -86,13 +85,13 @@ "outputs": [], "source": [ "# let's download the files we need to run this tutorial\n", - "\n", "!mkdir -p NeMoChineseTTS\n", "!cd NeMoChineseTTS && wget https://raw.githubusercontent.com/nvidia/NeMo/$BRANCH/scripts/tts_dataset_files/zh/pinyin_dict_nv_22.10.txt\n", "!cd NeMoChineseTTS && wget https://raw.githubusercontent.com/nvidia/NeMo/$BRANCH/scripts/dataset_processing/tts/sfbilingual/get_data.py\n", "!cd NeMoChineseTTS && wget https://raw.githubusercontent.com/nvidia/NeMo/$BRANCH/examples/tts/fastpitch.py\n", "!cd NeMoChineseTTS && wget https://raw.githubusercontent.com/nvidia/NeMo/$BRANCH/examples/tts/hifigan_finetune.py\n", "!cd NeMoChineseTTS && wget https://raw.githubusercontent.com/nvidia/NeMo/$BRANCH/scripts/dataset_processing/tts/extract_sup_data.py\n", + "!cd NeMoChineseTTS && wget https://raw.githubusercontent.com/nvidia/NeMo/$BRANCH/scripts/dataset_processing/tts/generate_mels.py\n", "!cd NeMoChineseTTS && wget https://raw.githubusercontent.com/nvidia/NeMo/$BRANCH/scripts/dataset_processing/tts/sfbilingual/ds_conf/ds_for_fastpitch_align.yaml\n", "!cd NeMoChineseTTS && wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/tts/conf/zh/fastpitch_align_22050.yaml\n", "!cd NeMoChineseTTS && wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/tts/conf/hifigan/hifigan.yaml\n", @@ -121,8 +120,8 @@ "\n", "For more information on training a basic FastPitch model, please refer to [FastPitch_MixerTTS_Training.ipynb](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/FastPitch_MixerTTS_Training.ipynb) tutorial.\n", "\n", - "### HiFiGAN\n", - "HiFiGAN is a generative adversarial network (GAN) model that generates audio from mel spectrograms. The generator uses transposed convolutions to upsample mel spectrograms to audio. For more details about the model, please refer to the original [paper](https://arxiv.org/abs/2010.05646). NeMo re-implementation of HiFi-GAN can be found [here](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/tts/models/hifigan.py)." + "### HiFi-GAN\n", + "HiFi-GAN is a generative adversarial network (GAN) model that generates audio from mel spectrograms. The generator uses transposed convolutions to upsample mel spectrograms to audio. For more details about the model, please refer to the original [paper](https://arxiv.org/abs/2010.05646). NeMo re-implementation of HiFi-GAN can be found [here](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/tts/models/hifigan.py)." ] }, { @@ -138,30 +137,99 @@ "id": "e1cf7a8d", "metadata": {}, "source": [ - "We will show example of preprocessing and training using SF Bilingual Speech TTS dataset ([link](https://catalog.ngc.nvidia.com/orgs/nvidia/resources/sf_bilingual_speech_zh_en)). The dataset contains about 2,740 bilingual audio samples of a single female speaker and their corresponding text transcripts, each of them is an audio of around 5-6 seconds and have a total length of approximately 4.5 hours.\n", + "We will show example of preprocessing and training using SF Bilingual Speech TTS Dataset ([link](https://catalog.ngc.nvidia.com/orgs/nvidia/resources/sf_bilingual_speech_zh_en)). The dataset contains about 2,740 bilingual audio samples of a single female speaker and their corresponding text transcripts, each of them is an audio of around 5-6 seconds and have a total length of approximately 4.5 hours. The SF Bilingual Speech Dataset is published in NGC registry with CC BY-NC 4.0 license. Please review details from the above link.\n", "\n", "In this section, we will cover:\n", - "1. Downloading the dataset\n", - "2. Creating manifests\n", - "3. Normalizing text\n", - "4. Phonemization\n", - "5. Creating dataset config\n", - "6. Creating supplementary data" + "1. Installing NGC Registry CLI\n", + "2. Downloading the dataset\n", + "3. Creating manifests\n", + "4. Normalizing text\n", + "5. Phonemization\n", + "6. Creating dataset config\n", + "7. Creating supplementary data" + ] + }, + { + "cell_type": "markdown", + "id": "a5752a97", + "metadata": {}, + "source": [ + "## 1. Installing NGC Registry CLI\n", + "You will need to install the [NGC registry CLI](https://docs.nvidia.com/ngc/ngc-overview/index.html#installing-ngc-registry-cli) to download SF Bilingual Speech Dataset. In general, you will need to,\n", + "1. Log in to your enterprise account on the NGC website (https://ngc.nvidia.com).\n", + "2. In the top right corner, click your user account icon and select **Setup**, then click **Downloads** under **CLI** from the Setup page.\n", + "3. From the CLI Install page, click the **Windows**, **Linux**, or **macOS** tab, according to the platform from which you will be running NGC Registry CLI.\n", + "4. Follow the instructions to install the CLI.\n", + "5. Verify the installation by entering `ngc --version`. The output should be \"`NGC CLI x.y.z`\" where x.y.z indicates the version." ] }, { "cell_type": "markdown", - "id": "40e76a38", + "id": "417bb4f3-445a-4435-b4b8-cf795713a883", "metadata": {}, "source": [ - "## 1. Downloading the dataset" + "Below bash script demonstrate basic steps of NGC Registry CLI installation," ] }, { "cell_type": "code", "execution_count": null, - "id": "36b8a1d3", + "id": "23428b5d", + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "%%bash\n", + "# https://ngc.nvidia.com/setup/installers/cli\n", + "wget --content-disposition \"https://ngc.nvidia.com/downloads/ngccli_linux.zip\" && unzip ngccli_linux.zip && chmod u+x ngc-cli/ngc\n", + "find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5\n", + "echo \"export PATH=\\\"\\$PATH:$(pwd)/ngc-cli\\\"\" >> ~/.bash_profile && source ~/.bash_profile" + ] + }, + { + "cell_type": "markdown", + "id": "858e7a80-0a81-4ccf-830e-12e41955ae00", + "metadata": {}, + "source": [ + "### Note: You must configure NGC CLI for your use by running `ngc config set` so that you can run the commands." + ] + }, + { + "cell_type": "markdown", + "id": "48e42312-3989-4a3a-927d-49fca586fc1d", "metadata": {}, + "source": [ + "Here is the example of configuring NGC CLI,\n", + "``` bash\n", + "$ ngc config set\n", + "Enter API key [no-apikey]. Choices: [, 'no-apikey']: \n", + "Enter CLI output format type [ascii]. Choices: [ascii, csv, json]:\n", + "Enter org [no-org]. Choices: ['']: or leave it empty.\n", + "Enter team [no-team]. Choices: ['no-team']:\n", + "Enter ace [no-ace]. Choices: ['no-ace']:\n", + "Successfully saved NGC configuration to /root/.ngc/config\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "f4ed2b3e-5a3d-4afb-9693-2bfb87b35fa4", + "metadata": {}, + "source": [ + "## 2. Downloading the dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36b8a1d3", + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "!cd NeMoChineseTTS && mkdir DataChinese && \\\n", @@ -175,7 +243,9 @@ "cell_type": "code", "execution_count": null, "id": "6db032e5", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "# DataChineseTTS directory looks like\n", @@ -187,7 +257,7 @@ "id": "c4f33db9", "metadata": {}, "source": [ - "## 2. Creating manifests \n", + "## 3. Creating manifests \n", "\n", "We've created `scripts/dataset_processing/tts/sfbilingual/get_data.py` script that reads the `DataChineseTTS/SF_bilingual/text_SF.txt` provided with the dataset and generates the following fields per each datapoint:\n", "1. `audio_filepath`: location of the wav file\n", @@ -201,7 +271,9 @@ "cell_type": "code", "execution_count": null, "id": "eb063cf4", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "!(cd NeMoChineseTTS && \\\n", @@ -222,7 +294,9 @@ "cell_type": "code", "execution_count": null, "id": "5c7b9430", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "# NeMoChineseTTS directory looks like\n", @@ -234,7 +308,7 @@ "id": "2f6ea189", "metadata": {}, "source": [ - "## 3. Normalizing text\n", + "## 4. Normalizing text\n", "\n", "The script above, i.e. `scripts/dataset_processing/tts/sfbilingual/get_data.py`, also generates another field per each datapoint:\n", "- `normalized_text`: normalized text via NeMo's text normalizer:\n", @@ -250,7 +324,7 @@ "id": "6c7f253e", "metadata": {}, "source": [ - "## 4. Phonemization\n", + "## 5. Phonemization\n", "\n", "The pronunciation of a Chinese sentence can be represented as a string of phones. We would first convert a sentence into a pinyin sequences by using pypinyin library. Then we use a pre-defined pinyin-to-phoneme dict to convert them into phonemes. For English words in the sentences, we would directly use letters as input units." ] @@ -279,7 +353,7 @@ "id": "18578da2", "metadata": {}, "source": [ - "## 5. Creating dataset config\n", + "## 6. Creating dataset config\n", "\n", "Most of the configuration remains the same as described in [FastPitch and MixerTTS training tutorial](FastPitch_MixerTTS_Training.ipynb) except:\n", "1. The `text_tokenizer._target_` is set to `nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.ChinesePhonemesTokenizer` class defined here: `collections/common/tokenizers/text_to_speech/tts_tokenizers.py`.\n", @@ -349,7 +423,7 @@ "id": "b515c5b4", "metadata": {}, "source": [ - "## 6. Creating Supplementary Data\n", + "## 7. Creating Supplementary Data\n", "\n", "As mentioned in the [FastPitch and MixerTTS training tutorial](FastPitch_MixerTTS_Training.ipynb) - To accelerate and stabilize our training, we also need to extract pitch for every audio, estimate pitch statistics (mean and std) and pre-calculate alignment prior matrices for alignment framework. To do this, all we need to do is iterate over our data one time, via `extract_sup_data.py` script.\n", "\n", @@ -408,7 +482,9 @@ "cell_type": "code", "execution_count": null, "id": "e7f2373e", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "!cat NeMoChineseTTS/fastpitch_align_22050.yaml" @@ -444,7 +520,9 @@ "cell_type": "code", "execution_count": null, "id": "4ad43d30", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "!cd NeMoChineseTTS && CUDA_VISIBLE_DEVICES=0 python3 fastpitch.py --config-path . --config-name fastpitch_align_22050 \\\n", @@ -457,8 +535,8 @@ " exp_manager.exp_dir=resultChineseTTS \\\n", " trainer.max_epochs=1 \\\n", " trainer.check_val_every_n_epoch=1 \\\n", - " pitch_mean=221.4948272705078 #paste_pitch_mean_here \\\n", - " pitch_std=64.65289306640625 #paste_pitch_std_here \\\n", + " pitch_mean=221.4948272705078 \\ # Paste the PITCH_MEAN here.\n", + " pitch_std=64.65289306640625 \\ # Paste the PITCH_STD here.\n", " phoneme_dict_path=./pinyin_dict_nv_22.10.txt \\\n", " +exp_manager.create_wandb_logger=true \\\n", " +exp_manager.wandb_logger_kwargs.name=\"tutorial\" \\\n", @@ -507,13 +585,6 @@ "metadata": {}, "outputs": [], "source": [ - "hfg_ngc = \"tts_hifigan\" # NGC pretrained model name: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_hifigan \n", - "# from the results directory \n", - "fastpitch_model_path = \"\"\n", - "test = \"NTHU對面有一條宵夜街。\" # text input to the model\n", - "test_id = \"com_SF_ce1\" # identifier for the audio corresponding to the test text\n", - "test = \"微軟舉辦了RACETOMARKETCHALLENGE競賽\"\n", - "test_id = \"com_SF_ce10\" # identifier for the audio corresponding to the test text\n", "test = \"GOOGLE也計畫讓社交網路技術成為ANDROID未來版本的要項。\"\n", "test_id = \"com_SF_ce73\"\n", "data_path = \"NeMoChineseTTS/DataChinese/sf_bilingual_speech_zh_en_vv1/SF_bilingual/wavs/\" # path to dataset folder with wav files from original dataset\n", @@ -544,6 +615,17 @@ " return audio, spectrogram" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "a32096d6-b899-43e4-851f-b30e57ed369c", + "metadata": {}, + "outputs": [], + "source": [ + "# find the path of the FastPitch model checkpoint.\n", + "!ls NeMoChineseTTS/resultChineseTTS/FastPitch/*/checkpoints/FastPitch.nemo" + ] + }, { "cell_type": "code", "execution_count": null, @@ -553,7 +635,9 @@ }, "outputs": [], "source": [ - "# load models\n", + "# load fastpitch and hifigan models\n", + "hfg_ngc = \"tts_en_lj_hifigan_ft_mixerttsx\" # pretrained hifigan from https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_lj_hifigan/versions/1.6.0/files/tts_en_lj_hifigan_ft_mixerttsx.nemo\n", + "fastpitch_model_path =\"\"\n", "vocoder_model = HifiGanModel.from_pretrained(hfg_ngc, strict=False).eval().cuda()\n", "if \".nemo\" in fastpitch_model_path:\n", " spec_gen_model = FastPitchModel.restore_from(fastpitch_model_path).eval().cuda()\n", @@ -589,7 +673,7 @@ "id": "e57e9baf", "metadata": {}, "source": [ - "We see that audio quality is not as good as we expect, even after training FastPitch for 1000 epochs. One of the ways mentioned in the [FastPitch_Finetuning.ipynb](FastPitch_Finetuning.ipynb) tutorial is to finetune HiFi-GAN. Lets try that out next!" + "You would hear that the above synthesized audio quality is pretty bad. It would be improved after continuing to train 1500 epochs, but again, the quality is still not acceptable. A straightforward solution is to finetune the HiFi-GAN model following the tutorial [FastPitch_Finetuning.ipynb](FastPitch_Finetuning.ipynb). Lets try that out next!" ] }, { @@ -599,7 +683,7 @@ "source": [ "# Finetuning HiFi-GAN\n", "\n", - "Improving speech quality by Finetuning HiFi-GAN on synthesized mel-spectrograms from FastPitch. " + "Improving speech quality by Finetuning HiFi-GAN on synthesized mel-spectrograms from FastPitch." ] }, { @@ -620,8 +704,7 @@ "outputs": [], "source": [ "test_audio_filepath = \"NeMoChineseTTS/DataChinese/sf_bilingual_speech_zh_en_vv1/SF_bilingual/wavs/com_SF_ce1.wav\"\n", - "test_audio_text = \"NTHU對面有一條宵夜街。\"\n", - "fastpitch_model_path = \"\"" + "test_audio_text = \"NTHU對面有一條宵夜街。\"" ] }, { @@ -661,7 +744,9 @@ "cell_type": "code", "execution_count": null, "id": "1679a9f9", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "spec_model = FastPitchModel.restore_from(fastpitch_model_path).eval().cuda()" @@ -775,108 +860,21 @@ "- Finetuning with #1 has artifacts from the original audio (noise) that get passed on as input to the vocoder resulting in artifacts in vocoder output in the form of noise.\n", "- On the other hand, #2.1 (i.e. `Mel spectrogram predicted from FastPitch with groundtruth alignment and duration`) gives the best results because it enables HiFi-GAN to learn mel spectrograms generated by FastPitch as well as duration distributions closer to the real world (i.e. ground truth) durations. \n", "\n", - "From implementation perspective - we follow the same process described in [Finetuning FastPitch for a new speaker](FastPitch_Finetuning.ipynb) - i.e. take the latest checkpoint from FastPitch training and predict spectrograms for each of the input records in `train_manifest.json`, `test_manifest.json` and `val_manifest.json`." + "From implementation perspective - we follow the same process described in [Finetuning FastPitch for a new speaker](FastPitch_Finetuning.ipynb) - i.e. take the latest checkpoint from FastPitch training and predict spectrograms for each of the input records in `train_manifest.json`, `test_manifest.json` and `val_manifest.json`. NeMo provides an efficient script, [scripts/dataset_processing/tts/generate_mels.py](https://raw.githubusercontent.com/nvidia/NeMo/$BRANCH/scripts/dataset_processing/tts/generate_mels.py), to generate Mel-spectrograms in the directory `NeMoChineseTTS/mels` and also create new JSON manifests with a suffix `_mel` by adding a new key `\"mel_filepath\"`. For example, `train_manifest.json` corresponds to `train_manifest_mel.json` saved in the same directory. You can run the following CLI to obtain the new JSON manifests." ] }, { "cell_type": "code", "execution_count": null, - "id": "4d2c5e5d", + "id": "d88c37d4-f157-475d-b277-76838722590f", "metadata": {}, "outputs": [], "source": [ - "import json\n", - "import numpy as np\n", - "import torch\n", - "import soundfile as sf\n", - "import librosa\n", - "\n", - "from pathlib import Path\n", - "\n", - "from nemo.collections.tts.torch.helpers import BetaBinomialInterpolator\n", - "\n", - "folder_name = \"synmels\"\n", - "fastpitch_model_path = \"\"\n", - "dataset_parts = [\"test_manifest\", \"val_manifest\", \"train_manifest\"]\n", - "dataset_base_path = \"NeMoChineseTTS/\"\n", - "\n", - "from nemo.collections.tts.models import FastPitchModel\n", - "if \".nemo\" in fastpitch_model_path:\n", - " spec_model = FastPitchModel.restore_from(fastpitch_model_path).eval().cuda()\n", - "else:\n", - " spec_model = FastPitchModel.load_from_checkpoint(checkpoint_path=fastpitch_model_path).eval().cuda()\n", - "\n", - "spec_model.eval().cuda()\n", - " \n", - "def load_wav(audio_file):\n", - " with sf.SoundFile(audio_file, 'r') as f:\n", - " samples = f.read(dtype='float32')\n", - " return samples.transpose()\n", - " \n", - "for dataset_part in dataset_parts:\n", - " # Get records from the manifest\n", - " manifest_path = f\"{dataset_base_path}{dataset_part}.json\"\n", - " records = []\n", - " with open(manifest_path, \"r\") as f:\n", - " for i, line in enumerate(f):\n", - " records.append(json.loads(line))\n", - "\n", - " beta_binomial_interpolator = BetaBinomialInterpolator()\n", - "\n", - " spec_model.eval()\n", - " device = spec_model.device\n", - "\n", - " save_dir = Path(f\"{dataset_base_path}{folder_name}/{dataset_part}\")\n", - "\n", - " save_dir.mkdir(exist_ok=True, parents=True)\n", - "\n", - " # Generate a spectrograms (we need to use ground truth alignment for correct matching between audio and mels)\n", - " for i, r in enumerate(records):\n", - " audio = load_wav(r[\"audio_filepath\"])\n", - "\n", - " audio = torch.from_numpy(audio).unsqueeze(0).to(device)\n", - " audio_len = torch.tensor(audio.shape[1], dtype=torch.long, device=device).unsqueeze(0)\n", - "\n", - " # Again, our finetuned FastPitch model doesn't use multiple speakers,\n", - " # but we keep the code to support it here for reference\n", - " if spec_model.fastpitch.speaker_emb is not None and \"speaker\" in r:\n", - " speaker = torch.tensor([r['speaker']]).to(device)\n", - " else:\n", - " speaker = None\n", - "\n", - " with torch.no_grad():\n", - " if \"normalized_text\" in r:\n", - " text = spec_model.parse(r[\"normalized_text\"], normalize=False)\n", - " else:\n", - " text = spec_model.parse(r['text'])\n", - "\n", - " text_len = torch.tensor(text.shape[-1], dtype=torch.long, device=device).unsqueeze(0)\n", - "\n", - " spect, spect_len = spec_model.preprocessor(input_signal=audio, length=audio_len)\n", - "\n", - " # Generate attention prior and spectrogram inputs for HiFi-GAN\n", - " attn_prior = torch.from_numpy(\n", - " beta_binomial_interpolator(spect_len.item(), text_len.item())\n", - " ).unsqueeze(0).to(text.device)\n", - "\n", - " spectrogram = spec_model.forward(\n", - " text=text, \n", - " input_lens=text_len, \n", - " spec=spect, \n", - " mel_lens=spect_len, \n", - " attn_prior=attn_prior,\n", - " speaker=speaker,\n", - " )[0]\n", - "\n", - " save_path = save_dir / f\"mel_{i}.npy\"\n", - " np.save(save_path, spectrogram[0].to('cpu').numpy())\n", - " r[\"mel_filepath\"] = str(save_path)\n", - "\n", - " hifigan_manifest_path = f\"{dataset_base_path}{folder_name}/hifigan_{dataset_part}_ft.json\"\n", - "\n", - " with open(hifigan_manifest_path, \"w\") as f:\n", - " for r in records:\n", - " f.write(json.dumps(r) + '\\n')" + "!python NeMoChineseTTS/generate_mels.py \\\n", + " --cpu \\\n", + " --fastpitch-model-ckpt \"\" \\\n", + " --input-json-manifests NeMoChineseTTS/train_manifest.json NeMoChineseTTS/val_manifest.json NeMoChineseTTS/test_manifest.json \\\n", + " --output-json-manifest-root NeMoChineseTTS" ] }, { @@ -916,21 +914,7 @@ "id": "565e07a8", "metadata": {}, "source": [ - "## Launch finetuning\n", - "\n", - "Download the pre-trained HiFi-GAN from NGC." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "fa06c0b1", - "metadata": {}, - "outputs": [], - "source": [ - "!(cd NemoChineseTTS && \\\n", - " wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_hifigan/versions/1.0.0rc1/zip -O tts_hifigan_1.0.0rc1.zip && \\\n", - " unzip tts_hifigan_1.0.0rc1.zip)" + "## Launch finetuning" ] }, { @@ -948,22 +932,21 @@ "metadata": {}, "outputs": [], "source": [ - "!(python3 NeMoChineseTTS/hifigan_finetune.py --config-path . --config-name hifigan.yaml \\\n", - " model.max_steps=10 \\\n", + "!CUDA_VISIBLE_DEVICES=0 python3 NeMoChineseTTS/hifigan_finetune.py --config-path . --config-name hifigan.yaml \\\n", + " trainer.max_steps=10 \\\n", " model.optim.lr=0.00001 \\\n", " ~model.optim.sched \\\n", - " train_dataset=NeMoChineseTTS/synmels/hifigan_train_manifest_ft.json \\\n", - " validation_datasets=NeMoChineseTTS/synmels/hifigan_val_manifest_ft.json \\\n", - " exp_manager.exp_dir=resultChineseTTS \\\n", - " +init_from_nemo_model=NeMoChineseTTS/tts_hifigan.nemo \\\n", - " trainer.devices=-1 \\\n", + " train_dataset=NeMoChineseTTS/train_manifest_mel.json \\\n", + " validation_datasets=NeMoChineseTTS/val_manifest_mel.json \\\n", + " exp_manager.exp_dir=NeMoChineseTTS/resultChineseTTS \\\n", + " +init_from_pretrained_model=\"tts_en_lj_hifigan_ft_mixerttsx\" \\\n", " +trainer.val_check_interval=5 \\\n", " trainer.check_val_every_n_epoch=null \\\n", " model/train_ds=train_ds_finetune \\\n", " model/validation_ds=val_ds_finetune \\\n", " exp_manager.create_wandb_logger=true \\\n", " exp_manager.wandb_logger_kwargs.name=\"tutorial_2\" \\\n", - " exp_manager.wandb_logger_kwargs.project=\"ChineseTTS\")" + " exp_manager.wandb_logger_kwargs.project=\"ChineseTTS\"" ] }, { @@ -984,6 +967,16 @@ "Let's evaluate the quality of the FastPitch model generated so far using a HiFi-GAN model finetuned on predicted mels." ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "8ce21de1-b60a-4c4f-bf78-b56dce560803", + "metadata": {}, + "outputs": [], + "source": [ + "!ls NeMoChineseTTS/resultChineseTTS/HifiGan/*/checkpoints/HifiGan.nemo" + ] + }, { "cell_type": "code", "execution_count": null, @@ -991,8 +984,7 @@ "metadata": {}, "outputs": [], "source": [ - "hfg_path = \"\"\n", - "fastpitch_model_path = \"\"\n", + "hfg_path = \"\"\n", "if \".nemo\" in hfg_path:\n", " vocoder_model_pt = HifiGanModel.restore_from(hfg_path).eval().cuda()\n", "else:\n", @@ -1052,7 +1044,14 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.8.13" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {}, + "version_major": 2, + "version_minor": 0 + } } }, "nbformat": 4, From 2733d546de76cdc110a40f8037c9082c38023def Mon Sep 17 00:00:00 2001 From: "He Huang (Steve)" <105218074+stevehuang52@users.noreply.github.com> Date: Fri, 16 Dec 2022 10:16:06 -0500 Subject: [PATCH 065/154] [Add] ASR+VAD Inference Pipeline (#5575) Added offline ASR+VAD inference pipeline that matches with what's in RIVA, along with some feature-based ASR and classification datasets. Signed-off-by: stevehuang52 Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- README.rst | 1 + examples/asr/asr_vad/README.md | 47 ++ .../asr/asr_vad/speech_to_text_with_vad.py | 595 ++++++++++++++++++ nemo/collections/asr/data/audio_to_label.py | 2 +- nemo/collections/asr/data/feature_to_label.py | 219 ++++++- .../asr/data/feature_to_label_dataset.py | 15 + nemo/collections/asr/data/feature_to_text.py | 451 +++++++++++++ .../asr/data/feature_to_text_dataset.py | 90 +++ .../asr/models/classification_models.py | 90 ++- .../asr/parts/preprocessing/feature_loader.py | 36 +- nemo/collections/asr/parts/utils/vad_utils.py | 110 +++- .../common/parts/preprocessing/collections.py | 270 +++++++- .../common/parts/preprocessing/manifest.py | 33 +- tests/collections/asr/test_asr_datasets.py | 167 +++++ tests/collections/asr/test_label_datasets.py | 32 +- 15 files changed, 2098 insertions(+), 60 deletions(-) create mode 100644 examples/asr/asr_vad/README.md create mode 100644 examples/asr/asr_vad/speech_to_text_with_vad.py create mode 100644 nemo/collections/asr/data/feature_to_text.py create mode 100644 nemo/collections/asr/data/feature_to_text_dataset.py diff --git a/README.rst b/README.rst index 3b67330615dc..acec586245ef 100644 --- a/README.rst +++ b/README.rst @@ -71,6 +71,7 @@ Key Features * `Support of long audios for Conformer with memory efficient local attention `_ * `Speech Classification and Speech Command Recognition `_: MatchboxNet (Command Recognition) * `Voice activity Detection (VAD) `_: MarbleNet + * ASR with VAD Inference - `Example `_ * `Speaker Recognition `_: TitaNet, ECAPA_TDNN, SpeakerNet * `Speaker Diarization `_ * Clustering Diarizer: TitaNet, ECAPA_TDNN, SpeakerNet diff --git a/examples/asr/asr_vad/README.md b/examples/asr/asr_vad/README.md new file mode 100644 index 000000000000..03c7efa146b9 --- /dev/null +++ b/examples/asr/asr_vad/README.md @@ -0,0 +1,47 @@ +# NeMo ASR+VAD Inference + +This example provides the ASR+VAD inference pipeline, with the option to perform only ASR or VAD alone. + +## Input + +There are two types of input +- A manifest passed to `manifest_filepath`, +- A directory containing audios passed to `audio_dir` and also specify `audio_type` (default to `wav`). + +The input manifest must be a manifest json file, where each line is a Python dictionary. The fields ["audio_filepath", "offset", "duration", "text"] are required. An example of a manifest file is: +```json +{"audio_filepath": "/path/to/audio_file1", "offset": 0, "duration": 10000, "text": "a b c d e"} +{"audio_filepath": "/path/to/audio_file2", "offset": 0, "duration": 10000, "text": "f g h i j"} +``` + +## Output +Output will be a folder storing the VAD predictions and/or a manifest containing the audio transcriptions. Some temporary data will also be stored. + + +## Usage + +To run the code with ASR+VAD default settings: + +```bash +python speech_to_text_with_vad.py \ + manifest_filepath=/PATH/TO/MANIFEST.json \ + vad_model=vad_multilingual_marblenet \ + asr_model=stt_en_conformer_ctc_large \ + vad_config=../conf/vad/vad_inference_postprocess.yaml +``` + +To use only ASR and disable VAD, set `vad_model=None` and `use_rttm=False`. + +To use only VAD, set `asr_model=None` and specify both `vad_model` and `vad_config`. + +To enable profiling, set `profiling=True`, but this will significantly slow down the program. + +To use or disable feature masking, set `use_rttm` to `True` or `False`. + +To normalize feature before masking, set `normalize=pre_norm`, +and set `normalize=post_norm` for masking before normalization. + +To use a specific value for feature masking, set `feat_mask_val` to the desired value. +Default is `feat_mask_val=None`, where -16.530 (zero log mel-spectrogram value) will be used for `post_norm` and 0 (same as SpecAugment) will be used for `pre_norm`. + +See more options in the `InferenceConfig` class. diff --git a/examples/asr/asr_vad/speech_to_text_with_vad.py b/examples/asr/asr_vad/speech_to_text_with_vad.py new file mode 100644 index 000000000000..b22ff709c344 --- /dev/null +++ b/examples/asr/asr_vad/speech_to_text_with_vad.py @@ -0,0 +1,595 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +This file provides the ASR+VAD inference pipeline, with the option to perform only ASR or VAD alone. + +There are two types of input, the first one is a manifest passed to `manifest_filepath`, +and the other one is to pass a directory containing audios to `audio_dir` and specify `audio_type`. + +The input manifest must be a manifest json file, where each line is a Python dictionary. The fields ["audio_filepath", "offset", "duration", "text"] are required. An example of a manifest file is: +``` +{"audio_filepath": "/path/to/audio_file1", "offset": 0, "duration": 10000, "text": "a b c d e"} +{"audio_filepath": "/path/to/audio_file2", "offset": 0, "duration": 10000, "text": "f g h i j"} +``` + +To run the code with ASR+VAD default settings: + +```bash +python speech_to_text_with_vad.py \ + manifest_filepath=/PATH/TO/MANIFEST.json \ + vad_model=vad_multilingual_marblenet \ + asr_model=stt_en_conformer_ctc_large \ + vad_config=../conf/vad/vad_inference_postprocess.yaml +``` + +To use only ASR and disable VAD, set `vad_model=None` and `use_rttm=False`. + +To use only VAD, set `asr_model=None` and specify both `vad_model` and `vad_config`. + +To enable profiling, set `profiling=True`, but this will significantly slow down the program. + +To use or disable feature masking, set `use_rttm` to `True` or `False`. + +To normalize feature before masking, set `normalize=pre_norm`, +and set `normalize=post_norm` for masking before normalization. + +To use a specific value for feature masking, set `feat_mask_val` to the desired value. +Default is `feat_mask_val=None`, where -16.530 will be used for `post_norm` and 0 will be used for `pre_norm`. + +See more options in the `InferenceConfig` class. +""" + + +import contextlib +import json +import os +import time +from dataclasses import dataclass, is_dataclass +from pathlib import Path +from typing import Callable, Optional + +import torch +import yaml +from omegaconf import DictConfig, OmegaConf +from torch.profiler import ProfilerActivity, profile, record_function +from tqdm import tqdm + +from nemo.collections.asr.data import feature_to_text_dataset +from nemo.collections.asr.metrics.rnnt_wer import RNNTDecodingConfig +from nemo.collections.asr.metrics.wer import CTCDecodingConfig, word_error_rate +from nemo.collections.asr.models import ASRModel, EncDecClassificationModel +from nemo.collections.asr.parts.utils.manifest_utils import read_manifest, write_manifest +from nemo.collections.asr.parts.utils.vad_utils import ( + extract_audio_features, + generate_overlap_vad_seq, + generate_vad_segment_table, + get_vad_stream_status, + init_vad_model, +) +from nemo.core.config import hydra_runner +from nemo.utils import logging + +try: + from torch.cuda.amp import autocast +except ImportError: + + @contextlib.contextmanager + def autocast(enabled=None): + yield + + +@dataclass +class InferenceConfig: + # Required configs + asr_model: Optional[str] = None # Path to a .nemo file or a pretrained NeMo model on NGC + vad_model: Optional[str] = None # Path to a .nemo file or a pretrained NeMo model on NGC + vad_config: Optional[str] = None # Path to a yaml file containing VAD post-processing configs + manifest_filepath: Optional[str] = None # Path to dataset's JSON manifest + audio_dir: Optional[str] = None + + use_rttm: bool = True # whether to use RTTM + feat_mask_val: Optional[float] = None # value used to mask features based on RTTM, set None to use defaults + normalize: Optional[ + str + ] = "post_norm" # whether and where to normalize feature, choices=[None, `pre_norm`, `post_norm`] + normalize_type: str = "per_feature" # how to determine mean and std used for normalization + use_pure_noise: bool = False # whether input is pure noise or not. + + profiling: bool = False # whether to enable pytorch profiling + + # General configs + batch_size: int = 1 # batch size for ASR. Feature extraction and VAD only support single sample per batch. + num_workers: int = 8 + sample_rate: int = 16000 + frame_unit_time_secs: float = 0.01 # unit time per frame in seconds, equal to `window_stride` in ASR configs. + audio_type: str = "wav" + + # Output settings, no need to change + output_dir: Optional[str] = None # will be automatically set by the program + output_filename: Optional[str] = None # will be automatically set by the program + pred_name_postfix: Optional[str] = None # If you need to use another model name, rather than standard one. + + # Set to True to output language ID information + compute_langs: bool = False + + # Decoding strategy for CTC models + ctc_decoding: CTCDecodingConfig = CTCDecodingConfig() + + # Decoding strategy for RNNT models + rnnt_decoding: RNNTDecodingConfig = RNNTDecodingConfig(fused_batch_size=-1) + + +@hydra_runner(config_name="InferenceConfig", schema=InferenceConfig) +def main(cfg): + + if is_dataclass(cfg): + cfg = OmegaConf.structured(cfg) + + if cfg.output_dir is None: + cfg.output_dir = "./outputs" + output_dir = Path(cfg.output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + + logging.info(f'Hydra config: {OmegaConf.to_yaml(cfg)}') + + # setup profiling, note that profiling will significantly increast the total runtime + if cfg.profiling: + logging.info("Profiling enabled") + profile_fn = profile + record_fn = record_function + else: + logging.info("Profiling disabled") + + @contextlib.contextmanager + def profile_fn(*args, **kwargs): + yield + + @contextlib.contextmanager + def record_fn(*args, **kwargs): + yield + + input_manifest_file = prepare_inference_manifest(cfg) + + if cfg.manifest_filepath is None: + cfg.manifest_filepath = str(input_manifest_file) + + with profile_fn( + activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], record_shapes=True, profile_memory=True + ) as prof: + + input_manifest_file = extract_audio_features(input_manifest_file, cfg, record_fn) + + if cfg.vad_model is not None: + logging.info(f"Running VAD with model: {cfg.vad_model}") + input_manifest_file = run_vad_inference(input_manifest_file, cfg, record_fn) + + if cfg.asr_model is not None: + logging.info(f"Running ASR with model: {cfg.asr_model}") + run_asr_inference(input_manifest_file, cfg, record_fn) + + if cfg.profiling: + print("--------------------------------------------------------------------\n") + print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=15)) + print("--------------------------------------------------------------------\n") + logging.info("Done.") + + +def prepare_inference_manifest(cfg: DictConfig) -> str: + + if cfg.audio_dir is not None and cfg.manifest_filepath is None: + manifest_data = [] + for audio_file in Path(cfg.audio_dir).glob(f"**/*.{cfg.audio_type}"): + item = {"audio_filepath": str(audio_file.absolute()), "duration": 1000000, "offset": 0} + manifest_data.append(item) + parent_dir = Path(cfg.audio_dir) + else: + manifest_data = read_manifest(cfg.manifest_filepath) + parent_dir = Path(cfg.manifest_filepath).parent + + new_manifest_data = [] + + for item in manifest_data: + audio_file = Path(item["audio_filepath"]) + if len(str(audio_file)) < 255 and not audio_file.is_file() and not audio_file.is_absolute(): + new_audio_file = parent_dir / audio_file + if new_audio_file.is_file(): + item["audio_filepath"] = str(new_audio_file.absolute()) + else: + item["audio_filepath"] = os.path.expanduser(str(audio_file)) + else: + item["audio_filepath"] = os.path.expanduser(str(audio_file)) + item["label"] = "infer" + item["text"] = "-" + new_manifest_data.append(item) + + new_manifest_filepath = str(Path(cfg.output_dir) / Path("temp_manifest_input.json")) + write_manifest(new_manifest_filepath, new_manifest_data) + return new_manifest_filepath + + +def extract_audio_features(manifest_filepath: str, cfg: DictConfig, record_fn: Callable) -> str: + file_list = [] + manifest_data = [] + out_dir = Path(cfg.output_dir) / Path("features") + new_manifest_filepath = str(Path(cfg.output_dir) / Path("temp_manifest_input_feature.json")) + + if Path(new_manifest_filepath).is_file(): + logging.info("Features already exist in output_dir, skipping feature extraction.") + return new_manifest_filepath + + has_feat = False + with open(manifest_filepath, 'r', encoding='utf-8') as fin: + for line in fin.readlines(): + item = json.loads(line.strip()) + manifest_data.append(item) + file_list.append(Path(item['audio_filepath']).stem) + if "feature_file" in item: + has_feat = True + if has_feat: + logging.info("Features already exist in manifest, skipping feature extraction.") + return manifest_filepath + + out_dir.mkdir(parents=True, exist_ok=True) + torch.set_grad_enabled(False) + vad_model = EncDecClassificationModel.from_pretrained("vad_multilingual_marblenet") + device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + vad_model = vad_model.to(device) + vad_model.eval() + vad_model.setup_test_data( + test_data_config={ + 'batch_size': 1, + 'vad_stream': False, + 'sample_rate': cfg.sample_rate, + 'manifest_filepath': manifest_filepath, + 'labels': ['infer',], + 'num_workers': cfg.num_workers, + 'shuffle': False, + } + ) + + logging.info(f"Extracting features on {len(file_list)} audio files...") + with record_fn("feat_extract_loop"): + for i, test_batch in enumerate(tqdm(vad_model.test_dataloader(), total=len(vad_model.test_dataloader()))): + test_batch = [x.to(vad_model.device) for x in test_batch] + with autocast(): + with record_fn("feat_extract_infer"): + processed_signal, processed_signal_length = vad_model.preprocessor( + input_signal=test_batch[0], length=test_batch[1], + ) + with record_fn("feat_extract_other"): + processed_signal = processed_signal.squeeze(0)[:, :processed_signal_length] + processed_signal = processed_signal.cpu() + outpath = os.path.join(out_dir, file_list[i] + ".pt") + outpath = str(Path(outpath).absolute()) + torch.save(processed_signal, outpath) + manifest_data[i]["feature_file"] = outpath + del test_batch + + logging.info(f"Features saved at: {out_dir}") + write_manifest(new_manifest_filepath, manifest_data) + return new_manifest_filepath + + +def run_vad_inference(manifest_filepath: str, cfg: DictConfig, record_fn: Callable) -> str: + logging.info("Start VAD inference pipeline...") + vad_model = init_vad_model(cfg.vad_model) + device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + vad_model = vad_model.to(device) + vad_model.eval() + + vad_yaml = Path(cfg.vad_config) + if not vad_yaml.is_file(): + raise ValueError(f"VAD config file not found: {cfg.vad_config}") + + with vad_yaml.open("r") as fp: + vad_cfg = yaml.safe_load(fp) + vad_cfg = DictConfig(vad_cfg) + + test_data_config = { + 'vad_stream': True, + 'manifest_filepath': manifest_filepath, + 'labels': ['infer',], + 'num_workers': cfg.num_workers, + 'shuffle': False, + 'window_length_in_sec': vad_cfg.vad.parameters.window_length_in_sec, + 'shift_length_in_sec': vad_cfg.vad.parameters.shift_length_in_sec, + } + vad_model.setup_test_data(test_data_config=test_data_config, use_feat=True) + + pred_dir = Path(cfg.output_dir) / Path("vad_frame_pred") + if pred_dir.is_dir(): + logging.info(f"VAD frame-level prediction already exists: {pred_dir}, skipped") + else: + logging.info("Generating VAD frame-level prediction") + pred_dir.mkdir(parents=True) + t0 = time.time() + pred_dir = generate_vad_frame_pred( + vad_model=vad_model, + window_length_in_sec=vad_cfg.vad.parameters.window_length_in_sec, + shift_length_in_sec=vad_cfg.vad.parameters.shift_length_in_sec, + manifest_vad_input=manifest_filepath, + out_dir=str(pred_dir), + use_feat=True, + record_fn=record_fn, + ) + t1 = time.time() + logging.info(f"Time elapsed: {t1 - t0: .2f} seconds") + logging.info( + f"Finished generating VAD frame level prediction with window_length_in_sec={vad_cfg.vad.parameters.window_length_in_sec} and shift_length_in_sec={vad_cfg.vad.parameters.shift_length_in_sec}" + ) + + frame_length_in_sec = vad_cfg.vad.parameters.shift_length_in_sec + # overlap smoothing filter + if vad_cfg.vad.parameters.smoothing: + # Generate predictions with overlapping input segments. Then a smoothing filter is applied to decide the label for a frame spanned by multiple segments. + # smoothing_method would be either in majority vote (median) or average (mean) + logging.info("Generating predictions with overlapping input segments") + t0 = time.time() + smoothing_pred_dir = generate_overlap_vad_seq( + frame_pred_dir=pred_dir, + smoothing_method=vad_cfg.vad.parameters.smoothing, + overlap=vad_cfg.vad.parameters.overlap, + window_length_in_sec=vad_cfg.vad.parameters.window_length_in_sec, + shift_length_in_sec=vad_cfg.vad.parameters.shift_length_in_sec, + num_workers=cfg.num_workers, + out_dir=vad_cfg.smoothing_out_dir, + ) + logging.info( + f"Finish generating predictions with overlapping input segments with smoothing_method={vad_cfg.vad.parameters.smoothing} and overlap={vad_cfg.vad.parameters.overlap}" + ) + t1 = time.time() + logging.info(f"Time elapsed: {t1 - t0: .2f} seconds") + pred_dir = smoothing_pred_dir + frame_length_in_sec = 0.01 + + # Turn frame-wise prediction into speech intervals + logging.info(f"Generating segment tables with postprocessing params: {vad_cfg.vad.parameters.postprocessing}") + segment_dir_name = "vad_rttm" + for key, val in vad_cfg.vad.parameters.postprocessing.items(): + if key == "use_rttm": + continue + segment_dir_name = segment_dir_name + "-" + str(key) + str(val) + + segment_dir = Path(cfg.output_dir) / Path(segment_dir_name) + if segment_dir.is_dir(): + logging.info(f"VAD speech segments already exists: {segment_dir}, skipped") + else: + segment_dir.mkdir(parents=True) + t0 = time.time() + vad_cfg.vad.parameters.postprocessing.use_rttm = True + segment_dir = generate_vad_segment_table( + vad_pred_dir=pred_dir, + postprocessing_params=vad_cfg.vad.parameters.postprocessing, + frame_length_in_sec=frame_length_in_sec, + num_workers=cfg.num_workers, + out_dir=segment_dir, + ) + t1 = time.time() + logging.info(f"Time elapsed: {t1 - t0: .2f} seconds") + logging.info("Finished generating RTTM files from VAD predictions.") + + rttm_map = {} + for filepath in Path(segment_dir).glob("*.rttm"): + rttm_map[filepath.stem] = str(filepath.absolute()) + + manifest_data = read_manifest(manifest_filepath) + for i in range(len(manifest_data)): + key = Path(manifest_data[i]["audio_filepath"]).stem + manifest_data[i]["rttm_file"] = rttm_map[key] + + new_manifest_filepath = str(Path(cfg.output_dir) / Path(f"temp_manifest_{segment_dir_name}.json")) + write_manifest(new_manifest_filepath, manifest_data) + return new_manifest_filepath + + +def generate_vad_frame_pred( + vad_model: EncDecClassificationModel, + window_length_in_sec: float, + shift_length_in_sec: float, + manifest_vad_input: str, + out_dir: str, + use_feat: bool = False, + record_fn: Callable = None, +) -> str: + """ + Generate VAD frame level prediction and write to out_dir + """ + time_unit = int(window_length_in_sec / shift_length_in_sec) + trunc = int(time_unit / 2) + trunc_l = time_unit - trunc + all_len = 0 + + data = [] + with open(manifest_vad_input, 'r', encoding='utf-8') as fin: + for line in fin.readlines(): + file = json.loads(line)['audio_filepath'].split("/")[-1] + data.append(file.split(".wav")[0]) + logging.info(f"Inference on {len(data)} audio files/json lines!") + + status = get_vad_stream_status(data) + + with record_fn("vad_infer_loop"): + for i, test_batch in enumerate(tqdm(vad_model.test_dataloader(), total=len(vad_model.test_dataloader()))): + test_batch = [x.to(vad_model.device) for x in test_batch] + with autocast(): + with record_fn("vad_infer_model"): + if use_feat: + log_probs = vad_model(processed_signal=test_batch[0], processed_signal_length=test_batch[1]) + else: + log_probs = vad_model(input_signal=test_batch[0], input_signal_length=test_batch[1]) + + with record_fn("vad_infer_other"): + probs = torch.softmax(log_probs, dim=-1) + pred = probs[:, 1] + + if status[i] == 'start': + to_save = pred[:-trunc] + elif status[i] == 'next': + to_save = pred[trunc:-trunc_l] + elif status[i] == 'end': + to_save = pred[trunc_l:] + else: + to_save = pred + + all_len += len(to_save) + outpath = os.path.join(out_dir, data[i] + ".frame") + with open(outpath, "a", encoding='utf-8') as fout: + for f in range(len(to_save)): + fout.write('{0:0.4f}\n'.format(to_save[f])) + + del test_batch + if status[i] == 'end' or status[i] == 'single': + all_len = 0 + return out_dir + + +def init_asr_model(model_path: str) -> ASRModel: + if model_path.endswith('.nemo'): + logging.info(f"Using local ASR model from {model_path}") + asr_model = ASRModel.restore_from(restore_path=model_path) + elif model_path.endswith('.ckpt'): + asr_model = ASRModel.load_from_checkpoint(checkpoint_path=model_path) + else: + logging.info(f"Using NGC ASR model {model_path}") + asr_model = ASRModel.from_pretrained(model_name=model_path) + return asr_model + + +def run_asr_inference(manifest_filepath, cfg, record_fn) -> str: + logging.info("Start ASR inference pipeline...") + asr_model = init_asr_model(cfg.asr_model) + device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + asr_model = asr_model.to(device) + asr_model.eval() + + # Setup decoding strategy + decode_function = None + if hasattr(asr_model, 'change_decoding_strategy'): + # Check if ctc or rnnt model + if hasattr(asr_model, 'joint'): # RNNT model + cfg.rnnt_decoding.fused_batch_size = -1 + cfg.rnnt_decoding.compute_langs = cfg.compute_langs + asr_model.change_decoding_strategy(cfg.rnnt_decoding) + decode_function = asr_model.decoding.rnnt_decoder_predictions_tensor + else: + asr_model.change_decoding_strategy(cfg.ctc_decoding) + decode_function = asr_model.decoding.ctc_decoder_predictions_tensor + else: + raise ValueError(f"Only support CTC or RNNT models that have `change_decoding_strategy()` implemented.") + + # Compute output filename + if cfg.output_filename is None: + # create default output filename + if cfg.pred_name_postfix is not None: + cfg.output_filename = cfg.manifest_filepath.replace('.json', f'_{cfg.pred_name_postfix}.json') + else: + tag = f"{cfg.normalize}_{cfg.normalize_type}" + if cfg.use_rttm: + vad_tag = Path(manifest_filepath).stem + vad_tag = vad_tag[len("temp_manifest_vad_rttm_") :] + tag += f"-mask{cfg.feat_mask_val}-{vad_tag}" + cfg.output_filename = cfg.manifest_filepath.replace('.json', f'-{Path(cfg.asr_model).stem}-{tag}.json') + cfg.output_filename = Path(cfg.output_dir) / Path(cfg.output_filename).name + + logging.info("Setting up dataloader for ASR...") + data_config = { + "manifest_filepath": manifest_filepath, + "normalize": cfg.normalize, + "normalize_type": cfg.normalize_type, + "use_rttm": cfg.use_rttm, + "feat_mask_val": cfg.feat_mask_val, + "frame_unit_time_secs": cfg.frame_unit_time_secs, + } + logging.info(f"use_rttm = {cfg.use_rttm}") + if hasattr(asr_model, "tokenizer"): + dataset = feature_to_text_dataset.get_bpe_dataset(config=data_config, tokenizer=asr_model.tokenizer) + else: + data_config["labels"] = asr_model.decoder.vocabulary + dataset = feature_to_text_dataset.get_char_dataset(config=data_config) + + dataloader = torch.utils.data.DataLoader( + dataset=dataset, + batch_size=cfg.batch_size, + collate_fn=dataset._collate_fn, + drop_last=False, + shuffle=False, + num_workers=cfg.get('num_workers', 0), + pin_memory=cfg.get('pin_memory', False), + ) + + logging.info("Start transcribing...") + hypotheses = [] + all_hypotheses = [] + t0 = time.time() + with autocast(): + with torch.no_grad(): + with record_fn("asr_infer_loop"): + for test_batch in tqdm(dataloader, desc="Transcribing"): + with record_fn("asr_infer_model"): + outputs = asr_model.forward( + processed_signal=test_batch[0].to(device), + processed_signal_length=test_batch[1].to(device), + ) + with record_fn("asr_infer_other"): + logits, logits_len = outputs[0], outputs[1] + + current_hypotheses, all_hyp = decode_function(logits, logits_len, return_hypotheses=False,) + + hypotheses += current_hypotheses + if all_hyp is not None: + all_hypotheses += all_hyp + else: + all_hypotheses += current_hypotheses + + del logits + del test_batch + t1 = time.time() + logging.info(f"Time elapsed: {t1 - t0: .2f} seconds") + + logging.info("Finished transcribing.") + # Save output to manifest + input_manifest_data = read_manifest(manifest_filepath) + manifest_data = read_manifest(cfg.manifest_filepath) + groundtruth = [] + for i in range(len(manifest_data)): + groundtruth.append(manifest_data[i]["text"]) + manifest_data[i]["pred_text"] = hypotheses[i] + manifest_data[i]["feature_file"] = input_manifest_data[i]["feature_file"] + if "rttm_file" in input_manifest_data[i]: + manifest_data[i]["feature_file"] = input_manifest_data[i]["feature_file"] + + write_manifest(cfg.output_filename, manifest_data) + + if cfg.use_pure_noise: + hypotheses = " ".join(hypotheses) + words = hypotheses.split() + chars = "".join(words) + logging.info("-----------------------------------------") + logging.info(f"Number of hallucinated characters={len(chars)}") + logging.info(f"Number of hallucinated words={len(words)}") + logging.info(f"Concatenated predictions: {hypotheses}") + logging.info("-----------------------------------------") + else: + wer_score = word_error_rate(hypotheses=hypotheses, references=groundtruth) + logging.info("-----------------------------------------") + logging.info(f"WER={wer_score*100:.2f}") + logging.info("-----------------------------------------") + + logging.info(f"ASR output saved at {cfg.output_filename}") + return cfg.output_filename + + +if __name__ == "__main__": + main() diff --git a/nemo/collections/asr/data/audio_to_label.py b/nemo/collections/asr/data/audio_to_label.py index cc41fb5e0bd5..e8c4e2791e2e 100644 --- a/nemo/collections/asr/data/audio_to_label.py +++ b/nemo/collections/asr/data/audio_to_label.py @@ -168,7 +168,7 @@ def _vad_frame_seq_collate_fn(self, batch): """ slice_length = int(self.featurizer.sample_rate * self.window_length_in_sec) _, audio_lengths, _, tokens_lengths = zip(*batch) - slice_length = min(slice_length, max(audio_lengths)) + slice_length = int(min(slice_length, max(audio_lengths))) shift = int(self.featurizer.sample_rate * self.shift_length_in_sec) has_audio = audio_lengths[0] is not None diff --git a/nemo/collections/asr/data/feature_to_label.py b/nemo/collections/asr/data/feature_to_label.py index dcba068d7a71..673f50374581 100644 --- a/nemo/collections/asr/data/feature_to_label.py +++ b/nemo/collections/asr/data/feature_to_label.py @@ -15,6 +15,7 @@ import torch +from nemo.collections.asr.parts.preprocessing.feature_loader import ExternalFeatureLoader from nemo.collections.common.parts.preprocessing import collections from nemo.core.classes import Dataset from nemo.core.neural_types import AcousticEncodedRepresentation, LabelsType, LengthsType, NeuralType @@ -22,24 +23,141 @@ def _feature_collate_fn(batch): - """collate batch of feat sig, feat len, tokens, tokens len + """collate batch of feat sig, feat len, labels, labels len, assuming all features have the same shape. Args: batch (FloatTensor, LongTensor, LongTensor, LongTensor): A tuple of tuples of feature, feature lengths, - encoded tokens, and encoded tokens length. + encoded labels, and encoded labels length. """ - _, feat_lengths, _, tokens_lengths = zip(*batch) - feat_signal, tokens = [], [] - for sig, sig_len, tokens_i, tokens_i_len in batch: - feat_signal.append(sig) - tokens.append(tokens_i) + packed_batch = list(zip(*batch)) + if len(packed_batch) == 5: + _, feat_lengths, _, labels_lengths, sample_ids = packed_batch + elif len(packed_batch) == 4: + sample_ids = None + _, feat_lengths, _, labels_lengths = packed_batch + else: + raise ValueError("Expects 4 or 5 tensors in the batch!") - feat_signal = torch.stack(feat_signal) + features, labels = [], [] + for b in batch: + feat_i, labels_i = b[0], b[2] + features.append(feat_i) + labels.append(labels_i) + + features = torch.stack(features) feat_lengths = torch.stack(feat_lengths) - tokens = torch.stack(tokens) - tokens_lengths = torch.stack(tokens_lengths) + labels = torch.stack(labels) + labels_lengths = torch.stack(labels_lengths) + + if sample_ids is None: + return features, feat_lengths, labels, labels_lengths + else: + sample_ids = torch.tensor(sample_ids, dtype=torch.int32) + return features, feat_lengths, labels, labels_lengths, sample_ids + + +def _audio_feature_collate_fn(batch, feat_pad_val, label_pad_id): + """collate batch of audio feature, audio len, labels, labels len + Args: + batch (Optional[FloatTensor], Optional[LongTensor], LongTensor, + LongTensor): A tuple of tuples of feature, feature lengths, + labels, and label lengths. This collate func assumes the + features are torch tensors of Log-Melspectrogram (i.e. [N_MEL, T]). + """ + packed_batch = list(zip(*batch)) + if len(packed_batch) == 5: + _, feat_lengths, _, labels_lengths, sample_ids = packed_batch + elif len(packed_batch) == 4: + sample_ids = None + _, feat_lengths, _, labels_lengths = packed_batch + else: + raise ValueError("Expects 4 or 5 tensors in the batch!") + max_feat_len = 0 + has_feat = feat_lengths[0] is not None + if has_feat: + max_feat_len = max(feat_lengths).item() + max_labels_len = max(labels_lengths).item() + + features, labels = [], [] + for b in batch: + feat_i, feat_i_len, label_i, label_i_len = b[0], b[1], b[2], b[3] + + if has_feat: + feat_i_len = feat_i_len.item() + if feat_i_len < max_feat_len: + pad = (0, max_feat_len - feat_i_len) + feat_i = torch.nn.functional.pad(feat_i, pad, value=feat_pad_val) + features.append(feat_i) + + label_i_len = label_i_len.item() + if label_i_len < max_labels_len: + pad = (0, max_labels_len - label_i_len) + label_i = torch.nn.functional.pad(label_i, pad, value=label_pad_id) + labels.append(label_i) + + if has_feat: + features = torch.stack(features) + feature_lengths = torch.stack(feat_lengths) + else: + features, feat_lengths = None, None + labels = torch.stack(labels) + labels_lengths = torch.stack(labels_lengths) + + if sample_ids is None: + return features, feature_lengths, labels, labels_lengths + else: + sample_ids = torch.tensor(sample_ids, dtype=torch.int32) + return features, feature_lengths, labels, labels_lengths, sample_ids + + +def _vad_feature_segment_collate_fn(batch, window_length_in_sec, shift_length_in_sec, frame_unit_in_sec): + """collate batch of audio features, features len, tokens, tokens len + Args: + batch (Optional[FloatTensor], Optional[LongTensor], LongTensor, + LongTensor): A tuple of tuples of signal, signal lengths, + encoded tokens, and encoded tokens length. This collate func + assumes the signals are 1d torch tensors (i.e. mono audio). + batch size equals to 1. + """ + slice_length = int(window_length_in_sec / frame_unit_in_sec) + audio_features, feat_lengths, _, tokens_lengths = zip(*batch) + + slice_length = int(min(slice_length, max(feat_lengths))) + shift = int(shift_length_in_sec / frame_unit_in_sec) + has_audio = feat_lengths[0] is not None + + f_dim = audio_features[0].shape[0] + audio_features, num_slices, tokens, feat_lengths = [], [], [], [] + append_len_start = torch.div(slice_length, 2, rounding_mode='trunc') + append_len_end = slice_length - torch.div(slice_length, 2, rounding_mode='trunc') + for feat_i, feat_i_len, tokens_i, _ in batch: + start = torch.zeros(f_dim, append_len_start) + end = torch.zeros(f_dim, append_len_end) + feat_i = torch.cat((start, feat_i, end), dim=1) + feat_i_len += slice_length + + if has_audio: + slices = max(1, torch.div(feat_i_len - slice_length, shift, rounding_mode='trunc')) - return feat_signal, feat_lengths, tokens, tokens_lengths + for slice_id in range(slices): + start_idx = slice_id * shift + end_idx = start_idx + slice_length + feat_slice = feat_i[:, start_idx:end_idx] + audio_features.append(feat_slice) + + num_slices.append(slices) + tokens.extend([tokens_i] * slices) + feat_lengths.extend([slice_length] * slices) + + if has_audio: + audio_features = torch.stack(audio_features) + feat_lengths = torch.tensor(feat_lengths) + else: + audio_features, feat_lengths = None, None + + tokens = torch.stack(tokens) + tokens_lengths = torch.tensor(num_slices) + return audio_features, feat_lengths, tokens, tokens_lengths class _FeatureSeqSpeakerLabelDataset(Dataset): @@ -137,3 +255,82 @@ class FeatureToSeqSpeakerLabelDataset(_FeatureSeqSpeakerLabelDataset): def _collate_fn(self, batch): return _feature_collate_fn(batch) + + +class FeatureToLabelDataset(Dataset): + """ + Dataset that loads tensors via a json file containing paths to feature files and their labels. + Each new line is a different sample. Example below: + and their target labels. JSON files should be of the following format: + {"feature_filepath": "/path/to/audio_feature.pt", "label": "1"} \ + ... + {"feature_filepath": "/path/to/audio_feature.pt", "label": "0"} + Args: + manifest_filepath (str): Dataset parameter. Path to JSON containing data. + labels (Optional[list]): Dataset parameter. List of unique labels collected from all samples. + augmentor (Optional): feature augmentation + + """ + + ZERO_LEVEL_SPEC_DB_VAL = -16.635 # Log-Melspectrogram value for zero signal + FRAME_UNIT_TIME_SECS = 0.01 + + @property + def output_types(self) -> Optional[Dict[str, NeuralType]]: + """Returns definitions of module output ports. + """ + output_types = { + 'audio_feat': NeuralType(('B', 'D', 'T'), AcousticEncodedRepresentation()), + 'feat_length': NeuralType(tuple('B'), LengthsType()), + 'labels': NeuralType(('B'), LabelsType()), + 'labels_length': NeuralType(tuple('B'), LengthsType()), + } + + return output_types + + def __init__( + self, + *, + manifest_filepath: str, + labels: List[str] = None, + augmentor: 'nemo.collections.asr.parts.perturb.AudioAugmentor' = None, + window_length_in_sec: float = 0.63, + shift_length_in_sec: float = 0.01, + ): + super().__init__() + self.window_length_in_sec = window_length_in_sec + self.shift_length_in_sec = shift_length_in_sec + self.collection = collections.ASRFeatureLabel(manifests_files=manifest_filepath.split(','),) + + self.feature_loader = ExternalFeatureLoader(augmentor=augmentor) + self.labels = labels if labels else self.collection.uniq_labels + + self.label2id, self.id2label = {}, {} + for label_id, label in enumerate(self.labels): + self.label2id[label] = label_id + self.id2label[label_id] = label + + for idx in range(len(self.labels[:5])): + logging.debug(" label id {} and its mapped label {}".format(idx, self.id2label[idx])) + + def __len__(self): + return len(self.collection) + + def __getitem__(self, index): + sample = self.collection[index] + + features = self.feature_loader.process(sample.feature_file) + f, fl = features, torch.tensor(features.shape[1]).long() + + t = torch.tensor(self.label2id[sample.label]) + tl = torch.tensor(1).long() + + return f, fl, t, tl + + def _collate_fn(self, batch): + return _audio_feature_collate_fn(batch, self.ZERO_LEVEL_SPEC_DB_VAL, 0) + + def _vad_segment_collate_fn(self, batch): + return _vad_feature_segment_collate_fn( + batch, self.window_length_in_sec, self.shift_length_in_sec, self.FRAME_UNIT_TIME_SECS + ) diff --git a/nemo/collections/asr/data/feature_to_label_dataset.py b/nemo/collections/asr/data/feature_to_label_dataset.py index b88cba52a729..dabe06aa62bb 100644 --- a/nemo/collections/asr/data/feature_to_label_dataset.py +++ b/nemo/collections/asr/data/feature_to_label_dataset.py @@ -11,6 +11,8 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. +from typing import Optional + from nemo.collections.asr.data import feature_to_label @@ -29,3 +31,16 @@ def get_feature_seq_speakerlabel_dataset( manifest_filepath=config['manifest_filepath'], labels=config['labels'], feature_loader=feature_loader, ) return dataset + + +def get_feature_label_dataset( + config: dict, augmentor: Optional['AudioAugmentor'] = None +) -> feature_to_label.FeatureToLabelDataset: + dataset = feature_to_label.FeatureToLabelDataset( + manifest_filepath=config['manifest_filepath'], + labels=config['labels'], + augmentor=augmentor, + window_length_in_sec=config.get("window_length_in_sec", 0.63), + shift_length_in_sec=config.get("shift_length_in_sec", 0.01), + ) + return dataset diff --git a/nemo/collections/asr/data/feature_to_text.py b/nemo/collections/asr/data/feature_to_text.py new file mode 100644 index 000000000000..eaec7b3afba5 --- /dev/null +++ b/nemo/collections/asr/data/feature_to_text.py @@ -0,0 +1,451 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Callable, Dict, List, Optional, Tuple, Union + +import torch + +from nemo.collections.asr.data.feature_to_label import _audio_feature_collate_fn +from nemo.collections.asr.parts.preprocessing.feature_loader import ExternalFeatureLoader +from nemo.collections.asr.parts.preprocessing.features import normalize_batch +from nemo.collections.asr.parts.utils.audio_utils import ChannelSelectorType +from nemo.collections.asr.parts.utils.vad_utils import load_speech_segments_from_rttm +from nemo.collections.common import tokenizers +from nemo.collections.common.parts.preprocessing import collections, parsers +from nemo.core.classes import Dataset +from nemo.core.neural_types import AcousticEncodedRepresentation, LabelsType, LengthsType, NeuralType + + +class ASRFeatureManifestProcessor: + def __init__( + self, + manifest_filepath: str, + parser: Union[str, Callable], + max_duration: Optional[float] = None, + min_duration: Optional[float] = None, + max_utts: int = 0, + bos_id: Optional[int] = None, + eos_id: Optional[int] = None, + pad_id: int = 0, + index_by_file_id: bool = False, + ): + self.parser = parser + self.collection = collections.ASRFeatureText( + manifests_files=manifest_filepath, + parser=parser, + min_duration=min_duration, + max_duration=max_duration, + max_number=max_utts, + index_by_file_id=index_by_file_id, + ) + + self.eos_id = eos_id + self.bos_id = bos_id + self.pad_id = pad_id + + def process_text_by_id(self, index: int) -> Tuple[List[int], int]: + sample = self.collection[index] + return self.process_text_by_sample(sample) + + def process_text_by_file_id(self, file_id: str) -> Tuple[List[int], int]: + manifest_idx = self.collection.mapping[file_id][0] + sample = self.collection[manifest_idx] + return self.process_text_by_sample(sample) + + def process_text_by_sample(self, sample: collections.ASRAudioText.OUTPUT_TYPE) -> Tuple[List[int], int]: + t, tl = sample.text_tokens, len(sample.text_tokens) + + if self.bos_id is not None: + t = [self.bos_id] + t + tl += 1 + if self.eos_id is not None: + t = t + [self.eos_id] + tl += 1 + + return t, tl + + +class _FeatureTextDataset(Dataset): + """ + Dataset that loads tensors via a json file containing paths to audio feature files, transcripts, + durations (in seconds) and optional RTTM files. Each new line is a different sample. Example below: + {"feature_filepath": "/path/to/audio_feature.pt", "text_filepath": "/path/to/audio.txt", + "rttm_filepath": "/path/to/audio_rttm.rttm", "duration": 23.147} + ... + {"feature_filepath": "/path/to/audio_feature.pt", "text": "the transcription", "offset": 301.75, "duration": 0.82, "utt": + "utterance_id", "ctm_utt": "en_4156", "side": "A"} + Args: + manifest_filepath: Path to manifest json as described above. Can be comma-separated paths. + parser: Str for a language specific preprocessor or a callable. + normalize: whether and where to normalize feature, must be one of [None, "post_norm", "pre_norm"] + normalize_type (Union[str, dict]): how to normalize feature, see `nemo.collections.asr.parts.preprocessing.features.normalize_batch` + use_rttm: whether to use RTTM files if there is any, default to False + feat_mask_val (Optional[float]): value used to mask features with RTTM files, default to None to use zero mel-spectralgram + frame_unit_time_secs (float): time in seconds for each frame + sample_rate (int): Sample rate to resample loaded audio to + int_values (bool): If true, load samples as 32-bit integers. Defauts to False. + augmentor (nemo.collections.asr.parts.perturb.AudioAugmentor): An AudioAugmentor object used to augment loaded + audio + max_duration: If audio exceeds this length, do not include in dataset + min_duration: If audio is less than this length, do not include in dataset + max_utts: Limit number of utterances + trim: whether or not to trim silence. Defaults to False + bos_id: Id of beginning of sequence symbol to append if not None + eos_id: Id of end of sequence symbol to append if not None + pad_id: Id of pad symbol. Defaults to 0 + return_sample_id (bool): whether to return the sample_id as a part of each sample + channel_selector (int | Iterable[int] | str): select a single channel or a subset of channels from multi-channel audio. If set to `'average'`, it performs averaging across channels. Disabled if set to `None`. Defaults to `None`. Uses zero-based indexing. + """ + + ZERO_LEVEL_SPEC_DB_VAL = -16.635 # Log-Melspectrogram value for zero signal + NORM_MODES = ["pre_norm", "post_norm"] + + @property + def output_types(self) -> Optional[Dict[str, NeuralType]]: + """Returns definitions of module output ports. + """ + return { + 'features': NeuralType(('B', 'D', 'T'), AcousticEncodedRepresentation()), + 'feature_length': NeuralType(tuple('B'), LengthsType()), + 'transcripts': NeuralType(('B', 'T'), LabelsType()), + 'transcript_length': NeuralType(tuple('B'), LengthsType()), + 'sample_id': NeuralType(tuple('B'), LengthsType(), optional=True), + } + + def __init__( + self, + manifest_filepath: str, + parser: Union[str, Callable], + normalize: Optional[str] = "post_norm", + normalize_type: Union[str, dict] = "per_feature", + use_rttm: bool = False, + feat_mask_val: Optional[float] = None, + frame_unit_time_secs: float = 0.01, + sample_rate: Optional[int] = 16000, + augmentor: 'nemo.collections.asr.parts.perturb.FeatureAugmentor' = None, + max_duration: Optional[int] = None, + min_duration: Optional[int] = None, + max_utts: int = 0, + trim: bool = False, + bos_id: Optional[int] = None, + eos_id: Optional[int] = None, + pad_id: int = 0, + return_sample_id: bool = False, + channel_selector: Optional[ChannelSelectorType] = None, + ): + if type(manifest_filepath) == str: + manifest_filepath = manifest_filepath.split(",") + + self.sample_rate = sample_rate + self.normalize = normalize + self.normalize_type = normalize_type + self.use_rttm = use_rttm + if feat_mask_val is not None: + self.feat_mask_val = feat_mask_val + elif normalize == "pre_norm": + self.feat_mask_val = 0.0 # similar to SpectralAugmentation + else: + self.feat_mask_val = self.ZERO_LEVEL_SPEC_DB_VAL + + if normalize is not None and normalize not in self.NORM_MODES: + raise ValueError(f"`normalize` must be one of {self.NORM_MODES}, got `{normalize}` instead") + + self.frame_unit_time_secs = frame_unit_time_secs + + self.manifest_processor = ASRFeatureManifestProcessor( + manifest_filepath=manifest_filepath, + parser=parser, + max_duration=max_duration, + min_duration=min_duration, + max_utts=max_utts, + bos_id=bos_id, + eos_id=eos_id, + pad_id=pad_id, + ) + self.featurizer = ExternalFeatureLoader(augmentor=augmentor) + self.trim = trim + self.return_sample_id = return_sample_id + self.channel_selector = channel_selector + + def get_manifest_sample(self, sample_id): + return self.manifest_processor.collection[sample_id] + + def __getitem__(self, index): + sample = self.manifest_processor.collection[index] + offset = sample.offset + + if offset is None: + offset = 0 + + features = self.featurizer.process(sample.feature_file) + + f, fl = features, torch.tensor(features.shape[1]).long() + + t, tl = self.manifest_processor.process_text_by_sample(sample=sample) + + # Feature normalization + if self.normalize is None: + if self.use_rttm and sample.rttm_file: + f = self.mask_features_from_rttm(f, offset, sample.rttm_file, self.feat_mask_val) + elif self.normalize == "post_norm": + # (Optional) Masking based on RTTM file + if self.use_rttm and sample.rttm_file: + f = self.mask_features_from_rttm(f, offset, sample.rttm_file, self.feat_mask_val) + f = self.normalize_feature(f) + else: # pre-norm + f = self.normalize_feature(f) + # (Optional) Masking based on RTTM file + if self.use_rttm and sample.rttm_file: + f = self.mask_features_from_rttm(f, offset, sample.rttm_file, self.feat_mask_val) + + if self.return_sample_id: + output = f, fl, torch.tensor(t).long(), torch.tensor(tl).long(), index + else: + output = f, fl, torch.tensor(t).long(), torch.tensor(tl).long() + + return output + + def mask_features_from_rttm(self, features, offset, rttm_file, mask_val): + segments = load_speech_segments_from_rttm(rttm_file) + sid = 0 + for i in range(features.size(1)): + t = offset + i * self.frame_unit_time_secs + while sid < len(segments) - 1 and segments[sid][1] < t: + sid += 1 + if segments[sid][1] == 0 or t < segments[sid][0] or t > segments[sid][1]: + features[:, i] = mask_val + + return features + + def __len__(self): + return len(self.manifest_processor.collection) + + def _collate_fn(self, batch): + return _audio_feature_collate_fn( + batch, feat_pad_val=self.feat_mask_val, label_pad_id=self.manifest_processor.pad_id + ) + + def normalize_feature(self, feat): + """ + Args: + feat: feature tensor of shape [M, T] + """ + feat = feat.unsqueeze(0) # add batch dim + feat, _, _ = normalize_batch(feat, torch.tensor([feat.size(-1)]), self.normalize_type) + return feat.squeeze(0) # delete batch dim + + +class FeatureToCharDataset(_FeatureTextDataset): + """ + Dataset that loads tensors via a json file containing paths to audio feature + files, transcripts, durations (in seconds) and optional RTTM files. Each new line is a + different sample. Example below: + {"feature_filepath": "/path/to/audio_feature.pt", "text_filepath": + "/path/to/audio.txt", "duration": 23.147, "rttm_filepath": "/path/to/audio_rttm.rttm",} + ... + {"feature_filepath": "/path/to/audio_feature.pt", "text": "the + transcription", "offset": 301.75, "duration": 0.82, "utt": + "utterance_id", "ctm_utt": "en_4156", "side": "A"} + + Args: + manifest_filepath: Path to manifest json as described above. Can + be comma-separated paths. + labels: String containing all the possible characters to map to + normalize: how to normalize feature, must be one of [None, "post_norm", "pre_norm"] + normalize_type (Union[str, dict]): how to normalize feature, see `nemo.collections.asr.parts.preprocessing.features.normalize_batch` + use_rttm: whether to use RTTM files if there is any, default to False + feat_mask_val (Optional[float]): value used to mask features with RTTM files, default to None to use zero mel-spectralgram + frame_unit_time_secs: time in seconds for each frame + sample_rate (int): Sample rate to resample loaded audio to + int_values (bool): If true, load samples as 32-bit integers. Defauts to False. + augmentor (nemo.collections.asr.parts.perturb.AudioAugmentor): An AudioAugmentor + object used to augment loaded audio + max_duration: If audio exceeds this length, do not include in dataset + min_duration: If audio is less than this length, do not include + in dataset + max_utts: Limit number of utterances + blank_index: blank character index, default = -1 + unk_index: unk_character index, default = -1 + bos_id: Id of beginning of sequence symbol to append if not None + eos_id: Id of end of sequence symbol to append if not None + return_sample_id (bool): whether to return the sample_id as a part of each sample + channel_selector (int | Iterable[int] | str): select a single channel or a subset of channels from multi-channel audio. If set to `'average'`, it performs averaging across channels. Disabled if set to `None`. Defaults to `None`. Uses zero-based indexing. + """ + + def __init__( + self, + manifest_filepath: str, + labels: Union[str, List[str]], + normalize: Optional[str] = "post_norm", + normalize_type: Union[str, dict] = "per_feature", + use_rttm: bool = False, + feat_mask_val: Optional[float] = None, + frame_unit_time_secs: float = 0.01, + sample_rate: Optional[int] = 16000, + augmentor: 'nemo.collections.asr.parts.perturb.FeatureAugmentor' = None, + max_duration: Optional[int] = None, + min_duration: Optional[int] = None, + max_utts: int = 0, + blank_index: int = -1, + unk_index: int = -1, + trim: bool = False, + bos_id: Optional[int] = None, + eos_id: Optional[int] = None, + pad_id: int = 0, + parser: Union[str, Callable] = 'en', + return_sample_id: bool = False, + channel_selector: Optional[ChannelSelectorType] = None, + ): + self.labels = labels + + parser = parsers.make_parser( + labels=labels, name=parser, unk_id=unk_index, blank_id=blank_index, do_normalize=normalize + ) + + super().__init__( + manifest_filepath=manifest_filepath, + parser=parser, + normalize=normalize, + normalize_type=normalize_type, + use_rttm=use_rttm, + feat_mask_val=feat_mask_val, + frame_unit_time_secs=frame_unit_time_secs, + sample_rate=sample_rate, + augmentor=augmentor, + max_duration=max_duration, + min_duration=min_duration, + max_utts=max_utts, + trim=trim, + bos_id=bos_id, + eos_id=eos_id, + pad_id=pad_id, + return_sample_id=return_sample_id, + channel_selector=channel_selector, + ) + + +class FeatureToBPEDataset(_FeatureTextDataset): + """ + Dataset that loads tensors via a json file containing paths to audio feature + files, transcripts, durations (in seconds) and optional RTTM files. Each new line is a different sample. + Example below: + {"audio_filepath": "/path/to/audio.wav", "text_filepath": + "/path/to/audio.txt", "duration": 23.147, "rttm_filepath": "/path/to/audio_rttm.rttm",} + ... + {"audio_filepath": "/path/to/audio.wav", "text": "the + transcription", "offset": 301.75, "duration": 0.82, "utt": + "utterance_id", "ctm_utt": "en_4156", "side": "A"} + + In practice, the dataset and manifest used for character encoding and byte pair encoding + are exactly the same. The only difference lies in how the dataset tokenizes the text in + the manifest. + + Args: + manifest_filepath: Path to manifest json as described above. Can + be comma-separated paths. + tokenizer: A subclass of the Tokenizer wrapper found in the common collection, + nemo.collections.common.tokenizers.TokenizerSpec. ASR Models support a subset of + all available tokenizers. + normalize: how to normalize feature, must be one of [None, "post_norm", "pre_norm"] + normalize_type (Union[str, dict]): how to normalize feature, see `nemo.collections.asr.parts.preprocessing.features.normalize_batch` + use_rttm: whether to use RTTM files if there is any, default to False + feat_mask_val (Optional[float]): value used to mask features with RTTM files, default to None to use zero mel-spectralgram + frame_unit_time_secs: time in seconds for each frame + sample_rate (int): Sample rate to resample loaded audio to + int_values (bool): If true, load samples as 32-bit integers. Defauts to False. + augmentor (nemo.collections.asr.parts.perturb.AudioAugmentor): An AudioAugmentor + object used to augment loaded audio + max_duration: If audio exceeds this length, do not include in dataset + min_duration: If audio is less than this length, do not include + in dataset + max_utts: Limit number of utterances + trim: Whether to trim silence segments + use_start_end_token: Boolean which dictates whether to add [BOS] and [EOS] + tokens to beginning and ending of speech respectively. + return_sample_id (bool): whether to return the sample_id as a part of each sample + channel_selector (int | Iterable[int] | str): select a single channel or a subset of channels from multi-channel audio. If set to `'average'`, it performs averaging across channels. Disabled if set to `None`. Defaults to `None`. Uses zero-based indexing. + """ + + def __init__( + self, + manifest_filepath: str, + tokenizer: 'nemo.collections.common.tokenizers.TokenizerSpec', + normalize: Optional[str] = "post_norm", + normalize_type: Union[str, dict] = "per_feature", + use_rttm: bool = False, + feat_mask_val: Optional[float] = None, + frame_unit_time_secs: float = 0.01, + sample_rate: Optional[int] = 16000, + augmentor: 'nemo.collections.asr.parts.perturb.FeatureAugmentor' = None, + max_duration: Optional[int] = None, + min_duration: Optional[int] = None, + max_utts: int = 0, + use_start_end_token: bool = True, + trim: bool = False, + return_sample_id: bool = False, + channel_selector: Optional[ChannelSelectorType] = None, + ): + if use_start_end_token and hasattr(tokenizer, "bos_id") and tokenizer.bos_id > 0: + bos_id = tokenizer.bos_id + else: + bos_id = None + + if use_start_end_token and hasattr(tokenizer, "eos_id") and tokenizer.eos_id > 0: + eos_id = tokenizer.eos_id + else: + eos_id = None + + if hasattr(tokenizer, "pad_id") and tokenizer.pad_id > 0: + pad_id = tokenizer.pad_id + else: + pad_id = 0 + + class TokenizerWrapper: + def __init__(self, tokenizer): + if isinstance(tokenizer, tokenizers.aggregate_tokenizer.AggregateTokenizer): + self.is_aggregate = True + else: + self.is_aggregate = False + self._tokenizer = tokenizer + + def __call__(self, *args): + if isinstance(args[0], List) and self.is_aggregate: + t = [] + for span in args[0]: + t.extend(self._tokenizer.text_to_ids(span['str'], span['lang'])) + return t + + t = self._tokenizer.text_to_ids(*args) + return t + + super().__init__( + manifest_filepath=manifest_filepath, + parser=TokenizerWrapper(tokenizer), + normalize=normalize, + normalize_type=normalize_type, + use_rttm=use_rttm, + feat_mask_val=feat_mask_val, + frame_unit_time_secs=frame_unit_time_secs, + sample_rate=sample_rate, + augmentor=augmentor, + max_duration=max_duration, + min_duration=min_duration, + max_utts=max_utts, + trim=trim, + bos_id=bos_id, + eos_id=eos_id, + pad_id=pad_id, + return_sample_id=return_sample_id, + channel_selector=channel_selector, + ) diff --git a/nemo/collections/asr/data/feature_to_text_dataset.py b/nemo/collections/asr/data/feature_to_text_dataset.py new file mode 100644 index 000000000000..7efd3be3cd24 --- /dev/null +++ b/nemo/collections/asr/data/feature_to_text_dataset.py @@ -0,0 +1,90 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Optional + +from nemo.collections.asr.data.feature_to_text import FeatureToBPEDataset, FeatureToCharDataset +from nemo.utils import logging + + +def get_char_dataset(config: dict, augmentor: Optional['FeatureAugmentor'] = None) -> FeatureToCharDataset: + """ + Instantiates a Character Encoding based FeatureToCharDataset. + + Args: + config: Config of the FeatureToCharDataset. + augmentor: Optional AudioAugmentor object for augmentations on audio data. + + Returns: + An instance of FeatureToCharDataset. + """ + if 'labels' not in config: + logging.warning(f"dataset does not have explicitly defined labels") + + dataset = FeatureToCharDataset( + manifest_filepath=config['manifest_filepath'], + labels=config.get('labels', None), + normalize=config.get('normalize', 'post_norm'), + normalize_type=config.get('normalize_type', 'per_feature'), + use_rttm=config.get('use_rttm', False), + feat_mask_val=config.get('feat_mask_val', None), + frame_unit_time_secs=config.get('frame_unit_time_secs', 0.01), + sample_rate=config.get('sample_rate', 16000), + augmentor=augmentor, + max_duration=config.get('max_duration', None), + min_duration=config.get('min_duration', None), + max_utts=config.get('max_utts', 0), + blank_index=config.get('blank_index', -1), + unk_index=config.get('unk_index', -1), + trim=config.get('trim_silence', False), + parser=config.get('parser', 'en'), + return_sample_id=config.get('return_sample_id', False), + channel_selector=config.get('channel_selector', None), + ) + return dataset + + +def get_bpe_dataset( + config: dict, tokenizer: 'TokenizerSpec', augmentor: Optional['FeatureAugmentor'] = None +) -> FeatureToBPEDataset: + """ + Instantiates a Byte Pair Encoding / Word Piece Encoding based FeatureoToBPEDataset. + + Args: + config: Config of the FeatureToBPEDataset. + tokenizer: An instance of a TokenizerSpec object. + augmentor: Optional FeatureAugmentor object for augmentations on audio features. + + Returns: + An instance of FeatureToBPEDataset. + """ + dataset = FeatureToBPEDataset( + manifest_filepath=config['manifest_filepath'], + tokenizer=tokenizer, + normalize=config.get('normalize', 'post_norm'), + normalize_type=config.get('normalize_type', 'per_feature'), + use_rttm=config.get('use_rttm', False), + feat_mask_val=config.get('feat_mask_val', None), + frame_unit_time_secs=config.get('frame_unit_time_secs', 0.01), + sample_rate=config.get('sample_rate', 16000), + augmentor=augmentor, + max_duration=config.get('max_duration', None), + min_duration=config.get('min_duration', None), + max_utts=config.get('max_utts', 0), + trim=config.get('trim_silence', False), + use_start_end_token=config.get('use_start_end_token', True), + return_sample_id=config.get('return_sample_id', False), + channel_selector=config.get('channel_selector', None), + ) + return dataset diff --git a/nemo/collections/asr/models/classification_models.py b/nemo/collections/asr/models/classification_models.py index 03f7dea7cf83..818048cdcdaa 100644 --- a/nemo/collections/asr/models/classification_models.py +++ b/nemo/collections/asr/models/classification_models.py @@ -25,7 +25,7 @@ from pytorch_lightning import Trainer from torchmetrics.regression import MeanAbsoluteError, MeanSquaredError -from nemo.collections.asr.data import audio_to_label_dataset +from nemo.collections.asr.data import audio_to_label_dataset, feature_to_label_dataset from nemo.collections.asr.models.asr_model import ASRModel, ExportableEncDecModel from nemo.collections.asr.parts.preprocessing.features import WaveformFeaturizer from nemo.collections.asr.parts.preprocessing.perturb import process_augmentations @@ -124,8 +124,10 @@ def input_types(self) -> Optional[Dict[str, NeuralType]]: else: audio_eltype = AudioSignal() return { - "input_signal": NeuralType(('B', 'T'), audio_eltype), - "input_signal_length": NeuralType(tuple('B'), LengthsType()), + "input_signal": NeuralType(('B', 'T'), audio_eltype, optional=True), + "input_signal_length": NeuralType(tuple('B'), LengthsType(), optional=True), + "processed_signal": NeuralType(('B', 'D', 'T'), SpectrogramType(), optional=True), + "processed_signal_length": NeuralType(tuple('B'), LengthsType(), optional=True), } @property @@ -133,19 +135,30 @@ def input_types(self) -> Optional[Dict[str, NeuralType]]: def output_types(self) -> Optional[Dict[str, NeuralType]]: pass - def forward(self, input_signal, input_signal_length): - processed_signal, processed_signal_len = self.preprocessor( - input_signal=input_signal, length=input_signal_length, - ) + def forward( + self, input_signal=None, input_signal_length=None, processed_signal=None, processed_signal_length=None + ): + has_input_signal = input_signal is not None and input_signal_length is not None + has_processed_signal = processed_signal is not None and processed_signal_length is not None + if (has_input_signal ^ has_processed_signal) == False: + raise ValueError( + f"{self} Arguments ``input_signal`` and ``input_signal_length`` are mutually exclusive " + " with ``processed_signal`` and ``processed_signal_length`` arguments." + ) + + if not has_processed_signal: + processed_signal, processed_signal_length = self.preprocessor( + input_signal=input_signal, length=input_signal_length, + ) # Crop or pad is always applied if self.crop_or_pad is not None: - processed_signal, processed_signal_len = self.crop_or_pad( - input_signal=processed_signal, length=processed_signal_len + processed_signal, processed_signal_length = self.crop_or_pad( + input_signal=processed_signal, length=processed_signal_length ) # Spec augment is not applied during evaluation/testing if self.spec_augmentation is not None and self.training: - processed_signal = self.spec_augmentation(input_spec=processed_signal, length=processed_signal_len) - encoded, encoded_len = self.encoder(audio_signal=processed_signal, length=processed_signal_len) + processed_signal = self.spec_augmentation(input_spec=processed_signal, length=processed_signal_length) + encoded, encoded_len = self.encoder(audio_signal=processed_signal, length=processed_signal_length) logits = self.decoder(encoder_output=encoded) return logits @@ -179,14 +192,17 @@ def setup_validation_data(self, val_data_config: Optional[Union[DictConfig, Dict self._validation_dl = self._setup_dataloader_from_config(config=DictConfig(val_data_config)) - def setup_test_data(self, test_data_config: Optional[Union[DictConfig, Dict]]): + def setup_test_data(self, test_data_config: Optional[Union[DictConfig, Dict]], use_feat: bool = False): if 'shuffle' not in test_data_config: test_data_config['shuffle'] = False # preserve config self._update_dataset_config(dataset_name='test', config=test_data_config) - self._test_dl = self._setup_dataloader_from_config(config=DictConfig(test_data_config)) + if use_feat: + self._test_dl = self._setup_feature_label_dataloader(config=DictConfig(test_data_config)) + else: + self._test_dl = self._setup_dataloader_from_config(config=DictConfig(test_data_config)) def test_dataloader(self): if self._test_dl is not None: @@ -266,6 +282,43 @@ def _setup_dataloader_from_config(self, config: DictConfig): pin_memory=config.get('pin_memory', False), ) + def _setup_feature_label_dataloader(self, config: DictConfig) -> torch.utils.data.DataLoader: + """ + setup dataloader for VAD inference with audio features as input + """ + + OmegaConf.set_struct(config, False) + config.is_regression_task = self.is_regression_task + OmegaConf.set_struct(config, True) + + if 'augmentor' in config: + augmentor = process_augmentations(config['augmentor']) + else: + augmentor = None + if 'manifest_filepath' in config and config['manifest_filepath'] is None: + logging.warning(f"Could not load dataset as `manifest_filepath` is None. Provided config : {config}") + return None + + dataset = feature_to_label_dataset.get_feature_label_dataset(config=config, augmentor=augmentor) + if 'vad_stream' in config and config['vad_stream']: + collate_func = dataset._vad_segment_collate_fn + batch_size = 1 + shuffle = False + else: + collate_func = dataset._collate_fn + batch_size = config['batch_size'] + shuffle = config['shuffle'] + + return torch.utils.data.DataLoader( + dataset=dataset, + batch_size=batch_size, + collate_fn=collate_func, + drop_last=config.get('drop_last', False), + shuffle=shuffle, + num_workers=config.get('num_workers', 0), + pin_memory=config.get('pin_memory', False), + ) + @torch.no_grad() def transcribe(self, paths2audio_files: List[str], batch_size: int = 4, logprobs=False) -> List[str]: """ @@ -562,8 +615,15 @@ def multi_test_epoch_end(self, outputs, dataloader_idx: int = 0): return {'log': tensorboard_log} @typecheck() - def forward(self, input_signal, input_signal_length): - logits = super().forward(input_signal=input_signal, input_signal_length=input_signal_length) + def forward( + self, input_signal=None, input_signal_length=None, processed_signal=None, processed_signal_length=None + ): + logits = super().forward( + input_signal=input_signal, + input_signal_length=input_signal_length, + processed_signal=processed_signal, + processed_signal_length=processed_signal_length, + ) return logits def change_labels(self, new_labels: List[str]): diff --git a/nemo/collections/asr/parts/preprocessing/feature_loader.py b/nemo/collections/asr/parts/preprocessing/feature_loader.py index 18866b6e2cee..8c629cf4cfd4 100644 --- a/nemo/collections/asr/parts/preprocessing/feature_loader.py +++ b/nemo/collections/asr/parts/preprocessing/feature_loader.py @@ -11,6 +11,8 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. +from typing import Optional + import numpy as np import torch @@ -20,29 +22,30 @@ class ExternalFeatureLoader(object): Currently support pickle, npy and npz format. """ - def __init__(self, file_path, sample_rate=16000, augmentor=None): - """Load feature samples from file_path. - file_path (str) is the path of the file that stores feature/sample. + def __init__( + self, augmentor: Optional["nemo.collections.asr.parts.perturb.FeatureAugmentor"] = None, + ): + """ + Feature loader """ - self._file_path = file_path self.augmentor = augmentor - def load_feature_from_file(self, file_path): + def load_feature_from_file(self, file_path: str): """Load samples from file_path and convert it to be of type float32 file_path (str) is the path of the file that stores feature/sample. """ - # load pickle/npy/npz file - try: + + if file_path.endswith(".pt") or file_path.endswith(".pth"): + samples = torch.load(file_path, map_location="cpu").float().numpy() + return samples + else: + # load pickle/npy/npz file samples = np.load(file_path, allow_pickle=True) return self._convert_samples_to_float32(samples) - except: - raise TypeError( - "Error open feature files. Probably not supported the format yet. Support pkl, npz, and npy format now. " - ) - # TODO load other type of files such as kaldi io ark + # TODO load other type of files such as kaldi io ark @staticmethod - def _convert_samples_to_float32(samples): + def _convert_samples_to_float32(samples: np.ndarray) -> np.ndarray: """Convert sample type to float32. Integers will be scaled to [-1, 1] in float32. """ @@ -56,9 +59,10 @@ def _convert_samples_to_float32(samples): raise TypeError("Unsupported sample type: %s." % samples.dtype) return float32_samples - def process(self, file_path, offset=0, duration=0, trim=False, orig_sr=None): - feature = self.load_feature_from_file(file_path) - return self.process_segment(feature) + def process(self, file_path: str) -> torch.Tensor: + features = self.load_feature_from_file(file_path) + features = self.process_segment(features) + return features def process_segment(self, feature_segment): if self.augmentor: diff --git a/nemo/collections/asr/parts/utils/vad_utils.py b/nemo/collections/asr/parts/utils/vad_utils.py index f3e64472d21d..490dd4d9f000 100644 --- a/nemo/collections/asr/parts/utils/vad_utils.py +++ b/nemo/collections/asr/parts/utils/vad_utils.py @@ -19,7 +19,7 @@ import shutil from itertools import repeat from pathlib import Path -from typing import Dict, Tuple +from typing import Dict, List, Tuple import IPython.display as ipd import librosa @@ -287,7 +287,7 @@ def generate_overlap_vad_seq( ) else: - for frame_filepath in tqdm(frame_filepathlist, desc='generating preds', leave=True): + for frame_filepath in tqdm(frame_filepathlist, desc='generating preds', leave=False): generate_overlap_vad_seq_per_file(frame_filepath, per_args) return overlap_out_dir @@ -669,17 +669,23 @@ def generate_vad_segment_table_per_file(pred_filepath: str, per_args: dict) -> s out_dir, per_args_float = prepare_gen_segment_table(sequence, per_args) preds = generate_vad_segment_table_per_tensor(sequence, per_args_float) - save_name = name + ".txt" + ext = ".rttm" if per_args.get("use_rttm", False) else ".txt" + save_name = name + ext save_path = os.path.join(out_dir, save_name) - if preds.shape == torch.Size([0]): + if preds.shape[0] == 0: with open(save_path, "w", encoding='utf-8') as fp: - fp.write(f"0 0 speech\n") - + if per_args.get("use_rttm", False): + fp.write(f"SPEAKER 1 0 0 speech \n") + else: + fp.write(f"0 0 speech\n") else: with open(save_path, "w", encoding='utf-8') as fp: for i in preds: - fp.write(f"{i[0]:.4f} {i[2]:.4f} speech\n") + if per_args.get("use_rttm", False): + fp.write(f"SPEAKER {name} 1 {i[0]:.4f} {i[2]:.4f} speech \n") + else: + fp.write(f"{i[0]:.4f} {i[2]:.4f} speech\n") return save_path @@ -722,7 +728,7 @@ def generate_vad_segment_table( "out_dir": table_out_dir, } per_args = {**per_args, **postprocessing_params} - + num_workers = None if num_workers is not None and num_workers > 1: with multiprocessing.Pool(num_workers) as p: inputs = zip(vad_pred_filepath_list, repeat(per_args)) @@ -1048,7 +1054,12 @@ def extract_labels(path2ground_truth_label: str, time: list) -> list: def generate_vad_frame_pred( - vad_model, window_length_in_sec: float, shift_length_in_sec: float, manifest_vad_input: str, out_dir: str + vad_model, + window_length_in_sec: float, + shift_length_in_sec: float, + manifest_vad_input: str, + out_dir: str, + use_feat: bool = False, ) -> str: """ Generate VAD frame level prediction and write to out_dir @@ -1065,10 +1076,13 @@ def generate_vad_frame_pred( logging.info(f"Inference on {len(data)} audio files/json lines!") status = get_vad_stream_status(data) - for i, test_batch in enumerate(vad_model.test_dataloader()): + for i, test_batch in enumerate(tqdm(vad_model.test_dataloader(), total=len(vad_model.test_dataloader()))): test_batch = [x.to(vad_model.device) for x in test_batch] with autocast(): - log_probs = vad_model(input_signal=test_batch[0], input_signal_length=test_batch[1]) + if use_feat: + log_probs = vad_model(processed_signal=test_batch[0], processed_signal_length=test_batch[1]) + else: + log_probs = vad_model(input_signal=test_batch[0], input_signal_length=test_batch[1]) probs = torch.softmax(log_probs, dim=-1) pred = probs[:, 1] @@ -1217,3 +1231,77 @@ def construct_manifest_eval( fout.flush() return aligned_vad_asr_output_manifest + + +def extract_audio_features(vad_model: EncDecClassificationModel, manifest_vad_input: str, out_dir: str) -> str: + """ + Extract audio features and write to out_dir + """ + + file_list = [] + with open(manifest_vad_input, 'r', encoding='utf-8') as fin: + for line in fin.readlines(): + file_list.append(Path(json.loads(line)['audio_filepath']).stem) + + logging.info(f"Extracting features on {len(file_list)} audio files/json lines!") + + for i, test_batch in enumerate(tqdm(vad_model.test_dataloader(), total=len(vad_model.test_dataloader()))): + test_batch = [x.to(vad_model.device) for x in test_batch] + with autocast(): + processed_signal, processed_signal_length = vad_model.preprocessor( + input_signal=test_batch[0], length=test_batch[1], + ) + processed_signal = processed_signal.squeeze(0)[:, :processed_signal_length] + processed_signal = processed_signal.cpu() + outpath = os.path.join(out_dir, file_list[i] + ".pt") + torch.save(processed_signal, outpath) + del test_batch + return out_dir + + +def load_rttm_file(filepath: str) -> pd.DataFrame: + """ + Load rttm file and extract speech segments + """ + if not Path(filepath).exists(): + raise ValueError(f"File not found: {filepath}") + data = pd.read_csv(filepath, sep=" ", delimiter=None, header=None) + data = data.rename(columns={3: "start", 4: "dur", 7: "speaker"}) + + data['start'] = data['start'].astype(float) + data['dur'] = data['dur'].astype(float) + data['end'] = data['start'] + data['dur'] + + data = data.sort_values(by=['start']) + data['segment'] = list(zip(data['start'], data['end'])) + + return data + + +def merge_intervals(intervals: List[List[float]]) -> List[List[float]]: + """ + Merge speech segments into non-overlapping segments + """ + intervals.sort(key=lambda x: x[0]) + merged = [] + for interval in intervals: + # if the list of merged intervals is empty or if the current + # interval does not overlap with the previous, simply append it. + if not merged or merged[-1][1] < interval[0]: + merged.append(interval) + else: + # otherwise, there is overlap, so we merge the current and previous + # intervals. + merged[-1][1] = max(merged[-1][1], interval[1]) + return merged + + +def load_speech_segments_from_rttm(rttm_file: str) -> List[List[float]]: + """ + load speech segments from rttm file, where each segment is represented + as [start, end] interval + """ + speech_segments = list(load_rttm_file(rttm_file)['segment']) + speech_segments = [list(x) for x in speech_segments] + speech_segments = merge_intervals(speech_segments) + return speech_segments diff --git a/nemo/collections/common/parts/preprocessing/collections.py b/nemo/collections/common/parts/preprocessing/collections.py index c5d5d6eee157..597920b8307d 100644 --- a/nemo/collections/common/parts/preprocessing/collections.py +++ b/nemo/collections/common/parts/preprocessing/collections.py @@ -267,6 +267,7 @@ def __init__( self.mapping = {} output_type = self.OUTPUT_TYPE data, duration_filtered = [], 0.0 + total_duration = 0.0 for audio_file, duration, command, offset in zip(audio_files, durations, labels, offsets): # Duration filters. if min_duration is not None and duration < min_duration: @@ -278,6 +279,7 @@ def __init__( continue data.append(output_type(audio_file, duration, command, offset)) + total_duration += duration if index_by_file_id: file_id, _ = os.path.splitext(os.path.basename(audio_file)) @@ -296,6 +298,7 @@ def __init__( logging.info( "Filtered duration for loading collection is %f.", duration_filtered, ) + logging.info(f"Dataset loaded with {len(data)} items, total duration of {total_duration / 3600: .2f} hours.") self.uniq_labels = sorted(set(map(lambda x: x.label, data))) logging.info("# {} files loaded accounting to # {} labels".format(len(data), len(self.uniq_labels))) @@ -395,7 +398,7 @@ def __init__( Args: feature_files: List of feature files. - seq_labels: List of sequences of abels. + seq_labels: List of sequences of labels. max_number: Maximum number of samples to collect. index_by_file_id: If True, saves a mapping from filename base (ID) to index in data. """ @@ -962,3 +965,268 @@ def get_audio_file(item: Dict, manifest_key: Union[str, List[str]]): return dict( audio_files=item['audio_files'], duration=item['duration'], offset=item['offset'], text=item['text'] ) + + +class FeatureLabel(_Collection): + """List of feature sequence and their label correspondence with preprocessing.""" + + OUTPUT_TYPE = collections.namedtuple(typename='FeatureLabelEntity', field_names='feature_file label',) + + def __init__( + self, + feature_files: List[str], + labels: List[str], + max_number: Optional[int] = None, + index_by_file_id: bool = False, + ): + """Instantiates feature-SequenceLabel manifest with filters and preprocessing. + + Args: + feature_files: List of feature files. + labels: List of labels. + max_number: Maximum number of samples to collect. + index_by_file_id: If True, saves a mapping from filename base (ID) to index in data. + """ + + output_type = self.OUTPUT_TYPE + data = [] + + self.uniq_labels = set() + + if index_by_file_id: + self.mapping = {} + + for feature_file, label in zip(feature_files, labels): + + data.append(output_type(feature_file, label)) + self.uniq_labels |= set(label) + + if index_by_file_id: + file_id, _ = os.path.splitext(os.path.basename(feature_file)) + self.mapping[file_id] = len(data) - 1 + + # Max number of entities filter. + if len(data) == max_number: + break + + logging.info("# {} files loaded including # {} unique labels".format(len(data), len(self.uniq_labels))) + super().__init__(data) + + +class ASRFeatureLabel(FeatureLabel): + """`FeatureLabel` collector from asr structured json files.""" + + def __init__( + self, manifests_files: Union[str, List[str]], max_number: Optional[int] = None, index_by_file_id: bool = False, + ): + + """Parse lists of feature files and sequences of labels. + + Args: + manifests_files: Either single string file or list of such - + manifests to yield items from. + max_number: Maximum number of samples to collect; pass to `FeatureSequenceLabel` constructor. + index_by_file_id: If True, saves a mapping from filename base (ID) to index in data; pass to `FeatureSequenceLabel` constructor. + """ + + feature_files, labels = [], [] + for item in manifest.item_iter(manifests_files, parse_func=self._parse_item): + feature_files.append(item['feature_file']) + labels.append(item['label']) + + super().__init__(feature_files, labels, max_number, index_by_file_id) + + def _parse_item(self, line: str, manifest_file: str) -> Dict[str, Any]: + item = json.loads(line) + + # Feature file + if 'feature_filename' in item: + item['feature_file'] = item.pop('feature_filename') + elif 'feature_filepath' in item: + item['feature_file'] = item.pop('feature_filepath') + elif 'feature_file' not in item: + raise ValueError( + f"Manifest file has invalid json line " f"structure: {line} without proper 'feature_file' key." + ) + item['feature_file'] = os.path.expanduser(item['feature_file']) + + # Label. + if 'label' in item: + item['label'] = item.pop('label') + else: + raise ValueError(f"Manifest file has invalid json line structure: {line} without proper 'label' key.") + + item = dict(feature_file=item['feature_file'], label=item['label'],) + + return item + + +class FeatureText(_Collection): + """List of audio-transcript text correspondence with preprocessing.""" + + OUTPUT_TYPE = collections.namedtuple( + typename='FeatureTextEntity', + field_names='id feature_file rttm_file duration text_tokens offset text_raw speaker orig_sr lang', + ) + + def __init__( + self, + ids: List[int], + feature_files: List[str], + rttm_files: List[str], + durations: List[float], + texts: List[str], + offsets: List[str], + speakers: List[Optional[int]], + orig_sampling_rates: List[Optional[int]], + token_labels: List[Optional[int]], + langs: List[Optional[str]], + parser: parsers.CharParser, + min_duration: Optional[float] = None, + max_duration: Optional[float] = None, + max_number: Optional[int] = None, + do_sort_by_duration: bool = False, + index_by_file_id: bool = False, + ): + """Instantiates feature-text manifest with filters and preprocessing. + + Args: + ids: List of examples positions. + feature_files: List of audio feature files. + rttm_files: List of audio rttm files. + durations: List of float durations. + texts: List of raw text transcripts. + offsets: List of duration offsets or None. + speakers: List of optional speakers ids. + orig_sampling_rates: List of original sampling rates of audio files. + langs: List of language ids, one for eadh sample, or None. + parser: Instance of `CharParser` to convert string to tokens. + min_duration: Minimum duration to keep entry with (default: None). + max_duration: Maximum duration to keep entry with (default: None). + max_number: Maximum number of samples to collect. + do_sort_by_duration: True if sort samples list by duration. Not compatible with index_by_file_id. + index_by_file_id: If True, saves a mapping from filename base (ID) to index in data. + """ + + output_type = self.OUTPUT_TYPE + data, duration_filtered, num_filtered, total_duration = [], 0.0, 0, 0.0 + if index_by_file_id: + self.mapping = {} + + for id_, feat_file, rttm_file, duration, offset, text, speaker, orig_sr, token_labels, lang in zip( + ids, + feature_files, + rttm_files, + durations, + offsets, + texts, + speakers, + orig_sampling_rates, + token_labels, + langs, + ): + # Duration filters. + if min_duration is not None and duration < min_duration: + duration_filtered += duration + num_filtered += 1 + continue + + if max_duration is not None and duration > max_duration: + duration_filtered += duration + num_filtered += 1 + continue + + if token_labels is not None: + text_tokens = token_labels + else: + if text != '': + if hasattr(parser, "is_aggregate") and parser.is_aggregate and isinstance(text, str): + if lang is not None: + text_tokens = parser(text, lang) + else: + raise ValueError("lang required in manifest when using aggregate tokenizers") + else: + text_tokens = parser(text) + else: + text_tokens = [] + + if text_tokens is None: + duration_filtered += duration + num_filtered += 1 + continue + + total_duration += duration + + data.append( + output_type(id_, feat_file, rttm_file, duration, text_tokens, offset, text, speaker, orig_sr, lang) + ) + if index_by_file_id: + file_id, _ = os.path.splitext(os.path.basename(feat_file)) + if file_id not in self.mapping: + self.mapping[file_id] = [] + self.mapping[file_id].append(len(data) - 1) + + # Max number of entities filter. + if len(data) == max_number: + break + + if do_sort_by_duration: + if index_by_file_id: + logging.warning("Tried to sort dataset by duration, but cannot since index_by_file_id is set.") + else: + data.sort(key=lambda entity: entity.duration) + + logging.info("Dataset loaded with %d files totalling %.2f hours", len(data), total_duration / 3600) + logging.info("%d files were filtered totalling %.2f hours", num_filtered, duration_filtered / 3600) + + super().__init__(data) + + +class ASRFeatureText(FeatureText): + """`FeatureText` collector from asr structured json files.""" + + def __init__(self, manifests_files: Union[str, List[str]], *args, **kwargs): + """Parse lists of audio files, durations and transcripts texts. + + Args: + manifests_files: Either single string file or list of such - + manifests to yield items from. + *args: Args to pass to `AudioText` constructor. + **kwargs: Kwargs to pass to `AudioText` constructor. + """ + + ids, feature_files, rttm_files, durations, texts, offsets, = ( + [], + [], + [], + [], + [], + [], + ) + speakers, orig_srs, token_labels, langs = [], [], [], [] + for item in manifest.item_iter(manifests_files): + ids.append(item['id']) + feature_files.append(item['feature_file']) + rttm_files.append(item['rttm_file']) + durations.append(item['duration']) + texts.append(item['text']) + offsets.append(item['offset']) + speakers.append(item['speaker']) + orig_srs.append(item['orig_sr']) + token_labels.append(item['token_labels']) + langs.append(item['lang']) + + super().__init__( + ids, + feature_files, + rttm_files, + durations, + texts, + offsets, + speakers, + orig_srs, + token_labels, + langs, + *args, + **kwargs, + ) diff --git a/nemo/collections/common/parts/preprocessing/manifest.py b/nemo/collections/common/parts/preprocessing/manifest.py index f6d8d4a6015b..efe850fc0e8c 100644 --- a/nemo/collections/common/parts/preprocessing/manifest.py +++ b/nemo/collections/common/parts/preprocessing/manifest.py @@ -92,7 +92,7 @@ def __parse_item(line: str, manifest_file: str) -> Dict[str, Any]: item['audio_file'] = item.pop('audio_filename') elif 'audio_filepath' in item: item['audio_file'] = item.pop('audio_filepath') - else: + elif 'audio_file' not in item: raise ValueError( f"Manifest file {manifest_file} has invalid json line structure: {line} without proper audio file key." ) @@ -117,18 +117,45 @@ def __parse_item(line: str, manifest_file: str) -> Dict[str, Any]: item['text'] = f.read().replace('\n', '') elif 'normalized_text' in item: item['text'] = item['normalized_text'] + else: + item['text'] = "" + + # Optional RTTM file + if 'rttm_file' in item: + pass + elif 'rttm_filename' in item: + item['rttm_file'] = item.pop('rttm_filename') + elif 'rttm_filepath' in item: + item['rttm_file'] = item.pop('rttm_filepath') + else: + item['rttm_file'] = None + if item['rttm_file'] is not None: + item['rttm_file'] = get_full_path(audio_file=item['rttm_file'], manifest_file=manifest_file) + + # Optional audio feature file + if 'feature_file' in item: + pass + elif 'feature_filename' in item: + item['feature_file'] = item.pop('feature_filename') + elif 'feature_filepath' in item: + item['feature_file'] = item.pop('feature_filepath') + else: + item['feature_file'] = None + if item['feature_file'] is not None: + item['feature_file'] = get_full_path(audio_file=item['feature_file'], manifest_file=manifest_file) item = dict( audio_file=item['audio_file'], duration=item['duration'], - text=item.get('text', ""), + text=item['text'], + rttm_file=item['rttm_file'], + feature_file=item['feature_file'], offset=item.get('offset', None), speaker=item.get('speaker', None), orig_sr=item.get('orig_sample_rate', None), token_labels=item.get('token_labels', None), lang=item.get('lang', None), ) - return item diff --git a/tests/collections/asr/test_asr_datasets.py b/tests/collections/asr/test_asr_datasets.py index 18ac747a845b..e8946f88b7b4 100644 --- a/tests/collections/asr/test_asr_datasets.py +++ b/tests/collections/asr/test_asr_datasets.py @@ -47,6 +47,7 @@ is_dali_supported, ) from nemo.collections.asr.data.audio_to_text_dataset import inject_dataloader_value_from_model_config +from nemo.collections.asr.data.feature_to_text import FeatureToBPEDataset, FeatureToCharDataset from nemo.collections.asr.models.ctc_models import EncDecCTCModel from nemo.collections.asr.parts.utils.audio_utils import get_segment_start from nemo.collections.asr.parts.utils.manifest_utils import write_manifest @@ -592,6 +593,172 @@ def test_dali_tarred_char_vs_ref_dataset(self, test_data_dir): assert np.mean(err) < 0.0001 assert np.max(err) < 0.01 + @pytest.mark.unit + def test_feature_to_text_char_dataset(self): + num_samples = 5 + golden_feat_shape = (80, 5) + with tempfile.TemporaryDirectory() as tmpdir: + manifest_path = os.path.join(tmpdir, 'manifest_input.json') + with open(manifest_path, 'w', encoding='utf-8') as fp: + for i in range(num_samples): + feat_file = os.path.join(tmpdir, f"feat_{i}.pt") + torch.save(torch.randn(80, 5), feat_file) + entry = {'audio_filepath': "", 'feature_file': feat_file, 'duration': 100000, "text": "a b c"} + fp.write(json.dumps(entry) + '\n') + + dataset = FeatureToCharDataset(manifest_path, labels=self.labels) + cnt = 0 + for item in dataset: + cnt += 1 + feat = item[0] + token_len = item[3] + assert feat.shape == golden_feat_shape + assert torch.equal(token_len, torch.tensor(5)) + assert cnt == num_samples + + @pytest.mark.unit + def test_feature_to_text_bpe_dataset(self, test_data_dir): + num_samples = 5 + golden_feat_shape = (80, 5) + tokenizer_path = os.path.join(test_data_dir, "asr", "tokenizers", "an4_wpe_128", 'vocab.txt') + tokenizer = tokenizers.AutoTokenizer(pretrained_model_name='bert-base-cased', vocab_file=tokenizer_path) + with tempfile.TemporaryDirectory() as tmpdir: + manifest_path = os.path.join(tmpdir, 'manifest_input.json') + with open(manifest_path, 'w', encoding='utf-8') as fp: + for i in range(num_samples): + feat_file = os.path.join(tmpdir, f"feat_{i}.pt") + torch.save(torch.randn(80, 5), feat_file) + entry = {'audio_filepath': "", 'feature_file': feat_file, 'duration': 100000, "text": "a b c"} + fp.write(json.dumps(entry) + '\n') + + dataset = FeatureToBPEDataset(manifest_path, tokenizer=tokenizer) + cnt = 0 + for item in dataset: + cnt += 1 + feat = item[0] + token_len = item[3] + assert feat.shape == golden_feat_shape + assert torch.equal(token_len, torch.tensor(5)) + assert cnt == num_samples + + @pytest.mark.unit + def test_feature_with_rttm_to_text_char_dataset(self): + num_samples = 2 + golden_feat_shape = (80, 10) + sample = torch.ones(80, 10) + masked_sample = sample * FeatureToCharDataset.ZERO_LEVEL_SPEC_DB_VAL + with tempfile.TemporaryDirectory() as tmpdir: + manifest_path = os.path.join(tmpdir, 'manifest_input.json') + with open(manifest_path, 'w', encoding='utf-8') as fp: + feat_file = os.path.join(tmpdir, f"feat_0.pt") + torch.save(sample, feat_file) + + rttm_file = os.path.join(tmpdir, f"rttm_0.rttm") + with open(rttm_file, "w") as fout: + fout.write(f"SPEAKER 1 0 1 speech \n") + + entry = { + 'audio_filepath': "", + 'feature_file': feat_file, + 'rttm_file': rttm_file, + 'duration': 100000, + "text": "a b c", + } + fp.write(json.dumps(entry) + '\n') + + # second sample where all frames are not masked + feat_file = os.path.join(tmpdir, f"feat_1.pt") + torch.save(sample, feat_file) + + rttm_file = os.path.join(tmpdir, f"rttm_1.rttm") + with open(rttm_file, "w") as fout: + fout.write(f"SPEAKER 1 0 0 speech \n") + + entry = { + 'audio_filepath': "", + 'feature_file': feat_file, + 'rttm_file': rttm_file, + 'duration': 100000, + "text": "a b c", + } + fp.write(json.dumps(entry) + '\n') + + dataset = FeatureToCharDataset(manifest_path, labels=self.labels, normalize=None, use_rttm=True) + cnt = 0 + for item in dataset: + cnt += 1 + feat = item[0] + token_len = item[3] + assert feat.shape == golden_feat_shape + assert torch.equal(token_len, torch.tensor(5)) + + if cnt == 1: + assert torch.equal(feat, sample) + else: + assert torch.equal(feat, masked_sample) + + assert cnt == num_samples + + @pytest.mark.unit + def test_feature_with_rttm_to_text_bpe_dataset(self, test_data_dir): + tokenizer_path = os.path.join(test_data_dir, "asr", "tokenizers", "an4_wpe_128", 'vocab.txt') + tokenizer = tokenizers.AutoTokenizer(pretrained_model_name='bert-base-cased', vocab_file=tokenizer_path) + num_samples = 2 + golden_feat_shape = (80, 10) + sample = torch.ones(80, 10) + masked_sample = sample * FeatureToCharDataset.ZERO_LEVEL_SPEC_DB_VAL + with tempfile.TemporaryDirectory() as tmpdir: + manifest_path = os.path.join(tmpdir, 'manifest_input.json') + with open(manifest_path, 'w', encoding='utf-8') as fp: + feat_file = os.path.join(tmpdir, f"feat_0.pt") + torch.save(sample, feat_file) + + rttm_file = os.path.join(tmpdir, f"rttm_0.rttm") + with open(rttm_file, "w") as fout: + fout.write(f"SPEAKER 1 0 1 speech \n") + + entry = { + 'audio_filepath': "", + 'feature_file': feat_file, + 'rttm_file': rttm_file, + 'duration': 100000, + "text": "a b c", + } + fp.write(json.dumps(entry) + '\n') + + # second sample where all frames are not masked + feat_file = os.path.join(tmpdir, f"feat_1.pt") + torch.save(sample, feat_file) + + rttm_file = os.path.join(tmpdir, f"rttm_1.rttm") + with open(rttm_file, "w") as fout: + fout.write(f"SPEAKER 1 0 0 speech \n") + + entry = { + 'audio_filepath': "", + 'feature_file': feat_file, + 'rttm_file': rttm_file, + 'duration': 100000, + "text": "a b c", + } + fp.write(json.dumps(entry) + '\n') + + dataset = FeatureToBPEDataset(manifest_path, tokenizer=tokenizer, normalize=None, use_rttm=True) + cnt = 0 + for item in dataset: + cnt += 1 + feat = item[0] + token_len = item[3] + assert feat.shape == golden_feat_shape + assert torch.equal(token_len, torch.tensor(5)) + + if cnt == 1: + assert torch.equal(feat, sample) + else: + assert torch.equal(feat, masked_sample) + + assert cnt == num_samples + class TestAudioDatasets: @pytest.mark.unit diff --git a/tests/collections/asr/test_label_datasets.py b/tests/collections/asr/test_label_datasets.py index fbe902528445..0c2b54dc24e2 100644 --- a/tests/collections/asr/test_label_datasets.py +++ b/tests/collections/asr/test_label_datasets.py @@ -11,13 +11,15 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. +import json import os +import tempfile import pytest import torch from nemo.collections.asr.data.audio_to_label import TarredAudioToClassificationLabelDataset -from nemo.collections.asr.data.feature_to_label import FeatureToSeqSpeakerLabelDataset +from nemo.collections.asr.data.feature_to_label import FeatureToLabelDataset, FeatureToSeqSpeakerLabelDataset from nemo.collections.asr.parts.preprocessing.feature_loader import ExternalFeatureLoader from nemo.collections.asr.parts.preprocessing.features import WaveformFeaturizer @@ -85,7 +87,7 @@ def test_tarred_dataset_duplicate_name(self, test_data_dir): @pytest.mark.unit def test_feat_seqlabel_dataset(self, test_data_dir): manifest_path = os.path.abspath(os.path.join(test_data_dir, 'asr/feat/emb.json')) - feature_loader = ExternalFeatureLoader(file_path=manifest_path, sample_rate=16000, augmentor=None) + feature_loader = ExternalFeatureLoader(augmentor=None) ds_braceexpand = FeatureToSeqSpeakerLabelDataset( manifest_filepath=manifest_path, labels=self.unique_labels_in_seq, feature_loader=feature_loader ) @@ -104,3 +106,29 @@ def test_feat_seqlabel_dataset(self, test_data_dir): for _ in ds_braceexpand: count += 1 assert count == 2 + + @pytest.mark.unit + def test_feat_label_dataset(self): + + with tempfile.TemporaryDirectory() as tmpdir: + manifest_path = os.path.join(tmpdir, 'manifest_input.json') + with open(manifest_path, 'w', encoding='utf-8') as fp: + for i in range(2): + feat_file = os.path.join(tmpdir, f"feat_{i}.pt") + torch.save(torch.randn(80, 5), feat_file) + entry = {'feature_file': feat_file, 'duration': 100000, 'label': '0'} + fp.write(json.dumps(entry) + '\n') + + dataset = FeatureToLabelDataset(manifest_filepath=manifest_path, labels=self.unique_labels_in_seq) + + correct_label = torch.tensor(self.unique_labels_in_seq.index('0')) + correct_label_length = torch.tensor(1) + + assert dataset[0][0].shape == (80, 5) + assert torch.equal(dataset[0][2], correct_label) + assert torch.equal(dataset[0][3], correct_label_length) + + count = 0 + for _ in dataset: + count += 1 + assert count == 2 From 40323ae5fb15d8d2116719e8970fc6eeff53912f Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 16 Dec 2022 11:41:41 -0800 Subject: [PATCH 066/154] rename separator to ctm_grouping_separator and refactor Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 3 +- tools/nemo_forced_aligner/align.py | 36 +++-- tools/nemo_forced_aligner/utils/data_prep.py | 3 + tools/nemo_forced_aligner/utils/make_ctm.py | 132 +++++++++---------- 4 files changed, 98 insertions(+), 76 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 367af39d2c70..e05211167d2f 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -29,7 +29,8 @@ Call the `align.py` script, specifying the parameters as follows: * `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. -* **[OPTIONAL]** `separator`: the string used to separate CTM segments. If the separator is `“”` (empty string), the CTM segments will be the tokens used by the ASR model. If the separator is anything else, e.g. `“ “`, `“|”` or `“”`, the segments will be the blocks of text separated by that separator. (Default: `“ “`, so for languages such as English, the CTM segments will be words.) +* **[OPTIONAL]** `ctm_grouping_separator`: the string used to separate CTM segments. If the separator is `“”` (empty string) or `None`, the CTM segments will be the tokens used by the ASR model. If the separator is anything else, e.g. `“ “`, `“|”` or `“”`, the segments will be the blocks of text separated by that separator. (Default: `“ “`, so for languages such as English, the CTM segments will be words.) +> Note: if you pass in a hydra override `ctm_grouping_separator=" "`, hydra will remove the whitespace, thus converting that `" "` to `""`. If you want to pass in a space, make sure to use `ctm_grouping_separator="\ "`, or just do not pass in this override, as the default value is `" "` anyway. * **[OPTIONAL]** `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be replaced with dashes, so as not to change the number of space-separated elements in the CTM files. diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index d0a129e7f892..4cf8ced37871 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -18,7 +18,7 @@ import torch from omegaconf import OmegaConf from utils.data_prep import get_audio_sr, get_log_probs_y_T_U, get_manifest_lines -from utils.make_ctm import make_basetoken_ctm, make_word_ctm +from utils.make_ctm import make_segment_ctm, make_token_ctm from utils.viterbi_decoding import viterbi_decoding from nemo.collections.asr.models.ctc_models import EncDecCTCModel @@ -47,8 +47,8 @@ manifest_filepath: filepath to the manifest of the data you want to align, containing 'audio_filepath' and 'text' fields. output_ctm_folder: the folder where output CTM files will be saved. - separator: the string used to separate CTM segments. - If the separator is “”, the CTM segments will be the tokens used by the ASR model. + ctm_grouping_separator: the string used to separate CTM segments. + If the separator is “” or None, each line of the output CTM will be the tokens used by the ASR model. If the separator is anything else, e.g. “ “, “|” or “”, the segments will be the blocks of text separated by that separator. Default: “ “, ie for languages such as English, the CTM segments will be words. @@ -86,7 +86,7 @@ class AlignmentConfig: output_ctm_folder: Optional[str] = None # General configs - separator: str = " " + ctm_grouping_separator: Optional[str] = " " n_parts_for_ctm_id: int = 1 transcribe_device: str = "cpu" viterbi_device: str = "cpu" @@ -96,6 +96,8 @@ class AlignmentConfig: @hydra_runner(config_name="AlignmentConfig", schema=AlignmentConfig) def main(cfg: AlignmentConfig): + logging.info(f'Hydra config: {OmegaConf.to_yaml(cfg)}') + if is_dataclass(cfg): cfg = OmegaConf.structured(cfg) @@ -115,6 +117,22 @@ def main(cfg: AlignmentConfig): if cfg.output_ctm_folder is None: raise ValueError("cfg.output_ctm_folder must be specified") + # Log info about selected params + if cfg.ctm_grouping_separator == "" or cfg.ctm_grouping_separator is None: + logging.info( + f"ctm_grouping_separator is {cfg.ctm_grouping_separator} => " "each line of the output CTM will be a token" + ) + elif cfg.ctm_grouping_separator == " ": + logging.info( + f"ctm_grouping_separator is {cfg.ctm_grouping_separator} => " + "each line of the output CTM will be a space-separated word" + ) + else: + logging.info( + f"ctm_grouping_separator is {cfg.ctm_grouping_separator} => " + f"each line of the output CTM will be the text that is inbetween {cfg.ctm_grouping_separator}" + ) + # init devices transcribe_device = torch.device(cfg.transcribe_device) viterbi_device = torch.device(cfg.viterbi_device) @@ -148,11 +166,11 @@ def main(cfg: AlignmentConfig): for start, end in zip(starts, ends): data = get_manifest_lines(cfg.manifest_filepath, start, end) - log_probs, y, T, U = get_log_probs_y_T_U(data, model, cfg.separator) + log_probs, y, T, U = get_log_probs_y_T_U(data, model, cfg.ctm_grouping_separator) alignments = viterbi_decoding(log_probs, y, T, U, viterbi_device) - if cfg.separator == "": - make_basetoken_ctm( + if cfg.ctm_grouping_separator == "" or cfg.ctm_grouping_separator is None: + make_token_ctm( data, alignments, model, @@ -163,7 +181,7 @@ def main(cfg: AlignmentConfig): ) else: - make_word_ctm( + make_segment_ctm( data, alignments, model, @@ -171,7 +189,7 @@ def main(cfg: AlignmentConfig): cfg.output_ctm_folder, cfg.n_parts_for_ctm_id, audio_sr, - cfg.separator, + cfg.ctm_grouping_separator, ) return None diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index dfe6e373d19f..9a0bd425e1cb 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -62,6 +62,9 @@ def get_processed_text_and_segments(text, separator): if separator == "": return text, None + # get rid of any whitespace around text + text = text.strip() + segments = text.split(separator) # remove any spaces at start and end of segments diff --git a/tools/nemo_forced_aligner/utils/make_ctm.py b/tools/nemo_forced_aligner/utils/make_ctm.py index 0873559f513e..1a67aff4cad7 100644 --- a/tools/nemo_forced_aligner/utils/make_ctm.py +++ b/tools/nemo_forced_aligner/utils/make_ctm.py @@ -25,7 +25,7 @@ def _get_utt_id(audio_filepath, n_parts_for_ctm_id): return utt_id -def make_basetoken_ctm( +def make_token_ctm( data, alignments, model, model_downsample_factor, output_ctm_folder, n_parts_for_ctm_id, audio_sr, ): """ @@ -48,13 +48,13 @@ def make_basetoken_ctm( for token_id in token_ids: u_list.extend([token_id, blank_index]) - # make 'basetokens_info' - a list of dictionaries: + # make 'tokens_info' - a list of dictionaries: # e.g. [ - # {"basetoken": "", "u": 0, "t_start": 0, "t_end", 1 }, - # {"basetoken": "_sh", "u": 0, "t_start": 2, "t_end", 4 }, + # {"token": "", "u": 0, "t_start": 0, "t_end", 1 }, + # {"token": "_sh", "u": 0, "t_start": 2, "t_end", 4 }, # ... # ] - basetokens_info = [] + tokens_info = [] for t, u in enumerate(alignment): alignment_token_in_vocab = u_list[u] if alignment_token_in_vocab == blank_index: @@ -65,39 +65,39 @@ def make_basetoken_ctm( alignment_token_as_string = "" if t == 0: - basetokens_info.append( - {"basetoken": alignment_token_as_string, "u": u, "t_start": t,} + tokens_info.append( + {"token": alignment_token_as_string, "u": u, "t_start": t,} ) else: - if u == basetokens_info[-1]["u"]: + if u == tokens_info[-1]["u"]: pass else: - basetokens_info[-1]["t_end"] = t - 1 - basetokens_info.append( - {"basetoken": alignment_token_as_string, "u": u, "t_start": t,} + tokens_info[-1]["t_end"] = t - 1 + tokens_info.append( + {"token": alignment_token_as_string, "u": u, "t_start": t,} ) - basetokens_info[-1]["t_end"] = len(alignment) - 1 + tokens_info[-1]["t_end"] = len(alignment) - 1 # get utt_id that will be used for saving CTM file as .ctm utt_id = _get_utt_id(manifest_line['audio_filepath'], n_parts_for_ctm_id) with open(os.path.join(output_ctm_folder, f"{utt_id}.ctm"), "w") as f_ctm: - for basetoken_info in basetokens_info: - basetoken = basetoken_info["basetoken"] - start_sample = basetoken_info["t_start"] * timestep_to_sample_ratio - end_sample = (basetoken_info["t_end"] + 1) * timestep_to_sample_ratio - 1 + for token_info in tokens_info: + token = token_info["token"] + start_sample = token_info["t_start"] * timestep_to_sample_ratio + end_sample = (token_info["t_end"] + 1) * timestep_to_sample_ratio - 1 start_time = start_sample / audio_sr end_time = end_sample / audio_sr - f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {basetoken}\n") + f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {token}\n") return None -def make_word_ctm( +def make_segment_ctm( data, alignments, model, model_downsample_factor, output_ctm_folder, n_parts_for_ctm_id, audio_sr, separator ): """ @@ -111,101 +111,101 @@ def make_word_ctm( timestep_to_sample_ratio = spectrogram_hop_length * model_downsample_factor for manifest_line, alignment in zip(data, alignments): - # make 'words_info' - a list of dictionaries: + # make 'segments_info' - a list of dictionaries: # e.g. [ - # {"word": "", "u_start": 0, "u_end": 0, "t_start": 0, "t_end", 0 }, - # {"word": "she", "u_start": 1, "u_end": 5, "t_start": 1, "t_end", 5 }, - # {"word": "had", "u_start": 7, "u_end": 10, "t_start": 6, "t_end", 8 } + # {"segment": "", "u_start": 0, "u_end": 0, "t_start": 0, "t_end", 0 }, + # {"segment": "she", "u_start": 1, "u_end": 5, "t_start": 1, "t_end", 5 }, + # {"segment": "had", "u_start": 7, "u_end": 10, "t_start": 6, "t_end", 8 } # ... # ] - words_info = [{"word": "", "u_start": 0, "u_end": 0, "t_start": 0, "t_end": None}] + segments_info = [{"segment": "", "u_start": 0, "u_end": 0, "t_start": 0, "t_end": None}] u_counter = 1 _, segments = get_processed_text_and_segments(manifest_line['text'], separator) if separator not in model.decoder.vocabulary: - for word in segments: - word_info = { - "word": word.replace(" ", ""), + for segment in segments: + segment_info = { + "segment": segment.replace(" ", ""), "u_start": u_counter, "u_end": None, "t_start": None, "t_end": None, } - word_tokens = tokenize_text(model, word) - word_info["u_end"] = word_info["u_start"] + 2 * (len(word_tokens) - 1) - words_info.append(word_info) + segment_tokens = tokenize_text(model, segment) + segment_info["u_end"] = segment_info["u_start"] + 2 * (len(segment_tokens) - 1) + segments_info.append(segment_info) - u_counter += 2 * len(word_tokens) + u_counter += 2 * len(segment_tokens) if " " in model.decoder.vocabulary: u_counter += 2 else: - for word_i, word in enumerate(segments): - word_info = { - "word": word.replace(" ", ""), + for segment_i, segment in enumerate(segments): + segment_info = { + "segment": segment.replace(" ", ""), "u_start": u_counter, "u_end": None, "t_start": None, "t_end": None, } - word_tokens = tokenize_text(model, word) + segment_tokens = tokenize_text(model, segment) - word_info["u_end"] = word_info["u_start"] + 2 * (len(word_tokens) - 1) - u_counter += 2 * len(word_tokens) + segment_info["u_end"] = segment_info["u_start"] + 2 * (len(segment_tokens) - 1) + u_counter += 2 * len(segment_tokens) - words_info.append(word_info) + segments_info.append(segment_info) - if word_i < len(manifest_line["text"].split(separator)) - 1: - # add the space after every word except the final word - word_info = { - "word": separator.replace(" ", ""), + if segment_i < len(manifest_line["text"].split(separator)) - 1: + # add the space after every segment except the final segment + segment_info = { + "segment": separator.replace(" ", ""), "u_start": u_counter, "u_end": None, "t_start": None, "t_end": None, } - word_tokens = tokenize_text(model, " ") + segment_tokens = tokenize_text(model, " ") - word_info["u_end"] = word_info["u_start"] + 2 * len(word_tokens) - 1 - u_counter += 2 * len(word_tokens) + segment_info["u_end"] = segment_info["u_start"] + 2 * len(segment_tokens) - 1 + u_counter += 2 * len(segment_tokens) - words_info.append(word_info) + segments_info.append(segment_info) - words_info_pointer = 0 + segments_info_pointer = 0 for t, u_at_t in enumerate(alignment): - if words_info_pointer < len(words_info): - if u_at_t == words_info[words_info_pointer]["u_start"]: - if words_info[words_info_pointer]["t_start"] is None: - words_info[words_info_pointer]["t_start"] = t + if segments_info_pointer < len(segments_info): + if u_at_t == segments_info[segments_info_pointer]["u_start"]: + if segments_info[segments_info_pointer]["t_start"] is None: + segments_info[segments_info_pointer]["t_start"] = t - if u_at_t > words_info[words_info_pointer]["u_end"]: - if words_info[words_info_pointer]["t_end"] is None: - words_info[words_info_pointer]["t_end"] = t - 1 + if u_at_t > segments_info[segments_info_pointer]["u_end"]: + if segments_info[segments_info_pointer]["t_end"] is None: + segments_info[segments_info_pointer]["t_end"] = t - 1 - words_info_pointer += 1 + segments_info_pointer += 1 - if words_info_pointer < len(words_info): - if u_at_t == words_info[words_info_pointer]["u_start"]: - if words_info[words_info_pointer]["t_start"] is None: - words_info[words_info_pointer]["t_start"] = t + if segments_info_pointer < len(segments_info): + if u_at_t == segments_info[segments_info_pointer]["u_start"]: + if segments_info[segments_info_pointer]["t_start"] is None: + segments_info[segments_info_pointer]["t_start"] = t # don't forget the final t_end as our for-loop might miss it - if alignment[-1] == words_info[-1]["u_end"]: - words_info[-1]["t_end"] = len(alignment) - 1 + if alignment[-1] == segments_info[-1]["u_end"]: + segments_info[-1]["t_end"] = len(alignment) - 1 # get utt_id that will be used for saving CTM file as .ctm utt_id = _get_utt_id(manifest_line['audio_filepath'], n_parts_for_ctm_id) with open(os.path.join(output_ctm_folder, f"{utt_id}.ctm"), "w") as f_ctm: - for word_info in words_info: - if not (word_info["word"] == "initial_silence" and word_info["t_end"] is None): - word = word_info["word"] - start_sample = word_info["t_start"] * timestep_to_sample_ratio - end_sample = (word_info["t_end"] + 1) * timestep_to_sample_ratio - 1 + for segment_info in segments_info: + if not (segment_info["segment"] == "initial_silence" and segment_info["t_end"] is None): + segment = segment_info["segment"] + start_sample = segment_info["t_start"] * timestep_to_sample_ratio + end_sample = (segment_info["t_end"] + 1) * timestep_to_sample_ratio - 1 start_time = start_sample / audio_sr end_time = end_sample / audio_sr - f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {word}\n") + f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {segment}\n") return None From 77300e93d21e2818b2e8335950fe02a2435da36b Mon Sep 17 00:00:00 2001 From: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Date: Fri, 16 Dec 2022 11:42:25 -0800 Subject: [PATCH 067/154] Bert interleaved (#5556) * Adding SP and SAR support Bert * Adding Sequence parallel support to Bert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Adding Sequence parallel support to Bert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Adding SP and SAR support Bert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Adding SP and SAR support Bert * Adding SP and SAR support Bert * Adding Sequence parallel support to Bert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Adding Sequence parallel support to Bert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Adding Sequence parallel support to Bert * Update bert_model.py Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Adding tests * Adding interleaved pipeline parallelism * Adding interleaved pipeline parallelism * Adding interleaved pipeline parallelism * Adding interleaved pipeline parallelism * Adding interleaved pipeline parallelism * Adding interleaved pipeline parallelism * Adding interleaved pipeline parallelism * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Addressing Eric's comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Addressing Eric's comments * Fix bug fix sequence parallel and Interleaved * Fix bug fix sequence parallel and Interleaved Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev Co-authored-by: Eric Harper Signed-off-by: Elena Rastorgueva --- .../conf/megatron_bert_config.yaml | 1 + .../language_modeling/megatron_bert_model.py | 262 +++++++++++++----- 2 files changed, 194 insertions(+), 69 deletions(-) diff --git a/examples/nlp/language_modeling/conf/megatron_bert_config.yaml b/examples/nlp/language_modeling/conf/megatron_bert_config.yaml index a1ba39e27ce4..cbc0562e2904 100644 --- a/examples/nlp/language_modeling/conf/megatron_bert_config.yaml +++ b/examples/nlp/language_modeling/conf/megatron_bert_config.yaml @@ -45,6 +45,7 @@ model: global_batch_size: 8 tensor_model_parallel_size: 1 pipeline_model_parallel_size: 1 + virtual_pipeline_model_parallel_size: null # model architecture encoder_seq_length: 512 diff --git a/nemo/collections/nlp/models/language_modeling/megatron_bert_model.py b/nemo/collections/nlp/models/language_modeling/megatron_bert_model.py index 95e616358edc..ef4fa18b10b4 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_bert_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_bert_model.py @@ -12,6 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. +import itertools from typing import Any, Dict, List, Optional, Union import torch @@ -19,7 +20,7 @@ from omegaconf.dictconfig import DictConfig from pytorch_lightning.trainer.trainer import Trainer -from nemo.collections.nlp.data.language_modeling.megatron.dataset_utils import build_train_valid_test_datasets +from nemo.collections.nlp.data.language_modeling.megatron import dataset_utils from nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers import ( MegatronPretrainingBatchSampler, MegatronPretrainingRandomBatchSampler, @@ -65,12 +66,15 @@ def __init__(self, cfg: DictConfig, trainer: Trainer): "Apex was not found. Please see the NeMo README for installation instructions: https://github.com/NVIDIA/NeMo#megatron-gpt." ) self.megatron_amp_o2 = cfg.get('megatron_amp_O2', False) + self.cfg = cfg + + if not self.megatron_amp_o2 and self.cfg.get('virtual_pipeline_model_parallel_size', None): + raise ValueError('Virtual pipeline model parallel is only supported when using megatron_amp_O2') + super().__init__(cfg, trainer=trainer, no_lm_init=False) self._validate_trainer() - self.cfg = cfg - if self.trainer.precision == 32: self.autocast_dtype = torch.float elif self.trainer.precision == 16: @@ -86,12 +90,34 @@ def __init__(self, cfg: DictConfig, trainer: Trainer): self._reduced_sop_loss_buffer = [] # build_model returns a list of modules which are used for interleaved pipeline parallelism - self.model = build_model(model_provider_func=self.model_provider_func, wrap_with_ddp=False,)[0] + self.model = build_model( + model_provider_func=self.model_provider_func, + wrap_with_ddp=False, + virtual_pipeline_model_parallel_size=self.cfg.get('virtual_pipeline_model_parallel_size', None), + ) + + # if we're not using interleaved, then self.model is a module. + if self.cfg.get('virtual_pipeline_model_parallel_size', None) is None: + self.model = self.model[0] if self.megatron_amp_o2: + if not self.with_distributed_adam: - self.model.cuda(torch.cuda.current_device()) - self.model = Float16Module(module=self.model, precision=cfg.precision) + # Pre-allocate the model on GPU to have master parameters allocated on the same device with matching data type + if isinstance(self.model, list): + for module in self.model: + module.cuda(torch.cuda.current_device()) + else: + self.model.cuda(torch.cuda.current_device()) + + # Model wrapper to convert both model and inputs to half precision + if isinstance(self.model, list): + converted_model = [] + for module in self.model: + converted_model.append(Float16Module(module=module, precision=cfg.precision)) + self.model = converted_model + else: + self.model = Float16Module(module=self.model, precision=cfg.precision) def model_provider_func(self, pre_process, post_process): cfg = self.cfg @@ -145,7 +171,10 @@ def _validate_trainer(self): def _get_fwd_bwd_function(self): fwd_bwd_function = None if self.cfg.get('pipeline_model_parallel_size', 1) > 1: - fwd_bwd_function = forward_backward_pipelining_without_interleaving + if self.cfg.get('virtual_pipeline_model_parallel_size', None) is not None: + fwd_bwd_function = _forward_backward_pipelining_with_interleaving + else: + fwd_bwd_function = forward_backward_pipelining_without_interleaving else: fwd_bwd_function = forward_backward_no_pipelining return fwd_bwd_function @@ -175,12 +204,14 @@ def fwd_output_and_loss_func(batch, model, checkpoint_activations_all_layers=Non if not self.cfg.bert_binary_head: types = None + output_tensor = self.forward( tokens, padding_mask, types, lm_labels, checkpoint_activations_all_layers=checkpoint_activations_all_layers, + model=model, ) def loss_func(output_tensor): @@ -201,9 +232,17 @@ def loss_func(output_tensor): return fwd_output_and_loss_func def forward( - self, input_ids, attention_mask, token_type_ids, lm_labels=None, checkpoint_activations_all_layers=None + self, + input_ids, + attention_mask, + token_type_ids, + lm_labels=None, + checkpoint_activations_all_layers=None, + model=None, ): - output_tensor = self.model( + if model is None: + model = self.model + output_tensor = model( input_ids, attention_mask, token_type_ids=token_type_ids, @@ -330,7 +369,16 @@ def allreduce_first_last_embeddings(self): parallel_state.is_pipeline_first_stage(ignore_virtual=True) or parallel_state.is_pipeline_last_stage(ignore_virtual=True) ): - module = self.model + if parallel_state.is_pipeline_first_stage(ignore_virtual=True): + if isinstance(self.model, list): + module = self.model[0] # only the first virtual rank has the embeddings + else: + module = self.model + if parallel_state.is_pipeline_last_stage(ignore_virtual=True): + if isinstance(self.model, list): + module = self.model[-1] # only the last virtual rank has the embeddings + else: + module = self.model if module.share_token_embeddings: word_embeddings_weight = module.word_embeddings_weight() if self.megatron_amp_o2: @@ -423,7 +471,7 @@ def process_batch(self, batch): padding_mask = batch['padding_mask'].long() return [tokens, types, sentence_order, loss_mask, lm_labels, padding_mask] - def _build_train_valid_test_datasets(self): + def build_train_valid_test_datasets(self): logging.info('Building Bert datasets.') if self.trainer.limit_val_batches > 1.0 and isinstance(self.trainer.limit_val_batches, float): raise ValueError("limit_val_batches must be an integer or float less than or equal to 1.0.") @@ -444,7 +492,7 @@ def _build_train_valid_test_datasets(self): 1 ] = 1 # This is to make sure we only have one epoch on every validation iteration - self._train_ds, self._validation_ds, self._test_ds = build_train_valid_test_datasets( + self._train_ds, self._validation_ds, self._test_ds = dataset_utils.build_train_valid_test_datasets( cfg=self.cfg, trainer=self.trainer, data_prefix=self.cfg.data.data_prefix, @@ -496,6 +544,79 @@ def _append_sequence_parallel_module_grads(self, module, grads): grad = param.grad grads.append(grad.data) + def setup(self, stage=None): + """ PTL hook that is executed after DDP spawns. + We setup datasets here as megatron datasets require DDP to instantiate. + See https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html#setup for more information. + Args: + stage (str, optional): Can be 'fit', 'validate', 'test' or 'predict'. Defaults to None. + """ + + # log number of parameters + if isinstance(self.model, list): + num_parameters_on_device = sum( + [sum([p.nelement() for p in model_module.parameters()]) for model_module in self.model] + ) + if parallel_state.get_pipeline_model_parallel_world_size() > 1 and parallel_state.is_pipeline_last_stage( + ignore_virtual=True + ): + # substract the embedding weights on the last virtual stage + num_word_embedding_parameters = sum([p.nelement() for p in self.model[-1].word_embeddings_weight()]) + num_parameters_on_device -= num_word_embedding_parameters + else: + num_parameters_on_device = sum([p.nelement() for p in self.model.parameters()]) + + if parallel_state.get_pipeline_model_parallel_world_size() > 1 and parallel_state.is_pipeline_last_stage( + ignore_virtual=True + ): + # substract the embedding weights on the last stage + num_word_embedding_parameters = sum([p.nelement() for p in self.model.word_embeddings_weight()]) + + num_parameters_on_device -= num_word_embedding_parameters + + # to be summed across data parallel group + total_num_parameters = torch.tensor(num_parameters_on_device).cuda() + + torch.distributed.all_reduce(total_num_parameters, group=parallel_state.get_model_parallel_group()) + + logging.info( + f'Pipeline model parallel rank: {parallel_state.get_pipeline_model_parallel_rank()}, ' + f'Tensor model parallel rank: {parallel_state.get_tensor_model_parallel_rank()}, ' + f'Number of model parameters on device: {num_parameters_on_device:.2e}. ' + f'Total number of model parameters: {total_num_parameters:.2e}.' + ) + + resume_checkpoint_path = self.trainer._checkpoint_connector.resume_from_checkpoint_fit_path + if resume_checkpoint_path: + init_consumed_samples = self._extract_consumed_samples_from_ckpt(resume_checkpoint_path) + else: + init_consumed_samples = 0 + self.init_consumed_samples = init_consumed_samples + self.init_global_step = self.trainer.global_step + + if stage == 'predict': + return + else: + # TODO: consider adding a ModelPT guard to check if model is being restored. + # allowing restored models to optionally setup datasets + self.build_train_valid_test_datasets() + self.setup_training_data(self.cfg.data) + self.setup_validation_data(self.cfg.data) + self.setup_test_data(self.cfg.data) + + # when using pipeline model parallel the final stage need to initialize word embeddings + if parallel_state.get_pipeline_model_parallel_world_size() > 1: + if isinstance(self.model, list): + for i, module in enumerate(self.model): + parallel_state.set_virtual_pipeline_model_parallel_rank(i) + module.sync_initial_word_embeddings() + parallel_state.set_virtual_pipeline_model_parallel_rank(0) + else: + self.model.sync_initial_word_embeddings() + + if self.cfg.get('transformer_engine', False): + self.setup_transformer_engine_tp_groups() + def allreduce_sequence_parallel_gradients(self): """ All-reduce layernorm parameters across model parallel nodes when sequence parallelism is used. Modified from megatron-lm: @@ -503,7 +624,16 @@ def allreduce_sequence_parallel_gradients(self): """ grads = [] - self._append_sequence_parallel_module_grads(self.model, grads) + if isinstance(self.model, list): + for module in self.model: + self._append_sequence_parallel_module_grads(module, grads) + else: + self._append_sequence_parallel_module_grads(self.model, grads) + + coalesced = torch._utils._flatten_dense_tensors(grads) + torch.distributed.all_reduce(coalesced, group=parallel_state.get_tensor_model_parallel_group()) + for buf, synced in zip(grads, torch._utils._unflatten_dense_tensors(coalesced, grads)): + buf.copy_(synced) def build_pretraining_data_loader(self, dataset, consumed_samples): """Buld dataloader given an input dataset.""" @@ -542,51 +672,6 @@ def build_pretraining_data_loader(self, dataset, consumed_samples): dataset, batch_sampler=batch_sampler, num_workers=self.cfg.data.num_workers, pin_memory=True, ) - def setup(self, stage=None): - num_parameters_on_device = sum([p.nelement() for p in self.model.parameters()]) - if parallel_state.get_pipeline_model_parallel_world_size() > 1 and parallel_state.is_pipeline_last_stage( - ignore_virtual=True - ): - # substract the embedding weights on the last stage - num_word_embedding_parameters = sum([p.nelement() for p in self.model.word_embeddings_weight()]) - - num_parameters_on_device -= num_word_embedding_parameters - - # to be summed across data parallel group - total_num_parameters = torch.tensor(num_parameters_on_device).cuda() - - torch.distributed.all_reduce(total_num_parameters, group=parallel_state.get_model_parallel_group()) - - logging.info( - f'Pipeline model parallel rank: {parallel_state.get_pipeline_model_parallel_rank()}, ' - f'Tensor model parallel rank: {parallel_state.get_tensor_model_parallel_rank()}, ' - f'Number of model parameters on device: {num_parameters_on_device:.2e}. ' - f'Total number of model parameters: {total_num_parameters:.2e}.' - ) - - resume_checkpoint_path = self.trainer._checkpoint_connector.resume_from_checkpoint_fit_path - - if resume_checkpoint_path: - init_consumed_samples = self._extract_consumed_samples_from_ckpt(resume_checkpoint_path) - else: - init_consumed_samples = 0 - - self.init_consumed_samples = init_consumed_samples - self.init_global_step = self.trainer.global_step - - if stage == 'predict': - return - # TODO: consider adding a ModelPT guard to check if model is being restored. - # allowing restored models to optionally setup datasets - self._build_train_valid_test_datasets() - self.setup_training_data(self.cfg.data) - self.setup_validation_data(self.cfg.data) - self.setup_test_data(self.cfg.data) - - # when using pipeline model parallel the final stage need to initialize word embeddings - if parallel_state.get_pipeline_model_parallel_world_size() > 1: - self.model.sync_initial_word_embeddings() - def setup_training_data(self, cfg): if hasattr(self, '_train_ds'): consumed_samples = self.compute_consumed_samples(0) @@ -620,7 +705,10 @@ def transfer_batch_to_device(self, batch: Any, device: torch.device, dataloader_ return batch def parameters(self): - return self.model.parameters() + if isinstance(self.model, list): + return itertools.chain.from_iterable(module.parameters() for module in self.model) + else: + return self.model.parameters() @classmethod def list_available_models(cls) -> Optional[PretrainedModelInfo]: @@ -667,16 +755,32 @@ def configure_optimizers(self): # Disable overlapped grad sync for embedding grad when # pipeline parallelism is enabled - # See: allreduce_first_last_embeddings - if parallel_state.get_pipeline_model_parallel_world_size() > 1 and ( - parallel_state.is_pipeline_first_stage(ignore_virtual=True) - or parallel_state.is_pipeline_last_stage(ignore_virtual=True) - ): - module = self.model - if module.share_token_embeddings: - word_embeddings_weight = module.word_embeddings_weight() - word_embeddings_weight._disable_greedy_grad_copy = not self.megatron_amp_o2 - word_embeddings_weight._disable_overlap_grad_sync = True + if parallel_state.get_pipeline_model_parallel_world_size() > 1: + if parallel_state.is_pipeline_first_stage(ignore_virtual=True): + if isinstance(self.model, list): + module = self.model[0] # only the first virtual rank has the embeddings + else: + module = self.model + if module.share_token_embeddings: + param = module.word_embeddings_weight() + param._disable_greedy_grad_copy = not self.megatron_amp_o2 + param._disable_overlap_grad_sync = True + if parallel_state.is_pipeline_last_stage(ignore_virtual=True): + if isinstance(self.model, list): + module = self.model[-1] # only the last virtual rank has the embeddings + else: + module = self.model + if module.share_token_embeddings: + param = module.word_embeddings_weight() + param._disable_greedy_grad_copy = not self.megatron_amp_o2 + param._disable_overlap_grad_sync = True + + # Disable overlapped grad sync for layer norm grads when + # sequence parallelism is enabled + for param in self.parameters(): + if getattr(param, 'sequence_parallel_enabled', False): + param._disable_greedy_grad_copy = not self.megatron_amp_o2 + param._disable_overlap_grad_sync = True # sequence parallelism is enabled for param in self.parameters(): @@ -709,3 +813,23 @@ def input_example(self, max_batch=1, max_dim=256): attention_mask = torch.randint(low=0, high=1, size=sz, device=sample.device) input_dict = {"input_ids": input_ids, "attention_mask": attention_mask, "token_type_ids": token_type_ids} return tuple([input_dict]) + + def on_save_checkpoint(self, checkpoint) -> None: + """LightningModule hook: + https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-save-checkpoint + """ + if isinstance(self.model, list): + for i in range(len(self.model)): + parallel_state.set_virtual_pipeline_model_parallel_rank(i) + checkpoint[f'model{i}'] = self.model[i].module.state_dict_for_save_checkpoint() + parallel_state.set_virtual_pipeline_model_parallel_rank(0) + + def on_load_checkpoint(self, checkpoint) -> None: + """LightningModule hook: + https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-load-checkpoint + """ + if isinstance(self.model, list): + for i in range(len(self.model)): + parallel_state.set_virtual_pipeline_model_parallel_rank(i) + self.model[i].module.load_state_dict(checkpoint[f'model{i}'], strict=True) + parallel_state.set_virtual_pipeline_model_parallel_rank(0) From dbefd268675211650dc497625f3fa7c33cc2e90a Mon Sep 17 00:00:00 2001 From: kevjshih Date: Fri, 16 Dec 2022 15:43:20 -0500 Subject: [PATCH 068/154] Add duration padding support for RADTTS inference (#5650) * Added duration padding support for RADTTS inference * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Kevin Shih Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- nemo/collections/tts/helpers/helpers.py | 24 +++++++++++++++++--- nemo/collections/tts/modules/radtts.py | 29 ++++++++++++++++--------- 2 files changed, 40 insertions(+), 13 deletions(-) diff --git a/nemo/collections/tts/helpers/helpers.py b/nemo/collections/tts/helpers/helpers.py index c7235780bb45..eba32b1a00eb 100644 --- a/nemo/collections/tts/helpers/helpers.py +++ b/nemo/collections/tts/helpers/helpers.py @@ -478,7 +478,15 @@ def remove(conv_list): return new_conv_list -def regulate_len(durations, enc_out, pace=1.0, mel_max_len=None): +def regulate_len( + durations, + enc_out, + pace=1.0, + mel_max_len=None, + replicate_to_nearest_multiple=False, + group_size=2, + in_lens: torch.tensor = None, +): """A function that takes predicted durations per encoded token, and repeats enc_out according to the duration. NOTE: durations.shape[1] == enc_out.shape[1] @@ -486,9 +494,12 @@ def regulate_len(durations, enc_out, pace=1.0, mel_max_len=None): durations (torch.tensor): A tensor of shape (batch x enc_length) that represents how many times to repeat each token in enc_out. enc_out (torch.tensor): A tensor of shape (batch x enc_length x enc_hidden) that represents the encoded tokens. - pace (float): The pace of speaker. Higher values result in faster speaking pace. Defaults to 1.0. - max_mel_len (int): The maximum length above which the output will be removed. If sum(durations, dim=1) > + pace (float): The pace of speaker. Higher values result in faster speaking pace. Defaults to 1.0. max_mel_len (int): The maximum length above which the output will be removed. If sum(durations, dim=1) > max_mel_len, the values after max_mel_len will be removed. Defaults to None, which has no max length. + replicate_to_nearest_multiple (bool): replicate the last element specified by durations[i, in_lens[i] - 1] until the + full length of the sequence is the next nearest multiple of group_size + group_size (int): factor used by replicate_to_nearest_multiple + in_lens (torch.tensor): input sequence length specifying valid values in the durations input tensor """ dtype = enc_out.dtype @@ -496,6 +507,13 @@ def regulate_len(durations, enc_out, pace=1.0, mel_max_len=None): reps = (reps + 0.5).floor().long() dec_lens = reps.sum(dim=1) + if replicate_to_nearest_multiple: + to_pad = group_size * (torch.div(dec_lens, group_size, rounding_mode='floor') + 1) - dec_lens + to_pad = to_pad.unsqueeze(-1).repeat(1, reps.shape[1]) + to_pad_expanded = torch.zeros_like(reps).scatter_(1, in_lens.unsqueeze(-1).long() - 1, to_pad) + reps = reps + to_pad_expanded + dec_lens = reps.sum(dim=1) + max_len = dec_lens.max() reps_cumsum = torch.cumsum(torch.nn.functional.pad(reps, (1, 0, 0, 0), value=0.0), dim=1)[:, None, :] reps_cumsum = reps_cumsum.to(dtype=dtype, device=enc_out.device) diff --git a/nemo/collections/tts/modules/radtts.py b/nemo/collections/tts/modules/radtts.py index 73f73e3d17e5..7bb3f2882334 100644 --- a/nemo/collections/tts/modules/radtts.py +++ b/nemo/collections/tts/modules/radtts.py @@ -107,17 +107,17 @@ def forward(self, z, context, inverse=False, seq_lens=None): class RadTTSModule(NeuralModule, Exportable): """ - Takes model parameters (modelConfig) from config file to intialize radtts module. - Specifiy the type of training in the include_modules parameter. "decatnvpred" for decoder training. and "decatnunvbiasdpmvpredapm" for feature training - n_speakers (int): Number of speakers + Takes model parameters (modelConfig) from config file to intialize radtts module. + Specifiy the type of training in the include_modules parameter. "decatnvpred" for decoder training. and "decatnunvbiasdpmvpredapm" for feature training + n_speakers (int): Number of speakers n_speaker_dim (int): number of speakers dimention n_text (int): Symbols embedding size n_text_dim (int): n_flows (int): - n_conv_layers_per_step (int): number of convultion layers per step - dummy_speaker_embedding (bool): - include_modules (string): A string that describes what to train. "decatnvpred" for decoder training. and "decatnunvbiasdpmvpredapm" for feature training. - scaling_fn (string): scaling function + n_conv_layers_per_step (int): number of convultion layers per step + dummy_speaker_embedding (bool): + include_modules (string): A string that describes what to train. "decatnvpred" for decoder training. and "decatnunvbiasdpmvpredapm" for feature training. + scaling_fn (string): scaling function decoder_use_partial_padding (Bool): Set this to True to add partial padding learn_alignments (Bool): set this to true to learn alignments attn_use_CTC (Bool): set True to use CTC @@ -360,7 +360,7 @@ def preprocess_context(self, context, speaker_vecs, out_lens, f0, energy_avg): def fold(self, mel): """Inverse of the self.unfold(mel.unsqueeze(-1)) operation used for the grouping or "squeeze" operation on input - + Args: mel: B x C x T tensor of temporal data """ @@ -608,7 +608,14 @@ def infer( dur = dur[:, 0] dur = dur.clamp(0, token_duration_max) - txt_enc_time_expanded, out_lens = regulate_len(dur, txt_enc.transpose(1, 2), pace) + txt_enc_time_expanded, out_lens = regulate_len( + dur, + txt_enc.transpose(1, 2), + pace, + replicate_to_nearest_multiple=True, + group_size=self.n_group_size, + in_lens=in_lens, + ) n_groups = torch.div(out_lens, self.n_group_size, rounding_mode='floor') max_out_len = torch.max(out_lens) @@ -771,7 +778,9 @@ def input_example(self, max_batch=1, max_dim=256): par = next(self.parameters()) sz = (max_batch, max_dim) inp = torch.randint(16, 32, sz, device=par.device, dtype=torch.int64) - lens = torch.randint(max_dim // 2, max_dim, (max_batch,), device=par.device, dtype=torch.int) + lens = torch.randint(max_dim // 8, max_dim // 4, (max_batch,), device=par.device, dtype=torch.int) + zeros_src = torch.zeros_like(inp) + inp.scatter_(1, lens.long().unsqueeze(-1) - 1, zeros_src) # set the inp[i, lens[i]-1] elt to zero speaker = torch.randint(0, 1, (max_batch,), device=par.device, dtype=torch.int64) inputs = { 'text': inp, From c5ed085e9ab2cf3a374d0a3e3dc1feec819d2026 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 16 Dec 2022 14:55:04 -0800 Subject: [PATCH 069/154] Add remove_blank_tokens_from_ctm parameter Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 2 ++ tools/nemo_forced_aligner/align.py | 11 +++++++++++ tools/nemo_forced_aligner/utils/make_ctm.py | 12 ++++++++++-- 3 files changed, 23 insertions(+), 2 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index e05211167d2f..6378669694c0 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -32,6 +32,8 @@ Call the `align.py` script, specifying the parameters as follows: * **[OPTIONAL]** `ctm_grouping_separator`: the string used to separate CTM segments. If the separator is `“”` (empty string) or `None`, the CTM segments will be the tokens used by the ASR model. If the separator is anything else, e.g. `“ “`, `“|”` or `“”`, the segments will be the blocks of text separated by that separator. (Default: `“ “`, so for languages such as English, the CTM segments will be words.) > Note: if you pass in a hydra override `ctm_grouping_separator=" "`, hydra will remove the whitespace, thus converting that `" "` to `""`. If you want to pass in a space, make sure to use `ctm_grouping_separator="\ "`, or just do not pass in this override, as the default value is `" "` anyway. +* **[OPTIONAL]** `remove_blank_tokens_from_ctm`: a boolean denoting whether to remove tokens from output CTMs. (Default: False). Note: "" tokens can only be present if your CTMs are token-level. Therefore, NFA will throw an error if you set `remove_blank_tokens_from_ctm` to `True` if the `ctm_grouping_separator` is not `""` or `None`. + * **[OPTIONAL]** `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be replaced with dashes, so as not to change the number of space-separated elements in the CTM files. * **[OPTIONAL]** `transcribe_device`: The device that will be used for generating log-probs (i.e. transcribing). (Default: 'cpu'). diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 4cf8ced37871..c2b18156f8d8 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -59,6 +59,9 @@ “abc |def” “abc| def” “abc | def” + remove_blank_tokens_from_ctm: a boolean denoting whether to remove tokens from output CTMs. + Note: "" tokens can only be present if your CTMs are token-level. Therefore, NFA will throw an error + if you set `remove_blank_tokens_from_ctm` to `True` if the `ctm_grouping_separator` is not `""` or `None`. n_parts_for_ctm_id: int specifying how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. Note also that any spaces that are present in the audio_filepath @@ -87,6 +90,7 @@ class AlignmentConfig: # General configs ctm_grouping_separator: Optional[str] = " " + remove_blank_tokens_from_ctm: bool = False n_parts_for_ctm_id: int = 1 transcribe_device: str = "cpu" viterbi_device: str = "cpu" @@ -117,6 +121,12 @@ def main(cfg: AlignmentConfig): if cfg.output_ctm_folder is None: raise ValueError("cfg.output_ctm_folder must be specified") + if (cfg.ctm_grouping_separator != "" or cfg.ctm_grouping_separator is None) and cfg.remove_blank_tokens_from_ctm: + raise ValueError( + "cfg.remove_blank_tokens_from_ctm can only be set to True if producing token-level CTMs, " + " (i.e. if cfg.ctm_grouping_separator is empty string or None" + ) + # Log info about selected params if cfg.ctm_grouping_separator == "" or cfg.ctm_grouping_separator is None: logging.info( @@ -178,6 +188,7 @@ def main(cfg: AlignmentConfig): cfg.output_ctm_folder, cfg.n_parts_for_ctm_id, audio_sr, + cfg.remove_blank_tokens_from_ctm, ) else: diff --git a/tools/nemo_forced_aligner/utils/make_ctm.py b/tools/nemo_forced_aligner/utils/make_ctm.py index 1a67aff4cad7..9a68c10c1935 100644 --- a/tools/nemo_forced_aligner/utils/make_ctm.py +++ b/tools/nemo_forced_aligner/utils/make_ctm.py @@ -26,7 +26,14 @@ def _get_utt_id(audio_filepath, n_parts_for_ctm_id): def make_token_ctm( - data, alignments, model, model_downsample_factor, output_ctm_folder, n_parts_for_ctm_id, audio_sr, + data, + alignments, + model, + model_downsample_factor, + output_ctm_folder, + n_parts_for_ctm_id, + audio_sr, + remove_blank_tokens_from_ctm, ): """ Note: assume order of utts in data matches order of utts in alignments @@ -92,7 +99,8 @@ def make_token_ctm( start_time = start_sample / audio_sr end_time = end_sample / audio_sr - f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {token}\n") + if not (remove_blank_tokens_from_ctm and token == ""): + f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {token}\n") return None From c3899a0685cb347282e77a8a21509ce2ab537eaa Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 16 Dec 2022 15:29:19 -0800 Subject: [PATCH 070/154] Dont save initial_silence line in CTM Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/utils/make_ctm.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/nemo_forced_aligner/utils/make_ctm.py b/tools/nemo_forced_aligner/utils/make_ctm.py index 9a68c10c1935..36e9196cd258 100644 --- a/tools/nemo_forced_aligner/utils/make_ctm.py +++ b/tools/nemo_forced_aligner/utils/make_ctm.py @@ -206,7 +206,7 @@ def make_segment_ctm( with open(os.path.join(output_ctm_folder, f"{utt_id}.ctm"), "w") as f_ctm: for segment_info in segments_info: - if not (segment_info["segment"] == "initial_silence" and segment_info["t_end"] is None): + if not segment_info["segment"] == "": segment = segment_info["segment"] start_sample = segment_info["t_start"] * timestep_to_sample_ratio end_sample = (segment_info["t_end"] + 1) * timestep_to_sample_ratio - 1 From 0087dad5c9a6136e005d849354b6930832e022c3 Mon Sep 17 00:00:00 2001 From: milesial Date: Sat, 17 Dec 2022 01:19:39 +0100 Subject: [PATCH 071/154] Add DLLogger support to exp_manager (#5658) * Add DLLogger support to exp_manager Signed-off-by: Alexandre Milesi * Move dllogger to separate file and check import Signed-off-by: Alexandre Milesi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Alexandre Milesi Signed-off-by: Alexandre Milesi Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper Signed-off-by: Elena Rastorgueva --- docs/source/core/exp_manager.rst | 2 +- nemo/utils/dllogger.py | 69 ++++++++++++++++++++++++++++++++ nemo/utils/exp_manager.py | 47 ++++++++++++++++++---- 3 files changed, 109 insertions(+), 9 deletions(-) create mode 100644 nemo/utils/dllogger.py diff --git a/docs/source/core/exp_manager.rst b/docs/source/core/exp_manager.rst index dc2689358925..854480ad1f65 100644 --- a/docs/source/core/exp_manager.rst +++ b/docs/source/core/exp_manager.rst @@ -4,7 +4,7 @@ Experiment Manager ================== -NeMo's Experiment Manager leverages PyTorch Lightning for model checkpointing, TensorBoard Logging, Weights and Biases, and MLFlow logging. The +NeMo's Experiment Manager leverages PyTorch Lightning for model checkpointing, TensorBoard Logging, Weights and Biases, DLLogger and MLFlow logging. The Experiment Manager is included by default in all NeMo example scripts. To use the experiment manager simply call :class:`~nemo.utils.exp_manager.exp_manager` and pass in the PyTorch Lightning ``Trainer``. diff --git a/nemo/utils/dllogger.py b/nemo/utils/dllogger.py new file mode 100644 index 000000000000..a9b7970a283d --- /dev/null +++ b/nemo/utils/dllogger.py @@ -0,0 +1,69 @@ +# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from pathlib import Path + +from pytorch_lightning.loggers import Logger + +from nemo.utils import logging + +try: + import dllogger + from dllogger import Verbosity + + HAVE_DLLOGGER = True +except (ImportError, ModuleNotFoundError): + HAVE_DLLOGGER = False + + +class DLLogger(Logger): + @property + def name(self): + return self.__class__.__name__ + + @property + def version(self): + return None + + def __init__(self, stdout: bool, verbose: bool, json_file: str): + if not HAVE_DLLOGGER: + raise ImportError( + "DLLogger was not found. Please see the README for installation instructions: " + "https://github.com/NVIDIA/dllogger" + ) + verbosity = Verbosity.VERBOSE if verbose else Verbosity.DEFAULT + backends = [] + if json_file: + Path(json_file).parent.mkdir(parents=True, exist_ok=True) + backends.append(dllogger.JSONStreamBackend(verbosity, json_file)) + if stdout: + backends.append(dllogger.StdOutBackend(verbosity)) + + if not backends: + logging.warning( + "Neither stdout nor json_file DLLogger parameters were specified." "DLLogger will not log anything." + ) + dllogger.init(backends=backends) + + def log_hyperparams(self, params, *args, **kwargs): + dllogger.log(step="PARAMETER", data=params) + + def log_metrics(self, metrics, step=None): + if step is None: + step = tuple() + + dllogger.log(step=step, data=metrics) + + def save(self): + dllogger.flush() diff --git a/nemo/utils/exp_manager.py b/nemo/utils/exp_manager.py index f5c997e4842e..3364f4d11670 100644 --- a/nemo/utils/exp_manager.py +++ b/nemo/utils/exp_manager.py @@ -42,6 +42,7 @@ from nemo.constants import NEMO_ENV_VARNAME_TESTING, NEMO_ENV_VARNAME_VERSION from nemo.utils import logging, timers from nemo.utils.app_state import AppState +from nemo.utils.dllogger import DLLogger from nemo.utils.env_var_parsing import get_envbool from nemo.utils.exceptions import NeMoBaseException from nemo.utils.get_rank import is_global_rank_zero @@ -103,6 +104,13 @@ class MLFlowParams: run_id: Optional[str] = None +@dataclass +class DLLoggerParams: + verbose: Optional[bool] = False + stdout: Optional[bool] = False + json_file: Optional[str] = "./dllogger.json" + + @dataclass class StepTimingParams: reduction: Optional[str] = "mean" @@ -139,6 +147,8 @@ class ExpManagerConfig: wandb_logger_kwargs: Optional[Dict[Any, Any]] = None create_mlflow_logger: Optional[bool] = False mlflow_logger_kwargs: Optional[MLFlowParams] = MLFlowParams() + create_dllogger_logger: Optional[bool] = False + dllogger_logger_kwargs: Optional[DLLoggerParams] = DLLoggerParams() # Checkpointing parameters create_checkpoint_callback: Optional[bool] = True checkpoint_callback_params: Optional[CallbackParams] = CallbackParams() @@ -207,7 +217,7 @@ def exp_manager(trainer: 'pytorch_lightning.Trainer', cfg: Optional[Union[DictCo directory. exp_manager also allows for explicit folder creation via explicit_log_dir. The version can be a datetime string or an integer. Datestime version can be disabled if use_datetime_version is set - to False. It optionally creates TensorBoardLogger, WandBLogger, MLFlowLogger, ModelCheckpoint objects from pytorch lightning. + to False. It optionally creates TensorBoardLogger, WandBLogger, DLLogger, MLFlowLogger, ModelCheckpoint objects from pytorch lightning. It copies sys.argv, and git information if available to the logging directory. It creates a log file for each process to log their output into. @@ -251,6 +261,9 @@ def exp_manager(trainer: 'pytorch_lightning.Trainer', cfg: Optional[Union[DictCo - create_mlflow_logger (bool): Whether to create an MLFlow logger and attach it to the pytorch lightning training. Defaults to False - mlflow_logger_kwargs (dict): optional parameters for the MLFlow logger + - create_dllogger_logger (bool): Whether to create an DLLogger logger and attach it to the pytorch lightning + training. Defaults to False + - dllogger_logger_kwargs (dict): optional parameters for the DLLogger logger - create_checkpoint_callback (bool): Whether to create a ModelCheckpoint callback and attach it to the pytorch lightning trainer. The ModelCheckpoint saves the top 3 models with the best "val_loss", the most recent checkpoint under ``*last.ckpt``, and the final checkpoint after training completes under ``*end.ckpt``. @@ -366,7 +379,12 @@ def exp_manager(trainer: 'pytorch_lightning.Trainer', cfg: Optional[Union[DictCo # For some reason, LearningRateLogger requires trainer to have a logger. Safer to create logger on all ranks # not just global rank 0. - if cfg.create_tensorboard_logger or cfg.create_wandb_logger or cfg.create_mlflow_logger: + if ( + cfg.create_tensorboard_logger + or cfg.create_wandb_logger + or cfg.create_mlflow_logger + or cfg.create_dllogger_logger + ): configure_loggers( trainer, exp_dir, @@ -378,6 +396,8 @@ def exp_manager(trainer: 'pytorch_lightning.Trainer', cfg: Optional[Union[DictCo cfg.wandb_logger_kwargs, cfg.create_mlflow_logger, cfg.mlflow_logger_kwargs, + cfg.create_dllogger_logger, + cfg.dllogger_logger_kwargs, ) # add loggers timing callbacks @@ -433,7 +453,8 @@ def error_checks(trainer: 'pytorch_lightning.Trainer', cfg: Optional[Union[DictC """ Checks that the passed trainer is compliant with NeMo and exp_manager's passed configuration. Checks that: - Throws error when hydra has changed the working directory. This causes issues with lightning's DDP - - Throws error when trainer has loggers defined but create_tensorboard_logger or create_WandB_logger or create_mlflow_logger is True + - Throws error when trainer has loggers defined but create_tensorboard_logger or create_wandB_logger + or create_mlflow_logger or create_dllogger_logger is True - Prints error messages when 1) run on multi-node and not Slurm, and 2) run on multi-gpu without DDP """ if HydraConfig.initialized() and get_original_cwd() != os.getcwd(): @@ -447,7 +468,8 @@ def error_checks(trainer: 'pytorch_lightning.Trainer', cfg: Optional[Union[DictC raise LoggerMisconfigurationError( "The pytorch lightning trainer that was passed to exp_manager contained a logger, and either " f"create_tensorboard_logger: {cfg.create_tensorboard_logger} or create_wandb_logger: " - f"{cfg.create_wandb_logger} or create_mlflow_logger: {cfg.create_mlflow_logger} was set to True. " + f"{cfg.create_wandb_logger} or create_mlflow_logger: {cfg.create_mlflow_logger}" + f"or create_dllogger_logger: {cfg.create_mlflow_logger} was set to True. " "These can only be used if trainer does not already have a logger." ) if trainer.num_nodes > 1 and not check_slurm(trainer): @@ -700,11 +722,14 @@ def configure_loggers( wandb_kwargs: dict, create_mlflow_logger: bool, mlflow_kwargs: dict, + create_dllogger_logger: bool, + dllogger_kwargs: dict, ): - """ Creates TensorboardLogger and/or WandBLogger / MLFlowLogger and attach them to trainer. Raises ValueError if - summary_writer_kwargs or wandb_kwargs are misconfigured. """ - # Potentially create tensorboard logger and/or WandBLogger / MLFlowLogger + Creates TensorboardLogger and/or WandBLogger / MLFlowLogger / DLlogger and attach them to trainer. + Raises ValueError if summary_writer_kwargs or wandb_kwargs are misconfigured. + """ + # Potentially create tensorboard logger and/or WandBLogger / MLFlowLogger / DLLogger logger_list = [] if create_tensorboard_logger: if summary_writer_kwargs is None: @@ -737,7 +762,13 @@ def configure_loggers( mlflow_logger = MLFlowLogger(run_name=version, **mlflow_kwargs) logger_list.append(mlflow_logger) - logging.info('MLFlowLogger has been set up') + logging.info("MLFlowLogger has been set up") + + if create_dllogger_logger: + dllogger_logger = DLLogger(**dllogger_kwargs) + + logger_list.append(dllogger_logger) + logging.info("DLLogger has been set up") trainer._logger_connector.configure_logger(logger_list) From c56e190aaeee12420a5eb77223a76cf776c279b9 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 16 Dec 2022 16:35:53 -0800 Subject: [PATCH 072/154] add minimum_timestamp_duration parameter Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 3 ++ tools/nemo_forced_aligner/align.py | 11 ++++++- tools/nemo_forced_aligner/utils/make_ctm.py | 33 ++++++++++++++++++++- 3 files changed, 45 insertions(+), 2 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 6378669694c0..ce81411bd567 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -42,6 +42,7 @@ Call the `align.py` script, specifying the parameters as follows: * **[OPTIONAL]** `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). +* **[OPTIONAL]** `minimum_timestamp_duration`: a float indicating a minimum duration (in seconds) for timestamps in the CTM. If any line in the CTM has a duration lower than the `minimum_timestamp_duration`, it will be enlarged from the middle outwards until it meets the minimum_timestamp_duration, or reaches the beginning or end of the audio file. Note that this may cause timestamps to overlap. (Default: 0, i.e. no modifications to predicted duration). # Input manifest file format NFA needs to be provided with a 'manifest' file where each line specifies the absolute "audio_filepath" and "text" of each utterance that you wish to produce alignments for, like the format below: @@ -62,3 +63,5 @@ Note the second item in the line (the 'channel ID', which is required by the CTM Ideally you would have some 'true' CTM files to compare with your generated CTM files. Alternatively (or additionally), you can visualize the quality of alignments using tools such as Gecko, which can play your audio file and display the predicted alignments at the same time. The Gecko tool requires you to upload an audio file and at least one CTM file. The Gecko tool can be accessed here: https://gong-io.github.io/gecko/. More information about the Gecko tool can be found on its Github page here: https://github.com/gong-io/gecko. + +Note that Gecko may not display some tokens/words/segments properly if their timestamps are too short. To ameliorate this, you may try setting `minimum_timestamp_duration` to some suitable number. diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index c2b18156f8d8..7d7e5491fa6a 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -75,7 +75,10 @@ viterbi_device: string specifying the device that will be used for doing Viterbi decoding. The string needs to be in a format recognized by torch.device(). batch_size: int specifying batch size that will be used for generating log-probs and doing Viterbi decoding. - + minimum_timestamp_duration: a float indicating a minimum duration (in seconds) for timestamps in the CTM. If any + line in the CTM has a duration lower than the `minimum_timestamp_duration`, it will be enlarged from the + middle outwards until it meets the minimum_timestamp_duration, or reaches the beginning or end of the audio + file. Note that this may cause timestamps to overlap. """ @@ -95,6 +98,7 @@ class AlignmentConfig: transcribe_device: str = "cpu" viterbi_device: str = "cpu" batch_size: int = 1 + minimum_timestamp_duration: float = 0 @hydra_runner(config_name="AlignmentConfig", schema=AlignmentConfig) @@ -127,6 +131,9 @@ def main(cfg: AlignmentConfig): " (i.e. if cfg.ctm_grouping_separator is empty string or None" ) + if cfg.minimum_timestamp_duration < 0: + raise ValueError("cfg.minimum_timestamp_duration cannot be a negative number") + # Log info about selected params if cfg.ctm_grouping_separator == "" or cfg.ctm_grouping_separator is None: logging.info( @@ -189,6 +196,7 @@ def main(cfg: AlignmentConfig): cfg.n_parts_for_ctm_id, audio_sr, cfg.remove_blank_tokens_from_ctm, + cfg.minimum_timestamp_duration, ) else: @@ -201,6 +209,7 @@ def main(cfg: AlignmentConfig): cfg.n_parts_for_ctm_id, audio_sr, cfg.ctm_grouping_separator, + cfg.minimum_timestamp_duration, ) return None diff --git a/tools/nemo_forced_aligner/utils/make_ctm.py b/tools/nemo_forced_aligner/utils/make_ctm.py index 36e9196cd258..b7f5c03c81b9 100644 --- a/tools/nemo_forced_aligner/utils/make_ctm.py +++ b/tools/nemo_forced_aligner/utils/make_ctm.py @@ -15,6 +15,8 @@ import os from pathlib import Path +import soundfile as sf + from .data_prep import get_processed_text_and_segments, tokenize_text @@ -34,6 +36,7 @@ def make_token_ctm( n_parts_for_ctm_id, audio_sr, remove_blank_tokens_from_ctm, + minimum_timestamp_duration, ): """ Note: assume order of utts in data matches order of utts in alignments @@ -90,6 +93,11 @@ def make_token_ctm( # get utt_id that will be used for saving CTM file as .ctm utt_id = _get_utt_id(manifest_line['audio_filepath'], n_parts_for_ctm_id) + # get audio file duration if we will need it later + if minimum_timestamp_duration > 0: + with sf.SoundFile(manifest_line["audio_filepath"]) as f: + audio_file_duration = f.frames / f.samplerate + with open(os.path.join(output_ctm_folder, f"{utt_id}.ctm"), "w") as f_ctm: for token_info in tokens_info: token = token_info["token"] @@ -99,6 +107,11 @@ def make_token_ctm( start_time = start_sample / audio_sr end_time = end_sample / audio_sr + if minimum_timestamp_duration > 0 and minimum_timestamp_duration > end_time - start_time: + token_mid_point = (start_time + end_time) / 2 + start_time = max(token_mid_point - minimum_timestamp_duration / 2, 0) + end_time = min(token_mid_point + minimum_timestamp_duration / 2, audio_file_duration) + if not (remove_blank_tokens_from_ctm and token == ""): f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {token}\n") @@ -106,7 +119,15 @@ def make_token_ctm( def make_segment_ctm( - data, alignments, model, model_downsample_factor, output_ctm_folder, n_parts_for_ctm_id, audio_sr, separator + data, + alignments, + model, + model_downsample_factor, + output_ctm_folder, + n_parts_for_ctm_id, + audio_sr, + separator, + minimum_timestamp_duration, ): """ Note: assume order of utts in data matches order of utts in alignments @@ -204,6 +225,11 @@ def make_segment_ctm( # get utt_id that will be used for saving CTM file as .ctm utt_id = _get_utt_id(manifest_line['audio_filepath'], n_parts_for_ctm_id) + # get audio file duration if we will need it later + if minimum_timestamp_duration > 0: + with sf.SoundFile(manifest_line["audio_filepath"]) as f: + audio_file_duration = f.frames / f.samplerate + with open(os.path.join(output_ctm_folder, f"{utt_id}.ctm"), "w") as f_ctm: for segment_info in segments_info: if not segment_info["segment"] == "": @@ -214,6 +240,11 @@ def make_segment_ctm( start_time = start_sample / audio_sr end_time = end_sample / audio_sr + if minimum_timestamp_duration > 0 and minimum_timestamp_duration > end_time - start_time: + token_mid_point = (start_time + end_time) / 2 + start_time = max(token_mid_point - minimum_timestamp_duration / 2, 0) + end_time = min(token_mid_point + minimum_timestamp_duration / 2, audio_file_duration) + f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {segment}\n") return None From 312e4400c6f3947667f8e197ad541e94a7909308 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 16 Dec 2022 16:38:07 -0800 Subject: [PATCH 073/154] add suggestion about removing blanks to README Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index ce81411bd567..00df749a198a 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -64,4 +64,4 @@ Ideally you would have some 'true' CTM files to compare with your generated CTM Alternatively (or additionally), you can visualize the quality of alignments using tools such as Gecko, which can play your audio file and display the predicted alignments at the same time. The Gecko tool requires you to upload an audio file and at least one CTM file. The Gecko tool can be accessed here: https://gong-io.github.io/gecko/. More information about the Gecko tool can be found on its Github page here: https://github.com/gong-io/gecko. -Note that Gecko may not display some tokens/words/segments properly if their timestamps are too short. To ameliorate this, you may try setting `minimum_timestamp_duration` to some suitable number. +Note that Gecko may not display some tokens/words/segments properly if their timestamps are too short. To ameliorate this, you may try setting `minimum_timestamp_duration` to some suitable number. If you are producing token-level CTMs, it may also be helpful to set `remove_blank_tokens_from_ctm` to `True`, to make the Gecko visualization less cluttered. From 73f796cf283ef267571c0af519ac3f104d3a6968 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 16 Dec 2022 17:38:52 -0800 Subject: [PATCH 074/154] reorder args Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 12 +++++------ tools/nemo_forced_aligner/align.py | 24 ++++++++++----------- tools/nemo_forced_aligner/utils/make_ctm.py | 8 +++---- 3 files changed, 22 insertions(+), 22 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 00df749a198a..2b2cf2411b09 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -29,6 +29,12 @@ Call the `align.py` script, specifying the parameters as follows: * `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. +* **[OPTIONAL]** `transcribe_device`: The device that will be used for generating log-probs (i.e. transcribing). (Default: 'cpu'). + +* **[OPTIONAL]** `viterbi_device`: The device that will be used for doing Viterbi decoding. (Default: 'cpu'). + +* **[OPTIONAL]** `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). + * **[OPTIONAL]** `ctm_grouping_separator`: the string used to separate CTM segments. If the separator is `“”` (empty string) or `None`, the CTM segments will be the tokens used by the ASR model. If the separator is anything else, e.g. `“ “`, `“|”` or `“”`, the segments will be the blocks of text separated by that separator. (Default: `“ “`, so for languages such as English, the CTM segments will be words.) > Note: if you pass in a hydra override `ctm_grouping_separator=" "`, hydra will remove the whitespace, thus converting that `" "` to `""`. If you want to pass in a space, make sure to use `ctm_grouping_separator="\ "`, or just do not pass in this override, as the default value is `" "` anyway. @@ -36,12 +42,6 @@ Call the `align.py` script, specifying the parameters as follows: * **[OPTIONAL]** `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be replaced with dashes, so as not to change the number of space-separated elements in the CTM files. -* **[OPTIONAL]** `transcribe_device`: The device that will be used for generating log-probs (i.e. transcribing). (Default: 'cpu'). - -* **[OPTIONAL]** `viterbi_device`: The device that will be used for doing Viterbi decoding. (Default: 'cpu'). - -* **[OPTIONAL]** `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). - * **[OPTIONAL]** `minimum_timestamp_duration`: a float indicating a minimum duration (in seconds) for timestamps in the CTM. If any line in the CTM has a duration lower than the `minimum_timestamp_duration`, it will be enlarged from the middle outwards until it meets the minimum_timestamp_duration, or reaches the beginning or end of the audio file. Note that this may cause timestamps to overlap. (Default: 0, i.e. no modifications to predicted duration). # Input manifest file format diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 7d7e5491fa6a..1e793ffc40c1 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -47,6 +47,11 @@ manifest_filepath: filepath to the manifest of the data you want to align, containing 'audio_filepath' and 'text' fields. output_ctm_folder: the folder where output CTM files will be saved. + transcribe_device: string specifying the device that will be used for generating log-probs (i.e. "transcribing"). + The string needs to be in a format recognized by torch.device(). + viterbi_device: string specifying the device that will be used for doing Viterbi decoding. + The string needs to be in a format recognized by torch.device(). + batch_size: int specifying batch size that will be used for generating log-probs and doing Viterbi decoding. ctm_grouping_separator: the string used to separate CTM segments. If the separator is “” or None, each line of the output CTM will be the tokens used by the ASR model. If the separator is anything else, e.g. “ “, “|” or “”, the segments will be the blocks of @@ -70,11 +75,6 @@ e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 1 => utt_id will be "e1" e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 2 => utt_id will be "d_e1" e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 3 => utt_id will be "c_d_e1" - transcribe_device: string specifying the device that will be used for generating log-probs (i.e. "transcribing"). - The string needs to be in a format recognized by torch.device(). - viterbi_device: string specifying the device that will be used for doing Viterbi decoding. - The string needs to be in a format recognized by torch.device(). - batch_size: int specifying batch size that will be used for generating log-probs and doing Viterbi decoding. minimum_timestamp_duration: a float indicating a minimum duration (in seconds) for timestamps in the CTM. If any line in the CTM has a duration lower than the `minimum_timestamp_duration`, it will be enlarged from the middle outwards until it meets the minimum_timestamp_duration, or reaches the beginning or end of the audio @@ -92,13 +92,13 @@ class AlignmentConfig: output_ctm_folder: Optional[str] = None # General configs - ctm_grouping_separator: Optional[str] = " " - remove_blank_tokens_from_ctm: bool = False - n_parts_for_ctm_id: int = 1 transcribe_device: str = "cpu" viterbi_device: str = "cpu" batch_size: int = 1 + ctm_grouping_separator: Optional[str] = " " + remove_blank_tokens_from_ctm: bool = False minimum_timestamp_duration: float = 0 + n_parts_for_ctm_id: int = 1 @hydra_runner(config_name="AlignmentConfig", schema=AlignmentConfig) @@ -193,10 +193,10 @@ def main(cfg: AlignmentConfig): model, cfg.model_downsample_factor, cfg.output_ctm_folder, - cfg.n_parts_for_ctm_id, - audio_sr, cfg.remove_blank_tokens_from_ctm, cfg.minimum_timestamp_duration, + cfg.n_parts_for_ctm_id, + audio_sr, ) else: @@ -206,10 +206,10 @@ def main(cfg: AlignmentConfig): model, cfg.model_downsample_factor, cfg.output_ctm_folder, - cfg.n_parts_for_ctm_id, - audio_sr, cfg.ctm_grouping_separator, + cfg.n_parts_for_ctm_id, cfg.minimum_timestamp_duration, + audio_sr, ) return None diff --git a/tools/nemo_forced_aligner/utils/make_ctm.py b/tools/nemo_forced_aligner/utils/make_ctm.py index b7f5c03c81b9..db2b1c7c691e 100644 --- a/tools/nemo_forced_aligner/utils/make_ctm.py +++ b/tools/nemo_forced_aligner/utils/make_ctm.py @@ -33,10 +33,10 @@ def make_token_ctm( model, model_downsample_factor, output_ctm_folder, - n_parts_for_ctm_id, - audio_sr, remove_blank_tokens_from_ctm, minimum_timestamp_duration, + n_parts_for_ctm_id, + audio_sr, ): """ Note: assume order of utts in data matches order of utts in alignments @@ -124,10 +124,10 @@ def make_segment_ctm( model, model_downsample_factor, output_ctm_folder, - n_parts_for_ctm_id, - audio_sr, separator, + n_parts_for_ctm_id, minimum_timestamp_duration, + audio_sr, ): """ Note: assume order of utts in data matches order of utts in alignments From 612ac2e39aec127ef129c31800a04db5693405ef Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 16 Dec 2022 17:47:09 -0800 Subject: [PATCH 075/154] clarify description of ctm_grouping_separator in README Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 2b2cf2411b09..b31e9ecf0d44 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -36,6 +36,8 @@ Call the `align.py` script, specifying the parameters as follows: * **[OPTIONAL]** `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). * **[OPTIONAL]** `ctm_grouping_separator`: the string used to separate CTM segments. If the separator is `“”` (empty string) or `None`, the CTM segments will be the tokens used by the ASR model. If the separator is anything else, e.g. `“ “`, `“|”` or `“”`, the segments will be the blocks of text separated by that separator. (Default: `“ “`, so for languages such as English, the CTM segments will be words.) +> Note: if the separator is not `“”` or `“ “`, it will be removed from the ground truth text and the CTM, ie it is treated as a marker which is not part of the ground truth. The separator will essentially be treated as a space, and any additional spaces around it will be amalgamated into one, i.e. the following texts will be treated equivalently: `“abc|def”`, `“abc |def”`, `“abc| def”`, `“abc | def"`. + > Note: if you pass in a hydra override `ctm_grouping_separator=" "`, hydra will remove the whitespace, thus converting that `" "` to `""`. If you want to pass in a space, make sure to use `ctm_grouping_separator="\ "`, or just do not pass in this override, as the default value is `" "` anyway. * **[OPTIONAL]** `remove_blank_tokens_from_ctm`: a boolean denoting whether to remove tokens from output CTMs. (Default: False). Note: "" tokens can only be present if your CTMs are token-level. Therefore, NFA will throw an error if you set `remove_blank_tokens_from_ctm` to `True` if the `ctm_grouping_separator` is not `""` or `None`. @@ -49,7 +51,7 @@ NFA needs to be provided with a 'manifest' file where each line specifies the ab ```json {"audio_filepath": "/absolute/path/to/audio.wav", "text": "the transcription of the utterance"} ``` -> Note: NFA does not require "duration" fields, and can align long audio files without running out of memory. Depending on your machine specs, you can align audios up to 5-10 minutes on Conformer CTC models, up to around 1.5 hours for QuartzNet models, and up to several hours for Citrinet models. NFA will also produce better alignments the more accurate the ground-truth "text" is. +> Note: NFA does not require `"duration"` fields in the manifest, and can align long audio files without running out of memory. Depending on your machine specs, you can align audios up to 5-10 minutes on Conformer CTC models, up to around 1.5 hours for QuartzNet models, and up to several hours for Citrinet models. NFA will also produce better alignments the more accurate the ground-truth `"text"` is. # Output CTM file format From 460f82bc20c52ac1b057a22816d3d0c20a70a2c6 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 16 Dec 2022 17:49:46 -0800 Subject: [PATCH 076/154] update docstring Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/utils/viterbi_decoding.py | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/tools/nemo_forced_aligner/utils/viterbi_decoding.py b/tools/nemo_forced_aligner/utils/viterbi_decoding.py index 301af6bd7a6b..a7b52272cec9 100644 --- a/tools/nemo_forced_aligner/utils/viterbi_decoding.py +++ b/tools/nemo_forced_aligner/utils/viterbi_decoding.py @@ -21,16 +21,8 @@ def viterbi_decoding(log_probs, y, T, U, device): """ Does Viterbi decoding. Returns: - alignments: list of lists containing locations for the tokens we align to at each timestep - looks like: [[0, 0, 1, 2, 2, 3, 3, ... ], [0, 1, 2, 2, 2, 3, 4, ....], ...] - v_matrix: - tensor of shape (Batch, Time, Length_of_U_including_interspersed_blanks) - containing viterbi probabilities - This is returned in case you want to examine it in a parent function - log_probs_reordered: - log_probs reordered to match the order of ground truth tokens - tensor of shape (Batch, Time, Length_of_U_including_interspersed_blanks) - This is returned in case you want to examine it in a parent function + alignments: list of lists containing locations for the tokens we align to at each timestep. + Looks like: [[0, 0, 1, 2, 2, 3, 3, ... ], [0, 1, 2, 2, 2, 3, 4, ....], ...] """ B, T_max, _ = log_probs.shape U_max = y.shape[1] From 2c6c4efbb0e188e29b2457770ffb5370fd4222b4 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 16 Dec 2022 23:37:03 -0800 Subject: [PATCH 077/154] [TTS][ZH] bugfix for ngc cli installation. (#5652) (#5664) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- .../tts/FastPitch_ChineseTTS_Training.ipynb | 24 ++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb b/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb index 4d85d95c49b8..49455b1947a3 100644 --- a/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb +++ b/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb @@ -185,9 +185,31 @@ "source": [ "%%bash\n", "# https://ngc.nvidia.com/setup/installers/cli\n", + "rm -rf ngc-cli\n", "wget --content-disposition \"https://ngc.nvidia.com/downloads/ngccli_linux.zip\" && unzip ngccli_linux.zip && chmod u+x ngc-cli/ngc\n", "find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5\n", - "echo \"export PATH=\\\"\\$PATH:$(pwd)/ngc-cli\\\"\" >> ~/.bash_profile && source ~/.bash_profile" + "rm ngccli_linux.zip ngc-cli.md5" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f0ac1e1-d86c-412e-abcd-677a172a09ba", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ[\"PATH\"] = f\"{os.getcwd()}/ngc-cli:{os.getenv('PATH', '')}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da507572-012c-4040-ab33-ddacde97fadf", + "metadata": {}, + "outputs": [], + "source": [ + "!echo $PATH" ] }, { From b6da516a67d824e73304ee3b2400c11dff9b07e7 Mon Sep 17 00:00:00 2001 From: Sandeep Subramanian Date: Sat, 17 Dec 2022 09:20:52 -1000 Subject: [PATCH 078/154] Port stateless timer to exp manager (#5584) * Port stateless timer to exp manager Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes and remove from all megatron code Signed-off-by: MaximumEntropy * Fixes Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change message Signed-off-by: MaximumEntropy Signed-off-by: MaximumEntropy Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- .../megatron_bart_pretraining.py | 7 +----- .../megatron_bert_pretraining.py | 6 +---- .../megatron_gpt_pretraining.py | 6 +---- .../megatron_gpt_prompt_learning.py | 7 +----- .../megatron_retro_mutransfer_pretrain.py | 7 +----- .../megatron_retro_pretraining.py | 7 +----- .../megatron_t5_lm_adaptation_finetune.py | 7 +----- .../megatron_t5_pretraining.py | 7 +----- .../megatron_t5_prompt_learning.py | 8 +------ .../megatron_t5_seq2seq_eval.py | 8 +------ .../megatron_t5_seq2seq_finetune.py | 7 +----- .../tuning/megatron_gpt_adapter_tuning.py | 8 +------ .../tuning/megatron_gpt_ia3_tuning.py | 8 +------ .../tuning/megatron_t5_adapter_tuning.py | 8 +------ .../tuning/megatron_t5_ia3_tuning.py | 8 +------ .../megatron_nmt_training.py | 7 +----- nemo/utils/exp_manager.py | 22 +++++++++++++++++++ tests/core_ptl/test_ptl_stateless_timer.py | 2 +- 18 files changed, 39 insertions(+), 101 deletions(-) diff --git a/examples/nlp/language_modeling/megatron_bart_pretraining.py b/examples/nlp/language_modeling/megatron_bart_pretraining.py index b08772c24348..e798dbfe4b78 100644 --- a/examples/nlp/language_modeling/megatron_bart_pretraining.py +++ b/examples/nlp/language_modeling/megatron_bart_pretraining.py @@ -17,7 +17,6 @@ from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer from pytorch_lightning.callbacks import ModelSummary -from pytorch_lightning.callbacks.timer import Timer from pytorch_lightning.trainer.connectors.checkpoint_connector import CheckpointConnector from nemo.collections.nlp.models.language_modeling.megatron_bart_model import MegatronBARTModel @@ -29,7 +28,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager @hydra_runner(config_path="conf", config_name="megatron_bart_config") @@ -72,10 +71,6 @@ def main(cfg) -> None: logging.info(f'Resuming training from checkpoint: {resume_from_checkpoint}') trainer._checkpoint_connector = CheckpointConnector(trainer, resume_from_checkpoint=resume_from_checkpoint) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): diff --git a/examples/nlp/language_modeling/megatron_bert_pretraining.py b/examples/nlp/language_modeling/megatron_bert_pretraining.py index e21a29a6f77a..cc59a866f6ed 100644 --- a/examples/nlp/language_modeling/megatron_bert_pretraining.py +++ b/examples/nlp/language_modeling/megatron_bert_pretraining.py @@ -27,7 +27,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager @hydra_runner(config_path="conf", config_name="megatron_bert_config") @@ -70,10 +70,6 @@ def main(cfg) -> None: logging.info(f'Resuming training from checkpoint: {resume_from_checkpoint}') trainer._checkpoint_connector = CheckpointConnector(trainer, resume_from_checkpoint=resume_from_checkpoint) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): diff --git a/examples/nlp/language_modeling/megatron_gpt_pretraining.py b/examples/nlp/language_modeling/megatron_gpt_pretraining.py index 0563cdf703b1..ff055c97e209 100644 --- a/examples/nlp/language_modeling/megatron_gpt_pretraining.py +++ b/examples/nlp/language_modeling/megatron_gpt_pretraining.py @@ -28,7 +28,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager @hydra_runner(config_path="conf", config_name="megatron_gpt_config") @@ -74,10 +74,6 @@ def main(cfg) -> None: logging.info(f'Resuming training from checkpoint: {resume_from_checkpoint}') trainer._checkpoint_connector = CheckpointConnector(trainer, resume_from_checkpoint=resume_from_checkpoint) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): diff --git a/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py b/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py index 308fde5f519e..e3fa95013f72 100644 --- a/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py +++ b/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py @@ -30,7 +30,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager mp.set_start_method("spawn", force=True) @@ -74,11 +74,6 @@ def main(cfg) -> None: trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) exp_manager(trainer, cfg.exp_manager) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) - # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): cfg.model.precision = cfg.trainer.precision diff --git a/examples/nlp/language_modeling/megatron_retro_mutransfer_pretrain.py b/examples/nlp/language_modeling/megatron_retro_mutransfer_pretrain.py index d755da52fe2f..03fb490bc16d 100644 --- a/examples/nlp/language_modeling/megatron_retro_mutransfer_pretrain.py +++ b/examples/nlp/language_modeling/megatron_retro_mutransfer_pretrain.py @@ -15,7 +15,6 @@ from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer -from pytorch_lightning.callbacks.timer import Timer from pytorch_lightning.plugins.precision.native_amp import NativeMixedPrecisionPlugin from pytorch_lightning.trainer.connectors.checkpoint_connector import CheckpointConnector @@ -26,7 +25,7 @@ from nemo.core.config.optimizers import AdamParams, AdamWParams from nemo.core.optim.optimizers import register_optimizer from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager @hydra_runner(config_path="conf", config_name="megatron_retro_mutransfer") @@ -70,10 +69,6 @@ def main(cfg) -> None: logging.info(f'Resuming training from checkpoint: {resume_from_checkpoint}') trainer._checkpoint_connector = CheckpointConnector(trainer, resume_from_checkpoint=resume_from_checkpoint) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): diff --git a/examples/nlp/language_modeling/megatron_retro_pretraining.py b/examples/nlp/language_modeling/megatron_retro_pretraining.py index f9bde24ca1ba..947c86a0df7c 100644 --- a/examples/nlp/language_modeling/megatron_retro_pretraining.py +++ b/examples/nlp/language_modeling/megatron_retro_pretraining.py @@ -15,7 +15,6 @@ from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer -from pytorch_lightning.callbacks.timer import Timer from pytorch_lightning.plugins.precision.native_amp import NativeMixedPrecisionPlugin from pytorch_lightning.trainer.connectors.checkpoint_connector import CheckpointConnector @@ -23,7 +22,7 @@ from nemo.collections.nlp.parts.nlp_overrides import GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager @hydra_runner(config_path="conf", config_name="megatron_retro_config") @@ -65,10 +64,6 @@ def main(cfg) -> None: logging.info(f'Resuming training from checkpoint: {resume_from_checkpoint}') trainer._checkpoint_connector = CheckpointConnector(trainer, resume_from_checkpoint=resume_from_checkpoint) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): diff --git a/examples/nlp/language_modeling/megatron_t5_lm_adaptation_finetune.py b/examples/nlp/language_modeling/megatron_t5_lm_adaptation_finetune.py index 3550d5e2918c..8b244e00b693 100644 --- a/examples/nlp/language_modeling/megatron_t5_lm_adaptation_finetune.py +++ b/examples/nlp/language_modeling/megatron_t5_lm_adaptation_finetune.py @@ -17,7 +17,6 @@ from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer from pytorch_lightning.callbacks import ModelSummary -from pytorch_lightning.callbacks.timer import Timer from pytorch_lightning.trainer.connectors.checkpoint_connector import CheckpointConnector from nemo.collections.nlp.models.language_modeling.megatron_t5_model import MegatronT5Model @@ -30,7 +29,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager @hydra_runner(config_path="conf", config_name="megatron_t5_lm_adaptation_finetune") @@ -72,10 +71,6 @@ def main(cfg) -> None: logging.info(f'Resuming training from checkpoint: {resume_from_checkpoint}') trainer._checkpoint_connector = CheckpointConnector(trainer, resume_from_checkpoint=resume_from_checkpoint) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): diff --git a/examples/nlp/language_modeling/megatron_t5_pretraining.py b/examples/nlp/language_modeling/megatron_t5_pretraining.py index 018cdeae4c24..07a86ec8956b 100644 --- a/examples/nlp/language_modeling/megatron_t5_pretraining.py +++ b/examples/nlp/language_modeling/megatron_t5_pretraining.py @@ -17,7 +17,6 @@ from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer from pytorch_lightning.callbacks import ModelSummary -from pytorch_lightning.callbacks.timer import Timer from pytorch_lightning.trainer.connectors.checkpoint_connector import CheckpointConnector from nemo.collections.nlp.models.language_modeling.megatron_t5_model import MegatronT5Model @@ -29,7 +28,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager @hydra_runner(config_path="conf", config_name="megatron_t5_config") @@ -72,10 +71,6 @@ def main(cfg) -> None: logging.info(f'Resuming training from checkpoint: {resume_from_checkpoint}') trainer._checkpoint_connector = CheckpointConnector(trainer, resume_from_checkpoint=resume_from_checkpoint) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): diff --git a/examples/nlp/language_modeling/megatron_t5_prompt_learning.py b/examples/nlp/language_modeling/megatron_t5_prompt_learning.py index efcf01288a7c..b2c5146e3ef6 100644 --- a/examples/nlp/language_modeling/megatron_t5_prompt_learning.py +++ b/examples/nlp/language_modeling/megatron_t5_prompt_learning.py @@ -16,7 +16,6 @@ from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer -from pytorch_lightning.callbacks.timer import Timer from nemo.collections.nlp.models.language_modeling.megatron_t5_prompt_learning_model import ( MegatronT5PromptLearningModel, @@ -29,7 +28,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager mp.set_start_method("spawn", force=True) @@ -67,11 +66,6 @@ def main(cfg) -> None: trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) exp_manager(trainer, cfg.exp_manager) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) - # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): cfg.model.precision = cfg.trainer.precision diff --git a/examples/nlp/language_modeling/megatron_t5_seq2seq_eval.py b/examples/nlp/language_modeling/megatron_t5_seq2seq_eval.py index 008bdc90cdd6..60af013d38a0 100644 --- a/examples/nlp/language_modeling/megatron_t5_seq2seq_eval.py +++ b/examples/nlp/language_modeling/megatron_t5_seq2seq_eval.py @@ -16,7 +16,6 @@ from megatron_t5_seq2seq_finetune import load_from_checkpoint_dir, load_from_nemo, validate_checkpoint_loading_args from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer -from pytorch_lightning.callbacks.timer import Timer from pytorch_lightning.plugins.precision.native_amp import NativeMixedPrecisionPlugin from nemo.collections.nlp.models.language_modeling.megatron_finetune_model import MegatronT5FinetuneModel @@ -25,7 +24,7 @@ from nemo.collections.nlp.parts.nlp_overrides import GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager def _modify_config(t5_cfg, cfg, add_cfg_to_tree=False): @@ -99,11 +98,6 @@ def main(cfg) -> None: exp_manager(trainer, cfg.exp_manager) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) - if hasattr(cfg.model.data.validation_ds, 'task_name'): if cfg.model.restore_from_path: t5_cfg = MegatronT5GLUEModel.restore_from( diff --git a/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py b/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py index 0f513e387478..ffdb6b20f780 100644 --- a/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py +++ b/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py @@ -18,7 +18,6 @@ import torch.multiprocessing as mp from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer -from pytorch_lightning.callbacks.timer import Timer from pytorch_lightning.trainer.connectors.checkpoint_connector import CheckpointConnector from nemo.collections.nlp.models.language_modeling.megatron_finetune_model import MegatronT5FinetuneModel @@ -34,7 +33,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import AppState, logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager from nemo.utils.model_utils import inject_model_parallel_rank mp.set_start_method("spawn", force=True) @@ -170,10 +169,6 @@ def main(cfg) -> None: logging.info(f'Resuming training from checkpoint: {resume_from_checkpoint}') trainer._checkpoint_connector = CheckpointConnector(trainer, resume_from_checkpoint=resume_from_checkpoint) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) if hasattr(cfg.model.data.train_ds, 'task_name'): if cfg.model.restore_from_path: diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py b/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py index aeabe18b3d9a..5d03d96d77f5 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py @@ -16,7 +16,6 @@ from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer -from pytorch_lightning.callbacks.timer import Timer from nemo.collections.nlp.models.language_modeling.megatron_gpt_adapter_model import MegatronGPTAdapterLearningModel from nemo.collections.nlp.parts.nlp_overrides import ( @@ -28,7 +27,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager """ @@ -87,11 +86,6 @@ def main(cfg) -> None: trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) exp_manager(trainer, cfg.exp_manager) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) - # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): cfg.model.precision = cfg.trainer.precision diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py b/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py index 8103be100b10..d5474bcdde5b 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py @@ -16,7 +16,6 @@ from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer -from pytorch_lightning.callbacks.timer import Timer from nemo.collections.nlp.models.language_modeling.megatron_gpt_adapter_model import MegatronGPTInfusedAdapterModel from nemo.collections.nlp.parts.nlp_overrides import ( @@ -28,7 +27,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager """ @@ -87,11 +86,6 @@ def main(cfg) -> None: trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) exp_manager(trainer, cfg.exp_manager) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) - # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): cfg.model.precision = cfg.trainer.precision diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py b/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py index 50e126e0de52..15c253984148 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py @@ -16,7 +16,6 @@ from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer -from pytorch_lightning.callbacks.timer import Timer from nemo.collections.nlp.models.language_modeling.megatron_t5_adapter_model import MegatronT5AdapterLearningModel from nemo.collections.nlp.parts.nlp_overrides import ( @@ -28,7 +27,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager """ @@ -87,11 +86,6 @@ def main(cfg) -> None: trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) exp_manager(trainer, cfg.exp_manager) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) - # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): cfg.model.precision = cfg.trainer.precision diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py b/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py index 6230231638c7..27dfd41fed75 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py @@ -16,7 +16,6 @@ from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer -from pytorch_lightning.callbacks.timer import Timer from nemo.collections.nlp.models.language_modeling.megatron_t5_adapter_model import MegatronT5InfusedAdapterModel from nemo.collections.nlp.parts.nlp_overrides import ( @@ -28,7 +27,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager """ @@ -87,11 +86,6 @@ def main(cfg) -> None: trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) exp_manager(trainer, cfg.exp_manager) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) - # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): cfg.model.precision = cfg.trainer.precision diff --git a/examples/nlp/machine_translation/megatron_nmt_training.py b/examples/nlp/machine_translation/megatron_nmt_training.py index 9299996efc24..4526f41c702d 100644 --- a/examples/nlp/machine_translation/megatron_nmt_training.py +++ b/examples/nlp/machine_translation/megatron_nmt_training.py @@ -17,7 +17,6 @@ from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer from pytorch_lightning.callbacks import ModelSummary -from pytorch_lightning.callbacks.timer import Timer from pytorch_lightning.trainer.connectors.checkpoint_connector import CheckpointConnector from nemo.collections.nlp.models.language_modeling.megatron_bart_model import MegatronBARTModel @@ -32,7 +31,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging -from nemo.utils.exp_manager import StatelessTimer, exp_manager +from nemo.utils.exp_manager import exp_manager @hydra_runner(config_path="conf", config_name="aayn_base_megatron") @@ -75,10 +74,6 @@ def main(cfg) -> None: logging.info(f'Resuming training from checkpoint: {resume_from_checkpoint}') trainer._checkpoint_connector = CheckpointConnector(trainer, resume_from_checkpoint=resume_from_checkpoint) - # Override timer callback to a stateless one - for idx, callback in enumerate(trainer.callbacks): - if isinstance(callback, Timer): - trainer.callbacks[idx] = StatelessTimer(cfg.trainer.max_time,) # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): diff --git a/nemo/utils/exp_manager.py b/nemo/utils/exp_manager.py index 3364f4d11670..fd62cd7c71f3 100644 --- a/nemo/utils/exp_manager.py +++ b/nemo/utils/exp_manager.py @@ -163,6 +163,8 @@ class ExpManagerConfig: # disable initial validation when resuming from a checkpoint saved during validation disable_validation_on_resume: Optional[bool] = True ema: Optional[EMAParams] = EMAParams() + # Wall clock time limit + max_time_per_run: Optional[str] = None class TimingCallback(Callback): @@ -274,6 +276,8 @@ def exp_manager(trainer: 'pytorch_lightning.Trainer', cfg: Optional[Union[DictCo Set this to True if you are using DDP with many GPUs and do not want many log files in your exp dir. - log_global_rank_0_only (bool): Whether to only create log files for global rank 0. Defaults to False. Set this to True if you are using DDP with many GPUs and do not want many log files in your exp dir. + - max_time (str): The maximum wall clock time *per run*. This is intended to be used on clusters where you want + a checkpoint to be saved after this specified time and be able to resume from that checkpoint. Defaults to None. returns: log_dir (Path): The final logging directory where logging files are saved. Usually the concatenation of @@ -423,6 +427,24 @@ def exp_manager(trainer: 'pytorch_lightning.Trainer', cfg: Optional[Union[DictCo # extend training loop to skip initial validation when resuming from checkpoint configure_no_restart_validation_training_loop(trainer) + # Setup a stateless timer for use on clusters. + if cfg.max_time_per_run is not None: + found_ptl_timer = False + for idx, callback in enumerate(trainer.callbacks): + if isinstance(callback, Timer): + # NOTE: PTL does not expose a `trainer.max_time`. By the time we are in this function, PTL has already setup a timer if the user specifies `trainer.max_time` so best we can do is replace that. + # Working: If only `trainer.max_time` is set - it behaves as a normal PTL timer. If only `exp_manager.max_time_per_run` is set - it behaves as a StateLessTimer. If both are set, it also behaves as a StateLessTimer. + logging.warning( + f'Found a PTL Timer callback, replacing with a StatelessTimer callback. This will happen if you set trainer.max_time as well as exp_manager.max_time_per_run.' + ) + trainer.callbacks[idx] = StatelessTimer(cfg.max_time_per_run) + found_ptl_timer = True + break + + if not found_ptl_timer: + trainer.max_time = cfg.max_time_per_run + trainer.callbacks.append(StatelessTimer(cfg.max_time_per_run)) + if is_global_rank_zero(): # Move files_to_copy to folder and add git information if present if cfg.files_to_copy: diff --git a/tests/core_ptl/test_ptl_stateless_timer.py b/tests/core_ptl/test_ptl_stateless_timer.py index c20cac4fecf0..40b7faf7ac4d 100644 --- a/tests/core_ptl/test_ptl_stateless_timer.py +++ b/tests/core_ptl/test_ptl_stateless_timer.py @@ -96,7 +96,6 @@ def setup_model(self): strategy='ddp', logger=None, enable_checkpointing=False, - callbacks=[StatelessTimer('00:00:00:03')], ) exp_manager_cfg = ExpManagerConfig( explicit_log_dir='./ptl_stateless_timer_check/', @@ -106,6 +105,7 @@ def setup_model(self): create_checkpoint_callback=True, checkpoint_callback_params=callback_params, resume_if_exists=True, + max_time_per_run="00:00:00:03", ) exp_manager(trainer, cfg=OmegaConf.structured(exp_manager_cfg)) model = ExampleModel(trainer=trainer) From 0a5463bf78d701eb63ce471085ce8e22927be558 Mon Sep 17 00:00:00 2001 From: Sean Naren Date: Mon, 19 Dec 2022 17:34:17 +0000 Subject: [PATCH 079/154] Fix EMA restart by allowing device to be set by the class init (#5668) Signed-off-by: SeanNaren Signed-off-by: SeanNaren Signed-off-by: Elena Rastorgueva --- nemo/collections/common/callbacks/ema.py | 2 -- 1 file changed, 2 deletions(-) diff --git a/nemo/collections/common/callbacks/ema.py b/nemo/collections/common/callbacks/ema.py index 0737f4262b62..63222db68f52 100644 --- a/nemo/collections/common/callbacks/ema.py +++ b/nemo/collections/common/callbacks/ema.py @@ -327,7 +327,6 @@ def state_dict(self): 'current_step': self.current_step, 'decay': self.decay, 'every_n_steps': self.every_n_steps, - 'device': self.device, } return state_dict @@ -338,7 +337,6 @@ def load_state_dict(self, state_dict): self.ema_params = tuple(param.to(self.device) for param in copy.deepcopy(state_dict['ema'])) self.current_step = state_dict['current_step'] self.decay = state_dict['decay'] - self.device = state_dict['device'] self.every_n_steps = state_dict['every_n_steps'] self.rebuild_ema_params = False From c708b9d08ee7a6bcc16118354c73d4bcf0d659ab Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Date: Mon, 19 Dec 2022 10:04:44 -0800 Subject: [PATCH 080/154] Remove SDP (moved to separate repo) - merge to main (#5630) * Remove sdp files from tools folder Signed-off-by: Elena Rastorgueva * Add page to docs with new SDP location Signed-off-by: Elena Rastorgueva Signed-off-by: Elena Rastorgueva --- docs/source/tools/intro.rst | 6 + docs/source/tools/speech_data_processor.rst | 8 + tools/speech_data_processor/README.md | 69 --- tools/speech_data_processor/__init__.py | 13 - .../spanish/mls/config_mls_es.yaml | 295 --------- .../unique_processors/clean_roman_numerals.py | 119 ---- tools/speech_data_processor/main.py | 26 - tools/speech_data_processor/requirements.txt | 1 - tools/speech_data_processor/sdp/__init__.py | 13 - .../sdp/processors/__init__.py | 40 -- .../sdp/processors/asr_inference.py | 59 -- .../sdp/processors/base_processor.py | 163 ----- .../create_initial_manifest/__init__.py | 13 - .../create_initial_manifest_mls.py | 108 ---- .../processors/modify_manifest/__init__.py | 13 - .../modify_manifest/data_to_data.py | 296 --------- .../modify_manifest/data_to_dropbool.py | 563 ------------------ .../modify_manifest/modify_manifest.py | 95 --- .../sdp/processors/write_manifest.py | 45 -- .../sdp/run_processors.py | 67 --- .../sdp/utils/__init__.py | 13 - .../speech_data_processor/sdp/utils/common.py | 51 -- .../sdp/utils/edit_spaces.py | 41 -- .../sdp/utils/get_diff.py | 77 --- .../sdp/utils/metrics_computation.py | 63 -- tools/speech_data_processor/tests/__init__.py | 13 - .../prepare_test_data/prepare_mls_data.py | 52 -- .../tests/test_all_cfgs.py | 57 -- .../tests/test_data_to_data.py | 99 --- .../tests/test_data_to_dropbool.py | 236 -------- .../speech_data_processor/tests/test_utils.py | 26 - 31 files changed, 14 insertions(+), 2726 deletions(-) create mode 100644 docs/source/tools/speech_data_processor.rst delete mode 100644 tools/speech_data_processor/README.md delete mode 100644 tools/speech_data_processor/__init__.py delete mode 100644 tools/speech_data_processor/dataset_configs/spanish/mls/config_mls_es.yaml delete mode 100644 tools/speech_data_processor/dataset_configs/spanish/mls/unique_processors/clean_roman_numerals.py delete mode 100644 tools/speech_data_processor/main.py delete mode 100644 tools/speech_data_processor/requirements.txt delete mode 100644 tools/speech_data_processor/sdp/__init__.py delete mode 100644 tools/speech_data_processor/sdp/processors/__init__.py delete mode 100644 tools/speech_data_processor/sdp/processors/asr_inference.py delete mode 100644 tools/speech_data_processor/sdp/processors/base_processor.py delete mode 100644 tools/speech_data_processor/sdp/processors/create_initial_manifest/__init__.py delete mode 100644 tools/speech_data_processor/sdp/processors/create_initial_manifest/create_initial_manifest_mls.py delete mode 100644 tools/speech_data_processor/sdp/processors/modify_manifest/__init__.py delete mode 100644 tools/speech_data_processor/sdp/processors/modify_manifest/data_to_data.py delete mode 100644 tools/speech_data_processor/sdp/processors/modify_manifest/data_to_dropbool.py delete mode 100644 tools/speech_data_processor/sdp/processors/modify_manifest/modify_manifest.py delete mode 100644 tools/speech_data_processor/sdp/processors/write_manifest.py delete mode 100644 tools/speech_data_processor/sdp/run_processors.py delete mode 100644 tools/speech_data_processor/sdp/utils/__init__.py delete mode 100644 tools/speech_data_processor/sdp/utils/common.py delete mode 100644 tools/speech_data_processor/sdp/utils/edit_spaces.py delete mode 100644 tools/speech_data_processor/sdp/utils/get_diff.py delete mode 100644 tools/speech_data_processor/sdp/utils/metrics_computation.py delete mode 100644 tools/speech_data_processor/tests/__init__.py delete mode 100644 tools/speech_data_processor/tests/prepare_test_data/prepare_mls_data.py delete mode 100644 tools/speech_data_processor/tests/test_all_cfgs.py delete mode 100644 tools/speech_data_processor/tests/test_data_to_data.py delete mode 100644 tools/speech_data_processor/tests/test_data_to_dropbool.py delete mode 100644 tools/speech_data_processor/tests/test_utils.py diff --git a/docs/source/tools/intro.rst b/docs/source/tools/intro.rst index 466c0fe3b666..37956e351b2a 100644 --- a/docs/source/tools/intro.rst +++ b/docs/source/tools/intro.rst @@ -12,3 +12,9 @@ NeMo provides a set of tools useful for developing Automatic Speech Recognitions comparison_tool +There are also additional NeMo-related tools hosted in separate github repositories: + +.. toctree:: + :maxdepth: 1 + + speech_data_processor diff --git a/docs/source/tools/speech_data_processor.rst b/docs/source/tools/speech_data_processor.rst new file mode 100644 index 000000000000..49c3d7a81117 --- /dev/null +++ b/docs/source/tools/speech_data_processor.rst @@ -0,0 +1,8 @@ +Speech Data Processor +======================== + +Speech Data Processor (SDP) is a toolkit to make it easy to: + 1. write code to process a new dataset, minimizing the amount of boilerplate code required. + 2. share the steps for processing a speech dataset. Sharing processing steps can be as easy as sharing a YAML file. + +SDP is hosted here: https://github.com/NVIDIA/NeMo-speech-data-processor. \ No newline at end of file diff --git a/tools/speech_data_processor/README.md b/tools/speech_data_processor/README.md deleted file mode 100644 index 58547c2e8317..000000000000 --- a/tools/speech_data_processor/README.md +++ /dev/null @@ -1,69 +0,0 @@ -# Speech Data Processor - -Speech Data Processor (SDP) is a toolkit to make it easy to: -1. write code to process a new dataset, minimizing the amount of boilerplate code required. -2. share the steps for processing a speech dataset. Sharing processing steps can be as easy as sharing a YAML file. - -SDP's philosophy is to represent processing operations as 'processor' classes. Many common processing operations are provided, and it is easy to add your own. In some cases, all you will need to do to process a new dataset is simply to write a YAML file containing the parameters needed to process your dataset. - -SDP is specifically intended for the use case when you have an existing dataset with the audio & text pairs already specified in some form, and you wish to create a JSON manifest suitable for use with NeMo. SDP allows for intermediate cleaning and filtering steps which involve amending the 'ground truth' `"text"` or dropping utterances which are deemed to be too inaccurate for training on. - -## Quick intro to Speech Data Processor - -* The steps to process a dataset are specified by a YAML config file. -* The YAML config file contains a list of processor classes & the args to pass into the constructor. -* Each processor class inputs an existing manifest (except for classes which create an 'initial' manifest from some external transcript file) & outputs a modified version of the manifest. It may change other files in the process, e.g. resample audio. -* To process a manifest, you need to list the chain of processors you wish to use. -* If a processor is not included, you can make your own. - -## YAML config file layout -A simplified version of an SDP file can be: - -```yaml -processors: - - # use existing classes for popular datasets or make your own class - - _target_: sdp.processors.CreateInitialManifestMLS - output_manifest_file: ... - download_dir: ... - ... - - # use existing classes for common operations or write your own - - _target_: sdp.processors.SubSubstringToSubstring - - substring_pairs: { - # specify the parameters needed for your usecase - " mr ": " mister ", - " misteak ": " mistake ", - ... - } - - - _target_: sdp.processors.DropNonAlphabet - alphabet: " abcdefghijklmnopqrstuvwxyz" - output_manifest_file: ... - ... -``` -## Existing processor classes -In addition to those mentioned in the example config file, many more classes are already included in Speech Data Processor, for example: -* `sdp.processors.ASRInference` will run inference on the manifest using a specified `pretrained_model`. -* `sdp.processors.DropHighWER` will compute WER between `text` and `pred_text` of each utterance and remove the utterance if WER is greater than the specified `wer_threshold`. -* `sdp.processors.DropHighLowCharrate` will compute the character rate in the utterance using `text` and `duration`, and drop the utterance if it is outside the bounds of the specified `high_charrate_threshold` and `low_charrate_threshold`. Carefully chosen thresholds will allow us to drop utterances with incorrect ground truth `text`. - -## Processor test cases -You can add test cases to verify you have specified your desired changes correctly and to help document why your are making these changes. - -For example: -```yaml -processors: - ... - - _target_: sdp.processors.DropIfRegexInAttribute - attribute_to_regex: - "text" : ["(\\D ){5,20}"] # looks for between 4 and 19 characters surrounded by spaces - - test_cases: - - {input: {text: "some s p a c e d out letters"}, output: null} - - {input: {text: "normal words only"}, output: {text: "normal words only"}} - - {input: {text: "three a b c spaced out letters"}, output: {text: "three a b c spaced out letters"}} - - {input: {text: "four a b c d spaced out letters"}, output: null} - ... -``` \ No newline at end of file diff --git a/tools/speech_data_processor/__init__.py b/tools/speech_data_processor/__init__.py deleted file mode 100644 index 2db92b257416..000000000000 --- a/tools/speech_data_processor/__init__.py +++ /dev/null @@ -1,13 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/tools/speech_data_processor/dataset_configs/spanish/mls/config_mls_es.yaml b/tools/speech_data_processor/dataset_configs/spanish/mls/config_mls_es.yaml deleted file mode 100644 index 2499887385d7..000000000000 --- a/tools/speech_data_processor/dataset_configs/spanish/mls/config_mls_es.yaml +++ /dev/null @@ -1,295 +0,0 @@ -# user can specify which processors should be run -# can be either "all" to run all processors, -# or any Python "slice" object, e.g., -# ":3" (to select first 3 objects), -# ":-1" (to select all but last) -# "2:5" (to select 3rd to 5th) -# "0" (to select only the first processor) -processors_to_run: ??? -data_split: ??? -workspace_dir: ??? -final_manifest: ??? - -processors: - - _target_: sdp.processors.CreateInitialManifestMLS - output_manifest_file: "${workspace_dir}/mls_spanish_processed/${data_split}_manifest.json" - language: spanish - download_dir: ${workspace_dir} - resampled_audio_dir: "${workspace_dir}/mls_spanish_processed/${data_split}/audio/" - data_split: "${data_split}" - - - _target_: sdp.processors.ASRInference - input_manifest_file: "${workspace_dir}/mls_spanish_processed/${data_split}_manifest.json" - output_manifest_file: "${workspace_dir}/processed_manifests/stt_es_quartznet15x5_${data_split}.json" - pretrained_model: "stt_es_quartznet15x5" - - - _target_: sdp.processors.SubSubstringToSubstring - input_manifest_file: "${workspace_dir}/processed_manifests/stt_es_quartznet15x5_${data_split}.json" - - substring_pairs: { - - "'" : "", # so that e.g. "d'artagnan" becomes "dartagnan", not "d artagnan" - - ' sr ': ' señor ', - ' vd ': ' usted ', - ' vds ': ' ustedes ', - ' vdes ': ' ustedes ', - ' v md ': ' vuestra merced ', - ' mr ': ' mister ', - ' ud ': ' un ', # technically can be ustedes, but I have only seen it being 'un' - - ' guando ': ' cuando ', - ' magestad ': ' majestad ', - - ' á ': ' a ', - ' ó ': ' o ', - ' é ': ' e ', - ' ú ': ' u ', - ' fué ': ' fue ', - ' sólo ': ' solo ', - ' dió ': ' dio ', - ' hácia ': ' hacia ', - ' jóven ': ' joven ', - - # tried to make reasonable approximations: - ' dixo ' : ' dijo ', - ' dixe ' : ' dije ', - ' dixéramos ' : ' dijéramos ', - - ' dixeron ' : ' dijeron ', - ' dixéron ': ' dijéron ', - ' dixese ': ' dijese ', - ' dixesen ':' dijesen', ### bug - ' dixesemos ':' dijesemos', ### bug - ' diximos ':' dijimos ', - ' dixere ':' dijere ', - ' dixera ':' dijera ', - - ' algun ':' algún ', - ' alli ':' allí ', - ' aqui ':' aquí ', - ' asi ':' así ', - ' atencion ':' atención ', - ' capitan ':' capitán ', - ' corazon ':' corazón ', - ' debia ':' debía ', - ' decia ':' decía ', - ' decian ':' decían ', - ' demas ':' demás ', - ' despues ':' después ', - ' dia ':' día ', - ' dias ':' días ', - ' habeis ':' habéis ', - ' habia ':' había ', - ' habian ':' habían ', - ' habitacion ':' habitación ', - ' habria ':' habría ', - ' hacian ':' hacían ', - ' mio ':' mío ', - ' ningun ':' ningún ', - ' ocasion ':' ocasión ', - ' oir ':' oír ', - ' pais ':' país ', - ' parecia ':' parecía ', - ' podia ':' podía ', - ' podian ':' podían ', - ' podria ':' podría ', - ' queria ':' quería ', - ' razon ':' razón ', - ' segun ':' según ', - ' tambien ':' también ', - ' tenia ':' tenía ', - ' tenian ':' tenían ', - ' velazquez ':' velázquez ', - ' venian ':' venían ' - } - - test_cases: - - {input: {text: "á las dos"}, output: {text: "a las dos"}} - - - _target_: sdp.processors.SubSubstringToSpace - substrings: [ - '.', '-', '‐', '‑', '–', '—', '―', - '"', '$', '&', "'", ',', ',', - ':', "'", '=', '?', '_', '`', - '{', '|', '}', '~', '¨', 'ª', - '«', '°', '´', '·', '»', '¿', - '‘', '’', '“', '”', '„', '…', - '‧', '‹', '›', '≪', '≫', '!', - ':', ';', '`', 'ʻ', - 'ː', '‘', '’', '“', '→', - '"', '%', '‘', '”', '�', - 'ʽ', 'ʿ', - '́', #used for putting stress on Russian letters - '̇', - 'content from google book search google book search generated at ', - 'content from google book search generated at ', - 'content from ', - 'google book search generated at ', - 'search generated at ', - 'content from google ', - 'content from google book search ', - 'content from google book search generated at content from google book search generated at ', - 'book search generated at ', - 'content from google book ', - 'generated at ', - 'generated at content from google book search generated at ', - 'at content from google book search generated at ', - 'from google book search generated at ', - 'content from google book search content from google book search generated at ', - 'content from google book search generated at content from google book search generated at content from google book search generated at ', - ] - - test_cases: - - {input: {text: "abc, def."}, output: {text: "abc def"}} - - {input: {text: "abc! def."}, output: {text: "abc def"}} - - - _target_: sdp.processors.DropNonAlphabet - alphabet: " abcdefghijklmnopqrstuvwxyzáéíñóúü" - test_cases: - - {input: {text: "test тест 测试"}, output: null} - - {input: {text: "test"}, output: {text: "test"}} - - - _target_: dataset_configs.spanish.mls.unique_processors.clean_roman_numerals.CleanRomanNumerals - king_triggers: [ - "alfonso", - "benedicto", - "carlos", - "clemente", - "enrique", - "federico", - "felipe", - "fernando", - "filipo", - "gregorio", - "guillermo", - "jaime", - "jorge", - "león", - "luis", - "pie", - "tomo", - ] - queen_triggers: ["isabel"] - ordinal_masc_triggers: ["capítulo"] - ordinal_fem_triggers: [ - "parte", - "escena", - ] - cardinal_triggers: [ - "siglo", - "carta", - "libro", - "número", - ] - test_cases: - - {input: {text: "número i"}, output: {text: "número uno"}} - - - _target_: sdp.processors.DropIfRegexInAttribute - attribute_to_regex: - "text" : ["(\\D ){5,20}"] # looks for between 4 and 19 characters surrounded by spaces - - test_cases: - - {input: {text: "some s p a c e d out letters"}, output: null} - - {input: {text: "normal words only"}, output: {text: "normal words only"}} - - {input: {text: "three a b c spaced out letters"}, output: {text: "three a b c spaced out letters"}} - - {input: {text: "four a b c d spaced out letters"}, output: null} - - - _target_: sdp.processors.DropIfSubstringInAttribute - attribute_to_substring: - "audio_filepath" : [ - # books with lots of OCR errors etc. - "8882/10372", - "8882/11576", - "10246/11643", - "9972/11767", - "9972/12090", - "9972/12260", - "10246/12300", - "10246/12585", - "12689/12700", - "12341/12700", - "8882/12700", - "12953/12700", - "12428/12700", - "12921/12700", - "11797/12700", - "9972/12705", - "11797/13121", - "11797/13497", - "10246/13672", - "12367/14286", - "10246/14708", - "11048/9310", - "6447/9310", - "11040/9310", - "9063/9310", - "9972/9503", - "8060/9503", - ] - - test_cases: - - {input: {audio_filepath: "/path/10246/12585/abc.wav"}, output: null} - - {input: {audio_filepath: "/path/1/1/abc.wav"}, output: {audio_filepath: "/path/1/1/abc.wav"}} - - - _target_: sdp.processors.DropHighLowCharrate - high_charrate_threshold: 20 - low_charrate_threshold: 5 - test_cases: - - {input: {text: "buenos dias", duration: 0.1}, output: null} - - {input: {text: "buenos dias", duration: 30}, output: null} - - {input: {text: "buenos dias", duration: 1}, output: {text: "buenos dias", duration: 1}} - - - - _target_: sdp.processors.DropIfSubstringInInsertion - substrings_in_insertion: [ - "uno ", "dos ", "tres ", "cuatro ", "cinco ", - "seis ", "siete ", "ocho ", "nueve ", "diez ", - "once ", "doce ", "trece ", "catorce ", "quince ", - "veinte ", "treinta ", "cuarenta ", "cincuenta ", - "sesenta ", "setenta ", "ochenta ", "noventa ", - "cien ", "ciento", "cientos ", "mil " - ] - test_cases: - - {input: {text: "el de junio", pred_text: "el diez de junio"}, output: null} - - {input: {text: "el diez de junio", pred_text: "el diez de junio"}, output: {text: "el diez de junio", pred_text: "el diez de junio"}} - - - _target_: sdp.processors.DropIfSubstringInAttribute - attribute_to_substring: - "pred_text" : [ - 'librewox', 'librevox', 'librivox', 'libribox', 'libriebox', 'libriboux', - ' grabado por ', - ] - test_cases: - - {input: {pred_text: "librivox recording"}, output: null} - - {input: {pred_text: "abcdef"}, output: {pred_text: "abcdef"}} - - - _target_: sdp.processors.DropASRErrorBeginningEnd - beginning_error_char_threshold: 10 - end_error_char_threshold: 10 - test_cases: - - {input: {text: "sí hola", pred_text: "abcdefabcdef sí hola"}, output: null} - - {input: {text: "abcdefabcdef sí hola", pred_text: "sí hola"}, output: null} - - { input: {text: "abcdefabcdef sí hola", pred_text: "uvwxyzuvwxyz sí hola"}, - output: {text: "abcdefabcdef sí hola", pred_text: "uvwxyzuvwxyz sí hola"}} - - - _target_: sdp.processors.DropHighWER - wer_threshold: 90 - test_cases: - - {input: {text: "sí hola", pred_text: "abcdefgh abcdefgh"}, output: null} - - {input: {text: "sí hola", pred_text: "sí hola"}, output: {text: "sí hola", pred_text: "sí hola"}} - - - _target_: sdp.processors.DropHighCER - output_manifest_file: "${workspace_dir}/processed_manifests/processed_${data_split}.json" - cer_threshold: 90 - test_cases: - - {input: {text: "sí hola", pred_text: "abcdefgh abcdefgh"}, output: null} - - {input: {text: "sí hola", pred_text: "sí hola"}, output: {text: "sí hola", pred_text: "sí hola"}} - - - _target_: sdp.processors.WriteManifest - input_manifest_file: "${workspace_dir}/processed_manifests/processed_${data_split}.json" - output_manifest_file: ${final_manifest} - fields_to_save: - - "audio_filepath" - - "text" - - "duration" diff --git a/tools/speech_data_processor/dataset_configs/spanish/mls/unique_processors/clean_roman_numerals.py b/tools/speech_data_processor/dataset_configs/spanish/mls/unique_processors/clean_roman_numerals.py deleted file mode 100644 index 7bdaa0f97468..000000000000 --- a/tools/speech_data_processor/dataset_configs/spanish/mls/unique_processors/clean_roman_numerals.py +++ /dev/null @@ -1,119 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import collections -import os -import tempfile -import urllib.request -from typing import List - -import pandas as pd -from sdp.processors.base_processor import DataEntry -from sdp.processors.modify_manifest.modify_manifest import ModifyManifestTextProcessor - -from nemo.utils import logging - -NUMERAL_DATA_DOWNLOAD_PATH = ( - "https://github.com/NVIDIA/NeMo/releases/download/v1.0.0rc1/1-100_roman_numeral_table_spanish.csv" -) - - -def download_numeral_data(tgt_path): - urllib.request.urlretrieve(NUMERAL_DATA_DOWNLOAD_PATH, tgt_path) - - return None - - -class CleanRomanNumerals(ModifyManifestTextProcessor): - def __init__( - self, - king_triggers, - queen_triggers, - ordinal_masc_triggers, - ordinal_fem_triggers, - cardinal_triggers, - numerals_data_path=None, - save_changes_in=None, - **kwargs, - ): - super().__init__(**kwargs) - self.numerals_data_path = numerals_data_path - self.king_triggers = king_triggers - self.queen_triggers = queen_triggers - self.ordinal_masc_triggers = ordinal_masc_triggers - self.ordinal_fem_triggers = ordinal_fem_triggers - self.cardinal_triggers = cardinal_triggers - - # read csv - if self.numerals_data_path: - if not os.path.isfile(self.numerals_data_path): - download_numeral_data(self.numerals_data_path) - df = pd.read_csv(self.numerals_data_path, sep="\t", index_col=0) - else: - with tempfile.TemporaryDirectory() as temp_dir: - temp_file = os.path.join(temp_dir, "temporary_numeral_data_download.csv") - download_numeral_data(temp_file) - df = pd.read_csv(temp_file, sep="\t", index_col=0) - - self.roman_numeral_to_ordinal_masc = {} - for i, row in df.iterrows(): - self.roman_numeral_to_ordinal_masc[row["roman"]] = row["ordinal_masc"].strip() - - self.roman_numeral_to_ordinal_fem = {} - for i, row in df.iterrows(): - self.roman_numeral_to_ordinal_fem[row["roman"]] = row["ordinal_fem"].strip() - - self.roman_numeral_to_cardinal = {} - for i, row in df.iterrows(): - self.roman_numeral_to_cardinal[row["roman"]] = row["cardinal"].strip() - - self.roman_numeral_to_king = {} - for i, row in df.iterrows(): - self.roman_numeral_to_king[row["roman"]] = row["king"].strip() - - self.roman_numeral_to_queen = {} - for i, row in df.iterrows(): - self.roman_numeral_to_queen[row["roman"]] = row["queen"].strip() - - self.clean_roman_numerals_count = collections.defaultdict(int) - - def _process_dataset_entry(self, data_entry) -> List: - data_entry = self.clean_operation(data_entry, self.ordinal_masc_triggers, self.roman_numeral_to_ordinal_masc) - data_entry = self.clean_operation(data_entry, self.ordinal_fem_triggers, self.roman_numeral_to_ordinal_fem) - data_entry = self.clean_operation(data_entry, self.cardinal_triggers, self.roman_numeral_to_cardinal) - data_entry = self.clean_operation(data_entry, self.king_triggers, self.roman_numeral_to_king) - data_entry = self.clean_operation(data_entry, self.queen_triggers, self.roman_numeral_to_queen) - return [DataEntry(data=data_entry, metrics=self.clean_roman_numerals_count)] - - def finalize(self, metrics): - total_counter = collections.defaultdict(int) - for counter in metrics: - for word, count in counter.items(): - total_counter[word] += count - logging.info("Num of roman numeral substitutions") - total_counter_sorted = dict(sorted(total_counter.items(), key=lambda x: x[1], reverse=True,)) - for word, count in total_counter_sorted.items(): - logging.info(f"{word} {count}") - super().finalize(metrics) - - def clean_operation(self, data, triggers, roman_numeral_to_num_written): - for trigger in triggers: - if trigger in data["text"]: - for roman_numeral, num_written in roman_numeral_to_num_written.items(): - noun_roman = f" {trigger} {roman_numeral} " - if noun_roman in data["text"]: - noun_number = f" {trigger} {num_written} " - data["text"] = data["text"].replace(noun_roman, noun_number) - self.clean_roman_numerals_count[noun_roman] += 1 - return data diff --git a/tools/speech_data_processor/main.py b/tools/speech_data_processor/main.py deleted file mode 100644 index 7228063ff298..000000000000 --- a/tools/speech_data_processor/main.py +++ /dev/null @@ -1,26 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from sdp.run_processors import run_processors - -from nemo.core.config import hydra_runner - - -@hydra_runner() -def main(cfg): - run_processors(cfg) - - -if __name__ == "__main__": - main() diff --git a/tools/speech_data_processor/requirements.txt b/tools/speech_data_processor/requirements.txt deleted file mode 100644 index 63904d71d9c9..000000000000 --- a/tools/speech_data_processor/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -diff_match_patch diff --git a/tools/speech_data_processor/sdp/__init__.py b/tools/speech_data_processor/sdp/__init__.py deleted file mode 100644 index 2db92b257416..000000000000 --- a/tools/speech_data_processor/sdp/__init__.py +++ /dev/null @@ -1,13 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/tools/speech_data_processor/sdp/processors/__init__.py b/tools/speech_data_processor/sdp/processors/__init__.py deleted file mode 100644 index 451bff62b6cd..000000000000 --- a/tools/speech_data_processor/sdp/processors/__init__.py +++ /dev/null @@ -1,40 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# let's import all supported processors here to simplify target specification -from sdp.processors.asr_inference import ASRInference -from sdp.processors.create_initial_manifest.create_initial_manifest_mls import CreateInitialManifestMLS -from sdp.processors.modify_manifest.data_to_data import ( - InsIfASRInsertion, - SubIfASRSubstitution, - SubMakeLowercase, - SubRegex, - SubSubstringToSpace, - SubSubstringToSubstring, -) -from sdp.processors.modify_manifest.data_to_dropbool import ( - DropASRErrorBeginningEnd, - DropHighCER, - DropHighLowCharrate, - DropHighLowDuration, - DropHighLowWordrate, - DropHighWER, - DropIfRegexInAttribute, - DropIfSubstringInAttribute, - DropIfSubstringInInsertion, - DropIfTextIsEmpty, - DropLowWordMatchRate, - DropNonAlphabet, -) -from sdp.processors.write_manifest import WriteManifest diff --git a/tools/speech_data_processor/sdp/processors/asr_inference.py b/tools/speech_data_processor/sdp/processors/asr_inference.py deleted file mode 100644 index 98bf43b45b90..000000000000 --- a/tools/speech_data_processor/sdp/processors/asr_inference.py +++ /dev/null @@ -1,59 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import subprocess -from pathlib import Path - -from sdp.processors.base_processor import BaseProcessor - - -class ASRInference(BaseProcessor): - """This processor performs ASR inference on the input manifest. - - Args: - output_manifest: the path to the output manifest. It will be the same as the input manifest, but will - also have "pred_true" entries for every utterance. - input_manifest_file: the path to the input manifest which will be transcribed. - pretrained_model: the name of the pretrained NeMo ASR model which will be used to do inference. - batch_size: the batch size to use for ASR inference. - - Note that it does not re-use base parallel implementation, since the ASR - inference is already run in batches. - - TODO: actually, it might still be benefitial to have another level of - parallelization, but that needs to be tested. - """ - - def __init__( - self, output_manifest_file: str, input_manifest_file: str, pretrained_model: str, batch_size: int = 32 - ): - self.output_manifest_file = output_manifest_file - self.input_manifest_file = input_manifest_file - self.script_path = Path(__file__).parents[4] / "examples" / "asr" / "transcribe_speech.py" - self.pretrained_model = pretrained_model - self.batch_size = batch_size - - def process(self): - """This will add "pred_text" key into the output manifest.""" - os.makedirs(os.path.dirname(self.output_manifest_file), exist_ok=True) - subprocess.run( - f"python {self.script_path} " - f"pretrained_name={self.pretrained_model} " - f"dataset_manifest={self.input_manifest_file} " - f"output_filename={self.output_manifest_file} " - f"batch_size={self.batch_size} ", - shell=True, - check=True, - ) diff --git a/tools/speech_data_processor/sdp/processors/base_processor.py b/tools/speech_data_processor/sdp/processors/base_processor.py deleted file mode 100644 index 2bbad5da6484..000000000000 --- a/tools/speech_data_processor/sdp/processors/base_processor.py +++ /dev/null @@ -1,163 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import itertools -import json -import multiprocessing -from abc import ABC, abstractmethod -from dataclasses import dataclass -from typing import Any, Dict, List, Optional - -from tqdm import tqdm -from tqdm.contrib.concurrent import process_map - -from nemo.utils import logging - - -@dataclass -class DataEntry: - """A wrapper for data entry + any additional metrics.""" - - data: Optional[Dict] # can be None to drop the entry - metrics: Any = None - - -class BaseProcessor(ABC): - """ - Abstract class for SDP processors. - - Args - output_manifest_file: path of where the output manifest file will be located. - input_manifest_file: path of where the input manifest file is located. This arg - is optional - some processors may not take in an input manifest because they - need to create an initial manifest from scratch (ie from some transcript file - that is in a format different to the NeMo manifest format). - """ - - def __init__(self, output_manifest_file, input_manifest_file=None): - self.output_manifest_file = output_manifest_file - self.input_manifest_file = input_manifest_file - - @abstractmethod - def process(self): - pass - - def test(self): - """This method can be used to perform "runtime" tests. - - This can be any kind of self-consistency tests, but are usually - in the form of checking that provided input test data entries match - provided output test data entries. - - There are not tests by default. - """ - - -class BaseParallelProcessor(BaseProcessor): - """ - Processor class which allows operations on each utterance to be parallelized. Parallelization - is done using tqdm.contrib.concurrent.process_map. - - Args: - max_workers: maximum number of workers that will be spawned during parallel processing. - chunksize: the size of the chunks that will be sent to worker processes. - """ - - def __init__(self, max_workers: int = -1, chunksize: int = 100, **kwargs): - super().__init__(**kwargs) - if max_workers == -1: - max_workers = multiprocessing.cpu_count() - self.max_workers = max_workers - self.chunksize = chunksize - self.number_of_entries = 0 - self.total_duration = -1 - - def read_manifest(self): - """ - This function should be overridden in the "initial" class creating - manifest to read from the original source of data. - """ - if self.input_manifest_file is None: - raise NotImplementedError("Override this method if the processor creates initial manifest") - - # TODO: should we not assume that manifest can fully fit in memory? - with open(self.input_manifest_file, "rt", encoding="utf8") as fin: - dataset_entries = [json.loads(line) for line in fin.readlines()] - - return dataset_entries - - def prepare(self): - """Can be used in derived classes to prepare processing in any way. - - E.g., download data or compute some aggregates. Will be called before - starting processing the data. - """ - - def process(self): - """ - We always going to serialize output into a manifest file that contains - all information about the dataset. - - This method should also create a new manifest in the end according to - the `self.output_manifest_file` argument. - """ - self.prepare() - dataset_entries = self.read_manifest() - - # this will unroll all inner lists - data = itertools.chain( - *process_map( - self.process_dataset_entry, dataset_entries, max_workers=self.max_workers, chunksize=self.chunksize, - ) - ) - metrics = [] - with open(self.output_manifest_file, "wt", encoding="utf8") as fout: - for data_entry in tqdm(data): - metrics.append(data_entry.metrics) - if data_entry.data is None: - continue - json.dump(data_entry.data, fout) - self.number_of_entries += 1 - self.total_duration += data_entry.data.get("duration", 0) - fout.write("\n") - - self.finalize(metrics) - - def finalize(self, metrics): - """Can be used to output statistics about processed data. - - By default outputs new number of entries/hours. - """ - logging.info("Total number of entries after processing: %d", self.number_of_entries) - if self.total_duration != -1: - logging.info("Total audio duration (hours) after processing: %.2f", self.total_duration / 3600) - - @abstractmethod - def process_dataset_entry(self, data_entry) -> List[DataEntry]: - """Needs to be implemented in the derived classes. - - Note that this method should always return a list of objects - to allow one-to-many mapping (many-to-one is not - supported in this design). - - Each returned value should be a DataEntry object that will hold - a dictionary (or anything else that can be json-serialized) with - the actual data + any additional metrics required for statistcs - reporting. Those metrics can be used in :meth:`finalize` to - prepare for final reporting. - - TODO: it would be more strightforward to use a generator here, but - seems that it's not supported with multiprocessing. Is there a - way to make it work? - """ diff --git a/tools/speech_data_processor/sdp/processors/create_initial_manifest/__init__.py b/tools/speech_data_processor/sdp/processors/create_initial_manifest/__init__.py deleted file mode 100644 index 2db92b257416..000000000000 --- a/tools/speech_data_processor/sdp/processors/create_initial_manifest/__init__.py +++ /dev/null @@ -1,13 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/tools/speech_data_processor/sdp/processors/create_initial_manifest/create_initial_manifest_mls.py b/tools/speech_data_processor/sdp/processors/create_initial_manifest/create_initial_manifest_mls.py deleted file mode 100644 index 97f224cb69de..000000000000 --- a/tools/speech_data_processor/sdp/processors/create_initial_manifest/create_initial_manifest_mls.py +++ /dev/null @@ -1,108 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -from pathlib import Path - -import sox -from sdp.processors.base_processor import BaseParallelProcessor, DataEntry -from sdp.utils.common import download_file, extract_archive -from sox import Transformer - -MLS_URL = "https://dl.fbaipublicfiles.com/mls/mls_{language}.tar.gz" -TEST_DATA_PATH = str(Path(__file__).parents[2] / "tests" / "test_data" / "mls_{language}" / "data.tar.gz") - - -class CreateInitialManifestMLS(BaseParallelProcessor): - """ - Downloads and unzips raw MLS data for the specified language, and creates an initial manifest using - the transcripts provided in the raw data. - - Args: - language: the language of the data you wish to be downloaded. This will be used to format the - URL from which we attempt to download the data. - download_dir: the directory where the downloaded data will be saved. - data_split: the data split for which the initial manifest will be created. - resampled_audio_dir: the directory where the resampled (16kHz) wav files will be stored. - use_test_data: if `True`, will use the test data manifest located at `TEST_DATA_PATH` to carry out tests. - """ - - def __init__( - self, - language: str, - download_dir: str, - resampled_audio_dir: str, - data_split: str, - use_test_data: bool = False, - **kwargs, - ): - super().__init__(**kwargs) - self.language = language - self.download_dir = Path(download_dir) - self.data_split = data_split - self.resampled_audio_dir = resampled_audio_dir - self.use_test_data = use_test_data - - # will be initialized in self.prepare method - self.audio_path_prefix = None - self.transcription_file = None - - def prepare(self): - """Downloading and extracting data (unless already done). - - If use_test_data is True, then will not download data, instead will - copy the included test data (mainly useful for quick development or - CI pipeline). - """ - if self.use_test_data: - test_data_path = TEST_DATA_PATH.format(language=self.language) - data_folder = extract_archive(str(test_data_path), str(self.download_dir)) - else: - url = MLS_URL.format(language=self.language) - download_file(url, str(self.download_dir)) - data_folder = extract_archive(str(self.download_dir / os.path.basename(url)), str(self.download_dir)) - self.audio_path_prefix = str(Path(data_folder) / self.data_split / "audio") - self.transcription_file = str(Path(data_folder) / self.data_split / "transcripts.txt") - - def read_manifest(self): - if self.transcription_file is None: - raise RuntimeError("self.process has to be called before processing the data.") - - with open(self.transcription_file, "rt", encoding="utf8") as fin: - dataset_entries = fin.readlines() - - return dataset_entries - - def process_dataset_entry(self, data_entry: str): - if len(data_entry.split("\t")) != 2: - raise RuntimeError(f"have more than one tab in line {data_entry}") - - utt_id, text = data_entry.split("\t") - transcript_text = text.strip() - - src_flac_path = os.path.join(self.audio_path_prefix, *utt_id.split("_")[:2], utt_id + ".flac") - tgt_wav_path = os.path.join(self.resampled_audio_dir, *utt_id.split("_")[:2], utt_id + ".wav") - - if not os.path.exists(os.path.dirname(tgt_wav_path)): - os.makedirs(os.path.dirname(tgt_wav_path), exist_ok=True) - if not os.path.exists(tgt_wav_path): - Transformer().build(src_flac_path, tgt_wav_path) - - data = { - "audio_filepath": tgt_wav_path, - "duration": float(sox.file_info.duration(tgt_wav_path)), - "text": transcript_text, - } - - return [DataEntry(data=data)] diff --git a/tools/speech_data_processor/sdp/processors/modify_manifest/__init__.py b/tools/speech_data_processor/sdp/processors/modify_manifest/__init__.py deleted file mode 100644 index 2db92b257416..000000000000 --- a/tools/speech_data_processor/sdp/processors/modify_manifest/__init__.py +++ /dev/null @@ -1,13 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/tools/speech_data_processor/sdp/processors/modify_manifest/data_to_data.py b/tools/speech_data_processor/sdp/processors/modify_manifest/data_to_data.py deleted file mode 100644 index 78bf51974559..000000000000 --- a/tools/speech_data_processor/sdp/processors/modify_manifest/data_to_data.py +++ /dev/null @@ -1,296 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import collections -import re -from typing import Dict, List - -from sdp.processors.base_processor import DataEntry -from sdp.processors.modify_manifest.modify_manifest import ModifyManifestTextProcessor -from sdp.utils.edit_spaces import add_start_end_spaces -from sdp.utils.get_diff import get_diff_with_subs_grouped - -from nemo.utils import logging - -# TODO: some of the processors here can be replaced with generic regex processor -# by substituiting different regex patterns - - -class SubSubstringToSpace(ModifyManifestTextProcessor): - """ - Class for processor that converts substrings in "text" to spaces. - - Args: - substrings: list of strings that will be replaced with spaces if - they are contained in data["text"]. - """ - - def __init__(self, substrings: List[str], **kwargs,) -> None: - super().__init__(**kwargs) - self.substrings = substrings - - def _process_dataset_entry(self, data_entry) -> List: - remove_substring_counter = collections.defaultdict(int) - for substring in self.substrings: - while substring in data_entry["text"]: - data_entry["text"] = data_entry["text"].replace(substring, " ") - remove_substring_counter[substring] += 1 - return [DataEntry(data=data_entry, metrics=remove_substring_counter)] - - def finalize(self, metrics): - total_counter = collections.defaultdict(int) - for counter in metrics: - for substring, value in counter.items(): - total_counter[substring] += value - logging.info("Number of substrings that were converted to spaces") - for substring, count in total_counter.items(): - logging.info(f"{substring}, {count}, \t\t(substring.encode(): {substring.encode()}") - super().finalize(metrics) - - -class SubSubstringToSubstring(ModifyManifestTextProcessor): - """ - Class for processor that converts substrings in "text" to other substrings. - The before and after substring pairs are defined in 'substring_pairs'. - - Args: - substring_pairs: Dictonary where keys are the substrings you want to change and - their values are the substrings you want to convert them to. - """ - - def __init__( - self, substring_pairs: Dict, **kwargs, - ): - super().__init__(**kwargs) - self.substring_pairs = substring_pairs - - def _process_dataset_entry(self, data_entry) -> List: - replace_substring_counter = collections.defaultdict(int) - for original_word, new_word in self.substring_pairs.items(): - while original_word in data_entry["text"]: - data_entry["text"] = data_entry["text"].replace(original_word, new_word) - return [DataEntry(data=data_entry, metrics=replace_substring_counter)] - - def finalize(self, metrics): - total_counter = collections.defaultdict(int) - for counter in metrics: - for substring, value in counter.items(): - total_counter[substring] += value - logging.info("Num of substrings that were substituted") - for substring, count in total_counter.items(): - logging.info(f"{substring} {count}") - super().finalize(metrics) - - -class InsIfASRInsertion(ModifyManifestTextProcessor): - """ - Class for processor that adds a substring to data["text"] if it is - present at that location in data["pred_text"]. - It is useful if words are systematically missing from ground truth - transcriptions. - - Args: - insert_words: list of strings that will be inserted into data['text'] if - there is an insertion (containing only that string) in data['pred_text']. - Note: because data_to_data looks for an exact match in the insertion, - we recommend including variations with different spaces in 'insert_words', - e.g. [' nemo', 'nemo ', ' nemo ']. - """ - - def __init__( - self, insert_words: List[str], **kwargs, - ): - super().__init__(**kwargs) - self.insert_words = insert_words - - def _process_dataset_entry(self, data_entry) -> List: - insert_word_counter = collections.defaultdict(int) - for insert_word in self.insert_words: - if not insert_word in data_entry["pred_text"]: - break - orig_words, pred_words = data_entry["text"], data_entry["pred_text"] - diff = get_diff_with_subs_grouped(orig_words, pred_words) - - if len(diff) > 0: # ie if there are differences between text and pred_text - - new_sent = "" - - for diff_entry in diff: - if diff_entry[0] == 0: # no change - new_sent += diff_entry[1] - - elif diff_entry[0] == -1: # deletion in original string - new_sent += diff_entry[1] - - elif diff_entry[0] == 1: # insertion in original string - if diff_entry[1] == insert_word: - new_sent += insert_word - insert_word_counter[insert_word] += 1 - - elif isinstance(diff_entry, tuple): # i.e. diff is a substitution - new_sent += diff_entry[0][1] - else: - raise ValueError(f"unexpected item in diff_entry: {diff_entry}") - - new_sent = " ".join(new_sent.split()) # remove any extra spaces - data_entry["text"] = new_sent - - return [DataEntry(data=data_entry, metrics=insert_word_counter)] - - def finalize(self, metrics): - total_counter = collections.defaultdict(int) - for counter in metrics: - for word, count in counter.items(): - total_counter[word] += count - logging.info("Num of words that were inserted") - for word, count in total_counter.items(): - logging.info(f"{word} {count}") - super().finalize(metrics) - - -class SubIfASRSubstitution(ModifyManifestTextProcessor): - """ - Class for processor that converts a substring in data['text'] to a - substring in data['pred_text'] if both are located in the same place - (ie are part of a 'substitution' operation) and if the substrings - correspond to key-value pairs in 'sub_words'. - This is useful if words are systematically incorrect in ground truth - transcriptions. - - Args: - sub_words: dictionary where a key is a string that might be in data['text'] - and the value is the string that might be in data['pred_text']. If both - are located in the same place (ie are part of a 'substitution' operation) - then the key string will be converted to the value string in data['text']. - - .. note:: - data_to_data looks for exact string matches of substitutions, so - you may need to be careful with spaces in 'sub_words', e.g. - recommended to do sub_words = {"nmo ": "nemo "} instead of - sub_words = {"nmo" : "nemo"} - """ - - def __init__( - self, sub_words: Dict, **kwargs, - ): - super().__init__(**kwargs) - self.sub_words = sub_words - - def _process_dataset_entry(self, data_entry) -> List: - sub_word_counter = collections.defaultdict(int) - for original_word, new_word in self.sub_words.items(): - if not original_word in data_entry["text"]: - break - orig_words, pred_words = data_entry["text"], data_entry["pred_text"] - diff = get_diff_with_subs_grouped(orig_words, pred_words) - - if len(diff) > 0: # ie if there are differences between text and pred_text - - new_sent = "" - - for diff_entry in diff: - if diff_entry[0] == 0: # no change - new_sent += diff_entry[1] - - elif diff_entry[0] == -1: # deletion in original string - new_sent += diff_entry[1] - - elif diff_entry[0] == 1: # insertion in original string - # don't make changes - pass - - elif isinstance(diff_entry, tuple): # substitution - if diff_entry[0][1] == original_word and diff_entry[1][1] == new_word: - # ie. substitution is one we want to use to change the original text - new_sent += new_word - sub_word_counter[original_word] += 1 - - else: - # ie. substitution is one we want to ignore - new_sent += diff_entry[0][1] - else: - raise ValueError(f"unexpected item in diff_entry: {diff_entry}") - - new_sent = add_start_end_spaces(new_sent) - data_entry["text"] = new_sent - - return [DataEntry(data=data_entry, metrics=sub_word_counter)] - - def finalize(self, metrics): - total_counter = collections.defaultdict(int) - for counter in metrics: - for word, count in counter.items(): - total_counter[word] += count - logging.info("Num of words that were substituted") - for word, count in total_counter.items(): - logging.info(f"{word} {count}") - super().finalize(metrics) - - -class SubMakeLowercase(ModifyManifestTextProcessor): - """ - Class to convert data['text'] to lowercase by calling '.lower()' on it. - """ - - def __init__( - self, **kwargs, - ): - super().__init__(**kwargs) - - def _process_dataset_entry(self, data_entry) -> List: - data_entry["text"] = data_entry["text"].lower() - return [DataEntry(data=data_entry)] - - def finalize(self, metrics): - logging.info("Made all letters lowercase") - super().finalize(metrics) - - -class SubRegex(ModifyManifestTextProcessor): - """ - Class for processor that converts a regex match to a string, as defined - by key-value pairs in regex_to_sub. - - Args: - regex_to_sub: dictionary where the keys are regex patterns that might - be in data['text'], and the values are the strings that will replace - the regex matches if they are found. - """ - - def __init__( - self, regex_to_sub: Dict, **kwargs, - ): - super().__init__(**kwargs) - self.regex_to_sub = regex_to_sub - - def _process_dataset_entry(self, data_entry) -> List: - replace_word_counter = collections.defaultdict(int) - for regex, sub in self.regex_to_sub.items(): - while re.search(regex, data_entry["text"]): - for match in re.finditer(regex, data_entry["text"]): - replace_word_counter[match.group(0)] += 1 - data_entry["text"] = re.sub(regex, sub, data_entry["text"]) - return [DataEntry(data=data_entry, metrics=replace_word_counter)] - - def finalize(self, metrics): - total_counter = collections.defaultdict(int) - for counter in metrics: - for word, count in counter.items(): - total_counter[word] += count - logging.info("Some of the words matching the regex that were substituted") - total_counter_sorted = dict(sorted(total_counter.items(), key=lambda x: x[1], reverse=True)) - for word, count in total_counter_sorted.items(): - if count > 1: - logging.info(f"{word} {count}") - super().finalize(metrics) diff --git a/tools/speech_data_processor/sdp/processors/modify_manifest/data_to_dropbool.py b/tools/speech_data_processor/sdp/processors/modify_manifest/data_to_dropbool.py deleted file mode 100644 index 926950d7423b..000000000000 --- a/tools/speech_data_processor/sdp/processors/modify_manifest/data_to_dropbool.py +++ /dev/null @@ -1,563 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import collections -import re -from typing import Dict, List - -from sdp.processors.base_processor import DataEntry -from sdp.processors.modify_manifest.modify_manifest import ModifyManifestTextProcessor -from sdp.utils.edit_spaces import remove_extra_spaces -from sdp.utils.get_diff import get_diff_with_subs_grouped -from sdp.utils.metrics_computation import get_cer, get_charrate, get_wer, get_wmr, get_wordrate - -from nemo.utils import logging - - -class DropHighLowCharrate(ModifyManifestTextProcessor): - """ - Class for processor that drops utterances if their character rate is - too low or too high. Character rate = (num of characters in "text")/ - (duration of audio). - A too-low or too-high character rate often implies that the ground - truth text is inaccurate. - - Args: - high_charrate_threshold: a float for the upper character rate threshold. - If the character rate of an utterance is higher than this number, - the utterance will be dropped. - low_charrate_threshold: a float for the lower character rate threshold. - If the character rate of an utterance is lower than this number, - the utterance will be dropped. - """ - - def __init__( - self, high_charrate_threshold: float, low_charrate_threshold: float, **kwargs, - ): - super().__init__(**kwargs) - - self.high_charrate_threshold = high_charrate_threshold - self.low_charrate_threshold = low_charrate_threshold - - def _process_dataset_entry(self, data_entry) -> List: - charrate = get_charrate(remove_extra_spaces(data_entry["text"]), data_entry["duration"]) - if charrate > self.high_charrate_threshold: - return [DataEntry(data=None, metrics=(0, 1))] - elif charrate < self.low_charrate_threshold: - return [DataEntry(data=None, metrics=(1, 0))] - - return [DataEntry(data=data_entry, metrics=(0, 0))] - - def finalize(self, metrics): - high_drop_counter = 0 - low_drop_counter = 0 - for (dropped_low, dropped_high) in metrics: - low_drop_counter += dropped_low - high_drop_counter += dropped_high - logging.info( - "Num of utterances that were dropped due to char rate > %d: %d", - self.high_charrate_threshold, - high_drop_counter, - ) - - logging.info( - "Num of utterances that were dropped due to char rate < %d: %d", - self.low_charrate_threshold, - low_drop_counter, - ) - super().finalize(metrics) - - -class DropHighLowWordrate(ModifyManifestTextProcessor): - """ - Class for processor that drops utterances if their word rate is - too low or too high. Word rate = (num of words in "text")/ - (duration of audio). - A too-low or too-high word rate often implies that the ground - truth text is inaccurate. - - Args: - high_wordrate_threshold: a float for the upper word rate threshold. - If the word rate of an utterance is higher than this number, - the utterance will be dropped. - low_wordrate_threshold: a float for the lower word rate threshold. - If the word rate of an utterance is lower than this number, - the utterance will be dropped. - """ - - def __init__( - self, high_wordrate_threshold: float, low_wordrate_threshold: float, **kwargs, - ): - super().__init__(**kwargs) - - self.high_wordrate_threshold = high_wordrate_threshold - self.low_wordrate_threshold = low_wordrate_threshold - - def _process_dataset_entry(self, data_entry) -> List: - wordrate = get_wordrate(data_entry["text"], data_entry["duration"]) - if wordrate > self.high_wordrate_threshold: - return [DataEntry(data=None, metrics=(0, 1))] - elif wordrate < self.low_wordrate_threshold: - return [DataEntry(data=None, metrics=(1, 0))] - - return [DataEntry(data=data_entry, metrics=(0, 0))] - - def finalize(self, metrics): - high_drop_counter = 0 - low_drop_counter = 0 - for (dropped_low, dropped_high) in metrics: - low_drop_counter += dropped_low - high_drop_counter += dropped_high - logging.info( - "Num of utterances that were dropped due to word rate > %d: %d", - self.high_wordrate_threshold, - high_drop_counter, - ) - logging.info( - "Num of utterances that were dropped due to word rate < %d: %d", - self.low_wordrate_threshold, - low_drop_counter, - ) - super().finalize(metrics) - - -class DropHighLowDuration(ModifyManifestTextProcessor): - """ - Class for processor that drops utterances if their audio duration - (in seconds) is too low or too high. - - Args: - high_duration_threshold: a float for the upper duration threshold (in seconds). - If the duration of an utterance's audio is higher than this number, - the utterance will be dropped. - low_duration_threshold: a float for the lower duration threshold (in seconds). - If the duration of an utterance's audio is lower than this number, - the utterance will be dropped. - """ - - def __init__( - self, high_duration_threshold: float, low_duration_threshold: float, **kwargs, - ): - super().__init__(**kwargs) - self.high_duration_threshold = high_duration_threshold - self.low_duration_threshold = low_duration_threshold - self.high_drop_counter = 0 - self.low_drop_counter = 0 - - def _process_dataset_entry(self, data_entry) -> List: - duration = data_entry["duration"] - if duration > self.high_duration_threshold: - return [DataEntry(data=None, metrics=(0, 1))] - elif duration < self.low_duration_threshold: - return [DataEntry(data=None, metrics=(1, 0))] - - return [DataEntry(data=data_entry, metrics=(0, 0))] - - def finalize(self, metrics): - high_drop_counter = 0 - low_drop_counter = 0 - for (dropped_low, dropped_high) in metrics: - low_drop_counter += dropped_low - high_drop_counter += dropped_high - logging.info( - "Num of utterances that were dropped due to duration > %d: %d", - self.high_duration_threshold, - high_drop_counter, - ) - logging.info( - "Num of utterances that were dropped due to duration < %d: %d", - self.low_duration_threshold, - low_drop_counter, - ) - super().finalize(metrics) - - -class DropNonAlphabet(ModifyManifestTextProcessor): - """ - Class for processor that drops utterances if they contain characters that - are not in our "alphabet". - - Args: - alphabet: a string containing all of the characters in our alphabet. - If an utterance contains at least one character that is not in the - 'alphabet', then that utterance will be dropped. - Note: don't forget to include spaces in your alphabet, unless you - want to make sure none of the utterances contain spaces. - """ - - def __init__( - self, alphabet: str, **kwargs, - ): - super().__init__(**kwargs) - self.alphabet = alphabet - - def _process_dataset_entry(self, data_entry) -> List: - drop_this_utt = False - non_alphabet_counter = collections.defaultdict(int) - for char in data_entry["text"]: - if char not in self.alphabet: - drop_this_utt = True - non_alphabet_counter[char] += 1 - if drop_this_utt: - return [DataEntry(data=None, metrics=non_alphabet_counter)] - return [DataEntry(data=data_entry, metrics=non_alphabet_counter)] - - def finalize(self, metrics): - total_counter = collections.defaultdict(int) - for counter in metrics: - for char, value in counter.items(): - total_counter[char] += value - logging.info("Num of non-alphabet characters") - for char, count in total_counter.items(): - logging.info(f"{char}: {count}") - super().finalize(metrics) - - -class DropASRErrorBeginningEnd(ModifyManifestTextProcessor): - """ - Class for processor that drops utterances if there is a sufficiently long - ASR mismatch at the beginning or end of the utterance. - - Args: - beginning_error_char_threshold: if there is an insertion or deletion at - the beginning of the utterance that has more characters than this number, - then the utterance will be dropped. - If there is a substitution at the beginning of the utterance, then the - utterance will be dropped if abs(len(deletion) - len(insertion)) > - beginning_error_char_threshold. - end_error_char_threshold: if there is an insertion or deletion at - the end of the utterance that has more characters than this number, - then the utterance will be dropped. - If there is a substitution at the end of the utterance, then the - utterance will be dropped if abs(len(deletion) - len(insertion)) > - end_error_char_threshold. - """ - - def __init__( - self, beginning_error_char_threshold: int, end_error_char_threshold: int, **kwargs, - ): - super().__init__(**kwargs) - self.beginning_error_char_threshold = beginning_error_char_threshold - self.end_error_char_threshold = end_error_char_threshold - - def _process_dataset_entry(self, data_entry) -> List: - orig_words, pred_words = data_entry["text"], data_entry["pred_text"] - - # remove spaces at start and end. Otherwise all utterances - # will have no errors at the begining (because both 'text' - # and 'pred_text' will begin with " ") - orig_words = remove_extra_spaces(orig_words) - pred_words = remove_extra_spaces(pred_words) - - diff = get_diff_with_subs_grouped(orig_words, pred_words) - - if len(diff) > 0: # i.e. if there are differences between text and pred_text - first_diff_entry = diff[0] - if first_diff_entry[0] == 1 or first_diff_entry[0] == -1: # i.e. diff is purely an insertion or deletion - if len(first_diff_entry[1]) > self.beginning_error_char_threshold: - return [DataEntry(data=None, metrics=(1, 0))] - elif first_diff_entry[0] != 0: # i.e. diff should be a tuple representing substitution - len_deletion = len(first_diff_entry[0][1]) - len_insertion = len(first_diff_entry[1][1]) - if abs(len_deletion - len_insertion) > self.beginning_error_char_threshold: - return [DataEntry(data=None, metrics=(1, 0))] - - last_diff_entry = diff[-1] - if last_diff_entry[0] == 1 or last_diff_entry[0] == -1: # i.e. diff is purely an insertion or deletion - if len(last_diff_entry[1]) > self.end_error_char_threshold: - return [DataEntry(data=None, metrics=(0, 1))] - elif last_diff_entry[0] != 0: # i.e. diff should be a tuple representing substitution - len_deletion = len(last_diff_entry[0][1]) - len_insertion = len(last_diff_entry[1][1]) - if abs(len_deletion - len_insertion) > self.end_error_char_threshold: - return [DataEntry(data=None, metrics=(0, 1))] - - return [DataEntry(data=data_entry, metrics=(0, 0))] - - def finalize(self, metrics): - beginning_drop_counter = 0 - end_drop_counter = 0 - for (dropped_beginning, dropped_end) in metrics: - beginning_drop_counter += dropped_beginning - end_drop_counter += dropped_end - logging.info( - "Num of utterances that were dropped due to asr " "insertions/deletions at the beginning: %d", - beginning_drop_counter, - ) - logging.info( - "Num of utterances that were dropped due to asr insertions/deletions at the end: %d", end_drop_counter, - ) - super().finalize(metrics) - - -class DropHighCER(ModifyManifestTextProcessor): - """ - Class for processor that drops utterances if there is a sufficiently - high CER between data['text'] and data['pred_text']. - Note: we only drop the utterance if CER > threshold (ie strictly greater - than) so that if we set the threshold to 0, we will not remove - utterances with CER == 0. - - Args: - cer_thershold: CER threshold above which the utterance will be dropped. - """ - - def __init__( - self, cer_threshold: float, **kwargs, - ): - super().__init__(**kwargs) - self.cer_threshold = cer_threshold - - def _process_dataset_entry(self, data_entry) -> List: - cer = get_cer(remove_extra_spaces(data_entry["text"]), remove_extra_spaces(data_entry["pred_text"])) - if cer > self.cer_threshold: - return [DataEntry(data=None, metrics=1)] - else: - return [DataEntry(data=data_entry, metrics=0)] - - def finalize(self, metrics): - drop_counter = 0 - for dropped in metrics: - drop_counter += dropped - logging.info( - "Num of utterances that were dropped due to CER > %d: %d", self.cer_threshold, drop_counter, - ) - super().finalize(metrics) - - -class DropHighWER(ModifyManifestTextProcessor): - """ - Class for processor that drops utterances if there is a sufficiently - high WER between data['text'] and data['pred_text']. - Note: we only drop the utterance if CER > threshold (ie strictly greater - than) so that if we set the threshold to 0, we will not remove - utterances with WER == 0. - - Args: - wer_thershold: WER threshold above which the utterance will be dropped. - """ - - def __init__( - self, wer_threshold: float, **kwargs, - ): - super().__init__(**kwargs) - self.wer_threshold = wer_threshold - - def _process_dataset_entry(self, data_entry) -> List: - wer = get_wer(data_entry["text"], data_entry["pred_text"]) - if wer > self.wer_threshold: - return [DataEntry(data=None, metrics=1)] - else: - return [DataEntry(data=data_entry, metrics=0)] - - def finalize(self, metrics): - drop_counter = 0 - for dropped in metrics: - drop_counter += dropped - logging.info( - "Num of utterances that were dropped due to WER > %d: %d", self.wer_threshold, drop_counter, - ) - super().finalize(metrics) - - -class DropLowWordMatchRate(ModifyManifestTextProcessor): - """ - Class for processor that drops utterances if there is a sufficiently - low WMR between data['text'] and data['pred_text']. - Note: we only drop the utterance if WMR < threshold (ie strictly lower - than) so that if we set the threshold to 100, we will not remove - utterances with WMR == 100. - - Args: - wmr_thershold: WMR threshold below which the utterance will be dropped. - """ - - def __init__( - self, wmr_threshold: float, **kwargs, - ): - super().__init__(**kwargs) - self.wmr_threshold = wmr_threshold - - def _process_dataset_entry(self, data_entry) -> List: - orig_words, pred_words = data_entry["text"], data_entry["pred_text"] - orig_words = remove_extra_spaces(orig_words) - pred_words = remove_extra_spaces(pred_words) - wmr = get_wmr(orig_words, pred_words) - if wmr < self.wmr_threshold: - return [DataEntry(data=None, metrics=1)] - else: - return [DataEntry(data=data_entry, metrics=0)] - - def finalize(self, metrics): - drop_counter = 0 - for dropped in metrics: - drop_counter += dropped - logging.info( - "Num of utterances that were dropped due to WMR < %d: %d", self.wmr_threshold, drop_counter, - ) - super().finalize(metrics) - - -class DropIfSubstringInAttribute(ModifyManifestTextProcessor): - """ - Class for processor that drops utterances if an attribute of 'data' contains - a string, as specified by attribute_to_substring. - - Args: - attribute_to_substring: a dictionary where the keys are existing attributes - of 'data', and the values are lists of strings which the utterances might - contain. If the specified attribute contains the specified string, that - utterance will be dropped. - """ - - def __init__( - self, attribute_to_substring: Dict, **kwargs, - ): - super().__init__(**kwargs) - self.attribute_to_substring = attribute_to_substring - - def _process_dataset_entry(self, data_entry) -> List: - for attribute in self.attribute_to_substring.keys(): - if attribute not in data_entry: - raise ValueError(f"attribute {attribute} not in data {data_entry}") - else: - for substring_to_drop in self.attribute_to_substring[attribute]: - if substring_to_drop in data_entry[attribute]: - return [DataEntry(data=None, metrics=f'"{attribute}" contains "{substring_to_drop}"')] - return [DataEntry(data=data_entry, metrics="")] - - def finalize(self, metrics): - total_counter = collections.defaultdict(int) - for value in metrics: - if value: - total_counter[value] += 1 - logging.info("Num of utterances that were dropped containing substring in attribute") - for idx, count in total_counter.items(): - logging.info(f"{idx}, {count}") - super().finalize(metrics) - - -class DropIfRegexInAttribute(ModifyManifestTextProcessor): - """ - Class for processor that drops utterances if an attribute of 'data' matches - a regex pattern, as specified by attribute_to_regex. - - Args: - attribute_to_regex: a dictionary where the keys are existing attributes - of 'data', and the values are lists of strings containing regex patterns - which the utterance might contain. If the specified attribute contains - the specified regex pattern, the utterance will be dropped. - """ - - def __init__( - self, attribute_to_regex: Dict, **kwargs, - ): - super().__init__(**kwargs) - self.attribute_to_regex = attribute_to_regex - - def _process_dataset_entry(self, data_entry) -> List: - drop_counter = collections.defaultdict(set) - for attribute in self.attribute_to_regex.keys(): - if attribute not in data_entry: - raise ValueError(f"attribute {attribute} not in data {data_entry}") - else: - for regex in self.attribute_to_regex[attribute]: - if re.search(regex, data_entry[attribute]): - for match in re.finditer(regex, data_entry[attribute]): - drop_counter[attribute].add(match.group(0)) - return [DataEntry(data=None, metrics=drop_counter)] - return [DataEntry(data=data_entry, metrics=drop_counter)] - - def finalize(self, metrics): - total_counter = collections.defaultdict(set) - for counter in metrics: - for attribute, value in counter.items(): - total_counter[attribute].add(value) - logging.info("Regex matches that were dropped in attribute") - for attribute, matches in total_counter.items(): - logging.info(f"{attribute}, {matches}") - super().finalize(metrics) - - -class DropIfSubstringInInsertion(ModifyManifestTextProcessor): - """ - Class for processor that drops utterances if a substring matches an insertion - made between data['text'] and data['pred_text']. - Note: we check for exact matches, so you need to be mindful of spaces, e.g. - you may wish to do substrings_in_insertion = ["nemo ", ...] instead - of substrings_in_insertion = ["nemo", ...] - - Args: - substrings_in_insertion: a list of strings which might be inserted in predicted - ASR text. If the insertion matches a string exactly, the utterance will - be dropped. - """ - - def __init__( - self, substrings_in_insertion: List[str], **kwargs, - ): - super().__init__(**kwargs) - self.substrings_in_insertion = substrings_in_insertion - - def _process_dataset_entry(self, data_entry) -> List: - - for substring_in_insertion in self.substrings_in_insertion: - if substring_in_insertion in data_entry["pred_text"]: - orig_words, pred_words = data_entry["text"], data_entry["pred_text"] - diff = get_diff_with_subs_grouped(orig_words, pred_words) - - for diff_entry in diff: - if diff_entry[0] == 1: # insertion in original string - if substring_in_insertion in diff_entry[1]: - return [DataEntry(data=None, metrics=diff_entry[1])] - return [DataEntry(data=data_entry, metrics="")] - - def finalize(self, metrics): - total_counter = collections.defaultdict(int) - for diff_entry in metrics: - if diff_entry: - total_counter[diff_entry] += 1 - logging.info("Some of the insertions that cause the utterance to be dropped:") - total_counter_sorted = dict(sorted(total_counter.items(), key=lambda x: x[1], reverse=True)) - - for insertion, count in total_counter_sorted.items(): - logging.info(f"{insertion}, {count}") - super().finalize(metrics) - - -class DropIfTextIsEmpty(ModifyManifestTextProcessor): - """ - Class for processor that drops utterances if data['text'] is an empty string. - """ - - def __init__( - self, **kwargs, - ): - super().__init__(**kwargs) - - def _process_dataset_entry(self, data_entry) -> List: - if "text" not in data_entry: - raise ValueError(f'attribute "text" not in data {data_entry}') - else: - if len(data_entry["text"].strip()) == 0: # {text: " "} is considered empty - return [DataEntry(data=None, metrics=1)] - else: - return [DataEntry(data=data_entry, metrics=0)] - - def finalize(self, metrics): - drop_counter = 0 - for dropped in metrics: - drop_counter += dropped - logging.info("Num of utterances that were dropped because text was empty: %d", drop_counter) - super().finalize(metrics) diff --git a/tools/speech_data_processor/sdp/processors/modify_manifest/modify_manifest.py b/tools/speech_data_processor/sdp/processors/modify_manifest/modify_manifest.py deleted file mode 100644 index 5c8ceefebe8e..000000000000 --- a/tools/speech_data_processor/sdp/processors/modify_manifest/modify_manifest.py +++ /dev/null @@ -1,95 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -from abc import abstractmethod -from typing import Dict, List, Optional - -from sdp.processors.base_processor import BaseParallelProcessor -from sdp.utils.edit_spaces import add_start_end_spaces, remove_extra_spaces - - -class ModifyManifestTextProcessor(BaseParallelProcessor): - """Base class useful for most "text-only" modifications of the manifest. - - This adds the following functionality on top of BaseParallelProcessor - - Adds space in the beginning and end of sentence for easier regex-based - processing. - - Automatically handles common test cases by comparing input to output - values. - - Args: - test_cases: an optional list of dicts containing test cases for checking - that the processor makes the changes that we are expecting. - The dicts must have a key 'input', the value of which is a dictionary - containing data which is our test input manifest line, and a key - 'output', the value of which is a dictionary containing data which is - the expected output manifest line. - - .. note:: - This class only supports one-to-one or one-to-none mappings. - """ - - def __init__(self, test_cases: Optional[List[Dict]] = None, **kwargs): - super().__init__(**kwargs) - self.test_cases = test_cases - - def test(self): - for test_case in self.test_cases: - generated_outputs = self.process_dataset_entry(test_case["input"]) - # can only return 1 or zero entries - if len(generated_outputs) == 1: - generated_output = generated_outputs[0].data - else: - generated_output = None - if generated_output != test_case["output"]: - raise RuntimeError( - "Runtime test failed.\n" - f"Test input: {test_case['input']}\n" - f"Generated output: {generated_output}\n" - f"Expected output: {test_case['output']}" - ) - - @abstractmethod - def _process_dataset_entry(self, data_entry): - pass - - def process_dataset_entry(self, data_entry): - """Wrapper for 'process_dataset_entry' abstract method. - - Before 'process_dataset_entry' is called, the function - 'add_start_end_spaces' is applied to the "text" and "pred_text" in - the input data. - After 'process_dataset_entry' is called, the function - 'remove_extra_spaces' is applied to the "text" and "pred_text" to - the output 'data' variable of the 'process_dataset_entry' method. - """ - # handle spaces - if "text" in data_entry: - data_entry["text"] = add_start_end_spaces(data_entry["text"]) - if "pred_text" in data_entry: - data_entry["pred_text"] = add_start_end_spaces(data_entry["pred_text"]) - - data_entries = self._process_dataset_entry(data_entry) - if len(data_entries) > 1: - raise RuntimeError("Cannot handle one-to-many relation") - - if len(data_entries) == 1 and data_entries[0].data is not None: - # handle spaces - if "text" in data_entries[0].data: - data_entries[0].data["text"] = remove_extra_spaces(data_entries[0].data["text"]) - if "pred_text" in data_entries[0].data: - data_entries[0].data["pred_text"] = remove_extra_spaces(data_entries[0].data["pred_text"]) - - return data_entries diff --git a/tools/speech_data_processor/sdp/processors/write_manifest.py b/tools/speech_data_processor/sdp/processors/write_manifest.py deleted file mode 100644 index f601985a1647..000000000000 --- a/tools/speech_data_processor/sdp/processors/write_manifest.py +++ /dev/null @@ -1,45 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import json -from typing import List - -from sdp.processors.base_processor import BaseProcessor -from tqdm import tqdm - - -class WriteManifest(BaseProcessor): - """ - Saves a copy of a manifest but only with the fields specified in fields_to_save. - - Args: - output_manifest_file: path of where the output file will be saved. - input_manifest_file: path of where the input file that we will be copying is saved. - fields_to_save: list of the fields in the input manifest that we want to copy over. - The output file will only contain these fields. - """ - - def __init__(self, output_manifest_file: str, input_manifest_file: str, fields_to_save: List[str]): - self.output_manifest_file = output_manifest_file - self.input_manifest_file = input_manifest_file - self.fields_to_save = fields_to_save - - def process(self): - with open(self.input_manifest_file, "rt", encoding="utf8") as fin, open( - self.output_manifest_file, "wt", encoding="utf8" - ) as fout: - for line in tqdm(fin): - line = json.loads(line) - new_line = {field: line[field] for field in self.fields_to_save} - fout.write(json.dumps(new_line) + "\n") diff --git a/tools/speech_data_processor/sdp/run_processors.py b/tools/speech_data_processor/sdp/run_processors.py deleted file mode 100644 index 28d790e9ecd2..000000000000 --- a/tools/speech_data_processor/sdp/run_processors.py +++ /dev/null @@ -1,67 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import tempfile -import uuid - -import hydra -from omegaconf import OmegaConf - -from nemo.utils import logging - - -def run_processors(cfg): - logging.info(f"Hydra config: {OmegaConf.to_yaml(cfg)}") - processors_to_run = cfg.get("processors_to_run", "all") - - if processors_to_run == "all": - processors_to_run = ":" - # converting processors_to_run into Python slice - processors_to_run = slice(*map(lambda x: int(x.strip()) if x.strip() else None, processors_to_run.split(":"))) - processors_cfgs = cfg.processors[processors_to_run] - logging.info( - "Specified to run the following processors: %s ", [cfg["_target_"] for cfg in processors_cfgs], - ) - - processors = [] - # let's build all processors first to automatically check - # for errors in parameters - with tempfile.TemporaryDirectory() as tmp_dir: - for idx, processor_cfg in enumerate(processors_cfgs): - logging.info('=> Building processor "%s"', processor_cfg["_target_"]) - - # we assume that each processor defines "output_manifest_file" - # and "input_manifest_file" keys, which can be optional. In case they - # are missing, we create tmp files here for them - if "output_manifest_file" not in processor_cfg: - tmp_file_path = os.path.join(tmp_dir, str(uuid.uuid4())) - OmegaConf.set_struct(processor_cfg, False) - processor_cfg["output_manifest_file"] = tmp_file_path - OmegaConf.set_struct(processor_cfg, True) - if idx != len(processors_cfgs) - 1 and "input_manifest_file" not in processors_cfgs[idx + 1]: - OmegaConf.set_struct(processors_cfgs[idx + 1], False) - processors_cfgs[idx + 1]["input_manifest_file"] = tmp_file_path - OmegaConf.set_struct(processors_cfgs[idx + 1], True) - - processor = hydra.utils.instantiate(processor_cfg) - # running runtime tests to fail right-away if something is not - # matching users expectations - processor.test() - processors.append(processor) - - for processor in processors: - # TODO: add proper str method to all classes for good display - logging.info('=> Running processor "%s"', processor) - processor.process() diff --git a/tools/speech_data_processor/sdp/utils/__init__.py b/tools/speech_data_processor/sdp/utils/__init__.py deleted file mode 100644 index 2db92b257416..000000000000 --- a/tools/speech_data_processor/sdp/utils/__init__.py +++ /dev/null @@ -1,13 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/tools/speech_data_processor/sdp/utils/common.py b/tools/speech_data_processor/sdp/utils/common.py deleted file mode 100644 index f9b7e96aaa91..000000000000 --- a/tools/speech_data_processor/sdp/utils/common.py +++ /dev/null @@ -1,51 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import tarfile -import urllib - -import wget - -from nemo.utils import logging - -# TODO: seems like this download code is saving initial file -# in the local directory! - - -def download_file(source_url: str, target_directory: str): - logging.info(f"Trying to download data from {source_url} " f"and save it in this directory: {target_directory}") - filename = os.path.basename(urllib.parse.urlparse(source_url).path) - target_filepath = os.path.join(target_directory, filename) - - if os.path.exists(target_filepath): - logging.info(f"Found file {target_filepath} => will not be attempting" f" download from {source_url}") - else: - wget.download(source_url, target_directory) - logging.info("Download completed") - - -def extract_archive(archive_path: str, extract_path: str) -> str: - logging.info(f"Attempting to extract all contents from tar file {archive_path}" f" and save in {extract_path}") - archive_file_basename = os.path.basename(archive_path) - archive_contents_dir = os.path.join(extract_path, archive_file_basename.split(".tar.gz")[0]) - - if os.path.exists(archive_contents_dir): - logging.info(f"Directory {archive_contents_dir} already exists => " "will not attempt to extract file") - else: - with tarfile.open(archive_path, "r") as archive: - archive.extractall(path=extract_path) - logging.info("Finished extracting") - - return archive_contents_dir diff --git a/tools/speech_data_processor/sdp/utils/edit_spaces.py b/tools/speech_data_processor/sdp/utils/edit_spaces.py deleted file mode 100644 index 33e664633156..000000000000 --- a/tools/speech_data_processor/sdp/utils/edit_spaces.py +++ /dev/null @@ -1,41 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -def remove_extra_spaces(input_string): - """ - Removes extra spaces in between words and at the start and end - of the string. - e.g. "abc xyz abc xyz" --> "abc xyz abc xyz" - e.g. " abc xyz " --> "abc xyz" - """ - output_string = " ".join(input_string.split()) - return output_string - - -def add_start_end_spaces(input_string): - """ - Adds spaces at the start and end of the input string. - This is useful for when we specify we are looking for a particular - word " ". This will ensure we will find the word even - if it is at the beginning or end of the utterances (ie. there will - definitely be two spaces around the word). - - e.g. "abc xyz" --> " abc xyz " - """ - # ensure no extra spaces - no_extra_spaces_string = remove_extra_spaces(input_string) - output_string = f" {no_extra_spaces_string} " - - return output_string diff --git a/tools/speech_data_processor/sdp/utils/get_diff.py b/tools/speech_data_processor/sdp/utils/get_diff.py deleted file mode 100644 index 2aa4ffdc3bb1..000000000000 --- a/tools/speech_data_processor/sdp/utils/get_diff.py +++ /dev/null @@ -1,77 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from typing import List - -import diff_match_patch -from sdp.utils.edit_spaces import remove_extra_spaces - -diff = diff_match_patch.diff_match_patch() -diff.Diff_Timeout = 0 - - -def get_diff_with_subs_grouped(orig_words: str, pred_words: str) -> List[tuple]: - """ - Function to produce a list of word-level diffs, but with the substitutions - grouped together. - e.g. - orig_words = "hello there nemo" - pred_words = "hello my name is nemo" - will give an output of: - [(0, 'hello '), ((-1, 'there '), (1, 'my name is ')), (0, 'nemo ')] - (note how the 'there' nad 'my name is' entry are grouped together in a tuple) - - This is to make it easier to find substitutions in the diffs, as - dif_match_patch does not show substitutions clearly, only as a deletion followed by - an insertion. - - Args: - orig_words: a string containing the groud truth. - pred_words: a string containing the text predicted by ASR. - - Returns: - A list of tuples containing the word-level diffs between the ground truth - and ASR. - """ - - orig_words = remove_extra_spaces(orig_words) - orig_words = orig_words.replace(" ", "\n") + "\n" - - pred_words = remove_extra_spaces(pred_words) - pred_words = pred_words.replace(" ", "\n") + "\n" - - orig_enc, pred_enc, enc = diff.diff_linesToChars(orig_words, pred_words) - diffs = diff.diff_main(orig_enc, pred_enc, False) - diff.diff_charsToLines(diffs, enc) - diffs_post = [] - - for d in diffs: - diffs_post.append((d[0], d[1].replace("\n", " "))) - diffs = diffs_post - - diffs_group_subs = [] - i = 0 - while i < len(diffs): - if i < len(diffs) - 1: # if i == len(diffs), line accessing diffs[i+1] will raise error - if diffs[i][0] == -1 and diffs[i + 1][0] == 1: - diffs_group_subs.append((diffs[i], diffs[i + 1])) - i += 1 # skip extra diff entry so we don't append diffs[i+1] again - else: - diffs_group_subs.append(diffs[i]) - else: - diffs_group_subs.append(diffs[i]) - - i += 1 - - return diffs_group_subs diff --git a/tools/speech_data_processor/sdp/utils/metrics_computation.py b/tools/speech_data_processor/sdp/utils/metrics_computation.py deleted file mode 100644 index cacf5a2f901a..000000000000 --- a/tools/speech_data_processor/sdp/utils/metrics_computation.py +++ /dev/null @@ -1,63 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import difflib - -import editdistance - -sm = difflib.SequenceMatcher() - - -def get_cer(text, pred_text): - char_dist = editdistance.eval(text, pred_text) - num_chars = len(text) - cer = round(char_dist / num_chars * 100.0, 2) - - return cer - - -def get_wer(text, pred_text): - text_words = text.split() - pred_text_words = pred_text.split() - word_dist = editdistance.eval(text_words, pred_text_words) - - num_words = len(text_words) - wer = round(word_dist / num_words * 100.0, 2) - - return wer - - -def get_charrate(text, duration): - num_chars = len(text) - charrate = round(num_chars / duration, 2) - - return charrate - - -def get_wordrate(text, duration): - num_words = len(text.split()) - wordrate = round(num_words / duration, 2) - - return wordrate - - -def get_wmr(text, pred_text): - orig = text.strip().split() - sm.set_seqs(orig, pred_text.strip().split()) - num_matches = 0 - for m in sm.get_matching_blocks(): - for word_idx in range(m[0], m[0] + m[2]): - num_matches += 1 - wmr = round(num_matches / len(orig) * 100.0, 2) - return wmr diff --git a/tools/speech_data_processor/tests/__init__.py b/tools/speech_data_processor/tests/__init__.py deleted file mode 100644 index 2db92b257416..000000000000 --- a/tools/speech_data_processor/tests/__init__.py +++ /dev/null @@ -1,13 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/tools/speech_data_processor/tests/prepare_test_data/prepare_mls_data.py b/tools/speech_data_processor/tests/prepare_test_data/prepare_mls_data.py deleted file mode 100644 index 7a420c515cc9..000000000000 --- a/tools/speech_data_processor/tests/prepare_test_data/prepare_mls_data.py +++ /dev/null @@ -1,52 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Will take the donwloaded tar file and create a version with only X entries.""" - -import argparse -import os -import shutil -import tarfile -import tempfile -from pathlib import Path - -if __name__ == "__main__": - parser = argparse.ArgumentParser("Preparing MLS test data") - parser.add_argument("--extracted_data_path", required=True, help="Path to the downloaded and extracted data.") - parser.add_argument("--num_entries", default=200, type=int, help="How many entries to keep (in each split)") - parser.add_argument("--test_data_folder", required=True, help="Where to place the prepared data") - - args = parser.parse_args() - with tempfile.TemporaryDirectory() as tmpdir: - tmpdir_path = Path(tmpdir) - for split in ["train", "dev", "test"]: - os.makedirs(tmpdir_path / split / "audio") - transcript_path = Path(args.extracted_data_path) / split / "transcripts.txt" - with open(transcript_path, "rt", encoding="utf8") as fin, open( - tmpdir_path / split / "transcripts.txt", "wt", encoding="utf8" - ) as fout: - for idx, line in enumerate(fin): - if idx == args.num_entries: - break - utt_id = line.split("\t", 1)[0] - src_flac_path = os.path.join( - args.extracted_data_path, split, "audio", *utt_id.split("_")[:2], utt_id + ".flac" - ) - fout.write(line) - tgt_flac_dir = os.path.join(tmpdir_path, split, "audio", *utt_id.split("_")[:2]) - os.makedirs(tgt_flac_dir, exist_ok=True) - shutil.copy(src_flac_path, os.path.join(tgt_flac_dir, utt_id + ".flac")) - with tarfile.open(os.path.join(args.test_data_folder, "data.tar.gz"), "w:gz") as tar: - # has to be the same as what's before .tar.gz - tar.add(tmpdir, arcname="data") diff --git a/tools/speech_data_processor/tests/test_all_cfgs.py b/tools/speech_data_processor/tests/test_all_cfgs.py deleted file mode 100644 index 56564e3df60d..000000000000 --- a/tools/speech_data_processor/tests/test_all_cfgs.py +++ /dev/null @@ -1,57 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import glob -import json -from pathlib import Path - -import pytest -from omegaconf import OmegaConf -from sdp.run_processors import run_processors - -CONFIG_BASE_DIR = Path(__file__).parents[1] / "dataset_configs" - - -def get_test_cases(): - """Returns paths to all configs that are checked in.""" - for config_path in glob.glob(f"{CONFIG_BASE_DIR}/**/*.yaml", recursive=True): - yield config_path - - -@pytest.mark.pleasefixme -@pytest.mark.parametrize("config_path", get_test_cases()) -def test_configs(config_path, tmp_path): - cfg = OmegaConf.load(config_path) - assert "processors" in cfg - cfg["processors_to_run"] = "all" - cfg["workspace_dir"] = str(tmp_path) - cfg["final_manifest"] = str(tmp_path / "final_manifest.json") - cfg["data_split"] = "train" - cfg["processors"][0]["use_test_data"] = True - # the test fails if any error in data processing pipeline end-to-end - run_processors(cfg) - # additionally, let's test that final generated manifest matches the - # reference file (ignoring the file paths) - with open(Path(config_path).parent / "test_data_reference.json", "rt", encoding="utf8") as reference_fin, open( - cfg["final_manifest"], "rt", encoding="utf8" - ) as generated_fin: - reference_lines = reference_fin.readlines() - generated_lines = generated_fin.readlines() - assert len(reference_lines) == len(generated_lines) - for reference_line, generated_line in zip(reference_lines, generated_lines): - reference_data = json.loads(reference_line) - generated_data = json.loads(generated_line) - reference_data.pop("audio_filepath") - generated_data.pop("audio_filepath") - assert reference_data == generated_data diff --git a/tools/speech_data_processor/tests/test_data_to_data.py b/tools/speech_data_processor/tests/test_data_to_data.py deleted file mode 100644 index ae75b9317cfc..000000000000 --- a/tools/speech_data_processor/tests/test_data_to_data.py +++ /dev/null @@ -1,99 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pytest -from sdp.processors.modify_manifest.data_to_data import ( - InsIfASRInsertion, - SubIfASRSubstitution, - SubMakeLowercase, - SubRegex, - SubSubstringToSpace, - SubSubstringToSubstring, -) - -test_params_list = [] - -test_params_list.extend( - [ - (SubSubstringToSpace, {"substrings": [","]}, {"text": "hello, nemo"}, {"text": "hello nemo"}), - (SubSubstringToSpace, {"substrings": ["-"]}, {"text": "ice-cream"}, {"text": "ice cream"}), - ] -) - -test_params_list.extend( - [ - ( - SubSubstringToSubstring, - {"substring_pairs": {"nemon": "nemo"}}, - {"text": "using nemon"}, - {"text": "using nemo"}, - ), - ] -) - -test_params_list.extend( - [ - ( - InsIfASRInsertion, - {"insert_words": [" nemo", "nemo ", " nemo "]}, - {"text": "i love the toolkit", "pred_text": "i love the nemo toolkit"}, - {"text": "i love the nemo toolkit", "pred_text": "i love the nemo toolkit"}, - ), - ( - InsIfASRInsertion, - {"insert_words": [" nemo", "nemo ", " nemo "]}, - {"text": "i love the toolkit", "pred_text": "i love the new nemo toolkit"}, - {"text": "i love the toolkit", "pred_text": "i love the new nemo toolkit"}, - ), - ] -) - -test_params_list.extend( - [ - ( - SubIfASRSubstitution, - {"sub_words": {"nmo ": "nemo "}}, - {"text": "i love the nmo toolkit", "pred_text": "i love the nemo toolkit"}, - {"text": "i love the nemo toolkit", "pred_text": "i love the nemo toolkit"}, - ), - ] -) - -test_params_list.extend( - [ - ( - SubIfASRSubstitution, - {"sub_words": {"nmo ": "nemo "}}, - {"text": "i love the nmo toolkit", "pred_text": "i love the nemo toolkit"}, - {"text": "i love the nemo toolkit", "pred_text": "i love the nemo toolkit"}, - ), - ] -) - -test_params_list.extend( - [(SubMakeLowercase, {}, {"text": "Hello Привет 123"}, {"text": "hello привет 123"},),] -) - -test_params_list.extend( - [(SubRegex, {"regex_to_sub": {"\s<.*>\s": " "}}, {"text": "hello world"}, {"text": "hello world"},),] -) - - -@pytest.mark.parametrize("test_class,class_kwargs,test_input,expected_output", test_params_list, ids=str) -def test_data_to_data(test_class, class_kwargs, test_input, expected_output): - processor = test_class(**class_kwargs, output_manifest_file=None) - - output = processor.process_dataset_entry(test_input)[0].data - - assert output == expected_output diff --git a/tools/speech_data_processor/tests/test_data_to_dropbool.py b/tools/speech_data_processor/tests/test_data_to_dropbool.py deleted file mode 100644 index 4a42f0c0651a..000000000000 --- a/tools/speech_data_processor/tests/test_data_to_dropbool.py +++ /dev/null @@ -1,236 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pytest -from sdp.processors.modify_manifest.data_to_dropbool import ( - DropASRErrorBeginningEnd, - DropHighCER, - DropHighLowCharrate, - DropHighLowDuration, - DropHighLowWordrate, - DropHighWER, - DropIfRegexInAttribute, - DropIfSubstringInAttribute, - DropIfSubstringInInsertion, - DropIfTextIsEmpty, - DropLowWordMatchRate, - DropNonAlphabet, -) - -test_params_list = [] - -test_params_list.extend( - [ - ( - DropHighLowCharrate, - {"high_charrate_threshold": 9.9, "low_charrate_threshold": 0}, - {"text": "0123456789", "duration": 1}, - True, - ), - ( - DropHighLowCharrate, - {"high_charrate_threshold": 99, "low_charrate_threshold": 10.1}, - {"text": "0123456789", "duration": 1}, - True, - ), - ( - DropHighLowCharrate, - {"high_charrate_threshold": 10.1, "low_charrate_threshold": 9.9}, - {"text": "0123456789", "duration": 1}, - False, - ), - ] -) - -test_params_list.extend( - [ - ( - DropHighLowWordrate, - {"high_wordrate_threshold": 3.9, "low_wordrate_threshold": 0}, - {"text": "11 22 33 44", "duration": 1}, - True, - ), - ( - DropHighLowWordrate, - {"high_wordrate_threshold": 99, "low_wordrate_threshold": 4.1}, - {"text": "11 22 33 44", "duration": 1}, - True, - ), - ( - DropHighLowWordrate, - {"high_wordrate_threshold": 4.1, "low_wordrate_threshold": 3.9}, - {"text": "11 22 33 44", "duration": 1}, - False, - ), - ] -) - -test_params_list.extend( - [ - (DropHighLowDuration, {"high_duration_threshold": 3.9, "low_duration_threshold": 0}, {"duration": 4}, True,), - (DropHighLowDuration, {"high_duration_threshold": 99, "low_duration_threshold": 4.1}, {"duration": 4}, True,), - ( - DropHighLowDuration, - {"high_duration_threshold": 4.1, "low_duration_threshold": 3.9}, - {"duration": 4}, - False, - ), - ] -) - - -test_params_list.extend( - [ - (DropNonAlphabet, {"alphabet": " abc"}, {"text": "ab ba cab dac"}, True,), - (DropNonAlphabet, {"alphabet": " abcd"}, {"text": "ab ba cab dac"}, False,), - ] -) - - -test_params_list.extend( - [ - ( - DropASRErrorBeginningEnd, - {"beginning_error_char_threshold": 0, "end_error_char_threshold": 2}, - {"text": "2", "pred_text": "1 2 3"}, - True, - ), - ( - DropASRErrorBeginningEnd, - {"beginning_error_char_threshold": 2, "end_error_char_threshold": 0}, - {"text": "2", "pred_text": "1 2 3"}, - True, - ), - ( - DropASRErrorBeginningEnd, - {"beginning_error_char_threshold": 2, "end_error_char_threshold": 2}, - {"text": "2", "pred_text": "1 2 3"}, - False, - ), - ( - DropASRErrorBeginningEnd, - {"beginning_error_char_threshold": 0, "end_error_char_threshold": 2}, - {"text": "sentence with some text here", "pred_text": "sentence with some text her"}, - False, - ), - ( - DropASRErrorBeginningEnd, - {"beginning_error_char_threshold": 0, "end_error_char_threshold": 2}, - { - "text": "sentence with some text here but actually more text was spoken", - "pred_text": "sentence with some text her", - }, - True, - ), - ] -) - -test_params_list.extend( - [ - (DropHighCER, {"cer_threshold": 9.9}, {"text": "0123456789", "pred_text": "012345678"}, True,), - (DropHighCER, {"cer_threshold": 10.1}, {"text": "0123456789", "pred_text": "012345678"}, False,), - ] -) - -test_params_list.extend( - [ - (DropHighWER, {"wer_threshold": 0}, {"text": "11 22", "pred_text": "11 22"}, False,), - (DropHighWER, {"wer_threshold": 50.1}, {"text": "11 22", "pred_text": "11 22 33"}, False,), - (DropHighWER, {"wer_threshold": 49.9}, {"text": "11 22", "pred_text": "11 22 33"}, True,), - ] -) - -test_params_list.extend( - [ - ( - DropLowWordMatchRate, - {"wmr_threshold": 50.1}, - {"text": "hello world i'm nemo", "pred_text": "hello world"}, - True, - ), - ( - DropLowWordMatchRate, - {"wmr_threshold": 49.9}, - {"text": "hello world i'm nemo", "pred_text": "hello world"}, - False, - ), - ] -) - -test_params_list.extend( - [ - ( - DropIfSubstringInAttribute, - {"attribute_to_substring": {"filepath": ["002"]}}, - {"text": "hello world", "filepath": "path/to/file/002.wav"}, - True, - ), - ( - DropIfSubstringInAttribute, - {"attribute_to_substring": {"filepath": ["002"]}}, - {"text": "hello world", "filepath": "path/to/file/001.wav"}, - False, - ), - ] -) - -test_params_list.extend( - [ - ( - DropIfRegexInAttribute, - {"attribute_to_regex": {"text": ["(\\D ){5,20}"]}}, - {"text": "h e l l o world"}, - True, - ), - (DropIfRegexInAttribute, {"attribute_to_regex": {"text": ["(\\D ){5,20}"]}}, {"text": "hello world"}, False,), - ] -) - -test_params_list.extend( - [ - ( - DropIfSubstringInInsertion, - {"substrings_in_insertion": ["might "]}, - {"text": "we miss certain words", "pred_text": "we might miss certain words"}, - True, - ), - ( - DropIfSubstringInInsertion, - {"substrings_in_insertion": ["might "]}, - {"text": "we may certain words", "pred_text": "we might miss certain words"}, - False, - ), - ] -) - -test_params_list.extend( - [ - (DropIfTextIsEmpty, {}, {"text": "", "pred_text": "uuuu"}, True,), - (DropIfTextIsEmpty, {}, {"text": "uhuh", "pred_text": "uuuu"}, False,), - ] -) - - -@pytest.mark.parametrize("test_class,class_kwargs,test_input,expected_output", test_params_list, ids=str) -def test_data_to_data(test_class, class_kwargs, test_input, expected_output): - processor = test_class(**class_kwargs, output_manifest_file=None) - - output = processor.process_dataset_entry(test_input) - if output: - output = output[0].data - - if expected_output: - assert output is None - else: - assert output == test_input diff --git a/tools/speech_data_processor/tests/test_utils.py b/tools/speech_data_processor/tests/test_utils.py deleted file mode 100644 index f1314572fe02..000000000000 --- a/tools/speech_data_processor/tests/test_utils.py +++ /dev/null @@ -1,26 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pytest -from sdp.utils.edit_spaces import add_start_end_spaces, remove_extra_spaces - - -@pytest.mark.parametrize("input,expected_output", [("abc xyz abc xyz", "abc xyz abc xyz"), (" abc xyz ", "abc xyz")]) -def test_remove_extra_spaces(input, expected_output): - assert remove_extra_spaces(input) == expected_output - - -@pytest.mark.parametrize("input,expected_output", [("abc", " abc "), ("abc xyz", " abc xyz ")]) -def test_add_start_end_spaces(input, expected_output): - assert add_start_end_spaces(input) == expected_output From 247dc27fde17a4c28ba5324edb79d54d44886c1e Mon Sep 17 00:00:00 2001 From: Kirthi Shankar Sivamani Date: Mon, 19 Dec 2022 16:01:46 -0800 Subject: [PATCH 081/154] Add interface for making amax reduction optional for FP8 (#5447) * add TE interface for making amax reduction optional Signed-off-by: Kirthi Shankar Sivamani * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Kirthi Shankar Sivamani Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper Signed-off-by: Elena Rastorgueva --- examples/nlp/language_modeling/conf/megatron_gpt_config.yaml | 1 + .../nlp/models/language_modeling/megatron/gpt_model.py | 2 ++ .../nlp/models/language_modeling/megatron_gpt_model.py | 1 + .../collections/nlp/modules/common/megatron/language_model.py | 4 ++++ nemo/collections/nlp/modules/common/megatron/transformer.py | 3 +++ 5 files changed, 11 insertions(+) diff --git a/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml b/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml index 0604c0287d05..13b8ffde521d 100755 --- a/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml +++ b/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml @@ -149,6 +149,7 @@ model: fp8_interval: 1 # scaling update interval fp8_amax_history_len: 1 # Number of steps for which amax history is recorded per tensor fp8_amax_compute_algo: most_recent # 'most_recent' or 'max'. Algorithm for computing amax from history + reduce_amax: True # Perform reduction to sync amax tensors across GPUs after every iteration use_emha: False # Use fused multi-head attention for large sequence-length. Note this is not yet supported. Please set to False. data: diff --git a/nemo/collections/nlp/models/language_modeling/megatron/gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron/gpt_model.py index de7b8b05bde7..f53ad92b1dea 100755 --- a/nemo/collections/nlp/models/language_modeling/megatron/gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron/gpt_model.py @@ -137,6 +137,7 @@ def __init__( fp8_interval=1, fp8_amax_history_len=1, fp8_amax_compute_algo='most_recent', + reduce_amax=True, use_emha=False, ): @@ -203,6 +204,7 @@ def __init__( fp8_interval=fp8_interval, fp8_amax_history_len=fp8_amax_history_len, fp8_amax_compute_algo=fp8_amax_compute_algo, + reduce_amax=reduce_amax, use_emha=use_emha, ) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py index 529a1772aab9..1973e6a16c85 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py @@ -190,6 +190,7 @@ def model_provider_func(self, pre_process, post_process): fp8_interval=self.cfg.get('fp8_interval', 1), fp8_amax_history_len=self.cfg.get('fp8_amax_history_len', 1), fp8_amax_compute_algo=self.cfg.get('fp8_amax_compute_algo', 'most_recent'), + reduce_amax=self.cfg.get('reduce_amax', True), use_emha=self.cfg.get('use_emha', False), ) diff --git a/nemo/collections/nlp/modules/common/megatron/language_model.py b/nemo/collections/nlp/modules/common/megatron/language_model.py index a676972cf4af..c2120bcd1c5e 100755 --- a/nemo/collections/nlp/modules/common/megatron/language_model.py +++ b/nemo/collections/nlp/modules/common/megatron/language_model.py @@ -84,6 +84,7 @@ def get_language_model( fp8_interval=1, fp8_amax_history_len=1, fp8_amax_compute_algo='most_recent', + reduce_amax=True, use_emha=False, ): """Build language model and return along with the key to save.""" @@ -146,6 +147,7 @@ def get_language_model( fp8_interval=fp8_interval, fp8_amax_history_len=fp8_amax_history_len, fp8_amax_compute_algo=fp8_amax_compute_algo, + reduce_amax=reduce_amax, use_emha=use_emha, ) # key used for checkpoints. @@ -433,6 +435,7 @@ def __init__( fp8_interval=1, fp8_amax_history_len=1, fp8_amax_compute_algo='most_recent', + reduce_amax=True, use_emha=False, ): super(TransformerLanguageModel, self).__init__() @@ -514,6 +517,7 @@ def __init__( fp8_interval=fp8_interval, fp8_amax_history_len=fp8_amax_history_len, fp8_amax_compute_algo=fp8_amax_compute_algo, + reduce_amax=reduce_amax, use_emha=use_emha, ) self._encoder_key = 'encoder' diff --git a/nemo/collections/nlp/modules/common/megatron/transformer.py b/nemo/collections/nlp/modules/common/megatron/transformer.py index 8c36d2d504d7..aa09954b5eb3 100644 --- a/nemo/collections/nlp/modules/common/megatron/transformer.py +++ b/nemo/collections/nlp/modules/common/megatron/transformer.py @@ -2006,6 +2006,7 @@ def __init__( fp8_interval=1, fp8_amax_history_len=1, fp8_amax_compute_algo='most_recent', + reduce_amax=True, use_emha=False, normalize_attention_scores=True, num_moe_experts=1, @@ -2079,6 +2080,7 @@ def __init__( self.fp8_interval = fp8_interval self.fp8_amax_history_len = fp8_amax_history_len self.fp8_amax_compute_algo = fp8_amax_compute_algo + self.reduce_amax = reduce_amax self.fp8_recipe = None @@ -2093,6 +2095,7 @@ def __init__( fp8_format=fp8_format, amax_history_len=self.fp8_amax_history_len, amax_compute_algo=self.fp8_amax_compute_algo, + reduce_amax=reduce_amax, ) self.is_first_microbatch = True From 6b13cd5986b72c225889e2492c7dcc8df07984c1 Mon Sep 17 00:00:00 2001 From: Evelina <10428420+ekmb@users.noreply.github.com> Date: Mon, 19 Dec 2022 18:10:03 -0800 Subject: [PATCH 082/154] [TTS] add tts dict cust notebook (#5662) * add tts dict cust notebook Signed-off-by: ekmb * review Signed-off-by: ekmb * fixed audio links Signed-off-by: ekmb * remove old notebook Signed-off-by: ekmb * fix typo Signed-off-by: ekmb Signed-off-by: ekmb Signed-off-by: Elena Rastorgueva --- nemo_text_processing/g2p/modules.py | 45 ++- .../tts/Pronunciation_customization.ipynb | 311 ++++++++++++++++++ tutorials/tts/audio_samples/default_ipa.wav | Bin 0 -> 96812 bytes .../tts/audio_samples/new_dict_entry.wav | Bin 0 -> 109100 bytes .../tts/audio_samples/phonemes_as_input.wav | Bin 0 -> 109100 bytes 5 files changed, 339 insertions(+), 17 deletions(-) create mode 100644 tutorials/tts/Pronunciation_customization.ipynb create mode 100644 tutorials/tts/audio_samples/default_ipa.wav create mode 100644 tutorials/tts/audio_samples/new_dict_entry.wav create mode 100644 tutorials/tts/audio_samples/phonemes_as_input.wav diff --git a/nemo_text_processing/g2p/modules.py b/nemo_text_processing/g2p/modules.py index 264cbc4d3784..f83c47d56210 100644 --- a/nemo_text_processing/g2p/modules.py +++ b/nemo_text_processing/g2p/modules.py @@ -316,23 +316,7 @@ def __init__( else: self.use_chars = use_chars - # parse the dictionary, update it, and generate symbol set. - if isinstance(phoneme_dict, str) or isinstance(phoneme_dict, pathlib.Path): - # load the dictionary file where there may exist a digit suffix after a word, which - # represents the pronunciation variant of that word. - phoneme_dict_obj = defaultdict(list) - _alt_re = re.compile(r"\([0-9]+\)") - with open(phoneme_dict, "r") as fdict: - for line in fdict: - if len(line) and ('A' <= line[0] <= 'Z' or line[0] == "'"): - parts = line.strip().split(maxsplit=1) - word = re.sub(_alt_re, "", parts[0]) - prons = re.sub(r"\s+", "", parts[1]) - phoneme_dict_obj[word].append(list(prons)) - else: - # Load phoneme_dict as dictionary object - logging.info("Loading phoneme_dict as a Dict object.") - phoneme_dict_obj = phoneme_dict + phoneme_dict_obj = self._parse_phoneme_dict(phoneme_dict) # verify if phoneme dict obj is empty if phoneme_dict_obj: @@ -372,6 +356,33 @@ def __init__( if set_graphemes_upper and heteronyms: self.heteronyms = [het.upper() for het in self.heteronyms] + @staticmethod + def _parse_phoneme_dict(phoneme_dict: Union[str, pathlib.Path]): + # parse the dictionary, update it, and generate symbol set. + if isinstance(phoneme_dict, str) or isinstance(phoneme_dict, pathlib.Path): + # load the dictionary file where there may exist a digit suffix after a word, which + # represents the pronunciation variant of that word. + phoneme_dict_obj = defaultdict(list) + _alt_re = re.compile(r"\([0-9]+\)") + with open(phoneme_dict, "r") as fdict: + for line in fdict: + if len(line) and ('A' <= line[0] <= 'Z' or line[0] == "'"): + parts = line.strip().split(maxsplit=1) + word = re.sub(_alt_re, "", parts[0]) + prons = re.sub(r"\s+", "", parts[1]) + phoneme_dict_obj[word].append(list(prons)) + else: + # Load phoneme_dict as dictionary object + logging.info("Loading phoneme_dict as a Dict object.") + phoneme_dict_obj = phoneme_dict + return phoneme_dict_obj + + def replace_dict(self, phoneme_dict: Union[str, pathlib.Path]): + """ + Replace model's phoneme dictionary with a custom one + """ + self.phoneme_dict = self._parse_phoneme_dict(phoneme_dict) + @staticmethod def _parse_file_by_lines(p: Union[str, pathlib.Path]) -> List[str]: with open(p, 'r') as f: diff --git a/tutorials/tts/Pronunciation_customization.ipynb b/tutorials/tts/Pronunciation_customization.ipynb new file mode 100644 index 000000000000..d7b97daf701f --- /dev/null +++ b/tutorials/tts/Pronunciation_customization.ipynb @@ -0,0 +1,311 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "# How do I customize TTS pronunciations?\n", + "\n", + "This tutorial walks you through the basics of NeMo TTS pronunciation customization. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\"\"\"\n", + "You can either run this notebook locally (if you have all the dependencies and a GPU) or on Google Colab.\n", + "Instructions for setting up Colab are as follows:\n", + "1. Open a new Python 3 notebook.\n", + "2. Import this notebook from GitHub (File -> Upload Notebook -> \"GITHUB\" tab -> copy/paste GitHub URL)\n", + "3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n", + "4. Run this cell to set up dependencies.\n", + "\"\"\"\n", + "\n", + "BRANCH = 'main'\n", + "# # If you're using Google Colab and not running locally, uncomment and run this cell.\n", + "# !apt-get install sox libsndfile1 ffmpeg\n", + "# !pip install wget text-unidecode pynini==2.1.4\n", + "# !python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]\n", + "# !wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/nemo_text_processing/install_pynini.sh\n", + "# !bash install_pynini.sh" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Grapheme-to-phoneme (G2P) Overview\n", + "\n", + "Modern **text-to-speech** (TTS) models can learn pronunciations from raw text input and its corresponding audio data.\n", + "Sometimes, however, it is desirable to customize pronunciations, for example, for domain-specific terms. As a result, many TTS systems use grapheme and phonetic input during training to directly access and correct pronunciations at inference time.\n", + "\n", + "\n", + "[The International Phonetic Alphabet (IPA)](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) and [ARPABET](https://en.wikipedia.org/wiki/ARPABET) are the most common phonetic alphabets. \n", + "\n", + "There are two ways to customize pronunciations:\n", + "\n", + "1. pass phonemes as an input to the TTS model, note that the request-time overrides are best suited for one-off adjustments\n", + "2. configure TTS model with the desired domain-specific terms using custom phonetic dictionary\n", + "\n", + "Both methods require users to convert graphemes into phonemes (G2P). \n", + "\n", + "#### All words for G2P purposes could be divided into the following groups:\n", + "* *known* words - words that are present in the model's phonetic dictionary\n", + "* *out-of-vocabulary (OOV)* words - words that are missing from the model's phonetic dictionary. \n", + "* *[heteronyms](https://en.wikipedia.org/wiki/Heteronym_(linguistics))* - words with the same spelling but different pronunciations and/or meanings, e.g., *bass* (the fish) and *bass* (the musical instrument).\n", + "\n", + "#### Important NeMo flags:\n", + "* `your_spec_generator_model.vocab.g2p.phoneme_dict` - phoneme dictionary that maps words to their phonetic transcriptions, e.g., [ARPABET-based CMU Dictionary](https://github.com/NVIDIA/NeMo/blob/r1.14.0/scripts/tts_dataset_files/cmudict-0.7b_nv22.10) or [IPA-based CMU Dictionary](https://github.com/NVIDIA/NeMo/blob/r1.14.0/scripts/tts_dataset_files/ipa_cmudict-0.7b_nv22.10.txt)\n", + "* `your_spec_generator_model.vocab.g2p.heteronyms` - list of the model's heteronyms, grapheme form of these words will be used even if the word is present in the phoneme dictionary.\n", + "* `your_spec_generator_model.vocab.g2p.ignore_ambiguous_words`: if is set to **True**, words with more than one phonetic representation in the pronunciation dictionary are ignored. This flag is relevant to the words with multiple valid phonetic transcriptions in the dictionary that are not in `your_spec_generator_model.vocab.g2p.heteronyms` list.\n", + "* `your_spec_generator_model.vocab.phoneme_probability` - phoneme probability flag in the Tokenizer and the same from in the G2P module: `your_spec_generator_model.vocab.g2p.phoneme_probability` ([0, 1]). If a word is present in the phoneme dictionary, we still want our TTS model to see graphemes and phonemes during training to handle OOV words during inference. The `phoneme_probability` determines the probability of an unambiguous dictionary word appearing in phonetic form during model training, `(1 - phoneme_probability)` is the probability of the graphemes. This flag is set to `1` in the parse() method during inference.\n", + "\n", + "To ensure the desired pronunciation, we need to add a new entry to the model's phonetic dictionary. If the target word is already in the dictionary, we need to remove the default pronunciation so that only the target pronunciation is present. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Default G2P\n", + "\n", + "Below we show how to analyze default G2P output of the NeMo models" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import nemo.collections.tts as nemo_tts\n", + "from nemo_text_processing.g2p.modules import IPAG2P\n", + "import soundfile as sf\n", + "import IPython.display as ipd\n", + "import torch\n", + "\n", + "# Load mel spectrogram generator\n", + "spec_generator = nemo_tts.models.FastPitchModel.from_pretrained(\"tts_en_fastpitch_ipa\").eval()\n", + "# Load vocoder\n", + "vocoder = nemo_tts.models.HifiGanModel.from_pretrained(model_name=\"tts_hifigan\").eval()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# helper functions\n", + "\n", + "def generate_audio(input_text):\n", + " # parse() sets phoneme probability to 1, i.e. dictionary phoneme transcriptions are used for known words\n", + " parsed = spec_generator.parse(input_text)\n", + " spectrogram = spec_generator.generate_spectrogram(tokens=parsed)\n", + " audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram)\n", + " display(ipd.Audio(audio.detach().to('cpu').numpy(), rate=22050))\n", + " \n", + "def display_postprocessed_text(text):\n", + " # to use dictionary entries for known words, not needed for generate_audio() as parse() handles this\n", + " spec_generator.vocab.phoneme_probability = 1\n", + " spec_generator.vocab.g2p.phoneme_probability = 1\n", + " print(f\"Input before tokenization: {' '.join(spec_generator.vocab.g2p(text))}\\n\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "text = \"paracetamol can help reduce fever.\"\n", + "generate_audio(text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Expected results if you run the tutorial:\n", + " \n", + "\n", + "\n", + "During preprocessing, unambiguous dictionary words are converted to phonemes, while OOV and words with multiple entries are kept as graphemes. For example, **paracetamol** is missing from the phoneme dictionary, and **can** has 2 forms." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "display_postprocessed_text(text)\n", + "for word in [\"paracetamol\", \"can\"]:\n", + " word = word.upper()\n", + " phoneme_forms = spec_generator.vocab.g2p.phoneme_dict[word]\n", + " print(f\"Number of phoneme forms for wordPhoneme forms for '{word}': {len(phoneme_forms)} -- {phoneme_forms}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Input customization\n", + "\n", + "One can pass phonemes directly as input to the model to customize pronunciation.\n", + "\n", + "Let's replace the word **paracetamol** with the desired phonemic transcription in our example by adding vertical bars around each phone, e.g., `ˌpæɹəˈsitəmɔl` -> `|ˌ||p||æ||ɹ||ə||ˈ||s||i||t||ə||m||ɔ||l|`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(f\"Original text: {text}\")\n", + "\n", + "new_pronunciation = \"ˌpæɹəˈsitəmɔl\"\n", + "phoneme_form = \"\".join([f\"|{s}|\" for s in new_pronunciation])\n", + "text_with_phonemes = text.replace(\"paracetamol\", phoneme_form)\n", + "print(f\"Text with phonemes: {text_with_phonemes}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "generate_audio(text_with_phonemes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Expected results if you run the tutorial:\n", + " \n", + "\n", + "\n", + "## Dictionary customization\n", + "\n", + "Below we show how to customize phonetic dictionary for NeMo TTS models. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's add a new entry to the dictionary for the word **paracetamol**. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# we download IPA-based CMU Dictionary and add a custom entry for the target word\n", + "ipa_cmu_dict = \"ipa_cmudict-0.7b_nv22.10.txt\"\n", + "if os.path.exists(ipa_cmu_dict):\n", + " ! rm $ipa_cmu_dict\n", + "\n", + "! wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/scripts/tts_dataset_files/$ipa_cmu_dict\n", + "\n", + "with open(ipa_cmu_dict, \"a\") as f:\n", + " f.write(f\"PARACETAMOL {new_pronunciation}\\n\")\n", + " \n", + "! tail $ipa_cmu_dict" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# let's now use our updated dictionary as the model's phonetic dictionary\n", + "spec_generator.vocab.g2p.replace_dict(ipa_cmu_dict)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Paracetomol** is no longer an OOV, and the model uses the phonetic form we provided:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "display_postprocessed_text(text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, let's use the new phoneme dictionary for synthesis." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "generate_audio(text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Expected results if you run the tutorial:\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Resources\n", + "* [TTS pipeline costumizaiton](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-custom.html#tts-pipeline-configuration)\n", + "* [Overview of TTS in NeMo](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/NeMo_TTS_Primer.ipynb)\n", + "* [G2P models in NeMo](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/text_processing/g2p/g2p.html)\n", + "* [Riva TTS documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-overview.html)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "38", + "language": "python", + "name": "38" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.13" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/tutorials/tts/audio_samples/default_ipa.wav b/tutorials/tts/audio_samples/default_ipa.wav new file mode 100644 index 0000000000000000000000000000000000000000..03eecf441a68fa2dca2981eb4484b2937c6006f0 GIT binary patch literal 96812 zcmWif1ymbL7l3Ca3Lzo5Lvbx$cc((#XiMFxPPDp+xM4FQ70ZEgPj zPYwq*?9SZ1HnVr{H+js+kv`iIFk{H{5lfRcdP)EQ2>a!E?9>5DYvfoiq z1VVrd2n9l*gIcJAID0e^B!X2S39JAM!B{XGECpA~S(Y3>EBu+aU@vVF$Ei&mIkCv7`29uT5j;$b}qMiVL6-*dp&) zne4zj7y~nyQ;Zw<57mPJW)(94BqC~NF4aa%q#q%Z5C{&TE?6qI6DX+d^l;#S4ny4# zEoeY6WIc2TZQu(lbt>$`-tQN<8hD|PkOHt7-e5>3m7$nBa4bjx1l$1=A;W(2lv&O^ zhjG9LOa|xRZ&v=#%oJ98+abYzI~H7I^*}>6Py^Fp6|2Q~u!+^eBX*W=U@Axk(X8HP zvYMF(-h;hh8Mp;{gJ8e|El|i>OTuc+36RhPpTK>r6>hUX{Xicu3=9VDtUcbcwkc;v zjsV$kDEJI^up_MkzU*u_U=n+clszj6OaZpw1h@$Qh0)*vltB;B0c}AooDO8Hg^vL( zOoe6e11o(4Yr6s%4z95_eF<;D`>a23z!|WO9T&2Gagmk6i~Ze~m0>R!0zSgUP|VJ^ z9}ETK*je4dQ%J*DkOy`k8Q>7O#cClD@WBMI4om}i&<4C0l zXbL3JOXvhRY0W_&-=M)`E(#@91yj7i$w7IR`@E z2<8ACPa&iWv4k)aspM%Ym`Q|=a5H?t2H+`V8Pbj%!`|RyIqx|;aVxAGO+kO4^HDD( zg^kBHupDtj+}JqPF#*hbx;Je@ucmh~-rydX2lvw^Y7e!FN}x4#C36TSgK*?AvIQvt zS3w1+Knjr)K+1aCC}cfqK-Z&p5E-lS8^9j;!NrUZ^P1tnTn1y-FwdEnOawGAGZ|0j z2{Q>kW^LlZdSW@`gF66%%g8eH5|)f@N0X5M9}h!sqmd|vdLkO|3Bi$g*hDX-AJK)( zI(C#Z%nRl=qhq(X17V-|&q7w8JsAJ>f7+lHt zGb-v|!b}}NaQq=^3;Qxbupd%{Xpms!EquWG)H9F>$1>&2WYEccAto7b>f`k7-sCYa7#h2B_yjtjm3U4eN0KmCF46yMxw3^PBc4#43^J#rpXV*`+K z<}8iSUDQb?3h9p}qJC@?aM2CeO1uvJ0Cur=J`r7s3Gn`SCVmll~@*X9mX>A z!BZ5$A7Js|6f+nSU<+2wb>uC??}1LD$T-mCVrntBGoRS_oXy>a2*5<-F$jfHC}sLF zYmo2ASWwCY5z9=w%&W}fh-~TuDCT_Qn{iKsKw3CGEL%m#1x0u*lSRqxTtb=|J()nPp_@S*IuHv)#y}o4gU5IY zcMdNbHBnCF8L|d$!60y9u9D*jlX;o(xM>6JjCJ#3c;Ct<;|qFW{V)xdjfZmExmU1rFpE5ITyC5|o&l@z zM)UwvL&lPcs6GA-QveOKja{3PK`GK3U4)i_MNA`oo4!k511RS@ZzXRD zHiB`Xrci6B2ueYQ6EahmIg*M(9&yTW7wjK20R4jHv-xBlSHq2GSE8?&D{jZx!8P(< z@mo2kkXEXS{6OV1HmEc91*bUA@RjHdy3oAKRAJy6eT<_G-wk8+S9LmFrDl$1rDmcQ zGkTb2m>kS9BgZVD&cpBcY3@$06>lA93O*QHf}|m%k$SiYK4wf%4L-vP_=34WpN5w~ zfAj)&5Od`Fim%xPI1O_svz;Riv5K__LyVMY};eQp;5~bUf|w?w0G~i7%8F)uc~WYCGCHj;b!{ONfiWnq3$F z1(WH=E94x_@J_TfLEc|s*Y#R8Li1ELNVT}zrfZ#Ynl|4QM;&2T zpD4XR9j$t;nyGEjlIp{ny_$KN4ceJ{3HcQD6m*H+T1y;GySE3g=&v3fH}TH6F+)DY zjtw#SM)tVs5^iU-(89aAF?V>gE!T;itekn5 z!6DMoG~2LEdr;}vS=g$QJ#E2SJKEe73l)aWRm$Ex%-atoKml{P`_3FRBlrS zYc?AA#={1=?uK@k(V9Muk-}rvN1Yq}I%5uu=FN0X{FLx)TyDRyp*9{eoBxDY@hof~ zT1|0ubK6?0t%~~=t}P2}lyoPWlt`ekQ)DA~CcbFl#P`A-kVS|BrqDCV2*XV^()p+b zHm20As_m+~)!3``Ob61nsY}oi)Ye-*y*Z+JzwCHRb6a=Ea@9h^0%8HP1!@92}#WG@%r89n+rCR;7wrvg5mMoy$!ks{7UZXQU}1j~L^tSP=-5>qy?%Ca%8 z^;EZ1`&WIjdvx3K#_GBYbrTx0o1e8Ot3^aHlF8=R?~Ii2FwZuJ82_61V6n74VpIIu z#j}!5F6lGl_OMMM2`=TLUYvh1DYw$%lO>mPPdBA*V6ON_#K*&5XJ#C#+H1%Yz4Hs{ z^`KXenB4&j9jAytqt}^5NF30)vb_i?%1xsN*A*`$1&(-B%3KD?wQooCJ7TJijP|{dV|MW{{{2Mtd8vI z=4p9=6N03nL405QDP5(ktD2Qx`#b#GuAdFR2j}NDPS>pvNBD2+^?q3VkU_o0flZQR zB-#+CiJ|h)-TZ69oyQe zhW|q%4g5L8JmgB8Pd@!Dk#d~3%0?)>$X-p>aUPArp4Y+n08lFa_Q&Xp6MqtpB6r@ z%4|#3Tr#+t-l*kmUmNGtEUN}J^Bcyu9aRmd2lA&|dD*U)+DQ+3)b*?%F>ua+h0MYW zbB~S*jUC=&jUbOWtUA+{*mR3QGHOY(?x6BG!K+9tz=_D)u*DQLUrNIa#`bnws+mD)rY%FIMeC% zN?F%E`G?x*s;|Xu1rEhm%Z@ak)dAi_*XIE_p*MSm28{Py;IM<=8_qM)8kN>p+pg)O zBURJ8y*p;NY-^fVyRCw%;8oqOi)mTXnxyDbl(*e(&z1it!{nVUKRYdq|I(Qpy+x*t z%*Urszwwi1N6y_pt8^ST$R|SVeoA_ozlFq!yT*-bREsHm>S}9-m0cX3JAzOPqvRlVb zB>QR;m5aL$C>Lv288~KF(-hs;Zfp6=+DB!_3a1uXmF=#nXt~nmtb1bGXOwF56dp}G z>O&f;TCMcE7@=U8%|VxSp;klOCw!V^oe(zdWc;ha{lZdQUvra93)DBeR_k=8c}A`A zocd6Ebjy;e`9(Vl7Z&oX1}VbHc<~p{#2}>S?~qA8;~l!VBh6i1TiZ(8uXb3gRfas{ z2UD>jOy!`s+UQ!TDQzhEQ9(2wRBY3aVRnI@$O!6-#Ml1<^Io8$T5~DHj`*yP=$J1%DjU~yxvgDW zXugi^=SeXk5^G%C`J|20nALDd?$AEV=q9XlQN|XIv6(7KsGJlz{71Cewc27MWu*wG}y+;iCFd(t_)3C*!zpZnzKE`9+_KGhZmf8*UYW_@7mc?xp zQU9nrmLHP7Ff+YqWYbA~7QWqbs%Qyko^f^ajnvhW4aaO6y_;NBKvC{Iyp_jHE8wB>ak4+6f4wk%)P12#28bfa%-D=%agim zwayK`4e9MMW|c6&|Nf9EQ+F@QT(o!Q&@mZ3ciH^|!n@MsN4jp9xuRUR-(JJrKZ&c2 zKN`z(_I_RR-^L&D1yq|JjdhvZ^G4s`0bgTt{Q35iu`2EN_G_}6Eu%WpG}}oUxr41i z3d|e3{2M=)tthxx=wA`qr06rlhbhhadK)h(7v!`pl?JRC-*MicVw9Rjhzczo}TV$hp zC;H`gzW0v-vfishY@9=c1@tEUI_-X4u|ZAzqWi=3@H=(Lcvo2{hjqs)u9OPP7gkMf zT;D!kY1CHe`lt_fJd_QrA5h&=ajL4JcBOo&ak%9G|L-H3<~&>dcm4fkF4Hd$T-qZ< zINI2^^HFD-E(aZAz1~UbG)G)Pu9a`g3;R6p^_^EaU;WCGG~=bcdL9@|jN}h~61vUt z8P-QXW#8HV3>@Yy5Z>2(X*cw4tR5 z0;Rax>Kp%|IixMN;&yt8k*SQ4%rU(0>?0dqx4v?1)vel5vc=j@ z9I^Z2ex8$WE-6@7uxi7CIq{0fZO$>MW#?IWew&lJ0dJLtJMD1xw0x^SUsv<%#w)?I zQK^Axg*Dz}rpwZ3+0Y@Q^9QDc_IJi`p3b?`vF&VUi~bMuhQCkTVYQw=k6hHQEx(m@ zJ#Fcao#}anbL&rb4KrV7d|({0LlxILqK;c(TTx#nkzLaH@dCY;4v(RI9|x^t)zTKG7Q@Eqv5N;E{ztsejFd+PY7FxB|0cfFXlbUPC@VDR%{ zZN2~VpJ#gqYBb8Oiq7e}AZ8cmlx2fxEI$ekPh{+e;-OGa&4;kt&!TJ)GlT< zB{uqYZEVS}8&LVI(!Tbpd>T8F>4x!oes#ehwHorl5!m%|o2EAaqrcl-M`ez!eQ z#=9dealB?f(c(67UEWYM{m;8bmGbKF zZcok|@0&y4Bphn7ymY@5bzK9ATG3z6-@&=TnEMUR)3&VajJJv> zeV;!2II(b0H)-kN_dNP^ACD-$&oSvRw1@GpHd#M}T#9WHPLh0+N+f&vy{HYHl{H`U zcV~$I%*gU6YON&|EA#`&Rn$_Wq&q`4yn1iRkz$9^E!F?FXOL%YQzJ{pCoKwHJ7#s> z-0o3rp+oIbk zx2qv9dN%pqwK}1kTJZGUj7Os$Wxi_2SR@+_9yk|=K8X1g{W&PlX^5aN*`b-MR_W$a zCXQPC*TLIaW%pSqGqyCfeO%{?#kJx!_Rk->#X4J!U83tLk2hs%B#G%lzqu)y12u%4K1O<&ut2WWvv-uIuL~ z?_WG`+@07IH#L6_C?&N>m`$tasSs`0509<<>~_D*>#rpD@86yHs#p4^Mm4Z?dJ_CF zW@?1VZ-Ijhzi1Zd+_j?#8cno(?C{oYqI;c#gug?ZRuhs1zI%Qlehn!~Z0M~TOsXjl z^0vyWactRzzgC&Ea|Rb2t@3S^8lFpgM>xjUt+v{HYTe<*Pe&z%jdiyWWTAh!kND@D zMg&EM;Gq*;Z7H(hLfV+8PaZZsqTdfLk}67gfj#0wU7|;Y%A67{8qI&YhpP*;hiDJc zK!+*bvjVSsUb0z3do}%&zb9?umrZ|)^11bLW2x)jA z8>8kTr>r@VDdP_;IIv-5(yKXr#$OFDb5+@3+#N!mgThr4bSN~zJ=p5A@^=2m&!JEM zJiPZZ_v?g;-MXok&c2>K7lauCyzHf%Sn{}PspdRkhh$4uxD|LAy?q^r;s2>ZDpE6t z|C;rallO16wxv-w&-~7CQyVWEQ#LdY`yG%WD*Rc_Ynj~TtG_Jx>R#9P#q3op+}HnF z)jS0b{x7)2KE-;dggLA@Y-*N3tJ+`~`vK^1%U7T9Gf<>8Y z{%Q)7E6bXYaF zO`5l7NOX9D#~K?a2Rlc`{anDT9##(jAP*XzW~IK3e%$)Z_O&=|S;g-T6Y6F6*7>^I z0f%g>B8wxuK;9^e&z52-=9K3$&waG>I?JonV8zVJ=t4AiUS3;CcGbK3L(OxVhSj@O z6%>kcE&e{w{ZsDW*r;ez+o(V4G$6uW*Qei{=;VrBtF{=HgiadVo2_m+|7Y{ZG1c>B z;FRzHA6wBu)y7=U_W@4^-7!A7{qarK+-8h;Yca(6iVM$PEJ@+^!`@-{INny5?031_ z_%-=%b6+4?K#f)8*SMGdSNybWVa=qbGTD3C_J%`Mijs@D+p_v+&C8iw8eD(8?P=Gi z&eje>^^w~*c*(>;>zLhBw|6F4&iFPkB)qssocjW=ecmY{_ThCt>ue_L6U(RmN_>&~ zaKw{mDZFokOKjSo;|}&CoL@QhwW>h}!X@M!P$#UniS*J1N`rsL*&_9R{W;BI+BrpE{5dEmpA)XGn;*|jY~GSzaF-EC1J zzZBaJ2z(sRRzF89P6d(`7qQzkRLuE z=6pz==WXti_KRibzZbu7eO~|M>pPFXJ?nMkC{choToi#8lDFs@+L5Q1I5{lwI_Q(( zy~0y2z60NA`zopxC)&@-2gsF;WUYP8k&4o?ktKDx_p%MywmB2?9u{)Sj@C?U%4qDA zr5Q>j$71))l&sjE{AJbZS%=4$4=(F7qTkS%9x>DUmG+WH-14l&4Bai|X}{%ahj3UfMk^ zXrg-LiNUW1)yK|^3F&*V*O!P;&%fLhU0CD%{LMd(zv=lh>EoBIvc?^Ti{K1%(BP*z zqH!~aqNMPP&0xnc=TPTYc2~u#xnla3*~UZ}mgpI^zH4AdqFmGHRX4a|Y00F*z(P&Q z!qVr(0VP{Yi>fx%9jkJw{33f#?&xu1M4#EcmXs!zOiPX5G*&t8&xCt%Z%2~Dz7Enx zhxV8!?5z!HI#8OP{W#;@pNx!^1um7nWb@lA<%ul=COQOmWN{fo_6?c~wS3am&QY~sZ zVir618{jjweBOn*cGJ4X^&WFLP7;4HzGXr{{KK*1hHej2Sy!3+wY)FiSk$*rP;$ER zX7%j4j1Gkw_h3-$w-#Lc=ok zomRj4O=YD87mDr_-z=^vo?P%>-lBYIu~RkHxW0KtlV6(>x#jYtU+JW=Q|HgwFn!Xv zL-8NS+>EoG^ek@WxJ7Y~`pgdo_Kna+e_s|;HK8~l&s6AF_M>iL>u9;G*|hJ9Or9^WmzFKQFk^F25(!FKuv0SI4Hq_IG^UKghI?%pg{)m2rkt6Eg`x$;8!^dfcc z^sMO2OSxq6>Z;QX2b!03j-;9#enyQL8#T*k#)PRK#|OkkPt;64Hyp>9EQP4&#htR*uUYxp<<_l)K|^PJBMzVtmG!w%96fO7fS{=_hvlk=?Aj zRaBqHZgC3(t5(+jXs~MyX&uodv9ldrB%Nc%Z z%#$&9hqaAN8#!il;)L|^Q{uD7oa%EgSnjr4I>k!Kq-b@`CyRR)=nGS-hRbrA7B$q> z->xAVhSys+w#bIdueE!0wW-D!P8dywMTR@96d~Q(u2ZUEy7vaPo}&xWz0@q%7&OP# zUCJ$rL^&;skk4yf)Hz;GOf(7Qd(_O-`-T; z;-*OHn6KE~F}`C^XK+VaTf9QvIlOy|vPh}a)*A`)Kw=+JMmU%!n+BUS=1YW<_=kKz z7)&?Rt=(RRYpB0um6L;GZ#1dP7R%!iM0+=muVW zbUt44t{`NZrLR%upVzc%(`46w!UsN$L6uMr6`hj6LF@SX)!&VQWNvd{m9)whvjph za$?cN_!5ix_LOUdbBxO*_r1RBLsdZ!gFgAmy{&rWxP5XrcpP(Sby#3MjGxMBfx$#i zeOXskTeLi?=}lv2y6$q@=@{nlR(jqdlsga4M+TDKhQZ1kiiFm3 zxu1M%qrOhn=+>Ox=qsBo4`|V~;q6;H2=zk!XG#w!Y!l{!PlhASLF66kAL=2q6FGpz zVza?xs6<;CYvUF366OJ#iQHrRt_Z%RMV6pkP|Ypk7|{<*J#(3}m@^3f&h_KR3&O>7 zg=*d-juH0|y7RUQk67#ysl>R|JpN~)2PB=ZgZiOwWrqUnSogPuZP=U(Kc z@|SX4xE2u5FW@z57n;l6$RE!=DxS`F0!KMZ@E52*>P1Dv3Fv85AUurSW*$=CNC+a3 zm$Zs$WB|t#*#{r+7V)%#U1%ilqHsI+j^!$ezX0P#;Bxd47-FsycyosltC^9EH7V9i zGybP1%-fYV#-J_qc59L&2n=R(`8y>u%}0%3&D2Al3Rzqwft#Q$bBK%Dq0Ex z_!XRPs9=(j4DMaX5ZBSANH4Sx%GUA7ZB!Y7L!8@87QuCNATo>7puS{!z*|rDFs0D< zOuY?-W=FwHjxQB&xJK`SJ;(>-7;^|d3LZhX3jBl*O#zgbAW0tsIcOptE~b%0oWps6 zHL^BJ=O^o@lA|yJ6!02&txOzOj^8oK;Q)@9R;nxMAJ`@E0Lx=w3DK$p+$qF)&`oKjVY@rcPy%NU1p&Z%i; z3bNRYP?DA_RFw@ukI;{@ejw?Ex5A|sr+HJvUVH;GN<555pedH0u`}p2(E%;Wb|9Y9 z#1f;>PBc}k)J@krX^etxx_o7oAR7)Mr{M>LLy7$$m~%ioRP)EK9Szq7+Gt=UegmCJ z9I-UXJ(R~BVmgY*@8UY+511*4(@V^ORF2{Y5lH-tUKDQSP7w&vo>;7nJx?I`K}7Op z9@k(I9fxn_x!SezR0f7quSsG?;{UV`2V<$HY@T~>x{A!y>&PeQ$ZjvHLl|ZhFrV-T zq#Ex)n+!q1=Um^mb*8r>M7I!h@+T|j;l*ID<|KZf3gRNz9B`ews$UO%U^J#OKIhq) zKj4`46BEvpST3;awZd#R|74aU_7)PRxP2Dd zUwEP|0c*89(%E1&k$hmTkmx&)F*3>#d4tr+QRKS7TdN>ru(6!7j?0F>yfj0v4i^i1 zemQfJzYkE>oAn)%Q^;`is_ixD1QMeYYzB%ypn;rx+f<#0#WssM+NnYt^5XVHjly(+5=qioWDa7!OFq4Nddvu@nt)(Yp zZ>}>gM!cb$`GPi0lYsbA;k<=lS^I3Sjr@_an?17YGqjh3Bq|1U2)&YG zM0!d|my7$fZ4m#kIxjnJnIN#M9cyK7 zT_W}nJL*3pcZF+lv7k4d-y@3}*OKZyQZlI;cQgwl%6~g=;C$2DA)Bo3=w90dz?%(E zJqk^n#&3=iad(N6ZyvnX{=x2bXB`9_Odr$IVDF~a>)(hx$rL7u^MyB2a}vB`drU1% z2}n7of!VKlfc)a0Y2S?hH4dl73hLEyJPb+Y-qxm~--(UVd)nzn8{Z6#P_mKXnp?bk z>snfOM7{19pnVt1b_$SG|8;_}WRqG_ySn5d-Fz=@I0Bwuk}(1b;h-8R`i+n4-W zdW1MbyOZ~vmyCw023cEZ9vPLAi|z3+j`yZJ0Zf*J*GRo2%CTD9d4Hq3{W|_~tuL)c zr*WjDpSd39AszV7E_ZH#?WUSI(JbyegN;^z%*K5?Kf8S|i54ys?QH*HU8q>;9Iid8 z`EI>fvPV6)`*`fL_B)L``o)_LR!s9gg9X&@blJ+A(QIBzN&&o(orSgx1niL_NDq0s^m<_qQJ?^Rqkz*6>d8_I0ju znruo?R@vR8ZZ*H*zr;{l(tse2I0=yQ>{B9#;Xogr$vhO6|xPkD~Ut&P+gYA3y;rv*Lj)J zgVY0~E2l~NT9eF42bshxy|-`zCzwp9eh7p%-evp!#vxA%7Wr_vQ;NKt9^3TKEppzD z7k5T5VbY!EFs4Ndt<}`F4uQ)R`3>;_bG=y8xycM1Kec{Tmbk|lhgS-He;@~yZ#CM0 zErwamK0W&I-Q}kw$3)jT0B{Y;0DaC_$5!lM`yFco4v;4GPrn6#cQvDmst_w!`EzKR|^4caizdi7M_Y;HI zySaEJ8YMZ@(g}i*zWQaH3xJP26ix!R>`wEHI^LErIB8qOtvpdnf;0uat0!%1OH_7V z7XDQ`ZO2qnmAoZ$Zu``#=WuI$?1imp(D|r)k$u^{6c2kb8eGo%kA**x+OK$ z4xPLMstBx@FnI6X9T}0#L{%jm+ciylI=LtR0CHs4ykDyZ|h^AXqkp`ISZ)VEQ~ohxO8bgTJ<%H5gh^s+SQr6}qlEi3ih0`ir$ zo^^GjZEsTtwQIDfWxBQdr#mD33^o^|}v zTF?2leQJ&lln(fCKGa&;zhbnQ&m_3+tGh){9-RC&(k|CSAnP*3I-$I4Z9{~fmInED zp$Cyg4HR#vyzBf-Z2XlU%Vp9se)Zqe{6ekws8RPd^$Y$kn?vvMTU#>CaUS=6+W@DT z6kl%e4Q#d$ZD;msk2%!WeX!XB%Ui~{#F?knWQKUF;*{x@f1Iw@p2QUnHr=+8I{5{w ze2bv!8kNAv4s2}aZ*g+DNxj00!1d* z5SV`fD*b=dtrnFajrdVLZk@%AllizzB3-D#_DM~fZ0}l=d8NUVg>+7l(=J!}Pt+mD z`Jc2_pDwyd-j|y4=cG7IhWn+7lcdtUoDd~!{D_j}){ zCTn7tPz3_P0KGfrWD(mn!y(D&rab98xju-Ca>gP@bbXk4Y~7`c?PR_{wFs6|Z#0?K z@6EQxtz@WMwc@q9z@tJN%6>L)-yW-PCm9H%*GJFtsS4z(+67-IR4<=c+kLIgdk`fAgppUWiUshB63 zNyi9J>TiOD7I`ck;ca1U^cIT^$Ix10Ftwl4)c%h%#p_+D=<(iWOHI3FHt)QS5uNV1 z4``2{?ceB2Hp8$+O(XSJq}HXF-totw6|gTkn>)?8Pu$aZkXkQ2udOs6wEIViF*z=6 zt=Xoxf>iX0>by43F3PY%9ufMyX)C$ItGec`wJoo`bGC7%vo$I0=yJJDo-@vtRKP`@ z&d&Qg8_dCWTf6q)_2_6Tr7}b4FWgJipi4;^w%oWH`DhgKhY>c^ZFGVDj3C|glUa%! zVV=Nn`X<=S$~RTy4}ThBtp#Q)?lSrl8cXx3{z8A#8~h-=g4v)+;4ys)jiWowI?hi4 zG}_WUOE22Ra%Jl}QL4p5ML)~Sbgpg`_a;oBe=)WQ#{486h&tg$?N0FomUVb)HO;uy zeAdnpJX6iI<`IL)ctp_-9A1 zbsO(W_dM|wky`N)9J4Dj#On~|TeMJyd z5^p4aj<({4k`t)8xHtTaeM4_ykC01fwB=vZei(ytX~27~%P{V*3V4T^&!2=mGhYx$ z@dA1=b%%YH31_Y{tyHvN2bE7v<+{MP^f8!7Kc}bQa_S^{gC$~Y8Ed>6?PL7hChQL=W|CPFp%7WlIAHtHa3lqLi4LM312@Eh!a)w$ zLG%T`k?qtLu#z~>%x5TyqPcJ+Ok#>L6Ex6+Q7=>nP~;LQL~kR@kyLag!14QVAe_Y+ z3Z218;KsBt`;bB^k|lHobF}m?%7`6c8i4|4Q^(;!I0kf}HON{R2FlPCAQuBnDg7OI z)8m=m;1Y_1wdfUO18@e5(N20foD6)ZLdbzznehmUG_hA@!X34Oyh6obHmXH>A(7}J5CAtapGYm`h8W1F^eJRI za}xfCY-YNdP{?Nv!XNM*JkKmf(m^|;K^~$?J_Pv#ofO&_DCAxBw0 z^Cvuk$dM>$i)}+Du=|klxHqbzJ<%j=3TF*VOuWYiv8<*ya)zbLWN;6CkZDDhFs(F8 ziUJdLnQ~;Go=*W^Y9GA`6oCMiPuq+f#mbRVaGJfFW@IV+k2!cx8CSGW@nW&L#?vkSVwE$|kT&vdd3PZKi=go8F%3Y^eb zL`D}PG|OHlfPdKi<~mr$a;$IRGh_gR!;SDT?Z&KPsg?rxg9gYTM8Tvn*(^PKgdPrk zfeiFvn!pd1pG#y4;288Fy%_NDRK$VRcs$FyG$1PxOXMJX$9It`wnKCsvKlP~7KkU) zK(A%k$}Mw*a%4_0{@_1Wy4{QuwHBD@K6EI&1iTSTTFAcLxW|ZqFOtd}fnQJ*ZbHMr zRXUr^A#&E2_X7_&8(o65(is?swQe@ZrVi0xSVDFdJPZz^FThjaj#e=Zs6ETD7NKz< z9J__gU|GQ)7=uIjGGpHvl#rd1L#uQQOSsIe0lfZCvGqnNCWaWFr9AUg!zU3vH&9bW-SqAtswVgV| zQm&qG03yM1Ko7R-{w_>IU%*dj6#4{N4>lm>s6CtGyb%Ds85^*Xo&sHv)6kzD4D67V z)GH(cJYxx@C9Fs7po73mx&-**$LUH`j!IZYmIGpuvkU=NFh{YkNIf$VQ-e@iL7!nc z*+6PGc*gLUNHl?2&x}KoSY~bjYEA72XVC#rN38&VSr43q{!0td>j(yaqdtfY8-1ln zJ=n@jKw`mj#2>gKJK-VLSKH`W3;-GQW;l#xwU5H-)NYg^>lh(@fF=K0s26Ah;>XH7 z7==s~$VL{yWr#cW85&uZdN5*OiC`;miY{Z~p^DC>B&1 zHj%`7+aXglIe>c?4Pa|u7C{l@hUvRW8~#mB7U4l2gFnG5w3EDHUWjkw?4tDKTyPqHiw4qh@EXSj8wE6M1=j@t zIMA}%rCqGh8HfQ$J2nib^(_VjPN2@QWZph%6G$<=F|5?wAp()DW)YnW5;$_?DL9Ng z<})X9i=HCJvP?+QS>jOd@x2=VFoIG+{~a63)S7NGvKqe7PGj zkx@+OsSU!{_$C8M4yE^UG#H=mVaOsz^S;74#vR7r%zV*cIE>uIVOKde581N}ZVYw` zv1Q5FB;Ex;Vyh%y=@1>C*72s}EvCT;%ABMkIDvQxY~`FomXRy4JhPm7&+UV3(}ZYO zQ5nz`J4C)$mr*OZ*M-BVE8RhKBxfh*2lfjtC;0q-rL`8F+O_6aoDW<}(H$aK8HZev zhFFxr5<>-ZlVidjBld`rKuJ_M%_8$BW;2aTdVF-P^sT>Kxe^6^Ry_sjC)fCAL z6lB5eT%743r*nH->}B*h7cb4~}cn;OLK@QY|4!7JV+;7%7D)C&)H-8a4t?P6BQT#pFV>1NelGX4cYg=xfYg zm+lr2RKQm3?(Q7ByJOU`JI3zrF6s@Q< zi3g%rT!t$8TB=mlGS0&vAbQ9Ge~q^}pDQswpH0nFm!v#IV|~<>$T#9Wxt_Fy_^6IY z#nxc8teQ!T5f{l*h}z;OnIrpC9knUcJBF3^s%D^q^sP8ZZlTIkf~BLZQQ8AvsT$Rf zK0Cj*);o*dx}JRjRkDRZ5QN z;D*qxG$V<_N(t{Jud4dVqlJGYqkfxVzh<-W(=}8!XzOSPt0qbfr6be;%@R!|)R?zX zQB-X*iI7B69cJol%+n52-=LdPPv|b%)_RSunf50|(4;O!ds5v|wTjFn=TVQPo)Rmq zR(`5H#4pfpJaeDjE!UJM2_iR|o6WXo3z!wmc~2aBnLFd^Zdq!c=d2*N;eOfn*uL5_ z-Cfuz(nxL~+e(O(I;aTMWa5!BR4J2Zkz=W$QgdmJ%C6so{8Mi}T>K#UQl+qWx8-N3 zpC2NppEwOw4_yb}FM*?jD~A0SIwIN;KO=E^Qk^8CeCOmh$<-3i zm&=Z;91-CcXQ-iiP2P}Wxc}T&o!xC*Nn&AYQAY9S;+>^GOP-Y!mcB2oZ0TUVg4)95 zvJ^{-eYKNyopMLJH#k?h2D%2j8Aq7AAAg8l>&^5eask8+s;BJ7&S#zhqdZe&y2Jyx=_4N?_L95HrP6xz`N$>*=%)v% zV$MXbh<^~@HgR#~*)>Mino#{g+OD)ARX?QcPfUwj9a$WhZ@57R%H{bYcbTQ6xJOAu zi8KFj-p}HovhgK5iA>EM*9KPz)n7q&}iS@`wXewjRdQ)PpS|6^?bwTKO)ivc&-LIlxmZr)9qOp*tYOaq zeqk;6%I2BG++dgUUxayFGoTpsQb*Ku+vN+CPSZkjLw%M$M@)o=mnxR$j&Tfgnq9*A za+Ty1B0=7(91$i++3JV>LYyt7V(p9dIyLUta#@?4HsLLXH!#&$m#V6;F)=H4aaf7r zPx2!l=;~Zj`}@Ank3WQd9R9f|>&Lfw`TDX9`$cz0&p7V`Z`y|`^8Dlz#U-rcxi8rbF%Y<<4^lddn0=zcO&MF zFi6Q(PLe^yJZX$vmrSP1srpc*#A9+RyvPuxNLKT^ywyEpU3F2Hy~VSeDdaZsKlqj6 zH@TJ^t{f%4>K{cVrG2YEsoCw8zHMCX!aMeA->YR({hd{RB+ZSP96G`OzHT}>kW00# z$>00s>4zt8a^I-mY=67_L&faTzh0L1cYl;Fkq61AB&&ofGtpxufQ%#vDMd_{#-r!P zX7MREfbqDRIgdH^IF>sXxLSEupcea!*X#YrZefqIkJx2QzSr&j*E^Yc#P?RJQRASW zOUYsMN>x*JpqeM|$yO6jKu?AYN~Mz2N!o1fWPO%TkC4NWt~gcV*u*yF z? zPb!f>OhDhC6vbb5<1Rf^IVvTHYlN+Wi+{n@=E6BWpTZR~16eck*t^!d*Bj`);92iU z^XzqJd4@AGGmCR_V>ns3E4Nb`!XwI;7ZME7jHIY@xUaXRCs75c6*N#^i22GP>6i2d z5vTI%*SbN58-@g*G`}JKJp#%H1O`m-=l#n9*9V^s_6y-dHiT{qZyr%C;%4~F(A6Q% zprQT^e1;l&Yc*;ox-p#~Udw-pr&!AK)K$(UIafPZIUYFfJ1#jCXKPmz$8}q{y|*jV z{lYob*~As)e(T=t8NobcDl?+DjrR=mnIXOFytA1Qb~d+xo5t1QPw;gFJN%=!LXa?k zYszk7@ACD;zUW;sQf`3=O+#u8T}GXtpV8UWGFpdz6>Z2;ViTeZ1Tjb1BqvF!A}j0$ z=jRD;gl_^Xl=1cXKB#e8M{QM?(+<&{*IzK+@NMV!%=feJ8sCY&+x)HvFo7e2)WO|D zBEy$Qj*V;(5fM>2!Y6!C=#QYg{yHDEUZZ|TWs>ccd@-88!!&14!S+7&MtFO<+dBOm zPwZCvTbtYBZxPJ5%pc7CEaR+W>|-3oPQf#aoy2wIT5!vFKXIN=z!wPT#9g>|jZ)$i zmz=INk+qT}4V6EOD)E(=CfP+v%#bFc|4&P4h%`d(t61e>h*oTq4@*15=0a7WSO^yX z5s`Y5n@AbrEg?o2${*qy@U=xw+#;?MS4lIK&g5?Tj=HI)2)%b28B{*Qe66M!qro`N z)YA99U$K9)fSmysf*OX6jeH%oJ4zR|A(Du6h1Cw(6S&gvr}3UXQ`bdXPFVo~9#*?@4$$UG(EF2X-Ntflui0zD) z$H{M{u~M-VBp;G%D#daI1qdHRKATAw#qHt?5xs$==ZHQS(bMY$@ep@XE8>;!^|g!( zOn;f``b_cL@Bg2_%D=wPY2$fQk#80ML;g?vI|fj}*3gIHb0T*{K8^Si@i}}zSj&*m zfQzPf`sUh+>ao;YrH^z8ok8j_*F4W$4;`#+m9?F9qxH4*mnF_TuB>s{(6UbE2Ij)D zww7tu9NRZXocomLyr;yo%{z*z$-ZZ=a{ zCZpS!FK$(nh|5GN(SdA6y&%)cM`R_+O!|>46%LWXW6ET?i?mDnEG43+$s*|n==wxx zFP!IH+$nAt|Bm0p-{Xf1{iQmjuR2$KRee|Ux8CCu=l{X4g6|NM-lxpB#6K_a&!D+M zQ-cj*%_7f4y^b_Rd9M(ogiLYK&fN!<7Ss zFC~(-$eYAN;u;xG7g9OsJ2nM9d;m`aO0FL08i%+?Ix!tx7+Kc+fhI^*Uezt&h zL0Lg$urYXj(5c{hVGF`5hrbNvLe7M3is%*jCZbmO(69xekAp`BPVq}Hx-^gINNN|^ zO<68XVE%SrcI|O3aLsXM*dJQsEDbE%tPgBK_BFPh)=t)UR)1Tpt+!2UpY7P^y5rf! zJY|=2_xX0>Iw?+WCU2FSpnFJqh5g$jp$zo@)x4_xkNPaMb1OVv&IrHIQcq! zW4?nhOk5!T61oVdg#z(8uzvkTQ8*zi5-Xw)j=%B-sAXP$F1Zm==q|=fuf)x`GftE; zkzshLc&ORh9k7{mjEhW9{3-@*4*D;k$gi7kFW(^l=)h`0mcV6!hM*R~eL}j291d~? z{0x{D*f@|5u=>n1oYuBbH6nj1^QCNVo_D+Jpi_ijzS^;&0u?&|1z?tbs>%Qk`CSYnswI`<$dkF_|mb{6KMiTVqC-f%kK`sR^9FhB>3)y}|;YUe>rGZj9 z)ej>kLI4HNES#7SNrZLrc$#m5BssFNob^(+9WBu*^%L6|I zZwr|gTrJ2HBnM>$lfiQXhM~H(#5c(=-FLllp4Lo&v&DYRTFYj!0Ty91 zT8CNYT0UFWSxc;ktV^uVt-*HL(ZN;hN^)yGOT6i98sACa`IY=@-Yndb)XHH*>Rif5 zM58w#Dj7kjksEr2sHH3~fiJ&8?u3qg--tdyCiuyra+>_NTqdU}E0rb$Pdp)aQunED zlm&70H%b$9RBVCHeGAdQZ#}vI-a!YXYEqb(ARORnp@%SCxFR4ejEa0mPXOz_R|UE{UKR&Ec=EYdq9j`~G>R~(4{&n0r@452!2 z;a+hwxj?ozvyUlc7P5No2-}J!*ngOvY!-K$YsRH@L9UM9=gAICVMM04cU3zRNlhhWtLJmA_VNTjNF*A=wjiBs9+7q3cDavk}%C z1=`#rKc0Qx@FnMS^Ur;9CVr=XdGnVPWtaRci?cah3=_rA7j{SoiTd;^)nLtY{U_52 z-`4&DQ}OfS+Z)MwQ0G{05YWNSU(-dW^4kzTJzlBcn-*MUVbx*P zch(w|URHfm#mcdbL;D8=7+X?4?0Pf*Ys)Q_>x@1rK-Y@rNUfaT^ zB{wZyTs7Hge5g*M_C-NT;~CrC4-%E&ENr*6_m=u%Qm zbUIx20HJ#qus@=9b5xe)TFCFdinJyQ0H$Aox3mSdTJz5d>g z%wSO@WO|impTQ^Sd~{*rfr_cAL(=9~Sy!!bjhyObslf>yBJKuG@Y$tVEUtD8FY54P z{Fli&!?S}wD<6~(iCNvg8Gd~%NG*xA7#&AEenP79oj43D^OpWk^Ik_Ab{U=+d-@#o zE%W~pcr5U9z$yRxe#>ADN10j}7HhJQ1Jek>+<9-7Gtb(25n}zigjmuj}Ak z{%mz&xtJ|>mi(1DWCQvVy@J|A^hS2kM^Ve)gkIcLrUmXN!HmT_8g()^yi3?Z+>s|E z23VOC(6cj<{tNX5*Qi)(ExC}`s1zw>RE{Q4Z`U6(eF-d#JQ8n9Vk$03byR*_t#8d? zHQ6e`$#0|cLazi2G&CUAxM~y|etL37X7Bub^+U6_C2vOjH~!0!pUymI;fT^3wl|*T zf(mHo$-;WozD_eRZ~wBbiOIeVA7w`@Ki4)TO2Cv z<|cT5IFH+x+ZNfZwsv->{efMLI7S=hG`|`<{wN`eLLH_Dt2U~-sy@@h=}vSP>b4?? z41bS1%Z_CFdlk=4?-ZthJ<0z-94<~8EE|!nk0USOhIN=|Nfr<@kkvjUKcQ}E!u2Z+ zZ%zLO^^G2z_+N!1l@_INsckAhsdBHfw&LcvDPesAwwX%Q<-{b%oWdpFo8_GNT<>G6 z_xsK!*tQ;Z~S1mX;e(B zebRg^#!1E@M#eD6@Q;qvB-1^J@$v*|i;%&N_H1>ob`(2$I%hZyPR7yMb?$GL}+Zw_~OKA$%`swq&=-Nq)Jdq zVB(nQVPRDQFB#?$$2^_QV+yM0Hvc|6=kmw;?-#tk`KkTaB|rP;-7DN&cFsYw|4460 zGwp?~pQV3k9A-LhbQoS68X9jJ<4l81Z;U$(H4I-2dyF%TGYm=k-P+CS74%_pyJ8b( zai=;ip=xx#(j^UKr8o4{V@-H6M#mRret&|k5d@=@+9Z5Iaf zuee3rKkRrW#=FAv+%pofppi@lH0hSl-(?NpKJQA%W3#& z=7*vW+~>yM*j!YU71y)aod?jPGl$U8f2hpr1G+%t7h^}`G5u5BOI>~47~NdmS8YA* zCe3TjNZnIiHJzZTsXm9;?-HVl@>cxHk&K@wz#ZedCb?++#( zIki}5s=v{F897qq?I#g+m66hIp@I+~Y~(j`E!k^~g;BBB*`ZuIZxn6|51^h_CQ)sBYHFQ&SXt{)h4&|#?Fr|i8`C4UlT14rXr7W!&BjrOvpoaVVIf*wLFg{^$eEo433%Wj=Z za;$NTajtZ=al71~JR2DeKL-ezS8_vio-QRG02OhII4_4vOGO#cl6l+|M(-{4EcXsa z6}^S&#qQ)<3rW&=`MFHte$kj1L}roG$vxCwRfcYpah7SbuQ~91M2EOm36~OUB(1K{ zFXc(<(A4CLTN0YY%!$|$^42d|drMs6T5GOR)c4o5uj@Z0yf1#syifo1JS!*X#J_urj-Cqq4Jns6q)O8tG1c?k?c2_m@U{75`#kdr@d-AyGBz+i$IWGrsikR%@xCDe zS}RZcNS#NCN}RNXZ^0h+6gkh?8{7W17Feg*+S)5QnmLPH^}P9Pgiv105${SKc`4#{ zSBQBE0l8l#jggLvGT)io$=+k1u*ccjtby&$7P0>PIAJQ{DBY#Kz{%K@_ewo7L6xjM zsry@h%TzINTUg7eyD@Fb%}(5$+_6%ZR9&jI!nMTF<*vrmk4OrbsGTUcU>Z5vm31qK zM*XexW8ufEpM$f@WxKM6d?WJ)mDI6ab6scFp#$j_)mfd>@W$B9RNZH#?{U8_{=NJu z|9gHH{LFr}{rCEf^nGi(WZY%Q)LqwfQ%xl&$jyaSY;*4xm)pL_`qMnkywhCG;%i-N ztz;WuPj}vQpG8-(A8ZxgD|jSX?geDQHDque$(7JW{fe|xOcs}lC&kI4pBOC;6YZiP zmC5_i5uPTBh)v`}^iFO@e^#4xb@UeuF}^*6>V&t8+8vWt&X_1B?WkBerBkJwNp!-@ zIBRsj@MQm|nt$XT>^$dkbEQJ-k6AelJ{vyQ&N`hnGpk|tr>~QL-6*>4gZcqDBDs^nin!u@b3*_ql) zso_U?iSA@|Y5?j7zmorVM8-vyd-VYG{-}`s*r~Had zZ4DiD$2F%_(bRW2UO2%#aF2D?vfr_^E9+k}uXt9mvv^ubrBeU0WOJ!yoxP=NuBXr& z$Ch!e#2)f>Wft<+Bgq-meL7neuRf)US1nQXSJRra>N%>u^a~1THEIS`8J&;!N@oyh zaLM%%;mQMwXoXJjSsnN*#5ZDN^uluU%TKN_tzt-p+U38++u}Y&TSFuK%Cxc6J+V8J z={Ra0RvcE)CNC|obzarH9(l*}7ZeRGoo}9F^>-w=e|V>GJwzX%qq@;!)kWH0`r5{6 zrn5dfd_VZc`JVI%@L6E2qo1MCt3Hy*vx+tO7t9*ZMb}p6W=F97r1i7KVQFHWVI686 zY<*>&WLsrF>zL_Ox%RtaJ<*Jt{fmDl%#oHU?aBYBDysGB6Pm%$ch68u$>@6P4r?o- zA9Zu}2vrjOoIFqbM+~6*smExH+5!3_zOlg#!>>mYF(q*|6MrQwu8@^nny4$cCHiQ@ z$`HNZam{`C0Ncye$QoDDI)C=hn%}Q|d;7iD&oj9b^STu7Diy6qo&Me-++}gEQcQlM z_o>S?W3)5zHBqb4&e7aeXQ&p_&B0;EQ3)C*|0&iJ*76L;u^G&EZz#UHd53tTyf-}m zdIot@nJ>&c-2Cn^%h*%g5jty@d#UgJ$O zG(AwIxsJ}Ig3;N18hCd*qScB@)-2ZdG}ZRI7ce;ZP3X1ow8*rm;ZesUpM{SM{T6i7 zKhu<`TSzA;k-{uSbe*)vT6>tMl*N~!4#fP~T+jN|cG0mDsFl-RUv@JWC(IPjNSok~ zzrYNH4@6tC17eDgi8bgCze4#WJ0!FC0g>+nel3^9HpiW{fQe$evUAxiHkMn#mE-U7 z1BLg3QEV*!Exr|hh^+Wabcp3p!zsf1xvFFkndA?OR_#=WXou+<>rIA3h7e;v<0|8N zW1+#S|ET*zS5CKCyG1iyO`u-3lBQJCK|4iz4-xhzdR89>+m^0dpe@w=r%qQjr=}41 zzm`%5yNyVgfOhd@oO?0;(}rik7PZ z`-iUM@&oxeCIOt3I!QI~n^TY*D8~MViGPWQa64TnCCd+i9h;{NQCb1p_)*56)({JbgTxAA2qtO#NjxV=M1-5*HnR)%$%J~^?o=4{kUS2(eh{eH zBH}J_pE!!oq!V_`=?GA=Af!4x9(ZuMut{p8*Rg zhx@mP9_desEku2y1hKlwxW#o=#-kVZSIj^eMqGhi2_OaJ{Yrt%{0Hxtfvj~e*(U9k z0ufcODIP!+0(6vJ8xn71*27>Iq1!ihW~Oyp9Mrp8cvsF_p~ zsy)@5$|iY4jTP)Q9G%rug+}}$PQ^WDvsnjQjyuV1;CBdris_OaIgyn>#+DG3(F^(p z(74s9hExhT>OT1xb0vmhio#7qtVUysi=y0A<||bcQNAh<1%DPx_oa){b4ixk$y;Qb zTnlIOuMz<|1>jESz%D%)LSq4);xTum2&juQpvDiP5}8eAQtL5ap{A;f>bNRWT}7=^ zA3(g1r#I35^a094)+DzfiqBw5$z@sh(66^?;m8 z29tM*c9=)=7QEUKUe$O+Vqap`#Z4j>80)d*QL=yxrP4u{@|2Ex2kP}F4e&KPAfxyY z&!43v0_A!ItG^A&{31JLS&moQVaMyx_x~7z7NI~9&jOzPJGk9|IB_+yJ{gI5ALoe9 zLsGQPot73$eWkWiAIN*QR2@(B!5o)hVmNTme~{skI6T27FXrVIFE0#PN{<_JAuH=kHfA`DQA@jK>Wx+i1qII0f#H<pWN_CN8x>Tth^3RDG%DVR1uZ?(6obb!b6~w_TWx7 zS9uCN>sPF4x{``K<}G=qTvh&x=Nyu5Ne<*uhC+`u!u>iF=XV>(rK;fP4WQ&B$dyQ{ zs2C~$IVA^aB_9CgI*3dm&4`}Y1fshixcsub7?|!F_-Y2u=q1krbz+e@|Lysw0TcZN z&znG=BA??co3ugaT*7X9lg-IDKm%;V1fwLNp#xD@VZ{!z^;w~A0`AJO> ztDi!wgal89e(OtgLXKX8$vI~sXUDMrWy+th35%5l(2h;O_m!aCBNQ9rWCqAUtWp8! zyEaN6V1BP-KGHMfi#}rF&kbd(vIuze5TyWmejg>8;23wE>| zkOaMuT^*0EstZBWlge?>b_sZY6sj+Mh*l0kfe=lma# z&pj};=`f}v1t~f5NmMFaL7$*XSjT^e>V6`6lW$0WaDF>zN3_R$tR&DY z1?bou?0g5-GyI)Tpjd%6s_^>bkbU+sXjH2SbL(DUb1v0HQIbo6QBEt;T0YVqMKJtEM+B zdK;n!^gZ8CiASj^K(fDS3edEG{x!+N~?3bMJs*QvvPCJ>A9-kUfJ7czn#toJ4H z80%;N9Tq^CAu%^$L*_#Y`YBaW8Q~8@;%Elx0iC)0 zI8zm#Wdna4$0;s=tzL}1j|TT`f&F-jCs%{NF&v)dOx%bM<1?+$bu|R_ByYhThad~b z@aG_Ca072Y11?*Ox39wtxYc;~8Cb<^tc`)C@dMQcL#uBCZH{0j*9`1&7C5ak5r!3* zan?8SyPqJ>@37)W`2H2Xdyk*IgVYdsXAPXqDA=J>#BSm$xKIcERs)(ifV6|c$6P-+m)wd(&S+!bTZk8viy-+UQ6 zcn>)b!U+$>p7%pmGqL9XVE@MBJ;BJ(?uQ3A4D_G%KmPqsF++pO@ayv-OMgQmIzv`x z;zWO+MX6F7yKe*zo(w*?0Dk@iOGLsKe~J3PEGP zglC}Uah&UE6xg>;_QP2U<=K`u?VHFl6a7 zPW=^Da2RsC0UYxPv?^9S@cvgo2Q3C&9}sWw-dOu!z-KtdPb9ldd;F*x<#9ro^ldMEJAm7voMoX0G9cpD+92SGg- z?0GUy?Dy2k-~Q4`=$g-1%>!^d2MH{|Zc4G5Q{a)u@VpF=$Z)Kj2aSU8cMN>qdeCb1 zAV*suW9MKGeu5Hru%_cUgD64`f8{K8@EN=N%^huETRP$M#Q&b+1XZ(P+k?QT6`}E> z@Vp?bhag0y892T4{IzksG0@Sncn z6eA#M<)CF-;Tbh?ZoQz5Mu9KaLl+Ik|M$i%l)#S7c>8Z}F9Q<$Cw!FY&>Vkby?Iz) z3f7VanTv-8uK=12gq*Jgugu5!wuCks4$V3ct1}Qk@yUU(1izzqNn{HYyffuzqu8S5STdzE4Khy(4~}g(r#7^&6D#;JkgXD!=FFS;0?P@b=3= zqPpO=*!6!N$9jC~_iWQ9u=m5@MXrEW8Vrg}z|VR^hm^&tv168Et6F!QTypFeARVoraNq@2gzjGEclLzlY4O_Mcnj!|V zwIA@>1n@eC^Z2FQL}X|>ff}`an*_iGD>jXmw>c z5kKKU*&swH*Vq0rNPRByl4|+~aM;U~r&LE(E_ni4ART(U8#1mp$s{rk z)q5|Amvl3^jl4!F!~D!=iVYmJ1hs*(G=dr~2co*>xUvM%k$>eM)D=2To+>p(mF)ve z;;c*Z;$`AB`3=u$hzkE2sQh0<45reE)-oq`#O-$nA!7#OXtIj5S|rHRR0K7fY>b=L zFL^I2!S<0s^f!W{R77Lc%{NoN$sDm0x6^9M1Vpr|5>9Ao)EdieA;)W#O{5jQB^>fO z^dm^8x&dFl1(EG)#7%S?7^k!)_t9;rAfhI63!zjkL?-%D?J#4GR?_4p%6(!1HJMr> zFA&SiQbYjv6RC<=2SmE>T?4A7pLFKTQQ97BGpvO5T$afR#ui zmk}-GOG;Nv&F!NcC&nvV$pG{&{1bc85E~JN+JUOvgUT0F&piOL?GY-|^MPW{LpRd| z`ET5RP2@yXJ!r=j%1u#{e-g#0QrklHgU8TGnIhY9e@;~lh`@a(;hz(KD|4VxN|lOa zA}|Q6K$k4p1HJQ&x~eD=fxGo2tGCr8Uk_`P93>U1RQ5N8l`?Wy#W<0O+*3o?5H5wBXxWjRvr2}`&eH~pHR z;09p$UdgAFjf9_ESBW7{65kLHD(9kAZ04EGH^OwgGr~G?7;e=a5NrJ^Ka*17UHB6kNYz~0N?o8zpe>%sEcuT1 zA=_X@Pl(pkXJ}AL-i@vb2J~tPB5iom2RVfLLEe@VrIXT7at-L43EAF&Xp{*yC7sMs z?!cG2j6BUt>JxpR6yyfbqzv*Hy%htz-6@F0ZIdeCGu0Fe^bbW2khTdOrBDhee$tIw zWmV!|L`N19Pv9vXRF2E-fW!z@T!=~fpk4!BB5{@KO1a>{enAa!I?)Fa*SCl+-cV*p z6EI2kJ5fqEp)yhJGC-P!z8R~j5b7>0z8auZ2782;>pk zp%+7rd>)=zT|^BdAZ34mXFCuL$*M|oSdOW%$A^j9%3rcWnGF2w-IH2tM2jAOxCAZn2V3mAUV(=8?P!Tu=TO z=U^@DCT^xsBn%SW;uMKQMFgd&ktG}s*94eN4B-P{BCb*xkq*Oh&WupsBLY2H#}AiGFW=*ZREm8SAs&RA~5Huhxg_ zax|^gM`;rkhh7|g#mT~P)M_vs&40pN>7}08p81&Sn(3|Q9pGK)ed+n=x#8L7(Ro5V zYdx8sk)A-$NcSyQzH6LkvG+9U`>wf%dXF=`+1;#*RdegusayqNC33JIk&#{^r%KPn zv2risE4bhlBE;)Kf&R$f)uhYOe^cRfBrtLY1T280hZ;K z1oQQ>5#|w=KdtqxqpeeIZ*3Y>XdiGCIcGUXI!@bWTCZ78*lIf_IRA03caC-{E;QBl zUSwLZH@Qq;157|d??)|)KXNsHL$7s%r#BB#pEAXWTHq)21A0FFf$pwaN$v!drmN;? zduW%a-qD>^7gekD5B-WmXGM;WNQpWZcP?>7vOB3?a*K-gGZSc>VXFms--x~$3+Wa>L^#dSfO)T zo61?0c2)S4{59!(;=q`8p=|?(>dq+x1cj|Bj+Fjmvt8@U>gTU6xKP}*>`ED1c%k5Z z;h6$kVWZ+&rSaA?*6OH>sVy5 z6N!KP&m`nmI#{80^qRnR2t+oZOG|vF$q-UF}vumPbtleeb<2>!2;r4M| zcGdQJnXBFz?&i)Ihv0C!0=<6BBu|X%m-DN$zU!uYCUc3qBqRz4gd0+0q8*h_PoyPS zs!T|01l@q@tki+*FBbL!#}Fp874*VK=DPcY?Xc&Uz9`~;GFP!i?3I|2(bW>n)oa#l zU+rDmkqYZ7ud6z$(z*(Ua&3b(#+ic6Q<+=lZNVL6oVK-P+X^P=;A&%Di8g?YwAF=5OW_AuMU8{8{1xP=4xN>D~v}xEtN>!nyryHSYk|Drb>nqASU>nTg}}F{{1BUN>iyYNIx3 zrIZXDMR!zGR+Zig4C}|<@H}%rcU^LJb^Nq%v)?IOW(l-*SGcfSi9M4OV?Cku6Z)h) zt@gfNZr%9$v+J2_{#CnaZMkx4rCD(w10!^QNS|0v9LG2HRIryTE}#GY=c#W|xmWU6 z=FcdQibhz@x;A;oF#+5l){m2fHDm{kNn3}0C-b7nbpUG9#@^%pW#hg6o<8nL(42p> ziRkq54Z3IvcjCJEHE2x}&RR zCPajF)1RmRp!Se)npCnG^A(jQ=I;Z)FaL2guUSz}!KH%3r8etA$5rPEr_;ULyVARX zt4;J)ozh&Vcgh9gT0W0`&j<6fx%HfcTI>>Uu6GXGj~(dl?I7$1$9?xI?5Hi*kD1|l z;|T!H-0Vi?Pj3t}nf=J_Woge&m(}?OHSUd=+1}>f4c}g^ zES&TG+}F21&gTadr9ubUi+#*9E%hzsEZwYC?A2Y_-Z04ml;~E{E6(S#*#KZo-0TME zO9QK6SFsX1m1*Zb@3?NCM{2*-Qld~if5=xK zyEQ^;JgaW3`m_A#FhA2t6{Y^7+o#E*hD#rrrLH+;^Yi2LKIiiVjSJ@IZ!Hom5zcHu8%wlF2Go9J!9n2(gAApZuAP(kUdWU&BpvH{# z2+VkXnQ#kTB5vT{w+KIl52(_A2khw(AzPd#@lp->jMP+a0~wK&MZ{2e8&^m(x`=tX zmx4yM*K|MNxc@`{fY48&A)!=oQqZZeM-dw%uSV~WjVL#<+^3km5&0n%{3)M~rdfJX z8>h~qm#Kb`E#=w#aqnVxhGUnts^z?8x;4c%-nP;<&3@W`-#*V?V)r_R0L>?RlQ@wt zCoTrYisTj*M$fxt7)bAR)*QAOH_SAgQ{#KtqS z=cy;v``Fu)9n7hr6`bsTZZ-c%=q~&%n1l_ymw&~li2?F1AffL7^>Yn190A~@FyYdgO3F!22Kpv=zG~PQ1gh& z!M(E&YFD=lS|N#V$Yy%VTw3Q6M+3(u$6*KQcx=NtX)0D5=0YtZjXM6WNOh>FroVUpNY(g4ZvL7pP_mreoMde*vL>dB!ICtJ^j!qS#gF!OdYP?^O3L;9DCz zHO}phS$38EhRqJE9b@Zloo6{^zGB&9$+a%F`#7t)Yr3nrkGTm?Ez~7@m?&;0KSF2# z?KD$7DAobyZnoG}%oHn0=Yer)FGWi&B?3{D4Dp3fNBmDXEOY{=4IsXe#neGnsHRMN zOSjV?n}+!dLGOawh0Y6m6=99ejhPW^k8TyEiaZnkPx!#_r2VJ`H6qIU50Z=pW;|#ni)4q-jMX6N0Di61NIv+<#0P&)=@Ajxf6y zr@qxZwQN)Ay3*sNk4k4^LSN^SB_+C2qU^i*Kg%`S2*(F!Pj{KSiT4s?M`g!su9SPr zhlm5j$3lY8Sr{T3B{k|IR*GtA5jqXTNj>Gm$|2-dPX8~f(i(A_QOZg5QM#vEq?XmA zf&EnJzvyK{q3MHP?V#_$g(0m%r-kVvend=)42T>M`ETUs$Z3(g!!L&I4qY4)61+cP zkl#W~XMER--R(kPJ^>(JrD@*uCHx0sekbrd_%Ns(eJ~F9|!jTi#G~ zwQ$+~u{14xSvW2~@7L{L`+lv-yOK{9&M%^ihZlE5)nvDlg{6DVvus7qnVu)!u1rU! zE^B4Ka0hwZ>%{p|rp)7ZUk(_U9OP1VBX@J3I8DaUbySa3dsN%$6nZ(Lh52L|Zm$i= z@<25HM|Pu1s9buM>W=EZxI^mT3CWHxYw)Ar9tOFvHxZ99mnH1p~E8UKNcfyi?G> z;8ekaLbfoWXkX!?!e@o&il&zQQMSYqZA-HUJA$1@UB&KWsC?hZjpRQ?J}4rjL1=W?&@dzVh6RUahS-AhgN_HD@?YWm)tI4Q zr`@J*plU{G3FJ$-!|+;nICYK}#sSl!1BMKd6h#lFmO{<;fNTgAb_G)`Gg7x|2eIHEhLft8CsMb$&j0vHK(CLhu{##uvGz4L4rhAraj{UAB z&iu`^#&o=lFojsIStnYameefVTS%3z16{#o(>PtIxPRpcan*L;@i0Oq$y+TWUQ_R= z&txnet^Lnf?zNnKqP$jvSsQC08_EuO0I@&?;Yso+U!=C~JZFgHp?;%zd6l>kZV2bN zeRL)&31#ky)v?AW(2`NXz+T-YFI143Z$f0>LHP6XC36zhh}>YEag@ zB)qhd>4<57X<2FalCLF`ONW_mSSFUQvEOj^^b8TBQEgUH{eTlp3h{!h!_?*kO}aKk zGnI{EYSZKBqfpr1#~d-zL#Qa^ArH#OmA{bRkE0$zG04y@=sj#xZM43g@q|~LzOQb$ z{-Mv=05PCX;NXA`AY1Up9Y!DT4#p*3D|M%}U39cIU2~J0#T1ZV1kln9BbW;Z_ zgO#bIflcIUvNtH6yw0@HEJ4*@OKmS*fv%r!kgm4DX}sif&6_dqG3I-BLx22dcxsqu z9P6EemCLUCpc|rX$__%rEs06y6t)_jM7g0Ae~SBNV`&%mfF;Pnq!LZZ?XubP!#T$} z-rd9V!o9*Z*7+Y!Rjtb}SktYS$`foA%ZFRp|6JC2AtxfY#q!E)qyYoIDM1C@-6ppqM@WJCMjUYIRi1kZ1o*o(j8A%!ozALt4pU^;H_ zY;l*ndDjQW!g32d7B4yruvcn5f4JhY^OSk!@kc#J-2?cR==*p=k{^RtG*?L>TT#8} z%JgZnE)fYeS{C6?v#h{kFfdN$EEmjuf||XEnXC~tE4c~qT&tuxqA_ca>L=^kaM!7) zR1#;!+m2&LGJO%3>%(o)-qF`L%+>DVzOz*|^K_(6Vuw>RKvM3FY7v@AR~sqc-6g{h!VbXXp7V(h+F&R}YF;HDwka5pf=Ml5W<`)&Zpgr2z#{J$Gyk2M*F&o)i zaPg_BuSQ=(Jp4J`M0;Dim-(pNmgj)aGfCRwdF^WAc`ne>NM!|B_3OoFLVvL_*e6$U zy9kmVNO#3b{5sT{uHiZUF~3LLEB2PY3CmpPa60SFe?rFPfk?S8JN1}VSHu&_cGZaf zZiwo|j?nMlQYGpdr`HF04b)#{p$nGJ<8vOfNA;vZqn%D9N=52XW~}Be7sH%Gu6C2Q zNWWbp5xqoTF_{>T>SC~rg7&{Q)gw(I;WwHVJABYs(BT8=yi&}lqirp-7y@kMyi}+nieWPjR!liCRscM)si!F@~DPMsqvpj>J>y zxb~%iG0fuDk>gYy*_Nd=Ixdra&dlZvx-XiF%yZ;E&r-#7XLc&n9ofrZoWb+dS>y+L zCR>eNPM!mibPrL3BehY?EBS;c!#&cICR9YLwnjTT#jC=5Uh;f$k975ME#Qk4L$8Nr0PmjvL+QwrYH=knPZR(p91!{84<{ENk*O`P4@QGbuJfL zsf*|XtcUHZHyXd{29tWBi&&GM;FW5;tv$$oqrYi7Y8TOqn04A|USEyLy7o*zvMyPh zxduMbHqBo8kUCM=!8?R}`8^``{_+a-A-#wj1va+o?kM$PduS^%6~#@iVd81x64g$< z<0kEE9bfn;@g6@|xGxz9exG%Z_g~Re}Y729T za!WnkzI-);;qJicv9~;gAlQvuFLph1i<$&gMkpg{T5B`8b#!aB9&wGH%3bBOOe^R} z22gjXQRElZTdpMyMMZ~G{fF_>tl|WEF{1J=Dv0Z?5t*)JFtL}CDF;rlBcW63tX7b2 z3Zdd?ahg<9lKDuXy+R`EeqPSzZPMRNDvJ!5GDQfKyMRU4QN1VD5QYiSsNtLl))6cA z5Jo8(sHmH$d_df9E-{aL!!A%_g*`&9d`G^Eh~!9V0m0GlkW~g3lKrBss5wBdP@gM1 zX-ZR-Ye(%3PeTj2opvm$~VL`Wf>C#VzZ7og_*Jttfw8y zzl2%oD+Qpsu9G}dnI?VmRCc#<-{9LSFGw3LQzOV4go5g{3F-@sfg?(1cn8cR zv$R+KAoqd7XpK67HfnC@+UQ*DMy3{>MimneltT4Am5!>x4eCw;zI#7CrpCwgf4-2;6Cb{z5 zC3gotTbf4|Gj*6G^07P}8TK)R2QxN7T;whn#xilbx7w3TjM7E03EPS3tc|%&zE&Mj zlHXQK$?DV=B0{+V8h#^^0}ojtk3)NL6xsCMxs=gAj{J=9jZ45!c~&|*N%r25H2 zgn`mf@^AJa*O5I+?<0;#Z-mvd4wWh2l@`ha%=#udO zRj5+XAqB}FI)Y{NREDO4$Pg-<8bnT2E=yLpG5ZKv@*esD^AReVL1G8~5HE?l<@fS* zRUoUgO<5=ICqt08_8>ReojL)gW-^qXzp8)^kQ!_YV&MU1DL(2dEH@(rP` z6sFqAKMUb1-i*hP-5)r5nd#Ro$=Tap3hS)(^ z;a$>AzN`*KWkNE!1(k}?WY?PN3$cMvB080B)M#$Jw!S8jm8n|rhuDT&$tI>R zsX;AZsWe-?N@frkT^)i$= zhmakcjojpaR2OP8aYcTo{7ntzg0wCsU)hRS{5$0fZDPCAFQjYyS2>Ouui=>Oay!s| zLzO{P1vZsk$8?4^Ar86eO2ixbCX+*~l5Bj6Sd|E;?%~(th1Ni1R)l039xXo>) z6w0Tg>JRcf+>6I-8~%TZR5i*W`w1*J?&G)u~wA?Gw38+RmD{bXUZL z>{Kf%1y0qSp&IMX6f$Y#B)n&Rs1*mIj`*F@40?K-vQd5`U6oAoVI>uA<&Uv9)s^eY z_r!GG?yls108Vg*W32O*lXs4E$*xS#pJH{?lfILiNExC<>Y$j^4H!2a>HCZi8%&8x zp%^GEldMWDcwf|n8uc|?JCwmpi-C^N7i9AnmrhWQDOrzPQDA>&KX6Oeu#5WiN9Y7hU(%2 zRuCTAwPqluz?Jv4^|IHu&vG1aI$gv0 z5Ya7#swTpZ&SS=~ls<2JXTLpQFNiAmJM(QftUD~b z&C^Uiri*0@O(}>zgDbDtuXW~b7BefFrte3b^4wUC1 zC-9D6?jGexupTs3F)cI|nlsI5<}!0%^Ek`w@~!q4&R5QwuIcW+o>9U$say?U=EIBP z9g|BpVgoc!bzKabbw{~#VDfBWoV1O)sGf#$=(B8xCeN(Ku)BQn!^R|-tJbdhJ-KVG zn3^4G8mk?ud?>De=w$y5`q_+D)w%mx|0yQ&W@SJ4@#_16%)q?+C9f@`?2Fy6gd%7G z?jQ#mE%gx32?~h3zF?xu&U%j5wq)CBTfY6GUAn*MZPFd*DC?%O4@Fe&#vc>DC4Ue5 znVfyDpv0mVW9j?a^M*2g9qkv!NFJBYxNeqjvkod}?7oiIj{UALF2Qle_Q6_Y9dG;Y zNOK?OYY8ts&0NK{EK@{LTwYY}(EL%w%SXtD{@OkaOP~Dmkx#8Si-qsXRmQ&mtk=**F zCeuuFilc#dZcYEL;|A!uU+R3V_m_U{dndGhQ)6}P9pij;taQ)C+Pun6XZ6ac@}m2# zW;cuObb6ff>do6lnYC?ii6#DYbgy_mZgcpc&=cWnLpz7H3U3~^G5D&X8{L3jM>H31 zn(Aae`276c-Vfp5i?aXCTblbOH@E0+*;MmTYcc zr76rbHjkPO<^C+P3VoTFC0FO4I4av@lT>=4)-MSQ@xVx!f}OL^SkA5$lsR#Z{dmJ(PeF{jcxawhulMiSq8CEqjp339q2v0N6MhV zW1>c_>Y-^E7W-N=&T+N)TweN*w2zP8_`D5xGd%sq8|>><{Umix9`*LCc&bI_85%a=K(^5XmUYFByrA5PIrDSo z<+jY(mUA+1Tv34Onq{Ybj_^qRME^l6+{m!%bz(Z~?Hbr_L+c-1gZiKByQ1}9RbK|S zp$5C^m>=d$|G4G3^Koo?#>?Zcmc4uVY0B3wnJcpz7Q8Vva1JCqy0O0VgPTW$#6Pao zCFyOPEBs^NP~RuIUUWzKh9koqTFR7OwESoL&Gkg6iRh3ZeiS<^9pn*WZT_=+ACKFM zI1hY;#i%@H)J!s#namtw4!}|7U*-h;iTFcV#7}k9D-Sotlu{-03IhuJ=JR>yii%59 zESs%wz*jvebX1R$Q#AL(|E^uyqPXR>#)&B{I|uZ;)Vo5fI=>wWS*;0^AC_0j+59^E ze#blR2kW1;d=>gW3aXSQc^`}Hri+dUu@`kl*U8x3w^i_e(Z40_tF)xztLVdF!-Bf{ z9?=~oSG&7gc9twEoo-394!0G@P{FQm`6Ed9(X< zue_qX8+wNt>l(J{)@Y6}3?YIZPxl0 znni4FrVkv((mlQG8J2FQpT#2zHsq#fZ^`PN)jYd(?zKFxg8oI7%kGr(uKhxg+FLs! zvTE(RO-D8xn{uID_r4bfPVeDqI>c&_S7MAZkDLkrw+WPs97Np$3onw*W7n_(L*v%+l}h|YC{a_S zAkQ;LnQgwcq3J~Nnf$dmd$Ut=TIWs5|57llIM&qEzSwh5OjM>)7ySk#UaNDuVXp=c zQ`U6t(64{*U2UEvmn6=QSnA(P+s64Y=j4YEFZaE2q~CpX>0!4=Q7?9V82sZw(K+j9 z_rFA}Cf@kM*DJ7l*tdvkk;#!$BcDXyjvW&BUyLBtSyQspa(m~F&zo1UxVWEntDSYF$z2VLVz1O}TQ{@*fu`*`dH4UP zU+*rDnpUdLR@f1y_&CTDW!t_setY57{nwXYTzGo_NsVWZUgv!Y&Hhp}%iO@>E6dDV zgUfGxaL^tx(86mwzC)W6qDf zmPMrHjPs)>K^!U$kYb1;YB#eD8IxhOhgqq)rOVPqYxi)g5$}CVcP6jO#r$=b$G*U- zmM$*L%lVoWl2!ZXqaVk9w#<22Oqe^^c1k}Ce@2UydQ|CGoo;xtouk*>-U(f9w^&v0 zYxSNn|0FZ`)7|GIAKiYC{3P+Et08Zj(=OPCjg5@>}O0 z&O4GP6bvbDTe{G)%CnSxy+2=T-#YKIya4Lw4ior#U_3aWo@~x z#4m5$*GI3WJkGrz^XNwU{?`ZJulscV+o}A1FJIOFc^N51J&#}>*T0F5}PtK?;d*787E-66wluTXJT~M-I)yj{RoiB>Y|10Ne*2>J6Ki6fFInxSml)kiXclru_6_I?xR?xLU zChCdqr{-U-BG;I8fNMLGsGv?&M#wY8ksi@8)%MfU*mR-vNXdUCRZIQLd~C(iTI~g2 zec-gvb@4?tB7P5Q*05=fM!(e?mE62`TD6o4DS^9~8NzhOP7|N^``4*&ra%8Hz3wZo z_qvY}pTB)wm6Klfr{gitDM!fPIaR;ZM;jO$JT16y2p@VqA~&iLMpymt^+D785{>7y zfix$@xqdHCD>D^M&F_)x$xgxP_jqQiY4PK-8VB}q8P zhk${Z=_s|gvk$a&Ezba>xT9^1tDW44acE-oE}tF|w~``jnroBE*|jGnN7kvAe7PD^ zp>FUe!$5X|TFc$Y^f-6z_wX-OKezm%&B**RA!FQ+tlaa(S*Cx>IoDg^t01}2-z1pC2U&woQQf6DPhGy)%}(jO4xnGB(ak-)RI>6u<&iZE^l#8c&=Z5 z_d=ogY1s+}vN-d8gV*nik&>~W@?+DHbGBZ%k90dcx# z9P|ty967E%!ghH%>cP4*%ZxE0E2F~WURSUuI;&i-Uc09JTcw1}k^aFUevkCksE?jl z>%d|mFE!_McBjlr-@kml@%>`fwY-~!rNx8GNNX==ju4=>AqA=**U1p?J2;?6(5m1m zAQklzvkbYJeY-sauh|7E}|cnT#3tqO9%k;BJloZ+D+fjOg|6+d}g z&WiSB))MnI^L?|Hze@CP$Jr|!lHz!dTc*HrZU$8frV_80Nk#Y*XHWhZ~0|;Sj?DqZ>+>@rLQTAg!4@%oGxf zrNf@<&g1rD<*Amo<}A~D^HHnG#yDTN?s&!vbH(#;8rua5`ZX}qByu>E0yF3^W+!eV z4VWB6S6^tXnq`{dnr)hq+7;SB?NV+sbB(UUe%GAQ&DHPI-!aVduHwHgC@`dM@XNqj zfinWxK)Ww*c+8EVMyfxh&wNXl#Wt-xz5Fh8*9u~Gn?c4+wV$%Dg~qfmDn4r3CHp4F zXGfazk^7dg0e-m^QGMHhJ;NnxQZy!x<;tM{S}tb@HTmc6KCWcvYsUs>oVyoai0A(o zyvw>!0)9|xDH+I5bw*A09aM;K1Bt38ahO~H1=MRgj`h=&X!{#p8<+WP^jYeC(pcNO zrO$hxt3I(lYY;toZ+wN{xA4w4gz3CB3)v8wLiVNu+*P(Xwm{KcSUv=b64rUdozCA7 zugYm^O_0Zakatn1{JYdn7~m;%_O(wee~+8lN9(lm`?lSV#jbxmjm3JXtJ^{zq@L0v znMJe}RZd!!mxsuDkP!2sf^C9o<~PJ|Af&~h#`-$)wmre&drG`Rg}Fx&lrCVX+(oUl zM0imy`V!kwGf5k)&o|UGcJd#dx6(|{d?3vnKW^nM3aaUgCVE% zm5+)C`HG(9?#=G2?knzWcZ#PD|6CX?t&x2cKiqrLz?TG;dD-eVg3i}LQ} zi^{`oKB&Su<(%d|%2$y(ASYM@8LuwPXm$@bLbFz5)HLUc*xFn-&dfGq`!G|Ht%;>t zlZ#PPpAYqKQ@F~MfB;wokI`&YHsTt>oMG-W8<~NKwieMWoe%z3b!2QdQM2gQ498Mj zgl4yPkItq$r<)E+{zq+qR^s-t$LL{XBC-csF&z5)R96nj?h8F8zEr#?2ZP6u4aUbc z#Jz_|b;KY1Xpi3Qa=vj`?DyS-W04QL}}ej#zt3^%gYI-iUub#yz(-e@qAgopmV2rbvBcrgF30-FuTA*q7u|KKft&+2Kv(pB~Y0s$IBVgPHDdsEnk(N$PK^;>_n82 zhv+`+BCfmUp=P~y1a1SfbpxQPn#A2=A2K)TK~yGO=w{2!q{~7m{0n}%CPT+yaJ7Tp z%jUwSzNLJbcb; z6Ejh*eN!HeDxpZB0e{oe(qlxuN;K*NX9=g_H0&YnQ5%>+TrVizw1%FB6Z)6B$J)`F zgKS57E8&zSahl)*Z3E-J1K-C3o+7@oI2-Zb-!RL665T)_eNL_gsjLw)K6_Cw+lyRD zTnBY07uD#GWJQXXUc#H{u9zs*l77OO=Mm~4BIJ$oD`@i{sZBvpe?b$7U7uxFup`-c zb^!AieFsW`fv8kG4Q}XPpjcNVgNXj>Aw`m#$z|eZfq||!7}=g@VrSGg`k}hBw#dVTVW^tz&dtsTbAg`TIq5+jK`<5}!_Jm_|_nOa=A>yO#6TJk{7V!!#GTDDDBNC%zJ@ z3Gev5{8G3o_CYrFvG`MpP{ylgi3rNT6f{q65euXwdBQraxVZ=}+4PFcB z{C7`#&sX_IC!ANBsxCq>dN~wWxdKPw-LS zDFMnqayr!WUxayZhmwVPqFGFqrhxIjRo=OYi&?LU)9mN$Y!aKs zbYcqWYoL`5ruM@hy&5W7Z^|_!qgWGF6ifJS{9-73XQd8PGuhQ-o($qdh*EAkNR*V93AS`OOO zFwhU*kRzxx%0`9I3A8``9(lM8sLEXqI?+OqRF*1z!S=l<*OH&ZI=})OsjgA($qDjOiIuL1d&GRPn>0;Y z2)$So_{nvZP*5j2fm*SczRZ}}Yg~vX9{vKK*_+HNup2HCin1PWaShz$x?&&kpvXx} zP_1wTC%zic^%tOWIt9d#Qz`|*LKf=t&ykT-Q>r;Qz;{9RdkV@A&HYd5*3@U!Z1C3Gllcg72e(Nw^5L z1{aW(PL%VdyJ*dac!zm-5}Z|)7OEH66vgCoY9XD3IA<>7VYWiS`#YV0{67T}0-@BF z&!FyNk#q%=;6K48{fhDU4Yl5r!Dy;TxS-k$LmmEWq60X+fz&YS3g{#^P^W&Cw4+9O zK8OjS$~ZYwT8ZkL;h-x|k**=*v`VfFx?^W>w9G_0Ih)c#C(@5zPd}kE;5s>!K7mZ# z2XZqa`ftENTcloAs(}bSLH0(y%2cVN)K{7#9mAWPLZ!qRr7xIA5#UR@sasGV-)3Uj z81@=8$N_X!Du&b(p=wWzh2P~UxS{O;88rfAMIX@8wt}to2XdO7!4iqb4p|7+_iJ@2 zYFX=pHtmHTsR~x*Owc8J<4wnaY}rqFjb~pcKZA?w75TdSLmq+r=qqr5E`lE11?z8Z zY9PMO3QhDoC_d9cg6Ilr9gWsF4FXXrvcpLrMPz{+S`oYPeYnqEKov^{M%5jtM5}@^ z)tmHzYidKZLU)>`w}BPAgLH#e)k{5qGnMmy^;8={Y&ihB#B!Xio`a<}1bmXsV8}(N zBKCtwjILeaV^sma=OJpKzk}qh!lTrPD5Kqr3?ZJBv!EQYYii!^8KX}?_ zc!S%>&<`Lhp_)_%dF3vMF9{&IEdg1x9yo)8zzy63{*O0iJBjaCL;Vig=TcB+$HAE` z2k+1r+JI+dJ+#9GDu*hkK0>A66yu>3{HaOm4pin306QWFy2c7<$FyH}Ce-aeCX=vN z|BGF_5!~#Ou`+=<1C~@f%*U^&s6VN$0Qo8wTCK%OW2L3y3-`Y=IIKn~6F~_L2e<1a z_)6ak-!80ZZ)qUP)>dgVRXAQ#Esp#l1h*oX7~AdIU4sADO` zIuis%&-r2AfdV-0ounix8LNzjzD@N5nYuox1Z8N4N2r&wfdg)pY0zbw zf-tKCgYO^k0_T9?=mM*=6FC@OAl=Au;ErB`l3^Q0Lw~fx-{`H?;8IUhta5K?jP6RM zQg!)HIZvLi)B>exK4^!3qCbw}-;YI2@;%UTCt=_34I1EK^hqUhHrgu_Yv6XW6^Qh; zh*9bhkf=NUk9gij@|7xL%^=XrBbDkHotr@uIS;MwHdGiq13~>FwTfy$eIXBl%^Xbr zi+QR8J1HNmw@d0d^)@)6dd%(M|FOG%@$FtmiDpeIT6u^%$FZSU~SO?_i4xrrpOEiH4%u3wm+tBT43aX@R5L9M>X4dR~yUbEJ zOA6rf@akB!f}JSEY+M17$0dB%hM)~T0FU_#&X;pQ^1BT}*1?BKKwDTqMR_Gz)kC z<{qeq({L)P4DOykcy~68p@v|7p2ylklTNHtGchtggDE!#q+JGgsmkDcqdpAmHWLVy z-9W}0f<9RX8su|)_vN6;)dgYf6!@ZpL3(y!%y^-{d!U!^VBd(xbJoQiZith8EAV+H zgYniB^)Ch3ogU%GFP>-_R;ffh4;1Sl16~E$?;JR8+tJ1sK^pl*#*D$&cLBj}ASl$Q z&@=CFc7295GXwX#SVW)l@GLGu54BPe$SG^_FH10*`lwbYg>HdsTc-G-dh9xMPz&%C zoyfVEk=x1nWJOSbE@KAV1%vmc(hM~GU07v$%j-d^2!ifbf;WN%XG}9(vWJrsu|~87 z&zD2bJVYA|z?;N_sdg7XE@LG>4dUZ>a2UU%4k8bPJS*NT6T~E6tRpO#cn-962-c*J z_;;7UJJMq3_XVde6m8QS{HE#{Up>Lr{u30qIymvAg8eyKt)dpA{_QYU?F*PaRj>;# z#n)FwpWTCEY$RCVx8N?Cij{LB)-fljiIM2P&7d4h%37tnQeR2Iez+9;n6HYcG)GTm z;AyJjlV*a1cL`%64?7Q>T2b#i79{_P@U@J>oIHrRn}qv~6%@Z)=#4{o=22KxBGf{R zwRGhNW>g2vg?Av)&B0tXfz;ie><9Oa|7Rc1Avb}Fass!V+F-$6z$|uRRJF$}z6OTV zcxAfM2^{!&$}yal#(;|{fsHs9l)rS$PLh0vr)z}~`Wh_C0T@~BLD#0hgc}QvONOcg zA(13xJR3n|;Em4VjIb0Va4CLXi+wK#Ej<8i&{;Tx-2xZp3sC@;{|zw17ZOwO=GDQ* z%fQ^-hV^|i_WH444v)a+1Y_iO#2&RDr`JG?i*%?lN8=uI1^bZ-YSewK_MNcb{(~|7 z7*#=D>SOG4tp&kCj z8*jJOfx|fT?KllqE^OUIRod1%lNJgKL0J&gC+lC zuxDaFT@1z7Onh<~&YbnYiF^pQSR5$nZ?P6v!S1;de>()v77s>Xd(4Vo{Mk)trC;3A zZ{S$Qg4Q_%&0Oz5c=7|pi{1^2Y!xZP~P?z#l?C=R>GHmo#u@bi|d zy)mvC+?8qEJV)a95Aalju!o|45#!2?cNQ?eJ&5=8BFA7vjROxJX+oO%Wu=;IBdt_pEB#}DY<8I^qyMvZ|7PGJm7~b{4 zP^ym|?Qg6_G8h?s(Vu(p_H!{_r(s7mfZ^%EPHn}in}hilj$OAcSf_*VwomZgQqj}D zSe%jA+u!0mQHFKq7rpxzm9{%}v}0H;GVuP-a8`u}48CRt7^%P5*Hu9gev1(`0<{S=Ofo^;Zh)*jpi%h7$yCpw` z8$t(A$iFCK(R*Fc2XQ!MT>!Dhh+X{~ZegQAmp+TH4FU`L7*^Y#c%w+nhQF|8tiU_; z!I`K8pE(8nIR&f2e9Y*H=z}X@fqp^HY{zV`gw>=Jwc9TgRf)&^I0BOIdo>oT!&Z#C zZFsZ(*sIoICM*R@^*6Mg8S{|9irohD;y(N`<{*-p4GnZ_+&);;mUO__>WvYw5Jrd$}uG$x0tzT$;MdOUl4ynjl3CL`mv-Iqh=R&zHeAr*I@Noi?{iTbKone zIH#a3BeC*d!}`((Ot9X{Ce)x?!Oa_ox8Trg|KXHX3Rc{6^l~4Z**UbvS`h9ju%f)d zx*Co7S^;%zd$7XiFyv}<_`9(*O(hBX45o0^Ap@`>yV8}FL+Sw!#?pF*tV(Ivuj`;$Afj<7yl*V zj`Kq~jXQuOUzE3FcILosXb1XLgH`AkXMGR$l@DlB1A4L=`fm-!=`t{DTVmg-ja9lA ze)kZ&$|SI%-je~SB#uW#>pQH*D=;E@f*{rq&#)dN;vK4{m*ccNfEt0woC|AA3fvs_ zpa&acj0a=q-GjZM7gm=z^#x|sY&b*2;8u`|=bDfE>U=o)x@zlwRYce_R?W z^}_viD(>Q&rLEE_{B17k4vo0mR+qcUf5?k5T8ls^T>@Hz7L3V7;6LBPNDN1tPr^*i zfUigcXsC6_GR>t|G6&gAE=oI2_ey`DsWf7-GC^uCwDO#GRdbRKFZ(Flqw-?wA!|dcpEb@JVjX4OZzaoT zmA@@tX6tE>1PSJx>!o`fuNOB;rLw5xgPyn#yNrr4`#VOh33ZJoY65+aend~FOK^{y zORffkb%*isG|Qrpz>$<|cM zV#|EXeoJGkVl6CxV0&f1?6~I4b``l9o=F?{vku-aBphuH!tSz(C#O z%~pX!A!A(Ny~SsIUPQ@N*n`+HBJ53GEy)7eb5iUZch-OTH_L6>&*K*@A<9uV9G1|xKw=bwlc*AI8 z+>3Z2{&HMqOogZ_;X6a}1NZn}_GNv(dTrGgXoA`9)C1Khf8bMGy={SJUCG0O19>yS z6Pph8m2c6};)^BEOIDQ##bG5cOSYByTbkNNLVFW0eMOa=gBr`sVGncHHD|P)bzgOf z`e1!O{aih#AEPsBf9EPOtI02lBo5@^59b`|Y~)<$OmOaV)U|8NbImu(=9FA7I#$FN zhnb(+Y2m5*nV!qpHIuce+VMK0f%Y~A1ciNwd6}@fa!A$9RlZi5otRp2cf5Dp%a}>g zW1^-<&JKSU(kgJ8Pnw?Lc;dL|@7iSjR{CerjDn&0#d)*xuH{b1shzzxGx2BX_x$hg zf85BbleeO1P+4C~i}GCCTgNe1B7am|g=*c7>@01fA=m2~@_ZiS594d2>UGJW=;F19 zxQ48T*+_Yam&!pYO^ETlcY51x)=8FJ@XEHDZ<};wzNJFR4O3S6Jy#QPsj`&lO%>6; z+*$1)uLr&-gH4fag)>RxD`!@!o4Bvy>Iy3>Zb}GCe3QT?w6E}YY<$!QXgInXduq#Q z1K|-<-NWn?Emg~kii-;q^SftTemwpdS2ph zsp+ur2bEN61A9?pg`eebxS3H-Mn)2OE~y!5w+8RAkC234{p zzN+{s_I6B-*dy^(69y!9Oq!GUTSASvqKNT9=X{>%=W(OyKZ$p;RT$>}Vs}~gm$5}> zbJzWh{{Hl9!1uO4=4Ym4f6bYg`y?kSXF~QLxxxAQ#lubY?WL}#sDogXeW!VQMQTp8s@o>3l4vEm+2uFGM6XZI;@T%KuqYHD2i zzND}u!u;9!P~1VCLZsx4vKjvJDmO;o+2?Zb;;7JysR>--g!mIN5mC3I+sEHZ%&L@? zl#qBkzF#aEu`=kcPk*oFhW+{)y4Eb*sKohBr*%$Q=aQ!d=DdB`W3oSFZOLwt6`5K1 zBmSrS)Be+$c_wQDytlK>hVtRav(|F0L&b8LFiE*ey=VP(nR?z(-SDUGu;#R8gLb;^ zCu&br_B<0!w<9koo23LX)AN@n#I?=U3Y45Z<-2W3pgf!~S1XAzO|`P}cXAWA47tm2 zaw^k7TWsJ0eZnuqyDKh95-K?2zDJcrO^s_)X>WB~^@mkYS4@uW5iumh>37llmiHW= zmp%ms2h&de%X!-}u4G#gpMNxeYHmvIs+>Ey19CcLWqzOdt=0GOKTc#?GrQ!iC^VZU zTGDLOZ5$iqfz7C#8AV~OE*^AUe{3HP;b_BU@y|O=+0C;*-AYx zr3)qQL(bg}!B)p!wS1nfXL)_cW}9d|WQs0bZ)xP{t&&tWQww>Zg-oLUsdu_xc*xdp zL+q89r?COif5xOcBtHO;M#Kj~N4pm;~wk1`+gQp;icLiaeK z6Zp}ulupD-%7+;ykgeJq1Z>zC9oK3;UN zU}655oDG?dZ!zD)GUIZcd2C)cc#t+IN-bu}I-82EyX^PiKyCDF5xvx6au^-J267F# zs_boM4%bMRroW*3L-UDk%r0a0Q-32)@fhd(H2IWhMXhriP|Iq8y!6pk+qK5g)$R)l zVX*6=RDt`hX+mESi-cw5KEr)~Ti}e~zfU6c9%b9j;*B5ZKyx!1csgWB|$Gr%>-`N^JKKD^vw8c^t&BDF$sQOAc>h9_q=ghXXw?u)()w-;tm++mH;)R0vwSP*PoL}J+y#W2-0v8>ffqh z&L>DK#i8brisVJr@}vi7HqZtN`(7|HQnw{%+lBQ8c#Z%0m4hBh4(z)dfs}U zqyF{)n_msT!6Ch4!s9!~jE(dT4~jSxaU-0KZW~uCc1PF`{~6#HZ3x`v(@L`gHNFF= zX38p|z9*CaSD>A9EnQ8o%GQ>xETM{Gi`Ex+D=94QUXoIFy|ii3xct(B*QEo>eeKW7 zw}DzY+nMaVf*@IZfaNIGbsu_sLyoGQ22SP`3szFW_ zSy3Zd#MVN(P$peRrtz$_(9_-dAHRwGuGy|zNT;E~%$s?yALSe2+sEsI_W<8PKEr(X z`p*q+9DX%|3$GMwQV*Q9 zm+A|Q)s|p-*HK=`ig*nbrr%i~ZC|FbGF17Eey`2d9|G;7J2WuK-hN&QM#+~6nh{(f zWOC4oz>9(N1G)vS4)hPY8t}xguJ0|MOFkN7iLSEd4R?*(%#0@2$%8@D4dExdTe<#r zc6BbYKeMKpkDDS*X>h%cHMcNdGA%6|TNZCFv$O=Y=$SRh#yHkH{&u(>XI+8(Rlbn_ z&fgPO& z3XV$lG4_g%X3jFF#&yZr-1#q9!4q8u_gU91S2g!Xx3A~En*xVrgis`K;3xjeALL*2 zH~Cb)iTF)=iX7%dDM+d-nZ#fa@3cxsMD@0)_YgY^#(lRbnTB}dFMpeNL2lDdjxpu%X%4}i=1 zO-~52|I?8*4wcNvf*%mJ3-v(=sE4OtC8i2iyjMlx58<*91$Xwwo^I}=?g^f1d}X-# zMhf}7FYqgO`7}WmtrCkI)nss--oUd@Q5O3x5RaiYNSOI4y^Iez=1?^WjLJ z?`goF<30RZJ{UZuO#ZD9EZI;);ssrd8+`v1MA~4uSZq;u&Ui|&W2jNb(;Rb{ zslux8O%7w~gXY~CZqnIIOXdQccjItZuZC>%GI|%ipFR&A?q%HH7tlPN%Xo7s8gFe? zt)KRkrjzEiW}SAgZoU4keyaYcZiaT2CYRgAZRKv`Ttq+}I)h$F(;3D>H~67 zEABV$6`KPVk&Wx7sR`=PcP@iF$@Sv$@g0{kfy@@VI$CTmG&Hx#Lg)g1gR1OrIE>bd60 z_89njd_TS`{};w_ZDD}WU6>(wpqzPwcH1L5#mmxDxs}oYYOoj3fK)*SREsS8Mk)&0 zq}AxHBT$K+fbwA*`<4yj`f+o)?%Z)UjE!LbWJ9@0_}%YZ0=JC4#YCX~`Yqj$@nYBg z(k8J5OfmRImFWx6fYfKanGdLp{})=oL6i(XSKxH0CFDcI@=stN|K;I!1+mx%h(KzU z&rrkkK!j^HbQtU5G&~0R#?JB+X&`2Exa1He@d6yheZ;N8aACRdPB4lepl~|G@8g|( zeW9x`TgVk|z~Sbev`HEUKI`vLCw4`I^9MMj7m@F(gvj(h7| zCt`jb!4C{Ud^H7HjapRb|9<3?xzMbc;EFq+^d*nNE8!+0^@kAsU4@wF80-v3vDfuR z-e?Ziu!iXOAW+!)fpd{2pON#Sjj61xk(h7US(0 z^%e8^D*Q{r=yr5Bx;`C)cb1{JxkOE&rXx=IiE>g!|HsieKu5B*UASD;?sUiI%p@5l zW81c^JGO0`8QZpPGlN*&>2x}676149SJs`iZe|*F>Zs1U_p^Z~_!V>)ZuoN>wi;Ul zny+YR4?Acp)?ml5GRy|bj6ql&Mx*ITx0LCYHkH+{T^^nIF6gi3qV=>C>UV=KPJ|LxoQTaKZU^=VjFY?`Uf=6DWJ&u z0ln}E^usGaL^Q)s08yD3LM$el5mrzMnegxUF1#WB71|;d?+Y~lF#Hb8*eyYMa~AI3 z3dni>2Y)Nqji&MMKwsE`KLjVYY5Ynq1WpN6Tnp|a*Ny+h ze}L9l#&v=FJ)9T$)q)uqqx+zmWfU93I*LH8ghft5@4Eszj2sL|eE4B}6MhbNq zegm`BU*bM7hiFIy0r~JFTzd_evG0KTusaqGJ#_+B2iWwZfie6S@+IcLN%j~x6Yd0g zZXcmOFb7WQM?epq0K3o^pe{4Oct1 zBxIQM99H7~;v)F29>^q~3l-}{pol&TdaW^VTD^g{^0kE~pc0(NH;2(3!B^uK0K29O zByGuAKY&8zhc^A+C(Qwk)E>yteJ=h6->4BP9AhE1I!1H?IcE}RV-^c$x}i&WH3|(og8HG2s?!VP%ku(=EC>N2Q7{aba#*8=LaNE4263BF{n4xL&}ghpyi*4o`ag@ zYIyc(3l)Y*(9>1G>evF+@k?k!tS8n0e)P}=@t~ICK{2}tda)i2fhWKtP;;J$*kIPF z3Hq&B;s;36T_Z#TeS#Ah@ht3?GWdgVx*i9=>mmrkPRM>Q7ox>RpzqoRiS%ly-W~(Z z#weIq0-@eH1m<%K)Qkwo?2&p`V39{A4$ zv+@e~`yWt)8xC5EFVbb#c@i+p+LM-}Ugt*DKxSG6L$ zc-KRu>uGE*UFJ`B=9AG(oa>-41F=I6d7ksPOfBo{s?8sz+`bBWfFi@5$}OZfkZ*vS zILdv3n-;VH+v({h-EquPZBveRxAa*#Ii*!_BI#eunY?e+yXl&24=k1az_kP?)MaV` z+nnt#{g&-frIy$6Ka_Pv-czF`vHT>lin8-3uzn86L{#qdJ@CYn7XGF$4KX{$qc$z=m4UZ>l@Zd{Z48`4y%N9`+z~B!xq3g6?An5tp;`C~ za<%Zvy1(j1r{ZC-^ zTF2`7yTmD~f-S^)3wPO2q7{?CdBrJ^gwaDYQ_8Tj_E*Z;fqy-7Yzw9FcweSBH_^LN zm`Tr}uQ{u;lw!~w(zw_RvQdbq{vNIcl~hHENsOI$=!@CW!k@we{=(-w00 zD542D8B69UtR=Y!-%sBs-bz=vp8hL}Lb@hq;4*~1#2QGTTSq#$-pE{Zgf|^uA}p0? z#7N9X7uf3yjfguE&fZ2EQhNVn*BZp<>q!;L4?Es@j{T3`j^U(@-Q{iNUB$%!#-bD> zxFSD;PUNz^^_+RaD5VQdh#K+@+_?87VWOkK@3D@P34hSJ;y&`KY&09{Ng(gbM&dI# z3;I!hA18@(peHV+S1WC@FIXR5O*EA!!AbuL^f^&OIb6|~$s^~}AQZUiU4KvYpB1 z8x>Jby;WP<)!Nf}*m9+!L0Le>9mhbE+^XPT+H;I6ET_Cn{B=E$h`~N~WqW72;?Q=A zC2Vbe2;-*JNGZ2L=uf8TI)#joH|0O_5tN{vqdP7OlQ&XSqcCbQ5R%t1m(`Pk5;VW@ z>Sz`0lC~`TS>O*QnmEXeQd#sH^|utWWG!?R>hEMI(@c3#y-qV*X`t$Y%f&>DmaUV; zNnibKfL$KUpAn9--`yF`5Z7PlKX;HT*7n2J)Vi*Ge?_o$e#Hkzh)r)sO;<~;Woa;8 zHg!c4!*zqSu9oDz@fXEpm zh9Di#5#WUgMkkYNs8DPRthVooN%9}6K$T1$q-v?{ufDH*ulgORi&z_VHoR|0S@7|o zr@<#e(TK!|!Vo5;YDj?oxh7G$l4+{2DR;_`5dQ>>4Pvjedpw`5U9Alq7j2))gG~*J zKNa3C2{s-ygcxraI~&7{4NGnpT{P&+iYiDK<%qSmw2rFi=c>f7XLEcDgjzr@R`OP9 z1?Y}G6P;l$I*X4Z7ogwqs>FS94LXTjfoVV|+8$g3D@8ffeh-0Gi1O8Q*Y$mtI?$^~ zNx0~$?kPeTc?b0YWm{PT#b@79tk7|S>PVWyp z8xxQ)A#Qx^*Qh#?o#MvD_fEQ0wLEEkR8W*Dta8MiC|`&e@Iv!cahXmfwn>rvVeYs8 znQgV{Csa_wj4KUO^N(e(&d&YUCMz&^V~!`=lf5QqN$$nm?gh0$^Z}|b{u${7&p8cX&lN9q z;O#(RUhO^UNb`TfUO@WDMbDw~0p1~SHq}htOLh+pxmOZncw%Z~~v#Zv4z>RpF^rUPiF>@qOL9n28sZa~xU;HVx6NwJ5+6cJyd`o%v}Jlzy&xKbQdoSlCq zw=Ab9i^~ejYmz%4w^PCR{3Zp1z%{?Uw7a?7e5ZW5nmjD2k%c7m&aB6kkET zp?kqSTFHE5ikMTt+TX^!pyyE-aSP6Ly8torKcO98m+uPvjJcjit_98-R?)K2eUDov zjN#KAYs;s*!nwKleDK;Xl%G%_%ua<_{arsedR5%0*!Qs)qQavd$9fW*CofE%QE6R# zc}%yMY0*=n7e=Iq^bhJ0FkO*JKEgVH?{cWn(>vHcro3-?uIaXMPbptqt>8~VVt%un z=DAGnksP)lp~O-6u5e>Xit&e8W_?p`tk~}|yZ?G0bBJ(3tci927glc~k|fB@Ot!p) zp&+*?jYy>en7WXc97iO<(@B!_7xvpF+%oor|E)jAli}Rw?Ch{u<~k0r^?5fWN>|vs zI==av3SX%V6)!)bGN}(L2dU2nEDdQ8H7!m^On@2teALjGpoHv{H&rgiWyCIudKvy9 zA}6#skPDga!E{i->vw2M(F8CDWjva)b!eq#RH+(|k6bFUUe z6)!9)F7_3L8U~p5SQz^u`&mbbyR-M4FP8fz{y^364F3eOAaaT0WFdWCUL;>3J3!we z2au0|4saF-G-J@Y2nT7PmASp_H{VHLu`k0@gBRK#Afi7Nuq(XdTu&`ELTcLm-aGtG@u5e_+5UBygGTmbDQUVFQ{F-uXtq1 z?Be^y>rIO+@wPR1nebxyHpO|Ue~`49`~x*N z9hnU-xZTJfQRHLcgybjN5q#pqyj$FM$57xzdaM@TPsk`!iyeFp$5VD5B=)N0LF%dM z&H4b?sof9w75FfGMr^D2775+rrbmayG>UtZph{B4KaR3RY>m){*@JfIzi6jv!c=p} zbTJlAmuqqp{h@Ai#ptpiV^p!H=s^it5>@oKV1M5DTv^_W+>^j5>}A+l!Wc#wyP7J> z``V`3_d1I_xn75Vgs@Bu0B-RKvIE(e?nh%xb47o}S@|TzH95k}rWeyja8v)nmtY(^ zPKxFy_@n$PpVjl-)772s2y`^Ds>_4iMqeGiHgdpw-8s`oaj%xWNWcpE|aUx zo%O}|5?seD?aVS$R&nd%x+P1CMi-4L?o==#e_7$00$ZVItYmyrGOScE{Vh+j*K;;? zUUX%6`}tEiEo7$V!fj0_aqt2EAxBb)pu(VJO(6fkL*9Z+mG8tf5XKFMEaM>HSTg() z?igF_yY1sWXI<6Yr|l!{3H|}XeKB9^=~sD{@(2)T1DOih3XM;v(7)03)m;t_2R_co zn1eBcqqoQSV|T_sOPrP9i}?^SD@++$8te#~rTeORq$cHs_z{ujHnIV1EzeGSwq?J0 ztLd>}P|4z=c?IJOzUA-D>!07Cplg9xxUINtNqNaS!)Bw*oNG;W$el{4-5b1#{0mU+ z_Cg!uXYkX+Te2SP5>|m;DMc16i=y+%H{>{S9k5VdU`r9Z*hD-hB!Tn%u%GZZ_OJ6y zc9l8j125p7KOT1B8tDeqSZavh#m?w(dW|eqHA!a)Sfu+IxFFaN=8Tvb)j0Y_)QD(R z%=*}jgu4kjF`Fa%ggy*Hf)?n;YrkkZsNc!ei?F6DP4_z}QKEJV7IzG0cNRhPvJ;&v4g4XLV<7TfQU8 zzn@DHON1QXTedTtRdpoVGjQUeZKGeI^8{7|Hwya^*)FOgdj*d+;EqAcNg zY@^8dP*v~={eytj+TEHZs$6C){-4;CyYFw|^*T#!JuH389HnoJ-wiDdM9I&>)A_6O zzvorWUr@NPB+)n%)D5eR{Y;a~msoGxGo4G_1H6mbef)5sIE%0*W#F%%R(6Ei#vEn3 zFu&+jdI!~p8bh8ZO7KQl9Z1=2D1-u|_Yn6PvI~p7+3pk01I{mwHx+w*)%ic79d9T6 z@b40Tp!SaQIThHyfK`{m0(Bl066CC zt$e9l7rZI>ai}k>QRKFmb1?^_*F{N@>Cq2kj>R=dD2rPfvp@WF@DlxQ&21H~d?stj zgpj?ZN$h2x&U?`H$u5lo(Q@p9DU*WX;s`={+J{LSG8fNTOCO4lq zkG8b6PIDMsK35M7}%rnoSmflY0RW0wy5$ zp^i11@5VFyGIl&W+rQTr?w{^G?>-57tt?lXfFND*D6%#306k51p<2rlRlMqqt~_LT z$g!|nQMTxmxV$)hOx37)k^LimQO?+!@nLZtqOwEZ>2(29HRl!O%mPYglO9}RR-|H8DwDTU*T4-{*Q&KK@0s9rdx=y%cUlDeg#rl#glOOUmRQ|)Qz zMIq<$o6p3p5znGLt|Ir756JBh(maP~0d)Qq)Mj!K5L2GuIdJOy4Q(nd5%Ren>=LNP z9OufQQl8@<0#tgZn|DRIaMmiOVK3osTCsgpSLO^eMb%NgPg@ck6WS-t5|tYBA~rAn zOl;klwCJ;uOCv+0Gh^^Lc}%B>oxy?nd~KxaG-HLlqWdO`=Y@39hr*!3(?#*cilVtixy64=>Y0knxMjKRjdP^?kf)M&wfBYp1#b|I zNDw4+DJdOYAD)E<(7&loq(pupm6Vl=pvDs`i9~FZ)D1E+Z*vPcfj!F#{)Joy`w0?U zN*z(oYo5MRJ+vX2Ma@K7g2r|;(^&OFwNQOKFfrst$e_rb(e~J!g!p)4%)cme#Psk* z5p>kT=$xnq;ah?+{Uhxe)dpD(;YB)%qdAQaD9WzW&J*_P6^%@TOX%X(h1Cm@B5%R3 zg06-A3q}9~_YceoLya-UhNhoo|H>!WJ~?N36a4v*znSGf2}$>_kV|L{;MnXZMgXk} zqv}%z;yV5v&iIEx?WzUd3zUu3B!!s58@bM0f4(I*mh*D^pk`c~8wRN%LEK$w0%oI@ zFpJ1Crm8YuzC~M2=LxtQv?jzAvOIEjOjax#_d0e)O!uf?5k=t}!e2yGi)tBmBIvq~ z(z?`Llxg&CbR#7EtnhV(G5N(2|RRUghm1JHYr zx1qpWVI8n7z%sZkWO9L^;lPAH+%m2vT+uCVJ3GL?!TVRBfoa)@o1)b02iijt_mXo6A&|4)1{;F!CUum7%Ak3&jL_qAz)2E|w}BemEc`x-AeY1rKmwR7 zG!VM*+aOuEI)94ya&7&6;7Rc@5=R_{etCe%pqI#BDDss3R2|eW0;&cYgS?@mBl<@! zjoKa6BEk^1JakOR){xaB`CCcJ;MI}Ysi`$pDOD-7d8haVy4BZWDjkePJ=3AE4*4xfc?tP$)ugi|)-tz~9 z??SFfqu25Ml!p07N6|$e~mEg!~Dyhl~vI1@;J>6*N3( zR#5BUxm91OLjt8u9~K;uIr%NtWONOp+BYH0Z$A2^dka`bqV?s{ZC!z zfH4|ENz3j+!q`6S1@cPV$;Weo@1~n^wzCCS=q*>xCew4{T*JGP4J8jt_8A%&rx>-R z!%F?7hstJ`&#*kG7;4+^m;`yfdweh1$3X4jq#wvWREH%IvD6`|KhqY}Q!<$-D`vVe zeVN158=%`&gVU|L^jV50Cc_H*Sk!}WcmTNSBhX#QF;F0M7HKZIo>XcSa@oL|I?v9{rTwK$Eg@@_WoV z%16AwGNDG=4cw2nq_^U6PM$QeWokmnjBa#6!e ze%e37d(5@kRl}9yeBtD|8lS!E=!Zz{7|YCrTBP7tW@?33B$MP+fe+w&dMx zdoLWBc~d>-S-vxX^<^F%(O5On92k$9gQ{P8lH!fMK&Yxqa3khN$U{BQ(wNPWJ zhdE)ipApVWN}?`$2|t6K06*ntv;!sL1d@u`V2(_{PKy+A26pXl=w#_SC=^zrtFh(C zXRH)$j4@1qJe_Q$NJpR1FQ{7B3)&Ba7B_W-zDY~~1u+dU*RIDj= zB>xDlAy;LJzn0I%uLUa7cHuS~%3gp>4Zq(a4Dx3BccSM!*ZeDmtK4z%JsTvCMR|vw`B-Qih1}t zu@V)gXvnpfn$VG0AjPBq*p>JL^b}N(TZoI+7nAi}D zLWhWp&~1>FwTtcy9?uiPXIA7-D2qjh+klrNABEbo211UrAJY@-0;>N`Jm$ZHw;)lN zm#=V>(H6`PY!GMl_d-`tv!qgXvv)rFSUyvz!ZiUs`C7aZ5^5WO^`sYbB3i+!_$lN^ zYQO)i`zUE5UWkXq&Hi9Pr|}?xdi@3RH4xNR}!AA2Np=LbBe+=10YNdau z7GEpOB=3<&-R-!}(gETZWEs^H>iV9OQxseInfBY_IX(>ODL;ig^sVFGK`w8RI0%$4 zHT`=~BXf`F?UPCA=nYjAnQ1vETG3}zW7S1Rwyz$A;jIG~`t{}!R4uJXn(Dsk(O}hN zA7u3%o%yw@bG~r0EhMoYrHbWK?Qg}Zszl*3GMXLZok|?nFSVz-kI`?1ji{Y$@4qC? zMUEgpD5rIV;vUt`n}OU$cS#y~Z*R8$jpnGdRZ8~k#@EoPid*&~@r{3!B1XQ*GuwaA z>We?mXWBM+jvy@4OjhQzI{NZ!shz}e_ZGUL^CmZwu_=wd)rMyD>tF>Bsp{Tz;=X(^ zy?~$U^)lv~4L+@|M zG?ZSjE_V~jEIo6M(;j#D*jTmSd7nKRu+@B#Z4;R7sp`E?U!aa-`JPj_QGLtOnR+6Z zdjsU7Aq(-erls$R?6&ZrqE2KlPUol`^4R~JU#hw5xX0}X=#C`1k0B4K^`J2KvHkG_ z^hEb^*?#G%SB_7_-}ow#x1GRjx*qyl?z+y}$OZL0X}YI{ zVyL%`Yh+X!v(Iwe-yR~^k%Uc{WSPWA?CztU6uPU{2o zq4p3q&~eq-mX1^x`S|jaWC+q;(NOqkp3mefr~4X6SJ{^?PCJ#D=ig9aQwPiH^53~d zJ{J?F^!lopEumWxvfQMvKpwgwe%q2N-==aHYmrsx2x%PV66G_mX|p} zli1JyRWa3ZiAEzug?Lr2HU2`>aRgsm-4zQj5VOS_0$4qct z^^etC{r4SAKyQAlbO@PmcLf$uwf;^D>7qJN+DSpk^DOI$8Ook~FTSC#sXUoE;Vj|} znoFXM)p<1m)9G(z|L|pDpUMW?r^ZkBH?{AORhQE6l?soI^lc59jgKvOs-7Bf*;t=E z74WDW@#-QPyGtq>X(ge$YmaKZtHK!?aEg>`uZPu@QiMXf(AUDZ zPScDX;x5ErE4%qKOa|H3*f+*GmL_pKedC;`lue`vn=4>D9$#$gSIi;znSRqA`ZFu|MoS4~p_J;|L8T#i{yNkbjur+eE_hdXPpEGrtvrup z$EE9TnY<1?l0C&WMDA<8@}Io@i4D>K#V7Ph#daB~Y3mL24v|M|D+;%guL4_n@|}wn zdx%B+U1TOVP&J6PTlXn5v=x6Z1>6nqQ=H^%8R?as{=G=KG;&4G08$s7`1^@^V`$6b z-r{FX0KbPHMeKD{Mds^PiN%JU_@Tfua$s4k^FY`y{+oS-c2LT*SMLOedMg9s@I(FL|Xxb4vOl4Pz-9x8w#fBdN)0l4V0C`^NS1LYozOkie z41Jr}q0Gx)PUOY+_~nuf()KMaQH1bU*|YL^??9g+aHb>KBdPOIgJprNC-KvtLYn+- z*$d#ktW-K%*d8+4o$T#H{S>#$FIYaw`^w*$W-D5dA05%^wPH2LOyWk+DKp|@ViSt; zq!(dTtd|{V#2w#E$7St!VwE)y(FXVThMHdoPa{JtRmh6agT}^ON?aXd4>C%tGG4;$ zNt?4iVwsB7p#E*(utknEmiqt3J}zrzZV@?|Z)~+uUDb?nh_EXy^mDX1so9Cb9eH_G zJC=Lxvy;{s73E_RbH#%ERBU2WBWqq}-#Q1SN|^@|Z{RgEi4=>}Ip>@1Xh>4=RO{Nb zJagZ_RKtTl`4d~0a&zzBm6H?j-}xbRkzf2#s^a`Zv^d%+M9a_pBl2!zTJE<_+$+8>Oa1D z*R1c;ULG70YR>z#GQCp;;>k&GFStrRapfg{CC= zDSwvwp!ptJ7@^Ai715U(@--z*q9Px)Pyef#kyRV-Oh@XqMYWlJ0o!s*6A~Q{O!t$= z7C%HsXr|fYnDbZ<%iNd;7KL9IG|o1fZyj3M+T8f2#((De&XM@Bpk8HDh&Pf;J*a3p zxi`7S7lc$s7|i>m-w_G#+NUb{ueQswVz*my5~=F=C##3sY~i6_jFcwdvy*vDn#x|P z`Z1$j1Nj%43hNvFSo?nDA^8yd&3VPY^0~z`s;|O0e4Zr|{}fzqKIq$_ZR&9_05gaUb}n-OVDIHB!MZ9 zmee8g$c=1hkmK)T{d1MPWVe=;J;?G0FVvAG7aVW&^7yzn7ie>3f6-2RS)!)&0Y5Kn zc~+yKy8$&!yUQ!3{1LKqSLkbzkNsxCUSZQsMh4oyYkC_RC+fwkZ+2H79RBFZn((jc z298fSVR{zxhsb|-r&cRpau$_-J15=sw8oLr#<}BaSbgL3GosAs&40DiMpUF5PDQ@P zu5(Vy8TpwoF?W%C470oBNbRnfYdo1%s=NN0`>A7zr?Q>JH5J#Q$9=yP7#38Kxl{c( z&3tov;HH$t-=}Dw;0KMz5?z+XelXMdIo#Sm* zR=)AW*T%iY;n}?W6?Q5}SAYtC;{GdIL&qu~lm*F@A=AH)joPjG{<(2-63g09>Wn_b zxD5#p(N`4OZE=yn$y^yhZpotort4}T+dRn`jjZemPPU&TVt_Q0Id7s?c5 zn76(%!qX97Mr1jTk=241pj{7h7O*)QKX}W2MKyC{3_lYGZX*v zR~rLK2R~)W7k%>#=d0AW+;f}`V#Q>Sm4v4$<_e2TFR68WK5^FiNLM2H9l5?gVQbl- z;;T`pH-()pOZJ>|QxUbxQlW%=f_4hHSYIX!iSb-D?j)rk}`U3 zacfQY*r{L5VrI(j{0yuL|0SY@`Hgdy>ZN?Xmv?NX&NH!s#nV9j$aPs}Dsv}{w#+s4 z2wA6D;=G@IDY>#N+yJS*;WaCosSfh*Z1LLmVtqxetn*dollnJ1s{II>_kAcD9o5J^ zURdw#s=CS6mRPyLWn;fnwN#nL$pKY;kwQ&H1IMg@wdQ8xfhxy8$hGZaGG0!qv&cR% zyJ{_iseOeZqQq857%dyixA)h@vo)z^o4+`ufw12EOx~UO$Di`#Vi&b4pSd(HGKtce zQp9jY&#)yqsdldBSC9M8vMN>VV=T+66qF^FJ*)D@@D#ZnXf-S(6}tZlUk7~=vyB6) zKmB=K-ZZ$Gh4m^U?S&_lhm`vbkwM##Uyd+tNAjXS!Nda9c7IjKclBE;2MWlYvfm*} zOGN-}O4s&6hxleWYU%zc_Z9fD@j+W$y_}VU4_F2er;%*M{0ftxmj~g?iSGIOFsq{7 zznT%%C8i7sugCu@xt6fTHO_iG>}EwZO2ORqI9;D2-`FQQGeUN`Qn~r+|J)++5Ona* z!6O?3)aR=q-yPMYVO&qy9`3ehh4KhE(rT+}viH3<IN`pky4NK&BDb(N<*WEj4sFP9dQO&3_e)LtW;+`wW}^sJMbbIHF}(I+ zX;t=Pz#PZ<2<(p=KN0`al3|@dZlrDcTDetS#Z~*44yQVrpQ>I2$Sl?Q!^(BGeTq7= z8O0ky{GJbDf2i-=<;n5}K^;_GihFUHk?-y6oE^dsJ7OxvhX#tR`2@bic02Mlcg?nl zY%0{!FZNt3`x<@HwMA4TRbADq$jq`|VNGqw$d8_Q6#M7=NInuW*z2kcdo!j9a+yVC zf#sES-<+#&wN1Xv;-Dsx^?XC5m-3E$yq8pevJRjRD%ZQmR}9po%EtSSI|}6k zsq<`2G}d`dxkvHYHU?PO^@7`C)w3qW=&-%kUCQyo2EH9OlE(-OG7;#ZAKW$7o0t~% zPmV(EbWC)g63ti*R00N#!u_B6C_N?b3Ed~IL0%++02}w9cB#44A4hBqaAMJUq3UrV zhsxyi1!07=L_LDt>m3P-))P!Po8?QAPo_ru3%E~MbLI{ated_Lyc%{lkxO;YE*`YJm>Va{ERmhECxh`UWt`Y=J*e zchqtOpDaB?F#MSJjch09_Vtl{VFxoqkXLyhBjyHf%a4~XsPnl~s9LJzpGWGjY~h~o zg$mR&Wr@1k^7pPDj%KnhN`rU2AO|m2bJPxA)MVdjypn9Cca@M#EM#iBen`92+gv7O zws;V!thSru(9l5AecAhvtje6_Tlii`i*Qj@#Zzt(0-~6nT&B64JQ?2C8t!Z_!^v!f z=6uK%_i*I|g*DHveI;LRYfES)yL6sR7w#i}d~2mGkT3Y9tS0&}_^0m~Xs8Ty6}}-h zndF+rH+=&x|n_g#LXgpI@rx*5BbZ$yq{`;h-6HG5ISs1?$6aisJDxZQiW z<8b~KL}htSN`7=Voi9audZLFkr=7Ee)zn^Y7IDMBhdhY6{METAjfHn`2lz*Pw%jZ= z_8;Xh6T{_l(e2m|DFO?yr&5@=8P`SIn%nQ&r&9UV>|NPqZX9^v8gUJnG~pG$gQ&!@ zXtZjExq0ESERJH`UOz9D>b>G7vjxdf*p=N3 z3#A82PME^Gk&W0*@hg{z1@RuDEi>7|aDCN}$k(1rNo-2(b~@qFzhdZA~i|2m$m;Kgq4UH&lD zQ)Drpgg96QwEt}_?zJ2H>(R!j;bHACU*fANKBwc`&#ppnMCgya;@^Wne={T8LAtrB z%3k5TIMnw@=&s!(WqG=>U*wbMVXi240@aFW;J+zdW*UmTuMrkYTqK@*+WEf8w9I19 z3-4rQ5Hs1mgdL<1R> zSBN<5GCIq(0NIOil1NMx?r`JS<^qt&^&94cnZ5Trd@gr5QO>TixE zzoIw2&5>~mGoR`2g^Z_q2s@+<>>%ix7w{3x348{d>T5u!k*$SP?zw<55BUOrCjC*W zhO{C^yHAlrh5!(p82pS6iJ`=7;W$SK zA>?c%7v0P46SDD|hzbc8rlA9oP;8>mPM9WECSD>txwg1f+=?B>2C^UdJF=7fcCiR8 z2hZU%@i5*G99xzy-+M>Jtf5`dbUgQhv5x*i@=`OfCAMkq- zfqcMsmIh!~#QEqk;(+*+-vSi%6ru@X@=p>r;}@{e2+Mv);XC>_AWq0>ehq||jligo zkR6;0X#G~~IUgv+;ZZ_!OoI($>tU;ra<&m(7kdkN>P5f>kYPqqCv3!?dFUDJnGlN9fV}LE_z}qY@cC1b0Kz3YkpI9nw~?a6f073X%_PpPm$+@_S_Y`KGH~n zWMzC3Py`wf6gK`4jT%2n4^Je5i9mY zF9K(9ymVO9V|zgp8HXk!7sPyYhCrj|(bIfSNbhKZ7{p9SzV3uN!FkwSoPu{24@#r3 zYGMOK0bbMn+(crWcm%wL4UlO3l~@<)E+wJsfH1fT@~kl71Ns0Mt`$gEF#!t}*8+KQ z5;$rv5J5o2109g4LyCb0xu1W6{gTqeGUSDn4T*zlbSz}4C!?wGFYZ8FRtJ)06i_dL z#DTtr1dXlWz1u2o0cKxgv@#G3r(!pOY55UO(3gmRF)#3Ydt));K&c@AWN38afuS=xv! zMh6Ph(Hr3BY$$b?I%BEGeyOH-5qyo`#s9Fc;sWV65()i;M0NpjqyVKL%>Y5$BdjQc zabXbBu})%{mg_6*o@SJ4n;hPWJfy6Mt9 zU@S+Xui*Rr23KEas347kURMoW0{7{LFc(WgCJU3X3~0wm$RqJN_-H4=Ygph{1fUK? z1{BVz@MGu>+hY36R@)9N65;fT3O}MkCo$xF`aFcsqQ~F?h{W$&7ZE z62;3>5;6*nL%NCIq>HE-e8%UcCg?S3ikXe;CDPDZbhF1<>V>MIkS*- zX#$WN7~tjphjtWiqcm{4Pk{q95vd{tVmpBEdj_te8gdng)_&wL^ingpajOGM)+CjR z4bc0N3TVV*L_dlm3UM?liIsqhT3I@Pdd0l|v7p5b=orXJsX*?D9e|>6Qp$!l-U&?7 zR^ol>7$gW?N8-dE;Jo}Rv_$iSZqgw1mbepIF(2OP7PR3WNhg}Ye;bF6hBWnmKpg!B zHS!4vh6WsY^;D+d5UfBt zDa4~ckqj{zn4uETzR!XW{U12uKf*o#1-)h_ng(-HD0~YKXmXpw%(Mp>RujZ;z`gwr zBc(r3J}!t#v>kFntP069^`v${)@>t}Am`9rpoC^3I>w=O$J|WO=L7&y&HN|>DfKWsH)M;+*d@^Rnvs>8fMx8=rM)Rs&k}Y zFhZ=*tDnFXoI&mY~6uBIUuTm&G1}o zA}s<(@<(wpFm{t*9z}u5%!}`UmpfewhtHY>Bl#2b3`}DF=QopR=!0Qs7nle3LEqzm ztNjKT=XIs?(8}GUx-c4&p`RRsT%br1Lu(?OxC-1u(_oyg5U+qHeIVSm%D|f4B))r30?& znm8PG5ig|}n2G$5pI`xRbdK}^=_$s8%lk9ThU3t4(o&&4S{24EP^G1{q6e;U2(;u1 z82xcb1bSC=!XBh8%$XWki@N}wH(7EbrI4cY5HbYT!}YX*el{GkKdu6uz%6wIVs>9x zeg4Ph=>lB(-!MNg$S+{!ha*2wM#@0b&_qxuIAN6OfVa~c+X^#VA9S7g96bTO@FuJl z-Oy&@VWb)`(y~#Ydy7rb-{K6Q5RVYwK*H5fWPy|;HiPwk6SQ4VBmr~;kSzk_?RD^p zyQNl0BQzOu26$v9aImI{Og*3S4x)OyHi*1nBKt4v$djgnxAvwDo*5%DG zlJ-K+x(B~Ez}@>FSxE->VGi)=SBn2z1r6d_=w$=ZA<{DN^uC5MehGF>JY=SvfoH{@ z(oAF-q$}EBi~%INms@HC4I$1hQ=^x*M{(YOW|8d zu%h;Yxn7d)iPxnxpyda`cf2dzg4RdTZBhi#%m+z@s1~U$UWc(;1NyfMI98RUU056O zJlY)+1Qvo`>?Lv14fD*TCohSa69a2uYk*co8*Pv;sId` ze6#OJHK~p`8T|n%HW$HfKN`k_TX+CHbrH#>=*>S7poK&4F+pD(3q0j(;QW6D^Kiufr_=0%`*s-ajR)>f z0C?5j0S#jiFzLr4vC@A)h29Bk>r=S43K*xGV0?B!uE8Ah7TDMyfyT1{Qpz4dPevd+ z#R21?68woAtf<4P#nAP8c9%=({n;Cd7 zQCbIk-HzZ4nGe|-8qgn>h+5PK4KVgmNwaO5b^7Cu4`x+tDRqG2}p0R8@?xB>Q(=a8%7ePln{8^+X6u{y>=M#CPvPp3D!>;BYYGXF+bzQ<$|;%o64Yp>G*bLus!j#W zTy=kpTO(UuY8(#2Fc?GK##bOP*^;y3W3lsHVT=*RPO`Nfh2Gf z`=Hg3XE6I_L5Ao!*s&afoyrK{hwlI_{Sag+{9h+&GVJaXAX}v>j93dK#(W2V!A4-o z&x2l{3H#`==tS5XT?hWW4T$(JAmJrMItC7>)xg|SNVVuA*j%81sad<&R~RSmL;t}{ zQ;2x^7&ILSpIyMAL?fL=w-ki_gFNAX`)0V2tFd{jY#(J`r+?LXl5E!%3EAiZCkzIr|AP?`2~_Y zM^l+pTX2sB%LF+NIn{{@N^w&0OFlzcPkBTcr<$Z3tU9Q4%FoL7$&0|x)tFgMuOxHv ziC9C}3qKP&0ga_G`#%Rtnk&%hw7;>AvaYb+vdp)rEdQBLlzl1tTz1{ur+irX+j6?% zRmEoOXB*;RT-#ik?qj|y>^UGf?GjEw4l06y@|M^{E~AU7k5naUJ~0Hppc)_M2>Px& z#5klE@=ZZaLheTV3Mc{_;bQR!HB;B@cEaMs_1b1Cn-1%{LcXxM}1r~RASe(V(-DN{?w`4Nz)6@3O6l>1c1Pb7Jb{?Na^_9J4MgiTqHd9PdTYf`U8<-JQ zl-re@d^fY7nFEBV;YE`6)a_O8QFTkWF7HSepdvyx4+I{* zOte6FcQC*IfH&Q9&H2D#br*0|cJ^}I%RZVz<<@aba{cA{>008R==tI4<{Rx_7u<_3 zN;}(@ea+tHdhy+bQi!}5aT*&1E7Feq1E@Sl$>Zpfx=U%POmO-v636lP$Rmbv+Fq)3 zx^T2U8P%;dPBo{QHmkR3MwxaR4jTW5s`{Tv_miHPw-`rT1}D}?+?e>vQb2oID>vNN z{Fg9LQJcvI4&Zl@;cv4QLiK}ngI|3+Jp(->+|M1!_Mh4R{Hkwn?3&}$I_l=^&3V-k#DzHL6!C5qk*i0S*f|X9Ti2h1m0a|D^sy5w_ zUXCt^vcREAlO_RuKm*po$LwtJzM?79pLoO_2zQklYgnyIGfi`!xha?FZ|T~bhb3*c zHMfRr0n19$BI~E56Uk?j*CzEhT{q7%m(fks-_|Cp>n8M7J!ATkMMV>O7ozO0$R^+{ zZ}jeRDjhZKmF!;}C0t!|AN?-)>%z~}?0Pxn9dg%n=PqZic(jGYL;FnJF3ev=IE2H$ILHGhiw7#C(~AIQgVUB#)&;`LH$(I6!U9c zZ*8FjQng+AR8ftdA$*Q?jrNMl+3fHL|4h#=cYjAodp}1hduDDK*Dbp|$Nzgz_J`b7 z&H}F5t{I+|Ufwe$u)sem_!dYhfABBS)tJKH5{D8c$#2vEp#Bw>b*B@lY}!k&Wezig z=og5a`~+IuGGZ0*A9jgF#RlB>Xcxk%+Cn(Q4iS0aEb~ZHQ#(q1j}Ax+p!M5mX@|mlhe@FwlsNn@{F`S*1ybc%$symbsh8;!x7ykbu0NW=?LEdh@yIN0@opw z;~V2)UFY0wJae2v?hxlN`^D@D(4VEbq;rK+<6Pz#0W9iE-fcj9xDr|!Qlk#LUUXpW z8h=Q<2!^mRn1@7q4^@nMN^PWgx*OAqEzNIo*yFq_n*?<{M571OM^#f;t(ZX2nC#EOL%sCcK@{PpSsijhSSbJOQng{4z z+W+(!x>DK&sN8VZ@*!N_WV9nVY8Cs#|?O4pZMPi|qyxSZtdXW3J8=HyNR zVKzly0wq00N7lxuJ63>fpP&`D zL`z1Sg|^eTG#=GRnTp;fO;>g|WCKT}i{Y}qhUtTiH7P7l6WbhvU7gxhL0Y zce-3T?oS@dS1)is7zy1DRSne$hob{f3tvO*&Hv_Z2r8Tb{==!N47jP%!IPXqtw+q~ zIXp-uaEet@X&_GD6M6uJuy3@z>>v3w@d95o{J+>$`7q5q%@g@pc$w8yje$m7(Y(qe z>FkzuR*m_c<&UKINm(iKWC0y9a`Ox0J%iu8#`sRRF(FFrAr{kL8DKJsnQWtQzd+~U z*g%5+hA+oi#x>TJZl7W==lGHa z!2a#dT^HID7s&-w8n|0?A$cEl7-*#>fMKZy3(X(kfJ`8(kY}aW{2u;Jbg=MDCD3hv zHFzp`RBWkks{Werj+rn0peE?d=BS~Hse)mKVY;=Ob*r(UZA@~<Jj*ibNN%+hPQI`HS!s1zL)b_2~&<$$3NlriM@~6{2^7 z=Xfk;dNx>Vijyx80~`aMMH5$x-532J%}#IuTd-+lkpBuj8d*wObp$9PUb=!I*<8)! zFdFrz^g7EN(?iqLq&CS#lZGej%!N#&ESn71jCYLNOjq^)Q>~K~K(^)*-HNJ2-sK~~ z75~i3ueF3&ne`$<38u;^N#IC{x?gjfkwu*fd z$%#bdZRqV>elF8nBQ`SOvU;ZaPnlErlg`p?weB-@Ft0F7&>yfZP5NOyXWN&0BDs2s zJaMf#+jiROHJvf7Fs;)U)Vx;qmnG3=pd@ZVtVR{h_MZ#%3RDUC15R`^keVI0rg>j&aUP?oFPVz7qZ;J`L{pUNAk97@ZQeqqB8XEI+5`M<6O3g8sij^vj8J z?F_IUUn92@MPYT%NzV{BxetDrD#SLv13H#>u@hCBWM#zx!o=WMevPW2cCN0G@|*;( zJt4__+w#TQ*_feKo1R(@nVwmrNez;6lI=Fhu+Jbjoi~0mj!(k)jb7%(LV2E?^XY< zK#g!Mwhv3OW4PX&iW|o-5XY1AsFwuJyWpD2B#+W<84cYJdBGY|7I*;GASQD|j0t}W z8@R5qf-wg@F5v+`F*YsCMYbuk)vq;2QJMQz*dnWI%Cxq%f#)tYy{3x@5zev1?nwMt(Osh`woolUdTo^6I1zBv9Gak?!}I{79MsptY5j&(;CvBx?&Jhd!{u zvIDYt@?W%_tVFcqeug>*_Jk@&c7|5^m%9c!bDjTlTyza~k9MlC_XX`2oJ(CR-C5pU zzAZk#x0APYU}Izh`o0fE=EpY1-p88pPX!4~bh??Bi7 zCc1*@@`2Du>c9!?(pYA6u-q*-h^knV@ZMNHWffgWvrzS%Y$3c-G_cf4{GE8%sL=E_ z^tL{>HA%dZ+%q{VX^d@|@vHH*g*U%8tuuerVfGXQWPdYWS&S-;h+j|keqcf116UpI z2Rr(zJ8wI_IvYAt++MfR@jJIkZe6>@mFxQHoaV{z`{mv3Y2zIo_zURh4!a;N?K~cq(L^XImNu#OjxfP zM;Z4UB=uqCaoITZ;+=w>>MEXyoeDPa{pY_Ot{!ImXPmq2;oMp-rL%(Tw`;ZirhSv6 zxkKSv>^kUH`eymsc@n*|0=keaQY<<(>WoYQM&U#t{8bef68}Sljzsil84yQBuv&Gc zMqy?DB^|^bco+GR>ta;M6qG`KAn{&jObGlzOu~& z75&db-$ILmXFS`S1#<7_)O3<=udA3-nR75(%&p+)On;=2(V5q%cj z6`6`N&=k(i*AeFv0%ETPv9_<0m8hjaqp!xaMGVS?XvjgF-$RJ@y+BN1J(n3{W0NC~ zn99ok1O=xG+rzhIHMRG^9*{zn<_mc(r}TA!w0EC5PJ3m&P-l=T9gc1cZt{N%M1wPfMSM*h{c}q>%I98l#GqMU za?9qP%<1JA;GE`4@y+yq_pbN+^kxMLM61P`f*VYLtsTJM5L*)t^b9W}W!S3+!nfJU z_?YX|W^j4_lDd$Kz$=%5?9^TUrEr(46tmWjo^PjL!w?sat4kAGx$QH~QdM05-eL^)vtj8fPY!xdDBY;I> z;$l&&*j8y^T5*LrZD;_|U%N1&y}qSNDaEMynhv%d$>(h^jkS$$&2MaDZGq(3`6{Ii zPbqFKW;|eFY@00AZ6?z%%^Q^f#PTZga&&RBI`N4=6-n{0@Ye_)3@#5`a&L7gbKhl; z%$?x8?^xw{4Fsm?_M`R^jw<{R%Ml}LGR+iALJYHeL^nU(w^ZFahlI?=YlcwgVt`oOx+a?VsnyHizN zz62*3^evL_fQ-E;x;wDT^Td0|ztX?o%Q)ZIjrQf)N%m@vZMnl82Xg!6q}odZeX*yv zhX07Spr^Fwjdyd%#I67yN>#Ra>=WlfXX`aWLqUx2@U2zFJHe8cgnQ&KM$kX$cw* z!5eR6X`J#be{Q}Kskco9^dC&SEe2Z|i^I@g^G&{t;plX#B`T+m^D6#FNZ~E#8Rc0F zZ>g&1jD7O2=f6wm)WUr9%yk1VrAT&@9IJDk>#}#RzoJ)gkMcSL6{BI6=F(!zV}EjH zK3lp0jN>1qmjnwvnV%|7jb_43DS9T^7Cb3#=-5MKLwksS3T62H+|Otd-&1Cjr%5^d z#xNgSsJN1F#xP8mU$&SkD9_XvNS3GWO1^JBYiej2W$Bo>A^moN?di{wa}5qdYg<+8 zeOn{*JzaIppYmyR1*Q*GocJOXj8$Pb1S|SJdFI0NGr2oB{kfj(s=wX8+hniHer(^J z+b#EH&bpiqj=9bwp7s7}euuwVz!K~h2}Ub&4TPUU9igfqLq5z3pEi>|ORYxr;!CnK zQ<7Ou)h1_vf%gvL|97S1VhwbW3$c={Ci;xnB)>qN6^qASMYa(;l?yZ-4fEB97)CxW zVXtvXawuhBvMzCn<+wT9TrYV+{^Wvt($f-)8HN~x<__jEw#DYgx(6y5vxQ8fN|86D zPl7x)I~?|H@gDM=aMy9$Tr(ZTbFb%A%U+nxWy^Eg*p<2eI5>M3zY1+d zEq??&V>CNZJR$ExUlu#?ts-XWm|}54ZS8AyFNIz?Tg~ev^ZLZ(q;rW0i7&0s%r~rE zQg`QTnXhZg2eZ;Vz-%#gF_kjz*VWb7lq!ZNKT11xDcJ%S&GUP?~N+rcv!f>u`%rA@r*0`Kf1GzFm z98K1dcTulQxCVshnaYLgMLN;+(@NVq*aYib^9#%3#9v9LQtPMuPJCgjVl8G_V^Wxg z84v1qYOW_tQ#6nkD~^oZq4p*A%{dl(YkOP! zF|a1hcG_L5+|S*eJzj6&Kw9WWxJA^$-a}oe0yyMzgdWIbECpWpHSlIIJR1wZBqVHr6%Y!m9-7^ z_l%#7_l*0^OD(B3k9CCgvTc8&A?b7CBHJSCe#=|qGeb=Gmv*kEc0vbbKKTfGAK?{R zbD8Wwc1EN@aJ>JTcc1sDH`~+KecjpKS=ecEEdXm|*mKGA#?#T;#n;JyBv2tVH2gZ; zHj*CQ9<9XAjuqhN@Xxs7+)MtuP*c1IEbU*LpNNgchQKecfT({! z%n^&Bl8i#!J`wd6O^}7s0S}=H{DA?;vb11A^isMfb6xhgVwJLkYL)6-!e;djjTUL-@g5~s(DplD@eunuU?WT&7+odhyap46&ic5@@ ziH?jEik^;MiWZ9w4(|vp44;Z5u{YT-(H?kqHoWayY<1Yt{6L*~#yX=l*_UkDm^s#o z-5EUq&>+Bk^96XN=kL`<{=jQTzg!{rmp_6zJyJs3P z5j8;V$@kO~>MAhn`%(MB)|w=nBXi0Ez|KCIKU8-De_ki`8g*UuUiDeT|AVUPs(}i>yolnST$DeMJ)y^u9GJgbiKRuk@RX|- ztIY-@+au}VM*TP3D10oW4BrV&2^|Yr!~G&pB8sRv+72ta4Lgdh7n>Pt4@8?qv2x(x zD##7v=7XJq;+n+{gK_I5r{y1T|8QQeGqr_jyLo$>J||7?~vKlIq(XNpchaZ$bX5#1AI(<<7eQ&h7WfEC^2_*#h)h#LCc0=hfp>Q$_zS8FeSvyHh)LpA;fAmV zEG7HF8>11gw?4v1&svxwv57+T3RrHUa3{+4`3^<(~~euE`j z4pxWIcS&=FRs3yXKI$PxAS+lK5;@H41gFqQC?{+L+AA&A5<2pauzst66Jr(X+Dr8Yv&9oICX^wsBXhW% z$Uv_1HR>~#BZD|iY$wuUs^}2Pi(%w(PV)7^tume;CYZt1`v9{~@p|x=eddb@wS@tY z)@d-xNcGrvQVXV3-wn5xRO@E0uLZ3 z0_Q^mCa5fGI#rSqfhnGhT(CfX12XAB;Dqc$T(=$A-i-7P>JBXSc4QY>dZCCAiycEh zFfCn1tS494jJ?8uy0c5<5mdP~Mg(X8VxxPh)!@)rNc95W)kU0oeVCsYn2({<1)LO{ zU?=%N`RHDF^r7C6*QuJcpPo+>V3)DO=ebIsr<>C!5Zx|A&!W50O`*T#!B!DOFUKcv zRct_=0LmE1@x&J}e2tVQBmbO+T8ReYd+TN;{vCe{D|H5{Y0ts4x+Ydb72F%?G-~{+ko8b~ znuPTefPXXvGqjW1MpdN_W5y1GVT3`?%pz(J_+Uzb@uo0srGJ4L=s7uu>O-}r!nluB z)MKhSm~X~|FWo|QqJEN2_zX934}HiT1W$&*Lh_9qFItHS(pYdZCJ+KS9Py3E63Y!H zTOxn4gFF=bAsV@BTyvrpUz@5TrgD^&8VzAJdAW*IL%soVSD3@UAZ7u(Ws1ZIO-VPR zCmF-qPZ_e1bP7yCw=lO4QStIW(h7bQ1CX^#QT@pe zL?UcydE``9!*Y*B?Bg}63TPl)<%5rui~7lAQ3gvf6?1o6G@w3u54clO#jav9EX^lK zeFE0uLa~ul3fxDJ#lpCYHmJNCCMr=~^a#8_Gr+T^BK|^+!3pFG%A!K(DbXIE$r7gF zvm*(b>W&J%FQ`PwMHOW;@+2y{hEQh_m;5afQei~5Z(<)fE`33L?0g_(e?Tn!7OISQ zNk37)EkoUu716TC=#Pp4w=*Da2756FwTnza=B5;tg;_lS>oSyl%79>sI6}=O z79exhflLPCWK(#P5!6rhB9}`QP|I>!dVzY?RjAjH5oYq46eO-ndqpSkr-lgu;ax0$ zj8P@18^CJR_4g+%iJ2Ti`cT0?3G9+HaAz{fhwGl?)5Ud&$Vc!~ zEBz%j6PJjyfCV*P^ozHoL8$cUC!G-;(o$(5>TNPnO}hbF93mUQeodv?Q!}Zm^jP{f zy@y%PC}p3S<;*#zj_f}$QQlK*RqT%CV_!Wj$fa813E>N$!~Yh(3NG=8v>vp}K#F*o&_cs~TAyT<#y> z?c*Nd?&m({vLVV_*W1`XKNtvSMVH0u@Uw)=sBD(V57b!Nz^r7t$~c)%o~Z~ZN2)5Q zhAC?)3oGu(egUP*D+R!?8wKiTLDms{2=2W!b|kp?TE^zY2FC`)8pe3=XU>bgjve8W zc$W7IFU8+rp$mZfs1p3gBk+$Zi@Suy@GB1qImaLZB!>GTnEB@L3xZ`L2eSyavF9_I1qT_{l_&3 zF|HB65`TXFG4yAl->EtI9lzaS|E%yv_6v7fI4+$fU9?*MOEF8;Uj10JOWQ^FLU&Bx z#PHBS8=o5*0|`7`V^vL%Ri|9iP9Y13+b7w%(f*N8xIx4nJ|A8LCduXDJ$PITSB)Hq zEQ-Emn%hE7ft39+Uyi~q%u+*`IcdoG#`j_c2nDv?Lw z=HZ>8(%@-q5IPVNLZ`#WBR19%+a}B-qRg%YnXzor?)+toPAhq~^v=?kOLi!Bx}YMx zS&Ai5XI*J-Xlh_Ms9B(VNRJltvHb!{H*IhCbJW*`pBjC*^ltjQ;qQlk9QkGYk19Do zog;mjp+hV!d`CTEFIlQGE1|r$h5ob=7+SXeN!H}QlV&DvwK`4729Kt^Dn(X`%n{0S zTr?$eK4b`e4iaFKw1mh|jZkK=b}$?;1q+4dhC`A0>@sdHJmB5rD4LcXlh;rlR&G^x zRn}3^@}*1*Y5@@yTL7=KM6577KDsi}CtM;_8olH_{o{OD-ksjt-U2|DUl5oSoP&r& zB>Xkfm?gRXKr|dl)&f(KUiXJ}dcJ>(v?_hB!nsOR<@*(TmTg;nV}W|9Tdd`bV{~^k z6V+=}i{uvS4BtLH&I78AAE}>vzpwc=;|=@z%bQW}GC!!ljQEjmpXb>fTFh-B1m=Nq zoce(FyWVBIWu9rRlt?8%OnI2vE!CYoD)EYWt)5Z)WEtcMep0kfu$=d+b3*R!oYmQX zW^c&8o-;1Df^(nC?1Af#o zOf$rAm&?Z~`YVm9!K%?JL0LuFTj2u&;S=f%(NlcHb!U4=Du?$#YQab(R!Dk5UsLL| z4GispkS>|3n+h2d41M%-^yLlXjh9TF%pTJUV@17LlV7=%c7v6=b@-G225`1#=5EWG zo82Y*XZC+NvvLPIY4<44D{sU{2U-PRg?>dou|96D2+TXCq~firu;#V)oNmAFmbM0B z9U{6b>(XA-I}Q?fE*RSs+mF*b16J+fvD>UG`Y0lgtPP(I4+Ywco!!c<70#d=Bm=ks zm1Wto#`5y=H2GYaQr3cLM^B>W;?&j{RgW#CE8=wFJC_pEM<<52gr=c#e08z1{aV z9~NaTc=`Ex{TEaJ%guWF?)4|(Tk+pVa-Vssg)+Eb#0dG{>NonD<_^~WwhOiswqJW+_PAn*mbrc z`zBf~x)yaXb;AQgYl9D<6Q}$I0;7XX!h@qvW6gw)l9T99Euu#!BYI|CrbkfSMDqiqX`#OUV&3oW z1@6!8OP-6~w?0{*Sg;b}HoL$U z7KyFn0(@`y*>aqtc`#@{B&OkHp+pB-C1Eko@Y6UkmWjPJi=D_?*jSWgJAs+HVeCz8 z2RDyDEy#vO2AAp{~U(C7%`lyYS9@ zN^4ooP$pMYpeOj3aErK33(O~a0q`g?0)IG8{iy$W(1(H_wtlMf{bqI@m*2N7)GHcf zDXuV2qc?RNvtP+-?iebV8(XfM3z~xZ`&z&HPC~ncgamEEZLp?{m#gT_;yAW_@U7>e z^PFRjlkkA~D^i9lD+sivT;LL4}@QIcu+e3%nps*=`(E1ww!FCdtcp^Okl$rKP#6i8F-- z{C4hW>>|rY6;WfPX?S0V3C#(%3{DDW1s8=zhpPZRb`iKJmI$wiVX}1fFQX^Pn?I@O z&ywrQWR^KnDy7)%jJ?SxO`9|d{+f%zdBMab{2@@1<0-RTlw~cIl8FvvSp4k#9^k7Zs9a<5Y#C`&UOg2AEY)<@3U6kch&C(py_cuxA z1z4~D8lUQh00&|zbCp~qUB?>ekDBW2=#|Ly@QBc!VE&*Z5D9ceer0E9PZ&9Q*2GD? zO-dru=?}~h`8CC9B7m2h3PP*q%YPFYwvKrupIS$35EME*eU2F#wlEx{kkMem`Gcv4x=$H; zK8A`V5Gm@+t%%X!7VFBkVoR_YY%#VLJDSa8_rx}EBA+E*BYsdUb51TP{!=EY{!=bf zwgqp|d1X75Awf!btsbp0YYS+MK58l%FK$Q%5>EONbj0Ts}dFCVb^Ww z#c)WY!LTrtKg(6+VzF(p9B^e9WKTxdM)!lA;2OIxc89a`wb9`>nwUzC0zUg}Ab#(V z?^BFY8dQ4KO{G$)P&|@VXAV-8$w$&j@s#kAuLqWHg6kRE%w|Q&XuHUT@WSw=@c$xn zqSjb0cR-kl?!b0bFL1!CWLkMCMSbN~6{DW4S*M+%yQ3SW8=$?Ro~F7hkD_My8+=ec zT0RnnkNhp@5B?E495RGUgvW$4!{*5GNQ0<`HN?zZez1^K6u*hR&{caFr~egsT~9L* zkvIy5>!o04{+DV)y#~K+dF+y_#oLIU)dLRm-I$LZ%toWV*oCnm|Cm@StE{eIESgv} zZAt!=0#`Gdq!&ubu}ssSO~{dVXMBk4y+blxK0t?H_Evn`@xS7-)??e|6>n) z3ajUjfFt~b)$-qjsXz_*P317d6<<^ujat`NPZ~V>cKQZ7ho*sgk?NJg3_hvVq$pbW zU(wu9T7dRl@YM6%@wnhwTn(NNFO15OPk+Hx;1BcVvC}M(`jH1|N?u2qlF(oMTD?L& zBf+oq%I%oVb*RrwK-cgaQ~|TnBb?lq!LABgPX~}64Io0L>I)a`+^f{ z9!}b6VB5_hQ^5y(3~a)QuuQFhP@ha^QUqn4=_aT1aF1E?C9f{Lc1bW7wYn$vRn2z2=v zS(p5R-lTEp+G$6OMF-Cb!h@MAh#UcpsLed=2|Fd5GzYQiX<)^^$Il0^@LMn=zlVKp zi8^^2``bL~PcT`RM&73`*m%mxXULbso4f*k`ds=OYLE7!qvkzMmCq6LZ;7b3i_ZkV zvH_k^c`h3a+Gk>XY#8?)6>%1vINKm^witK}#Sm>?05Rs()h;2<0e<)j zek?dXfAE8FR(&9LN7dO*!ib)iIB&rTu!nb}UxAfsFJqQ%kmZ*jl($uUR7_UJl%rJ> zRn1fnls6Tt<>|7+urP5HjaK4f$kWcHaQ};qVjo5uMVY7{6;%hKI&ksjLw(qnSQ?_a z8#snP$}bUCi8~SPn~eDOdTKi&%vtD49geQwV!)5+2Hf_#VrM)WiG`8HcnI$FyU4NE z6;Ft5(bf44ksl4U0-BqT9)%3`G5R8Xht2}SeGU3wR3)d9-O;moMQn|{;y&=t?*(%I zD?EMxDgP!=!ABrYR*~Mo=;aF(SCtb~r&YUD8LIBe4vMz&@-hqd!YPRG$)!U=9sW_Q zH=7zQ9O)Z=6gnLWhdPI&;g87rH;*0TstP&cQlbI%oGvY^CjTO@rYNF-^^_&Zh@Sy{K3Hq^mp}uLnm<;`~f<;@8F5=U~5pp@zk(MK-@r6l{RgiU-jlxW2%c{y( z%3X4`LZ)~j?%&x_9V9yEr6Pz^J>^${S-umpVj3{dQ(Q0bquJO?i19e!>tBi) z(X&zqUiE0|3+-Tz$qLFF$Sccf`4L%$Yz|n)!7WejLl5RmaV_FTDf~@t3pbNn%dO+i zf#H26HyykW^SNc*8BPw~>{Q{FFcIC>J0&Yt#Z%%l;l0;97;WSRphWyeJa-s4(yO2gM-5z@8yHOw zuxGy)u0b#Tun?Dlo`EO?uJQzICHJUwx;xnEPtZ5%XUK}IpxeZNA9zN4|fskf=8@u0rB_CZ2VWo`L?$iB9L{h0)Y zoMVW5yyM%%ZbcS_b_EvuD}nRP9{4-d8EjR%V=U+8w+j^z)lNo+@;EVqtV~h#B>FJW z_>A;K#Gr4Xk5mN1;ySE=A$)+_%T3_=bF;b49L|kLRIvGE#x+S`}jAg~&?C;+T-zn-4$r4f2Pr5t~nkEE*wt+YxAeH4#HzNJmze0(;r5cWDVGhJE-;TER_MC zgh{fCAK(Kx#GqIlTIfTEP^&O=7UTtU4~5`B;=N^Jtm z{8a2Zi=;oK^Ws=k<`L*T^`hf-J+AaGIx_7*T{%qdz{$Z1&iUp*pV^5MYgdoW_3=i?Mecp*|u%c?kUkt&mZ_jyc;3 zD|rWd_gZ1TFj43MT%dx2Mv$WuGe10}LP8V#{zGUb3>0=_r>}vLzsLT*7M=Co(ShnE zs*qEV0UJ$rBGX6@I&sHh=JOFER^VG`)H76ZT|m$75a@uAp1BMcmEbz@&*V9CBOo58(Fpo7)dVl?>4S*5`~=%xIeH() zA4H|}aa8JgksZ%~57vvk3=Dy`sCa8E<`dhXilTL1*RX^o<@= z783XISdE_iqv$|yFCG&b3Q_(8PYb0mQwdnt_fZ|>!c{JaJH(Y(>xac(sQ2iHyVwAW z*#&oD$6a5{+bKpt%7uXibdNBQwNQaj8}$caR1Mj%$LvRLtQRc&c2pg7LCwZq@O(eV zXr>d}fWnYMP6hVSC9LTy$oUzOt>GZ&eo_Tkt`x++_drfjNJuuz!%>nu9KxMYsk<4nr0De??DF-UR< zW;}i$8Q5{BLc?A|KL}tB%*4+075q87#ckpXF%i1947uOs*nJk@OtJ}C%@@e)l|VIC z6WBiBk>srkBfiZDOF9nKjxOwJrHBf!vMg$x-9T&zp)-9B>M4AfiIT8;1CfK92(8~j zuEZ*z4xF|+U`KucB!e!f?kEMB6ebG7Cf>!K*;Ohh`5?=$VzyWsBYproG8gy-?_f)} z5CgGFbCEO4#X0FMEW|J9>;c?Q6G;Kz={{Du3wvM^P`Q((O88k8`qCI!Dhq*V^BQAI z!R!pj9vk=JYrz_xg!eodm_c)ZOyLK1MnibdBQQ@RFegJH;ktOH4|?<`!fu}imPG^= z9CfhHJHmb}gAUC=jX-1MhO6U*Mq=kX3=QvuSa1^Y{=a|6Z7l+t?LiuqfHm8SLo; z5ovA(YdZ(Kj2-(-405X1G_viFEfcz!JQhmz2yD0KfP>`xR(4OXyn_r#i+3ER{q-ro?ORl0S;2a7TxrkyWu~I->T^hRpk8>><}7u@Fu-6Nw`j?Kkw`uLZ`>Yv7~|fW|kL z{)hSV0n=a>aDH}U?M%Zxlp)lx!Ns7lGw|HnynhQ|S)T%TXaqEN7tWcLv4(CTr~Cmp zE(eIonCXgGspT+Z<*^eC!X5R(_n3h1vLE}99~hBS@96uXZ4+owP%D-UfMhkI^< zD?Y|r><>wS3j}NMAYOlqZ25OoLOg-)`k?W-K*8~0uccv^!my_?pxET%XBzIdGGtv6 z^YtD*T?Zk{GmzYCJYHg-dWS02l`_r6~n0h!Dve1`AzWahvNHU(2{4EqYFTy$iSU+hTLi*a{CFp z=M~J}8_Z{ENOB(W7c{6E*7pnCbzkW1Na%YObhHk}+!XTBii z-_ze>Z5+o)Utxt?@rfDGmc_(6tb(4{ht$yA{kZc9komu`bp%i-Yua5Y}Rkn}MANNR1fp(Y!GY#2(!QI2NPvzf&09H)wzZ z7B_xXB`RMsFxH;Xuk~2Jb70fTVFn6fch7(}m&f@3hq)e#)jJ+zZ;N|x3z^q|bSgs& zS$H~IrRn(AWf;pV=yL{EK@(WK3aB@+;Pmhlx|NP~Fb;8r9avpE;n8-&`cmNhatU*} z7WzK{mTLtp)H?icEAHZN%tJp|g30**2WWU>%>EJh(4Vk=AHpI{g_g9!cf>WeC0?zE z5!-ORE4a!~tgwzah0TP1+=f4r2}|?>|NM}k7M~c5YpsQ?dIy_!j#vY0)ByKi2y3ho z?zSZS8Y8OK@527fMU_cM{G6R<_kGxv6_8c}?l}V*lmW?9#Td%r)9taR424!Nf>w{f z$WUtqP4z)GKQM-a(7H3Y-V&^FF!x~Z>yLXIiV^O|d>qHRdV=vuu*?LmZX?XFyETA; zGznV23YspiSQ~ zIxFT#g>O(mzi522Am+a&B+v%e9fK=(hXnIuR3EYKZ)5gf;*$>e6#4OdX;|^n7!?5& zEdyq;D6W zTr&e61rQQp(YgWArZ&dV4ZcGs%<2kMf$qR#1H6L?h=mMk6sX!hOX(AQJMR zU@gAFUVVT!au-tkjWw=^PNiU;*gS9N2W0vVx&zcS;0RHWABk~Bu}*(LVmF|z=kla| z5+ghh&E1FbAAzP`$BaD3813+#7<`U~jUs_Ko>lN+)_vH=7(7z|>%@qA`V;rr75d%@*0Uch&7aVwN|1O2pLAn9A26y5Sa)Zz z|778N-e7mB29dk|llR_HSWpW2Qq7R0l) zV3XS5J6qyCJt6l2cnpR-I%7Uc;E{lNw_~1P;C}Al9&cb~ui@_A!6(q-d*Tlk(hk5Y z2tX_1R@01oNy8|D(5&Z}X zczmZgijjZDNbJyUH)iG&_P!?=&(l0xaTj;>KJWFfdGC1zxqim0-?886vF=M@G;J^+ z4e+Q7JuQsqRC%iaUkFX)F@JHt%mYn-j2Q=K8Sd|Itce@Yp$M>>8p7U>z}RPDC!2%E z4ETBNplua#?Xs}O)iL`S&{7rVhk{0XG14E9+e56N`?%&u+!fHLuu>V!h8i|gjk~T4 z-KY-<*zidgM*9<=ap!qv0n7u9_iFLZM95SHJ6s0eR2{Qf9)Fd?{}c1>qa?;q89Q&4 zypvRX|JCDaTCB3Tjn08~UB-?&9cP)okmVfgmItAEaj)hpz7atI%$f?b9Up&HT%|hh zwF<6W9HWZ+@dk_{Zq4JKTA{qLcp>d;xXKdPwso-Fmtb?AU?x7`N@k3!0&IJKcy8-( zBHf6y+CF&7E3s+^V%8htj#F@zM0}?QQu>BZUxK_3<@o^XpjYeRJt^z3luNS)07#h|b*02#|SQ)!~3f>d97xDYC;9q=JIru!;uupNniou+7xJm>zFNE31fhFP~CkYxZVg?QPmL%wn z2l9IkJ&Z@3eE1GEix`=LgJp2v*Pr zX~gYd+{^GmZ@=M@15JqUq<-B0^St%-20Hy6&*x$c3DDp8%;@o%xJD^)uLO3+bXbp! zyyufK{zTX$6lg&f@vGZ#-vuzQY51K28PfidT~G?tFQ_ zZP@u87|jpZ$@dt0F2;TrKIjGL_5;W;zSF*hzKVGkJ6kz7LS{D|{}3&vTHn z91@CaRetDxGDcDeKTAQQGa&g&(CYZJ`Sa#Nhr7|?mH4w&@Sbwes%p@kd>CW=3M#yw zfZa%vOppMFZ~uvvb4t<1CcB&b-?fDc&#L4 zmWVON?O1%b%*7gei!oe;{n~-mvKLnE82;V`-9CtYHf|dpVi$OsXRpw=gE3Q>OCI}o z+{cQ~tOa9B#x>=5J_fl4VWYn1t%~>wBM~bu4cd@`cbG7zCP>B#ITptV+rj?!%G-(C zW4*+E`U1Fv@{q6wpVVWX1DLD0&Hn-Ue1@fafw6qV_21xkCg%7hMwbO$^+{*KBzGA%b6%zOJomhD! z##RWQjC*zYF~+!O7Wc&BC#e7TkAC6v@f8@4|HOTo%skzVpL65q&5wA^3BBR)J@K{q eA@3c(A?Fa*g&J2^Vyp~CtieiB!#7O8sQ(X`^T>$+ literal 0 HcmV?d00001 diff --git a/tutorials/tts/audio_samples/new_dict_entry.wav b/tutorials/tts/audio_samples/new_dict_entry.wav new file mode 100644 index 0000000000000000000000000000000000000000..e8c61c12c66e1a07f674fb45876f87931197684f GIT binary patch literal 109100 zcmWh!1#}Zj7hZ{bo2HVwyBDfJad&rjhvGckdAPeiq_{gpTeR+yq;1^Y*8cow&*Yr! z?#$eKcdpF6-;5gAuU}L#2#oJJuJ631MPWPu06@}a?`$VF z5+^;ah&WALAg&O1i95s};x8d3+6WjR2r3W+z@GjVA`Hj{CIg9{-ZamB2=E8IfJ&l< zU;#cr3_u3Ngb$DeL<8PHFc1Rt05$_TfRUgBJAp4i2k;a43RD8`JY!HiPqzY%p8pqs zX`XlD0VT1X=tr#a%>0sQ@XRg%h68M11Cd9B5oyF|Vmm<)3xU}{J#m{DMNkP7zLV%A zQau#J0qH<@03jZFcsStsy+M=`2(bt_2NVGJiTdn4~?6FIlwdEBX9_~3w-iW_W~FK?D0^`0>%IffssIOARPEZEFy*z z{}8i@WMZ91UiC!0hfbqs#QsDEF^>37%=Gj=0KD{YwT)=;%vtD>OdN0qC<5|<(Vlhy zsPV`!!*lmPqLO$({32cx5gut)6H&kuVl#1&hy-o{8vw#X!C-(vV7L!)pRnN{iBO=9 z@CSrM5iTHtiNi!3!2l)zs7LNM2&ZSP6yOfg6QB~G@khiVJeNpuD@atqQZFj&1r&#_ z2T5QLTm!v^c7tC@zkwsjQ2P+ucUy!*X1!`Y?wE~zBDMgD#666M7h*GrCEz*_ZSC+H zDwp<%EP!4Bb0ICX3D}I6Vl}`GGD@>jFT!=icbp)?i8k~m>c*c#58yLkDt^`7#p&x< z;T+-q?Ou-Zfm$#bScH|M!-$Ds7}N@hVJ<0`w4Pix6|j)7tzGB^w92b6l`p9#Drx)5#nQ9?~9h*LxiaRL8`k03OacgNQ3w2G$>| z!vg^!aGSV_yRb@ZBmSB=1Om{%;8tJ>ei$2$yYbJ&UeC%I3{Z$v{2%-Yp6+QOKm~XN z>JJ@(=74d4pJ#n`U;T2Fv%Bgf%wo=pf`~R zXrUcse=-x=LGW=1pG=eheE~mQi2g)v#C`}MHIvfG_aQkp5SfLJBE+B?`UD*Z0IbtJ z5-CO};xXVnXfAXC>JAMC>EH|SGJKvKK*E6=cmo=Q@jRMp#Z!@NXR#{_+fD2T3ZUum z1n3R0lep*6#u^|UB9S)2i-_^)79<`qyN4idw3@gFJ_0kL?l1+~3+w_DNC}iRWGi$V z>IUBZ$xEqJnvKkbdSd%ragJRMHF67lM~a7k;o-E%e6jg1 z*md1?z*UHJ1FA`1$Xnn@;wM^y|A0PI?Tjb1Z}0?SJ9-hFhD}3Jw;1_`j>M910rm-< zg)K*iqP_4}z!B0<>L~gr+HUw7GTfPn%D@!Lee!hBipre(TnmXn_z%Pbs__Qkgh#Vs zAQZStOu;3@UHCrPLFRfK+Fz7}#t=)Oo8VhqgBD_aF$dC&oJIR$5_B#uhEVc*@>o!T z`4VcV2Q8f0gH{h2(QtQ_yUtC)dIR^sBxoM?*nZwT#=O}T44t7}VaCyWQEr2y(P(5B z>H>GuKQTG<9nf6V;B?y5wj9?v;upnAm6MEko^zUmjGZS%(~M*#z{bn5AJ|ZwjqgII zqPH?1J{5U z;0qYB3`FW2?OKi=C$@uA0ULH7p&}ye0JxLnBuil_u-;>}w*a+3s>jz$J@HU~yb38m zt{{`J;ovv83;dHfil*X!z-D+oNeHio%uphB?5S6i!Gmc$Ob3<038te$bM<{5V$J*Z^b)&bkPO^5; zo5}g`M#u*~Kw&U@aRBao)-U>CdMSg%oyPa^dng`bXuoY-H?BgZjjMb_jOxM7V}v1I8&wZy=j&;(XqlsxK=qc90`u4uKw6tU?_=7 z1!&o{e<*R3Ei?;bA7?oCBI_1S1mq&9I}!T{9)$U%@64UP_K5Ukae7WpZb4W+Klf9H zCFxq+l<*nBdjj_I{w0Svd^L+@PRT7X+`gpKs_3Skr|o5Ao4Z-=TEmnm$m}Dq5PO%WSna;_M?UoY5VQm-9NzFjrd1JBplBrH-RaVQIrCSuax;~b0 zXP!G2d4p{PdXqNLp0g%Tgm_K>{6 z)SgSb_Dvt0ax>ME)HNzC_?CAO{LNG)s;gUFORBxnaJ+>hIV{i8ZE}7f4`HM16ZAPG zCUDL5#d6X(-FU?O!TiEluH&jM$Uezrs-EWg?k_kO90~RZdShu0rBS0DsIFIBkT$np zmK2Gd;-t>M%D?)R=F7I_Xd^k78{qprz%I-R=*Az#C;?-TWsdERS|=OniEOcdGjF&1 z(xfq)dZS~&344b1=(8a2Y8H~s&%2v@IrC4X)q5S%B!2t1^xLwZ!|JZI&^rpHdpp`> z{ndl)=cq@$Uj)^MwFkWvD9P2f(c1phP@iK`mXGkLgaS&jqcfZ_V5*nD;lR|^k0?zG4VvQG80Q{Nq84vBz7v% zO8K|yZ@<3FeqE?|QuV4aPU6s@cq(^$@T8dGF}tHU!AHCz>7$7q&IIcj-BQhFRjh1Y z=NL&~oTz$;*1n>BW(7u8P_~Q1xe}{d1{@(t1>}N{ZnsR0B8QEEfkQp92H>O|QqZnDp zuYd?amUoOV?0cMdl)|?iluvIv)o`)-l6aAHj|R|P)b-Jat4}GjRAjB)YIpwQm~IJH zckA>OZD}rUE9`hHKd2jJ&BI^PVz`@q;DF-54M8&lGWjLU*|5UZW{T4k%4W0|w_L4H zuf5%{X~I;LplxbQ+-yqE7? zzZBn}zIkB*0cYuhjjG?+Td(Isp1gl+E7|_x_R~2+jdC*)i1WUxk&C3bG>4{D7|rfv%7t(YKuW^kn3u6 zFLZ+p-A$K`S1qsI|3Rx*1H8(4<-GqmYSsa!kX(m^o8~E6IjkPE3((lZ{rR()4zI`(FKBpWwWKlk?a7D%!X7;?#TH-zV(}ddup;8|@wFzcb_p z^Mon6o_N#v^wrbi*Wj-o)!*7i>b5zXnWTV`!Q-NPM=y>X7g6rt#&>&t<(~0-6Pz7# zKVTPafa9x5-Fl~)(N@(lK)260*%WJT*GiPbq=!Um@x*p*N4xU9mT3a*;~j_H6<8{^ z0_7m3ZUY`m>Pi=Ij*1G4b zIFoMnElAia*yYV)9%ifk9{aWUXrQH?D}HZ(QGGAviT2IS-)CFbs}~cwyc5B)Xj0U) zNI}>N;f{dEzTSSZ{)2rOzN7sv@t1IhfvYUfRgjq7^1YSX9;*`T#g@@dnSHGJtXk4u zDEie3i>dNHI>;W3mw-*s8{(*oW&ddV>0&~U=yGnUH;vbw@fEP!dg_X#EQw3&`gn6q1eE7vkHaG_;hml3gDD4qUaLi7<+8 zieaMi#xo7CT6@Y*7-F4o@M$0)sst9}z3`cyo!o576UHl+i8+iaA@Utnda+z7{@X08 z6W4nw9)g<4_WqQqOBUoT(oX!*Z$xJIz!1(?@+x8;EyJrj|2F>!sZLi~PksCQw&2Fo z+cEE&t0T3uDcgM0LPv*f5eE9M=J~Mg+&*4ZpC7*2ew~5)LwAN&2QL?_ruA|?QF^s4 zZK!TY5=~aDGRGrxurbIpOM+@@yR-R2(=E{_X_9WAeFJtLI#2#XzCk`s7Lk;seiSiv zG4(#l0=#syY!bsrg;LCG4yiA1<67c*%QJh7Ix@R;Vb(1Bu>CoG!~f-cgAIU_QpOt) za6@>G$HGanAwT;(yZoR3fBT+hmvwL7hMeNu4wQwA3ebB0$H*W{C{q|#p4tCSur*>t z%!TNu5%mE^*1tHWzu)QIR?(~xv1Mi2HP(4{%sSDimWPPjo4gyAwustml?Tkx?g;QH z^)Q3Py2w=0E>c(|3wVe4fu2PIT$}A@t;==g@*f>5B~x`q(#IfY?$%+^<9|%MG@jY- zY}aKWA8AVnALlVl#1wnU1%17S5(_jX)!W}iJ*MBk_`Ix)A)4=8!9MG+4PN7y%4&s9 zx&C!;g$A(Lfrn#^$=;d6)4s=#4YPagh47Y)d;=9G0 z-5MQO@MGkn31h~d7^2F}i!T<6S!-zN^q%aq-iX(7_A}^T(;Ugqzd4`6-rali{&QMY zT*q36gbI4c`Pc9z%yFcTXdSkfBxk{amt)8&`I)a$$HpuaK4O?rxA~<;F87nNrOTvO zWRn#ZSug3zj=I(hjc7wcV?tY|Y?e0FJ_4wq_T%*8&-DrRwy@_@ELe@J-Nvz8wG~+> znpT^lw82`gV?U7P9TImUE1A;$$vI*9`q&J-3JQok4ABD^Wj4cDp8Frua$Wh51X+KkoqIdcIM^_+D zi=Sbj=0CMdsZ=B>OJsG@l1{FKDKa(JH}YD>x4mqyQu-L{T~Fbj%=@0$sfS>kS9f}U z@Qr(o?Txw6G{n?w3^uKC_<~7XAfiW#A$x6MTDK{=hA!XYwuc=QvUuIRHu^++HwYTN zrPRs96hn(>V6E_n=*zxuD}KJO?;-U!g#$H|GmJlsiBKdy4{3AG1Wqv4`1;4>q?$4& zr^7KHc7Tr^Gmr#(lyMB$~G zqI5NtC(4cmoWM4lseh;{T^zT0A!cU|9$Iq3V~U;TC#d`r#_ zwzD63GN<)KKgtEp2!YXeo!4>FVUwn1@(;}$&GWx+OMc90Ni|?t6=e!-Kls6U-c+n_ zwA2%9?#IyW$xpMuuKww}VpD_8@>Wucv3d5c#*6APa!1DwNu~IcI8^dZ@=G$geMd*W zbf!YBeyI<$IBgf5Q_&8*5y*q)0a8rv?rs0avelen9%5y5hg}c667W#?JLq_D_rO|yZ<@k&Rh7{4xk6D^|NYVr-wJy3L0PNeud@KV zhYdpZ*e{ybn%nJ%fVrF#!KV_HT{h?CXFHM>hg@S1BnaCgQy-JueAcqYyhLBA$dsrX zKh|`wT3JP|JJh_reTAyXINcF~#Q>+ldtf5424`R+kS6y5q&vbwco-YJLYwF_CM-E= zN_MYqKHcx<{Z9WBb0;v{tBqOAa(F!uZVG2cC4|ZN>!Itq%EpGDai84pe}C!oC%kcp ze3-G!VMPa_-Q07WT^!%sENBb!w68hzc8ok;99Iy%F{G#9A#)!X?>gzxqd@HvHCMGu zNl_K6UaFduXXRHqc8E^5v^ULc*&|u1uo><;dl3`hH1aQa3ee%nP$jubk@I*syq?w1 zPY`)J{ZIav!nHkfawjKGiO>nMI2o*+%pKe$pWTAHz7#&jSP!1F_EEwjNzLJM`rq&6 zP{SF?2lX49*nJ%3pf4S`ZL;GK5=UwDUJ*Jeac#OYeQ)x;h_ix`GzRj@bVjpWd0BZ$ z)l(gz-lUYu{5wsOXOc-Bbuw>tx}IbnWs7pma@@7Y+KyTfbCl(mCEV_G1iMzCpBZjWfh(BTfY0@w?)c z!s<^ggQnwK-Fc2y>n+PZbA@rV{*n5REUtZVTc4I~Ev`0o`#rhV<3-mRUKx@M+x5S7 zt93s5LMmG^_W5u2l)U!m9(}z=uI=Ldg+h!*YZNx!J%2 zy|4}bJLU6$cha}*pT_<6ZuFOpG_0~kI%Zq72D0{#dV+qBJsJXe;lY1nXQbq$4ooB> z_X=lvmC)Iq7%|;b(GCGp2?Vj)HyR^V*E`0G5?e2|&Jag-V)C`>Z`wDy?)nq@cKu{S zvY|$ILyK$jbv(;U)RWl?3Qn%bAJ{Ko(9Z$KdI@sYCV!8d6I3Xa1daP5l62L?G##thl91D$&s@n zH-rugDCRT+3(U(qWHnuWiob0Bs`z=OX1|!Fx~d;xtkvoin>y!8=5(l)!z?1`w|7{0 zkL1l61sQXa8IgPZ(ph^!GP2TH;j}v6?u0 z99-tj3!qc4rJ4?bErJmYaMMUG0z>c9Gud#?lDhJL+JaYNKq{mC%e_>a1*J-7KyohN%K zDbTA4y;rwzO;Tt^XZq#jE72jrH9q&*Yw2zZiNd9hre2}UBt69)_F`R0=c1PGb+uK+ zRhMhNHHcbYw^QX0l^;~ERI`+N`3c!kX?Gb{xx%mrWpO@67`t}#yD=habj7f&euMLU z(&A(5!|#XBjBzB@WgN&{ntUbf4cp=9+i|UY^{3)DN8fJ#>Qj{>zNEfqsMFQS@z(0v z#H!sjIjxPVKsS{s2~b7#N!*-tHU4^(CZr;O5Tx*_UjK1ta@;H*<|V2YEObMre-xv{ zcw@J^OEr?3==wiRy+wUGKg$uNL3L7DC!Zi&+BvziPPR*P&e23G3VxMg>-}k1^+@ip zzWwqF*qIZPqT?RNeNKWi#$`)$q?zS$XMM|vMkQ49?DN7`5zlwN$^ITvb4@Z|<)!_m zVt2|~UNw|7ED&+j!(D2|G@&v2VbY(Z9dYvT{{r@T7qLgtnT^Mw`qfDh2#`s zsJ&YAq5VfwbFH?!=u46i~NY@1;;}7Pk z2M9)X898uhexIBiaY|*(yhxweF-h)>LD?}`7ZbMxU7+4H?`Z4%&G_8(_V@c!->U!m zG;i;a$U5bH+j}=3tCLr0YI}+@HCwR*T>sGBiKS^*(xX#G#sq|(^sD8ivl)y_v_(*f#Kb-;dq?W^c>Xq(02Z$R3rSRM@rK!>q+|4g4>7zItI@dznwk z#dqI6=l@z#yGbN&Kiy#z&1^`nSyXwh8fan3URw0!a|ltY=HbESoU`!oAm3ldWO0xrP& zbn}}k|L!f#C>{AXq_(rgEvB|5HqEW-@pFAC{d?K3&-Hxi9lMBn&!-^NC*pLNJ@~!f z7f;UP9QV2xo!{H%sqX~AdGAZ?G*UV;)pSGgusyR)(Q>2hxY*vImwlEODr%(z+6Ric zZ53@>+q=jnsMqT&4Nmh$X9U>BeHKhjxYp%)zIWl(o)5d_=Jo0-OFx^ry(=%jqo=m# zuHJcUn%@vHohyD!z>THtUbM{-t8L^CiXC zr!nA8;EBL5zN`4NcnNGe_mI~hK@VYAppS5|z{&gxEL2TNus15RO=0P%Xj+>HRd#*Y4LA1*Rg6`<@(>bf6o6s zR&%~>s8V7hz#>MZ7sMy?_1?R99o!<`6t6oB2%dv^n)#9oIhMWoGmtn`H>lMEh7Eozm3dHRp&?x zF61A`jm;`g8K1l?mNAaQ;g1m<_L<@Jm7#zJBi}9a z^dHsBl`7={^;7M6{XhC^x@DRX%Kh>)GN*iy+NS+xq+5U3T#i?cCH7@DxoMf{t0&sq z%UtSzEW9LPQObc-P8uocdcwQJuw+V>EkC!{`F>%&`FX2Sd_p65U7?M(pStd<^@<3k zRskyh?VK+f(JZV#TXU*LQFo*@UDiXl)&?UAtUs|1AB_(naH0m}l3q~O(0|a^(3X;A zKqO{$?Q%S_FSI?e1X#LRhFF)`2rJk6$h^{cR6km0*J^a%^+yb?#$KjP#+Uj%+L78M z-D{oHATuM*_n?&)?Ufd=D3lkgOad~x=0e$L4EjlrNLiH5Jtr)=?Ywwj7j@WZCLV+8Mf^y3N}C+C*KDzTFgWx41o-0{8`a zKIIXq9wGttSROXagWW-}5qKA30WpDCM%>39xE=O|<|q0Cnw_3~4oCA$dr=pp`=nW^ z@>TfAzsoBWuyVd)g3_!{b9=CCKAPawF~^eY)3;?6cctdQdC&5d-A?s<)T^}zH?K4O zT*B7K7s3p0KYBgB+csGjqkJH{?(q-Xr8M~m`8nyT_HE+JqPgNx?Hi@F%I@0fh9dJ& z+hs?ZbC0u&yB!?~IG_^JP|6p|W=aWpIY|j!27X~>NLP1?bFgi*DL{Wy6Q&-bdZj$8 zysDHdZ>bRVWo?#zwPCAqf%%p7k8Od?Z6RB)JIjexv=?6I0xpNGh<*|GPl~0>;ryg- znfY^b!QA!PC0QvMN7AxV{Nw(HPWIc$`AFL6K4u=SU#@wf0Ho*IpG%S@|Ie}$o#nDg zva_AxoubYTS)K~e|F9f#^3WDM1I&Xdq&@IVnB_qtZ3eF6Nf;e{@80CfaXRfLtJ?Hg z|5U?N2y1{YxER_m6I={)XYLafzwkeAw3HT#GnS6E+0jfeBDL5{53q`vTX& zmv9L679)bSo|VmZbM|{7{7ZhDf-Z&)jIe}14gVF97Bw$wa@4h`K2f0&6GPSrll}Dk zaIb3|wI{0Jf!;2Qq2R+BW=BYnJLBE&z|mhVz06m znG1{*!*cyG{Sw1G<8|XPLxDa=XV9L|t<^gXk>>N(D~`E{j|UmK*0T$_ju&~bb-u)H zU;*?VUg*Kce1RgNLbx00AT$iS=U8apgyy(Lq1mbZ zsQs$_qJ62IrW>Q{u8r1oRgYDlS5wrhRTAYa6-m>k*`k$c*XV{C#nxd?CbAHnfl-NG zpdP+R+D77$Yshj+A4&&Y2yFm+fe(o89`sKZ5Ji+Ab6wBfr}5{+WbC#3Fv{~FHEEvj z=LW}82eJloCh+pS6MT67uZ4SqR3TeKH;4Wgx-j&7Xjy1}SaRr;z{$RTykm?@9(+zG zxRfYDA!oF8uhnK3*?(Afnzrd8w3(XQ>R}pCcfvq3?lbu7hwHozw~fP1{Y;OH2aOJs z(i~#}9TwMD)R#DnHn_UE2D_i3T5JfC?8^6GTlgpotc6B;@OF2A?f812&MmZ7+IJ!^ zfjy)c(k*BZ%z-jMEffsw2kw#oB_~qjnYm0Clf))*mhlRC``7?yFF(ldhHrP@7XC_a zAHjIOh`WKC&)Lek!`Z-Wptg`MLIZ$2w8T~A=wkh8EHt?+s2y}&x3!v|SYhWD*L7!| zy^r;YjpN+!{NtSJ$ahU~O|o@$?XdsV-OzP0Ot!@$Mpu=q(RtV{w>LOqk?W37w8Qbk z2w}tEPLG^hkrlWv;0<>`(}_V~BOFIN#!)g*N*C$~>Ssx5|V(jgt$g(Aoikns32H}PNLLO`!N2YJcVm0eJOvy zpKv?*AvWBD*QSuJkwO8!C!cPowj&aJBC-!X317gDyKyuN#=%GKOrV_1BCjUZd(gT8 z6gQp23%RejuBj<(8o+ zi-<8aS$#pe9keGBPi(IpG@vhoNAgDxIerpNyrYnp^POV_aF-Rz-iaM?s@(hVFKiW* z?m9*sC%d>UfZ1G#E~h9d`|(pYp}i4EAPoR_xfg=jZabPo;-V$s9XQ1?%%-4yWiBIs zLzhvHdfdh0X?`{V2Ej|9f7~F-jS?^5pHkq2$vjx z!EVQ1Fptt3pb#33x{el1^>$@)NKKe<_2^e@HK&bk+jh z0qH2OUCc+8{>J;pb&O8ZYw(EUKh1Q?KCTV?M|az$XR;Wpv5ELVvx0cx|JiBA%D}16 zVBh0doHc~DkTHrf4;fGFL#V_Rgw9zFRp?FBarhB9)43nNO~Mh52lcrEzet80f5<8H z8^}A@K%z1rtdCPw)YjzhxbPP{(+kMZcp1p=RhN8DL1@UlJWL&2O;>*E7OsLA0Tjrbaxk zy*H2;uuhr*9V7pAte~RUYRg*=3sTbfE;e=-pGxmwytWL%-h=BXH#rd6U4NRgmxhzy zM++&a5||N*}xfrb+xZTzgyN)PtXRN#~Ae79qc=* z0n}p*x+93xg}TW8$ufz*+!<&;>2nWVYM+H~XX)843GR~ncToB`TAC@m=aFhPvs3B& zft@HXW^I6OI_5(<65A$Zkm-z;$pQI6rd9#%75AlXBWE+XuVKFbTj9*MqsTL+lJXg7 zH6|F6V#3|gbz_64!*(0N;<+xGg9BpR&LxqtM0l%hH|I9;!WI*tuEhNB zGqUYN%x0X+KWWZ@D#01*E7U=uADRwYpZhtYIL}t;CNI2f1d;GS+$H9{5GTZ&`mnjYx@?Wb5znL0-gXdY`Aqo-~mV*#?m z>s2R0-cCNQn!=>Rm#qcpbk=OlgPbD#T!eE9Z6`I$bzI?QUxPg8KGs~zcUO8q3ZanP zW&2WWxWbZYjR?A>T<=H^n_}3a*6`N(o>1Jgu$|f{))WD=|i+54Au@b$bQoPj`fiCr0zQYFv+TZ zBEB8Aj&?%%(Y+;LpkqM&SK%MlrRH;?I~>_^Zj6au^PL=WTezrQ%p3<|uB-S_;X%zT zhL7V2axZ?2^_*BvllctS@Mznpxza@MnUo{g9^|PiA!>`oQ;`d4kfgICXkO-o@Zaj~ zu2rT`Xy13FBoksTjR@ondNV;7lytdqyoWQg zBbWJHJq$WQP68&Ou!kOBoy8&9zN&!jpgR zVYuu)Q`rPxpv<#EbQ{S_CuVfzPHtZn2x~su(gktK8s;mXgZ25|yXku6DIcEx8Djt* zY$Wi1V@ezE*>B`m)O?~2ATOi{xW#Q_7&*e}@~5ts{BTJ>p~#l&e(n9YZjt{GPOhjc zs`ct{TqZVRm$9k5MhltZ=PnByuL7ioY_&>gJ4-hab3ErielmI>IhM&0*_!2m54FU; zG37&51Cb(ZB(1SnHUA~A(tz41zd6(b^*fr#J>NUSmLvsp3mR@?Qv)&iMba$tc54{r zGLWyca)0yI)KM8J5m&0mGv|^X;O8iloo4_EW0_xHNxAEL;vKcMX<2unVP(yj$Qs{l zah{d%qarKaUv(+;NxoxiHgTx@FzF8Og4V>eo4TD%GYZ~Gx9ZxXmHMAxce2W8j@ey1 zmcBaL*%HLA^PI}Mgr*q*E`=%4AFwa*=NqS4Zu@+4#=2z;v3(+~oXXQ4#T$bU8Ja!X z-IbHdqt|W--5i2^UE^Qum)APkdz0Y;kn1~MGz01gIjg9^NxW5*GW9yzLgoj2g5+%A z5x7^&@_;hRE?qZ=2QpLMm#g>RTocU?qD_68PJ^+59{HsHlK_#3#0~)MquTPJBdgfgIg5EijUOp!$q&8l$>_ z>M!OUvh-KiCG*s?Or6Qw#Y;^sY46&nX{Yob^5GAt?Luwycj|ohVd2`7K^2B|9ZTYb za8B!P#!8ly(cKcJT^AQ7FR?s}4eAtuKcGT?YKhe@*y<{hOTv!g6y-?4 zReF*Evkghgl62QyPBN-q+sY_(a2^YAVV%8FQAGuo3};)02QO@WO$zW&?wHOcz!#b_ zzTM#moq-B;^VuZ{_beI(U+jyuUiS|6j6 zW2Y$Z%V%fiDbp2i67sYGx{7c;`mB0icAb8NYEKOJ#wUW2NycCnL0$9hZJJL_$PpgRD&sz&DTLeLY zx4!jH>PnsQBV}MnBcvmTMR5lH`jePU-TSS06!Gybn-cg>a}cA|nH#`s5qnLC@0rql zr&|PWF2BvqME{ViV6B!F`H$=bz9lHKGl56v{BA$Ofa4!LJDFPK604d5__7GjbIK3N zX+bYx!G}8;ne@TGn*vyISs(NPxXxOdxBHGeD@fIt1wTprui+m0D#EUN zVfpRPaphnqoL+vt>_ap`i9G3Z;0I$2!2mbO`=s`4vsja(S4k&&P7beT9n_E*lPI&5 zPVXULZO5kgfpQZ*$`_T~4?gDHtzQ|m$qs7mX=x3ij3PhS&;pEvHu&zY)ke;xi%Wv@ z_fXK1@mV1dr97s~X7d%ro1m@ag)K+p>Rjf*Dp zQ7hGq=iYx;#^0N8p=3npUqnfm&CQ&P$T80En)SAsh9{(Ca#!H-BWi z0{*tf(Urm{vKh$4fJLSdi#ld{i<0zEI9PI=(@gfVG@D$p*JTPh7QYC8*Y?ck2Jdp) z>CmrczN?CttH~7<6B`U}+P{>1*I#!l`7nN$@HX#Y9s{@AC!lQaL{iuK)mf_yUF6@w z_dB!D%kJNtZdN&yXI(`dj(oBG4fw3`c38YVV`RfBA1ks-H6U<>gp8^sYf6(pVUWCP0tw`eY94(}NGy(xklV^p!J#04ckVux+JbYIdWdur=}h+*7& zl_`QWzg4ZVtSMZxB9Ad%xVYxBV21yUCSPtb{f)JhP_Yp6Bgd2QO4%~AA@rzojG|v~ znr)f+ZRo%nLv#sZZ;b5n-F&9)Yt(Z%Qre4qi0ET0^?%cPn{nR%PmO?^%9(GAR$mvg zF-ZH4l>l(@%XmQHRqbm(@_9T7}^qIw*xt?-Ru_>lTUE49O&xi)G{95;ZElrL= z;iUFQr18PC%EJO(oap)u!a}^p_1d5f*)QJ$8|WJ#B(5ZP@=r=H z^Egzp(&mkWM|H(CHaE4inK6wV2n;lEn3>FYSxL|i6X?7nh;Squ0YNEcvSe;lzRm*d z66CdQh>-t>>`3s&cet|Xqp%Ukq+P$IQJ^A*n zrVX|Zq(KuxbarLR$9q_fGPL<3EElcl4NQ8|6Df1{0FO3zL`1Z z?G*a(FCQ>l|1__Iln6GLycEFF~YjUNxjc9?`$> zRyU;E>yGhtQACSd*Pav3Qaz|?1=Ld;@HgBT^s{87=PJKfw#z+MI*)Bn7GZP6JMb^$ zALf;&jk8_N==j6cFMapT?F&2_rK4y6wQ~YP!mIVAD?;?ls@e-v>RIy$+TCk#auhxoq%2Cr~zo2ig zLMO=+@(TT)8kW|l5^|%^D#db94C+Sx5(<%6T0kc#^Q>bSmJnl`phIsZ+k!vP>2z zNa>r1SDZ|4sdprvFCIo}8to`LtUh90sF9KbEL(NW$s(?a{aA2jZAZ=={Os=q`D)el z(*4Rm5?5V!%@<-jNoda5$CB@ispU4+9JJTpCMhL#cgvPF6?h{*&*>IeHMQVilh^h} zTzku;SkN4YrUksfpP*9uB6NxUre9ashhLVsv0$xZc;kx|-RS+MhO(v-UC<}cB+F@Z z>sOLwc7WRAuRh?`K`kY89i{#&jQF96xkGoW9uVC<|Fd9Q{QhhUmXY`d2WT`U?dZ&|t_#^2eh zdTg}Sk^|c%8>%)LlRX9alaTdgLdqHuQvb!Kk8f38O~2OmuRY_Jik!fUxp{)l=ni^? zdxiX4;DPVeMuBqIpKMhJRkN~KWU6+qyECChd#NPlPXof`N#*GZv8=5{D&DE+?}%}g zg}mUm|4oP}z|L58K2225VjCxn-0=<7r$}|xH<9VuQRL4`LLH!N`E9ysmF9i!S_7s( z`SoS^RWjDIM0LC*G~sAT6Y5Chv~Ng^%a>PuG8(Egu)9R1*i)BUa722?vP@hqn(Mf& zoMLW&fueBl~GREgppbkTQ*3{*i zZ&9!2ZwKPK_~VYowy~;tGNf>f_FwJCoHil6BDJ73G^^R)>=np>nC9QNN&_NpmPFVe zrdE}Bt1qN+quy!Zd1V>eVSNZ^va}>zWtkSMkI7;FMTzW?26QEArIRZ`!7O= z`aJv=c(JemupaI(>xBuJx3*rN<=5oJz6@^RYH_G@jM_)s)^_1outJwz5rf$(8tZ0@4JB#eJg%-D5aX+McqfMID*6lT(jUH? z&^SS|xP>EzKcrjMaD_+;$*mSkH0TI-ymG7Og4a(y3$OR=GkhxFD#{7CR(VW6mzY@J zP5X*@<#h;5RJynWN7uXxYwyFXIl36P5i*OKk)Sg!`R3lpk)rEZxqC4>Q{r$b{7wtK zwgEtmWWh7hJi#7xSi`hHwg1hEdigrpxxeD@Wti!2PEccEJIhAVS!@sCLp1O!EZp5x z@;{Lu)r=Yj%l#|t_aqGV+3{I?9y#Dh(kv!Mxxw4Mf?sSmra)w4ZZkQ^7OoREl)Ma00x>29K8lt4RtX45OKN2o z+}WHIUV!$8n}M0%F!&=>z^;d`LF<_~s!xEOTt{~Fdk0gD3O^&fApT+VqWk<8;uldJ zy_K()ZhH(2&Nf9aMDD>xQBPckjYelu zBcL=ik=h0K6HVn}IWKh?)+1A)am*OxAoQBPD_RN{5?9b@ES!yp_Y=+F>%zPI2kZ;A z4A}+IY&N2Th5%-S3p)*%8_iKYrsYEzCo%_g@7e-hOae3j4-kFj!+dHi6hip9{54Jp z2sSU7M05pHh|YmNlF!B6+1}I#p^^X1b%nyv#!wS3UYN=~XSdN8MInfsw4kH09>_>C z0h)u zi}Ij8gj3KPF_3?uqe339MjFsQ&{x2>sX=^j4m${Oo?y>f1G;@NIGM| z+_EP-U%;_G#w3^zm^BmlIdCXDhwq5&XL=!zp*ftH&w;NXSE(L|8EykLhau<#EaiS6 zyWvc}n%e_dIaVm1+eEj4oqQ{DFS3{IkLN?}`B+L!zv1^&AL&5kADSm1_!R>)x1eO^ zIxRrI^S7w^s2OM%?gRcrcjgWjEP4wYgqi671ns2)K}&SMY@cun+QIL$BuI2jj7yeC zb&_`C36k+(8X7NrBzY`YEt-WL6cmbP@;})A*i#6`F*cQ1iw6?#eCbZw+to9{(!w2P z-&$91uXK#E*sOD{nRbhPm+iZcvzc8=e23GBr_z^vB6Z;;Y zz&b^)u!UqY0^T=481spI<3j7VSt0A`>Wy`qYZg^DE}u|5rtrbvvhodOOG?+5`ifhY zHL7@5v%D&;fwZ==#(QuF>`o>+v!B?X6wj-`1o8m7THr5!C5{rV6lY6B*m?LMH9?sX zgsFELn)sa&Mai$p)3s=Dx3F80-cW(&l8P{vMXZTVH}BEDFz(mgS5MF(s)hc+a!S+z ziegOAWl=fb$XjC3)-9~NXWi_&QQxw*P3?`EuO%rZGpbwH@3nuby;{DsGTk!KcE(m< z*SYt(i#*SL3*Gs|O}ZZ|qo?`|bRKU(ry?Ai%Re9wkXxxL_7`&)?1j`5o2-NC{GsRm zfm)Sf6=m_v73pP70uJa#gvCZD#N_I?%0q+WBGZ~h#_taw9-)IDZ`Gt-D&MD|prM4ezXi@&E*iw9;@=xt? z8)_NlxZ-r#6WlAkH>qz#3u3A-$=g6%xWULOw3cl_*Lz>!JIRIICZI7n?d##W?^IaY z)=tC6nD;6YWDD4zbz8Y*%IWI%rYoVDjRY|*f{tp>8j=$4CC_UO^mk|48>$jY{#SZ0Z+JmWMSg|4c7Ls_5YN9~ zdZ+q_rOYzGB5!D6bJ+&E;)sd(Gx`f@^!~!5*#W$R8^rNM5h-CtQZ1
>>KQN9H-` znFG#prsW;JO!-@q&*j|tuoXo}SBBmLk zra3&q(+fZEI_dJ&Rj_Q}F+V~Q?<%PdKsjAsW6z+V;AN(~;PFNzaCnF?)TxES=*<*Lwc2+$Ho9hD#Ul&)G^#;1@ zkTOk4kK)WykjgalakO{dcP_V1bmkLJ$u(>+2eU%3T@=p0VCS&}U%(Zkm-#(l%Z9=~ zds^U1x5{t?Ap5}irz>AbtFSEo-hw5*dK(=Wf6!s}(x8}EeOXQ7aLt4LW zx<8_e(G=o|jE#O8-aTY_*bbFVu}}C>utGElMd+rUm$n&J$hO=ntt%}SmrN|IC|y(a zzlylZ5jD2bu?6z7>s1}C1HCHmc*hcZb-mFZ;v32c1&y$S+$QoN`G%ZDE#TT?KI{k^ zKpiET<0rkL#0R{ur}uU%Ycv{Y1eEL)m? zD!=jHN5#KtY!zEeeimOTY+jmJy{JCN@y*-VW4AuHCEM@I1V|7I)*yd)=#t6;XU}v0)a>aT_Y!x z>zHihJL^>@l{EgeFerzGBr>?o|wtcl|?oQcR694eJcfP)k&HbY$qB&}d_7z*eyWorBJTgV^{@R?A8T)A8*H0Vcd2f)$HtUFpAk9t z44+Av87W}3brWVuXG;!Yb%0#U=4u$Ixg!Y2j|GTf7HmJESDKu^IF>p{=< zow0tXE2!>ME-2qtl3g~ojII1p@wI4nK~!rnP&iACYx-Z%ef-R`Cn(oQ~ANPzT?XPM zd_;H-?!YW08iUOx3W~<2h}VnOA$ocRahROX`oUcQ2XX*W>dtQH%Oz-fspk1Z9=1jU z)oUX|7e=f#^iYL3+=RjR9of{o^uRqrbl71PR#3wo7gRK2TtVM82ZOS+}7L1+ncR=5i2 z3{enroqXdf^;Xic@EB~QFc{>(Lz%_o32HHy40xF)=053h@)kX!R}}@m^cT7tRplY) zH8%67urFGTxbPO3JWFz%-@&)T24a`c z7=D3!bwh6D_^J@gmil(J(wc^fL*rhrzJ#Lc= zjMqZrb!p1S>IV5y|5mC)p`vCRlUGI9)Tse`)NRd!Lq|sR3e7S0lP`g9@+*;B7$wqR zqe#}7UVFS+VAI*|HL$e{D_@p1uBa_uT9jQ}Q+m7lX3f;9gqnF(Co4WxAFW$uKZWD; z1Lg%a)~olp$#?uIw6&@Lp) z;(&L&^SDccPor1CClN}Z$7W!i0iEb46ev>p%@X&8pO6MI%?_-nuE zOmVdZdk$INW>hkl0Hf$DY_xE#;5PbD_(bB9p;rI*=XbAhSg$#X`)F zGkbPfJ@u<yimEYB&5hwl2P7JbKZWdoG;I9+oMzb{%F=3cu55q@A@lf`&%M znvUqhG+X^|$abltBk0CY8Xpds3nD-}dA0sSSa|s7pjFyOel_4`Z6bIY*jNWkUwSHQ z&9zze4;|e+3irJRvOF&T{%`T`wfWo2rc|X>oG6OPPtK zazzl|sVP8HT?p5p8f+!J1AZ+S;`hM67(`kTIs9;U{o(S{(k z(cslw_D@s&XBrb5+qhGxPHpwK$XXlzGtV@e!e1DZ6pPbK}GuQ z$}@p{;OyY9QL^Z15x4dKDLyJH{q@Rg=2byof|qIYMQPj6TjcHe^P#X+>Co~C6~^+^vePAz#Rp5<)$DFau|bY*_E=|ocfPwj z9#3}WFGH`ujgx@X05W9d`@6vxxe(5coSBDeB_oByFGV77Tef9 zvH_}j4G2@uif$Jq7A!29UPKfvD{fXCTk@k=UrK>F>|#V*gUuAo6}cpd{&VH+{ijKX_&pX+@qeyZuj;G)tFBd7s5P1{`gexY zM%tKQKn!+6u;H5FsE$F9j>H8Nz##8n2qdS^j|=ZT@GD`{{?r<67~w5iUeUFks168cs!;O z>;M#8voK0rC;Fd|5_A&{6~FOo?N={+C6P&QN@!7yV5I;SUKV8udtlRrtAuC3|JDxh zcoh&1IJ*MCF;3tp`YMs+J4H;V2x<(SNr-)eeO2JFW#dEf(RdL)10O*A#VP!YHx@4; z>!}$)>u`wv%J{KWw3=zg#zC9Wf#@J49GQ){;cZAi?6hE)c%P_1a8@7|91>0z%@YwK zS_DfP1HIr|(K+!l(FV~U!5!?DaD{Mz=$vq@z#s_4MxYlF3fToK(3#*2{AAbA|1sSF zUn80>rFIhC$y>xHyaB&XJ|p{(UGdRGJ^6(&lD(S}$Pl2d{>qNwUU3h(XaC!CAW~f0f;)2x#57AIhIcbDyCL| z23`VPgFB=B08h6D&}jY9M(9lR6pEvLkOA;GK<({;^~N4x3$ZA~0ry6#VGleBHKTXY zJIExc2S1QI%ircVz`u|wa9{W#cNtJ1&0INO0|=!Tc|B)h)odqb59m!@T3Gy6JRU6^>Tbjz(vood9jQCAtJq#tl#)&p|wrfbN5({7klt>B#E%;e02) z5BOO_xpbiP-$A$F>iFjHfBZUj3L|Cde%+6tV1KR6M=r8n# z&wv!%SEd`Q;#uf2{2x4nTh4Wam2eaUIMeU|cptJ85Ubw+RgDj=L@yxgpw%#fu0VGn z9{}sPJ9-#B0L!3s#-^PWLS@NwjJYCPYEqwzYj zj9JS3f`&pp$#3*=R?f9YZb8Gqy!BtfZ@2^Bg5S)IM;IBwqK@!5Ef4~Mw1>6(6 zDQqS9hVDSq5f0E-ze3%GS!fO173sz=Lkop_gwMdfBU?gRn7I_hA=7A0)n9L(AOCSP}6kfTJtCP)m#!a87hEywhO4?IN@TzL7oSP zz-lyI)DJm_trSf`TcH}XjVMff1>4L=aLv(ffQg)n(NdVn52WGM_ntF;JL;GSw_y_bfZWYAw3sD>L8TN{00vozP z_*>8%8_6-~Y|&nHKlTm#gn+FQ$v%W-H?s-wJmdx1S3tuT0fDSN|A`(2q39N9D?gEb z&Rc`N|1C7%|WFXX^ISH4d6f~R(V0%#I++#S9&Ezw= zrs!;VB{P;zr0Yn8Pk@3r9OQjIQ3INcH5Rl$6kwBZIkq3$E<7N5<6k9-^~>|?k10_d z^o`#JC1Z;vH<5q1)!?jk;dIPari%5F5$sL+3(*Td==%dGTjkU%Dw~dAmf(|o&zR55 zCHfGF(&yn@$S3$dTgMIMX+XTMgY{@S8i=j|Z2tjp5_%fl4XDXy`6GM;ya!%}=s|D# z0<@QV!Ajv;I2WT}p36XIBV)kUWB`bED?|p21p1iGyjgG$d4qI-)!^je&?5dh6Hk|N zk6=IO5Pg>%Lza;W&dHpnrjd{6kNjOu4>&QqNh=%4&t)Tcy`U-FpZg8`1XjZ^RL|yc zr;-1`6X4-c4lCft!E502{8ToY-N}Y76SOJ7Z3OXYU zG6fN!6R-g6FA6e!_^n_da71{AvhYL1D^LjkLB4?49t{0~DCiw~6%g{Juphjd>&Nck zF42pq?SSsxjxA;~m|A8l@V#ikujL=G$sjwq#7dby>434zTC>M}S}b6dn&%a0OsPkmbvvwtRcQTOPH+eTw(v@5HXMK~MlxUlH;=P(*Z4)K zURWudBHV!%fcV102>cs(fPCcvw1qzje#&deL1ZS-y4CZ!%p>Xv-Ih7d1p^wXegaWv0_WsjfH?dR_%JNwCo(zA8)h@L1Vk%}PX^I$2z&#%jQ&FG+y-Vp zlL%B$TBsiQFWg7ofzPA`{tbhnvD`tnioL}C1x^xqdarAexKh zCqZIRkIfZ)5NL&4g+{?Rv zITv5xo$F%l)9q29PIzcLYXKYDb%FI$8v0ul)=Ad;w$=7&_R)@dC+SjnO};QZ5BCr~ zs6u)V%WyQvy%xbd)F0jeR9m5dvHTp|O)H>qFoTssbD(d4r~Ux-69r15r7Qf0s=n)a zbGwj7VWXoq#2FJOBwmi65>wh}am1UDA|t6)EA#w2iYt*N++O+v(ak%-iP@f7HrMa2 zeOv8U#aFbh_+IW;9$xNOmRIzl@N&`7;ti#ZE3zsam9uJ8^^w*^&NT05Jd!%eGzCw) z3f>4`LrOt3ogxSm)B&bFpvnU8gm%a)RDxZFW!z9&$W=fmkdA_5!c18U)mmMN=}@RE zG9#`s;Zfp$iLDb(BoxKBi?4{A7?T*eF=U#dNcmbS6`bTC!tLB-&8$zZZC>@Id~o^x zGJ9!R$(fQ1C1XoA7q2ZkQ#7i0MX|c%QptzXeid)34%T`a+S^aMyxs$3Ei)3l1Sqe1c(O4!@0j&Tv0S|cXTUi4=fWF20A^%|p@owordAfR| zeogTDh??lc__>J(8((SipfMZoj-4MhIdX3J-r&K;wc0y@*QHGa6S?7})Kg?TR)4r= zL{({db?K+#l|{>oUKPDBYE`_l_-=8d;_M;@^zE=RZDo8-Ztd;*Ue>jaHXat=L`Q-c zH5uFrtuYnmMWe7O*jKCzwi(HUv!KrK9;5*Ifank_IPFV7{+z#6;e(lkW3(?KJ!FG%qvmQLD{BgN1D-QC@QLnF zyP*N9yHcH5xw^bd>7L?0MVpGpmXc*F$_vX5mhCG`D1Te=y)vrGQuVZ^NBt)2a(lfa z)@}B!BNxzLXdTm!Rr0T)P&f;?5@C=DoX8LGztAzB=3qdwS94XM>gmqzXJXk5Hil~r z9Yhnv2LCkWb?p%|79JU`jLV6ujEjxm9RE7*WUMW^QB-z#L~xk#wJt-;t0=|QfCsW@ zzkZ^@*gZIx+d*D%Z?$##{mucK5PoIf}acWv01!>AI|k; ze}S7W5!^I&ycS8uLPS&iR>?nWrkO5*5qraCha51c z7;w!%)t|t>@-_ZBepf~31Xs|ha324iIYg{;54KILFRR8Xx0T&59$nZczdUdB-^~0U zg|bp#d03UCW^L`1dY46QAMd#DnCCq1taF}lJ#n}9J;&9=XyQFlL~n%FAQ5OCN?`G# z5V1-0OLznP84L(*QNaD^D4K?jg}-nE*)-P6EaFCkY(8A{T-EMB#*apk z`G&dNJi@%(w8+@YFv2ic@6aq!i&R5EFKfH}t^Z(|RZ=1f7S6^((C<)Ft|_JQ-FI!W zr&<@+>uVzbdE!acfa*_Go2zfsEUjHzf3x9P!-s~{hKm-DZI^SB`<45OC*OCB7*2|* zTC#-h&xQhLpO)MKz8)F{AAqMoKR~UviyZ?zJQ&b5sAT?Qb}%dGca(<8CwGz}BAm#= z&141rgzJQK6E^aD?EhKuZ{Ta?QB`|Qk*?g}G%8I$jl9VeG%M(h`LA)DVWWP7K3LzN zWi)*>P1HM;!HN&EE0X!b-lz}KavEx!&)@ylnc;c|C?6MX9jsZFE0!SZ11n~0VVeQS zB~>=g>a;$yO}C32gY4gJ%k5npHO}?!_MQZ<(-Y`@;%(_O;<@-{{2zQQn1E+^k9tk` zIs6sA18Lh7;OERe-#ibb>fi@Dkp^ULvi& z8uwGjajSpB!a8}Kst)LQ>yql5f;mZ7YpwN`^_X>nHQGAM@^8cA2DSBx?V{afKWxvi zSKB7oTH7YuHrRgHN^NcIC+x@UtL@wDEgdVIcU?YrV{ZrFIfA4ltb{)SjY7gf-98EJ zi?&6ZqGi}D@gAu~cE^95yhgcKJ4_#G7-aZm_+czCo-$1^|1w_)$_?reoEn^D&NRL@ z)EM3vHyS$|uIT1#CaKS=W-A{kKKL(@b`WmGLQ zjQy8A!LDS_gT2^pTp+&!c(!~8E@5Y(lhAkI!4?V?gKRO6PX}&GFWJG&X=Xm}Ma%+$ zaGS83xV1DZ;7ee#%3s|~6R4YR*ke3k>|uOnOgB^Jo90)h0Mi81chgo=Gh=tXM0-Nr zN!2fKRlswuV37L`r5h-kUd6@1z0l6s0kFF=Qwqzg6)OYN0tW}C zDsxrMH2&IV+Wng4npYaJHc-oIrfK`=n(7%{k#>ouKwYnHrY=|ZQLOj>=GR30RuF~V>Q*Q{&mTANT0E=KecazfML7r#M4|coN)c`e! z8WuN9vvjl?Y}K~k_HT~cPL=DN6LUTUdk$k=iSC8&IJd&>a5 z)5s~Hr}&v@M6|=_`{cg%zBaxtz6_tsJJ|QdSBz6cA5b;jBu@jsry$ZrTp~MxPd9x^p+;4cM@2?xBS*D5#bOb16=fqXOjjM>A3AR}l5)Hmq-dFBbt`OH|7vjF*zUrRo z?&%)le(nC@mb^qA&Q@+6ZzU}Vh-HPg?)J9! z3AVA;OO_Uv>z0MqUSL+2>_~71(^T7?lKZ4_e1A~f8N>jDrt**D4k?)n7 z#KQ#3k)vD@`NOMs?{G#sj@e#VCN#tVuFOE|4cl3Jg}n|ufgMhbtJHbW(Zhbx_Q19f zL4Guaa91xd_yJDlDC}cN8BEWOgN!{58?`FsWq-BbHlU;)jx^^6k*~dP z-Cuxb&OQ5N>ln*b%M$B8+cjH`wW;-;b-4YAW3^+sJ;qjMeQz6R-(l}-&$8v)hS*y> zzB=AJjLxslcCKHp)$Uxk)MNHEcr?D#zB->0{}0!Z6DSu|O+oZ0x`MgOcI6b@33epA zfo%>v4mF&S{m!&uf3g?Z9c%)B2x%)i>(@&*&wsN#OX;W0)j4%i-3cwKtI(~~AJVtg z$LJ)wQMwnpt-7Jwa?Mq3xW0ovQCpvQy%R*b}%2WUMQhFtV+8iz~>v z-jV9)V;^iIt-Ar?bdmFb^NC}&eW`t>qnp$2Z0vN}KiQ8vjILsr)Scr3x+zaCh~uq% ztGsKx68tXl7rZYhX&?*84s;k(NT<-rw3Zpjw&Mg`G`o?x#e8Ppa%J2(mZLvVpQ#q~ zLW(4g0)M3C_)T1g8}a^RQ`QPcgK5t~@pb8T`8H*(icrl^H`f$t8nhF15xR@oe9cJh zY~48!LBh3fwcYjC4M~P}x=anJmTJbU15^=;$+AN68sT%$f${QN>D71_?+H&E&s4X> zHNlZ+udwa4PjNhQq&ZgDTi7GPt);Thw9(dl>v&rj;O#DPG;#KG#dr>Tr~8)retY+L zhl6L7g&T-vIM}Aee-Nih8`$HH1}x$iU~{Yxd>u`Wq#n~#nIE7R@DbeA3+emxcB+OD z6DM#Z@h^VMcYtWe9zwx<&F_@IL+La235#l^jlLG$Jmx^`j=1W0zl66j(;|&wXG3lV zUosc#Gt_?+Cjw|$u@v%KDC~!%@YPHP=n(evHgQdKoUy;RHMZWV*Va0!R#i=`c2xfa zInn2ukM*By_gpuDqt6Y3rpz1+V~8G}1l@sV!MEWf&y(8o!8>`!xqYsK zZkxNCd%iQlQE0zs-(+{#GQpPVX!}dY7+0-(gZCyLOpRb>a|Ayfo`&iLUxi)8Tf~Xt z#o|R`y=1C*u}~p=BubQ7COb#7k3Na2x1AjnVd?#B6m?a)I6%5oJ(#ebIBBH6t$m>B5#on z>J5F8zE5?c8bIH_f_BhD=u+|<5l=zf7ep_-AsX*@RB_T!7G56Pv#~U(QF1}bKk2!x z{7&h_l{FWMhd{;bYd6yGaX)LnIi z5pP%t`~>|X68pCZY^thLbyidA#@gMw4f-_wEbVA@pz3zuLU}7$hPXLa$%8FgZ;~qn zFwA?`^{d%by`%a?jjb-xqP88gUv|cMB78^iIYc1&54jOUNRCp|wd5vH5vj@k)K!pO zRTFoKuH^ryUvz&kE51WECL05``5GpI@lcOJ?$U{F&PQXvMTL^1vbV~5Q%ocOgy^Qx z$*WU_r#($S+e*`NSMvDS7a>pe6P0%)lR1}DTJ0>X{B!YF@sH?SK6_VoO77l2Xi0fx zN?omWkf$l@777DCDAok70t~Ru`aXugx-`wFz(o1W06Jh~z&WW_yh%`lh9SB9Dz=CU z#=E$CJAT+Cwx5>!4I3I1mTJpITa@#UJIkBvdxNKtpQ$r+2=fs5-*o}5YAXR*Ig!Bd z<{-zq=34;XvL(cPB9Z`iI=Ps72cpX^svq^3`a?~kYw0WWbLs>o=W5U@zZUZOfdN`L zcv#HTBvab&bVaN4tE7;KVSi@jER02N9c87h8*RJor=23tKrkntOMD`Rkp_yPX3$%yLxk6t z>%(z`d_`QqMfhs`88MPdqsypKq=^_oI7ol`HI+nKaXB%YEbn@peSp7wKmbzlV0yZI3QvrguYFBnsO(SFA#V=!c)-vHHTL!OB+Wb69ra7wZ}pDS9aVtr(#mp*wCq7kMP{Wb=nDa$6;4e9zdGF*4&?>jSCj@f(6}C^P)h zq1*1?m3w}7{QTto{dd3Kt$)kBj(r!Bvo+6G(FmN>a%u`PKw7PSW9kz$+ceIQs=ugv zuIUpvK#F5Ofh%!GwmXN|E4fNnU3&+IZpYH)^J`@2T1!|nGSO;;5|90hb^<@32(CGNY=DXAJR*7w*GahC%Yuhd3 zQkpYvy!nv)f5PpY$la%M!_N;NbgxCP_q=KH=Gd#^*E2uu|2d&7*D~EZht|RIk^@SY z;j@`CM(D?CXRG@w=gD9DwG+63j%Ek#0&mGodMfRq{n`GU5c&+yLDJv`ZX(FAeI$@GODL!#RpeM)k@IV6mhA21w=X_4&6<$)G7HaLU07Jx!Kdf1 z3l2%&1d5G0VTwivBVs~^nkF0WXcsDv$bJaIfTqb#EWr39jQ0qb)0aG^E!OZe2+aQ z?*;q{l@9uASIG^;AEKDtLv15_;&q;5Zp5?No8XJ`34L#UA*7yp#(jgbVSn@&_Fi~Q zv{jVi_duQsR1vlMZ>C0(BO9~L)oEPn_taIXjZza+Ymy`5bIcoLNIb--Q@M_?NI}Y^%5WE47h9md}>aA}8=%qCH2Kq`p&E4}| zOWktcE@Cm+f%xPbk3S~^n03qo>N$@2^1N+)alj9vmCItk3;JF$&Iusb{pp_S?N5}@ z9GHl5fWR0eZVB9$dx@9(EmnNjbk_Z+-DC(3HzizZb|FQSIyt3niZ>;y1(a5wFg|FQ zJV|sI38h+EQVRe1lKkP}`<7>#V!E_wl4*||8@O;zt%8{y#Pi zt&>D6-9~rl&#>*mA3=RJRlQCT?l&9hL$~pn-8$zW2khA3P`H+PuHhQGnCp*R1|*o~ z!k2>Ym=o=fmcvcBZDhJ{ptlTAfv38YLG3Jcj&Lh{SMcF@qA$YNi|9meWoFYCh=0KI zet}05dA?}RYp2)I;Oq|G>H=`HO!ajpvluzl1i1uukhY6Mq_?GG{I>e_4OprU(Eq2i z>Ry^BH;PQ$)pUE(|B|ws<}^KM$T@TfIu02IABW!Yrx|~uDR|}=yl1>6Za>!oXQs>RIfJ`F&u=5> z9N!@iFu&Qg3~&R)Rrms8A2HXL;LdXVwfAwvIzKxvImb8$xq`d{iEJ>JoW~aNUyvja zgSKJ`g5#2Fisc%O)~n4kua8P-LL>`Q{nPHIHcK6y7T)4V^7H6SeUX1pu?VU3HLLBI zpP#$)^SRISzc_NJ>7Syuy2$P2iyBtdz_I3jd&Xi(5*Lt9<7W{h&1^c1>) zIgNMq4tKA2p11qkUpS7r5BLs~2)hdUh|I#oLW{7kP%cQoo}e3G6`M$&$7?{3e7!fw z{l^&&D&%Y4#<&bm^Bwn95U1$p%sH@yP)4*M2T_NqK(f70?V0PI>S^nZ^e*!7?&F?Y zzMf1SY~8p%8e4=%u`s=8=}ybqmRiyO?}7?M+(Ww3TUn(ypYfN}bT; zP1q!5waAUUWPUrAmHqd_k&R__&hC{nH^=yy&sq1oRpnIYYw{i^#)L9NGtE3b^h(GN zlT@Fjp;gc2I`I>JD_P_{<{sv}Z!fVi_KmKY-fKitCLFS%^MpB~nd0^0_aa2pNB9LB z4c}*Wk^M<0Swf`vj=Qa{Ii6WQ2{E4NjnDf3oPb)xoMQ5+{}B%S9I+hq5r^X~JyEVB z&Q`9wt{JYs&cDu|z&rUCzKlFgRnYsnmPmK(IQku#ENmIDM)MI=Lmv!h!*demBwtC{ zp0d69x#Yy=uTwrHXUC2+dS#OZlc0C_ox0BX;a|4@A4gXK9aXYK``Jt;6L&Wt0wg53 zySpszu(&Mlwz#{y`{KU1dvFgV#C6=B^}qX14uAOx^YXg8s=B-C-dj)BKTdl3@acue zPaeH_zTngG9H(K98<@bw-NniAW)yuMs%bj_&6k)6UdLGE6@n$hHJy>zeJnL=Q`GVhF0@V_ctXF_u6}8{1LjqLw=@ z+nly1&KJ@}IZ;xbw;i#_T}L@}_9F9W<3YnBLqEfVvL0oh%jOvuoAa#Q9PzFY=?1!M zPv)x&8mKCB?#Fyb26YX39GDe$x_ss2oYabG8L5G3{nDbVhF0yG?1+sFXyIO7=tU1$ zWAh@u2EP6FD)IH>*M+YRzI^ps`LsH_&~VwAt=1Bbdd>|l7jvZig!1XJ6{FWg4hw%B zv>tJKH?1qavY)j4K)33L#kWdm*%VVZ>t;tAsW+9_KZT8&AZ>)(Q{7hGNli00mJR}H z^Q}ajoveSDV{wlTu{o^kEfJPRwn$W=4oaV0WzN4`4@84l!*$oz730h>ZbxQ(nqgSk zRKsxdWZOGOBUd#sAMD_LTsr}HI?Z8^%z&q%-eDcXmd4ypvZl_jHoIzgTF2CVRXkG% zrhJbN2;1vX%5|deor_B|ex-gm_Uh$}kkKDnp4kJIkqz(8hZqJAN{YS~5WWSsAw+h5sMSSRDw5{Y~CQr9R)yuFKko_)2g zzGb5^vFuk#kCNUcQ6)LWgG)Y^78}NxORN#lemp2|BMICPj&jr8e)#+uoE$nT^ibrp z_}5jIS3g~2d5y2t2UfpR{aW??sn;v43)XpV)Rto%j=aLFKN^1M`{rnR&#oRPabKU$}lS{U?YuS+@9DxxhrY&RSQu3OS7r4TTm9^5$YDbtB~lLw?CGo03u zueoeE&3Q@p#2wD=I18_IloQX2=bUESGwUwf8hc+ma?8fsrB6$n8^#%r8v3Hnxyu-4 zO>-3BKDHNJ(=cW`SHe%!gn4xJ&kJrIS}{B}ZgQow$cAjJI-;svHL6;#s$;5bLV_&7 z`#DEHz3Tb)JozM%tl{8nH?wV*nYTaS&V;f=1v^+MpFMC!x z+MqM_H9assG%hhuvn_T$5Ub1SN>}i&-g0aB7TR{62mBrfbPt>z;)<@7STT8AN_5JN zWGiqHtE!kP`NVYzT&mxpiGfDI&SLkUe|}u}?%cb?_mkf8=||JMeJaYjS$f5m_8-1g zbIIp^=(VVaQ6=HVP@k}MVGTlxeVe;?MUay=H1?MeWf^COA!u3|Pim*j_r_xPr zqqbLTsA)go{93#9CW9uS-LfGT@12V0Rm9zv-v=z7PB(_8_nRrXFj zStX^4cWPYPhP2ewij@|`S_7YZG}FFkBAv^NR%T52c<5d8cgNlyNx$@}V|u4gNm&6U zuPl+SU^-LV!FOF~S>(8=(ugbJGsAa>RSN#Bcc2skH zagE1qFkfD+EL7?!r{slFn8*Q_e;t{PZqBF9nXYAGk@Q}S>%znmDDj=-2X3v){FSTb%yzRl64} zU)+4ND<+ECFUEmW4NO-LuUIcJ6bcz@Q& z?qLq9kED05c;|AP*7A>WciD=PDaC5BtK?J3_>xi{{P8K`}w-|IQfhYP6?SC~{axnWARRHf>V>OE_ms^&^QR{nlia=zh@|_N0G;B26l}%*FF$>8=x=A&Hr;#X6k)Dbz#Z~a$`cj4%j=XA&xIh#j zYeii>kO3;EI@psMReMMm=iN6rF}hy4$K?;kpG~q>u3ja)%C<_S<@Hh8&_4oqd(RSP zyY?A66^;O>`Qi8PAFsV_@pi|Do?nmr_?|KDSFfC;l6qFFOOnUZn&`K4&FhfgZFB}I z@VgXH6f`^dec(2~)!rXGyLilU@2;)M1p?z8=j!ShV!H^`V078x(%PtWtTUW5%rb_U zuUnTm?x6QWk~~_`P{M3v@32pquf&H5WKXf#OfB+DJ*`%v(ZF~bfk`b=x&n7QhJCaUAnm%zZU;eCbnx<9--1sn-17w8|fI5Z-1Nc4{AJCO~-!h_|&J%J9trXH)g zFr^H=bvj!Hm-fm3^xK{}A}ct%7(L?${eF>CrzqF3-qP1r$MLs#Ufsuf19{O=_gm+! zAMW|sd$!Lq?^a$TJT;!XJgVvi-3OsFyM>mZ1I{PcE&FOqmT8Jh!Zg70 zk8O^#6*A$uz!LSK@5!G`Rb~R&hTgCtxG@h?75S-TmvRw1TcRR56}mc=X>B5t-6Vjt zBwv_3?jK&q-{cM2L;5ye$=B1OclyDAcoWsI3{x5D3^zAdiFB(hTfm=;hebpkBN7nQKV`Kw`badt*o};{y ze2(}{4SW!M8eOs)23v#P1=S1s+u!Mx>|Rr|oEt-$%0_#A^EktDoN;!P&ML_+jx2Q; zQZcs;9UGmKkZGwx7~rUVz(|`6bb1D`>%GY%)ek4eNcFz5N*;uq=5!$S+sJREM##yZ z6zie(w^&*Rth~uN$n}q_vumt#B=T{c91WcQu6d}>WVk}5x?meLU_981tS?_AsM-V_ z^e5eNw7J?M?P_hQu#fx5lm~|XDC5I~1FxwNckUD4T2oc~TKmjxg8OKXs$Q496wlY5 ztGxDk@9^#qoMW24Sl7*Mq-Fu{&E~1UrG2i3&N=o->upmF<2yr;u{S!vU{TGxfQzc` z+$;X)@sJ=R=h^XGGwvt*gPqB~!+F0kQ0_!6lF!Je>lPvB1TH-uD87tKY@X>F<7Pn|=Tr1N!Kr~Rz8YddLuH935BK8>Hm zkLM|vVOPOkNCw9J7+Fu-sNhn`Eu^L*>*Ab)97&EeU^)JAjC6K#brYLPY4Sp4h1!tD zp#B-aq%oD5Z=^pt2puYSFb0;$C#9ESnX46U!q|H(xkT-MXrb>6cSX%kX>Xo z8Am9cN}b^6ZBbjRyMdeq%2xU=P8Y4Nsi@8O1fC^Yd@K%?DuP|uRozUllVV1L4x1H( ze}xH}2yHWMIqgo3EL0I%@qM|;Sfv0`ldb{Z&r7|g>{W`Cq4=#lIShXfVRM1lZ@}My z`sW+|G`|zrXn}9R?PWvQjZ6qKSsF5%=77C$2-;t*=^sGY1(O$K74s)pV$-;J+!F2) zaH;=r@LG00^8>p|DjlsJ2iMP29xSz%YDjgZa#BNSGAdkKWdXA^8n}p_$`asgHDGnr zQ+Qy*SmmVLUH&B9lfFt3@*?>mP)QRMom!&qhkAVka|Csx#q1Zh3e=lsb8*}Ywi6q} z7J}`vkx66<5c4N9Zy1gZV>_~0?4R62yi+cBn>)%~=bmw!xBzYmt6?WGHZp}=KtCNX zwZC#3JN{XziR2VN0{dT2k|mdXTUo1Kr#8|anZ*}Cu?vXo2NBy3YOF#bpU(wz9oRaI z=KniqKlzn3N_qi(Ap>f$=aDm+FWr=!=)rVS*`~(Q!So*O4b=8I=5MAUqh+#$70tw`t6AG8%&4{l5vQ-kTv{K@oXTH&h?(*c?Aw#+)l z#tdNhvYBiOIH4hYAATHqI%RT?xt&}?ZW6na*#it@Yr0vz2S%?TFP7$jD|krCm+FD3 zaa}p1E~lLcVIG3>I2p{t@q90MW&wAes|Mcg3}!odL2YVpd^%3qFZ;{SfiLVS{cq|r1}-L9@ukKrprEl_iCav4pt!D8J*vWcD}GLN*u^MheCAL&8N=sEn?N@bj)-q2U{FkOgu2%vY=Me0&@AKofYm0_D3 zDFbfSY;p>3ZzRE3k#~5j(_}6gMn;e{JkLelNi$NLbRnIplZ!wAQaB;rHf)6=kop|m^p zi!pQ#=J^fgJf8F;kNCm=UUvXeecj!s1$rJh&NbX;FQcq|@-T;G1z`b(__;|Zv z+v}N;Omilj$-?~CA_VrgjJBe7a0`Y5Ssbq_%1_0q_^SQYM`}2{>j5H5e^^{L>W3{5 zLq{=fm=q=$XNSX>&u}~;2fyz%PP|vKHazhoCTgM*R@9T^<9l;4vgY8{4afRy0h^x3 z8vhMj*h4nr2K6UK5lTMN5A-UOe zVFj5u*M(C;y|D)khtkelSb7BZq<&bvQ{(_{PKSVASV0z&<-jns#LD)EceW*QB;0`=FiVVcR9RUCOn;b;k*oFTc1D|SvXZs-cS%}#89;=l}t3a1D zhE)8ovpr!6gB5!S``H9O)I>0)XTV-Z!*0{*O$z=FMm7Os`i)3i8Bu#OvkFYl$xL_b z#~<)^UEr~6G3NsP1ucfH@Y;XWbE*ojU5K%kQ9neFk%%EPVBa0_TSc%9TwopBs2hC0 zKUqe0;AHxVJjNsO$@gD1XLKdX9J%RxMG z0POM|#^((S55z-$(&-iX&@@T7EF6VG3SS^iF3h;9I_ z!sqrNl8nb!btn&b!6sr!HOzklJgqz4s41ddbG%O;{AM>+e=g$h96AYqo?|{7cn@D# zaU*`moCyc!!Bt$8NBg|NOEm-ZTZaoq)5dgB}B8bOpF1jquqzcmoyB%7JJF zPh${e=Fy8ZkCuZSZT$ak@EsofiFiN*qY%;e5qSx(yM+~8LOR1%M`HED$@l;7T1)ZF zMVP0(h;LV6OBL{2d*HoSlkHgPrTA|z_*G3rA@Q|515`1PQo;V!wW-+Y%znF6r-t-I6AO<^JFIeMLG96a50q?RKKKK{nN>4mH1(Eh1 z=BX*>wkvjp0DN`9=O1F{9ftK-hc~$j`@M;g&4yJhM)a?U-w$P4`1mP0pAMm`=s)nW zQuulnygd}NmXBS;`~UHw1!lAb=65MpeHy$X89UKWuyB52w~}cb-Yynirp0)&@Pu5f z?G3sVJ~|#N*B*18Oe1JrSa(O*(s^*BePMB{F|&C%4=PYYc}k$Ph8b&$xq!|s=3*z^ z4rcfpcwr3UQcH}n5-j#T_RjB!M|WVM@8G>RFhf7E5@+#s16C>`|J4WM+ejWFZa%;& z9>M#zLCp6f9PV7@5m%~WfB6GZp&G_iiIgX)mW%hjQzv^fA?+t-d(V+EjYnE!1vPWXYAmqSZ|3&;PmLgo>&vV zZ$2z^9p2|0_WYIj+h3T2`goseuu%inU=w_K6zzwzQ*ZnkhyAG+BItzwKlvP<>4vxN zgLRvP+v8ZQ=~~3BPFSZ{?BYHcp%?z<2M;ZWS?mg1X@K_~inlJ0?@BnYori5rz{%ln zM27=7V-{d-Ym&+E+?{x%i{ush>g~tV`(v&vVwS9k!xn15?{)istO&!~R>V_BVl|?X z>nOs{&9H%k7%|kZDT{1Y75F_;Qi#{tSf>|QlM~oI#~?=af+fsG#GQr{?kMW$x6A8~Fb{Qe?l>0i9fX~fMth!2Ng>*MiOeK0}+d(2PlVNY2G&lrtWO~roXjdNR5%csvK5Lq?bLO&x8%)ybe`d=hxKMh$c4@; z_Ey|k#-{r5rThchPd+UtKyP3eH(McUJg}x7Odx0ChCs<-KD4GOog^<|^um2a!a`ST zcDXKtxu^D&kK!lsOdYb3PJ)V5B-y3rD4ppSCLDbp%Fsc25aQNZnhA@@P-o-r(uoAH z&6poFjp*nIb~Ag5nM>Z`cC!U%`&1^KOk&Q_3Th-e7YV{C?Q?Ap>aRRdG_IGlJ!#3D z;0CFQYMOdLtc7eqq57E4R=sEgS`+8F-Q+yZrtj%0+%$B+K#jneJ(3)y1tbOahIY_{ z@MAW>I*Jf$wqnQGg*hb9z}^MBp=u`Wi*sHkR=x>h_eOd_y^45Pn?^DM)~F_Pg(e}c z3?SbSla3NkM1d%>oF;)2*j8;$uBdTz4)cRq0hNJp>83EVW2u)Vf}k6e$Qk;k#+wey9xu&4Xf zAl2ZP2!9G$+9)zr{hJI`<}!_7DHWJXs#BV(Y)}TMi9`#YE`@S(iZWlNq<|O{Pwa;A z;==5a`5lgIL`@~-ND%V zZ;=(y^_fpbQ$OY%=A=2(1-Y>9^tk$w`HOUA1~L=WXmyZsM{R@rfnB|>rl{%aVmgfJ z$OaPzXVk8k{U7j77xRkDrRQ-sc}-(k347l?lCO?J=Ki_T7`yj%+=Q#BXAy7isk5-x z9f1Fslz(YgXmTxQXCV5xpf7u;>@|$di5QYKSr<)<~Hu%;m|Pp6EQ7@v9eQ{ zLwFiIm^Cx4aFQ9%WHZUAN1w-jIGuT}TvT>2_u-p_98#C47x113)Pu}&rW7l8R{f;3 zfX(%1;<>i;ha3#;s(7|Bn?Xim7yFaM0y&yMQed~&prJAz$_N4M2r>rU(A%kt5vNZw zGA*whQcTJd+^p)V4VZT9OBy2&lcynLy^YzW?ovvaD@$ zeo{Rxf0NqLXBb0cXvFyAjMtMrpfA$Y@A9@_O)e1}u_a__798hP=pQW{GJaI5}n5}r~160%fnKp>f79{~H zTQ|`?{4t%bPF8EvmC9y%j0t37ST9-`S)bIjfp?| zgXxQt%|=w}I^mYL9Y3v#xZPHHs>FjEvy>acx^OCpz`Av3H$uO-jQ`2pqmPt}@N6R1g8V~1qa0Cd;e?q=>yqiPg;U63J!ihq4a5X3wdvGHt)_Z0GFgpt&o;!X<}?x+ zx!X`inu}h7>{0F%YoT9|L2;@bvEz5ZS!Rtq%74I6TzKa}x^)&rRf z%nGtv5oDA6iOl3ba%Y&!@Y=I#E*-^++$PqSNyiCr9l4CGU0pNqF38jARosn7s@}>xd6jw{dA$CFRlBMKm`3a_cy1I% zdz~i08=v9yHXiq<223BaQfa2XqDer0=doQ$i5f$z1IhZHnM!hD50lhlpgFeVl#Wb{ zS|0hhzH~Y6^51bfnoVn}6A{tJGn;WMbt5IXf83*op`9m^dF(QFInF1iNg-+`|DCzY zL5pJ{%dvsvmAY9)H4nL4BknLmaeKXkbI(ZiG)@+O(|_T!|Lwo&YF%Vf;&9%&O%j+) zx(45Squx~Ct2J<*Ng;O;A2-l}YJsvJDkY1d<5Zp=Rp;Z>*BH0nemDy~#ZBppvIDvT~@8L)0Fa*{8`OG}z5?3&BSf|a*4OB%?ze3J^2ipTVlM`$g?m0UTS~^bT z#JBT}g!(+g7Q$B|xxw({>8vNLKvXlK00cZUPCB=tj`0&Ufo8}(9z`DcZ?MiPNjJr0 z*H)+2HOuAWOt9&!DYicL&kmC_25j*{$8+aARM#iU)uFAEuh`Wz+5uWO1yFc9uKq&q@S)=Q4>AnQ_rup z|K)&(fdN4!0iFS40+t4p`Ty&C!7I|^jjo+rs5XIrNd8n(p(VV~`Oxo=(% zgz8LFu&I-YH+uu)*wZu~T-P7wU`ryHAY)Ab7$-wve4M3=t)+c~Ezh#Ra>V+;e$mVdcESyA zJyXnX*BacAn9p{`2fgtOOE~S-GE9;}?PzSOUb)aeM$!UaWVT~|b zcz_x~KJq*Bn9=MoE{JVRd!dg*81C91Xmw3X&mJM(G2J3=hBXP*hjuK_Rmx3{t8yd5ZkSXg<85-)N?p#|)p*Qvl- z*sru5xPHTp15HVo>HYQ&uBM{SHPczy`4Dxq6q~_v*Rs&E&vM0*X?|`tnb%og*)nW- zw)KvKuA1^GbuMkiM6upn8o!Gh!S+V&B#pBn8@&nbP?nL}*lULZvAvf5P&260vvI_W z3hR?<#qNrz7IP=gqtcYLU#Xqa?p5Aj;YE3$m_cFokV<~DJQnjk#K(3I$1YnJt7`mG zQoU$c!H9wbMWai)73mAY3W5qx0rOL@G~KY+c*wBL*uv~-y>Ap8)bCl_*G20k!EN7Tt*l8SUasYGv)-nOisvvQzG7l=~37i8OK25mG+$H`w zFLU2;2AYgG8jMPbKQxD0FoiD3Zcuu7`Nxf^l$huf+d1rac#Sw+Vy!BTt9`0EH*G?t zspZE+9}Z0m>go61?KEBPtZ17HCc$53UFqe5OSzNs(hKSpoh?Z%TUq8b%r>+%OfZkJ z>0Lje3VFeC%C5EhIOaedHN$ZYT=Z>@DvtBuPN!R6TKZXx<^ksFX4Z7gFx3!lxM%EZ zCf2LAM$Tcb2BIi^R_vrYw+?)Vzo5I(nJ4@xt{-WqgrL*UO35a+acaQa@Df8^4%1Ob z3cn=OoV+^ukNAw(fst(zQd8{JG8^PH%&!|*-I&Bw*cjU>>_@-?eQ%OzD>0lX+icow z^egC*IrCe^k6S)p%jlY|EpAe>xOlFivw604g=>{uMpknYGo9(dwdFQ)1DG3173DVY z*)gt{;PM`_r(5z&cTEsNu`IJw+g7W~eBId2m}IolzHQF%qe@bZOWZoV(v{*iWD3M`$hM;t>;i*wKX`uSb?YRG8yyH0*& z(SU-ngi-WCF7T2!&c)9sV2pj8;>ds+=-VZ0qXdykXC@B$)CIZH$9V zU(H`_+ik6F2AjJ*!D)AN!x;Lx7QdJNPa-MN2 zwhlI{xrI5%vy@8Zp?-q4`j$EyYF}#9w#meXoocO5 zPOj8F*%JRYI>C25Gt{!VIH*K0|7F=;JT5ykBm2AU`~=-hLhHf)qrR5&lx;F+Z)hH5u5B4%J#Bquer28ytZN|D z!HC1#vDbdc{@L2dESbT#ckXqzwKuY+SYLzt5^D{%mDp<7zd=(V3jGnzJAT+#*~4su z?aP2tziii-gQY4#Ln=zu@>4FwY7;&s#iaDCxw^L0XhqZabywGFn%X@6TFlnqdLA4f zVUIIdEMC^1mUU$jIo*D}`*G^a*o^S(k$D#jKNZsAP;(M^S$BYJnnV93C#jCT;#T6m z#)ET@6CSdEk%?++I#?Oz+6GiiZ`(=xGutj}J!?7BJRnj;OSEIEbAxlLtG%Nmv@I@M zPC8=5!LFf>`}PjDme#tKo94-uFPKHAWr*EkdtvvtE4Hb&L$(r!;(Y5kW9n|n;BSPd zq^?fgQ2t6>({c+d$EMXv^R6?gQKd$`>NKnJwPHfW^g*4*pK%|eo5BBL-;+eOP~xFI zvCsM0zTWq`4(dx>L_v#(=6cw5u$ub|EosrdQKs<0s0H=%db)yQ8J>Qro1 zsYc578V%~NuN9u+N;s5ws$%mBRYFSLj0z7#!B%7~bLBW!sKs7iQ=FH5229#&|s^CHv)RhQk)f)_W_QReWox3X2Rd4mc0+BVuoz$hGS3$u-|@{UnX zUuYmq1q-jSd|D}y7XmfE-_=hW9dtQrW#sOVKf_K%6~*3*Jrvh7v0=)kDrYP8PU=*# zeWglC=c1=GQE}X9}?PwXU_i zHHTUJEEDZB#plXrR6x8)G3ws6X+3q5lmJZ9P2k~%$PIuzD0a;U;MaUbe&-*cLmueZ&swcm@N)X=Tri^G40#f9AnI^eJI z9pK%;`5x&}HMIZA-F9BDISFG)7n0C(@VrK6?3#ST2_MEh|2Q2Sf^TE|SM;Mxm?(T375 zX}?rkIt`w#8)DgNvAgt2`WN`0SK=4tGA_+Y!Ca~M%}t0bx9N%4o0O{HU? z7BaMKo#CwUFmyGVo5q@btVgWptT8rUdsXPSbOHt>R;i=Tz^%WrvQut^n}kcTBYV;Y z_gqEEP?{^}lrgAikAaf56?lt|q#ft03DsWEdO}(DxVM{cd*~t+1S|>)3buw!4X+Wo zAgXR;zwlQfHG*~r_6yt_csAgtU$oD7&lCFlx?XObH9fgpvI2<0P0)?n4+YOR@<*4$ zcGG;s^p`2$*wGkcm4VZt0mJ)Z0)d5c61&RPr|BpN~%;9 zx4(z#TXnP=hU(T7DCT`5R$v#qvMZRqM5aw~+kDLgK|T5oi@T~IYF=p9yIt1R(HDDM z^z!q4?_VvbLh!8M#1Jp2xNVJC5ivS^W0)ayScrGaMQ)6vX!#T;$?T-v;3Ptn7|zJ=z3=LJm)GV-+rvkKx1PZZh<7Z
|XreJ``nc`Xh3Kbwtn#|)+aX|CkPtFI zbaMEuh~|;+BK#wm@K#~7L#u@>4E73|6wuADqxUWJrWmWqW49tVn~vLEPuFZmTiaN3 zwqbth&EgS7cM2ZoH_T7X_sIW{=aD}&-;uv7zX|?qE0|C;vSd;jV=OYxH+3;jvYfDL z?f#Avj@8aju07H{@EM*H= z6FZW#asBwNf<>65`KFnuJ?7TkeY(eWuX8@RekTI@1g#F99C9S|pRh*Z`@)Jt>xE7Y zc^^DKs6pUK|D(R^y{mZ+)oXMOG_AQX@|R)~ySUan7TLTlkw$yz`QmkjNAfdr6LK%) zgyiPue#$G(zn6a^zc{a2-lDw8`2&j9l|~yko2*cAT4CvkGe=v;Sm#~WFd*}~s~<=` zZU^2nfLp*SY!Gh+v+$bct7d{mtGOi{5|V`V{0&ZIvtduMP$r+w_2w?KAK2SmRl!GF z&uxP4ovy&$;gRE&;@i`2y?;hPWKdR6Vz3xIDI`8r4k-%08uUG2o`0cld!NN#ozbsF z)THwJn3kj`dnpMNSG(YoB; zx&@%i!Dr1;!G)jxUP`xlisB_=-YZe z@s9WV;om0kRS*}F7u+u>KcH2B;(ycM>UY7nx=$ysv-%NkDMAx=By?rAiJP49j$gL* z*6(JA(Oi~U(z1AY(dD8p#hxWCN{$p8ic*Wp3jK;wiad*RiqDnIE?r(`Gqi#(Vxnag zSd6Ef3F2U&^ODu)&}uvc%^9An%fH87d<^rgvIL@cwjL#Ds#(7%3iamRKZt||-x6pTiS33`v$4Rg1o{MxFHT^V$ zHPyN0R8nR`Cp(4Fs(R_BlQ`a6>Y1X!Ics6OQ?|VHR_Vpk%B8!D9~P}HZc_Tx5M=&j z;cRjC5l~4ffda@lc`bObv2txCS?#Ot1RCTE&_q+!q1X|%z`lyAgMJ29Zb9Gn%|-_7XuUi(@wGIQ0|@=wWIU4okPe6U;FFU>|5 zjuY}qDO74EwgKWYLu#)Mpn6(WovAEGg<`qdOlc`kS35Jmp>#R~S;AY)7uLkx=Qaa* ze3+jlWC}T&B)4^LgWOhYU%7?r$KkBvroZIARqu|GjraKE*$`{-Lf6D?vU`A6Ij?Q* zGlX`0BA7(u-Lm-hOfK+YJ((747&i*N#A--goJ*WFq_MI|YAvpE{;~@;r$w-C!am*r zw)f4FV0~`gY`bMI!>wez^OwWyXzv;eb?-a!F|eJgy1w8HSWzwz{iHkcJfNwsDm9cc zWd{9;+SE9D6@5p_(5IrKvK1C7Nlm4v5|A5=!ggU-kaOyL8qAqBgWPUwY71M0YT9mY z*=`ZKwQjpKlX+1nbuaOH?={GMm9Uk&p$Ya_>$Sn-x3&h~j%&_0K}+SU+*-CKa|i0@ zERc?+#2?xHb&SYO<4mfIz6TY%KfA-Vyu`Tra`eJ(9y&ej}c|N8j5?I z_w561^X%80D_p;!0ODa|9rvN~Ld0t>R{B>S0)+K^MNv;8&+euCmU_t5=wNmZ_nj@r z{J{(XU%|wEW?W1cVUjjLYvNaOP58$`ZTNmSVJ|zBsRz}JgF-_gliSPvEu7JO()Q80 z>2uKqAkf`IYY|E`TDR}ov%-7iSC=qxObR-h)Mh%<0NlyEpjg!etQ9>Ss@6kwq^mkY zRh5bIc4%S7NQa@g9tv&UWzK!L)p$u)m3``RwUT_$bqL(8=1|IBAsv-fD9yeW+dAhu z>qs4;p?(y7S$@kEfX}`tzf(@pAfSQr7++>6&_x@7G_Fl5vM2c-d@m*jxWxMGWZ-{e zHD-PRcaMwF*3_?bkJ8j<3h8lfimsREc+XwBdVG2EhzZg*buZ9vXG(ESd8zK1|TtaE|*4dt!KNN&-N(D~>d3qO%vy+?xhV$FNE z%i3@-4mjo#X~Op6U&FpyvhJvuFGnRdp3G8zNGz0|IgF+`>c(|Yzq^Xs$pm~oB->Co z7zM4nbIJ{{2u_Jbwx8xUmWTE^&@Eo(@|7}`BS1^kMYZ{u>Pb$pTcBTWBo*jX`Mmh2 zYpb|h{wlYU7sG>QBZu9HB&&@GBfN2U^WNot2^|if(s678Kb!B)TvVgM+Q;HXOXDoJ8bM^ndX*E3gLXA@T=HJvj8H{VB{17*|kN*Q~SD^+`njPtWN z3rv#LN*Uq+tM&pm*`(GIt4ggHqo##M$Ba`SD!J-DrVhGcjpRI`UO!9IK{&)7XGh{R z{s}ezzqq&BndlsH3%}D#?d}%qBY98P?3NFTTbT~-RXtWCUiqTGcYk^p%!~(e3wfq^ zUmUAGWEDP=jg!YY@|>IK1^%J1ir<2q{48mPSPPvndATFh_!8v$YKW?lik$V7ZTvdz zd@e}3Z`Ijk@iO~fW8kkVTI+Q~zL~n(vXA))wU*U*Eu#=xUGrO#_~C+e%X-`mPmhfnzT=JsO`BU%u2bwqmd&_z0P&h ztP`rU31mF&$2i$v>=|_C*v!0CD#-n*oxLX{3Rl&Zu3L^rt`Km!{!vc?YweUr%E#m; z${_NH3nh)D_hOQ|j_uF*shv@y4P-iki}XXzQ8xjBeE`UW4$2jJR4H(RNvnKe^_tz> z9?}KenisT!)Y_FMo};z71q@IU>OVjP7Au>`Zgw@d0eRI}?g?W@j<6XMC%o2f*VgCi ziPRn{&L>9f+P(O#>?-unnWwy_*ZE&y@hstH0(CYY$bqN8hTKuFiaVsqP;NBxFPI8y ze`$?~&Oofp1u{!fZ99doCHJ^u>IKxvQ=n$cY#IU znEk?kVSz?KoNX+{s(0vTW&qcQeaS61zF|)xBGhEv;F~`Gb#kDdc#FMEL$*Izr&g1f$O&KzzvV81 zFY%N4OZ_Zr#bosh`w6F^40R$R(|)BY@K6KQX~2PWSJp%Kv8ECP%vY{l;+l<)&jw~M zdqa(r`iT3~-TVmc2kkPbX|^JwQkO13uZV8!K=m+UUt95}d|Bsb(qj)RE*4t%FWs zIpQG2gsSm3+C#Ykb(UhL2eT9?2vk?u5Uw_}A2F#uB3useM8BBEP%+s?Unm67J+0X<{{Fxu*0kG(Xqj zlgVT%JplyGU0|HDmF@CYRbZR&5BYNJ17wiaNEOljpey1|b&Sl#Jmvi~{+gD&mNpaT zqB~Ha)?0s2t7kt+278(#Tntf$DurbmN+xZ~l(Nsc8B7VV)e_acd?;NK-}*qPiiz7Btc zo4|#k-qo1D1zuD+8m9b`Un@sZS>B}NNbk_wD_y-n?z0Cu559nZz=@0tc)otXH9k?3 zfb4%HkCGnA%fQb1AXk7EnnSLL^HDG6GOENH?icI9WkE?jgY=ZMT@_>_eZ|ZtZZrXV z!%fwOQD!PmC}iKJ-`Jb{D)zQoSH3PaP)DH4oW!@`Z!$Z93VK2#a2sBS%Dw@3jx{8L zeZdg;=PY`U-N(IP3ROuls87i*jujT7W1ufEF40UYWGvf*m!t)s>WgZlCgeIi)!y4eU8SwIOq0~hN7 z^9=nLqq${dAMPysNEDb`x4GHmFgoV2)WG)Tb2$s&RG0|P5W_!(YSKIUL>&cg**&^V z*@rGEmEJ}4g|SeUoF$C`4reer+D6g)>R7dm=HhSp%2N5U zdIIQ~^Mpg(PbVuGGaJs|=UVa>Xi-rpEqbGmLOZ*`_X*{S^^258WSgC_XcBYN9S_MpB2QM->J~8G=vwUe`C%yjEE!ep8Vv zuT9R2-^a64b0Z59ip?c!jepy^h#ys+83v8TlfYlzWY*E9%4o5kv&dH0oL<(nq<-X2##Y2jp6mcaZ%C>7x}z3{pIAvwrI&E7m?HGk6lzwvMe0oMyY!WH8{C$< zRRvFZ6Q2*3-av32CSw;)1bWt_Z5a?4os-b9QfZ1UO<$#9%GKmwNv;b1k&XP*_4k>_ z;%3we-OIcR4*ibJGX6@>a^|)uT4cyIXWQb%PKp5R;X5G04vRiejm$>nx|VaTW0zfp zQZ}_NM9uIz>fo8B+l!q=yNY|4WS4Y8?~TJ{?F|8@h1NmN?b25zpR9z1Ue@V7!oBKw zYrIbBPwPf$gM}=ng<4a-DAom8p(?Z)CP~?9sr$sJ&8ddEHR@ljx4L0o<2H>))N7np zT&`w7V{I1cDiix3rPs2(!jk)iwew%(PRLtO zkX|y=@~=3LIixwKd+Koy`MRe5bNq+;&+*&m)5SAOr_o+!-zZa@*{1qX>GLglT+*p{ zjk%G&N5u)vn{?RGDyya3c0l`j z#=o)Q zcrJAx=60K3LW-dYn~F|Q@nv(1vI^E0wn6qYB)_y|i2K2`gIzxK^zU%GLvkEBX6e104De&&biA8UQBm9g`;G5=c8 z>C&6V?N*I=Q(?F+=;IOQ*-Y=Jd+b)MIVGIn3G#X;*+2LMZ5J&Sx^P|TV7aHgu`#iv zW+9V5DknbscaCqqp&+#|$0T`FCFwi}P{9(vsD`TLjMKOD-+GS(D^gd8jv7(Xv=No>9H70VlAM@N-~ z4E1a2zDua)hE`c_X(Y~n=}%Xm{G%P#9ir;{Gh zp7E^j6X)ynad~g{+3LI6cZ}C_Z7910SjB$wEokFSuq2iRLpfw)&hP9C*|y)Evc_g# z$g5a1*br}?&a{jvuHUii$IheLt>|8VaJ!K$2LEdFrTXruDBUDd({`wMb>^#g3-pV9GwMNTiF(cb0xX)KyV09id3k(ySsZkZKpErRHp8(Q+IcFcbTaUb)y2b6i)~V zAt8P(Z}oli&3j{|$-U?7v(MgZ|0{J%ht0i*^qkw%I9xiZ-GrM%p0!?7IWp`HamKsK zb~RV?<<9HFX|MnDKkD|c>e=)c)!+Ph8~SNO?k)SE0If+1oe;S&_C(^uYRhZwPpy$W zC2mM`dH7eYSzeXa@@{uqyREddz1Y^mYAxAQ+`hO=@#^Ac)``~R=K00-%@a%KJKy`R zh`&)?m|p7J!LP!PNB#^m24B~%P$n`ya;uQ$>s>ysG*EJ;_?pRRd|J4qu%N(Oc&hkQ z$$8soXN-HlXQgjD(=GXZ^QN6uohEepJh1sV-8lJ><*kQQxfV8zeCw%e?ws-J4f#Ce z+0th-AGdin^_BI#^t0sa>F-9*yF_c9TlS-Tljl0WkaEkbYpaLO ziOPr$iW(n2FXV(SO;JwqVkdu5d2Fe<_@s#~cIKReeMGUDG0PpoNa z@2hrS$XBG}HikV{z6vby?Qlq~ zM@>a}Z*$M*hZdEXlS`+UZ7bjJ=#IV6(QWo_L4UgnKQ3^E3Q@GvzSj?hchFk>li>bA zRW&0Stwiunx8Jo)EVdL4EjXRqJm+gp*E~sK2V)1*ceC1NE`Q=V!JcFOi4CfE5$d7& zZBu&=9~M5c&w#1zOY2QbDT(P2yo~5*+nxFSb^HGmk2gIW|KR687oN{~JtsZ&=T-A& z_jqwW-A>7?F9kP=>>9T&u|?v+gl`Fb6O0MT@%qSvLH%Tz;v@ec=lIfaV@Tf7JT|{~ z(ch*T#oJ3Ft@Ui-&fA_(zD<6QzoKwZ5+MJQo|pSnd75h4VC@y{3aw5vSUFyLR;=%B z;doR^Sgsgn=I_j@pL09!NnwBZm7lWYmW_2a^UB3anz)3IwZj{v)xX&yvwMZ1wT2$* z{H2|35~*pgs-*ZXRTFjja9@e5mt%&dlhK{0 z$Z4Ka{8OD{&fStfsJNsw#r2I$3Z0(3p=N`cwt6+&Z0?yj=uDpz9Va$d)qYfYYy5{G zRX|}LmRT>o|C`Zilb?)vH0OEBOVhiy8Pjvtm=4;c?*8I3>V&L|W_ZZXD8`_Ru4agc z-;h8j{a0b4VRNKk-ypb$=8CkTc+X|Fah9g01%+es2L62gQ<@_ya1|F=m@*YqfzSP? zg-? z=6Q7IU-OghuNr@f|2{n5Q|xza^9M_6$hNAQ1$PeZ5~hu8Wtf_5R=S;Gu z-W1a(ZgTR*dIQ^C?QZKjq{E5UN19e@xTIF=Bt_`mz`e3u14X5H$D_*HMt3vAved3+5DY31hib7^; z_Dg3<*0Ddzvn}5X+Wids9+OciLzek4V`bK|?0yB4EaM%2doFWpBs*j?)I!ivxO#LA zW4BuXtjP|NJ8kFuyzrX!dr*b&`y;f zq8U$4a)1frJ+~jf@?Kwscm1k?cwID8&7Xo zzuxH@T@u@dJfL^5J6*eM&cX{>Ss(qccfKC-Zsy0|(<^=c^zCF$aPgwDkM1RWUAnE( ztoIKn zO!I##n^ZLMd*bIuA4Y%p{o{v^vpyb7@0FRJJJS?l-QoD;8z6Zu>!V85?bOGF?T@$? zt%)s)C1ckZI>y$CJs;C4azf~upr=YH)t~L{sBZo}Uy&V~`Si2tL!%Ez(>G*J%2``b zt@wbgjc2|XF72hPsvQ~hkG?_Jk?_>;d-_4zI;vvk4!uY+lpEtUI_-|B_HCu}O62Cw zrB^+z0+%S6j8my2CCSUGAFuVW)}7i!y~z#o8Z@olC8cY$S=UtVlI--pu+GifosstW z&}aHf`_GNjSAS{nJu@%Iw7}ZP@z^ttw-JqH?G!ImGVRBpA0ee-c@fD`1E5=ZYM5@A z8=Vnu(^t`||&&V0+Lh?Hct(l}#yA?3a!rb_(O!dr*ahmT!cUQf=pN?6AOdR2k~27evmi*sfCd z%50U)YDa2Lt-Z45&B`_7>F^)AL}d$-@>ePQ$M`rm?MJiky)t)v6~A1_Q2wk^Fu)j5 za;(heUWYpKb6K#`soJCM0tb)S@ba+2u-D=BBf=xPhDV3K4Vs{Sz(i6}!Z&xrvdQMz zh1%SjKPG3DXK_E4g!o93E+m>6)KNEC;BVdegB`;GY^aQDh z3g&yNCe@tGAnKEC$px6ndqdAq+)(vYRZ*?d?h8$hnjB*=+%s&8tr91XABY^eT4)um zM!8$|nkeQ<-KXqBOS3IXYe#D<%j=R3)(T~k@_kO3yPJ0aD2_4^(7r=+yhYrK$!Qs; ztF_>(oR^%$^xRt$7?F1Qfk@pX8Q!vaUe2MINj^LkgPq?c*{r`FV6mcU| zlfAhO@LpI80%d{kGwbBPiyI}WWDEMB>?9PA~{0~QE_^Q3P6_iaX7hOr-0JN*=>?t7@ zlVNiLt1&0mRw(6e`n!0Cz#ZYJbExxC`D+IzCCWEBV_aihH(XQP$2{M?mHmIR>$w15 zBe0g3L!G1VON(TC7)t&x^Huh@)J{z$t&-3{4j;zd@MriM_)5I9eXIR1QTb{E@67vz zM!G=WK~+Vg*M;kQg)5@_#){>th}$fLh!L35oGI)V^?~uCi|@!9y^q{?olDE#*mG=COI=pE zZCcq?d*AYabEqfFw}V~GC&CM&4@fTi$VmD>x(*z(m&yKQTFUd~+Z8Jn6XEC+?5`$chE+Lvq-n8;`ND?mki$I}>oh<}5F_`^HgmkN!e1aG{da8NvsbutqCAk4i$ z3-~9dej|i4oPy2s-gnP&njE?IJD5ojOFxy4w|y$ha-47t^RDpU=c=JA9*lbBSNad> zH))J)jx0kq0R6dI@+b1w@;%U_)L^EfcKMzBAxQ}YglNpG*|@&k7%0fDi)gdM=b#}o zQ88FGSM$4WT2SwxmAX6HJ({{|v!Vi1hc1#d6&Le$xOVJd|1IB1_yc;OlU4G^`0xBm z_=xH;jWq=`pf5qyumm^>MMjaU!O*+?{|T@(e!qW^r^wm3yukhwD%S9_!FGLlnsWqn zAtkw6|WQv6mI!Bc^dOZwpLn%U$<1!Ffd!( zBh=$lIhy;%oq@t>8I(Ae_{PFV=mQEcry3!)!}H%D5zyy}#Ux+>NCZnkIh(@s;2BCd zlmDns3a`=QzNTkH;^GBl`_z{_lkp4MDUsCB0{ zy{!RF>>zO#GwYww5B3Rv2)pvSP)wW$Un>i}z7`?{He4Oi zkGb3{!U(Z7=n8re20lxw5|hx?(4lJGA0)R5#5eFLq606$3i&K9$FI}F?Ya?^Dow#? zs1_ItHpNv;C4UAdq$Ma{p~PX31Uli9E(f`1HtMZ%a7i{oyV*l>E6^0(njCzaJg8Hz z!za5UV%Io$-Dcw1{t)*ET#|ca71}{}kv@`kmyMUrmQ9fZA^uif)D0zxAk_tfWe)l-Pr+|_h>qzTyv;^n+|)$HTaTH^Zg8ue zipe1>v}kLvGIt2Si622M8vqqePmnx9i0VW==x8bOCs`jJApNM$)Iw?vv`SqmBRPmH zL=P~G*bY+JRlMVR!ghWY_GV>aJ^CkS#Rq{*c+(6Zf5Ry!1zS!eteDtsPh3O4`#vZ} zJH>{WsExw(={?bhS>rKaa-0W`tJDAaQflx%IB*53gQDjnZjw=$fegZY+pnq30Qe41 zL3zo>>_n;fQiMaEkS5#}9th>oniqo0w*}LqC4uQU`8pFfz{-MpiwGh^$*SZ~@-@kk zLFoMm`ye@F&k#;9B4j$T8zVYN-NlQ!lVb;w1Ai+nj{;oJkae z&-R&kNgN|a5E9I%$6;#!HJ(5ZFg(wrqZB7(@o)K7!X-h3viy?R7rR^z(p(%llYE0u zU5iSgOynsjQwCv@wFOwnPU1G$y6rKeeF^(?nvf#=;OVZPmnKkpbXpxcjOQ#PSN1*-NBsgJYokN?-k%>ipaq(5*vwyn5+JZzSddLgf@Wi zcpgvjk-&fmFoY^&rEcnLc|OVK0lQIY}>sV3p~n1(g?8JwXAkZcCyv)q)FOJboKD8PUD z18<}YcoD6z4(lTFn=xO#m8b~Ce?6$??-8wu>zLO51GIrAh?yU-civ!9qX$@8&p~3` ziilh$xk_{(SCjWiA5^$gsOsq0$CI~+b$CjCFux)Y|AryX6p4R{>EMFKVJ0|UVEGTc z8~WFaLT$`{3*t=hT$VyF(w%rrR3R6Vx1pKd0w*~$=x!dwQvu&~5129^5v`$s#6KB} z{gnZ-&}Ga5vLKTj6}AZ{!Anhm7U6?rIgtd_SRHCQ`nzh>tqtUS;xRgrwZV$eh#Rn; zukg$H)qF1B72jyI7>;PT8^nkWm=i$eO4b1}_8aI0(?O+;MQ6Gt`raK7m5f-;Pr&ln zF8&lcgM9fne#$)lK70<=;+vF$W?CKZU_4^sNPLG(@)44Njd5k?w}XB_W7J5G~hWl`aGq;x6`OJZLu0L3f%5TInH7)_(wP z>L2h_dSMzq7khX&ICP7!w%$Rj`vX(^)1hQ*0o_X&dc|F^=8K5?nA-jU#!#ogdFUI; zuy3~GNk0YSz=D2$HafXIK^y75Dh-SG3>bu(5Rfp4w#HR& z1ef*zwFJ`&tH@I5ST`d-SP7a@6HMga1Y`Li_GMKNxcXpfzY8M!V(gA3$UV0pUpomB zkqqxV8tj2s%%j*ylFBEy;*ItorXY@#gVITgClN=E2mzskxDfhFBM6M^@y6bQwR8)F zvHkeG3vimO!Oy%2f>9IjAxD7BP!9^kV4S(F!OX2e9wwy3R;;t>&?59kY|@ME#l84s z9#EDyft&UUZ}|>ZPyc`qc}*r%v)w?;__ZeEv4&=Y6m|>5rB#Sszc_U*5J`T3_c0Y( zgi`ShH0pln*WM%dSqt7pB=*E${AmY|Dpqm`PcRnrtYl<6L!nldf;zKDVgjw9DfWWNyFnxxBR#+442}PhPzR^_>UHaiC-Uk=NiW>%vbK^D$SKUF-x{e&B10vH% zWRB&5rr2KmR0KQ|-4DgbG$_y3s)#bQA02(pt1J>M_7-XGus4uq=5h1_Qh_Q()K z`2P4~C7xtFGNe>+U$5fCH{#67$DJmaH~_j-&%jf#dgV}hUxLD^5~!gJqH-8`l{FDB ztAN3mgJ;tl5%CoEP;)RO$Ad4G6{w77wG->JAy)lt@a!^>v-Xi><3wo!!ofo^4bS=u z*!61c&~tB2?mrA>i~{Pk8)7gRb+eF-wFKR5 z9NtrP%)TE*#`CUS)moK?}_<7DAEZ9v`^h8uEKtSJgPj|TMh>4;6e zupU2y4Y(4@aT2lF4EmZEKdUaHTP4DZ>|-vte!nAgYz5j@3bKwUaJ?K*Ddi()*$zeB zbgbq#$W)U4f3EFXp-e|5fE^rsSR!LP&~Cc$i_#3Hn$Te<{_|d24HPe0bM2-R9_!f z$}eK$FJfUM#M7bpw@oA^`2A3W12JtB-t`iY#j;S%n2Pi87irQDnx7T?M4HG%tc{dh zz-~_l*H(kP@D(y=4OY-TWX8WJWIge22jiz}{vS1`22#LQ+9R&R{pSQoBXX=Ecpc;U zm10$Q$Etac8^cFpFVU8WLB6*E&#w()AUsLImbru6?>6qtDcD6-aXOC%U2QH_^$_Gs z1`txGV~wl?!>T{Bge1&_U%}Js0cA`A=)#3q%^JM%-|^08;mKdcH}4Iy*fH$1W?1)A zz=0csClQSpbPdn*0QU9_FmFmh-F2g4&>GaW9f)i<5RXMrGOvQHHXgqhR78XgHyt##)3oc?OJjFb)lo+Bh*7ib>iZ&1n{{O#4 z$kZ*w58??j=O!SlR|8Y$HKsT33XOyp{7JBQnjkN0gcGJSR?Hj8Ags_#!URTA3KXRZ z@;{Km{zi7y2T!sc*p@HwJbGZy$4NGU8#x{HsdU`X&LdNsE-Vxd2xq~7{R2BG78Kpd zIA7)yF8H6$CA%U|DMr?vNNmBXJ%n3Hm4F%h)(stGC+vj9c&=;lEEXce`hci373*p& zo&+bUk9*sCu$~5D_q!2yLZCj&#*?WG0)2d7F76$<_y&(~-pz)u`w+1f)@%)|r`d!D znf6dpiixn^;BxSz)4gAYc39PQ3EyQ8}OztB)h~yFTqH%jf?uzxA6v)6kAB(8x$Iap-IB$1B7|IDGAoJXWy*pj9 zA7_UQr$T=`htnXx-$a)68!;Jd?1nf?rXfq4hRpg0xVVjRS9yj!oyIikLUE(`7+K1V zKqR8i2%O`OK@|_dubPLOl8$r`Ug9;@XcF-cV#7t83hP7{lz2mtZX;V(#*1-gqBpFPsN>c1OJ4t zBeW0|AaIO9EO-b0Pd(gAb8vFTNDd=9cS2SdgxszzKKHM^`w(RP-;gy{0MD-zXbmk; z138IPEgKmDgKT;R*ftOF>=u)0#keAfj{(1aPC8i#wMM=gxPWy)zK6 zj^Stii+FGe_wH6$u^$lYRv^wf!A(4fe6~66taq@cdgCp~BnOdYbVv27Bi?~Q=#E{K zC@vE@@f66R@wkm>!9{+LynY*YRwK-c6=B8Sz`e3M{)riNiSdXDI3FdM;JtK2{NIiB zRR#H$M;wMzX&3UTuf!Fy8I-3g+D4s*j`KYB(^2f^NuV#rBUgVbnT2{n9Q1YfFt5;& zR6}wWZyv{ecrm;Io+HCB5!raN zM#R2U?A2GK7S)lebcoa>eJ{JstdMt5m=u?k50#UYm6iV}`eA;@A%8CC<$owHDR@O^ zr9t@!6O=*n12VaEEOnEpEZL1}P$~SxKlvm5jeHZm#h&q=JMLpHzq6_HNO`03tBwVZ zSVyt_vHbw1^jA50mM?LhcYSlu^p^RCvllTvbxcS>wrdopL)rL9vW#ek36YKQ5^C&=*uQ)M&kFZ!P-AX67dX2)9p&T7 zJ&xgyFh{Z@#nA??K>f(jrKF4 zs@{f}L>EWDj@}w=izN9h(F zho^}<(KXPyviz1~k|W!}mOpT(_`eBfNuw;Fe5QG>OAGoC^e*VP;3N7SVUwdy#pWbj zOuCwUF?o0rpHM$ujJbp=f2KY(=$@uP`G~ndrx8v}d`|ZVd#Lh7r5VMq3UB2_n)I84I-&|s@XQ@(ppzM12dv^yvA4GRY#P0 ziUA6xLZO(g7_2xZ|DD+&ZB8wc92f4e6?~LuzU#iTp)<<4rM$r5vFDTxvE8*&mTKl5 zX0`Pn`xj3~fuXuF1C`Ngg~qDsuM5-H3_l$+FF{*rZ0g|ZiPZ9Qwd0z#iur+Ybw^c{$Z9W6zZEnCnKzJmgI(2Z&V9Z zC95n-mL*q9PDq)Uaz1%ga_gi$6;tANN9yz&)SHPI_KHApL(zEzu(U-ir zKUZhPX12?ypIIlX`j2Tp??6BFA%AE=VZjMwE3;T?cHH;OWS0rg0uG`xW>L<_nMF+1uJz+44&lTk?#Sk`J~K{0_-kIt6F$kia+c zo~)s&TPPd7H}OG|E%|2xozODwO+ufPtJMRwhSywB)tUI9!hbPaB9DX&4cQhlDfFM9 zdW!RbNN=irf;FsUUCB>#>ynP9cSWu9N9R1qI{U53w*#4_KilQ)D|}UWzj&MZfOU(7 zv(~odJC=KXu&I(&WLIflX{M|hX2O5Ub@B(y8TlvWZ>n#KW{iz~M_;2$s4+yQxR?Kp zeeA#QWqmSlb>9Wo6}QtFaCWs@O5T-hxBlsWCq~Nd;M6@Ocmo1iK{iQ!C1`S_A^Lsn z)R>xv`_boOxe6mw?p0<}npMm-l*MQa_hP3R@}pBClf%aZw^lu%teAl<_fPf4yPK7F zf;xMrCC!{;x>9&M_d)jg?2EZ~3c`$u#%89~ra9(j<{c$1tXu54uExGz>~WUiMsO9y z5`vR@m{FJl{f%rv?W5DB4Q1_QK}>|~DRm_9i&sX#A?q=!*Yn^dzt%GYF7kstQ(UcF z!^(R*KH6K_|L_Kj9(fkSQVl`Z*%{a@PYn`6Q=*5)tc;C_PK`>8SQ~vPc5h-{%9TpR zNqge{jmgG03y!@IC58;wIWz;+*OdjbnV3Rd&#m?5mCGD;%hGL|tX?x|+?2l{*OdFH zAl$goxT0`zzB5lzSl##rTEm{D`m%NQV8?aGbEm@FlrxKIL_DaihCnsR5PFn+hf*bP zgE{wi^i1gmx*h$N`d3<$zDa%(+Y1v|9XE!h_!+#NmH3;vcX(F2lH9g(r|+X=hk{cK z3v}|;_nIZ|)iXk8g&heSACjdH3my}4AbfAkpYh2Rk`g+{bA~+;gTj*{SwnDKQuJK? zVr_NpMcpUOE4hQ*CN>a)xqJST-dtCxBgyu=MP`{^@`t&r>4xc@=`WM2=xTnGg6bxz z)o$HqJz(ivYPapN_jmMhe)N3vALddplW`o3{QV&1R-|jma_Ley&K#spOZ!V}QrC!e zWJBsa(MJ56ZwddR5BysGJ=d9k&pz@6`I7zn*-74a&W`pyo-vX%Mn-StDtKD^M6#wT zMpsL>PnV{hp#7j-6jU)ZJ?enrP3*|nbJ1k@-ysddv!gf0I}BCBdh613oc^;9rT9EzCfJ5s)#CotHA4^^lf0ElJ=RuM_NaVo5phMueI85>v zX+i$~7aY%5vpd+s?00U8_>9Z;{9WFIJx%JBSEcj){hiBvGHSZ|xps@HOz~M!ODQVq zX<~ykVfOHG5w5U9q0u41`jeqaQFRUbqNaxyX%_|&p_RgC>SHuV6{qAXg@V~fN0Lh< zEMJe6`vV}04)lI;hL!g$S2@nvbfw9qxwg!*m!%sl3$5>LarT#G*KKs!UHE0zD31aC zYqT%Q-^thCyUKgQSDkImHRqM6%`fEr{5z0u{{qvs8>-R{R0MeOs@Okp2_1&JlA)+J zY{L!BEWF~A;3ld8Vf+t%y11XbO7$YnqJz7INR@q4ER{Eq)nF3j-{t9wVJcR$FsPq? zYjCCD20>DtEa)1fc#%hXE4aV7F1fzAs=6OKFFN$)tz7fqHrx|zhcdXFu5qV>g?AqG$a3ErU$}Rx z=aJ_xZv)?EUuEB7?*XqB?v7b3&4nUwu8zD?!6)){`4IF2TcZj-9~HzSl0S&Qhz+QZ z86{eDvCiRUXTUAJ5pfQ8sgBYx`9?wbtzXtc2-zrQNk74HWFVta+;R$z?{Eg~B$w6SP z4QH?wm}eBxFZ-ALh-$6`HJ!0!HDvzh@%IrR5KSa|((|O-@g_erEF&_D<+~JFsu7x3 znq`_knmL;18Ww;3Tys&ApfRY6l_Ahwh01ee3#8rXBV-pM2~2V$Dv=v_38!a2_}}_Z z`xVH@zxZzWli0=V2(|+I#y`ye4IY0({SVot=;(zDp?ntR^^iI?^E_ucTx{5O5Oe7x^86g!K#F}Rm~6AEzudJCtC{*@!_i=;x1YsVSx9i}~M)A?0 z_s7D!dnCMDOTiC6&NCofT*QRwW_V|oqOznD7U0(MgR^jb&=n7Z{zJ>|_5bNtf?C#rEf8O4>$M(=sH^XyZdLr9Wa&)M)!7+a1LbnH{x>az)7e+k3ki9KPjhHQzojM z0-p=MaZm6~2T{S49~}M`a94{*9dsaSUItsni!3|ioBA%kvsuL5LcM-OjAb2WXo#FW=lWN{poOe8+8hNdRj-Q<-(lTjnK`#1u>S)Ay-?=&P?m#c@7- z<|gA?9|e)8uee?a7L*{_{U&7cLHsIAnO|bNu?p6T)xO@J22I&QZx8PjxKEq#t%^N1 z(2o|mUG6_TYdtkR@g4;{-NU_myy3oac;jDvE&Oq8W3CZ&GnG&?nIVqDS~kEhYdUe6 zSO)s}dYY#n(#z-x^maOf&ZkqP?cgnY9L|nUs4~zHUegEZX0)5CNFShSX*B%x3c${7 zFa0S!E^EQe#J6q1Bh8^(SF17Wy7)_)ROq-dYl zTkJjTRe2YAMtLs4e|xyMs`tC+5}2l(H{EyEUzh!d?ap08?`|I779CW$#US?P5HrD+ zC?G3SPpJv?4?0#lQTjl79$ABneoxnumSawHf=n&zjVeVQd|nI1Qla3X9ieB_LG*06 z0ro>ps2bgs{+<4d9!V$A1RYI>(O$|5*2^P`rrYD+aa0fL7aQ&e>X;O~11FLU*@u__ zho^R^r1d}r@iA&a9mSsL1O|cXU&L>Qn_xcA^3UJ2GA<5ow6D?idHCHqp>s0=EbDg+0j7QL4)phrU$R3i0B-$^@3KO#UQ@|A&|QhB;SWH~y$Uoo#esTo zS__8)U_L%cJ@olQ#5wScZYMN{QgJYJA~}3ND9rLf2du(FQN}grDsvq$qiJT-SQon; zktmy+4Sq}p6ldMI;oM?)N~Xf&pfBHsZ^4h||Kls7>%U%j2!+HLWD={PJz9i`)I8Mi zYZ0f2q393)BB!N;A~c3-K#ik})C}A_{-h=JLaHM)upw}jjiYqb7p%zfIEzn`SFs0M z!k<f!yZMjgCAc@F1XH0qj*Ne5X@F2{dYf-3WP^lWXC7$}9*=vFL1 zH>MZ*E-%o3ltUZ%8lAR!fif_^;U6x1Lg!(Qpaqe4D}RDt3|?1n-i_PC1#T4HAO*Eu zEv^fm1IzAZ|AL=*2zQpN2eRaOem6e>UYN`HX#A92Vl=uBdFZ%}M%JWvYt{qR|F($6Z%}vNiQZ{3I(G*!6TV6%3#^Er(m*Jou}n!VPFGDs)nOvc{+dcg1)9ie5=oY8bxH zBk&Uv>9zFV*nb4*bFZP?n}HSE43pOFK`MBM3H43rurDAdkqt=;aSvUF7O2>kqvOCz zl=!aiQ4J2mDRvuu&=u$cZ$>Ym0Tc*h(f4hOns0SD5!FUF3tedw+)OUw zTlW{g3zN_jaR`H<#B&I}1gTKQ6M_P@>4T`;eiI@vzn=(hDlN>13g9~QDbKJL)qOhsqm3F_qC(1WM~hl42e6QiLp zX^7~(MQjBvT?X`1{~$+s3P<&Fbn12^ruo6l8iJYsZv1}aP#Ua(Il?MoGUy~Pg{A1= zUBNoeMjvD`x){~qw|E*Hq+{eZbn;T5OgV{I?I9zuTZcn$!;%-^@cNOg3oqXz)LbYN zA}9w`3KPgt=vc4A`yB-z-}}fm!{G|>%Ox-I|7w+W@M(yH?@|!X8#T_>SJ1!s(V4x8 z?td`$=s_qMoiJh{1W&q^y9lC=2YV?+*K|N7300o1I-df z3k#t0T85s-FBjDA$drDmvTjM*ASV1n?1J8|1Uiu(WI7Z}hu~|{k8A_2N*s9w+B-e* z5FR`;i4^>ON1`3H5XtCrJV%ePL^1^bo+h^8ZE=zZ$ePoLFX*SG5pML9dXtsFNF4&d zRs-3d6u?lUp>wT)oZ%p1!7gYpo5OeKn`l6{_ayow-GoZQNW|O8a8JGql~t@jK^xap zc#OaPDeQo9YAzz@Txb*C{!+@}Qw&1q7Y;Mfi#5TTcq^HU`^gFD5;}lzJPBR>TIgfv z1%5f=-IwUl=lCwhAttPao@68<-e_cLk?2X)!rIXhMR3*fqhA^aKHquzWpoqW6SKE8 zi+5|fhUSt9jwrX5nj`BjsVm62i$sz1m46jGmx&|yxlI1ya+6{^&O)_h4!2#N$W(); zK`pqy{~_~Iw_P^js&Y43O_E7pv$NtReW@_dGZrq|gse8GL9@LV>>;Xts&|5!zv6h$ zc2W0}ek9NOzIwM)q;j%cZ%<=Ss1}L2@(g;H?Goiybmp6o@!T;12Tcp+(wDi9WjwA1KUD)g1 zE%G{wLawv*2q$Rk6YD|e=;1mmv{60qPjhFghpM_dZ%ICqapEz}8_7^_HTo6=sTSkqM> zxCOp6rRb~XT}bz2*MVW{b6ELsg&2@K+leIckm@1J@mkA+Xi=Ii`Ip+~Z_lnIwbC+B z7K^})_A;1GAr5iFefNk6bvs#%WT5-1Z<<`IHZV&$v3wpeifYP?ly-Gw1gUA_+D7itux7e6{9=uBB<$w2Bwd4^+r@Mw*hec*e?epP6xm;6lMW$9%(Np~Tu z`~D?9$OgCt{UhX|}TeTH@W;c0;Tg*$aNBXI|6Zh=rJbvXB`9;AD zl~xBqBY7x$&p!+dk)C40#P7aYWIHhEe#k0GzDN#zPPG=XuUeAoogJi#3Q>vRCLBq?3#Jv(iN>nOEYA;`RIl$}gWL zO1w`!iokX9wvZk;D@+h`p+dRiTjrp}!|GSEAaSq%iu+&x8frD&Npwg^;UpU%^ZjSV zdQ45WuIG?YpI#;{@lN9+S*J5<;kNDw*)2KxK^rwg|{7k&K@LGO7sI&wU*dX#lO(~RyHq$)EMD-QScaFir+3OmQ`*2Q^r}k(1w=0Y z*xS)v&pFE7lAA6Z^c=NotR=$%b?b8el{wFLxm}jnQ z7DiSHPY$}H$dD&!Zv+q0B`QdHTiH6$t@hGAs6j-0AkQ<=eTS{%Tj06kl{liUm&;CC zDwOO&uh&$nEgfhqF8pYTE%lZicRaM#uxz*fZu?$laJ2Tc^}cbw^z3tc%C~cvOY~O; zA^SV`CD4?*%~cO{kiVygpmxCGj+ua48fI_DWZF1df2CkR}tBv zvqL-Sv$Usmn&8zSflpWMVSdUO#)?XFB3B=iW3j?J?-j=>%zZg+Pi^fi4a^DF=C(8D zheb-$d1Fgs%Mt<|*DKaY>soWXb)k(Z{ouHQt_1CV>CSPk@bp4IUny*ZrewbG5p$w` zXpSaGze^)(At$q4B`lJ57*9jMnY0XP|f@CalRSc zT4I-aw04kUlyZtEM_9u1!cIkGhbi^B z+OO&m?OW|lb&gz5FO#eaNX3P`gGD6+UKmrH4eiZrjje-AK3Ip@CRzf;+luo|NyYNw z+a}h0&C{ue;jd=%9o#k z9zZQmU`T08=^yZjG%^*Ld}@p%o+ZqD~vC#+SvVMK2614qm1^ ztn01&t{tf>R86Bh2m11h*%YugJG(|XR@$v)1Fa3sy-kinx~Nvs>cU$^yG=q-%Yr@y z2MX2{=9^Yo7+ZgPFZ(|0YD>kkT$h)vAdKe6VDhOSZoZrtO*@!id1b|Yri4zDUck2- zrf#ZiA=fL+vZd0tvLUExG?a~_nv(g%M9eoe4CG>}Arq=GouK8raf3Wxyk{gOt!}LNPu)RBgiB&4#Bh-#BHrq!g${^Hj}DIOZ|D^9A*_vl zi|)0qnKoEGMl}nQ;U^>-PU;u@)BT@Z+wEs;simK-O-eqQZW)7(dyU_U#KH@O?TY3W zRWxdfXB1Z|wioAHz@xGk!lUU#d59;|yTu>D+POnePd^T{A)8Zek>z}oUuAkL-e7v^ zPt_)+ow3MAC~9Cn@jV?voune@EmSjdU|_JY6xxh<%rC9u4zjPjQ$78h)7^`qpS%lI z?rmIXT&u{(VQV8!LOB#q^5E z4_O<0F=#^YhoBYOB=so<`YB;GJA|9YSM?5bzHlrqU1ptVIaIu;m@#!O>QLk<>{!^K z@JNBWaC_0NqQ8sMO-oBMEqzMUt?R5Y_D|&-TyuPD*>`LN@8`N8Kd_Q_>8VVpyf3p^ zcATy&y&|uz3RR~oo69+dVwTbUz}w%3I`Cb3Gr3b@7q8-Up9j6|Mhx)pfH&C^Z?5Pi z?vXPk8&L7CBWX{yP$a8Um8_Zm1cv;W`?Mm%&O@Gx2`2un(NBZ~r@A(?K`rSSkNjfv@1d z{0fCN3T^rAimn)|6&*Jnvbe1e?eXRMvhI$9Ztx8JLxs8E@ZJGYP(pU5uStK%-0%t* zBEKTt3*OyF#Rp}HJcRj8rUv`$D&@v3#$#$0kuA!^`>4R*#T+fgw_%t0f<1z>33~)O z`gW2#><(8egtkmTI!RvQ86fjv5lT%}@~CGirQ9qsY6_HR4xR+!X&; z%~4nf&qoCjRs>yBW>AZn`$evJ+$4hxyw8H!dQ`w6)NJMw+J4DVbnF*ZFv`k zRlZQZ4BYdJiXx>?8L7G}{|j}`fHaGm1pU=ga)IQI7zw)JBuqD@ut)uGeK%bP-9?y7 zdkszI1kY9P2=S-H%nVolDet6-)%{c(Gl9lGc#Q~$)c)jTFf{_K= z@&mbwd`m%k;c(+r(V34MX1N-D#GN!vbEq@_Mnt-@_kKsQA2noyhM$%3-_6w!e;oIcr)EQ+~u4H zb+2K8tG-B2x*(9VWp7lURpn}L&^gUg^}?WsA<5B);>X2ricOB08fA}p6SpQ_U!ii` z>zGl|10yHngxel8IcSmgZ{-(oA`^tGyovkbJMEffuTknQnP#?{Hk-7jkwv;fU4Gl# ze{!ehPb^qcG{zKT+GhOS)S#qQ=?Z$GS$@1bx zbj+7?{k(Je2a;!GABCVYsrLrg)9qLNr(LGs8eua09(OUegFy)wj8oAcWA9d26xY@u zMokHqgf$8ILw8go*FIGpkrfbagcq!dz3m(78tmw6TVZJrKeAm$#@Ml_UtxKEQa+Ra zA}<{mtSZJW#z97<>4e!}{ZMMOO}2M;Zg=nZ_Vw3c|K>k}S$UP1Pu-)vP$t;rH{{I} z%M^3uiOM@l9Xd$`%x>vO>F?6&lpHn6_kjY;KCj^(`46#;{gphM-L+huU7P(CC7k$z z-{p^U)nT_0&&b2fE9DT?NL{3^m3o!7p}tOdWK3-Atyq7|r0C2Tuc24m+V~-Htz()+ zSB{z%E{04DUZUf)>omno8Br_HSuk_o*^}Ph&WrXSTWZOA(wK7&Z3p#}a%7|LWV<@6Kif*Ake+Syh6gJkk*8ASQ%x&dJ$$4nZPqF*G zd-+siEB!=~uiC1K2zKk5Xn*Q_`UVlS;d|WUI8AJMOasG~*g^4s#=kM_jouUWGSVGx z3274?u5+qc#Z%f2_my~lI(OaQ*}dJqq;zG;R#P`4ZOkfaWL#HdDCn8DE%#OK!2G9$ z3C0h`bkj()-15SD%J!t}HGJ*{d*d)^FBK93@u(z+Vm4<3eGf!z#0+_`;xIay6=g?c z_hnvblC&|crW`~brfWjQFmb5xg`3Q6W!w0>`CoavyT*F`8V9e5)JK zwd7v;8O`k=ZBSL+DBUCdneau?HDfI?|3-_^*PvVO72QFe(O_tQ;O>jBB;BAjFC;&ZMrxdNz0Aq%!vy;jS*R>gc1ZdU zeT^z6?~;F#53$Z3pwH1!*u)vol|AOa>f7l3>>lP_$5(@Tdstu|SknC^Ey*)dQL#yt zqH7VN)^86!5uypZ6*(mKcI*H{wb*RKRl~XXW))t=RxwP9t{!nTtaE58JZ8q~_N#r& zCTgANW4E%RZ@+th^RxXQXk#`OFE(x{iZ7a0pv_Ou+mbgd?{2=cz*P9MDA-s~lmIWl zZ0p*x0tW{Vsd}ENzLxA$elC=h-N_ud-h@gkGFzFYa*eW?;v*Ch1LSX+wG1iaDFNM+ z&+rWY3O4U|=%=s49bv0)inqwU*;UuSg=ZvcvZIjBT@zYJ4p6n=&GJ+g8`LgD8FEWM zKI~4!fv8@F%`qcmzQiaE8L{Kyz46oHsFc**s_w$$JAITq`Z!P#xwA7?AA2s`mFPrUFY1!|N3a&y| zn5T(9g{#ZI5_e((;S@QABB<_kq4bdKDdU0`Vyi4!HdU&ZE|d1A_oE_ihrjQJz!fnV zwa+fRnj65*_ow>^Z@A}{FCEp6Yh-95?Md-V*))6fttH&&k z-C+0{V>V1NjE!4TF+L$THp1{%q$l)KXtF*-vs}|%ybZHZ%vOX}V0tK}aV zybCT9Ls&g@(w|2@fFp53AUO{t(?x5WZ> zC3T7}7sMB~E9_Sg%s1t)%xjrP>b-Q(OTxqN&u5r9G z@pMvB;zDb&xxf(C+LXQJr=+Qpi=vhIF5bkHMz@471^RiLIM044e$+V)2MGp(h z6n4w6n)fRg&7<;u6%Hz~*e2QsICQRA9?rKc5Djh)tD?i{zqroAXS5$am8eVhCHs?4 z$ya1!QFCz)&}0XTSi($l0m zw+)@OOO?&!4#_3aI6{UB(33Yfk{KlZCtN4&veLsv&kL>;OfQ&IkX10gAPM^AQu5a2 zj>vyjSg-h8=^6Vq$69x?@0ah5e`WBW(75O|raLF_b3k$08Q%eSy=g>6at}F-yiC@F z8`8FL2R?!L0baMag|55^?lcdw47(pX3XU*E(2LSGTpFAm>CQ=no8WuRBH!^cP>D2@ zE(V#{Y3(PY+&s*Z0kL{>;+^DUiBqjv);jSm;wLAqO&*-MI9?Z5#k5;js*aVfmn;xh zA%-I*+$P!=sT?T?bo9hH=atqe{5Rj0e=|Q+;3%9`a3z0Q{@lE@yvzAJ3py0dC|ze? z>Rj(>=D!sf7g`-I2%nC2V=}mWp*@hE14I~R+bHrX^l3IE>ys78`@~9O5Ye3&gjd3< zA?LWaY&`plDS+G9qd=IRNI#Bdf%oUl$RTFCPy+SNFE-zYI*t}qSUPVKoc1fofoZ3 zt)UmQ-}q_BJhTQbC4Z1dM2u*s=sCFoZg(@_F6b%Tk)46N+g4}?aFnmfRAq|j?(|OB zpMIlyP@|(iL$Pr7aShptw;>u}Gl_%ZPm&@I7U&NP-S!4|9Qq>9ax#Bm(U@R74n4HMM(7HfF-vD=peOPgPVWa%9 zdA;)HiHleaq8m%A|UYhJ(nB?bEmKNUB!y|#CBZS}qnlnrx{>(TGjDmKKQMjl}G zh@s>X(K7LKu}or-N+fl~yG2*YVZ>`3!H;3r;O26oaERN_qHGhOb!1V;Bl_t3VCT@B z=s009b`c!44hSq!4*c_a$m0~1)!hw(X^Qzn> z#7~H;Zf?b{6?XQdZf{}UGa_8qJMhD6 zY7-P>n)X;DxJkbbeQ<~1t`oitclp0A03b&-hmO2;k@!d}t_pIE*bQ7*6M0mWCDzEw zDW$5QF2;P)+%&dgeB}gRQrpy#Ns|(>gi3Ms;&#OA6P6@I;@ihgG}-k&)f#D(yoQZL zX7WpzeUTrb@u56_&@<6_zhne(zP9BY`ST$6Y~GE0Lt&$WMFqbKj}*Qt*kAayXkF=2 z=XB2?f1OY?@`oy*Z!;<2Fmst3C_DlGnRmot@(}1NhKo8thS`o?0>ArP=n3d^a)D0G zgtQR$0LkP4TZb*n9HqZf2O~R~ra}?6M)ZPAC6d5_s*j|S;-jLiYMuVG@wl0c+iks@ z5KUSLGpn|BaNPDdw^f(0IU#D@7b`WJ4BxcLs?D`!8Rc%xJY7pe4fE$1#CjqAyLEY7gV;_nJz}zAD_RG| zpMdw+WDXd*>2;cQ3YAnN8iV8LC_cnknK4w)u+G2QwZ`_cf=)gLdRcZsoke1@6LX#Vc(W zoQdvdo~B-zH{!b+)PxhFD&`cM#cu>Y$sT+JxsiMZ+2uNPo9RSX#FZuAB&Q)KXOdqD zHPICMQ+%MO{)$F~YkV1G2KEe-5M}Y#aO;%_Zc|gmdg)f_B6&I08ubBfHN!aLEtB3- z-Q2_U&2-ybYWW`fEH*nPHpXVQnjRZ=>jYq=kCac5Y$t2s?+}n>a4l&vQa!NVlMi*~ z>f)H9qwqbJV7}o5VxKrQ%Gj}a=q?mEANUbCrR_ofl#5ldn#bC6x@G#~dWn9TevJOH?yUa4 ze!FgtHbGseP$*i-E6TP@GT?R!Mc;CB==A8!aL-VUAmc0YJaJ8MOt4ihSy?o_a6sYY zLSJD_@$ce{l18OBORGTr^4w9`HQ6KaM}x;B$#er)?FWP+q&M0OdRd2%gTRe2ReDI; zQZ5!i|1I4m9!_RM zzgUd0g3Y22N6&&&{AF-ePVw*Y^mOGqF5Ao7@7NIAHrp%vQb#N2a5&SEx%1phJ?p*e z{FegTfIlrD4v=RLgPNl=(G}Mco1nLI3ZWOH04ZWPXQ2azMfCbqYb!ItNTLtn}#ppSqb8$KHy03CWekk8yr@MFA$oB^foJK+Jg ziZF5wm|UPMbOQ#|e|)C1XmgB*GyIdx5#bfUT`P;y-sm>0@i!dV)*VM=bY#%a8c*!n7Na%6eOMZb4{{6^uelPUV-D88$ zM-pL2ley?U>I>2fl-ytN@9Z&Vhu}pH5LdAVRD^vFr}akSB8Suc5iNcK_GeX46WaxS zCOL%eLS%x4na!(ZV(2gU!XE;qO<&1w#u6C-y{GT6O2Q9j0rgRYkh}RC%uD_zbOTMs z%dzt!4fwsX9%w`QP;`TkDqDk|i{>($kVfJz!noisz8OAXEv^e4ew&7+}D^91T2US8N=w*uFXAxBR#-1P}pmpyeJm&25V|tS8A4xap zsUD2P%e7?da8q_3^GbS4z9XE>$%O`@<0?CK)$ih~DBg(H2L}hcVO6BKbVzUweHQBE z^9wFsoEh9XD!u1vQh*UI-#ri-%?$;b@-0tBC*k{#oiO~ZnM@zcaGfPoJ>RT z^q#|M|2yG9eZ-ka3E3^06%|CywH>1;9q(mfd4zw0r1__yjr6a%ll(8_8#xr_%_~k# zyeE#i`-|78cL!$jjNI!>f}7=uelMAdB>1(_4!TSHaQ|?+hO`>KiT;C|u)T^EfgQwH`NSwnKH>I&JE=38A#<<} z&r)p}{tulkJs)^1ZcL5{bP;+;*YT$4ZQ%+&nJxg|9d9H9{~`)rH4YNsmx#%F&UG^%(YMATC2XQ z93h>Z6EJlfMGH@Fe^udvMUOV{BK{i0dcBm}RZMarz1dw}m&G?L{##y7(=%A)r6i|? z>WG?L8R3vr*-&l}5l5-KH)B+IMF)xvQ}!1x2wkLNBoih3X=`K;y-ZeDdC7ePdnMWt zwaDKi3%zf-u<|aE;m6psiWS@e+{mnF>q}}t&+2U9b(B|0i~ua0*TF z9gz!2Imq`7Tt~?w-A7L!gds1mQ&A;OhS$USUn_1dd7dj1dV-DCCbL&ORart?kEq~p z6})1okCph9&|B6{MnW=Wkn;sqLq`&85xsps zgY((dR_MU_Sy|m*Lpnyv1HnmXBjQnXv66E=72d?_@*mSD4e!DkY!9J>Xa@QfZnsm3 zIO$$?nOC7Ik+*SI=V}?P+#>&QQI>LNv=@I3?I(0srv{%z()4Gj@==#&b2NppqC4GB ztYh%8?+2wans#6N>q~Vxxq`U9YE)p6jPO-pdKiC34tQ9}b=hlXjD5FGt@xc^N1kR} zk_#S@Wg8Zcirg?<4w;1`Dmz`%O&Lc@B!z_HvK}pUqC`wf$I56I^AKiqAd75+-xsFf z4_!PqHNHy058vqY8Nuv=iu%FgbuKdz!j=a&OB!I`gU9epG*)m#M-yK~7u_z_W>o~X zcyf(jnbc?{VArR4I!oH?TR6*OlT38s3i4ONojjoP8P^6^kz(YApVyusmjoY#E^GC8 z71uL#z4QVyfUODLT{5*wnCsC|Z;U}!;yFPgqT*m>#R>j>>7DotR-bb(;UYE{mS`ic zXLz)-2SoJ_NSM;> zs5_&}UGHVdcsF_waUsMK|EOK=0nt3=4B1EjC%&Gz2ji0e!=CgX(DW0v_q8GfS&48a z@KHQPofSC~n52rxR{C4<@07122BE#;hi?<_GNc8=k!!7yKgJj$+qWK=O?CJiFW#W%5qq_ z*>_HX+OJX<6KaL~`RDNw`4_Uc_c&Qm{5({HJcv~debb4H(`AXK8QE*Co1`<|{Zp=) zrQW0X*qYV@C4N6`k+2*5XO#DVa@&~-N&X6?YBI2-lFqU}NuBa=&)zZ%_zMNa5;Xbu z=fmR87pVmI+cy)SB$j)i!7X+0W>5S+_ zx+ULH)rp^EJ8Jnsp3nQPKA?YBP)_t+zSi-#DkA>uuIig%O_iy>&et@L>G!#BOq}Xl zc82bTG&!W<@5l;5G5(#YU!x5_{8M!=PZmpzeW_Bqv#2tW9jHnIsS!OHnT=FaZVM&^ zEU|2qD!pYrM3FwNsJro0ULiRuVbqr!MoQi9*8o*=%9GbArb`-6!E?hST0vdH59w-p zD>GK&nfdUa6rye1I9F}|%=rCuZ1IAM`G5LK8mTc)^Ke13!S)bw%cB7+aSMpUvf}nS zHA~Cxm8{ZC;hsd6nXcO|2hOEbWhdr!l=zi<<;Zu^_*DG2Bf)Ya`d5Cts>vl2Yz5Wz zzu%%`r9Q26+q3dyD-1Vuons?MFcDJ}5b&@_Gd&<`~FEg!+vTpVZr4z3vZqKyv zpENCX`-JOpU;g|fuoY^4>?T_l6D}MgYM@MvoQrl={>^=-2Z-V&_M&uCH(8tf*V2dT z`$2J_NnG2)fv7dL%$L1#v-DGRoD3<*vE+!3fA5>#)*mRG*`Uh%5_(O!#zoCVa_WvK z1f1&}ED2cMPE&u`l$R61u_g32MOEKS{m!=D^5A@HeO3E$-6(qxMY_(KBVrP3w96hJ z_@_eW;yEStYJPS+&-qkluS21&;!{$$b)CW;Ia-lcvO=}UkpHEdd8q#RSHF3#uJo%- zyVr2Tev-POT7?!;ZP~SogWxc?FFP+YcmC!Q!)V4rM@MIzlM zPf(%rpP-pUzHphi408~fAG0$rjel&&p!TE5UWPQK@b4z+yC`qlQsjRL1Tir>EIQFx zm+exNZ9N`1&E=_R$8u6GJ;2#?fmz`dDb?eJl{&uM%~F`mFTf*qyH6Jomy#s4 zn}<04Q}21jK-Vs1S<(_(E?glqF^79ieD=?PVHmmCE>-tdUH>`3z?#Kbv#Zv2uwq}R z6+T)K?|N=-^jnlD6?l6&yl?EOY$CRY_|A`7snYN*svOJdUlqO7vq>Lu0`4ooWzIGrx@bfmvMDd}@go^HJOlN(o@0RPotSYPyQ zU_yM0lEZAWJ|I+MM--i?yu7#ymW}J^juKt8mm*YZimOpnFLBL}&an>O=-y$DFHSYK z@NZK5lixJm`zt@LzSDs2QFrwwkXBo;-nJ+Inl3TzwIA%?B)^%-D5~t8zudT)h~W+2 z&(z(Q6Oq3Nzu~T{M~6o7U&(gN7=D0kZ0LZ#LUwjq5>~A=3+WWcmh>c-%j<_Ez<#sD z*8Z+c{`0#>^6b#Wa6`okU%qi}NTAZVU-}8287AWSmR1kE!l#YZtj^=YJj?r^Bk&6K zUZ1~Hzqt)Vj+z^PdmGzV+PWZ7xm#Mr*G$KJ-&y@$@#Vl>%dOyR_Mo^ys8C*wo96x= zmtNXbXY*`Aj0xgjJ<;K5mA^&w{rRids?ZreJ-zxzQi7VdGD`ym(JI;>hLops{;JNG zetT18nY~f;tRD6eg@4pyg;>=8dwXq$=_WJ%S9ASlv*_QF)Ngp;OZAvLi4&h*O*G=$ zeaHE361y}r-z<4*>FNx5;?r*yuLNflF2ssqr=i~YcQ{4TdwaGT(vs8KYrtEuF?-k_A8ndf| z<16mDpVGKl(cJKO{VM)E#bKK8I(0dt@yUeB&!8WC#}`HH%(CqsJWcGyu9tVVrRf_v z+8cS6^LA!ihfa}-RIBvuDD&<53LFKpu!MVoI2>&6LhTpXy|$9 ziMl>ItLReF#?o%G6trQul4vP0$u}ToYiU*LK)Dm2f2l@F+J!EO`!PONCBI26RNj01WB{fpNlhyK`DX-GcV$DAz#sZD^RqL8Z=-1z8 zCHu67S6fY0%6Rja=iI3HStKdS5q0*StHc$QFYZ>`T~J)SrrDb36$yFeYwusEcY*Tn zWAtfoa*F5YZR}1B^xEYVHlcdvti*B>YeA*Nf}C5rf3%4&D^|UPQ2*|!eu>)pXMRd; zTUuhfVux)+^F^P!6t{2GEN=puL)~Mq*I9j~(5$U_;?5iJCLHmsAmP3u>R!wgaXATD z&WHJF>4PQ3KQi@wkgcNU?row|@&A6Q7X93KWbWDT9jd1Br=mtrbNw-Owcj)3=d9t} zO_C}?FS5C6bnX#}I;O4ToVQE*fnpc6SY3x5$d<>N2Ab*`^X(mNmGw>2KE`3?n)Z9q z-&RmR-`3nwOShVB!#pB}bAnfq*eGw8vW#e_W1_kdnH1b1tmG;yIty_wgZ^(;L@yM3 z)Kqp?lU-BE`!X^@mQ82M1a!1}ivF$xGxX-?24k_Cl1kwu#G$CgH80IHd{Qm9&*Dn3 z&c^EgU5@*4#k42TFK$Lo6>xqai`{(rl!L#H&^OBO&$EKx%gta^{g#f3>)R|F0uHtP znFQG)W`)C=w!&%mjH@{4mx(KlE92TN>c=!e6#Np=SvgsFM_x-=jkWq_ATLZ$pgKy& zr-)gknw_!8(R@dq{3$(6z7T)&b5Y_URqt$@v6k$heH+$W-7jL`rc+(iFCw{YFWF0% zQ|V>3!dRgwGDmtJd*)xDDo;I#jF)#N^!{^9z)-_C7JCmTM17D;(LstqbY13aM|?pW?OH93(?17x~-tnCntBPi-}iu+xDfH4#!vA+Ne85aPNPnVzDTr@3!E6UayJzls6Ek% zlCQKMi-;`z(_kecP3@+!u$sQ9!DRJ|&fqE9%i--vGd>P25Iv_(a&N#<{%>G%{+52i z`}&Sy6J%OKAD$4oq0$hKg2NFxIS;GJ_MtQ6b)z%!wfH>m3i%OqJ5`=kgXgOEnN@6%& zO_)HScmpIh%!rQ=$AT5`isBjcd+4fZ&hO)jk+8Q zNC9_V{*#FFl?e6mqexYpV^0bzxLbT>c{n_a-7C7n4Hh1;t>E^33%3e;imsyi!ddqo zbT83~4n(Jk$`PBvpL;T29yu%wz-j&?rAHeRDE5s>WozJVNeR1~eUI)Z#-S4;UICLG zV|Fr)@F-G2Nd3=$9@XI#JDxoSs?aEt&H05lSO(GsR8n1qFp6+GG@tA(9Hs9F$>IxW zBwUA>N|eA!h=PkEov~N^adf0m#8pAFgk@xRZZ9PzKcGw5Z=k=-LmP5;xy5)G{SK$H zF8(a=z^4Fv=OF(OZvfow+2|g=C9(zeFn(a2yalhgkwCC|!IpBh(Dv9!{tX8@a&Df` z5({t(h>e1YehCWI5nL93jh%w)2nDwff5o=p6UdSL6M^IR^Sg*!aDP}6w1$)MTfo*m zh)fVB!x!kzfB=QL<0On6~Ab9p=YT}iE_1uaq<;T(wv1!CA_7>=Ox8nbT z;xf#SCOilW`uahn8Q(}%L06%aP?Pgxd3;Um40|1vIQPL#JPh~Io7iJ)mT+A3l#O9} z;9Ee``Yn&3auTlj^>0r6lMa6VZek^KUC_X@ra zJ`LEPSsLE*VROZ_sIUM_9w3 z!_FZUco*o6pQAH{d~T=U!mhK=z(wH((7CVj`_K%~TsIYp(Q7P%y#n`yspvDVP#A$8 zLDus%q4V3%4Tlr{+T1AN9Qp**=)KX#pq*}yPK4XoMW};14PzmJyVN+eEAJMJ*dL(|za0on2L1}H z1&vUFe~$FUX7U|?zuOrU=;yd>pugfBp4=!Ojurh&8817KhuYZ<4x~s% zxCvyPO=xR4Pb)(D!W?BmYds5oZ5{Y;F~T?SIfy|L1P{!fr@{dA77%;|VC5cxuJ0tI z74JZPA$5Tj+)vnm?gf^6Ci)mr!;A?FXMp*=hL49ikqKwN<@xT|X(Wd=pfOk?II(De zTnofqAi0+S!QePh#jAps5GOQ7rz5#MBTPZx13UMSPzuf)|H3zW2kORc!dN88+l5Og z!>thdVdY>Y^#(VKjlw2=5cpTs1AnDr;Umy=vv?zN2%d~Uy72_xHGcx0ZZuM!H-j&OjlT#n z)d7lzS^Q8SYZoDYZX1H5%i-Nl0w4G=khI&OuOMEx5oW>eELLa^4g%--9*|!;LN58v z-xE3jaUlcf1_+G%385<7(%u2aLJx?E9IVY8A%srn!QBE@TQ+D(e{gfa4Fdu9f+s>- z^dHcge1tf-3wo#3pzR;ezYzZ8QX@hhZ-A#|5EHN`ZUYtiIp)^ z4?<3Z^8Sh-1FAA6%m9MASAZA^bN3uFpSL0}Vb&5t3-q*5gFla%VLp~Xj5VUeUn`Tzj~MKLhM61W3(6goPE;6UYL8U<{Tc z?UB>SVdyJYqJ0G|P@JCuooFEPh0lUGIZe1H^hb@5Ef1o11Xu$=7HEd5(1#EUiiGLt z4_K+4fmHAlcr#AE3zCaO;ZFMk{3-AGRxqj!k%@3edqL2l4|p#>1sMUOl_S)EcmYo< z1-)bo#3{I8G)15= znal5j)glI#Q6`X4+Cu!Ai^M{#9E1QX9;oAq@H=$CIt~keBgsN0d@nQnTweH%6vJrv zV2`sIX~N5Zm%bC64gLuGA=>Wc{{rI1LS!LO^dk^i{^RwZ2M?VZLWIwRO5q}W-LgnS z;AGE-cX=ml0a^n@0Qgda5vkAwEdfry2J&JPmwUK^*qM zT)Qm%0eU*cL*#~-YXIlBzp-5Yv(OTSKMC9qW*{j*(msMT=6}HZMEKfBSLnaEDcpzE zTm;TKb%7eu0GSDEVi7ojO-9}cAB8$FUbBFD^B8{K7Vz|_3-en7M4XpU*RO*4c@R3y zuLDu#4qpbeq~GBQdcoh#fE7Lp&E|6;u5SQWxQW1ZABW@$>maJl2M4T)5TCaJ8_Noc zf%#~8WWP`jB83F0iw+eCm~)>55sC`4!4+X4%rH5S%ojq&Nq}{CUg!gaG`=)fO| z;Qi<}VK!fYEP#=|2@y9HErA&H0SHkhe2_5450KwL z8&F{*_}VCtrnw??8E{mTXh&G%|8e)%0d2M$Fzu=XamEi%TM29qG4SrYpfCIj(12zE$K@!DaZMP5|2$3>!>rto3GKLX%B>f5cQq#1SdHfew_p|MlInGxO7$j&lN#P zLgzr1{tgkW3R=jEz+K_7@D|)ssv$ghF!Y8Bfrj3df$;Ad04rt~%>K9kpIsH#yK#g5daY87hv}khLzuylaCFfht4;|MV~X1Pgcobb==z3^V^B(h9!kzd~8a z%hO?%e;4{A&EX9w_~M)3Nvgw}?1TvW5u9RrL9QPIvzG@ynmw@6E((2+r$R+=4{QP} zb&N0&)E-iP3B;!~$S>t!%%HadnaEd#zM5OGj?Mv9rWvr`H$j}*22}ZE$Urc5s1*K` z6+jTmg`5!tZ{GIkDaf8n;dk|e_{jj}b3DX~IZ%;yL6Z3DXj|wI>WU18=jkncfoiB7 z%$%t}M_&Qax*NI}DwK4g98|Qi{4a=?w;&q5f=;3^RNwJ1SCSzDZQvV1P8$c`@;~pJ zMEEk7`GLY&v>{aXbD&D;0j#Pe`~~=4tKf^w0ZQK_UW4|4Z1NYn7S=o5NeicuNicGM zLvFbME2$w6p!y065E|C@9mon>p_)Ak>l%kSy&6X81^6;GgbMpJe8=t3zodoV{S;ze zUx-i8LJgh;_u#+nf;TOME+!YuELO0fd%^kUAdL4gbT#z-tb({LhUiiTR^(*i6ukLk zs8G)GYhhF~_-(LmbD_Rz1@ymd-~eTTe3XsEK{cESZ~aloMe2hi(r9=l6dX2|AiITq z$V+gx=z-3Fl@o(LgLw27=1LvN#;1^)ko#65Qm7E;!O8;fC!mIHf;_(m=13Fxw-%r; zG=j`C2e}4M+7IeZlh6TT_d;-HEEcT%0T_#;uqGD4YGNTbkL8CV+n^`wH|(v-p<3Xz zeijPAN7e)mK21;w%-AMKAG8DF7cRqYXAIQ%yOC6EGt!q&z$?R=jKT_-2P^G7R0l`F zjj33Rp=BO`&DIthsf8`bXo9ia^> zhc)tv-vX6)4;bg#@QkM*duGB{WuP|u1li8be+NB$h?@bc;3AJgw%Pz+{3N&s_5-zT z1~QyG1*ET@ux3^ONo_Jzo@S_a*FvAsiGfWRz-vo8~2xK$xNoRq7&Sl6F3BYVi zh8QToj_niFQ&S+9#3K$E;em)A*6vWqnH1E>KZV(_f9wYl=PSPx-b;#(0WQ~DWH0Pb zmcw)YfDBL{e)>n`FToH0KNX&`K2&PGz_W)HPH=aHpFkDcBwmh9hp60(8N)fb@!(&g zMZaNVq0$|R>}F}9FV_t+z)Nl^IIi`eE@RiZSdNd0`=TJaGR+NHNJ~iLc|;a`DHrH@I)jPMc`Qnqu;=} zax$zk1bq!FxDV8(PlV5W8ssV*_T#;PtTP3?Mt8@a7FpS_k|A4ef(62bBFlBuxG2xC-PNcY+6G$S;ueWjY4nU&;Njk7T_nr zUf>Euyn8TP7lBhwEOcWYLLXq~@IP1{Hk$Yej&>7>YeaS8ErF9W$sjqJ{6tJ7IiOWG zCFc__fE7TJiwO#Nvu^AU_E^(yV01hOi*Z5qt%PGm()f@K+>7q3#Vx#*i!I5c^|cgtNMGpi=4~syGq&O zqN0T2Odt=Ki^x1h&b!>H1#63grGzcs<}Ph!^V%1>5`CLPatft6syvlJFXUb!iK1Vi zqb&eNiCq*HHS;Drnx$` zy1i^Kdi6~=YT7Z)IWH$CazF(=;Z?CeKww3LTi_!r19zf_;IsA)#y`q_U@;&T%jw0!9*I=? zgj|GNr*9z};qqshuCMN{%B+5)oua*|cU$(yT`J@3800%L-baHGgoY9-eL)=Pin1@awzf{!5&fOB{c@l??d@(Oq>+{4p| z+rUhHj}1aI!2xC)UlACHBN!cXM(8P>L1goCre9v zF`F!`<)x9(9W+*qof-Qyrl)C{W}4QfU8a1jcr6Qww*%AUF5+iDQ7K___-MFa@PKcd zN9THGFJF4UgerOEz}!n67fTC@t`*<6agMp3iQYHfx!y0HSpOJ*Ebz$&(L32JP)=22 z>OfWw`7H@8b*e48y&<0PAzp0Nq7xOZ9KX zD0x>|OG#VFSTZ0CVOB;`L;ld#@HC*%=efN08>LK1R?(i~#pn_f&=aZWCM39luXYMTNOX#^QA9|@uE=*nQpD7w_ao_ zGFZ$VO*(z5adymsm|d|smL<9&y59P4>fY+xinsCyvaZmP*P44mSD;(cUts@+l zI6FGJIGoOxj<92?^Rz9iDOH<2zT_wYz*s0cz@e{34yoJ%H3cfc^obhXuZ!t|`t;JLdf8wAmNhCfRnEUMcBYa<=55y{;?P zUEpf#dg!9vKHr|8I5HN<+40n5YBr?@7F2)a0Jv@?fPWh}6N7%@ERdXM;-@h^%0s42 z0&eF%@VSLE33!JT?mf_aYtpmObn!ko-`+}%kA5RpDt0Lj%ERPZqNS`*vrpGYKgJX` z{xr5T-_^g-wYDV1o{c#Xt1`}n+x1bZsj7jhHmWv?_L4z(gxk)v1p2p@QBmr^L@(jF z^f^>BPn4d~zj`NGb?Gx#M%epVp4tg>$O9yqfFG&SyW<&#Cp1k$66#fSO+u zei508FOu05&1El%$^a;Q!_&~{aZ`Nm?|-M%sus0^z(FQHRaSv zDu*&w{$AXMSiuL`e11Ilm8nMA0?EEXp2l8>*W+2}vO7`NW#?Ht>iFqw=<4O2V6LouK;a3eI{p;Cx~jFv(AXKcpHwQ1=0~@*13} zU*Io--(O|Ww|8L6@LKi<)f`lye-I;i3VFvgi7uk@@#l&msyyi*!hy*pd$qR>O-+A{ zU-YMp3ycE{UZXK~xYZV?j_GG?q2qOeZlz|MRILHGmqlt<&X5v~4cEY&&6};!?N=JC@srIA*xj-tXQD{-^$bgZD%I!wVy?qrK=_ z;Lp3CyUgj}PT>_g0UYpVf=ezq3}aUCm@s0W;9R>Utc-53drJlF>Nxcg`9yLvn#v@|>U?DWWVZ!Pb7-wq$;d*;b{dpw|6q`-|o8Z zna-BZzMlO)JlGCA0y>0>LQ5mRD26`GDq4t7vb7PS&*ha_&Rk6lsI(X^*4UTHf zuqtrk5dFW4-3u-aR>fleA-#pFK>b6-17p0aaE#JK-qSV6hsuCru(%8GQL9P5>8=|b zhRcRD%}ecZ!)e0;W5n_)c3A9UQ>?CqhSU}59%(A++NvAMK9F(PRiqNu0M4dob9*9H z16RBm{;)sB|J@sOE_D2Ku<#u+9Oa#_foS)^e%GLl)6WLgQ8t3(f8#ar7HD_a2h~F>;F*{iP7k_sS6Lfd6FeCoGXKy~Y84e1HAZ@d z4qNJXdrUS0=dmH;#rix`o~g)a()Tcwv#hgBHjj!=j~^FjjJax% znlO{W*hr7)-fEJR7HK`Q8oCsDhmS-zapkC}|C6_>uY6D$;(ZgGHd~f`t#h5@lJmZM zh4Zd+sPiXq!o=Qz{-c3vff4?+z{XH|BqMs9%B0895 zJvf-IfjgU5+#}%hu7bTl4nLn;%LbU$%v9zKJ%Di&gk&Pnrv60lGVA1*HA~d#^0~l> zi<2KV)rfs=UT+w$JFIJBDr-tNr^Ju6dSVuu7gGoLeC+fZVI&w*Z54IQU+OxT}nl zPGT4!J`Cadh+P=URHiS52O@IS6&0-|ENmSe>HwKG&7{CFUU;dHrQJ1B{3K8kEC)hBew%)9~RJsRKW%Qf1P^!-wxJ( z$+gm*X2+KuX|(9}S!j{frjTDyB6M>6ZfYI+?r34~Fxi#&CkZ zlP~4U3r&F#!=h7!C#;4W&8_7Y!G5S7hqEo1a`a4kEggxPfL(f4xIx9ysz?M2DN_{( z6@Jpi4<}xz(=1(LZyTFw&S(Z0CY!67&&TYDUvIr=QR+XcU+T-7?ijKR^)-atBHjip zt?!~?#89M|ofnM{&GMH9Tp>-UmcN6up*`CnarAOk^R#sJc22RU+k%dI?qA-Nz%Sn( zZ)bvh+A#G3~OSPsfoFv(Ww(uGh{=>^}&?@6E(#5AXT`j)S%F7|JYz4bTGU$ zFw^zZ_R_&Q`*~`5m%4Abp4%VU^Iac3*LB0*vFYeQA<8|2-JOr`Cwv3?+Aj7Q^ORWt?i&xIov2ILa7ipX zf?67mQuU?h6`Zn%{6GC^qIj47L2RjIgRZwasP1cQVRo7a#K*+vSl$?>YvvmeGp~Q4 z7c^B=GI=L(TG?&$WL!zZa%*%xM#FMdB(>3i;gV+4Pbi@>|Z>iod2L~o{sW^|wzAWS?0 zmZX8I!HnZB!rf~K*wQ`t)o@}2=QThMy^1yvI)HojXMPsH4HSr%fcR&nx6uojchvTX zKb(!|BuyAH`Xsn0VwLdnG0HRYm&koqLVnj(vM!Hx>rbl(t4|oGSsuldO9~~TaR$?J z?M&n4m=ZH%>Y)FrE>=_$3*;Mc96W}D3ovV=o(B~E@uAJ(#i3r_)(-Q3wYhVrJHuVZ zbuuZY7~*~6pX~QS<#o%mH?TGmp;|FRsfy7!su}Z#`wZs@tKdfZ6`TQ#g1`S4 zoT*aK!+cXt0j{%kz#BP*-@w&ol7Zw_$lRvrFdxoD9*8TmJEBv9eh3FVntYOMki0xbUseS8q*cp6zOBD`zkFYgZ3fJ-eo~n(ezCb2str_N93zyVtt= z`YMJRL?2O|qAMfIBRJ(?>VQvE6X6-Zi7NqGeGZ)5|Al3u1^g2BE;|UaxJ=04hwun6 z#`@7gCX2ony%!#X^pw`-F3{sc-J^HKljLvI_mox0mp~Gjt!-#M8?UmQ)H^hDO&ent z#ki7Qq`pX89FwK%VaSTv2P`hBDOG1y$t0gdyCi8MoUDa;faCWjX!Gv#yMnUt2EWW1 zD(O&M0W!!KXFq36+wbBC&|%eCSAfD344VG}c#K1i1`x2X5gcH!OlQOQ!CWf}yY zN8gCn$|h>&Dtn6_0oS`wT@rIGUKG3Ca80|}m~5GCNle+AiY5DF>S#CXdssrTtzuoq zx!N4X7jYTT6}KY?lI5}f+$p+3u$Q--?_%IfC??R)HMUeJJXi9UeXMhXTjy9%`o4Ha zaX%a7TJO8!BfK+RpIjE-r_c_lFzQgNBbw-9dM5jvzX*i;0pPoE1*z{qJv|E;(5s;@;cwv~*NmfiFVYpb_Sc22TzmE}ID=Zp*Fz?=Bs-VBKo6y8 z@Ts~l^pmNDQ*7I?INXC+CO7HEYnRDWEhCb=W#5(& zC-*mv)hbPGVo7Vy7`O4GMkdQ7%aHHzeq;eYNXTasLajaT-PgPh|88F$x8Jt9@JbLgbL^|c+KFpoE`T_9wSH~_pSHgon>HqPk-na8sCjH`z1aV?bQWM% zRo~w~adWx@W`-_@F6kJ$eu8ueNC^l?3JB5-A_yWODJ7st3ldU-wB*n|Oy})W|IeNG zeeN^EFmvxYd+(KJt?ycEVq9+IjNl8F0BNOQHM`9tq{{jz*OB~*x%T|U1>Y3*&3~NT zF?~Xsl-A?Jw;6jg^0RE&a%S%P6=@0CEAo#PY|cNO|5<*+!U07~9UAivYbtw`7Cx)D zt8WO;0K3KI!czS&tq1l*5VlM4VkX-7rkd%ytk%#<>KB>d5aDa->EW*8{8Jle9c?=$JMxplL|SsMO#QmO|-@?Pz$Lgl1)2rTUb79=$gDMdVL$vnm8ueOd9_#L)1; z;bF0FV#|~q8?Qvs-`G-Cs!CtOMyCDpI<=B#dSREm3whmgTA7#9OF4aGj$*9j<+{3v|^XC=y!hc^@)WI>`Gf??Pou-W^=FvdwMSMAx3HNs4 z5>JD@iIKJ-JJiCcp+8YutNrwg%%s>x|Cy=2E}nzV%bpFEs-|V)A?3cKnVMh<4V)i& zKJq}|1nX2hy))Nr}*?+p#@LH7URHqpvHZBvgv2615}xRBU4W*Rew) zUBOYdzRdq=A&nttbwjbcw--I26Ght`Bdcsysm!l42B!~An?|3od2e5(eVrbaUL)gO z`qT7<84WY9=Vah%6%~F~bkecW`K3FIZowOsdU{jl5i|k)VJCl(Obq>?I7WyBxiCjK ztDE!*YKmsj=WD%{AxgaJ@kRUW?j&CYp@prHc`YV`8_)42~WfF(qO`xGTgG9Bv9I48Ttx?{X)^yXcrJXXf5%e33zs-EDJykc7PT79Gb1%B6Puz$NW6EnSUYW>NZ zlz_{|G2i|wbH0AmCaa28QaWntV{T=Nlb*}vtRGu*0@elV!Ae-$uqL6kB1T2Z(amD- z#+;4m9#b#+N%WJL?y=#q1(E9_#)pT8Rt|X)sMwd=zpdGp|8GT0#H9W<^aL!(58Hlw0)N_68}<)p~TK z=*-i)q<=-0Y=p5~ZwOK#pHf0!^5qcOlg)$)*|b6TNnE_CU&>572;o(35g5|3O*HB zHfU(@jG#k7U4lvn?F*bAkZ4b_O|ovYRJBYsb&@|3hYQP%aI9H4Q?oYtu6px4iJmvE zLe~-Zd{0Nu8TV;df~SgivbPHHyF6bn<$vls^$@7dH%v6iN8i5GZAN8X(X9Gh?V0)~ z^G~iSZPdrgR$m3*b?*T>kPi1Hd8d1RA$}C;>+h@Qd+PnyTc2r^Y04ZmU46~WLys21 zOq+IC(;MP%(sl72GB-e6DW%BY$<>&Tb&{ExX;L|I886L=mfDt==EjyymLIH@Y;}pf zcd`v&T3VKUhCRvt)K=5}mECE-ZSQW6wIy5ETMI1n%?>6P)sfqhpW71AE@VAH6 z&CH&uq1VyIsxR2vY-NSoTZ_{gtKEn{Cn^_|_G%z=rj96Gl?}db%y-@ETj-na8|Uro zJ;78%w|A8Hjc2MS+H3YzPzEqbZaL@kOu3?AL)la zQG6lPWzN?Ix^p?1Yj_Go$~p0SP_x-$8HsoYJy=`wJTXQdY+A&8uo|XcOtI!c=AX?& zEr%=}Ef!0NWxu77bq&+f9@44&fGOY9mw0X)^KmM|;6x7wXqX%qDR`g8hwXRAMJQDi88(!1(a^nLVld#dh2mv7Xg z^wFGwmkxDX)LGj1S{qGg@>Xxoc7rcOlq$cHOEF0<%si3^b#2Rf zIy2?b&-#jaqh&dh{(6|*OiKKgnPj>=UAD;{aU5vW7v$hJf#P_;)BV3_fpl$yE*Lv> zUHggYO4aCEzFtewbHFql)8Dbaa%jDoWLv*A-s!pYS?Q$bYrFJNqaiZ&rGA-sUnM?K zK|iQJ)$eH&wX5netXGC=WA4&MEtTm^)6guT{Ci590S0Iw*}N(|dtU+ba!Gg&ZWm0Z z&`ew^uBSs*YpJgk#|$1>+C;aXaAsXpky7c&v{IZ(jBb+HLX4Bf@|PBm2gNKE7oy_mz* zOB{hcJT0y#U;K%2PJbpY#0L*UD^?J8;sF}U9iJ|hV+XUe&CV!2OP{JV)6aPiD3#Ri zN?&!lXPa-W+DN-dR@>%n>1&FWcnk)&EnHCpjbuUNmZYOpWpEuyJf~a%$=HOn2iDEfC-j@E*xe8sfOS9JZaZc3NUbeMO_$K(*xDcNS;#U#Turu)(yxix)*zBe_I&zL&M ze=8;JruEl^MlIa}wACZO>kmrkulGHZd@FZAIf1x#kvYqwwdJTQ7;KSbKX{WI9 zQ$e?#(l_bdwcR`wFE#!KJvvEvqzweInWgs=zW{5J2G-~ga0V`YAMx~4$WTk+SHY?E zLf@VeD@Zj!1NAl>Vh)J9AfX15x+p#n=a) zzY(jGjN95t@U&qd7PGX5;BOj%jM)ViG)9<9bxgc4o!M4Fpp_Cq5d6n{=`*b8Eoip6 zVmesE7D9jKOkZNohgBSneCiofJ9Y8CN?wG&kzH?(GYP5nP|Y0uO%n$AgY(`yS`wSl0Yrf5x# z-P(9Ok(NvYdj!%WALN}^e~!dinaTT1w~KSc3FxRD%!S)3wq|--b8#{@?Sz;u4Wmjd zRqQW*1%7mr^uS1_swaw`D@(+l(px%pcW3JBZKI#~7kG;haNPsI<%||Sm^KF<0q^KD zbngM}Yh^I;&TQW@G;W3xCfKwFLT_!DHdK#No-qgSPwF!!2>D_Q=7<%LmuW8U21lN2 zoD)WYi0KUe>#;ga|3`bL%w(=yZ?&EF8NK2A=;zqkL!+~HPjfM=qc>9tTZ`+N0r#8< zawF+u7Aec7^GxMzY&tGAmQ$rarRvO*8Xzr^M@o%NpG&jF8sch0kh+2Rcn%)+dnN*= zfLxhp{0-7R1(bYeC@@`!6o-rbnB6%MN_}A*2Nkv)ZGTco6nAolkBuF|eKh|)uxr2T zTcJ`VuuX5kb(RKsv(#9ruhUD?4Qso;0xSAh>kM8$NS)7sjVZJPwO{eH$AExO(Le9!8x8M_68I*#_@1~55?K)yXUf{E*a zumKN<+(G9kqDt?z`%EM}t>wZC!^LQmXgzBm6!b87cyQg|XF+d*P6hW1nH*9#Xih+N zdl%a#rqmv^R<}N<8(;$X&d17s-s@h?)665gk2+sEb~#o$>Nu;o);NP)Rd{-x=Z^M- zdn$Q`dtrg7=<%s!J$vF$ z{LRT&@ueJKiSR1G;8hl{3MZxC=o!9EJ|KlljU|maPWj{x8c5$r4@@O&7Xq7v?uu9& zT|egM*cq{VV;)EMj2aQ17P39)Dqj71bG+#n=_$y-Z{fGio<%OZ<3z!eJX`Lz?BJ~I z%sN@&*{ySS=SCOwFB<4<&CJ|uo(I0FYBjwJs92M1V=@sPajmJgu>pEuZcvq=et~!G zS=N!3X{OcEL6G)!wI9`1>I)?k%QH;rql7A}Ky*YB?>gv{$s_momGM31=l4n%vZP)$ z4oWVfT5XT~roNt6$mifU$q6a64W!qP{ zUGY?frRDmTKAzA&{(kh_i1}e+=+U6D0Ud3-%;{7Sty8af4lrf@7*jeves|^7RM?|!05>N3H{2q_~=fxq#EJXKTqyeX=>SFN#jf2kCS3OQLQ6FLXCjNmU&`7 z^_;6}!Nn~7{fAdOpFKz|@o?$=Blo{}xbE@B=QH1o`Y@$aOpBsFnJ!l@G9#T|a4?&T%Y{duN)g71U6!x&BX^xJZMQy-=vpO;#IO@9j? zj8p7Nq@wyp`l@;(+>;%if}*_3x%F}%=Zg8)3Whmadk*RQExp64lqg%ScJiK@PwRYJ zw@q!m+K$RK%l()XUUFmH-k2B6^Y}EhdcZ(aus+H?x?o9GleFotS&WxxCM<&`HL9;Mx%cdx>|>-YY6@Gf=O z^9^qnr&r8vTf{(9de6tnA6eqO17F=2Qy%6v)h4dyyNOcOcJ{pPZkNpOyDGPs}f0;3-_~Jm$Hf{GbcsQE59p zAkX6YHIqITZy1sK7PX#Y@zwPn@U&w>`Yc|fJY$&rch94EyqK3rbyjcI-bug#Y`}NN!JSw_7>`u@f>wnemnIGOv6`w`CI{vO>=8OEX?nHg1>508k z$XDU}Bd122W5>t+uSDMjUt+IP4@yOp3Qb%bzd!0}$ST_f(WD5@7kTfqx@WBTuqv&3 z+THgL-;YXLNzd%pSv7Mv%GYzp`#JeN! zeffg9y|s%i(Z1O(2GkGe9Iz%JJ0K%q8&B^CY+06_roPe;;fDSliC}c40`RJk1u}LM9-9Gv zx&ofpD)8&?@Ge*5{ny35&j5S;FLPgyFvGPUwWJq`9-PYn!Vf-Ip_F;_)!Nj~cc0waeCz$4m-jb4Zuz3ao5k;Eq)*72o%=;$2X~G- zOg?EJ8`>r6yV&OO`SDxgU&XBUmS(fGI38u>O-{KJA ziN0Us30G-M)TxE9xv!mXn$Jb%*`c&hm#K}lK;5DHj7U6#wsHg0MtY1*G6kCU@ub;X zz9Ds%7Ew(!ftcKH+A_7Na@uPpkF&_pyXY2?)gOx%JJvg|xDLCAdzyIzeKw^$lcw+R zOcqQO=b7|aer7sjK4)od-AI4jMb`7y1Ge@7;q>k99To{TD>f=Ds&3@l@Z8Xp;72aJ z@NS1piIz}nA8Tpr9kVW{QoHe5i&OXb-gpDOjXg8mvt55Uqn)2SdOJ2d5}YZnaqi=u z+P<#J4|w_y^%%hk#s{pKDVu2}{g{jP%6`Yb$UfZ`W*un$K@O8Ld6roYvZIhVSQoXu zGS>IryUg3o`>FS$x0i3DQd(OG?mAW6EH5_?whpm>85kA3ETmWHccFtrhlL~tzYZ)N zFw!>GQpc1aU8Un~K6#rU_e`g+=tAMTf=&51^Ahsxc`Ne<=Wi&eQ*_tS$92=)&nqZj z606;3#L#PZvb@!_+}zP}i@8gI)~%L3<~F9kq!YqWT~<@PhhyV8h?Bg-~TP@>w0NcNhRx8}{(^#OZ>zYkSICI_B3{h@qRa5t^Wiw&u} z9#%^I?OFS`H8Q*BJ008Ii#>`vitfU9-J5*rdQVeaz?#rakx!#%#$1bj9C9rhT1lk>yAEk+Dt*bzdrMlUFIHe)gH{?YRvKf*g}v6+LA+r%v7p-T?Z*&($kS ztt|@zCWJH(KO2!6u`PU4XgNCkm9`~Xx|w!KZAGi_m)=9mSH}1Z&mQ+M*Cgk64y$8- z(cq%*Ko+?jm0e5R-Mrr`mw9d;CiOCfTYAtVyQJ+U9hIM2+FAZ&3d2~_dU?LoTr45n z(T`~tRfjT&zWAxWLSGAICr|fR)P>p#>anj2HKYadWz!w=QcJY;Q)?}&$$HeXz%tdc z%(BYzr6s|#hEB$%<)1~vXrZrFKT!tyPJ0J=TYDRMM|&?pd6RNZS*SK7G8RgG$OoZ7 z+`$y_c}&Ba3;uP6yg(i-Clj^#O7e;$#i!`Z2%dt^8}-O!N9yCXed;;I>)Qm11O_z|R5Sev`W1y;O@PfR4B89r;u_b*J(XbM7;IrI`BuRGFlP zXqUC^OxSV@Po$@&^44Sa>OnI?CWa+NRE-=J84+1O!WT9-)Dt`)Xji~~+Z0QnX$94K zc6By03NjpRi%f+>3-09a&W|gIq=WPw$Iq^gp2ogbs*UOaE71a3PL~guzA|^EXLkqA zx}0?hvB(alL(*MgpS~CjXj{)3R~u(v$KImJMF)z?Iy6V9tFODJH&zMJVvP!76kW+Z zW~=pI>*uyRwh?wyK;M9|0bc}20XF+FYqDh|{Mi^V6JWIKhFlLX>QLWgv3;KUl&0ZHrOFqdg2g@aKj1uJuvek4LB=B8k;e2e7nCr8b z7+gDM6cmX=c#5t=?dwwg4<^!2X5xxn{ZtKJbO)U}jy}*{ak%tYS}T8Tnn{oF zf2^ySAX(jh#pbpCW$9>MC+`)VMhE?xx>q^vD`dW7E^%_vGuHje)zwu%erE|)1Q(qa z*B)0rcbfZmPdo1`Z>TTXHwgUMI-k$CoM>cqEnL_1zo^g*r04ri`Wo4b-^ysx=9fHo-wpx~%3rurNbLBpi z0JQfmF6y5%@x#tH6J9QQvHRuXH|NrBWJW>e~wg$AnA= z^SC==YsBU7I$@uOR1R8he{HG9Ox;8J56VQ(Ea$$$_W9d$@8_iFoXc&*e8Xf%IdB8X z-nR509|ivG3N=9$&9!Vvfk%VahE50@9@aT@Nw7We8(WrnmAqc`(n(?~e0~fB-575} zo}@lFosNEv5{?u{s>9)2>t5vDqrB7t(Ay164)a9oaobXRJ^ZB;0mlQ{1l+I(G4mtF z`qaGA)S8&xGNH0@MO%sgv7AW^tGr9RkG+k2$9-*izMcvde+07-Xq=*6(FzrN!R>Rv z?^>8in=2oYN6V$;Z>8hJa%CaGh}X+%4b^2J0v`CT`hu`VnaT_`S=*)6rLV{os^mPv zQJ!^{$)7R7a6Ors*XEPv*6>oaX}Ro{zLsL817bDm!LCu!d0acChN)|nkCoC&TV*Re zdO{sb7AX_`RE>OVEVAKv_WXD13YQ7J@vqiUA@g3}tuNI_ zkeSHUdm7J-mb~jRh}}qH#jVALVrOEMBgKW{GI0R4JI90uV0phGzEX-T`*aXn4XE2$ zOk6jSDx{vqZT%O0ioQU<0X(;!(b%YISkYKZ$%^#TSL^rm^60SPM7Jl3mqm+InRyqj zr18w`y($TEb$OBeuY8T^5J~c4>N8#ti3>Hx>OO6y){i_SnMu8`ei;pSf+*b-qKG@V zb5|mxTZx|kO4lTf2=-Po6>ItaRO_h!NG`3pv4mQ$33NdTq0W8)*_e`geSNrooag5N zG-ILuR{ukvuWzA3s2>%Ei$QhF6SKw2QUli3OX??$lxBlI{7ZT&=~A-XAKiRbcF1;S zhbItyeQa_Mmy$gmX0FU+*8OlK7 zXWwXbiLh7H22e{-sEyV0=veNrh+jmd+6#Q=XVlP6BG-JFN~T$$6uWBo)P>AD8>gNo zr?5qniDmWETatJAQ#(k7+Y9X@XxWgcTpwyi!=U>W;>P`{Oe#xl{Rk>IGwAuSO$?Pr zNIRuy`5PkYmrOsIS6UWZ&)L4S4-2>!a512DKq>nuYdfkN`JK) z8JkOF26n2yDJRL+ZKa-J6#2{T>c855bl7)NuW{NKOy+2YUXf^9XHa6%Sv6t%%WGAu?Npy|Upcg;D?c3DwIJ;3v9{a1w`wRJsu)!-L-v z?W`}FLBeN)Xq`>gq@grQIwrj*Q_w~p2^Aj8B~2YnT}%y3`AFVPGA0|ya|BZ(I}|Q_ zMTTRi-UfViknZNZ!-$#xpofrw*+{MHex63!8Ue<7@X!r8(>_pcC;aLrM&>{|FY>@U>MMUYJjCN3Q>H!%;gPO`@bb`N?VWDY(YM4?f@OZ#=(74Qzvs57&hGNhei zLt&FHXrq)X-p@VNnHqh;xejgL$>m{ok;@a}i&pL`o77=o!=qUG=uEr5B_L0y%?IZqU}7FjiOFEhO=M7yw3e(!am@OjFpGVrR9TC zUFjC5pNJl*NW48otVGOx3(?je;qrg<9b`e?>W%3^kU*t%7E#%1XuB(926hRr$-#Xs zriwL~WpNdh*Ins*qKX^DLSY{9)C}VZ9VPywvTz+|pJGH(!FvRhyvpx+yt#w(8XadR~;dCp>A&vHOu2bWQ^3?>OGKz6=aA1L6+ws7pY8V?x@w$ zx>dIpx(VSFyEMf3ej{h?cEgIylZs|y*I?No@<5t{nVnZ%xJ@~%weq>Wfb^pe}Y zAk0F`>=g>Bk#UJj;I@f)4GZD8sI2b|dxbLMEOLqa#6Hy9)S!RFEp%A{xwf0&^hJ>FXSEHW6cRO?_A&ZSfXfVh7<3$p zU$csO#uM5CZHMO6=EIR+gKs(wO>gNHsN48govp4=bJWFJGTMI$a+rj-m`Bd#2X>W< zHTWB3@N;oIlOjJR*L$2g!dj+PrspOzv&ROQYngS^9x{kO$azu^>9iOrPQk*IHHP8^ zjVC`{Ui(Blqg5u#-=27JYvC6veEo=WSqvbP=;w>_FRD2cYz%RrbU;$13UXzvNr7|< z&D~O}L;k<1G*p^GrP&YCXHrRND;2t>gfGci4`ZUUh&CRAL`NIVv03rht996i8hSIV z;uf@WH#&JdF!GG+oToomdIU^$HE=V3YOBf0f1@qd_Gx)qb@0i5>Q22nwY9S0#hNBi zoqCnp_1$Q&4&-!3u``3LHq}UIxliGh<=EnT=!JK}ec=G-x&@1N4(j`cL8&};*hIUDTgWQS45^~QYUGs&6#o?scukv zX>s}$G;L$LmW4$WXfD4MyU*+KA=5(7#!IzkTrjElyIKUHp99jB#tmYhejO|Q- zI|3phOA3}N$|dA?(pIU1WQfbj*uACd_bh$5YKp7Gw_>0qQiC~%eEksNZ|0I6MiWfe z$8+EE=!c7Fif3fbyMecSq;Ej$HbZ|8L)Ld=Pw(jeL9+p9_T$E*(4CQ~BM39o!;<9?tH6LSApi# z22x2jUfeG}5^sV7I4X`ow%$_pJpvygk~(rLdc6Waxs8AD?P9@V{H0$aOBzky_yYYc zMpFa63mv+it9?Umuo3+w%EQZpxkI#&PTsB|pIHN4S}?ur3uH7y*Y$GbQ42ttJ=Gtg z)l=w06bDTnQ}3PvBB2ealiO61ccG%^vT=m!@ym4JNung35hTP}l7 z4r3{A2xpM>wREqVi#}Kd|NKSlbPu&`)#w?p2mbjO%6>|YJP0}an>_EQ@LpZG$lo2U zJ3JglO@9jfu!Y>{3baNg;W>Trnz1ecekzA_uYq&kkf*##26G%02v?!Z671y>{IfE} z{Q>(LUGV4Qp?M%&S(PqP+sWY9htFnkmmYZcE_m?*T=16Jg zfT&w(Ji>l{MtyQ2I>3X=R1sUekoZw7O{1Sag2GfD$ zieM9K(I16cCFpya6Y7e;R7m$Sl`aFhbZprTlAs5ES_0La_mJ0@`g2|8^aqiBJ_=2K zC(}6?3sD#UY8m@&LpKHsmSi;iU?sb`5RUS9a?9X~CoCbXDO?}8>!(@9L_5?z?wP`ELBU&)zFgwAH3dA2~KW1LL{tEKk|+S=cP7+p)=~B zHCkg&TJfnV+-oKr=-=O9R-J<97sN^Das?m#9g=uns03E`8rpIKUPgD$d@S`jhoNpE z*fam%OM{*C(8umEU)8vZKeqpjXU%ARtGfgRxq;|wP#AWtLF2g{JNq38oQ9J2AE58{GSAB`Kn?m22LEyi-z*o(|r%gixoWmQ4g>xpu2RpIb2YJq# zjdp1Vjhper;pC)t!5KT)(Mk65pqQ2tPuwN&I~o*o!Wd|wLn#oAR!DPx zQ;>z~^xw!rZr&gPb|PdY&^du%YHlKPJCXXI(RLU4tH3qJqRS5#`!8ELlTm!SGd@f| ztonHDN0;Jjr9t6AJpDDo27ZDioP(cw8p->EwO+;te+sTifZCC)rZ?xb2zqa4EkALM z9!R8>&S|IN&Cj4xZ)h?Rz4j+ka1F{FLl>-ouFJX3E2OeARO!bxR=_bo;a}0?2hDk; zm`lDviW<{9$Ai!8M928|J`~9qiA*$r-=g85XS`m6PAbKfDq}^u;PZ4xjvMoLjr05$ z4AXQvefc}4^rpsWBtGdtxM~`&IlPv@?VGR#XOZ9>I>tzxbp?2_4m8!;y132mT$khwJ9^;R^rITtN_#PkUf3w&U9$}Z+$ayKATfW3{Y(WCHA%DkFJ_@B(1{xTu@En%K&?ognsP8SuG)^vjQD6 zl(VZ09jr);kN%-0S=$R%bOV{*3%&Mn;ve9MTI^*79Q+GfeinS|hUZTd|4oBenn3^Y zNO2pk{+hLHfi|na&HN69ob)=XMLl&VIHwBN_rasjxyLoM?JNAumuR#%?Djnp^@ROw zL}usXjf_SrhoVz9ah2m}x(D#zWq9B`pDzP1cM!%POMWfc2U_((_jShK9?5x4FXn?K z?D8}8UK#k<1~)tKEiCv`_povF$%9({6+Fk!bwmAU(BTq)dBkor(M&FEaWtM!E%+n_ zt=zh1zb5u~=Gg~duIBW&LzlZ~Wjc+b4HEDyOYyIT4oNPq zUWhN~Ev7bQdFcNlXz%8pxtu^Ac~lRS@~=0FxKMTWSp#pT68gA2*XhDDMK>gFEIaNC z|3<;tew+U%7V3Ab%{e6GEq8H2G5@P9cdyGywdOP9(PiJV?vCuIB)(%szDJ^E;<%T} zE*`OeYHB!@o6zJf>k&D#Tvq%Dy?q^i%Ej)4vX>_4jz+909Dnu}_ud50ZH60;qX{qZ z-s7Ca4tVhYop2#sXh-gv7IR5QcF>ZYlxA%u;k!hxSdzcHIRhGhqVvk}^)agp28CC` zk0E7sW=_oCUFiz4|1%ooCRBXE|1_*MEiSO3e!j}$EMKAhE<%r+e1FGh9^#!1LY`Zp zZ`*QyBcNmzuKF>&@df8GAFlomA7K%eeI)-*L`ny9VzpV5h=e`Fy6jpb^g10w>pZps2 zB%I^ds6p(x68fSA>+@50Dq3|Fn(X&tJ$4K|whWu`8Pe5=cUFO;1K=hrbhC4MHsTRZ zJm<$)iL0!2BNkyQciaLu?n92#=+~se`TppejZgZufezmZ_+9Dn{d?&8wwTH{&>?~k%0hq^D%4c%6;qcb9wew0{8+_V_y*EV?Y8F$k8vFHoU)74&AJ5(+;iot>iWOfu8=lESno6VNY8LD5 z1kN)A-ToT-_@(>;5_q3Ii1b%Appn9f6k`4I*t>s!TCuhb<5Tnyh7$FWK>xjy(Ha^& zd5;sm2VcfQsp_nR|9eyi_}X^=tGG zQ2!BEds3`Pv-qyE2LC!Q!WlQ@dme)e}#tcqq# z<{U~E>ump%asV7vlE0KIw&*o^XA|C)%xU=8VKP~PU@$$@@%ifzncK7SaE z>-Pbc<8jU7>2V^uY%^EB0Zp@5rNLPwLMz+JEFn4?X?Q9T(6DhpO+krt``qa zC9w+sZ>n*Z6lm3wd-g(Kc1J69=DrbpE|@3pRQCQ)v0c6iRo=rR8GKi`dIsklSiJuv zIJG3JFN=(I;HxS;QJK?F;LsPW@?`P)-l92t&^UwBiQp9dy0Im9Z@@{k;wSnM!ao(b zq6__>1s_~R*8N{hl)_^D!9Mr0+Y4~b8Gd&KnM>oHCO+YRhWGnI*U(Q-`6&>6sBs?` P_Y?R@hYEgM^TPN)185uR literal 0 HcmV?d00001 diff --git a/tutorials/tts/audio_samples/phonemes_as_input.wav b/tutorials/tts/audio_samples/phonemes_as_input.wav new file mode 100644 index 0000000000000000000000000000000000000000..e8c61c12c66e1a07f674fb45876f87931197684f GIT binary patch literal 109100 zcmWh!1#}Zj7hZ{bo2HVwyBDfJad&rjhvGckdAPeiq_{gpTeR+yq;1^Y*8cow&*Yr! z?#$eKcdpF6-;5gAuU}L#2#oJJuJ631MPWPu06@}a?`$VF z5+^;ah&WALAg&O1i95s};x8d3+6WjR2r3W+z@GjVA`Hj{CIg9{-ZamB2=E8IfJ&l< zU;#cr3_u3Ngb$DeL<8PHFc1Rt05$_TfRUgBJAp4i2k;a43RD8`JY!HiPqzY%p8pqs zX`XlD0VT1X=tr#a%>0sQ@XRg%h68M11Cd9B5oyF|Vmm<)3xU}{J#m{DMNkP7zLV%A zQau#J0qH<@03jZFcsStsy+M=`2(bt_2NVGJiTdn4~?6FIlwdEBX9_~3w-iW_W~FK?D0^`0>%IffssIOARPEZEFy*z z{}8i@WMZ91UiC!0hfbqs#QsDEF^>37%=Gj=0KD{YwT)=;%vtD>OdN0qC<5|<(Vlhy zsPV`!!*lmPqLO$({32cx5gut)6H&kuVl#1&hy-o{8vw#X!C-(vV7L!)pRnN{iBO=9 z@CSrM5iTHtiNi!3!2l)zs7LNM2&ZSP6yOfg6QB~G@khiVJeNpuD@atqQZFj&1r&#_ z2T5QLTm!v^c7tC@zkwsjQ2P+ucUy!*X1!`Y?wE~zBDMgD#666M7h*GrCEz*_ZSC+H zDwp<%EP!4Bb0ICX3D}I6Vl}`GGD@>jFT!=icbp)?i8k~m>c*c#58yLkDt^`7#p&x< z;T+-q?Ou-Zfm$#bScH|M!-$Ds7}N@hVJ<0`w4Pix6|j)7tzGB^w92b6l`p9#Drx)5#nQ9?~9h*LxiaRL8`k03OacgNQ3w2G$>| z!vg^!aGSV_yRb@ZBmSB=1Om{%;8tJ>ei$2$yYbJ&UeC%I3{Z$v{2%-Yp6+QOKm~XN z>JJ@(=74d4pJ#n`U;T2Fv%Bgf%wo=pf`~R zXrUcse=-x=LGW=1pG=eheE~mQi2g)v#C`}MHIvfG_aQkp5SfLJBE+B?`UD*Z0IbtJ z5-CO};xXVnXfAXC>JAMC>EH|SGJKvKK*E6=cmo=Q@jRMp#Z!@NXR#{_+fD2T3ZUum z1n3R0lep*6#u^|UB9S)2i-_^)79<`qyN4idw3@gFJ_0kL?l1+~3+w_DNC}iRWGi$V z>IUBZ$xEqJnvKkbdSd%ragJRMHF67lM~a7k;o-E%e6jg1 z*md1?z*UHJ1FA`1$Xnn@;wM^y|A0PI?Tjb1Z}0?SJ9-hFhD}3Jw;1_`j>M910rm-< zg)K*iqP_4}z!B0<>L~gr+HUw7GTfPn%D@!Lee!hBipre(TnmXn_z%Pbs__Qkgh#Vs zAQZStOu;3@UHCrPLFRfK+Fz7}#t=)Oo8VhqgBD_aF$dC&oJIR$5_B#uhEVc*@>o!T z`4VcV2Q8f0gH{h2(QtQ_yUtC)dIR^sBxoM?*nZwT#=O}T44t7}VaCyWQEr2y(P(5B z>H>GuKQTG<9nf6V;B?y5wj9?v;upnAm6MEko^zUmjGZS%(~M*#z{bn5AJ|ZwjqgII zqPH?1J{5U z;0qYB3`FW2?OKi=C$@uA0ULH7p&}ye0JxLnBuil_u-;>}w*a+3s>jz$J@HU~yb38m zt{{`J;ovv83;dHfil*X!z-D+oNeHio%uphB?5S6i!Gmc$Ob3<038te$bM<{5V$J*Z^b)&bkPO^5; zo5}g`M#u*~Kw&U@aRBao)-U>CdMSg%oyPa^dng`bXuoY-H?BgZjjMb_jOxM7V}v1I8&wZy=j&;(XqlsxK=qc90`u4uKw6tU?_=7 z1!&o{e<*R3Ei?;bA7?oCBI_1S1mq&9I}!T{9)$U%@64UP_K5Ukae7WpZb4W+Klf9H zCFxq+l<*nBdjj_I{w0Svd^L+@PRT7X+`gpKs_3Skr|o5Ao4Z-=TEmnm$m}Dq5PO%WSna;_M?UoY5VQm-9NzFjrd1JBplBrH-RaVQIrCSuax;~b0 zXP!G2d4p{PdXqNLp0g%Tgm_K>{6 z)SgSb_Dvt0ax>ME)HNzC_?CAO{LNG)s;gUFORBxnaJ+>hIV{i8ZE}7f4`HM16ZAPG zCUDL5#d6X(-FU?O!TiEluH&jM$Uezrs-EWg?k_kO90~RZdShu0rBS0DsIFIBkT$np zmK2Gd;-t>M%D?)R=F7I_Xd^k78{qprz%I-R=*Az#C;?-TWsdERS|=OniEOcdGjF&1 z(xfq)dZS~&344b1=(8a2Y8H~s&%2v@IrC4X)q5S%B!2t1^xLwZ!|JZI&^rpHdpp`> z{ndl)=cq@$Uj)^MwFkWvD9P2f(c1phP@iK`mXGkLgaS&jqcfZ_V5*nD;lR|^k0?zG4VvQG80Q{Nq84vBz7v% zO8K|yZ@<3FeqE?|QuV4aPU6s@cq(^$@T8dGF}tHU!AHCz>7$7q&IIcj-BQhFRjh1Y z=NL&~oTz$;*1n>BW(7u8P_~Q1xe}{d1{@(t1>}N{ZnsR0B8QEEfkQp92H>O|QqZnDp zuYd?amUoOV?0cMdl)|?iluvIv)o`)-l6aAHj|R|P)b-Jat4}GjRAjB)YIpwQm~IJH zckA>OZD}rUE9`hHKd2jJ&BI^PVz`@q;DF-54M8&lGWjLU*|5UZW{T4k%4W0|w_L4H zuf5%{X~I;LplxbQ+-yqE7? zzZBn}zIkB*0cYuhjjG?+Td(Isp1gl+E7|_x_R~2+jdC*)i1WUxk&C3bG>4{D7|rfv%7t(YKuW^kn3u6 zFLZ+p-A$K`S1qsI|3Rx*1H8(4<-GqmYSsa!kX(m^o8~E6IjkPE3((lZ{rR()4zI`(FKBpWwWKlk?a7D%!X7;?#TH-zV(}ddup;8|@wFzcb_p z^Mon6o_N#v^wrbi*Wj-o)!*7i>b5zXnWTV`!Q-NPM=y>X7g6rt#&>&t<(~0-6Pz7# zKVTPafa9x5-Fl~)(N@(lK)260*%WJT*GiPbq=!Um@x*p*N4xU9mT3a*;~j_H6<8{^ z0_7m3ZUY`m>Pi=Ij*1G4b zIFoMnElAia*yYV)9%ifk9{aWUXrQH?D}HZ(QGGAviT2IS-)CFbs}~cwyc5B)Xj0U) zNI}>N;f{dEzTSSZ{)2rOzN7sv@t1IhfvYUfRgjq7^1YSX9;*`T#g@@dnSHGJtXk4u zDEie3i>dNHI>;W3mw-*s8{(*oW&ddV>0&~U=yGnUH;vbw@fEP!dg_X#EQw3&`gn6q1eE7vkHaG_;hml3gDD4qUaLi7<+8 zieaMi#xo7CT6@Y*7-F4o@M$0)sst9}z3`cyo!o576UHl+i8+iaA@Utnda+z7{@X08 z6W4nw9)g<4_WqQqOBUoT(oX!*Z$xJIz!1(?@+x8;EyJrj|2F>!sZLi~PksCQw&2Fo z+cEE&t0T3uDcgM0LPv*f5eE9M=J~Mg+&*4ZpC7*2ew~5)LwAN&2QL?_ruA|?QF^s4 zZK!TY5=~aDGRGrxurbIpOM+@@yR-R2(=E{_X_9WAeFJtLI#2#XzCk`s7Lk;seiSiv zG4(#l0=#syY!bsrg;LCG4yiA1<67c*%QJh7Ix@R;Vb(1Bu>CoG!~f-cgAIU_QpOt) za6@>G$HGanAwT;(yZoR3fBT+hmvwL7hMeNu4wQwA3ebB0$H*W{C{q|#p4tCSur*>t z%!TNu5%mE^*1tHWzu)QIR?(~xv1Mi2HP(4{%sSDimWPPjo4gyAwustml?Tkx?g;QH z^)Q3Py2w=0E>c(|3wVe4fu2PIT$}A@t;==g@*f>5B~x`q(#IfY?$%+^<9|%MG@jY- zY}aKWA8AVnALlVl#1wnU1%17S5(_jX)!W}iJ*MBk_`Ix)A)4=8!9MG+4PN7y%4&s9 zx&C!;g$A(Lfrn#^$=;d6)4s=#4YPagh47Y)d;=9G0 z-5MQO@MGkn31h~d7^2F}i!T<6S!-zN^q%aq-iX(7_A}^T(;Ugqzd4`6-rali{&QMY zT*q36gbI4c`Pc9z%yFcTXdSkfBxk{amt)8&`I)a$$HpuaK4O?rxA~<;F87nNrOTvO zWRn#ZSug3zj=I(hjc7wcV?tY|Y?e0FJ_4wq_T%*8&-DrRwy@_@ELe@J-Nvz8wG~+> znpT^lw82`gV?U7P9TImUE1A;$$vI*9`q&J-3JQok4ABD^Wj4cDp8Frua$Wh51X+KkoqIdcIM^_+D zi=Sbj=0CMdsZ=B>OJsG@l1{FKDKa(JH}YD>x4mqyQu-L{T~Fbj%=@0$sfS>kS9f}U z@Qr(o?Txw6G{n?w3^uKC_<~7XAfiW#A$x6MTDK{=hA!XYwuc=QvUuIRHu^++HwYTN zrPRs96hn(>V6E_n=*zxuD}KJO?;-U!g#$H|GmJlsiBKdy4{3AG1Wqv4`1;4>q?$4& zr^7KHc7Tr^Gmr#(lyMB$~G zqI5NtC(4cmoWM4lseh;{T^zT0A!cU|9$Iq3V~U;TC#d`r#_ zwzD63GN<)KKgtEp2!YXeo!4>FVUwn1@(;}$&GWx+OMc90Ni|?t6=e!-Kls6U-c+n_ zwA2%9?#IyW$xpMuuKww}VpD_8@>Wucv3d5c#*6APa!1DwNu~IcI8^dZ@=G$geMd*W zbf!YBeyI<$IBgf5Q_&8*5y*q)0a8rv?rs0avelen9%5y5hg}c667W#?JLq_D_rO|yZ<@k&Rh7{4xk6D^|NYVr-wJy3L0PNeud@KV zhYdpZ*e{ybn%nJ%fVrF#!KV_HT{h?CXFHM>hg@S1BnaCgQy-JueAcqYyhLBA$dsrX zKh|`wT3JP|JJh_reTAyXINcF~#Q>+ldtf5424`R+kS6y5q&vbwco-YJLYwF_CM-E= zN_MYqKHcx<{Z9WBb0;v{tBqOAa(F!uZVG2cC4|ZN>!Itq%EpGDai84pe}C!oC%kcp ze3-G!VMPa_-Q07WT^!%sENBb!w68hzc8ok;99Iy%F{G#9A#)!X?>gzxqd@HvHCMGu zNl_K6UaFduXXRHqc8E^5v^ULc*&|u1uo><;dl3`hH1aQa3ee%nP$jubk@I*syq?w1 zPY`)J{ZIav!nHkfawjKGiO>nMI2o*+%pKe$pWTAHz7#&jSP!1F_EEwjNzLJM`rq&6 zP{SF?2lX49*nJ%3pf4S`ZL;GK5=UwDUJ*Jeac#OYeQ)x;h_ix`GzRj@bVjpWd0BZ$ z)l(gz-lUYu{5wsOXOc-Bbuw>tx}IbnWs7pma@@7Y+KyTfbCl(mCEV_G1iMzCpBZjWfh(BTfY0@w?)c z!s<^ggQnwK-Fc2y>n+PZbA@rV{*n5REUtZVTc4I~Ev`0o`#rhV<3-mRUKx@M+x5S7 zt93s5LMmG^_W5u2l)U!m9(}z=uI=Ldg+h!*YZNx!J%2 zy|4}bJLU6$cha}*pT_<6ZuFOpG_0~kI%Zq72D0{#dV+qBJsJXe;lY1nXQbq$4ooB> z_X=lvmC)Iq7%|;b(GCGp2?Vj)HyR^V*E`0G5?e2|&Jag-V)C`>Z`wDy?)nq@cKu{S zvY|$ILyK$jbv(;U)RWl?3Qn%bAJ{Ko(9Z$KdI@sYCV!8d6I3Xa1daP5l62L?G##thl91D$&s@n zH-rugDCRT+3(U(qWHnuWiob0Bs`z=OX1|!Fx~d;xtkvoin>y!8=5(l)!z?1`w|7{0 zkL1l61sQXa8IgPZ(ph^!GP2TH;j}v6?u0 z99-tj3!qc4rJ4?bErJmYaMMUG0z>c9Gud#?lDhJL+JaYNKq{mC%e_>a1*J-7KyohN%K zDbTA4y;rwzO;Tt^XZq#jE72jrH9q&*Yw2zZiNd9hre2}UBt69)_F`R0=c1PGb+uK+ zRhMhNHHcbYw^QX0l^;~ERI`+N`3c!kX?Gb{xx%mrWpO@67`t}#yD=habj7f&euMLU z(&A(5!|#XBjBzB@WgN&{ntUbf4cp=9+i|UY^{3)DN8fJ#>Qj{>zNEfqsMFQS@z(0v z#H!sjIjxPVKsS{s2~b7#N!*-tHU4^(CZr;O5Tx*_UjK1ta@;H*<|V2YEObMre-xv{ zcw@J^OEr?3==wiRy+wUGKg$uNL3L7DC!Zi&+BvziPPR*P&e23G3VxMg>-}k1^+@ip zzWwqF*qIZPqT?RNeNKWi#$`)$q?zS$XMM|vMkQ49?DN7`5zlwN$^ITvb4@Z|<)!_m zVt2|~UNw|7ED&+j!(D2|G@&v2VbY(Z9dYvT{{r@T7qLgtnT^Mw`qfDh2#`s zsJ&YAq5VfwbFH?!=u46i~NY@1;;}7Pk z2M9)X898uhexIBiaY|*(yhxweF-h)>LD?}`7ZbMxU7+4H?`Z4%&G_8(_V@c!->U!m zG;i;a$U5bH+j}=3tCLr0YI}+@HCwR*T>sGBiKS^*(xX#G#sq|(^sD8ivl)y_v_(*f#Kb-;dq?W^c>Xq(02Z$R3rSRM@rK!>q+|4g4>7zItI@dznwk z#dqI6=l@z#yGbN&Kiy#z&1^`nSyXwh8fan3URw0!a|ltY=HbESoU`!oAm3ldWO0xrP& zbn}}k|L!f#C>{AXq_(rgEvB|5HqEW-@pFAC{d?K3&-Hxi9lMBn&!-^NC*pLNJ@~!f z7f;UP9QV2xo!{H%sqX~AdGAZ?G*UV;)pSGgusyR)(Q>2hxY*vImwlEODr%(z+6Ric zZ53@>+q=jnsMqT&4Nmh$X9U>BeHKhjxYp%)zIWl(o)5d_=Jo0-OFx^ry(=%jqo=m# zuHJcUn%@vHohyD!z>THtUbM{-t8L^CiXC zr!nA8;EBL5zN`4NcnNGe_mI~hK@VYAppS5|z{&gxEL2TNus15RO=0P%Xj+>HRd#*Y4LA1*Rg6`<@(>bf6o6s zR&%~>s8V7hz#>MZ7sMy?_1?R99o!<`6t6oB2%dv^n)#9oIhMWoGmtn`H>lMEh7Eozm3dHRp&?x zF61A`jm;`g8K1l?mNAaQ;g1m<_L<@Jm7#zJBi}9a z^dHsBl`7={^;7M6{XhC^x@DRX%Kh>)GN*iy+NS+xq+5U3T#i?cCH7@DxoMf{t0&sq z%UtSzEW9LPQObc-P8uocdcwQJuw+V>EkC!{`F>%&`FX2Sd_p65U7?M(pStd<^@<3k zRskyh?VK+f(JZV#TXU*LQFo*@UDiXl)&?UAtUs|1AB_(naH0m}l3q~O(0|a^(3X;A zKqO{$?Q%S_FSI?e1X#LRhFF)`2rJk6$h^{cR6km0*J^a%^+yb?#$KjP#+Uj%+L78M z-D{oHATuM*_n?&)?Ufd=D3lkgOad~x=0e$L4EjlrNLiH5Jtr)=?Ywwj7j@WZCLV+8Mf^y3N}C+C*KDzTFgWx41o-0{8`a zKIIXq9wGttSROXagWW-}5qKA30WpDCM%>39xE=O|<|q0Cnw_3~4oCA$dr=pp`=nW^ z@>TfAzsoBWuyVd)g3_!{b9=CCKAPawF~^eY)3;?6cctdQdC&5d-A?s<)T^}zH?K4O zT*B7K7s3p0KYBgB+csGjqkJH{?(q-Xr8M~m`8nyT_HE+JqPgNx?Hi@F%I@0fh9dJ& z+hs?ZbC0u&yB!?~IG_^JP|6p|W=aWpIY|j!27X~>NLP1?bFgi*DL{Wy6Q&-bdZj$8 zysDHdZ>bRVWo?#zwPCAqf%%p7k8Od?Z6RB)JIjexv=?6I0xpNGh<*|GPl~0>;ryg- znfY^b!QA!PC0QvMN7AxV{Nw(HPWIc$`AFL6K4u=SU#@wf0Ho*IpG%S@|Ie}$o#nDg zva_AxoubYTS)K~e|F9f#^3WDM1I&Xdq&@IVnB_qtZ3eF6Nf;e{@80CfaXRfLtJ?Hg z|5U?N2y1{YxER_m6I={)XYLafzwkeAw3HT#GnS6E+0jfeBDL5{53q`vTX& zmv9L679)bSo|VmZbM|{7{7ZhDf-Z&)jIe}14gVF97Bw$wa@4h`K2f0&6GPSrll}Dk zaIb3|wI{0Jf!;2Qq2R+BW=BYnJLBE&z|mhVz06m znG1{*!*cyG{Sw1G<8|XPLxDa=XV9L|t<^gXk>>N(D~`E{j|UmK*0T$_ju&~bb-u)H zU;*?VUg*Kce1RgNLbx00AT$iS=U8apgyy(Lq1mbZ zsQs$_qJ62IrW>Q{u8r1oRgYDlS5wrhRTAYa6-m>k*`k$c*XV{C#nxd?CbAHnfl-NG zpdP+R+D77$Yshj+A4&&Y2yFm+fe(o89`sKZ5Ji+Ab6wBfr}5{+WbC#3Fv{~FHEEvj z=LW}82eJloCh+pS6MT67uZ4SqR3TeKH;4Wgx-j&7Xjy1}SaRr;z{$RTykm?@9(+zG zxRfYDA!oF8uhnK3*?(Afnzrd8w3(XQ>R}pCcfvq3?lbu7hwHozw~fP1{Y;OH2aOJs z(i~#}9TwMD)R#DnHn_UE2D_i3T5JfC?8^6GTlgpotc6B;@OF2A?f812&MmZ7+IJ!^ zfjy)c(k*BZ%z-jMEffsw2kw#oB_~qjnYm0Clf))*mhlRC``7?yFF(ldhHrP@7XC_a zAHjIOh`WKC&)Lek!`Z-Wptg`MLIZ$2w8T~A=wkh8EHt?+s2y}&x3!v|SYhWD*L7!| zy^r;YjpN+!{NtSJ$ahU~O|o@$?XdsV-OzP0Ot!@$Mpu=q(RtV{w>LOqk?W37w8Qbk z2w}tEPLG^hkrlWv;0<>`(}_V~BOFIN#!)g*N*C$~>Ssx5|V(jgt$g(Aoikns32H}PNLLO`!N2YJcVm0eJOvy zpKv?*AvWBD*QSuJkwO8!C!cPowj&aJBC-!X317gDyKyuN#=%GKOrV_1BCjUZd(gT8 z6gQp23%RejuBj<(8o+ zi-<8aS$#pe9keGBPi(IpG@vhoNAgDxIerpNyrYnp^POV_aF-Rz-iaM?s@(hVFKiW* z?m9*sC%d>UfZ1G#E~h9d`|(pYp}i4EAPoR_xfg=jZabPo;-V$s9XQ1?%%-4yWiBIs zLzhvHdfdh0X?`{V2Ej|9f7~F-jS?^5pHkq2$vjx z!EVQ1Fptt3pb#33x{el1^>$@)NKKe<_2^e@HK&bk+jh z0qH2OUCc+8{>J;pb&O8ZYw(EUKh1Q?KCTV?M|az$XR;Wpv5ELVvx0cx|JiBA%D}16 zVBh0doHc~DkTHrf4;fGFL#V_Rgw9zFRp?FBarhB9)43nNO~Mh52lcrEzet80f5<8H z8^}A@K%z1rtdCPw)YjzhxbPP{(+kMZcp1p=RhN8DL1@UlJWL&2O;>*E7OsLA0Tjrbaxk zy*H2;uuhr*9V7pAte~RUYRg*=3sTbfE;e=-pGxmwytWL%-h=BXH#rd6U4NRgmxhzy zM++&a5||N*}xfrb+xZTzgyN)PtXRN#~Ae79qc=* z0n}p*x+93xg}TW8$ufz*+!<&;>2nWVYM+H~XX)843GR~ncToB`TAC@m=aFhPvs3B& zft@HXW^I6OI_5(<65A$Zkm-z;$pQI6rd9#%75AlXBWE+XuVKFbTj9*MqsTL+lJXg7 zH6|F6V#3|gbz_64!*(0N;<+xGg9BpR&LxqtM0l%hH|I9;!WI*tuEhNB zGqUYN%x0X+KWWZ@D#01*E7U=uADRwYpZhtYIL}t;CNI2f1d;GS+$H9{5GTZ&`mnjYx@?Wb5znL0-gXdY`Aqo-~mV*#?m z>s2R0-cCNQn!=>Rm#qcpbk=OlgPbD#T!eE9Z6`I$bzI?QUxPg8KGs~zcUO8q3ZanP zW&2WWxWbZYjR?A>T<=H^n_}3a*6`N(o>1Jgu$|f{))WD=|i+54Au@b$bQoPj`fiCr0zQYFv+TZ zBEB8Aj&?%%(Y+;LpkqM&SK%MlrRH;?I~>_^Zj6au^PL=WTezrQ%p3<|uB-S_;X%zT zhL7V2axZ?2^_*BvllctS@Mznpxza@MnUo{g9^|PiA!>`oQ;`d4kfgICXkO-o@Zaj~ zu2rT`Xy13FBoksTjR@ondNV;7lytdqyoWQg zBbWJHJq$WQP68&Ou!kOBoy8&9zN&!jpgR zVYuu)Q`rPxpv<#EbQ{S_CuVfzPHtZn2x~su(gktK8s;mXgZ25|yXku6DIcEx8Djt* zY$Wi1V@ezE*>B`m)O?~2ATOi{xW#Q_7&*e}@~5ts{BTJ>p~#l&e(n9YZjt{GPOhjc zs`ct{TqZVRm$9k5MhltZ=PnByuL7ioY_&>gJ4-hab3ErielmI>IhM&0*_!2m54FU; zG37&51Cb(ZB(1SnHUA~A(tz41zd6(b^*fr#J>NUSmLvsp3mR@?Qv)&iMba$tc54{r zGLWyca)0yI)KM8J5m&0mGv|^X;O8iloo4_EW0_xHNxAEL;vKcMX<2unVP(yj$Qs{l zah{d%qarKaUv(+;NxoxiHgTx@FzF8Og4V>eo4TD%GYZ~Gx9ZxXmHMAxce2W8j@ey1 zmcBaL*%HLA^PI}Mgr*q*E`=%4AFwa*=NqS4Zu@+4#=2z;v3(+~oXXQ4#T$bU8Ja!X z-IbHdqt|W--5i2^UE^Qum)APkdz0Y;kn1~MGz01gIjg9^NxW5*GW9yzLgoj2g5+%A z5x7^&@_;hRE?qZ=2QpLMm#g>RTocU?qD_68PJ^+59{HsHlK_#3#0~)MquTPJBdgfgIg5EijUOp!$q&8l$>_ z>M!OUvh-KiCG*s?Or6Qw#Y;^sY46&nX{Yob^5GAt?Luwycj|ohVd2`7K^2B|9ZTYb za8B!P#!8ly(cKcJT^AQ7FR?s}4eAtuKcGT?YKhe@*y<{hOTv!g6y-?4 zReF*Evkghgl62QyPBN-q+sY_(a2^YAVV%8FQAGuo3};)02QO@WO$zW&?wHOcz!#b_ zzTM#moq-B;^VuZ{_beI(U+jyuUiS|6j6 zW2Y$Z%V%fiDbp2i67sYGx{7c;`mB0icAb8NYEKOJ#wUW2NycCnL0$9hZJJL_$PpgRD&sz&DTLeLY zx4!jH>PnsQBV}MnBcvmTMR5lH`jePU-TSS06!Gybn-cg>a}cA|nH#`s5qnLC@0rql zr&|PWF2BvqME{ViV6B!F`H$=bz9lHKGl56v{BA$Ofa4!LJDFPK604d5__7GjbIK3N zX+bYx!G}8;ne@TGn*vyISs(NPxXxOdxBHGeD@fIt1wTprui+m0D#EUN zVfpRPaphnqoL+vt>_ap`i9G3Z;0I$2!2mbO`=s`4vsja(S4k&&P7beT9n_E*lPI&5 zPVXULZO5kgfpQZ*$`_T~4?gDHtzQ|m$qs7mX=x3ij3PhS&;pEvHu&zY)ke;xi%Wv@ z_fXK1@mV1dr97s~X7d%ro1m@ag)K+p>Rjf*Dp zQ7hGq=iYx;#^0N8p=3npUqnfm&CQ&P$T80En)SAsh9{(Ca#!H-BWi z0{*tf(Urm{vKh$4fJLSdi#ld{i<0zEI9PI=(@gfVG@D$p*JTPh7QYC8*Y?ck2Jdp) z>CmrczN?CttH~7<6B`U}+P{>1*I#!l`7nN$@HX#Y9s{@AC!lQaL{iuK)mf_yUF6@w z_dB!D%kJNtZdN&yXI(`dj(oBG4fw3`c38YVV`RfBA1ks-H6U<>gp8^sYf6(pVUWCP0tw`eY94(}NGy(xklV^p!J#04ckVux+JbYIdWdur=}h+*7& zl_`QWzg4ZVtSMZxB9Ad%xVYxBV21yUCSPtb{f)JhP_Yp6Bgd2QO4%~AA@rzojG|v~ znr)f+ZRo%nLv#sZZ;b5n-F&9)Yt(Z%Qre4qi0ET0^?%cPn{nR%PmO?^%9(GAR$mvg zF-ZH4l>l(@%XmQHRqbm(@_9T7}^qIw*xt?-Ru_>lTUE49O&xi)G{95;ZElrL= z;iUFQr18PC%EJO(oap)u!a}^p_1d5f*)QJ$8|WJ#B(5ZP@=r=H z^Egzp(&mkWM|H(CHaE4inK6wV2n;lEn3>FYSxL|i6X?7nh;Squ0YNEcvSe;lzRm*d z66CdQh>-t>>`3s&cet|Xqp%Ukq+P$IQJ^A*n zrVX|Zq(KuxbarLR$9q_fGPL<3EElcl4NQ8|6Df1{0FO3zL`1Z z?G*a(FCQ>l|1__Iln6GLycEFF~YjUNxjc9?`$> zRyU;E>yGhtQACSd*Pav3Qaz|?1=Ld;@HgBT^s{87=PJKfw#z+MI*)Bn7GZP6JMb^$ zALf;&jk8_N==j6cFMapT?F&2_rK4y6wQ~YP!mIVAD?;?ls@e-v>RIy$+TCk#auhxoq%2Cr~zo2ig zLMO=+@(TT)8kW|l5^|%^D#db94C+Sx5(<%6T0kc#^Q>bSmJnl`phIsZ+k!vP>2z zNa>r1SDZ|4sdprvFCIo}8to`LtUh90sF9KbEL(NW$s(?a{aA2jZAZ=={Os=q`D)el z(*4Rm5?5V!%@<-jNoda5$CB@ispU4+9JJTpCMhL#cgvPF6?h{*&*>IeHMQVilh^h} zTzku;SkN4YrUksfpP*9uB6NxUre9ashhLVsv0$xZc;kx|-RS+MhO(v-UC<}cB+F@Z z>sOLwc7WRAuRh?`K`kY89i{#&jQF96xkGoW9uVC<|Fd9Q{QhhUmXY`d2WT`U?dZ&|t_#^2eh zdTg}Sk^|c%8>%)LlRX9alaTdgLdqHuQvb!Kk8f38O~2OmuRY_Jik!fUxp{)l=ni^? zdxiX4;DPVeMuBqIpKMhJRkN~KWU6+qyECChd#NPlPXof`N#*GZv8=5{D&DE+?}%}g zg}mUm|4oP}z|L58K2225VjCxn-0=<7r$}|xH<9VuQRL4`LLH!N`E9ysmF9i!S_7s( z`SoS^RWjDIM0LC*G~sAT6Y5Chv~Ng^%a>PuG8(Egu)9R1*i)BUa722?vP@hqn(Mf& zoMLW&fueBl~GREgppbkTQ*3{*i zZ&9!2ZwKPK_~VYowy~;tGNf>f_FwJCoHil6BDJ73G^^R)>=np>nC9QNN&_NpmPFVe zrdE}Bt1qN+quy!Zd1V>eVSNZ^va}>zWtkSMkI7;FMTzW?26QEArIRZ`!7O= z`aJv=c(JemupaI(>xBuJx3*rN<=5oJz6@^RYH_G@jM_)s)^_1outJwz5rf$(8tZ0@4JB#eJg%-D5aX+McqfMID*6lT(jUH? z&^SS|xP>EzKcrjMaD_+;$*mSkH0TI-ymG7Og4a(y3$OR=GkhxFD#{7CR(VW6mzY@J zP5X*@<#h;5RJynWN7uXxYwyFXIl36P5i*OKk)Sg!`R3lpk)rEZxqC4>Q{r$b{7wtK zwgEtmWWh7hJi#7xSi`hHwg1hEdigrpxxeD@Wti!2PEccEJIhAVS!@sCLp1O!EZp5x z@;{Lu)r=Yj%l#|t_aqGV+3{I?9y#Dh(kv!Mxxw4Mf?sSmra)w4ZZkQ^7OoREl)Ma00x>29K8lt4RtX45OKN2o z+}WHIUV!$8n}M0%F!&=>z^;d`LF<_~s!xEOTt{~Fdk0gD3O^&fApT+VqWk<8;uldJ zy_K()ZhH(2&Nf9aMDD>xQBPckjYelu zBcL=ik=h0K6HVn}IWKh?)+1A)am*OxAoQBPD_RN{5?9b@ES!yp_Y=+F>%zPI2kZ;A z4A}+IY&N2Th5%-S3p)*%8_iKYrsYEzCo%_g@7e-hOae3j4-kFj!+dHi6hip9{54Jp z2sSU7M05pHh|YmNlF!B6+1}I#p^^X1b%nyv#!wS3UYN=~XSdN8MInfsw4kH09>_>C z0h)u zi}Ij8gj3KPF_3?uqe339MjFsQ&{x2>sX=^j4m${Oo?y>f1G;@NIGM| z+_EP-U%;_G#w3^zm^BmlIdCXDhwq5&XL=!zp*ftH&w;NXSE(L|8EykLhau<#EaiS6 zyWvc}n%e_dIaVm1+eEj4oqQ{DFS3{IkLN?}`B+L!zv1^&AL&5kADSm1_!R>)x1eO^ zIxRrI^S7w^s2OM%?gRcrcjgWjEP4wYgqi671ns2)K}&SMY@cun+QIL$BuI2jj7yeC zb&_`C36k+(8X7NrBzY`YEt-WL6cmbP@;})A*i#6`F*cQ1iw6?#eCbZw+to9{(!w2P z-&$91uXK#E*sOD{nRbhPm+iZcvzc8=e23GBr_z^vB6Z;;Y zz&b^)u!UqY0^T=481spI<3j7VSt0A`>Wy`qYZg^DE}u|5rtrbvvhodOOG?+5`ifhY zHL7@5v%D&;fwZ==#(QuF>`o>+v!B?X6wj-`1o8m7THr5!C5{rV6lY6B*m?LMH9?sX zgsFELn)sa&Mai$p)3s=Dx3F80-cW(&l8P{vMXZTVH}BEDFz(mgS5MF(s)hc+a!S+z ziegOAWl=fb$XjC3)-9~NXWi_&QQxw*P3?`EuO%rZGpbwH@3nuby;{DsGTk!KcE(m< z*SYt(i#*SL3*Gs|O}ZZ|qo?`|bRKU(ry?Ai%Re9wkXxxL_7`&)?1j`5o2-NC{GsRm zfm)Sf6=m_v73pP70uJa#gvCZD#N_I?%0q+WBGZ~h#_taw9-)IDZ`Gt-D&MD|prM4ezXi@&E*iw9;@=xt? z8)_NlxZ-r#6WlAkH>qz#3u3A-$=g6%xWULOw3cl_*Lz>!JIRIICZI7n?d##W?^IaY z)=tC6nD;6YWDD4zbz8Y*%IWI%rYoVDjRY|*f{tp>8j=$4CC_UO^mk|48>$jY{#SZ0Z+JmWMSg|4c7Ls_5YN9~ zdZ+q_rOYzGB5!D6bJ+&E;)sd(Gx`f@^!~!5*#W$R8^rNM5h-CtQZ1>>KQN9H-` znFG#prsW;JO!-@q&*j|tuoXo}SBBmLk zra3&q(+fZEI_dJ&Rj_Q}F+V~Q?<%PdKsjAsW6z+V;AN(~;PFNzaCnF?)TxES=*<*Lwc2+$Ho9hD#Ul&)G^#;1@ zkTOk4kK)WykjgalakO{dcP_V1bmkLJ$u(>+2eU%3T@=p0VCS&}U%(Zkm-#(l%Z9=~ zds^U1x5{t?Ap5}irz>AbtFSEo-hw5*dK(=Wf6!s}(x8}EeOXQ7aLt4LW zx<8_e(G=o|jE#O8-aTY_*bbFVu}}C>utGElMd+rUm$n&J$hO=ntt%}SmrN|IC|y(a zzlylZ5jD2bu?6z7>s1}C1HCHmc*hcZb-mFZ;v32c1&y$S+$QoN`G%ZDE#TT?KI{k^ zKpiET<0rkL#0R{ur}uU%Ycv{Y1eEL)m? zD!=jHN5#KtY!zEeeimOTY+jmJy{JCN@y*-VW4AuHCEM@I1V|7I)*yd)=#t6;XU}v0)a>aT_Y!x z>zHihJL^>@l{EgeFerzGBr>?o|wtcl|?oQcR694eJcfP)k&HbY$qB&}d_7z*eyWorBJTgV^{@R?A8T)A8*H0Vcd2f)$HtUFpAk9t z44+Av87W}3brWVuXG;!Yb%0#U=4u$Ixg!Y2j|GTf7HmJESDKu^IF>p{=< zow0tXE2!>ME-2qtl3g~ojII1p@wI4nK~!rnP&iACYx-Z%ef-R`Cn(oQ~ANPzT?XPM zd_;H-?!YW08iUOx3W~<2h}VnOA$ocRahROX`oUcQ2XX*W>dtQH%Oz-fspk1Z9=1jU z)oUX|7e=f#^iYL3+=RjR9of{o^uRqrbl71PR#3wo7gRK2TtVM82ZOS+}7L1+ncR=5i2 z3{enroqXdf^;Xic@EB~QFc{>(Lz%_o32HHy40xF)=053h@)kX!R}}@m^cT7tRplY) zH8%67urFGTxbPO3JWFz%-@&)T24a`c z7=D3!bwh6D_^J@gmil(J(wc^fL*rhrzJ#Lc= zjMqZrb!p1S>IV5y|5mC)p`vCRlUGI9)Tse`)NRd!Lq|sR3e7S0lP`g9@+*;B7$wqR zqe#}7UVFS+VAI*|HL$e{D_@p1uBa_uT9jQ}Q+m7lX3f;9gqnF(Co4WxAFW$uKZWD; z1Lg%a)~olp$#?uIw6&@Lp) z;(&L&^SDccPor1CClN}Z$7W!i0iEb46ev>p%@X&8pO6MI%?_-nuE zOmVdZdk$INW>hkl0Hf$DY_xE#;5PbD_(bB9p;rI*=XbAhSg$#X`)F zGkbPfJ@u<yimEYB&5hwl2P7JbKZWdoG;I9+oMzb{%F=3cu55q@A@lf`&%M znvUqhG+X^|$abltBk0CY8Xpds3nD-}dA0sSSa|s7pjFyOel_4`Z6bIY*jNWkUwSHQ z&9zze4;|e+3irJRvOF&T{%`T`wfWo2rc|X>oG6OPPtK zazzl|sVP8HT?p5p8f+!J1AZ+S;`hM67(`kTIs9;U{o(S{(k z(cslw_D@s&XBrb5+qhGxPHpwK$XXlzGtV@e!e1DZ6pPbK}GuQ z$}@p{;OyY9QL^Z15x4dKDLyJH{q@Rg=2byof|qIYMQPj6TjcHe^P#X+>Co~C6~^+^vePAz#Rp5<)$DFau|bY*_E=|ocfPwj z9#3}WFGH`ujgx@X05W9d`@6vxxe(5coSBDeB_oByFGV77Tef9 zvH_}j4G2@uif$Jq7A!29UPKfvD{fXCTk@k=UrK>F>|#V*gUuAo6}cpd{&VH+{ijKX_&pX+@qeyZuj;G)tFBd7s5P1{`gexY zM%tKQKn!+6u;H5FsE$F9j>H8Nz##8n2qdS^j|=ZT@GD`{{?r<67~w5iUeUFks168cs!;O z>;M#8voK0rC;Fd|5_A&{6~FOo?N={+C6P&QN@!7yV5I;SUKV8udtlRrtAuC3|JDxh zcoh&1IJ*MCF;3tp`YMs+J4H;V2x<(SNr-)eeO2JFW#dEf(RdL)10O*A#VP!YHx@4; z>!}$)>u`wv%J{KWw3=zg#zC9Wf#@J49GQ){;cZAi?6hE)c%P_1a8@7|91>0z%@YwK zS_DfP1HIr|(K+!l(FV~U!5!?DaD{Mz=$vq@z#s_4MxYlF3fToK(3#*2{AAbA|1sSF zUn80>rFIhC$y>xHyaB&XJ|p{(UGdRGJ^6(&lD(S}$Pl2d{>qNwUU3h(XaC!CAW~f0f;)2x#57AIhIcbDyCL| z23`VPgFB=B08h6D&}jY9M(9lR6pEvLkOA;GK<({;^~N4x3$ZA~0ry6#VGleBHKTXY zJIExc2S1QI%ircVz`u|wa9{W#cNtJ1&0INO0|=!Tc|B)h)odqb59m!@T3Gy6JRU6^>Tbjz(vood9jQCAtJq#tl#)&p|wrfbN5({7klt>B#E%;e02) z5BOO_xpbiP-$A$F>iFjHfBZUj3L|Cde%+6tV1KR6M=r8n# z&wv!%SEd`Q;#uf2{2x4nTh4Wam2eaUIMeU|cptJ85Ubw+RgDj=L@yxgpw%#fu0VGn z9{}sPJ9-#B0L!3s#-^PWLS@NwjJYCPYEqwzYj zj9JS3f`&pp$#3*=R?f9YZb8Gqy!BtfZ@2^Bg5S)IM;IBwqK@!5Ef4~Mw1>6(6 zDQqS9hVDSq5f0E-ze3%GS!fO173sz=Lkop_gwMdfBU?gRn7I_hA=7A0)n9L(AOCSP}6kfTJtCP)m#!a87hEywhO4?IN@TzL7oSP zz-lyI)DJm_trSf`TcH}XjVMff1>4L=aLv(ffQg)n(NdVn52WGM_ntF;JL;GSw_y_bfZWYAw3sD>L8TN{00vozP z_*>8%8_6-~Y|&nHKlTm#gn+FQ$v%W-H?s-wJmdx1S3tuT0fDSN|A`(2q39N9D?gEb z&Rc`N|1C7%|WFXX^ISH4d6f~R(V0%#I++#S9&Ezw= zrs!;VB{P;zr0Yn8Pk@3r9OQjIQ3INcH5Rl$6kwBZIkq3$E<7N5<6k9-^~>|?k10_d z^o`#JC1Z;vH<5q1)!?jk;dIPari%5F5$sL+3(*Td==%dGTjkU%Dw~dAmf(|o&zR55 zCHfGF(&yn@$S3$dTgMIMX+XTMgY{@S8i=j|Z2tjp5_%fl4XDXy`6GM;ya!%}=s|D# z0<@QV!Ajv;I2WT}p36XIBV)kUWB`bED?|p21p1iGyjgG$d4qI-)!^je&?5dh6Hk|N zk6=IO5Pg>%Lza;W&dHpnrjd{6kNjOu4>&QqNh=%4&t)Tcy`U-FpZg8`1XjZ^RL|yc zr;-1`6X4-c4lCft!E502{8ToY-N}Y76SOJ7Z3OXYU zG6fN!6R-g6FA6e!_^n_da71{AvhYL1D^LjkLB4?49t{0~DCiw~6%g{Juphjd>&Nck zF42pq?SSsxjxA;~m|A8l@V#ikujL=G$sjwq#7dby>434zTC>M}S}b6dn&%a0OsPkmbvvwtRcQTOPH+eTw(v@5HXMK~MlxUlH;=P(*Z4)K zURWudBHV!%fcV102>cs(fPCcvw1qzje#&deL1ZS-y4CZ!%p>Xv-Ih7d1p^wXegaWv0_WsjfH?dR_%JNwCo(zA8)h@L1Vk%}PX^I$2z&#%jQ&FG+y-Vp zlL%B$TBsiQFWg7ofzPA`{tbhnvD`tnioL}C1x^xqdarAexKh zCqZIRkIfZ)5NL&4g+{?Rv zITv5xo$F%l)9q29PIzcLYXKYDb%FI$8v0ul)=Ad;w$=7&_R)@dC+SjnO};QZ5BCr~ zs6u)V%WyQvy%xbd)F0jeR9m5dvHTp|O)H>qFoTssbD(d4r~Ux-69r15r7Qf0s=n)a zbGwj7VWXoq#2FJOBwmi65>wh}am1UDA|t6)EA#w2iYt*N++O+v(ak%-iP@f7HrMa2 zeOv8U#aFbh_+IW;9$xNOmRIzl@N&`7;ti#ZE3zsam9uJ8^^w*^&NT05Jd!%eGzCw) z3f>4`LrOt3ogxSm)B&bFpvnU8gm%a)RDxZFW!z9&$W=fmkdA_5!c18U)mmMN=}@RE zG9#`s;Zfp$iLDb(BoxKBi?4{A7?T*eF=U#dNcmbS6`bTC!tLB-&8$zZZC>@Id~o^x zGJ9!R$(fQ1C1XoA7q2ZkQ#7i0MX|c%QptzXeid)34%T`a+S^aMyxs$3Ei)3l1Sqe1c(O4!@0j&Tv0S|cXTUi4=fWF20A^%|p@owordAfR| zeogTDh??lc__>J(8((SipfMZoj-4MhIdX3J-r&K;wc0y@*QHGa6S?7})Kg?TR)4r= zL{({db?K+#l|{>oUKPDBYE`_l_-=8d;_M;@^zE=RZDo8-Ztd;*Ue>jaHXat=L`Q-c zH5uFrtuYnmMWe7O*jKCzwi(HUv!KrK9;5*Ifank_IPFV7{+z#6;e(lkW3(?KJ!FG%qvmQLD{BgN1D-QC@QLnF zyP*N9yHcH5xw^bd>7L?0MVpGpmXc*F$_vX5mhCG`D1Te=y)vrGQuVZ^NBt)2a(lfa z)@}B!BNxzLXdTm!Rr0T)P&f;?5@C=DoX8LGztAzB=3qdwS94XM>gmqzXJXk5Hil~r z9Yhnv2LCkWb?p%|79JU`jLV6ujEjxm9RE7*WUMW^QB-z#L~xk#wJt-;t0=|QfCsW@ zzkZ^@*gZIx+d*D%Z?$##{mucK5PoIf}acWv01!>AI|k; ze}S7W5!^I&ycS8uLPS&iR>?nWrkO5*5qraCha51c z7;w!%)t|t>@-_ZBepf~31Xs|ha324iIYg{;54KILFRR8Xx0T&59$nZczdUdB-^~0U zg|bp#d03UCW^L`1dY46QAMd#DnCCq1taF}lJ#n}9J;&9=XyQFlL~n%FAQ5OCN?`G# z5V1-0OLznP84L(*QNaD^D4K?jg}-nE*)-P6EaFCkY(8A{T-EMB#*apk z`G&dNJi@%(w8+@YFv2ic@6aq!i&R5EFKfH}t^Z(|RZ=1f7S6^((C<)Ft|_JQ-FI!W zr&<@+>uVzbdE!acfa*_Go2zfsEUjHzf3x9P!-s~{hKm-DZI^SB`<45OC*OCB7*2|* zTC#-h&xQhLpO)MKz8)F{AAqMoKR~UviyZ?zJQ&b5sAT?Qb}%dGca(<8CwGz}BAm#= z&141rgzJQK6E^aD?EhKuZ{Ta?QB`|Qk*?g}G%8I$jl9VeG%M(h`LA)DVWWP7K3LzN zWi)*>P1HM;!HN&EE0X!b-lz}KavEx!&)@ylnc;c|C?6MX9jsZFE0!SZ11n~0VVeQS zB~>=g>a;$yO}C32gY4gJ%k5npHO}?!_MQZ<(-Y`@;%(_O;<@-{{2zQQn1E+^k9tk` zIs6sA18Lh7;OERe-#ibb>fi@Dkp^ULvi& z8uwGjajSpB!a8}Kst)LQ>yql5f;mZ7YpwN`^_X>nHQGAM@^8cA2DSBx?V{afKWxvi zSKB7oTH7YuHrRgHN^NcIC+x@UtL@wDEgdVIcU?YrV{ZrFIfA4ltb{)SjY7gf-98EJ zi?&6ZqGi}D@gAu~cE^95yhgcKJ4_#G7-aZm_+czCo-$1^|1w_)$_?reoEn^D&NRL@ z)EM3vHyS$|uIT1#CaKS=W-A{kKKL(@b`WmGLQ zjQy8A!LDS_gT2^pTp+&!c(!~8E@5Y(lhAkI!4?V?gKRO6PX}&GFWJG&X=Xm}Ma%+$ zaGS83xV1DZ;7ee#%3s|~6R4YR*ke3k>|uOnOgB^Jo90)h0Mi81chgo=Gh=tXM0-Nr zN!2fKRlswuV37L`r5h-kUd6@1z0l6s0kFF=Qwqzg6)OYN0tW}C zDsxrMH2&IV+Wng4npYaJHc-oIrfK`=n(7%{k#>ouKwYnHrY=|ZQLOj>=GR30RuF~V>Q*Q{&mTANT0E=KecazfML7r#M4|coN)c`e! z8WuN9vvjl?Y}K~k_HT~cPL=DN6LUTUdk$k=iSC8&IJd&>a5 z)5s~Hr}&v@M6|=_`{cg%zBaxtz6_tsJJ|QdSBz6cA5b;jBu@jsry$ZrTp~MxPd9x^p+;4cM@2?xBS*D5#bOb16=fqXOjjM>A3AR}l5)Hmq-dFBbt`OH|7vjF*zUrRo z?&%)le(nC@mb^qA&Q@+6ZzU}Vh-HPg?)J9! z3AVA;OO_Uv>z0MqUSL+2>_~71(^T7?lKZ4_e1A~f8N>jDrt**D4k?)n7 z#KQ#3k)vD@`NOMs?{G#sj@e#VCN#tVuFOE|4cl3Jg}n|ufgMhbtJHbW(Zhbx_Q19f zL4Guaa91xd_yJDlDC}cN8BEWOgN!{58?`FsWq-BbHlU;)jx^^6k*~dP z-Cuxb&OQ5N>ln*b%M$B8+cjH`wW;-;b-4YAW3^+sJ;qjMeQz6R-(l}-&$8v)hS*y> zzB=AJjLxslcCKHp)$Uxk)MNHEcr?D#zB->0{}0!Z6DSu|O+oZ0x`MgOcI6b@33epA zfo%>v4mF&S{m!&uf3g?Z9c%)B2x%)i>(@&*&wsN#OX;W0)j4%i-3cwKtI(~~AJVtg z$LJ)wQMwnpt-7Jwa?Mq3xW0ovQCpvQy%R*b}%2WUMQhFtV+8iz~>v z-jV9)V;^iIt-Ar?bdmFb^NC}&eW`t>qnp$2Z0vN}KiQ8vjILsr)Scr3x+zaCh~uq% ztGsKx68tXl7rZYhX&?*84s;k(NT<-rw3Zpjw&Mg`G`o?x#e8Ppa%J2(mZLvVpQ#q~ zLW(4g0)M3C_)T1g8}a^RQ`QPcgK5t~@pb8T`8H*(icrl^H`f$t8nhF15xR@oe9cJh zY~48!LBh3fwcYjC4M~P}x=anJmTJbU15^=;$+AN68sT%$f${QN>D71_?+H&E&s4X> zHNlZ+udwa4PjNhQq&ZgDTi7GPt);Thw9(dl>v&rj;O#DPG;#KG#dr>Tr~8)retY+L zhl6L7g&T-vIM}Aee-Nih8`$HH1}x$iU~{Yxd>u`Wq#n~#nIE7R@DbeA3+emxcB+OD z6DM#Z@h^VMcYtWe9zwx<&F_@IL+La235#l^jlLG$Jmx^`j=1W0zl66j(;|&wXG3lV zUosc#Gt_?+Cjw|$u@v%KDC~!%@YPHP=n(evHgQdKoUy;RHMZWV*Va0!R#i=`c2xfa zInn2ukM*By_gpuDqt6Y3rpz1+V~8G}1l@sV!MEWf&y(8o!8>`!xqYsK zZkxNCd%iQlQE0zs-(+{#GQpPVX!}dY7+0-(gZCyLOpRb>a|Ayfo`&iLUxi)8Tf~Xt z#o|R`y=1C*u}~p=BubQ7COb#7k3Na2x1AjnVd?#B6m?a)I6%5oJ(#ebIBBH6t$m>B5#on z>J5F8zE5?c8bIH_f_BhD=u+|<5l=zf7ep_-AsX*@RB_T!7G56Pv#~U(QF1}bKk2!x z{7&h_l{FWMhd{;bYd6yGaX)LnIi z5pP%t`~>|X68pCZY^thLbyidA#@gMw4f-_wEbVA@pz3zuLU}7$hPXLa$%8FgZ;~qn zFwA?`^{d%by`%a?jjb-xqP88gUv|cMB78^iIYc1&54jOUNRCp|wd5vH5vj@k)K!pO zRTFoKuH^ryUvz&kE51WECL05``5GpI@lcOJ?$U{F&PQXvMTL^1vbV~5Q%ocOgy^Qx z$*WU_r#($S+e*`NSMvDS7a>pe6P0%)lR1}DTJ0>X{B!YF@sH?SK6_VoO77l2Xi0fx zN?omWkf$l@777DCDAok70t~Ru`aXugx-`wFz(o1W06Jh~z&WW_yh%`lh9SB9Dz=CU z#=E$CJAT+Cwx5>!4I3I1mTJpITa@#UJIkBvdxNKtpQ$r+2=fs5-*o}5YAXR*Ig!Bd z<{-zq=34;XvL(cPB9Z`iI=Ps72cpX^svq^3`a?~kYw0WWbLs>o=W5U@zZUZOfdN`L zcv#HTBvab&bVaN4tE7;KVSi@jER02N9c87h8*RJor=23tKrkntOMD`Rkp_yPX3$%yLxk6t z>%(z`d_`QqMfhs`88MPdqsypKq=^_oI7ol`HI+nKaXB%YEbn@peSp7wKmbzlV0yZI3QvrguYFBnsO(SFA#V=!c)-vHHTL!OB+Wb69ra7wZ}pDS9aVtr(#mp*wCq7kMP{Wb=nDa$6;4e9zdGF*4&?>jSCj@f(6}C^P)h zq1*1?m3w}7{QTto{dd3Kt$)kBj(r!Bvo+6G(FmN>a%u`PKw7PSW9kz$+ceIQs=ugv zuIUpvK#F5Ofh%!GwmXN|E4fNnU3&+IZpYH)^J`@2T1!|nGSO;;5|90hb^<@32(CGNY=DXAJR*7w*GahC%Yuhd3 zQkpYvy!nv)f5PpY$la%M!_N;NbgxCP_q=KH=Gd#^*E2uu|2d&7*D~EZht|RIk^@SY z;j@`CM(D?CXRG@w=gD9DwG+63j%Ek#0&mGodMfRq{n`GU5c&+yLDJv`ZX(FAeI$@GODL!#RpeM)k@IV6mhA21w=X_4&6<$)G7HaLU07Jx!Kdf1 z3l2%&1d5G0VTwivBVs~^nkF0WXcsDv$bJaIfTqb#EWr39jQ0qb)0aG^E!OZe2+aQ z?*;q{l@9uASIG^;AEKDtLv15_;&q;5Zp5?No8XJ`34L#UA*7yp#(jgbVSn@&_Fi~Q zv{jVi_duQsR1vlMZ>C0(BO9~L)oEPn_taIXjZza+Ymy`5bIcoLNIb--Q@M_?NI}Y^%5WE47h9md}>aA}8=%qCH2Kq`p&E4}| zOWktcE@Cm+f%xPbk3S~^n03qo>N$@2^1N+)alj9vmCItk3;JF$&Iusb{pp_S?N5}@ z9GHl5fWR0eZVB9$dx@9(EmnNjbk_Z+-DC(3HzizZb|FQSIyt3niZ>;y1(a5wFg|FQ zJV|sI38h+EQVRe1lKkP}`<7>#V!E_wl4*||8@O;zt%8{y#Pi zt&>D6-9~rl&#>*mA3=RJRlQCT?l&9hL$~pn-8$zW2khA3P`H+PuHhQGnCp*R1|*o~ z!k2>Ym=o=fmcvcBZDhJ{ptlTAfv38YLG3Jcj&Lh{SMcF@qA$YNi|9meWoFYCh=0KI zet}05dA?}RYp2)I;Oq|G>H=`HO!ajpvluzl1i1uukhY6Mq_?GG{I>e_4OprU(Eq2i z>Ry^BH;PQ$)pUE(|B|ws<}^KM$T@TfIu02IABW!Yrx|~uDR|}=yl1>6Za>!oXQs>RIfJ`F&u=5> z9N!@iFu&Qg3~&R)Rrms8A2HXL;LdXVwfAwvIzKxvImb8$xq`d{iEJ>JoW~aNUyvja zgSKJ`g5#2Fisc%O)~n4kua8P-LL>`Q{nPHIHcK6y7T)4V^7H6SeUX1pu?VU3HLLBI zpP#$)^SRISzc_NJ>7Syuy2$P2iyBtdz_I3jd&Xi(5*Lt9<7W{h&1^c1>) zIgNMq4tKA2p11qkUpS7r5BLs~2)hdUh|I#oLW{7kP%cQoo}e3G6`M$&$7?{3e7!fw z{l^&&D&%Y4#<&bm^Bwn95U1$p%sH@yP)4*M2T_NqK(f70?V0PI>S^nZ^e*!7?&F?Y zzMf1SY~8p%8e4=%u`s=8=}ybqmRiyO?}7?M+(Ww3TUn(ypYfN}bT; zP1q!5waAUUWPUrAmHqd_k&R__&hC{nH^=yy&sq1oRpnIYYw{i^#)L9NGtE3b^h(GN zlT@Fjp;gc2I`I>JD_P_{<{sv}Z!fVi_KmKY-fKitCLFS%^MpB~nd0^0_aa2pNB9LB z4c}*Wk^M<0Swf`vj=Qa{Ii6WQ2{E4NjnDf3oPb)xoMQ5+{}B%S9I+hq5r^X~JyEVB z&Q`9wt{JYs&cDu|z&rUCzKlFgRnYsnmPmK(IQku#ENmIDM)MI=Lmv!h!*demBwtC{ zp0d69x#Yy=uTwrHXUC2+dS#OZlc0C_ox0BX;a|4@A4gXK9aXYK``Jt;6L&Wt0wg53 zySpszu(&Mlwz#{y`{KU1dvFgV#C6=B^}qX14uAOx^YXg8s=B-C-dj)BKTdl3@acue zPaeH_zTngG9H(K98<@bw-NniAW)yuMs%bj_&6k)6UdLGE6@n$hHJy>zeJnL=Q`GVhF0@V_ctXF_u6}8{1LjqLw=@ z+nly1&KJ@}IZ;xbw;i#_T}L@}_9F9W<3YnBLqEfVvL0oh%jOvuoAa#Q9PzFY=?1!M zPv)x&8mKCB?#Fyb26YX39GDe$x_ss2oYabG8L5G3{nDbVhF0yG?1+sFXyIO7=tU1$ zWAh@u2EP6FD)IH>*M+YRzI^ps`LsH_&~VwAt=1Bbdd>|l7jvZig!1XJ6{FWg4hw%B zv>tJKH?1qavY)j4K)33L#kWdm*%VVZ>t;tAsW+9_KZT8&AZ>)(Q{7hGNli00mJR}H z^Q}ajoveSDV{wlTu{o^kEfJPRwn$W=4oaV0WzN4`4@84l!*$oz730h>ZbxQ(nqgSk zRKsxdWZOGOBUd#sAMD_LTsr}HI?Z8^%z&q%-eDcXmd4ypvZl_jHoIzgTF2CVRXkG% zrhJbN2;1vX%5|deor_B|ex-gm_Uh$}kkKDnp4kJIkqz(8hZqJAN{YS~5WWSsAw+h5sMSSRDw5{Y~CQr9R)yuFKko_)2g zzGb5^vFuk#kCNUcQ6)LWgG)Y^78}NxORN#lemp2|BMICPj&jr8e)#+uoE$nT^ibrp z_}5jIS3g~2d5y2t2UfpR{aW??sn;v43)XpV)Rto%j=aLFKN^1M`{rnR&#oRPabKU$}lS{U?YuS+@9DxxhrY&RSQu3OS7r4TTm9^5$YDbtB~lLw?CGo03u zueoeE&3Q@p#2wD=I18_IloQX2=bUESGwUwf8hc+ma?8fsrB6$n8^#%r8v3Hnxyu-4 zO>-3BKDHNJ(=cW`SHe%!gn4xJ&kJrIS}{B}ZgQow$cAjJI-;svHL6;#s$;5bLV_&7 z`#DEHz3Tb)JozM%tl{8nH?wV*nYTaS&V;f=1v^+MpFMC!x z+MqM_H9assG%hhuvn_T$5Ub1SN>}i&-g0aB7TR{62mBrfbPt>z;)<@7STT8AN_5JN zWGiqHtE!kP`NVYzT&mxpiGfDI&SLkUe|}u}?%cb?_mkf8=||JMeJaYjS$f5m_8-1g zbIIp^=(VVaQ6=HVP@k}MVGTlxeVe;?MUay=H1?MeWf^COA!u3|Pim*j_r_xPr zqqbLTsA)go{93#9CW9uS-LfGT@12V0Rm9zv-v=z7PB(_8_nRrXFj zStX^4cWPYPhP2ewij@|`S_7YZG}FFkBAv^NR%T52c<5d8cgNlyNx$@}V|u4gNm&6U zuPl+SU^-LV!FOF~S>(8=(ugbJGsAa>RSN#Bcc2skH zagE1qFkfD+EL7?!r{slFn8*Q_e;t{PZqBF9nXYAGk@Q}S>%znmDDj=-2X3v){FSTb%yzRl64} zU)+4ND<+ECFUEmW4NO-LuUIcJ6bcz@Q& z?qLq9kED05c;|AP*7A>WciD=PDaC5BtK?J3_>xi{{P8K`}w-|IQfhYP6?SC~{axnWARRHf>V>OE_ms^&^QR{nlia=zh@|_N0G;B26l}%*FF$>8=x=A&Hr;#X6k)Dbz#Z~a$`cj4%j=XA&xIh#j zYeii>kO3;EI@psMReMMm=iN6rF}hy4$K?;kpG~q>u3ja)%C<_S<@Hh8&_4oqd(RSP zyY?A66^;O>`Qi8PAFsV_@pi|Do?nmr_?|KDSFfC;l6qFFOOnUZn&`K4&FhfgZFB}I z@VgXH6f`^dec(2~)!rXGyLilU@2;)M1p?z8=j!ShV!H^`V078x(%PtWtTUW5%rb_U zuUnTm?x6QWk~~_`P{M3v@32pquf&H5WKXf#OfB+DJ*`%v(ZF~bfk`b=x&n7QhJCaUAnm%zZU;eCbnx<9--1sn-17w8|fI5Z-1Nc4{AJCO~-!h_|&J%J9trXH)g zFr^H=bvj!Hm-fm3^xK{}A}ct%7(L?${eF>CrzqF3-qP1r$MLs#Ufsuf19{O=_gm+! zAMW|sd$!Lq?^a$TJT;!XJgVvi-3OsFyM>mZ1I{PcE&FOqmT8Jh!Zg70 zk8O^#6*A$uz!LSK@5!G`Rb~R&hTgCtxG@h?75S-TmvRw1TcRR56}mc=X>B5t-6Vjt zBwv_3?jK&q-{cM2L;5ye$=B1OclyDAcoWsI3{x5D3^zAdiFB(hTfm=;hebpkBN7nQKV`Kw`badt*o};{y ze2(}{4SW!M8eOs)23v#P1=S1s+u!Mx>|Rr|oEt-$%0_#A^EktDoN;!P&ML_+jx2Q; zQZcs;9UGmKkZGwx7~rUVz(|`6bb1D`>%GY%)ek4eNcFz5N*;uq=5!$S+sJREM##yZ z6zie(w^&*Rth~uN$n}q_vumt#B=T{c91WcQu6d}>WVk}5x?meLU_981tS?_AsM-V_ z^e5eNw7J?M?P_hQu#fx5lm~|XDC5I~1FxwNckUD4T2oc~TKmjxg8OKXs$Q496wlY5 ztGxDk@9^#qoMW24Sl7*Mq-Fu{&E~1UrG2i3&N=o->upmF<2yr;u{S!vU{TGxfQzc` z+$;X)@sJ=R=h^XGGwvt*gPqB~!+F0kQ0_!6lF!Je>lPvB1TH-uD87tKY@X>F<7Pn|=Tr1N!Kr~Rz8YddLuH935BK8>Hm zkLM|vVOPOkNCw9J7+Fu-sNhn`Eu^L*>*Ab)97&EeU^)JAjC6K#brYLPY4Sp4h1!tD zp#B-aq%oD5Z=^pt2puYSFb0;$C#9ESnX46U!q|H(xkT-MXrb>6cSX%kX>Xo z8Am9cN}b^6ZBbjRyMdeq%2xU=P8Y4Nsi@8O1fC^Yd@K%?DuP|uRozUllVV1L4x1H( ze}xH}2yHWMIqgo3EL0I%@qM|;Sfv0`ldb{Z&r7|g>{W`Cq4=#lIShXfVRM1lZ@}My z`sW+|G`|zrXn}9R?PWvQjZ6qKSsF5%=77C$2-;t*=^sGY1(O$K74s)pV$-;J+!F2) zaH;=r@LG00^8>p|DjlsJ2iMP29xSz%YDjgZa#BNSGAdkKWdXA^8n}p_$`asgHDGnr zQ+Qy*SmmVLUH&B9lfFt3@*?>mP)QRMom!&qhkAVka|Csx#q1Zh3e=lsb8*}Ywi6q} z7J}`vkx66<5c4N9Zy1gZV>_~0?4R62yi+cBn>)%~=bmw!xBzYmt6?WGHZp}=KtCNX zwZC#3JN{XziR2VN0{dT2k|mdXTUo1Kr#8|anZ*}Cu?vXo2NBy3YOF#bpU(wz9oRaI z=KniqKlzn3N_qi(Ap>f$=aDm+FWr=!=)rVS*`~(Q!So*O4b=8I=5MAUqh+#$70tw`t6AG8%&4{l5vQ-kTv{K@oXTH&h?(*c?Aw#+)l z#tdNhvYBiOIH4hYAATHqI%RT?xt&}?ZW6na*#it@Yr0vz2S%?TFP7$jD|krCm+FD3 zaa}p1E~lLcVIG3>I2p{t@q90MW&wAes|Mcg3}!odL2YVpd^%3qFZ;{SfiLVS{cq|r1}-L9@ukKrprEl_iCav4pt!D8J*vWcD}GLN*u^MheCAL&8N=sEn?N@bj)-q2U{FkOgu2%vY=Me0&@AKofYm0_D3 zDFbfSY;p>3ZzRE3k#~5j(_}6gMn;e{JkLelNi$NLbRnIplZ!wAQaB;rHf)6=kop|m^p zi!pQ#=J^fgJf8F;kNCm=UUvXeecj!s1$rJh&NbX;FQcq|@-T;G1z`b(__;|Zv z+v}N;Omilj$-?~CA_VrgjJBe7a0`Y5Ssbq_%1_0q_^SQYM`}2{>j5H5e^^{L>W3{5 zLq{=fm=q=$XNSX>&u}~;2fyz%PP|vKHazhoCTgM*R@9T^<9l;4vgY8{4afRy0h^x3 z8vhMj*h4nr2K6UK5lTMN5A-UOe zVFj5u*M(C;y|D)khtkelSb7BZq<&bvQ{(_{PKSVASV0z&<-jns#LD)EceW*QB;0`=FiVVcR9RUCOn;b;k*oFTc1D|SvXZs-cS%}#89;=l}t3a1D zhE)8ovpr!6gB5!S``H9O)I>0)XTV-Z!*0{*O$z=FMm7Os`i)3i8Bu#OvkFYl$xL_b z#~<)^UEr~6G3NsP1ucfH@Y;XWbE*ojU5K%kQ9neFk%%EPVBa0_TSc%9TwopBs2hC0 zKUqe0;AHxVJjNsO$@gD1XLKdX9J%RxMG z0POM|#^((S55z-$(&-iX&@@T7EF6VG3SS^iF3h;9I_ z!sqrNl8nb!btn&b!6sr!HOzklJgqz4s41ddbG%O;{AM>+e=g$h96AYqo?|{7cn@D# zaU*`moCyc!!Bt$8NBg|NOEm-ZTZaoq)5dgB}B8bOpF1jquqzcmoyB%7JJF zPh${e=Fy8ZkCuZSZT$ak@EsofiFiN*qY%;e5qSx(yM+~8LOR1%M`HED$@l;7T1)ZF zMVP0(h;LV6OBL{2d*HoSlkHgPrTA|z_*G3rA@Q|515`1PQo;V!wW-+Y%znF6r-t-I6AO<^JFIeMLG96a50q?RKKKK{nN>4mH1(Eh1 z=BX*>wkvjp0DN`9=O1F{9ftK-hc~$j`@M;g&4yJhM)a?U-w$P4`1mP0pAMm`=s)nW zQuulnygd}NmXBS;`~UHw1!lAb=65MpeHy$X89UKWuyB52w~}cb-Yynirp0)&@Pu5f z?G3sVJ~|#N*B*18Oe1JrSa(O*(s^*BePMB{F|&C%4=PYYc}k$Ph8b&$xq!|s=3*z^ z4rcfpcwr3UQcH}n5-j#T_RjB!M|WVM@8G>RFhf7E5@+#s16C>`|J4WM+ejWFZa%;& z9>M#zLCp6f9PV7@5m%~WfB6GZp&G_iiIgX)mW%hjQzv^fA?+t-d(V+EjYnE!1vPWXYAmqSZ|3&;PmLgo>&vV zZ$2z^9p2|0_WYIj+h3T2`goseuu%inU=w_K6zzwzQ*ZnkhyAG+BItzwKlvP<>4vxN zgLRvP+v8ZQ=~~3BPFSZ{?BYHcp%?z<2M;ZWS?mg1X@K_~inlJ0?@BnYori5rz{%ln zM27=7V-{d-Ym&+E+?{x%i{ush>g~tV`(v&vVwS9k!xn15?{)istO&!~R>V_BVl|?X z>nOs{&9H%k7%|kZDT{1Y75F_;Qi#{tSf>|QlM~oI#~?=af+fsG#GQr{?kMW$x6A8~Fb{Qe?l>0i9fX~fMth!2Ng>*MiOeK0}+d(2PlVNY2G&lrtWO~roXjdNR5%csvK5Lq?bLO&x8%)ybe`d=hxKMh$c4@; z_Ey|k#-{r5rThchPd+UtKyP3eH(McUJg}x7Odx0ChCs<-KD4GOog^<|^um2a!a`ST zcDXKtxu^D&kK!lsOdYb3PJ)V5B-y3rD4ppSCLDbp%Fsc25aQNZnhA@@P-o-r(uoAH z&6poFjp*nIb~Ag5nM>Z`cC!U%`&1^KOk&Q_3Th-e7YV{C?Q?Ap>aRRdG_IGlJ!#3D z;0CFQYMOdLtc7eqq57E4R=sEgS`+8F-Q+yZrtj%0+%$B+K#jneJ(3)y1tbOahIY_{ z@MAW>I*Jf$wqnQGg*hb9z}^MBp=u`Wi*sHkR=x>h_eOd_y^45Pn?^DM)~F_Pg(e}c z3?SbSla3NkM1d%>oF;)2*j8;$uBdTz4)cRq0hNJp>83EVW2u)Vf}k6e$Qk;k#+wey9xu&4Xf zAl2ZP2!9G$+9)zr{hJI`<}!_7DHWJXs#BV(Y)}TMi9`#YE`@S(iZWlNq<|O{Pwa;A z;==5a`5lgIL`@~-ND%V zZ;=(y^_fpbQ$OY%=A=2(1-Y>9^tk$w`HOUA1~L=WXmyZsM{R@rfnB|>rl{%aVmgfJ z$OaPzXVk8k{U7j77xRkDrRQ-sc}-(k347l?lCO?J=Ki_T7`yj%+=Q#BXAy7isk5-x z9f1Fslz(YgXmTxQXCV5xpf7u;>@|$di5QYKSr<)<~Hu%;m|Pp6EQ7@v9eQ{ zLwFiIm^Cx4aFQ9%WHZUAN1w-jIGuT}TvT>2_u-p_98#C47x113)Pu}&rW7l8R{f;3 zfX(%1;<>i;ha3#;s(7|Bn?Xim7yFaM0y&yMQed~&prJAz$_N4M2r>rU(A%kt5vNZw zGA*whQcTJd+^p)V4VZT9OBy2&lcynLy^YzW?ovvaD@$ zeo{Rxf0NqLXBb0cXvFyAjMtMrpfA$Y@A9@_O)e1}u_a__798hP=pQW{GJaI5}n5}r~160%fnKp>f79{~H zTQ|`?{4t%bPF8EvmC9y%j0t37ST9-`S)bIjfp?| zgXxQt%|=w}I^mYL9Y3v#xZPHHs>FjEvy>acx^OCpz`Av3H$uO-jQ`2pqmPt}@N6R1g8V~1qa0Cd;e?q=>yqiPg;U63J!ihq4a5X3wdvGHt)_Z0GFgpt&o;!X<}?x+ zx!X`inu}h7>{0F%YoT9|L2;@bvEz5ZS!Rtq%74I6TzKa}x^)&rRf z%nGtv5oDA6iOl3ba%Y&!@Y=I#E*-^++$PqSNyiCr9l4CGU0pNqF38jARosn7s@}>xd6jw{dA$CFRlBMKm`3a_cy1I% zdz~i08=v9yHXiq<223BaQfa2XqDer0=doQ$i5f$z1IhZHnM!hD50lhlpgFeVl#Wb{ zS|0hhzH~Y6^51bfnoVn}6A{tJGn;WMbt5IXf83*op`9m^dF(QFInF1iNg-+`|DCzY zL5pJ{%dvsvmAY9)H4nL4BknLmaeKXkbI(ZiG)@+O(|_T!|Lwo&YF%Vf;&9%&O%j+) zx(45Squx~Ct2J<*Ng;O;A2-l}YJsvJDkY1d<5Zp=Rp;Z>*BH0nemDy~#ZBppvIDvT~@8L)0Fa*{8`OG}z5?3&BSf|a*4OB%?ze3J^2ipTVlM`$g?m0UTS~^bT z#JBT}g!(+g7Q$B|xxw({>8vNLKvXlK00cZUPCB=tj`0&Ufo8}(9z`DcZ?MiPNjJr0 z*H)+2HOuAWOt9&!DYicL&kmC_25j*{$8+aARM#iU)uFAEuh`Wz+5uWO1yFc9uKq&q@S)=Q4>AnQ_rup z|K)&(fdN4!0iFS40+t4p`Ty&C!7I|^jjo+rs5XIrNd8n(p(VV~`Oxo=(% zgz8LFu&I-YH+uu)*wZu~T-P7wU`ryHAY)Ab7$-wve4M3=t)+c~Ezh#Ra>V+;e$mVdcESyA zJyXnX*BacAn9p{`2fgtOOE~S-GE9;}?PzSOUb)aeM$!UaWVT~|b zcz_x~KJq*Bn9=MoE{JVRd!dg*81C91Xmw3X&mJM(G2J3=hBXP*hjuK_Rmx3{t8yd5ZkSXg<85-)N?p#|)p*Qvl- z*sru5xPHTp15HVo>HYQ&uBM{SHPczy`4Dxq6q~_v*Rs&E&vM0*X?|`tnb%og*)nW- zw)KvKuA1^GbuMkiM6upn8o!Gh!S+V&B#pBn8@&nbP?nL}*lULZvAvf5P&260vvI_W z3hR?<#qNrz7IP=gqtcYLU#Xqa?p5Aj;YE3$m_cFokV<~DJQnjk#K(3I$1YnJt7`mG zQoU$c!H9wbMWai)73mAY3W5qx0rOL@G~KY+c*wBL*uv~-y>Ap8)bCl_*G20k!EN7Tt*l8SUasYGv)-nOisvvQzG7l=~37i8OK25mG+$H`w zFLU2;2AYgG8jMPbKQxD0FoiD3Zcuu7`Nxf^l$huf+d1rac#Sw+Vy!BTt9`0EH*G?t zspZE+9}Z0m>go61?KEBPtZ17HCc$53UFqe5OSzNs(hKSpoh?Z%TUq8b%r>+%OfZkJ z>0Lje3VFeC%C5EhIOaedHN$ZYT=Z>@DvtBuPN!R6TKZXx<^ksFX4Z7gFx3!lxM%EZ zCf2LAM$Tcb2BIi^R_vrYw+?)Vzo5I(nJ4@xt{-WqgrL*UO35a+acaQa@Df8^4%1Ob z3cn=OoV+^ukNAw(fst(zQd8{JG8^PH%&!|*-I&Bw*cjU>>_@-?eQ%OzD>0lX+icow z^egC*IrCe^k6S)p%jlY|EpAe>xOlFivw604g=>{uMpknYGo9(dwdFQ)1DG3173DVY z*)gt{;PM`_r(5z&cTEsNu`IJw+g7W~eBId2m}IolzHQF%qe@bZOWZoV(v{*iWD3M`$hM;t>;i*wKX`uSb?YRG8yyH0*& z(SU-ngi-WCF7T2!&c)9sV2pj8;>ds+=-VZ0qXdykXC@B$)CIZH$9V zU(H`_+ik6F2AjJ*!D)AN!x;Lx7QdJNPa-MN2 zwhlI{xrI5%vy@8Zp?-q4`j$EyYF}#9w#meXoocO5 zPOj8F*%JRYI>C25Gt{!VIH*K0|7F=;JT5ykBm2AU`~=-hLhHf)qrR5&lx;F+Z)hH5u5B4%J#Bquer28ytZN|D z!HC1#vDbdc{@L2dESbT#ckXqzwKuY+SYLzt5^D{%mDp<7zd=(V3jGnzJAT+#*~4su z?aP2tziii-gQY4#Ln=zu@>4FwY7;&s#iaDCxw^L0XhqZabywGFn%X@6TFlnqdLA4f zVUIIdEMC^1mUU$jIo*D}`*G^a*o^S(k$D#jKNZsAP;(M^S$BYJnnV93C#jCT;#T6m z#)ET@6CSdEk%?++I#?Oz+6GiiZ`(=xGutj}J!?7BJRnj;OSEIEbAxlLtG%Nmv@I@M zPC8=5!LFf>`}PjDme#tKo94-uFPKHAWr*EkdtvvtE4Hb&L$(r!;(Y5kW9n|n;BSPd zq^?fgQ2t6>({c+d$EMXv^R6?gQKd$`>NKnJwPHfW^g*4*pK%|eo5BBL-;+eOP~xFI zvCsM0zTWq`4(dx>L_v#(=6cw5u$ub|EosrdQKs<0s0H=%db)yQ8J>Qro1 zsYc578V%~NuN9u+N;s5ws$%mBRYFSLj0z7#!B%7~bLBW!sKs7iQ=FH5229#&|s^CHv)RhQk)f)_W_QReWox3X2Rd4mc0+BVuoz$hGS3$u-|@{UnX zUuYmq1q-jSd|D}y7XmfE-_=hW9dtQrW#sOVKf_K%6~*3*Jrvh7v0=)kDrYP8PU=*# zeWglC=c1=GQE}X9}?PwXU_i zHHTUJEEDZB#plXrR6x8)G3ws6X+3q5lmJZ9P2k~%$PIuzD0a;U;MaUbe&-*cLmueZ&swcm@N)X=Tri^G40#f9AnI^eJI z9pK%;`5x&}HMIZA-F9BDISFG)7n0C(@VrK6?3#ST2_MEh|2Q2Sf^TE|SM;Mxm?(T375 zX}?rkIt`w#8)DgNvAgt2`WN`0SK=4tGA_+Y!Ca~M%}t0bx9N%4o0O{HU? z7BaMKo#CwUFmyGVo5q@btVgWptT8rUdsXPSbOHt>R;i=Tz^%WrvQut^n}kcTBYV;Y z_gqEEP?{^}lrgAikAaf56?lt|q#ft03DsWEdO}(DxVM{cd*~t+1S|>)3buw!4X+Wo zAgXR;zwlQfHG*~r_6yt_csAgtU$oD7&lCFlx?XObH9fgpvI2<0P0)?n4+YOR@<*4$ zcGG;s^p`2$*wGkcm4VZt0mJ)Z0)d5c61&RPr|BpN~%;9 zx4(z#TXnP=hU(T7DCT`5R$v#qvMZRqM5aw~+kDLgK|T5oi@T~IYF=p9yIt1R(HDDM z^z!q4?_VvbLh!8M#1Jp2xNVJC5ivS^W0)ayScrGaMQ)6vX!#T;$?T-v;3Ptn7|zJ=z3=LJm)GV-+rvkKx1PZZh<7Z
|XreJ``nc`Xh3Kbwtn#|)+aX|CkPtFI zbaMEuh~|;+BK#wm@K#~7L#u@>4E73|6wuADqxUWJrWmWqW49tVn~vLEPuFZmTiaN3 zwqbth&EgS7cM2ZoH_T7X_sIW{=aD}&-;uv7zX|?qE0|C;vSd;jV=OYxH+3;jvYfDL z?f#Avj@8aju07H{@EM*H= z6FZW#asBwNf<>65`KFnuJ?7TkeY(eWuX8@RekTI@1g#F99C9S|pRh*Z`@)Jt>xE7Y zc^^DKs6pUK|D(R^y{mZ+)oXMOG_AQX@|R)~ySUan7TLTlkw$yz`QmkjNAfdr6LK%) zgyiPue#$G(zn6a^zc{a2-lDw8`2&j9l|~yko2*cAT4CvkGe=v;Sm#~WFd*}~s~<=` zZU^2nfLp*SY!Gh+v+$bct7d{mtGOi{5|V`V{0&ZIvtduMP$r+w_2w?KAK2SmRl!GF z&uxP4ovy&$;gRE&;@i`2y?;hPWKdR6Vz3xIDI`8r4k-%08uUG2o`0cld!NN#ozbsF z)THwJn3kj`dnpMNSG(YoB; zx&@%i!Dr1;!G)jxUP`xlisB_=-YZe z@s9WV;om0kRS*}F7u+u>KcH2B;(ycM>UY7nx=$ysv-%NkDMAx=By?rAiJP49j$gL* z*6(JA(Oi~U(z1AY(dD8p#hxWCN{$p8ic*Wp3jK;wiad*RiqDnIE?r(`Gqi#(Vxnag zSd6Ef3F2U&^ODu)&}uvc%^9An%fH87d<^rgvIL@cwjL#Ds#(7%3iamRKZt||-x6pTiS33`v$4Rg1o{MxFHT^V$ zHPyN0R8nR`Cp(4Fs(R_BlQ`a6>Y1X!Ics6OQ?|VHR_Vpk%B8!D9~P}HZc_Tx5M=&j z;cRjC5l~4ffda@lc`bObv2txCS?#Ot1RCTE&_q+!q1X|%z`lyAgMJ29Zb9Gn%|-_7XuUi(@wGIQ0|@=wWIU4okPe6U;FFU>|5 zjuY}qDO74EwgKWYLu#)Mpn6(WovAEGg<`qdOlc`kS35Jmp>#R~S;AY)7uLkx=Qaa* ze3+jlWC}T&B)4^LgWOhYU%7?r$KkBvroZIARqu|GjraKE*$`{-Lf6D?vU`A6Ij?Q* zGlX`0BA7(u-Lm-hOfK+YJ((747&i*N#A--goJ*WFq_MI|YAvpE{;~@;r$w-C!am*r zw)f4FV0~`gY`bMI!>wez^OwWyXzv;eb?-a!F|eJgy1w8HSWzwz{iHkcJfNwsDm9cc zWd{9;+SE9D6@5p_(5IrKvK1C7Nlm4v5|A5=!ggU-kaOyL8qAqBgWPUwY71M0YT9mY z*=`ZKwQjpKlX+1nbuaOH?={GMm9Uk&p$Ya_>$Sn-x3&h~j%&_0K}+SU+*-CKa|i0@ zERc?+#2?xHb&SYO<4mfIz6TY%KfA-Vyu`Tra`eJ(9y&ej}c|N8j5?I z_w561^X%80D_p;!0ODa|9rvN~Ld0t>R{B>S0)+K^MNv;8&+euCmU_t5=wNmZ_nj@r z{J{(XU%|wEW?W1cVUjjLYvNaOP58$`ZTNmSVJ|zBsRz}JgF-_gliSPvEu7JO()Q80 z>2uKqAkf`IYY|E`TDR}ov%-7iSC=qxObR-h)Mh%<0NlyEpjg!etQ9>Ss@6kwq^mkY zRh5bIc4%S7NQa@g9tv&UWzK!L)p$u)m3``RwUT_$bqL(8=1|IBAsv-fD9yeW+dAhu z>qs4;p?(y7S$@kEfX}`tzf(@pAfSQr7++>6&_x@7G_Fl5vM2c-d@m*jxWxMGWZ-{e zHD-PRcaMwF*3_?bkJ8j<3h8lfimsREc+XwBdVG2EhzZg*buZ9vXG(ESd8zK1|TtaE|*4dt!KNN&-N(D~>d3qO%vy+?xhV$FNE z%i3@-4mjo#X~Op6U&FpyvhJvuFGnRdp3G8zNGz0|IgF+`>c(|Yzq^Xs$pm~oB->Co z7zM4nbIJ{{2u_Jbwx8xUmWTE^&@Eo(@|7}`BS1^kMYZ{u>Pb$pTcBTWBo*jX`Mmh2 zYpb|h{wlYU7sG>QBZu9HB&&@GBfN2U^WNot2^|if(s678Kb!B)TvVgM+Q;HXOXDoJ8bM^ndX*E3gLXA@T=HJvj8H{VB{17*|kN*Q~SD^+`njPtWN z3rv#LN*Uq+tM&pm*`(GIt4ggHqo##M$Ba`SD!J-DrVhGcjpRI`UO!9IK{&)7XGh{R z{s}ezzqq&BndlsH3%}D#?d}%qBY98P?3NFTTbT~-RXtWCUiqTGcYk^p%!~(e3wfq^ zUmUAGWEDP=jg!YY@|>IK1^%J1ir<2q{48mPSPPvndATFh_!8v$YKW?lik$V7ZTvdz zd@e}3Z`Ijk@iO~fW8kkVTI+Q~zL~n(vXA))wU*U*Eu#=xUGrO#_~C+e%X-`mPmhfnzT=JsO`BU%u2bwqmd&_z0P&h ztP`rU31mF&$2i$v>=|_C*v!0CD#-n*oxLX{3Rl&Zu3L^rt`Km!{!vc?YweUr%E#m; z${_NH3nh)D_hOQ|j_uF*shv@y4P-iki}XXzQ8xjBeE`UW4$2jJR4H(RNvnKe^_tz> z9?}KenisT!)Y_FMo};z71q@IU>OVjP7Au>`Zgw@d0eRI}?g?W@j<6XMC%o2f*VgCi ziPRn{&L>9f+P(O#>?-unnWwy_*ZE&y@hstH0(CYY$bqN8hTKuFiaVsqP;NBxFPI8y ze`$?~&Oofp1u{!fZ99doCHJ^u>IKxvQ=n$cY#IU znEk?kVSz?KoNX+{s(0vTW&qcQeaS61zF|)xBGhEv;F~`Gb#kDdc#FMEL$*Izr&g1f$O&KzzvV81 zFY%N4OZ_Zr#bosh`w6F^40R$R(|)BY@K6KQX~2PWSJp%Kv8ECP%vY{l;+l<)&jw~M zdqa(r`iT3~-TVmc2kkPbX|^JwQkO13uZV8!K=m+UUt95}d|Bsb(qj)RE*4t%FWs zIpQG2gsSm3+C#Ykb(UhL2eT9?2vk?u5Uw_}A2F#uB3useM8BBEP%+s?Unm67J+0X<{{Fxu*0kG(Xqj zlgVT%JplyGU0|HDmF@CYRbZR&5BYNJ17wiaNEOljpey1|b&Sl#Jmvi~{+gD&mNpaT zqB~Ha)?0s2t7kt+278(#Tntf$DurbmN+xZ~l(Nsc8B7VV)e_acd?;NK-}*qPiiz7Btc zo4|#k-qo1D1zuD+8m9b`Un@sZS>B}NNbk_wD_y-n?z0Cu559nZz=@0tc)otXH9k?3 zfb4%HkCGnA%fQb1AXk7EnnSLL^HDG6GOENH?icI9WkE?jgY=ZMT@_>_eZ|ZtZZrXV z!%fwOQD!PmC}iKJ-`Jb{D)zQoSH3PaP)DH4oW!@`Z!$Z93VK2#a2sBS%Dw@3jx{8L zeZdg;=PY`U-N(IP3ROuls87i*jujT7W1ufEF40UYWGvf*m!t)s>WgZlCgeIi)!y4eU8SwIOq0~hN7 z^9=nLqq${dAMPysNEDb`x4GHmFgoV2)WG)Tb2$s&RG0|P5W_!(YSKIUL>&cg**&^V z*@rGEmEJ}4g|SeUoF$C`4reer+D6g)>R7dm=HhSp%2N5U zdIIQ~^Mpg(PbVuGGaJs|=UVa>Xi-rpEqbGmLOZ*`_X*{S^^258WSgC_XcBYN9S_MpB2QM->J~8G=vwUe`C%yjEE!ep8Vv zuT9R2-^a64b0Z59ip?c!jepy^h#ys+83v8TlfYlzWY*E9%4o5kv&dH0oL<(nq<-X2##Y2jp6mcaZ%C>7x}z3{pIAvwrI&E7m?HGk6lzwvMe0oMyY!WH8{C$< zRRvFZ6Q2*3-av32CSw;)1bWt_Z5a?4os-b9QfZ1UO<$#9%GKmwNv;b1k&XP*_4k>_ z;%3we-OIcR4*ibJGX6@>a^|)uT4cyIXWQb%PKp5R;X5G04vRiejm$>nx|VaTW0zfp zQZ}_NM9uIz>fo8B+l!q=yNY|4WS4Y8?~TJ{?F|8@h1NmN?b25zpR9z1Ue@V7!oBKw zYrIbBPwPf$gM}=ng<4a-DAom8p(?Z)CP~?9sr$sJ&8ddEHR@ljx4L0o<2H>))N7np zT&`w7V{I1cDiix3rPs2(!jk)iwew%(PRLtO zkX|y=@~=3LIixwKd+Koy`MRe5bNq+;&+*&m)5SAOr_o+!-zZa@*{1qX>GLglT+*p{ zjk%G&N5u)vn{?RGDyya3c0l`j z#=o)Q zcrJAx=60K3LW-dYn~F|Q@nv(1vI^E0wn6qYB)_y|i2K2`gIzxK^zU%GLvkEBX6e104De&&biA8UQBm9g`;G5=c8 z>C&6V?N*I=Q(?F+=;IOQ*-Y=Jd+b)MIVGIn3G#X;*+2LMZ5J&Sx^P|TV7aHgu`#iv zW+9V5DknbscaCqqp&+#|$0T`FCFwi}P{9(vsD`TLjMKOD-+GS(D^gd8jv7(Xv=No>9H70VlAM@N-~ z4E1a2zDua)hE`c_X(Y~n=}%Xm{G%P#9ir;{Gh zp7E^j6X)ynad~g{+3LI6cZ}C_Z7910SjB$wEokFSuq2iRLpfw)&hP9C*|y)Evc_g# z$g5a1*br}?&a{jvuHUii$IheLt>|8VaJ!K$2LEdFrTXruDBUDd({`wMb>^#g3-pV9GwMNTiF(cb0xX)KyV09id3k(ySsZkZKpErRHp8(Q+IcFcbTaUb)y2b6i)~V zAt8P(Z}oli&3j{|$-U?7v(MgZ|0{J%ht0i*^qkw%I9xiZ-GrM%p0!?7IWp`HamKsK zb~RV?<<9HFX|MnDKkD|c>e=)c)!+Ph8~SNO?k)SE0If+1oe;S&_C(^uYRhZwPpy$W zC2mM`dH7eYSzeXa@@{uqyREddz1Y^mYAxAQ+`hO=@#^Ac)``~R=K00-%@a%KJKy`R zh`&)?m|p7J!LP!PNB#^m24B~%P$n`ya;uQ$>s>ysG*EJ;_?pRRd|J4qu%N(Oc&hkQ z$$8soXN-HlXQgjD(=GXZ^QN6uohEepJh1sV-8lJ><*kQQxfV8zeCw%e?ws-J4f#Ce z+0th-AGdin^_BI#^t0sa>F-9*yF_c9TlS-Tljl0WkaEkbYpaLO ziOPr$iW(n2FXV(SO;JwqVkdu5d2Fe<_@s#~cIKReeMGUDG0PpoNa z@2hrS$XBG}HikV{z6vby?Qlq~ zM@>a}Z*$M*hZdEXlS`+UZ7bjJ=#IV6(QWo_L4UgnKQ3^E3Q@GvzSj?hchFk>li>bA zRW&0Stwiunx8Jo)EVdL4EjXRqJm+gp*E~sK2V)1*ceC1NE`Q=V!JcFOi4CfE5$d7& zZBu&=9~M5c&w#1zOY2QbDT(P2yo~5*+nxFSb^HGmk2gIW|KR687oN{~JtsZ&=T-A& z_jqwW-A>7?F9kP=>>9T&u|?v+gl`Fb6O0MT@%qSvLH%Tz;v@ec=lIfaV@Tf7JT|{~ z(ch*T#oJ3Ft@Ui-&fA_(zD<6QzoKwZ5+MJQo|pSnd75h4VC@y{3aw5vSUFyLR;=%B z;doR^Sgsgn=I_j@pL09!NnwBZm7lWYmW_2a^UB3anz)3IwZj{v)xX&yvwMZ1wT2$* z{H2|35~*pgs-*ZXRTFjja9@e5mt%&dlhK{0 z$Z4Ka{8OD{&fStfsJNsw#r2I$3Z0(3p=N`cwt6+&Z0?yj=uDpz9Va$d)qYfYYy5{G zRX|}LmRT>o|C`Zilb?)vH0OEBOVhiy8Pjvtm=4;c?*8I3>V&L|W_ZZXD8`_Ru4agc z-;h8j{a0b4VRNKk-ypb$=8CkTc+X|Fah9g01%+es2L62gQ<@_ya1|F=m@*YqfzSP? zg-? z=6Q7IU-OghuNr@f|2{n5Q|xza^9M_6$hNAQ1$PeZ5~hu8Wtf_5R=S;Gu z-W1a(ZgTR*dIQ^C?QZKjq{E5UN19e@xTIF=Bt_`mz`e3u14X5H$D_*HMt3vAved3+5DY31hib7^; z_Dg3<*0Ddzvn}5X+Wids9+OciLzek4V`bK|?0yB4EaM%2doFWpBs*j?)I!ivxO#LA zW4BuXtjP|NJ8kFuyzrX!dr*b&`y;f zq8U$4a)1frJ+~jf@?Kwscm1k?cwID8&7Xo zzuxH@T@u@dJfL^5J6*eM&cX{>Ss(qccfKC-Zsy0|(<^=c^zCF$aPgwDkM1RWUAnE( ztoIKn zO!I##n^ZLMd*bIuA4Y%p{o{v^vpyb7@0FRJJJS?l-QoD;8z6Zu>!V85?bOGF?T@$? zt%)s)C1ckZI>y$CJs;C4azf~upr=YH)t~L{sBZo}Uy&V~`Si2tL!%Ez(>G*J%2``b zt@wbgjc2|XF72hPsvQ~hkG?_Jk?_>;d-_4zI;vvk4!uY+lpEtUI_-|B_HCu}O62Cw zrB^+z0+%S6j8my2CCSUGAFuVW)}7i!y~z#o8Z@olC8cY$S=UtVlI--pu+GifosstW z&}aHf`_GNjSAS{nJu@%Iw7}ZP@z^ttw-JqH?G!ImGVRBpA0ee-c@fD`1E5=ZYM5@A z8=Vnu(^t`||&&V0+Lh?Hct(l}#yA?3a!rb_(O!dr*ahmT!cUQf=pN?6AOdR2k~27evmi*sfCd z%50U)YDa2Lt-Z45&B`_7>F^)AL}d$-@>ePQ$M`rm?MJiky)t)v6~A1_Q2wk^Fu)j5 za;(heUWYpKb6K#`soJCM0tb)S@ba+2u-D=BBf=xPhDV3K4Vs{Sz(i6}!Z&xrvdQMz zh1%SjKPG3DXK_E4g!o93E+m>6)KNEC;BVdegB`;GY^aQDh z3g&yNCe@tGAnKEC$px6ndqdAq+)(vYRZ*?d?h8$hnjB*=+%s&8tr91XABY^eT4)um zM!8$|nkeQ<-KXqBOS3IXYe#D<%j=R3)(T~k@_kO3yPJ0aD2_4^(7r=+yhYrK$!Qs; ztF_>(oR^%$^xRt$7?F1Qfk@pX8Q!vaUe2MINj^LkgPq?c*{r`FV6mcU| zlfAhO@LpI80%d{kGwbBPiyI}WWDEMB>?9PA~{0~QE_^Q3P6_iaX7hOr-0JN*=>?t7@ zlVNiLt1&0mRw(6e`n!0Cz#ZYJbExxC`D+IzCCWEBV_aihH(XQP$2{M?mHmIR>$w15 zBe0g3L!G1VON(TC7)t&x^Huh@)J{z$t&-3{4j;zd@MriM_)5I9eXIR1QTb{E@67vz zM!G=WK~+Vg*M;kQg)5@_#){>th}$fLh!L35oGI)V^?~uCi|@!9y^q{?olDE#*mG=COI=pE zZCcq?d*AYabEqfFw}V~GC&CM&4@fTi$VmD>x(*z(m&yKQTFUd~+Z8Jn6XEC+?5`$chE+Lvq-n8;`ND?mki$I}>oh<}5F_`^HgmkN!e1aG{da8NvsbutqCAk4i$ z3-~9dej|i4oPy2s-gnP&njE?IJD5ojOFxy4w|y$ha-47t^RDpU=c=JA9*lbBSNad> zH))J)jx0kq0R6dI@+b1w@;%U_)L^EfcKMzBAxQ}YglNpG*|@&k7%0fDi)gdM=b#}o zQ88FGSM$4WT2SwxmAX6HJ({{|v!Vi1hc1#d6&Le$xOVJd|1IB1_yc;OlU4G^`0xBm z_=xH;jWq=`pf5qyumm^>MMjaU!O*+?{|T@(e!qW^r^wm3yukhwD%S9_!FGLlnsWqn zAtkw6|WQv6mI!Bc^dOZwpLn%U$<1!Ffd!( zBh=$lIhy;%oq@t>8I(Ae_{PFV=mQEcry3!)!}H%D5zyy}#Ux+>NCZnkIh(@s;2BCd zlmDns3a`=QzNTkH;^GBl`_z{_lkp4MDUsCB0{ zy{!RF>>zO#GwYww5B3Rv2)pvSP)wW$Un>i}z7`?{He4Oi zkGb3{!U(Z7=n8re20lxw5|hx?(4lJGA0)R5#5eFLq606$3i&K9$FI}F?Ya?^Dow#? zs1_ItHpNv;C4UAdq$Ma{p~PX31Uli9E(f`1HtMZ%a7i{oyV*l>E6^0(njCzaJg8Hz z!za5UV%Io$-Dcw1{t)*ET#|ca71}{}kv@`kmyMUrmQ9fZA^uif)D0zxAk_tfWe)l-Pr+|_h>qzTyv;^n+|)$HTaTH^Zg8ue zipe1>v}kLvGIt2Si622M8vqqePmnx9i0VW==x8bOCs`jJApNM$)Iw?vv`SqmBRPmH zL=P~G*bY+JRlMVR!ghWY_GV>aJ^CkS#Rq{*c+(6Zf5Ry!1zS!eteDtsPh3O4`#vZ} zJH>{WsExw(={?bhS>rKaa-0W`tJDAaQflx%IB*53gQDjnZjw=$fegZY+pnq30Qe41 zL3zo>>_n;fQiMaEkS5#}9th>oniqo0w*}LqC4uQU`8pFfz{-MpiwGh^$*SZ~@-@kk zLFoMm`ye@F&k#;9B4j$T8zVYN-NlQ!lVb;w1Ai+nj{;oJkae z&-R&kNgN|a5E9I%$6;#!HJ(5ZFg(wrqZB7(@o)K7!X-h3viy?R7rR^z(p(%llYE0u zU5iSgOynsjQwCv@wFOwnPU1G$y6rKeeF^(?nvf#=;OVZPmnKkpbXpxcjOQ#PSN1*-NBsgJYokN?-k%>ipaq(5*vwyn5+JZzSddLgf@Wi zcpgvjk-&fmFoY^&rEcnLc|OVK0lQIY}>sV3p~n1(g?8JwXAkZcCyv)q)FOJboKD8PUD z18<}YcoD6z4(lTFn=xO#m8b~Ce?6$??-8wu>zLO51GIrAh?yU-civ!9qX$@8&p~3` ziilh$xk_{(SCjWiA5^$gsOsq0$CI~+b$CjCFux)Y|AryX6p4R{>EMFKVJ0|UVEGTc z8~WFaLT$`{3*t=hT$VyF(w%rrR3R6Vx1pKd0w*~$=x!dwQvu&~5129^5v`$s#6KB} z{gnZ-&}Ga5vLKTj6}AZ{!Anhm7U6?rIgtd_SRHCQ`nzh>tqtUS;xRgrwZV$eh#Rn; zukg$H)qF1B72jyI7>;PT8^nkWm=i$eO4b1}_8aI0(?O+;MQ6Gt`raK7m5f-;Pr&ln zF8&lcgM9fne#$)lK70<=;+vF$W?CKZU_4^sNPLG(@)44Njd5k?w}XB_W7J5G~hWl`aGq;x6`OJZLu0L3f%5TInH7)_(wP z>L2h_dSMzq7khX&ICP7!w%$Rj`vX(^)1hQ*0o_X&dc|F^=8K5?nA-jU#!#ogdFUI; zuy3~GNk0YSz=D2$HafXIK^y75Dh-SG3>bu(5Rfp4w#HR& z1ef*zwFJ`&tH@I5ST`d-SP7a@6HMga1Y`Li_GMKNxcXpfzY8M!V(gA3$UV0pUpomB zkqqxV8tj2s%%j*ylFBEy;*ItorXY@#gVITgClN=E2mzskxDfhFBM6M^@y6bQwR8)F zvHkeG3vimO!Oy%2f>9IjAxD7BP!9^kV4S(F!OX2e9wwy3R;;t>&?59kY|@ME#l84s z9#EDyft&UUZ}|>ZPyc`qc}*r%v)w?;__ZeEv4&=Y6m|>5rB#Sszc_U*5J`T3_c0Y( zgi`ShH0pln*WM%dSqt7pB=*E${AmY|Dpqm`PcRnrtYl<6L!nldf;zKDVgjw9DfWWNyFnxxBR#+442}PhPzR^_>UHaiC-Uk=NiW>%vbK^D$SKUF-x{e&B10vH% zWRB&5rr2KmR0KQ|-4DgbG$_y3s)#bQA02(pt1J>M_7-XGus4uq=5h1_Qh_Q()K z`2P4~C7xtFGNe>+U$5fCH{#67$DJmaH~_j-&%jf#dgV}hUxLD^5~!gJqH-8`l{FDB ztAN3mgJ;tl5%CoEP;)RO$Ad4G6{w77wG->JAy)lt@a!^>v-Xi><3wo!!ofo^4bS=u z*!61c&~tB2?mrA>i~{Pk8)7gRb+eF-wFKR5 z9NtrP%)TE*#`CUS)moK?}_<7DAEZ9v`^h8uEKtSJgPj|TMh>4;6e zupU2y4Y(4@aT2lF4EmZEKdUaHTP4DZ>|-vte!nAgYz5j@3bKwUaJ?K*Ddi()*$zeB zbgbq#$W)U4f3EFXp-e|5fE^rsSR!LP&~Cc$i_#3Hn$Te<{_|d24HPe0bM2-R9_!f z$}eK$FJfUM#M7bpw@oA^`2A3W12JtB-t`iY#j;S%n2Pi87irQDnx7T?M4HG%tc{dh zz-~_l*H(kP@D(y=4OY-TWX8WJWIge22jiz}{vS1`22#LQ+9R&R{pSQoBXX=Ecpc;U zm10$Q$Etac8^cFpFVU8WLB6*E&#w()AUsLImbru6?>6qtDcD6-aXOC%U2QH_^$_Gs z1`txGV~wl?!>T{Bge1&_U%}Js0cA`A=)#3q%^JM%-|^08;mKdcH}4Iy*fH$1W?1)A zz=0csClQSpbPdn*0QU9_FmFmh-F2g4&>GaW9f)i<5RXMrGOvQHHXgqhR78XgHyt##)3oc?OJjFb)lo+Bh*7ib>iZ&1n{{O#4 z$kZ*w58??j=O!SlR|8Y$HKsT33XOyp{7JBQnjkN0gcGJSR?Hj8Ags_#!URTA3KXRZ z@;{Km{zi7y2T!sc*p@HwJbGZy$4NGU8#x{HsdU`X&LdNsE-Vxd2xq~7{R2BG78Kpd zIA7)yF8H6$CA%U|DMr?vNNmBXJ%n3Hm4F%h)(stGC+vj9c&=;lEEXce`hci373*p& zo&+bUk9*sCu$~5D_q!2yLZCj&#*?WG0)2d7F76$<_y&(~-pz)u`w+1f)@%)|r`d!D znf6dpiixn^;BxSz)4gAYc39PQ3EyQ8}OztB)h~yFTqH%jf?uzxA6v)6kAB(8x$Iap-IB$1B7|IDGAoJXWy*pj9 zA7_UQr$T=`htnXx-$a)68!;Jd?1nf?rXfq4hRpg0xVVjRS9yj!oyIikLUE(`7+K1V zKqR8i2%O`OK@|_dubPLOl8$r`Ug9;@XcF-cV#7t83hP7{lz2mtZX;V(#*1-gqBpFPsN>c1OJ4t zBeW0|AaIO9EO-b0Pd(gAb8vFTNDd=9cS2SdgxszzKKHM^`w(RP-;gy{0MD-zXbmk; z138IPEgKmDgKT;R*ftOF>=u)0#keAfj{(1aPC8i#wMM=gxPWy)zK6 zj^Stii+FGe_wH6$u^$lYRv^wf!A(4fe6~66taq@cdgCp~BnOdYbVv27Bi?~Q=#E{K zC@vE@@f66R@wkm>!9{+LynY*YRwK-c6=B8Sz`e3M{)riNiSdXDI3FdM;JtK2{NIiB zRR#H$M;wMzX&3UTuf!Fy8I-3g+D4s*j`KYB(^2f^NuV#rBUgVbnT2{n9Q1YfFt5;& zR6}wWZyv{ecrm;Io+HCB5!raN zM#R2U?A2GK7S)lebcoa>eJ{JstdMt5m=u?k50#UYm6iV}`eA;@A%8CC<$owHDR@O^ zr9t@!6O=*n12VaEEOnEpEZL1}P$~SxKlvm5jeHZm#h&q=JMLpHzq6_HNO`03tBwVZ zSVyt_vHbw1^jA50mM?LhcYSlu^p^RCvllTvbxcS>wrdopL)rL9vW#ek36YKQ5^C&=*uQ)M&kFZ!P-AX67dX2)9p&T7 zJ&xgyFh{Z@#nA??K>f(jrKF4 zs@{f}L>EWDj@}w=izN9h(F zho^}<(KXPyviz1~k|W!}mOpT(_`eBfNuw;Fe5QG>OAGoC^e*VP;3N7SVUwdy#pWbj zOuCwUF?o0rpHM$ujJbp=f2KY(=$@uP`G~ndrx8v}d`|ZVd#Lh7r5VMq3UB2_n)I84I-&|s@XQ@(ppzM12dv^yvA4GRY#P0 ziUA6xLZO(g7_2xZ|DD+&ZB8wc92f4e6?~LuzU#iTp)<<4rM$r5vFDTxvE8*&mTKl5 zX0`Pn`xj3~fuXuF1C`Ngg~qDsuM5-H3_l$+FF{*rZ0g|ZiPZ9Qwd0z#iur+Ybw^c{$Z9W6zZEnCnKzJmgI(2Z&V9Z zC95n-mL*q9PDq)Uaz1%ga_gi$6;tANN9yz&)SHPI_KHApL(zEzu(U-ir zKUZhPX12?ypIIlX`j2Tp??6BFA%AE=VZjMwE3;T?cHH;OWS0rg0uG`xW>L<_nMF+1uJz+44&lTk?#Sk`J~K{0_-kIt6F$kia+c zo~)s&TPPd7H}OG|E%|2xozODwO+ufPtJMRwhSywB)tUI9!hbPaB9DX&4cQhlDfFM9 zdW!RbNN=irf;FsUUCB>#>ynP9cSWu9N9R1qI{U53w*#4_KilQ)D|}UWzj&MZfOU(7 zv(~odJC=KXu&I(&WLIflX{M|hX2O5Ub@B(y8TlvWZ>n#KW{iz~M_;2$s4+yQxR?Kp zeeA#QWqmSlb>9Wo6}QtFaCWs@O5T-hxBlsWCq~Nd;M6@Ocmo1iK{iQ!C1`S_A^Lsn z)R>xv`_boOxe6mw?p0<}npMm-l*MQa_hP3R@}pBClf%aZw^lu%teAl<_fPf4yPK7F zf;xMrCC!{;x>9&M_d)jg?2EZ~3c`$u#%89~ra9(j<{c$1tXu54uExGz>~WUiMsO9y z5`vR@m{FJl{f%rv?W5DB4Q1_QK}>|~DRm_9i&sX#A?q=!*Yn^dzt%GYF7kstQ(UcF z!^(R*KH6K_|L_Kj9(fkSQVl`Z*%{a@PYn`6Q=*5)tc;C_PK`>8SQ~vPc5h-{%9TpR zNqge{jmgG03y!@IC58;wIWz;+*OdjbnV3Rd&#m?5mCGD;%hGL|tX?x|+?2l{*OdFH zAl$goxT0`zzB5lzSl##rTEm{D`m%NQV8?aGbEm@FlrxKIL_DaihCnsR5PFn+hf*bP zgE{wi^i1gmx*h$N`d3<$zDa%(+Y1v|9XE!h_!+#NmH3;vcX(F2lH9g(r|+X=hk{cK z3v}|;_nIZ|)iXk8g&heSACjdH3my}4AbfAkpYh2Rk`g+{bA~+;gTj*{SwnDKQuJK? zVr_NpMcpUOE4hQ*CN>a)xqJST-dtCxBgyu=MP`{^@`t&r>4xc@=`WM2=xTnGg6bxz z)o$HqJz(ivYPapN_jmMhe)N3vALddplW`o3{QV&1R-|jma_Ley&K#spOZ!V}QrC!e zWJBsa(MJ56ZwddR5BysGJ=d9k&pz@6`I7zn*-74a&W`pyo-vX%Mn-StDtKD^M6#wT zMpsL>PnV{hp#7j-6jU)ZJ?enrP3*|nbJ1k@-ysddv!gf0I}BCBdh613oc^;9rT9EzCfJ5s)#CotHA4^^lf0ElJ=RuM_NaVo5phMueI85>v zX+i$~7aY%5vpd+s?00U8_>9Z;{9WFIJx%JBSEcj){hiBvGHSZ|xps@HOz~M!ODQVq zX<~ykVfOHG5w5U9q0u41`jeqaQFRUbqNaxyX%_|&p_RgC>SHuV6{qAXg@V~fN0Lh< zEMJe6`vV}04)lI;hL!g$S2@nvbfw9qxwg!*m!%sl3$5>LarT#G*KKs!UHE0zD31aC zYqT%Q-^thCyUKgQSDkImHRqM6%`fEr{5z0u{{qvs8>-R{R0MeOs@Okp2_1&JlA)+J zY{L!BEWF~A;3ld8Vf+t%y11XbO7$YnqJz7INR@q4ER{Eq)nF3j-{t9wVJcR$FsPq? zYjCCD20>DtEa)1fc#%hXE4aV7F1fzAs=6OKFFN$)tz7fqHrx|zhcdXFu5qV>g?AqG$a3ErU$}Rx z=aJ_xZv)?EUuEB7?*XqB?v7b3&4nUwu8zD?!6)){`4IF2TcZj-9~HzSl0S&Qhz+QZ z86{eDvCiRUXTUAJ5pfQ8sgBYx`9?wbtzXtc2-zrQNk74HWFVta+;R$z?{Eg~B$w6SP z4QH?wm}eBxFZ-ALh-$6`HJ!0!HDvzh@%IrR5KSa|((|O-@g_erEF&_D<+~JFsu7x3 znq`_knmL;18Ww;3Tys&ApfRY6l_Ahwh01ee3#8rXBV-pM2~2V$Dv=v_38!a2_}}_Z z`xVH@zxZzWli0=V2(|+I#y`ye4IY0({SVot=;(zDp?ntR^^iI?^E_ucTx{5O5Oe7x^86g!K#F}Rm~6AEzudJCtC{*@!_i=;x1YsVSx9i}~M)A?0 z_s7D!dnCMDOTiC6&NCofT*QRwW_V|oqOznD7U0(MgR^jb&=n7Z{zJ>|_5bNtf?C#rEf8O4>$M(=sH^XyZdLr9Wa&)M)!7+a1LbnH{x>az)7e+k3ki9KPjhHQzojM z0-p=MaZm6~2T{S49~}M`a94{*9dsaSUItsni!3|ioBA%kvsuL5LcM-OjAb2WXo#FW=lWN{poOe8+8hNdRj-Q<-(lTjnK`#1u>S)Ay-?=&P?m#c@7- z<|gA?9|e)8uee?a7L*{_{U&7cLHsIAnO|bNu?p6T)xO@J22I&QZx8PjxKEq#t%^N1 z(2o|mUG6_TYdtkR@g4;{-NU_myy3oac;jDvE&Oq8W3CZ&GnG&?nIVqDS~kEhYdUe6 zSO)s}dYY#n(#z-x^maOf&ZkqP?cgnY9L|nUs4~zHUegEZX0)5CNFShSX*B%x3c${7 zFa0S!E^EQe#J6q1Bh8^(SF17Wy7)_)ROq-dYl zTkJjTRe2YAMtLs4e|xyMs`tC+5}2l(H{EyEUzh!d?ap08?`|I779CW$#US?P5HrD+ zC?G3SPpJv?4?0#lQTjl79$ABneoxnumSawHf=n&zjVeVQd|nI1Qla3X9ieB_LG*06 z0ro>ps2bgs{+<4d9!V$A1RYI>(O$|5*2^P`rrYD+aa0fL7aQ&e>X;O~11FLU*@u__ zho^R^r1d}r@iA&a9mSsL1O|cXU&L>Qn_xcA^3UJ2GA<5ow6D?idHCHqp>s0=EbDg+0j7QL4)phrU$R3i0B-$^@3KO#UQ@|A&|QhB;SWH~y$Uoo#esTo zS__8)U_L%cJ@olQ#5wScZYMN{QgJYJA~}3ND9rLf2du(FQN}grDsvq$qiJT-SQon; zktmy+4Sq}p6ldMI;oM?)N~Xf&pfBHsZ^4h||Kls7>%U%j2!+HLWD={PJz9i`)I8Mi zYZ0f2q393)BB!N;A~c3-K#ik})C}A_{-h=JLaHM)upw}jjiYqb7p%zfIEzn`SFs0M z!k<f!yZMjgCAc@F1XH0qj*Ne5X@F2{dYf-3WP^lWXC7$}9*=vFL1 zH>MZ*E-%o3ltUZ%8lAR!fif_^;U6x1Lg!(Qpaqe4D}RDt3|?1n-i_PC1#T4HAO*Eu zEv^fm1IzAZ|AL=*2zQpN2eRaOem6e>UYN`HX#A92Vl=uBdFZ%}M%JWvYt{qR|F($6Z%}vNiQZ{3I(G*!6TV6%3#^Er(m*Jou}n!VPFGDs)nOvc{+dcg1)9ie5=oY8bxH zBk&Uv>9zFV*nb4*bFZP?n}HSE43pOFK`MBM3H43rurDAdkqt=;aSvUF7O2>kqvOCz zl=!aiQ4J2mDRvuu&=u$cZ$>Ym0Tc*h(f4hOns0SD5!FUF3tedw+)OUw zTlW{g3zN_jaR`H<#B&I}1gTKQ6M_P@>4T`;eiI@vzn=(hDlN>13g9~QDbKJL)qOhsqm3F_qC(1WM~hl42e6QiLp zX^7~(MQjBvT?X`1{~$+s3P<&Fbn12^ruo6l8iJYsZv1}aP#Ua(Il?MoGUy~Pg{A1= zUBNoeMjvD`x){~qw|E*Hq+{eZbn;T5OgV{I?I9zuTZcn$!;%-^@cNOg3oqXz)LbYN zA}9w`3KPgt=vc4A`yB-z-}}fm!{G|>%Ox-I|7w+W@M(yH?@|!X8#T_>SJ1!s(V4x8 z?td`$=s_qMoiJh{1W&q^y9lC=2YV?+*K|N7300o1I-df z3k#t0T85s-FBjDA$drDmvTjM*ASV1n?1J8|1Uiu(WI7Z}hu~|{k8A_2N*s9w+B-e* z5FR`;i4^>ON1`3H5XtCrJV%ePL^1^bo+h^8ZE=zZ$ePoLFX*SG5pML9dXtsFNF4&d zRs-3d6u?lUp>wT)oZ%p1!7gYpo5OeKn`l6{_ayow-GoZQNW|O8a8JGql~t@jK^xap zc#OaPDeQo9YAzz@Txb*C{!+@}Qw&1q7Y;Mfi#5TTcq^HU`^gFD5;}lzJPBR>TIgfv z1%5f=-IwUl=lCwhAttPao@68<-e_cLk?2X)!rIXhMR3*fqhA^aKHquzWpoqW6SKE8 zi+5|fhUSt9jwrX5nj`BjsVm62i$sz1m46jGmx&|yxlI1ya+6{^&O)_h4!2#N$W(); zK`pqy{~_~Iw_P^js&Y43O_E7pv$NtReW@_dGZrq|gse8GL9@LV>>;Xts&|5!zv6h$ zc2W0}ek9NOzIwM)q;j%cZ%<=Ss1}L2@(g;H?Goiybmp6o@!T;12Tcp+(wDi9WjwA1KUD)g1 zE%G{wLawv*2q$Rk6YD|e=;1mmv{60qPjhFghpM_dZ%ICqapEz}8_7^_HTo6=sTSkqM> zxCOp6rRb~XT}bz2*MVW{b6ELsg&2@K+leIckm@1J@mkA+Xi=Ii`Ip+~Z_lnIwbC+B z7K^})_A;1GAr5iFefNk6bvs#%WT5-1Z<<`IHZV&$v3wpeifYP?ly-Gw1gUA_+D7itux7e6{9=uBB<$w2Bwd4^+r@Mw*hec*e?epP6xm;6lMW$9%(Np~Tu z`~D?9$OgCt{UhX|}TeTH@W;c0;Tg*$aNBXI|6Zh=rJbvXB`9;AD zl~xBqBY7x$&p!+dk)C40#P7aYWIHhEe#k0GzDN#zPPG=XuUeAoogJi#3Q>vRCLBq?3#Jv(iN>nOEYA;`RIl$}gWL zO1w`!iokX9wvZk;D@+h`p+dRiTjrp}!|GSEAaSq%iu+&x8frD&Npwg^;UpU%^ZjSV zdQ45WuIG?YpI#;{@lN9+S*J5<;kNDw*)2KxK^rwg|{7k&K@LGO7sI&wU*dX#lO(~RyHq$)EMD-QScaFir+3OmQ`*2Q^r}k(1w=0Y z*xS)v&pFE7lAA6Z^c=NotR=$%b?b8el{wFLxm}jnQ z7DiSHPY$}H$dD&!Zv+q0B`QdHTiH6$t@hGAs6j-0AkQ<=eTS{%Tj06kl{liUm&;CC zDwOO&uh&$nEgfhqF8pYTE%lZicRaM#uxz*fZu?$laJ2Tc^}cbw^z3tc%C~cvOY~O; zA^SV`CD4?*%~cO{kiVygpmxCGj+ua48fI_DWZF1df2CkR}tBv zvqL-Sv$Usmn&8zSflpWMVSdUO#)?XFB3B=iW3j?J?-j=>%zZg+Pi^fi4a^DF=C(8D zheb-$d1Fgs%Mt<|*DKaY>soWXb)k(Z{ouHQt_1CV>CSPk@bp4IUny*ZrewbG5p$w` zXpSaGze^)(At$q4B`lJ57*9jMnY0XP|f@CalRSc zT4I-aw04kUlyZtEM_9u1!cIkGhbi^B z+OO&m?OW|lb&gz5FO#eaNX3P`gGD6+UKmrH4eiZrjje-AK3Ip@CRzf;+luo|NyYNw z+a}h0&C{ue;jd=%9o#k z9zZQmU`T08=^yZjG%^*Ld}@p%o+ZqD~vC#+SvVMK2614qm1^ ztn01&t{tf>R86Bh2m11h*%YugJG(|XR@$v)1Fa3sy-kinx~Nvs>cU$^yG=q-%Yr@y z2MX2{=9^Yo7+ZgPFZ(|0YD>kkT$h)vAdKe6VDhOSZoZrtO*@!id1b|Yri4zDUck2- zrf#ZiA=fL+vZd0tvLUExG?a~_nv(g%M9eoe4CG>}Arq=GouK8raf3Wxyk{gOt!}LNPu)RBgiB&4#Bh-#BHrq!g${^Hj}DIOZ|D^9A*_vl zi|)0qnKoEGMl}nQ;U^>-PU;u@)BT@Z+wEs;simK-O-eqQZW)7(dyU_U#KH@O?TY3W zRWxdfXB1Z|wioAHz@xGk!lUU#d59;|yTu>D+POnePd^T{A)8Zek>z}oUuAkL-e7v^ zPt_)+ow3MAC~9Cn@jV?voune@EmSjdU|_JY6xxh<%rC9u4zjPjQ$78h)7^`qpS%lI z?rmIXT&u{(VQV8!LOB#q^5E z4_O<0F=#^YhoBYOB=so<`YB;GJA|9YSM?5bzHlrqU1ptVIaIu;m@#!O>QLk<>{!^K z@JNBWaC_0NqQ8sMO-oBMEqzMUt?R5Y_D|&-TyuPD*>`LN@8`N8Kd_Q_>8VVpyf3p^ zcATy&y&|uz3RR~oo69+dVwTbUz}w%3I`Cb3Gr3b@7q8-Up9j6|Mhx)pfH&C^Z?5Pi z?vXPk8&L7CBWX{yP$a8Um8_Zm1cv;W`?Mm%&O@Gx2`2un(NBZ~r@A(?K`rSSkNjfv@1d z{0fCN3T^rAimn)|6&*Jnvbe1e?eXRMvhI$9Ztx8JLxs8E@ZJGYP(pU5uStK%-0%t* zBEKTt3*OyF#Rp}HJcRj8rUv`$D&@v3#$#$0kuA!^`>4R*#T+fgw_%t0f<1z>33~)O z`gW2#><(8egtkmTI!RvQ86fjv5lT%}@~CGirQ9qsY6_HR4xR+!X&; z%~4nf&qoCjRs>yBW>AZn`$evJ+$4hxyw8H!dQ`w6)NJMw+J4DVbnF*ZFv`k zRlZQZ4BYdJiXx>?8L7G}{|j}`fHaGm1pU=ga)IQI7zw)JBuqD@ut)uGeK%bP-9?y7 zdkszI1kY9P2=S-H%nVolDet6-)%{c(Gl9lGc#Q~$)c)jTFf{_K= z@&mbwd`m%k;c(+r(V34MX1N-D#GN!vbEq@_Mnt-@_kKsQA2noyhM$%3-_6w!e;oIcr)EQ+~u4H zb+2K8tG-B2x*(9VWp7lURpn}L&^gUg^}?WsA<5B);>X2ricOB08fA}p6SpQ_U!ii` z>zGl|10yHngxel8IcSmgZ{-(oA`^tGyovkbJMEffuTknQnP#?{Hk-7jkwv;fU4Gl# ze{!ehPb^qcG{zKT+GhOS)S#qQ=?Z$GS$@1bx zbj+7?{k(Je2a;!GABCVYsrLrg)9qLNr(LGs8eua09(OUegFy)wj8oAcWA9d26xY@u zMokHqgf$8ILw8go*FIGpkrfbagcq!dz3m(78tmw6TVZJrKeAm$#@Ml_UtxKEQa+Ra zA}<{mtSZJW#z97<>4e!}{ZMMOO}2M;Zg=nZ_Vw3c|K>k}S$UP1Pu-)vP$t;rH{{I} z%M^3uiOM@l9Xd$`%x>vO>F?6&lpHn6_kjY;KCj^(`46#;{gphM-L+huU7P(CC7k$z z-{p^U)nT_0&&b2fE9DT?NL{3^m3o!7p}tOdWK3-Atyq7|r0C2Tuc24m+V~-Htz()+ zSB{z%E{04DUZUf)>omno8Br_HSuk_o*^}Ph&WrXSTWZOA(wK7&Z3p#}a%7|LWV<@6Kif*Ake+Syh6gJkk*8ASQ%x&dJ$$4nZPqF*G zd-+siEB!=~uiC1K2zKk5Xn*Q_`UVlS;d|WUI8AJMOasG~*g^4s#=kM_jouUWGSVGx z3274?u5+qc#Z%f2_my~lI(OaQ*}dJqq;zG;R#P`4ZOkfaWL#HdDCn8DE%#OK!2G9$ z3C0h`bkj()-15SD%J!t}HGJ*{d*d)^FBK93@u(z+Vm4<3eGf!z#0+_`;xIay6=g?c z_hnvblC&|crW`~brfWjQFmb5xg`3Q6W!w0>`CoavyT*F`8V9e5)JK zwd7v;8O`k=ZBSL+DBUCdneau?HDfI?|3-_^*PvVO72QFe(O_tQ;O>jBB;BAjFC;&ZMrxdNz0Aq%!vy;jS*R>gc1ZdU zeT^z6?~;F#53$Z3pwH1!*u)vol|AOa>f7l3>>lP_$5(@Tdstu|SknC^Ey*)dQL#yt zqH7VN)^86!5uypZ6*(mKcI*H{wb*RKRl~XXW))t=RxwP9t{!nTtaE58JZ8q~_N#r& zCTgANW4E%RZ@+th^RxXQXk#`OFE(x{iZ7a0pv_Ou+mbgd?{2=cz*P9MDA-s~lmIWl zZ0p*x0tW{Vsd}ENzLxA$elC=h-N_ud-h@gkGFzFYa*eW?;v*Ch1LSX+wG1iaDFNM+ z&+rWY3O4U|=%=s49bv0)inqwU*;UuSg=ZvcvZIjBT@zYJ4p6n=&GJ+g8`LgD8FEWM zKI~4!fv8@F%`qcmzQiaE8L{Kyz46oHsFc**s_w$$JAITq`Z!P#xwA7?AA2s`mFPrUFY1!|N3a&y| zn5T(9g{#ZI5_e((;S@QABB<_kq4bdKDdU0`Vyi4!HdU&ZE|d1A_oE_ihrjQJz!fnV zwa+fRnj65*_ow>^Z@A}{FCEp6Yh-95?Md-V*))6fttH&&k z-C+0{V>V1NjE!4TF+L$THp1{%q$l)KXtF*-vs}|%ybZHZ%vOX}V0tK}aV zybCT9Ls&g@(w|2@fFp53AUO{t(?x5WZ> zC3T7}7sMB~E9_Sg%s1t)%xjrP>b-Q(OTxqN&u5r9G z@pMvB;zDb&xxf(C+LXQJr=+Qpi=vhIF5bkHMz@471^RiLIM044e$+V)2MGp(h z6n4w6n)fRg&7<;u6%Hz~*e2QsICQRA9?rKc5Djh)tD?i{zqroAXS5$am8eVhCHs?4 z$ya1!QFCz)&}0XTSi($l0m zw+)@OOO?&!4#_3aI6{UB(33Yfk{KlZCtN4&veLsv&kL>;OfQ&IkX10gAPM^AQu5a2 zj>vyjSg-h8=^6Vq$69x?@0ah5e`WBW(75O|raLF_b3k$08Q%eSy=g>6at}F-yiC@F z8`8FL2R?!L0baMag|55^?lcdw47(pX3XU*E(2LSGTpFAm>CQ=no8WuRBH!^cP>D2@ zE(V#{Y3(PY+&s*Z0kL{>;+^DUiBqjv);jSm;wLAqO&*-MI9?Z5#k5;js*aVfmn;xh zA%-I*+$P!=sT?T?bo9hH=atqe{5Rj0e=|Q+;3%9`a3z0Q{@lE@yvzAJ3py0dC|ze? z>Rj(>=D!sf7g`-I2%nC2V=}mWp*@hE14I~R+bHrX^l3IE>ys78`@~9O5Ye3&gjd3< zA?LWaY&`plDS+G9qd=IRNI#Bdf%oUl$RTFCPy+SNFE-zYI*t}qSUPVKoc1fofoZ3 zt)UmQ-}q_BJhTQbC4Z1dM2u*s=sCFoZg(@_F6b%Tk)46N+g4}?aFnmfRAq|j?(|OB zpMIlyP@|(iL$Pr7aShptw;>u}Gl_%ZPm&@I7U&NP-S!4|9Qq>9ax#Bm(U@R74n4HMM(7HfF-vD=peOPgPVWa%9 zdA;)HiHleaq8m%A|UYhJ(nB?bEmKNUB!y|#CBZS}qnlnrx{>(TGjDmKKQMjl}G zh@s>X(K7LKu}or-N+fl~yG2*YVZ>`3!H;3r;O26oaERN_qHGhOb!1V;Bl_t3VCT@B z=s009b`c!44hSq!4*c_a$m0~1)!hw(X^Qzn> z#7~H;Zf?b{6?XQdZf{}UGa_8qJMhD6 zY7-P>n)X;DxJkbbeQ<~1t`oitclp0A03b&-hmO2;k@!d}t_pIE*bQ7*6M0mWCDzEw zDW$5QF2;P)+%&dgeB}gRQrpy#Ns|(>gi3Ms;&#OA6P6@I;@ihgG}-k&)f#D(yoQZL zX7WpzeUTrb@u56_&@<6_zhne(zP9BY`ST$6Y~GE0Lt&$WMFqbKj}*Qt*kAayXkF=2 z=XB2?f1OY?@`oy*Z!;<2Fmst3C_DlGnRmot@(}1NhKo8thS`o?0>ArP=n3d^a)D0G zgtQR$0LkP4TZb*n9HqZf2O~R~ra}?6M)ZPAC6d5_s*j|S;-jLiYMuVG@wl0c+iks@ z5KUSLGpn|BaNPDdw^f(0IU#D@7b`WJ4BxcLs?D`!8Rc%xJY7pe4fE$1#CjqAyLEY7gV;_nJz}zAD_RG| zpMdw+WDXd*>2;cQ3YAnN8iV8LC_cnknK4w)u+G2QwZ`_cf=)gLdRcZsoke1@6LX#Vc(W zoQdvdo~B-zH{!b+)PxhFD&`cM#cu>Y$sT+JxsiMZ+2uNPo9RSX#FZuAB&Q)KXOdqD zHPICMQ+%MO{)$F~YkV1G2KEe-5M}Y#aO;%_Zc|gmdg)f_B6&I08ubBfHN!aLEtB3- z-Q2_U&2-ybYWW`fEH*nPHpXVQnjRZ=>jYq=kCac5Y$t2s?+}n>a4l&vQa!NVlMi*~ z>f)H9qwqbJV7}o5VxKrQ%Gj}a=q?mEANUbCrR_ofl#5ldn#bC6x@G#~dWn9TevJOH?yUa4 ze!FgtHbGseP$*i-E6TP@GT?R!Mc;CB==A8!aL-VUAmc0YJaJ8MOt4ihSy?o_a6sYY zLSJD_@$ce{l18OBORGTr^4w9`HQ6KaM}x;B$#er)?FWP+q&M0OdRd2%gTRe2ReDI; zQZ5!i|1I4m9!_RM zzgUd0g3Y22N6&&&{AF-ePVw*Y^mOGqF5Ao7@7NIAHrp%vQb#N2a5&SEx%1phJ?p*e z{FegTfIlrD4v=RLgPNl=(G}Mco1nLI3ZWOH04ZWPXQ2azMfCbqYb!ItNTLtn}#ppSqb8$KHy03CWekk8yr@MFA$oB^foJK+Jg ziZF5wm|UPMbOQ#|e|)C1XmgB*GyIdx5#bfUT`P;y-sm>0@i!dV)*VM=bY#%a8c*!n7Na%6eOMZb4{{6^uelPUV-D88$ zM-pL2ley?U>I>2fl-ytN@9Z&Vhu}pH5LdAVRD^vFr}akSB8Suc5iNcK_GeX46WaxS zCOL%eLS%x4na!(ZV(2gU!XE;qO<&1w#u6C-y{GT6O2Q9j0rgRYkh}RC%uD_zbOTMs z%dzt!4fwsX9%w`QP;`TkDqDk|i{>($kVfJz!noisz8OAXEv^e4ew&7+}D^91T2US8N=w*uFXAxBR#-1P}pmpyeJm&25V|tS8A4xap zsUD2P%e7?da8q_3^GbS4z9XE>$%O`@<0?CK)$ih~DBg(H2L}hcVO6BKbVzUweHQBE z^9wFsoEh9XD!u1vQh*UI-#ri-%?$;b@-0tBC*k{#oiO~ZnM@zcaGfPoJ>RT z^q#|M|2yG9eZ-ka3E3^06%|CywH>1;9q(mfd4zw0r1__yjr6a%ll(8_8#xr_%_~k# zyeE#i`-|78cL!$jjNI!>f}7=uelMAdB>1(_4!TSHaQ|?+hO`>KiT;C|u)T^EfgQwH`NSwnKH>I&JE=38A#<<} z&r)p}{tulkJs)^1ZcL5{bP;+;*YT$4ZQ%+&nJxg|9d9H9{~`)rH4YNsmx#%F&UG^%(YMATC2XQ z93h>Z6EJlfMGH@Fe^udvMUOV{BK{i0dcBm}RZMarz1dw}m&G?L{##y7(=%A)r6i|? z>WG?L8R3vr*-&l}5l5-KH)B+IMF)xvQ}!1x2wkLNBoih3X=`K;y-ZeDdC7ePdnMWt zwaDKi3%zf-u<|aE;m6psiWS@e+{mnF>q}}t&+2U9b(B|0i~ua0*TF z9gz!2Imq`7Tt~?w-A7L!gds1mQ&A;OhS$USUn_1dd7dj1dV-DCCbL&ORart?kEq~p z6})1okCph9&|B6{MnW=Wkn;sqLq`&85xsps zgY((dR_MU_Sy|m*Lpnyv1HnmXBjQnXv66E=72d?_@*mSD4e!DkY!9J>Xa@QfZnsm3 zIO$$?nOC7Ik+*SI=V}?P+#>&QQI>LNv=@I3?I(0srv{%z()4Gj@==#&b2NppqC4GB ztYh%8?+2wans#6N>q~Vxxq`U9YE)p6jPO-pdKiC34tQ9}b=hlXjD5FGt@xc^N1kR} zk_#S@Wg8Zcirg?<4w;1`Dmz`%O&Lc@B!z_HvK}pUqC`wf$I56I^AKiqAd75+-xsFf z4_!PqHNHy058vqY8Nuv=iu%FgbuKdz!j=a&OB!I`gU9epG*)m#M-yK~7u_z_W>o~X zcyf(jnbc?{VArR4I!oH?TR6*OlT38s3i4ONojjoP8P^6^kz(YApVyusmjoY#E^GC8 z71uL#z4QVyfUODLT{5*wnCsC|Z;U}!;yFPgqT*m>#R>j>>7DotR-bb(;UYE{mS`ic zXLz)-2SoJ_NSM;> zs5_&}UGHVdcsF_waUsMK|EOK=0nt3=4B1EjC%&Gz2ji0e!=CgX(DW0v_q8GfS&48a z@KHQPofSC~n52rxR{C4<@07122BE#;hi?<_GNc8=k!!7yKgJj$+qWK=O?CJiFW#W%5qq_ z*>_HX+OJX<6KaL~`RDNw`4_Uc_c&Qm{5({HJcv~debb4H(`AXK8QE*Co1`<|{Zp=) zrQW0X*qYV@C4N6`k+2*5XO#DVa@&~-N&X6?YBI2-lFqU}NuBa=&)zZ%_zMNa5;Xbu z=fmR87pVmI+cy)SB$j)i!7X+0W>5S+_ zx+ULH)rp^EJ8Jnsp3nQPKA?YBP)_t+zSi-#DkA>uuIig%O_iy>&et@L>G!#BOq}Xl zc82bTG&!W<@5l;5G5(#YU!x5_{8M!=PZmpzeW_Bqv#2tW9jHnIsS!OHnT=FaZVM&^ zEU|2qD!pYrM3FwNsJro0ULiRuVbqr!MoQi9*8o*=%9GbArb`-6!E?hST0vdH59w-p zD>GK&nfdUa6rye1I9F}|%=rCuZ1IAM`G5LK8mTc)^Ke13!S)bw%cB7+aSMpUvf}nS zHA~Cxm8{ZC;hsd6nXcO|2hOEbWhdr!l=zi<<;Zu^_*DG2Bf)Ya`d5Cts>vl2Yz5Wz zzu%%`r9Q26+q3dyD-1Vuons?MFcDJ}5b&@_Gd&<`~FEg!+vTpVZr4z3vZqKyv zpENCX`-JOpU;g|fuoY^4>?T_l6D}MgYM@MvoQrl={>^=-2Z-V&_M&uCH(8tf*V2dT z`$2J_NnG2)fv7dL%$L1#v-DGRoD3<*vE+!3fA5>#)*mRG*`Uh%5_(O!#zoCVa_WvK z1f1&}ED2cMPE&u`l$R61u_g32MOEKS{m!=D^5A@HeO3E$-6(qxMY_(KBVrP3w96hJ z_@_eW;yEStYJPS+&-qkluS21&;!{$$b)CW;Ia-lcvO=}UkpHEdd8q#RSHF3#uJo%- zyVr2Tev-POT7?!;ZP~SogWxc?FFP+YcmC!Q!)V4rM@MIzlM zPf(%rpP-pUzHphi408~fAG0$rjel&&p!TE5UWPQK@b4z+yC`qlQsjRL1Tir>EIQFx zm+exNZ9N`1&E=_R$8u6GJ;2#?fmz`dDb?eJl{&uM%~F`mFTf*qyH6Jomy#s4 zn}<04Q}21jK-Vs1S<(_(E?glqF^79ieD=?PVHmmCE>-tdUH>`3z?#Kbv#Zv2uwq}R z6+T)K?|N=-^jnlD6?l6&yl?EOY$CRY_|A`7snYN*svOJdUlqO7vq>Lu0`4ooWzIGrx@bfmvMDd}@go^HJOlN(o@0RPotSYPyQ zU_yM0lEZAWJ|I+MM--i?yu7#ymW}J^juKt8mm*YZimOpnFLBL}&an>O=-y$DFHSYK z@NZK5lixJm`zt@LzSDs2QFrwwkXBo;-nJ+Inl3TzwIA%?B)^%-D5~t8zudT)h~W+2 z&(z(Q6Oq3Nzu~T{M~6o7U&(gN7=D0kZ0LZ#LUwjq5>~A=3+WWcmh>c-%j<_Ez<#sD z*8Z+c{`0#>^6b#Wa6`okU%qi}NTAZVU-}8287AWSmR1kE!l#YZtj^=YJj?r^Bk&6K zUZ1~Hzqt)Vj+z^PdmGzV+PWZ7xm#Mr*G$KJ-&y@$@#Vl>%dOyR_Mo^ys8C*wo96x= zmtNXbXY*`Aj0xgjJ<;K5mA^&w{rRids?ZreJ-zxzQi7VdGD`ym(JI;>hLops{;JNG zetT18nY~f;tRD6eg@4pyg;>=8dwXq$=_WJ%S9ASlv*_QF)Ngp;OZAvLi4&h*O*G=$ zeaHE361y}r-z<4*>FNx5;?r*yuLNflF2ssqr=i~YcQ{4TdwaGT(vs8KYrtEuF?-k_A8ndf| z<16mDpVGKl(cJKO{VM)E#bKK8I(0dt@yUeB&!8WC#}`HH%(CqsJWcGyu9tVVrRf_v z+8cS6^LA!ihfa}-RIBvuDD&<53LFKpu!MVoI2>&6LhTpXy|$9 ziMl>ItLReF#?o%G6trQul4vP0$u}ToYiU*LK)Dm2f2l@F+J!EO`!PONCBI26RNj01WB{fpNlhyK`DX-GcV$DAz#sZD^RqL8Z=-1z8 zCHu67S6fY0%6Rja=iI3HStKdS5q0*StHc$QFYZ>`T~J)SrrDb36$yFeYwusEcY*Tn zWAtfoa*F5YZR}1B^xEYVHlcdvti*B>YeA*Nf}C5rf3%4&D^|UPQ2*|!eu>)pXMRd; zTUuhfVux)+^F^P!6t{2GEN=puL)~Mq*I9j~(5$U_;?5iJCLHmsAmP3u>R!wgaXATD z&WHJF>4PQ3KQi@wkgcNU?row|@&A6Q7X93KWbWDT9jd1Br=mtrbNw-Owcj)3=d9t} zO_C}?FS5C6bnX#}I;O4ToVQE*fnpc6SY3x5$d<>N2Ab*`^X(mNmGw>2KE`3?n)Z9q z-&RmR-`3nwOShVB!#pB}bAnfq*eGw8vW#e_W1_kdnH1b1tmG;yIty_wgZ^(;L@yM3 z)Kqp?lU-BE`!X^@mQ82M1a!1}ivF$xGxX-?24k_Cl1kwu#G$CgH80IHd{Qm9&*Dn3 z&c^EgU5@*4#k42TFK$Lo6>xqai`{(rl!L#H&^OBO&$EKx%gta^{g#f3>)R|F0uHtP znFQG)W`)C=w!&%mjH@{4mx(KlE92TN>c=!e6#Np=SvgsFM_x-=jkWq_ATLZ$pgKy& zr-)gknw_!8(R@dq{3$(6z7T)&b5Y_URqt$@v6k$heH+$W-7jL`rc+(iFCw{YFWF0% zQ|V>3!dRgwGDmtJd*)xDDo;I#jF)#N^!{^9z)-_C7JCmTM17D;(LstqbY13aM|?pW?OH93(?17x~-tnCntBPi-}iu+xDfH4#!vA+Ne85aPNPnVzDTr@3!E6UayJzls6Ek% zlCQKMi-;`z(_kecP3@+!u$sQ9!DRJ|&fqE9%i--vGd>P25Iv_(a&N#<{%>G%{+52i z`}&Sy6J%OKAD$4oq0$hKg2NFxIS;GJ_MtQ6b)z%!wfH>m3i%OqJ5`=kgXgOEnN@6%& zO_)HScmpIh%!rQ=$AT5`isBjcd+4fZ&hO)jk+8Q zNC9_V{*#FFl?e6mqexYpV^0bzxLbT>c{n_a-7C7n4Hh1;t>E^33%3e;imsyi!ddqo zbT83~4n(Jk$`PBvpL;T29yu%wz-j&?rAHeRDE5s>WozJVNeR1~eUI)Z#-S4;UICLG zV|Fr)@F-G2Nd3=$9@XI#JDxoSs?aEt&H05lSO(GsR8n1qFp6+GG@tA(9Hs9F$>IxW zBwUA>N|eA!h=PkEov~N^adf0m#8pAFgk@xRZZ9PzKcGw5Z=k=-LmP5;xy5)G{SK$H zF8(a=z^4Fv=OF(OZvfow+2|g=C9(zeFn(a2yalhgkwCC|!IpBh(Dv9!{tX8@a&Df` z5({t(h>e1YehCWI5nL93jh%w)2nDwff5o=p6UdSL6M^IR^Sg*!aDP}6w1$)MTfo*m zh)fVB!x!kzfB=QL<0On6~Ab9p=YT}iE_1uaq<;T(wv1!CA_7>=Ox8nbT z;xf#SCOilW`uahn8Q(}%L06%aP?Pgxd3;Um40|1vIQPL#JPh~Io7iJ)mT+A3l#O9} z;9Ee``Yn&3auTlj^>0r6lMa6VZek^KUC_X@ra zJ`LEPSsLE*VROZ_sIUM_9w3 z!_FZUco*o6pQAH{d~T=U!mhK=z(wH((7CVj`_K%~TsIYp(Q7P%y#n`yspvDVP#A$8 zLDus%q4V3%4Tlr{+T1AN9Qp**=)KX#pq*}yPK4XoMW};14PzmJyVN+eEAJMJ*dL(|za0on2L1}H z1&vUFe~$FUX7U|?zuOrU=;yd>pugfBp4=!Ojurh&8817KhuYZ<4x~s% zxCvyPO=xR4Pb)(D!W?BmYds5oZ5{Y;F~T?SIfy|L1P{!fr@{dA77%;|VC5cxuJ0tI z74JZPA$5Tj+)vnm?gf^6Ci)mr!;A?FXMp*=hL49ikqKwN<@xT|X(Wd=pfOk?II(De zTnofqAi0+S!QePh#jAps5GOQ7rz5#MBTPZx13UMSPzuf)|H3zW2kORc!dN88+l5Og z!>thdVdY>Y^#(VKjlw2=5cpTs1AnDr;Umy=vv?zN2%d~Uy72_xHGcx0ZZuM!H-j&OjlT#n z)d7lzS^Q8SYZoDYZX1H5%i-Nl0w4G=khI&OuOMEx5oW>eELLa^4g%--9*|!;LN58v z-xE3jaUlcf1_+G%385<7(%u2aLJx?E9IVY8A%srn!QBE@TQ+D(e{gfa4Fdu9f+s>- z^dHcge1tf-3wo#3pzR;ezYzZ8QX@hhZ-A#|5EHN`ZUYtiIp)^ z4?<3Z^8Sh-1FAA6%m9MASAZA^bN3uFpSL0}Vb&5t3-q*5gFla%VLp~Xj5VUeUn`Tzj~MKLhM61W3(6goPE;6UYL8U<{Tc z?UB>SVdyJYqJ0G|P@JCuooFEPh0lUGIZe1H^hb@5Ef1o11Xu$=7HEd5(1#EUiiGLt z4_K+4fmHAlcr#AE3zCaO;ZFMk{3-AGRxqj!k%@3edqL2l4|p#>1sMUOl_S)EcmYo< z1-)bo#3{I8G)15= znal5j)glI#Q6`X4+Cu!Ai^M{#9E1QX9;oAq@H=$CIt~keBgsN0d@nQnTweH%6vJrv zV2`sIX~N5Zm%bC64gLuGA=>Wc{{rI1LS!LO^dk^i{^RwZ2M?VZLWIwRO5q}W-LgnS z;AGE-cX=ml0a^n@0Qgda5vkAwEdfry2J&JPmwUK^*qM zT)Qm%0eU*cL*#~-YXIlBzp-5Yv(OTSKMC9qW*{j*(msMT=6}HZMEKfBSLnaEDcpzE zTm;TKb%7eu0GSDEVi7ojO-9}cAB8$FUbBFD^B8{K7Vz|_3-en7M4XpU*RO*4c@R3y zuLDu#4qpbeq~GBQdcoh#fE7Lp&E|6;u5SQWxQW1ZABW@$>maJl2M4T)5TCaJ8_Noc zf%#~8WWP`jB83F0iw+eCm~)>55sC`4!4+X4%rH5S%ojq&Nq}{CUg!gaG`=)fO| z;Qi<}VK!fYEP#=|2@y9HErA&H0SHkhe2_5450KwL z8&F{*_}VCtrnw??8E{mTXh&G%|8e)%0d2M$Fzu=XamEi%TM29qG4SrYpfCIj(12zE$K@!DaZMP5|2$3>!>rto3GKLX%B>f5cQq#1SdHfew_p|MlInGxO7$j&lN#P zLgzr1{tgkW3R=jEz+K_7@D|)ssv$ghF!Y8Bfrj3df$;Ad04rt~%>K9kpIsH#yK#g5daY87hv}khLzuylaCFfht4;|MV~X1Pgcobb==z3^V^B(h9!kzd~8a z%hO?%e;4{A&EX9w_~M)3Nvgw}?1TvW5u9RrL9QPIvzG@ynmw@6E((2+r$R+=4{QP} zb&N0&)E-iP3B;!~$S>t!%%HadnaEd#zM5OGj?Mv9rWvr`H$j}*22}ZE$Urc5s1*K` z6+jTmg`5!tZ{GIkDaf8n;dk|e_{jj}b3DX~IZ%;yL6Z3DXj|wI>WU18=jkncfoiB7 z%$%t}M_&Qax*NI}DwK4g98|Qi{4a=?w;&q5f=;3^RNwJ1SCSzDZQvV1P8$c`@;~pJ zMEEk7`GLY&v>{aXbD&D;0j#Pe`~~=4tKf^w0ZQK_UW4|4Z1NYn7S=o5NeicuNicGM zLvFbME2$w6p!y065E|C@9mon>p_)Ak>l%kSy&6X81^6;GgbMpJe8=t3zodoV{S;ze zUx-i8LJgh;_u#+nf;TOME+!YuELO0fd%^kUAdL4gbT#z-tb({LhUiiTR^(*i6ukLk zs8G)GYhhF~_-(LmbD_Rz1@ymd-~eTTe3XsEK{cESZ~aloMe2hi(r9=l6dX2|AiITq z$V+gx=z-3Fl@o(LgLw27=1LvN#;1^)ko#65Qm7E;!O8;fC!mIHf;_(m=13Fxw-%r; zG=j`C2e}4M+7IeZlh6TT_d;-HEEcT%0T_#;uqGD4YGNTbkL8CV+n^`wH|(v-p<3Xz zeijPAN7e)mK21;w%-AMKAG8DF7cRqYXAIQ%yOC6EGt!q&z$?R=jKT_-2P^G7R0l`F zjj33Rp=BO`&DIthsf8`bXo9ia^> zhc)tv-vX6)4;bg#@QkM*duGB{WuP|u1li8be+NB$h?@bc;3AJgw%Pz+{3N&s_5-zT z1~QyG1*ET@ux3^ONo_Jzo@S_a*FvAsiGfWRz-vo8~2xK$xNoRq7&Sl6F3BYVi zh8QToj_niFQ&S+9#3K$E;em)A*6vWqnH1E>KZV(_f9wYl=PSPx-b;#(0WQ~DWH0Pb zmcw)YfDBL{e)>n`FToH0KNX&`K2&PGz_W)HPH=aHpFkDcBwmh9hp60(8N)fb@!(&g zMZaNVq0$|R>}F}9FV_t+z)Nl^IIi`eE@RiZSdNd0`=TJaGR+NHNJ~iLc|;a`DHrH@I)jPMc`Qnqu;=} zax$zk1bq!FxDV8(PlV5W8ssV*_T#;PtTP3?Mt8@a7FpS_k|A4ef(62bBFlBuxG2xC-PNcY+6G$S;ueWjY4nU&;Njk7T_nr zUf>Euyn8TP7lBhwEOcWYLLXq~@IP1{Hk$Yej&>7>YeaS8ErF9W$sjqJ{6tJ7IiOWG zCFc__fE7TJiwO#Nvu^AU_E^(yV01hOi*Z5qt%PGm()f@K+>7q3#Vx#*i!I5c^|cgtNMGpi=4~syGq&O zqN0T2Odt=Ki^x1h&b!>H1#63grGzcs<}Ph!^V%1>5`CLPatft6syvlJFXUb!iK1Vi zqb&eNiCq*HHS;Drnx$` zy1i^Kdi6~=YT7Z)IWH$CazF(=;Z?CeKww3LTi_!r19zf_;IsA)#y`q_U@;&T%jw0!9*I=? zgj|GNr*9z};qqshuCMN{%B+5)oua*|cU$(yT`J@3800%L-baHGgoY9-eL)=Pin1@awzf{!5&fOB{c@l??d@(Oq>+{4p| z+rUhHj}1aI!2xC)UlACHBN!cXM(8P>L1goCre9v zF`F!`<)x9(9W+*qof-Qyrl)C{W}4QfU8a1jcr6Qww*%AUF5+iDQ7K___-MFa@PKcd zN9THGFJF4UgerOEz}!n67fTC@t`*<6agMp3iQYHfx!y0HSpOJ*Ebz$&(L32JP)=22 z>OfWw`7H@8b*e48y&<0PAzp0Nq7xOZ9KX zD0x>|OG#VFSTZ0CVOB;`L;ld#@HC*%=efN08>LK1R?(i~#pn_f&=aZWCM39luXYMTNOX#^QA9|@uE=*nQpD7w_ao_ zGFZ$VO*(z5adymsm|d|smL<9&y59P4>fY+xinsCyvaZmP*P44mSD;(cUts@+l zI6FGJIGoOxj<92?^Rz9iDOH<2zT_wYz*s0cz@e{34yoJ%H3cfc^obhXuZ!t|`t;JLdf8wAmNhCfRnEUMcBYa<=55y{;?P zUEpf#dg!9vKHr|8I5HN<+40n5YBr?@7F2)a0Jv@?fPWh}6N7%@ERdXM;-@h^%0s42 z0&eF%@VSLE33!JT?mf_aYtpmObn!ko-`+}%kA5RpDt0Lj%ERPZqNS`*vrpGYKgJX` z{xr5T-_^g-wYDV1o{c#Xt1`}n+x1bZsj7jhHmWv?_L4z(gxk)v1p2p@QBmr^L@(jF z^f^>BPn4d~zj`NGb?Gx#M%epVp4tg>$O9yqfFG&SyW<&#Cp1k$66#fSO+u zei508FOu05&1El%$^a;Q!_&~{aZ`Nm?|-M%sus0^z(FQHRaSv zDu*&w{$AXMSiuL`e11Ilm8nMA0?EEXp2l8>*W+2}vO7`NW#?Ht>iFqw=<4O2V6LouK;a3eI{p;Cx~jFv(AXKcpHwQ1=0~@*13} zU*Io--(O|Ww|8L6@LKi<)f`lye-I;i3VFvgi7uk@@#l&msyyi*!hy*pd$qR>O-+A{ zU-YMp3ycE{UZXK~xYZV?j_GG?q2qOeZlz|MRILHGmqlt<&X5v~4cEY&&6};!?N=JC@srIA*xj-tXQD{-^$bgZD%I!wVy?qrK=_ z;Lp3CyUgj}PT>_g0UYpVf=ezq3}aUCm@s0W;9R>Utc-53drJlF>Nxcg`9yLvn#v@|>U?DWWVZ!Pb7-wq$;d*;b{dpw|6q`-|o8Z zna-BZzMlO)JlGCA0y>0>LQ5mRD26`GDq4t7vb7PS&*ha_&Rk6lsI(X^*4UTHf zuqtrk5dFW4-3u-aR>fleA-#pFK>b6-17p0aaE#JK-qSV6hsuCru(%8GQL9P5>8=|b zhRcRD%}ecZ!)e0;W5n_)c3A9UQ>?CqhSU}59%(A++NvAMK9F(PRiqNu0M4dob9*9H z16RBm{;)sB|J@sOE_D2Ku<#u+9Oa#_foS)^e%GLl)6WLgQ8t3(f8#ar7HD_a2h~F>;F*{iP7k_sS6Lfd6FeCoGXKy~Y84e1HAZ@d z4qNJXdrUS0=dmH;#rix`o~g)a()Tcwv#hgBHjj!=j~^FjjJax% znlO{W*hr7)-fEJR7HK`Q8oCsDhmS-zapkC}|C6_>uY6D$;(ZgGHd~f`t#h5@lJmZM zh4Zd+sPiXq!o=Qz{-c3vff4?+z{XH|BqMs9%B0895 zJvf-IfjgU5+#}%hu7bTl4nLn;%LbU$%v9zKJ%Di&gk&Pnrv60lGVA1*HA~d#^0~l> zi<2KV)rfs=UT+w$JFIJBDr-tNr^Ju6dSVuu7gGoLeC+fZVI&w*Z54IQU+OxT}nl zPGT4!J`Cadh+P=URHiS52O@IS6&0-|ENmSe>HwKG&7{CFUU;dHrQJ1B{3K8kEC)hBew%)9~RJsRKW%Qf1P^!-wxJ( z$+gm*X2+KuX|(9}S!j{frjTDyB6M>6ZfYI+?r34~Fxi#&CkZ zlP~4U3r&F#!=h7!C#;4W&8_7Y!G5S7hqEo1a`a4kEggxPfL(f4xIx9ysz?M2DN_{( z6@Jpi4<}xz(=1(LZyTFw&S(Z0CY!67&&TYDUvIr=QR+XcU+T-7?ijKR^)-atBHjip zt?!~?#89M|ofnM{&GMH9Tp>-UmcN6up*`CnarAOk^R#sJc22RU+k%dI?qA-Nz%Sn( zZ)bvh+A#G3~OSPsfoFv(Ww(uGh{=>^}&?@6E(#5AXT`j)S%F7|JYz4bTGU$ zFw^zZ_R_&Q`*~`5m%4Abp4%VU^Iac3*LB0*vFYeQA<8|2-JOr`Cwv3?+Aj7Q^ORWt?i&xIov2ILa7ipX zf?67mQuU?h6`Zn%{6GC^qIj47L2RjIgRZwasP1cQVRo7a#K*+vSl$?>YvvmeGp~Q4 z7c^B=GI=L(TG?&$WL!zZa%*%xM#FMdB(>3i;gV+4Pbi@>|Z>iod2L~o{sW^|wzAWS?0 zmZX8I!HnZB!rf~K*wQ`t)o@}2=QThMy^1yvI)HojXMPsH4HSr%fcR&nx6uojchvTX zKb(!|BuyAH`Xsn0VwLdnG0HRYm&koqLVnj(vM!Hx>rbl(t4|oGSsuldO9~~TaR$?J z?M&n4m=ZH%>Y)FrE>=_$3*;Mc96W}D3ovV=o(B~E@uAJ(#i3r_)(-Q3wYhVrJHuVZ zbuuZY7~*~6pX~QS<#o%mH?TGmp;|FRsfy7!su}Z#`wZs@tKdfZ6`TQ#g1`S4 zoT*aK!+cXt0j{%kz#BP*-@w&ol7Zw_$lRvrFdxoD9*8TmJEBv9eh3FVntYOMki0xbUseS8q*cp6zOBD`zkFYgZ3fJ-eo~n(ezCb2str_N93zyVtt= z`YMJRL?2O|qAMfIBRJ(?>VQvE6X6-Zi7NqGeGZ)5|Al3u1^g2BE;|UaxJ=04hwun6 z#`@7gCX2ony%!#X^pw`-F3{sc-J^HKljLvI_mox0mp~Gjt!-#M8?UmQ)H^hDO&ent z#ki7Qq`pX89FwK%VaSTv2P`hBDOG1y$t0gdyCi8MoUDa;faCWjX!Gv#yMnUt2EWW1 zD(O&M0W!!KXFq36+wbBC&|%eCSAfD344VG}c#K1i1`x2X5gcH!OlQOQ!CWf}yY zN8gCn$|h>&Dtn6_0oS`wT@rIGUKG3Ca80|}m~5GCNle+AiY5DF>S#CXdssrTtzuoq zx!N4X7jYTT6}KY?lI5}f+$p+3u$Q--?_%IfC??R)HMUeJJXi9UeXMhXTjy9%`o4Ha zaX%a7TJO8!BfK+RpIjE-r_c_lFzQgNBbw-9dM5jvzX*i;0pPoE1*z{qJv|E;(5s;@;cwv~*NmfiFVYpb_Sc22TzmE}ID=Zp*Fz?=Bs-VBKo6y8 z@Ts~l^pmNDQ*7I?INXC+CO7HEYnRDWEhCb=W#5(& zC-*mv)hbPGVo7Vy7`O4GMkdQ7%aHHzeq;eYNXTasLajaT-PgPh|88F$x8Jt9@JbLgbL^|c+KFpoE`T_9wSH~_pSHgon>HqPk-na8sCjH`z1aV?bQWM% zRo~w~adWx@W`-_@F6kJ$eu8ueNC^l?3JB5-A_yWODJ7st3ldU-wB*n|Oy})W|IeNG zeeN^EFmvxYd+(KJt?ycEVq9+IjNl8F0BNOQHM`9tq{{jz*OB~*x%T|U1>Y3*&3~NT zF?~Xsl-A?Jw;6jg^0RE&a%S%P6=@0CEAo#PY|cNO|5<*+!U07~9UAivYbtw`7Cx)D zt8WO;0K3KI!czS&tq1l*5VlM4VkX-7rkd%ytk%#<>KB>d5aDa->EW*8{8Jle9c?=$JMxplL|SsMO#QmO|-@?Pz$Lgl1)2rTUb79=$gDMdVL$vnm8ueOd9_#L)1; z;bF0FV#|~q8?Qvs-`G-Cs!CtOMyCDpI<=B#dSREm3whmgTA7#9OF4aGj$*9j<+{3v|^XC=y!hc^@)WI>`Gf??Pou-W^=FvdwMSMAx3HNs4 z5>JD@iIKJ-JJiCcp+8YutNrwg%%s>x|Cy=2E}nzV%bpFEs-|V)A?3cKnVMh<4V)i& zKJq}|1nX2hy))Nr}*?+p#@LH7URHqpvHZBvgv2615}xRBU4W*Rew) zUBOYdzRdq=A&nttbwjbcw--I26Ght`Bdcsysm!l42B!~An?|3od2e5(eVrbaUL)gO z`qT7<84WY9=Vah%6%~F~bkecW`K3FIZowOsdU{jl5i|k)VJCl(Obq>?I7WyBxiCjK ztDE!*YKmsj=WD%{AxgaJ@kRUW?j&CYp@prHc`YV`8_)42~WfF(qO`xGTgG9Bv9I48Ttx?{X)^yXcrJXXf5%e33zs-EDJykc7PT79Gb1%B6Puz$NW6EnSUYW>NZ zlz_{|G2i|wbH0AmCaa28QaWntV{T=Nlb*}vtRGu*0@elV!Ae-$uqL6kB1T2Z(amD- z#+;4m9#b#+N%WJL?y=#q1(E9_#)pT8Rt|X)sMwd=zpdGp|8GT0#H9W<^aL!(58Hlw0)N_68}<)p~TK z=*-i)q<=-0Y=p5~ZwOK#pHf0!^5qcOlg)$)*|b6TNnE_CU&>572;o(35g5|3O*HB zHfU(@jG#k7U4lvn?F*bAkZ4b_O|ovYRJBYsb&@|3hYQP%aI9H4Q?oYtu6px4iJmvE zLe~-Zd{0Nu8TV;df~SgivbPHHyF6bn<$vls^$@7dH%v6iN8i5GZAN8X(X9Gh?V0)~ z^G~iSZPdrgR$m3*b?*T>kPi1Hd8d1RA$}C;>+h@Qd+PnyTc2r^Y04ZmU46~WLys21 zOq+IC(;MP%(sl72GB-e6DW%BY$<>&Tb&{ExX;L|I886L=mfDt==EjyymLIH@Y;}pf zcd`v&T3VKUhCRvt)K=5}mECE-ZSQW6wIy5ETMI1n%?>6P)sfqhpW71AE@VAH6 z&CH&uq1VyIsxR2vY-NSoTZ_{gtKEn{Cn^_|_G%z=rj96Gl?}db%y-@ETj-na8|Uro zJ;78%w|A8Hjc2MS+H3YzPzEqbZaL@kOu3?AL)la zQG6lPWzN?Ix^p?1Yj_Go$~p0SP_x-$8HsoYJy=`wJTXQdY+A&8uo|XcOtI!c=AX?& zEr%=}Ef!0NWxu77bq&+f9@44&fGOY9mw0X)^KmM|;6x7wXqX%qDR`g8hwXRAMJQDi88(!1(a^nLVld#dh2mv7Xg z^wFGwmkxDX)LGj1S{qGg@>Xxoc7rcOlq$cHOEF0<%si3^b#2Rf zIy2?b&-#jaqh&dh{(6|*OiKKgnPj>=UAD;{aU5vW7v$hJf#P_;)BV3_fpl$yE*Lv> zUHggYO4aCEzFtewbHFql)8Dbaa%jDoWLv*A-s!pYS?Q$bYrFJNqaiZ&rGA-sUnM?K zK|iQJ)$eH&wX5netXGC=WA4&MEtTm^)6guT{Ci590S0Iw*}N(|dtU+ba!Gg&ZWm0Z z&`ew^uBSs*YpJgk#|$1>+C;aXaAsXpky7c&v{IZ(jBb+HLX4Bf@|PBm2gNKE7oy_mz* zOB{hcJT0y#U;K%2PJbpY#0L*UD^?J8;sF}U9iJ|hV+XUe&CV!2OP{JV)6aPiD3#Ri zN?&!lXPa-W+DN-dR@>%n>1&FWcnk)&EnHCpjbuUNmZYOpWpEuyJf~a%$=HOn2iDEfC-j@E*xe8sfOS9JZaZc3NUbeMO_$K(*xDcNS;#U#Turu)(yxix)*zBe_I&zL&M ze=8;JruEl^MlIa}wACZO>kmrkulGHZd@FZAIf1x#kvYqwwdJTQ7;KSbKX{WI9 zQ$e?#(l_bdwcR`wFE#!KJvvEvqzweInWgs=zW{5J2G-~ga0V`YAMx~4$WTk+SHY?E zLf@VeD@Zj!1NAl>Vh)J9AfX15x+p#n=a) zzY(jGjN95t@U&qd7PGX5;BOj%jM)ViG)9<9bxgc4o!M4Fpp_Cq5d6n{=`*b8Eoip6 zVmesE7D9jKOkZNohgBSneCiofJ9Y8CN?wG&kzH?(GYP5nP|Y0uO%n$AgY(`yS`wSl0Yrf5x# z-P(9Ok(NvYdj!%WALN}^e~!dinaTT1w~KSc3FxRD%!S)3wq|--b8#{@?Sz;u4Wmjd zRqQW*1%7mr^uS1_swaw`D@(+l(px%pcW3JBZKI#~7kG;haNPsI<%||Sm^KF<0q^KD zbngM}Yh^I;&TQW@G;W3xCfKwFLT_!DHdK#No-qgSPwF!!2>D_Q=7<%LmuW8U21lN2 zoD)WYi0KUe>#;ga|3`bL%w(=yZ?&EF8NK2A=;zqkL!+~HPjfM=qc>9tTZ`+N0r#8< zawF+u7Aec7^GxMzY&tGAmQ$rarRvO*8Xzr^M@o%NpG&jF8sch0kh+2Rcn%)+dnN*= zfLxhp{0-7R1(bYeC@@`!6o-rbnB6%MN_}A*2Nkv)ZGTco6nAolkBuF|eKh|)uxr2T zTcJ`VuuX5kb(RKsv(#9ruhUD?4Qso;0xSAh>kM8$NS)7sjVZJPwO{eH$AExO(Le9!8x8M_68I*#_@1~55?K)yXUf{E*a zumKN<+(G9kqDt?z`%EM}t>wZC!^LQmXgzBm6!b87cyQg|XF+d*P6hW1nH*9#Xih+N zdl%a#rqmv^R<}N<8(;$X&d17s-s@h?)665gk2+sEb~#o$>Nu;o);NP)Rd{-x=Z^M- zdn$Q`dtrg7=<%s!J$vF$ z{LRT&@ueJKiSR1G;8hl{3MZxC=o!9EJ|KlljU|maPWj{x8c5$r4@@O&7Xq7v?uu9& zT|egM*cq{VV;)EMj2aQ17P39)Dqj71bG+#n=_$y-Z{fGio<%OZ<3z!eJX`Lz?BJ~I z%sN@&*{ySS=SCOwFB<4<&CJ|uo(I0FYBjwJs92M1V=@sPajmJgu>pEuZcvq=et~!G zS=N!3X{OcEL6G)!wI9`1>I)?k%QH;rql7A}Ky*YB?>gv{$s_momGM31=l4n%vZP)$ z4oWVfT5XT~roNt6$mifU$q6a64W!qP{ zUGY?frRDmTKAzA&{(kh_i1}e+=+U6D0Ud3-%;{7Sty8af4lrf@7*jeves|^7RM?|!05>N3H{2q_~=fxq#EJXKTqyeX=>SFN#jf2kCS3OQLQ6FLXCjNmU&`7 z^_;6}!Nn~7{fAdOpFKz|@o?$=Blo{}xbE@B=QH1o`Y@$aOpBsFnJ!l@G9#T|a4?&T%Y{duN)g71U6!x&BX^xJZMQy-=vpO;#IO@9j? zj8p7Nq@wyp`l@;(+>;%if}*_3x%F}%=Zg8)3Whmadk*RQExp64lqg%ScJiK@PwRYJ zw@q!m+K$RK%l()XUUFmH-k2B6^Y}EhdcZ(aus+H?x?o9GleFotS&WxxCM<&`HL9;Mx%cdx>|>-YY6@Gf=O z^9^qnr&r8vTf{(9de6tnA6eqO17F=2Qy%6v)h4dyyNOcOcJ{pPZkNpOyDGPs}f0;3-_~Jm$Hf{GbcsQE59p zAkX6YHIqITZy1sK7PX#Y@zwPn@U&w>`Yc|fJY$&rch94EyqK3rbyjcI-bug#Y`}NN!JSw_7>`u@f>wnemnIGOv6`w`CI{vO>=8OEX?nHg1>508k z$XDU}Bd122W5>t+uSDMjUt+IP4@yOp3Qb%bzd!0}$ST_f(WD5@7kTfqx@WBTuqv&3 z+THgL-;YXLNzd%pSv7Mv%GYzp`#JeN! zeffg9y|s%i(Z1O(2GkGe9Iz%JJ0K%q8&B^CY+06_roPe;;fDSliC}c40`RJk1u}LM9-9Gv zx&ofpD)8&?@Ge*5{ny35&j5S;FLPgyFvGPUwWJq`9-PYn!Vf-Ip_F;_)!Nj~cc0waeCz$4m-jb4Zuz3ao5k;Eq)*72o%=;$2X~G- zOg?EJ8`>r6yV&OO`SDxgU&XBUmS(fGI38u>O-{KJA ziN0Us30G-M)TxE9xv!mXn$Jb%*`c&hm#K}lK;5DHj7U6#wsHg0MtY1*G6kCU@ub;X zz9Ds%7Ew(!ftcKH+A_7Na@uPpkF&_pyXY2?)gOx%JJvg|xDLCAdzyIzeKw^$lcw+R zOcqQO=b7|aer7sjK4)od-AI4jMb`7y1Ge@7;q>k99To{TD>f=Ds&3@l@Z8Xp;72aJ z@NS1piIz}nA8Tpr9kVW{QoHe5i&OXb-gpDOjXg8mvt55Uqn)2SdOJ2d5}YZnaqi=u z+P<#J4|w_y^%%hk#s{pKDVu2}{g{jP%6`Yb$UfZ`W*un$K@O8Ld6roYvZIhVSQoXu zGS>IryUg3o`>FS$x0i3DQd(OG?mAW6EH5_?whpm>85kA3ETmWHccFtrhlL~tzYZ)N zFw!>GQpc1aU8Un~K6#rU_e`g+=tAMTf=&51^Ahsxc`Ne<=Wi&eQ*_tS$92=)&nqZj z606;3#L#PZvb@!_+}zP}i@8gI)~%L3<~F9kq!YqWT~<@PhhyV8h?Bg-~TP@>w0NcNhRx8}{(^#OZ>zYkSICI_B3{h@qRa5t^Wiw&u} z9#%^I?OFS`H8Q*BJ008Ii#>`vitfU9-J5*rdQVeaz?#rakx!#%#$1bj9C9rhT1lk>yAEk+Dt*bzdrMlUFIHe)gH{?YRvKf*g}v6+LA+r%v7p-T?Z*&($kS ztt|@zCWJH(KO2!6u`PU4XgNCkm9`~Xx|w!KZAGi_m)=9mSH}1Z&mQ+M*Cgk64y$8- z(cq%*Ko+?jm0e5R-Mrr`mw9d;CiOCfTYAtVyQJ+U9hIM2+FAZ&3d2~_dU?LoTr45n z(T`~tRfjT&zWAxWLSGAICr|fR)P>p#>anj2HKYadWz!w=QcJY;Q)?}&$$HeXz%tdc z%(BYzr6s|#hEB$%<)1~vXrZrFKT!tyPJ0J=TYDRMM|&?pd6RNZS*SK7G8RgG$OoZ7 z+`$y_c}&Ba3;uP6yg(i-Clj^#O7e;$#i!`Z2%dt^8}-O!N9yCXed;;I>)Qm11O_z|R5Sev`W1y;O@PfR4B89r;u_b*J(XbM7;IrI`BuRGFlP zXqUC^OxSV@Po$@&^44Sa>OnI?CWa+NRE-=J84+1O!WT9-)Dt`)Xji~~+Z0QnX$94K zc6By03NjpRi%f+>3-09a&W|gIq=WPw$Iq^gp2ogbs*UOaE71a3PL~guzA|^EXLkqA zx}0?hvB(alL(*MgpS~CjXj{)3R~u(v$KImJMF)z?Iy6V9tFODJH&zMJVvP!76kW+Z zW~=pI>*uyRwh?wyK;M9|0bc}20XF+FYqDh|{Mi^V6JWIKhFlLX>QLWgv3;KUl&0ZHrOFqdg2g@aKj1uJuvek4LB=B8k;e2e7nCr8b z7+gDM6cmX=c#5t=?dwwg4<^!2X5xxn{ZtKJbO)U}jy}*{ak%tYS}T8Tnn{oF zf2^ySAX(jh#pbpCW$9>MC+`)VMhE?xx>q^vD`dW7E^%_vGuHje)zwu%erE|)1Q(qa z*B)0rcbfZmPdo1`Z>TTXHwgUMI-k$CoM>cqEnL_1zo^g*r04ri`Wo4b-^ysx=9fHo-wpx~%3rurNbLBpi z0JQfmF6y5%@x#tH6J9QQvHRuXH|NrBWJW>e~wg$AnA= z^SC==YsBU7I$@uOR1R8he{HG9Ox;8J56VQ(Ea$$$_W9d$@8_iFoXc&*e8Xf%IdB8X z-nR509|ivG3N=9$&9!Vvfk%VahE50@9@aT@Nw7We8(WrnmAqc`(n(?~e0~fB-575} zo}@lFosNEv5{?u{s>9)2>t5vDqrB7t(Ay164)a9oaobXRJ^ZB;0mlQ{1l+I(G4mtF z`qaGA)S8&xGNH0@MO%sgv7AW^tGr9RkG+k2$9-*izMcvde+07-Xq=*6(FzrN!R>Rv z?^>8in=2oYN6V$;Z>8hJa%CaGh}X+%4b^2J0v`CT`hu`VnaT_`S=*)6rLV{os^mPv zQJ!^{$)7R7a6Ors*XEPv*6>oaX}Ro{zLsL817bDm!LCu!d0acChN)|nkCoC&TV*Re zdO{sb7AX_`RE>OVEVAKv_WXD13YQ7J@vqiUA@g3}tuNI_ zkeSHUdm7J-mb~jRh}}qH#jVALVrOEMBgKW{GI0R4JI90uV0phGzEX-T`*aXn4XE2$ zOk6jSDx{vqZT%O0ioQU<0X(;!(b%YISkYKZ$%^#TSL^rm^60SPM7Jl3mqm+InRyqj zr18w`y($TEb$OBeuY8T^5J~c4>N8#ti3>Hx>OO6y){i_SnMu8`ei;pSf+*b-qKG@V zb5|mxTZx|kO4lTf2=-Po6>ItaRO_h!NG`3pv4mQ$33NdTq0W8)*_e`geSNrooag5N zG-ILuR{ukvuWzA3s2>%Ei$QhF6SKw2QUli3OX??$lxBlI{7ZT&=~A-XAKiRbcF1;S zhbItyeQa_Mmy$gmX0FU+*8OlK7 zXWwXbiLh7H22e{-sEyV0=veNrh+jmd+6#Q=XVlP6BG-JFN~T$$6uWBo)P>AD8>gNo zr?5qniDmWETatJAQ#(k7+Y9X@XxWgcTpwyi!=U>W;>P`{Oe#xl{Rk>IGwAuSO$?Pr zNIRuy`5PkYmrOsIS6UWZ&)L4S4-2>!a512DKq>nuYdfkN`JK) z8JkOF26n2yDJRL+ZKa-J6#2{T>c855bl7)NuW{NKOy+2YUXf^9XHa6%Sv6t%%WGAu?Npy|Upcg;D?c3DwIJ;3v9{a1w`wRJsu)!-L-v z?W`}FLBeN)Xq`>gq@grQIwrj*Q_w~p2^Aj8B~2YnT}%y3`AFVPGA0|ya|BZ(I}|Q_ zMTTRi-UfViknZNZ!-$#xpofrw*+{MHex63!8Ue<7@X!r8(>_pcC;aLrM&>{|FY>@U>MMUYJjCN3Q>H!%;gPO`@bb`N?VWDY(YM4?f@OZ#=(74Qzvs57&hGNhei zLt&FHXrq)X-p@VNnHqh;xejgL$>m{ok;@a}i&pL`o77=o!=qUG=uEr5B_L0y%?IZqU}7FjiOFEhO=M7yw3e(!am@OjFpGVrR9TC zUFjC5pNJl*NW48otVGOx3(?je;qrg<9b`e?>W%3^kU*t%7E#%1XuB(926hRr$-#Xs zriwL~WpNdh*Ins*qKX^DLSY{9)C}VZ9VPywvTz+|pJGH(!FvRhyvpx+yt#w(8XadR~;dCp>A&vHOu2bWQ^3?>OGKz6=aA1L6+ws7pY8V?x@w$ zx>dIpx(VSFyEMf3ej{h?cEgIylZs|y*I?No@<5t{nVnZ%xJ@~%weq>Wfb^pe}Y zAk0F`>=g>Bk#UJj;I@f)4GZD8sI2b|dxbLMEOLqa#6Hy9)S!RFEp%A{xwf0&^hJ>FXSEHW6cRO?_A&ZSfXfVh7<3$p zU$csO#uM5CZHMO6=EIR+gKs(wO>gNHsN48govp4=bJWFJGTMI$a+rj-m`Bd#2X>W< zHTWB3@N;oIlOjJR*L$2g!dj+PrspOzv&ROQYngS^9x{kO$azu^>9iOrPQk*IHHP8^ zjVC`{Ui(Blqg5u#-=27JYvC6veEo=WSqvbP=;w>_FRD2cYz%RrbU;$13UXzvNr7|< z&D~O}L;k<1G*p^GrP&YCXHrRND;2t>gfGci4`ZUUh&CRAL`NIVv03rht996i8hSIV z;uf@WH#&JdF!GG+oToomdIU^$HE=V3YOBf0f1@qd_Gx)qb@0i5>Q22nwY9S0#hNBi zoqCnp_1$Q&4&-!3u``3LHq}UIxliGh<=EnT=!JK}ec=G-x&@1N4(j`cL8&};*hIUDTgWQS45^~QYUGs&6#o?scukv zX>s}$G;L$LmW4$WXfD4MyU*+KA=5(7#!IzkTrjElyIKUHp99jB#tmYhejO|Q- zI|3phOA3}N$|dA?(pIU1WQfbj*uACd_bh$5YKp7Gw_>0qQiC~%eEksNZ|0I6MiWfe z$8+EE=!c7Fif3fbyMecSq;Ej$HbZ|8L)Ld=Pw(jeL9+p9_T$E*(4CQ~BM39o!;<9?tH6LSApi# z22x2jUfeG}5^sV7I4X`ow%$_pJpvygk~(rLdc6Waxs8AD?P9@V{H0$aOBzky_yYYc zMpFa63mv+it9?Umuo3+w%EQZpxkI#&PTsB|pIHN4S}?ur3uH7y*Y$GbQ42ttJ=Gtg z)l=w06bDTnQ}3PvBB2ealiO61ccG%^vT=m!@ym4JNung35hTP}l7 z4r3{A2xpM>wREqVi#}Kd|NKSlbPu&`)#w?p2mbjO%6>|YJP0}an>_EQ@LpZG$lo2U zJ3JglO@9jfu!Y>{3baNg;W>Trnz1ecekzA_uYq&kkf*##26G%02v?!Z671y>{IfE} z{Q>(LUGV4Qp?M%&S(PqP+sWY9htFnkmmYZcE_m?*T=16Jg zfT&w(Ji>l{MtyQ2I>3X=R1sUekoZw7O{1Sag2GfD$ zieM9K(I16cCFpya6Y7e;R7m$Sl`aFhbZprTlAs5ES_0La_mJ0@`g2|8^aqiBJ_=2K zC(}6?3sD#UY8m@&LpKHsmSi;iU?sb`5RUS9a?9X~CoCbXDO?}8>!(@9L_5?z?wP`ELBU&)zFgwAH3dA2~KW1LL{tEKk|+S=cP7+p)=~B zHCkg&TJfnV+-oKr=-=O9R-J<97sN^Das?m#9g=uns03E`8rpIKUPgD$d@S`jhoNpE z*fam%OM{*C(8umEU)8vZKeqpjXU%ARtGfgRxq;|wP#AWtLF2g{JNq38oQ9J2AE58{GSAB`Kn?m22LEyi-z*o(|r%gixoWmQ4g>xpu2RpIb2YJq# zjdp1Vjhper;pC)t!5KT)(Mk65pqQ2tPuwN&I~o*o!Wd|wLn#oAR!DPx zQ;>z~^xw!rZr&gPb|PdY&^du%YHlKPJCXXI(RLU4tH3qJqRS5#`!8ELlTm!SGd@f| ztonHDN0;Jjr9t6AJpDDo27ZDioP(cw8p->EwO+;te+sTifZCC)rZ?xb2zqa4EkALM z9!R8>&S|IN&Cj4xZ)h?Rz4j+ka1F{FLl>-ouFJX3E2OeARO!bxR=_bo;a}0?2hDk; zm`lDviW<{9$Ai!8M928|J`~9qiA*$r-=g85XS`m6PAbKfDq}^u;PZ4xjvMoLjr05$ z4AXQvefc}4^rpsWBtGdtxM~`&IlPv@?VGR#XOZ9>I>tzxbp?2_4m8!;y132mT$khwJ9^;R^rITtN_#PkUf3w&U9$}Z+$ayKATfW3{Y(WCHA%DkFJ_@B(1{xTu@En%K&?ognsP8SuG)^vjQD6 zl(VZ09jr);kN%-0S=$R%bOV{*3%&Mn;ve9MTI^*79Q+GfeinS|hUZTd|4oBenn3^Y zNO2pk{+hLHfi|na&HN69ob)=XMLl&VIHwBN_rasjxyLoM?JNAumuR#%?Djnp^@ROw zL}usXjf_SrhoVz9ah2m}x(D#zWq9B`pDzP1cM!%POMWfc2U_((_jShK9?5x4FXn?K z?D8}8UK#k<1~)tKEiCv`_povF$%9({6+Fk!bwmAU(BTq)dBkor(M&FEaWtM!E%+n_ zt=zh1zb5u~=Gg~duIBW&LzlZ~Wjc+b4HEDyOYyIT4oNPq zUWhN~Ev7bQdFcNlXz%8pxtu^Ac~lRS@~=0FxKMTWSp#pT68gA2*XhDDMK>gFEIaNC z|3<;tew+U%7V3Ab%{e6GEq8H2G5@P9cdyGywdOP9(PiJV?vCuIB)(%szDJ^E;<%T} zE*`OeYHB!@o6zJf>k&D#Tvq%Dy?q^i%Ej)4vX>_4jz+909Dnu}_ud50ZH60;qX{qZ z-s7Ca4tVhYop2#sXh-gv7IR5QcF>ZYlxA%u;k!hxSdzcHIRhGhqVvk}^)agp28CC` zk0E7sW=_oCUFiz4|1%ooCRBXE|1_*MEiSO3e!j}$EMKAhE<%r+e1FGh9^#!1LY`Zp zZ`*QyBcNmzuKF>&@df8GAFlomA7K%eeI)-*L`ny9VzpV5h=e`Fy6jpb^g10w>pZps2 zB%I^ds6p(x68fSA>+@50Dq3|Fn(X&tJ$4K|whWu`8Pe5=cUFO;1K=hrbhC4MHsTRZ zJm<$)iL0!2BNkyQciaLu?n92#=+~se`TppejZgZufezmZ_+9Dn{d?&8wwTH{&>?~k%0hq^D%4c%6;qcb9wew0{8+_V_y*EV?Y8F$k8vFHoU)74&AJ5(+;iot>iWOfu8=lESno6VNY8LD5 z1kN)A-ToT-_@(>;5_q3Ii1b%Appn9f6k`4I*t>s!TCuhb<5Tnyh7$FWK>xjy(Ha^& zd5;sm2VcfQsp_nR|9eyi_}X^=tGG zQ2!BEds3`Pv-qyE2LC!Q!WlQ@dme)e}#tcqq# z<{U~E>ump%asV7vlE0KIw&*o^XA|C)%xU=8VKP~PU@$$@@%ifzncK7SaE z>-Pbc<8jU7>2V^uY%^EB0Zp@5rNLPwLMz+JEFn4?X?Q9T(6DhpO+krt``qa zC9w+sZ>n*Z6lm3wd-g(Kc1J69=DrbpE|@3pRQCQ)v0c6iRo=rR8GKi`dIsklSiJuv zIJG3JFN=(I;HxS;QJK?F;LsPW@?`P)-l92t&^UwBiQp9dy0Im9Z@@{k;wSnM!ao(b zq6__>1s_~R*8N{hl)_^D!9Mr0+Y4~b8Gd&KnM>oHCO+YRhWGnI*U(Q-`6&>6sBs?` P_Y?R@hYEgM^TPN)185uR literal 0 HcmV?d00001 From 77da321581375ead63002088fdd61eebf7f6f0c5 Mon Sep 17 00:00:00 2001 From: anteju <108555623+anteju@users.noreply.github.com> Date: Mon, 19 Dec 2022 18:58:03 -0800 Subject: [PATCH 083/154] [ASR] Audio processing base, multi-channel enhancement models (#5356) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Audio processing base model, enc-mask-dec enhancement, tests and modules Signed-off-by: Ante Jukić * Addressed review comments Signed-off-by: Ante Jukić * Fixed CodeQL warnings Signed-off-by: Ante Jukić * Addressed PR comments Signed-off-by: Ante Jukić * Addressed PR comments: - renamed AudioProcessingModel to AudioToAudioModel - various small modifications - updated unit tests Signed-off-by: Ante Jukić * Addressed comments - Moved spectrogram to audio_preprocessing - Renamed MultichannelFeatures - Updated config and unit tests Signed-off-by: Ante Jukić Signed-off-by: Ante Jukić Signed-off-by: Elena Rastorgueva --- .../conf/multichannel_enhancement.yaml | 117 +++++ .../audio_to_audio/process_audio.py | 240 +++++++++ .../audio_to_audio/speech_enhancement.py | 61 +++ nemo/collections/asr/data/audio_to_audio.py | 76 ++- .../asr/data/audio_to_audio_dataset.py | 3 + nemo/collections/asr/losses/__init__.py | 1 + nemo/collections/asr/losses/audio_losses.py | 248 +++++++++ nemo/collections/asr/models/__init__.py | 2 + .../asr/models/audio_to_audio_model.py | 114 ++++ .../asr/models/enhancement_models.py | 434 +++++++++++++++ nemo/collections/asr/modules/__init__.py | 3 + nemo/collections/asr/modules/audio_modules.py | 492 ++++++++++++++++++ .../asr/modules/audio_preprocessing.py | 204 +++++++- .../asr/parts/utils/audio_utils.py | 59 +++ tests/collections/asr/test_asr_datasets.py | 138 ++--- tests/collections/asr/test_asr_losses.py | 341 ++++++++++++ tests/collections/asr/test_audio_modules.py | 191 +++++++ .../asr/test_audio_preprocessing.py | 189 +++++++ .../collections/asr/utils/test_audio_utils.py | 70 +++ 19 files changed, 2891 insertions(+), 92 deletions(-) create mode 100644 examples/asr/experimental/audio_to_audio/conf/multichannel_enhancement.yaml create mode 100644 examples/asr/experimental/audio_to_audio/process_audio.py create mode 100644 examples/asr/experimental/audio_to_audio/speech_enhancement.py create mode 100644 nemo/collections/asr/losses/audio_losses.py create mode 100644 nemo/collections/asr/models/audio_to_audio_model.py create mode 100644 nemo/collections/asr/models/enhancement_models.py create mode 100644 nemo/collections/asr/modules/audio_modules.py create mode 100644 tests/collections/asr/test_asr_losses.py create mode 100644 tests/collections/asr/test_audio_modules.py create mode 100644 tests/collections/asr/test_audio_preprocessing.py diff --git a/examples/asr/experimental/audio_to_audio/conf/multichannel_enhancement.yaml b/examples/asr/experimental/audio_to_audio/conf/multichannel_enhancement.yaml new file mode 100644 index 000000000000..f38c1fe69bcb --- /dev/null +++ b/examples/asr/experimental/audio_to_audio/conf/multichannel_enhancement.yaml @@ -0,0 +1,117 @@ +# This configuration contains the default values for training a multichannel speech enhancement model. +# +name: "multichannel_enhancement" + +model: + sample_rate: 16000 + skip_nan_grad: false + num_outputs: 1 + + train_ds: + manifest_filepath: ??? + input_key: audio_filepath # key of the input signal path in the manifest + target_key: target_filepath # key of the target signal path in the manifest + target_channel_selector: 0 # target signal is the first channel from files in target_key + audio_duration: 4.0 # in seconds, audio segment duration for training + random_offset: true # if the file is longer than audio_duration, use random offset to select a subsegment + min_duration: ${model.train_ds.audio_duration} + batch_size: 64 # batch size may be increased based on the available memory + shuffle: true + num_workers: 8 + pin_memory: true + + validation_ds: + manifest_filepath: ??? + input_key: audio_filepath # key of the input signal path in the manifest + target_key: target_filepath + target_channel_selector: 0 # target signal is the first channel from files in target_key + batch_size: 1 # batch size may be increased based on the available memory + shuffle: false + num_workers: 4 + pin_memory: true + + test_ds: + manifest_filepath: ??? + input_key: audio_filepath # key of the input signal path in the manifest + target_key: target_filepath # key of the target signal path in the manifest + target_channel_selector: 0 # target signal is the first channel from files in target_key + batch_size: 1 # batch size may be increased based on the available memory + shuffle: false + num_workers: 4 + pin_memory: true + + encoder: + _target_: nemo.collections.asr.modules.audio_preprocessing.AudioToSpectrogram + fft_length: 512 # Length of the window and FFT for calculating spectrogram + hop_length: 256 # Hop length for calculating spectrogram + power: null + + decoder: + _target_: nemo.collections.asr.modules.audio_preprocessing.SpectrogramToAudio + fft_length: 512 # Length of the window and FFT for calculating spectrogram + hop_length: 256 # Hop length for calculating spectrogram + + mask_estimator: + _target_: nemo.collections.asr.modules.audio_modules.MaskEstimatorRNN + num_outputs: ${model.num_outputs} + num_subbands: 257 # Number of subbands of the input spectrogram + num_features: 256 # Number of features at RNN input + num_layers: 5 # Number of RNN layers + bidirectional: true # Use bi-directional RNN + + mask_processor: + _target_: nemo.collections.asr.modules.audio_modules.MaskBasedBeamformer # Mask-based multi-channel processing + ref_channel: 0 # Reference channel for the output + + loss: + _target_: nemo.collections.asr.losses.SDRLoss + scale_invariant: true # Use scale-invariant SDR + + optim: + name: adamw + lr: 1e-3 + # optimizer arguments + betas: [0.9, 0.98] + weight_decay: 0 + +trainer: + devices: -1 # number of GPUs, -1 would use all available GPUs + num_nodes: 1 + max_epochs: -1 + max_steps: -1 # computed at runtime if not set + val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations + accelerator: auto + strategy: ddp + accumulate_grad_batches: 1 + gradient_clip_val: null + precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP. + log_every_n_steps: 25 # Interval of logging. + enable_progress_bar: true + resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc. + num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it + check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs + sync_batchnorm: true + enable_checkpointing: False # Provided by exp_manager + logger: false # Provided by exp_manager + +exp_manager: + exp_dir: null + name: ${name} + create_tensorboard_logger: true + create_checkpoint_callback: true + checkpoint_callback_params: + # in case of multiple validation sets, first one is used + monitor: "val_loss" + mode: "min" + save_top_k: 5 + always_save_nemo: true # saves the checkpoints as nemo files instead of PTL checkpoints + + # you need to set these two to true to continue the training + resume_if_exists: false + resume_ignore_no_checkpoint: false + + # You may use this section to create a W&B logger + create_wandb_logger: false + wandb_logger_kwargs: + name: null + project: null diff --git a/examples/asr/experimental/audio_to_audio/process_audio.py b/examples/asr/experimental/audio_to_audio/process_audio.py new file mode 100644 index 000000000000..6441f685ca4f --- /dev/null +++ b/examples/asr/experimental/audio_to_audio/process_audio.py @@ -0,0 +1,240 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import contextlib +import glob +import json +import os +from dataclasses import dataclass, is_dataclass +from pathlib import Path +from typing import List, Optional + +import pytorch_lightning as pl +import torch +from omegaconf import OmegaConf + +from nemo.collections.asr.models import AudioToAudioModel +from nemo.core.config import hydra_runner +from nemo.utils import logging, model_utils + + +""" +Process audio file on a single CPU/GPU. Useful for processing of moderate amounts of audio data. + +# Arguments + model_path: path to .nemo checkpoint for an AudioToAudioModel + pretrained_name: name of a pretrained AudioToAudioModel model (from NGC registry) + audio_dir: path to directory with audio files + dataset_manifest: path to dataset JSON manifest file (in NeMo format) + + input_channel_selector: list of channels to take from audio files, defaults to `None` and takes all available channels + input_key: key for audio filepath in the manifest file, defaults to `audio_filepath` + + output_dir: Directory where processed files will be saved + output_filename: Output filename where manifest pointing to processed files will be written + batch_size: batch size during inference + + cuda: Optional int to enable or disable execution of model on certain CUDA device. + amp: Bool to decide if Automatic Mixed Precision should be used during inference + audio_type: Str filetype of the audio. Supported = wav, flac, mp3 + + overwrite_output: Bool which when set allowes repeated processing runs to overwrite previous results. + +# Usage +AudioToAudioModel can be specified by either `model_path` or `pretrained_name`. +Data for processing can be defined with either `audio_dir` or `dataset_manifest`. +Processed audio is saved in `output_dir`, and a manifest for processed files is saved +in `output_filename`. + +``` +python process_audio.py \ + model_path=null \ + pretrained_name=null \ + audio_dir="" \ + dataset_manifest="" \ + input_channel_selector=[] \ + output_dir="" \ + output_filename="" \ + batch_size=1 \ + cuda=0 \ + amp=True +``` +""" + + +@dataclass +class ProcessConfig: + # Required configs + model_path: Optional[str] = None # Path to a .nemo file + pretrained_name: Optional[str] = None # Name of a pretrained model + audio_dir: Optional[str] = None # Path to a directory which contains audio files + dataset_manifest: Optional[str] = None # Path to dataset's JSON manifest + + # Audio configs + input_channel_selector: Optional[List] = None # Union types not supported Optional[Union[List, int]] + input_key: Optional[str] = None # Can be used with a manifest + + # General configs + output_dir: Optional[str] = None + output_filename: Optional[str] = None + batch_size: int = 1 + num_workers: int = 0 + + # Override model config + override_config_path: Optional[str] = None # path to a yaml config that will override the internal config file + + # Set `cuda` to int to define CUDA device. If 'None', will look for CUDA + # device anyway, and do inference on CPU only if CUDA device is not found. + # If `cuda` is a negative number, inference will be on CPU only. + cuda: Optional[int] = None + amp: bool = False + audio_type: str = "wav" + + # Recompute model predictions, even if the output folder exists. + overwrite_output: bool = False + + +@hydra_runner(config_name="ProcessConfig", schema=ProcessConfig) +def main(cfg: ProcessConfig) -> ProcessConfig: + logging.info(f'Hydra config: {OmegaConf.to_yaml(cfg)}') + + if is_dataclass(cfg): + cfg = OmegaConf.structured(cfg) + + if cfg.model_path is None and cfg.pretrained_name is None: + raise ValueError("Both cfg.model_path and cfg.pretrained_name cannot be None!") + if cfg.audio_dir is None and cfg.dataset_manifest is None: + raise ValueError("Both cfg.audio_dir and cfg.dataset_manifest cannot be None!") + + # setup GPU + if cfg.cuda is None: + if torch.cuda.is_available(): + device = [0] # use 0th CUDA device + accelerator = 'gpu' + else: + device = 1 + accelerator = 'cpu' + else: + device = [cfg.cuda] + accelerator = 'gpu' + + map_location = torch.device('cuda:{}'.format(device[0]) if accelerator == 'gpu' else 'cpu') + + # setup model + if cfg.model_path is not None: + # restore model from .nemo file path + model_cfg = AudioToAudioModel.restore_from(restore_path=cfg.model_path, return_config=True) + classpath = model_cfg.target # original class path + imported_class = model_utils.import_class_by_path(classpath) # type: AudioToAudioModel + logging.info(f"Restoring model : {imported_class.__name__}") + audio_to_audio_model = imported_class.restore_from( + restore_path=cfg.model_path, override_config_path=cfg.override_config_path, map_location=map_location + ) # type: AudioToAudioModel + model_name = os.path.splitext(os.path.basename(cfg.model_path))[0] + else: + # restore model by name + audio_to_audio_model = AudioToAudioModel.from_pretrained( + model_name=cfg.pretrained_name, map_location=map_location + ) # type: AudioToAudioModel + model_name = cfg.pretrained_name + + trainer = pl.Trainer(devices=device, accelerator=accelerator) + audio_to_audio_model.set_trainer(trainer) + audio_to_audio_model = audio_to_audio_model.eval() + + if cfg.audio_dir is not None: + filepaths = list(glob.glob(os.path.join(cfg.audio_dir, f"**/*.{cfg.audio_type}"), recursive=True)) + else: + # get filenames from manifest + filepaths = [] + if os.stat(cfg.dataset_manifest).st_size == 0: + raise RuntimeError(f"The input dataset_manifest {cfg.dataset_manifest} is empty.") + + input_key = 'audio_filepath' if cfg.input_key is None else cfg.input_key + manifest_dir = Path(cfg.dataset_manifest).parent + with open(cfg.dataset_manifest, 'r') as f: + for line in f: + item = json.loads(line) + audio_file = Path(item[input_key]) + if not audio_file.is_file() and not audio_file.is_absolute(): + audio_file = manifest_dir / audio_file + filepaths.append(str(audio_file.absolute())) + + logging.info(f"\nProcessing {len(filepaths)} files...\n") + + # setup AMP (optional) + if cfg.amp and torch.cuda.is_available() and hasattr(torch.cuda, 'amp') and hasattr(torch.cuda.amp, 'autocast'): + logging.info("AMP enabled!\n") + autocast = torch.cuda.amp.autocast + else: + + @contextlib.contextmanager + def autocast(): + yield + + # Compute output filename + if cfg.output_dir is None: + # create default output filename + if cfg.audio_dir is not None: + cfg.output_dir = os.path.dirname(os.path.join(cfg.audio_dir, '.')) + f'_processed_{model_name}' + else: + cfg.output_dir = os.path.dirname(cfg.dataset_manifest) + f'_processed_{model_name}' + + # Compute output filename + if cfg.output_filename is None: + # create default output filename + cfg.output_filename = cfg.output_dir.rstrip('/') + '_manifest.json' + + # if transcripts should not be overwritten, and already exists, skip re-transcription step and return + if not cfg.overwrite_output and os.path.exists(cfg.output_dir): + logging.warning( + f"Previous output found at {cfg.output_dir}, and flag `overwrite_output`" + f"is {cfg.overwrite_output}. Returning without processing." + ) + + return cfg + + # Process audio + with autocast(): + with torch.no_grad(): + paths2processed_files = audio_to_audio_model.process( + paths2audio_files=filepaths, + output_dir=cfg.output_dir, + batch_size=cfg.batch_size, + num_workers=cfg.num_workers, + input_channel_selector=cfg.input_channel_selector, + ) + + logging.info(f"Finished processing {len(filepaths)} files!") + logging.info(f"Processed audio is available in the output directory: {cfg.output_dir}") + + # Prepare new/updated manifest with a new key for processed audio + with open(cfg.output_filename, 'w', encoding='utf-8') as f: + if cfg.dataset_manifest is not None: + with open(cfg.dataset_manifest, 'r') as fr: + for idx, line in enumerate(fr): + item = json.loads(line) + item['audio_filepath_unprocessed'] = item[input_key] + item['audio_filepath'] = paths2processed_files[idx] + f.write(json.dumps(item) + "\n") + else: + for idx, processed_file in enumerate(paths2processed_files): + item = {'audio_filepath': processed_file} + f.write(json.dumps(item) + "\n") + + return cfg + + +if __name__ == '__main__': + main() # noqa pylint: disable=no-value-for-parameter diff --git a/examples/asr/experimental/audio_to_audio/speech_enhancement.py b/examples/asr/experimental/audio_to_audio/speech_enhancement.py new file mode 100644 index 000000000000..d12f6c47af06 --- /dev/null +++ b/examples/asr/experimental/audio_to_audio/speech_enhancement.py @@ -0,0 +1,61 @@ +# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +# Training the model + +Basic run (on CPU for 50 epochs): + python examples/asr/experimental/audio_to_audio/speech_enhancement.py \ + # (Optional: --config-path= --config-name=) \ + model.train_ds.manifest_filepath="" \ + model.validation_ds.manifest_filepath="" \ + trainer.devices=1 \ + trainer.accelerator='cpu' \ + trainer.max_epochs=50 + +PyTorch Lightning Trainer arguments and args of the model and the optimizer can be added or overriden from CLI +""" +import pytorch_lightning as pl +from omegaconf import OmegaConf + +from nemo.collections.asr.models import EncMaskDecAudioToAudioModel +from nemo.core.config import hydra_runner +from nemo.utils import logging +from nemo.utils.exp_manager import exp_manager + + +@hydra_runner(config_path="./conf", config_name="multichannel_enhancement") +def main(cfg): + logging.info(f'Hydra config: {OmegaConf.to_yaml(cfg, resolve=True)}') + + trainer = pl.Trainer(**cfg.trainer) + exp_manager(trainer, cfg.get("exp_manager", None)) + model = EncMaskDecAudioToAudioModel(cfg=cfg.model, trainer=trainer) + + # Initialize the weights of the model from another model, if provided via config + model.maybe_init_from_pretrained_checkpoint(cfg) + + # Train the model + trainer.fit(model) + + # Run on test data, if available + if hasattr(cfg.model, 'test_ds') and cfg.model.test_ds.manifest_filepath is not None: + if trainer.is_global_zero: + trainer = pl.Trainer(devices=1, accelerator=cfg.trainer.accelerator) + if model.prepare_test(trainer): + trainer.test(model) + + +if __name__ == '__main__': + main() # noqa pylint: disable=no-value-for-parameter diff --git a/nemo/collections/asr/data/audio_to_audio.py b/nemo/collections/asr/data/audio_to_audio.py index 19a2444f5f9f..6a10a4a300b5 100644 --- a/nemo/collections/asr/data/audio_to_audio.py +++ b/nemo/collections/asr/data/audio_to_audio.py @@ -15,9 +15,9 @@ import abc import math import random -from collections import OrderedDict +from collections import OrderedDict, namedtuple from dataclasses import dataclass -from typing import Callable, Dict, List, Optional, Tuple, Union +from typing import Callable, Dict, List, Optional, Tuple, Type, Union import librosa import numpy as np @@ -57,7 +57,7 @@ def _audio_collate_fn(batch: List[dict]) -> Tuple[torch.Tensor]: } ``` 1D tensors have shape (num_samples,) and 2D tensors - have shape (num_samples, num_channels) + have shape (num_channels, num_samples) Returns: A tuple containing signal tensor and signal length tensor (in samples) @@ -73,7 +73,7 @@ def _audio_collate_fn(batch: List[dict]) -> Tuple[torch.Tensor]: batched = tuple() for signal in signals: - signal_length = [b[signal].shape[0] for b in batch] + signal_length = [b[signal].shape[-1] for b in batch] # Batch length is determined by the longest signal in the batch batch_length = max(signal_length) b_signal = [] @@ -85,7 +85,7 @@ def _audio_collate_fn(batch: List[dict]) -> Tuple[torch.Tensor]: pad = (0, batch_length - s_len) elif b[signal].ndim == 2: # multi-channel signal - pad = (0, 0, 0, batch_length - s_len) + pad = (0, batch_length - s_len, 0, 0) else: raise RuntimeError( f'Signal {signal} has unsuported dimensions {signal.shape}. Currently, only 1D and 2D arrays are supported.' @@ -396,7 +396,7 @@ def get_samples( Returns: Numpy array with samples from audio file. The array has shape (num_samples,) for a single-channel signal - or (num_samples, num_channels) for a multi-channel signal. + or (num_channels, num_samples) for a multi-channel signal. """ output = cls.get_samples_synchronized( audio_files=[audio_file], @@ -439,7 +439,7 @@ def get_samples_synchronized( Returns: List with the same size as `audio_files` but containing numpy arrays with samples from each audio file. - Each array has shape (num_samples, ) or (num_samples, num_channels), for single- + Each array has shape (num_samples,) or (num_channels, num_samples), for single- or multi-channel signal, respectively. For example, if `audio_files = [path/to/file_1.wav, path/to/file_2.wav]`, the output will be a list `output = [samples_file_1, samples_file_2]`. @@ -516,7 +516,7 @@ def get_samples_from_file( channel_selector: Select a subset of available channels. Returns: - An array with shape (samples,) or (samples, channels) + An array with shape (samples,) or (channels, samples) """ if isinstance(audio_file, str): # Load samples from a single file @@ -566,7 +566,7 @@ def get_segment_from_file( channel_selector: Select a subset of available channels. Returns: - An array with shape (samples,) or (samples, channels) + An array with shape (samples,) or (channels, samples) """ if num_samples is None: segment = AudioSegment.from_file( @@ -581,7 +581,15 @@ def get_segment_from_file( offset=offset, channel_selector=channel_selector, ) - return segment.samples + + if segment.samples.ndim == 1: + # Single-channel signal + return segment.samples + elif segment.samples.ndim == 2: + # Use multi-channel format as (channels, samples) + return segment.samples.T + else: + raise RuntimeError(f'Unexpected samples shape: {segment.samples.shape}') @staticmethod def list_to_multichannel(signal: Union[np.ndarray, List[np.ndarray]]) -> np.ndarray: @@ -595,7 +603,7 @@ def list_to_multichannel(signal: Union[np.ndarray, List[np.ndarray]]) -> np.ndar Returns: Numpy array obtained by concatenating the elements of the list - along the channel dimension (axis=1). + along the channel dimension (axis=0). """ if not isinstance(signal, list): # Nothing to do there @@ -610,10 +618,10 @@ def list_to_multichannel(signal: Union[np.ndarray, List[np.ndarray]]) -> np.ndar # If multiple signals are provided in a list, we concatenate them along the channel dimension if signal[0].ndim == 1: # Single-channel individual files - mc_signal = np.stack(signal, axis=1) + mc_signal = np.stack(signal, axis=0) elif signal[0].ndim == 2: # Multi-channel individual files - mc_signal = np.concatenate(signal, axis=1) + mc_signal = np.concatenate(signal, axis=0) else: raise RuntimeError(f'Unexpected target with {signal[0].ndim} dimensions.') @@ -687,15 +695,14 @@ def output_types(self) -> Optional[Dict[str, NeuralType]]: """Returns definitions of module output ports. """ - def __init__( - self, collection: collections.Audio, audio_processor: Callable, - ): + def __init__(self, collection: collections.Audio, audio_processor: Callable, output_type: Type[namedtuple]): """Instantiates an audio dataset. """ super().__init__() self.collection = collection self.audio_processor = audio_processor + self.output_type = output_type def num_channels(self, signal_key) -> int: """Returns the number of channels for a particular signal in @@ -704,7 +711,7 @@ def num_channels(self, signal_key) -> int: More specifically, this will get the tensor from the first item in the dataset, check if it's a one- or two-dimensional tensor, and return the number of channels based on the size - of the second axis (shape[1]). + of the first axis (shape[0]). NOTE: This assumes that all examples have the same number of channels. @@ -722,7 +729,7 @@ def num_channels(self, signal_key) -> int: if item[signal_key].ndim == 1: return 1 elif item[signal_key].ndim == 2: - return item[signal_key].shape[1] + return item[signal_key].shape[0] else: raise RuntimeError( f'Unexpected number of dimension for signal {signal_key} with shape {item[signal_key].shape}' @@ -757,7 +764,12 @@ def __len__(self) -> int: def _collate_fn(self, batch) -> Tuple[torch.Tensor]: """Collate items in a batch. """ - return _audio_collate_fn(batch) + return self.output_type(*_audio_collate_fn(batch)) + + +AudioToTargetExample = namedtuple( + typename='AudioToTargetExample', field_names='input_signal input_length target_signal target_length' +) @experimental @@ -839,9 +851,7 @@ def __init__( channel_selectors=[input_channel_selector, target_channel_selector], ) - super().__init__( - collection=collection, audio_processor=audio_processor, - ) + super().__init__(collection=collection, audio_processor=audio_processor, output_type=AudioToTargetExample) @property def output_types(self) -> Optional[Dict[str, NeuralType]]: @@ -859,7 +869,7 @@ def output_types(self) -> Optional[Dict[str, NeuralType]]: ``` """ sc_audio_type = NeuralType(('B', 'T'), AudioSignal()) - mc_audio_type = NeuralType(('B', 'T', 'C'), AudioSignal()) + mc_audio_type = NeuralType(('B', 'C', 'T'), AudioSignal()) return OrderedDict( input_signal=sc_audio_type if self.num_channels('input_signal') == 1 else mc_audio_type, @@ -869,6 +879,12 @@ def output_types(self) -> Optional[Dict[str, NeuralType]]: ) +AudioToTargetWithReferenceExample = namedtuple( + typename='AudioToTargetWithReferenceExample', + field_names='input_signal input_length target_signal target_length reference_signal reference_length', +) + + @experimental class AudioToTargetWithReferenceDataset(BaseAudioDataset): """A dataset for audio-to-audio tasks where the goal is to use @@ -975,7 +991,7 @@ def __init__( ) super().__init__( - collection=collection, audio_processor=audio_processor, + collection=collection, audio_processor=audio_processor, output_type=AudioToTargetWithReferenceExample ) @property @@ -996,7 +1012,7 @@ def output_types(self) -> Optional[Dict[str, NeuralType]]: ``` """ sc_audio_type = NeuralType(('B', 'T'), AudioSignal()) - mc_audio_type = NeuralType(('B', 'T', 'C'), AudioSignal()) + mc_audio_type = NeuralType(('B', 'C', 'T'), AudioSignal()) return OrderedDict( input_signal=sc_audio_type if self.num_channels('input_signal') == 1 else mc_audio_type, @@ -1008,6 +1024,12 @@ def output_types(self) -> Optional[Dict[str, NeuralType]]: ) +AudioToTargetWithEmbeddingExample = namedtuple( + typename='AudioToTargetWithEmbeddingExample', + field_names='input_signal input_length target_signal target_length embedding_vector embedding_length', +) + + @experimental class AudioToTargetWithEmbeddingDataset(BaseAudioDataset): """A dataset for audio-to-audio tasks where the goal is to use @@ -1086,7 +1108,7 @@ def __init__( audio_processor.embedding_setup = SignalSetup(signals=['embedding_vector']) super().__init__( - collection=collection, audio_processor=audio_processor, + collection=collection, audio_processor=audio_processor, output_type=AudioToTargetWithEmbeddingExample ) @property @@ -1107,7 +1129,7 @@ def output_types(self) -> Optional[Dict[str, NeuralType]]: ``` """ sc_audio_type = NeuralType(('B', 'T'), AudioSignal()) - mc_audio_type = NeuralType(('B', 'T', 'C'), AudioSignal()) + mc_audio_type = NeuralType(('B', 'C', 'T'), AudioSignal()) return OrderedDict( input_signal=sc_audio_type if self.num_channels('input_signal') == 1 else mc_audio_type, diff --git a/nemo/collections/asr/data/audio_to_audio_dataset.py b/nemo/collections/asr/data/audio_to_audio_dataset.py index 52c2e429858d..b296d64b1f2a 100644 --- a/nemo/collections/asr/data/audio_to_audio_dataset.py +++ b/nemo/collections/asr/data/audio_to_audio_dataset.py @@ -30,6 +30,7 @@ def get_audio_to_target_dataset(config: dict) -> audio_to_audio.AudioToTargetDat input_key=config['input_key'], target_key=config['target_key'], audio_duration=config.get('audio_duration', None), + random_offset=config.get('random_offset', False), max_duration=config.get('max_duration', None), min_duration=config.get('min_duration', None), max_utts=config.get('max_utts', 0), @@ -55,6 +56,7 @@ def get_audio_to_target_with_reference_dataset(config: dict) -> audio_to_audio.A target_key=config['target_key'], reference_key=config['reference_key'], audio_duration=config.get('audio_duration', None), + random_offset=config.get('random_offset', False), max_duration=config.get('max_duration', None), min_duration=config.get('min_duration', None), max_utts=config.get('max_utts', 0), @@ -83,6 +85,7 @@ def get_audio_to_target_with_embedding_dataset(config: dict) -> audio_to_audio.A target_key=config['target_key'], embedding_key=config['embedding_key'], audio_duration=config.get('audio_duration', None), + random_offset=config.get('random_offset', False), max_duration=config.get('max_duration', None), min_duration=config.get('min_duration', None), max_utts=config.get('max_utts', 0), diff --git a/nemo/collections/asr/losses/__init__.py b/nemo/collections/asr/losses/__init__.py index 0747e9a37bea..3e50cea1d692 100644 --- a/nemo/collections/asr/losses/__init__.py +++ b/nemo/collections/asr/losses/__init__.py @@ -13,6 +13,7 @@ # limitations under the License. from nemo.collections.asr.losses.angularloss import AngularSoftmaxLoss +from nemo.collections.asr.losses.audio_losses import SDRLoss from nemo.collections.asr.losses.ctc import CTCLoss from nemo.collections.asr.losses.lattice_losses import LatticeLoss from nemo.collections.asr.losses.ssl_losses.contrastive import ContrastiveLoss diff --git a/nemo/collections/asr/losses/audio_losses.py b/nemo/collections/asr/losses/audio_losses.py new file mode 100644 index 000000000000..34343995a881 --- /dev/null +++ b/nemo/collections/asr/losses/audio_losses.py @@ -0,0 +1,248 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from typing import List, Optional + +import numpy as np +import torch + +from nemo.core.classes import Loss, Typing, typecheck +from nemo.core.neural_types import AudioSignal, LengthsType, LossType, MaskType, NeuralType +from nemo.utils import logging + +__all__ = ['SDRLoss'] + + +def temporal_mean( + input: torch.Tensor, + input_length: Optional[torch.Tensor] = None, + mask: Optional[torch.Tensor] = None, + keepdim: bool = False, + eps: float = 1e-10, +) -> torch.Tensor: + """Calculate mean along temporal dimension with optionally + averaging only over valid samples (based on the input length). + + Args: + input: Batch of signals, shape (B, T, C) + input_length: Optional, length of each example in the batch, shape (B,) + mask: Optional, temporal mask for each example in the batch, shape (B, T) + keepdim: Whether to keep the temporal dimension + eps: Regularization to avoid division by zero + + Returns: + (B, C, 1) if keepdim=True, otherwise (B, C) + """ + if (input_length is not None) and (mask is not None): + raise RuntimeError( + 'Argument `input_length` is mutually exclusive with `mask`. Both cannot be used at the same time.' + ) + + if input_length is None and mask is None: + # No length information, assume all samples are valid + mean = torch.mean(input, dim=-1, keepdim=keepdim) + elif input_length is not None: + assert (input_length <= input.shape[-1]).all(), f'Check input length {input_length}, input shape {input.shape}' + # Average only over valid elements + mean = [] + for b, b_len in enumerate(input_length): + mean_b = torch.sum(input[b, :, :b_len], axis=-1, keepdim=keepdim) / b_len + mean.append(mean_b) + mean = torch.stack(mean, axis=0) + elif mask is not None: + # Average using temporal mask + mean = mask.unsqueeze(1) * input + mean = torch.sum(mean, axis=-1, keepdim=keepdim) + normalization = torch.sum(mask, axis=-1, keepdim=keepdim) + mean = mean / (normalization.unsqueeze(1) + eps) + else: + raise RuntimeError(f'Unexpected input with both input_length and mask provided.') + + return mean + + +def calculate_sdr_batch( + estimate: torch.Tensor, + target: torch.Tensor, + input_length: Optional[torch.Tensor] = None, + mask: Optional[torch.Tensor] = None, + scale_invariant: bool = False, + remove_mean: bool = True, + sdr_max: Optional[float] = None, + eps: float = 1e-10, +) -> torch.Tensor: + """Calculate signal-to-distortion ratio per channel. + + SDR = 10 * log10( ||t||_2^2 / (||e-t||_2^2 + alpha * ||t||^2) + + where + alpha = 10^(-sdr_max/10) + + Optionally, apply scale-invariant scaling on the target signal. + + Args: + estimate: estimated signal, shape (B, T, C) + target: target signal, shape (B, T, C) + input_length: Optional, length of valid samples, shape (B,) + mask: Optional, temporal mask, shape (B, T) + scale_invariant: Use scale invariant SDR + remove_mean: If True, mean will be removed before calculating SDR + eps: Small regularization constant + + Returns: + SDR in dB for each channel, shape (B, C) + """ + assert ( + estimate.shape == target.shape + ), f'Estimate shape ({estimate.shape}) not matching target shape ({target.shape})' + + if remove_mean: + estimate = estimate - temporal_mean(estimate, input_length=input_length, mask=mask, keepdim=True, eps=eps) + target = target - temporal_mean(target, input_length=input_length, mask=mask, keepdim=True, eps=eps) + + if scale_invariant: + estimate_dot_target = temporal_mean( + estimate * target, input_length=input_length, mask=mask, keepdim=True, eps=eps + ) + target_pow = temporal_mean(torch.abs(target) ** 2, input_length=input_length, mask=mask, keepdim=True, eps=eps) + target_scale = estimate_dot_target / (target_pow + eps) + target = target_scale * target + + distortion = estimate - target + + target_pow = temporal_mean(torch.abs(target) ** 2, input_length=input_length, mask=mask, eps=eps) + distortion_pow = temporal_mean(torch.abs(distortion) ** 2, input_length=input_length, mask=mask, eps=eps) + + if sdr_max is not None: + distortion_pow = distortion_pow + 10 ** (-sdr_max / 10) * target_pow + + sdr = target_pow / (distortion_pow + eps) + sdr = 10 * torch.log10(sdr + eps) + + return sdr + + +class SDRLoss(Loss, Typing): + """ + Computes signal-to-distortion ratio (SDR) loss with weighted average across channels. + + Args: + weight: weight for SDR of each output channel, used for averaging the loss across channels. Defaults to `None` (averaging). + reduction: batch reduction. Defaults to `mean` over the batch. + scale_invariant: If `True`, use scale-invariant SDR. Defaults to `False`. + remove_mean: Remove mean before calculating the loss. Defaults to `True`. + sdr_max: Soft thresholding of the loss to SDR_max. + eps: Small value for regularization. + """ + + def __init__( + self, + weight: Optional[List[float]] = None, + reduction: str = 'mean', + scale_invariant: bool = False, + remove_mean: bool = True, + sdr_max: Optional[float] = None, + eps: float = 1e-10, + ): + super().__init__() + + # SDR weight buffer + if weight is not None: + if any([w <= 0 for w in weight]): + raise ValueError(f'Weight must be positive! Current value: {weight}') + elif not np.isclose(sum(weight), 1, atol=1e-6): + raise ValueError(f'Weight should add to one, current weight: {weight}') + weight = torch.tensor(weight).reshape(1, -1) + logging.info(f'Channel weight set to %s', weight) + self.register_buffer('weight', weight) + self.weight: Optional[Tensor] + + # Batch reduction + self.reduction = reduction + if reduction == 'mean': + self.reduce = torch.mean + else: + raise ValueError(f'Unexpected reduction mode {reduction}.') + + # SDR calculation setup + self.scale_invariant = scale_invariant + self.remove_mean = remove_mean + self.sdr_max = sdr_max + self.eps = eps + + @property + def input_types(self): + """Input types definitions for SDRLoss. + """ + signal_shape = ('B', 'C', 'T') + return { + "estimate": NeuralType(signal_shape, AudioSignal()), + "target": NeuralType(signal_shape, AudioSignal()), + "input_length": NeuralType(tuple('B'), LengthsType(), optional=True), + "mask": NeuralType(('B', 'T'), MaskType(), optional=True), + } + + @property + def output_types(self): + """Output types definitions for SDRLoss. + loss: + NeuralType(None) + """ + return {"loss": NeuralType(elements_type=LossType())} + + @typecheck() + def forward( + self, + estimate: torch.Tensor, + target: torch.Tensor, + input_length: Optional[torch.Tensor] = None, + mask: Optional[torch.Tensor] = None, + ) -> torch.Tensor: + """For input batch of multi-channel signals, calculate SDR between estimate and target for each channel, + perform averaging across channels (weighting optional), and apply reduction across the batch. + + Args: + estimate: Batch of signals, shape (B, T, C) + target: Batch of signals, shape (B, T, C) + input_length: Batch of lengths, shape (B,) + mask: Batch of temporal masks, shape (B, T) + + Returns: + Scalar loss. + """ + + sdr = calculate_sdr_batch( + estimate=estimate, + target=target, + input_length=input_length, + mask=mask, + scale_invariant=self.scale_invariant, + remove_mean=self.remove_mean, + sdr_max=self.sdr_max, + eps=self.eps, + ) + + # channel averaging + if self.weight is None: + sdr = torch.mean(sdr, dim=1) + else: + # weighting across channels + sdr = sdr * self.weight + sdr = torch.sum(sdr, dim=1) + + # reduction + sdr = self.reduce(sdr) + + return -sdr diff --git a/nemo/collections/asr/models/__init__.py b/nemo/collections/asr/models/__init__.py index 82e7b0697bae..a2af88130a90 100644 --- a/nemo/collections/asr/models/__init__.py +++ b/nemo/collections/asr/models/__init__.py @@ -13,10 +13,12 @@ # limitations under the License. from nemo.collections.asr.models.asr_model import ASRModel +from nemo.collections.asr.models.audio_to_audio_model import AudioToAudioModel from nemo.collections.asr.models.classification_models import EncDecClassificationModel from nemo.collections.asr.models.clustering_diarizer import ClusteringDiarizer from nemo.collections.asr.models.ctc_bpe_models import EncDecCTCModelBPE from nemo.collections.asr.models.ctc_models import EncDecCTCModel +from nemo.collections.asr.models.enhancement_models import EncMaskDecAudioToAudioModel from nemo.collections.asr.models.hybrid_rnnt_ctc_bpe_models import EncDecHybridRNNTCTCBPEModel from nemo.collections.asr.models.hybrid_rnnt_ctc_models import EncDecHybridRNNTCTCModel from nemo.collections.asr.models.k2_sequence_models import EncDecK2SeqModel, EncDecK2SeqModelBPE diff --git a/nemo/collections/asr/models/audio_to_audio_model.py b/nemo/collections/asr/models/audio_to_audio_model.py new file mode 100644 index 000000000000..d38a0557bf46 --- /dev/null +++ b/nemo/collections/asr/models/audio_to_audio_model.py @@ -0,0 +1,114 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from abc import ABC, abstractmethod +from typing import List, Union + +import torch + +from nemo.core.classes import ModelPT +from nemo.utils import logging, model_utils +from nemo.utils.decorators import experimental + +__all__ = ['AudioToAudioModel'] + + +@experimental +class AudioToAudioModel(ModelPT, ABC): + @abstractmethod + def process( + self, paths2audio_files: List[str], output_dir: str, batch_size: int = 4 + ) -> List[Union[str, List[str]]]: + """ + Takes paths to audio files and returns a list of paths to processed + audios. + + Args: + paths2audio_files: paths to audio files to be processed + output_dir: directory to save processed files + batch_size: batch size for inference + + Returns: + Paths to processed audio signals. + """ + pass + + @abstractmethod + def evaluation_step(self, batch, batch_idx, dataloader_idx: int = 0, tag: str = 'val'): + pass + + def validation_step(self, batch, batch_idx, dataloader_idx: int = 0): + return self.evaluation_step(batch, batch_idx, dataloader_idx, 'val') + + def test_step(self, batch, batch_idx, dataloader_idx=0): + return self.evaluation_step(batch, batch_idx, dataloader_idx, 'test') + + def multi_evaluation_epoch_end(self, outputs, dataloader_idx: int = 0, tag: str = 'val'): + loss_mean = torch.stack([x[f'{tag}_loss'] for x in outputs]).mean() + tensorboard_logs = {f'{tag}_loss': loss_mean} + return {f'{tag}_loss': loss_mean, 'log': tensorboard_logs} + + def multi_validation_epoch_end(self, outputs, dataloader_idx: int = 0): + return self.multi_evaluation_epoch_end(outputs, dataloader_idx, 'val') + + def multi_test_epoch_end(self, outputs, dataloader_idx: int = 0): + return self.multi_evaluation_epoch_end(outputs, dataloader_idx, 'test') + + @classmethod + def list_available_models(cls) -> 'List[PretrainedModelInfo]': + """ + This method returns a list of pre-trained model which can be instantiated directly from NVIDIA's NGC cloud. + Returns: + List of available pre-trained models. + """ + # recursively walk the subclasses to generate pretrained model info + list_of_models = model_utils.resolve_subclass_pretrained_model_info(cls) + return list_of_models + + def setup_optimization_flags(self): + """ + Utility method that must be explicitly called by the subclass in order to support optional optimization flags. + This method is the only valid place to access self.cfg prior to DDP training occurs. + + The subclass may chose not to support this method, therefore all variables here must be checked via hasattr() + """ + # Skip update if nan/inf grads appear on any rank. + self._skip_nan_grad = False + if "skip_nan_grad" in self._cfg and self._cfg["skip_nan_grad"]: + self._skip_nan_grad = self._cfg["skip_nan_grad"] + + def on_after_backward(self): + """ + zero-out the gradients which any of them is NAN or INF + """ + super().on_after_backward() + + if hasattr(self, '_skip_nan_grad') and self._skip_nan_grad: + device = next(self.parameters()).device + valid_gradients = torch.tensor([1], device=device, dtype=torch.float32) + + # valid_gradients = True + for param_name, param in self.named_parameters(): + if param.grad is not None: + is_not_nan_or_inf = not (torch.isnan(param.grad).any() or torch.isinf(param.grad).any()) + if not is_not_nan_or_inf: + valid_gradients = valid_gradients * 0 + break + + if torch.distributed.is_initialized(): + torch.distributed.all_reduce(valid_gradients, op=torch.distributed.ReduceOp.MIN) + + if valid_gradients < 1: + logging.warning(f'detected inf or nan values in gradients! Setting gradients to zero.') + self.zero_grad() diff --git a/nemo/collections/asr/models/enhancement_models.py b/nemo/collections/asr/models/enhancement_models.py new file mode 100644 index 000000000000..c9ca358e7854 --- /dev/null +++ b/nemo/collections/asr/models/enhancement_models.py @@ -0,0 +1,434 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import json +import os +import tempfile +from typing import Dict, List, Optional, Union + +import librosa +import soundfile as sf +import torch +from omegaconf import DictConfig +from pytorch_lightning import Trainer +from tqdm import tqdm + +from nemo.collections.asr.data import audio_to_audio_dataset +from nemo.collections.asr.data.audio_to_text_dataset import inject_dataloader_value_from_model_config +from nemo.collections.asr.models.audio_to_audio_model import AudioToAudioModel +from nemo.collections.asr.parts.utils.audio_utils import ChannelSelectorType +from nemo.core.classes.common import PretrainedModelInfo, typecheck +from nemo.core.neural_types import AudioSignal, LengthsType, NeuralType +from nemo.utils import logging +from nemo.utils.decorators import experimental + +__all__ = ['EncMaskDecAudioToAudioModel'] + + +@experimental +class EncMaskDecAudioToAudioModel(AudioToAudioModel): + """Class for encoder-mask-decoder audio processing models. + + The model consists of the following blocks: + - encoder: transforms input multi-channel audio signal into an encoded representation (analysis transform) + - mask_estimator: estimates a mask used by signal processor + - mask_processor: mask-based signal processor, combines the encoded input and the estimated mask + - decoder: transforms processor output into the time domain (synthesis transform) + """ + + def __init__(self, cfg: DictConfig, trainer: Trainer = None): + # Get global rank and total number of GPU workers for IterableDataset partitioning, if applicable + # Global_rank and local_rank is set by LightningModule in Lightning 1.2.0 + self.world_size = 1 + if trainer is not None: + self.world_size = trainer.world_size + + super().__init__(cfg=cfg, trainer=trainer) + self.sample_rate = self._cfg.sample_rate + + # Setup processing modules + self.encoder = EncMaskDecAudioToAudioModel.from_config_dict(self._cfg.encoder) + self.mask_estimator = EncMaskDecAudioToAudioModel.from_config_dict(self._cfg.mask_estimator) + self.mask_processor = EncMaskDecAudioToAudioModel.from_config_dict(self._cfg.mask_processor) + self.decoder = EncMaskDecAudioToAudioModel.from_config_dict(self._cfg.decoder) + + # Future enhancement: + # If subclasses need to modify the config before calling super() + # Check ASRBPE* classes do with their mixin + + # Setup loss + self.loss = EncMaskDecAudioToAudioModel.from_config_dict(self._cfg.loss) + + # Setup optional Optimization flags + self.setup_optimization_flags() + + @torch.no_grad() + def process( + self, + paths2audio_files: List[str], + output_dir: str, + batch_size: int = 1, + num_workers: Optional[int] = None, + input_channel_selector: Optional[ChannelSelectorType] = None, + ) -> List[str]: + """ + Process audio files provided in paths2audio_files. + Processed signals will be saved in output_dir. + + Args: + paths2audio_files: (a list) of paths to audio files. \ + Recommended length per file is between 5 and 25 seconds. \ + But it is possible to pass a few hours long file if enough GPU memory is available. + output_dir: + batch_size: (int) batch size to use during inference. + Bigger will result in better throughput performance but would use more memory. + channel_selector (int | Iterable[int] | str): select a single channel or a subset of channels from multi-channel audio. If set to `'average'`, it performs averaging across channels. Disabled if set to `None`. Defaults to `None`. + + Returns: + """ + if paths2audio_files is None or len(paths2audio_files) == 0: + return {} + + if num_workers is None: + num_workers = min(batch_size, os.cpu_count() - 1) + + # Output + paths2processed_files = [] + + # Model's mode and device + mode = self.training + device = next(self.parameters()).device + + try: + # Switch model to evaluation mode + self.eval() + # Freeze weights + self.freeze() + + logging_level = logging.get_verbosity() + logging.set_verbosity(logging.WARNING) + + # Processing + with tempfile.TemporaryDirectory() as tmpdir: + # Save temporary manifest + temporary_manifest_filepath = os.path.join(tmpdir, 'manifest.json') + with open(temporary_manifest_filepath, 'w', encoding='utf-8') as fp: + for audio_file in paths2audio_files: + entry = {'input_filepath': audio_file, 'duration': librosa.get_duration(filename=audio_file)} + fp.write(json.dumps(entry) + '\n') + + config = { + 'manifest_filepath': temporary_manifest_filepath, + 'input_key': 'input_filepath', + 'input_channel_selector': input_channel_selector, + 'batch_size': min(batch_size, len(paths2audio_files)), + 'num_workers': num_workers, + 'channel_selector': input_channel_selector, + } + + # Create output dir if necessary + if not os.path.isdir(output_dir): + os.makedirs(output_dir) + + # DataLoader for the input files + temporary_dataloader = self._setup_process_dataloader(config) + + # Indexing of the original files, used to form the output file name + file_idx = 0 + + # Process batches + for test_batch in tqdm(temporary_dataloader, desc="Processing"): + input_signal = test_batch[0] + input_length = test_batch[1] + + # Expand channel dimension, if necessary + # For consistency, the model uses multi-channel format, even if the channel dimension is 1 + if input_signal.ndim == 2: + input_signal = input_signal.unsqueeze(1) + + processed_batch, _ = self.forward( + input_signal=input_signal.to(device), input_length=input_length.to(device) + ) + + for example_idx in range(processed_batch.size(0)): + # This assumes the data loader is not shuffling files + file_name = os.path.basename(paths2audio_files[file_idx]) + # Prepare output file + output_file = os.path.join(output_dir, f'processed_{file_name}') + # Crop the output signal to the actual length + output_signal = processed_batch[example_idx, :, : input_length[example_idx]].cpu().numpy() + # Write audio + sf.write(output_file, output_signal.T, self.sample_rate, 'float') + # Update the file counter + file_idx += 1 + # Save processed file + paths2processed_files.append(output_file) + + del test_batch + del processed_batch + + finally: + # set mode back to its original value + self.train(mode=mode) + if mode is True: + self.unfreeze() + logging.set_verbosity(logging_level) + + return paths2processed_files + + def _setup_dataloader_from_config(self, config: Optional[Dict]): + + is_concat = config.get('is_concat', False) + if is_concat: + raise NotImplementedError('Concat not implemented') + + # TODO: Consider moving `inject` from `audio_to_text_dataset` to a utility module? + # Automatically inject args from model config to dataloader config + inject_dataloader_value_from_model_config(self.cfg, config, key='sample_rate') + + # Instantiate tarred dataset loader or normal dataset loader + if config.get('is_tarred', False): + raise NotImplementedError('Tarred datasets not supported') + + if 'manifest_filepath' in config and config['manifest_filepath'] is None: + logging.warning(f"Could not load dataset as `manifest_filepath` was None. Provided config : {config}") + return None + + dataset = audio_to_audio_dataset.get_audio_to_target_dataset(config=config) + + if hasattr(dataset, 'collate_fn'): + collate_fn = dataset.collate_fn + else: + collate_fn = dataset.datasets[0].collate_fn + + return torch.utils.data.DataLoader( + dataset=dataset, + batch_size=config['batch_size'], + collate_fn=collate_fn, + drop_last=config.get('drop_last', False), + shuffle=config['shuffle'], + num_workers=config.get('num_workers', 0), + pin_memory=config.get('pin_memory', False), + ) + + def setup_training_data(self, train_data_config: Optional[Union[DictConfig, Dict]]): + """ + Sets up the training data loader via a Dict-like object. + + Args: + train_data_config: A config that contains the information regarding construction + of a training dataset. + + Supported Datasets: + - :class:`~nemo.collections.asr.data.audio_to_audio.AudioToTargetDataset` + """ + if 'shuffle' not in train_data_config: + train_data_config['shuffle'] = True + + # preserve config + self._update_dataset_config(dataset_name='train', config=train_data_config) + + self._train_dl = self._setup_dataloader_from_config(config=train_data_config) + + if 'is_tarred' in train_data_config and train_data_config['is_tarred']: + raise NotImplementedError('Tarred datasets not supported') + + def setup_validation_data(self, val_data_config: Optional[Union[DictConfig, Dict]]): + """ + Sets up the validation data loader via a Dict-like object. + + Args: + val_data_config: A config that contains the information regarding construction + of a validation dataset. + + Supported Datasets: + - :class:`~nemo.collections.asr.data.audio_to_audio.AudioToTargetDataset` + """ + if 'shuffle' not in val_data_config: + val_data_config['shuffle'] = False + + # preserve config + self._update_dataset_config(dataset_name='validation', config=val_data_config) + + self._validation_dl = self._setup_dataloader_from_config(config=val_data_config) + + def setup_test_data(self, test_data_config: Optional[Union[DictConfig, Dict]]): + """ + Sets up the test data loader via a Dict-like object. + + Args: + test_data_config: A config that contains the information regarding construction + of a test dataset. + + Supported Datasets: + - :class:`~nemo.collections.asr.data.audio_to_audio.AudioToTargetDataset` + """ + if 'shuffle' not in test_data_config: + test_data_config['shuffle'] = False + + # preserve config + self._update_dataset_config(dataset_name='test', config=test_data_config) + + self._test_dl = self._setup_dataloader_from_config(config=test_data_config) + + def _setup_process_dataloader(self, config: Dict) -> 'torch.utils.data.DataLoader': + """Prepare a dataloader for processing files. + + Args: + config: A python dictionary which contains the following keys: + manifest_filepath: path to a manifest file + input_key: key with audio filepaths in the manifest + input_channel_selector: Optional, used to select a subset of channels from input audio files + batch_size: batch size for the dataloader + num_workers: number of workers for the dataloader + + Returns: + A pytorch DataLoader for the given manifest filepath. + """ + dl_config = { + 'manifest_filepath': config['manifest_filepath'], + 'sample_rate': self.sample_rate, + 'input_key': config['input_key'], + 'input_channel_selector': config.get('input_channel_selector', None), + 'target_key': None, + 'target_channel_selector': None, + 'batch_size': config['batch_size'], + 'shuffle': False, + 'num_workers': config.get('num_workers', min(config['batch_size'], os.cpu_count() - 1)), + 'pin_memory': True, + } + + temporary_dataloader = self._setup_dataloader_from_config(config=DictConfig(dl_config)) + return temporary_dataloader + + @property + def input_types(self) -> Dict[str, NeuralType]: + return { + "input_signal": NeuralType( + ('B', 'C', 'T'), AudioSignal(freq=self.sample_rate) + ), # multi-channel format, channel dimension can be 1 for single-channel audio + "input_length": NeuralType(tuple('B'), LengthsType()), + } + + @property + def output_types(self) -> Dict[str, NeuralType]: + return { + "output_signal": NeuralType( + ('B', 'C', 'T'), AudioSignal(freq=self.sample_rate) + ), # multi-channel format, channel dimension can be 1 for single-channel audio + "output_length": NeuralType(tuple('B'), LengthsType()), + } + + def match_batch_length(self, input: torch.Tensor, batch_length: int): + """Trim or pad the output to match the batch length. + + Args: + input: tensor with shape (B, C, T) + batch_length: int + + Returns: + Tensor with shape (B, C, T), where T matches the + batch length. + """ + input_length = input.size(-1) + pad_length = batch_length - input_length + pad = (0, pad_length) + # pad with zeros or crop + return torch.nn.functional.pad(input, pad, 'constant', 0) + + @typecheck() + def forward(self, input_signal, input_length): + """ + Forward pass of the model. + + Args: + input_signal: Tensor that represents a batch of raw audio signals, + of shape [B, T] or [B, T, C]. T here represents timesteps, with 1 second of audio represented as + `self.sample_rate` number of floating point values. + input_signal_length: Vector of length B, that contains the individual lengths of the audio + sequences. + + Returns: + """ + batch_length = input_signal.size(-1) + + # Encoder + encoded, encoded_length = self.encoder(input=input_signal, input_length=input_length) + + # Mask estimator + mask, _ = self.mask_estimator(input=encoded, input_length=encoded_length) + + # Mask-based processor in the encoded domain + processed, processed_length = self.mask_processor(input=encoded, input_length=encoded_length, mask=mask) + + # Decoder + processed, processed_length = self.decoder(input=processed, input_length=processed_length) + + # Trim or pad the estimated signal to match input length + processed = self.match_batch_length(input=processed, batch_length=batch_length) + return processed, processed_length + + # PTL-specific methods + def training_step(self, batch, batch_idx): + input_signal, input_length, target_signal, target_length = batch + + # Expand channel dimension, if necessary + # For consistency, the model uses multi-channel format, even if the channel dimension is 1 + if input_signal.ndim == 2: + input_signal = input_signal.unsqueeze(1) + if target_signal.ndim == 2: + target_signal = target_signal.unsqueeze(1) + + processed_signal, _ = self.forward(input_signal=input_signal, input_length=input_length) + + loss_value = self.loss(estimate=processed_signal, target=target_signal, input_length=input_length) + + tensorboard_logs = { + 'train_loss': loss_value, + 'learning_rate': self._optimizer.param_groups[0]['lr'], + 'global_step': torch.tensor(self.trainer.global_step, dtype=torch.float32), + } + + return {'loss': loss_value, 'log': tensorboard_logs} + + def evaluation_step(self, batch, batch_idx, dataloader_idx: int = 0, tag: str = 'val'): + input_signal, input_length, target_signal, target_length = batch + + # Expand channel dimension, if necessary + # For consistency, the model uses multi-channel format, even if the channel dimension is 1 + if input_signal.ndim == 2: + input_signal = input_signal.unsqueeze(1) + if target_signal.ndim == 2: + target_signal = target_signal.unsqueeze(1) + + processed_signal, _ = self.forward(input_signal=input_signal, input_length=input_length) + + loss_value = self.loss(estimate=processed_signal, target=target_signal, input_length=input_length) + + self.log('global_step', torch.tensor(self.trainer.global_step, dtype=torch.float32)) + + return { + f'{tag}_loss': loss_value, + } + + @classmethod + def list_available_models(cls) -> Optional[PretrainedModelInfo]: + """ + This method returns a list of pre-trained model which can be instantiated directly from NVIDIA's NGC cloud. + + Returns: + List of available pre-trained models. + """ + results = [] + + return results diff --git a/nemo/collections/asr/modules/__init__.py b/nemo/collections/asr/modules/__init__.py index 35be97044083..5ea4953b5171 100644 --- a/nemo/collections/asr/modules/__init__.py +++ b/nemo/collections/asr/modules/__init__.py @@ -12,12 +12,15 @@ # See the License for the specific language governing permissions and # limitations under the License. +from nemo.collections.asr.modules.audio_modules import MaskBasedBeamformer, MaskEstimatorRNN, MaskReferenceChannel from nemo.collections.asr.modules.audio_preprocessing import ( AudioToMelSpectrogramPreprocessor, AudioToMFCCPreprocessor, + AudioToSpectrogram, CropOrPadSpectrogramAugmentation, MaskedPatchAugmentation, SpectrogramAugmentation, + SpectrogramToAudio, ) from nemo.collections.asr.modules.beam_search_decoder import BeamSearchDecoderWithLM from nemo.collections.asr.modules.conformer_encoder import ConformerEncoder, ConformerEncoderAdapter diff --git a/nemo/collections/asr/modules/audio_modules.py b/nemo/collections/asr/modules/audio_modules.py new file mode 100644 index 000000000000..ff073361c45a --- /dev/null +++ b/nemo/collections/asr/modules/audio_modules.py @@ -0,0 +1,492 @@ +# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Dict, Optional, Tuple + +import torch + +from nemo.collections.asr.parts.preprocessing.features import make_seq_mask_like +from nemo.collections.asr.parts.utils.audio_utils import db2mag, wrap_to_pi +from nemo.core.classes import NeuralModule, typecheck +from nemo.core.neural_types import FloatType, LengthsType, NeuralType, SpectrogramType +from nemo.utils import logging +from nemo.utils.decorators import experimental + +try: + import torchaudio + + HAVE_TORCHAUDIO = True +except ModuleNotFoundError: + HAVE_TORCHAUDIO = False + + +__all__ = [ + 'MaskEstimatorRNN', + 'MaskReferenceChannel', + 'MaskBasedBeamformer', +] + + +@experimental +class SpectrogramToMultichannelFeatures(NeuralModule): + """Convert a complex-valued multi-channel spectrogram to + multichannel features. + + Args: + num_subbands: Expected number of subbands in the input signal + num_input_channels: Optional, provides the number of channels + of the input signal. Used to infer the number + of output channels. + magnitude_reduction: Reduction across channels. Default `None`, will calculate + magnitude of each channel. + use_ipd: Use inter-channel phase difference (IPD). + mag_normalization: Normalization for magnitude features + ipd_normalization: Normalization for IPD features + """ + + def __init__( + self, + num_subbands: int, + num_input_channels: Optional[int] = None, + mag_reduction: Optional[str] = 'rms', + use_ipd: bool = False, + mag_normalization: Optional[str] = None, + ipd_normalization: Optional[str] = None, + ): + super().__init__() + self.mag_reduction = mag_reduction + self.use_ipd = use_ipd + + # TODO: normalization + if mag_normalization is not None: + raise NotImplementedError(f'Unknown magnitude normalization {mag_normalization}') + self.mag_normalization = mag_normalization + + if ipd_normalization is not None: + raise NotImplementedError(f'Unknown ipd normalization {ipd_normalization}') + self.ipd_normalization = ipd_normalization + + if self.use_ipd: + self._num_features = 2 * num_subbands + self._num_channels = num_input_channels + else: + self._num_features = num_subbands + self._num_channels = num_input_channels if self.mag_reduction is None else 1 + + @property + def input_types(self) -> Dict[str, NeuralType]: + """Returns definitions of module output ports. + """ + return { + "input": NeuralType(('B', 'C', 'D', 'T'), SpectrogramType()), + "input_length": NeuralType(('B',), LengthsType()), + } + + @property + def output_types(self) -> Dict[str, NeuralType]: + """Returns definitions of module output ports. + """ + return { + "output": NeuralType(('B', 'C', 'D', 'T'), SpectrogramType()), + "output_length": NeuralType(('B',), LengthsType()), + } + + @property + def num_features(self) -> int: + """Configured number of features + """ + return self._num_features + + @property + def num_channels(self) -> int: + """Configured number of channels + """ + if self._num_channels is not None: + return self._num_channels + else: + raise ValueError( + 'Num channels is not configured. To configure this, `num_input_channels` ' + 'must be provided when constructing the object.' + ) + + @typecheck() + def forward(self, input: torch.Tensor, input_length: torch.Tensor) -> torch.Tensor: + """Convert input batch of C-channel spectrograms into + a batch of time-frequency features with dimension num_feat. + The output number of channels may be the same as input, or + reduced to 1, e.g., if averaging over magnitude and not appending individual IPDs. + + Args: + input: Spectrogram for C channels with F subbands and N time frames, (B, C, F, N) + input_length: Length of valid entries along the time dimension, shape (B,) + + Returns: + num_feat_channels channels with num_feat features, shape (B, num_feat_channels, num_feat, N) + """ + # Magnitude spectrum + if self.mag_reduction is None: + mag = torch.abs(input) + elif self.mag_reduction == 'abs_mean': + mag = torch.abs(torch.mean(input, axis=1, keepdim=True)) + elif self.mag_reduction == 'mean_abs': + mag = torch.mean(torch.abs(input), axis=1, keepdim=True) + elif self.mag_reduction == 'rms': + mag = torch.sqrt(torch.mean(torch.abs(input) ** 2, axis=1, keepdim=True)) + else: + raise ValueError(f'Unexpected magnitude reduction {self.mag_reduction}') + + if self.mag_normalization is not None: + mag = self.mag_normalization(mag) + + features = mag + + if self.use_ipd: + # Calculate IPD relative to average spec + spec_mean = torch.mean(input, axis=1, keepdim=True) + ipd = torch.angle(input) - torch.angle(spec_mean) + # Modulo to [-pi, pi] + ipd = wrap_to_pi(ipd) + + if self.ipd_normalization is not None: + ipd = self.ipd_normalization(ipd) + + # Concatenate to existing features + features = torch.cat([features.expand(ipd.shape), ipd], axis=2) + + if self._num_channels is not None and features.size(1) != self._num_channels: + raise RuntimeError( + f'Number of channels in features {features.size(1)} is different than the configured number of channels {self._num_channels}' + ) + + return features, input_length + + +class MaskEstimatorRNN(NeuralModule): + """Estimate `num_outputs` masks from the input spectrogram + using stacked RNNs and projections. + + The module is structured as follows: + input --> spatial features --> input projection --> + --> stacked RNNs --> output projection for each output --> sigmoid + + Reference: + Multi-microphone neural speech separation for far-field multi-talker + speech recognition (https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8462081) + + Args: + num_outputs: Number of output masks to estimate + num_subbands: Number of subbands of the input spectrogram + num_features: Number of features after the input projections + num_layers: Number of RNN layers + num_hidden_features: Number of hidden features in RNN layers + num_input_channels: Number of input channels + dropout: If non-zero, introduces dropout on the outputs of each RNN layer except the last layer, with dropout + probability equal to `dropout`. Default: 0 + bidirectional: If `True`, use bidirectional RNN. + rnn_type: Type of RNN, either `lstm` or `gru`. Default: `lstm` + mag_reduction: Channel-wise reduction for magnitude features + use_ipd: Use inter-channel phase difference (IPD) features + """ + + def __init__( + self, + num_outputs: int, + num_subbands: int, + num_features: int = 1024, + num_layers: int = 3, + num_hidden_features: Optional[int] = None, + num_input_channels: Optional[int] = None, + dropout: float = 0, + bidirectional=True, + rnn_type: str = 'lstm', + mag_reduction: str = 'rms', + use_ipd: bool = None, + ): + super().__init__() + if num_hidden_features is None: + num_hidden_features = num_features + + self.features = SpectrogramToMultichannelFeatures( + num_subbands=num_subbands, + num_input_channels=num_input_channels, + mag_reduction=mag_reduction, + use_ipd=use_ipd, + ) + + self.input_projection = torch.nn.Linear( + in_features=self.features.num_features * self.features.num_channels, out_features=num_features + ) + + if rnn_type == 'lstm': + self.rnn = torch.nn.LSTM( + input_size=num_features, + hidden_size=num_hidden_features, + num_layers=num_layers, + batch_first=True, + dropout=dropout, + bidirectional=bidirectional, + ) + elif rnn_type == 'gru': + self.rnn = torch.nn.GRU( + input_size=num_features, + hidden_size=num_hidden_features, + num_layers=num_layers, + batch_first=True, + dropout=dropout, + bidirectional=bidirectional, + ) + else: + raise ValueError(f'Unknown rnn_type: {rnn_type}') + + # Each output shares the RNN and has a separate projection + self.output_projections = torch.nn.ModuleList( + [ + torch.nn.Linear( + in_features=2 * num_features if bidirectional else num_features, out_features=num_subbands + ) + for _ in range(num_outputs) + ] + ) + self.output_nonlinearity = torch.nn.Sigmoid() + + @property + def input_types(self) -> Dict[str, NeuralType]: + """Returns definitions of module output ports. + """ + return { + "input": NeuralType(('B', 'C', 'D', 'T'), SpectrogramType()), + "input_length": NeuralType(('B',), LengthsType()), + } + + @property + def output_types(self) -> Dict[str, NeuralType]: + """Returns definitions of module output ports. + """ + return { + "output": NeuralType(('B', 'C', 'D', 'T'), FloatType()), + "output_length": NeuralType(('B',), LengthsType()), + } + + @typecheck() + def forward(self, input: torch.Tensor, input_length: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]: + """Estimate `num_outputs` masks from the input spectrogram. + + Args: + input: C-channel input, shape (B, C, F, N) + input_length: Length of valid entries along the time dimension, shape (B,) + + Returns: + Returns `num_outputs` masks in a tensor, shape (B, num_outputs, F, N), + and output length with shape (B,) + """ + input, _ = self.features(input=input, input_length=input_length) + B, num_feature_channels, num_features, N = input.shape + + # (B, num_feat_channels, num_feat, N) -> (B, N, num_feat_channels, num_feat) + input = input.permute(0, 3, 1, 2) + + # (B, N, num_feat_channels, num_feat) -> (B, N, num_feat_channels * num_features) + input = input.view(B, N, -1) + + # Apply projection on num_feat + input = self.input_projection(input) + + # Apply RNN on the input sequence + input_packed = torch.nn.utils.rnn.pack_padded_sequence( + input, input_length.cpu(), batch_first=True, enforce_sorted=False + ).to(input.device) + self.rnn.flatten_parameters() + input_packed, _ = self.rnn(input_packed) + input, input_length = torch.nn.utils.rnn.pad_packed_sequence(input_packed, batch_first=True) + input_length = input_length.to(input.device) + + # Create `num_outputs` masks + output = [] + for output_projection in self.output_projections: + # Output projection + mask = output_projection(input) + mask = self.output_nonlinearity(mask) + + # Back to the original format + # (B, N, F) -> (B, F, N) + mask = mask.transpose(2, 1) + + # Append to the output + output.append(mask) + + # Stack along channel dimension to get (B, M, F, N) + output = torch.stack(output, axis=1) + + # Mask frames beyond input length + length_mask: torch.Tensor = make_seq_mask_like( + lengths=input_length, like=output, time_dim=-1, valid_ones=False + ) + output = output.masked_fill(length_mask, 0.0) + + return output, input_length + + +class MaskReferenceChannel(NeuralModule): + """A simple mask processor which applies mask + on ref_channel of the input signal. + + Args: + ref_channel: Index of the reference channel. + mask_min_db: Threshold mask to a minimal value before applying it, defaults to -200dB + mask_max_db: Threshold mask to a maximal value before applying it, defaults to 0dB + """ + + def __init__(self, ref_channel: int = 0, mask_min_db: float = -200, mask_max_db: float = 0): + super().__init__() + self.ref_channel = ref_channel + # Mask thresholding + self.mask_min = db2mag(mask_min_db) + self.mask_max = db2mag(mask_max_db) + + @property + def input_types(self) -> Dict[str, NeuralType]: + """Returns definitions of module output ports. + """ + return { + "input": NeuralType(('B', 'C', 'D', 'T'), SpectrogramType()), + "input_length": NeuralType(('B',), LengthsType()), + "mask": NeuralType(('B', 'C', 'D', 'T'), FloatType()), + } + + @property + def output_types(self) -> Dict[str, NeuralType]: + """Returns definitions of module output ports. + """ + return { + "output": NeuralType(('B', 'C', 'D', 'T'), SpectrogramType()), + "output_length": NeuralType(('B',), LengthsType()), + } + + @typecheck() + def forward( + self, input: torch.Tensor, input_length: torch.Tensor, mask: torch.Tensor + ) -> Tuple[torch.Tensor, torch.Tensor]: + """Apply mask on `ref_channel` of the input signal. + This can be used to generate multi-channel output. + If `mask` has `M` channels, the output will have `M` channels as well. + + Args: + input: Input signal complex-valued spectrogram, shape (B, C, F, N) + input_length: Length of valid entries along the time dimension, shape (B,) + mask: Mask for M outputs, shape (B, M, F, N) + + Returns: + M-channel output complex-valed spectrogram with shape (B, M, F, N) + """ + # Apply thresholds + mask = torch.clamp(mask, min=self.mask_min, max=self.mask_max) + + # Apply each output mask on the ref channel + output = mask * input[:, self.ref_channel : self.ref_channel + 1, ...] + return output, input_length + + +class MaskBasedBeamformer(NeuralModule): + """Multi-channel processor using masks to estimate signal statistics. + + Args: + filter_type: string denoting the type of the filter. Defaults to `mvdr` + ref_channel: reference channel for processing + mask_min_db: Threshold mask to a minimal value before applying it, defaults to -200dB + mask_max_db: Threshold mask to a maximal value before applying it, defaults to 0dB + """ + + def __init__( + self, + filter_type: str = 'mvdr_souden', + ref_channel: int = 0, + mask_min_db: float = -200, + mask_max_db: float = 0, + ): + if not HAVE_TORCHAUDIO: + logging.error('Could not import torchaudio. Some features might not work.') + + raise ModuleNotFoundError( + "torchaudio is not installed but is necessary to instantiate a {self.__class__.__name__}" + ) + + super().__init__() + self.ref_channel = ref_channel + self.filter_type = filter_type + if self.filter_type == 'mvdr_souden': + self.psd = torchaudio.transforms.PSD() + self.filter = torchaudio.transforms.SoudenMVDR() + else: + raise ValueError(f'Unknown filter type {filter_type}') + # Mask thresholding + self.mask_min = db2mag(mask_min_db) + self.mask_max = db2mag(mask_max_db) + + @property + def input_types(self) -> Dict[str, NeuralType]: + """Returns definitions of module output ports. + """ + return { + "input": NeuralType(('B', 'C', 'D', 'T'), SpectrogramType()), + "input_length": NeuralType(('B',), LengthsType()), + "mask": NeuralType(('B', 'C', 'D', 'T'), FloatType()), + } + + @property + def output_types(self) -> Dict[str, NeuralType]: + """Returns definitions of module output ports. + """ + return { + "output": NeuralType(('B', 'C', 'D', 'T'), SpectrogramType()), + "output_length": NeuralType(('B',), LengthsType()), + } + + @typecheck() + def forward(self, input: torch.Tensor, input_length: torch.Tensor, mask: torch.Tensor) -> torch.Tensor: + """Apply a mask-based beamformer to the input spectrogram. + This can be used to generate multi-channel output. + If `mask` has `M` channels, the output will have `M` channels as well. + + Args: + input: Input signal complex-valued spectrogram, shape (B, C, F, N) + input_length: Length of valid entries along the time dimension, shape (B,) + mask: Mask for M output signals, shape (B, M, F, N) + + Returns: + M-channel output signal complex-valued spectrogram, shape (B, M, F, N) + """ + # Apply threshold on the mask + mask = torch.clamp(mask, min=self.mask_min, max=self.mask_max) + # Length mask + length_mask: torch.Tensor = make_seq_mask_like( + lengths=input_length, like=mask[:, 0, ...], time_dim=-1, valid_ones=False + ) + # Use each mask to generate an output at ref_channel + output = [] + for m in range(mask.size(1)): + # Prepare mask for the desired and the undesired signal + mask_desired = mask[:, m, ...].masked_fill(length_mask, 0.0) + mask_undesired = (1 - mask_desired).masked_fill(length_mask, 0.0) + # Calculate PSDs + psd_desired = self.psd(input, mask_desired) + psd_undesired = self.psd(input, mask_undesired) + # Apply filter + output_m = self.filter(input, psd_desired, psd_undesired, reference_channel=self.ref_channel) + output_m = output_m.masked_fill(length_mask, 0.0) + # Save the current output (B, F, N) + output.append(output_m) + + output = torch.stack(output, axis=1) + + return output, input_length diff --git a/nemo/collections/asr/modules/audio_preprocessing.py b/nemo/collections/asr/modules/audio_preprocessing.py index f133426faf33..75a034cbafce 100644 --- a/nemo/collections/asr/modules/audio_preprocessing.py +++ b/nemo/collections/asr/modules/audio_preprocessing.py @@ -16,13 +16,17 @@ import random from abc import ABC, abstractmethod from dataclasses import dataclass -from typing import Any, Optional +from typing import Any, Dict, Optional, Tuple import torch from packaging import version from nemo.collections.asr.parts.numba.spec_augment import SpecAugmentNumba, spec_augment_launch_heuristics -from nemo.collections.asr.parts.preprocessing.features import FilterbankFeatures, FilterbankFeaturesTA +from nemo.collections.asr.parts.preprocessing.features import ( + FilterbankFeatures, + FilterbankFeaturesTA, + make_seq_mask_like, +) from nemo.collections.asr.parts.submodules.spectr_augment import SpecAugment, SpecCutout from nemo.core.classes import Exportable, NeuralModule, typecheck from nemo.core.neural_types import ( @@ -51,6 +55,8 @@ __all__ = [ 'AudioToMelSpectrogramPreprocessor', + 'AudioToSpectrogram', + 'SpectrogramToAudio', 'AudioToMFCCPreprocessor', 'SpectrogramAugmentation', 'MaskedPatchAugmentation', @@ -684,6 +690,200 @@ def restore_from(cls, restore_path: str): pass +class AudioToSpectrogram(NeuralModule): + """Transform a batch of input multi-channel signals into a batch of + STFT-based spectrograms. + + Args: + fft_length: length of FFT + hop_length: length of hops/shifts of the sliding window + power: exponent for magnitude spectrogram. Default `None` will + return a complex-valued spectrogram + """ + + def __init__(self, fft_length: int, hop_length: int, power: Optional[float] = None): + if not HAVE_TORCHAUDIO: + logging.error('Could not import torchaudio. Some features might not work.') + + raise ModuleNotFoundError( + "torchaudio is not installed but is necessary to instantiate a {self.__class__.__name__}" + ) + + super().__init__() + + # For now, assume FFT length is divisible by two + if fft_length % 2 != 0: + raise ValueError(f'fft_length = {fft_length} must be divisible by 2') + + self.stft = torchaudio.transforms.Spectrogram( + n_fft=fft_length, hop_length=hop_length, power=power, pad_mode='constant' + ) + + # number of subbands + self.F = fft_length // 2 + 1 + + @property + def num_subbands(self) -> int: + return self.F + + @property + def input_types(self) -> Dict[str, NeuralType]: + """Returns definitions of module output ports. + """ + return { + "input": NeuralType(('B', 'C', 'T'), AudioSignal()), + "input_length": NeuralType(('B',), LengthsType()), + } + + @property + def output_types(self) -> Dict[str, NeuralType]: + """Returns definitions of module output ports. + """ + return { + "output": NeuralType(('B', 'C', 'D', 'T'), SpectrogramType()), + "output_length": NeuralType(('B',), LengthsType()), + } + + @typecheck() + def forward(self, input: torch.Tensor, input_length: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]: + """Convert a batch of C-channel input signals + into a batch of complex-valued spectrograms. + + Args: + input: Time-domain input signal with C channels, shape (B, C, T) + input_length: Length of valid entries along the time dimension, shape (B,) + + Returns: + Output spectrogram with F subbands and N time frames, shape (B, C, F, N) + and output length with shape (B,). + """ + output_length = self.get_output_length(input_length=input_length) + + B, T = input.size(0), input.size(-1) + input = input.view(B, -1, T) + + # STFT output (B, C, F, N) + with torch.cuda.amp.autocast(enabled=False): + output = self.stft(input.float()) + + # Mask padded frames + length_mask: torch.Tensor = make_seq_mask_like( + lengths=output_length, like=output, time_dim=-1, valid_ones=False + ) + output = output.masked_fill(length_mask, 0.0) + + return output, output_length + + def get_output_length(self, input_length: torch.Tensor) -> torch.Tensor: + """Get length of valid frames for the output. + + Args: + input_length: number of valid samples, shape (B,) + + Returns: + Number of valid frames, shape (B,) + """ + output_length = input_length.div(self.stft.hop_length, rounding_mode='floor').add(1).long() + return output_length + + +class SpectrogramToAudio(NeuralModule): + """Transform a batch of input multi-channel spectrograms into a batch of + time-domain multi-channel signals. + + Args: + fft_length: length of FFT + hop_length: length of hops/shifts of the sliding window + power: exponent for magnitude spectrogram. Default `None` will + return a complex-valued spectrogram + """ + + def __init__(self, fft_length: int, hop_length: int): + if not HAVE_TORCHAUDIO: + logging.error('Could not import torchaudio. Some features might not work.') + + raise ModuleNotFoundError( + "torchaudio is not installed but is necessary to instantiate a {self.__class__.__name__}" + ) + + super().__init__() + + # For now, assume FFT length is divisible by two + if fft_length % 2 != 0: + raise ValueError(f'fft_length = {fft_length} must be divisible by 2') + + self.istft = torchaudio.transforms.InverseSpectrogram( + n_fft=fft_length, hop_length=hop_length, pad_mode='constant' + ) + + self.F = fft_length // 2 + 1 + + @property + def num_subbands(self) -> int: + return self.F + + @property + def input_types(self) -> Dict[str, NeuralType]: + """Returns definitions of module output ports. + """ + return { + "input": NeuralType(('B', 'C', 'D', 'T'), SpectrogramType()), + "input_length": NeuralType(('B',), LengthsType()), + } + + @property + def output_types(self) -> Dict[str, NeuralType]: + """Returns definitions of module output ports. + """ + return { + "output": NeuralType(('B', 'C', 'T'), AudioSignal()), + "output_length": NeuralType(('B',), LengthsType()), + } + + @typecheck() + def forward(self, input: torch.Tensor, input_length: torch.Tensor) -> torch.Tensor: + """Convert input complex-valued spectrogram to a time-domain + signal. Multi-channel IO is supported. + + Args: + input: Input spectrogram for C channels, shape (B, C, F, N) + input_length: Length of valid entries along the time dimension, shape (B,) + + Returns: + Time-domain signal with T time-domain samples and C channels, (B, C, T) + and output length with shape (B,). + """ + output_length = self.get_output_length(input_length=input_length) + + B, F, N = input.size(0), input.size(-2), input.size(-1) + assert F == self.F, f'Number of subbands F={F} not matching self.F={self.F}' + input = input.view(B, -1, F, N) + + # iSTFT output (B, C, T) + with torch.cuda.amp.autocast(enabled=False): + output = self.istft(input.cfloat()) + + # Mask padded samples + length_mask: torch.Tensor = make_seq_mask_like( + lengths=output_length, like=output, time_dim=-1, valid_ones=False + ) + output = output.masked_fill(length_mask, 0.0) + + return output, output_length + + def get_output_length(self, input_length: torch.Tensor) -> torch.Tensor: + """Get length of valid samples for the output. + + Args: + input_length: number of valid frames, shape (B,) + + Returns: + Number of valid samples, shape (B,) + """ + output_length = input_length.sub(1).mul(self.istft.hop_length).long() + return output_length + + @dataclass class AudioToMelSpectrogramPreprocessorConfig: _target_: str = "nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor" diff --git a/nemo/collections/asr/parts/utils/audio_utils.py b/nemo/collections/asr/parts/utils/audio_utils.py index 93d399740be1..eda0c8957381 100644 --- a/nemo/collections/asr/parts/utils/audio_utils.py +++ b/nemo/collections/asr/parts/utils/audio_utils.py @@ -12,6 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. +import math from typing import Iterable, Optional, Union import librosa @@ -19,6 +20,7 @@ import numpy.typing as npt import scipy import soundfile as sf +import torch from scipy.spatial.distance import pdist, squareform from nemo.utils import logging @@ -400,3 +402,60 @@ def get_segment_start(signal: np.ndarray, segment: np.ndarray) -> int: ) cc = scipy.signal.correlate(signal, segment, mode='valid') return np.argmax(cc) + + +def calculate_sdr_numpy( + estimate: np.ndarray, + target: np.ndarray, + scale_invariant: bool = False, + remove_mean: bool = True, + sdr_max: Optional[float] = None, + eps: float = 1e-10, +) -> float: + """Calculate signal-to-distortion ratio. + + SDR = 10 * log10( ||t||_2^2 / (||e-t||_2^2 + alpha * ||t||^2) + + where + alpha = 10^(-sdr_max/10) + + Optionally, apply scale-invariant scaling to target signal. + + Args: + estimate: estimated signal + target: target signal + + Returns: + SDR in dB. + """ + if remove_mean: + estimate = estimate - np.mean(estimate) + target = target - np.mean(target) + + if scale_invariant: + estimate_dot_target = np.mean(estimate * target) + target_pow = np.mean(np.abs(target) ** 2) + target_scale = estimate_dot_target / (target_pow + eps) + target = target_scale * target + + target_pow = np.mean(np.abs(target) ** 2) + distortion_pow = np.mean(np.abs(estimate - target) ** 2) + + if sdr_max is not None: + distortion_pow = distortion_pow + 10 ** (-sdr_max / 10) * target_pow + + sdr = 10 * np.log10(target_pow / (distortion_pow + eps) + eps) + return sdr + + +def wrap_to_pi(x: torch.Tensor) -> torch.Tensor: + """Wrap angle in radians to [-pi, pi] + + Args: + x: angle in radians + + Returns: + Angle in radians wrapped to [-pi, pi] + """ + pi = torch.tensor(math.pi, device=x.device) + return torch.remainder(x + pi, 2 * pi) - pi diff --git a/tests/collections/asr/test_asr_datasets.py b/tests/collections/asr/test_asr_datasets.py index e8946f88b7b4..a8ce58d51ba8 100644 --- a/tests/collections/asr/test_asr_datasets.py +++ b/tests/collections/asr/test_asr_datasets.py @@ -774,10 +774,10 @@ def test_list_to_multichannel(self, num_channels, num_targets): _rng = np.random.default_rng(seed=random_seed) # Multi-channel signal - golden_target = _rng.normal(size=(num_samples, num_channels * num_targets)) + golden_target = _rng.normal(size=(num_channels * num_targets, num_samples)) # Create a list of num_targets signals with num_channels channels - target_list = [golden_target[:, n * num_channels : (n + 1) * num_channels] for n in range(num_targets)] + target_list = [golden_target[n * num_channels : (n + 1) * num_channels, :] for n in range(num_targets)] # Check the original signal is not modified assert (ASRAudioProcessor.list_to_multichannel(golden_target) == golden_target).all() @@ -812,7 +812,7 @@ def test_audio_collate_fn(self): for n in range(batch_size): item = dict() for signal, num_channels in signal_to_channels.items(): - random_signal = _rng.normal(size=(signal_to_length[signal][n], num_channels)) + random_signal = _rng.normal(size=(num_channels, signal_to_length[signal][n])) random_signal = np.squeeze(random_signal) # get rid of channel dimention for single-channel item[signal] = torch.tensor(random_signal) batch.append(item) @@ -844,9 +844,9 @@ def test_audio_collate_fn(self): uut_signal = b_signal[n][:uut_length, ...] golden_signal = batch[n][signal][:uut_length, ...].cpu().detach().numpy() - assert np.isclose( + assert np.allclose( uut_signal, golden_signal, atol=atol - ).all(), f'Example {n} signal {signal} value mismatch.' + ), f'Example {n} signal {signal} value mismatch.' @pytest.mark.unit def test_audio_to_target_dataset(self): @@ -900,7 +900,7 @@ def test_audio_to_target_dataset(self): if num_channels == 1: random_signal = _rng.uniform(low=-0.5, high=0.5, size=(data_duration_samples[n])) else: - random_signal = _rng.uniform(low=-0.5, high=0.5, size=(data_duration_samples[n], num_channels)) + random_signal = _rng.uniform(low=-0.5, high=0.5, size=(num_channels, data_duration_samples[n])) data[signal].append(random_signal) with tempfile.TemporaryDirectory() as test_dir: @@ -917,7 +917,7 @@ def test_audio_to_target_dataset(self): signal_filename = f'{signal}_{n:02d}.wav' # write audio files - sf.write(os.path.join(test_dir, signal_filename), data[signal][n], sample_rate, 'float') + sf.write(os.path.join(test_dir, signal_filename), data[signal][n].T, sample_rate, 'float') # update metadata meta[data_key[signal]] = signal_filename @@ -967,14 +967,14 @@ def test_audio_to_target_dataset(self): assert ( item_signal.shape == golden_signal.shape ), f'Signal {signal}: item shape {item_signal.shape} not matching reference shape {golden_signal.shape}' - assert np.isclose( + assert np.allclose( item_signal, golden_signal, atol=atol - ).all(), f'Test 1: Failed for example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 1: Failed for example {n}, signal {signal} (random seed {random_seed})' item_factory_signal = item_factory[signal].cpu().detach().numpy() - assert np.isclose( + assert np.allclose( item_factory_signal, golden_signal, atol=atol - ).all(), f'Test 1: Failed for factory example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 1: Failed for factory example {n}, signal {signal} (random seed {random_seed})' # Test 2 # - Filtering based on signal duration @@ -1001,9 +1001,9 @@ def test_audio_to_target_dataset(self): assert ( item_signal.shape == golden_signal.shape ), f'Signal {signal}: item shape {item_signal.shape} not matching reference shape {golden_signal.shape}' - assert np.isclose( + assert np.allclose( item_signal, golden_signal, atol=atol - ).all(), f'Test 2: Failed for example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 2: Failed for example {n}, signal {signal} (random seed {random_seed})' # Test 3 # - Use channel selector @@ -1027,13 +1027,13 @@ def test_audio_to_target_dataset(self): for signal in data: cs = channel_selector[signal] item_signal = item[signal].cpu().detach().numpy() - golden_signal = data[signal][n][..., cs] + golden_signal = data[signal][n][cs, ...] assert ( item_signal.shape == golden_signal.shape ), f'Signal {signal}: item shape {item_signal.shape} not matching reference shape {golden_signal.shape}' - assert np.isclose( + assert np.allclose( item_signal, golden_signal, atol=atol - ).all(), f'Test 3: Failed for example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 3: Failed for example {n}, signal {signal} (random seed {random_seed})' # Test 4 # - Use fixed duration (random segment selection) @@ -1067,7 +1067,7 @@ def test_audio_to_target_dataset(self): # of the first signal, and then use it fixed for other signals if golden_start is None: golden_start = get_segment_start( - signal=full_golden_signal[:, 0], segment=item_signal[:, 0] + signal=full_golden_signal[0, :], segment=item_signal[0, :] ) if not random_offset: assert ( @@ -1075,20 +1075,20 @@ def test_audio_to_target_dataset(self): ), f'Expecting the signal to start at 0 when random_offset is False' golden_end = golden_start + audio_duration_samples - golden_signal = full_golden_signal[golden_start:golden_end, ...] + golden_signal = full_golden_signal[..., golden_start:golden_end] # Test length is correct assert ( - len(item_signal) == audio_duration_samples - ), f'Test 4: Signal length ({len(item_signal)}) not matching the expected length ({audio_duration_samples})' + item_signal.shape[-1] == audio_duration_samples + ), f'Test 4: Signal length ({item_signal.shape[-1]}) not matching the expected length ({audio_duration_samples})' assert ( item_signal.shape == golden_signal.shape ), f'Signal {signal}: item shape {item_signal.shape} not matching reference shape {golden_signal.shape}' # Test signal values - assert np.isclose( + assert np.allclose( item_signal, golden_signal, atol=atol - ).all(), f'Test 4: Failed for example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 4: Failed for example {n}, signal {signal} (random seed {random_seed})' # Test 5: # - Test collate_fn @@ -1102,8 +1102,8 @@ def test_audio_to_target_dataset(self): assert signal_shape == ( batch_size, - audio_duration_samples, data_num_channels[signal], + audio_duration_samples, ), f'Test 5: Unexpected signal {signal} shape {signal_shape}' assert len(signal_len) == batch_size, f'Test 5: Unexpected length of signal_len ({len(signal_len)})' assert all(signal_len == audio_duration_samples), f'Test 5: Unexpected signal_len {signal_len}' @@ -1154,7 +1154,7 @@ def test_audio_to_target_dataset_with_target_list(self): if num_channels == 1: random_signal = _rng.uniform(low=-0.5, high=0.5, size=(data_duration_samples[n])) else: - random_signal = _rng.uniform(low=-0.5, high=0.5, size=(data_duration_samples[n], num_channels)) + random_signal = _rng.uniform(low=-0.5, high=0.5, size=(num_channels, data_duration_samples[n])) data[signal].append(random_signal) with tempfile.TemporaryDirectory() as test_dir: @@ -1176,7 +1176,7 @@ def test_audio_to_target_dataset_with_target_list(self): # write audio file sf.write( os.path.join(test_dir, signal_filename[-1]), - data[signal][n][:, ch], + data[signal][n][ch, :], sample_rate, 'float', ) @@ -1185,7 +1185,7 @@ def test_audio_to_target_dataset_with_target_list(self): signal_filename = f'{signal}_{n:02d}.wav' # write audio files - sf.write(os.path.join(test_dir, signal_filename), data[signal][n], sample_rate, 'float') + sf.write(os.path.join(test_dir, signal_filename), data[signal][n].T, sample_rate, 'float') # update metadata meta[data_key[signal]] = signal_filename @@ -1224,14 +1224,14 @@ def test_audio_to_target_dataset_with_target_list(self): assert ( item_signal.shape == golden_signal.shape ), f'Signal {signal}: item shape {item_signal.shape} not matching reference shape {golden_signal.shape}' - assert np.isclose( + assert np.allclose( item_signal, golden_signal, atol=atol - ).all(), f'Test 1: Failed for example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 1: Failed for example {n}, signal {signal} (random seed {random_seed})' item_factory_signal = item_factory[signal].cpu().detach().numpy() - assert np.isclose( + assert np.allclose( item_factory_signal, golden_signal, atol=atol - ).all(), f'Test 1: Failed for factory example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 1: Failed for factory example {n}, signal {signal} (random seed {random_seed})' # Test 2 # Set target as the first channel of input_filepath and all files listed in target_filepath. @@ -1252,13 +1252,13 @@ def test_audio_to_target_dataset_with_target_list(self): golden_signal = data[signal][n] if signal == 'target_signal': # add the first channel of the input - golden_signal = np.concatenate([data['input_signal'][n][..., 0:1], golden_signal], axis=1) + golden_signal = np.concatenate([data['input_signal'][n][0:1, ...], golden_signal], axis=0) assert ( item_signal.shape == golden_signal.shape ), f'Signal {signal}: item shape {item_signal.shape} not matching reference shape {golden_signal.shape}' - assert np.isclose( + assert np.allclose( item_signal, golden_signal, atol=atol - ).all(), f'Test 2: Failed for example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 2: Failed for example {n}, signal {signal} (random seed {random_seed})' @pytest.mark.unit def test_audio_to_target_dataset_for_inference(self): @@ -1304,7 +1304,7 @@ def test_audio_to_target_dataset_for_inference(self): if num_channels == 1: random_signal = _rng.uniform(low=-0.5, high=0.5, size=(data_duration_samples[n])) else: - random_signal = _rng.uniform(low=-0.5, high=0.5, size=(data_duration_samples[n], num_channels)) + random_signal = _rng.uniform(low=-0.5, high=0.5, size=(num_channels, data_duration_samples[n])) data[signal].append(random_signal) with tempfile.TemporaryDirectory() as test_dir: @@ -1316,7 +1316,7 @@ def test_audio_to_target_dataset_for_inference(self): # filenames signal_filename = f'{signal}_{n:02d}.wav' # write audio files - sf.write(os.path.join(test_dir, signal_filename), data[signal][n], sample_rate, 'float') + sf.write(os.path.join(test_dir, signal_filename), data[signal][n].T, sample_rate, 'float') # update metadata meta[data_key[signal]] = signal_filename meta['duration'] = data_duration[n] @@ -1359,14 +1359,14 @@ def test_audio_to_target_dataset_for_inference(self): assert ( item_signal.shape == golden_signal.shape ), f'Signal {signal}: item shape {item_signal.shape} not matching reference shape {golden_signal.shape}' - assert np.isclose( + assert np.allclose( item_signal, golden_signal, atol=atol - ).all(), f'Test 1: Failed for example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 1: Failed for example {n}, signal {signal} (random seed {random_seed})' item_factory_signal = item_factory[signal].cpu().detach().numpy() - assert np.isclose( + assert np.allclose( item_factory_signal, golden_signal, atol=atol - ).all(), f'Test 1: Failed for factory example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 1: Failed for factory example {n}, signal {signal} (random seed {random_seed})' @pytest.mark.unit def test_audio_to_target_with_reference_dataset(self): @@ -1419,7 +1419,7 @@ def test_audio_to_target_with_reference_dataset(self): if num_channels == 1: random_signal = _rng.uniform(low=-0.5, high=0.5, size=(data_duration_samples[n])) else: - random_signal = _rng.uniform(low=-0.5, high=0.5, size=(data_duration_samples[n], num_channels)) + random_signal = _rng.uniform(low=-0.5, high=0.5, size=(num_channels, data_duration_samples[n])) data[signal].append(random_signal) with tempfile.TemporaryDirectory() as test_dir: @@ -1436,7 +1436,7 @@ def test_audio_to_target_with_reference_dataset(self): signal_filename = f'{signal}_{n:02d}.wav' # write audio files - sf.write(os.path.join(test_dir, signal_filename), data[signal][n], sample_rate, 'float') + sf.write(os.path.join(test_dir, signal_filename), data[signal][n].T, sample_rate, 'float') # update metadata meta[data_key[signal]] = signal_filename @@ -1481,14 +1481,14 @@ def test_audio_to_target_with_reference_dataset(self): assert ( item_signal.shape == golden_signal.shape ), f'Signal {signal}: item shape {item_signal.shape} not matching reference shape {golden_signal.shape}' - assert np.isclose( + assert np.allclose( item_signal, golden_signal, atol=atol - ).all(), f'Test 1: Failed for example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 1: Failed for example {n}, signal {signal} (random seed {random_seed})' item_factory_signal = item_factory[signal].cpu().detach().numpy() - assert np.isclose( + assert np.allclose( item_factory_signal, golden_signal, atol=atol - ).all(), f'Test 1: Failed for factory example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 1: Failed for factory example {n}, signal {signal} (random seed {random_seed})' # Test 2 # - Use fixed duration (random segment selection) @@ -1520,19 +1520,19 @@ def test_audio_to_target_with_reference_dataset(self): # Find random segment using correlation on the first channel # of the first signal, and then use it fixed for other signals if golden_start is None: - golden_start = get_segment_start(signal=full_golden_signal[:, 0], segment=item_signal[:, 0]) + golden_start = get_segment_start(signal=full_golden_signal[0, :], segment=item_signal[0, :]) golden_end = golden_start + audio_duration_samples - golden_signal = full_golden_signal[golden_start:golden_end, ...] + golden_signal = full_golden_signal[..., golden_start:golden_end] # Test length is correct assert ( - len(item_signal) == audio_duration_samples - ), f'Test 2: Signal {signal} length ({len(item_signal)}) not matching the expected length ({audio_duration_samples})' + item_signal.shape[-1] == audio_duration_samples + ), f'Test 2: Signal {signal} length ({item_signal.shape[-1]}) not matching the expected length ({audio_duration_samples})' # Test signal values - assert np.isclose( + assert np.allclose( item_signal, golden_signal, atol=atol - ).all(), f'Test 2: Failed for example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 2: Failed for example {n}, signal {signal} (random seed {random_seed})' # Test 3 # - Use fixed duration (random segment selection) @@ -1569,22 +1569,28 @@ def test_audio_to_target_with_reference_dataset(self): # of the first signal, and then use it fixed for other signals if golden_start is None: golden_start = get_segment_start( - signal=full_golden_signal[:, 0], segment=item_signal[:, 0] + signal=full_golden_signal[0, :], segment=item_signal[0, :] ) golden_end = golden_start + audio_duration_samples - golden_signal = full_golden_signal[golden_start:golden_end, ...] + golden_signal = full_golden_signal[..., golden_start:golden_end] # Test length is correct assert ( - len(item_signal) == audio_duration_samples - ), f'Test 3: Signal {signal} length ({len(item_signal)}) not matching the expected length ({audio_duration_samples})' + item_signal.shape[-1] == audio_duration_samples + ), f'Test 3: Signal {signal} length ({item_signal.shape[-1]}) not matching the expected length ({audio_duration_samples})' assert ( item_signal.shape == golden_signal.shape ), f'Signal {signal}: item shape {item_signal.shape} not matching reference shape {golden_signal.shape}' # Test signal values - assert np.isclose( + assert np.allclose( item_signal, golden_signal, atol=atol - ).all(), f'Test 3: Failed for example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 3: Failed for example {n}, signal {signal} (random seed {random_seed})' + + # Test 4: + # - Test collate_fn + batch_size = 16 + batch = [dataset.__getitem__(n) for n in range(batch_size)] + _ = dataset.collate_fn(batch) @pytest.mark.unit def test_audio_to_target_with_embedding_dataset(self): @@ -1637,7 +1643,7 @@ def test_audio_to_target_with_embedding_dataset(self): if num_channels == 1: random_signal = _rng.uniform(low=-0.5, high=0.5, size=(data_length)) else: - random_signal = _rng.uniform(low=-0.5, high=0.5, size=(data_length, num_channels)) + random_signal = _rng.uniform(low=-0.5, high=0.5, size=(num_channels, data_length)) data[signal].append(random_signal) with tempfile.TemporaryDirectory() as test_dir: @@ -1659,7 +1665,7 @@ def test_audio_to_target_with_embedding_dataset(self): signal_filename = f'{signal}_{n:02d}.wav' # write audio files - sf.write(os.path.join(test_dir, signal_filename), data[signal][n], sample_rate, 'float') + sf.write(os.path.join(test_dir, signal_filename), data[signal][n].T, sample_rate, 'float') # update metadata meta[data_key[signal]] = signal_filename @@ -1701,14 +1707,20 @@ def test_audio_to_target_with_embedding_dataset(self): assert ( item_signal.shape == golden_signal.shape ), f'Signal {signal}: item shape {item_signal.shape} not matching reference shape {golden_signal.shape}' - assert np.isclose( + assert np.allclose( item_signal, golden_signal, atol=atol - ).all(), f'Test 1: Failed for example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 1: Failed for example {n}, signal {signal} (random seed {random_seed})' item_factory_signal = item_factory[signal].cpu().detach().numpy() - assert np.isclose( + assert np.allclose( item_factory_signal, golden_signal, atol=atol - ).all(), f'Test 1: Failed for factory example {n}, signal {signal} (random seed {random_seed})' + ), f'Test 1: Failed for factory example {n}, signal {signal} (random seed {random_seed})' + + # Test 2: + # - Test collate_fn + batch_size = 16 + batch = [dataset.__getitem__(n) for n in range(batch_size)] + _ = dataset.collate_fn(batch) class TestUtilityFunctions: diff --git a/tests/collections/asr/test_asr_losses.py b/tests/collections/asr/test_asr_losses.py new file mode 100644 index 000000000000..5f9b5c108158 --- /dev/null +++ b/tests/collections/asr/test_asr_losses.py @@ -0,0 +1,341 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import pytest +import torch + +from nemo.collections.asr.losses.audio_losses import SDRLoss, calculate_sdr_batch +from nemo.collections.asr.parts.utils.audio_utils import calculate_sdr_numpy + + +class TestAudioLosses: + @pytest.mark.unit + @pytest.mark.parametrize('num_channels', [1, 4]) + def test_sdr(self, num_channels: int): + """Test SDR calculation + """ + test_eps = [0, 1e-16, 1e-1] + batch_size = 8 + num_samples = 50 + num_batches = 10 + random_seed = 42 + atol = 1e-6 + + _rng = np.random.default_rng(seed=random_seed) + + for remove_mean in [True, False]: + for eps in test_eps: + + sdr_loss = SDRLoss(eps=eps, remove_mean=remove_mean) + + for n in range(num_batches): + + # Generate random signal + target = _rng.normal(size=(batch_size, num_channels, num_samples)) + # Random noise + scaling + noise = _rng.uniform(low=0.01, high=1) * _rng.normal(size=(batch_size, num_channels, num_samples)) + # Estimate + estimate = target + noise + + # DC bias for both + target += _rng.uniform(low=-1, high=1) + estimate += _rng.uniform(low=-1, high=1) + + # Tensors for testing the loss + tensor_estimate = torch.tensor(estimate) + tensor_target = torch.tensor(target) + + # Reference SDR + golden_sdr = np.zeros((batch_size, num_channels)) + for b in range(batch_size): + for m in range(num_channels): + golden_sdr[b, m] = calculate_sdr_numpy( + estimate=estimate[b, m, :], target=target[b, m, :], remove_mean=remove_mean, eps=eps, + ) + + # Calculate SDR in torch + uut_sdr = calculate_sdr_batch( + estimate=tensor_estimate, target=tensor_target, remove_mean=remove_mean, eps=eps, + ) + + # Calculate SDR loss + uut_sdr_loss = sdr_loss(estimate=tensor_estimate, target=tensor_target) + + # Compare torch SDR vs numpy + assert np.allclose( + uut_sdr.cpu().detach().numpy(), golden_sdr, atol=atol + ), f'SDR not matching for example {n}, eps={eps}, remove_mean={remove_mean}' + + # Compare SDR loss vs average of torch SDR + assert np.isclose( + uut_sdr_loss, -uut_sdr.mean(), atol=atol + ), f'SDRLoss not matching for example {n}, eps={eps}, remove_mean={remove_mean}' + + @pytest.mark.unit + @pytest.mark.parametrize('num_channels', [1, 4]) + def test_sdr_weighted(self, num_channels: int): + """Test SDR calculation with weighting for channels + """ + batch_size = 8 + num_samples = 50 + num_batches = 10 + random_seed = 42 + atol = 1e-6 + + _rng = np.random.default_rng(seed=random_seed) + + channel_weight = _rng.uniform(low=0.01, high=1.0, size=num_channels) + channel_weight = channel_weight / np.sum(channel_weight) + sdr_loss = SDRLoss(weight=channel_weight) + + for n in range(num_batches): + + # Generate random signal + target = _rng.normal(size=(batch_size, num_channels, num_samples)) + # Random noise + scaling + noise = _rng.uniform(low=0.001, high=10) * _rng.normal(size=target.shape) + # Estimate + estimate = target + noise + + # Tensors for testing the loss + tensor_estimate = torch.tensor(estimate) + tensor_target = torch.tensor(target) + + # Reference SDR + golden_sdr = 0 + for b in range(batch_size): + sdr = [ + calculate_sdr_numpy(estimate=estimate[b, m, :], target=target[b, m, :]) + for m in range(num_channels) + ] + # weighted sum + sdr = np.sum(np.array(sdr) * channel_weight) + golden_sdr += sdr + golden_sdr /= batch_size # average over batch + + # Calculate SDR + uut_sdr_loss = sdr_loss(estimate=tensor_estimate, target=tensor_target) + + # Compare + assert np.allclose( + uut_sdr_loss.cpu().detach().numpy(), -golden_sdr, atol=atol + ), f'SDRLoss not matching for example {n}' + + @pytest.mark.unit + @pytest.mark.parametrize('num_channels', [1, 4]) + def test_sdr_input_length(self, num_channels): + """Test SDR calculation with input length. + """ + batch_size = 8 + max_num_samples = 50 + num_batches = 10 + random_seed = 42 + atol = 1e-6 + + _rng = np.random.default_rng(seed=random_seed) + + sdr_loss = SDRLoss() + + for n in range(num_batches): + + # Generate random signal + target = _rng.normal(size=(batch_size, num_channels, max_num_samples)) + # Random noise + scaling + noise = _rng.uniform(low=0.001, high=10) * _rng.normal(size=target.shape) + # Estimate + estimate = target + noise + + # Limit calculation to random input_length samples + input_length = _rng.integers(low=1, high=max_num_samples, size=batch_size) + + # Tensors for testing the loss + tensor_estimate = torch.tensor(estimate) + tensor_target = torch.tensor(target) + tensor_input_length = torch.tensor(input_length) + + # Reference SDR + golden_sdr = 0 + for b, b_len in enumerate(input_length): + sdr = [ + calculate_sdr_numpy(estimate=estimate[b, m, :b_len], target=target[b, m, :b_len]) + for m in range(num_channels) + ] + sdr = np.mean(np.array(sdr)) + golden_sdr += sdr + golden_sdr /= batch_size # average over batch + + # Calculate SDR + uut_sdr_loss = sdr_loss(estimate=tensor_estimate, target=tensor_target, input_length=tensor_input_length) + + # Compare + assert np.allclose( + uut_sdr_loss.cpu().detach().numpy(), -golden_sdr, atol=atol + ), f'SDRLoss not matching for example {n}' + + @pytest.mark.unit + @pytest.mark.parametrize('num_channels', [1, 4]) + def test_sdr_scale_invariant(self, num_channels: int): + """Test SDR calculation with scale invariant option. + """ + batch_size = 8 + max_num_samples = 50 + num_batches = 10 + random_seed = 42 + atol = 1e-6 + + _rng = np.random.default_rng(seed=random_seed) + + sdr_loss = SDRLoss(scale_invariant=True) + + for n in range(num_batches): + + # Generate random signal + target = _rng.normal(size=(batch_size, num_channels, max_num_samples)) + # Random noise + scaling + noise = _rng.uniform(low=0.001, high=10) * _rng.normal(size=target.shape) + # Estimate + estimate = target + noise + + # Limit calculation to random input_length samples + input_length = _rng.integers(low=1, high=max_num_samples, size=batch_size) + + # Tensors for testing the loss + tensor_estimate = torch.tensor(estimate) + tensor_target = torch.tensor(target) + tensor_input_length = torch.tensor(input_length) + + # Reference SDR + golden_sdr = 0 + for b, b_len in enumerate(input_length): + sdr = [ + calculate_sdr_numpy( + estimate=estimate[b, m, :b_len], target=target[b, m, :b_len], scale_invariant=True + ) + for m in range(num_channels) + ] + sdr = np.mean(np.array(sdr)) + golden_sdr += sdr + golden_sdr /= batch_size # average over batch + + # Calculate SDR loss + uut_sdr_loss = sdr_loss(estimate=tensor_estimate, target=tensor_target, input_length=tensor_input_length) + + # Compare + assert np.allclose( + uut_sdr_loss.cpu().detach().numpy(), -golden_sdr, atol=atol + ), f'SDRLoss not matching for example {n}' + + @pytest.mark.unit + @pytest.mark.parametrize('num_channels', [1, 4]) + def test_sdr_binary_mask(self, num_channels): + """Test SDR calculation with temporal mask. + """ + batch_size = 8 + max_num_samples = 50 + num_batches = 10 + random_seed = 42 + atol = 1e-6 + + _rng = np.random.default_rng(seed=random_seed) + + sdr_loss = SDRLoss() + + for n in range(num_batches): + + # Generate random signal + target = _rng.normal(size=(batch_size, num_channels, max_num_samples)) + # Random noise + scaling + noise = _rng.uniform(low=0.001, high=10) * _rng.normal(size=target.shape) + # Estimate + estimate = target + noise + + # Limit calculation to masked samples + mask = _rng.integers(low=0, high=2, size=(batch_size, max_num_samples)) + + # Tensors for testing the loss + tensor_estimate = torch.tensor(estimate) + tensor_target = torch.tensor(target) + tensor_mask = torch.tensor(mask) + + # Reference SDR + golden_sdr = 0 + for b in range(batch_size): + sdr = [ + calculate_sdr_numpy(estimate=estimate[b, m, mask[b, :] > 0], target=target[b, m, mask[b, :] > 0]) + for m in range(num_channels) + ] + sdr = np.mean(np.array(sdr)) + golden_sdr += sdr + golden_sdr /= batch_size # average over batch + + # Calculate SDR loss + uut_sdr_loss = sdr_loss(estimate=tensor_estimate, target=tensor_target, mask=tensor_mask) + + # Compare + assert np.allclose( + uut_sdr_loss.cpu().detach().numpy(), -golden_sdr, atol=atol + ), f'SDRLoss not matching for example {n}' + + @pytest.mark.unit + @pytest.mark.parametrize('num_channels', [1]) + @pytest.mark.parametrize('sdr_max', [10, 0]) + def test_sdr_max(self, num_channels: int, sdr_max: float): + """Test SDR calculation with soft max threshold. + """ + batch_size = 8 + max_num_samples = 50 + num_batches = 10 + random_seed = 42 + atol = 1e-6 + + _rng = np.random.default_rng(seed=random_seed) + + sdr_loss = SDRLoss(sdr_max=sdr_max) + + for n in range(num_batches): + + # Generate random signal + target = _rng.normal(size=(batch_size, num_channels, max_num_samples)) + # Random noise + scaling + noise = _rng.uniform(low=0.001, high=10) * _rng.normal(size=target.shape) + # Estimate + estimate = target + noise + + # Limit calculation to random input_length samples + input_length = _rng.integers(low=1, high=max_num_samples, size=batch_size) + + # Tensors for testing the loss + tensor_estimate = torch.tensor(estimate) + tensor_target = torch.tensor(target) + tensor_input_length = torch.tensor(input_length) + + # Reference SDR + golden_sdr = 0 + for b, b_len in enumerate(input_length): + sdr = [ + calculate_sdr_numpy(estimate=estimate[b, m, :b_len], target=target[b, m, :b_len], sdr_max=sdr_max) + for m in range(num_channels) + ] + sdr = np.mean(np.array(sdr)) + golden_sdr += sdr + golden_sdr /= batch_size # average over batch + + # Calculate SDR loss + uut_sdr_loss = sdr_loss(estimate=tensor_estimate, target=tensor_target, input_length=tensor_input_length) + + # Compare + assert np.allclose( + uut_sdr_loss.cpu().detach().numpy(), -golden_sdr, atol=atol + ), f'SDRLoss not matching for example {n}' diff --git a/tests/collections/asr/test_audio_modules.py b/tests/collections/asr/test_audio_modules.py new file mode 100644 index 000000000000..de32bd5651d9 --- /dev/null +++ b/tests/collections/asr/test_audio_modules.py @@ -0,0 +1,191 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import importlib +from typing import Optional + +import numpy as np +import pytest +import torch + +from nemo.collections.asr.modules.audio_modules import MaskReferenceChannel, SpectrogramToMultichannelFeatures +from nemo.collections.asr.modules.audio_preprocessing import AudioToSpectrogram + +try: + importlib.import_module('torchaudio') + + HAVE_TORCHAUDIO = True +except ModuleNotFoundError: + HAVE_TORCHAUDIO = False + + +class TestSpectrogramToMultichannelFeatures: + @pytest.mark.unit + @pytest.mark.skipif(not HAVE_TORCHAUDIO, reason="Modules in this test require torchaudio") + @pytest.mark.parametrize('fft_length', [256]) + @pytest.mark.parametrize('num_channels', [1, 4]) + @pytest.mark.parametrize('mag_reduction', [None, 'rms', 'abs_mean', 'mean_abs']) + def test_magnitude(self, fft_length: int, num_channels: int, mag_reduction: Optional[str]): + """Test calculation of spatial features for multi-channel audio. + """ + atol = 1e-6 + batch_size = 8 + num_samples = fft_length * 50 + num_examples = 25 + random_seed = 42 + + _rng = np.random.default_rng(seed=random_seed) + + hop_length = fft_length // 4 + audio2spec = AudioToSpectrogram(fft_length=fft_length, hop_length=hop_length) + + spec2feat = SpectrogramToMultichannelFeatures( + num_subbands=audio2spec.num_subbands, mag_reduction=mag_reduction, use_ipd=False, mag_normalization=None, + ) + + for n in range(num_examples): + x = _rng.normal(size=(batch_size, num_channels, num_samples)) + + spec, spec_len = audio2spec(input=torch.Tensor(x), input_length=torch.Tensor([num_samples] * batch_size)) + + # UUT output + feat, _ = spec2feat(input=spec, input_length=spec_len) + feat_np = feat.cpu().detach().numpy() + + # Golden output + spec_np = spec.cpu().detach().numpy() + if mag_reduction is None: + feat_golden = np.abs(spec_np) + elif mag_reduction == 'rms': + feat_golden = np.sqrt(np.mean(np.abs(spec_np) ** 2, axis=1, keepdims=True)) + elif mag_reduction == 'mean_abs': + feat_golden = np.mean(np.abs(spec_np), axis=1, keepdims=True) + elif mag_reduction == 'abs_mean': + feat_golden = np.abs(np.mean(spec_np, axis=1, keepdims=True)) + else: + raise NotImplementedError() + + # Compare shape + assert feat_np.shape == feat_golden.shape, f'Feature shape not matching for example {n}' + + # Compare values + assert np.allclose(feat_np, feat_golden, atol=atol), f'Features not matching for example {n}' + + @pytest.mark.unit + @pytest.mark.skipif(not HAVE_TORCHAUDIO, reason="Modules in this test require torchaudio") + @pytest.mark.parametrize('fft_length', [256]) + @pytest.mark.parametrize('num_channels', [1, 4]) + def test_ipd(self, fft_length: int, num_channels: int): + """Test calculation of IPD spatial features for multi-channel audio. + """ + atol = 1e-5 + batch_size = 8 + num_samples = fft_length * 50 + num_examples = 10 + random_seed = 42 + + _rng = np.random.default_rng(seed=random_seed) + + hop_length = fft_length // 4 + audio2spec = AudioToSpectrogram(fft_length=fft_length, hop_length=hop_length) + + spec2feat = SpectrogramToMultichannelFeatures( + num_subbands=audio2spec.num_subbands, + mag_reduction='rms', + use_ipd=True, + mag_normalization=None, + ipd_normalization=None, + ) + + for n in range(num_examples): + x = _rng.normal(size=(batch_size, num_channels, num_samples)) + + spec, spec_len = audio2spec(input=torch.Tensor(x), input_length=torch.Tensor([num_samples] * batch_size)) + + # UUT output + feat, _ = spec2feat(input=spec, input_length=spec_len) + feat_np = feat.cpu().detach().numpy() + ipd = feat_np[..., audio2spec.num_subbands :, :] + + # Golden output + spec_np = spec.cpu().detach().numpy() + spec_mean = np.mean(spec_np, axis=1, keepdims=True) + ipd_golden = np.angle(spec_np) - np.angle(spec_mean) + ipd_golden = np.remainder(ipd_golden + np.pi, 2 * np.pi) - np.pi + + # Compare shape + assert ipd.shape == ipd_golden.shape, f'Feature shape not matching for example {n}' + + # Compare values + assert np.allclose(ipd, ipd_golden, atol=atol), f'Features not matching for example {n}' + + +class TestMaskBasedProcessor: + @pytest.mark.unit + @pytest.mark.skipif(not HAVE_TORCHAUDIO, reason="Modules in this test require torchaudio") + @pytest.mark.parametrize('fft_length', [256]) + @pytest.mark.parametrize('num_channels', [1, 4]) + @pytest.mark.parametrize('num_masks', [1, 2]) + def test_mask_reference_channel(self, fft_length: int, num_channels: int, num_masks: int): + """Test masking of the reference channel. + """ + if num_channels == 1: + # Only one channel available + ref_channels = [0] + else: + # Use first or last channel for MC signals + ref_channels = [0, num_channels - 1] + + atol = 1e-6 + batch_size = 8 + num_samples = fft_length * 50 + num_examples = 10 + random_seed = 42 + + _rng = np.random.default_rng(seed=random_seed) + + hop_length = fft_length // 4 + audio2spec = AudioToSpectrogram(fft_length=fft_length, hop_length=hop_length) + + for ref_channel in ref_channels: + + mask_processor = MaskReferenceChannel(ref_channel=ref_channel) + + for n in range(num_examples): + x = _rng.normal(size=(batch_size, num_channels, num_samples)) + + spec, spec_len = audio2spec( + input=torch.Tensor(x), input_length=torch.Tensor([num_samples] * batch_size) + ) + + # Randomly-generated mask + mask = _rng.uniform( + low=0.0, high=1.0, size=(batch_size, num_masks, audio2spec.num_subbands, spec.shape[-1]) + ) + + # UUT output + out, _ = mask_processor(input=spec, input_length=spec_len, mask=torch.tensor(mask)) + out_np = out.cpu().detach().numpy() + + # Golden output + spec_np = spec.cpu().detach().numpy() + out_golden = np.zeros_like(mask, dtype=spec_np.dtype) + for m in range(num_masks): + out_golden[:, m, ...] = spec_np[:, ref_channel, ...] * mask[:, m, ...] + + # Compare shape + assert out_np.shape == out_golden.shape, f'Output shape not matching for example {n}' + + # Compare values + assert np.allclose(out_np, out_golden, atol=atol), f'Output not matching for example {n}' diff --git a/tests/collections/asr/test_audio_preprocessing.py b/tests/collections/asr/test_audio_preprocessing.py new file mode 100644 index 000000000000..93dca28df217 --- /dev/null +++ b/tests/collections/asr/test_audio_preprocessing.py @@ -0,0 +1,189 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import importlib + +import numpy as np +import pytest +import torch + +from nemo.collections.asr.modules.audio_preprocessing import AudioToSpectrogram, SpectrogramToAudio + +try: + importlib.import_module('torchaudio') + + HAVE_TORCHAUDIO = True +except ModuleNotFoundError: + HAVE_TORCHAUDIO = False + + +class TestAudioSpectrogram: + @pytest.mark.unit + @pytest.mark.skipif(not HAVE_TORCHAUDIO, reason="Modules in this test require torchaudio") + @pytest.mark.parametrize('fft_length', [64, 512]) + @pytest.mark.parametrize('num_channels', [1, 3]) + def test_audio_to_spec(self, fft_length: int, num_channels: int): + """Test output length for audio to spectrogram. + + Create signals of arbitrary length and check output + length is matching the actual transform length. + """ + hop_lengths = [fft_length // 2, fft_length // 3, fft_length // 4] + batch_size = 4 + num_examples = 20 + random_seed = 42 + atol = 1e-6 + + _rng = np.random.default_rng(seed=random_seed) + + for n in range(num_examples): + + # Generate time-domain examples with different length + input_length = _rng.integers(low=fft_length, high=100 * fft_length, size=batch_size) # in samples + x = _rng.normal(size=(batch_size, num_channels, np.max(input_length))) + x = torch.tensor(x) + for b in range(batch_size): + x[b, :, input_length[b] :] = 0 + + for hop_length in hop_lengths: + # Prepare transform + audio2spec = AudioToSpectrogram(fft_length=fft_length, hop_length=hop_length) + + # Transform the whole batch + batch_spec, batch_spec_len = audio2spec(input=x, input_length=torch.tensor(input_length)) + + for b in range(batch_size): + + # Transform just the current example + b_spec, b_spec_len = audio2spec( + input=x[b : b + 1, :, : input_length[b]], input_length=torch.tensor([input_length[b]]) + ) + actual_len = b_spec.size(-1) + + # Check lengths + assert ( + actual_len == b_spec_len + ), f'Output length not matching for example ({n}, {b}) with length {input_length[n]} (hop_length={hop_length}): true {actual_len} vs calculated {b_spec_len}.' + + assert ( + actual_len == batch_spec_len[b] + ), f'Output length not matching for example ({n}, {b}) with length {input_length[n]} (hop_length={hop_length}): true {actual_len} vs calculated batch len {batch_spec_len[b]}.' + + # Make sure transforming a batch is the same as transforming individual examples + assert torch.allclose( + batch_spec[b, ..., :actual_len], b_spec, atol=atol + ), f'Spectrograms not matching for example ({n}, {b}) with length {input_length[b]} (hop_length={hop_length})' + + @pytest.mark.unit + @pytest.mark.skipif(not HAVE_TORCHAUDIO, reason="Modules in this test require torchaudio") + @pytest.mark.parametrize('fft_length', [64, 512]) + @pytest.mark.parametrize('num_channels', [1, 3]) + def test_spec_to_audio(self, fft_length: int, num_channels: int): + """Test output length for spectrogram to audio. + + Create signals of arbitrary length and check output + length is matching the actual transform length. + """ + hop_lengths = [fft_length // 2, fft_length // 3, fft_length // 4] + batch_size = 4 + num_examples = 20 + random_seed = 42 + atol = 1e-6 + + _rng = np.random.default_rng(seed=random_seed) + + for n in range(num_examples): + + # Generate spectrogram examples with different lengths + input_length = _rng.integers(low=10, high=100, size=batch_size) # in frames + input_shape = (batch_size, num_channels, fft_length // 2 + 1, np.max(input_length)) + spec = _rng.normal(size=input_shape) + 1j * _rng.normal(size=input_shape) + spec = torch.tensor(spec) + spec[..., 0, :] = spec[..., 0, :].real + spec[..., -1, :] = spec[..., -1, :].real + for b in range(batch_size): + spec[b, ..., input_length[b] :] = 0 + + for hop_length in hop_lengths: + # Prepare transform + spec2audio = SpectrogramToAudio(fft_length=fft_length, hop_length=hop_length) + + # Transform the whole batch + batch_x, batch_x_len = spec2audio(input=spec, input_length=torch.tensor(input_length)) + + for b in range(batch_size): + + # Transform just the current example + b_x, b_x_len = spec2audio( + input=spec[b : b + 1, ..., : input_length[b]], input_length=torch.tensor([input_length[b]]) + ) + + actual_len = b_x.size(-1) + + # Check lengths + assert ( + b_x_len == actual_len + ), f'Output length not matching for example ({n}, {b}) with {input_length[b]} frames (hop_length={hop_length}): true {actual_len} vs calculated {b_x_len}.' + + assert ( + batch_x_len[b] == actual_len + ), f'Output length not matching for example ({n}, {b}) with {input_length[b]} frames (hop_length={hop_length}): true {actual_len} vs calculated batch {batch_x_len[b]}.' + + # Make sure transforming a batch is the same as transforming individual examples + if input_length[b] < spec.size(-1): + # Discard the last bit of the signal which differs due to number of frames in batch (with zero padded frames) vs individual (only valid frames). + # The reason for this difference is normalization with `window_sumsquare` of the inverse STFT. More specifically, + # batched and non-batched transform are using on a different number of frames. + tail_length = max(fft_length // 2 - hop_length, 0) + else: + tail_length = 0 + valid_len = actual_len - tail_length + batch_x_valid = batch_x[b, :, :valid_len] + b_x_valid = b_x[..., :valid_len] + assert torch.allclose( + batch_x_valid, b_x_valid, atol=atol + ), f'Signals not matching for example ({n}, {b}) with length {input_length[b]} (hop_length={hop_length}): max abs diff {torch.max(torch.abs(batch_x_valid-b_x_valid))} at {torch.argmax(torch.abs(batch_x_valid-b_x_valid))}' + + @pytest.mark.unit + @pytest.mark.skipif(not HAVE_TORCHAUDIO, reason="Modules in this test require torchaudio") + @pytest.mark.parametrize('fft_length', [128, 1024]) + @pytest.mark.parametrize('num_channels', [1, 4]) + def test_audio_to_spectrogram_reconstruction(self, fft_length: int, num_channels: int): + """Test analysis and synthesis transform result in a perfect reconstruction. + """ + batch_size = 4 + num_samples = fft_length * 50 + num_examples = 25 + random_seed = 42 + atol = 1e-6 + + _rng = np.random.default_rng(seed=random_seed) + + hop_lengths = [fft_length // 2, fft_length // 4] + + for hop_length in hop_lengths: + audio2spec = AudioToSpectrogram(fft_length=fft_length, hop_length=hop_length) + spec2audio = SpectrogramToAudio(fft_length=fft_length, hop_length=hop_length) + + for n in range(num_examples): + x = _rng.normal(size=(batch_size, num_channels, num_samples)) + + x_spec, x_spec_length = audio2spec( + input=torch.Tensor(x), input_length=torch.Tensor([num_samples] * batch_size) + ) + x_hat, x_hat_length = spec2audio(input=x_spec, input_length=x_spec_length) + + assert np.allclose( + x_hat.cpu().detach().numpy(), x, atol=atol + ), f'Reconstructed not matching for example {n} (hop length {hop_length})' diff --git a/tests/collections/asr/utils/test_audio_utils.py b/tests/collections/asr/utils/test_audio_utils.py index 489a7a774078..43fd3096b13c 100644 --- a/tests/collections/asr/utils/test_audio_utils.py +++ b/tests/collections/asr/utils/test_audio_utils.py @@ -22,6 +22,7 @@ from nemo.collections.asr.parts.utils.audio_utils import SOUND_VELOCITY as sound_velocity from nemo.collections.asr.parts.utils.audio_utils import ( + calculate_sdr_numpy, db2mag, estimated_coherence, generate_approximate_noise_field, @@ -294,3 +295,72 @@ def test_get_segment_start(self): assert ( estimated_start == start ), f'Example {n}: estimated start ({estimated_start}) not matching the actual start ({start})' + + @pytest.mark.unit + def test_calculate_sdr_numpy(self): + atol = 1e-6 + random_seed = 42 + num_examples = 50 + num_samples = 2000 + + _rng = np.random.default_rng(seed=random_seed) + + for n in range(num_examples): + # Generate signal + target = _rng.normal(size=num_samples) + # Adjust the estimate + golden_sdr = _rng.integers(low=-10, high=10) + estimate = target * (1 + 10 ** (-golden_sdr / 20)) + + # UUT + estimated_sdr = calculate_sdr_numpy(estimate=estimate, target=target, remove_mean=False) + + assert np.isclose( + estimated_sdr, golden_sdr, atol=atol + ), f'Example {n}: estimated ({estimated_sdr}) not matching the actual value ({golden_sdr})' + + # Add random mean and use remove_mean=True + # SDR should not change + target += _rng.uniform(low=-10, high=10) + estimate += _rng.uniform(low=-10, high=10) + + # UUT + estimated_sdr = calculate_sdr_numpy(estimate=estimate, target=target, remove_mean=True) + + assert np.isclose( + estimated_sdr, golden_sdr, atol=atol + ), f'Example {n}: estimated ({estimated_sdr}) not matching the actual value ({golden_sdr})' + + @pytest.mark.unit + def test_calculate_sdr_numpy_scale_invariant(self): + atol = 1e-6 + random_seed = 42 + num_examples = 50 + num_samples = 2000 + + _rng = np.random.default_rng(seed=random_seed) + + for n in range(num_examples): + # Generate signal + target = _rng.normal(size=num_samples) + # Adjust the estimate + estimate = target + _rng.uniform(low=0.01, high=1) * _rng.normal(size=target.size) + + # scaled target + target_scaled = target / (np.linalg.norm(target) + 1e-16) + target_scaled = np.sum(estimate * target_scaled) * target_scaled + + golden_sdr = calculate_sdr_numpy( + estimate=estimate, target=target_scaled, scale_invariant=False, remove_mean=False + ) + + # UUT + estimated_sdr = calculate_sdr_numpy( + estimate=estimate, target=target, scale_invariant=True, remove_mean=False + ) + + print(golden_sdr, estimated_sdr) + + assert np.isclose( + estimated_sdr, golden_sdr, atol=atol + ), f'Example {n}: estimated ({estimated_sdr}) not matching the actual value ({golden_sdr})' From e4210780edde6a04676acffc3191532462aeaedd Mon Sep 17 00:00:00 2001 From: Sean Naren Date: Tue, 20 Dec 2022 19:41:21 +0000 Subject: [PATCH 084/154] Expose ClusteringDiarizer device (#5681) * Expose device for users to set Signed-off-by: SeanNaren * Expose device for users to set Signed-off-by: SeanNaren Signed-off-by: SeanNaren Signed-off-by: Elena Rastorgueva --- .../diarization/conf/inference/diar_infer_general.yaml | 1 + .../diarization/conf/inference/diar_infer_meeting.yaml | 1 + .../diarization/conf/inference/diar_infer_telephonic.yaml | 1 + nemo/collections/asr/models/clustering_diarizer.py | 3 ++- 4 files changed, 5 insertions(+), 1 deletion(-) diff --git a/examples/speaker_tasks/diarization/conf/inference/diar_infer_general.yaml b/examples/speaker_tasks/diarization/conf/inference/diar_infer_general.yaml index 8036e2882b71..a4bfa167f103 100644 --- a/examples/speaker_tasks/diarization/conf/inference/diar_infer_general.yaml +++ b/examples/speaker_tasks/diarization/conf/inference/diar_infer_general.yaml @@ -9,6 +9,7 @@ name: &name "ClusterDiarizer" num_workers: 1 sample_rate: 16000 batch_size: 64 +device: null # can specify a specific device, i.e: cuda:1 (default cuda if cuda available, else cpu) diarizer: manifest_filepath: ??? diff --git a/examples/speaker_tasks/diarization/conf/inference/diar_infer_meeting.yaml b/examples/speaker_tasks/diarization/conf/inference/diar_infer_meeting.yaml index d31be0ca2646..1636dfda9bab 100644 --- a/examples/speaker_tasks/diarization/conf/inference/diar_infer_meeting.yaml +++ b/examples/speaker_tasks/diarization/conf/inference/diar_infer_meeting.yaml @@ -9,6 +9,7 @@ name: &name "ClusterDiarizer" num_workers: 1 sample_rate: 16000 batch_size: 64 +device: null # can specify a specific device, i.e: cuda:1 (default cuda if cuda available, else cpu) diarizer: manifest_filepath: ??? diff --git a/examples/speaker_tasks/diarization/conf/inference/diar_infer_telephonic.yaml b/examples/speaker_tasks/diarization/conf/inference/diar_infer_telephonic.yaml index a2b04a69130c..3ac94a22cd31 100644 --- a/examples/speaker_tasks/diarization/conf/inference/diar_infer_telephonic.yaml +++ b/examples/speaker_tasks/diarization/conf/inference/diar_infer_telephonic.yaml @@ -9,6 +9,7 @@ name: &name "ClusterDiarizer" num_workers: 1 sample_rate: 16000 batch_size: 64 +device: null # can specify a specific device, i.e: cuda:1 (default cuda if cuda available, else cpu) diarizer: manifest_filepath: ??? diff --git a/nemo/collections/asr/models/clustering_diarizer.py b/nemo/collections/asr/models/clustering_diarizer.py index 5b690f65e649..25186027a4c7 100644 --- a/nemo/collections/asr/models/clustering_diarizer.py +++ b/nemo/collections/asr/models/clustering_diarizer.py @@ -106,7 +106,8 @@ def __init__(self, cfg: DictConfig, speaker_model=None): # Clustering params self._cluster_params = self._diarizer_params.clustering.parameters - self._device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") + default_device = "cuda" if torch.cuda.is_available() else "cpu" + self._device = torch.device(cfg.device if cfg.device else default_device) @classmethod def list_available_models(cls): From 7a1319b13cdc25ddb5849345bb9981054aecbee9 Mon Sep 17 00:00:00 2001 From: Somshubra Majumdar Date: Tue, 20 Dec 2022 11:50:15 -0800 Subject: [PATCH 085/154] Add Beam Search support to ASR transcribe() (#5443) * Add support for beam decoding via high level API. Signed-off-by: smajumdar * Add ctc decoding section Signed-off-by: smajumdar * Update ctc transcribe API to return results from beam search Signed-off-by: smajumdar * Add argument to preserve arpa file Signed-off-by: smajumdar * Update script to use hydra config, add some support for future compute timesteps, add doc for ctc decoding Signed-off-by: smajumdar * Update eval script and doc to use new API Signed-off-by: smajumdar * Add tests for ctc greedy decoding Signed-off-by: smajumdar * Address reviewer comments and add docstrings Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix changes and address comments Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: smajumdar Co-authored-by: Samuel Kriman Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- docs/source/asr/api.rst | 19 + docs/source/asr/asr_language_modeling.rst | 74 +-- examples/asr/transcribe_speech.py | 3 +- nemo/collections/asr/metrics/wer.py | 117 ++++- nemo/collections/asr/metrics/wer_bpe.py | 36 +- nemo/collections/asr/models/ctc_models.py | 11 +- .../asr/parts/submodules/ctc_beam_decoding.py | 454 ++++++++++++++++++ .../ngram_lm/eval_beamsearch_ngram.py | 363 ++++++++------ .../ngram_lm/train_kenlm.py | 12 +- .../asr/decoding/test_ctc_decoding.py | 189 ++++++++ 10 files changed, 1064 insertions(+), 214 deletions(-) create mode 100644 nemo/collections/asr/parts/submodules/ctc_beam_decoding.py create mode 100644 tests/collections/asr/decoding/test_ctc_decoding.py diff --git a/docs/source/asr/api.rst b/docs/source/asr/api.rst index 590ca6d6b7f3..3f857f0270ac 100644 --- a/docs/source/asr/api.rst +++ b/docs/source/asr/api.rst @@ -199,6 +199,25 @@ Audio Augmentors Miscellaneous Classes --------------------- +CTC Decoding +~~~~~~~~~~~~ + +.. autoclass:: nemo.collections.asr.metrics.wer.CTCDecoding + :show-inheritance: + :members: + +.. autoclass:: nemo.collections.asr.metrics.wer_bpe.CTCBPEDecoding + :show-inheritance: + :members: + +.. autoclass:: nemo.collections.asr.parts.submodules.ctc_greedy_decoding.GreedyCTCInfer + :show-inheritance: + :members: + +.. autoclass:: nemo.collections.asr.parts.submodules.ctc_beam_decoding.BeamCTCInfer + :show-inheritance: + :members: + RNNT Decoding ~~~~~~~~~~~~~ diff --git a/docs/source/asr/asr_language_modeling.rst b/docs/source/asr/asr_language_modeling.rst index 26b8e56454cd..cf996670fd68 100644 --- a/docs/source/asr/asr_language_modeling.rst +++ b/docs/source/asr/asr_language_modeling.rst @@ -133,37 +133,43 @@ can get skipped. The following is the list of the arguments for the evaluation script: -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| **Argument** |**Type**| **Default** | **Description** | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| nemo_model_file | str | Required | The path of the `.nemo` file of the ASR model to extract the tokenizer. | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| input_manifest | str | Required | Path to the training file, it can be a text file or JSON manifest. | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| kenlm_model_file | str | Required | The path to store the KenLM binary model file. | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| preds_output_folder | str | None | The path to an optional folder to store the predictions. | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| probs_cache_file | str | None | The cache file for storing the outputs of the model. | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| acoustic_batch_size | int | 16 | The batch size to calculate log probabilities. | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| use_amp | bool | ``False`` | Whether to use AMP if available to calculate log probabilities. | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| device | str | cuda | The device to load the model onto to calculate log probabilities. | -| | | | It can `cpu`, `cuda`, `cuda:0`, `cuda:1`, ... | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| decoding_mode | str | beamsearch_ngram | The decoding scheme to be used for evaluation. | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| beam_width | float | Required | The width or list of the widths of the beam search decoding. | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| beam_alpha | float | Required | The alpha parameter or list of the alphas for the beam search decoding. | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| beam_beta | float | Required | The beta parameter or list of the betas for the beam search decoding. | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ -| beam_batch_size | int | 128 | The batch size to be used for beam search decoding. | -| | | | Larger batch size can be a little faster, but uses larger memory. | -+---------------------+--------+------------------+-------------------------------------------------------------------------+ ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| **Argument** | **Type** | **Default** | **Description** | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| nemo_model_file | str | Required | The path of the `.nemo` file of the ASR model to extract the tokenizer. | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| input_manifest | str | Required | Path to the training file, it can be a text file or JSON manifest. | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| kenlm_model_file | str | Required | The path to store the KenLM binary model file. | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| preds_output_folder | str | None | The path to an optional folder to store the predictions. | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| probs_cache_file | str | None | The cache file for storing the outputs of the model. | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| acoustic_batch_size | int | 16 | The batch size to calculate log probabilities. | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| use_amp | bool | False | Whether to use AMP if available to calculate log probabilities. | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| device | str | cuda | The device to load the model onto to calculate log probabilities. | +| | | | It can `cpu`, `cuda`, `cuda:0`, `cuda:1`, ... | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| decoding_mode | str | beamsearch_ngram | The decoding scheme to be used for evaluation. | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| beam_width | float | Required | List of the width or list of the widths of the beam search decoding. | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| beam_alpha | float | Required | List of the alpha parameter for the beam search decoding. | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| beam_beta | float | Required | List of the beta parameter for the beam search decoding. | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| beam_batch_size | int | 128 | The batch size to be used for beam search decoding. | +| | | | Larger batch size can be a little faster, but uses larger memory. | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| decoding_strategy | str | beam | String argument for type of decoding strategy for the model. | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ +| decoding | Dict | BeamCTC | Subdict of beam search configs. Values found via | +| | Config | InferConfig | python eval_beamsearch_ngram.py --help | ++---------------------+----------+------------------+-------------------------------------------------------------------------+ + Width of the beam search (`--beam_width`) specifies the number of top candidates/predictions the beam search decoder would search for. Larger beams result in more accurate but slower predictions. @@ -183,9 +189,9 @@ For instance, the following set of parameters would results in 2*1*2=4 beam sear .. code-block:: python eval_beamsearch_ngram.py ... \ - --beam_width 64 128 \ - --beam_alpha 1.0 \ - --beam_beta 1.0 0.5 + beam_width=[64,128] \ + beam_alpha=[1.0] \ + beam_beta=[1.0,0.5] .. _neural_rescoring: diff --git a/examples/asr/transcribe_speech.py b/examples/asr/transcribe_speech.py index 6c9044ef1149..3db0ca1e5e67 100644 --- a/examples/asr/transcribe_speech.py +++ b/examples/asr/transcribe_speech.py @@ -65,7 +65,8 @@ audio_type: Str filetype of the audio. Supported = wav, flac, mp3 overwrite_transcripts: Bool which when set allows repeated transcriptions to overwrite previous results. - + + ctc_decoding: Decoding sub-config for CTC. Refer to documentation for specific values. rnnt_decoding: Decoding sub-config for RNNT. Refer to documentation for specific values. # Usage diff --git a/nemo/collections/asr/metrics/wer.py b/nemo/collections/asr/metrics/wer.py index 9757f6c5df43..9a4de1285457 100644 --- a/nemo/collections/asr/metrics/wer.py +++ b/nemo/collections/asr/metrics/wer.py @@ -23,10 +23,10 @@ from omegaconf import DictConfig, OmegaConf from torchmetrics import Metric -from nemo.collections.asr.parts.submodules import ctc_greedy_decoding +from nemo.collections.asr.parts.submodules import ctc_beam_decoding, ctc_greedy_decoding from nemo.collections.asr.parts.utils.asr_confidence_utils import ConfidenceConfig, ConfidenceMixin from nemo.collections.asr.parts.utils.rnnt_utils import Hypothesis, NBestHypotheses -from nemo.utils import logging +from nemo.utils import logging, logging_mode __all__ = ['word_error_rate', 'word_error_rate_detail', 'WER', 'move_dimension_to_the_front'] @@ -152,6 +152,7 @@ class AbstractCTCDecoding(ConfidenceMixin): strategy: str value which represents the type of decoding that can occur. Possible values are : - greedy (for greedy decoding). + - beam (for DeepSpeed KenLM based decoding). compute_timestamps: A bool flag, which determines whether to compute the character/subword, or word based timestamp mapping the output log-probabilities to discrite intervals of timestamps. @@ -230,6 +231,25 @@ class AbstractCTCDecoding(ConfidenceMixin): compute_timestamps: Same as above, overrides above value. preserve_frame_confidence: Same as above, overrides above value. + "beam": + beam_size: int, defining the beam size for beam search. Must be >= 1. + If beam_size == 1, will perform cached greedy search. This might be slightly different + results compared to the greedy search above. + + return_best_hypothesis: optional bool, whether to return just the best hypothesis or all of the + hypotheses after beam search has concluded. This flag is set by default. + + beam_alpha: float, the strength of the Language model on the final score of a token. + final_score = acoustic_score + beam_alpha * lm_score + beam_beta * seq_length. + + beam_beta: float, the strength of the sequence length penalty on the final score of a token. + final_score = acoustic_score + beam_alpha * lm_score + beam_beta * seq_length. + + kenlm_path: str, path to a KenLM ARPA or .binary file (depending on the strategy chosen). + If the path is invalid (file is not found at path), will raise a deferred error at the moment + of calculation of beam search, so that users may update / change the decoding strategy + to point to the correct file. + blank_id: The id of the RNNT blank token. """ @@ -258,7 +278,7 @@ def __init__(self, decoding_cfg, blank_id: int): self.batch_dim_index = self.cfg.get('batch_dim_index', 0) self.word_seperator = self.cfg.get('word_seperator', ' ') - possible_strategies = ['greedy'] + possible_strategies = ['greedy', 'beam'] if self.cfg.strategy not in possible_strategies: raise ValueError(f"Decoding strategy must be one of {possible_strategies}. Given {self.cfg.strategy}") @@ -266,17 +286,22 @@ def __init__(self, decoding_cfg, blank_id: int): if self.preserve_alignments is None: if self.cfg.strategy in ['greedy']: self.preserve_alignments = self.cfg.greedy.get('preserve_alignments', False) + else: + self.preserve_alignments = self.cfg.beam.get('preserve_alignments', False) # Update compute timestamps if self.compute_timestamps is None: if self.cfg.strategy in ['greedy']: self.compute_timestamps = self.cfg.greedy.get('compute_timestamps', False) + elif self.cfg.strategy in ['beam']: + self.compute_timestamps = self.cfg.beam.get('compute_timestamps', False) # initialize confidence-related fields self._init_confidence(self.cfg.get('confidence_cfg', None)) # we need timestamps to extract non-blank per-frame confidence - self.compute_timestamps |= self.preserve_frame_confidence + if self.compute_timestamps is not None: + self.compute_timestamps |= self.preserve_frame_confidence if self.cfg.strategy == 'greedy': @@ -288,6 +313,40 @@ def __init__(self, decoding_cfg, blank_id: int): confidence_method_cfg=self.confidence_method_cfg, ) + elif self.cfg.strategy == 'beam': + + self.decoding = ctc_beam_decoding.BeamCTCInfer( + blank_id=blank_id, + beam_size=self.cfg.beam.get('beam_size', 1), + search_type='default', + return_best_hypothesis=self.cfg.beam.get('return_best_hypothesis', True), + preserve_alignments=self.preserve_alignments, + compute_timestamps=self.compute_timestamps, + beam_alpha=self.cfg.beam.get('beam_alpha', 1.0), + beam_beta=self.cfg.beam.get('beam_beta', 0.0), + kenlm_path=self.cfg.beam.get('kenlm_path', None), + ) + + self.decoding.override_fold_consecutive_value = False + + elif self.cfg.strategy == 'pyctcdecode': + + # NOTE: Currently not supported + self.decoding = ctc_beam_decoding.BeamCTCInfer( + blank_id=blank_id, + beam_size=self.cfg.beam.get('beam_size', 1), + search_type='pyctcdecode', + return_best_hypothesis=self.cfg.beam.get('return_best_hypothesis', True), + preserve_alignments=self.preserve_alignments, + compute_timestamps=self.compute_timestamps, + beam_alpha=self.cfg.beam.get('beam_alpha', 1.0), + beam_beta=self.cfg.beam.get('beam_beta', 0.0), + kenlm_path=self.cfg.beam.get('kenlm_path', None), + # pyctcdecode_cfg=self.cfg.beam.get('pyctcdecode_cfg', None), + ) + + self.decoding.override_fold_consecutive_value = False + else: raise ValueError( f"Incorrect decoding strategy supplied. Must be one of {possible_strategies}\n" @@ -326,6 +385,18 @@ def ctc_decoder_predictions_tensor( if isinstance(decoder_outputs, torch.Tensor): decoder_outputs = move_dimension_to_the_front(decoder_outputs, self.batch_dim_index) + if ( + hasattr(self.decoding, 'override_fold_consecutive_value') + and self.decoding.override_fold_consecutive_value is not None + ): + logging.info( + f"Beam search requires that consecutive ctc tokens are not folded. \n" + f"Overriding provided value of `fold_consecutive` = {fold_consecutive} to " + f"{self.decoding.override_fold_consecutive_value}", + mode=logging_mode.ONCE, + ) + fold_consecutive = self.decoding.override_fold_consecutive_value + with torch.inference_mode(): # Resolve the forward step of the decoding strategy hypotheses_list = self.decoding( @@ -822,13 +893,15 @@ def preserve_frame_confidence(self, value): class CTCDecoding(AbstractCTCDecoding): """ - Used for performing CTC auto-regressive / non-auto-regressive decoding of the logprobs. + Used for performing CTC auto-regressive / non-auto-regressive decoding of the logprobs for character + based models. Args: decoding_cfg: A dict-like object which contains the following key-value pairs. strategy: str value which represents the type of decoding that can occur. Possible values are : - greedy (for greedy decoding). + - beam (for DeepSpeed KenLM based decoding). compute_timestamps: A bool flag, which determines whether to compute the character/subword, or word based timestamp mapping the output log-probabilities to discrite intervals of timestamps. @@ -906,7 +979,25 @@ class CTCDecoding(AbstractCTCDecoding): preserve_alignments: Same as above, overrides above value. compute_timestamps: Same as above, overrides above value. preserve_frame_confidence: Same as above, overrides above value. - confidence_method: Same as above, overrides confidence_cfg.method. + + "beam": + beam_size: int, defining the beam size for beam search. Must be >= 1. + If beam_size == 1, will perform cached greedy search. This might be slightly different + results compared to the greedy search above. + + return_best_hypothesis: optional bool, whether to return just the best hypothesis or all of the + hypotheses after beam search has concluded. This flag is set by default. + + beam_alpha: float, the strength of the Language model on the final score of a token. + final_score = acoustic_score + beam_alpha * lm_score + beam_beta * seq_length. + + beam_beta: float, the strength of the sequence length penalty on the final score of a token. + final_score = acoustic_score + beam_alpha * lm_score + beam_beta * seq_length. + + kenlm_path: str, path to a KenLM ARPA or .binary file (depending on the strategy chosen). + If the path is invalid (file is not found at path), will raise a deferred error at the moment + of calculation of beam search, so that users may update / change the decoding strategy + to point to the correct file. blank_id: The id of the RNNT blank token. """ @@ -920,6 +1011,11 @@ def __init__( super().__init__(decoding_cfg=decoding_cfg, blank_id=blank_id) + # Finalize Beam Search Decoding framework + if isinstance(self.decoding, ctc_beam_decoding.AbstractBeamCTCInfer): + self.decoding.set_vocabulary(self.vocabulary) + self.decoding.set_decoding_type('char') + def _aggregate_token_confidence(self, hypothesis: Hypothesis) -> List[float]: """ Implemented by subclass in order to aggregate token confidence to a word-level confidence. @@ -1088,9 +1184,6 @@ class CTCDecodingConfig: # compute ctc time stamps compute_timestamps: Optional[bool] = None - # confidence config - confidence_cfg: ConfidenceConfig = ConfidenceConfig() - # token representing word seperator word_seperator: str = " " @@ -1102,3 +1195,9 @@ class CTCDecodingConfig: # greedy decoding config greedy: ctc_greedy_decoding.GreedyCTCInferConfig = ctc_greedy_decoding.GreedyCTCInferConfig() + + # beam decoding config + beam: ctc_beam_decoding.BeamCTCInferConfig = ctc_beam_decoding.BeamCTCInferConfig(beam_size=4) + + # confidence config + confidence_cfg: ConfidenceConfig = ConfidenceConfig() diff --git a/nemo/collections/asr/metrics/wer_bpe.py b/nemo/collections/asr/metrics/wer_bpe.py index 55fdebbfb77b..1cee87fc68fd 100644 --- a/nemo/collections/asr/metrics/wer_bpe.py +++ b/nemo/collections/asr/metrics/wer_bpe.py @@ -20,6 +20,7 @@ from torchmetrics import Metric from nemo.collections.asr.metrics.wer import AbstractCTCDecoding, CTCDecodingConfig +from nemo.collections.asr.parts.submodules import ctc_beam_decoding from nemo.collections.asr.parts.utils.rnnt_utils import Hypothesis from nemo.collections.common.tokenizers.tokenizer_spec import TokenizerSpec from nemo.utils import logging @@ -27,13 +28,15 @@ class CTCBPEDecoding(AbstractCTCDecoding): """ - Used for performing CTC auto-regressive / non-auto-regressive decoding of the logprobs. + Used for performing CTC auto-regressive / non-auto-regressive decoding of the logprobs for subword based + models. Args: decoding_cfg: A dict-like object which contains the following key-value pairs. strategy: str value which represents the type of decoding that can occur. Possible values are : - greedy (for greedy decoding). + - beam (for DeepSpeed KenLM based decoding). compute_timestamps: A bool flag, which determines whether to compute the character/subword, or word based timestamp mapping the output log-probabilities to discrite intervals of timestamps. @@ -111,7 +114,25 @@ class CTCBPEDecoding(AbstractCTCDecoding): preserve_alignments: Same as above, overrides above value. compute_timestamps: Same as above, overrides above value. preserve_frame_confidence: Same as above, overrides above value. - confidence_method: Same as above, overrides confidence_cfg.method. + + "beam": + beam_size: int, defining the beam size for beam search. Must be >= 1. + If beam_size == 1, will perform cached greedy search. This might be slightly different + results compared to the greedy search above. + + return_best_hypothesis: optional bool, whether to return just the best hypothesis or all of the + hypotheses after beam search has concluded. This flag is set by default. + + beam_alpha: float, the strength of the Language model on the final score of a token. + final_score = acoustic_score + beam_alpha * lm_score + beam_beta * seq_length. + + beam_beta: float, the strength of the sequence length penalty on the final score of a token. + final_score = acoustic_score + beam_alpha * lm_score + beam_beta * seq_length. + + kenlm_path: str, path to a KenLM ARPA or .binary file (depending on the strategy chosen). + If the path is invalid (file is not found at path), will raise a deferred error at the moment + of calculation of beam search, so that users may update / change the decoding strategy + to point to the correct file. tokenizer: NeMo tokenizer object, which inherits from TokenizerSpec. """ @@ -122,6 +143,17 @@ def __init__(self, decoding_cfg, tokenizer: TokenizerSpec): super().__init__(decoding_cfg=decoding_cfg, blank_id=blank_id) + # Finalize Beam Search Decoding framework + if isinstance(self.decoding, ctc_beam_decoding.AbstractBeamCTCInfer): + if hasattr(self.tokenizer.tokenizer, 'get_vocab'): + vocab_dict = self.tokenizer.tokenizer.get_vocab() + vocab = list(vocab_dict.keys()) + self.decoding.set_vocabulary(vocab) + else: + logging.warning("Could not resolve the vocabulary of the tokenizer !") + + self.decoding.set_decoding_type('subword') + def _aggregate_token_confidence(self, hypothesis: Hypothesis) -> List[float]: """ Implemented by subclass in order to aggregate token confidence to a word-level confidence. diff --git a/nemo/collections/asr/models/ctc_models.py b/nemo/collections/asr/models/ctc_models.py index d41e321e1df3..90ee8026ee55 100644 --- a/nemo/collections/asr/models/ctc_models.py +++ b/nemo/collections/asr/models/ctc_models.py @@ -204,13 +204,10 @@ def transcribe( if current_hypotheses[idx].alignments is None: current_hypotheses[idx].alignments = current_hypotheses[idx].y_sequence - hypotheses += current_hypotheses - - # Keep following for beam search integration - # if all_hyp is not None: - # all_hypotheses += all_hyp - # else: - # all_hypotheses += current_hypotheses + if all_hyp is None: + hypotheses += current_hypotheses + else: + hypotheses += all_hyp del greedy_predictions del logits diff --git a/nemo/collections/asr/parts/submodules/ctc_beam_decoding.py b/nemo/collections/asr/parts/submodules/ctc_beam_decoding.py new file mode 100644 index 000000000000..ad017bb20eb8 --- /dev/null +++ b/nemo/collections/asr/parts/submodules/ctc_beam_decoding.py @@ -0,0 +1,454 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from dataclasses import dataclass +from typing import Iterable, List, Optional, Tuple, Union + +import torch + +from nemo.collections.asr.parts.utils import rnnt_utils +from nemo.core.classes import Typing, typecheck +from nemo.core.neural_types import HypothesisType, LengthsType, LogprobsType, NeuralType +from nemo.utils import logging + + +def pack_hypotheses( + hypotheses: List[rnnt_utils.NBestHypotheses], logitlen: torch.Tensor, +) -> List[rnnt_utils.NBestHypotheses]: + + if logitlen is not None: + if hasattr(logitlen, 'cpu'): + logitlen_cpu = logitlen.to('cpu') + else: + logitlen_cpu = logitlen + + for idx, hyp in enumerate(hypotheses): # type: rnnt_utils.NBestHypotheses + for candidate_idx, cand in enumerate(hyp.n_best_hypotheses): + cand.y_sequence = torch.tensor(cand.y_sequence, dtype=torch.long) + + if logitlen is not None: + cand.length = logitlen_cpu[idx] + + if cand.dec_state is not None: + cand.dec_state = _states_to_device(cand.dec_state) + + return hypotheses + + +def _states_to_device(dec_state, device='cpu'): + if torch.is_tensor(dec_state): + dec_state = dec_state.to(device) + + elif isinstance(dec_state, (list, tuple)): + dec_state = tuple(_states_to_device(dec_i, device) for dec_i in dec_state) + + return dec_state + + +class AbstractBeamCTCInfer(Typing): + """A beam CTC decoder. + + Provides a common abstraction for sample level beam decoding. + + Args: + blank_id: int, index of the blank token. Can be 0 or len(vocabulary). + beam_size: int, size of the beam used in the underlying beam search engine. + + """ + + @property + def input_types(self): + """Returns definitions of module input ports. + """ + return { + "decoder_output": NeuralType(('B', 'T', 'D'), LogprobsType()), + "decoder_lengths": NeuralType(tuple('B'), LengthsType()), + } + + @property + def output_types(self): + """Returns definitions of module output ports. + """ + return {"predictions": [NeuralType(elements_type=HypothesisType())]} + + def __init__(self, blank_id: int, beam_size: int): + self.blank_id = blank_id + + if beam_size < 1: + raise ValueError("Beam search size cannot be less than 1!") + + self.beam_size = beam_size + + # Variables set by corresponding setter methods + self.vocab = None + self.decoding_type = None + + # Internal variable, used to prevent double reduction of consecutive tokens (ctc collapse) + self.override_fold_consecutive_value = None + + def set_vocabulary(self, vocab: List[str]): + """ + Set the vocabulary of the decoding framework. + + Args: + vocab: List of str. Each token corresponds to its location in the vocabulary emitted by the model. + Note that this vocabulary must NOT contain the "BLANK" token. + """ + self.vocab = vocab + + def set_decoding_type(self, decoding_type: str): + """ + Sets the decoding type of the framework. Can support either char or subword models. + + Args: + decoding_type: Str corresponding to decoding type. Only supports "char" and "subword". + """ + decoding_type = decoding_type.lower() + supported_types = ['char', 'subword'] + + if decoding_type not in supported_types: + raise ValueError( + f"Unsupported decoding type. Supported types = {supported_types}.\n" f"Given = {decoding_type}" + ) + + self.decoding_type = decoding_type + + @typecheck() + def forward( + self, decoder_output: torch.Tensor, decoder_lengths: torch.Tensor, + ) -> Tuple[List[Union[rnnt_utils.Hypothesis, rnnt_utils.NBestHypotheses]]]: + """Returns a list of hypotheses given an input batch of the encoder hidden embedding. + Output token is generated auto-repressively. + + Args: + decoder_output: A tensor of size (batch, timesteps, features) or (batch, timesteps) (each timestep is a label). + decoder_lengths: list of int representing the length of each sequence + output sequence. + + Returns: + packed list containing batch number of sentences (Hypotheses). + """ + raise NotImplementedError() + + def __call__(self, *args, **kwargs): + return self.forward(*args, **kwargs) + + +class BeamCTCInfer(AbstractBeamCTCInfer): + """A greedy CTC decoder. + + Provides a common abstraction for sample level and batch level greedy decoding. + + Args: + blank_index: int index of the blank token. Can be 0 or len(vocabulary). + preserve_alignments: Bool flag which preserves the history of logprobs generated during + decoding (sample / batched). When set to true, the Hypothesis will contain + the non-null value for `logprobs` in it. Here, `logprobs` is a torch.Tensors. + compute_timestamps: A bool flag, which determines whether to compute the character/subword, or + word based timestamp mapping the output log-probabilities to discrite intervals of timestamps. + The timestamps will be available in the returned Hypothesis.timestep as a dictionary. + + """ + + def __init__( + self, + blank_id: int, + beam_size: int, + search_type: str = "default", + return_best_hypothesis: bool = True, + preserve_alignments: bool = False, + compute_timestamps: bool = False, + beam_alpha: float = 1.0, + beam_beta: float = 0.0, + kenlm_path: str = None, + # pyctcdecode_cfg: Optional['PyCTCDecodeConfig'] = None, + ): + super().__init__(blank_id=blank_id, beam_size=beam_size) + + self.search_type = search_type + self.return_best_hypothesis = return_best_hypothesis + self.preserve_alignments = preserve_alignments + self.compute_timestamps = compute_timestamps + + if self.compute_timestamps: + raise ValueError(f"Currently this flag is not supported for beam search algorithms.") + + self.vocab = None # This must be set by specific method by user before calling forward() ! + + if search_type == "default" or search_type == "nemo": + self.search_algorithm = self.default_beam_search + elif search_type == "pyctcdecode": + self.search_algorithm = self._pyctcdecode_beam_search + + raise NotImplementedError(f"The search type of `pyctcdecode` is currently not supported.\n" f"") + else: + raise NotImplementedError( + f"The search type ({search_type}) supplied is not supported!\n" + f"Please use one of : (default, nemo, pyctcdecode)" + ) + + self.beam_alpha = beam_alpha + self.beam_beta = beam_beta + + # Default beam search args + self.kenlm_path = kenlm_path + + # PyCTCDecode params + # if pyctcdecode_cfg is None: + # pyctcdecode_cfg = PyCTCDecodeConfig() + # self.pyctcdecode_cfg = pyctcdecode_cfg # type: PyCTCDecodeConfig + + # Default beam search scorer functions + self.default_beam_scorer = None + self.pyctcdecode_beam_scorer = None + self.token_offset = 0 + + @typecheck() + def forward( + self, decoder_output: torch.Tensor, decoder_lengths: torch.Tensor, + ) -> Tuple[List[Union[rnnt_utils.Hypothesis, rnnt_utils.NBestHypotheses]]]: + """Returns a list of hypotheses given an input batch of the encoder hidden embedding. + Output token is generated auto-repressively. + + Args: + decoder_output: A tensor of size (batch, timesteps, features). + decoder_lengths: list of int representing the length of each sequence + output sequence. + + Returns: + packed list containing batch number of sentences (Hypotheses). + """ + if self.vocab is None: + raise RuntimeError("Please set the vocabulary with `set_vocabulary()` before calling this function.") + + if self.decoding_type is None: + raise ValueError("Please set the decoding type with `set_decoding_type()` before calling this function.") + + with torch.inference_mode(): + # Process each sequence independently + prediction_tensor = decoder_output + + if prediction_tensor.ndim != 3: + raise ValueError( + f"`decoder_output` must be a tensor of shape [B, T, V] (log probs, float). " + f"Provided shape = {prediction_tensor.shape}" + ) + + # determine type of input - logprobs or labels + out_len = decoder_lengths if decoder_lengths is not None else None + hypotheses = self.search_algorithm(prediction_tensor, out_len) + + # Pack results into Hypotheses + packed_result = pack_hypotheses(hypotheses, decoder_lengths) + + # Pack the result + if self.return_best_hypothesis and isinstance(packed_result[0], rnnt_utils.NBestHypotheses): + packed_result = [res.n_best_hypotheses[0] for res in packed_result] # type: Hypothesis + + return (packed_result,) + + @torch.no_grad() + def default_beam_search( + self, x: torch.Tensor, out_len: torch.Tensor + ) -> List[Union[rnnt_utils.Hypothesis, rnnt_utils.NBestHypotheses]]: + """ + + Args: + x: Tensor of shape [B, T, V+1] + out_len: + + Returns: + + """ + if self.compute_timestamps: + raise ValueError( + f"Beam Search with strategy `{self.search_type}` does not support time stamp calculation!" + ) + + if self.default_beam_scorer is None: + # Check for filepath + if self.kenlm_path is None or not os.path.exists(self.kenlm_path): + raise FileNotFoundError( + f"KenLM binary file not found at : {self.kenlm_path}. " + f"Please set a valid path in the decoding config." + ) + + # perform token offset for subword models + if self.decoding_type == 'subword': + vocab = [chr(idx + self.token_offset) for idx in range(len(self.vocab))] + else: + # char models + vocab = self.vocab + + # Must import at runtime to avoid circular dependency due to module level import. + from nemo.collections.asr.modules.beam_search_decoder import BeamSearchDecoderWithLM + + self.default_beam_scorer = BeamSearchDecoderWithLM( + vocab=vocab, + lm_path=self.kenlm_path, + beam_width=self.beam_size, + alpha=self.beam_alpha, + beta=self.beam_beta, + num_cpus=max(1, os.cpu_count()), + input_tensor=False, + ) + + x = x.to('cpu') + + with typecheck.disable_checks(): + data = [x[sample_id, : out_len[sample_id], :].softmax(dim=-1) for sample_id in range(len(x))] + beams_batch = self.default_beam_scorer.forward(log_probs=data, log_probs_length=None) + + # For each sample in the batch + nbest_hypotheses = [] + for beams_idx, beams in enumerate(beams_batch): + # For each beam candidate / hypothesis in each sample + hypotheses = [] + for candidate_idx, candidate in enumerate(beams): + hypothesis = rnnt_utils.Hypothesis( + score=0.0, y_sequence=[], dec_state=None, timestep=[], last_token=None + ) + + # For subword encoding, NeMo will double encode the subword (multiple tokens) into a + # singular unicode id. In doing so, we preserve the semantic of the unicode token, and + # compress the size of the final KenLM ARPA / Binary file. + # In order to do double encoding, we shift the subword by some token offset. + # This step is ignored for character based models. + if self.decoding_type == 'subword': + pred_token_ids = [ord(c) - self.token_offset for c in candidate[1]] + else: + pred_token_ids = candidate[1] + + # We preserve the token ids and the score for this hypothesis + hypothesis.y_sequence = pred_token_ids + hypothesis.score = candidate[0] + + # If alignment must be preserved, we preserve a view of the output logprobs. + # Note this view is shared amongst all beams within the sample, be sure to clone it if you + # require specific processing for each sample in the beam. + # This is done to preserve memory. + if self.preserve_alignments: + hypothesis.alignments = x[beams_idx][: out_len[beams_idx]] + + hypotheses.append(hypothesis) + + # Wrap the result in NBestHypothesis. + hypotheses = rnnt_utils.NBestHypotheses(hypotheses) + nbest_hypotheses.append(hypotheses) + + return nbest_hypotheses + + @torch.no_grad() + def _pyctcdecode_beam_search( + self, x: torch.Tensor, out_len: torch.Tensor + ) -> List[Union[rnnt_utils.Hypothesis, rnnt_utils.NBestHypotheses]]: + logging.warning("pyctcdecode is not yet integrated.") + + raise NotImplementedError( + "Currently, pyctcdecode has not ben formally integrated into the decoding framework." + ) + + # try: + # import pyctcdecode + # except (ImportError, ModuleNotFoundError): + # raise ImportError( + # f"Could not load `pyctcdecode` library. Please install it from pip using :\n" + # f"pip install --upgrade pyctcdecode" + # ) + # + # if self.pyctcdecode_beam_scorer is None: + # self.pyctcdecode_beam_scorer = pyctcdecode.build_ctcdecoder( + # labels=self.vocab, kenlm_model_path=self.kenlm_path, alpha=self.beam_alpha, beta=self.beam_beta + # ) # type: pyctcdecode.BeamSearchDecoderCTC + # + # x = x.to('cpu') + # + # with typecheck.disable_checks(): + # beams_batch = [] + # for sample_id in range(len(x)): + # logprobs = x[sample_id, : out_len[sample_id], :] + # result = self.pyctcdecode_beam_scorer.decode_beams( + # logprobs, + # beam_width=self.beam_size, + # beam_prune_logp=self.pyctcdecode_cfg.beam_prune_logp, + # token_min_logp=self.pyctcdecode_cfg.token_min_logp, + # prune_history=self.pyctcdecode_cfg.prune_history, + # hotwords=self.pyctcdecode_cfg.hotwords, + # hotword_weight=self.pyctcdecode_cfg.hotword_weight, + # lm_start_state=None, + # ) # Output format: text, last_lm_state, text_frames, logit_score, lm_score + # beams_batch.append(result) + # + # nbest_hypotheses = [] + # for beams_idx, beams in enumerate(beams_batch): + # hypotheses = [] + # for candidate_idx, candidate in enumerate(beams): + # # Candidate = (text, last_lm_state, text_frames, logit_score, lm_score) + # hypothesis = rnnt_utils.Hypothesis( + # score=0.0, y_sequence=[], dec_state=None, timestep=[], last_token=None + # ) + # + # # TODO: Requires token ids to be returned rather than text. + # pred_token_ids = None + # + # hypothesis.y_sequence = pred_token_ids + # hypothesis.text = candidate[0] # text + # hypothesis.score = candidate[4] # score + # + # if self.preserve_alignments: + # hypothesis.alignments = x[beams_idx][: out_len[beams_idx]] + # + # hypotheses.append(hypothesis) + # + # hypotheses = rnnt_utils.NBestHypotheses(hypotheses) + # nbest_hypotheses.append(hypotheses) + # + # return nbest_hypotheses + + def set_decoding_type(self, decoding_type: str): + super().set_decoding_type(decoding_type) + + # Please check train_kenlm.py in scripts/asr_language_modeling/ to find out why we need + # TOKEN_OFFSET for BPE-based models + if self.decoding_type == 'subword': + self.token_offset = 100 + + +# NOTE: Unused as of now +@dataclass +class PyCTCDecodeConfig: + # These arguments cannot be imported from pyctcdecode (optional dependency) + # Therefore we copy the values explicitly + # Taken from pyctcdecode.constant + beam_prune_logp: float = -10.0 + token_min_logp: float = -5.0 + prune_history: bool = False + hotwords: Optional[Iterable[str]] = None + hotword_weight: float = 10.0 + + +@dataclass +class BeamCTCInferConfig: + beam_size: int + search_type: str = 'default' + preserve_alignments: bool = False + compute_timestamps: bool = False + return_best_hypothesis: bool = True + + beam_alpha: float = 1.0 + beam_beta: float = 0.0 + kenlm_path: Optional[str] = None + + # pyctcdecode_cfg: PyCTCDecodeConfig = PyCTCDecodeConfig() diff --git a/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py b/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py index 277c4695a939..e563cf7471c9 100644 --- a/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py +++ b/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py @@ -13,71 +13,139 @@ # limitations under the License. # +""" # This script would evaluate an N-gram language model trained with KenLM library (https://github.com/kpu/kenlm) in # fusion with beam search decoders on top of a trained ASR model. NeMo's beam search decoders are capable of using the # KenLM's N-gram models to find the best candidates. This script supports both character level and BPE level # encodings and models which is detected automatically from the type of the model. # You may train the LM model with 'scripts/ngram_lm/train_kenlm.py'. -# -# USAGE: python eval_beamsearch_ngram.py --nemo_model_file \ -# --input_manifest \ -# --beam_width \ -# --beam_alpha \ -# --beam_beta \ -# --preds_output_folder \ -# --decoding_mode beamsearch_ngram -# ... -# + +# Config Help + +To discover all arguments of the script, please run : +python eval_beamsearch_ngram.py --help +python eval_beamsearch_ngram.py --cfg job + +# USAGE + +python eval_beamsearch_ngram.py nemo_model_file= \ + input_manifest= \ + beam_width=[] \ + beam_alpha=[] \ + beam_beta=[] \ + preds_output_folder= \ + probs_cache_file=null \ + decoding_mode=beamsearch_ngram + ... + + +# Grid Search for Hyper parameters + +For grid search, you can provide a list of arguments as follows - + + beam_width=[4,8,16,....] \ + beam_alpha=[-2.0,-1.0,...,1.0,2.0] \ + beam_beta=[-1.0,-0.5,0.0,...,1.0] \ + # You may find more info on how to use this script at: # https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html +""" -# Please check train_kenlm.py to find out why we need TOKEN_OFFSET for BPE-based models -TOKEN_OFFSET = 100 -import argparse import contextlib import json import os import pickle +from dataclasses import dataclass, field, is_dataclass from pathlib import Path +from typing import List, Optional import editdistance -import kenlm_utils import numpy as np import torch +from omegaconf import MISSING, OmegaConf from sklearn.model_selection import ParameterGrid from tqdm.auto import tqdm -import nemo import nemo.collections.asr as nemo_asr +from nemo.collections.asr.parts.submodules import ctc_beam_decoding +from nemo.core.config import hydra_runner from nemo.utils import logging +# fmt: off + + +@dataclass +class EvalBeamSearchNGramConfig: + """ + Evaluate an ASR model with beam search decoding and n-gram KenLM language model. + """ + # # The path of the '.nemo' file of the ASR model or the name of a pretrained model (ngc / huggingface) + nemo_model_file: str = MISSING + + # File paths + input_manifest: str = MISSING # The manifest file of the evaluation set + kenlm_model_file: Optional[str] = None # The path of the KenLM binary model file + preds_output_folder: Optional[str] = None # The optional folder where the predictions are stored + probs_cache_file: Optional[str] = None # The cache file for storing the logprobs of the model + + # Parameters for inference + acoustic_batch_size: int = 16 # The batch size to calculate log probabilities + beam_batch_size: int = 128 # The batch size to be used for beam search decoding + device: str = "cuda" # The device to load the model onto to calculate log probabilities + use_amp: bool = False # Whether to use AMP if available to calculate log probabilities + + # Beam Search hyperparameters + + # The decoding scheme to be used for evaluation. + # Can be one of ["greedy", "beamsearch", "beamsearch_ngram"] + decoding_mode: str = "beamsearch_ngram" + + beam_width: List[int] = field(default_factory=lambda: [128]) # The width or list of the widths for the beam search decoding + beam_alpha: List[float] = field(default_factory=lambda: [1.0]) # The alpha parameter or list of the alphas for the beam search decoding + beam_beta: List[float] = field(default_factory=lambda: [0.0]) # The beta parameter or list of the betas for the beam search decoding + + decoding_strategy: str = "beam" # Supports only beam for now + decoding: ctc_beam_decoding.BeamCTCInferConfig = ctc_beam_decoding.BeamCTCInferConfig(beam_size=128) + + +# fmt: on + def beam_search_eval( - all_probs, - target_transcripts, - vocab, - ids_to_text_func=None, - preds_output_file=None, - lm_path=None, - beam_alpha=1.0, - beam_beta=0.0, - beam_width=128, - beam_batch_size=128, - progress_bar=True, + model: nemo_asr.models.ASRModel, + cfg: EvalBeamSearchNGramConfig, + all_probs: List[torch.Tensor], + target_transcripts: List[str], + preds_output_file: str = None, + lm_path: str = None, + beam_alpha: float = 1.0, + beam_beta: float = 0.0, + beam_width: int = 128, + beam_batch_size: int = 128, + progress_bar: bool = True, ): - # creating the beam search decoder - beam_search_lm = nemo_asr.modules.BeamSearchDecoderWithLM( - vocab=vocab, - beam_width=beam_width, - alpha=beam_alpha, - beta=beam_beta, - lm_path=lm_path, - num_cpus=max(os.cpu_count(), 1), - input_tensor=False, - ) + level = logging.getEffectiveLevel() + logging.setLevel(logging.CRITICAL) + # Reset config + model.change_decoding_strategy(None) + + # Override the beam search config with current search candidate configuration + cfg.decoding.beam_size = beam_width + cfg.decoding.beam_alpha = beam_alpha + cfg.decoding.beam_beta = beam_beta + cfg.decoding.return_best_hypothesis = False + cfg.decoding.kenlm_path = cfg.kenlm_model_file + + # Update model's decoding strategy config + model.cfg.decoding.strategy = cfg.decoding_strategy + model.cfg.decoding.beam = cfg.decoding + + # Update model's decoding strategy + model.change_decoding_strategy(model.cfg.decoding) + logging.setLevel(level) wer_dist_first = cer_dist_first = 0 wer_dist_best = cer_dist_best = 0 @@ -97,9 +165,19 @@ def beam_search_eval( it = range(int(np.ceil(len(all_probs) / beam_batch_size))) for batch_idx in it: # disabling type checking - with nemo.core.typecheck.disable_checks(): - probs_batch = all_probs[batch_idx * beam_batch_size : (batch_idx + 1) * beam_batch_size] - beams_batch = beam_search_lm.forward(log_probs=probs_batch, log_probs_length=None,) + probs_batch = all_probs[batch_idx * beam_batch_size : (batch_idx + 1) * beam_batch_size] + probs_lens = torch.tensor([prob.shape[0] for prob in probs_batch]) + with torch.no_grad(): + packed_batch = torch.zeros(len(probs_batch), max(probs_lens), probs_batch[0].shape[-1], device='cpu') + + for prob_index in range(len(probs_batch)): + packed_batch[prob_index, : probs_lens[prob_index], :] = torch.tensor( + probs_batch[prob_index], device=packed_batch.device, dtype=packed_batch.dtype + ) + + _, beams_batch = model.decoding.ctc_decoder_predictions_tensor( + packed_batch, decoder_lengths=probs_lens, return_hypotheses=True, + ) for beams_idx, beams in enumerate(beams_batch): target = target_transcripts[sample_idx + beams_idx] @@ -108,12 +186,8 @@ def beam_search_eval( words_count += len(target_split_w) chars_count += len(target_split_c) wer_dist_min = cer_dist_min = 10000 - for candidate_idx, candidate in enumerate(beams): - if ids_to_text_func is not None: - # For BPE encodings, need to shift by TOKEN_OFFSET to retrieve the original sub-word ids - pred_text = ids_to_text_func([ord(c) - TOKEN_OFFSET for c in candidate[1]]) - else: - pred_text = candidate[1] + for candidate_idx, candidate in enumerate(beams): # type: (int, ctc_beam_decoding.rnnt_utils.Hypothesis) + pred_text = candidate.text pred_split_w = pred_text.split() wer_dist = editdistance.eval(target_split_w, pred_split_w) pred_split_c = list(pred_text) @@ -127,7 +201,7 @@ def beam_search_eval( wer_dist_first += wer_dist cer_dist_first += cer_dist - score = candidate[0] + score = candidate.score if preds_output_file: out_file.write('{}\t{}\n'.format(pred_text, score)) wer_dist_best += wer_dist_min @@ -157,84 +231,35 @@ def beam_search_eval( ) logging.info(f"=================================================================================") + return wer_dist_first / words_count, cer_dist_first / chars_count -def main(): - parser = argparse.ArgumentParser( - description='Evaluate an ASR model with beam search decoding and n-gram KenLM language model.' - ) - parser.add_argument( - "--nemo_model_file", - required=True, - type=str, - help="The path of the '.nemo' file of the ASR model or name of a pretrained model", - ) - parser.add_argument( - "--kenlm_model_file", required=False, default=None, type=str, help="The path of the KenLM binary model file" - ) - parser.add_argument("--input_manifest", required=True, type=str, help="The manifest file of the evaluation set") - parser.add_argument( - "--preds_output_folder", default=None, type=str, help="The optional folder where the predictions are stored" - ) - parser.add_argument( - "--probs_cache_file", default=None, type=str, help="The cache file for storing the outputs of the model" - ) - parser.add_argument( - "--acoustic_batch_size", default=16, type=int, help="The batch size to calculate log probabilities" - ) - parser.add_argument( - "--device", default="cuda", type=str, help="The device to load the model onto to calculate log probabilities" - ) - parser.add_argument( - "--use_amp", action="store_true", help="Whether to use AMP if available to calculate log probabilities" - ) - parser.add_argument( - "--decoding_mode", - choices=["greedy", "beamsearch", "beamsearch_ngram"], - default="beamsearch_ngram", - type=str, - help="The decoding scheme to be used for evaluation.", - ) - parser.add_argument( - "--beam_width", - required=False, - type=int, - nargs="+", - help="The width or list of the widths for the beam search decoding", - ) - parser.add_argument( - "--beam_alpha", - required=False, - type=float, - nargs="+", - help="The alpha parameter or list of the alphas for the beam search decoding", - ) - parser.add_argument( - "--beam_beta", - required=False, - type=float, - nargs="+", - help="The beta parameter or list of the betas for the beam search decoding", - ) - parser.add_argument( - "--beam_batch_size", default=128, type=int, help="The batch size to be used for beam search decoding" - ) - args = parser.parse_args() - if args.nemo_model_file.endswith('.nemo'): - asr_model = nemo_asr.models.ASRModel.restore_from(args.nemo_model_file, map_location=torch.device(args.device)) +@hydra_runner(config_path=None, config_name='EvalBeamSearchNGramConfig', schema=EvalBeamSearchNGramConfig) +def main(cfg: EvalBeamSearchNGramConfig): + if is_dataclass(cfg): + cfg = OmegaConf.structured(cfg) # type: EvalBeamSearchNGramConfig + + valid_decoding_modes = ["greedy", "beamsearch", "beamsearch_ngram"] + if cfg.decoding_mode not in valid_decoding_modes: + raise ValueError( + f"Given decoding_mode={cfg.decoding_mode} is invalid. Available options are :\n" f"{valid_decoding_modes}" + ) + + if cfg.nemo_model_file.endswith('.nemo'): + asr_model = nemo_asr.models.ASRModel.restore_from(cfg.nemo_model_file, map_location=torch.device(cfg.device)) else: logging.warning( "nemo_model_file does not end with .nemo, therefore trying to load a pretrained model with this name." ) asr_model = nemo_asr.models.ASRModel.from_pretrained( - args.nemo_model_file, map_location=torch.device(args.device) + cfg.nemo_model_file, map_location=torch.device(cfg.device) ) target_transcripts = [] - manifest_dir = Path(args.input_manifest).parent - with open(args.input_manifest, 'r') as manifest_file: + manifest_dir = Path(cfg.input_manifest).parent + with open(cfg.input_manifest, 'r') as manifest_file: audio_file_paths = [] - for line in tqdm(manifest_file, desc=f"Reading Manifest {args.input_manifest} ...", ncols=120): + for line in tqdm(manifest_file, desc=f"Reading Manifest {cfg.input_manifest} ...", ncols=120): data = json.loads(line) audio_file = Path(data['audio_filepath']) if not audio_file.is_file() and not audio_file.is_absolute(): @@ -242,35 +267,42 @@ def main(): target_transcripts.append(data['text']) audio_file_paths.append(str(audio_file.absolute())) - if args.probs_cache_file and os.path.exists(args.probs_cache_file): - logging.info(f"Found a pickle file of probabilities at '{args.probs_cache_file}'.") - logging.info(f"Loading the cached pickle file of probabilities from '{args.probs_cache_file}' ...") - with open(args.probs_cache_file, 'rb') as probs_file: + if cfg.probs_cache_file and os.path.exists(cfg.probs_cache_file): + logging.info(f"Found a pickle file of probabilities at '{cfg.probs_cache_file}'.") + logging.info(f"Loading the cached pickle file of probabilities from '{cfg.probs_cache_file}' ...") + with open(cfg.probs_cache_file, 'rb') as probs_file: all_probs = pickle.load(probs_file) if len(all_probs) != len(audio_file_paths): raise ValueError( - f"The number of samples in the probabilities file '{args.probs_cache_file}' does not " + f"The number of samples in the probabilities file '{cfg.probs_cache_file}' does not " f"match the manifest file. You may need to delete the probabilities cached file." ) else: - if args.use_amp: + + @contextlib.contextmanager + def default_autocast(): + yield + + if cfg.use_amp: if torch.cuda.is_available() and hasattr(torch.cuda, 'amp') and hasattr(torch.cuda.amp, 'autocast'): logging.info("AMP is enabled!\n") autocast = torch.cuda.amp.autocast + + else: + autocast = default_autocast else: - @contextlib.contextmanager - def autocast(): - yield + autocast = default_autocast with autocast(): with torch.no_grad(): - all_logits = asr_model.transcribe(audio_file_paths, batch_size=args.acoustic_batch_size, logprobs=True) - all_probs = [kenlm_utils.softmax(logits) for logits in all_logits] - if args.probs_cache_file: - logging.info(f"Writing pickle files of probabilities at '{args.probs_cache_file}'...") - with open(args.probs_cache_file, 'wb') as f_dump: + all_logits = asr_model.transcribe(audio_file_paths, batch_size=cfg.acoustic_batch_size, logprobs=True) + + all_probs = all_logits + if cfg.probs_cache_file: + logging.info(f"Writing pickle files of probabilities at '{cfg.probs_cache_file}'...") + with open(cfg.probs_cache_file, 'wb') as f_dump: pickle.dump(all_probs, f_dump) wer_dist_greedy = 0 @@ -297,66 +329,81 @@ def autocast(): logging.info('Greedy WER/CER = {:.2%}/{:.2%}'.format(wer_dist_greedy / words_count, cer_dist_greedy / chars_count)) - encoding_level = kenlm_utils.SUPPORTED_MODELS.get(type(asr_model).__name__, None) - if not encoding_level: - logging.warning( - f"Model type '{type(asr_model).__name__}' may not be supported. Would try to train a char-level LM." - ) - encoding_level = 'char' - - vocab = asr_model.decoder.vocabulary - ids_to_text_func = None - if encoding_level == "subword": - vocab = [chr(idx + TOKEN_OFFSET) for idx in range(len(vocab))] - ids_to_text_func = asr_model.tokenizer.ids_to_text - # delete the model to free the memory - del asr_model - - if args.decoding_mode == "beamsearch_ngram": - if not os.path.exists(args.kenlm_model_file): - raise FileNotFoundError(f"Could not find the KenLM model file '{args.kenlm_model_file}'.") - lm_path = args.kenlm_model_file + asr_model = asr_model.to('cpu') + + if cfg.decoding_mode == "beamsearch_ngram": + if not os.path.exists(cfg.kenlm_model_file): + raise FileNotFoundError(f"Could not find the KenLM model file '{cfg.kenlm_model_file}'.") + lm_path = cfg.kenlm_model_file else: lm_path = None # 'greedy' decoding_mode would skip the beam search decoding - if args.decoding_mode in ["beamsearch_ngram", "beamsearch"]: - if args.beam_width is None or args.beam_alpha is None or args.beam_beta is None: + if cfg.decoding_mode in ["beamsearch_ngram", "beamsearch"]: + if cfg.beam_width is None or cfg.beam_alpha is None or cfg.beam_beta is None: raise ValueError("beam_width, beam_alpha and beam_beta are needed to perform beam search decoding.") - params = {'beam_width': args.beam_width, 'beam_alpha': args.beam_alpha, 'beam_beta': args.beam_beta} + params = {'beam_width': cfg.beam_width, 'beam_alpha': cfg.beam_alpha, 'beam_beta': cfg.beam_beta} hp_grid = ParameterGrid(params) hp_grid = list(hp_grid) + best_wer_beam_size, best_cer_beam_size = None, None + best_wer_alpha, best_cer_alpha = None, None + best_wer_beta, best_cer_beta = None, None + best_wer, best_cer = 1e6, 1e6 + logging.info(f"==============================Starting the beam search decoding===============================") logging.info(f"Grid search size: {len(hp_grid)}") logging.info(f"It may take some time...") logging.info(f"==============================================================================================") - if args.preds_output_folder and not os.path.exists(args.preds_output_folder): - os.mkdir(args.preds_output_folder) + if cfg.preds_output_folder and not os.path.exists(cfg.preds_output_folder): + os.mkdir(cfg.preds_output_folder) for hp in hp_grid: - if args.preds_output_folder: + if cfg.preds_output_folder: preds_output_file = os.path.join( - args.preds_output_folder, + cfg.preds_output_folder, f"preds_out_width{hp['beam_width']}_alpha{hp['beam_alpha']}_beta{hp['beam_beta']}.tsv", ) else: preds_output_file = None - beam_search_eval( + candidate_wer, candidate_cer = beam_search_eval( + asr_model, + cfg, all_probs=all_probs, target_transcripts=target_transcripts, - vocab=vocab, - ids_to_text_func=ids_to_text_func, preds_output_file=preds_output_file, lm_path=lm_path, beam_width=hp["beam_width"], beam_alpha=hp["beam_alpha"], beam_beta=hp["beam_beta"], - beam_batch_size=args.beam_batch_size, + beam_batch_size=cfg.beam_batch_size, progress_bar=True, ) + if candidate_cer < best_cer: + best_cer_beam_size = hp["beam_width"] + best_cer_alpha = hp["beam_alpha"] + best_cer_beta = hp["beam_beta"] + best_cer = candidate_cer + + if candidate_wer < best_wer: + best_wer_beam_size = hp["beam_width"] + best_wer_alpha = hp["beam_alpha"] + best_wer_beta = hp["beam_beta"] + best_wer = candidate_wer + + logging.info( + f'Best WER Candidate = {best_wer:.2%} :: Beam size = {best_wer_beam_size}, ' + f'Beam alpha = {best_wer_alpha}, Beam beta = {best_wer_beta}' + ) + + logging.info( + f'Best CER Candidate = {best_cer:.2%} :: Beam size = {best_cer_beam_size}, ' + f'Beam alpha = {best_cer_alpha}, Beam beta = {best_cer_beta}' + ) + logging.info(f"=================================================================================") + if __name__ == '__main__': main() diff --git a/scripts/asr_language_modeling/ngram_lm/train_kenlm.py b/scripts/asr_language_modeling/ngram_lm/train_kenlm.py index f19695611814..41c4f31d31c6 100644 --- a/scripts/asr_language_modeling/ngram_lm/train_kenlm.py +++ b/scripts/asr_language_modeling/ngram_lm/train_kenlm.py @@ -27,7 +27,8 @@ # --train_file \ # --kenlm_model_file \ -# --ngram_length +# --ngram_length \ +# --preserve_arpa # # After training is done, the binary LM model is stored at the path specified by '--kenlm_model_file'. # You may find more info on how to use this script at: @@ -81,6 +82,9 @@ def main(): parser.add_argument( "--do_lowercase", action='store_true', help="Whether to apply lower case conversion on the training text" ) + parser.add_argument( + '--preserve_arpa', required=False, action='store_true', help='Whether to preserve the intermediate ARPA file.' + ) args = parser.parse_args() """ TOKENIZER SETUP """ @@ -156,8 +160,10 @@ def main(): os.remove(encoded_train_file) logging.info(f"Deleted the temporary encoded training file '{encoded_train_file}'.") - os.remove(arpa_file) - logging.info(f"Deleted the arpa file '{arpa_file}'.") + + if not args.preserve_arpa: + os.remove(arpa_file) + logging.info(f"Deleted the arpa file '{arpa_file}'.") if __name__ == '__main__': diff --git a/tests/collections/asr/decoding/test_ctc_decoding.py b/tests/collections/asr/decoding/test_ctc_decoding.py new file mode 100644 index 000000000000..2ff3abbad756 --- /dev/null +++ b/tests/collections/asr/decoding/test_ctc_decoding.py @@ -0,0 +1,189 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from functools import lru_cache + +import pytest +import torch +from omegaconf import DictConfig, OmegaConf + +from nemo.collections.asr.metrics.wer import CTCDecoding, CTCDecodingConfig +from nemo.collections.asr.metrics.wer_bpe import CTCBPEDecoding, CTCBPEDecodingConfig +from nemo.collections.asr.parts.mixins import mixins +from nemo.collections.asr.parts.utils.rnnt_utils import Hypothesis + + +def char_vocabulary(): + return [' ', 'a', 'b', 'c', 'd', 'e', 'f'] + + +@pytest.fixture() +@lru_cache(maxsize=8) +def tmp_tokenizer(test_data_dir): + cfg = DictConfig({'dir': os.path.join(test_data_dir, "asr", "tokenizers", "an4_wpe_128"), 'type': 'wpe'}) + + class _TmpASRBPE(mixins.ASRBPEMixin): + def register_artifact(self, _, vocab_path): + return vocab_path + + asrbpe = _TmpASRBPE() + asrbpe._setup_tokenizer(cfg) + return asrbpe.tokenizer + + +def check_char_timestamps(hyp: Hypothesis, decoding: CTCDecoding): + assert hyp.timestep is not None + assert isinstance(hyp.timestep, dict) + assert 'timestep' in hyp.timestep + assert 'char' in hyp.timestep + assert 'word' in hyp.timestep + + words = hyp.text.split(decoding.word_seperator) + words = list(filter(lambda x: x != '', words)) + assert len(hyp.timestep['word']) == len(words) + + +def check_subword_timestamps(hyp: Hypothesis, decoding: CTCBPEDecoding): + assert hyp.timestep is not None + assert isinstance(hyp.timestep, dict) + assert 'timestep' in hyp.timestep + assert 'char' in hyp.timestep + assert 'word' in hyp.timestep + + chars = list(hyp.text) + chars = list(filter(lambda x: x not in ['', ' ', '#'], chars)) + all_chars = [list(decoding.tokenizer.tokens_to_text(data['char'])) for data in hyp.timestep['char']] + all_chars = [char for subword in all_chars for char in subword] + all_chars = list(filter(lambda x: x not in ['', ' ', '#'], all_chars)) + assert len(chars) == len(all_chars) + + +class TestCTCDecoding: + @pytest.mark.unit + def test_constructor(self): + cfg = CTCDecodingConfig() + vocab = char_vocabulary() + decoding = CTCDecoding(decoding_cfg=cfg, vocabulary=vocab) + assert decoding is not None + + @pytest.mark.unit + def test_constructor_subword(self, tmp_tokenizer): + cfg = CTCBPEDecodingConfig() + decoding = CTCBPEDecoding(decoding_cfg=cfg, tokenizer=tmp_tokenizer) + assert decoding is not None + + @pytest.mark.unit + def test_char_decoding_greedy_forward(self,): + cfg = CTCDecodingConfig(strategy='greedy') + vocab = char_vocabulary() + decoding = CTCDecoding(decoding_cfg=cfg, vocabulary=vocab) + + B, T = 4, 20 + V = len(char_vocabulary()) + 1 + input_signal = torch.randn(size=(B, T, V)) + length = torch.randint(low=1, high=T, size=[B]) + + with torch.no_grad(): + texts, _ = decoding.ctc_decoder_predictions_tensor( + input_signal, length, fold_consecutive=True, return_hypotheses=False + ) + + for text in texts: + assert isinstance(text, str) + + @pytest.mark.unit + @pytest.mark.parametrize('alignments', [False, True]) + @pytest.mark.parametrize('timestamps', [False, True]) + def test_char_decoding_greedy_forward_hypotheses(self, alignments, timestamps): + cfg = CTCDecodingConfig(strategy='greedy', preserve_alignments=alignments, compute_timestamps=timestamps) + vocab = char_vocabulary() + decoding = CTCDecoding(decoding_cfg=cfg, vocabulary=vocab) + + B, T = 4, 20 + V = len(char_vocabulary()) + 1 + input_signal = torch.randn(size=(B, T, V)) + length = torch.randint(low=1, high=T, size=[B]) + + with torch.no_grad(): + hyps, _ = decoding.ctc_decoder_predictions_tensor( + input_signal, length, fold_consecutive=True, return_hypotheses=True + ) + + for idx, hyp in enumerate(hyps): + assert isinstance(hyp, Hypothesis) + assert torch.is_tensor(hyp.y_sequence) + assert isinstance(hyp.text, str) + + # alignments check + if alignments: + assert hyp.alignments is not None + assert isinstance(hyp.alignments, tuple) + assert len(hyp.alignments[0]) == length[idx] + assert len(hyp.alignments[1]) == length[idx] + + # timestamps check + if timestamps: + check_char_timestamps(hyp, decoding) + + @pytest.mark.unit + def test_subword_decoding_greedy_forward(self, tmp_tokenizer): + cfg = CTCBPEDecodingConfig(strategy='greedy') + decoding = CTCBPEDecoding(decoding_cfg=cfg, tokenizer=tmp_tokenizer) + + B, T = 4, 20 + V = decoding.tokenizer.tokenizer.vocab_size + 1 + input_signal = torch.randn(size=(B, T, V)) + length = torch.randint(low=1, high=T, size=[B]) + + with torch.no_grad(): + texts, _ = decoding.ctc_decoder_predictions_tensor( + input_signal, length, fold_consecutive=True, return_hypotheses=False + ) + + for text in texts: + assert isinstance(text, str) + + @pytest.mark.unit + @pytest.mark.parametrize('alignments', [False, True]) + @pytest.mark.parametrize('timestamps', [False, True]) + def test_subword_decoding_greedy_forward_hypotheses(self, tmp_tokenizer, alignments, timestamps): + cfg = CTCBPEDecodingConfig(strategy='greedy', preserve_alignments=alignments, compute_timestamps=timestamps) + decoding = CTCBPEDecoding(decoding_cfg=cfg, tokenizer=tmp_tokenizer) + + B, T = 4, 20 + V = decoding.tokenizer.tokenizer.vocab_size + 1 + input_signal = torch.randn(size=(B, T, V)) + length = torch.randint(low=1, high=T, size=[B]) + + with torch.no_grad(): + hyps, _ = decoding.ctc_decoder_predictions_tensor( + input_signal, length, fold_consecutive=True, return_hypotheses=True + ) + + for idx, hyp in enumerate(hyps): + assert isinstance(hyp, Hypothesis) + assert torch.is_tensor(hyp.y_sequence) + assert isinstance(hyp.text, str) + + # alignments check + if alignments: + assert hyp.alignments is not None + assert isinstance(hyp.alignments, tuple) + assert len(hyp.alignments[0]) == length[idx] + assert len(hyp.alignments[1]) == length[idx] + + # timestamps check + if timestamps: + check_subword_timestamps(hyp, decoding) From 4eead4f281f4711ce3d5dc2178e3594b12a7136f Mon Sep 17 00:00:00 2001 From: mikolajblaz Date: Tue, 20 Dec 2022 22:45:20 +0100 Subject: [PATCH 086/154] Propagate attention_dropout flag for GPT-3 (#5669) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Propagate attention_dropout flag for GPT-3 Signed-off-by: Mikołaj Błaż * Add default to megatron_gpt_config Signed-off-by: Mikołaj Błaż Signed-off-by: Mikołaj Błaż Co-authored-by: Oleksii Kuchaiev Co-authored-by: Eric Harper Signed-off-by: Elena Rastorgueva --- examples/nlp/language_modeling/conf/megatron_gpt_config.yaml | 1 + .../nlp/models/language_modeling/megatron/gpt_model.py | 2 ++ .../nlp/models/language_modeling/megatron_gpt_model.py | 1 + .../nlp/modules/common/megatron/language_model.py | 5 +++++ 4 files changed, 9 insertions(+) diff --git a/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml b/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml index 13b8ffde521d..7ab1642e5df9 100755 --- a/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml +++ b/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml @@ -60,6 +60,7 @@ model: init_method_std: 0.02 # Standard deviation of the zero mean normal distribution used for weight initialization.') use_scaled_init_method: True # use scaled residuals initialization hidden_dropout: 0.1 # Dropout probability for hidden state transformer. + attention_dropout: 0.1 # Dropout probability for attention kv_channels: null # Projection weights dimension in multi-head attention. Set to hidden_size // num_attention_heads if null apply_query_key_layer_scaling: True # scale Q * K^T by 1 / layer-number. normalization: layernorm # Type of normalization layers diff --git a/nemo/collections/nlp/models/language_modeling/megatron/gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron/gpt_model.py index f53ad92b1dea..6fc11f091fa2 100755 --- a/nemo/collections/nlp/models/language_modeling/megatron/gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron/gpt_model.py @@ -113,6 +113,7 @@ def __init__( fp16_lm_cross_entropy=False, use_cpu_initialization=False, hidden_dropout=0.1, + attention_dropout=0.1, precision=16, fp32_residual_connection=False, activations_checkpoint_granularity=None, @@ -165,6 +166,7 @@ def __init__( vocab_size=vocab_size, hidden_size=hidden_size, hidden_dropout=hidden_dropout, + attention_dropout=attention_dropout, num_tokentypes=num_tokentypes, max_position_embeddings=max_position_embeddings, num_layers=num_layers, diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py index 1973e6a16c85..7ed174280634 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py @@ -165,6 +165,7 @@ def model_provider_func(self, pre_process, post_process): fp16_lm_cross_entropy=self.cfg.get('fp16_lm_cross_entropy', False), use_cpu_initialization=self.cfg.get('use_cpu_initialization', False), hidden_dropout=self.cfg.get('hidden_dropout', 0.1), + attention_dropout=self.cfg.get('attention_dropout', 0.1), precision=self.cfg.get('precision', 16), fp32_residual_connection=self.cfg.get('fp32_residual_connection', False), activations_checkpoint_granularity=self.cfg.get('activations_checkpoint_granularity', None), diff --git a/nemo/collections/nlp/modules/common/megatron/language_model.py b/nemo/collections/nlp/modules/common/megatron/language_model.py index c2120bcd1c5e..ece81c742ea2 100755 --- a/nemo/collections/nlp/modules/common/megatron/language_model.py +++ b/nemo/collections/nlp/modules/common/megatron/language_model.py @@ -59,6 +59,7 @@ def get_language_model( init_method_std=0.02, use_cpu_initialization=False, hidden_dropout=0.1, + attention_dropout=0.1, precision=16, fp32_residual_connection=False, activations_checkpoint_method=None, @@ -122,6 +123,7 @@ def get_language_model( post_process=post_process, use_cpu_initialization=use_cpu_initialization, hidden_dropout=hidden_dropout, + attention_dropout=attention_dropout, precision=precision, fp32_residual_connection=fp32_residual_connection, activations_checkpoint_method=activations_checkpoint_method, @@ -410,6 +412,7 @@ def __init__( post_process=True, use_cpu_initialization=False, hidden_dropout=0.1, + attention_dropout=0.1, precision=16, fp32_residual_connection=False, activations_checkpoint_method=None, @@ -497,6 +500,7 @@ def __init__( normalization=normalization, layernorm_epsilon=layernorm_epsilon, hidden_dropout=hidden_dropout, + attention_dropout=attention_dropout, use_cpu_initialization=use_cpu_initialization, persist_layer_norm=persist_layer_norm, openai_gelu=openai_gelu, @@ -544,6 +548,7 @@ def __init__( normalization=normalization, layernorm_epsilon=layernorm_epsilon, hidden_dropout=hidden_dropout, + attention_dropout=attention_dropout, use_cpu_initialization=use_cpu_initialization, bias_activation_fusion=bias_activation_fusion, bias_dropout_add_fusion=bias_dropout_add_fusion, From 2d213161d29f39f3ef78b3da9544e659c1d1299e Mon Sep 17 00:00:00 2001 From: Sandeep Subramanian Date: Tue, 20 Dec 2022 16:37:19 -1000 Subject: [PATCH 087/154] Enc-Dec model size reporting fixes (#5623) * Update for enc-dec models Signed-off-by: MaximumEntropy * Fix for bert as well Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix for PP Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: MaximumEntropy Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- .../language_modeling/megatron_base_model.py | 80 +++++++++++++++++++ .../language_modeling/megatron_bert_model.py | 29 +------ .../language_modeling/megatron_gpt_model.py | 30 +------ .../megatron_lm_encoder_decoder_model.py | 17 ++++ 4 files changed, 103 insertions(+), 53 deletions(-) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_base_model.py b/nemo/collections/nlp/models/language_modeling/megatron_base_model.py index 7119e47acb98..46d836e524e6 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_base_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_base_model.py @@ -484,3 +484,83 @@ def is_data_parallel_rank_zero(self): return True else: return False + + def _get_total_params_across_model_parallel_groups_gpt_bert(self, model): + """Returns the total number of parameters across all model parallel groups.""" + # log number of parameters + if isinstance(model, list): + num_parameters_on_device = sum( + [sum([p.nelement() for p in model_module.parameters()]) for model_module in model] + ) + if parallel_state.get_pipeline_model_parallel_world_size() > 1 and parallel_state.is_pipeline_last_stage( + ignore_virtual=True + ): + # substract the embedding weights on the last virtual stage + num_word_embedding_parameters = sum([p.nelement() for p in model[-1].word_embeddings_weight()]) + num_parameters_on_device -= num_word_embedding_parameters + else: + num_parameters_on_device = sum([p.nelement() for p in model.parameters()]) + + if parallel_state.get_pipeline_model_parallel_world_size() > 1 and parallel_state.is_pipeline_last_stage( + ignore_virtual=True + ): + # substract the embedding weights on the last stage + num_word_embedding_parameters = sum([p.nelement() for p in model.word_embeddings_weight()]) + num_parameters_on_device -= num_word_embedding_parameters + + # to be summed across data parallel group + total_num_parameters = torch.tensor(num_parameters_on_device).cuda() + + torch.distributed.all_reduce(total_num_parameters, group=parallel_state.get_model_parallel_group()) + + return num_parameters_on_device, total_num_parameters + + def _get_total_params_across_model_parallel_groups_enc_dec(self, model): + """Returns the total number of parameters across all model parallel groups.""" + # log number of parameters + # TODO: If/when we add interleaved model parallelism, we will need to add another if/else here. + num_parameters_on_device = sum([p.nelement() for p in model.parameters()]) + + if parallel_state.get_pipeline_model_parallel_world_size() > 1 and ( + parallel_state.get_pipeline_model_parallel_rank() == self.cfg.get('pipeline_model_parallel_split_rank', 0) + or parallel_state.is_pipeline_last_stage() + ): + # If the current rank is the in the decoder first stage (decoder emb) or last rank (output layer), subtract those weights since it is already accounted for in the encoder first stage. + # TODO: If we support embedding untying with PP > 1, we will need to update this. + num_word_embedding_parameters = sum([p.nelement() for p in model.word_embeddings_weight()]) + num_parameters_on_device -= num_word_embedding_parameters + + # Subtract decoder position embedding params that are shared with encoder. + if ( + parallel_state.is_pipeline_stage_at_split() + and self.cfg.encoder.get("position_embedding_type", "learned_absolute") == "learned_absolute" + ): + num_position_embedding_parameters = sum([p.nelement() for p in model.position_embeddings_weight()]) + num_parameters_on_device -= num_position_embedding_parameters + + # Check and remove RPE embeddings from the encoder that are replicated. + if ( + parallel_state.get_pipeline_model_parallel_world_size() > 1 + and parallel_state.is_pipeline_stage_before_split() + and not parallel_state.is_pipeline_first_stage() + and self.cfg.encoder.get("position_embedding_type", "learned_absolute") == "relative" + ): + # substract the RPE params on intermediate pipeline stages. + num_rpe_params = sum([p.nelement() for p in model.encoder_relative_position_embeddings_weight()]) + num_parameters_on_device -= num_rpe_params + + # Check and remove RPE embeddings from the decoder that are replicated. + if ( + parallel_state.get_pipeline_model_parallel_world_size() > 1 + and parallel_state.is_pipeline_stage_after_split() + and not parallel_state.is_pipeline_stage_at_split() + and self.cfg.encoder.get("position_embedding_type", "learned_absolute") == "relative" + ): + # substract the RPE params on intermediate pipeline stages. + num_rpe_params = sum([p.nelement() for p in model.decoder_relative_position_embeddings_weight()]) + num_parameters_on_device -= num_rpe_params + + # to be summed across data parallel group + total_num_parameters = torch.tensor(num_parameters_on_device).cuda() + torch.distributed.all_reduce(total_num_parameters, group=parallel_state.get_model_parallel_group()) + return num_parameters_on_device, total_num_parameters diff --git a/nemo/collections/nlp/models/language_modeling/megatron_bert_model.py b/nemo/collections/nlp/models/language_modeling/megatron_bert_model.py index ef4fa18b10b4..ddeb077e2c5a 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_bert_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_bert_model.py @@ -552,32 +552,9 @@ def setup(self, stage=None): stage (str, optional): Can be 'fit', 'validate', 'test' or 'predict'. Defaults to None. """ - # log number of parameters - if isinstance(self.model, list): - num_parameters_on_device = sum( - [sum([p.nelement() for p in model_module.parameters()]) for model_module in self.model] - ) - if parallel_state.get_pipeline_model_parallel_world_size() > 1 and parallel_state.is_pipeline_last_stage( - ignore_virtual=True - ): - # substract the embedding weights on the last virtual stage - num_word_embedding_parameters = sum([p.nelement() for p in self.model[-1].word_embeddings_weight()]) - num_parameters_on_device -= num_word_embedding_parameters - else: - num_parameters_on_device = sum([p.nelement() for p in self.model.parameters()]) - - if parallel_state.get_pipeline_model_parallel_world_size() > 1 and parallel_state.is_pipeline_last_stage( - ignore_virtual=True - ): - # substract the embedding weights on the last stage - num_word_embedding_parameters = sum([p.nelement() for p in self.model.word_embeddings_weight()]) - - num_parameters_on_device -= num_word_embedding_parameters - - # to be summed across data parallel group - total_num_parameters = torch.tensor(num_parameters_on_device).cuda() - - torch.distributed.all_reduce(total_num_parameters, group=parallel_state.get_model_parallel_group()) + num_parameters_on_device, total_num_parameters = self._get_total_params_across_model_parallel_groups_gpt_bert( + self.model + ) logging.info( f'Pipeline model parallel rank: {parallel_state.get_pipeline_model_parallel_rank()}, ' diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py index 7ed174280634..6ae9c2bb64bf 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py @@ -717,33 +717,9 @@ def setup(self, stage=None): Args: stage (str, optional): Can be 'fit', 'validate', 'test' or 'predict'. Defaults to None. """ - - # log number of parameters - if isinstance(self.model, list): - num_parameters_on_device = sum( - [sum([p.nelement() for p in model_module.parameters()]) for model_module in self.model] - ) - if parallel_state.get_pipeline_model_parallel_world_size() > 1 and parallel_state.is_pipeline_last_stage( - ignore_virtual=True - ): - # substract the embedding weights on the last virtual stage - num_word_embedding_parameters = sum([p.nelement() for p in self.model[-1].word_embeddings_weight()]) - num_parameters_on_device -= num_word_embedding_parameters - else: - num_parameters_on_device = sum([p.nelement() for p in self.model.parameters()]) - - if parallel_state.get_pipeline_model_parallel_world_size() > 1 and parallel_state.is_pipeline_last_stage( - ignore_virtual=True - ): - # substract the embedding weights on the last stage - num_word_embedding_parameters = sum([p.nelement() for p in self.model.word_embeddings_weight()]) - - num_parameters_on_device -= num_word_embedding_parameters - - # to be summed across data parallel group - total_num_parameters = torch.tensor(num_parameters_on_device).cuda() - - torch.distributed.all_reduce(total_num_parameters, group=parallel_state.get_model_parallel_group()) + num_parameters_on_device, total_num_parameters = self._get_total_params_across_model_parallel_groups_gpt_bert( + self.model + ) logging.info( f'Pipeline model parallel rank: {parallel_state.get_pipeline_model_parallel_rank()}, ' diff --git a/nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py b/nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py index 9b804cef9446..6eb34758b979 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py @@ -909,7 +909,24 @@ def build_pretraining_data_loader(self, dataset, consumed_samples, num_workers): ) def setup(self, stage=None): + """ PTL hook that is executed after DDP spawns. + We setup datasets here as megatron datasets require DDP to instantiate. + See https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html#setup for more information. + Args: + stage (str, optional): Can be 'fit', 'validate', 'test' or 'predict'. Defaults to None. + """ + num_parameters_on_device, total_num_parameters = self._get_total_params_across_model_parallel_groups_enc_dec( + self.enc_dec_model + ) + + logging.info( + f'Pipeline model parallel rank: {parallel_state.get_pipeline_model_parallel_rank()}\n' + f'Tensor model parallel rank: {parallel_state.get_tensor_model_parallel_rank()}\n' + f'Number of model parameters on device: {num_parameters_on_device:.2e}\n' + f'Total number of model parameters: {total_num_parameters:.2e}\n' + ) resume_checkpoint_path = self.trainer._checkpoint_connector.resume_from_checkpoint_fit_path + if resume_checkpoint_path: init_consumed_samples = self._extract_consumed_samples_from_ckpt(resume_checkpoint_path) else: From d3c15b0d6d89c7cbf3929e3221a5ba50aa58d159 Mon Sep 17 00:00:00 2001 From: Hainan Xu Date: Tue, 20 Dec 2022 21:41:37 -0500 Subject: [PATCH 088/154] Multiblank Transducer (#5527) * multi-blank transducers Signed-off-by: Hainan Xu * one line bug fix Signed-off-by: Hainan Xu * change interface of RNNTDecoding class to extract num-extra-output from joint instead of constructor Signed-off-by: Hainan Xu * addressed PR comments Signed-off-by: Hainan Xu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Hainan Xu Co-authored-by: Hainan Xu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- README.rst | 1 + .../conformer_multiblank_transducer_bpe.yaml | 262 +++++++ nemo/collections/asr/losses/rnnt.py | 43 +- nemo/collections/asr/losses/rnnt_pytorch.py | 125 +++ nemo/collections/asr/metrics/rnnt_wer.py | 100 ++- nemo/collections/asr/metrics/rnnt_wer_bpe.py | 2 +- nemo/collections/asr/modules/rnnt.py | 8 +- nemo/collections/asr/modules/rnnt_abstract.py | 4 + .../asr/parts/numba/rnnt_loss/__init__.py | 2 +- .../asr/parts/numba/rnnt_loss/rnnt.py | 120 +++ .../asr/parts/numba/rnnt_loss/rnnt_pytorch.py | 188 ++++- .../rnnt_loss/utils/cuda_utils/gpu_rnnt.py | 232 ++++++ .../utils/cuda_utils/gpu_rnnt_kernel.py | 478 ++++++++++++ .../parts/submodules/rnnt_greedy_decoding.py | 736 ++++++++++++++++++ .../asr/numba/rnnt_loss/test_rnnt_pytorch.py | 40 +- .../asr/test_asr_rnnt_encdec_model.py | 88 +++ 16 files changed, 2391 insertions(+), 38 deletions(-) create mode 100644 examples/asr/conf/conformer/multiblank/conformer_multiblank_transducer_bpe.yaml diff --git a/README.rst b/README.rst index acec586245ef..bab662d8f5ce 100644 --- a/README.rst +++ b/README.rst @@ -65,6 +65,7 @@ Key Features * `Automatic Speech Recognition (ASR) `_ * Supported models: Jasper, QuartzNet, CitriNet, Conformer-CTC, Conformer-Transducer, Squeezeformer-CTC, Squeezeformer-Transducer, ContextNet, LSTM-Transducer (RNNT), LSTM-CTC, ... * Supports CTC and Transducer/RNNT losses/decoders + * NeMo Original `Multi-blank Transducers `_ * Beam Search decoding * `Language Modelling for ASR `_: N-gram LM in fusion with Beam Search decoding, Neural Rescoring with Transformer * Streaming and Buffered ASR (CTC/Transducer) - `Chunked Inference Examples `_ diff --git a/examples/asr/conf/conformer/multiblank/conformer_multiblank_transducer_bpe.yaml b/examples/asr/conf/conformer/multiblank/conformer_multiblank_transducer_bpe.yaml new file mode 100644 index 000000000000..bf30db4d17fb --- /dev/null +++ b/examples/asr/conf/conformer/multiblank/conformer_multiblank_transducer_bpe.yaml @@ -0,0 +1,262 @@ +# It contains the default values for training a Multiblank Conformer-Transducer ASR model with stateless decoders, large size (~120M) with Transducer loss and sub-word encoding. + +# Architecture and training config: +# Default learning parameters in this config are set for effective batch size of 2K. To train it with smaller effective +# batch sizes, you may need to re-tune the learning parameters or use higher accumulate_grad_batches. +# Here are the recommended configs for different variants of Conformer-Transducer, other parameters are the same as in this config file. +# +# +-------------+---------+---------+----------+--------------+--------------------------+ +# | Model | d_model | n_heads | n_layers | weight_decay | pred_hidden/joint_hidden | +# +=============+=========+========+===========+==============+==========================+ +# | Small (14M)| 176 | 4 | 16 | 0.0 | 320 | +# +-------------+---------+--------+-----------+--------------+--------------------------+ +# | Medium (32M)| 256 | 4 | 16 | 1e-3 | 640 | +# +-------------+---------+--------+-----------+--------------+--------------------------+ +# | Large (120M)| 512 | 8 | 17 | 1e-3 | 640 | +# +-----------------------------------------------------------+--------------------------+ +# + +# You may find more info about Conformer-Transducer here: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html#conformer-transducer +# Multiblank transducer is decribed in https://arxiv.org/pdf/2211.03541 +# Pre-trained models of Conformer-Transducer can be found here: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/results.html + +name: "Multiblank-Conformer-Transducer-BPE" + +model: + loss: + loss_name: "multiblank_rnnt" + + multiblank_rnnt_kwargs: + # FastEmit regularization: https://arxiv.org/abs/2010.11148 + # You may enable FastEmit to reduce the latency of the model for streaming + fastemit_lambda: 0.0 # Recommended values to be in range [1e-4, 1e-2], 0.001 is a good start. + clamp: -1.0 # if > 0, applies gradient clamping in range [-clamp, clamp] for the joint tensor only. + # Multiblank RNN-T: https://arxiv.org/pdf/2211.03541.pdf + big_blank_durations: [2, 4, 8] + sigma: 0.05 + + sample_rate: 16000 + compute_eval_loss: false # eval samples can be very long and exhaust memory. Disable computation of transducer loss during validation/testing with this flag. + log_prediction: true # enables logging sample predictions in the output during training + skip_nan_grad: false + + model_defaults: + enc_hidden: ${model.encoder.d_model} + pred_hidden: 640 + joint_hidden: 640 + + train_ds: + manifest_filepath: ??? + sample_rate: ${model.sample_rate} + batch_size: 16 # you may increase batch_size if your memory allows + shuffle: true + num_workers: 8 + pin_memory: true + use_start_end_token: false + trim_silence: false + max_duration: 16.7 # it is set for LibriSpeech, you may need to update it for your dataset + min_duration: 0.1 + # tarred datasets + is_tarred: false + tarred_audio_filepaths: null + shuffle_n: 2048 + # bucketing params + bucketing_strategy: "synced_randomized" + bucketing_batch_size: null + + validation_ds: + manifest_filepath: ??? + sample_rate: ${model.sample_rate} + batch_size: 16 + shuffle: false + num_workers: 8 + pin_memory: true + use_start_end_token: false + + test_ds: + manifest_filepath: null + sample_rate: ${model.sample_rate} + batch_size: 16 + shuffle: false + num_workers: 8 + pin_memory: true + use_start_end_token: false + + # You may find more detail on how to train a tokenizer at: /scripts/tokenizers/process_asr_text_tokenizer.py + tokenizer: + dir: ??? # path to directory which contains either tokenizer.model (bpe) or vocab.txt (for wpe) + type: bpe # Can be either bpe (SentencePiece tokenizer) or wpe (WordPiece tokenizer) + + preprocessor: + _target_: nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor + sample_rate: ${model.sample_rate} + normalize: "per_feature" + window_size: 0.025 + window_stride: 0.01 + window: "hann" + features: 80 + n_fft: 512 + frame_splicing: 1 + dither: 0.00001 + pad_to: 0 + + spec_augment: + _target_: nemo.collections.asr.modules.SpectrogramAugmentation + freq_masks: 2 # set to zero to disable it + time_masks: 10 # set to zero to disable it + freq_width: 27 + time_width: 0.05 + + encoder: + _target_: nemo.collections.asr.modules.ConformerEncoder + feat_in: ${model.preprocessor.features} + feat_out: -1 # you may set it if you need different output size other than the default d_model + n_layers: 17 + d_model: 512 + + # Sub-sampling params + subsampling: striding # vggnet, striding, stacking or stacking_norm, dw_striding + subsampling_factor: 4 # must be power of 2 for striding and vggnet + subsampling_conv_channels: -1 # set to -1 to make it equal to the d_model + causal_downsampling: false + + # Feed forward module's params + ff_expansion_factor: 4 + + # Multi-headed Attention Module's params + self_attention_model: rel_pos # rel_pos or abs_pos + n_heads: 8 # may need to be lower for smaller d_models + # [left, right] specifies the number of steps to be seen from left and right of each step in self-attention + att_context_size: [-1, -1] # -1 means unlimited context + att_context_style: regular # regular or chunked_limited + xscaling: true # scales up the input embeddings by sqrt(d_model) + untie_biases: true # unties the biases of the TransformerXL layers + pos_emb_max_len: 5000 + + # Convolution module's params + conv_kernel_size: 31 + conv_norm_type: 'batch_norm' # batch_norm or layer_norm or groupnormN (N specifies the number of groups) + # conv_context_size can be"causal" or a list of two integers while conv_context_size[0]+conv_context_size[1]+1==conv_kernel_size + # null means [(kernel_size-1)//2, (kernel_size-1)//2], and 'causal' means [(kernel_size-1), 0] + conv_context_size: null + + ### regularization + dropout: 0.1 # The dropout used in most of the Conformer Modules + dropout_emb: 0.0 # The dropout used for embeddings + dropout_att: 0.1 # The dropout for multi-headed attention modules + + decoder: + _target_: nemo.collections.asr.modules.RNNTDecoder + normalization_mode: null # Currently only null is supported for export. + random_state_sampling: false # Random state sampling: https://arxiv.org/pdf/1910.11455.pdf + blank_as_pad: true # This flag must be set in order to support exporting of RNNT models + efficient inference. + + prednet: + pred_hidden: ${model.model_defaults.pred_hidden} + pred_rnn_layers: 1 + t_max: null + dropout: 0.2 + + joint: + _target_: nemo.collections.asr.modules.RNNTJoint + log_softmax: null # 'null' would set it automatically according to CPU/GPU device + preserve_memory: false # dramatically slows down training, but might preserve some memory + + # Fuses the computation of prediction net + joint net + loss + WER calculation + # to be run on sub-batches of size `fused_batch_size`. + # When this flag is set to true, consider the `batch_size` of *_ds to be just `encoder` batch size. + # `fused_batch_size` is the actual batch size of the prediction net, joint net and transducer loss. + # Using small values here will preserve a lot of memory during training, but will make training slower as well. + # An optimal ratio of fused_batch_size : *_ds.batch_size is 1:1. + # However, to preserve memory, this ratio can be 1:8 or even 1:16. + # Extreme case of 1:B (i.e. fused_batch_size=1) should be avoided as training speed would be very slow. + fuse_loss_wer: true + fused_batch_size: 16 + + jointnet: + joint_hidden: ${model.model_defaults.joint_hidden} + activation: "relu" + dropout: 0.2 + num_extra_outputs: 3 + + decoding: + strategy: "greedy_batch" # can be greedy, greedy_batch, beam, tsd, alsd. + + # this must not be None in order to use the multi-blank specific decoding method. + # you could set this to [1, 1, 1] so that big blanks are treated the same + # as standard blanks during decoding, which usually improves accuracy although runs slower. + big_blank_durations: [2, 4, 8] + + # greedy strategy config + greedy: + max_symbols: 10 + + # beam strategy config + beam: + beam_size: 2 + return_best_hypothesis: False + score_norm: true + tsd_max_sym_exp: 50 # for Time Synchronous Decoding + alsd_max_target_len: 2.0 # for Alignment-Length Synchronous Decoding + + # Adds Gaussian noise to the gradients of the decoder to avoid overfitting + variational_noise: + start_step: 0 + std: 0.0 + + optim: + name: adamw + lr: 5.0 + # optimizer arguments + betas: [0.9, 0.98] + weight_decay: 1e-3 + + # scheduler setup + sched: + name: NoamAnnealing + d_model: ${model.encoder.d_model} + # scheduler config override + warmup_steps: 10000 + warmup_ratio: null + min_lr: 1e-6 + +trainer: + devices: -1 # number of GPUs, -1 would use all available GPUs + num_nodes: 1 + max_epochs: 500 + max_steps: -1 # computed at runtime if not set + val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations + accelerator: auto + strategy: ddp + accumulate_grad_batches: 1 + gradient_clip_val: 0.0 + precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP. + log_every_n_steps: 10 # Interval of logging. + enable_progress_bar: True + resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc. + num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it + check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs + sync_batchnorm: true + enable_checkpointing: False # Provided by exp_manager + logger: false # Provided by exp_manager + benchmark: false # needs to be false for models with variable-length speech input as it slows down training + + +exp_manager: + exp_dir: null + name: ${name} + create_tensorboard_logger: true + create_checkpoint_callback: true + checkpoint_callback_params: + # in case of multiple validation sets, first one is used + monitor: "val_wer" + mode: "min" + save_top_k: 5 + always_save_nemo: True # saves the checkpoints as nemo files instead of PTL checkpoints + resume_if_exists: false + resume_ignore_no_checkpoint: false + + create_wandb_logger: false + wandb_logger_kwargs: + name: null + project: null diff --git a/nemo/collections/asr/losses/rnnt.py b/nemo/collections/asr/losses/rnnt.py index 6c275575206b..50f955d36551 100644 --- a/nemo/collections/asr/losses/rnnt.py +++ b/nemo/collections/asr/losses/rnnt.py @@ -34,7 +34,7 @@ import torch from omegaconf import DictConfig, OmegaConf -from nemo.collections.asr.losses.rnnt_pytorch import RNNTLossPytorch +from nemo.collections.asr.losses.rnnt_pytorch import MultiblankRNNTLossPytorch, RNNTLossPytorch from nemo.core.classes import Loss, typecheck from nemo.core.neural_types import LabelsType, LengthsType, LogprobsType, LossType, NeuralType from nemo.core.utils.numba_utils import NUMBA_INSTALLATION_MESSAGE @@ -48,7 +48,7 @@ WARP_RNNT_AVAILABLE = False try: - from nemo.collections.asr.parts.numba.rnnt_loss import RNNTLossNumba + from nemo.collections.asr.parts.numba.rnnt_loss import RNNTLossNumba, MultiblankRNNTLossNumba NUMBA_RNNT_AVAILABLE = True except (ImportError, ModuleNotFoundError): @@ -95,6 +95,20 @@ class RNNTLossConfig: is_available=True, installation_msg="Pure Pytorch implementation of RNN-T loss. Slow and for debugging purposes only.", ), + "multiblank_rnnt": RNNTLossConfig( + loss_name="multiblank_rnnt", + lib_name="numba", + min_version='0.53.0', + is_available=NUMBA_RNNT_AVAILABLE, + installation_msg=NUMBA_INSTALLATION_MESSAGE, + ), + "multiblank_rnnt_pytorch": RNNTLossConfig( + loss_name="pytorch", + lib_name="torch", + min_version='0.0', + is_available=True, + installation_msg="Pure Pytorch implementation of Multiblank RNN-T loss. Slow and for debugging purposes only.", + ), } RNNT_LOSS_RESOLVER['default'] = RNNT_LOSS_RESOLVER['warprnnt_numba'] @@ -177,6 +191,29 @@ def resolve_rnnt_loss(loss_name: str, blank_idx: int, loss_kwargs: dict = None) loss_func = RNNTLossPytorch(blank=blank_idx, reduction='none') _warn_unused_additional_kwargs(loss_name, loss_kwargs) + elif loss_name == 'multiblank_rnnt': + fastemit_lambda = loss_kwargs.pop('fastemit_lambda', 0.0) + clamp = loss_kwargs.pop('clamp', -1.0) + big_blank_durations = loss_kwargs.pop('big_blank_durations', None) + sigma = loss_kwargs.pop('sigma', 0.0) + loss_func = MultiblankRNNTLossNumba( + blank=blank_idx, + big_blank_durations=big_blank_durations, + reduction='none', + fastemit_lambda=fastemit_lambda, + clamp=clamp, + sigma=sigma, + ) + _warn_unused_additional_kwargs(loss_name, loss_kwargs) + + elif loss_name == 'multiblank_rnnt_pytorch': + big_blank_durations = loss_kwargs.pop('big_blank_durations', None) + sigma = loss_kwargs.pop('sigma', 0.0) + loss_func = MultiblankRNNTLossPytorch( + blank=blank_idx, big_blank_durations=big_blank_durations, reduction='none', sigma=sigma + ) + _warn_unused_additional_kwargs(loss_name, loss_kwargs) + else: raise ValueError( f"Invalid value of `loss_name`: {loss_name}. Allowed loss names are :" f"{loss_function_names}" @@ -311,7 +348,7 @@ def forward(self, log_probs, targets, input_lengths, target_lengths): # Reduce transcript length to correct alignment if additional padding was applied. # Transcript: [B, L] -> [B, L']; If L' < L if targets.shape[1] != max_targets_len: - targets = targets.narrow(dim=1, start=0, length=max_targets_len) + targets = targets.narrow(dim=1, start=0, length=max_targets_len).contiguous() # Temporarily override loss reduction loss_reduction = self._loss.reduction diff --git a/nemo/collections/asr/losses/rnnt_pytorch.py b/nemo/collections/asr/losses/rnnt_pytorch.py index 8400edac4e23..ab0b5cf4f630 100644 --- a/nemo/collections/asr/losses/rnnt_pytorch.py +++ b/nemo/collections/asr/losses/rnnt_pytorch.py @@ -110,3 +110,128 @@ def compute_forward_prob(self, acts, labels, act_lens, label_lens): log_prob = torch.stack(log_probs) return log_prob + + +class MultiblankRNNTLossPytorch(Loss): + """ + Pure Python implementation of multi-blank transducer loss (https://arxiv.org/pdf/2211.03541.pdf) + """ + + @property + def input_types(self): + """Input types definitions for CTCLoss. + """ + return { + "acts": NeuralType(('B', 'T', 'T', 'D'), LogprobsType()), + "labels": NeuralType(('B', 'T'), LabelsType()), + "act_lens": NeuralType(tuple('B'), LengthsType()), + "label_lens": NeuralType(tuple('B'), LengthsType()), + } + + @property + def output_types(self): + """Output types definitions for CTCLoss. + loss: + NeuralType(None) + """ + return {"loss": NeuralType(elements_type=LossType())} + + def __init__(self, blank, big_blank_durations, reduction, sigma): + super().__init__() + self.blank = blank + self.big_blank_durations = big_blank_durations + self.reduction = reduction + self.sigma = sigma + + def forward(self, acts, labels, act_lens, label_lens): + acts = torch.log_softmax(acts, -1) - self.sigma + forward_logprob = self.compute_forward_prob(acts, labels, act_lens, label_lens) + + losses = -forward_logprob + if self.reduction == 'mean_batch': + losses = losses.mean() # global batch size average + elif self.reduction == 'mean': + losses = torch.div(losses, label_lens).mean() + elif self.reduction == 'sum': + losses = losses.sum() + elif self.reduction == 'mean_volume': + losses = losses.sum() / label_lens.sum() # same as above but longer samples weigh more + + return losses + + def compute_forward_prob(self, acts, labels, act_lens, label_lens): + B, T, U, _ = acts.shape + + log_alpha = torch.zeros(B, T, U, device=acts.device) + for t in range(T): + for u in range(U): + if u == 0: + if t == 0: + # this is the base case: (t=0, u=0) with log-alpha = 0. + log_alpha[:, t, u] = 0.0 + else: + # this is case for (t = 0, u > 0), reached by (t, u - d) + # emitting a blank symbol of duration d. + log_alpha[:, t, u] = log_alpha[:, t - 1, u] + acts[:, t - 1, 0, self.blank] + for i, d in enumerate(self.big_blank_durations): + if t >= d: + tt = log_alpha[:, t - d, u] + acts[:, t - d, 0, self.blank - 1 - i] + log_alpha[:, t, u] = torch.logsumexp( + torch.stack([1.0 * log_alpha[:, t, u], tt]), dim=0 + ) + + else: + if t == 0: + # in case of (u > 0, t = 0), this is only reached from + # (t, u - 1) with a label emission. + gathered = torch.gather( + acts[:, t, u - 1], dim=1, index=labels[:, u - 1].view(-1, 1).type(torch.int64) + ).reshape(-1) + log_alpha[:, t, u] = log_alpha[:, t, u - 1] + gathered + else: + # here both t and u are > 0, this state is reachable + # with two possibilities: (t - d, u) with emission of + # blank with duration d, or (t, u - 1) with a label emission. + + # first we take care of the standard blank. + log_alpha[:, t, u] = torch.logsumexp( + torch.stack( + [ + log_alpha[:, t - 1, u] + acts[:, t - 1, u, self.blank], + log_alpha[:, t, u - 1] + + torch.gather( + acts[:, t, u - 1], dim=1, index=labels[:, u - 1].view(-1, 1).type(torch.int64) + ).reshape(-1), + ] + ), + dim=0, + ) + + # now we go over all big blanks. They need to be considered if current t >= blank duration d. + for i, d in enumerate(self.big_blank_durations): + if t >= d: + tt = log_alpha[:, t - d, u] + acts[:, t - d, u, self.blank - 1 - i] + log_alpha[:, t, u] = torch.logsumexp( + torch.stack([1.0 * log_alpha[:, t, u], tt]), dim=0 + ) + + log_probs = [] + for b in range(B): + # here we need to add the final blank emission weights, which needs + # to consider all possible blank durations. + to_append = ( + log_alpha[b, act_lens[b] - 1, label_lens[b]] + acts[b, act_lens[b] - 1, label_lens[b], self.blank] + ) + + for i, d in enumerate(self.big_blank_durations): + if act_lens[b] >= d: + tt = ( + log_alpha[b, act_lens[b] - d, label_lens[b]] + + acts[b, act_lens[b] - d, label_lens[b], self.blank - 1 - i] + ) + to_append = torch.logsumexp(torch.stack([1.0 * to_append, tt]), dim=0) + + log_probs.append(to_append) + log_prob = torch.stack(log_probs) + + return log_prob diff --git a/nemo/collections/asr/metrics/rnnt_wer.py b/nemo/collections/asr/metrics/rnnt_wer.py index 476b5a43c663..2777ca4d856c 100644 --- a/nemo/collections/asr/metrics/rnnt_wer.py +++ b/nemo/collections/asr/metrics/rnnt_wer.py @@ -195,6 +195,8 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int): super(AbstractRNNTDecoding, self).__init__() self.cfg = decoding_cfg self.blank_id = blank_id + self.num_extra_outputs = joint.num_extra_outputs + self.big_blank_durations = self.cfg.get("big_blank_durations", None) self.compute_hypothesis_token_set = self.cfg.get("compute_hypothesis_token_set", False) self.compute_langs = decoding_cfg.get('compute_langs', False) self.preserve_alignments = self.cfg.get('preserve_alignments', None) @@ -202,6 +204,10 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int): self.compute_timestamps = self.cfg.get('compute_timestamps', None) self.word_seperator = self.cfg.get('word_seperator', ' ') + if self.big_blank_durations is not None: + if blank_id == 0: + raise ValueError("blank_id must equal len(vocabs) for multi-blank RNN-T models") + possible_strategies = ['greedy', 'greedy_batch', 'beam', 'tsd', 'alsd', 'maes'] if self.cfg.strategy not in possible_strategies: raise ValueError(f"Decoding strategy must be one of {possible_strategies}") @@ -240,32 +246,58 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int): pass if self.cfg.strategy == 'greedy': - - self.decoding = greedy_decode.GreedyRNNTInfer( - decoder_model=decoder, - joint_model=joint, - blank_index=self.blank_id, - max_symbols_per_step=( - self.cfg.greedy.get('max_symbols', None) or self.cfg.greedy.get('max_symbols_per_step', None) - ), - preserve_alignments=self.preserve_alignments, - preserve_frame_confidence=self.preserve_frame_confidence, - confidence_method_cfg=self.confidence_method_cfg, - ) + if self.big_blank_durations is None: + self.decoding = greedy_decode.GreedyRNNTInfer( + decoder_model=decoder, + joint_model=joint, + blank_index=self.blank_id, + max_symbols_per_step=( + self.cfg.greedy.get('max_symbols', None) or self.cfg.greedy.get('max_symbols_per_step', None) + ), + preserve_alignments=self.preserve_alignments, + preserve_frame_confidence=self.preserve_frame_confidence, + confidence_method_cfg=self.confidence_method_cfg, + ) + else: + self.decoding = greedy_decode.GreedyMultiblankRNNTInfer( + decoder_model=decoder, + joint_model=joint, + blank_index=self.blank_id, + big_blank_durations=self.big_blank_durations, + max_symbols_per_step=( + self.cfg.greedy.get('max_symbols', None) or self.cfg.greedy.get('max_symbols_per_step', None) + ), + preserve_alignments=self.preserve_alignments, + preserve_frame_confidence=self.preserve_frame_confidence, + confidence_method_cfg=self.confidence_method_cfg, + ) elif self.cfg.strategy == 'greedy_batch': - - self.decoding = greedy_decode.GreedyBatchedRNNTInfer( - decoder_model=decoder, - joint_model=joint, - blank_index=self.blank_id, - max_symbols_per_step=( - self.cfg.greedy.get('max_symbols', None) or self.cfg.greedy.get('max_symbols_per_step', None) - ), - preserve_alignments=self.preserve_alignments, - preserve_frame_confidence=self.preserve_frame_confidence, - confidence_method_cfg=self.confidence_method_cfg, - ) + if self.big_blank_durations is None: + self.decoding = greedy_decode.GreedyBatchedRNNTInfer( + decoder_model=decoder, + joint_model=joint, + blank_index=self.blank_id, + max_symbols_per_step=( + self.cfg.greedy.get('max_symbols', None) or self.cfg.greedy.get('max_symbols_per_step', None) + ), + preserve_alignments=self.preserve_alignments, + preserve_frame_confidence=self.preserve_frame_confidence, + confidence_method_cfg=self.confidence_method_cfg, + ) + else: + self.decoding = greedy_decode.GreedyBatchedMultiblankRNNTInfer( + decoder_model=decoder, + joint_model=joint, + blank_index=self.blank_id, + big_blank_durations=self.big_blank_durations, + max_symbols_per_step=( + self.cfg.greedy.get('max_symbols', None) or self.cfg.greedy.get('max_symbols_per_step', None) + ), + preserve_alignments=self.preserve_alignments, + preserve_frame_confidence=self.preserve_frame_confidence, + confidence_method_cfg=self.confidence_method_cfg, + ) elif self.cfg.strategy == 'beam': @@ -437,8 +469,14 @@ def decode_hypothesis(self, hypotheses_list: List[Hypothesis]) -> List[Union[Hyp prediction = prediction.tolist() # RNN-T sample level is already preprocessed by implicit RNNT decoding - # Simply remove any blank tokens - prediction = [p for p in prediction if p != self.blank_id] + # Simply remove any blank and possibly big blank tokens + if self.blank_id != 0: + num_extra_outputs = 0 + if self.big_blank_durations is not None: + num_extra_outputs += len(self.big_blank_durations) + prediction = [p for p in prediction if p < self.blank_id - num_extra_outputs] + else: + prediction = [p for p in prediction if p != self.blank_id] # De-tokenize the integer tokens; if not computing timestamps if self.compute_timestamps is True: @@ -1005,10 +1043,14 @@ class RNNTDecoding(AbstractRNNTDecoding): def __init__( self, decoding_cfg, decoder, joint, vocabulary, ): - blank_id = len(vocabulary) + blank_id = ( + len(vocabulary) + joint.num_extra_outputs + ) # we need to ensure blank is the last token in the vocab. This is needed for multi-blank RNN-T models. self.labels_map = dict([(i, vocabulary[i]) for i in range(len(vocabulary))]) - super(RNNTDecoding, self).__init__(decoding_cfg=decoding_cfg, decoder=decoder, joint=joint, blank_id=blank_id) + super(RNNTDecoding, self).__init__( + decoding_cfg=decoding_cfg, decoder=decoder, joint=joint, blank_id=blank_id, + ) def _aggregate_token_confidence(self, hypothesis: Hypothesis) -> List[float]: """ @@ -1046,7 +1088,7 @@ def decode_ids_to_tokens(self, tokens: List[int]) -> List[str]: Returns: A list of decoded tokens. """ - token_list = [self.labels_map[c] for c in tokens if c != self.blank_id] + token_list = [self.labels_map[c] for c in tokens if c < self.blank_id - self.num_extra_outputs] return token_list def decode_tokens_to_lang(self, tokens: List[int]) -> str: diff --git a/nemo/collections/asr/metrics/rnnt_wer_bpe.py b/nemo/collections/asr/metrics/rnnt_wer_bpe.py index 74b264492314..2a8b981e7a74 100644 --- a/nemo/collections/asr/metrics/rnnt_wer_bpe.py +++ b/nemo/collections/asr/metrics/rnnt_wer_bpe.py @@ -199,7 +199,7 @@ def __init__(self, decoding_cfg, decoder, joint, tokenizer: TokenizerSpec): self.tokenizer = tokenizer super(RNNTBPEDecoding, self).__init__( - decoding_cfg=decoding_cfg, decoder=decoder, joint=joint, blank_id=blank_id + decoding_cfg=decoding_cfg, decoder=decoder, joint=joint, blank_id=blank_id + joint.num_extra_outputs ) def _aggregate_token_confidence(self, hypothesis: Hypothesis) -> List[float]: diff --git a/nemo/collections/asr/modules/rnnt.py b/nemo/collections/asr/modules/rnnt.py index d4cea3ae9070..03122f4d8e46 100644 --- a/nemo/collections/asr/modules/rnnt.py +++ b/nemo/collections/asr/modules/rnnt.py @@ -1174,6 +1174,7 @@ def __init__( self, jointnet: Dict[str, Any], num_classes: int, + num_extra_outputs: int = 0, vocabulary: Optional[List] = None, log_softmax: Optional[bool] = None, preserve_memory: bool = False, @@ -1186,7 +1187,8 @@ def __init__( self.vocabulary = vocabulary self._vocab_size = num_classes - self._num_classes = num_classes + 1 # add 1 for blank symbol + self._num_extra_outputs = num_extra_outputs + self._num_classes = num_classes + 1 + num_extra_outputs # 1 is for blank if experimental_fuse_loss_wer is not None: # Override fuse_loss_wer from deprecated argument @@ -1484,6 +1486,10 @@ def _update_adapter_cfg_input_dim(self, cfg: DictConfig): def num_classes_with_blank(self): return self._num_classes + @property + def num_extra_outputs(self): + return self._num_extra_outputs + @property def loss(self): return self._loss diff --git a/nemo/collections/asr/modules/rnnt_abstract.py b/nemo/collections/asr/modules/rnnt_abstract.py index 463292439e99..e473f64e5716 100644 --- a/nemo/collections/asr/modules/rnnt_abstract.py +++ b/nemo/collections/asr/modules/rnnt_abstract.py @@ -64,6 +64,10 @@ def joint(self, f: torch.Tensor, g: torch.Tensor) -> torch.Tensor: def num_classes_with_blank(self): raise NotImplementedError() + @property + def num_extra_outputs(self): + raise NotImplementedError() + class AbstractRNNTDecoder(NeuralModule, ABC): """ diff --git a/nemo/collections/asr/parts/numba/rnnt_loss/__init__.py b/nemo/collections/asr/parts/numba/rnnt_loss/__init__.py index 77c5b60fdfb3..66e30c77590a 100644 --- a/nemo/collections/asr/parts/numba/rnnt_loss/__init__.py +++ b/nemo/collections/asr/parts/numba/rnnt_loss/__init__.py @@ -13,4 +13,4 @@ # limitations under the License. from nemo.collections.asr.parts.numba.rnnt_loss.rnnt import rnnt_loss_cpu, rnnt_loss_gpu -from nemo.collections.asr.parts.numba.rnnt_loss.rnnt_pytorch import RNNTLossNumba +from nemo.collections.asr.parts.numba.rnnt_loss.rnnt_pytorch import MultiblankRNNTLossNumba, RNNTLossNumba diff --git a/nemo/collections/asr/parts/numba/rnnt_loss/rnnt.py b/nemo/collections/asr/parts/numba/rnnt_loss/rnnt.py index 401bad1a587e..64c8955006ed 100644 --- a/nemo/collections/asr/parts/numba/rnnt_loss/rnnt.py +++ b/nemo/collections/asr/parts/numba/rnnt_loss/rnnt.py @@ -234,3 +234,123 @@ def rnnt_loss_gpu( del gpu_workspace, wrapper return True + + +def multiblank_rnnt_loss_gpu( + acts: torch.Tensor, + labels: torch.Tensor, + input_lengths: torch.Tensor, + label_lengths: torch.Tensor, + costs: torch.Tensor, + grads: torch.Tensor, + blank_label: int, + big_blank_durations: list, + fastemit_lambda: float, + clamp: float, + num_threads: int, + sigma: float, +): + """ + Wrapper method for accessing GPU Multi-blank RNNT loss (https://arxiv.org/pdf/2211.03541.pdf). + + CUDA implementation ported from [HawkAaron/warp-transducer](https://github.com/HawkAaron/warp-transducer). + + Args: + acts: Activation tensor of shape [B, T, U, V + num_big_blanks + 1]. + labels: Ground truth labels of shape [B, U]. + input_lengths: Lengths of the acoustic sequence as a vector of ints [B]. + label_lengths: Lengths of the target sequence as a vector of ints [B]. + costs: Zero vector of length [B] in which costs will be set. + grads: Zero tensor of shape [B, T, U, V + num_big_blanks + 1] where the gradient will be set. + blank_label: Index of the standard blank token in the vocabulary. + big_blank_durations: A list of supported durations for big blank symbols + in the model, e.g. [2, 4, 8]. Note we only include durations for ``big + blanks'' here and it should not include 1 for the standard blank. + Those big blanks have vocabulary indices after the standard blank index. + fastemit_lambda: Float scaling factor for FastEmit regularization. Refer to + FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization. + clamp: Float value. When set to value >= 0.0, will clamp the gradient to [-clamp, clamp]. + num_threads: Number of threads for OpenMP. + sigma: logit-undernormalization weight used in the multi-blank model. Refer to + the multi-blank paper https://arxiv.org/pdf/2211.03541 for detailed explanations. + """ + minibatch_size = acts.shape[0] + maxT = acts.shape[1] + maxU = acts.shape[2] + alphabet_size = acts.shape[3] + + if hasattr(cuda, 'external_stream'): + stream = cuda.external_stream(torch.cuda.current_stream(acts.device).cuda_stream) + else: + stream = cuda.default_stream() + + if num_threads < 0: + num_threads = multiprocessing.cpu_count() + + num_threads = max(1, num_threads) # have to use at least 1 thread + + gpu_size, status = rnnt_helper.get_workspace_size(maxT, maxU, minibatch_size, gpu=True) + + if status != global_constants.RNNTStatus.RNNT_STATUS_SUCCESS: + raise RuntimeError("Invalid parameter passed when calculating working space memory") + + # Select GPU index + cuda.select_device(acts.device.index) + gpu_workspace = torch.zeros(gpu_size, device=acts.device, dtype=acts.dtype, requires_grad=False) + + big_blank_workspace = torch.zeros( + len(big_blank_durations), device=acts.device, dtype=torch.long, requires_grad=False + ) + + for i in range(0, len(big_blank_durations)): + big_blank_workspace[i] = big_blank_durations[i] + + ### VIEW TENSORS AS VECTORS FOR POINTER INDEXING ### + acts, acts_shape = rnnt_helper.flatten_tensor(acts) + + wrapper = gpu_rnnt.MultiblankGPURNNT( + minibatch=minibatch_size, + maxT=maxT, + maxU=maxU, + alphabet_size=alphabet_size, + workspace=gpu_workspace, + big_blank_workspace=big_blank_workspace, + num_big_blanks=len(big_blank_durations), + blank=blank_label, + fastemit_lambda=fastemit_lambda, + clamp=clamp, + num_threads=num_threads, + stream=stream, + sigma=sigma, + ) + + if grads is None: + status = wrapper.score_forward( + acts=acts.data, + costs=costs.data, + pad_labels=labels.data, + label_lengths=label_lengths.data, + input_lengths=input_lengths.data, + ) + + if status != global_constants.RNNTStatus.RNNT_STATUS_SUCCESS: + raise RuntimeError("Could not calculate forward scores") + + else: + ### FLATTEN GRAD TENSOR ### + grads, grads_shape = rnnt_helper.flatten_tensor(grads) + + status = wrapper.cost_and_grad( + acts=acts.data, + grads=grads.data, + costs=costs.data, + pad_labels=labels.data, + label_lengths=label_lengths.data, + input_lengths=input_lengths.data, + ) + + if status != global_constants.RNNTStatus.RNNT_STATUS_SUCCESS: + raise RuntimeError("Could not calculate forward scores") + + del gpu_workspace, big_blank_workspace, wrapper + return True diff --git a/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_pytorch.py b/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_pytorch.py index f28bbdb720ae..b8c8291243ef 100644 --- a/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_pytorch.py +++ b/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_pytorch.py @@ -34,7 +34,7 @@ from nemo.collections.asr.parts.numba.rnnt_loss import rnnt from nemo.collections.asr.parts.numba.rnnt_loss.utils.cpu_utils import cpu_rnnt -__all__ = ['rnnt_loss', 'RNNTLossNumba'] +__all__ = ['rnnt_loss', 'RNNTLossNumba', 'MultiblankRNNTLossNumba'] class _RNNTNumba(Function): @@ -91,6 +91,73 @@ def backward(ctx, grad_output): return ctx.grads.mul_(grad_output), None, None, None, None, None, None, None +class _MultiblankRNNTNumba(Function): + """ + Numba class for multi-blank transducer loss (https://arxiv.org/pdf/2211.03541.pdf) + """ + + @staticmethod + def forward( + ctx, acts, labels, act_lens, label_lens, blank, big_blank_durations, reduction, fastemit_lambda, clamp, sigma + ): + """ + big_blank_durations: list of durations for multi-blank transducer, e.g. + [2, 4, 8]. + sigma: hyper-parameter for logit under-normalization method for training + multi-blank transducers. Recommended value 0.05. + Refer to https://arxiv.org/pdf/2211.03541 for detailed explanations for + the above parameters; + For other parameters for this class, refer to comment for class _RNNTNumba + """ + is_cuda = acts.is_cuda + + certify_inputs(acts, labels, act_lens, label_lens) + if clamp < 0: + raise ValueError("`clamp` must be 0.0 or positive float value.") + + if is_cuda: + loss_func = rnnt.multiblank_rnnt_loss_gpu + else: + raise NotImplementedError() + + grads = torch.zeros_like(acts) if acts.requires_grad else None + minibatch_size = acts.size(0) + costs = torch.zeros(minibatch_size, device=acts.device, dtype=acts.dtype) + + loss_func( + acts, + labels=labels, + input_lengths=act_lens, + label_lengths=label_lens, + costs=costs, + grads=grads, + blank_label=blank, + big_blank_durations=big_blank_durations, + fastemit_lambda=fastemit_lambda, + clamp=clamp, + sigma=sigma, + num_threads=0, + ) + + if reduction in ['sum', 'mean']: + costs = costs.sum().unsqueeze_(-1) + if reduction == 'mean': + costs /= minibatch_size + + if grads is not None: + grads /= minibatch_size + + ctx.grads = grads + + return costs + + @staticmethod + def backward(ctx, grad_output): + if grad_output is not None and ctx.grads is not None: + grad_output = grad_output.view(-1, 1, 1, 1).to(ctx.grads) + return ctx.grads.mul_(grad_output), None, None, None, None, None, None, None, None, None, None + + def rnnt_loss( acts, labels, act_lens, label_lens, blank=0, reduction='mean', fastemit_lambda: float = 0.0, clamp: float = 0.0 ): @@ -122,6 +189,54 @@ def rnnt_loss( return _RNNTNumba.apply(acts, labels, act_lens, label_lens, blank, reduction, fastemit_lambda, clamp) +def multiblank_rnnt_loss( + acts, + labels, + act_lens, + label_lens, + blank, + big_blank_durations=[], + reduction='mean', + fastemit_lambda: float = 0.0, + clamp: float = 0.0, +): + """ + Multi-blank RNN Transducer (https://arxiv.org/pdf/2211.03541.pdf) Loss (functional form) + Args: + acts: Tensor of (batch x seqLength x labelLength x outputDim) containing output from network + labels: 2 dimensional Tensor containing all the targets of the batch with zero padded + act_lens: Tensor of size (batch) containing size of each output sequence from the network + label_lens: Tensor of (batch) containing label length of each example + blank (int): standard blank label. + big_blank_durations: list of durations for multi-blank transducer, e.g. + [2, 4, 8]. + sigma: hyper-parameter for logit under-normalization method for training + multi-blank transducers. Recommended value 0.05. + Refer to https://arxiv.org/pdf/2211.03541 for detailed explanations for + the last two params. + reduction (string, optional): Specifies the reduction to apply to the output: + 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, + 'mean': the output losses will be divided by the target lengths and + then the mean over the batch is taken. Default: 'mean' + """ + if not acts.is_cuda: + # Since CPU requires log_softmax to be computed explicitly, we need to perform grad clipping + # *after* we have obtained the gradients of loss(logsoftmax()). + # This is highly wasteful since it requires a copy of the entire joint tensor which is expensive. + # CUDA version is much more efficient since it performs an inplace logsoftmax, and therefore + # can inplace clamp the gradient. + if clamp > 0.0: + acts = cpu_rnnt.LogSoftmaxGradModification.apply(acts, clamp) + + # NOTE manually done log_softmax for CPU version, + # log_softmax is computed within GPU version. + acts = torch.nn.functional.log_softmax(acts, -1) + + return _MultiblankRNNTNumba.apply( + acts, labels, act_lens, label_lens, blank, big_blank_durations, reduction, fastemit_lambda, clamp + ) + + class RNNTLossNumba(Module): """ Parameters: @@ -168,6 +283,77 @@ def forward(self, acts, labels, act_lens, label_lens): ) +class MultiblankRNNTLossNumba(Module): + """ + Parameters: + blank (int): standard blank label. + big_blank_durations: list of durations for multi-blank transducer, e.g. + [2, 4, 8]. + sigma: hyper-parameter for logit under-normalization method for training + multi-blank transducers. Recommended value 0.05. + Refer to https://arxiv.org/pdf/2211.03541 for detailed explanations for + the above parameters; + reduction (string, optional): Specifies the reduction to apply to the output: + 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, + 'mean': the output losses will be divided by the target lengths and + then the mean over the batch is taken. Default: 'mean' + fastemit_lambda: Float scaling factor for FastEmit regularization. Refer to + FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization. + clamp: Float value. When set to value >= 0.0, will clamp the gradient to [-clamp, clamp]. + """ + + def __init__( + self, + blank, + big_blank_durations, + reduction='mean', + fastemit_lambda: float = 0.0, + clamp: float = -1, + sigma: float = 0.0, + ): + super(MultiblankRNNTLossNumba, self).__init__() + self.blank = blank + self.big_blank_durations = big_blank_durations + self.fastemit_lambda = fastemit_lambda + self.clamp = float(clamp) if clamp > 0 else 0.0 + self.reduction = reduction + self.loss = _MultiblankRNNTNumba.apply + self.sigma = sigma + + def forward(self, acts, labels, act_lens, label_lens): + """ + log_probs: Tensor of (batch x seqLength x labelLength x outputDim) containing output from network + labels: 2 dimensional Tensor containing all the targets of the batch with zero padded + act_lens: Tensor of size (batch) containing size of each output sequence from the network + label_lens: Tensor of (batch) containing label length of each example + """ + if not acts.is_cuda: + # Since CPU requires log_softmax to be computed explicitly, we need to perform grad clipping + # *after* we have obtained the gradients of loss(logsoftmax()). + # This is highly wasteful since it requires a copy of the entire joint tensor which is expensive. + # CUDA version is much more efficient since it performs an inplace logsoftmax, and therefore + # can inplace clamp the gradient. + if self.clamp > 0.0: + acts = cpu_rnnt.LogSoftmaxGradModification.apply(acts, self.clamp) + + # NOTE manually done log_softmax for CPU version, + # log_softmax is computed within GPU version. + acts = torch.nn.functional.log_softmax(acts, -1) + + return self.loss( + acts, + labels, + act_lens, + label_lens, + self.blank, + self.big_blank_durations, + self.reduction, + self.fastemit_lambda, + self.clamp, + self.sigma, + ) + + def check_type(var, t, name): if var.dtype is not t: raise TypeError("{} must be {}".format(name, t)) diff --git a/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/gpu_rnnt.py b/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/gpu_rnnt.py index 8dd7e78e8733..dca4e732c062 100644 --- a/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/gpu_rnnt.py +++ b/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/gpu_rnnt.py @@ -288,3 +288,235 @@ def _prepare_workspace(self) -> Tuple[int, Tuple[torch.Tensor, ...]]: used_offset += self.minibatch_ return used_offset, (denom, alphas, betas, llForward, llBackward) + + +class MultiblankGPURNNT(GPURNNT): + def __init__( + self, + sigma: float, + num_big_blanks: int, + minibatch: int, + maxT: int, + maxU: int, + alphabet_size: int, + workspace, + big_blank_workspace, + blank: int, + fastemit_lambda: float, + clamp: float, + num_threads: int, + stream, + ): + """ + Helper class to launch the CUDA Kernels to compute Multi-blank Transducer Loss (https://arxiv.org/pdf/2211.03541). + + Args: + sigma: Hyper-parameter related to the logit-normalization method in training multi-blank transducers. + num_big_blanks: Number of big blank symbols the model has. This should not include the standard blank symbol. + minibatch: Int representing the batch size. + maxT: The maximum possible acoustic sequence length. Represents T in the logprobs tensor. + maxU: The maximum possible target sequence length. Represents U in the logprobs tensor. + alphabet_size: The vocabulary dimension V + 1 + num-big-blanks + workspace: An allocated chunk of memory that will be sliced off and reshaped into required + blocks used as working memory. + big_blank_workspace: An allocated chunk of memory that will be sliced off and reshaped into required + blocks used as working memory specifically for the multi-blank related computations. + blank: Index of the RNNT blank token in the vocabulary. Generally the first or last token in the vocab. + fastemit_lambda: Float scaling factor for FastEmit regularization. Refer to + FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization. + clamp: Float value. When set to value >= 0.0, will clamp the gradient to [-clamp, clamp]. + num_threads: Number of OMP threads to launch. + stream: Numba Cuda Stream. + """ + super().__init__( + minibatch, maxT, maxU, alphabet_size, workspace, blank, fastemit_lambda, clamp, num_threads, stream + ) + self.big_blank_workspace = cuda.as_cuda_array( + big_blank_workspace + ) # a flat vector of integer numbers that represents allocated memory slices + + self.num_big_blanks = num_big_blanks + self.sigma = sigma + + def compute_cost_and_score( + self, + acts: torch.Tensor, + grads: Optional[torch.Tensor], + costs: torch.Tensor, + labels: torch.Tensor, + label_lengths: torch.Tensor, + input_lengths: torch.Tensor, + ) -> global_constants.RNNTStatus: + """ + Compute both the loss and the gradients. + + Args: + acts: A flattened tensor of shape [B, T, U, V+1] representing the activation matrix. + grad: A flattented zero tensor of same shape as acts. + costs: A zero vector of length B which will be updated inplace with the log probability costs. + flat_labels: A flattened matrix of labels of shape [B, U] + label_lengths: A vector of length B that contains the original lengths of the acoustic sequence. + input_lengths: A vector of length B that contains the original lengths of the target sequence. + + Updates: + This will launch kernels that will update inline the following variables: + - grads: Gradients of the activation matrix wrt the costs vector. + - costs: Negative log likelihood of the forward variable. + + Returns: + An enum that either represents a successful RNNT operation or failure. + """ + training = grads is not None + + if training: + grads *= 0.0 # zero grads + + _, (denom, alphas, betas, llForward, llBackward, bigblank_durations) = self._prepare_workspace() + + ######## START EXECUTION ######## + self.log_softmax(acts, denom) + + # Compute alphas + gpu_rnnt_kernel.compute_multiblank_alphas_kernel[self.minibatch_, self.maxU_, self.stream_, 0]( + acts, + denom, + self.sigma, + alphas, + llForward, + input_lengths, + label_lengths, + labels, + self.minibatch_, + self.maxT_, + self.maxU_, + self.alphabet_size_, + self.blank_, + bigblank_durations, + self.num_big_blanks, + ) + + if training: + # Compute betas + gpu_rnnt_kernel.compute_multiblank_betas_kernel[self.minibatch_, self.maxU_, self.stream_, 0]( + acts, + denom, + self.sigma, + betas, + llBackward, + input_lengths, + label_lengths, + labels, + self.minibatch_, + self.maxT_, + self.maxU_, + self.alphabet_size_, + self.blank_, + bigblank_durations, + self.num_big_blanks, + ) + + # Compute gradient + grad_blocks_per_grid = self.minibatch_ * self.maxT_ * self.maxU_ + grad_threads_per_block = gpu_rnnt_kernel.GPU_RNNT_THREAD_SIZE + gpu_rnnt_kernel.compute_multiblank_grad_kernel[ + grad_blocks_per_grid, grad_threads_per_block, self.stream_, 0 + ]( + grads, + acts, + denom, + self.sigma, + alphas, + betas, + llForward, + input_lengths, + label_lengths, + labels, + self.minibatch_, + self.maxT_, + self.maxU_, + self.alphabet_size_, + self.blank_, + bigblank_durations, + self.num_big_blanks, + self.fastemit_lambda_, + self.clamp_, + ) + + # // cost copy, negate (for log likelihood) and update with additional regularizers + # This needs to be done via CUDA, because we used temporary memory llForward + # passed to alpha, which was updated with log likelihoods. + # But copying this data into a pytorch pointer is more difficult (numba api is one way) + # Therefore launch a pointwise CUDA kernel to update the costs inplace from data of llForward + # Then negate to compute the loglikelihood. + threadsperblock = min(costs.shape[0], 32) + blockspergrid = (costs.shape[0] + (threadsperblock - 1)) // threadsperblock + rnnt_helper.compute_costs_data[blockspergrid, threadsperblock, self.stream_, 0]( + llForward, costs, self.fastemit_lambda_ + ) + self.stream_.synchronize() + + return global_constants.RNNTStatus.RNNT_STATUS_SUCCESS + + def cost_and_grad( + self, + acts: torch.Tensor, + grads: torch.Tensor, + costs: torch.Tensor, + pad_labels: torch.Tensor, + label_lengths: torch.Tensor, + input_lengths: torch.Tensor, + ): + if ( + acts is None + or grads is None + or costs is None + or pad_labels is None + or label_lengths is None + or input_lengths is None + ): + return global_constants.RNNTStatus.RNNT_STATUS_INVALID_VALUE + + return self.compute_cost_and_score(acts, grads, costs, pad_labels, label_lengths, input_lengths) + + def score_forward( + self, + acts: torch.Tensor, + costs: torch.Tensor, + pad_labels: torch.Tensor, + label_lengths: torch.Tensor, + input_lengths: torch.Tensor, + ): + if acts is None or costs is None or pad_labels is None or label_lengths is None or input_lengths is None: + return global_constants.RNNTStatus.RNNT_STATUS_INVALID_VALUE + + return self.compute_cost_and_score(acts, None, costs, pad_labels, label_lengths, input_lengths) + + def _prepare_workspace(self) -> (int, Tuple[torch.Tensor]): + """ + Helper method that uses the workspace and constructs slices of it that can be used. + + Returns: + An int, representing the offset of the used workspace (practically, the slice of the workspace consumed) + A tuple of tensors representing the shared workspace. + """ + used_offset = 0 + + # // denom + denom = self.gpu_workspace[used_offset : used_offset + self.maxT_ * self.maxU_ * self.minibatch_] + used_offset += self.maxT_ * self.maxU_ * self.minibatch_ + + # // alphas & betas + alphas = self.gpu_workspace[used_offset : used_offset + self.maxT_ * self.maxU_ * self.minibatch_] + used_offset += self.maxT_ * self.maxU_ * self.minibatch_ + betas = self.gpu_workspace[used_offset : used_offset + self.maxT_ * self.maxU_ * self.minibatch_] + used_offset += self.maxT_ * self.maxU_ * self.minibatch_ + + # // logllh + llForward = self.gpu_workspace[used_offset : used_offset + self.minibatch_] + used_offset += self.minibatch_ + llBackward = self.gpu_workspace[used_offset : used_offset + self.minibatch_] + used_offset += self.minibatch_ + + bigblank_durations = self.big_blank_workspace[: self.num_big_blanks] + + return used_offset, (denom, alphas, betas, llForward, llBackward, bigblank_durations) diff --git a/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/gpu_rnnt_kernel.py b/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/gpu_rnnt_kernel.py index db92147b3362..dbeb1544e7e3 100644 --- a/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/gpu_rnnt_kernel.py +++ b/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/gpu_rnnt_kernel.py @@ -397,3 +397,481 @@ def compute_grad_kernel( # update internal index through the thread_buffer; # until idx < V + 1, such that entire vocabulary has been updated. idx += GPU_RNNT_THREAD_SIZE + + +@cuda.jit() +def compute_multiblank_alphas_kernel( + acts: torch.Tensor, + denom: torch.Tensor, + sigma: float, + alphas: torch.Tensor, + llForward: torch.Tensor, + xlen: torch.Tensor, + ylen: torch.Tensor, + mlabels: torch.Tensor, + minibatch: int, + maxT: int, + maxU: int, + alphabet_size: int, + blank_: int, + big_blank_duration: torch.Tensor, + num_big_blanks: int, +): + """ + Compute alpha (forward variable) probabilities for multi-blank transducuer loss (https://arxiv.org/pdf/2211.03541). + + Args: + acts: Tensor of shape [B, T, U, V + 1 + num_big_blanks] flattened. Represents the logprobs activation tensor. + denom: Tensor of shape [B, T, U] flattened. Represents the denominator of the logprobs activation tensor + across entire vocabulary. + sigma: Hyper-parameter for logit-undernormalization technique for training multi-blank transducers. + alphas: Zero tensor of shape [B, T, U]. Will be updated inside the kernel with the forward variable + probabilities. + llForward: Zero tensor of shape [B]. Represents the log-likelihood of the forward pass. + Returned as the forward pass loss that is reduced by the optimizer. + xlen: Vector of length B which contains the actual acoustic sequence lengths in the padded + activation tensor. + ylen: Vector of length B which contains the actual target sequence lengths in the padded + activation tensor. + mlabels: Matrix of shape [B, U+1] (+1 here is due to token - usually the RNNT blank). + The matrix contains the padded target transcription that must be predicted. + minibatch: Int representing the batch size. + maxT: The maximum possible acoustic sequence length. Represents T in the logprobs tensor. + maxU: The maximum possible target sequence length. Represents U in the logprobs tensor. + alphabet_size: The vocabulary dimension V+1 (inclusive of RNNT blank). + blank_: Index of the RNNT standard blank token in the vocabulary. + big_blank_durations: Vector of supported big blank durations of the model. + num_big_blanks: Number of big blanks of the model. + + Updates: + Kernel inplace updates the following inputs: + - alphas: forward variable scores. + - llForward: log-likelihood of forward variable. + """ + # // launch B blocks, each block has U threads + b = cuda.blockIdx.x # // batch id + u = cuda.threadIdx.x # label id, u + T = xlen[b] # select AM length of current sample + U = ylen[b] + 1 # select target length of current sample, +1 for the blank token + + labels: torch.Tensor = mlabels[b] # mb label start point, equivalent to mlabels + b * (maxU - 1) + offset = b * maxT * maxU # pointer indexing offset + + # Initilize alpha[b, t=0, u=0] for all b in B + if u == 0: + alphas[offset] = 0 + + # sync until all alphas are initialized + cuda.syncthreads() + + # Ordinary alpha calculations, broadcast across B=b and U=u + # Look up forward variable calculation from rnnt_numpy.forward_pass() + # Note: because of the logit under-normalization, everytime logp() is called, + # it is always followed by a `-sigma` term. + for n in range(1, T + U - 1): + t = n - u + + if u == 0: + # for t in range(1, T) step to initialize alphas[b, t, 0] + if t > 0 and t < T: + alphas[offset + t * maxU + u] = ( + alphas[offset + (t - 1) * maxU + u] + + logp(denom, acts, maxT, maxU, alphabet_size, b, t - 1, 0, blank_) + - sigma + ) + + # Now add the weights for big blanks. + for i in range(num_big_blanks): + if t >= big_blank_duration[i]: + alphas[offset + t * maxU + u] = rnnt_helper.log_sum_exp( + alphas[offset + t * maxU + u], + alphas[offset + (t - big_blank_duration[i]) * maxU + u] + + logp( + denom, acts, maxT, maxU, alphabet_size, b, t - big_blank_duration[i], 0, blank_ - 1 - i + ) + - sigma, + ) + + elif u < U: + # for u in range(1, U) step to initialize alphas[b, 0, u] + if t == 0: + alphas[offset + u] = ( + alphas[offset + u - 1] + + logp(denom, acts, maxT, maxU, alphabet_size, b, 0, u - 1, labels[u - 1]) + - sigma + ) + + # for t in range(1, T) for u in range(1, U) step to compute alphas[b, t, u] + elif t > 0 and t < T: + no_emit = ( + alphas[offset + (t - 1) * maxU + u] + + logp(denom, acts, maxT, maxU, alphabet_size, b, t - 1, u, blank_) + - sigma + ) + emit = ( + alphas[offset + t * maxU + u - 1] + + logp(denom, acts, maxT, maxU, alphabet_size, b, t, u - 1, labels[u - 1]) + - sigma + ) + + alphas[offset + t * maxU + u] = rnnt_helper.log_sum_exp(emit, no_emit) + + # Now add the weights for big blanks. + for i in range(num_big_blanks): + if t >= big_blank_duration[i]: + # big-blank weight here is + # alpha(t - duration, u) * p(big-blank | t - duration, u) / exp(sigma), in log domain + # do this all all big-blanks if the above condition is met + big_blank_no_emit = ( + alphas[offset + (t - big_blank_duration[i]) * maxU + u] + + logp( + denom, acts, maxT, maxU, alphabet_size, b, t - big_blank_duration[i], u, blank_ - 1 - i + ) + - sigma + ) + alphas[offset + t * maxU + u] = rnnt_helper.log_sum_exp( + alphas[offset + t * maxU + u], big_blank_no_emit + ) + + # sync across all B=b and U=u + cuda.syncthreads() + + # After final sync, alphas[b, T-1, U - 1] + logprobs[b, T-1, U-1, blank] + denom[b, T-1, U-1] gives + # log-likelihood of forward pass. + if u == 0: + loglike = ( + alphas[offset + (T - 1) * maxU + U - 1] + + logp(denom, acts, maxT, maxU, alphabet_size, b, T - 1, U - 1, blank_) + - sigma + ) + + # Now add the weights for big blanks for the final weight computation. + for i in range(num_big_blanks): + if T >= big_blank_duration[i]: + big_blank_loglike = ( + alphas[offset + (T - big_blank_duration[i]) * maxU + U - 1] + + logp(denom, acts, maxT, maxU, alphabet_size, b, T - big_blank_duration[i], U - 1, blank_ - 1 - i) + - sigma + ) + loglike = rnnt_helper.log_sum_exp(loglike, big_blank_loglike) + + llForward[b] = loglike + + +@cuda.jit() +def compute_multiblank_betas_kernel( + acts: torch.Tensor, + denom: torch.Tensor, + sigma: float, + betas: torch.Tensor, + llBackward: torch.Tensor, + xlen: torch.Tensor, + ylen: torch.Tensor, + mlabels: torch.Tensor, # [B, U] + minibatch: int, + maxT: int, + maxU: int, + alphabet_size: int, + blank_: int, + big_blank_duration: torch.Tensor, + num_big_blanks: int, +): + """ + Compute beta (backward variable) probabilities for multi-blank transducer loss (https://arxiv.org/pdf/2211.03541). + + Args: + acts: Tensor of shape [B, T, U, V + 1 + num-big-blanks] flattened. Represents the logprobs activation tensor. + denom: Tensor of shape [B, T, U] flattened. Represents the denominator of the logprobs activation tensor + across entire vocabulary. + sigma: Hyper-parameter for logit-undernormalization technique for training multi-blank transducers. + betas: Zero tensor of shape [B, T, U]. Will be updated inside the kernel with the backward variable + probabilities. + llBackward: Zero tensor of shape [B]. Represents the log-likelihood of the backward pass. + Returned as the backward pass loss that is reduced by the optimizer. + xlen: Vector of length B which contains the actual acoustic sequence lengths in the padded + activation tensor. + ylen: Vector of length B which contains the actual target sequence lengths in the padded + activation tensor. + mlabels: Matrix of shape [B, U+1] (+1 here is due to token - usually the RNNT blank). + The matrix contains the padded target transcription that must be predicted. + minibatch: Int representing the batch size. + maxT: The maximum possible acoustic sequence length. Represents T in the logprobs tensor. + maxU: The maximum possible target sequence length. Represents U in the logprobs tensor. + alphabet_size: The vocabulary dimension V+1 (inclusive of RNNT blank). + blank_: Index of the RNNT standard blank token in the vocabulary. + big_blank_durations: Vector of supported big blank durations of the model. + num_big_blanks: Number of big blanks of the model. + + Updates: + Kernel inplace updates the following inputs: + - betas: backward variable scores. + - llBackward: log-likelihood of backward variable. + """ + # // launch B blocks, each block has U threads + b = cuda.blockIdx.x # // batch id + u = cuda.threadIdx.x # label id, u + T = xlen[b] # select AM length of current sample + U = ylen[b] + 1 # select target length of current sample, +1 for the blank token + + labels: torch.Tensor = mlabels[b] # mb label start point, equivalent to mlabels + b * (maxU - 1) + offset = b * maxT * maxU # pointer indexing offset + + # Note: just like the alphas, because of the logit under-normalization, everytime + # logp() is called, it is always followed by a `-sigma` term. + + # Initilize beta[b, t=T-1, u=U-1] for all b in B with log_probs[b, t=T-1, u=U-1, blank] + if u == 0: + betas[offset + (T - 1) * maxU + U - 1] = ( + logp(denom, acts, maxT, maxU, alphabet_size, b, T - 1, U - 1, blank_) - sigma + ) + + # sync until all betas are initialized + cuda.syncthreads() + + # Ordinary beta calculations, broadcast across B=b and U=u + # Look up backward variable calculation from rnnt_numpy.backward_pass() + for n in range(T + U - 2, -1, -1): + t = n - u + + if u == (U - 1): + # for t in reversed(range(T - 1)) step to initialize betas[b, t, U-1] + if t >= 0 and t < (T - 1): + # beta[t, U - 1] = beta[t + 1, U - 1] * p(blank | t, U - 1) / exp(sigma) + # this part is the same as regular RNN-T. + betas[offset + t * maxU + U - 1] = ( + betas[offset + (t + 1) * maxU + U - 1] + + logp(denom, acts, maxT, maxU, alphabet_size, b, t, U - 1, blank_) + - sigma + ) + + # now add the weights from big blanks + for i in range(num_big_blanks): + if t + big_blank_duration[i] < T: + # adding to beta[t, U - 1] of weight (in log domain), + # beta[t + duration, U - 1] * p(big-blank | t, U - 1) / exp(sigma) + betas[offset + t * maxU + U - 1] = rnnt_helper.log_sum_exp( + betas[offset + t * maxU + U - 1], + betas[offset + (t + big_blank_duration[i]) * maxU + U - 1] + + logp(denom, acts, maxT, maxU, alphabet_size, b, t, U - 1, blank_ - 1 - i) + - sigma, + ) + elif t + big_blank_duration[i] == T and big_blank_duration[i] != 1: + # adding to beta[T - duration, U - 1] of weight (in log domain), + # p(big-blank | T - duration, U - 1) / exp(sigma) + betas[offset + t * maxU + U - 1] = rnnt_helper.log_sum_exp( + betas[offset + t * maxU + U - 1], + logp(denom, acts, maxT, maxU, alphabet_size, b, t, U - 1, blank_ - 1 - i) - sigma, + ) + + elif u < U: + if t == T - 1: + # for u in reversed(range(U - 1)) step to initialize betas[b, T-1, u] + betas[offset + (T - 1) * maxU + u] = ( + betas[offset + (T - 1) * maxU + u + 1] + + logp(denom, acts, maxT, maxU, alphabet_size, b, T - 1, u, labels[u]) + - sigma + ) + elif (t >= 0) and (t < T - 1): + # for t in reversed(range(T - 1)) for u in reversed(range(U - 1)) step to compute betas[b, t, u] + no_emit = ( + betas[offset + (t + 1) * maxU + u] + + logp(denom, acts, maxT, maxU, alphabet_size, b, t, u, blank_) + - sigma + ) + emit = ( + betas[offset + t * maxU + u + 1] + + logp(denom, acts, maxT, maxU, alphabet_size, b, t, u, labels[u]) + - sigma + ) + betas[offset + t * maxU + u] = rnnt_helper.log_sum_exp(emit, no_emit) + + # now add the weights from big blanks + for i in range(num_big_blanks): + if t < T - big_blank_duration[i]: + # added weight for the big-blank, + # beta[t + duration, u] * p(big-blank | t, u) / exp(sigma) + big_blank_no_emit = ( + betas[offset + (t + big_blank_duration[i]) * maxU + u] + + logp(denom, acts, maxT, maxU, alphabet_size, b, t, u, blank_ - 1 - i) + - sigma + ) + betas[offset + t * maxU + u] = rnnt_helper.log_sum_exp( + betas[offset + t * maxU + u], big_blank_no_emit + ) + + # sync across all B=b and U=u + cuda.syncthreads() + + # After final sync, betas[b, 0, 0] gives + # log-likelihood of backward pass. + if u == 0: + llBackward[b] = betas[offset] + + +@cuda.jit() +def compute_multiblank_grad_kernel( + grads: torch.Tensor, + acts: torch.Tensor, + denom: torch.Tensor, + sigma: float, + alphas: torch.Tensor, + betas: torch.Tensor, + logll: torch.Tensor, + xlen: torch.Tensor, + ylen: torch.Tensor, + mlabels: torch.Tensor, # [B, U] + minibatch: int, + maxT: int, + maxU: int, + alphabet_size: int, + blank_: int, + big_blank_duration: torch.Tensor, + num_big_blanks: int, + fastemit_lambda: float, + clamp: float, +): + """ + Compute gradients for multi-blank transducer loss (https://arxiv.org/pdf/2211.03541). + + Args: + grads: Zero Tensor of shape [B, T, U, V + 1 + num_big_blanks]. Is updated by this kernel to contain the gradients + of this batch of samples. + acts: Tensor of shape [B, T, U, V + 1 + num_big_blanks] flattened. Represents the logprobs activation tensor. + denom: Tensor of shape [B, T, U] flattened. Represents the denominator of the logprobs activation tensor + across entire vocabulary. + sigma: Hyper-parameter for logit-undernormalization technique for training multi-blank transducers. + alphas: Alpha variable, contains forward probabilities. A tensor of shape [B, T, U]. + betas: Beta varoable, contains backward probabilities. A tensor of shape [B, T, U]. + logll: Log-likelihood of the forward variable, represented as a vector of shape [B]. + Represents the log-likelihood of the forward pass. + xlen: Vector of length B which contains the actual acoustic sequence lengths in the padded + activation tensor. + ylen: Vector of length B which contains the actual target sequence lengths in the padded + activation tensor. + mlabels: Matrix of shape [B, U+1] (+1 here is due to token - usually the RNNT blank). + The matrix contains the padded target transcription that must be predicted. + minibatch: Int representing the batch size. + maxT: The maximum possible acoustic sequence length. Represents T in the logprobs tensor. + maxU: The maximum possible target sequence length. Represents U in the logprobs tensor. + alphabet_size: The vocabulary dimension V+1 (inclusive of RNNT blank). + blank_: Index of the RNNT blank token in the vocabulary. Generally the first or last token in the vocab. + fastemit_lambda: Float scaling factor for FastEmit regularization. Refer to + FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization. + clamp: Float value. When set to value >= 0.0, will clamp the gradient to [-clamp, clamp]. + big_blank_durations: Vector of supported big blank durations of the model. + num_big_blanks: Number of big blanks of the model. + + Updates: + Kernel inplace updates the following inputs: + - grads: Gradients with respect to the log likelihood (logll). + """ + # Kernel call: + # blocks_per_grid = minibatch (b) * maxT (t) * maxU (u) + # threads_per_block = constant buffer size of parallel threads (v :: Constant) + tid = cuda.threadIdx.x # represents v, taking steps of some constant size + idx = tid # index of v < V+1; in steps of constant buffer size + col = cuda.blockIdx.x # represents a fused index of b * t * u + + # Decompose original indices from fused `col` + u = col % maxU # (b * t * u) % u = u + bt = (col - u) // maxU # (b * t * u - u) // U = b * t + t = bt % maxT # (b * t) % t = t + mb = (bt - t) // maxT # (b * t - t) // T = b + + # constants + T = xlen[mb] # select AM length of current sample + U = ylen[mb] + 1 # select target length of current sample, +1 for the blank token + labels: torch.Tensor = mlabels[mb] # labels = mlabels + mb * (maxU - 1); + + # Buffered gradient calculations, broadcast across B=b, T=t and U=u, looped over V with some constant stride. + # Look up gradient calculation from rnnt_numpy.compute_gradient() + if t < T and u < U: + # For cuda kernels, maximum number of threads per block is limited to some value. + # However, it may be the case that vocabulary size is larger than this limit + # To work around this, an arbitrary thread buffer size is chosen such that, + # 1) each element within the thread pool operates independently of the other + # 2) An inner while loop moves the index of each buffer element by the size of the buffer itself, + # such that all elements of the vocabulary size are covered in (V + 1 // thread_buffer) number of steps. + # As such, each thread will perform the while loop at least (V + 1 // thread_buffer) number of times + while idx < alphabet_size: + # remember, `col` represents the tri-index [b, t, u] + # therefore; logpk = denom[b, t, u] + acts[b, t, u, v] + logpk = denom[col] + acts[col * alphabet_size + idx] + # initialize the grad of the sample acts[b, t, u, v] + grad = math.exp(alphas[col] + betas[col] + logpk - logll[mb]) + + # In all of the following computation, whenever logpk is used, we + # need to subtract sigma based on our derivation of the gradient of + # the logit under-normalization method. + + # If FastEmit regularization is enabled, calculate the gradeint of probability of predicting the next label + # at the current timestep. + # The formula for this is Equation 9 in https://arxiv.org/abs/2010.11148, multiplied by the log probability + # of the current step (t, u), normalized by the total log likelihood. + # Once the gradient has been calculated, scale it by `fastemit_lambda`, as in Equation 10. + if fastemit_lambda > 0.0 and u < U - 1: + fastemit_grad = fastemit_lambda * math.exp( + alphas[col] # alphas(t, u) + + (denom[col] + acts[col * alphabet_size + labels[u]]) + + betas[col + 1] # betas(t, u+1) + + logpk # log Pr(k|t, u) + - sigma + - logll[mb] # total log likelihood for normalization + ) + else: + fastemit_grad = 0.0 + + # Update the gradient of act[b, t, u, v] with the gradient from FastEmit regularization + grad = grad + fastemit_grad + + # grad to last blank transition + # grad[b, T-1, U-1, v=blank] -= exp(alphas[b, t, u) + logpk - sigma - logll[b]) + if (idx == blank_) and (t == T - 1) and (u == U - 1): + grad -= math.exp(alphas[col] + logpk - sigma - logll[mb]) + else: + # this is one difference of the multi-blank gradient from standard RNN-T + # gradient -- basically, wherever the blank_ symbol is addressed in the + # original code, we need to do similar things to big blanks, and we need + # to change the if conditions to match the duration of the big-blank. + # grad[b, T-duration, U-1, v=big-blank] -= exp(alphas[b, t, u) + logpk - sigma - logll[b]) + for i in range(num_big_blanks): + if (idx == blank_ - 1 - i) and (t == T - big_blank_duration[i]) and (u == U - 1): + grad -= math.exp(alphas[col] + logpk - sigma - logll[mb]) + + # grad of blank across t < T; + # grad[b, t 0.0: + g = grads[col * alphabet_size + idx] + g = min(g, clamp) + g = max(g, -clamp) + grads[col * alphabet_size + idx] = g + + # update internal index through the thread_buffer; + # until idx < V + 1, such that entire vocabulary has been updated. + idx += GPU_RNNT_THREAD_SIZE diff --git a/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py b/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py index 5c0e15b8d470..a48af434bf2b 100644 --- a/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py +++ b/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py @@ -1446,6 +1446,742 @@ def _get_initial_states(self, batchsize): return input_states +class GreedyMultiblankRNNTInfer(GreedyRNNTInfer): + """A greedy transducer decoder for multi-blank RNN-T. + + Sequence level greedy decoding, performed auto-repressively. + + Args: + decoder_model: rnnt_utils.AbstractRNNTDecoder implementation. + joint_model: rnnt_utils.AbstractRNNTJoint implementation. + blank_index: int index of the blank token. Must be len(vocabulary) for multi-blank RNNTs. + big_blank_durations: a list containing durations for big blanks the model supports. + max_symbols_per_step: Optional int. The maximum number of symbols that can be added + to a sequence in a single time step; if set to None then there is + no limit. + preserve_alignments: Bool flag which preserves the history of alignments generated during + greedy decoding (sample / batched). When set to true, the Hypothesis will contain + the non-null value for `alignments` in it. Here, `alignments` is a List of List of + Tuple(Tensor (of length V + 1 + num-big-blanks), Tensor(scalar, label after argmax)). + The length of the list corresponds to the Acoustic Length (T). + Each value in the list (Ti) is a torch.Tensor (U), representing 1 or more targets from a vocabulary. + U is the number of target tokens for the current timestep Ti. + preserve_frame_confidence: Bool flag which preserves the history of per-frame confidence scores generated + during greedy decoding (sample / batched). When set to true, the Hypothesis will contain + the non-null value for `frame_confidence` in it. Here, `frame_confidence` is a List of List of floats. + The length of the list corresponds to the Acoustic Length (T). + Each value in the list (Ti) is a torch.Tensor (U), representing 1 or more confidence scores. + U is the number of target tokens for the current timestep Ti. + confidence_method_cfg: A dict-like object which contains the method name and settings to compute per-frame + confidence scores. + name: The method name (str). + Supported values: + - 'max_prob' for using the maximum token probability as a confidence. + - 'entropy' for using normalized entropy of a log-likelihood vector. + entropy_type: Which type of entropy to use (str). Used if confidence_method_cfg.name is set to `entropy`. + Supported values: + - 'gibbs' for the (standard) Gibbs entropy. If the temperature α is provided, + the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). + Note that for this entropy, the temperature should comply the following inequality: + 1/log(V) <= α <= -1/log(1-1/V) where V is the model vocabulary size. + - 'tsallis' for the Tsallis entropy with the Boltzmann constant one. + Tsallis entropy formula is the following: H_α = 1/(α-1)*(1-sum_i(p^α_i)), + where α is a parameter. When α == 1, it works like the Gibbs entropy. + More: https://en.wikipedia.org/wiki/Tsallis_entropy + - 'renui' for the Rényi entropy. + Rényi entropy formula is the following: H_α = 1/(1-α)*log_2(sum_i(p^α_i)), + where α is a parameter. When α == 1, it works like the Gibbs entropy. + More: https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy + temperature: Temperature scale for logsoftmax (α for entropies). Here we restrict it to be > 0. + When the temperature equals one, scaling is not applied to 'max_prob', + and any entropy type behaves like the Shannon entropy: H = -sum_i(p_i*log(p_i)) + entropy_norm: A mapping of the entropy value to the interval [0,1]. + Supported values: + - 'lin' for using the linear mapping. + - 'exp' for using exponential mapping with linear shift. + """ + + def __init__( + self, + decoder_model: rnnt_abstract.AbstractRNNTDecoder, + joint_model: rnnt_abstract.AbstractRNNTJoint, + blank_index: int, + big_blank_durations: list, + max_symbols_per_step: Optional[int] = None, + preserve_alignments: bool = False, + preserve_frame_confidence: bool = False, + confidence_method_cfg: Optional[DictConfig] = None, + ): + super().__init__( + decoder_model=decoder_model, + joint_model=joint_model, + blank_index=blank_index, + max_symbols_per_step=max_symbols_per_step, + preserve_alignments=preserve_alignments, + preserve_frame_confidence=preserve_frame_confidence, + confidence_method_cfg=confidence_method_cfg, + ) + self.big_blank_durations = big_blank_durations + self._SOS = blank_index - len(big_blank_durations) + + @torch.no_grad() + def _greedy_decode( + self, x: torch.Tensor, out_len: torch.Tensor, partial_hypotheses: Optional[rnnt_utils.Hypothesis] = None + ): + # x: [T, 1, D] + # out_len: [seq_len] + + # Initialize blank state and empty label set in Hypothesis + hypothesis = rnnt_utils.Hypothesis(score=0.0, y_sequence=[], dec_state=None, timestep=[], last_token=None) + + if partial_hypotheses is not None: + hypothesis.last_token = partial_hypotheses.last_token + hypothesis.y_sequence = ( + partial_hypotheses.y_sequence.cpu().tolist() + if isinstance(partial_hypotheses.y_sequence, torch.Tensor) + else partial_hypotheses.y_sequence + ) + if partial_hypotheses.dec_state is not None: + hypothesis.dec_state = self.decoder.batch_concat_states([partial_hypotheses.dec_state]) + hypothesis.dec_state = _states_to_device(hypothesis.dec_state, x.device) + + if self.preserve_alignments: + # Alignments is a 2-dimensional dangling list representing T x U + hypothesis.alignments = [[]] + + if self.preserve_frame_confidence: + hypothesis.frame_confidence = [[]] + + # if this variable is > 1, it means the last emission was a big-blank and we need to skip frames. + big_blank_duration = 1 + + # For timestep t in X_t + for time_idx in range(out_len): + if big_blank_duration > 1: + # skip frames until big_blank_duration == 1. + big_blank_duration -= 1 + continue + # Extract encoder embedding at timestep t + # f = x[time_idx, :, :].unsqueeze(0) # [1, 1, D] + f = x.narrow(dim=0, start=time_idx, length=1) + + # Setup exit flags and counter + not_blank = True + symbols_added = 0 + + # While blank is not predicted, or we dont run out of max symbols per timestep + while not_blank and (self.max_symbols is None or symbols_added < self.max_symbols): + # In the first timestep, we initialize the network with RNNT Blank + # In later timesteps, we provide previous predicted label as input. + if hypothesis.last_token is None and hypothesis.dec_state is None: + last_label = self._SOS + else: + last_label = label_collate([[hypothesis.last_token]]) + + # Perform prediction network and joint network steps. + g, hidden_prime = self._pred_step(last_label, hypothesis.dec_state) + # If preserving per-frame confidence, log_normalize must be true + logp = self._joint_step(f, g, log_normalize=True if self.preserve_frame_confidence else None)[ + 0, 0, 0, : + ] + + del g + + # torch.max(0) op doesnt exist for FP 16. + if logp.dtype != torch.float32: + logp = logp.float() + + # get index k, of max prob + v, k = logp.max(0) + k = k.item() # K is the label at timestep t_s in inner loop, s >= 0. + + # Note, we have non-blanks in the vocab first, followed by big blanks, and standard blank at last. + # here we check if it's a big blank and if yes, set the duration variable. + if k >= self._blank_index - len(self.big_blank_durations) and k < self._blank_index: + big_blank_duration = self.big_blank_durations[self._blank_index - k - 1] + + if self.preserve_alignments: + # insert logprobs into last timestep + hypothesis.alignments[-1].append((logp.to('cpu'), torch.tensor(k, dtype=torch.int32))) + + if self.preserve_frame_confidence: + # insert confidence into last timestep + hypothesis.frame_confidence[-1].append(self._get_confidence(logp)) + + del logp + + # If any type of blank token is predicted, exit inner loop, move onto next timestep t + if k >= self._blank_index - len(self.big_blank_durations): + not_blank = False + + if self.preserve_alignments: + # convert Ti-th logits into a torch array + hypothesis.alignments.append([]) # blank buffer for next timestep + + if self.preserve_frame_confidence: + hypothesis.frame_confidence.append([]) # blank buffer for next timestep + else: + # Append token to label set, update RNN state. + hypothesis.y_sequence.append(k) + hypothesis.score += float(v) + hypothesis.timestep.append(time_idx) + hypothesis.dec_state = hidden_prime + hypothesis.last_token = k + + # Increment token counter. + symbols_added += 1 + + # Remove trailing empty list of Alignments + if self.preserve_alignments: + if len(hypothesis.alignments[-1]) == 0: + del hypothesis.alignments[-1] + + # Remove trailing empty list of per-frame confidence + if self.preserve_frame_confidence: + if len(hypothesis.frame_confidence[-1]) == 0: + del hypothesis.frame_confidence[-1] + + # Unpack the hidden states + hypothesis.dec_state = self.decoder.batch_select_state(hypothesis.dec_state, 0) + + return hypothesis + + +class GreedyBatchedMultiblankRNNTInfer(GreedyBatchedRNNTInfer): + """A batch level greedy transducer decoder. + Batch level greedy decoding, performed auto-repressively. + Args: + decoder_model: rnnt_utils.AbstractRNNTDecoder implementation. + joint_model: rnnt_utils.AbstractRNNTJoint implementation. + blank_index: int index of the blank token. Must be len(vocabulary) for multi-blank RNNTs. + big_blank_durations: a list containing durations for big blanks the model supports. + max_symbols_per_step: Optional int. The maximum number of symbols that can be added + to a sequence in a single time step; if set to None then there is + no limit. + preserve_alignments: Bool flag which preserves the history of alignments generated during + greedy decoding (sample / batched). When set to true, the Hypothesis will contain + the non-null value for `alignments` in it. Here, `alignments` is a List of List of + Tuple(Tensor (of length V + 1 + num-big-blanks), Tensor(scalar, label after argmax)). + The length of the list corresponds to the Acoustic Length (T). + Each value in the list (Ti) is a torch.Tensor (U), representing 1 or more targets from a vocabulary. + U is the number of target tokens for the current timestep Ti. + preserve_frame_confidence: Bool flag which preserves the history of per-frame confidence scores generated + during greedy decoding (sample / batched). When set to true, the Hypothesis will contain + the non-null value for `frame_confidence` in it. Here, `frame_confidence` is a List of List of floats. + The length of the list corresponds to the Acoustic Length (T). + Each value in the list (Ti) is a torch.Tensor (U), representing 1 or more confidence scores. + U is the number of target tokens for the current timestep Ti. + confidence_method_cfg: A dict-like object which contains the method name and settings to compute per-frame + confidence scores. + name: The method name (str). + Supported values: + - 'max_prob' for using the maximum token probability as a confidence. + - 'entropy' for using normalized entropy of a log-likelihood vector. + entropy_type: Which type of entropy to use (str). Used if confidence_method_cfg.name is set to `entropy`. + Supported values: + - 'gibbs' for the (standard) Gibbs entropy. If the temperature α is provided, + the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). + Note that for this entropy, the temperature should comply the following inequality: + 1/log(V) <= α <= -1/log(1-1/V) where V is the model vocabulary size. + - 'tsallis' for the Tsallis entropy with the Boltzmann constant one. + Tsallis entropy formula is the following: H_α = 1/(α-1)*(1-sum_i(p^α_i)), + where α is a parameter. When α == 1, it works like the Gibbs entropy. + More: https://en.wikipedia.org/wiki/Tsallis_entropy + - 'renui' for the Rényi entropy. + Rényi entropy formula is the following: H_α = 1/(1-α)*log_2(sum_i(p^α_i)), + where α is a parameter. When α == 1, it works like the Gibbs entropy. + More: https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy + temperature: Temperature scale for logsoftmax (α for entropies). Here we restrict it to be > 0. + When the temperature equals one, scaling is not applied to 'max_prob', + and any entropy type behaves like the Shannon entropy: H = -sum_i(p_i*log(p_i)) + entropy_norm: A mapping of the entropy value to the interval [0,1]. + Supported values: + - 'lin' for using the linear mapping. + - 'exp' for using exponential mapping with linear shift. + """ + + def __init__( + self, + decoder_model: rnnt_abstract.AbstractRNNTDecoder, + joint_model: rnnt_abstract.AbstractRNNTJoint, + blank_index: int, + big_blank_durations: List[int], + max_symbols_per_step: Optional[int] = None, + preserve_alignments: bool = False, + preserve_frame_confidence: bool = False, + confidence_method_cfg: Optional[DictConfig] = None, + ): + super().__init__( + decoder_model=decoder_model, + joint_model=joint_model, + blank_index=blank_index, + max_symbols_per_step=max_symbols_per_step, + preserve_alignments=preserve_alignments, + preserve_frame_confidence=preserve_frame_confidence, + confidence_method_cfg=confidence_method_cfg, + ) + self.big_blank_durations = big_blank_durations + + # Depending on availability of `blank_as_pad` support + # switch between more efficient batch decoding technique + if self.decoder.blank_as_pad: + self._greedy_decode = self._greedy_decode_blank_as_pad + else: + self._greedy_decode = self._greedy_decode_masked + self._SOS = blank_index - len(big_blank_durations) + + def _greedy_decode_blank_as_pad( + self, + x: torch.Tensor, + out_len: torch.Tensor, + device: torch.device, + partial_hypotheses: Optional[List[rnnt_utils.Hypothesis]] = None, + ): + if partial_hypotheses is not None: + raise NotImplementedError("`partial_hypotheses` support is not supported") + + with torch.inference_mode(): + # x: [B, T, D] + # out_len: [B] + # device: torch.device + + # Initialize list of Hypothesis + batchsize = x.shape[0] + hypotheses = [ + rnnt_utils.Hypothesis(score=0.0, y_sequence=[], timestep=[], dec_state=None) for _ in range(batchsize) + ] + + # Initialize Hidden state matrix (shared by entire batch) + hidden = None + + # If alignments need to be preserved, register a danling list to hold the values + if self.preserve_alignments: + # alignments is a 3-dimensional dangling list representing B x T x U + for hyp in hypotheses: + hyp.alignments = [[]] + + # If confidence scores need to be preserved, register a danling list to hold the values + if self.preserve_frame_confidence: + # frame_confidence is a 3-dimensional dangling list representing B x T x U + for hyp in hypotheses: + hyp.frame_confidence = [[]] + hyp.y_3best = [[]] + hyp.frame_confidence_3best = [[[]]] + hyp.logp = [[]] + + # Last Label buffer + Last Label without blank buffer + # batch level equivalent of the last_label + last_label = torch.full([batchsize, 1], fill_value=self._SOS, dtype=torch.long, device=device) + + # this mask is true for if the emission is *any type* of blank. + blank_mask = torch.full([batchsize], fill_value=0, dtype=torch.bool, device=device) + + # Get max sequence length + max_out_len = out_len.max() + + # We have a mask for each big blank. A mask is "true" means: the previous emission is exactly the big-blank + # with the corresponding duration, or has larger duration. E.g., for big_blank_mask for duration 2, it will + # be set true if the previous emission was a big blank with duration 4, or 3 or 2; but false if prevoius + # emission was a standard blank (with duration = 1). + big_blank_masks = [torch.full([batchsize], fill_value=0, dtype=torch.bool, device=device)] * len( + self.big_blank_durations + ) + + # if this variable > 1, it means the previous emission was big-blank and we need to skip frames. + big_blank_duration = 1 + + for time_idx in range(max_out_len): + if big_blank_duration > 1: + # skip frames until big_blank_duration == 1 + big_blank_duration -= 1 + continue + f = x.narrow(dim=1, start=time_idx, length=1) # [B, 1, D] + + # Prepare t timestamp batch variables + not_blank = True + symbols_added = 0 + + # Reset all blank masks + blank_mask.mul_(False) + for i in range(len(big_blank_masks)): + big_blank_masks[i].mul_(False) + + # Update blank mask with time mask + # Batch: [B, T, D], but Bi may have seq len < max(seq_lens_in_batch) + # Forcibly mask with "blank" tokens, for all sample where current time step T > seq_len + blank_mask = time_idx >= out_len + for i in range(len(big_blank_masks)): + big_blank_masks[i] = time_idx >= out_len + + # Start inner loop + while not_blank and (self.max_symbols is None or symbols_added < self.max_symbols): + # Batch prediction and joint network steps + # If very first prediction step, submit SOS tag (blank) to pred_step. + # This feeds a zero tensor as input to AbstractRNNTDecoder to prime the state + if time_idx == 0 and symbols_added == 0 and hidden is None: + g, hidden_prime = self._pred_step(self._SOS, hidden, batch_size=batchsize) + else: + # Perform batch step prediction of decoder, getting new states and scores ("g") + g, hidden_prime = self._pred_step(last_label, hidden, batch_size=batchsize) + + # Batched joint step - Output = [B, V + 1 + num-big-blanks] + # If preserving per-frame confidence, log_normalize must be true + logp = self._joint_step(f, g, log_normalize=True if self.preserve_frame_confidence else None)[ + :, 0, 0, : + ] + + if logp.dtype != torch.float32: + logp = logp.float() + + # Get index k, of max prob for batch + v, k = logp.max(1) + del g + + # Update blank mask with current predicted blanks + # This is accumulating blanks over all time steps T and all target steps min(max_symbols, U) + k_is_blank = k >= self._blank_index - len(self.big_blank_durations) + blank_mask.bitwise_or_(k_is_blank) + + for i in range(len(big_blank_masks)): + # using <= since as we mentioned before, the mask doesn't store exact matches. + # instead, it is True when the predicted blank's duration is >= the duration that the + # mask corresponds to. + k_is_big_blank = k <= self._blank_index - 1 - i + + # need to do a bitwise_and since it could also be a non-blank. + k_is_big_blank.bitwise_and_(k_is_blank) + big_blank_masks[i].bitwise_or_(k_is_big_blank) + + del k_is_blank + + # If preserving alignments, check if sequence length of sample has been reached + # before adding alignment + if self.preserve_alignments: + # Insert logprobs into last timestep per sample + logp_vals = logp.to('cpu') + logp_ids = logp_vals.max(1)[1] + for batch_idx in range(batchsize): + if time_idx < out_len[batch_idx]: + hypotheses[batch_idx].alignments[-1].append( + (logp_vals[batch_idx], logp_ids[batch_idx]) + ) + del logp_vals + + # If preserving per-frame confidence, check if sequence length of sample has been reached + # before adding confidence scores + if self.preserve_frame_confidence: + # Insert probabilities into last timestep per sample + confidence = self._get_confidence(logp) + for batch_idx in range(batchsize): + if time_idx < out_len[batch_idx]: + hypotheses[batch_idx].frame_confidence[-1].append(confidence[batch_idx]) + del logp + + # If all samples predict / have predicted prior blanks, exit loop early + # This is equivalent to if single sample predicted k + if blank_mask.all(): + not_blank = False + + for i in range(len(big_blank_masks) + 1): + # The task here is find the shortest blank duration of all batches. + # so we start from the shortest blank duration and go up, + # and stop once we found the duration whose corresponding mask isn't all True. + if i == len(big_blank_masks) or not big_blank_masks[i].all(): + big_blank_duration = self.big_blank_durations[i - 1] if i > 0 else 1 + break + + # If preserving alignments, convert the current Uj alignments into a torch.Tensor + # Then preserve U at current timestep Ti + # Finally, forward the timestep history to Ti+1 for that sample + # All of this should only be done iff the current time index <= sample-level AM length. + # Otherwise ignore and move to next sample / next timestep. + if self.preserve_alignments: + + # convert Ti-th logits into a torch array + for batch_idx in range(batchsize): + + # this checks if current timestep <= sample-level AM length + # If current timestep > sample-level AM length, no alignments will be added + # Therefore the list of Uj alignments is empty here. + if len(hypotheses[batch_idx].alignments[-1]) > 0: + hypotheses[batch_idx].alignments.append([]) # blank buffer for next timestep + + # Do the same if preserving per-frame confidence + if self.preserve_frame_confidence: + + for batch_idx in range(batchsize): + if len(hypotheses[batch_idx].frame_confidence[-1]) > 0: + hypotheses[batch_idx].frame_confidence.append([]) # blank buffer for next timestep + hypotheses[batch_idx].y_3best.append([]) + hypotheses[batch_idx].frame_confidence_3best.append([]) + hypotheses[batch_idx].logp.append([]) + else: + # Collect batch indices where blanks occurred now/past + blank_indices = (blank_mask == 1).nonzero(as_tuple=False) + + # Recover prior state for all samples which predicted blank now/past + if hidden is not None: + # LSTM has 2 states + hidden_prime = self.decoder.batch_copy_states(hidden_prime, hidden, blank_indices) + + elif len(blank_indices) > 0 and hidden is None: + # Reset state if there were some blank and other non-blank predictions in batch + # Original state is filled with zeros so we just multiply + # LSTM has 2 states + hidden_prime = self.decoder.batch_copy_states(hidden_prime, None, blank_indices, value=0.0) + + # Recover prior predicted label for all samples which predicted blank now/past + k[blank_indices] = last_label[blank_indices, 0] + + # Update new label and hidden state for next iteration + last_label = k.clone().view(-1, 1) + hidden = hidden_prime + + # Update predicted labels, accounting for time mask + # If blank was predicted even once, now or in the past, + # Force the current predicted label to also be blank + # This ensures that blanks propogate across all timesteps + # once they have occured (normally stopping condition of sample level loop). + for kidx, ki in enumerate(k): + if blank_mask[kidx] == 0: + hypotheses[kidx].y_sequence.append(ki) + hypotheses[kidx].timestep.append(time_idx) + hypotheses[kidx].score += float(v[kidx]) + + symbols_added += 1 + + # Remove trailing empty list of alignments at T_{am-len} x Uj + if self.preserve_alignments: + for batch_idx in range(batchsize): + if len(hypotheses[batch_idx].alignments[-1]) == 0: + del hypotheses[batch_idx].alignments[-1] + + # Remove trailing empty list of confidence scores at T_{am-len} x Uj + if self.preserve_frame_confidence: + for batch_idx in range(batchsize): + if len(hypotheses[batch_idx].frame_confidence[-1]) == 0: + del hypotheses[batch_idx].frame_confidence[-1] + del hypotheses[batch_idx].y_3best[-1] + del hypotheses[batch_idx].frame_confidence_3best[-1] + del hypotheses[batch_idx].logp[-1] + + # Preserve states + for batch_idx in range(batchsize): + hypotheses[batch_idx].dec_state = self.decoder.batch_select_state(hidden, batch_idx) + + return hypotheses + + def _greedy_decode_masked( + self, + x: torch.Tensor, + out_len: torch.Tensor, + device: torch.device, + partial_hypotheses: Optional[List[rnnt_utils.Hypothesis]] = None, + ): + if partial_hypotheses is not None: + raise NotImplementedError("`partial_hypotheses` support is not supported") + + if self.big_blank_durations != [1] * len(self.big_blank_durations): + raise NotImplementedError( + "Efficient frame-skipping version for multi-blank masked decoding is not supported." + ) + + # x: [B, T, D] + # out_len: [B] + # device: torch.device + + # Initialize state + batchsize = x.shape[0] + hypotheses = [ + rnnt_utils.Hypothesis(score=0.0, y_sequence=[], timestep=[], dec_state=None) for _ in range(batchsize) + ] + + # Initialize Hidden state matrix (shared by entire batch) + hidden = None + + # If alignments need to be preserved, register a danling list to hold the values + if self.preserve_alignments: + # alignments is a 3-dimensional dangling list representing B x T x U + for hyp in hypotheses: + hyp.alignments = [[]] + else: + hyp.alignments = None + + # If confidence scores need to be preserved, register a danling list to hold the values + if self.preserve_frame_confidence: + # frame_confidence is a 3-dimensional dangling list representing B x T x U + for hyp in hypotheses: + hyp.frame_confidence = [[]] + + # Last Label buffer + Last Label without blank buffer + # batch level equivalent of the last_label + last_label = torch.full([batchsize, 1], fill_value=self._blank_index, dtype=torch.long, device=device) + last_label_without_blank = last_label.clone() + + # Mask buffers + blank_mask = torch.full([batchsize], fill_value=0, dtype=torch.bool, device=device) + + # Get max sequence length + max_out_len = out_len.max() + + with torch.inference_mode(): + for time_idx in range(max_out_len): + f = x.narrow(dim=1, start=time_idx, length=1) # [B, 1, D] + + # Prepare t timestamp batch variables + not_blank = True + symbols_added = 0 + + # Reset blank mask + blank_mask.mul_(False) + + # Update blank mask with time mask + # Batch: [B, T, D], but Bi may have seq len < max(seq_lens_in_batch) + # Forcibly mask with "blank" tokens, for all sample where current time step T > seq_len + blank_mask = time_idx >= out_len + + # Start inner loop + while not_blank and (self.max_symbols is None or symbols_added < self.max_symbols): + # Batch prediction and joint network steps + # If very first prediction step, submit SOS tag (blank) to pred_step. + # This feeds a zero tensor as input to AbstractRNNTDecoder to prime the state + if time_idx == 0 and symbols_added == 0 and hidden is None: + g, hidden_prime = self._pred_step(self._SOS, hidden, batch_size=batchsize) + else: + # Set a dummy label for the blank value + # This value will be overwritten by "blank" again the last label update below + # This is done as vocabulary of prediction network does not contain "blank" token of RNNT + last_label_without_blank_mask = last_label >= self._blank_index + last_label_without_blank[last_label_without_blank_mask] = 0 # temp change of label + last_label_without_blank[~last_label_without_blank_mask] = last_label[ + ~last_label_without_blank_mask + ] + + # Perform batch step prediction of decoder, getting new states and scores ("g") + g, hidden_prime = self._pred_step(last_label_without_blank, hidden, batch_size=batchsize) + + # Batched joint step - Output = [B, V + 1 + num-big-blanks] + # If preserving per-frame confidence, log_normalize must be true + logp = self._joint_step(f, g, log_normalize=True if self.preserve_frame_confidence else None)[ + :, 0, 0, : + ] + + if logp.dtype != torch.float32: + logp = logp.float() + + # Get index k, of max prob for batch + v, k = logp.max(1) + del g + + # Update blank mask with current predicted blanks + # This is accumulating blanks over all time steps T and all target steps min(max_symbols, U) + k_is_blank = k == self._blank_index + blank_mask.bitwise_or_(k_is_blank) + + # If preserving alignments, check if sequence length of sample has been reached + # before adding alignment + if self.preserve_alignments: + # Insert logprobs into last timestep per sample + logp_vals = logp.to('cpu') + logp_ids = logp_vals.max(1)[1] + for batch_idx in range(batchsize): + if time_idx < out_len[batch_idx]: + hypotheses[batch_idx].alignments[-1].append( + (logp_vals[batch_idx], logp_ids[batch_idx]) + ) + del logp_vals + + # If preserving per-frame confidence, check if sequence length of sample has been reached + # before adding confidence scores + if self.preserve_frame_confidence: + # Insert probabilities into last timestep per sample + confidence = self._get_confidence(logp) + for batch_idx in range(batchsize): + if time_idx < out_len[batch_idx]: + hypotheses[batch_idx].frame_confidence[-1].append(confidence[batch_idx]) + del logp + + # If all samples predict / have predicted prior blanks, exit loop early + # This is equivalent to if single sample predicted k + if blank_mask.all(): + not_blank = False + + # If preserving alignments, convert the current Uj alignments into a torch.Tensor + # Then preserve U at current timestep Ti + # Finally, forward the timestep history to Ti+1 for that sample + # All of this should only be done iff the current time index <= sample-level AM length. + # Otherwise ignore and move to next sample / next timestep. + if self.preserve_alignments: + + # convert Ti-th logits into a torch array + for batch_idx in range(batchsize): + + # this checks if current timestep <= sample-level AM length + # If current timestep > sample-level AM length, no alignments will be added + # Therefore the list of Uj alignments is empty here. + if len(hypotheses[batch_idx].alignments[-1]) > 0: + hypotheses[batch_idx].alignments.append([]) # blank buffer for next timestep + + # Do the same if preserving per-frame confidence + if self.preserve_frame_confidence: + + for batch_idx in range(batchsize): + if len(hypotheses[batch_idx].frame_confidence[-1]) > 0: + hypotheses[batch_idx].frame_confidence.append([]) # blank buffer for next timestep + else: + # Collect batch indices where blanks occurred now/past + blank_indices = (blank_mask == 1).nonzero(as_tuple=False) + + # Recover prior state for all samples which predicted blank now/past + if hidden is not None: + # LSTM has 2 states + hidden_prime = self.decoder.batch_copy_states(hidden_prime, hidden, blank_indices) + + elif len(blank_indices) > 0 and hidden is None: + # Reset state if there were some blank and other non-blank predictions in batch + # Original state is filled with zeros so we just multiply + # LSTM has 2 states + hidden_prime = self.decoder.batch_copy_states(hidden_prime, None, blank_indices, value=0.0) + + # Recover prior predicted label for all samples which predicted blank now/past + k[blank_indices] = last_label[blank_indices, 0] + + # Update new label and hidden state for next iteration + last_label = k.view(-1, 1) + hidden = hidden_prime + + # Update predicted labels, accounting for time mask + # If blank was predicted even once, now or in the past, + # Force the current predicted label to also be blank + # This ensures that blanks propogate across all timesteps + # once they have occured (normally stopping condition of sample level loop). + for kidx, ki in enumerate(k): + if blank_mask[kidx] == 0: + hypotheses[kidx].y_sequence.append(ki) + hypotheses[kidx].timestep.append(time_idx) + hypotheses[kidx].score += float(v[kidx]) + + symbols_added += 1 + + # Remove trailing empty list of alignments at T_{am-len} x Uj + if self.preserve_alignments: + for batch_idx in range(batchsize): + if len(hypotheses[batch_idx].alignments[-1]) == 0: + del hypotheses[batch_idx].alignments[-1] + + # Remove trailing empty list of confidence scores at T_{am-len} x Uj + if self.preserve_frame_confidence: + for batch_idx in range(batchsize): + if len(hypotheses[batch_idx].frame_confidence[-1]) == 0: + del hypotheses[batch_idx].frame_confidence[-1] + + # Preserve states + for batch_idx in range(batchsize): + hypotheses[batch_idx].dec_state = self.decoder.batch_select_state(hidden, batch_idx) + + return hypotheses + + @dataclass class GreedyRNNTInferConfig: max_symbols_per_step: Optional[int] = 10 diff --git a/tests/collections/asr/numba/rnnt_loss/test_rnnt_pytorch.py b/tests/collections/asr/numba/rnnt_loss/test_rnnt_pytorch.py index e1b0ce314b38..386f349ac139 100644 --- a/tests/collections/asr/numba/rnnt_loss/test_rnnt_pytorch.py +++ b/tests/collections/asr/numba/rnnt_loss/test_rnnt_pytorch.py @@ -12,13 +12,15 @@ # See the License for the specific language governing permissions and # limitations under the License. +import random + import numpy as np import pytest import torch -from nemo.collections.asr.losses.rnnt import RNNTLossPytorch +from nemo.collections.asr.losses.rnnt import MultiblankRNNTLossPytorch, RNNTLossPytorch from nemo.collections.asr.parts.numba.rnnt_loss.rnnt_numpy import RNNTLoss as RNNTLoss_Numpy -from nemo.collections.asr.parts.numba.rnnt_loss.rnnt_pytorch import RNNTLossNumba +from nemo.collections.asr.parts.numba.rnnt_loss.rnnt_pytorch import MultiblankRNNTLossNumba, RNNTLossNumba from nemo.core.utils import numba_utils from nemo.core.utils.numba_utils import __NUMBA_MINIMUM_VERSION__ @@ -458,5 +460,39 @@ def zero_grad(): assert np.allclose(pt_grads1_p_2, np_grads1 + np_grads2, atol=1e-5) +class TestMultiblankRNNTLoss: + @pytest.mark.unit + @pytest.mark.parametrize('device', DEVICES) + def test_case_randomized_act_label(self, device): + if device == 'cuda': + numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) + + B, T, U, V = 4, 8, 4, 8 # here V is number of non blank labels + big_blank_durations = [2, 4, 8] + sigma = 0.1 + + acts = torch.rand([B, T, U, V + 1 + len(big_blank_durations)]) + labels = [[random.randrange(0, V) for i in range(U - 1)] for j in range(B)] + + fn_pt = MultiblankRNNTLossNumba( + blank=V + len(big_blank_durations), + reduction='sum', + big_blank_durations=big_blank_durations, + sigma=sigma, + ) + pt_cost, pt_grads = wrap_and_call(fn_pt, acts, labels, device) + + fn_ag = MultiblankRNNTLossPytorch( + blank=V + len(big_blank_durations), + reduction='sum', + big_blank_durations=big_blank_durations, + sigma=sigma, + ) # ag for automatic gradient computation + ag_cost, ag_grads = wrap_and_call(fn_ag, acts, labels, device) + + assert np.allclose(pt_cost, ag_cost, rtol=1e-6), "multi-blank costs mismatch." + assert np.allclose(pt_grads, ag_grads, rtol=1e-2), "multi-blank gradient mismatch." + + if __name__ == "__main__": pytest.main([__file__]) diff --git a/tests/collections/asr/test_asr_rnnt_encdec_model.py b/tests/collections/asr/test_asr_rnnt_encdec_model.py index 1ad46d4af37a..7c023802ab6c 100644 --- a/tests/collections/asr/test_asr_rnnt_encdec_model.py +++ b/tests/collections/asr/test_asr_rnnt_encdec_model.py @@ -317,6 +317,94 @@ def test_greedy_decoding(self, greedy_class): with torch.no_grad(): _ = greedy(encoder_output=enc_out, encoded_lengths=enc_len) + @pytest.mark.skipif( + not NUMBA_RNNT_LOSS_AVAILABLE, reason='RNNTLoss has not been compiled with appropriate numba version.', + ) + @pytest.mark.unit + @pytest.mark.parametrize( + "greedy_class", [greedy_decode.GreedyMultiblankRNNTInfer, greedy_decode.GreedyBatchedMultiblankRNNTInfer], + ) + def test_multiblank_rnnt_greedy_decoding(self, greedy_class): + token_list = [" ", "a", "b", "c"] + vocab_size = len(token_list) + big_blank_durations = [2, 4] + + encoder_output_size = 4 + decoder_output_size = 4 + joint_output_shape = 4 + + prednet_cfg = {'pred_hidden': decoder_output_size, 'pred_rnn_layers': 1} + jointnet_cfg = { + 'encoder_hidden': encoder_output_size, + 'pred_hidden': decoder_output_size, + 'joint_hidden': joint_output_shape, + 'activation': 'relu', + } + + decoder = RNNTDecoder(prednet_cfg, vocab_size) + joint_net = RNNTJoint( + jointnet_cfg, vocab_size, vocabulary=token_list, num_extra_outputs=len(big_blank_durations) + ) + + greedy = greedy_class( + decoder, + joint_net, + blank_index=len(token_list), + big_blank_durations=big_blank_durations, + max_symbols_per_step=5, + ) + + # (B, D, T) + enc_out = torch.randn(1, encoder_output_size, 30) + enc_len = torch.tensor([30], dtype=torch.int32) + + with torch.no_grad(): + _ = greedy(encoder_output=enc_out, encoded_lengths=enc_len) + + @pytest.mark.skipif( + not NUMBA_RNNT_LOSS_AVAILABLE, reason='RNNTLoss has not been compiled with appropriate numba version.', + ) + @pytest.mark.unit + @pytest.mark.parametrize( + "greedy_class", [greedy_decode.GreedyMultiblankRNNTInfer, greedy_decode.GreedyBatchedMultiblankRNNTInfer], + ) + def test_multiblank_rnnt_greedy_decoding(self, greedy_class): + token_list = [" ", "a", "b", "c"] + vocab_size = len(token_list) + big_blank_durations = [2, 4] + + encoder_output_size = 4 + decoder_output_size = 4 + joint_output_shape = 4 + + prednet_cfg = {'pred_hidden': decoder_output_size, 'pred_rnn_layers': 1} + jointnet_cfg = { + 'encoder_hidden': encoder_output_size, + 'pred_hidden': decoder_output_size, + 'joint_hidden': joint_output_shape, + 'activation': 'relu', + } + + decoder = RNNTDecoder(prednet_cfg, vocab_size) + joint_net = RNNTJoint( + jointnet_cfg, vocab_size, vocabulary=token_list, num_extra_outputs=len(big_blank_durations) + ) + + greedy = greedy_class( + decoder, + joint_net, + blank_index=len(token_list), + big_blank_durations=big_blank_durations, + max_symbols_per_step=5, + ) + + # (B, D, T) + enc_out = torch.randn(1, encoder_output_size, 30) + enc_len = torch.tensor([30], dtype=torch.int32) + + with torch.no_grad(): + _ = greedy(encoder_output=enc_out, encoded_lengths=enc_len) + @pytest.mark.skipif( not NUMBA_RNNT_LOSS_AVAILABLE, reason='RNNTLoss has not been compiled with appropriate numba version.', ) From b71836badb335cbf87338ed8b71d40699aa5f59d Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 20 Dec 2022 22:28:15 -0800 Subject: [PATCH 089/154] [TTS][ZH] fix broken link for the script. (#5680) * change to main branch. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- tutorials/tts/FastPitch_ChineseTTS_Training.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb b/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb index 49455b1947a3..e0c58fdf6ebd 100644 --- a/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb +++ b/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb @@ -882,7 +882,7 @@ "- Finetuning with #1 has artifacts from the original audio (noise) that get passed on as input to the vocoder resulting in artifacts in vocoder output in the form of noise.\n", "- On the other hand, #2.1 (i.e. `Mel spectrogram predicted from FastPitch with groundtruth alignment and duration`) gives the best results because it enables HiFi-GAN to learn mel spectrograms generated by FastPitch as well as duration distributions closer to the real world (i.e. ground truth) durations. \n", "\n", - "From implementation perspective - we follow the same process described in [Finetuning FastPitch for a new speaker](FastPitch_Finetuning.ipynb) - i.e. take the latest checkpoint from FastPitch training and predict spectrograms for each of the input records in `train_manifest.json`, `test_manifest.json` and `val_manifest.json`. NeMo provides an efficient script, [scripts/dataset_processing/tts/generate_mels.py](https://raw.githubusercontent.com/nvidia/NeMo/$BRANCH/scripts/dataset_processing/tts/generate_mels.py), to generate Mel-spectrograms in the directory `NeMoChineseTTS/mels` and also create new JSON manifests with a suffix `_mel` by adding a new key `\"mel_filepath\"`. For example, `train_manifest.json` corresponds to `train_manifest_mel.json` saved in the same directory. You can run the following CLI to obtain the new JSON manifests." + "From implementation perspective - we follow the same process described in [Finetuning FastPitch for a new speaker](FastPitch_Finetuning.ipynb) - i.e. take the latest checkpoint from FastPitch training and predict spectrograms for each of the input records in `train_manifest.json`, `test_manifest.json` and `val_manifest.json`. NeMo provides an efficient script, [scripts/dataset_processing/tts/generate_mels.py](https://raw.githubusercontent.com/nvidia/NeMo/main/scripts/dataset_processing/tts/generate_mels.py), to generate Mel-spectrograms in the directory `NeMoChineseTTS/mels` and also create new JSON manifests with a suffix `_mel` by adding a new key `\"mel_filepath\"`. For example, `train_manifest.json` corresponds to `train_manifest_mel.json` saved in the same directory. You can run the following CLI to obtain the new JSON manifests." ] }, { From effc7f584cebcc2f2d6bb93d163f30e1e7efc719 Mon Sep 17 00:00:00 2001 From: Evelina <10428420+ekmb@users.noreply.github.com> Date: Wed, 21 Dec 2022 13:19:08 -0800 Subject: [PATCH 090/154] [TN/TTS docs] TN customization, g2p docs moved to tts (#5683) * TN customization, g2p docs moved to tts Signed-off-by: ekmb * link new TTS tutorial Signed-off-by: ekmb * combine 3 and 4 Signed-off-by: ekmb * remove note Signed-off-by: ekmb Signed-off-by: ekmb Signed-off-by: Elena Rastorgueva --- .../nlp/text_normalization/wfst/intro.rst | 7 ++++ .../wfst/wfst_customization.rst | 37 ++++++++++++++++++ .../wfst/wfst_resources.rst | 15 +++++++ .../wfst/wfst_text_normalization.rst | 13 ++---- .../wfst/wfst_text_processing_deployment.rst | 2 + docs/source/starthere/tutorials.rst | 9 +++-- docs/source/text_processing/intro.rst | 12 ------ .../text_processing/text_processing_all.bib | 34 ---------------- .../{text_processing/g2p => tts}/g2p.rst | 2 +- .../images/data_labeling_pipeline.png | Bin docs/source/tts/intro.rst | 5 ++- docs/source/tts/tts_all.bib | 35 +++++++++++++++++ 12 files changed, 109 insertions(+), 62 deletions(-) create mode 100644 docs/source/nlp/text_normalization/wfst/wfst_customization.rst create mode 100644 docs/source/nlp/text_normalization/wfst/wfst_resources.rst delete mode 100644 docs/source/text_processing/intro.rst delete mode 100644 docs/source/text_processing/text_processing_all.bib rename docs/source/{text_processing/g2p => tts}/g2p.rst (99%) rename docs/source/{text_processing/g2p => tts}/images/data_labeling_pipeline.png (100%) diff --git a/docs/source/nlp/text_normalization/wfst/intro.rst b/docs/source/nlp/text_normalization/wfst/intro.rst index 316e2a693bf2..05fff8391915 100644 --- a/docs/source/nlp/text_normalization/wfst/intro.rst +++ b/docs/source/nlp/text_normalization/wfst/intro.rst @@ -3,13 +3,20 @@ WFST-based (Inverse) Text Normalization NeMo supports Text Normalization (TN), audio-based TN and Inverse Text Normalization (ITN) tasks. +.. note:: + + *TN/ITN is transitioning from [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo) repository to a standalone [NVIDIA/NeMo-text-processing](https://github.com/NVIDIA/NeMo-text-processing) repository. All updates and discussions/issues should go to the new repository.* + + WFST-based TN/ITN: .. toctree:: :maxdepth: 2 wfst_text_normalization + wfst_customization wfst_text_processing_deployment + resources diff --git a/docs/source/nlp/text_normalization/wfst/wfst_customization.rst b/docs/source/nlp/text_normalization/wfst/wfst_customization.rst new file mode 100644 index 000000000000..9e30c58b9645 --- /dev/null +++ b/docs/source/nlp/text_normalization/wfst/wfst_customization.rst @@ -0,0 +1,37 @@ +.. _wfst_customization: + +Grammar customization +===================== + +.. note:: + + *TN/ITN is transitioning from [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo) repository to a standalone [NVIDIA/NeMo-text-processing](https://github.com/NVIDIA/NeMo-text-processing) repository. All updates and discussions/issues should go to the new repository.* + + +All grammar development is done with [Pynini library](https://www.opengrm.org/twiki/bin/view/GRM/Pynini). +These grammars can be exported to .far files and used with Riva/Sparrowhawk, see :doc:`Text Processing Deployment ` for details. + +Steps to customize grammars +--------------------------- + +1. Install [NeMo-TN from source](https://github.com/NVIDIA/NeMo-text-processing#from-source) +2. Run [nemo_text_processing/text_normalization/normalize.py](https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize.py) or [nemo_text_processing/inverse_text_normalization/inverse_normalize.py](https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/inverse_text_normalization/inverse_normalize.py) with `--verbose` flag to evaluate current behavior on the target case, see argument details in the scripts and [this tutorial](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Text_(Inverse)_Normalization.ipynb) +3. Modify existing grammars or add new grammars to cover the target case using [Tutorial on how to write new grammars](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/WFST_Tutorial.ipynb) +4. Add new test cases [here](https://github.com/NVIDIA/NeMo-text-processing/tree/main/tests/nemo_text_processing): + - Run python tests: + + .. code-block:: bash + + (optionally build grammars first and save to CACHE_DIR) + cd tests/nemo_text_processing && + cd pytest /test_*.py --cpu --tn_cache_dir=CACHE_DIR_WITH_FAR_FILES (--run_audio_based flag to also run audio-based TN tests, optional) + + - Run Sparrowhawk tests: + + .. code-block:: bash + + cd tools/text_processing_deployment && + bash export_grammars.sh --GRAMMARS= --LANGUAGE= --MODE=test + + +WFST TN/ITN resources could be found in :doc:`here `. diff --git a/docs/source/nlp/text_normalization/wfst/wfst_resources.rst b/docs/source/nlp/text_normalization/wfst/wfst_resources.rst new file mode 100644 index 000000000000..97870770ffa4 --- /dev/null +++ b/docs/source/nlp/text_normalization/wfst/wfst_resources.rst @@ -0,0 +1,15 @@ +.. _wfst_resources: + +Resources and Documentation +=========================== + +- List of `TN/ITN issues `_, use `TN/ITN` label +- TN/ITN related `discussions `_, use `TN/ITN` label +- Documentation on how to generate `.far files for deployment in Riva (via Sparrowhawk) `_ +- Tutorial that provides an `Overview of NeMo-TN/ITN `_ +- Tutorial on `how to write new grammars `_ in `Pynini `_ + + + + + diff --git a/docs/source/nlp/text_normalization/wfst/wfst_text_normalization.rst b/docs/source/nlp/text_normalization/wfst/wfst_text_normalization.rst index 28e573289642..3abdcdbb2d57 100644 --- a/docs/source/nlp/text_normalization/wfst/wfst_text_normalization.rst +++ b/docs/source/nlp/text_normalization/wfst/wfst_text_normalization.rst @@ -162,19 +162,12 @@ Language Support Matrix | Chinese | zh | x | | | | +------------------+----------+----------+----------+--------------------+----------------------+ -Grammar customization ---------------------- -.. note:: - - In-depth walk through `NeMo/tutorials/text_processing/WFST_tutorial.ipynb `__ in `Google's Colab `_. - - -Deploy to C++ ------------------ -See :doc:`Text Procesing Deployment ` for details. +See :doc:`Grammar customization ` for grammar customization details. +See :doc:`Text Processing Deployment ` for deployment in C++ details. +WFST TN/ITN resources could be found in :doc:`here `. References ---------- diff --git a/docs/source/nlp/text_normalization/wfst/wfst_text_processing_deployment.rst b/docs/source/nlp/text_normalization/wfst/wfst_text_processing_deployment.rst index 34810845b263..188ab81bd2fd 100644 --- a/docs/source/nlp/text_normalization/wfst/wfst_text_processing_deployment.rst +++ b/docs/source/nlp/text_normalization/wfst/wfst_text_processing_deployment.rst @@ -90,6 +90,8 @@ Go to script folder: This returns "ITN result: $2.50. TN result: two dollars fifty cents" +See :doc:`WFST Resources ` for more details. + References ---------- diff --git a/docs/source/starthere/tutorials.rst b/docs/source/starthere/tutorials.rst index 7452601bf021..df75194e2238 100644 --- a/docs/source/starthere/tutorials.rst +++ b/docs/source/starthere/tutorials.rst @@ -166,15 +166,18 @@ To run a tutorial: * - TTS - Inference and Model Selection - `TTS Inference and Model Selection `_ + * - TTS + - Pronunciation_customization + - `TTS Pronunciation_customization `_ * - Tools - CTC Segmentation - `CTC Segmentation `_ - * - Text Processing + * - Text Processing (TN/ITN) - Text Normalization and Inverse Normalization for ASR and TTS - `Text Normalization `_ - * - Text Processing + * - Text Processing (TN/ITN) - Inverse Text Normalization for ASR - Thutmose Tagger - `Inverse Text Normalization with Thutmose Tagger `_ - * - Text Processing + * - Text Processing (TN/ITN) - Constructing Normalization Grammars with WFSTs - `WFST Tutorial `_ diff --git a/docs/source/text_processing/intro.rst b/docs/source/text_processing/intro.rst deleted file mode 100644 index 2323063d714c..000000000000 --- a/docs/source/text_processing/intro.rst +++ /dev/null @@ -1,12 +0,0 @@ -NeMo Text Processing -==================== - -NeMo provides a set of models for text processing input and/or output of Automatic Speech Recognitions (ASR) and Text-to-Speech (TTS) synthesis models: \ -`https://github.com/NVIDIA/NeMo/tree/stable/nemo_text_processing `__ . - -.. toctree:: - :maxdepth: 1 - - g2p/g2p - - diff --git a/docs/source/text_processing/text_processing_all.bib b/docs/source/text_processing/text_processing_all.bib deleted file mode 100644 index d41deebe83dc..000000000000 --- a/docs/source/text_processing/text_processing_all.bib +++ /dev/null @@ -1,34 +0,0 @@ -@article{xue2021byt5, - title={ByT5: Towards a token-free future with pre-trained byte-to-byte models 2021}, - author={Xue, Linting and Barua, Aditya and Constant, Noah and Al-Rfou, Rami and Narang, Sharan and Kale, Mihir and Roberts, Adam and Raffel, Colin}, - journal={arXiv preprint arXiv:2105.13626}, - year={2021} -} - -@article{vrezavckova2021t5g2p, - title={T5g2p: Using text-to-text transfer transformer for grapheme-to-phoneme conversion}, - author={{\v{R}}ez{\'a}{\v{c}}kov{\'a}, Mark{\'e}ta and {\v{S}}vec, Jan and Tihelka, Daniel}, - year={2021}, - journal={International Speech Communication Association} -} - -@article{zhu2022byt5, - title={ByT5 model for massively multilingual grapheme-to-phoneme conversion}, - author={Zhu, Jian and Zhang, Cong and Jurgens, David}, - journal={arXiv preprint arXiv:2204.03067}, - year={2022} -} - -@article{ggulati2020conformer, - title={Conformer: Convolution-augmented transformer for speech recognition}, - author={Gulati, Anmol and Qin, James and Chiu, Chung-Cheng and Parmar, Niki and Zhang, Yu and Yu, Jiahui and Han, Wei and Wang, Shibo and Zhang, Zhengdong and Wu, Yonghui and others}, - journal={arXiv preprint arXiv:2005.08100}, - year={2020} -} - -@inproceedings{gorman2018improving, - title={Improving homograph disambiguation with supervised machine learning}, - author={Gorman, Kyle and Mazovetskiy, Gleb and Nikolaev, Vitaly}, - booktitle={Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, - year={2018} -} \ No newline at end of file diff --git a/docs/source/text_processing/g2p/g2p.rst b/docs/source/tts/g2p.rst similarity index 99% rename from docs/source/text_processing/g2p/g2p.rst rename to docs/source/tts/g2p.rst index b2d8b27c207d..36a717d3184f 100644 --- a/docs/source/text_processing/g2p/g2p.rst +++ b/docs/source/tts/g2p.rst @@ -203,7 +203,7 @@ G2P requires NeMo NLP and ASR collections installed. See `Installation instructi References ---------- -.. bibliography:: ../text_processing_all.bib ../../tts/tts_all.bib ../../nlp/nlp_all.bib +.. bibliography:: tts_all.bib :style: plain :labelprefix: g2p- :keyprefix: g2p-- diff --git a/docs/source/text_processing/g2p/images/data_labeling_pipeline.png b/docs/source/tts/images/data_labeling_pipeline.png similarity index 100% rename from docs/source/text_processing/g2p/images/data_labeling_pipeline.png rename to docs/source/tts/images/data_labeling_pipeline.png diff --git a/docs/source/tts/intro.rst b/docs/source/tts/intro.rst index e2a6104ac50d..3964319234b3 100644 --- a/docs/source/tts/intro.rst +++ b/docs/source/tts/intro.rst @@ -1,5 +1,5 @@ -Introduction -====================== +Text-to-Speech (TTS) +==================== Text-to-Speech (TTS) synthesis refers to a system that converts textual inputs into natural human speech. The synthesized speech is expected to sound intelligible and natural. With the resurgence of deep neural networks, TTS research has achieved tremendous progress. NeMo implementation focuses on the state-of-the-art neural TTS where both **cascaded** and **end-to-end** (upcoming) systems are included, @@ -17,5 +17,6 @@ We will illustrate details in the following sections. configs api resources + g2p .. include:: resources.rst diff --git a/docs/source/tts/tts_all.bib b/docs/source/tts/tts_all.bib index 7f437583f165..3e1615aecad3 100644 --- a/docs/source/tts/tts_all.bib +++ b/docs/source/tts/tts_all.bib @@ -67,3 +67,38 @@ @inproceedings{badlani2022one year={2022}, organization={IEEE} } + +@article{xue2021byt5, + title={ByT5: Towards a token-free future with pre-trained byte-to-byte models 2021}, + author={Xue, Linting and Barua, Aditya and Constant, Noah and Al-Rfou, Rami and Narang, Sharan and Kale, Mihir and Roberts, Adam and Raffel, Colin}, + journal={arXiv preprint arXiv:2105.13626}, + year={2021} +} + +@article{vrezavckova2021t5g2p, + title={T5g2p: Using text-to-text transfer transformer for grapheme-to-phoneme conversion}, + author={{\v{R}}ez{\'a}{\v{c}}kov{\'a}, Mark{\'e}ta and {\v{S}}vec, Jan and Tihelka, Daniel}, + year={2021}, + journal={International Speech Communication Association} +} + +@article{zhu2022byt5, + title={ByT5 model for massively multilingual grapheme-to-phoneme conversion}, + author={Zhu, Jian and Zhang, Cong and Jurgens, David}, + journal={arXiv preprint arXiv:2204.03067}, + year={2022} +} + +@article{ggulati2020conformer, + title={Conformer: Convolution-augmented transformer for speech recognition}, + author={Gulati, Anmol and Qin, James and Chiu, Chung-Cheng and Parmar, Niki and Zhang, Yu and Yu, Jiahui and Han, Wei and Wang, Shibo and Zhang, Zhengdong and Wu, Yonghui and others}, + journal={arXiv preprint arXiv:2005.08100}, + year={2020} +} + +@inproceedings{gorman2018improving, + title={Improving homograph disambiguation with supervised machine learning}, + author={Gorman, Kyle and Mazovetskiy, Gleb and Nikolaev, Vitaly}, + booktitle={Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, + year={2018} +} \ No newline at end of file From 9fa98d56a6cbef583a37b1edc2a33ef3cba0abf5 Mon Sep 17 00:00:00 2001 From: Adi Renduchintala <108822655+arendu@users.noreply.github.com> Date: Wed, 21 Dec 2022 15:01:04 -0800 Subject: [PATCH 091/154] Add prompt learning tests (#5649) * patch to allow using tokenizers without additional_special_tokens_ids attribute Signed-off-by: arendu * added gpt prompt learning and t5 prompt learning, made them run one after the other Signed-off-by: arendu * fixed changes Signed-off-by: arendu * gave unique names Signed-off-by: arendu * num workers set to 0 Signed-off-by: arendu * fixes to make num_workers>0 fast by using persistent_workers flag in dataloaders Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated to num_workers 8 Signed-off-by: arendu * updates to make num_workers arg in gpt/t5 infernce/training work Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: arendu * add num_workers arg in jenkins Signed-off-by: arendu * bs fix Signed-off-by: arendu * numworkers > 0 added for gpt prompt learning eval Signed-off-by: arendu * added num_workers Signed-off-by: arendu Signed-off-by: arendu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper Signed-off-by: Elena Rastorgueva --- Jenkinsfile | 414 ++++++++++-------- ...egatron_gpt_prompt_learning_inference.yaml | 1 + .../megatron_gpt_prompt_learning_eval.py | 3 + .../conf/megatron_gpt_adapter_inference.yaml | 1 + .../megatron_gpt_adapter_tuning_config.yaml | 2 +- .../conf/megatron_gpt_ia3_inference.yaml | 1 + .../tuning/megatron_gpt_adapter_eval.py | 4 + .../tuning/megatron_gpt_adapter_tuning.py | 2 + .../tuning/megatron_gpt_ia3_eval.py | 4 + .../tuning/megatron_gpt_ia3_tuning.py | 2 + .../tuning/megatron_t5_adapter_eval.py | 3 + .../tuning/megatron_t5_adapter_tuning.py | 2 + .../tuning/megatron_t5_ia3_eval.py | 3 + .../tuning/megatron_t5_ia3_tuning.py | 2 + .../megatron_gpt_prompt_learning_model.py | 1 + .../megatron_t5_prompt_learning_model.py | 1 + 16 files changed, 266 insertions(+), 180 deletions(-) diff --git a/Jenkinsfile b/Jenkinsfile index 4b7ed2ce833f..b6d7cf56fcf3 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -716,9 +716,10 @@ pipeline { name='test_tp1_pp2' \ exp_manager.exp_dir='examples/adapter_tuning' \ trainer.devices=2 \ - trainer.max_steps=6 \ - trainer.val_check_interval=2 \ + trainer.max_steps=1 \ + trainer.val_check_interval=1 \ trainer.max_epochs=null \ + model.data.num_workers=1 \ model.tensor_model_parallel_size=1 \ model.pipeline_model_parallel_size=2 \ model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ @@ -732,6 +733,7 @@ pipeline { adapter_model_file='examples/adapter_tuning/test_tp1_pp2.nemo' \ language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ trainer.devices=2 \ + data.num_workers=1 \ tensor_model_parallel_size=1 \ pipeline_model_parallel_size=2 \ data.global_batch_size=2 \ @@ -760,9 +762,10 @@ pipeline { name='test_tp2_pp1' \ exp_manager.exp_dir='examples/adapter_tuning' \ trainer.devices=2 \ - trainer.max_steps=6 \ - trainer.val_check_interval=2 \ + trainer.max_steps=1 \ + trainer.val_check_interval=1 \ trainer.max_epochs=null \ + model.data.num_workers=1 \ model.tensor_model_parallel_size=2 \ model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ model.existing_tasks=[] \ @@ -778,6 +781,7 @@ pipeline { tensor_model_parallel_size=2 \ data.global_batch_size=2 \ data.micro_batch_size=2 \ + data.num_workers=1 \ data.test_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ pred_file_path='examples/adapter_tuning/test_tp2_pp1/preds.txt'" sh "rm -rf examples/adapter_tuning/test_tp2_pp1.nemo" @@ -802,9 +806,10 @@ pipeline { name='test_tp1_pp2' \ exp_manager.exp_dir='examples/ia3_tuning' \ trainer.devices=2 \ - trainer.max_steps=6 \ - trainer.val_check_interval=2 \ + trainer.max_steps=1 \ + trainer.val_check_interval=1 \ trainer.max_epochs=null \ + model.data.num_workers=1 \ model.tensor_model_parallel_size=1 \ model.pipeline_model_parallel_size=2 \ model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ @@ -818,6 +823,7 @@ pipeline { adapter_model_file='examples/ia3_tuning/test_tp1_pp2.nemo' \ language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ trainer.devices=2 \ + data.num_workers=1 \ tensor_model_parallel_size=1 \ pipeline_model_parallel_size=2 \ data.global_batch_size=2 \ @@ -846,9 +852,10 @@ pipeline { name='test_tp2_pp1' \ exp_manager.exp_dir='examples/ia3_tuning' \ trainer.devices=2 \ - trainer.max_steps=6 \ - trainer.val_check_interval=2 \ + trainer.max_steps=1 \ + trainer.val_check_interval=1 \ trainer.max_epochs=null \ + model.data.num_workers=1 \ model.tensor_model_parallel_size=2 \ model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ model.existing_tasks=[] \ @@ -861,6 +868,7 @@ pipeline { adapter_model_file='examples/ia3_tuning/test_tp2_pp1.nemo' \ language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ trainer.devices=2 \ + data.num_workers=1 \ tensor_model_parallel_size=2 \ data.global_batch_size=2 \ data.micro_batch_size=2 \ @@ -888,9 +896,10 @@ pipeline { name='test_tp2_pp1' \ exp_manager.exp_dir='examples/adapter_tuning' \ trainer.devices=2 \ - trainer.max_steps=6 \ - trainer.val_check_interval=2 \ + trainer.max_steps=1 \ + trainer.val_check_interval=1 \ trainer.max_epochs=null \ + model.data.num_workers=1 \ model.tensor_model_parallel_size=2 \ model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp2_pp1.nemo' \ model.existing_tasks=[] \ @@ -903,6 +912,7 @@ pipeline { adapter_model_file='examples/adapter_tuning/test_tp2_pp1.nemo' \ gpt_model_file='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp2_pp1.nemo' \ inference.greedy=True \ + num_workers=1 \ inference.add_BOS=False \ trainer.devices=2 \ tensor_model_parallel_size=2 \ @@ -929,9 +939,10 @@ pipeline { name='test_tp1_pp2' \ exp_manager.exp_dir='examples/adapter_tuning' \ trainer.devices=2 \ - trainer.max_steps=6 \ - trainer.val_check_interval=2 \ + trainer.max_steps=1 \ + trainer.val_check_interval=1 \ trainer.max_epochs=null \ + model.data.num_workers=1 \ model.tensor_model_parallel_size=1 \ model.pipeline_model_parallel_size=2 \ model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp1_pp2.nemo' \ @@ -947,6 +958,7 @@ pipeline { inference.greedy=True \ inference.add_BOS=False \ trainer.devices=2 \ + num_workers=1 \ tensor_model_parallel_size=2 \ data_paths=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl']" sh "rm -rf examples/adapter_tuning/test_tp1_pp2.nemo" @@ -3450,111 +3462,125 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' } } - // TODO: fix these tests! They are taking ~10 minutes to complete - // stage('L2: Megatron GPT Prompt Tuning') { - // when { - // anyOf { - // branch 'main' - // changeRequest target: 'main' - // } - // } - // failFast true - // parallel{ - // stage('GPT Prompt Learning TP=1 PP=1') { - // steps { - // sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning.py \ - // --config-name=megatron_gpt_prompt_learning_config \ - // name='/home/TestData/nlp/prompt_learning/prompt_tuning_test' \ - // trainer.devices=1 \ - // trainer.max_steps=6 \ - // trainer.max_epochs=null \ - // model.tensor_model_parallel_size=1 \ - // model.virtual_prompt_style='prompt-tuning' \ - // model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp1_pp1.nemo' \ - // model.existing_tasks=[] \ - // model.new_tasks=['rte'] \ - // model.data.train_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - // model.global_batch_size=4" - // sh "rm -rf /home/TestData/nlp/prompt_learning/prompt_tuning_test" - // sh "rm -rf /home/TestData/nlp/prompt_learning/prompt_tuning_test.nemo" - // } - // } - // } - // } - - // TODO: fix these tests! They are taking ~10 minutes to complete - // stage('L2: Megatron GPT Prompt Learning') { - // when { - // anyOf { - // branch 'main' - // changeRequest target: 'main' - // } - // } - // failFast true - // parallel{ - // stage('GPT Prompt Learning TP=2 PP=1') { - // steps { - // sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning.py \ - // --config-name=megatron_gpt_prompt_learning_config \ - // name='/home/TestData/nlp/prompt_learning/p_tuning_test_tp' \ - // trainer.devices=2 \ - // trainer.max_steps=6 \ - // trainer.max_epochs=null \ - // model.tensor_model_parallel_size=2 \ - // model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp2_pp1.nemo' \ - // model.existing_tasks=[] \ - // model.new_tasks=['rte'] \ - // model.data.train_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - // model.global_batch_size=4" - // sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp" - // sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py \ - // virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/p_tuning_test_tp.nemo' \ - // gpt_model_file='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp2_pp1.nemo' \ - // inference.greedy=True \ - // inference.add_BOS=False \ - // trainer.devices=2 \ - // tensor_model_parallel_size=2 \ - // pred_file_path=/home/TestData/nlp/prompt_learning/p_tuning_test_tp_preds.txt \ - // data_paths=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl']" - // sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp.nemo" - // sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp_preds.txt" - // } - // } - // stage('GPT Prompt Learning TP=1 PP=2') { - // steps { - // sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning.py \ - // --config-name=megatron_gpt_prompt_learning_config \ - // name='/home/TestData/nlp/prompt_learning/p_tuning_test_pp' \ - // trainer.devices=2 \ - // trainer.max_steps=6 \ - // trainer.max_epochs=null \ - // model.pipeline_model_parallel_size=2 \ - // model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp1_pp2.nemo' \ - // model.existing_tasks=[] \ - // model.new_tasks=['boolq'] \ - // model.data.train_ds=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl'] \ - // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl'] \ - // model.global_batch_size=4" - // sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp" - // sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py \ - // virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/p_tuning_test_pp.nemo' \ - // gpt_model_file='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp1_pp2.nemo' \ - // inference.greedy=True \ - // inference.add_BOS=False \ - // trainer.devices=2 \ - // pipeline_model_parallel_size=2 \ - // pred_file_path=/home/TestData/nlp/prompt_learning/p_tuning_test_pp_preds.txt \ - // data_paths=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl']" - // sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp.nemo" - // sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp_preds.txt" - // } - // } - // } - // } + stage('L2: Megatron GPT Prompt Tuning TP1 PP1') { + when { + anyOf { + branch 'main' + changeRequest target: 'main' + } + } + failFast true + parallel{ + stage('GPT Prompt Learning TP=1 PP=1') { + steps { + sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning.py \ + --config-name=megatron_gpt_prompt_learning_config \ + name='/home/TestData/nlp/prompt_learning/prompt_tuning_test' \ + trainer.devices=1 \ + trainer.max_steps=1 \ + trainer.val_check_interval=1 \ + trainer.max_epochs=null \ + model.data.num_workers=1 \ + model.tensor_model_parallel_size=1 \ + model.virtual_prompt_style='prompt-tuning' \ + model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp1_pp1.nemo' \ + model.existing_tasks=[] \ + model.new_tasks=['rte'] \ + model.data.train_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ + model.data.validation_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ + model.global_batch_size=4" + sh "rm -rf /home/TestData/nlp/prompt_learning/prompt_tuning_test" + sh "rm -rf /home/TestData/nlp/prompt_learning/prompt_tuning_test.nemo" + } + } + } + } + stage('L2: Megatron GPT Prompt Tuning TP2 PP1') { + when { + anyOf { + branch 'main' + changeRequest target: 'main' + } + } + failFast true + parallel{ + stage('GPT Prompt Learning TP=2 PP=1') { + steps { + sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning.py \ + --config-name=megatron_gpt_prompt_learning_config \ + name='/home/TestData/nlp/prompt_learning/p_tuning_test_tp' \ + trainer.devices=2 \ + trainer.max_steps=1 \ + trainer.val_check_interval=1 \ + trainer.max_epochs=null \ + model.data.num_workers=1 \ + model.tensor_model_parallel_size=2 \ + model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp2_pp1.nemo' \ + model.existing_tasks=[] \ + model.new_tasks=['rte'] \ + model.data.train_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ + model.data.validation_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ + model.global_batch_size=4" + sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp" + sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py \ + virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/p_tuning_test_tp.nemo' \ + gpt_model_file='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp2_pp1.nemo' \ + inference.greedy=True \ + inference.add_BOS=False \ + trainer.devices=2 \ + tensor_model_parallel_size=2 \ + pred_file_path=/home/TestData/nlp/prompt_learning/p_tuning_test_tp_preds.txt \ + data_paths=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl']" + sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp.nemo" + sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp_preds.txt" + } + } + } + } + stage('L2: Megatron GPT Prompt Tuning TP1 PP2') { + when { + anyOf { + branch 'main' + changeRequest target: 'main' + } + } + failFast true + parallel{ + stage('GPT Prompt Learning TP=1 PP=2') { + steps { + sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning.py \ + --config-name=megatron_gpt_prompt_learning_config \ + name='/home/TestData/nlp/prompt_learning/p_tuning_test_pp' \ + trainer.devices=2 \ + trainer.max_steps=1 \ + trainer.val_check_interval=1 \ + trainer.max_epochs=null \ + model.data.num_workers=1 \ + model.pipeline_model_parallel_size=2 \ + model.language_model_path='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp1_pp2.nemo' \ + model.existing_tasks=[] \ + model.new_tasks=['boolq'] \ + model.data.train_ds=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl'] \ + model.data.validation_ds=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl'] \ + model.global_batch_size=4" + sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp" + sh "python examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py \ + virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/p_tuning_test_pp.nemo' \ + gpt_model_file='/home/TestData/nlp/megatron_gpt/tiny/megatron_14m_gpt_tp1_pp2.nemo' \ + inference.greedy=True \ + inference.add_BOS=False \ + trainer.devices=2 \ + pipeline_model_parallel_size=2 \ + pred_file_path=/home/TestData/nlp/prompt_learning/p_tuning_test_pp_preds.txt \ + data_paths=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl']" + sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp.nemo" + sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp_preds.txt" + } + } + } + } // TODO: Add this test back. Test was failing on CI machines due to HW error // stage('L2: Megatron GPT Convert from Megatron-LM checkpoing and Eval') { @@ -3833,7 +3859,8 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' sh "rm -rf examples/nlp/language_modeling/t5_index_mappings" } } - stage('L2: Megatron T5 Prompt Learning') { + + stage('L2: Megatron T5 Prompt Learning TP1 PP1') { when { anyOf { branch 'main' @@ -3848,9 +3875,10 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' --config-name=megatron_t5_prompt_learning \ name='/home/TestData/nlp/prompt_learning/t5_p_tuning_test' \ trainer.devices=1 \ - trainer.max_steps=6 \ + trainer.max_steps=1 \ + trainer.val_check_interval=1 \ trainer.max_epochs=null \ - model.tensor_model_parallel_size=1 \ + model.data.num_workers=1 \ model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m-refactor.nemo' \ model.existing_tasks=[] \ model.new_tasks=['squad'] \ @@ -3870,69 +3898,97 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_preds.txt" } } - // TODO: Fix these tests, they are taking ~10 minutes to complete - // stage('T5 Prompt Learning TP=2 PP=1') { - // steps { - // sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning.py \ - // --config-name=megatron_t5_prompt_learning \ - // name='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2' \ - // trainer.devices=2 \ - // trainer.max_steps=6 \ - // trainer.max_epochs=null \ - // model.tensor_model_parallel_size=2 \ - // model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ - // model.existing_tasks=[] \ - // model.new_tasks=['squad'] \ - // model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - // model.global_batch_size=8 \ - // model.micro_batch_size=8" - // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2" - // sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \ - // virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo' \ - // language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ - // data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - // pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt' \ - // tensor_model_parallel_size=2 \ - // trainer.devices=2 \ - // data.global_batch_size=8 \ - // data.micro_batch_size=8" - // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo" - // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt" - // } - // } - // stage('T5 Prompt Learning TP=1 PP=2') { - // steps { - // sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning.py \ - // --config-name=megatron_t5_prompt_learning \ - // name='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2' \ - // trainer.devices=2 \ - // trainer.max_steps=6 \ - // trainer.max_epochs=null \ - // model.pipeline_model_parallel_size=2 \ - // model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ - // model.existing_tasks=[] \ - // model.new_tasks=['squad'] \ - // model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - // model.global_batch_size=8 \ - // model.micro_batch_size=8" - // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2" - // sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \ - // virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2.nemo' \ - // language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ - // data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - // pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2_preds.txt' \ - // tensor_model_parallel_size=2 \ - // trainer.devices=2 \ - // data.global_batch_size=8 \ - // data.micro_batch_size=8" - // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2.nemo" - // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2_preds.txt" - // } - // } } } + + stage('L2: Megatron T5 Prompt Learning TP2 PP1') { + when { + anyOf { + branch 'main' + changeRequest target: 'main' + } + } + failFast true + parallel{ + stage('T5 Prompt Learning TP=2 PP=1') { + steps { + sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning.py \ + --config-name=megatron_t5_prompt_learning \ + name='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2' \ + trainer.devices=2 \ + trainer.max_steps=1 \ + trainer.val_check_interval=1 \ + trainer.max_epochs=null \ + model.data.num_workers=1 \ + model.tensor_model_parallel_size=2 \ + model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ + model.existing_tasks=[] \ + model.new_tasks=['squad'] \ + model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ + model.data.validation_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ + model.global_batch_size=8 \ + model.micro_batch_size=8" + sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2" + sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \ + virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo' \ + language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ + data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ + pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt' \ + tensor_model_parallel_size=2 \ + trainer.devices=2 \ + data.global_batch_size=8 \ + data.micro_batch_size=8" + sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo" + sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt" + } + } + } + } + + stage('L2: Megatron T5 Prompt Learning TP1 PP2') { + when { + anyOf { + branch 'main' + changeRequest target: 'main' + } + } + failFast true + parallel{ + stage('T5 Prompt Learning TP=1 PP=2') { + steps { + sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning.py \ + --config-name=megatron_t5_prompt_learning \ + name='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2' \ + trainer.devices=2 \ + trainer.max_steps=1 \ + trainer.val_check_interval=1 \ + trainer.max_epochs=null \ + model.data.num_workers=1 \ + model.pipeline_model_parallel_size=2 \ + model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ + model.existing_tasks=[] \ + model.new_tasks=['squad'] \ + model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ + model.data.validation_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ + model.global_batch_size=8 \ + model.micro_batch_size=8" + sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2" + sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \ + virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2.nemo' \ + language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ + data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ + pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2_preds.txt' \ + tensor_model_parallel_size=2 \ + trainer.devices=2 \ + data.global_batch_size=8 \ + data.micro_batch_size=8" + sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2.nemo" + sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2_preds.txt" + } + } + } + } + stage('L2: Megatron UL2 Pretraining and Resume Training TP=2') { when { anyOf { diff --git a/examples/nlp/language_modeling/conf/megatron_gpt_prompt_learning_inference.yaml b/examples/nlp/language_modeling/conf/megatron_gpt_prompt_learning_inference.yaml index fef8c26759cd..e0f1bef24c0d 100644 --- a/examples/nlp/language_modeling/conf/megatron_gpt_prompt_learning_inference.yaml +++ b/examples/nlp/language_modeling/conf/megatron_gpt_prompt_learning_inference.yaml @@ -24,4 +24,5 @@ gpt_model_file: null # GPT nemo file path virtual_prompt_model_file: ??? # path to a MegatronGPTPromptLearningModel model if you want to use soft prompts pred_file_path: ??? # Path will model predictions will be written data_paths: # paths to .jsonl files you want to perform inference on +num_workers: 8 \ No newline at end of file diff --git a/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py b/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py index 182ef69b78d2..38c08ad46a52 100644 --- a/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py +++ b/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py @@ -13,6 +13,7 @@ # limitations under the License. import torch +import torch.multiprocessing as mp from apex.transformer import parallel_state from omegaconf import OmegaConf from omegaconf.omegaconf import open_dict @@ -25,6 +26,7 @@ from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy from nemo.core.config import hydra_runner +mp.set_start_method("spawn", force=True) """ This is the script to run GPT text generation. @@ -143,6 +145,7 @@ def placeholder(): tokens_to_generate=length_params["max_length"], drop_last=False, shuffle=False, + num_workers=cfg.get("num_workers", 1), ) config = OmegaConf.to_container(cfg.inference) diff --git a/examples/nlp/language_modeling/tuning/conf/megatron_gpt_adapter_inference.yaml b/examples/nlp/language_modeling/tuning/conf/megatron_gpt_adapter_inference.yaml index 5325704fb72a..0fdf326d169f 100644 --- a/examples/nlp/language_modeling/tuning/conf/megatron_gpt_adapter_inference.yaml +++ b/examples/nlp/language_modeling/tuning/conf/megatron_gpt_adapter_inference.yaml @@ -31,3 +31,4 @@ data_paths: ??? # prompts for GPT inference server: False # whether launch the inference server port: 5555 # the port number for the inference server batch_size: 8 +num_workers: 8 diff --git a/examples/nlp/language_modeling/tuning/conf/megatron_gpt_adapter_tuning_config.yaml b/examples/nlp/language_modeling/tuning/conf/megatron_gpt_adapter_tuning_config.yaml index 4160d15b9a1f..d2c6118bb34e 100755 --- a/examples/nlp/language_modeling/tuning/conf/megatron_gpt_adapter_tuning_config.yaml +++ b/examples/nlp/language_modeling/tuning/conf/megatron_gpt_adapter_tuning_config.yaml @@ -116,7 +116,7 @@ model: validation_ds: ??? # expects a paths to validation data files add_eos: True shuffle: True - num_workers: 0 + num_workers: 8 pin_memory: True diff --git a/examples/nlp/language_modeling/tuning/conf/megatron_gpt_ia3_inference.yaml b/examples/nlp/language_modeling/tuning/conf/megatron_gpt_ia3_inference.yaml index d3f84564e252..d04995d015b5 100644 --- a/examples/nlp/language_modeling/tuning/conf/megatron_gpt_ia3_inference.yaml +++ b/examples/nlp/language_modeling/tuning/conf/megatron_gpt_ia3_inference.yaml @@ -29,3 +29,4 @@ checkpoint_name: null # PTL checkpoint file name, only used for PTL checkpoint l hparams_file: null # model configuration file, only used for PTL checkpoint loading data_paths: ??? # prompts for GPT inference batch_size: 8 +num_workers: 8 \ No newline at end of file diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_eval.py b/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_eval.py index a970cb2a7d9d..671a61a8e93c 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_eval.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_eval.py @@ -14,6 +14,7 @@ import torch +import torch.multiprocessing as mp from apex.transformer import parallel_state from omegaconf import OmegaConf from omegaconf.omegaconf import open_dict @@ -23,6 +24,8 @@ from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy from nemo.core.config import hydra_runner +mp.set_start_method("spawn", force=True) + """ This is the script to run an Adapter Tuned GPT Model for text generation. @@ -101,6 +104,7 @@ def dummy(): tokens_to_generate=cfg.inference.tokens_to_generate, drop_last=False, shuffle=False, + num_workers=cfg.get("num_workers", 1), ) config = OmegaConf.to_container(cfg.inference) diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py b/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py index 5d03d96d77f5..0551db4fc418 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py @@ -13,6 +13,7 @@ # limitations under the License. +import torch.multiprocessing as mp from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer @@ -29,6 +30,7 @@ from nemo.utils import logging from nemo.utils.exp_manager import exp_manager +mp.set_start_method("spawn", force=True) """ This is the script to train an Adapter infused GPT Model for text generation. diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_eval.py b/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_eval.py index 872a65ae922c..5c1930daa3e6 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_eval.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_eval.py @@ -14,6 +14,7 @@ import torch +import torch.multiprocessing as mp from apex.transformer import parallel_state from omegaconf import OmegaConf from omegaconf.omegaconf import open_dict @@ -23,6 +24,8 @@ from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy from nemo.core.config import hydra_runner +mp.set_start_method("spawn", force=True) + """ This is the script to run an Adapter Tuned GPT Model for text generation. @@ -101,6 +104,7 @@ def dummy(): tokens_to_generate=cfg.inference.tokens_to_generate, drop_last=False, shuffle=False, + num_workers=cfg.get("num_workers", 1), ) config = OmegaConf.to_container(cfg.inference) diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py b/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py index d5474bcdde5b..85ce524719e7 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py @@ -13,6 +13,7 @@ # limitations under the License. +import torch.multiprocessing as mp from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer @@ -29,6 +30,7 @@ from nemo.utils import logging from nemo.utils.exp_manager import exp_manager +mp.set_start_method("spawn", force=True) """ This is the script to train an Adapter infused GPT Model for text generation. diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_adapter_eval.py b/examples/nlp/language_modeling/tuning/megatron_t5_adapter_eval.py index 81c64e468e73..1aabea41480b 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_adapter_eval.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_adapter_eval.py @@ -14,6 +14,7 @@ import torch +import torch.multiprocessing as mp from apex.transformer import parallel_state from omegaconf import OmegaConf from omegaconf.omegaconf import open_dict @@ -25,6 +26,8 @@ from nemo.core.config import hydra_runner from nemo.utils.app_state import AppState +mp.set_start_method("spawn", force=True) + """ This is the script to run an Adapter Tuned GPT Model for text generation. diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py b/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py index 15c253984148..18d5f1543d2c 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py @@ -13,6 +13,7 @@ # limitations under the License. +import torch.multiprocessing as mp from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer @@ -29,6 +30,7 @@ from nemo.utils import logging from nemo.utils.exp_manager import exp_manager +mp.set_start_method("spawn", force=True) """ This is the script to train an Adapter infused GPT Model for text generation. diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_ia3_eval.py b/examples/nlp/language_modeling/tuning/megatron_t5_ia3_eval.py index b39371e11bdd..c88d73762cae 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_ia3_eval.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_ia3_eval.py @@ -14,6 +14,7 @@ import torch +import torch.multiprocessing as mp from apex.transformer import parallel_state from omegaconf import OmegaConf from omegaconf.omegaconf import open_dict @@ -25,6 +26,8 @@ from nemo.core.config import hydra_runner from nemo.utils.app_state import AppState +mp.set_start_method("spawn", force=True) + """ This is the script to run an Adapter Tuned GPT Model for text generation. diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py b/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py index 27dfd41fed75..185d5a94c3ca 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py @@ -13,6 +13,7 @@ # limitations under the License. +import torch.multiprocessing as mp from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer @@ -29,6 +30,7 @@ from nemo.utils import logging from nemo.utils.exp_manager import exp_manager +mp.set_start_method("spawn", force=True) """ This is the script to train an Adapter infused GPT Model for text generation. diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py index 96f0649061dc..65e554fafe29 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py @@ -871,6 +871,7 @@ def build_virtual_prompt_dataset( drop_last=drop_last, num_workers=num_workers, pin_memory=pin_memory, + persistent_workers=True, # (@adithyare and @eharper) We need this to make spawn=True to work. ) return dataset, dataloader diff --git a/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py b/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py index 643841c09f14..534bf194021b 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py @@ -440,6 +440,7 @@ def build_virtual_prompt_dataset( drop_last=drop_last, num_workers=num_workers, pin_memory=pin_memory, + persistent_workers=True, # (@adithyare and @eharper) We need to set this to True to get around issues with spawn=True ) print('build success', len(dataloader), dataset_paths) return dataset, dataloader From e25b5dce8eca47512d989caa86a015ebc797191d Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 21 Dec 2022 16:04:23 -0700 Subject: [PATCH 092/154] remove output (#5689) (#5690) Signed-off-by: ericharper Signed-off-by: ericharper Signed-off-by: ericharper Co-authored-by: Eric Harper Signed-off-by: Elena Rastorgueva --- .../Token_Classification-BioMegatron.ipynb | 179 ++++-------------- 1 file changed, 34 insertions(+), 145 deletions(-) diff --git a/tutorials/nlp/Token_Classification-BioMegatron.ipynb b/tutorials/nlp/Token_Classification-BioMegatron.ipynb index b07dfb061625..1cd13c429e92 100644 --- a/tutorials/nlp/Token_Classification-BioMegatron.ipynb +++ b/tutorials/nlp/Token_Classification-BioMegatron.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "id": "b7a434f4", "metadata": {}, "outputs": [], @@ -34,7 +34,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": null, "id": "challenging-pioneer", "metadata": {}, "outputs": [], @@ -95,7 +95,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": null, "id": "federal-beads", "metadata": {}, "outputs": [], @@ -107,22 +107,10 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": null, "id": "relevant-juvenile", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Downloading NCBI data...\n", - "Archive: DATA_DIR/NCBI_corpus.zip\n", - " inflating: DATA_DIR/NCBI_corpus_development.txt \n", - " inflating: DATA_DIR/NCBI_corpus_testing.txt \n", - " inflating: DATA_DIR/NCBI_corpus_training.txt \n" - ] - } - ], + "outputs": [], "source": [ "print('Downloading NCBI data...')\n", "wget.download('https://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/NCBI_corpus.zip', DATA_DIR)\n", @@ -131,18 +119,10 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": null, "id": "radical-castle", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "9288106\tClustering of missense mutations in the ataxia-telangiectasia gene in a sporadic T-cell leukaemia.\tAtaxia-telangiectasia ( A-T ) is a recessive multi-system disorder caused by mutations in the ATM gene at 11q22-q23 ( ref . 3 ) . The risk of cancer , especially lymphoid neoplasias , is substantially elevated in A-T patients and has long been associated with chromosomal instability . By analysing tumour DNA from patients with sporadic T-cell prolymphocytic leukaemia ( T-PLL ) , a rare clonal malignancy with similarities to a mature T-cell leukaemia seen in A-T , we demonstrate a high frequency of ATM mutations in T-PLL . In marked contrast to the ATM mutation pattern in A-T , the most frequent nucleotide changes in this leukaemia were missense mutations . These clustered in the region corresponding to the kinase domain , which is highly conserved in ATM-related proteins in mouse , yeast and Drosophila . The resulting amino-acid substitutions are predicted to interfere with ATP binding or substrate recognition . Two of seventeen mutated T-PLL samples had a previously reported A-T allele . In contrast , no mutations were detected in the p53 gene , suggesting that this tumour suppressor is not frequently altered in this leukaemia . Occasional missense mutations in ATM were also found in tumour DNA from patients with B-cell non-Hodgkins lymphomas ( B-NHL ) and a B-NHL cell line . The evidence of a significant proportion of loss-of-function mutations and a complete absence of the normal copy of ATM in the majority of mutated tumours establishes somatic inactivation of this gene in the pathogenesis of sporadic T-PLL and suggests that ATM acts as a tumour suppressor . As constitutional DNA was not available , a putative hereditary predisposition to T-PLL will require further investigation . . \n" - ] - } - ], + "outputs": [], "source": [ "# If you want to see more examples, you can explore the text of the corpus using the file browser to the left, or open files directly, for example typing a command like the following in a code-cell:\n", "\n", @@ -169,21 +149,10 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": null, "id": "present-interference", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'DATA_DIR/NER/test.tsv'" - ] - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "NER_DATA_DIR = f'{DATA_DIR}/NER'\n", "wget.download('https://raw.githubusercontent.com/spyysalo/ncbi-disease/master/conll/train.tsv', NER_DATA_DIR)\n", @@ -193,21 +162,10 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": null, "id": "identical-figure", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "total 1.5M\n", - "-rw-r--r-- 1 root root 196K Apr 8 00:56 devel.tsv\n", - "-rw-r--r-- 1 root root 201K Apr 8 00:56 test.tsv\n", - "-rw-r--r-- 1 root root 1.1M Apr 8 00:56 train.tsv\n" - ] - } - ], + "outputs": [], "source": [ "!ls -lh $NER_DATA_DIR" ] @@ -222,7 +180,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": null, "id": "utility-wesley", "metadata": {}, "outputs": [], @@ -232,44 +190,20 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": null, "id": "suited-jenny", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'import_from_iob_format (2).py'" - ] - }, - "execution_count": 27, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "wget.download(f'https://raw.githubusercontent.com/NVIDIA/NeMo/{BRANCH}/examples/nlp/token_classification/data/import_from_iob_format.py')" ] }, { "cell_type": "code", - "execution_count": 28, + "execution_count": null, "id": "sensitive-victoria", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-04-08 00:57:03 import_from_iob_format:119] Processing DATA_DIR/NER/train.tsv\n", - "[NeMo I 2022-04-08 00:57:03 import_from_iob_format:124] Processing of the DATA_DIR/NER/train.tsv is complete\n", - "[NeMo I 2022-04-08 00:57:06 import_from_iob_format:119] Processing DATA_DIR/NER/dev.tsv\n", - "[NeMo I 2022-04-08 00:57:06 import_from_iob_format:124] Processing of the DATA_DIR/NER/dev.tsv is complete\n", - "[NeMo I 2022-04-08 00:57:08 import_from_iob_format:119] Processing DATA_DIR/NER/test.tsv\n", - "[NeMo I 2022-04-08 00:57:08 import_from_iob_format:124] Processing of the DATA_DIR/NER/test.tsv is complete\n" - ] - } - ], + "outputs": [], "source": [ "! python import_from_iob_format.py --data_file=$NER_DATA_DIR/train.tsv\n", "! python import_from_iob_format.py --data_file=$NER_DATA_DIR/dev.tsv\n", @@ -286,54 +220,20 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": null, "id": "sound-surgeon", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Identification of APC2 , a homologue of the adenomatous polyposis coli tumour suppressor . \n", - "The adenomatous polyposis coli ( APC ) tumour - suppressor protein controls the Wnt signalling pathway by forming a complex with glycogen synthase kinase 3beta ( GSK - 3beta ) , axin / conductin and betacatenin . \n", - "Complex formation induces the rapid degradation of betacatenin . \n", - "In colon carcinoma cells , loss of APC leads to the accumulation of betacatenin in the nucleus , where it binds to and activates the Tcf - 4 transcription factor ( reviewed in [ 1 ] [ 2 ] ) . \n", - "Here , we report the identification and genomic structure of APC homologues . \n", - "Mammalian APC2 , which closely resembles APC in overall domain structure , was functionally analyzed and shown to contain two SAMP domains , both of which are required for binding to conductin . \n", - "Like APC , APC2 regulates the formation of active betacatenin - Tcf complexes , as demonstrated using transient transcriptional activation assays in APC - / - colon carcinoma cells . \n", - "Human APC2 maps to chromosome 19p13 . \n", - "3 . \n", - "APC and APC2 may therefore have comparable functions in development and cancer . \n" - ] - } - ], + "outputs": [], "source": [ "!head $NER_DATA_DIR/text_train.txt" ] }, { "cell_type": "code", - "execution_count": 30, + "execution_count": null, "id": "spectacular-strain", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "O O O O O O O O B-Disease I-Disease I-Disease I-Disease O O \n", - "O B-Disease I-Disease I-Disease I-Disease I-Disease I-Disease I-Disease O O O O O O O O O O O O O O O O O O O O O O O O O O O O O \n", - "O O O O O O O O O \n", - "O B-Disease I-Disease O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O \n", - "O O O O O O O O O O O O O \n", - "O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O \n", - "O O O O O O O O O O O O O O O O O O O O O O O O O O B-Disease I-Disease O O \n", - "O O O O O O O \n", - "O O \n", - "O O O O O O O O O O O B-Disease O \n" - ] - } - ], + "outputs": [], "source": [ "!head $NER_DATA_DIR/labels_train.txt" ] @@ -377,7 +277,7 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": null, "id": "speaking-grant", "metadata": {}, "outputs": [], @@ -389,18 +289,10 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": null, "id": "demanding-ballet", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "config file is already exists\n" - ] - } - ], + "outputs": [], "source": [ "# download the model's configuration file \n", "config_dir = WORK_DIR + '/configs/'\n", @@ -414,18 +306,10 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": null, "id": "criminal-outdoors", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WORK_DIR/configs/token_classification_config.yaml\n" - ] - } - ], + "outputs": [], "source": [ "# this line will print the entire config of the model\n", "config_path = f'{WORK_DIR}/configs/{MODEL_CONFIG}'\n", @@ -438,7 +322,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": null, "id": "informed-purse", "metadata": {}, "outputs": [], @@ -784,7 +668,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "Python 3.8.0 ('test_r1.10.0_pip')", "language": "python", "name": "python3" }, @@ -798,9 +682,14 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.12" + "version": "3.8.0" + }, + "vscode": { + "interpreter": { + "hash": "30504a7d8129b3c45f1978a1de0804c162ca7894685891a914c7f1dc31e854c4" + } } }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +} From 8a4b45b8f6df3faa264b330dc4c781797692bc8b Mon Sep 17 00:00:00 2001 From: Sandeep Subramanian Date: Wed, 21 Dec 2022 15:19:14 -0800 Subject: [PATCH 093/154] Minor fixes (#5691) Signed-off-by: MaximumEntropy Signed-off-by: MaximumEntropy Signed-off-by: Elena Rastorgueva --- examples/nlp/language_modeling/megatron_bert_pretraining.py | 1 - examples/nlp/language_modeling/megatron_gpt_pretraining.py | 1 - examples/nlp/language_modeling/megatron_gpt_prompt_learning.py | 1 - examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py | 1 + 4 files changed, 1 insertion(+), 3 deletions(-) diff --git a/examples/nlp/language_modeling/megatron_bert_pretraining.py b/examples/nlp/language_modeling/megatron_bert_pretraining.py index cc59a866f6ed..d988b1736dc2 100644 --- a/examples/nlp/language_modeling/megatron_bert_pretraining.py +++ b/examples/nlp/language_modeling/megatron_bert_pretraining.py @@ -15,7 +15,6 @@ from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer -from pytorch_lightning.callbacks.timer import Timer from pytorch_lightning.trainer.connectors.checkpoint_connector import CheckpointConnector from nemo.collections.nlp.models.language_modeling.megatron_bert_model import MegatronBertModel diff --git a/examples/nlp/language_modeling/megatron_gpt_pretraining.py b/examples/nlp/language_modeling/megatron_gpt_pretraining.py index ff055c97e209..a8093dbd899a 100644 --- a/examples/nlp/language_modeling/megatron_gpt_pretraining.py +++ b/examples/nlp/language_modeling/megatron_gpt_pretraining.py @@ -16,7 +16,6 @@ from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer -from pytorch_lightning.callbacks.timer import Timer from pytorch_lightning.trainer.connectors.checkpoint_connector import CheckpointConnector from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel diff --git a/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py b/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py index e3fa95013f72..55cab50851a6 100644 --- a/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py +++ b/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py @@ -16,7 +16,6 @@ from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer -from pytorch_lightning.callbacks.timer import Timer from nemo.collections.nlp.models.language_modeling.megatron_gpt_prompt_learning_model import ( MegatronGPTPromptLearningModel, diff --git a/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py b/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py index ffdb6b20f780..8c63cc5dce02 100644 --- a/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py +++ b/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py @@ -16,6 +16,7 @@ import tempfile import torch.multiprocessing as mp +from lightning_lite.plugins.environments import TorchElasticEnvironment from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer from pytorch_lightning.trainer.connectors.checkpoint_connector import CheckpointConnector From bdaf4310f4bf4fa21049e34c323af1551529e827 Mon Sep 17 00:00:00 2001 From: fayejf <36722593+fayejf@users.noreply.github.com> Date: Fri, 23 Dec 2022 16:14:13 -0800 Subject: [PATCH 094/154] temp disbale speaker reco CI (#5696) Signed-off-by: fayejf Signed-off-by: fayejf Signed-off-by: Elena Rastorgueva --- Jenkinsfile | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/Jenkinsfile b/Jenkinsfile index b6d7cf56fcf3..aa624258d65f 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -334,21 +334,21 @@ pipeline { } failFast true parallel { - - stage('Speaker Recognition') { - steps { - sh 'python examples/speaker_tasks/recognition/speaker_reco.py \ - model.train_ds.batch_size=10 \ - model.validation_ds.batch_size=2 \ - model.train_ds.manifest_filepath=/home/TestData/an4_speaker/train.json \ - model.validation_ds.manifest_filepath=/home/TestData/an4_speaker/dev.json \ - trainer.devices=[1] \ - trainer.accelerator="gpu" \ - +trainer.fast_dev_run=True \ - exp_manager.exp_dir=examples/speaker_tasks/recognition/speaker_recognition_results' - sh 'rm -rf examples/speaker_tasks/recognition/speaker_recognition_results' - } - } + // Temp disable this test due to long duration. TODO: bring it back + // stage('Speaker Recognition') { + // steps { + // sh 'python examples/speaker_tasks/recognition/speaker_reco.py \ + // model.train_ds.batch_size=10 \ + // model.validation_ds.batch_size=2 \ + // model.train_ds.manifest_filepath=/home/TestData/an4_speaker/train.json \ + // model.validation_ds.manifest_filepath=/home/TestData/an4_speaker/dev.json \ + // trainer.devices=[1] \ + // trainer.accelerator="gpu" \ + // +trainer.fast_dev_run=True \ + // exp_manager.exp_dir=examples/speaker_tasks/recognition/speaker_recognition_results' + // sh 'rm -rf examples/speaker_tasks/recognition/speaker_recognition_results' + // } + // } stage('Speaker Diarization') { steps { From 4e292c9743e6f331e3ef90004d98638c265413e0 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 23 Dec 2022 17:50:31 -0700 Subject: [PATCH 095/154] some tokenizers do not have additional_special_tokens_ids attribute (#5642) (#5648) Signed-off-by: arendu Signed-off-by: arendu Signed-off-by: arendu Co-authored-by: Adi Renduchintala <108822655+arendu@users.noreply.github.com> Co-authored-by: Eric Harper Signed-off-by: Elena Rastorgueva --- .../language_modeling/megatron_t5_adapter_model.py | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_t5_adapter_model.py b/nemo/collections/nlp/models/language_modeling/megatron_t5_adapter_model.py index f0ba7a8b69bd..319897ec8eb1 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_t5_adapter_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_t5_adapter_model.py @@ -128,9 +128,13 @@ def validation_step(self, batch, batch_idx, inference=False): idx = pred.index(self.tokenizer.eos_id) pred = pred[:idx] - pred = [id for id in pred if id not in self.tokenizer.tokenizer.additional_special_tokens_ids] - label = [id for id in label if id not in self.tokenizer.tokenizer.additional_special_tokens_ids] - enc_input = [id for id in enc_input if id not in self.tokenizer.tokenizer.additional_special_tokens_ids] + additional_special_tokens_ids = [] + if hasattr(self.tokenizer.tokenizer, "additional_special_tokens_ids"): + additional_special_tokens_ids = self.tokenizer.tokenizer.additional_special_tokens_ids + + pred = [id for id in pred if id not in additional_special_tokens_ids] + label = [id for id in label if id not in additional_special_tokens_ids] + enc_input = [id for id in enc_input if id not in additional_special_tokens_ids] pred = self.tokenizer.ids_to_text(pred) label = self.tokenizer.ids_to_text(label) From 2cab473ea1f7468feb695260416ce862797855cf Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 27 Dec 2022 10:21:17 -0800 Subject: [PATCH 096/154] Bump setuptools from 59.5.0 to 65.5.1 in /requirements (#5704) Bumps [setuptools](https://github.com/pypa/setuptools) from 59.5.0 to 65.5.1. - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/CHANGES.rst) - [Commits](https://github.com/pypa/setuptools/compare/v59.5.0...v65.5.1) --- updated-dependencies: - dependency-name: setuptools dependency-type: direct:production ... Signed-off-by: dependabot[bot] Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- requirements/requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements/requirements.txt b/requirements/requirements.txt index aad836335461..bfb3079feca2 100644 --- a/requirements/requirements.txt +++ b/requirements/requirements.txt @@ -5,7 +5,7 @@ onnx>=1.7.0 python-dateutil ruamel.yaml scikit-learn -setuptools==59.5.0 +setuptools==65.5.1 tensorboard text-unidecode torch From 17ac65e5ceac5338bac44a4f10eec7b9f77f694d Mon Sep 17 00:00:00 2001 From: Eric Harper Date: Wed, 28 Dec 2022 11:19:07 -0700 Subject: [PATCH 097/154] Merge 1.14.0 main (#5705) * update branch Signed-off-by: ericharper * [TTS][ZH] fix broken link for the script. (#5666) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * update readme Signed-off-by: ericharper * update branch Signed-off-by: ericharper * update package info Signed-off-by: ericharper * unpin lightning Signed-off-by: ericharper Signed-off-by: ericharper Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- README.rst | 2 +- nemo/package_info.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index bab662d8f5ce..c3b4018c68d5 100644 --- a/README.rst +++ b/README.rst @@ -227,7 +227,7 @@ Install it manually if not using the NVIDIA PyTorch container. git clone https://github.com/ericharper/apex.git cd apex - git checkout nm_v1.13.0 + git checkout nm_v1.14.0 pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./ Transformer Engine diff --git a/nemo/package_info.py b/nemo/package_info.py index 071179db59c3..4c7b3acd15ee 100644 --- a/nemo/package_info.py +++ b/nemo/package_info.py @@ -14,7 +14,7 @@ MAJOR = 1 -MINOR = 14 +MINOR = 15 PATCH = 0 PRE_RELEASE = 'rc0' From 0bb6aa6c32a73377d61f9a0c9352dc77536f8e02 Mon Sep 17 00:00:00 2001 From: milesial Date: Tue, 3 Jan 2023 23:50:16 +0100 Subject: [PATCH 098/154] Don't print exp_manager warning when max_steps == -1 (#5725) Signed-off-by: Alexandre Milesi Signed-off-by: Elena Rastorgueva --- nemo/utils/exp_manager.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nemo/utils/exp_manager.py b/nemo/utils/exp_manager.py index fd62cd7c71f3..29d1a929d721 100644 --- a/nemo/utils/exp_manager.py +++ b/nemo/utils/exp_manager.py @@ -1041,7 +1041,7 @@ def configure_checkpointing( f"). It is very likely this run will fail with ModelCheckpoint(monitor='{params.monitor}') not found " "in the returned metrics. Please ensure that validation is run within trainer.max_epochs." ) - elif trainer.max_steps is not None: + elif trainer.max_steps is not None and trainer.max_steps != -1: logging.warning( "The checkpoint callback was told to monitor a validation value and trainer's max_steps was set to " f"{trainer.max_steps}. Please ensure that max_steps will run for at least " From 13065f332300a42fb01b122faf9dfc83e381e57f Mon Sep 17 00:00:00 2001 From: Nithin Rao Date: Wed, 4 Jan 2023 22:29:20 +0530 Subject: [PATCH 099/154] pin torchmetrics version (#5720) * fix torchmetrics version Signed-off-by: nithinraok * add lower bound Signed-off-by: nithinraok Signed-off-by: nithinraok Signed-off-by: Elena Rastorgueva --- Jenkinsfile | 30 ++++++++++----------- nemo/collections/asr/models/label_models.py | 4 +-- requirements/requirements_lightning.txt | 2 +- 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/Jenkinsfile b/Jenkinsfile index aa624258d65f..9ad095049086 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -334,21 +334,21 @@ pipeline { } failFast true parallel { - // Temp disable this test due to long duration. TODO: bring it back - // stage('Speaker Recognition') { - // steps { - // sh 'python examples/speaker_tasks/recognition/speaker_reco.py \ - // model.train_ds.batch_size=10 \ - // model.validation_ds.batch_size=2 \ - // model.train_ds.manifest_filepath=/home/TestData/an4_speaker/train.json \ - // model.validation_ds.manifest_filepath=/home/TestData/an4_speaker/dev.json \ - // trainer.devices=[1] \ - // trainer.accelerator="gpu" \ - // +trainer.fast_dev_run=True \ - // exp_manager.exp_dir=examples/speaker_tasks/recognition/speaker_recognition_results' - // sh 'rm -rf examples/speaker_tasks/recognition/speaker_recognition_results' - // } - // } + stage('Speaker Recognition') { + steps { + sh 'python examples/speaker_tasks/recognition/speaker_reco.py \ + model.train_ds.batch_size=10 \ + model.validation_ds.batch_size=2 \ + model.train_ds.manifest_filepath=/home/TestData/an4_speaker/train.json \ + model.validation_ds.manifest_filepath=/home/TestData/an4_speaker/dev.json \ + trainer.max_epochs=10 \ + trainer.devices=[1] \ + trainer.accelerator="gpu" \ + +trainer.fast_dev_run=True \ + exp_manager.exp_dir=examples/speaker_tasks/recognition/speaker_recognition_results' + sh 'rm -rf examples/speaker_tasks/recognition/speaker_recognition_results' + } + } stage('Speaker Diarization') { steps { diff --git a/nemo/collections/asr/models/label_models.py b/nemo/collections/asr/models/label_models.py index 62e4dd5e456f..94048e5ab0c4 100644 --- a/nemo/collections/asr/models/label_models.py +++ b/nemo/collections/asr/models/label_models.py @@ -165,7 +165,7 @@ def __init__(self, cfg: DictConfig, trainer: Trainer = None): self.encoder = EncDecSpeakerLabelModel.from_config_dict(cfg.encoder) self.decoder = EncDecSpeakerLabelModel.from_config_dict(cfg.decoder) - self._macro_accuracy = Accuracy(num_classes=num_classes, average='macro', task='multiclass') + self._macro_accuracy = Accuracy(num_classes=num_classes, average='macro') self.labels = None if hasattr(self._cfg, 'spec_augment') and self._cfg.spec_augment is not None: @@ -365,7 +365,7 @@ def evaluation_step(self, batch, batch_idx, dataloader_idx: int = 0, tag: str = acc_top_k = self._accuracy(logits=logits, labels=labels) correct_counts, total_counts = self._accuracy.correct_counts_k, self._accuracy.total_counts_k self._macro_accuracy.update(preds=logits, target=labels) - stats = self._macro_accuracy._final_state() + stats = self._macro_accuracy._get_final_stats() return { f'{tag}_loss': loss_value, diff --git a/requirements/requirements_lightning.txt b/requirements/requirements_lightning.txt index faa23c758746..9876002b806d 100644 --- a/requirements/requirements_lightning.txt +++ b/requirements/requirements_lightning.txt @@ -2,7 +2,7 @@ hydra-core>=1.2.0,<1.3 omegaconf>=2.2,<2.3 pytorch-lightning>=1.8.3 pyyaml<6 # Pinned until omegaconf works with pyyaml>=6 -torchmetrics +torchmetrics>=0.4.1rc0,<=0.10.3 transformers>=4.0.1,<=4.21.2 wandb webdataset>=0.1.48,<=0.1.62 From 4bb9e893ac9db435ba699fe9ce4757a8e53c124c Mon Sep 17 00:00:00 2001 From: Eric Harper Date: Wed, 4 Jan 2023 11:31:18 -0700 Subject: [PATCH 100/154] Update to pytorch 22.12 container (#5694) * update to pytorch 22.12 container Signed-off-by: ericharper * please fix waveglow export in 22.12 container Signed-off-by: ericharper * Update torch.stft() calls due to deprecation of return_complex=False (#5729) Signed-off-by: Jocelyn Huang Signed-off-by: Jocelyn Huang * Update ASR torch.stft() call to use return_complex=True (#5730) Signed-off-by: Jocelyn Huang Signed-off-by: Jocelyn Huang Signed-off-by: ericharper Signed-off-by: Jocelyn Huang Co-authored-by: Jocelyn Signed-off-by: Elena Rastorgueva --- Dockerfile | 8 ++------ Jenkinsfile | 2 +- README.rst | 4 ++-- ci.groovy | 2 +- nemo/collections/asr/parts/preprocessing/features.py | 7 +++---- nemo/collections/tts/losses/stftlosses.py | 2 +- nemo/collections/tts/models/base.py | 12 +++++++++--- nemo/collections/tts/models/hifigan.py | 5 +++-- nemo/collections/tts/models/univnet.py | 5 +++-- nemo/collections/tts/modules/univnet_modules.py | 4 +++- tests/collections/tts/test_waveglow.py | 1 + 11 files changed, 29 insertions(+), 23 deletions(-) diff --git a/Dockerfile b/Dockerfile index 43fb42b12992..e7cd3e944523 100644 --- a/Dockerfile +++ b/Dockerfile @@ -14,7 +14,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:22.11-py3 +ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:22.12-py3 # build an image that includes only the nemo dependencies, ensures that dependencies @@ -34,10 +34,6 @@ RUN apt-get update && \ WORKDIR /tmp/ -RUN git clone https://github.com/NVIDIA/apex.git -b 22.11-devel && \ - cd apex && \ - pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./ - # uninstall stuff from base container RUN pip3 uninstall -y sacrebleu torchtext @@ -65,7 +61,7 @@ COPY . . # start building the final container FROM nemo-deps as nemo -ARG NEMO_VERSION=1.14.0 +ARG NEMO_VERSION=1.15.0 # Check that NEMO_VERSION is set. Build will fail without this. Expose NEMO and base container # version information as runtime environment variable for introspection purposes diff --git a/Jenkinsfile b/Jenkinsfile index 9ad095049086..af9b73e68e40 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -1,7 +1,7 @@ pipeline { agent { docker { - image 'nvcr.io/nvidia/pytorch:22.11-py3' + image 'nvcr.io/nvidia/pytorch:22.12-py3' args '--device=/dev/nvidia0 --gpus all --user 0:128 -v /home/TestData:/home/TestData -v $HOME/.cache:/root/.cache --shm-size=8g' } } diff --git a/README.rst b/README.rst index c3b4018c68d5..695f298c76d8 100644 --- a/README.rst +++ b/README.rst @@ -257,13 +257,13 @@ To build a nemo container with Dockerfile from a branch, please run DOCKER_BUILDKIT=1 docker build -f Dockerfile -t nemo:latest . -If you chose to work with main branch, we recommend using NVIDIA's PyTorch container version 22.11-py3 and then installing from GitHub. +If you chose to work with main branch, we recommend using NVIDIA's PyTorch container version 22.12-py3 and then installing from GitHub. .. code-block:: bash docker run --gpus all -it --rm -v :/NeMo --shm-size=8g \ -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit \ - stack=67108864 --device=/dev/snd nvcr.io/nvidia/pytorch:22.11-py3 + stack=67108864 --device=/dev/snd nvcr.io/nvidia/pytorch:22.12-py3 Examples -------- diff --git a/ci.groovy b/ci.groovy index f97397e9385f..484501eb0fd5 100644 --- a/ci.groovy +++ b/ci.groovy @@ -15,7 +15,7 @@ spec: path: /vol/scratch1/scratch.okuchaiev_blossom containers: - name: latestdlfw - image: nvcr.io/nvidia/pytorch:22.11-py3 + image: nvcr.io/nvidia/pytorch:22.12-py3 command: - cat volumeMounts: diff --git a/nemo/collections/asr/parts/preprocessing/features.py b/nemo/collections/asr/parts/preprocessing/features.py index b433144677e8..62e6969a0dff 100644 --- a/nemo/collections/asr/parts/preprocessing/features.py +++ b/nemo/collections/asr/parts/preprocessing/features.py @@ -290,7 +290,7 @@ def __init__( win_length=self.win_length, center=False if exact_pad else True, window=self.window.to(dtype=torch.float), - return_complex=False, + return_complex=True, ) self.normalize = normalize @@ -393,11 +393,10 @@ def forward(self, x, seq_len): with torch.cuda.amp.autocast(enabled=False): x = self.stft(x) - # torch returns real, imag; so convert to magnitude + # torch stft returns complex tensor (of shape [B,N,T]); so convert to magnitude # guard is needed for sqrt if grads are passed through guard = 0 if not self.use_grads else CONSTANT - if x.dtype in [torch.cfloat, torch.cdouble]: - x = torch.view_as_real(x) + x = torch.view_as_real(x) x = torch.sqrt(x.pow(2).sum(-1) + guard) if self.training and self.nb_augmentation_prob > 0.0: diff --git a/nemo/collections/tts/losses/stftlosses.py b/nemo/collections/tts/losses/stftlosses.py index 1e06535ded8d..65320bf9dd65 100644 --- a/nemo/collections/tts/losses/stftlosses.py +++ b/nemo/collections/tts/losses/stftlosses.py @@ -61,7 +61,7 @@ def stft(x, fft_size, hop_size, win_length, window): Returns: Tensor: Magnitude spectrogram (B, #frames, fft_size // 2 + 1). """ - x_stft = torch.stft(x, fft_size, hop_size, win_length, window, return_complex=False) + x_stft = torch.view_as_real(torch.stft(x, fft_size, hop_size, win_length, window, return_complex=True)) real = x_stft[..., 0] imag = x_stft[..., 1] diff --git a/nemo/collections/tts/models/base.py b/nemo/collections/tts/models/base.py index a7ce3b603eca..6fe2fb3f8806 100644 --- a/nemo/collections/tts/models/base.py +++ b/nemo/collections/tts/models/base.py @@ -146,9 +146,15 @@ def check_children_attributes(self): ) from e def yet_another_patch(audio, n_fft, hop_length, win_length, window): - spec = torch.stft(audio, n_fft=n_fft, hop_length=hop_length, win_length=win_length, window=window) - if spec.dtype in [torch.cfloat, torch.cdouble]: - spec = torch.view_as_real(spec) + spec = torch.stft( + audio, + n_fft=n_fft, + hop_length=hop_length, + win_length=win_length, + window=window, + return_complex=True, + ) + spec = torch.view_as_real(spec) return torch.sqrt(spec.pow(2).sum(-1)), torch.atan2(spec[..., -1], spec[..., 0]) self.stft = lambda x: yet_another_patch( diff --git a/nemo/collections/tts/models/hifigan.py b/nemo/collections/tts/models/hifigan.py index b98ede98e2bd..a431d84a9e11 100644 --- a/nemo/collections/tts/models/hifigan.py +++ b/nemo/collections/tts/models/hifigan.py @@ -286,7 +286,8 @@ def validation_step(self, batch, batch_idx): def _bias_denoise(self, audio, mel): def stft(x): - comp = torch.stft(x.squeeze(1), n_fft=1024, hop_length=256, win_length=1024) + comp = torch.stft(x.squeeze(1), n_fft=1024, hop_length=256, win_length=1024, return_complex=True) + comp = torch.view_as_real(comp) real, imag = comp[..., 0], comp[..., 1] mags = torch.sqrt(real ** 2 + imag ** 2) phase = torch.atan2(imag, real) @@ -294,7 +295,7 @@ def stft(x): def istft(mags, phase): comp = torch.stack([mags * torch.cos(phase), mags * torch.sin(phase)], dim=-1) - x = torch.istft(comp, n_fft=1024, hop_length=256, win_length=1024) + x = torch.istft(torch.view_as_complex(comp), n_fft=1024, hop_length=256, win_length=1024) return x # Create bias tensor diff --git a/nemo/collections/tts/models/univnet.py b/nemo/collections/tts/models/univnet.py index 6cbdf615e677..aec37a6dda26 100644 --- a/nemo/collections/tts/models/univnet.py +++ b/nemo/collections/tts/models/univnet.py @@ -237,7 +237,8 @@ def validation_step(self, batch, batch_idx): def _bias_denoise(self, audio, mel): def stft(x): - comp = torch.stft(x.squeeze(1), n_fft=1024, hop_length=256, win_length=1024) + comp = torch.stft(x.squeeze(1), n_fft=1024, hop_length=256, win_length=1024, return_complex=True) + comp = torch.view_as_real(comp) real, imag = comp[..., 0], comp[..., 1] mags = torch.sqrt(real ** 2 + imag ** 2) phase = torch.atan2(imag, real) @@ -245,7 +246,7 @@ def stft(x): def istft(mags, phase): comp = torch.stack([mags * torch.cos(phase), mags * torch.sin(phase)], dim=-1) - x = torch.istft(comp, n_fft=1024, hop_length=256, win_length=1024) + x = torch.istft(torch.view_as_complex(comp), n_fft=1024, hop_length=256, win_length=1024) return x # Create bias tensor diff --git a/nemo/collections/tts/modules/univnet_modules.py b/nemo/collections/tts/modules/univnet_modules.py index 930815e188c4..153fec9ee084 100644 --- a/nemo/collections/tts/modules/univnet_modules.py +++ b/nemo/collections/tts/modules/univnet_modules.py @@ -574,7 +574,9 @@ def spectrogram(self, x): n_fft, hop_length, win_length = self.resolution x = F.pad(x, (int((n_fft - hop_length) / 2), int((n_fft - hop_length) / 2)), mode='reflect') x = x.squeeze(1) - x = torch.stft(x, n_fft=n_fft, hop_length=hop_length, win_length=win_length, center=False) # [B, F, TT, 2] + x = torch.view_as_real( + torch.stft(x, n_fft=n_fft, hop_length=hop_length, win_length=win_length, center=False, return_complex=True) + ) # [B, F, TT, 2] (Note: torch.stft() returns complex tensor [B, F, TT]; converted via view_as_real) mag = torch.norm(x, p=2, dim=-1) # [B, F, TT] return mag diff --git a/tests/collections/tts/test_waveglow.py b/tests/collections/tts/test_waveglow.py index 889fc9c58616..fdf4344e02f9 100644 --- a/tests/collections/tts/test_waveglow.py +++ b/tests/collections/tts/test_waveglow.py @@ -70,6 +70,7 @@ def forward_wrapper(self, spec, z=None): class TestWaveGlow: + @pytest.mark.pleasefixme @pytest.mark.run_only_on('GPU') @pytest.mark.unit def test_export_to_onnx(self): From e7dd06d4224c1028d0bcbbd6aa134a7b189031f1 Mon Sep 17 00:00:00 2001 From: pks Date: Wed, 4 Jan 2023 21:17:48 +0100 Subject: [PATCH 101/154] add keep_initializers_as_inputs to _export method (#5731) Signed-off-by: Patrick Simianer Signed-off-by: Patrick Simianer Signed-off-by: Elena Rastorgueva --- nemo/core/classes/exportable.py | 1 + 1 file changed, 1 insertion(+) diff --git a/nemo/core/classes/exportable.py b/nemo/core/classes/exportable.py index 50266dab3dbe..1404d5117a3c 100644 --- a/nemo/core/classes/exportable.py +++ b/nemo/core/classes/exportable.py @@ -79,6 +79,7 @@ def export( dynamic_axes=dynamic_axes, check_tolerance=check_tolerance, export_modules_as_functions=export_modules_as_functions, + keep_initializers_as_inputs=keep_initializers_as_inputs, ) # Propagate input example (default scenario, may need to be overriden) if input_example is not None: From 9a02343c8350b4195e9d7dc3ef163033ef8959c4 Mon Sep 17 00:00:00 2001 From: Yi Dong <43824965+yidong72@users.noreply.github.com> Date: Wed, 4 Jan 2023 15:39:21 -0500 Subject: [PATCH 102/154] added tab former doc to the index page (#5733) Signed-off-by: Yi Dong Signed-off-by: Yi Dong Signed-off-by: Elena Rastorgueva --- README.rst | 1 + docs/source/starthere/tutorials.rst | 3 +++ 2 files changed, 4 insertions(+) diff --git a/README.rst b/README.rst index 695f298c76d8..13c456eedd7a 100644 --- a/README.rst +++ b/README.rst @@ -94,6 +94,7 @@ Key Features * `Dialogue State Tracking `_ * `Prompt Learning `_ * `NGC collection of pre-trained NLP models. `_ + * `Synthetic Tabular Data Generation `_ * `Speech synthesis (TTS) `_ * Spectrogram generation: Tacotron2, GlowTTS, TalkNet, FastPitch, FastSpeech2, Mixer-TTS, Mixer-TTS-X * Vocoders: WaveGlow, SqueezeWave, UniGlow, MelGAN, HiFiGAN, UnivNet diff --git a/docs/source/starthere/tutorials.rst b/docs/source/starthere/tutorials.rst index df75194e2238..51bc2f186964 100644 --- a/docs/source/starthere/tutorials.rst +++ b/docs/source/starthere/tutorials.rst @@ -139,6 +139,9 @@ To run a tutorial: * - NLP - P-Tuning/Prompt-Tuning - `P-Tuning/Prompt-Tuning `_ + * - NLP + - Synthetic Tabular Data Generation + - `Synthetic Tabular Data Generation `_ * - TTS - NeMo TTS Primer - `NeMo TTS Primer `_ From a91e7e37e2bb0b3599b9ab1187fabfb29085c5fc Mon Sep 17 00:00:00 2001 From: Micha Livne Date: Thu, 5 Jan 2023 01:03:15 +0200 Subject: [PATCH 103/154] ALiBi Positional Embeddings (#5467) * 1. Working on alibi positional embeddings. Signed-off-by: Micha Livne * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. Signed-off-by: Micha Livne * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. Signed-off-by: Micha Livne * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. Signed-off-by: Micha Livne * 1. Debugging. Signed-off-by: Micha Livne * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. Signed-off-by: Micha Livne * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Added encoder and decoder alibi classes. Signed-off-by: Micha Livne * 1. Debugging. Signed-off-by: Micha Livne * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Simplified code. 2. Added bidirectional support. Signed-off-by: Micha Livne * 1. Added support in config to alibi. Signed-off-by: Micha Livne * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Added Jenkins tests. Signed-off-by: Micha Livne * 1. Added missing file. Signed-off-by: Micha Livne * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. Signed-off-by: Micha Livne * 1. Debugging. Signed-off-by: Micha Livne * 1. Debugging. Signed-off-by: Micha Livne Signed-off-by: Micha Livne Co-authored-by: Micha Livne Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev Co-authored-by: Eric Harper Signed-off-by: Elena Rastorgueva --- Jenkinsfile | 96 +++++++++++++++ .../megatron_lm_encoder_decoder_model.py | 4 +- .../alibi_relative_position_embedding.py | 116 ++++++++++++++++++ .../megatron/token_level_encoder_decoder.py | 33 ++++- 4 files changed, 243 insertions(+), 6 deletions(-) create mode 100644 nemo/collections/nlp/modules/common/megatron/alibi_relative_position_embedding.py diff --git a/Jenkinsfile b/Jenkinsfile index af9b73e68e40..a3354bca11a3 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -3745,6 +3745,102 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' sh "rm -rf examples/nlp/language_modeling/t5_index_mappings" } } + stage('L2: Megatron T5 with ALiBi Pretraining and Resume Training TP=2') { + when { + anyOf { + branch 'main' + changeRequest target: 'main' + } + } + failFast true + steps { + sh "python examples/nlp/language_modeling/megatron_t5_pretraining.py \ + trainer.devices=2 \ + trainer.accelerator=gpu \ + trainer.log_every_n_steps=1 \ + trainer.val_check_interval=10 \ + trainer.limit_val_batches=2 \ + trainer.accumulate_grad_batches=1 \ + trainer.max_steps=10 \ + trainer.precision=16 \ + trainer.gradient_clip_val=1.0 \ + exp_manager.exp_dir=examples/nlp/language_modeling/t5_pretrain_results \ + model.tensor_model_parallel_size=2 \ + model.seq_length=128 \ + model.encoder.num_layers=4 \ + model.encoder.hidden_size=64 \ + model.encoder.num_attention_heads=8 \ + model.encoder.activation='swiglu' \ + model.encoder.masked_softmax_fusion=False \ + model.encoder.bias_activation_fusion=False \ + model.encoder.activations_checkpoint_method='block' \ + model.encoder.activations_checkpoint_num_layers=1 \ + model.encoder.position_embedding_type=alibi \ + model.decoder.num_layers=2 \ + model.decoder.hidden_size=64 \ + model.decoder.num_attention_heads=8 \ + model.decoder.activation='swiglu' \ + model.decoder.masked_softmax_fusion=False \ + model.decoder.bias_activation_fusion=False \ + model.decoder.activations_checkpoint_method='block' \ + model.decoder.activations_checkpoint_num_layers=1 \ + model.encoder.transformer_block_type='pre_ln' \ + model.decoder.transformer_block_type='pre_ln' \ + model.data.data_prefix=[.5,/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src,.5,/home/TestData/nlp/nmt/toy_data/wmt14-de-en.ref] \ + model.data.index_mapping_dir=examples/nlp/language_modeling/t5_index_mappings \ + model.data.data_impl=text_mmap \ + +model.data.data_impl_kwargs.newline_int=10 \ + +model.data.data_impl_kwargs.header_lines=0 \ + +model.data.data_impl_kwargs.workers=null \ + +model.data.data_impl_kwargs.sort_dataset_paths=False \ + model.share_token_embeddings=False \ + model.share_decoder_tokens_head_embeddings=False" + sh "python examples/nlp/language_modeling/megatron_t5_pretraining.py \ + trainer.devices=2 \ + trainer.accelerator=gpu \ + trainer.log_every_n_steps=1 \ + trainer.val_check_interval=10 \ + trainer.limit_val_batches=2 \ + trainer.accumulate_grad_batches=1 \ + trainer.max_steps=10 \ + trainer.precision=16 \ + trainer.gradient_clip_val=1.0 \ + exp_manager.exp_dir=examples/nlp/language_modeling/t5_pretrain_results \ + exp_manager.resume_if_exists=True \ + model.tensor_model_parallel_size=2 \ + model.seq_length=128 \ + model.encoder.num_layers=4 \ + model.encoder.hidden_size=64 \ + model.encoder.num_attention_heads=8 \ + model.encoder.activation='swiglu' \ + model.encoder.masked_softmax_fusion=False \ + model.encoder.bias_activation_fusion=False \ + model.encoder.activations_checkpoint_method='block' \ + model.encoder.activations_checkpoint_num_layers=1 \ + model.encoder.position_embedding_type=alibi \ + model.decoder.num_layers=2 \ + model.decoder.hidden_size=64 \ + model.decoder.num_attention_heads=8 \ + model.decoder.activation='swiglu' \ + model.decoder.masked_softmax_fusion=False \ + model.decoder.bias_activation_fusion=False \ + model.decoder.activations_checkpoint_method='block' \ + model.decoder.activations_checkpoint_num_layers=1 \ + model.encoder.transformer_block_type='pre_ln' \ + model.decoder.transformer_block_type='pre_ln' \ + model.data.data_prefix=[.5,/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src,.5,/home/TestData/nlp/nmt/toy_data/wmt14-de-en.ref] \ + model.data.index_mapping_dir=examples/nlp/language_modeling/t5_index_mappings \ + model.data.data_impl=text_mmap \ + +model.data.data_impl_kwargs.newline_int=10 \ + +model.data.data_impl_kwargs.header_lines=0 \ + +model.data.data_impl_kwargs.workers=null \ + +model.data.data_impl_kwargs.sort_dataset_paths=False \ + model.share_token_embeddings=False \ + model.share_decoder_tokens_head_embeddings=False" + sh "rm -rf examples/nlp/language_modeling/t5_pretrain_results" + sh "rm -rf examples/nlp/language_modeling/t5_index_mappings" + } + } stage('L2: Megatron T5 Pretraining and Resume Training PP=2') { when { anyOf { diff --git a/nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py b/nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py index 6eb34758b979..0a18ac354761 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py @@ -953,8 +953,8 @@ def setup(self, stage=None): ), "share_decoder_tokens_head_embeddings must be True when using pipeline model parallel > 1" self.enc_dec_model.sync_initial_word_embeddings() if ( - self.cfg.encoder.get('position_embedding_type') != 'relative' - and self.cfg.decoder.get('position_embedding_type') != 'relative' + self.cfg.encoder.get('position_embedding_type') == 'learned_absolute' + and self.cfg.decoder.get('position_embedding_type') == 'learned_absolute' ): self.enc_dec_model.sync_initial_position_embeddings() # Synchronize RPE embeddings across pipeline parallel ranks. diff --git a/nemo/collections/nlp/modules/common/megatron/alibi_relative_position_embedding.py b/nemo/collections/nlp/modules/common/megatron/alibi_relative_position_embedding.py new file mode 100644 index 000000000000..6b16e0e6d921 --- /dev/null +++ b/nemo/collections/nlp/modules/common/megatron/alibi_relative_position_embedding.py @@ -0,0 +1,116 @@ +# coding=utf-8 +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math + +import torch + +__all__ = ['ALiBiRelativePositionEmbedding'] + + +def get_slopes(n): + def get_slopes_power_of_2(n): + start = 2 ** (-(2 ** -(math.log2(n) - 3))) + ratio = start + return [start * ratio ** i for i in range(n)] + + if math.log2(n).is_integer(): + slopes = get_slopes_power_of_2(n) + else: + closest_power_of_2 = 2 ** math.floor(math.log2(n)) + slopes = ( + get_slopes_power_of_2(closest_power_of_2) + + get_slopes(2 * closest_power_of_2)[0::2][: n - closest_power_of_2] + ) + + return slopes + + +def build_slopes(num_attention_heads, alibi_num_heads): + """ + Builds a slopes tensor. + """ + slopes = torch.Tensor(get_slopes(alibi_num_heads) + [0] * (num_attention_heads - alibi_num_heads)).cuda() + return slopes.unsqueeze(-1).unsqueeze(-1) + + +def build_relative_position(query_length, key_length, num_attention_heads): + context_position = torch.arange(query_length)[:, None].cuda() + memory_position = torch.arange(key_length)[None, :].cuda() + # shape (query_length, key_length, num_heads) + relative_position = memory_position - context_position + + # shape (num_attention_heads, max_seq_len, max_seq_len) + relative_position = torch.abs(relative_position).unsqueeze(0).expand(num_attention_heads, -1, -1) + + return relative_position + + +class ALiBiRelativePositionEmbedding(torch.nn.Module): + """ + ALiBi (Attention with Linear Biases) relative position embedding for auto-regressive decoder + and joint encoder (symmetric for forward and backward distance). + Based on https://arxiv.org/bas/2108.12409 + """ + + def __init__(self, bidirectional, num_attention_heads, layer_type, alibi_num_heads=None, max_seq_len=512): + """ + Args: + bidirectional: Whether to use bidirectional relative position embedding + num_attention_heads: Number of attention heads + layer_type: Layer type. Can be one of [LayerType.encoder or LayerType.decoder]. Willdetermine the bias construction + alibi_num_heads: Number of attention heads for which alibi bias will be used + max_seq_len: Maximum sequence length for precomputed relative positions. Larger sizes will result in more memory usage by computing alibi mask on-the-fly. + """ + super().__init__() + + if alibi_num_heads is None: + alibi_num_heads = num_attention_heads + + if alibi_num_heads > num_attention_heads: + raise ValueError( + f"alibi_num_heads ({alibi_num_heads}) cannot be larger than num_attention_heads ({num_attention_heads})" + ) + + self.bidirectional = bidirectional + self.num_attention_heads = num_attention_heads + # LayerType.encoder or LayerType.decoder. Is only needed to determine the group for the all_reduce + self.layer_type = layer_type + # define the size of pre-computed relative position slopes. + # define the number of attention heads for which alibi mask will be pre-computed (the rest are disabled). + self.alibi_num_heads = alibi_num_heads + # Larger sizes will result in more memory usage by computing alibi mask on-the-fly. + self.max_seq_len = max_seq_len + + # cache the slopes + self.slopes = build_slopes(num_attention_heads, alibi_num_heads) + # cache the relative position bias. shape (num_attention_heads, max_seq_len, max_seq_len) + self.relative_position = build_relative_position(max_seq_len, max_seq_len, num_attention_heads) + + def forward(self, query_seq_length, key_seq_length): + # used cached relative position if possible + max_seq_len = max(query_seq_length, key_seq_length) + if max_seq_len > self.max_seq_len: + relative_position = build_relative_position(max_seq_len, max_seq_len, self.num_attention_heads) + else: + relative_position = self.relative_position + # shape (num_attention_heads, query_seq_length, key_seq_length) + relative_position = relative_position[:, :query_seq_length, :key_seq_length] + # if not bidirectional, mask out the future positions + if not self.bidirectional: + relative_position = torch.tril(relative_position) + + # shape (1, num_heads, query_length, key_length) + return relative_position.unsqueeze(0) * self.slopes diff --git a/nemo/collections/nlp/modules/common/megatron/token_level_encoder_decoder.py b/nemo/collections/nlp/modules/common/megatron/token_level_encoder_decoder.py index f9a09983901a..2fc06952e7d8 100644 --- a/nemo/collections/nlp/modules/common/megatron/token_level_encoder_decoder.py +++ b/nemo/collections/nlp/modules/common/megatron/token_level_encoder_decoder.py @@ -15,6 +15,9 @@ import torch from omegaconf import DictConfig +from nemo.collections.nlp.modules.common.megatron.alibi_relative_position_embedding import ( + ALiBiRelativePositionEmbedding, +) from nemo.collections.nlp.modules.common.megatron.language_model import Embedding from nemo.collections.nlp.modules.common.megatron.layer_type import LayerType from nemo.collections.nlp.modules.common.megatron.megatron_decoders import get_decoder_model @@ -154,6 +157,17 @@ def __init__( if parallel_state.get_pipeline_model_parallel_rank() != 0: self.encoder_relative_position_embeddings_weight().data.fill_(0) self.encoder_relative_position_embeddings_weight().shared = True + elif self.encoder_cfg.get('position_embedding_type', 'learned_absolute') == 'alibi': + self.encoder_relative_position_embedding = ALiBiRelativePositionEmbedding( + bidirectional=True, + num_attention_heads=encoder_cfg.num_attention_heads, + layer_type=LayerType.encoder, + alibi_num_heads=None, + max_seq_len=max_position_embeddings, + ) + self._encoder_relative_position_embedding_key = "encoder_relative_position_embedding" + else: + self.encoder_relative_position_embedding = None encoder = get_encoder_model( arch=encoder_cfg.arch, @@ -263,6 +277,17 @@ def __init__( ): self.decoder_cross_attention_relative_position_embeddings_weight().data.fill_(0) self.decoder_cross_attention_relative_position_embeddings_weight().shared = True + elif self.decoder_cfg.get('position_embedding_type', 'learned_absolute') == 'alibi': + self.decoder_relative_position_embedding = ALiBiRelativePositionEmbedding( + bidirectional=False, + num_attention_heads=decoder_cfg.num_attention_heads, + layer_type=LayerType.decoder, + alibi_num_heads=None, + max_seq_len=max_position_embeddings, + ) + self._decoder_relative_position_embedding_key = "decoder_relative_position_embedding" + else: + self.decoder_relative_position_embedding = None decoder = get_decoder_model( arch=decoder_cfg.arch, @@ -450,7 +475,7 @@ def forward( enc_seq_length = enc_input_ids.size(1) if self.pre_process and self.add_encoder: # We don't need position ids for RPE, because the embedding layer does not have position embeddings. - if self.encoder_cfg.get("position_embedding_type", "learned_absolute") != 'relative': + if self.encoder_relative_position_embedding is None: enc_position_ids = build_position_ids(enc_input_ids) else: enc_position_ids = None @@ -461,7 +486,7 @@ def forward( # This should only happen with PP > 1 for enc-dec prompt learning models enc_seq_length = enc_attn_mask.size(1) - if self.encoder_cfg.get("position_embedding_type", "learned_absolute") == 'relative' and self.add_encoder: + if self.add_encoder and self.encoder_relative_position_embedding is not None: encoder_self_attention_relative_position_bias = self.encoder_relative_position_embedding( query_seq_length=enc_seq_length, key_seq_length=enc_seq_length, ) @@ -485,7 +510,7 @@ def forward( if self.pre_process and self.add_decoder: # We don't need position ids for RPE, because the embedding layer does not have position embeddings. - if self.decoder_cfg.get("position_embedding_type", "learned_absolute") != 'relative': + if self.decoder_relative_position_embedding is None: dec_position_ids = build_position_ids(dec_input_ids) else: dec_position_ids = None @@ -494,7 +519,7 @@ def forward( # Note: This is when the decoder itself is split across PP ranks. dec_input = None - if self.decoder_cfg.get("position_embedding_type", "learned_absolute") == 'relative' and self.add_decoder: + if self.add_decoder and self.decoder_relative_position_embedding is not None: decoder_self_attention_relative_position_bias = self.decoder_relative_position_embedding( query_seq_length=dec_input_ids.size(1), key_seq_length=dec_input_ids.size(1) ) From d67cb7cc043e7709d8b1e2eef171de1e84b28f9b Mon Sep 17 00:00:00 2001 From: Sean Naren Date: Thu, 5 Jan 2023 19:27:49 +0000 Subject: [PATCH 104/154] Ensure EMA checkpoints are also deleted when normal checkpoints are (#5724) * Ensure EMA checkpoints are also deleted when normal checkpoints are Signed-off-by: SeanNaren * Simplify test Signed-off-by: SeanNaren * Remove comment Signed-off-by: SeanNaren * Fix bug where `save_best_model` caused a crash Signed-off-by: SeanNaren * Swap to logging only on rank 0 Signed-off-by: SeanNaren Signed-off-by: SeanNaren Signed-off-by: Elena Rastorgueva --- nemo/collections/common/callbacks/ema.py | 19 ++++---- nemo/utils/exp_manager.py | 8 ++++ requirements/requirements_lightning.txt | 2 +- tests/collections/common/test_ema.py | 57 +++++++++++++++++++++++- 4 files changed, 74 insertions(+), 12 deletions(-) diff --git a/nemo/collections/common/callbacks/ema.py b/nemo/collections/common/callbacks/ema.py index 63222db68f52..49ebbbae040a 100644 --- a/nemo/collections/common/callbacks/ema.py +++ b/nemo/collections/common/callbacks/ema.py @@ -13,13 +13,13 @@ # limitations under the License. import contextlib import copy -import logging import os import threading from typing import Any, Dict, Iterable import pytorch_lightning as pl import torch +from lightning_utilities.core.rank_zero import rank_zero_info from pytorch_lightning import Callback from pytorch_lightning.utilities.exceptions import MisconfigurationException @@ -117,25 +117,26 @@ def on_load_checkpoint( ) -> None: checkpoint_callback = trainer.checkpoint_callback - if trainer.ckpt_path and checkpoint_callback is not None and 'NeMo' in type(checkpoint_callback).__name__: + # use the connector as NeMo calls the connector directly in the exp_manager when restoring. + connector = trainer._checkpoint_connector + ckpt_path = connector.resume_checkpoint_path + + if ckpt_path and checkpoint_callback is not None and 'NeMo' in type(checkpoint_callback).__name__: ext = checkpoint_callback.FILE_EXTENSION - if trainer.ckpt_path.endswith(f'-EMA{ext}'): - logging.info( + if ckpt_path.endswith(f'-EMA{ext}'): + rank_zero_info( "loading EMA based weights. " "The callback will treat the loaded EMA weights as the main weights" " and create a new EMA copy when training." ) return - ema_path = trainer.ckpt_path.replace(ext, f'-EMA{ext}') + ema_path = ckpt_path.replace(ext, f'-EMA{ext}') if os.path.exists(ema_path): ema_state_dict = torch.load(ema_path, map_location=torch.device('cpu')) - # this is wrong, basically when we save the EMA weights, optimizer_states actually contains the model parameters - # as we swapped the model parameters with the state dict parameters. - # we could enforce that if you trained with EMA and want to continue training checkpoint['optimizer_states'] = ema_state_dict['optimizer_states'] del ema_state_dict - logging.info("EMA state has been restored.") + rank_zero_info("EMA state has been restored.") else: raise MisconfigurationException( "Unable to find the associated EMA weights when re-loading, " diff --git a/nemo/utils/exp_manager.py b/nemo/utils/exp_manager.py index 29d1a929d721..08b67327af0c 100644 --- a/nemo/utils/exp_manager.py +++ b/nemo/utils/exp_manager.py @@ -990,6 +990,14 @@ def _save_checkpoint(self, trainer: 'pytorch_lightning.Trainer', filepath: str) else: super()._save_checkpoint(trainer, filepath) + def _remove_checkpoint(self, trainer: "pytorch_lightning.Trainer", filepath: str) -> None: + super()._remove_checkpoint(trainer, filepath) + ema_callback = self._ema_callback(trainer) + if ema_callback is not None: + # remove EMA copy of the state dict as well. + filepath = self._ema_format_filepath(filepath) + super()._remove_checkpoint(trainer, filepath) + def _ema_format_filepath(self, filepath: str) -> str: return filepath.replace(self.FILE_EXTENSION, f'-EMA{self.FILE_EXTENSION}') diff --git a/requirements/requirements_lightning.txt b/requirements/requirements_lightning.txt index 9876002b806d..4b2d71f861bc 100644 --- a/requirements/requirements_lightning.txt +++ b/requirements/requirements_lightning.txt @@ -1,6 +1,6 @@ hydra-core>=1.2.0,<1.3 omegaconf>=2.2,<2.3 -pytorch-lightning>=1.8.3 +pytorch-lightning>=1.8.6 pyyaml<6 # Pinned until omegaconf works with pyyaml>=6 torchmetrics>=0.4.1rc0,<=0.10.3 transformers>=4.0.1,<=4.21.2 diff --git a/tests/collections/common/test_ema.py b/tests/collections/common/test_ema.py index 16de01e0fbe4..2e22d6a1a275 100644 --- a/tests/collections/common/test_ema.py +++ b/tests/collections/common/test_ema.py @@ -27,7 +27,10 @@ from nemo.collections.common.callbacks.ema import EMAOptimizer from nemo.core import ModelPT from nemo.utils.exp_manager import exp_manager -from tests.collections.nlp.test_gpt_model import DEVICE_CAPABILITY + +DEVICE_CAPABILITY = None +if torch.cuda.is_available(): + DEVICE_CAPABILITY = torch.cuda.get_device_capability() def extract_ema_weights(pl_module, trainer): @@ -69,13 +72,22 @@ def val_dataloader(self): dataset = RandomDataset(32, 16) return torch.utils.data.DataLoader(dataset, batch_size=2) + def test_dataloader(self): + dataset = RandomDataset(32, 16) + dl = torch.utils.data.DataLoader(dataset, batch_size=2) + self._test_names = ['test_{}_'.format(idx) for idx in range(len(dl))] + return dl + def forward(self, batch): return self.l1(self.bn(batch)).sum() + def training_step(self, batch, batch_idx): + return self(batch) + def validation_step(self, batch, batch_idx): return self(batch) - def training_step(self, batch, batch_idx): + def test_step(self, batch, batch_idx): return self(batch) def configure_optimizers(self): @@ -90,6 +102,9 @@ def setup_training_data(self, train_data_config: Union[DictConfig, Dict]): def setup_validation_data(self, val_data_config: Union[DictConfig, Dict]): pass + def setup_test_data(self, val_data_config: Union[DictConfig, Dict]): + pass + def validation_epoch_end(self, loss): self.log("val_loss", torch.stack(loss).mean()) @@ -248,6 +263,27 @@ def test_exp_manager_ema_weights(self, tmpdir): for saved_weight, ema_weight in zip(duplicate_model.state_dict().values(), ema_weights): assert torch.allclose(saved_weight.cpu(), ema_weight.cpu()) + @pytest.mark.unit + def test_exp_manager_ema_weights_topk(self, tmpdir): + """Test to ensure that EMA correctly ensures we only keep topk checkpoints.""" + tmp_path = tmpdir / "exp_manager_test" + model = ExampleModel() + save_top_k = 3 + + trainer = Trainer(max_epochs=10, enable_checkpointing=False, logger=False, devices=1) + exp_manager( + trainer, + { + "ema": {"enable": True}, + "explicit_log_dir": str(tmp_path), + "checkpoint_callback_params": {"save_top_k": save_top_k}, + }, + ) + trainer.fit(model) + + # we save 3 checkpoints for the model, 3 accompanied EMA weights, the last checkpoint and nemo model. + assert len(os.listdir(tmp_path / "checkpoints/")) == (save_top_k + 1) * 2 + 1 + class TestEMATrain: @pytest.mark.unit @@ -320,6 +356,23 @@ def run_training_test(self, accumulate_grad_batches, validate_original_weights, trainer.callbacks.insert(0, EMAValidationAssertCallback()) trainer.fit(model=model, val_dataloaders=model.train_dataloader()) + @pytest.mark.unit + def test_ema_run_with_save_best_model(self, tmpdir): + """Test to ensure that we save the model correctly when save best model is set to True.""" + tmp_path = tmpdir / "exp_manager_test" + model = ExampleModel() + + trainer = Trainer(max_epochs=1, enable_checkpointing=False, logger=False, devices=1, limit_train_batches=1) + exp_manager( + trainer, + { + "ema": {"enable": True}, + "explicit_log_dir": str(tmp_path), + "checkpoint_callback_params": {"save_best_model": True}, + }, + ) + trainer.fit(model) + class EMAAssertCallback(Callback): def on_train_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None: From 5e09cb49c93845bd390966fcd271c52d282c70e1 Mon Sep 17 00:00:00 2001 From: Virginia Adams <78445382+vadam5@users.noreply.github.com> Date: Thu, 5 Jan 2023 12:14:09 -0800 Subject: [PATCH 105/154] Fix P-Tuning Truncation (#5663) * untokenize truncated field Signed-off-by: Virginia Adams * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated truncation method arugments Signed-off-by: Virginia Adams * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Virginia Adams Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev Signed-off-by: Elena Rastorgueva --- .../megatron/gpt_prompt_learning_dataset.py | 61 +++++++------------ 1 file changed, 22 insertions(+), 39 deletions(-) diff --git a/nemo/collections/nlp/data/language_modeling/megatron/gpt_prompt_learning_dataset.py b/nemo/collections/nlp/data/language_modeling/megatron/gpt_prompt_learning_dataset.py index 69cd485b0ca5..030b9774b3ea 100755 --- a/nemo/collections/nlp/data/language_modeling/megatron/gpt_prompt_learning_dataset.py +++ b/nemo/collections/nlp/data/language_modeling/megatron/gpt_prompt_learning_dataset.py @@ -165,7 +165,15 @@ def load_data(self, dataset): # Try to truncate input text to fit into the max sequence length if len(input_ids) > self.max_seq_length: - input_ids = self._truncate_input(truncation_field, input_ids, taskname, doc) + input_ids = self._truncate_input( + truncation_field, + input_ids, + taskname, + doc, + prompt_template, + prompt_template_fields, + virtual_token_splits, + ) # Skip example if the final length doesn't fit length requirements even after truncation if self.min_seq_length <= len(input_ids) <= self.max_seq_length: @@ -264,7 +272,9 @@ def _insert_virtual_token_placeholders(self, input_example, virtual_token_splits return input_example - def _truncate_input(self, truncation_field, input_ids, taskname, doc): + def _truncate_input( + self, truncation_field, input_ids, taskname, doc, prompt_template, prompt_template_fields, virtual_token_splits + ): """ Try to truncate input text to fit into the max sequence length """ logging.info( f"Input greater than max sequence length. Attempting to truncate: '{truncation_field}' in task: '{taskname}'" @@ -272,17 +282,22 @@ def _truncate_input(self, truncation_field, input_ids, taskname, doc): # Truncate the text ids in this part of input to try and fit max sequence length if truncation_field is not None and truncation_field in doc.keys(): - truncation_length = len(input_ids) - self.max_seq_length + truncation_length = (len(input_ids) - self.max_seq_length) + 1 field_text = doc[truncation_field] - field_text = self._add_leading_space(taskname, truncation_field, field_text) # Truncate field text field_text_ids = self.tokenizer.text_to_ids(field_text) truncated_text_ids = field_text_ids[: -min(truncation_length, len(field_text_ids))] + truncated_field_text = self.tokenizer.ids_to_text(truncated_text_ids) + doc[truncation_field] = truncated_field_text + + # Re-insert the truncated text string into the text prompt + input_example = prompt_template + input_example = self._insert_text_in_template(input_example, prompt_template_fields, doc) + input_example = self._insert_virtual_token_placeholders(input_example, virtual_token_splits) - # Replace original text ids with truncated text ids - field_start, field_end = find_subsequence_location(input_ids, field_text_ids) - input_ids = input_ids[:field_start] + truncated_text_ids + input_ids[field_end + 1 :] + # Re-tokenize the whole prompt + input_ids = self.tokenizer.text_to_ids(input_example) return input_ids @@ -405,35 +420,3 @@ def inference_collate_fn(self, batch): input_ids = torch.cuda.LongTensor(input_ids) return task_id_nums, (input_ids, input_lengths) - - -def find_subsequence_location(sequence, subsequence): - """ Finds the start and end index of the first occurance - of a given subsequence within a larger list. Returns - the two indices corresponding to the postition of - the first and last token of the subseqeunce. - Assumes subsequence is known to be in sequence. - """ - assert len(sequence) >= len(subsequence), "subsequence too long" - - start_idx = None - next_subseq_token = subsequence[0] - next_subsequence_idx = 1 - - for seq_idx, token in enumerate(sequence): - if token == next_subseq_token: - if start_idx is None: - start_idx = seq_idx - - if next_subsequence_idx == len(subsequence): - end_idx = seq_idx - return start_idx, end_idx - else: - next_subseq_token = subsequence[next_subsequence_idx] - next_subsequence_idx += 1 - else: - start_idx = None - next_subseq_token = subsequence[0] - next_subsequence_idx = 1 - - raise ValueError("Subsequence not found in sequence") From 23d3e306bb753cb135ec74c0d1b2b3fd3d031f7b Mon Sep 17 00:00:00 2001 From: schaltung Date: Thu, 5 Jan 2023 17:51:12 -0500 Subject: [PATCH 106/154] Update 00_NeMo_Primer.ipynb (#5740) Fixed a minor typo in primer tutorial. Signed-off-by: schaltung Signed-off-by: schaltung Signed-off-by: Elena Rastorgueva --- tutorials/00_NeMo_Primer.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/00_NeMo_Primer.ipynb b/tutorials/00_NeMo_Primer.ipynb index 5e5dcbb92c1e..50aa60260b35 100644 --- a/tutorials/00_NeMo_Primer.ipynb +++ b/tutorials/00_NeMo_Primer.ipynb @@ -1249,7 +1249,7 @@ "\n", "Tutorials are meant to be more in-depth explanation of the workflow in the discussed task - usually involving a small amount of data to train a small model on a task, along with some explanation of the task itself.\n", "\n", - "White the tutorials are a great example of the simplicity of NeMo, please note for the best performance when training on real datasets, we advice the use of the example scripts instead of the tutorial notebooks. \n", + "While the tutorials are a great example of the simplicity of NeMo, please note for the best performance when training on real datasets, we advice the use of the example scripts instead of the tutorial notebooks. \n", "\n", "NeMo Tutorials directory can be found here - https://github.com/NVIDIA/NeMo/tree/main/tutorials" ] From 90fc37372a776248a43fa3fad04c5645bf9c2a42 Mon Sep 17 00:00:00 2001 From: Kaden Uhlig Date: Fri, 6 Jan 2023 05:47:58 +0100 Subject: [PATCH 107/154] Support non-standard padding token id (#5543) * Support non-standard padding token id Read the id of the padding token from the tokenizer when creating the embedding, rather than always defaulting to 0. This allows use of (admittedly bizarre) non-standard tokenizer models that don't give the padding token the id 0. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Numeri Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sandeep Subramanian Signed-off-by: Elena Rastorgueva --- .../machine_translation/mt_enc_dec_model.py | 13 +++++++++++++ nemo/collections/nlp/modules/common/lm_utils.py | 4 +++- .../nlp/modules/common/transformer/transformer.py | 4 ++++ .../common/transformer/transformer_bottleneck.py | 2 ++ .../common/transformer/transformer_modules.py | 15 ++++++++------- .../common/transformer/transformer_utils.py | 4 ++++ 6 files changed, 34 insertions(+), 8 deletions(-) diff --git a/nemo/collections/nlp/models/machine_translation/mt_enc_dec_model.py b/nemo/collections/nlp/models/machine_translation/mt_enc_dec_model.py index 3427da520b94..ddc3515332fe 100644 --- a/nemo/collections/nlp/models/machine_translation/mt_enc_dec_model.py +++ b/nemo/collections/nlp/models/machine_translation/mt_enc_dec_model.py @@ -38,6 +38,7 @@ from nemo.collections.common.tokenizers.en_ja_tokenizers import EnJaProcessor, JaMecabProcessor from nemo.collections.common.tokenizers.indic_tokenizers import IndicProcessor from nemo.collections.common.tokenizers.moses_tokenizers import MosesProcessor +from nemo.collections.common.tokenizers.tabular_tokenizer import TabularTokenizer from nemo.collections.nlp.data import TarredTranslationDataset, TranslationDataset from nemo.collections.nlp.models.enc_dec_nlp_model import EncDecNLPModel from nemo.collections.nlp.models.machine_translation.mt_enc_dec_config import MTEncDecModelConfig @@ -166,6 +167,11 @@ def __init__(self, cfg: MTEncDecModelConfig, trainer: Trainer = None): model_name = encoder_cfg_dict.pop('model_name', None) pretrained = encoder_cfg_dict.pop('pretrained', False) checkpoint_file = encoder_cfg_dict.pop('checkpoint_file', None) + if isinstance(self.encoder_tokenizer, TabularTokenizer): + # TabularTokenizer does not include a padding token, so this uses the prior default of 0. + encoder_padding_idx = 0 + else: + encoder_padding_idx = self.encoder_tokenizer.pad_id self.encoder = get_transformer( library=library, model_name=model_name, @@ -174,6 +180,7 @@ def __init__(self, cfg: MTEncDecModelConfig, trainer: Trainer = None): encoder=True, pre_ln_final_layer_norm=encoder_cfg_dict.get('pre_ln_final_layer_norm', False), checkpoint_file=checkpoint_file, + padding_idx=encoder_padding_idx, ) # decoder from NeMo, Megatron-LM, or HuggingFace @@ -182,6 +189,11 @@ def __init__(self, cfg: MTEncDecModelConfig, trainer: Trainer = None): library = decoder_cfg_dict.pop('library', 'nemo') model_name = decoder_cfg_dict.pop('model_name', None) pretrained = decoder_cfg_dict.pop('pretrained', False) + if isinstance(self.decoder_tokenizer, TabularTokenizer): + # TabularTokenizer does not include a padding token, so this uses the prior default of 0. + decoder_padding_idx = 0 + else: + decoder_padding_idx = self.decoder_tokenizer.pad_id self.decoder = get_transformer( library=library, model_name=model_name, @@ -189,6 +201,7 @@ def __init__(self, cfg: MTEncDecModelConfig, trainer: Trainer = None): config_dict=decoder_cfg_dict, encoder=False, pre_ln_final_layer_norm=decoder_cfg_dict.get('pre_ln_final_layer_norm', False), + padding_idx=decoder_padding_idx, ) # validate hidden_size of encoder and decoder diff --git a/nemo/collections/nlp/modules/common/lm_utils.py b/nemo/collections/nlp/modules/common/lm_utils.py index d02883f5df1c..485c30ee9072 100644 --- a/nemo/collections/nlp/modules/common/lm_utils.py +++ b/nemo/collections/nlp/modules/common/lm_utils.py @@ -150,7 +150,8 @@ def get_transformer( config_dict: Optional[dict] = None, checkpoint_file: Optional[str] = None, encoder: bool = True, - pre_ln_final_layer_norm=True, + pre_ln_final_layer_norm: bool = True, + padding_idx: int = 0, ) -> Union[EncoderModule, DecoderModule]: """Gets Transformer based model to be used as an Encoder or Decoder in NeMo NLP. First choose the library to get the transformer from. This can be huggingface, @@ -191,6 +192,7 @@ def get_transformer( config_dict=config_dict, encoder=encoder, pre_ln_final_layer_norm=pre_ln_final_layer_norm, + padding_idx=padding_idx, ) if checkpoint_file is not None: diff --git a/nemo/collections/nlp/modules/common/transformer/transformer.py b/nemo/collections/nlp/modules/common/transformer/transformer.py index 65c9233fc37b..5870303be93e 100644 --- a/nemo/collections/nlp/modules/common/transformer/transformer.py +++ b/nemo/collections/nlp/modules/common/transformer/transformer.py @@ -92,6 +92,7 @@ def __init__( mask_future: bool = False, pre_ln: bool = False, pre_ln_final_layer_norm: bool = True, + padding_idx: int = 0, ): super().__init__() @@ -106,6 +107,7 @@ def __init__( num_token_types=num_token_types, embedding_dropout=embedding_dropout, learn_positional_encodings=learn_positional_encodings, + padding_idx=padding_idx, ) self._encoder = TransformerEncoder( @@ -179,6 +181,7 @@ def __init__( hidden_act: str = 'relu', pre_ln: bool = False, pre_ln_final_layer_norm: bool = True, + padding_idx: int = 0, ): super().__init__() @@ -197,6 +200,7 @@ def __init__( num_token_types=num_token_types, embedding_dropout=embedding_dropout, learn_positional_encodings=learn_positional_encodings, + padding_idx=padding_idx, ) self._decoder = TransformerDecoder( diff --git a/nemo/collections/nlp/modules/common/transformer/transformer_bottleneck.py b/nemo/collections/nlp/modules/common/transformer/transformer_bottleneck.py index 0d1ddea2f6bd..b646ad8f382d 100644 --- a/nemo/collections/nlp/modules/common/transformer/transformer_bottleneck.py +++ b/nemo/collections/nlp/modules/common/transformer/transformer_bottleneck.py @@ -82,6 +82,7 @@ def __init__( hidden_steps: int = -1, hidden_blocks: int = 1, hidden_init_method: str = "default", + padding_idx: int = 0, # default whether forward() method returns hidden or (hidden, mask) return_mask=True, ): @@ -102,6 +103,7 @@ def __init__( mask_future=mask_future, pre_ln=pre_ln, pre_ln_final_layer_norm=pre_ln_final_layer_norm, + padding_idx=padding_idx, ) self._arch = arch diff --git a/nemo/collections/nlp/modules/common/transformer/transformer_modules.py b/nemo/collections/nlp/modules/common/transformer/transformer_modules.py index 25fb781f0cd4..de9af5ffab75 100644 --- a/nemo/collections/nlp/modules/common/transformer/transformer_modules.py +++ b/nemo/collections/nlp/modules/common/transformer/transformer_modules.py @@ -98,18 +98,19 @@ class TransformerEmbedding(nn.Module): def __init__( self, - vocab_size, - hidden_size, - max_sequence_length=512, - num_token_types=2, - embedding_dropout=0.0, - learn_positional_encodings=False, + vocab_size: int, + hidden_size: int, + max_sequence_length: int = 512, + num_token_types: int = 2, + embedding_dropout: float = 0.0, + learn_positional_encodings: bool = False, + padding_idx: int = 0, ): super().__init__() self.max_sequence_length = max_sequence_length self.learn_positional_encodings = learn_positional_encodings - self.token_embedding = nn.Embedding(vocab_size, hidden_size, padding_idx=0) + self.token_embedding = nn.Embedding(vocab_size, hidden_size, padding_idx=padding_idx) if learn_positional_encodings: self.position_embedding = nn.Embedding(max_sequence_length, hidden_size) else: diff --git a/nemo/collections/nlp/modules/common/transformer/transformer_utils.py b/nemo/collections/nlp/modules/common/transformer/transformer_utils.py index ab1c887fc7bc..ba558b9c025a 100644 --- a/nemo/collections/nlp/modules/common/transformer/transformer_utils.py +++ b/nemo/collections/nlp/modules/common/transformer/transformer_utils.py @@ -29,6 +29,7 @@ def get_nemo_transformer( config_dict: Optional[Union[dict, DictConfig]] = None, encoder: bool = True, pre_ln_final_layer_norm: bool = True, + padding_idx: int = 0, ) -> Union[TransformerEncoderNM, TransformerDecoderNM]: """Returns NeMo transformer. The following configurations are mandatory: @@ -85,6 +86,7 @@ def get_nemo_transformer( pre_ln=cfg.get('pre_ln', False), pre_ln_final_layer_norm=pre_ln_final_layer_norm, num_token_types=cfg.get('num_token_types', 2), + padding_idx=padding_idx, ) elif arch in TransformerBottleneckEncoderNM._SUPPORTED_ARCH: model = TransformerBottleneckEncoderNM( @@ -109,6 +111,7 @@ def get_nemo_transformer( hidden_blocks=cfg.get('hidden_blocks', 1), hidden_init_method=cfg.get('hidden_init_method', 'default'), return_mask=cfg.get('return_mask', True), + padding_idx=padding_idx, ) else: raise ValueError(f"Unknown arch = {arch}") @@ -129,6 +132,7 @@ def get_nemo_transformer( pre_ln=cfg.get('pre_ln', False), pre_ln_final_layer_norm=pre_ln_final_layer_norm, num_token_types=cfg.get('num_token_types', 2), + padding_idx=padding_idx, ) return model From 9f46caff98ac3d583de37f268fee787d4620418a Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 6 Jan 2023 05:53:37 -0800 Subject: [PATCH 108/154] typo and link fixed (#5741) (#5744) Signed-off-by: ekmb Signed-off-by: ekmb Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- tutorials/tts/Pronunciation_customization.ipynb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tutorials/tts/Pronunciation_customization.ipynb b/tutorials/tts/Pronunciation_customization.ipynb index d7b97daf701f..f6ae09f7aa8a 100644 --- a/tutorials/tts/Pronunciation_customization.ipynb +++ b/tutorials/tts/Pronunciation_customization.ipynb @@ -239,7 +239,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Paracetomol** is no longer an OOV, and the model uses the phonetic form we provided:" + "**Paracetamol** is no longer an OOV, and the model uses the phonetic form we provided:" ] }, { @@ -280,9 +280,9 @@ "metadata": {}, "source": [ "# Resources\n", - "* [TTS pipeline costumizaiton](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-custom.html#tts-pipeline-configuration)\n", + "* [TTS pipeline customization](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-custom.html#tts-pipeline-configuration)\n", "* [Overview of TTS in NeMo](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/NeMo_TTS_Primer.ipynb)\n", - "* [G2P models in NeMo](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/text_processing/g2p/g2p.html)\n", + "* [G2P models in NeMo](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/tts/g2p.html)\n", "* [Riva TTS documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-overview.html)" ] } From 29e36bb06007b4ef80225c1b13843c3ed8207a57 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 6 Jan 2023 10:25:38 -0800 Subject: [PATCH 109/154] link fixed (#5745) (#5746) Signed-off-by: ekmb Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- tutorials/tts/NeMo_TTS_Primer.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tutorials/tts/NeMo_TTS_Primer.ipynb b/tutorials/tts/NeMo_TTS_Primer.ipynb index 21c366155b17..a0e0b3f7097a 100644 --- a/tutorials/tts/NeMo_TTS_Primer.ipynb +++ b/tutorials/tts/NeMo_TTS_Primer.ipynb @@ -358,7 +358,7 @@ "\n", "For non-phonetic languages like English it is still possible to train a TTS model directly on the graphemes. But doing so will make the pronunciation of some words less accurate.\n", "\n", - "Details on how NeMo G2P works can be found in our [G2P documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/text_processing/g2p/g2p.html)." + "Details on how NeMo G2P works can be found in our [G2P documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/tts/g2p.html)." ] }, { @@ -2072,4 +2072,4 @@ }, "nbformat": 4, "nbformat_minor": 1 -} +} \ No newline at end of file From e457a0d05b6367531b95ebaae4986eab2c809b49 Mon Sep 17 00:00:00 2001 From: Ryan Langman Date: Fri, 6 Jan 2023 11:20:48 -0800 Subject: [PATCH 110/154] [TTS] Update Spanish TTS model to 1.15 (#5742) Signed-off-by: Ryan Signed-off-by: Elena Rastorgueva --- docs/source/tts/data/ngc_models_am.csv | 2 +- docs/source/tts/data/ngc_models_vocoder.csv | 2 +- nemo/collections/tts/models/fastpitch.py | 4 ++-- nemo/collections/tts/models/hifigan.py | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/source/tts/data/ngc_models_am.csv b/docs/source/tts/data/ngc_models_am.csv index c31922ca907f..6233368e369d 100644 --- a/docs/source/tts/data/ngc_models_am.csv +++ b/docs/source/tts/data/ngc_models_am.csv @@ -8,5 +8,5 @@ en-US,RAD-TTS,TBD,TBD,TBD,ARPABET,nemo.collections.tts.models.radtts.RadTTSModel en-US,tts_en_tacotron2,LJSpeech,22050Hz,1,ARPABET,nemo.collections.tts.models.tacotron2.Tacotron2Model,`tts_en_tacotron2 `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_tacotron2/versions/1.0.0/files/tts_en_tacotron2.nemo`` de-DE,tts_de_fastpitch_multispeaker_5,HUI Audio Corpus German,44100Hz,5,ARPABET,nemo.collections.tts.models.fastpitch.FastPitchModel,`tts_de_fastpitch_multispeaker_5 `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitch_multispeaker_5/versions/1.11.0/files/tts_de_fastpitch_multispeaker_5.nemo`` de-DE,tts_de_fastpitch_singlespeaker,Thorsten Müller (German Neutral-TTS dataset),22050Hz,1,ARPABET,nemo.collections.tts.models.fastpitch.FastPitchModel,`tts_de_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitchhifigan/versions/1.10.0/files/tts_de_fastpitch_align.nemo`` -es,tts_es_fastpitch_multispeaker,OpenSLR crowdsourced Latin American Spanish,44100Hz,174,grapheme,nemo.collections.tts.models.fastpitch.FastPitchModel,`tts_es_multispeaker_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.14.0/files/tts_es_fastpitch_multispeaker.nemo`` +es,tts_es_fastpitch_multispeaker,OpenSLR crowdsourced Latin American Spanish,44100Hz,174,IPA,nemo.collections.tts.models.fastpitch.FastPitchModel,`tts_es_multispeaker_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.15.0/files/tts_es_fastpitch_multispeaker.nemo`` zh-CN ,tts_zh_fastpitch_sfspeech,SFSpeech Chinese/English Bilingual Speech,22050Hz,1,pinyin,nemo.collections.tts.models.fastpitch.FastPitchModel,`tts_zh_fastpitch_hifigan_sfspeech `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_zh_fastpitch_hifigan_sfspeech/versions/1.14.0/files/tts_zh_fastpitch_sfspeech.nemo`` diff --git a/docs/source/tts/data/ngc_models_vocoder.csv b/docs/source/tts/data/ngc_models_vocoder.csv index b6f23b3425f9..b049931c211e 100644 --- a/docs/source/tts/data/ngc_models_vocoder.csv +++ b/docs/source/tts/data/ngc_models_vocoder.csv @@ -9,5 +9,5 @@ en-US,tts_waveglow_88m,librosa.filters.mel,LJSpeech,22050Hz,1,nemo.collections.t en-US,tts_waveglow_268m,librosa.filters.mel,LJSpeech,22050Hz,1,nemo.collections.tts.models.waveglow.WaveGlowModel,`tts_waveglow_268m `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_waveglow_268m/versions/1.0.0rc1/files/tts_waveglow_268m.nemo`` de-DE,tts_de_hui_hifigan_ft_fastpitch_multispeaker_5,FastPitch,HUI Audio Corpus German,44100Hz,5,nemo.collections.tts.models.hifigan.HifiGanModel,`tts_de_fastpitch_multispeaker_5 `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitch_multispeaker_5/versions/1.11.0/files/tts_de_hui_hifigan_ft_fastpitch_multispeaker_5.nemo`` de-DE,tts_de_slr_hifigan_ft_fastpitch_singlespeaker,FastPitch,Thorsten Müller (German Neutral-TTS dataset),22050Hz,1,nemo.collections.tts.models.hifigan.HifiGanModel,`tts_de_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitchhifigan/versions/1.10.0/files/tts_de_hifigan.nemo`` -es,tts_es_hifigan_ft_fastpitch_multispeaker,FastPitch,OpenSLR crowdsourced Latin American Spanish,44100Hz,174,nemo.collections.tts.models.hifigan.HifiGanModel,`tts_es_multispeaker_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.14.0/files/tts_es_hifigan_ft_fastpitch_multispeaker.nemo`` +es,tts_es_hifigan_ft_fastpitch_multispeaker,FastPitch,OpenSLR crowdsourced Latin American Spanish,44100Hz,174,nemo.collections.tts.models.hifigan.HifiGanModel,`tts_es_multispeaker_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.15.0/files/tts_es_hifigan_ft_fastpitch_multispeaker.nemo`` zh-CN ,tts_zh_hifigan_sfspeech,FastPitch,SFSpeech Chinese/English Bilingual Speech,22050Hz,1,nemo.collections.tts.models.hifigan.HifiGanModel,`tts_zh_fastpitch_hifigan_sfspeech `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_zh_fastpitch_hifigan_sfspeech/versions/1.14.0/files/tts_zh_hifigan_sfspeech.nemo`` diff --git a/nemo/collections/tts/models/fastpitch.py b/nemo/collections/tts/models/fastpitch.py index 95745140db45..60dd4fe06271 100644 --- a/nemo/collections/tts/models/fastpitch.py +++ b/nemo/collections/tts/models/fastpitch.py @@ -601,10 +601,10 @@ def list_available_models(cls) -> 'List[PretrainedModelInfo]': ) list_of_models.append(model) - # es, 174 speakers, 44100Hz, OpenSLR + # es, 174 speakers, 44100Hz, OpenSLR (IPA) model = PretrainedModelInfo( pretrained_model_name="tts_es_fastpitch_multispeaker", - location="https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.14.0/files/tts_es_fastpitch_multispeaker.nemo", + location="https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.15.0/files/tts_es_fastpitch_multispeaker.nemo", description="This model is trained on 174 speakers in 6 crowdsourced Latin American Spanish OpenSLR datasets sampled at 44100Hz and can be used to generate male and female Spanish voices with Latin American accents.", class_=cls, ) diff --git a/nemo/collections/tts/models/hifigan.py b/nemo/collections/tts/models/hifigan.py index a431d84a9e11..e1b4bd5b0cd8 100644 --- a/nemo/collections/tts/models/hifigan.py +++ b/nemo/collections/tts/models/hifigan.py @@ -406,7 +406,7 @@ def list_available_models(cls) -> 'Optional[Dict[str, str]]': # Spanish, multi-speaker, 44100 Hz, Latin American Spanish OpenSLR model = PretrainedModelInfo( pretrained_model_name="tts_es_hifigan_ft_fastpitch_multispeaker", - location="https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.14.0/files/tts_es_hifigan_ft_fastpitch_multispeaker.nemo", + location="https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.15.0/files/tts_es_hifigan_ft_fastpitch_multispeaker.nemo", description="This model is trained on the audio from 6 crowdsourced Latin American Spanish OpenSLR " "datasets and finetuned on the mel-spectrograms generated from the FastPitch checkpoint " "`tts_es_fastpitch_multispeaker`. This model has been tested on generating male and female " From 1a413c192f46ccc841dc73c1d7ddc05dd21f7b63 Mon Sep 17 00:00:00 2001 From: Igor Gitman Date: Fri, 6 Jan 2023 11:52:07 -0800 Subject: [PATCH 111/154] Fix for incorrect computation of batched alignment in transducers (#5692) * Fix rnnt alignment bug and add test Signed-off-by: Igor Gitman * Add tests/fixes for more decoding configurations Signed-off-by: Igor Gitman * Add tests/fixes for frame confidence computation Signed-off-by: Igor Gitman * Rename test file to avoid local execution Signed-off-by: Igor Gitman * Add test to jenkinsfile Signed-off-by: Igor Gitman * Proper fix for alignments + remove code duplication Signed-off-by: Igor Gitman * Return back separate mask processing Signed-off-by: Igor Gitman * Override cleanup fixture Signed-off-by: Igor Gitman * Add a TODO for multiblank RNNT Signed-off-by: Igor Gitman Signed-off-by: Igor Gitman Signed-off-by: Elena Rastorgueva --- Jenkinsfile | 16 ++++ .../parts/submodules/rnnt_greedy_decoding.py | 34 ++++---- .../asr/decoding/rnnt_alignments_check.py | 83 +++++++++++++++++++ 3 files changed, 119 insertions(+), 14 deletions(-) create mode 100644 tests/collections/asr/decoding/rnnt_alignments_check.py diff --git a/Jenkinsfile b/Jenkinsfile index a3354bca11a3..6bcdb89a98c8 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -988,6 +988,22 @@ pipeline { } } } + stage('L2: Transducer alignment') { + when { + anyOf { + branch 'main' + changeRequest target: 'main' + } + } + failFast true + parallel { + stage('Running pytest') { + steps { + sh 'pytest tests/collections/asr/decoding/rnnt_alignments_check.py --durations=-1' + } + } + } + } stage('L2: Segmentation Tool') { when { diff --git a/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py b/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py index a48af434bf2b..5e98b03f2fe2 100644 --- a/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py +++ b/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py @@ -403,7 +403,6 @@ def _greedy_decode( # Setup exit flags and counter not_blank = True symbols_added = 0 - # While blank is not predicted, or we dont run out of max symbols per timestep while not_blank and (self.max_symbols is None or symbols_added < self.max_symbols): # In the first timestep, we initialize the network with RNNT Blank @@ -632,13 +631,13 @@ def _greedy_decode_blank_as_pad( # Initialize Hidden state matrix (shared by entire batch) hidden = None - # If alignments need to be preserved, register a danling list to hold the values + # If alignments need to be preserved, register a dangling list to hold the values if self.preserve_alignments: # alignments is a 3-dimensional dangling list representing B x T x U for hyp in hypotheses: hyp.alignments = [[]] - # If confidence scores need to be preserved, register a danling list to hold the values + # If confidence scores need to be preserved, register a dangling list to hold the values if self.preserve_frame_confidence: # frame_confidence is a 3-dimensional dangling list representing B x T x U for hyp in hypotheses: @@ -670,7 +669,6 @@ def _greedy_decode_blank_as_pad( # Batch: [B, T, D], but Bi may have seq len < max(seq_lens_in_batch) # Forcibly mask with "blank" tokens, for all sample where current time step T > seq_len blank_mask = time_idx >= out_len - # Start inner loop while not_blank and (self.max_symbols is None or symbols_added < self.max_symbols): # Batch prediction and joint network steps @@ -699,6 +697,7 @@ def _greedy_decode_blank_as_pad( # This is accumulating blanks over all time steps T and all target steps min(max_symbols, U) k_is_blank = k == self._blank_index blank_mask.bitwise_or_(k_is_blank) + all_blanks = torch.all(blank_mask) del k_is_blank @@ -708,8 +707,11 @@ def _greedy_decode_blank_as_pad( # Insert logprobs into last timestep per sample logp_vals = logp.to('cpu') logp_ids = logp_vals.max(1)[1] - for batch_idx in range(batchsize): - if time_idx < out_len[batch_idx]: + for batch_idx, is_blank in enumerate(blank_mask): + # we only want to update non-blanks, unless we are at the last step in the loop where + # all elements produced blanks, otherwise there will be duplicate predictions + # saved in alignments + if time_idx < out_len[batch_idx] and (all_blanks or not is_blank): hypotheses[batch_idx].alignments[-1].append( (logp_vals[batch_idx], logp_ids[batch_idx]) ) @@ -720,14 +722,14 @@ def _greedy_decode_blank_as_pad( if self.preserve_frame_confidence: # Insert probabilities into last timestep per sample confidence = self._get_confidence(logp) - for batch_idx in range(batchsize): - if time_idx < out_len[batch_idx]: + for batch_idx, is_blank in enumerate(blank_mask): + if time_idx < out_len[batch_idx] and (all_blanks or not is_blank): hypotheses[batch_idx].frame_confidence[-1].append(confidence[batch_idx]) del logp # If all samples predict / have predicted prior blanks, exit loop early # This is equivalent to if single sample predicted k - if blank_mask.all(): + if all_blanks: not_blank = False # If preserving alignments, convert the current Uj alignments into a torch.Tensor @@ -787,7 +789,6 @@ def _greedy_decode_blank_as_pad( hypotheses[kidx].y_sequence.append(ki) hypotheses[kidx].timestep.append(time_idx) hypotheses[kidx].score += float(v[kidx]) - symbols_added += 1 # Remove trailing empty list of alignments at T_{am-len} x Uj @@ -912,6 +913,7 @@ def _greedy_decode_masked( # This is accumulating blanks over all time steps T and all target steps min(max_symbols, U) k_is_blank = k == self._blank_index blank_mask.bitwise_or_(k_is_blank) + all_blanks = torch.all(blank_mask) # If preserving alignments, check if sequence length of sample has been reached # before adding alignment @@ -919,11 +921,15 @@ def _greedy_decode_masked( # Insert logprobs into last timestep per sample logp_vals = logp.to('cpu') logp_ids = logp_vals.max(1)[1] - for batch_idx in range(batchsize): - if time_idx < out_len[batch_idx]: + for batch_idx, is_blank in enumerate(blank_mask): + # we only want to update non-blanks, unless we are at the last step in the loop where + # all elements produced blanks, otherwise there will be duplicate predictions + # saved in alignments + if time_idx < out_len[batch_idx] and (all_blanks or not is_blank): hypotheses[batch_idx].alignments[-1].append( (logp_vals[batch_idx], logp_ids[batch_idx]) ) + del logp_vals # If preserving per-frame confidence, check if sequence length of sample has been reached @@ -931,8 +937,8 @@ def _greedy_decode_masked( if self.preserve_frame_confidence: # Insert probabilities into last timestep per sample confidence = self._get_confidence(logp) - for batch_idx in range(batchsize): - if time_idx < out_len[batch_idx]: + for batch_idx, is_blank in enumerate(blank_mask): + if time_idx < out_len[batch_idx] and (all_blanks or not is_blank): hypotheses[batch_idx].frame_confidence[-1].append(confidence[batch_idx]) del logp diff --git a/tests/collections/asr/decoding/rnnt_alignments_check.py b/tests/collections/asr/decoding/rnnt_alignments_check.py new file mode 100644 index 000000000000..24000ff7ead7 --- /dev/null +++ b/tests/collections/asr/decoding/rnnt_alignments_check.py @@ -0,0 +1,83 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +# NOTE: the file name does not contain "test" on purpose to avoid executing +# these tests outside of the CI machines environment, where test data is +# stored + +import pytest +from examples.asr.transcribe_speech import TranscriptionConfig +from omegaconf import OmegaConf + +from nemo.collections.asr.parts.utils.transcribe_utils import prepare_audio_data, setup_model + +TEST_DATA_PATH = "/home/TestData/an4_dataset/an4_val.json" +PRETRAINED_MODEL_NAME = "stt_en_conformer_transducer_small" + + +def get_rnnt_alignments(strategy: str): + cfg = OmegaConf.structured(TranscriptionConfig(pretrained_name=PRETRAINED_MODEL_NAME)) + cfg.rnnt_decoding.confidence_cfg.preserve_frame_confidence = True + cfg.rnnt_decoding.preserve_alignments = True + cfg.rnnt_decoding.strategy = strategy + cfg.dataset_manifest = TEST_DATA_PATH + filepaths = prepare_audio_data(cfg)[0][:10] # selecting 10 files only + + model = setup_model(cfg, map_location="cuda")[0] + model.change_decoding_strategy(cfg.rnnt_decoding) + + transcriptions = model.transcribe( + paths2audio_files=filepaths, + batch_size=cfg.batch_size, + num_workers=cfg.num_workers, + return_hypotheses=True, + channel_selector=cfg.channel_selector, + )[0] + + for transcription in transcriptions: + for align_elem, frame_confidence in zip(transcription.alignments, transcription.frame_confidence): + assert len(align_elem) == len(frame_confidence) # frame confidences have to match alignments + assert len(align_elem) > 0 # no empty alignments + for idx, pred in enumerate(align_elem): + if idx < len(align_elem) - 1: + assert pred[1].item() != model.decoder.blank_idx # all except last have to be non-blank + else: + assert pred[1].item() == model.decoder.blank_idx # last one has to be blank + return transcriptions + + +@pytest.fixture(autouse=True) +def cleanup_local_folder(): + """Overriding global fixture to make sure it's not applied for this test. + + Otherwise, there will be errors in the CI in github. + """ + return + + +# TODO: add the same tests for multi-blank RNNT decoding +def test_rnnt_alignments(): + # using greedy as baseline and comparing all other configurations to it + ref_transcriptions = get_rnnt_alignments("greedy") + transcriptions = get_rnnt_alignments("greedy_batch") + # comparing that label sequence in alignments is exactly the same + # we can't compare logits as well, because they are expected to be + # slightly different in batched and single-sample mode + assert len(ref_transcriptions) == len(transcriptions) + for ref_transcription, transcription in zip(ref_transcriptions, transcriptions): + for ref_align_elem, align_elem in zip(ref_transcription.alignments, transcription.alignments): + assert len(ref_align_elem) == len(align_elem) + for ref_pred, pred in zip(ref_align_elem, align_elem): + assert ref_pred[1].item() == pred[1].item() From 33af65d3f5fad43ae4c70a37d46cebf8d230a53e Mon Sep 17 00:00:00 2001 From: Sandeep Subramanian Date: Fri, 6 Jan 2023 14:12:37 -0800 Subject: [PATCH 112/154] Move Attention and MLP classes to a separate file in Megatron transformers (#5453) * Move attention and mlp to separate files Signed-off-by: MaximumEntropy * Add new attention and mlp files Signed-off-by: MaximumEntropy * Fix import in tests Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports in attention Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing import Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy Signed-off-by: MaximumEntropy Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper Co-authored-by: Oleksii Kuchaiev Signed-off-by: Elena Rastorgueva --- .../nlp/modules/common/megatron/attention.py | 839 +++++++++++++ .../nlp/modules/common/megatron/mlp.py | 358 ++++++ .../modules/common/megatron/transformer.py | 1116 +---------------- .../collections/nlp/test_retrieval_module.py | 2 +- .../nlp/test_retrieval_module_inference.py | 2 +- 5 files changed, 1203 insertions(+), 1114 deletions(-) create mode 100644 nemo/collections/nlp/modules/common/megatron/attention.py create mode 100644 nemo/collections/nlp/modules/common/megatron/mlp.py diff --git a/nemo/collections/nlp/modules/common/megatron/attention.py b/nemo/collections/nlp/modules/common/megatron/attention.py new file mode 100644 index 000000000000..522fd80372e0 --- /dev/null +++ b/nemo/collections/nlp/modules/common/megatron/attention.py @@ -0,0 +1,839 @@ +# coding=utf-8 +# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import math + +import torch +import torch.nn.functional as F +from einops import rearrange, repeat + +from nemo.collections.nlp.modules.common.megatron.adapters.parallel_adapters import AdapterName, InfusedAdapterConfig +from nemo.collections.nlp.modules.common.megatron.fused_softmax import MatchedScaleMaskSoftmax +from nemo.collections.nlp.modules.common.megatron.module import MegatronModule +from nemo.collections.nlp.modules.common.megatron.rotary_pos_embedding import apply_rotary_pos_emb +from nemo.collections.nlp.modules.common.megatron.utils import ApexGuardDefaults, attention_mask_func +from nemo.core import adapter_mixins + +try: + from apex.transformer import parallel_state, tensor_parallel + from apex.transformer.enums import AttnMaskType, AttnType + from apex.transformer.utils import divide as safe_divide + + HAVE_APEX = True + +except (ImportError, ModuleNotFoundError): + + HAVE_APEX = False + + # fake missing classes with None attributes + ModelType = AttnMaskType = AttnType = LayerType = ApexGuardDefaults() + +""" We use the following notation throughout this file: + h: hidden size + n: number of attention heads + p: number of model parallel partitions + np: n/p + hp: h/p + hn: h/n + b: batch size + s: sequence length + l: number of layers + Transformer takes input of size [s, b, h] and returns a + tensor of the same size. We use the following arguments: + hyperparameters: transformer hyperparameters +""" + + +class ParallelAttention(MegatronModule, adapter_mixins.AdapterModuleMixin): + """Parallel self-attention layer abstract class. + + Self-attention layer takes input with size [s, b, h] + and returns output of the same size. + """ + + def __init__( + self, + init_method, + output_layer_init_method, + layer_number, + num_attention_heads, + hidden_size, + attention_type=AttnType.self_attn, + attn_mask_type=AttnMaskType.padding, + precision=16, + apply_query_key_layer_scaling=True, + kv_channels=None, + use_cpu_initialization=False, + masked_softmax_fusion=True, + attention_dropout=0.1, + layer_type=None, + megatron_legacy=False, + bias=True, + headscale=False, + activations_checkpoint_granularity=None, + sequence_parallel=False, + gradient_accumulation_fusion=False, + normalize_attention_scores=True, + ): + super(ParallelAttention, self).__init__() + + self.layer_number = max(1, layer_number) + self.attention_type = attention_type + self.attn_mask_type = attn_mask_type + self.normalize_attention_scores = normalize_attention_scores + + self.megatron_legacy = megatron_legacy + + self.set_accepted_adapter_types([InfusedAdapterConfig._target_]) + + if kv_channels is None: + assert ( + hidden_size % num_attention_heads == 0 + ), 'hidden_size must be divisible by num_attention_heads if kv_channels is None' + kv_channels = hidden_size // num_attention_heads + projection_size = kv_channels * num_attention_heads + + # Per attention head and per partition values. + world_size = parallel_state.get_tensor_model_parallel_world_size() + self.hidden_size_per_attention_head = safe_divide(projection_size, num_attention_heads) + self.num_attention_heads_per_partition = safe_divide(num_attention_heads, world_size) + self.num_attention_heads_partition_offset = ( + self.num_attention_heads_per_partition * parallel_state.get_tensor_model_parallel_rank() + ) + + no_async_tensor_model_parallel_allreduce = ( + parallel_state.get_tensor_model_parallel_world_size() == 1 or sequence_parallel + ) + + # Strided linear layer. + if attention_type == AttnType.self_attn: + self.query_key_value = tensor_parallel.ColumnParallelLinear( + hidden_size, + 3 * projection_size, + gather_output=False, + init_method=init_method, + use_cpu_initialization=use_cpu_initialization, + bias=bias, + sequence_parallel_enabled=sequence_parallel, + no_async_tensor_model_parallel_allreduce=no_async_tensor_model_parallel_allreduce, + gradient_accumulation_fusion=gradient_accumulation_fusion, + ) + else: + assert attention_type == AttnType.cross_attn + self.query = tensor_parallel.ColumnParallelLinear( + hidden_size, + projection_size, + gather_output=False, + init_method=init_method, + bias=bias, + sequence_parallel_enabled=sequence_parallel, + no_async_tensor_model_parallel_allreduce=no_async_tensor_model_parallel_allreduce, + gradient_accumulation_fusion=gradient_accumulation_fusion, + ) + + self.key_value = tensor_parallel.ColumnParallelLinear( + hidden_size, + 2 * projection_size, + gather_output=False, + init_method=init_method, + bias=bias, + sequence_parallel_enabled=sequence_parallel, + no_async_tensor_model_parallel_allreduce=no_async_tensor_model_parallel_allreduce, + gradient_accumulation_fusion=gradient_accumulation_fusion, + ) + + self.core_attention = CoreAttention( + layer_number=self.layer_number, + num_attention_heads=num_attention_heads, + hidden_size=hidden_size, + attention_type=self.attention_type, + attn_mask_type=self.attn_mask_type, + precision=precision, + apply_query_key_layer_scaling=apply_query_key_layer_scaling, + kv_channels=kv_channels, + masked_softmax_fusion=masked_softmax_fusion, + attention_dropout=attention_dropout, + sequence_parallel=sequence_parallel, + normalize_attention_scores=normalize_attention_scores, + ) + + # Output. + self.dense = tensor_parallel.RowParallelLinear( + projection_size, + hidden_size, + input_is_parallel=True, + init_method=output_layer_init_method, + skip_bias_add=True, + use_cpu_initialization=use_cpu_initialization, + bias=bias, + sequence_parallel_enabled=sequence_parallel, + gradient_accumulation_fusion=gradient_accumulation_fusion, + ) + + self.headscale = headscale + if headscale: + self.head_scale_tensor = torch.nn.Parameter( + torch.ones(1, self.num_attention_heads_per_partition, 1, 1), requires_grad=True + ) + + # Inference key-value memory + self.inference_key_memory = None + self.inference_value_memory = None + self.inference_current_sequence_len = 0 + + # relative position embedding + self.layer_type = layer_type + + def _checkpointed_attention_forward( + self, + query_layer, + key_layer, + value_layer, + attention_mask, + rotary_pos_emb=None, + relative_position_bias=None, + headscale_tensor=None, + ): + """Forward method with activation checkpointing.""" + + def custom_forward(*inputs): + if len(inputs) == 7: + query_layer = inputs[0] + key_layer = inputs[1] + value_layer = inputs[2] + attention_mask = inputs[3] + rotary_pos_emb = inputs[4] + relative_position_bias = inputs[5] + headscale_tensor = inputs[6] + elif len(inputs) == 8: + query_layer = inputs[0] + key_layer = inputs[1] + value_layer = inputs[2] + attention_mask = inputs[3] + rotary_pos_emb = (inputs[4], inputs[5]) + relative_position_bias = inputs[6] + headscale_tensor = inputs[7] + else: + raise ValueError('unexpected number of inputs') + output_ = self.core_attention( + query_layer, + key_layer, + value_layer, + attention_mask, + rotary_pos_emb=rotary_pos_emb, + relative_position_bias=relative_position_bias, + headscale_tensor=headscale_tensor, + ) + return output_ + + if rotary_pos_emb is None: + rot_tuple = (rotary_pos_emb,) + else: + rot_tuple = (rotary_pos_emb[0], rotary_pos_emb[1]) + + hidden_states = tensor_parallel.checkpoint( + custom_forward, + False, + query_layer, + key_layer, + value_layer, + attention_mask, + *rot_tuple, + relative_position_bias, + headscale_tensor, + ) + + return hidden_states + + def _allocate_memory(self, inference_max_sequence_len, batch_size, dtype): + return torch.empty( + inference_max_sequence_len, + batch_size, + self.num_attention_heads_per_partition, + self.hidden_size_per_attention_head, + dtype=dtype, + device=torch.cuda.current_device(), + ) + + def _transpose_last_dim(self, mixed_layer, num_splits, num_splits_first): + input_shape = mixed_layer.size() + if num_splits_first: + """[s, b, num_splits * np * hn] + -->(view) [s, b, num_splits, np, hn] + -->(tranpose) [s, b, np, num_splits, hn] + -->(view) [s, b, np * num_splits * hn] """ + + intermediate_shape = input_shape[:-1] + ( + num_splits, + self.num_attention_heads_per_partition, + self.hidden_size_per_attention_head, + ) + + mixed_layer = mixed_layer.view(*intermediate_shape) + mixed_layer = mixed_layer.transpose(-2, -3).contiguous() + else: + """[s, b, np * hn * num_splits] + -->(view) [s, b, np, hn, num_splits] + -->(tranpose) [s, b, np, num_splits, hn] + -->(view) [s, b, np * num_splits * hn] """ + + intermediate_shape = input_shape[:-1] + ( + self.num_attention_heads_per_partition, + self.hidden_size_per_attention_head, + num_splits, + ) + + mixed_layer = mixed_layer.view(*intermediate_shape) + mixed_layer = mixed_layer.transpose(-1, -2).contiguous() + mixed_layer = mixed_layer.view(*input_shape) + + return mixed_layer + + def forward( + self, + hidden_states, + attention_mask, + layer_past=None, + get_key_value=False, + encoder_output=None, + set_inference_key_value_memory=False, + inference_max_sequence_len=None, + rotary_pos_emb=None, # rotary positional embedding + relative_position_bias=None, + checkpoint_core_attention=False, + ): + # hidden_states: [sq, b, h] + + # ================================================= + # Pre-allocate memory for key-values for inference. + # ================================================= + if set_inference_key_value_memory: + assert inference_max_sequence_len and inference_max_sequence_len > 0 + self.inference_key_memory = self._allocate_memory( + inference_max_sequence_len, hidden_states.size(1), hidden_states.dtype + ) + self.inference_value_memory = self._allocate_memory( + inference_max_sequence_len, hidden_states.size(1), hidden_states.dtype + ) + self.inference_current_sequence_len = 0 + + # Some consistency check. + if inference_max_sequence_len: + assert self.inference_current_sequence_len < self.inference_key_memory.size(0) + assert inference_max_sequence_len == self.inference_key_memory.size(0) + # This is added for safety. In case inference_max_sequence_len + # is not provided, make sure there is no potential memory left + # from previous inference. + if not inference_max_sequence_len: + self.inference_key_memory = None + self.inference_value_memory = None + + # ===================== + # Query, Key, and Value + # ===================== + + if self.attention_type == AttnType.self_attn: + # Attention heads [sq, b, h] --> [sq, b, (np * 3 * hn)] + mixed_x_layer, _ = self.query_key_value(hidden_states) + + # [sq, b, (np * 3 * hn)] --> [sq, b, np, 3 * hn] + new_tensor_shape = mixed_x_layer.size()[:-1] + ( + self.num_attention_heads_per_partition, + 3 * self.hidden_size_per_attention_head, + ) + if self.megatron_legacy: + mixed_x_layer = self._transpose_last_dim(mixed_x_layer, 3, True) + mixed_x_layer = mixed_x_layer.view(*new_tensor_shape) + + # [sq, b, np, 3 * hn] --> 3 [sq, b, np, hn] + (query_layer, key_layer, value_layer) = tensor_parallel.split_tensor_along_last_dim(mixed_x_layer, 3) + else: + # Attention heads [sk, b, h] --> [sk, b, (np * 2 * hn)] + mixed_kv_layer, _ = self.key_value(encoder_output) + + # [sk, b, (np * 2 * hn)] --> [sk, b, np, 2 * hn] + new_tensor_shape = mixed_kv_layer.size()[:-1] + ( + self.num_attention_heads_per_partition, + 2 * self.hidden_size_per_attention_head, + ) + if self.megatron_legacy: + mixed_kv_layer = self._transpose_last_dim(mixed_kv_layer, 2, True) + mixed_kv_layer = mixed_kv_layer.view(*new_tensor_shape) + + # [sk, b, np, 2 * hn] --> 2 [sk, b, np, hn] + (key_layer, value_layer) = tensor_parallel.split_tensor_along_last_dim(mixed_kv_layer, 2) + + # Attention head [sq, b, h] --> [sq, b, hp] + query_layer, _ = self.query(hidden_states) + # [sq, b, hp] --> [sq, b, np, hn] + new_tensor_shape = query_layer.size()[:-1] + ( + self.num_attention_heads_per_partition, + self.hidden_size_per_attention_head, + ) + query_layer = query_layer.view(*new_tensor_shape) + + if self.is_adapter_available(): + key_infused_adapter = self.get_adapter_module(AdapterName.KEY_INFUSED) + value_infused_adapter = self.get_adapter_module(AdapterName.VALUE_INFUSED) + if key_infused_adapter: + assert value_infused_adapter is not None, "Expected value_infused_adapter not found!" + kls = key_layer.shape + key_layer = key_infused_adapter(key_layer.reshape(kls[0], kls[1], -1)).reshape(kls) + if value_infused_adapter: + assert key_infused_adapter is not None, "Expected key_infused_adapter not found!" + vls = value_layer.shape + value_layer = value_infused_adapter(value_layer.reshape(vls[0], vls[1], -1)).reshape(vls) + + # =================================================== + # Adjust key, value, and attention mask for inference + # =================================================== + + # duplicate the pos_emb for self attention + if rotary_pos_emb is not None: + rotary_pos_emb = rotary_pos_emb if isinstance(rotary_pos_emb, tuple) else ((rotary_pos_emb,) * 2) + + if inference_max_sequence_len: + # Adjust the range variables. + start = self.inference_current_sequence_len + self.inference_current_sequence_len += key_layer.size(0) + end = self.inference_current_sequence_len + # Copy key and values. + self.inference_key_memory[start:end, ...] = key_layer + self.inference_value_memory[start:end, ...] = value_layer + key_layer = self.inference_key_memory[:end, ...] + value_layer = self.inference_value_memory[:end, ...] + # Adjust attention mask + attention_mask = attention_mask[..., start:end, :end] + # adjust the key rotary positional embedding + if rotary_pos_emb is not None: + q_pos_emb, k_pos_emb = rotary_pos_emb + if not set_inference_key_value_memory: + # In inference, we compute one token at a time. + # Select the correct positional embedding. + q_pos_emb = q_pos_emb[end - 1 : end] + k_pos_emb = k_pos_emb[:end, :, :, :] + rotary_pos_emb = (q_pos_emb, k_pos_emb) + + if layer_past is not None: + past_key, past_value = layer_past + key_layer = torch.cat((past_key.type_as(key_layer), key_layer), dim=0) + value_layer = torch.cat((past_value.type_as(value_layer), value_layer), dim=0) + + if get_key_value: + present = (key_layer, value_layer) + + if checkpoint_core_attention: + context_layer = self._checkpointed_attention_forward( + query_layer, + key_layer, + value_layer, + attention_mask, + rotary_pos_emb=rotary_pos_emb, + relative_position_bias=relative_position_bias, + headscale_tensor=self.head_scale_tensor if self.headscale else None, + ) + else: + context_layer = self.core_attention( + query_layer, + key_layer, + value_layer, + attention_mask, + layer_past=layer_past, + get_key_value=get_key_value, + rotary_pos_emb=rotary_pos_emb, + relative_position_bias=relative_position_bias, + headscale_tensor=self.head_scale_tensor if self.headscale else None, + ) + + # ================= + # Output. [sq, b, h] + # ================= + + output, bias = self.dense(context_layer) + + if get_key_value: + output = [output, present] + + return output, bias + + +class ParallelChunkedCrossAttention(MegatronModule): + """Parallel chunked cross-attention layer class. + + Self-attention layer takes input with size [b, s, h] + and returns output of the same size. + """ + + def __init__( + self, + init_method, + output_layer_init_method, + layer_number, + num_attention_heads, + hidden_size, + precision=16, + apply_query_key_layer_scaling=True, + kv_channels=None, + use_cpu_initialization=False, + masked_softmax_fusion=True, + attention_dropout=0.1, + megatron_legacy=False, + chunk_size=64, # each chunk, how many tokens + bias=True, + headscale=False, + gradient_accumulation_fusion=False, + normalize_attention_scores=True, + ): + super(ParallelChunkedCrossAttention, self).__init__() + self.cross_attention = ParallelAttention( + init_method=init_method, + output_layer_init_method=output_layer_init_method, + layer_number=layer_number, + num_attention_heads=num_attention_heads, + hidden_size=hidden_size, + attention_type=AttnType.cross_attn, + attn_mask_type=AttnMaskType.padding, + precision=precision, + apply_query_key_layer_scaling=apply_query_key_layer_scaling, + kv_channels=kv_channels, + use_cpu_initialization=use_cpu_initialization, + masked_softmax_fusion=masked_softmax_fusion, + attention_dropout=attention_dropout, + megatron_legacy=megatron_legacy, + bias=bias, + headscale=headscale, + gradient_accumulation_fusion=gradient_accumulation_fusion, + normalize_attention_scores=normalize_attention_scores, + ) + self.chunk_size = chunk_size + + def forward( + self, + hidden_states, + attention_mask, + encoder_output=None, + set_inference_key_value_memory=False, + inference_max_sequence_len=None, + rotary_pos_emb=None, + checkpoint_core_attention=False, + ): + if checkpoint_core_attention: + raise ValueError( + 'checkpoint_core_attention during forward not implemented yet for ParallelChunkedCrossAttention' + ) + + # hidden_states is assumed to have dimension [token length, batch, dimension] + # derive variables + # encoder_output here is the retrieved context + context = encoder_output + # context is assumed to have dimension [num_chunks, num_neighbors, context_token_len, batch, dimension] + chunk_size = self.chunk_size + b, n, dim = ( + hidden_states.shape[1], + hidden_states.shape[0], + hidden_states.shape[2], + ) + default_bias = self.cross_attention.dense.bias + if set_inference_key_value_memory: + seq_index = (n // chunk_size) * chunk_size + self.current_len = n + elif inference_max_sequence_len is not None: + # only handles single token increment + assert n == 1 + self.current_len += n + token_pos = (self.current_len - 1) % chunk_size + chunk_id = self.current_len // chunk_size + if chunk_id <= 0: + # if sequence length less than chunk size, do an early return + return torch.zeros_like(hidden_states), default_bias + causal_padding = chunk_size - 1 + # pad it as a full chunk, put it at the end of the chunk position + hidden_states = F.pad(hidden_states, (0, 0, 0, 0, causal_padding, 0), value=0.0) + # only use the relevant context + context = context[chunk_id - 1 : chunk_id, :, :, :, :] + attention_mask = rearrange(attention_mask, '(b k) 1 q v -> b k 1 q v', b=b) + # select the relevant chunk attn mask + attention_mask = attention_mask[:, chunk_id - 1] + seq_index = chunk_size + else: + # this is normal forward without inference + seq_index = (n // chunk_size) * chunk_size + + # if sequence length less than chunk size, do an early return + if n < self.chunk_size and set_inference_key_value_memory and inference_max_sequence_len is not None: + return torch.zeros_like(hidden_states), default_bias + + num_chunks, num_retrieved = ( + context.shape[-5], + context.shape[-4], + ) + + # causal padding + causal_padding = chunk_size - 1 + + x = F.pad(hidden_states, (0, 0, 0, 0, -causal_padding, causal_padding), value=0.0) + + # remove sequence which is ahead of the neighbors retrieved (during inference) + + # seq_index = (n // chunk_size) * chunk_size + x, x_remainder = x[:seq_index], x[seq_index:] + + seq_remain_len = x_remainder.shape[0] + + # take care of rotary positional embedding + # make sure queries positions are properly shifted to the future + + q_pos_emb, k_pos_emb = rotary_pos_emb + # currently implementation is broken + # q need to extend to causal_padding, and just do + # q_pos_emb = F.pad(q_pos_emb, (0, 0, -causal_padding, 0), value = 0.) + if inference_max_sequence_len is not None and not set_inference_key_value_memory: + q_pos_emb = F.pad( + q_pos_emb, (0, 0, 0, 0, 0, 0, -causal_padding - token_pos, -causal_padding + token_pos), value=0.0 + ) + else: + q_pos_emb = F.pad(q_pos_emb, (0, 0, 0, 0, 0, 0, -causal_padding, 0), value=0.0) + + k_pos_emb = repeat(k_pos_emb, 'n b h d -> (r n) b h d', r=num_retrieved) + rotary_pos_emb = (q_pos_emb, k_pos_emb) + + # make sure number context chunks is enough + assert x.shape[0] // chunk_size == num_chunks + + # reshape so we have chunk to chunk attention, without breaking causality + x = rearrange(x, '(k n) b d -> n (b k) d', k=num_chunks) + context = rearrange(context, 'k r n b d -> (r n) (b k) d') + # cross attention + out, bias = self.cross_attention(x, attention_mask, encoder_output=context, rotary_pos_emb=rotary_pos_emb) + + # reshape back to original sequence + + out = rearrange(out, 'n (b k) d -> (k n) b d', b=b) + + # pad back to original, with 0s at the beginning (which will be added to the residual and be fine) + + out = F.pad(out, (0, 0, 0, 0, causal_padding, -causal_padding + seq_remain_len), value=0.0) + if not set_inference_key_value_memory and inference_max_sequence_len is not None: + out = out[-1:] + return out, bias + + +class CoreAttention(MegatronModule): + """ Region where selective activation recomputation is applied. + See Figure 3. in Reducing Activation Recomputation in Large Transformer Models + https://arxiv.org/pdf/2205.05198.pdf for more details. + + """ + + def __init__( + self, + layer_number, + num_attention_heads, + hidden_size, + attention_type=AttnType.self_attn, + attn_mask_type=AttnMaskType.padding, + precision=16, + apply_query_key_layer_scaling=True, + kv_channels=None, + masked_softmax_fusion=True, + attention_dropout=0.1, + sequence_parallel=False, + normalize_attention_scores=True, + ): + + super(CoreAttention, self).__init__() + + self.precision = precision + self.fp16 = precision == 16 + self.bf16 = precision == 'bf16' + + self.apply_query_key_layer_scaling = apply_query_key_layer_scaling + self.attention_softmax_in_fp32 = False + if self.apply_query_key_layer_scaling: + self.attention_softmax_in_fp32 = True + self.layer_number = max(1, layer_number) + self.attention_type = attention_type + self.attn_mask_type = attn_mask_type + self.sequence_parallel = sequence_parallel + # If True, will scale attention scores by 1 / sqrt(hidden_size_per_attention_head). + # This arg is been provided mostly to support weight conversion of Huggingface models. (ex: T5v1.1) + self.normalize_attention_scores = normalize_attention_scores + + if kv_channels is None: + assert ( + hidden_size % num_attention_heads == 0 + ), 'hidden_size must be divisible by num_attention_heads if kv_channels is None' + kv_channels = hidden_size // num_attention_heads + + projection_size = kv_channels * num_attention_heads + + # Per attention head and per partition values. + world_size = parallel_state.get_tensor_model_parallel_world_size() + self.hidden_size_per_partition = safe_divide(projection_size, world_size) + self.hidden_size_per_attention_head = safe_divide(projection_size, num_attention_heads) + self.num_attention_heads_per_partition = safe_divide(num_attention_heads, world_size) + self.num_attention_heads_partition_offset = ( + self.num_attention_heads_per_partition * parallel_state.get_tensor_model_parallel_rank() + ) + + coeff = None + self.norm_factor = math.sqrt(self.hidden_size_per_attention_head) + if self.apply_query_key_layer_scaling: + coeff = self.layer_number + self.norm_factor *= coeff + + self.scale_mask_softmax = MatchedScaleMaskSoftmax( + self.fp16, + self.bf16, + self.attn_mask_type, + masked_softmax_fusion, + attention_mask_func, + self.attention_softmax_in_fp32, + coeff, + ) + + # Dropout. Note that for a single iteration, this layer will generate + # different outputs on different number of parallel partitions but + # on average it should not be partition dependent. + self.attention_dropout = torch.nn.Dropout(attention_dropout) + + def forward( + self, + query_layer, + key_layer, + value_layer, + attention_mask, + layer_past=None, + get_key_value=False, + rotary_pos_emb=None, + relative_position_bias=None, + headscale_tensor=None, + ): + + # =================================== + # Raw attention scores. [b, np, s, s] + # =================================== + + # [b, np, sq, sk] + output_size = (query_layer.size(1), query_layer.size(2), query_layer.size(0), key_layer.size(0)) + + # TODO: figure out how to do this + # apply relative positional encoding (rotary embedding) + if rotary_pos_emb is not None: + q_pos_emb, k_pos_emb = rotary_pos_emb + + query_layer = apply_rotary_pos_emb(query_layer, q_pos_emb) + key_layer = apply_rotary_pos_emb(key_layer, k_pos_emb) + # TODO, can apply positional embedding to value_layer so it has + # absolute positional embedding. + # otherwise, only relative positional embedding takes effect + # value_layer = apply_rotary_pos_emb(value_layer, k_pos_emb) + + # [sq, b, np, hn] -> [sq, b * np, hn] + query_layer = query_layer.view(output_size[2], output_size[0] * output_size[1], -1) + # [sk, b, np, hn] -> [sk, b * np, hn] + key_layer = key_layer.view(output_size[3], output_size[0] * output_size[1], -1) + + # preallocting input tensor: [b * np, sq, sk] + matmul_input_buffer = torch.empty( + output_size[0] * output_size[1], + output_size[2], + output_size[3], + dtype=query_layer.dtype, + device=torch.cuda.current_device(), + ) + + # Raw attention scores. [b * np, sq, sk] + matmul_result = torch.baddbmm( + matmul_input_buffer, + query_layer.transpose(0, 1), # [b * np, sq, hn] + key_layer.transpose(0, 1).transpose(1, 2), # [b * np, hn, sk] + beta=0.0, + alpha=(1.0 / self.norm_factor) if self.normalize_attention_scores else 1.0, + ) + + # change view to [b, np, sq, sk] + attention_scores = matmul_result.view(*output_size) + + if relative_position_bias is not None: + attention_scores += relative_position_bias[ + :, + self.num_attention_heads_partition_offset : self.num_attention_heads_partition_offset + + self.num_attention_heads_per_partition, + : attention_scores.size(2), + : attention_scores.size(3), + ] + + # ================================================== + # Update attention mask for inference. [b, np, sq, sk] + # ================================================== + + if get_key_value: + with torch.no_grad(): + if layer_past is not None: + attention_mask = attention_mask[ + ..., attention_scores.size(3) - 1, : attention_scores.size(3) + ].unsqueeze(2) + else: + attention_mask = attention_mask[..., : attention_scores.size(3), : attention_scores.size(3)] + + # =========================== + # Attention probs and dropout + # =========================== + + # attention scores and attention mask [b, np, sq, sk] + attention_probs = self.scale_mask_softmax(attention_scores, attention_mask) + + # This is actually dropping out entire tokens to attend to, which might + # seem a bit unusual, but is taken from the original Transformer paper. + + if not self.sequence_parallel: + with tensor_parallel.random.get_cuda_rng_tracker().fork(): + attention_probs = self.attention_dropout(attention_probs) + else: + attention_probs = self.attention_dropout(attention_probs) + + # ========================= + # Context layer. [sq, b, hp] + # ========================= + + # value_layer -> context layer. + # [sk, b, np, hn] --> [b, np, sq, hn] + + # context layer shape: [b, np, sq, hn] + output_size = (value_layer.size(1), value_layer.size(2), query_layer.size(0), value_layer.size(3)) + + # change view [sk, b * np, hn] + value_layer = value_layer.view(value_layer.size(0), output_size[0] * output_size[1], -1) + + # change view [b * np, sq, sk] + attention_probs = attention_probs.view(output_size[0] * output_size[1], output_size[2], -1) + + # matmul: [b * np, sq, hn] + context_layer = torch.bmm(attention_probs, value_layer.transpose(0, 1)) + + # change view [b, np, sq, hn] + context_layer = context_layer.view(*output_size) + + if headscale_tensor is not None: + context_layer = context_layer * headscale_tensor + + # [b, np, sq, hn] --> [sq, b, np, hn] + context_layer = context_layer.permute(2, 0, 1, 3).contiguous() + + # [sq, b, np, hn] --> [sq, b, hp] + new_context_layer_shape = context_layer.size()[:-2] + (self.hidden_size_per_partition,) + context_layer = context_layer.view(*new_context_layer_shape) + + return context_layer diff --git a/nemo/collections/nlp/modules/common/megatron/mlp.py b/nemo/collections/nlp/modules/common/megatron/mlp.py new file mode 100644 index 000000000000..940d75ce7548 --- /dev/null +++ b/nemo/collections/nlp/modules/common/megatron/mlp.py @@ -0,0 +1,358 @@ +# coding=utf-8 +# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import torch +import torch.nn.functional as F + +from nemo.collections.nlp.modules.common.megatron.adapters.parallel_adapters import ( + AdapterName, + MLPInfusedAdapterConfig, +) +from nemo.collections.nlp.modules.common.megatron.fused_bias_geglu import fused_bias_geglu +from nemo.collections.nlp.modules.common.megatron.fused_bias_gelu import fused_bias_gelu +from nemo.collections.nlp.modules.common.megatron.fused_layer_norm import get_layer_norm +from nemo.collections.nlp.modules.common.megatron.layer_norm_1p import LayerNorm1P +from nemo.collections.nlp.modules.common.megatron.module import MegatronModule +from nemo.collections.nlp.modules.common.megatron.utils import ApexGuardDefaults, erf_gelu +from nemo.collections.nlp.modules.common.megatron.utils import openai_gelu as openai_gelu_func +from nemo.core import adapter_mixins + +try: + from apex.transformer import parallel_state, tensor_parallel + from apex.transformer.parallel_state import get_tensor_model_parallel_world_size + from apex.normalization import MixedFusedRMSNorm + + HAVE_APEX = True + +except (ImportError, ModuleNotFoundError): + + HAVE_APEX = False + + # fake missing classes with None attributes + ModelType = AttnMaskType = AttnType = LayerType = ApexGuardDefaults() + + +class ParallelMLP(MegatronModule, adapter_mixins.AdapterModuleMixin): + """MLP. + + MLP will take the input with h hidden state, project it to 4*h + hidden dimension, perform nonlinear transformation, and project the + state back into h hidden dimension. + """ + + def __init__( + self, + init_method, + output_layer_init_method, + hidden_size, + ffn_hidden_size, + use_cpu_initialization=False, + bias_activation_fusion=True, + openai_gelu=False, + onnx_safe=False, + activation='gelu', + bias=True, + transformer_block_type='pre_ln', + normalization='layernorm', + layernorm_epsilon=1e-5, + persist_layer_norm=False, + sequence_parallel=False, + gradient_accumulation_fusion=False, + dropout=0.0, + ): + super(ParallelMLP, self).__init__() + self.activation = activation + self.bias = bias + self.transformer_block_type = transformer_block_type + self.normalization = normalization + self.layernorm_epsilon = layernorm_epsilon + self.persist_layer_norm = persist_layer_norm + self.activation = activation + self.dropout = dropout + self.set_accepted_adapter_types([MLPInfusedAdapterConfig._target_]) + + if activation not in ['gelu', 'geglu', 'reglu', 'swiglu']: + raise ValueError(f"Activation {activation} not supported. Only gelu, geglu, reglu, swiglu are supported.") + + no_async_tensor_model_parallel_allreduce = ( + parallel_state.get_tensor_model_parallel_world_size() == 1 or sequence_parallel + ) + # Project to 4h. + self.dense_h_to_4h = tensor_parallel.ColumnParallelLinear( + hidden_size, + ffn_hidden_size, # NOTE: When using geglu, divide ffn dim by 2/3 to keep overall params the same. + gather_output=False, + init_method=init_method, + skip_bias_add=True, + use_cpu_initialization=use_cpu_initialization, + bias=bias, + sequence_parallel_enabled=sequence_parallel, + no_async_tensor_model_parallel_allreduce=no_async_tensor_model_parallel_allreduce, + gradient_accumulation_fusion=gradient_accumulation_fusion, + ) + + if activation in ['geglu', 'reglu', 'swiglu']: + # Separate linear layer for *GLU activations. + # Source: https://github.com/huggingface/transformers/blob/bee361c6f1f7704f8c688895f2f86f6e5ff84727/src/transformers/models/t5/modeling_t5.py#L292 + self.dense_h_to_4h_2 = tensor_parallel.ColumnParallelLinear( + hidden_size, + ffn_hidden_size, # NOTE: When using *glu, divide ffn dim by 2/3 to keep overall params the same. + gather_output=False, + init_method=init_method, + skip_bias_add=True, + use_cpu_initialization=use_cpu_initialization, + bias=bias, + sequence_parallel_enabled=sequence_parallel, + no_async_tensor_model_parallel_allreduce=no_async_tensor_model_parallel_allreduce, + gradient_accumulation_fusion=gradient_accumulation_fusion, + ) + + self.glu_activation_family = activation in ['geglu', 'reglu', 'swiglu'] + bias_activation_fusion_unavailable = activation in ['reglu', 'swiglu'] + + if bias_activation_fusion_unavailable and bias_activation_fusion: + raise ValueError( + f"Cannot use bias_activation_fusion with {activation} activation. Please turn bias gelu fusion off." + ) + + if self.glu_activation_family and onnx_safe and self.bias_activation_fusion: + raise ValueError( + f"Cannot use onnx_safe with specificed activation function and bias_activation_fusion : {activation} Please turn onnx safe off." + ) + + if bias_activation_fusion and not bias: + raise ValueError( + f"Cannot use bias_activation_fusion without bias terms. Please set bias=True or bias_activation_fusion=False." + ) + + self.bias_activation_fusion = bias_activation_fusion + + # Give openai_gelu precedence over other activations if set, for HF compatibility. Normally this is off and shouldn't affect regular model training. + if openai_gelu: + self.activation_func = openai_gelu_func + elif activation in ["gelu", "geglu"]: + self.activation_func = F.gelu + elif onnx_safe: + self.activation_func = erf_gelu + elif activation == "reglu": + self.activation_func = F.relu + elif activation == "swiglu": + # SiLU or sigmoid linear unit is the same as swish with beta = 1 (which is what https://arxiv.org/pdf/2002.05202.pdf uses.) + self.activation_func = F.silu + + # Project back to h. + self.dense_4h_to_h = tensor_parallel.RowParallelLinear( + ffn_hidden_size, + hidden_size, + input_is_parallel=True, + init_method=output_layer_init_method, + skip_bias_add=True, + use_cpu_initialization=use_cpu_initialization, + bias=bias, + sequence_parallel_enabled=sequence_parallel, + gradient_accumulation_fusion=gradient_accumulation_fusion, + ) + + # Normformer normalization + if transformer_block_type == 'normformer': + if normalization == 'layernorm': + self.normalization = get_layer_norm( + ffn_hidden_size // get_tensor_model_parallel_world_size(), layernorm_epsilon, persist_layer_norm + ) + elif normalization == 'layernorm1p': + self.normalization = LayerNorm1P( + ffn_hidden_size // get_tensor_model_parallel_world_size(), + layernorm_epsilon, + sequence_parallel_enabled=sequence_parallel, + ) + else: + self.normalization = MixedFusedRMSNorm( + ffn_hidden_size // get_tensor_model_parallel_world_size(), layernorm_epsilon + ) + + def forward(self, hidden_states): + + # [s, b, 4hp] + intermediate_parallel, bias_parallel = self.dense_h_to_4h(hidden_states) + + if self.glu_activation_family: + intermediate_parallel_2, bias_parallel_2 = self.dense_h_to_4h_2(hidden_states) + + if self.bias_activation_fusion: + if self.activation == 'gelu': + intermediate_parallel = fused_bias_gelu(intermediate_parallel, bias_parallel) + elif self.activation == 'geglu': + intermediate_parallel = fused_bias_geglu( + intermediate_parallel, bias_parallel, intermediate_parallel_2, bias_parallel_2 + ) + + elif self.activation in ['reglu', 'swiglu'] or ( + self.glu_activation_family and not self.bias_activation_fusion + ): + if bias_parallel is not None: + intermediate_parallel = self.activation_func(intermediate_parallel + bias_parallel) * ( + intermediate_parallel_2 + bias_parallel_2 + ) + else: + intermediate_parallel = self.activation_func(intermediate_parallel) * intermediate_parallel_2 + + else: + if bias_parallel is not None: + intermediate_parallel = self.activation_func(intermediate_parallel + bias_parallel) + else: + intermediate_parallel = self.activation_func(intermediate_parallel) + + if self.dropout > 0: + intermediate_parallel = F.dropout(intermediate_parallel, p=self.dropout, training=self.training) + + infused_adapter = self.get_adapter_module(AdapterName.MLP_INFUSED) + if infused_adapter: + intermediate_parallel = infused_adapter(intermediate_parallel) + + # Normformer normalization + if self.transformer_block_type == 'normformer': + intermediate_parallel = self.normalization(intermediate_parallel) + + # [s, b, h] + output, output_bias = self.dense_4h_to_h(intermediate_parallel) + return output, output_bias + + +class SwitchMLP(MegatronModule): + """Top-1 MoE + + Curently supports Sinkhorn based expert routing.""" + + def __init__( + self, + num_experts, + init_method, + output_layer_init_method, + hidden_size, + ffn_hidden_size, + use_cpu_initialization=False, + bias_activation_fusion=True, + openai_gelu=False, + onnx_safe=False, + activation='gelu', + bias=True, + transformer_block_type='pre_ln', + normalization='layernorm', + layernorm_epsilon=1e-5, + persist_layer_norm=False, + sequence_parallel=False, + gradient_accumulation_fusion=False, + dropout=0.0, + ): + super(SwitchMLP, self).__init__() + + self.num_experts = num_experts + self.route_algo = SwitchMLP.sinkhorn + self.router = tensor_parallel.RowParallelLinear( + hidden_size, + num_experts, + input_is_parallel=False, + init_method=init_method, + skip_bias_add=False, + use_cpu_initialization=use_cpu_initialization, + bias=bias, + sequence_parallel_enabled=sequence_parallel, + gradient_accumulation_fusion=gradient_accumulation_fusion, + ) + + mlp_args = { + 'init_method': init_method, + 'output_layer_init_method': output_layer_init_method, + 'hidden_size': hidden_size, + 'ffn_hidden_size': ffn_hidden_size, + 'use_cpu_initialization': use_cpu_initialization, + 'bias_activation_fusion': bias_activation_fusion, + 'openai_gelu': openai_gelu, + 'onnx_safe': onnx_safe, + 'activation': activation, + 'bias': bias, + 'transformer_block_type': transformer_block_type, + 'normalization': normalization, + 'layernorm_epsilon': layernorm_epsilon, + 'persist_layer_norm': persist_layer_norm, + 'sequence_parallel': sequence_parallel, + 'gradient_accumulation_fusion': gradient_accumulation_fusion, + 'dropout': dropout, + } + self.experts = torch.nn.ModuleList([ParallelMLP(**mlp_args) for _ in range(num_experts)]) + + def forward(self, hidden_states): + hidden_shape = hidden_states.shape + route, _ = self.router(hidden_states) + route = route.view(-1, self.num_experts) + if self.training: + with torch.no_grad(): + norm_route = self.route_algo( + route.detach().to(dtype=torch.float32) + ) # explicit fp32 conversion for stability + _, max_ind = torch.max(norm_route, dim=1) + route = torch.sigmoid(route) + max_prob = route[torch.arange(route.size(0)), max_ind] + else: + route = torch.sigmoid(route) + max_prob, max_ind = torch.max(route, dim=1) + max_prob = torch.unsqueeze(max_prob, 1) + + hidden_states = hidden_states.view(-1, hidden_shape[-1]) + + local_indices = (max_ind == 0).nonzero() + hidden = hidden_states[local_indices, :] + output, output_bias = self.experts[0](hidden) + output_bias = output_bias.expand_as(output) + + output_total = torch.empty_like(hidden_states, dtype=output.dtype) + output_bias_total = torch.empty_like(hidden_states, dtype=output_bias.dtype) + + output_total[local_indices, :] = output + output_bias_total[local_indices, :] = output_bias + + for expert_num, expert in enumerate(self.experts): + if expert_num == 0: + continue + local_indices = (max_ind == expert_num).nonzero() + hidden = hidden_states[local_indices, :] + output, output_bias = expert(hidden) + output_bias = output_bias.expand_as(output) + output_total[local_indices, :] = output + output_bias_total[local_indices, :] = output_bias + + output_total = output_total * max_prob + output_bias_total = output_bias_total * max_prob + output_total = output_total.view(hidden_shape) + output_bias_total = output_bias_total.view(hidden_shape) + + return output_total, output_bias_total + + @classmethod + def sinkhorn(cls, cost, tol=0.0001): + "Megatron-LMs sinkhorn implementation" + + cost = torch.exp(cost) + d0 = torch.ones(cost.size(0), device=cost.device, dtype=cost.dtype) + d1 = torch.ones(cost.size(1), device=cost.device, dtype=cost.dtype) + + eps = 0.00000001 + error = 1e9 + d1_old = d1 + while error > tol: + d0 = (1 / d0.size(0)) * 1 / (torch.sum(d1 * cost, 1) + eps) + d1 = (1 / d1.size(0)) * 1 / (torch.sum(d0.unsqueeze(1) * cost, 0) + eps) + error = torch.mean(torch.abs(d1_old - d1)) + d1_old = d1 + return d1 * cost * d0.unsqueeze(1) diff --git a/nemo/collections/nlp/modules/common/megatron/transformer.py b/nemo/collections/nlp/modules/common/megatron/transformer.py index aa09954b5eb3..a8002d82952f 100644 --- a/nemo/collections/nlp/modules/common/megatron/transformer.py +++ b/nemo/collections/nlp/modules/common/megatron/transformer.py @@ -14,45 +14,36 @@ # limitations under the License. """Transformer.""" -import math from contextlib import nullcontext from typing import Any, Callable, Optional import torch -import torch.nn.functional as F -from einops import rearrange, repeat +from einops import rearrange from nemo.collections.common.parts.adapter_modules import LinearAdapterConfig from nemo.collections.nlp.modules.common.megatron.adapters.parallel_adapters import ( AdapterName, - InfusedAdapterConfig, - MLPInfusedAdapterConfig, ParallelLinearAdapterConfig, ) +from nemo.collections.nlp.modules.common.megatron.attention import ParallelAttention, ParallelChunkedCrossAttention from nemo.collections.nlp.modules.common.megatron.fused_bias_dropout_add import ( bias_dropout_add, bias_dropout_add_fused_inference, bias_dropout_add_fused_train, dropout_add, ) -from nemo.collections.nlp.modules.common.megatron.fused_bias_geglu import fused_bias_geglu -from nemo.collections.nlp.modules.common.megatron.fused_bias_gelu import fused_bias_gelu from nemo.collections.nlp.modules.common.megatron.fused_layer_norm import get_layer_norm -from nemo.collections.nlp.modules.common.megatron.fused_softmax import MatchedScaleMaskSoftmax from nemo.collections.nlp.modules.common.megatron.layer_norm_1p import LayerNorm1P from nemo.collections.nlp.modules.common.megatron.layer_type import LayerType +from nemo.collections.nlp.modules.common.megatron.mlp import ParallelMLP, SwitchMLP from nemo.collections.nlp.modules.common.megatron.module import MegatronModule -from nemo.collections.nlp.modules.common.megatron.rotary_pos_embedding import apply_rotary_pos_emb -from nemo.collections.nlp.modules.common.megatron.utils import ApexGuardDefaults, attention_mask_func, erf_gelu -from nemo.collections.nlp.modules.common.megatron.utils import openai_gelu as openai_gelu_func +from nemo.collections.nlp.modules.common.megatron.utils import ApexGuardDefaults from nemo.core import adapter_mixins from nemo.utils import logging try: from apex.transformer import parallel_state, tensor_parallel from apex.transformer.enums import AttnMaskType, AttnType, ModelType - from apex.transformer.utils import divide as safe_divide - from apex.transformer.parallel_state import get_tensor_model_parallel_world_size from apex.normalization import MixedFusedRMSNorm HAVE_APEX = True @@ -100,1105 +91,6 @@ def __init__(self): """ -class ParallelMLP(MegatronModule, adapter_mixins.AdapterModuleMixin): - """MLP. - - MLP will take the input with h hidden state, project it to 4*h - hidden dimension, perform nonlinear transformation, and project the - state back into h hidden dimension. - """ - - def __init__( - self, - init_method, - output_layer_init_method, - hidden_size, - ffn_hidden_size, - use_cpu_initialization=False, - bias_activation_fusion=True, - openai_gelu=False, - onnx_safe=False, - activation='gelu', - bias=True, - transformer_block_type='pre_ln', - normalization='layernorm', - layernorm_epsilon=1e-5, - persist_layer_norm=False, - sequence_parallel=False, - gradient_accumulation_fusion=False, - dropout=0.0, - ): - super(ParallelMLP, self).__init__() - self.activation = activation - self.bias = bias - self.transformer_block_type = transformer_block_type - self.normalization = normalization - self.layernorm_epsilon = layernorm_epsilon - self.persist_layer_norm = persist_layer_norm - self.activation = activation - self.dropout = dropout - self.set_accepted_adapter_types([MLPInfusedAdapterConfig._target_]) - - if activation not in ['gelu', 'geglu', 'reglu', 'swiglu']: - raise ValueError(f"Activation {activation} not supported. Only gelu, geglu, reglu, swiglu are supported.") - - no_async_tensor_model_parallel_allreduce = ( - parallel_state.get_tensor_model_parallel_world_size() == 1 or sequence_parallel - ) - # Project to 4h. - self.dense_h_to_4h = tensor_parallel.ColumnParallelLinear( - hidden_size, - ffn_hidden_size, # NOTE: When using geglu, divide ffn dim by 2/3 to keep overall params the same. - gather_output=False, - init_method=init_method, - skip_bias_add=True, - use_cpu_initialization=use_cpu_initialization, - bias=bias, - sequence_parallel_enabled=sequence_parallel, - no_async_tensor_model_parallel_allreduce=no_async_tensor_model_parallel_allreduce, - gradient_accumulation_fusion=gradient_accumulation_fusion, - ) - - if activation in ['geglu', 'reglu', 'swiglu']: - # Separate linear layer for *GLU activations. - # Source: https://github.com/huggingface/transformers/blob/bee361c6f1f7704f8c688895f2f86f6e5ff84727/src/transformers/models/t5/modeling_t5.py#L292 - self.dense_h_to_4h_2 = tensor_parallel.ColumnParallelLinear( - hidden_size, - ffn_hidden_size, # NOTE: When using *glu, divide ffn dim by 2/3 to keep overall params the same. - gather_output=False, - init_method=init_method, - skip_bias_add=True, - use_cpu_initialization=use_cpu_initialization, - bias=bias, - sequence_parallel_enabled=sequence_parallel, - no_async_tensor_model_parallel_allreduce=no_async_tensor_model_parallel_allreduce, - gradient_accumulation_fusion=gradient_accumulation_fusion, - ) - - self.glu_activation_family = activation in ['geglu', 'reglu', 'swiglu'] - bias_activation_fusion_unavailable = activation in ['reglu', 'swiglu'] - - if bias_activation_fusion_unavailable and bias_activation_fusion: - raise ValueError( - f"Cannot use bias_activation_fusion with {activation} activation. Please turn bias gelu fusion off." - ) - - if self.glu_activation_family and onnx_safe and self.bias_activation_fusion: - raise ValueError( - f"Cannot use onnx_safe with specificed activation function and bias_activation_fusion : {activation} Please turn onnx safe off." - ) - - if bias_activation_fusion and not bias: - raise ValueError( - f"Cannot use bias_activation_fusion without bias terms. Please set bias=True or bias_activation_fusion=False." - ) - - self.bias_activation_fusion = bias_activation_fusion - - # Give openai_gelu precedence over other activations if set, for HF compatibility. Normally this is off and shouldn't affect regular model training. - if openai_gelu: - self.activation_func = openai_gelu_func - elif activation in ["gelu", "geglu"]: - self.activation_func = F.gelu - elif onnx_safe: - self.activation_func = erf_gelu - elif activation == "reglu": - self.activation_func = F.relu - elif activation == "swiglu": - # SiLU or sigmoid linear unit is the same as swish with beta = 1 (which is what https://arxiv.org/pdf/2002.05202.pdf uses.) - self.activation_func = F.silu - - # Project back to h. - self.dense_4h_to_h = tensor_parallel.RowParallelLinear( - ffn_hidden_size, - hidden_size, - input_is_parallel=True, - init_method=output_layer_init_method, - skip_bias_add=True, - use_cpu_initialization=use_cpu_initialization, - bias=bias, - sequence_parallel_enabled=sequence_parallel, - gradient_accumulation_fusion=gradient_accumulation_fusion, - ) - - # Normformer normalization - if transformer_block_type == 'normformer': - if normalization == 'layernorm': - self.normalization = get_layer_norm( - ffn_hidden_size // get_tensor_model_parallel_world_size(), layernorm_epsilon, persist_layer_norm - ) - elif normalization == 'layernorm1p': - self.normalization = LayerNorm1P( - ffn_hidden_size // get_tensor_model_parallel_world_size(), - layernorm_epsilon, - sequence_parallel_enabled=sequence_parallel, - ) - else: - self.normalization = MixedFusedRMSNorm( - ffn_hidden_size // get_tensor_model_parallel_world_size(), layernorm_epsilon - ) - - def forward(self, hidden_states): - - # [s, b, 4hp] - intermediate_parallel, bias_parallel = self.dense_h_to_4h(hidden_states) - - if self.glu_activation_family: - intermediate_parallel_2, bias_parallel_2 = self.dense_h_to_4h_2(hidden_states) - - if self.bias_activation_fusion: - if self.activation == 'gelu': - intermediate_parallel = fused_bias_gelu(intermediate_parallel, bias_parallel) - elif self.activation == 'geglu': - intermediate_parallel = fused_bias_geglu( - intermediate_parallel, bias_parallel, intermediate_parallel_2, bias_parallel_2 - ) - - elif self.activation in ['reglu', 'swiglu'] or ( - self.glu_activation_family and not self.bias_activation_fusion - ): - if bias_parallel is not None: - intermediate_parallel = self.activation_func(intermediate_parallel + bias_parallel) * ( - intermediate_parallel_2 + bias_parallel_2 - ) - else: - intermediate_parallel = self.activation_func(intermediate_parallel) * intermediate_parallel_2 - - else: - if bias_parallel is not None: - intermediate_parallel = self.activation_func(intermediate_parallel + bias_parallel) - else: - intermediate_parallel = self.activation_func(intermediate_parallel) - - if self.dropout > 0: - intermediate_parallel = F.dropout(intermediate_parallel, p=self.dropout, training=self.training) - - infused_adapter = self.get_adapter_module(AdapterName.MLP_INFUSED) - if infused_adapter: - intermediate_parallel = infused_adapter(intermediate_parallel) - - # Normformer normalization - if self.transformer_block_type == 'normformer': - intermediate_parallel = self.normalization(intermediate_parallel) - - # [s, b, h] - output, output_bias = self.dense_4h_to_h(intermediate_parallel) - return output, output_bias - - -class SwitchMLP(MegatronModule): - """Top-1 MoE - - Curently supports Sinkhorn based expert routing.""" - - def __init__( - self, - num_experts, - init_method, - output_layer_init_method, - hidden_size, - ffn_hidden_size, - use_cpu_initialization=False, - bias_activation_fusion=True, - openai_gelu=False, - onnx_safe=False, - activation='gelu', - bias=True, - transformer_block_type='pre_ln', - normalization='layernorm', - layernorm_epsilon=1e-5, - persist_layer_norm=False, - sequence_parallel=False, - gradient_accumulation_fusion=False, - dropout=0.0, - ): - super(SwitchMLP, self).__init__() - - self.num_experts = num_experts - self.route_algo = SwitchMLP.sinkhorn - self.router = tensor_parallel.RowParallelLinear( - hidden_size, - num_experts, - input_is_parallel=False, - init_method=init_method, - skip_bias_add=False, - use_cpu_initialization=use_cpu_initialization, - bias=bias, - sequence_parallel_enabled=sequence_parallel, - gradient_accumulation_fusion=gradient_accumulation_fusion, - ) - - mlp_args = { - 'init_method': init_method, - 'output_layer_init_method': output_layer_init_method, - 'hidden_size': hidden_size, - 'ffn_hidden_size': ffn_hidden_size, - 'use_cpu_initialization': use_cpu_initialization, - 'bias_activation_fusion': bias_activation_fusion, - 'openai_gelu': openai_gelu, - 'onnx_safe': onnx_safe, - 'activation': activation, - 'bias': bias, - 'transformer_block_type': transformer_block_type, - 'normalization': normalization, - 'layernorm_epsilon': layernorm_epsilon, - 'persist_layer_norm': persist_layer_norm, - 'sequence_parallel': sequence_parallel, - 'gradient_accumulation_fusion': gradient_accumulation_fusion, - 'dropout': dropout, - } - self.experts = torch.nn.ModuleList([ParallelMLP(**mlp_args) for _ in range(num_experts)]) - - def forward(self, hidden_states): - hidden_shape = hidden_states.shape - route, _ = self.router(hidden_states) - route = route.view(-1, self.num_experts) - if self.training: - with torch.no_grad(): - norm_route = self.route_algo( - route.detach().to(dtype=torch.float32) - ) # explicit fp32 conversion for stability - _, max_ind = torch.max(norm_route, dim=1) - route = torch.sigmoid(route) - max_prob = route[torch.arange(route.size(0)), max_ind] - else: - route = torch.sigmoid(route) - max_prob, max_ind = torch.max(route, dim=1) - max_prob = torch.unsqueeze(max_prob, 1) - - hidden_states = hidden_states.view(-1, hidden_shape[-1]) - - local_indices = (max_ind == 0).nonzero() - hidden = hidden_states[local_indices, :] - output, output_bias = self.experts[0](hidden) - output_bias = output_bias.expand_as(output) - - output_total = torch.empty_like(hidden_states, dtype=output.dtype) - output_bias_total = torch.empty_like(hidden_states, dtype=output_bias.dtype) - - output_total[local_indices, :] = output - output_bias_total[local_indices, :] = output_bias - - for expert_num, expert in enumerate(self.experts): - if expert_num == 0: - continue - local_indices = (max_ind == expert_num).nonzero() - hidden = hidden_states[local_indices, :] - output, output_bias = expert(hidden) - output_bias = output_bias.expand_as(output) - output_total[local_indices, :] = output - output_bias_total[local_indices, :] = output_bias - - output_total = output_total * max_prob - output_bias_total = output_bias_total * max_prob - output_total = output_total.view(hidden_shape) - output_bias_total = output_bias_total.view(hidden_shape) - - return output_total, output_bias_total - - @classmethod - def sinkhorn(cls, cost, tol=0.0001): - "Megatron-LMs sinkhorn implementation" - - cost = torch.exp(cost) - d0 = torch.ones(cost.size(0), device=cost.device, dtype=cost.dtype) - d1 = torch.ones(cost.size(1), device=cost.device, dtype=cost.dtype) - - eps = 0.00000001 - error = 1e9 - d1_old = d1 - while error > tol: - d0 = (1 / d0.size(0)) * 1 / (torch.sum(d1 * cost, 1) + eps) - d1 = (1 / d1.size(0)) * 1 / (torch.sum(d0.unsqueeze(1) * cost, 0) + eps) - error = torch.mean(torch.abs(d1_old - d1)) - d1_old = d1 - return d1 * cost * d0.unsqueeze(1) - - -class CoreAttention(MegatronModule): - """ Region where selective activation recomputation is applied. - See Figure 3. in Reducing Activation Recomputation in Large Transformer Models - https://arxiv.org/pdf/2205.05198.pdf for more details. - - """ - - def __init__( - self, - layer_number, - num_attention_heads, - hidden_size, - attention_type=AttnType.self_attn, - attn_mask_type=AttnMaskType.padding, - precision=16, - apply_query_key_layer_scaling=True, - kv_channels=None, - masked_softmax_fusion=True, - attention_dropout=0.1, - sequence_parallel=False, - normalize_attention_scores=True, - ): - - super(CoreAttention, self).__init__() - - self.precision = precision - self.fp16 = precision == 16 - self.bf16 = precision == 'bf16' - - self.apply_query_key_layer_scaling = apply_query_key_layer_scaling - self.attention_softmax_in_fp32 = False - if self.apply_query_key_layer_scaling: - self.attention_softmax_in_fp32 = True - self.layer_number = max(1, layer_number) - self.attention_type = attention_type - self.attn_mask_type = attn_mask_type - self.sequence_parallel = sequence_parallel - # If True, will scale attention scores by 1 / sqrt(hidden_size_per_attention_head). - # This arg is been provided mostly to support weight conversion of Huggingface models. (ex: T5v1.1) - self.normalize_attention_scores = normalize_attention_scores - - if kv_channels is None: - assert ( - hidden_size % num_attention_heads == 0 - ), 'hidden_size must be divisible by num_attention_heads if kv_channels is None' - kv_channels = hidden_size // num_attention_heads - - projection_size = kv_channels * num_attention_heads - - # Per attention head and per partition values. - world_size = parallel_state.get_tensor_model_parallel_world_size() - self.hidden_size_per_partition = safe_divide(projection_size, world_size) - self.hidden_size_per_attention_head = safe_divide(projection_size, num_attention_heads) - self.num_attention_heads_per_partition = safe_divide(num_attention_heads, world_size) - self.num_attention_heads_partition_offset = ( - self.num_attention_heads_per_partition * parallel_state.get_tensor_model_parallel_rank() - ) - - coeff = None - self.norm_factor = math.sqrt(self.hidden_size_per_attention_head) - if self.apply_query_key_layer_scaling: - coeff = self.layer_number - self.norm_factor *= coeff - - self.scale_mask_softmax = MatchedScaleMaskSoftmax( - self.fp16, - self.bf16, - self.attn_mask_type, - masked_softmax_fusion, - attention_mask_func, - self.attention_softmax_in_fp32, - coeff, - ) - - # Dropout. Note that for a single iteration, this layer will generate - # different outputs on different number of parallel partitions but - # on average it should not be partition dependent. - self.attention_dropout = torch.nn.Dropout(attention_dropout) - - def forward( - self, - query_layer, - key_layer, - value_layer, - attention_mask, - layer_past=None, - get_key_value=False, - rotary_pos_emb=None, - relative_position_bias=None, - headscale_tensor=None, - ): - - # =================================== - # Raw attention scores. [b, np, s, s] - # =================================== - - # [b, np, sq, sk] - output_size = (query_layer.size(1), query_layer.size(2), query_layer.size(0), key_layer.size(0)) - - # TODO: figure out how to do this - # apply relative positional encoding (rotary embedding) - if rotary_pos_emb is not None: - q_pos_emb, k_pos_emb = rotary_pos_emb - - query_layer = apply_rotary_pos_emb(query_layer, q_pos_emb) - key_layer = apply_rotary_pos_emb(key_layer, k_pos_emb) - # TODO, can apply positional embedding to value_layer so it has - # absolute positional embedding. - # otherwise, only relative positional embedding takes effect - # value_layer = apply_rotary_pos_emb(value_layer, k_pos_emb) - - # [sq, b, np, hn] -> [sq, b * np, hn] - query_layer = query_layer.view(output_size[2], output_size[0] * output_size[1], -1) - # [sk, b, np, hn] -> [sk, b * np, hn] - key_layer = key_layer.view(output_size[3], output_size[0] * output_size[1], -1) - - # preallocting input tensor: [b * np, sq, sk] - matmul_input_buffer = torch.empty( - output_size[0] * output_size[1], - output_size[2], - output_size[3], - dtype=query_layer.dtype, - device=torch.cuda.current_device(), - ) - - # Raw attention scores. [b * np, sq, sk] - matmul_result = torch.baddbmm( - matmul_input_buffer, - query_layer.transpose(0, 1), # [b * np, sq, hn] - key_layer.transpose(0, 1).transpose(1, 2), # [b * np, hn, sk] - beta=0.0, - alpha=(1.0 / self.norm_factor) if self.normalize_attention_scores else 1.0, - ) - - # change view to [b, np, sq, sk] - attention_scores = matmul_result.view(*output_size) - - if relative_position_bias is not None: - attention_scores += relative_position_bias[ - :, - self.num_attention_heads_partition_offset : self.num_attention_heads_partition_offset - + self.num_attention_heads_per_partition, - : attention_scores.size(2), - : attention_scores.size(3), - ] - - # ================================================== - # Update attention mask for inference. [b, np, sq, sk] - # ================================================== - - if get_key_value: - with torch.no_grad(): - if layer_past is not None: - attention_mask = attention_mask[ - ..., attention_scores.size(3) - 1, : attention_scores.size(3) - ].unsqueeze(2) - else: - attention_mask = attention_mask[..., : attention_scores.size(3), : attention_scores.size(3)] - - # =========================== - # Attention probs and dropout - # =========================== - - # attention scores and attention mask [b, np, sq, sk] - attention_probs = self.scale_mask_softmax(attention_scores, attention_mask) - - # This is actually dropping out entire tokens to attend to, which might - # seem a bit unusual, but is taken from the original Transformer paper. - - if not self.sequence_parallel: - with tensor_parallel.random.get_cuda_rng_tracker().fork(): - attention_probs = self.attention_dropout(attention_probs) - else: - attention_probs = self.attention_dropout(attention_probs) - - # ========================= - # Context layer. [sq, b, hp] - # ========================= - - # value_layer -> context layer. - # [sk, b, np, hn] --> [b, np, sq, hn] - - # context layer shape: [b, np, sq, hn] - output_size = (value_layer.size(1), value_layer.size(2), query_layer.size(0), value_layer.size(3)) - - # change view [sk, b * np, hn] - value_layer = value_layer.view(value_layer.size(0), output_size[0] * output_size[1], -1) - - # change view [b * np, sq, sk] - attention_probs = attention_probs.view(output_size[0] * output_size[1], output_size[2], -1) - - # matmul: [b * np, sq, hn] - context_layer = torch.bmm(attention_probs, value_layer.transpose(0, 1)) - - # change view [b, np, sq, hn] - context_layer = context_layer.view(*output_size) - - if headscale_tensor is not None: - context_layer = context_layer * headscale_tensor - - # [b, np, sq, hn] --> [sq, b, np, hn] - context_layer = context_layer.permute(2, 0, 1, 3).contiguous() - - # [sq, b, np, hn] --> [sq, b, hp] - new_context_layer_shape = context_layer.size()[:-2] + (self.hidden_size_per_partition,) - context_layer = context_layer.view(*new_context_layer_shape) - - return context_layer - - -class ParallelAttention(MegatronModule, adapter_mixins.AdapterModuleMixin): - """Parallel self-attention layer abstract class. - - Self-attention layer takes input with size [s, b, h] - and returns output of the same size. - """ - - def __init__( - self, - init_method, - output_layer_init_method, - layer_number, - num_attention_heads, - hidden_size, - attention_type=AttnType.self_attn, - attn_mask_type=AttnMaskType.padding, - precision=16, - apply_query_key_layer_scaling=True, - kv_channels=None, - use_cpu_initialization=False, - masked_softmax_fusion=True, - attention_dropout=0.1, - layer_type=None, - megatron_legacy=False, - bias=True, - headscale=False, - activations_checkpoint_granularity=None, - sequence_parallel=False, - gradient_accumulation_fusion=False, - normalize_attention_scores=True, - ): - super(ParallelAttention, self).__init__() - - self.layer_number = max(1, layer_number) - self.attention_type = attention_type - self.attn_mask_type = attn_mask_type - self.normalize_attention_scores = normalize_attention_scores - - self.megatron_legacy = megatron_legacy - - self.set_accepted_adapter_types([InfusedAdapterConfig._target_]) - - if kv_channels is None: - assert ( - hidden_size % num_attention_heads == 0 - ), 'hidden_size must be divisible by num_attention_heads if kv_channels is None' - kv_channels = hidden_size // num_attention_heads - projection_size = kv_channels * num_attention_heads - - # Per attention head and per partition values. - world_size = parallel_state.get_tensor_model_parallel_world_size() - self.hidden_size_per_attention_head = safe_divide(projection_size, num_attention_heads) - self.num_attention_heads_per_partition = safe_divide(num_attention_heads, world_size) - self.num_attention_heads_partition_offset = ( - self.num_attention_heads_per_partition * parallel_state.get_tensor_model_parallel_rank() - ) - - no_async_tensor_model_parallel_allreduce = ( - parallel_state.get_tensor_model_parallel_world_size() == 1 or sequence_parallel - ) - - # Strided linear layer. - if attention_type == AttnType.self_attn: - self.query_key_value = tensor_parallel.ColumnParallelLinear( - hidden_size, - 3 * projection_size, - gather_output=False, - init_method=init_method, - use_cpu_initialization=use_cpu_initialization, - bias=bias, - sequence_parallel_enabled=sequence_parallel, - no_async_tensor_model_parallel_allreduce=no_async_tensor_model_parallel_allreduce, - gradient_accumulation_fusion=gradient_accumulation_fusion, - ) - else: - assert attention_type == AttnType.cross_attn - self.query = tensor_parallel.ColumnParallelLinear( - hidden_size, - projection_size, - gather_output=False, - init_method=init_method, - bias=bias, - sequence_parallel_enabled=sequence_parallel, - no_async_tensor_model_parallel_allreduce=no_async_tensor_model_parallel_allreduce, - gradient_accumulation_fusion=gradient_accumulation_fusion, - ) - - self.key_value = tensor_parallel.ColumnParallelLinear( - hidden_size, - 2 * projection_size, - gather_output=False, - init_method=init_method, - bias=bias, - sequence_parallel_enabled=sequence_parallel, - no_async_tensor_model_parallel_allreduce=no_async_tensor_model_parallel_allreduce, - gradient_accumulation_fusion=gradient_accumulation_fusion, - ) - - self.core_attention = CoreAttention( - layer_number=self.layer_number, - num_attention_heads=num_attention_heads, - hidden_size=hidden_size, - attention_type=self.attention_type, - attn_mask_type=self.attn_mask_type, - precision=precision, - apply_query_key_layer_scaling=apply_query_key_layer_scaling, - kv_channels=kv_channels, - masked_softmax_fusion=masked_softmax_fusion, - attention_dropout=attention_dropout, - sequence_parallel=sequence_parallel, - normalize_attention_scores=normalize_attention_scores, - ) - - # Output. - self.dense = tensor_parallel.RowParallelLinear( - projection_size, - hidden_size, - input_is_parallel=True, - init_method=output_layer_init_method, - skip_bias_add=True, - use_cpu_initialization=use_cpu_initialization, - bias=bias, - sequence_parallel_enabled=sequence_parallel, - gradient_accumulation_fusion=gradient_accumulation_fusion, - ) - - self.headscale = headscale - if headscale: - self.head_scale_tensor = torch.nn.Parameter( - torch.ones(1, self.num_attention_heads_per_partition, 1, 1), requires_grad=True - ) - - # Inference key-value memory - self.inference_key_memory = None - self.inference_value_memory = None - self.inference_current_sequence_len = 0 - - # relative position embedding - self.layer_type = layer_type - - def _checkpointed_attention_forward( - self, - query_layer, - key_layer, - value_layer, - attention_mask, - rotary_pos_emb=None, - relative_position_bias=None, - headscale_tensor=None, - ): - """Forward method with activation checkpointing.""" - - def custom_forward(*inputs): - if len(inputs) == 7: - query_layer = inputs[0] - key_layer = inputs[1] - value_layer = inputs[2] - attention_mask = inputs[3] - rotary_pos_emb = inputs[4] - relative_position_bias = inputs[5] - headscale_tensor = inputs[6] - elif len(inputs) == 8: - query_layer = inputs[0] - key_layer = inputs[1] - value_layer = inputs[2] - attention_mask = inputs[3] - rotary_pos_emb = (inputs[4], inputs[5]) - relative_position_bias = inputs[6] - headscale_tensor = inputs[7] - else: - raise ValueError('unexpected number of inputs') - output_ = self.core_attention( - query_layer, - key_layer, - value_layer, - attention_mask, - rotary_pos_emb=rotary_pos_emb, - relative_position_bias=relative_position_bias, - headscale_tensor=headscale_tensor, - ) - return output_ - - if rotary_pos_emb is None: - rot_tuple = (rotary_pos_emb,) - else: - rot_tuple = (rotary_pos_emb[0], rotary_pos_emb[1]) - - hidden_states = tensor_parallel.checkpoint( - custom_forward, - False, - query_layer, - key_layer, - value_layer, - attention_mask, - *rot_tuple, - relative_position_bias, - headscale_tensor, - ) - - return hidden_states - - def _allocate_memory(self, inference_max_sequence_len, batch_size, dtype): - return torch.empty( - inference_max_sequence_len, - batch_size, - self.num_attention_heads_per_partition, - self.hidden_size_per_attention_head, - dtype=dtype, - device=torch.cuda.current_device(), - ) - - def _transpose_last_dim(self, mixed_layer, num_splits, num_splits_first): - input_shape = mixed_layer.size() - if num_splits_first: - """[s, b, num_splits * np * hn] - -->(view) [s, b, num_splits, np, hn] - -->(tranpose) [s, b, np, num_splits, hn] - -->(view) [s, b, np * num_splits * hn] """ - - intermediate_shape = input_shape[:-1] + ( - num_splits, - self.num_attention_heads_per_partition, - self.hidden_size_per_attention_head, - ) - - mixed_layer = mixed_layer.view(*intermediate_shape) - mixed_layer = mixed_layer.transpose(-2, -3).contiguous() - else: - """[s, b, np * hn * num_splits] - -->(view) [s, b, np, hn, num_splits] - -->(tranpose) [s, b, np, num_splits, hn] - -->(view) [s, b, np * num_splits * hn] """ - - intermediate_shape = input_shape[:-1] + ( - self.num_attention_heads_per_partition, - self.hidden_size_per_attention_head, - num_splits, - ) - - mixed_layer = mixed_layer.view(*intermediate_shape) - mixed_layer = mixed_layer.transpose(-1, -2).contiguous() - mixed_layer = mixed_layer.view(*input_shape) - - return mixed_layer - - def forward( - self, - hidden_states, - attention_mask, - layer_past=None, - get_key_value=False, - encoder_output=None, - set_inference_key_value_memory=False, - inference_max_sequence_len=None, - rotary_pos_emb=None, # rotary positional embedding - relative_position_bias=None, - checkpoint_core_attention=False, - ): - # hidden_states: [sq, b, h] - - # ================================================= - # Pre-allocate memory for key-values for inference. - # ================================================= - if set_inference_key_value_memory: - assert inference_max_sequence_len and inference_max_sequence_len > 0 - self.inference_key_memory = self._allocate_memory( - inference_max_sequence_len, hidden_states.size(1), hidden_states.dtype - ) - self.inference_value_memory = self._allocate_memory( - inference_max_sequence_len, hidden_states.size(1), hidden_states.dtype - ) - self.inference_current_sequence_len = 0 - - # Some consistency check. - if inference_max_sequence_len: - assert self.inference_current_sequence_len < self.inference_key_memory.size(0) - assert inference_max_sequence_len == self.inference_key_memory.size(0) - # This is added for safety. In case inference_max_sequence_len - # is not provided, make sure there is no potential memory left - # from previous inference. - if not inference_max_sequence_len: - self.inference_key_memory = None - self.inference_value_memory = None - - # ===================== - # Query, Key, and Value - # ===================== - - if self.attention_type == AttnType.self_attn: - # Attention heads [sq, b, h] --> [sq, b, (np * 3 * hn)] - mixed_x_layer, _ = self.query_key_value(hidden_states) - - # [sq, b, (np * 3 * hn)] --> [sq, b, np, 3 * hn] - new_tensor_shape = mixed_x_layer.size()[:-1] + ( - self.num_attention_heads_per_partition, - 3 * self.hidden_size_per_attention_head, - ) - if self.megatron_legacy: - mixed_x_layer = self._transpose_last_dim(mixed_x_layer, 3, True) - mixed_x_layer = mixed_x_layer.view(*new_tensor_shape) - - # [sq, b, np, 3 * hn] --> 3 [sq, b, np, hn] - (query_layer, key_layer, value_layer) = tensor_parallel.split_tensor_along_last_dim(mixed_x_layer, 3) - else: - # Attention heads [sk, b, h] --> [sk, b, (np * 2 * hn)] - mixed_kv_layer, _ = self.key_value(encoder_output) - - # [sk, b, (np * 2 * hn)] --> [sk, b, np, 2 * hn] - new_tensor_shape = mixed_kv_layer.size()[:-1] + ( - self.num_attention_heads_per_partition, - 2 * self.hidden_size_per_attention_head, - ) - if self.megatron_legacy: - mixed_kv_layer = self._transpose_last_dim(mixed_kv_layer, 2, True) - mixed_kv_layer = mixed_kv_layer.view(*new_tensor_shape) - - # [sk, b, np, 2 * hn] --> 2 [sk, b, np, hn] - (key_layer, value_layer) = tensor_parallel.split_tensor_along_last_dim(mixed_kv_layer, 2) - - # Attention head [sq, b, h] --> [sq, b, hp] - query_layer, _ = self.query(hidden_states) - # [sq, b, hp] --> [sq, b, np, hn] - new_tensor_shape = query_layer.size()[:-1] + ( - self.num_attention_heads_per_partition, - self.hidden_size_per_attention_head, - ) - query_layer = query_layer.view(*new_tensor_shape) - - if self.is_adapter_available(): - key_infused_adapter = self.get_adapter_module(AdapterName.KEY_INFUSED) - value_infused_adapter = self.get_adapter_module(AdapterName.VALUE_INFUSED) - if key_infused_adapter: - assert value_infused_adapter is not None, "Expected value_infused_adapter not found!" - kls = key_layer.shape - key_layer = key_infused_adapter(key_layer.reshape(kls[0], kls[1], -1)).reshape(kls) - if value_infused_adapter: - assert key_infused_adapter is not None, "Expected key_infused_adapter not found!" - vls = value_layer.shape - value_layer = value_infused_adapter(value_layer.reshape(vls[0], vls[1], -1)).reshape(vls) - - # =================================================== - # Adjust key, value, and attention mask for inference - # =================================================== - - # duplicate the pos_emb for self attention - if rotary_pos_emb is not None: - rotary_pos_emb = rotary_pos_emb if isinstance(rotary_pos_emb, tuple) else ((rotary_pos_emb,) * 2) - - if inference_max_sequence_len: - # Adjust the range variables. - start = self.inference_current_sequence_len - self.inference_current_sequence_len += key_layer.size(0) - end = self.inference_current_sequence_len - # Copy key and values. - self.inference_key_memory[start:end, ...] = key_layer - self.inference_value_memory[start:end, ...] = value_layer - key_layer = self.inference_key_memory[:end, ...] - value_layer = self.inference_value_memory[:end, ...] - # Adjust attention mask - attention_mask = attention_mask[..., start:end, :end] - # adjust the key rotary positional embedding - if rotary_pos_emb is not None: - q_pos_emb, k_pos_emb = rotary_pos_emb - if not set_inference_key_value_memory: - # In inference, we compute one token at a time. - # Select the correct positional embedding. - q_pos_emb = q_pos_emb[end - 1 : end] - k_pos_emb = k_pos_emb[:end, :, :, :] - rotary_pos_emb = (q_pos_emb, k_pos_emb) - - if layer_past is not None: - past_key, past_value = layer_past - key_layer = torch.cat((past_key.type_as(key_layer), key_layer), dim=0) - value_layer = torch.cat((past_value.type_as(value_layer), value_layer), dim=0) - - if get_key_value: - present = (key_layer, value_layer) - - if checkpoint_core_attention: - context_layer = self._checkpointed_attention_forward( - query_layer, - key_layer, - value_layer, - attention_mask, - rotary_pos_emb=rotary_pos_emb, - relative_position_bias=relative_position_bias, - headscale_tensor=self.head_scale_tensor if self.headscale else None, - ) - else: - context_layer = self.core_attention( - query_layer, - key_layer, - value_layer, - attention_mask, - layer_past=layer_past, - get_key_value=get_key_value, - rotary_pos_emb=rotary_pos_emb, - relative_position_bias=relative_position_bias, - headscale_tensor=self.head_scale_tensor if self.headscale else None, - ) - - # ================= - # Output. [sq, b, h] - # ================= - - output, bias = self.dense(context_layer) - - if get_key_value: - output = [output, present] - - return output, bias - - -class ParallelChunkedCrossAttention(MegatronModule): - """Parallel chunked cross-attention layer class. - - Self-attention layer takes input with size [b, s, h] - and returns output of the same size. - """ - - def __init__( - self, - init_method, - output_layer_init_method, - layer_number, - num_attention_heads, - hidden_size, - precision=16, - apply_query_key_layer_scaling=True, - kv_channels=None, - use_cpu_initialization=False, - masked_softmax_fusion=True, - attention_dropout=0.1, - megatron_legacy=False, - chunk_size=64, # each chunk, how many tokens - bias=True, - headscale=False, - gradient_accumulation_fusion=False, - normalize_attention_scores=True, - ): - super(ParallelChunkedCrossAttention, self).__init__() - self.cross_attention = ParallelAttention( - init_method=init_method, - output_layer_init_method=output_layer_init_method, - layer_number=layer_number, - num_attention_heads=num_attention_heads, - hidden_size=hidden_size, - attention_type=AttnType.cross_attn, - attn_mask_type=AttnMaskType.padding, - precision=precision, - apply_query_key_layer_scaling=apply_query_key_layer_scaling, - kv_channels=kv_channels, - use_cpu_initialization=use_cpu_initialization, - masked_softmax_fusion=masked_softmax_fusion, - attention_dropout=attention_dropout, - megatron_legacy=megatron_legacy, - bias=bias, - headscale=headscale, - gradient_accumulation_fusion=gradient_accumulation_fusion, - normalize_attention_scores=normalize_attention_scores, - ) - self.chunk_size = chunk_size - - def forward( - self, - hidden_states, - attention_mask, - encoder_output=None, - set_inference_key_value_memory=False, - inference_max_sequence_len=None, - rotary_pos_emb=None, - checkpoint_core_attention=False, - ): - if checkpoint_core_attention: - raise ValueError( - 'checkpoint_core_attention during forward not implemented yet for ParallelChunkedCrossAttention' - ) - - # hidden_states is assumed to have dimension [token length, batch, dimension] - # derive variables - # encoder_output here is the retrieved context - context = encoder_output - # context is assumed to have dimension [num_chunks, num_neighbors, context_token_len, batch, dimension] - chunk_size = self.chunk_size - b, n, dim = ( - hidden_states.shape[1], - hidden_states.shape[0], - hidden_states.shape[2], - ) - default_bias = self.cross_attention.dense.bias - if set_inference_key_value_memory: - seq_index = (n // chunk_size) * chunk_size - self.current_len = n - elif inference_max_sequence_len is not None: - # only handles single token increment - assert n == 1 - self.current_len += n - token_pos = (self.current_len - 1) % chunk_size - chunk_id = self.current_len // chunk_size - if chunk_id <= 0: - # if sequence length less than chunk size, do an early return - return torch.zeros_like(hidden_states), default_bias - causal_padding = chunk_size - 1 - # pad it as a full chunk, put it at the end of the chunk position - hidden_states = F.pad(hidden_states, (0, 0, 0, 0, causal_padding, 0), value=0.0) - # only use the relevant context - context = context[chunk_id - 1 : chunk_id, :, :, :, :] - attention_mask = rearrange(attention_mask, '(b k) 1 q v -> b k 1 q v', b=b) - # select the relevant chunk attn mask - attention_mask = attention_mask[:, chunk_id - 1] - seq_index = chunk_size - else: - # this is normal forward without inference - seq_index = (n // chunk_size) * chunk_size - - # if sequence length less than chunk size, do an early return - if n < self.chunk_size and set_inference_key_value_memory and inference_max_sequence_len is not None: - return torch.zeros_like(hidden_states), default_bias - - num_chunks, num_retrieved = ( - context.shape[-5], - context.shape[-4], - ) - - # causal padding - causal_padding = chunk_size - 1 - - x = F.pad(hidden_states, (0, 0, 0, 0, -causal_padding, causal_padding), value=0.0) - - # remove sequence which is ahead of the neighbors retrieved (during inference) - - # seq_index = (n // chunk_size) * chunk_size - x, x_remainder = x[:seq_index], x[seq_index:] - - seq_remain_len = x_remainder.shape[0] - - # take care of rotary positional embedding - # make sure queries positions are properly shifted to the future - - q_pos_emb, k_pos_emb = rotary_pos_emb - # currently implementation is broken - # q need to extend to causal_padding, and just do - # q_pos_emb = F.pad(q_pos_emb, (0, 0, -causal_padding, 0), value = 0.) - if inference_max_sequence_len is not None and not set_inference_key_value_memory: - q_pos_emb = F.pad( - q_pos_emb, (0, 0, 0, 0, 0, 0, -causal_padding - token_pos, -causal_padding + token_pos), value=0.0 - ) - else: - q_pos_emb = F.pad(q_pos_emb, (0, 0, 0, 0, 0, 0, -causal_padding, 0), value=0.0) - - k_pos_emb = repeat(k_pos_emb, 'n b h d -> (r n) b h d', r=num_retrieved) - rotary_pos_emb = (q_pos_emb, k_pos_emb) - - # make sure number context chunks is enough - assert x.shape[0] // chunk_size == num_chunks - - # reshape so we have chunk to chunk attention, without breaking causality - x = rearrange(x, '(k n) b d -> n (b k) d', k=num_chunks) - context = rearrange(context, 'k r n b d -> (r n) (b k) d') - # cross attention - out, bias = self.cross_attention(x, attention_mask, encoder_output=context, rotary_pos_emb=rotary_pos_emb) - - # reshape back to original sequence - - out = rearrange(out, 'n (b k) d -> (k n) b d', b=b) - - # pad back to original, with 0s at the beginning (which will be added to the residual and be fine) - - out = F.pad(out, (0, 0, 0, 0, causal_padding, -causal_padding + seq_remain_len), value=0.0) - if not set_inference_key_value_memory and inference_max_sequence_len is not None: - out = out[-1:] - return out, bias - - def get_bias_dropout_add(training): def _bias_dropout_add(x, bias, residual, prob): return bias_dropout_add(x, bias, residual, prob, training) diff --git a/tests/collections/nlp/test_retrieval_module.py b/tests/collections/nlp/test_retrieval_module.py index b5da4e20085f..175fdeaadab6 100644 --- a/tests/collections/nlp/test_retrieval_module.py +++ b/tests/collections/nlp/test_retrieval_module.py @@ -18,6 +18,7 @@ from einops import rearrange from pytorch_lightning.trainer.trainer import Trainer +from nemo.collections.nlp.modules.common.megatron.attention import ParallelChunkedCrossAttention from nemo.collections.nlp.modules.common.megatron.layer_type import LayerType from nemo.collections.nlp.modules.common.megatron.megatron_init import initialize_model_parallel_for_nemo from nemo.collections.nlp.modules.common.megatron.retrieval_token_level_encoder_decoder import ( @@ -28,7 +29,6 @@ MegatronRetrievalTransformerEncoderModule, ) from nemo.collections.nlp.modules.common.megatron.rotary_pos_embedding import RotaryEmbedding -from nemo.collections.nlp.modules.common.megatron.transformer import ParallelChunkedCrossAttention from nemo.collections.nlp.modules.common.megatron.utils import ( build_attention_mask_3d, init_method_normal, diff --git a/tests/collections/nlp/test_retrieval_module_inference.py b/tests/collections/nlp/test_retrieval_module_inference.py index 94e179a9e80f..4c655bc1f70f 100644 --- a/tests/collections/nlp/test_retrieval_module_inference.py +++ b/tests/collections/nlp/test_retrieval_module_inference.py @@ -19,6 +19,7 @@ from einops import rearrange from pytorch_lightning.trainer.trainer import Trainer +from nemo.collections.nlp.modules.common.megatron.attention import ParallelChunkedCrossAttention from nemo.collections.nlp.modules.common.megatron.layer_type import LayerType from nemo.collections.nlp.modules.common.megatron.megatron_init import initialize_model_parallel_for_nemo from nemo.collections.nlp.modules.common.megatron.retrieval_token_level_encoder_decoder import ( @@ -29,7 +30,6 @@ MegatronRetrievalTransformerEncoderModule, ) from nemo.collections.nlp.modules.common.megatron.rotary_pos_embedding import RotaryEmbedding -from nemo.collections.nlp.modules.common.megatron.transformer import ParallelChunkedCrossAttention from nemo.collections.nlp.modules.common.megatron.utils import ( build_attention_mask_3d, init_method_normal, From e28fcc93d1541b94621f446f4c6307e95c7647cc Mon Sep 17 00:00:00 2001 From: Adi Renduchintala <108822655+arendu@users.noreply.github.com> Date: Fri, 6 Jan 2023 18:59:53 -0800 Subject: [PATCH 113/154] Adithyare/prompt learning seed (#5749) * patch to allow using tokenizers without additional_special_tokens_ids attribute Signed-off-by: arendu * seeding for param-efficient learning methods Signed-off-by: arendu * seeding the datasampler Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed seed_everything Signed-off-by: arendu Signed-off-by: arendu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- .../language_modeling/megatron_gpt_prompt_learning_model.py | 2 +- .../language_modeling/megatron_t5_prompt_learning_model.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py index 65e554fafe29..1241268fdd42 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py @@ -848,7 +848,7 @@ def build_virtual_prompt_dataset( rank = parallel_state.get_data_parallel_rank() data_parallel_size = parallel_state.get_data_parallel_world_size() sampler = torch.utils.data.distributed.DistributedSampler( - dataset, num_replicas=data_parallel_size, rank=rank, shuffle=shuffle + dataset, num_replicas=data_parallel_size, rank=rank, shuffle=shuffle, seed=self.cfg.seed ) assert batch_size % data_parallel_size == 0, "Global batch size must be evenly divisible by data parallel size" diff --git a/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py b/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py index 534bf194021b..90274910de72 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py @@ -429,7 +429,7 @@ def build_virtual_prompt_dataset( rank = parallel_state.get_data_parallel_rank() world_size = parallel_state.get_data_parallel_world_size() sampler = torch.utils.data.distributed.DistributedSampler( - dataset, num_replicas=world_size, rank=rank, shuffle=shuffle + dataset, num_replicas=world_size, rank=rank, shuffle=shuffle, seed=self.cfg.seed ) dataloader = torch.utils.data.DataLoader( From de82689b45c1d4a001a519efe347e808efe9c3a8 Mon Sep 17 00:00:00 2001 From: Jonghwan Hyeon Date: Sun, 8 Jan 2023 19:57:18 +0900 Subject: [PATCH 114/154] Set the stream position to 0 for pydub (#5752) Signed-off-by: Jonghwan Hyeon Signed-off-by: Jonghwan Hyeon Signed-off-by: Elena Rastorgueva --- nemo/collections/asr/parts/preprocessing/segment.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/nemo/collections/asr/parts/preprocessing/segment.py b/nemo/collections/asr/parts/preprocessing/segment.py index 1b22dcf66840..7aee5c0cca70 100644 --- a/nemo/collections/asr/parts/preprocessing/segment.py +++ b/nemo/collections/asr/parts/preprocessing/segment.py @@ -219,6 +219,9 @@ def from_file( f"NeMo will fallback to loading via pydub." ) + if hasattr(audio_file, "seek"): + audio_file.seek(0) + if HAVE_PYDUB and samples is None: try: samples = Audio.from_file(audio_file) From c0b36db770d95595d99cbe6d984a9c459bb9fc5d Mon Sep 17 00:00:00 2001 From: anteju <108555623+anteju@users.noreply.github.com> Date: Mon, 9 Jan 2023 16:01:00 -0800 Subject: [PATCH 115/154] Fix: conformer encoder forward when length is None (#5761) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Ante Jukić Signed-off-by: Elena Rastorgueva --- nemo/collections/asr/modules/conformer_encoder.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nemo/collections/asr/modules/conformer_encoder.py b/nemo/collections/asr/modules/conformer_encoder.py index 04ad67f4d65f..6717c4d62ed2 100644 --- a/nemo/collections/asr/modules/conformer_encoder.py +++ b/nemo/collections/asr/modules/conformer_encoder.py @@ -414,7 +414,7 @@ def forward(self, audio_signal, length, cache_last_channel=None, cache_last_time if length is None: length = audio_signal.new_full( - audio_signal.size(0), max_audio_length, dtype=torch.int32, device=audio_signal.device + (audio_signal.size(0),), max_audio_length, dtype=torch.int32, device=audio_signal.device ) if cache_last_time is not None: From 4190144d0a96eaf6f3c38022ae6cc663fc8ae106 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 9 Jan 2023 16:43:32 -0800 Subject: [PATCH 116/154] Update Tacotron2 NGC checkpoint load to latest version (#5760) (#5762) Signed-off-by: Jocelyn Huang Signed-off-by: Elena Rastorgueva --- docs/source/tts/data/ngc_models_am.csv | 2 +- nemo/collections/tts/models/tacotron2.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/tts/data/ngc_models_am.csv b/docs/source/tts/data/ngc_models_am.csv index 6233368e369d..84cff6625bee 100644 --- a/docs/source/tts/data/ngc_models_am.csv +++ b/docs/source/tts/data/ngc_models_am.csv @@ -5,7 +5,7 @@ en-US,tts_en_fastpitch_multispeaker,HiFiTTS,44100Hz,10,ARPABET,nemo.collections. en-US,tts_en_lj_mixertts,LJSpeech,22050Hz,1,ARPABET,nemo.collections.tts.models.mixer_tts.MixerTTSModel,`tts_en_lj_mixertts `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_lj_mixertts/versions/1.6.0/files/tts_en_lj_mixertts.nemo`` en-US,tts_en_lj_mixerttsx,LJSpeech,22050Hz,1,ARPABET,nemo.collections.tts.models.mixer_tts.MixerTTSModel,`tts_en_lj_mixerttsx `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_lj_mixerttsx/versions/1.6.0/files/tts_en_lj_mixerttsx.nemo`` en-US,RAD-TTS,TBD,TBD,TBD,ARPABET,nemo.collections.tts.models.radtts.RadTTSModel,TBD, -en-US,tts_en_tacotron2,LJSpeech,22050Hz,1,ARPABET,nemo.collections.tts.models.tacotron2.Tacotron2Model,`tts_en_tacotron2 `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_tacotron2/versions/1.0.0/files/tts_en_tacotron2.nemo`` +en-US,tts_en_tacotron2,LJSpeech,22050Hz,1,ARPABET,nemo.collections.tts.models.tacotron2.Tacotron2Model,`tts_en_tacotron2 `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_tacotron2/versions/1.10.0/files/tts_en_tacotron2.nemo`` de-DE,tts_de_fastpitch_multispeaker_5,HUI Audio Corpus German,44100Hz,5,ARPABET,nemo.collections.tts.models.fastpitch.FastPitchModel,`tts_de_fastpitch_multispeaker_5 `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitch_multispeaker_5/versions/1.11.0/files/tts_de_fastpitch_multispeaker_5.nemo`` de-DE,tts_de_fastpitch_singlespeaker,Thorsten Müller (German Neutral-TTS dataset),22050Hz,1,ARPABET,nemo.collections.tts.models.fastpitch.FastPitchModel,`tts_de_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitchhifigan/versions/1.10.0/files/tts_de_fastpitch_align.nemo`` es,tts_es_fastpitch_multispeaker,OpenSLR crowdsourced Latin American Spanish,44100Hz,174,IPA,nemo.collections.tts.models.fastpitch.FastPitchModel,`tts_es_multispeaker_fastpitchhifigan `_,``https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.15.0/files/tts_es_fastpitch_multispeaker.nemo`` diff --git a/nemo/collections/tts/models/tacotron2.py b/nemo/collections/tts/models/tacotron2.py index bbcc7d48af79..ed5fac10631c 100644 --- a/nemo/collections/tts/models/tacotron2.py +++ b/nemo/collections/tts/models/tacotron2.py @@ -382,7 +382,7 @@ def list_available_models(cls) -> 'List[PretrainedModelInfo]': list_of_models = [] model = PretrainedModelInfo( pretrained_model_name="tts_en_tacotron2", - location="https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_tacotron2/versions/1.0.0/files/tts_en_tacotron2.nemo", + location="https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_tacotron2/versions/1.10.0/files/tts_en_tacotron2.nemo", description="This model is trained on LJSpeech sampled at 22050Hz, and can be used to generate female English voices with an American accent.", class_=cls, aliases=["Tacotron2-22050Hz"], From d369a0eb272c510ea6632fbff75dbd1fd1360198 Mon Sep 17 00:00:00 2001 From: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Date: Mon, 9 Jan 2023 18:12:07 -0800 Subject: [PATCH 117/154] [TTS][DE] refine grapheme-based tokenizer and fastpitch training recipe on thorsten's neutral datasets. (#5753) * refine GermanCharsTokenizer to support only graphemes as inputs by removing sentence-level phoneme representation; * refine GermanCharsTokenizer to preserve mixed cases from the original input graphemes; * add a new Thorsten's 22.10 dataset; * revise thorsten voice neutral datasets preparation script to support two versions of thorsten's voice datasets in a single script; Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- docs/source/tts/data/datasets.csv | 2 +- docs/source/tts/datasets.rst | 57 +++- ...ml => fastpitch_align_22050_grapheme.yaml} | 34 ++- .../de/fastpitch_align_44100_grapheme.yaml | 245 ++++++++++++++++ ...aml => fastpitch_align_44100_phoneme.yaml} | 0 .../common/parts/preprocessing/parsers.py | 6 +- .../tokenizers/text_to_speech/ipa_lexicon.py | 41 ++- .../text_to_speech/tts_tokenizers.py | 47 ++-- nemo/collections/tts/data/data_utils.py | 2 - nemo/collections/tts/models/fastpitch.py | 4 +- nemo/collections/tts/torch/data.py | 4 - nemo/collections/tts/torch/g2ps.py | 2 +- nemo/collections/tts/torch/tts_tokenizers.py | 2 +- nemo_text_processing/g2p/data/data_utils.py | 34 ++- nemo_text_processing/g2p/modules.py | 8 +- .../tts/hui_acg/get_data.py | 12 +- .../tts/openslr_95/get_data.py | 105 ------- .../ds_conf/ds_for_fastpitch_align.yaml | 8 +- .../tts/thorsten_neutral/get_data.py | 263 ++++++++++++++++++ .../text_to_speech/test_tts_tokenizers.py | 2 +- 20 files changed, 684 insertions(+), 194 deletions(-) rename examples/tts/conf/de/{fastpitch_align_22050.yaml => fastpitch_align_22050_grapheme.yaml} (86%) create mode 100644 examples/tts/conf/de/fastpitch_align_44100_grapheme.yaml rename examples/tts/conf/de/{fastpitch_align_44100.yaml => fastpitch_align_44100_phoneme.yaml} (100%) delete mode 100644 scripts/dataset_processing/tts/openslr_95/get_data.py rename scripts/dataset_processing/tts/{openslr_95 => thorsten_neutral}/ds_conf/ds_for_fastpitch_align.yaml (88%) create mode 100644 scripts/dataset_processing/tts/thorsten_neutral/get_data.py diff --git a/docs/source/tts/data/datasets.csv b/docs/source/tts/data/datasets.csv index 4c8d553ec3c1..6d25b6ea1500 100644 --- a/docs/source/tts/data/datasets.csv +++ b/docs/source/tts/data/datasets.csv @@ -3,7 +3,7 @@ English,en-US,LJSpeech,1,1,0,23.92,23.92,0.00,"22,050Hz",https://keithito.com/LJ English,en-US,LibriTTS (clean),1230,592,638,262.62,133.97,128.65,"24,000Hz",https://www.openslr.org/60/ English,en-US,HiFiTTS,10,6,4,291.60,158.30,133.30,"44,100Hz",http://www.openslr.org/109/ German,de-DE,Thorsten Müller (German Neutral-TTS dataset),1,0,1,22.96,0.00,22.96,"22,050Hz",https://www.openslr.org/95/ -German,de-DE,HUI-Audio-Corpus-German (clean),118,0,0,253.85,0.00,0.00,"44,100Hz",https://opendata.iisys.de/datasets.html +German,de-DE,HUI-Audio-Corpus-German (clean),118,n/a,n/a,253.85,0.00,0.00,"44,100Hz",https://opendata.iisys.de/datasets.html Spanish,es-AR,Crowdsourced high-quality Argentinian Spanish,44,31,13,8.03,5.61,2.42,"48,000Hz",https://www.openslr.org/61/ Spanish,es-CL,Crowdsourced high-quality Chilean Spanish,31,13,18,7.15,2.84,4.31,"48,000Hz",https://www.openslr.org/71/ Spanish,es-CO,Crowdsourced high-quality Colombian Spanish,33,16,17,7.58,3.74,3.84,"48,000Hz",https://www.openslr.org/72/ diff --git a/docs/source/tts/datasets.rst b/docs/source/tts/datasets.rst index abdfee1af7f8..083566bc7cb8 100644 --- a/docs/source/tts/datasets.rst +++ b/docs/source/tts/datasets.rst @@ -66,9 +66,8 @@ LibriTTS $ python scripts/dataset_processing/tts/libritts/get_data.py \ --data-root \ - --manifests-path \ - --val-size 0.01 \ - --test-size 0.01 + --data-sets "ALL" + --num-workers 4 $ python scripts/dataset_processing/tts/extract_sup_data.py \ --config-path ljspeech/ds_conf \ @@ -89,22 +88,55 @@ The texts of this dataset has been normalized already. So there is no extra need * Command Line Instruction: TBD -Thorsten Müller (German Neutral-TTS dataset) +Thorsten Müller's German Neutral-TTS Datasets ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -* Dataset URL: https://www.openslr.org/resources/95/ -* Dataset Processing Script: https://github.com/NVIDIA/NeMo/tree/stable/scripts/dataset_processing/tts/openslr_95/get_data.py +There are two German neutral datasets released by Thorsten Müller for now, 21.02 and 22.10, respectively. Version 22.10 has been recorded with a better recording setup, such as recording chamber and better microphone. So it is advised to train models on the 22.10 version because its audio quality is better and it has a way more natural speech flow and higher character rate per second speech. The two datasets are described below and defined in `scripts/dataset_processing/tts/thorsten_neutral/get_data.py:THORSTEN_NEUTRAL `_. + +.. code-block:: python + + # Thorsten Müller published two neural voice datasets, 21.02 and 22.10. + THORSTEN_NEUTRAL = { + "21_02": { + "url": "https://zenodo.org/record/5525342/files/thorsten-neutral_v03.tgz?download=1", + "dir_name": "thorsten-de_v03", + "metadata": ["metadata.csv"], + }, + "22_10": { + "url": "https://zenodo.org/record/7265581/files/ThorstenVoice-Dataset_2022.10.zip?download=1", + "dir_name": "ThorstenVoice-Dataset_2022.10", + "metadata": ["metadata_train.csv", "metadata_dev.csv", "metadata_test.csv"], + }, + } + +* Thorsten Müller's German Datasets repo: https://github.com/thorstenMueller/Thorsten-Voice +* Dataset Processing Script: https://github.com/NVIDIA/NeMo/tree/stable/scripts/dataset_processing/tts/thorsten_neutral/get_data.py * Command Line Instruction: .. code-block:: bash - $ python scripts/dataset_processing/tts/openslr_95/get_data.py \ + # Version 22.10 + $ python scripts/dataset_processing/tts/thorsten_neutral/get_data.py \ --data-root \ - --val-size 0.1 \ - --test-size 0.2 \ - --seed-for-ds-split 100 - + --manifests-root \ + --data-version "22_10" \ + --val-size 100 \ + --test-size 100 \ + --seed-for-ds-split 100 \ + --normalize-text + + # Version 21.02 + $ python scripts/dataset_processing/tts/thorsten_neutral/get_data.py \ + --data-root \ + --manifests-root \ + --data-version "21_02" \ + --val-size 100 \ + --test-size 100 \ + --seed-for-ds-split 100 \ + --normalize-text + + # extract pitch and compute pitch normalization params for each version. $ python scripts/dataset_processing/tts/extract_sup_data.py \ - --config-path openslr_95/ds_conf \ + --config-path thorsten_neutral/ds_conf \ --config-name ds_for_fastpitch_align.yaml \ manifest_filepath= \ sup_data_path= @@ -119,6 +151,7 @@ HUI Audio Corpus German $ python scripts/dataset_processing/tts/hui_acg/get_data.py \ --data-root \ + --manifests-root \ --set-type clean \ --min-duration 0.1 \ --max-duration 15 \ diff --git a/examples/tts/conf/de/fastpitch_align_22050.yaml b/examples/tts/conf/de/fastpitch_align_22050_grapheme.yaml similarity index 86% rename from examples/tts/conf/de/fastpitch_align_22050.yaml rename to examples/tts/conf/de/fastpitch_align_22050_grapheme.yaml index 62c8543b98a4..3217f976347d 100644 --- a/examples/tts/conf/de/fastpitch_align_22050.yaml +++ b/examples/tts/conf/de/fastpitch_align_22050_grapheme.yaml @@ -15,8 +15,8 @@ pitch_fmax: 2093.004522404789 # these frame-wise values depend on pitch_fmin and pitch_fmax, you can get values # by running `scripts/dataset_processing/tts/extract_sup_data.py` -pitch_mean: ??? # e.g. 132.524658203125 for http://www.openslr.org/95/ -pitch_std: ??? # e.g. 37.389366149902 for http://www.openslr.org/95/ +pitch_mean: ??? # e.g. 132.524658203125 for https://zenodo.org/record/5525342/files/thorsten-neutral_v03.tgz?download=1 +pitch_std: ??? # e.g. 37.389366149902 for https://zenodo.org/record/5525342/files/thorsten-neutral_v03.tgz?download=1 # Default values for dataset with sample_rate=22050 sample_rate: 22050 @@ -70,7 +70,6 @@ model: punct: true apostrophe: true pad_with_space: true - phonemes: true train_ds: dataset: @@ -86,10 +85,13 @@ model: n_mels: ${model.n_mel_channels} lowfreq: ${model.lowfreq} highfreq: ${model.highfreq} - max_duration: 14 # change to null to include longer audios. + max_duration: 15 # change to null to include longer audios. min_duration: 0.1 ignore_file: null - trim: false + trim: true + trim_top_db: 50 + trim_frame_length: ${model.n_window_size} + trim_hop_length: ${model.n_window_stride} pitch_fmin: ${model.pitch_fmin} pitch_fmax: ${model.pitch_fmax} pitch_norm: true @@ -101,6 +103,7 @@ model: shuffle: true batch_size: 32 num_workers: 12 + pin_memory: true validation_ds: dataset: @@ -116,10 +119,13 @@ model: n_mels: ${model.n_mel_channels} lowfreq: ${model.lowfreq} highfreq: ${model.highfreq} - max_duration: 14 # change to null to include longer audios. + max_duration: 15 # change to null to include longer audios. min_duration: 0.1 ignore_file: null - trim: false + trim: true + trim_top_db: 50 + trim_frame_length: ${model.n_window_size} + trim_hop_length: ${model.n_window_stride} pitch_fmin: ${model.pitch_fmin} pitch_fmax: ${model.pitch_fmax} pitch_norm: true @@ -130,7 +136,8 @@ model: drop_last: false shuffle: false batch_size: 32 - num_workers: 2 + num_workers: 8 + pin_memory: true preprocessor: _target_: nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor @@ -203,8 +210,7 @@ model: optim: name: adamw lr: 1e-3 - # optimizer arguments - betas: [0.9, 0.999] + betas: [ 0.9, 0.999 ] weight_decay: 1e-6 sched: @@ -215,15 +221,15 @@ model: trainer: num_nodes: 1 - devices: -1 # number of gpus + devices: -1 # specify all GPUs regardless of its availability accelerator: gpu strategy: ddp precision: 16 - max_epochs: 1000 + max_epochs: 1500 accumulate_grad_batches: 1 gradient_clip_val: 1000.0 - enable_checkpointing: false # Provided by exp_manager - logger: false # Provided by exp_manager + enable_checkpointing: false # Provided by exp_manager + logger: false # Provided by exp_manager log_every_n_steps: 100 check_val_every_n_epoch: 5 benchmark: false diff --git a/examples/tts/conf/de/fastpitch_align_44100_grapheme.yaml b/examples/tts/conf/de/fastpitch_align_44100_grapheme.yaml new file mode 100644 index 000000000000..5395ccb57080 --- /dev/null +++ b/examples/tts/conf/de/fastpitch_align_44100_grapheme.yaml @@ -0,0 +1,245 @@ +# This config contains the default values for training FastPitch model with aligner using 44.1KHz sampling +# rate. If you want to train model on other dataset, you can change config values according to your dataset. +# Most dataset-specific arguments are in the head of the config file, see below. + +name: FastPitch + +train_dataset: ??? +validation_datasets: ??? +sup_data_path: ??? +sup_data_types: [ "align_prior_matrix", "pitch", "speaker_id" ] + +# Default values from librosa.pyin +pitch_fmin: 65.40639132514966 +pitch_fmax: 2093.004522404789 + +# these frame-wise values depend on pitch_fmin and pitch_fmax, you can get values +# by running `scripts/dataset_processing/tts/extract_sup_data.py` +pitch_mean: ??? # e.g. 151.9578857421875 for top-5 speakers in https://github.com/iisys-hof/HUI-Audio-Corpus-German +pitch_std: ??? # e.g. 87.1997680664063 for top-5 speakers in https://github.com/iisys-hof/HUI-Audio-Corpus-German + +# Default values for dataset with sample_rate=44100 +sample_rate: 44100 +n_mel_channels: 80 +n_window_size: 2048 +n_window_stride: 512 +n_fft: 2048 +lowfreq: 0 +highfreq: null +window: hann + +whitelist_path: "nemo_text_processing/text_normalization/de/data/whitelist.tsv" + +model: + learn_alignment: true + bin_loss_warmup_epochs: 100 + + n_speakers: 119 # change to the number of speakers in your dataset. Make sure it is at least the largest speaker_id + 1 + max_token_duration: 75 + symbols_embedding_dim: 384 + pitch_embedding_kernel_size: 3 + + pitch_fmin: ${pitch_fmin} + pitch_fmax: ${pitch_fmax} + + pitch_mean: ${pitch_mean} + pitch_std: ${pitch_std} + + sample_rate: ${sample_rate} + n_mel_channels: ${n_mel_channels} + n_window_size: ${n_window_size} + n_window_stride: ${n_window_stride} + n_fft: ${n_fft} + lowfreq: ${lowfreq} + highfreq: ${highfreq} + window: ${window} + + text_normalizer: + _target_: nemo_text_processing.text_normalization.normalize.Normalizer + lang: de + input_case: cased + whitelist: ${whitelist_path} + + text_normalizer_call_kwargs: + verbose: false + punct_pre_process: true + punct_post_process: true + + text_tokenizer: + _target_: nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.GermanCharsTokenizer + punct: true + apostrophe: true + pad_with_space: true + + train_ds: + dataset: + _target_: nemo.collections.tts.torch.data.TTSDataset + manifest_filepath: ${train_dataset} + sample_rate: ${model.sample_rate} + sup_data_path: ${sup_data_path} + sup_data_types: ${sup_data_types} + n_fft: ${model.n_fft} + win_length: ${model.n_window_size} + hop_length: ${model.n_window_stride} + window: ${model.window} + n_mels: ${model.n_mel_channels} + lowfreq: ${model.lowfreq} + highfreq: ${model.highfreq} + max_duration: 15 # change to null to include longer audios. + min_duration: 0.1 + ignore_file: null + trim: true + trim_top_db: 50 + trim_frame_length: ${model.n_window_size} + trim_hop_length: ${model.n_window_stride} + pitch_fmin: ${model.pitch_fmin} + pitch_fmax: ${model.pitch_fmax} + pitch_norm: true + pitch_mean: ${model.pitch_mean} + pitch_std: ${model.pitch_std} + + dataloader_params: + drop_last: false + shuffle: true + batch_size: 32 + num_workers: 12 + pin_memory: true + + validation_ds: + dataset: + _target_: nemo.collections.tts.torch.data.TTSDataset + manifest_filepath: ${validation_datasets} + sample_rate: ${model.sample_rate} + sup_data_path: ${sup_data_path} + sup_data_types: ${sup_data_types} + n_fft: ${model.n_fft} + win_length: ${model.n_window_size} + hop_length: ${model.n_window_stride} + window: ${model.window} + n_mels: ${model.n_mel_channels} + lowfreq: ${model.lowfreq} + highfreq: ${model.highfreq} + max_duration: 15 # change to null to include longer audios. + min_duration: 0.1 + ignore_file: null + trim: true + trim_top_db: 50 + trim_frame_length: ${model.n_window_size} + trim_hop_length: ${model.n_window_stride} + pitch_fmin: ${model.pitch_fmin} + pitch_fmax: ${model.pitch_fmax} + pitch_norm: true + pitch_mean: ${model.pitch_mean} + pitch_std: ${model.pitch_std} + + dataloader_params: + drop_last: false + shuffle: false + batch_size: 32 + num_workers: 8 + pin_memory: true + + preprocessor: + _target_: nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor + features: ${model.n_mel_channels} + lowfreq: ${model.lowfreq} + highfreq: ${model.highfreq} + n_fft: ${model.n_fft} + n_window_size: ${model.n_window_size} + window_size: false + n_window_stride: ${model.n_window_stride} + window_stride: false + pad_to: 1 + pad_value: 0 + sample_rate: ${model.sample_rate} + window: ${model.window} + normalize: null + preemph: null + dither: 0.0 + frame_splicing: 1 + log: true + log_zero_guard_type: add + log_zero_guard_value: 1e-05 + mag_power: 1.0 + + input_fft: #n_embed and padding_idx are added by the model + _target_: nemo.collections.tts.modules.transformer.FFTransformerEncoder + n_layer: 6 + n_head: 1 + d_model: ${model.symbols_embedding_dim} + d_head: 64 + d_inner: 1536 + kernel_size: 3 + dropout: 0.1 + dropatt: 0.1 + dropemb: 0.0 + d_embed: ${model.symbols_embedding_dim} + + output_fft: + _target_: nemo.collections.tts.modules.transformer.FFTransformerDecoder + n_layer: 6 + n_head: 1 + d_model: ${model.symbols_embedding_dim} + d_head: 64 + d_inner: 1536 + kernel_size: 3 + dropout: 0.1 + dropatt: 0.1 + dropemb: 0.0 + + alignment_module: + _target_: nemo.collections.tts.modules.aligner.AlignmentEncoder + n_text_channels: ${model.symbols_embedding_dim} + + duration_predictor: + _target_: nemo.collections.tts.modules.fastpitch.TemporalPredictor + input_size: ${model.symbols_embedding_dim} + kernel_size: 3 + filter_size: 256 + dropout: 0.1 + n_layers: 2 + + pitch_predictor: + _target_: nemo.collections.tts.modules.fastpitch.TemporalPredictor + input_size: ${model.symbols_embedding_dim} + kernel_size: 3 + filter_size: 256 + dropout: 0.1 + n_layers: 2 + + optim: + name: adamw + lr: 1e-3 + betas: [ 0.9, 0.999 ] + weight_decay: 1e-6 + + sched: + name: NoamAnnealing + warmup_steps: 1000 + last_epoch: -1 + d_model: 1 # Disable scaling based on model dim + +trainer: + num_nodes: 1 + devices: -1 # specify all GPUs regardless of its availability + accelerator: gpu + strategy: ddp + precision: 16 + max_epochs: 1500 + accumulate_grad_batches: 1 + gradient_clip_val: 1000.0 + enable_checkpointing: false # Provided by exp_manager + logger: false # Provided by exp_manager + log_every_n_steps: 100 + check_val_every_n_epoch: 5 + benchmark: false + +exp_manager: + exp_dir: null + name: ${name} + create_tensorboard_logger: true + create_checkpoint_callback: true + checkpoint_callback_params: + monitor: val_loss + resume_if_exists: false + resume_ignore_no_checkpoint: false diff --git a/examples/tts/conf/de/fastpitch_align_44100.yaml b/examples/tts/conf/de/fastpitch_align_44100_phoneme.yaml similarity index 100% rename from examples/tts/conf/de/fastpitch_align_44100.yaml rename to examples/tts/conf/de/fastpitch_align_44100_phoneme.yaml diff --git a/nemo/collections/common/parts/preprocessing/parsers.py b/nemo/collections/common/parts/preprocessing/parsers.py index 9541fb2c3981..10a3522ef241 100644 --- a/nemo/collections/common/parts/preprocessing/parsers.py +++ b/nemo/collections/common/parts/preprocessing/parsers.py @@ -12,10 +12,10 @@ # See the License for the specific language governing permissions and # limitations under the License. -''' +""" A collection of simple character based parsers. These parser handle cleaning and tokenization by default. We currently support English. -''' +""" import string from typing import List, Optional @@ -223,7 +223,7 @@ def make_parser(labels: Optional[List[str]] = None, name: str = 'base', **kwargs Args: labels: List of labels to allocate indexes for. If set to - None then labels would be ascii table list. Essentially, this is a + None then labels would be ascii table list. Essentially, this is an id to str mapping (default: None). name: Concise name of parser to create (default: 'base'). (default: -1). diff --git a/nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py b/nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py index e919339cce01..06a3766afb59 100644 --- a/nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py +++ b/nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py @@ -35,11 +35,11 @@ 'U', 'V', 'W', 'X', 'Y', 'Z', 'Á', 'É', 'Í', 'Ñ', 'Ó', 'Ú', 'Ü' ), + # ref: https://en.wikipedia.org/wiki/German_orthography#Alphabet "de-DE": ( 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', - 'U', 'V', 'W', 'X', 'Y', 'Z', 'Ä', 'É', 'Ê', 'Ñ', - 'Ö', 'Ü' + 'U', 'V', 'W', 'X', 'Y', 'Z', 'Ä', 'Ö', 'Ü', 'ẞ', ), } @@ -66,6 +66,8 @@ ) } +GRAPHEME_CHARACTER_CASES = ["upper", "lower", "mixed"] + # fmt: on @@ -74,14 +76,27 @@ def validate_locale(locale): raise ValueError(f"Unsupported locale '{locale}'. " f"Supported locales {SUPPORTED_LOCALES}") -def get_grapheme_character_set(locale): +def get_grapheme_character_set(locale: str, case: str = "upper") -> str: if locale not in GRAPHEME_CHARACTER_SETS: raise ValueError( f"Grapheme character set not found for locale '{locale}'. " f"Supported locales {GRAPHEME_CHARACTER_SETS.keys()}" ) - char_set = set(GRAPHEME_CHARACTER_SETS[locale]) - return char_set + + charset_str_origin = ''.join(GRAPHEME_CHARACTER_SETS[locale]) + if case == "upper": + # Directly call .upper() will convert 'ß' into 'SS' according to https://bugs.python.org/issue30810. + charset_str = charset_str_origin.replace('ß', 'ẞ').upper() + elif case == "lower": + charset_str = charset_str_origin.lower() + elif case == "mixed": + charset_str = charset_str_origin.replace('ß', 'ẞ').upper() + charset_str_origin.lower() + else: + raise ValueError( + f"Grapheme character case not found: '{case}'. Supported cases are {GRAPHEME_CHARACTER_CASES}" + ) + + return charset_str def get_ipa_character_set(locale): @@ -101,11 +116,23 @@ def get_ipa_punctuation_list(locale): punct_set = set(DEFAULT_PUNCTUATION) if locale in ["de-DE", "es-ES"]: - # https://en.wikipedia.org/wiki/Guillemet#Uses + # ref: https://en.wikipedia.org/wiki/Guillemet#Uses punct_set.update(['«', '»', '‹', '›']) if locale == "de-DE": - punct_set.update(['„', '“']) + # ref: https://en.wikipedia.org/wiki/German_orthography#Punctuation + punct_set.update( + [ + '„', # double low-9 quotation mark, U+201E, decimal 8222 + '“', # left double quotation mark, U+201C, decimal 8220 + '‚', # single low-9 quotation mark, U+201A, decimal 8218 + '‘', # left single quotation mark, U+2018, decimal 8216 + '‒', # figure dash, U+2012, decimal 8210 + '–', # en dash, U+2013, decimal 8211 + '—', # em dash, U+2014, decimal 8212 + ] + ) elif locale == "es-ES": + # ref: https://en.wikipedia.org/wiki/Spanish_orthography#Punctuation punct_set.update(['¿', '¡']) punct_list = sorted(list(punct_set)) diff --git a/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py b/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py index 08c7bb09cac3..6a55982ff86f 100644 --- a/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py +++ b/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py @@ -27,7 +27,11 @@ spanish_text_preprocessing, ) -from nemo.collections.common.tokenizers.text_to_speech.ipa_lexicon import get_ipa_punctuation_list, validate_locale +from nemo.collections.common.tokenizers.text_to_speech.ipa_lexicon import ( + get_grapheme_character_set, + get_ipa_punctuation_list, + validate_locale, +) from nemo.utils import logging from nemo.utils.decorators import experimental @@ -85,6 +89,7 @@ def decode(self, tokens: List[int]) -> str: class BaseCharsTokenizer(BaseTokenizer): # fmt: off + # TODO @xueyang: unify definition of the default PUNCT_LIST and import from ipa_lexicon.py PUNCT_LIST = ( # Derived from LJSpeech and "/" additionally ',', '.', '!', '?', '-', ':', ';', '/', '"', '(', @@ -138,13 +143,13 @@ def encode(self, text): text = self.text_preprocessing_func(text) for c in text: - # Add space if last one isn't one + # Add a whitespace if the current char is a whitespace while the previous char is not a whitespace. if c == space and len(cs) > 0 and cs[-1] != space: cs.append(c) - # Add next char + # Add the current char that is an alphanumeric or an apostrophe. elif (c.isalnum() or c == "'") and c in tokens: cs.append(c) - # Add punct + # Add a punctuation that has a single char. elif (c in self.PUNCT_LIST) and self.punct: cs.append(c) # Warn about unknown char @@ -195,25 +200,22 @@ def __init__( class GermanCharsTokenizer(BaseCharsTokenizer): - # fmt: off - PUNCT_LIST = ( # Derived from LJSpeech and "/" additionally - ',', '.', '!', '?', '-', - ':', ';', '/', '"', '(', - ')', '[', ']', '{', '}', - ) - # fmt: on + + _LOCALE = "de-DE" + _PUNCT_LIST = get_ipa_punctuation_list(_LOCALE) + _CHARSET_STR = get_grapheme_character_set(locale=_LOCALE, case="mixed") def __init__( self, + chars=_CHARSET_STR, punct=True, apostrophe=True, add_blank_at=None, pad_with_space=False, - non_default_punct_list=None, + non_default_punct_list=_PUNCT_LIST, text_preprocessing_func=german_text_preprocessing, - phonemes=True, ): - """Deutsch char-based tokenizer. + """German grapheme-based tokenizer. Args: punct: Whether to reserve grapheme for basic punctuation or not. apostrophe: Whether to use apostrophe or not. @@ -221,16 +223,11 @@ def __init__( if None then no blank in labels. pad_with_space: Whether to pad text with spaces at the beginning and at the end or not. non_default_punct_list: List of punctuation marks which will be used instead default. - text_preprocessing_func: Text preprocessing function for correct execution of the tokenizer. - Currently, it only applies lower() function. + text_preprocessing_func: Text preprocessing function for correct execution of the tokenizer. By default, it + would keep any word unchanged. """ - - de_alphabet = "abcdefghijklmnopqrstuvwxyzäöüß" - if phonemes: - de_ipa = "ʊʃŋɜːɛɾəɪçɔøɡœɑÜ„1Q̃ɒʒÄɹÖʌθàó̈ðéɐá" - de_alphabet += de_ipa super().__init__( - chars=de_alphabet, + chars=chars, punct=punct, apostrophe=apostrophe, add_blank_at=add_blank_at, @@ -394,7 +391,7 @@ def __init__( pad_with_space: Whether to pad text with spaces at the beginning and at the end or not. text_preprocessing_func: Text preprocessing function for correct execution of the tokenizer. Basically, it replaces all non-unicode characters with unicode ones. - Note that lower() function shouldn't applied here, in case the text contains phonemes (it will be handled by g2p). + Note that lower() function shouldn't be applied here, in case the text contains phonemes (it will be handled by g2p). """ self.phoneme_probability = None @@ -661,7 +658,6 @@ def __init__( g2p, punct=True, non_default_punct_list=None, - chars=False, *, space=' ', silence=None, @@ -676,7 +672,6 @@ def __init__( g2p: Grapheme to phoneme module. punct: Whether to reserve grapheme for basic punctuation or not. non_default_punct_list: List of punctuation marks which will be used instead default. - chars: Whether to additionally use chars together with phonemes. It is useful if g2p module can return chars too. space: Space token as string. silence: Silence token as string (will be disabled if it is None). apostrophe: Whether to use apostrophe or not. @@ -686,7 +681,7 @@ def __init__( pad_with_space: Whether to pad text with spaces at the beginning and at the end or not. text_preprocessing_func: Text preprocessing function for correct execution of the tokenizer. Basically, it replaces all non-unicode characters with unicode ones. - Note that lower() function shouldn't applied here, in case the text contains phonemes (it will be handled by g2p). + Note that lower() function shouldn't be applied here, in case the text contains phonemes (it will be handled by g2p). """ tokens = [] self.space, tokens = len(tokens), tokens + [space] # Space diff --git a/nemo/collections/tts/data/data_utils.py b/nemo/collections/tts/data/data_utils.py index 7a32cb02c2ea..f8be72b01be7 100644 --- a/nemo/collections/tts/data/data_utils.py +++ b/nemo/collections/tts/data/data_utils.py @@ -40,8 +40,6 @@ def get_sup_data_file_path(entry: dict, base_audio_path: Path, sup_data_path: Pa audio_path = Path(entry["audio_filepath"]) rel_audio_path = audio_path.relative_to(base_audio_path).with_suffix("") audio_id = str(rel_audio_path).replace(os.sep, "_") - if "is_phoneme" in entry and entry["is_phoneme"] == 1: - audio_id += "_phoneme" file_name = f"{audio_id}.pt" file_path = Path(os.path.join(sup_data_path, file_name)) return file_path diff --git a/nemo/collections/tts/models/fastpitch.py b/nemo/collections/tts/models/fastpitch.py index 60dd4fe06271..7e639a81514a 100644 --- a/nemo/collections/tts/models/fastpitch.py +++ b/nemo/collections/tts/models/fastpitch.py @@ -95,6 +95,7 @@ def __init__(self, cfg: DictConfig, trainer: Trainer = None): assert self.vocab is not None input_fft_kwargs["n_embed"] = len(self.vocab.tokens) input_fft_kwargs["padding_idx"] = self.vocab.pad + # TODO @xueyang: remove AudioToCharWithPriorAndPitchDataset because it has been deprecated already. elif self.ds_class_name == "AudioToCharWithPriorAndPitchDataset": logging.warning( "AudioToCharWithPriorAndPitchDataset class has been deprecated. No support for" @@ -211,6 +212,7 @@ def _setup_tokenizer(self, cfg): text_tokenizer_kwargs["g2p"] = instantiate(cfg.text_tokenizer.g2p, **g2p_kwargs) + # TODO @xueyang: rename the instance of tokenizer because vocab is misleading. self.vocab = instantiate(cfg.text_tokenizer, **text_tokenizer_kwargs) @property @@ -506,7 +508,7 @@ def __setup_dataloader_from_config(self, cfg, shuffle_should_be: bool = True, na if "dataset" not in cfg or not isinstance(cfg.dataset, DictConfig): raise ValueError(f"No dataset for {name}") if "dataloader_params" not in cfg or not isinstance(cfg.dataloader_params, DictConfig): - raise ValueError(f"No dataloder_params for {name}") + raise ValueError(f"No dataloader_params for {name}") if shuffle_should_be: if 'shuffle' not in cfg.dataloader_params: logging.warning( diff --git a/nemo/collections/tts/torch/data.py b/nemo/collections/tts/torch/data.py index 069052b818e4..2c23a456c0f0 100644 --- a/nemo/collections/tts/torch/data.py +++ b/nemo/collections/tts/torch/data.py @@ -118,7 +118,6 @@ def __init__( "normalized_text": (Optional), "mel_filepath": (Optional), "duration": (Optional), - "is_phoneme": <0: default, 1: if normalized_text is phonemes> (Optional) sample_rate (int): The sample rate of the audio. Or the sample rate that we will resample all files to. text_tokenizer (Optional[Union[BaseTokenizer, Callable[[str], List[int]]]]): BaseTokenizer or callable which represents text tokenizer. tokens (Optional[List[str]]): Tokens from text_tokenizer. Should be specified if text_tokenizer is not BaseTokenizer. @@ -224,7 +223,6 @@ def __init__( "mel_filepath": item["mel_filepath"] if "mel_filepath" in item else None, "duration": item["duration"] if "duration" in item else None, "speaker_id": item["speaker"] if "speaker" in item else None, - "is_phoneme": item["is_phoneme"] if "is_phoneme" in item else None, } if "normalized_text" in item: @@ -487,8 +485,6 @@ def __getitem__(self, index): # Let's keep audio name and all internal directories in rel_audio_path_as_text_id to avoid any collisions rel_audio_path = Path(sample["audio_filepath"]).relative_to(self.base_data_dir).with_suffix("") rel_audio_path_as_text_id = str(rel_audio_path).replace("/", "_") - if sample["is_phoneme"] == 1: - rel_audio_path_as_text_id += "_phoneme" # Load audio features = self.featurizer.process( diff --git a/nemo/collections/tts/torch/g2ps.py b/nemo/collections/tts/torch/g2ps.py index 554f214975ba..d66af6f82da7 100644 --- a/nemo/collections/tts/torch/g2ps.py +++ b/nemo/collections/tts/torch/g2ps.py @@ -12,6 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# TODO (xueyang): deprecate this file since no other places import modules from here anymore. However, +# TODO @xueyang: deprecate this file since no other places import modules from here anymore. However, # all checkpoints uploaded in ngc used this path. So it requires to update all ngc checkpoints g2p path as well. from nemo_text_processing.g2p.modules import IPAG2P, BaseG2p, EnglishG2p diff --git a/nemo/collections/tts/torch/tts_tokenizers.py b/nemo/collections/tts/torch/tts_tokenizers.py index 0cac295b6c06..6e665fff11f9 100644 --- a/nemo/collections/tts/torch/tts_tokenizers.py +++ b/nemo/collections/tts/torch/tts_tokenizers.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -# TODO (xueyang): deprecate this file since no other places import modules from here anymore. However, +# TODO @xueyang: deprecate this file since no other places import modules from here anymore. However, # all checkpoints uploaded in ngc used this path. So it requires to update all ngc checkpoints path as well. from nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers import ( BaseCharsTokenizer, diff --git a/nemo_text_processing/g2p/data/data_utils.py b/nemo_text_processing/g2p/data/data_utils.py index 15522a796bf5..e481be9be2f9 100644 --- a/nemo_text_processing/g2p/data/data_utils.py +++ b/nemo_text_processing/g2p/data/data_utils.py @@ -130,6 +130,36 @@ def english_text_preprocessing(text, lower=True): return text +def german_text_preprocessing(text: str) -> str: + """ + TODO @xueyang: Apply NFC form may be too aggressive since it would ignore some accented characters that do not exist + in predefined German alphabet (nemo.collections.common.tokenizers.text_to_speech.ipa_lexicon.IPA_CHARACTER_SETS), + such as 'é'. This is not expected. A better solution is to add an extra normalization with NFD to discard the + diacritics and consider 'é' and 'e' produce similar pronunciations. + + Note that the tokenizer needs to run `unicodedata.normalize("NFC", x)` before calling `encode` function, + especially for the characters that have diacritics, such as 'ö' in the German alphabet. 'ö' can be encoded as + b'\xc3\xb6' (one char) as well as b'o\xcc\x88' (two chars). Without the normalization of composing two chars + together and without a complete predefined set of diacritics, when the tokenizer reads the input sentence + char-by-char, it would skip the combining diaeresis b'\xcc\x88', resulting in indistinguishable pronunciations + for 'ö' and 'o'. + + Args: + text (str): the original input sentence. + + Returns: + NFC normalized sentence (str). + """ + res = [] + for c in unicodedata.normalize("NFC", text): + if c in ['’']: # right single quotation mark as an apostrophe, U+2019, decimal 8217 + res.append("'") + else: + res.append(c) + + return ''.join(res) + + def _word_tokenize(words: List[Tuple[str, str, str]]) -> List[Tuple[List[str], bool]]: """ Process a list of words and attach indicators showing if each word is unchangeable or not. Each word representation @@ -202,10 +232,6 @@ def any_locale_text_preprocessing(text): return text.lower() -def german_text_preprocessing(text): - return text.lower() - - def spanish_text_preprocessing(text): return text.lower() diff --git a/nemo_text_processing/g2p/modules.py b/nemo_text_processing/g2p/modules.py index f83c47d56210..fbdfd7887841 100644 --- a/nemo_text_processing/g2p/modules.py +++ b/nemo_text_processing/g2p/modules.py @@ -82,7 +82,7 @@ def __init__( heteronyms (str, Path, List): Path to file with heteronyms (every line is new word) or list of words. encoding: Encoding type. phoneme_probability (Optional[float]): The probability (0. self.phoneme_probability: return word, True - # punctuation + # punctuation or whitespace. if re.search(r"[a-zA-ZÀ-ÿ\d]", word) is None: return list(word), True @@ -281,7 +281,7 @@ def __init__( Args: phoneme_dict (str, Path, Dict): Path to file in CMUdict format or a IPA dict object with CMUdict-like entries. a dictionary file example: scripts/tts_dataset_files/ipa_cmudict-0.7b_nv22.06.txt; - a dictionary object example: {..., "WIRE": [["ˈ", "w", "a", "ɪ", "ɚ"], ["ˈ", "w", "a", "ɪ", "ɹ"]], ...} + a dictionary object example: {..., "Wire": [["ˈ", "w", "a", "ɪ", "ɚ"], ["ˈ", "w", "a", "ɪ", "ɹ"]], ...} locale: Locale used to determine default tokenization logic. Supports ["en-US", "de-DE", "es-ES"]. Defaults to "en-US". Specify None if implementing custom logic for a new locale. @@ -292,7 +292,7 @@ def __init__( use_chars: Whether to include chars/graphemes in the token list. This should be set to True if phoneme_probability is used or if apply_to_oov_word ever returns graphemes. phoneme_probability (Optional[float]): The probability (0. 0, "Not enough data for train, val and test" - - def save(p, data): - with open(p, 'w') as f: - for d in data: - f.write(json.dumps(d) + '\n') - - save(dataset_path / "train_manifest.json", train_entries) - save(dataset_path / "val_manifest.json", validate_entries) - save(dataset_path / "test_manifest.json", test_entries) - - -def main(): - args = get_args() - dataset_root = args.data_root - dataset_root.mkdir(parents=True, exist_ok=True) - __process_data( - dataset_root / EXTRACTED_FOLDER, args.val_size, args.test_size, args.seed_for_ds_split, - ) - - -if __name__ == "__main__": - main() diff --git a/scripts/dataset_processing/tts/openslr_95/ds_conf/ds_for_fastpitch_align.yaml b/scripts/dataset_processing/tts/thorsten_neutral/ds_conf/ds_for_fastpitch_align.yaml similarity index 88% rename from scripts/dataset_processing/tts/openslr_95/ds_conf/ds_for_fastpitch_align.yaml rename to scripts/dataset_processing/tts/thorsten_neutral/ds_conf/ds_for_fastpitch_align.yaml index 5734e5814165..709a212395b9 100644 --- a/scripts/dataset_processing/tts/openslr_95/ds_conf/ds_for_fastpitch_align.yaml +++ b/scripts/dataset_processing/tts/thorsten_neutral/ds_conf/ds_for_fastpitch_align.yaml @@ -21,7 +21,10 @@ dataset: max_duration: null min_duration: 0.1 ignore_file: null - trim: false + trim: true + trim_top_db: 50 + trim_frame_length: ${dataset.win_length} + trim_hop_length: ${dataset.hop_length} pitch_fmin: 65.40639132514966 pitch_fmax: 2093.004522404789 @@ -40,5 +43,4 @@ dataset: _target_: nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.GermanCharsTokenizer punct: true apostrophe: true - pad_with_space: true - phonemes: true \ No newline at end of file + pad_with_space: true \ No newline at end of file diff --git a/scripts/dataset_processing/tts/thorsten_neutral/get_data.py b/scripts/dataset_processing/tts/thorsten_neutral/get_data.py new file mode 100644 index 000000000000..9422c0cd5498 --- /dev/null +++ b/scripts/dataset_processing/tts/thorsten_neutral/get_data.py @@ -0,0 +1,263 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +This script is used to generate JSON manifests for mel-generator model training. The usage is below. + +$ python scripts/dataset_processing/tts/thorsten_neutral/get_data.py \ + --data-root ~/experiments/thorsten_neutral \ + --manifests-root ~/experiments/thorsten_neutral \ + --data-version "22_10" \ + --min-duration 0.1 \ + --normalize-text +""" + +import argparse +import json +import random +import shutil +import subprocess +import urllib.request +from pathlib import Path + +from joblib import Parallel, delayed +from nemo_text_processing.text_normalization.normalize import Normalizer +from tqdm import tqdm + +from nemo.utils import logging + +# Thorsten Müller published two neural voice datasets, 21.02 and 22.10. +THORSTEN_NEUTRAL = { + "21_02": { + "url": "https://zenodo.org/record/5525342/files/thorsten-neutral_v03.tgz?download=1", + "dir_name": "thorsten-de_v03", + "metadata": ["metadata.csv"], + }, + "22_10": { + "url": "https://zenodo.org/record/7265581/files/ThorstenVoice-Dataset_2022.10.zip?download=1", + "dir_name": "ThorstenVoice-Dataset_2022.10", + "metadata": ["metadata_train.csv", "metadata_dev.csv", "metadata_test.csv"], + }, +} + + +def get_args(): + parser = argparse.ArgumentParser( + formatter_class=argparse.ArgumentDefaultsHelpFormatter, + description="Download Thorsten Müller's neutral voice dataset and create manifests with predefined split. " + "Thorsten Müller published two neural voice datasets, 21.02 and 22.10, where 22.10 provides better " + "audio quality. Please choose one of the two for your TTS models. Details about the dataset are " + "in https://github.com/thorstenMueller/Thorsten-Voice.", + ) + parser.add_argument("--data-root", required=True, type=Path, help="where the resulting dataset will reside.") + parser.add_argument("--manifests-root", required=True, type=Path, help="where the manifests files will reside.") + parser.add_argument("--data-version", default="22_10", choices=["21_02", "22_10"], type=str) + parser.add_argument("--min-duration", default=0.1, type=float) + parser.add_argument("--max-duration", default=float('inf'), type=float) + parser.add_argument("--val-size", default=100, type=int) + parser.add_argument("--test-size", default=100, type=int) + parser.add_argument( + "--num-workers", + default=-1, + type=int, + help="Specify the max number of concurrent Python worker processes. " + "If -1 all CPUs are used. If 1 no parallel computing is used.", + ) + parser.add_argument( + "--normalize-text", + default=False, + action='store_true', + help="Normalize original text and add a new entry 'normalized_text' to .json file if True.", + ) + parser.add_argument( + "--seed-for-ds-split", + default=100, + type=float, + help="Seed for deterministic split of train/dev/test, NVIDIA's default is 100.", + ) + args = parser.parse_args() + return args + + +def __maybe_download_file(source_url, destination_path): + if not destination_path.exists(): + logging.info(f"Downloading data: {source_url} --> {destination_path}") + tmp_file_path = destination_path.with_suffix(".tmp") + urllib.request.urlretrieve(source_url, filename=tmp_file_path) + tmp_file_path.rename(destination_path) + else: + logging.info(f"Skipped downloading data because it exists: {destination_path}") + + +def __extract_file(filepath, data_dir): + logging.info(f"Unzipping data: {filepath} --> {data_dir}") + shutil.unpack_archive(filepath, data_dir) + logging.info(f"Unzipping data is complete: {filepath}.") + + +def __save_json(json_file, dict_list): + logging.info(f"Saving JSON split to {json_file}.") + with open(json_file, "w") as f: + for d in dict_list: + f.write(json.dumps(d) + "\n") + + +def __text_normalization(json_file, num_workers=-1): + text_normalizer_call_kwargs = { + "punct_pre_process": True, + "punct_post_process": True, + } + text_normalizer = Normalizer( + lang="de", input_case="cased", overwrite_cache=True, cache_dir=str(json_file.parent / "cache_dir"), + ) + + def normalizer_call(x): + return text_normalizer.normalize(x, **text_normalizer_call_kwargs) + + def add_normalized_text(line_dict): + normalized_text = normalizer_call(line_dict["text"]) + line_dict.update({"normalized_text": normalized_text}) + return line_dict + + logging.info(f"Normalizing text for {json_file}.") + with open(json_file, 'r', encoding='utf-8') as fjson: + lines = fjson.readlines() + # Note: you need to verify which backend works well on your cluster. + # backend="loky" is fine on multi-core Ubuntu OS; backend="threading" on Slurm. + dict_list = Parallel(n_jobs=num_workers)( + delayed(add_normalized_text)(json.loads(line)) for line in tqdm(lines) + ) + + json_file_text_normed = json_file.parent / f"{json_file.stem}_text_normed{json_file.suffix}" + with open(json_file_text_normed, 'w', encoding="utf-8") as fjson_norm: + for dct in dict_list: + fjson_norm.write(json.dumps(dct) + "\n") + logging.info(f"Normalizing text is complete: {json_file} --> {json_file_text_normed}") + + +def __process_data( + unzipped_dataset_path, metadata, min_duration, max_duration, val_size, test_size, seed_for_ds_split +): + logging.info("Preparing JSON train/val/test splits.") + + entries = list() + not_found_wavs = list() + wrong_duration_wavs = list() + + for metadata_fname in metadata: + meta_file = unzipped_dataset_path / metadata_fname + with open(meta_file, 'r') as fmeta: + for line in tqdm(fmeta): + items = line.strip().split('|') + wav_file_stem, text = items[0], items[1] + wav_file = unzipped_dataset_path / "wavs" / f"{wav_file_stem}.wav" + + # skip audios if they do not exist. + if not wav_file.exists(): + not_found_wavs.append(wav_file) + logging.warning(f"Skipping {wav_file}: it is not found.") + continue + + # skip audios if their duration is out of range. + duration = subprocess.check_output(f"soxi -D {wav_file}", shell=True) + duration = float(duration) + if min_duration <= duration <= max_duration: + entry = { + 'audio_filepath': str(wav_file), + 'duration': duration, + 'text': text, + } + entries.append(entry) + elif duration < min_duration: + wrong_duration_wavs.append(wav_file) + logging.warning(f"Skipping {wav_file}: it is too short, less than {min_duration} seconds.") + continue + else: + wrong_duration_wavs.append(wav_file) + logging.warning(f"Skipping {wav_file}: it is too long, greater than {max_duration} seconds.") + continue + + random.Random(seed_for_ds_split).shuffle(entries) + train_size = len(entries) - val_size - test_size + if train_size <= 0: + raise ValueError("Not enough data for the train split.") + + logging.info("Preparing JSON train/val/test splits is complete.") + train, val, test = ( + entries[:train_size], + entries[train_size : train_size + val_size], + entries[train_size + val_size :], + ) + + return train, val, test, not_found_wavs, wrong_duration_wavs + + +def main(): + args = get_args() + data_root = args.data_root + manifests_root = args.manifests_root + data_version = args.data_version + + dataset_root = data_root / f"ThorstenVoice-Dataset-{data_version}" + dataset_root.mkdir(parents=True, exist_ok=True) + + # download and extract dataset + dataset_url = THORSTEN_NEUTRAL[data_version]["url"] + zipped_dataset_path = dataset_root / Path(dataset_url).name.split("?")[0] + __maybe_download_file(dataset_url, zipped_dataset_path) + __extract_file(zipped_dataset_path, dataset_root) + + # generate train/dev/test splits + unzipped_dataset_path = dataset_root / THORSTEN_NEUTRAL[data_version]["dir_name"] + entries_train, entries_val, entries_test, not_found_wavs, wrong_duration_wavs = __process_data( + unzipped_dataset_path=unzipped_dataset_path, + metadata=THORSTEN_NEUTRAL[data_version]["metadata"], + min_duration=args.min_duration, + max_duration=args.max_duration, + val_size=args.val_size, + test_size=args.test_size, + seed_for_ds_split=args.seed_for_ds_split, + ) + + # save json splits. + train_json = manifests_root / "train_manifest.json" + val_json = manifests_root / "val_manifest.json" + test_json = manifests_root / "test_manifest.json" + __save_json(train_json, entries_train) + __save_json(val_json, entries_val) + __save_json(test_json, entries_test) + + # save skipped audios that are not found into a file. + if len(not_found_wavs) > 0: + skipped_not_found_file = manifests_root / "skipped_not_found_wavs.list" + with open(skipped_not_found_file, "w") as f_notfound: + for line in not_found_wavs: + f_notfound.write(f"{line}\n") + + # save skipped audios that are too short or too long into a file. + if len(wrong_duration_wavs) > 0: + skipped_wrong_duration_file = manifests_root / "skipped_wrong_duration_wavs.list" + with open(skipped_wrong_duration_file, "w") as f_wrong_dur: + for line in wrong_duration_wavs: + f_wrong_dur.write(f"{line}\n") + + # normalize text if requested. New json file, train_manifest_text_normed.json, will be generated. + if args.normalize_text: + __text_normalization(train_json, args.num_workers) + __text_normalization(val_json, args.num_workers) + __text_normalization(test_json, args.num_workers) + + +if __name__ == "__main__": + main() diff --git a/tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py b/tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py index 6077c78c6523..f81726d44b06 100644 --- a/tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py +++ b/tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py @@ -81,7 +81,7 @@ def test_english_chars_tokenizer_accented_character(self): @pytest.mark.unit def test_german_chars_tokenizer(self): input_text = "Was ist dein Lieblingsgetränk?" - expected_output = "was ist dein lieblingsgetränk?" + expected_output = "Was ist dein Lieblingsgetränk?" tokenizer = GermanCharsTokenizer() chars, tokens = self._parse_text(tokenizer, input_text) From 0239a5787d73756a98c7f6360a18c70e261f4628 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 10 Jan 2023 04:22:22 -0800 Subject: [PATCH 118/154] Refactor so token, word and additonal segment-level alignments are generated in the same run Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 10 +- tools/nemo_forced_aligner/align.py | 102 ++++---- tools/nemo_forced_aligner/utils/data_prep.py | 242 ++++++++++++++---- tools/nemo_forced_aligner/utils/make_ctm.py | 249 +++++-------------- 4 files changed, 300 insertions(+), 303 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index b31e9ecf0d44..694a9ffebf79 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -27,7 +27,7 @@ Call the `align.py` script, specifying the parameters as follows: * `manifest_filepath`: The path to the manifest of the data you want to align, containing `'audio_filepath'` and `'text'` fields. The audio filepaths need to be absolute paths. -* `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. +* `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/{tokens,words,additional_segments}/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. * **[OPTIONAL]** `transcribe_device`: The device that will be used for generating log-probs (i.e. transcribing). (Default: 'cpu'). @@ -35,12 +35,10 @@ Call the `align.py` script, specifying the parameters as follows: * **[OPTIONAL]** `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). -* **[OPTIONAL]** `ctm_grouping_separator`: the string used to separate CTM segments. If the separator is `“”` (empty string) or `None`, the CTM segments will be the tokens used by the ASR model. If the separator is anything else, e.g. `“ “`, `“|”` or `“”`, the segments will be the blocks of text separated by that separator. (Default: `“ “`, so for languages such as English, the CTM segments will be words.) -> Note: if the separator is not `“”` or `“ “`, it will be removed from the ground truth text and the CTM, ie it is treated as a marker which is not part of the ground truth. The separator will essentially be treated as a space, and any additional spaces around it will be amalgamated into one, i.e. the following texts will be treated equivalently: `“abc|def”`, `“abc |def”`, `“abc| def”`, `“abc | def"`. +* **[OPTIONAL]** `additional_ctm_grouping_separator`: the string used to separate CTM segments if you want to obtain CTM files at a level that is not the token level or the word level. NFA will always produce token-level and word-level CTM files in: `/tokens/.ctm` and `/words/.ctm`. If `additional_ctm_grouping_separator` is specified, an additional folder `/{tokens/words/additional_segments}/.ctm` will be created containing CTMs for `addtional_ctm_grouping_separator`-separated segments. (Default: `None`. Cannot be empty string or space (" "), as space, as space-separated word-level CTMs will always be saved in `/words/.ctm`.) +> Note: the `additional_ctm_grouping_separator` will be removed from the ground truth text and all the output CTMs, ie it is treated as a marker which is not part of the ground truth. The separator will essentially be treated as a space, and any additional spaces around it will be amalgamated into one, i.e. if `additional_ctm_grouping_separator="|"`, the following texts will be treated equivalently: `“abc|def”`, `“abc |def”`, `“abc| def”`, `“abc | def"`. -> Note: if you pass in a hydra override `ctm_grouping_separator=" "`, hydra will remove the whitespace, thus converting that `" "` to `""`. If you want to pass in a space, make sure to use `ctm_grouping_separator="\ "`, or just do not pass in this override, as the default value is `" "` anyway. - -* **[OPTIONAL]** `remove_blank_tokens_from_ctm`: a boolean denoting whether to remove tokens from output CTMs. (Default: False). Note: "" tokens can only be present if your CTMs are token-level. Therefore, NFA will throw an error if you set `remove_blank_tokens_from_ctm` to `True` if the `ctm_grouping_separator` is not `""` or `None`. +* **[OPTIONAL]** `remove_blank_tokens_from_ctm`: a boolean denoting whether to remove tokens from token-level output CTMs. (Default: False). * **[OPTIONAL]** `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be replaced with dashes, so as not to change the number of space-separated elements in the CTM files. diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 1e793ffc40c1..d4d0373015e1 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -12,13 +12,14 @@ # See the License for the specific language governing permissions and # limitations under the License. +import os from dataclasses import dataclass, is_dataclass from typing import Optional import torch from omegaconf import OmegaConf from utils.data_prep import get_audio_sr, get_log_probs_y_T_U, get_manifest_lines -from utils.make_ctm import make_segment_ctm, make_token_ctm +from utils.make_ctm import make_ctm from utils.viterbi_decoding import viterbi_decoding from nemo.collections.asr.models.ctc_models import EncDecCTCModel @@ -52,21 +53,13 @@ viterbi_device: string specifying the device that will be used for doing Viterbi decoding. The string needs to be in a format recognized by torch.device(). batch_size: int specifying batch size that will be used for generating log-probs and doing Viterbi decoding. - ctm_grouping_separator: the string used to separate CTM segments. - If the separator is “” or None, each line of the output CTM will be the tokens used by the ASR model. - If the separator is anything else, e.g. “ “, “|” or “”, the segments will be the blocks of - text separated by that separator. - Default: “ “, ie for languages such as English, the CTM segments will be words. - Note: if the separator is not “” or “ “, it will be removed from the CTM, ie it is treated as a marker - which is not part of the ground truth. It will essentially be treated as a space, and any additional spaces - around it will be amalgamated into one, i.e. the following texts will be treated as equivalent: - “abc|def” - “abc |def” - “abc| def” - “abc | def” - remove_blank_tokens_from_ctm: a boolean denoting whether to remove tokens from output CTMs. - Note: "" tokens can only be present if your CTMs are token-level. Therefore, NFA will throw an error - if you set `remove_blank_tokens_from_ctm` to `True` if the `ctm_grouping_separator` is not `""` or `None`. + additional_ctm_grouping_separator: the string used to separate CTM segments if you want to obtain CTM files at a + level that is not the token level or the word level. NFA will always produce token-level and word-level CTM + files in: `/tokens/.ctm` and `/words/.ctm`. + If `additional_ctm_grouping_separator` is specified, an additional folder + `/{tokens/words/additional_segments}/.ctm` will be created containing CTMs + for `addtional_ctm_grouping_separator`-separated segments. + remove_blank_tokens_from_ctm: a boolean denoting whether to remove tokens from token-level output CTMs. n_parts_for_ctm_id: int specifying how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. Note also that any spaces that are present in the audio_filepath @@ -95,7 +88,7 @@ class AlignmentConfig: transcribe_device: str = "cpu" viterbi_device: str = "cpu" batch_size: int = 1 - ctm_grouping_separator: Optional[str] = " " + additional_ctm_grouping_separator: Optional[str] = None remove_blank_tokens_from_ctm: bool = False minimum_timestamp_duration: float = 0 n_parts_for_ctm_id: int = 1 @@ -125,31 +118,12 @@ def main(cfg: AlignmentConfig): if cfg.output_ctm_folder is None: raise ValueError("cfg.output_ctm_folder must be specified") - if (cfg.ctm_grouping_separator != "" or cfg.ctm_grouping_separator is None) and cfg.remove_blank_tokens_from_ctm: - raise ValueError( - "cfg.remove_blank_tokens_from_ctm can only be set to True if producing token-level CTMs, " - " (i.e. if cfg.ctm_grouping_separator is empty string or None" - ) + if cfg.additional_ctm_grouping_separator == "" or cfg.additional_ctm_grouping_separator == " ": + raise ValueError("cfg.additional_grouping_separaor cannot be empty string or space character") if cfg.minimum_timestamp_duration < 0: raise ValueError("cfg.minimum_timestamp_duration cannot be a negative number") - # Log info about selected params - if cfg.ctm_grouping_separator == "" or cfg.ctm_grouping_separator is None: - logging.info( - f"ctm_grouping_separator is {cfg.ctm_grouping_separator} => " "each line of the output CTM will be a token" - ) - elif cfg.ctm_grouping_separator == " ": - logging.info( - f"ctm_grouping_separator is {cfg.ctm_grouping_separator} => " - "each line of the output CTM will be a space-separated word" - ) - else: - logging.info( - f"ctm_grouping_separator is {cfg.ctm_grouping_separator} => " - f"each line of the output CTM will be the text that is inbetween {cfg.ctm_grouping_separator}" - ) - # init devices transcribe_device = torch.device(cfg.transcribe_device) viterbi_device = torch.device(cfg.viterbi_device) @@ -183,30 +157,46 @@ def main(cfg: AlignmentConfig): for start, end in zip(starts, ends): data = get_manifest_lines(cfg.manifest_filepath, start, end) - log_probs, y, T, U = get_log_probs_y_T_U(data, model, cfg.ctm_grouping_separator) + log_probs, y, T, U, token_info, word_info, segment_info = get_log_probs_y_T_U( + data, model, cfg.additional_ctm_grouping_separator + ) alignments = viterbi_decoding(log_probs, y, T, U, viterbi_device) - if cfg.ctm_grouping_separator == "" or cfg.ctm_grouping_separator is None: - make_token_ctm( - data, - alignments, - model, - cfg.model_downsample_factor, - cfg.output_ctm_folder, - cfg.remove_blank_tokens_from_ctm, - cfg.minimum_timestamp_duration, - cfg.n_parts_for_ctm_id, - audio_sr, - ) + make_ctm( + token_info, + alignments, + data, + model, + cfg.model_downsample_factor, + os.path.join(cfg.output_ctm_folder, "tokens"), + cfg.remove_blank_tokens_from_ctm, + cfg.n_parts_for_ctm_id, + cfg.minimum_timestamp_duration, + audio_sr, + ) - else: - make_segment_ctm( - data, + make_ctm( + word_info, + alignments, + data, + model, + cfg.model_downsample_factor, + os.path.join(cfg.output_ctm_folder, "words"), + False, # dont try to remove blank tokens because we dont expect them to be there + cfg.n_parts_for_ctm_id, + cfg.minimum_timestamp_duration, + audio_sr, + ) + + if cfg.additional_ctm_grouping_separator: + make_ctm( + segment_info, alignments, + data, model, cfg.model_downsample_factor, - cfg.output_ctm_folder, - cfg.ctm_grouping_separator, + os.path.join(cfg.output_ctm_folder, "additional_segments"), + False, # dont try to remove blank tokens because we dont expect them to be there cfg.n_parts_for_ctm_id, cfg.minimum_timestamp_duration, audio_sr, diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index 9a0bd425e1cb..824a525f6310 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -52,54 +52,190 @@ def get_manifest_lines(manifest_filepath, start, end): return data -def get_processed_text_and_segments(text, separator): - """ - If the separator is not empty string, ie CTM segments will not just be tokens, - this function replaces the separator with a space, amalgamates any spaces around - it into one, and returns the new processed text as well as a list of strings - containing the segments to be returned in the CTM. - """ - if separator == "": - return text, None - - # get rid of any whitespace around text - text = text.strip() - - segments = text.split(separator) - - # remove any spaces at start and end of segments - segments = [seg.strip() for seg in segments] - - processed_text = " ".join(segments) - - return processed_text, segments - - -def tokenize_text(model, text): - """ Works for token-based and character-based models """ - if hasattr(model, "tokenizer"): - return model.tokenizer.text_to_ids(text) +def get_char_tokens(text, model): + tokens = [] + for character in text: + if character in model.decoder.vocabulary: + tokens.append(model.decoder.vocabulary.index(character)) + else: + tokens.append(len(model.decoder.vocabulary)) # return unk token (same as blank token) + + return tokens + + +def get_y_token_word_segment_info(text, model, separator): + + if hasattr(model, 'tokenizer'): + + if not separator: # if separator is not defined - treat the whole text as one segment + segments = [text] + else: + segments = text.split(separator) + + # remove any spaces at start and end of segments + segments = [seg.strip() for seg in segments] + + BLANK_ID = len(model.decoder.vocabulary) # TODO: check + BLANK_TOKEN = "" + y_token_ids = [] + y_token_ids_with_blanks = [BLANK_ID] + y_tokens = [] + y_tokens_with_blanks = [BLANK_TOKEN] + token_info = [{"text": BLANK_TOKEN, "s_start": 0, "s_end": 0,}] + word_info = [] + segment_info = [] + + segment_s_pointer = 1 + word_s_pointer = 1 + + for segment in segments: + words = segment.split(" ") + for word in words: + + word_tokens = model.tokenizer.text_to_tokens(word) + word_ids = model.tokenizer.text_to_ids(word) + for token, id_ in zip(word_tokens, word_ids): + y_token_ids.append(id_) + y_token_ids_with_blanks.extend([id_, BLANK_ID]) + y_tokens.append(token) + y_tokens_with_blanks.extend([token, BLANK_TOKEN]) + + token_info.append( + { + "text": token, + "s_start": len(y_tokens_with_blanks) - 2, + "s_end": len(y_tokens_with_blanks) - 2, + } + ) + token_info.append( + { + "text": BLANK_TOKEN, + "s_start": len(y_tokens_with_blanks) - 1, + "s_end": len(y_tokens_with_blanks) - 1, + } + ) + + word_info.append( + { + "text": word, + "s_start": word_s_pointer, + "s_end": word_s_pointer + (len(word_tokens) - 1) * 2, # TODO check this, + } + ) + word_s_pointer += len(word_tokens) * 2 # TODO check this + + segment_tokens = model.tokenizer.text_to_tokens(segment) + segment_info.append( + { + "text": segment, + "s_start": segment_s_pointer, + "s_end": segment_s_pointer + (len(segment_tokens) - 1) * 2, + } + ) + segment_s_pointer += len(segment_tokens) * 2 + + return y_token_ids_with_blanks, token_info, word_info, segment_info elif hasattr(model.decoder, "vocabulary"): - # i.e. assume it is character-based model (in theory could be TalkNet - that will not work) - tokens = [] - for character in text: - if character in model.decoder.vocabulary: - tokens.append(model.decoder.vocabulary.index(character)) - else: - tokens.append(len(model.decoder.vocabulary)) # return unk token (same as blank token) - return tokens + segments = text.split(separator) + + # remove any spaces at start and end of segments + segments = [seg.strip() for seg in segments] + + BLANK_ID = len(model.decoder.vocabulary) # TODO: check this is correct + BLANK_TOKEN = "" + SPACE_ID = model.decoder.vocabulary.index(" ") + SPACE_TOKEN = "" + y_token_ids = [] + y_token_ids_with_blanks = [BLANK_ID] + y_tokens = [] + y_tokens_with_blanks = [BLANK_TOKEN] + token_info = [{"text": BLANK_TOKEN, "s_start": 0, "s_end": 0,}] + word_info = [] + segment_info = [] + + segment_s_pointer = 1 + word_s_pointer = 1 + + for i_segment, segment in enumerate(segments): + words = segment.split(" ") + for i_word, word in enumerate(words): + + word_tokens = list(word) + word_ids = get_char_tokens(word, model) + for token, id_ in zip(word_tokens, word_ids): + y_token_ids.append(id_) + y_token_ids_with_blanks.extend([id_, BLANK_ID]) + y_tokens.append(token) + y_tokens_with_blanks.extend([token, BLANK_TOKEN]) + + token_info.append( + { + "text": token, + "s_start": len(y_tokens_with_blanks) - 2, + "s_end": len(y_tokens_with_blanks) - 2, + } + ) + token_info.append( + { + "text": BLANK_TOKEN, + "s_start": len(y_tokens_with_blanks) - 1, + "s_end": len(y_tokens_with_blanks) - 1, + } + ) + + # add space token unless this is the final word in the final segment + if not (i_segment == len(segments) - 1 and i_word == len(words) - 1): + y_token_ids.append(SPACE_ID) + y_token_ids_with_blanks.extend([SPACE_ID, BLANK_ID]) + y_tokens.append(SPACE_TOKEN) + y_tokens_with_blanks.extend([SPACE_TOKEN, BLANK_TOKEN]) + token_info.append( + { + "text": SPACE_TOKEN, + "s_start": len(y_tokens_with_blanks) - 2, + "s_end": len(y_tokens_with_blanks) - 2, + } + ) + token_info.append( + { + "text": BLANK_TOKEN, + "s_start": len(y_tokens_with_blanks) - 1, + "s_end": len(y_tokens_with_blanks) - 1, + } + ) + word_info.append( + { + "text": word, + "s_start": word_s_pointer, + "s_end": word_s_pointer + len(word_tokens) * 2 - 2, # TODO check this, + } + ) + word_s_pointer += len(word_tokens) * 2 + 2 # TODO check this + + segment_tokens = get_char_tokens(segment, model) + segment_info.append( + { + "text": segment, + "s_start": segment_s_pointer, + "s_end": segment_s_pointer + (len(segment_tokens) - 1) * 2, + } + ) + segment_s_pointer += len(segment_tokens) * 2 + 2 + + return y_token_ids_with_blanks, token_info, word_info, segment_info else: - raise RuntimeError("Can't tokenize this model atm") + raise RuntimeError("Cannot get tokens of this model.") def get_log_probs_y_T_U(data, model, separator): """ Preparing some tensors to be used during Viterbi decoding. Returns: - log_probs, y, T, U_dash (ie. U with every other token being a blank) + log_probs, y, T, U (y and U are s.t. every other token is a blank), + token_info_list, word_info_list, segment_info_list """ audio_filepaths = [line["audio_filepath"] for line in data] @@ -116,17 +252,24 @@ def get_log_probs_y_T_U(data, model, separator): y_list = [] U_list = [] + token_info_list = [] + word_info_list = [] + segment_info_list = [] + for line in data: - processed_text, _ = get_processed_text_and_segments(line['text'], separator) - y_line = tokenize_text(model, processed_text) + y_line, token_info, word_info, segment_info = get_y_token_word_segment_info(line["text"], model, separator) + y_list.append(y_line) U_list.append(len(y_line)) + token_info_list.append(token_info) + word_info_list.append(word_info) + segment_info_list.append(segment_info) T_max = max(T_list) U_max = max(U_list) - blank_index = len(model.decoder.vocabulary) V = len(model.decoder.vocabulary) + 1 T = torch.tensor(T_list) + U = torch.tensor(U_list) # make log_probs tensor of shape (B x T_max x V) log_probs = V_NEG_NUM * torch.ones((B, T_max, V)) @@ -134,19 +277,10 @@ def get_log_probs_y_T_U(data, model, separator): t = log_probs_b.shape[0] log_probs[b, :t, :] = log_probs_b - # make y tensor of shape (B x U_dash_max) - U_dash_max = 2 * U_max + 1 - y = V * torch.ones((B, U_dash_max), dtype=torch.int64) - U_dash_list = [] + # make y tensor of shape (B x U_max) + y = V * torch.ones((B, U_max), dtype=torch.int64) for b, y_b in enumerate(y_list): - y_dash_b = [blank_index] - for y_b_element in y_b: - y_dash_b.append(y_b_element) - y_dash_b.append(blank_index) - U_dash = len(y_dash_b) - y[b, :U_dash] = torch.tensor(y_dash_b) - U_dash_list.append(U_dash) - - U_dash = torch.tensor(U_dash_list) + U_b = U[b] + y[b, :U_b] = torch.tensor(y_b) - return log_probs, y, T, U_dash + return log_probs, y, T, U, token_info_list, word_info_list, segment_info_list diff --git a/tools/nemo_forced_aligner/utils/make_ctm.py b/tools/nemo_forced_aligner/utils/make_ctm.py index db2b1c7c691e..4f33ed4f75d1 100644 --- a/tools/nemo_forced_aligner/utils/make_ctm.py +++ b/tools/nemo_forced_aligner/utils/make_ctm.py @@ -17,8 +17,6 @@ import soundfile as sf -from .data_prep import get_processed_text_and_segments, tokenize_text - def _get_utt_id(audio_filepath, n_parts_for_ctm_id): fp_parts = Path(audio_filepath).parts[-n_parts_for_ctm_id:] @@ -27,68 +25,77 @@ def _get_utt_id(audio_filepath, n_parts_for_ctm_id): return utt_id -def make_token_ctm( - data, - alignments, +def add_t_start_end_to_boundary_info(boundary_info, alignment): + # first remove boundary_info of any items that are not in the alignment + # the only items we expect not to be in the alignment are blanks which the alignment chooses to skip + # we will iterate boundary_info in reverse order for this to make popping the items simple + s_in_alignment = set(alignment) + for boundary_info_pointer in range(len(boundary_info) - 1, -1, -1): + s_in_boundary_info = set( + range(boundary_info[boundary_info_pointer]["s_start"], boundary_info[boundary_info_pointer]["s_end"] + 1) + ) + item_not_in_alignment = True + for s_ in s_in_boundary_info: + if s_ in s_in_alignment: + item_not_in_alignment = False + + if item_not_in_alignment: + boundary_info.pop(boundary_info_pointer) + + # now updated boundary_info with s_start and s_end + boundary_info_pointer = 0 + for t, s_at_t in enumerate(alignment): + if s_at_t == boundary_info[boundary_info_pointer]["s_start"]: + if "t_start" not in boundary_info[boundary_info_pointer]: + boundary_info[boundary_info_pointer]["t_start"] = t + + if t < len(alignment) - 1: + if alignment[t + 1] > boundary_info[boundary_info_pointer]["s_end"]: + if "t_end" not in boundary_info[boundary_info_pointer]: + boundary_info[boundary_info_pointer]["t_end"] = t + + boundary_info_pointer += 1 + else: # i.e. t == len(alignment) - 1, i.e. we are a the final element in alignment + # add final t_end if we haven't already + if "t_end" not in boundary_info[boundary_info_pointer]: + boundary_info[boundary_info_pointer]["t_end"] = t + + if boundary_info_pointer == len(boundary_info): + # we have finished populating boundary_info with t_start and t_end, + # but we might have some final remaining elements (blanks) in the alignment which we dont care about + # => break, so as not to cause issues trying to access boundary_info[boundary_info_pointer] + break + + return boundary_info + + +def make_ctm( + boundary_info_list, + alignment_list, + data_list, model, model_downsample_factor, output_ctm_folder, remove_blank_tokens_from_ctm, - minimum_timestamp_duration, n_parts_for_ctm_id, + minimum_timestamp_duration, audio_sr, ): """ - Note: assume order of utts in data matches order of utts in alignments + Note: assume same order of utts in boundary_info, alignments and data """ - assert len(data) == len(alignments) + assert len(boundary_info_list) == len(alignment_list) == len(data_list) + + BLANK_TOKEN = "" # TODO: move outside of function os.makedirs(output_ctm_folder, exist_ok=True) - blank_index = len(model.decoder.vocabulary) spectrogram_hop_length = model.preprocessor.featurizer.hop_length timestep_to_sample_ratio = spectrogram_hop_length * model_downsample_factor - for manifest_line, alignment in zip(data, alignments): - - # alignments currently list into loc in sentence. - # we want to get token_ids for utterance - token_ids = tokenize_text(model, manifest_line["text"]) - u_list = [blank_index] - for token_id in token_ids: - u_list.extend([token_id, blank_index]) - - # make 'tokens_info' - a list of dictionaries: - # e.g. [ - # {"token": "", "u": 0, "t_start": 0, "t_end", 1 }, - # {"token": "_sh", "u": 0, "t_start": 2, "t_end", 4 }, - # ... - # ] - tokens_info = [] - for t, u in enumerate(alignment): - alignment_token_in_vocab = u_list[u] - if alignment_token_in_vocab == blank_index: - alignment_token_as_string = "" - else: - alignment_token_as_string = model.decoder.vocabulary[alignment_token_in_vocab] - if alignment_token_as_string == " ": - alignment_token_as_string = "" - - if t == 0: - tokens_info.append( - {"token": alignment_token_as_string, "u": u, "t_start": t,} - ) - - else: - if u == tokens_info[-1]["u"]: - pass - else: - tokens_info[-1]["t_end"] = t - 1 - tokens_info.append( - {"token": alignment_token_as_string, "u": u, "t_start": t,} - ) - - tokens_info[-1]["t_end"] = len(alignment) - 1 + for boundary_info, alignment, manifest_line in zip(boundary_info_list, alignment_list, data_list): + + boundary_info = add_t_start_end_to_boundary_info(boundary_info, alignment) # get utt_id that will be used for saving CTM file as .ctm utt_id = _get_utt_id(manifest_line['audio_filepath'], n_parts_for_ctm_id) @@ -99,10 +106,10 @@ def make_token_ctm( audio_file_duration = f.frames / f.samplerate with open(os.path.join(output_ctm_folder, f"{utt_id}.ctm"), "w") as f_ctm: - for token_info in tokens_info: - token = token_info["token"] - start_sample = token_info["t_start"] * timestep_to_sample_ratio - end_sample = (token_info["t_end"] + 1) * timestep_to_sample_ratio - 1 + for segment_info in boundary_info: + text = segment_info["text"] + start_sample = segment_info["t_start"] * timestep_to_sample_ratio + end_sample = (segment_info["t_end"] + 1) * timestep_to_sample_ratio - 1 start_time = start_sample / audio_sr end_time = end_sample / audio_sr @@ -112,139 +119,7 @@ def make_token_ctm( start_time = max(token_mid_point - minimum_timestamp_duration / 2, 0) end_time = min(token_mid_point + minimum_timestamp_duration / 2, audio_file_duration) - if not (remove_blank_tokens_from_ctm and token == ""): - f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {token}\n") - - return None - - -def make_segment_ctm( - data, - alignments, - model, - model_downsample_factor, - output_ctm_folder, - separator, - n_parts_for_ctm_id, - minimum_timestamp_duration, - audio_sr, -): - """ - Note: assume order of utts in data matches order of utts in alignments - """ - assert len(data) == len(alignments) - - os.makedirs(output_ctm_folder, exist_ok=True) - - spectrogram_hop_length = model.preprocessor.featurizer.hop_length - timestep_to_sample_ratio = spectrogram_hop_length * model_downsample_factor - - for manifest_line, alignment in zip(data, alignments): - # make 'segments_info' - a list of dictionaries: - # e.g. [ - # {"segment": "", "u_start": 0, "u_end": 0, "t_start": 0, "t_end", 0 }, - # {"segment": "she", "u_start": 1, "u_end": 5, "t_start": 1, "t_end", 5 }, - # {"segment": "had", "u_start": 7, "u_end": 10, "t_start": 6, "t_end", 8 } - # ... - # ] - segments_info = [{"segment": "", "u_start": 0, "u_end": 0, "t_start": 0, "t_end": None}] - u_counter = 1 - _, segments = get_processed_text_and_segments(manifest_line['text'], separator) - if separator not in model.decoder.vocabulary: - for segment in segments: - segment_info = { - "segment": segment.replace(" ", ""), - "u_start": u_counter, - "u_end": None, - "t_start": None, - "t_end": None, - } - segment_tokens = tokenize_text(model, segment) - segment_info["u_end"] = segment_info["u_start"] + 2 * (len(segment_tokens) - 1) - segments_info.append(segment_info) - - u_counter += 2 * len(segment_tokens) - - if " " in model.decoder.vocabulary: - u_counter += 2 - - else: - for segment_i, segment in enumerate(segments): - segment_info = { - "segment": segment.replace(" ", ""), - "u_start": u_counter, - "u_end": None, - "t_start": None, - "t_end": None, - } - segment_tokens = tokenize_text(model, segment) - - segment_info["u_end"] = segment_info["u_start"] + 2 * (len(segment_tokens) - 1) - u_counter += 2 * len(segment_tokens) - - segments_info.append(segment_info) - - if segment_i < len(manifest_line["text"].split(separator)) - 1: - # add the space after every segment except the final segment - segment_info = { - "segment": separator.replace(" ", ""), - "u_start": u_counter, - "u_end": None, - "t_start": None, - "t_end": None, - } - segment_tokens = tokenize_text(model, " ") - - segment_info["u_end"] = segment_info["u_start"] + 2 * len(segment_tokens) - 1 - u_counter += 2 * len(segment_tokens) - - segments_info.append(segment_info) - - segments_info_pointer = 0 - for t, u_at_t in enumerate(alignment): - if segments_info_pointer < len(segments_info): - if u_at_t == segments_info[segments_info_pointer]["u_start"]: - if segments_info[segments_info_pointer]["t_start"] is None: - segments_info[segments_info_pointer]["t_start"] = t - - if u_at_t > segments_info[segments_info_pointer]["u_end"]: - if segments_info[segments_info_pointer]["t_end"] is None: - segments_info[segments_info_pointer]["t_end"] = t - 1 - - segments_info_pointer += 1 - - if segments_info_pointer < len(segments_info): - if u_at_t == segments_info[segments_info_pointer]["u_start"]: - if segments_info[segments_info_pointer]["t_start"] is None: - segments_info[segments_info_pointer]["t_start"] = t - - # don't forget the final t_end as our for-loop might miss it - if alignment[-1] == segments_info[-1]["u_end"]: - segments_info[-1]["t_end"] = len(alignment) - 1 - - # get utt_id that will be used for saving CTM file as .ctm - utt_id = _get_utt_id(manifest_line['audio_filepath'], n_parts_for_ctm_id) - - # get audio file duration if we will need it later - if minimum_timestamp_duration > 0: - with sf.SoundFile(manifest_line["audio_filepath"]) as f: - audio_file_duration = f.frames / f.samplerate - - with open(os.path.join(output_ctm_folder, f"{utt_id}.ctm"), "w") as f_ctm: - for segment_info in segments_info: - if not segment_info["segment"] == "": - segment = segment_info["segment"] - start_sample = segment_info["t_start"] * timestep_to_sample_ratio - end_sample = (segment_info["t_end"] + 1) * timestep_to_sample_ratio - 1 - - start_time = start_sample / audio_sr - end_time = end_sample / audio_sr - - if minimum_timestamp_duration > 0 and minimum_timestamp_duration > end_time - start_time: - token_mid_point = (start_time + end_time) / 2 - start_time = max(token_mid_point - minimum_timestamp_duration / 2, 0) - end_time = min(token_mid_point + minimum_timestamp_duration / 2, audio_file_duration) - - f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {segment}\n") + if not (text == BLANK_TOKEN and remove_blank_tokens_from_ctm): + f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {text}\n") return None From 0236daae5e0ccae3de10d6ba438484f7f9362bbe Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 10 Jan 2023 07:33:24 -0800 Subject: [PATCH 119/154] change CTM rounding to remove unnecessary decimal figures Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/utils/make_ctm.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/nemo_forced_aligner/utils/make_ctm.py b/tools/nemo_forced_aligner/utils/make_ctm.py index 4f33ed4f75d1..b14d79d1b5da 100644 --- a/tools/nemo_forced_aligner/utils/make_ctm.py +++ b/tools/nemo_forced_aligner/utils/make_ctm.py @@ -120,6 +120,6 @@ def make_ctm( end_time = min(token_mid_point + minimum_timestamp_duration / 2, audio_file_duration) if not (text == BLANK_TOKEN and remove_blank_tokens_from_ctm): - f_ctm.write(f"{utt_id} 1 {start_time:.5f} {end_time - start_time:.5f} {text}\n") + f_ctm.write(f"{utt_id} 1 {start_time:.2f} {end_time - start_time:.2f} {text}\n") return None From 31edb15255accc71f710eae98fbbdb7be9ecd795 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 10 Jan 2023 07:45:04 -0800 Subject: [PATCH 120/154] Move obtaining start and end of batch line IDs to separate util function Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 12 +++--------- tools/nemo_forced_aligner/utils/data_prep.py | 16 ++++++++++++++++ 2 files changed, 19 insertions(+), 9 deletions(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index d4d0373015e1..6497954ff2c8 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -18,7 +18,7 @@ import torch from omegaconf import OmegaConf -from utils.data_prep import get_audio_sr, get_log_probs_y_T_U, get_manifest_lines +from utils.data_prep import get_audio_sr, get_batch_starts_ends, get_log_probs_y_T_U, get_manifest_lines from utils.make_ctm import make_ctm from utils.viterbi_decoding import viterbi_decoding @@ -144,14 +144,8 @@ def main(cfg: AlignmentConfig): "timestamps in output CTM." ) - # define start and end line IDs of batches - with open(cfg.manifest_filepath, 'r') as f: - num_lines_in_manifest = sum(1 for _ in f) - - starts = [x for x in range(0, num_lines_in_manifest, cfg.batch_size)] - ends = [x - 1 for x in starts] - ends.pop(0) - ends.append(num_lines_in_manifest) + # get start and end line IDs of batches + starts, ends = get_batch_starts_ends(cfg.manifest_filepath, cfg.batch_size) # get alignment and save in CTM batch-by-batch for start, end in zip(starts, ends): diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index 824a525f6310..8effa1891bca 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -21,6 +21,22 @@ V_NEG_NUM = -1e30 +def get_batch_starts_ends(manifest_filepath, batch_size): + """ + Get the start and end ids of the lines we will use for each 'batch'. + """ + + with open(manifest_filepath, 'r') as f: + num_lines_in_manifest = sum(1 for _ in f) + + starts = [x for x in range(0, num_lines_in_manifest, batch_size)] + ends = [x - 1 for x in starts] + ends.pop(0) + ends.append(num_lines_in_manifest) + + return starts, ends + + def get_audio_sr(manifest_filepath): """ Measure the sampling rate of the audio file in the first line From 6258cc3f5b3a7ad3de49c261ba5f2121683f2f08 Mon Sep 17 00:00:00 2001 From: milesial Date: Tue, 10 Jan 2023 19:46:35 +0100 Subject: [PATCH 121/154] Sanitize params before DLLogger log_hyperparams (#5736) * Sanitize params before DLLogger log_hyperparams Signed-off-by: Alexandre Milesi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alexandre Milesi Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper Signed-off-by: Elena Rastorgueva --- nemo/utils/dllogger.py | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/nemo/utils/dllogger.py b/nemo/utils/dllogger.py index a9b7970a283d..5b5e674585d3 100644 --- a/nemo/utils/dllogger.py +++ b/nemo/utils/dllogger.py @@ -12,9 +12,14 @@ # See the License for the specific language governing permissions and # limitations under the License. +from argparse import Namespace from pathlib import Path +from lightning_utilities.core.apply_func import apply_to_collection +from omegaconf import DictConfig, ListConfig, OmegaConf from pytorch_lightning.loggers import Logger +from pytorch_lightning.utilities.logger import _convert_params, _flatten_dict, _sanitize_callable_params +from pytorch_lightning.utilities.parsing import AttributeDict from nemo.utils import logging @@ -57,6 +62,13 @@ def __init__(self, stdout: bool, verbose: bool, json_file: str): dllogger.init(backends=backends) def log_hyperparams(self, params, *args, **kwargs): + if isinstance(params, Namespace): + params = vars(params) + elif isinstance(params, AttributeDict): + params = dict(params) + params = apply_to_collection(params, (DictConfig, ListConfig), OmegaConf.to_container, resolve=True) + params = apply_to_collection(params, Path, str) + params = _sanitize_callable_params(_flatten_dict(_convert_params(params))) dllogger.log(step="PARAMETER", data=params) def log_metrics(self, metrics, step=None): From 71ea99966b638181ad9e1ed2e724cc8a11665034 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 10 Jan 2023 12:59:33 -0800 Subject: [PATCH 122/154] Allow to run alignment on transcribed pred_text Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 21 +++++- tools/nemo_forced_aligner/align.py | 66 +++++++++++++++++-- tools/nemo_forced_aligner/utils/data_prep.py | 50 ++++++++++++-- .../{make_ctm.py => make_output_files.py} | 43 ++++++++++++ 4 files changed, 165 insertions(+), 15 deletions(-) rename tools/nemo_forced_aligner/utils/{make_ctm.py => make_output_files.py} (76%) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 694a9ffebf79..6e7589e01dec 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -29,6 +29,8 @@ Call the `align.py` script, specifying the parameters as follows: * `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/{tokens,words,additional_segments}/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. +* **[OPTIONAL]** `align_using_pred_text`: if True, will transcribe the audio using the specified model and then use that transcription as the 'ground truth' for the forced alignment. The `"pred_text"` will be saved in the output JSON manifest at `/{original manifest name}_with_ctm_paths.json`. To avoid over-writing other transcribed texts, if there are already `"pred_text"` entries in the original manifest, the program will exit without attempting to generate alignments. (Default: False). + * **[OPTIONAL]** `transcribe_device`: The device that will be used for generating log-probs (i.e. transcribing). (Default: 'cpu'). * **[OPTIONAL]** `viterbi_device`: The device that will be used for doing Viterbi decoding. (Default: 'cpu'). @@ -45,19 +47,32 @@ Call the `align.py` script, specifying the parameters as follows: * **[OPTIONAL]** `minimum_timestamp_duration`: a float indicating a minimum duration (in seconds) for timestamps in the CTM. If any line in the CTM has a duration lower than the `minimum_timestamp_duration`, it will be enlarged from the middle outwards until it meets the minimum_timestamp_duration, or reaches the beginning or end of the audio file. Note that this may cause timestamps to overlap. (Default: 0, i.e. no modifications to predicted duration). # Input manifest file format -NFA needs to be provided with a 'manifest' file where each line specifies the absolute "audio_filepath" and "text" of each utterance that you wish to produce alignments for, like the format below: +By default, NFA needs to be provided with a 'manifest' file where each line specifies the absolute "audio_filepath" and "text" of each utterance that you wish to produce alignments for, like the format below: ```json {"audio_filepath": "/absolute/path/to/audio.wav", "text": "the transcription of the utterance"} ``` + +You can omit the `"text"` field from the manifest if you specify `align_using_pred_text=true`. In that case, any `"text"` fields in the manifest will be ignored: the ASR model at `pretrained_name` or `model_path` will be used to transcribe the audio and obtain `"pred_text"`, which will be used as the 'ground truth' for the forced alignment process. The `"pred_text"` will also be saved in the output manifest JSON file at `/_with_ctm_paths.json`. To remove the possibility of overwriting `"pred_text"`, NFA will not attempt to do alignment with `align_using_pred_text=true` if there are existing `"pred_text"` fields in the original manifest. + > Note: NFA does not require `"duration"` fields in the manifest, and can align long audio files without running out of memory. Depending on your machine specs, you can align audios up to 5-10 minutes on Conformer CTC models, up to around 1.5 hours for QuartzNet models, and up to several hours for Citrinet models. NFA will also produce better alignments the more accurate the ground-truth `"text"` is. # Output CTM file format -For each utterance specified in a line of `manifest_filepath`, one CTM file will be generated at the location of `/.ctm`. +For each utterance specified in a line of `manifest_filepath`, several CTM files will be generated: +* a CTM file containing token-level alignments at `/tokens/.ctm`, +* a CTM file containing word-level alignments at `/words/.ctm`, +* if `additional_ctm_grouping_separator` is specified, there will also be a CTM file containing those segments at `output_ctm_folder/additional_segments`. Each CTM file will contain lines of the format: -` 1 `. +` 1 `. Note the second item in the line (the 'channel ID', which is required by the CTM file format) is always 1, as NFA operates on single channel audio. +# Output JSON manifest file format +A new manifest file will be saved at `/_with_ctm_paths.json`. It will contain the same fields as the original manifest, and additionally: +* `"token_level_ctm_filepath"` +* `"word_level_ctm_filepath"` +* `"additonal_segment_level_ctm_filepath"` (if `additional_ctm_grouping_separator` is specified) +* `"pred_text"` (if `align_using_pred_text=true`) + # How do I evaluate the alignment accuracy? Ideally you would have some 'true' CTM files to compare with your generated CTM files. diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 6497954ff2c8..90b23c8b6243 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -18,8 +18,15 @@ import torch from omegaconf import OmegaConf -from utils.data_prep import get_audio_sr, get_batch_starts_ends, get_log_probs_y_T_U, get_manifest_lines -from utils.make_ctm import make_ctm +from utils.data_prep import ( + get_audio_sr, + get_batch_starts_ends, + get_log_probs_y_T_U, + get_manifest_lines, + is_entry_in_all_lines, + is_entry_in_any_lines, +) +from utils.make_output_files import make_ctm, make_new_manifest from utils.viterbi_decoding import viterbi_decoding from nemo.collections.asr.models.ctc_models import EncDecCTCModel @@ -48,6 +55,8 @@ manifest_filepath: filepath to the manifest of the data you want to align, containing 'audio_filepath' and 'text' fields. output_ctm_folder: the folder where output CTM files will be saved. + align_using_pred_text: if True, will transcribe the audio using the specified model and then use that transcription + as the 'ground truth' for the forced alignment. transcribe_device: string specifying the device that will be used for generating log-probs (i.e. "transcribing"). The string needs to be in a format recognized by torch.device(). viterbi_device: string specifying the device that will be used for doing Viterbi decoding. @@ -85,6 +94,7 @@ class AlignmentConfig: output_ctm_folder: Optional[str] = None # General configs + align_using_pred_text: bool = False transcribe_device: str = "cpu" viterbi_device: str = "cpu" batch_size: int = 1 @@ -118,12 +128,36 @@ def main(cfg: AlignmentConfig): if cfg.output_ctm_folder is None: raise ValueError("cfg.output_ctm_folder must be specified") + if cfg.batch_size < 1: + raise ValueError("cfg.batch_size cannot be zero or a negative number") + if cfg.additional_ctm_grouping_separator == "" or cfg.additional_ctm_grouping_separator == " ": - raise ValueError("cfg.additional_grouping_separaor cannot be empty string or space character") + raise ValueError("cfg.additional_grouping_separator cannot be empty string or space character") if cfg.minimum_timestamp_duration < 0: raise ValueError("cfg.minimum_timestamp_duration cannot be a negative number") + # Validate manifest contents + if not is_entry_in_all_lines(cfg.manifest_filepath, "audio_filepath"): + raise RuntimeError( + "At least one line in cfg.manifest_filepath does not contain an 'audio_filepath' entry. " + "All lines must contain an 'audio_filepath' entry." + ) + + if cfg.align_using_pred_text: + if is_entry_in_any_lines(cfg.manifest_filepath, "pred_text"): + raise RuntimeError( + "Cannot specify cfg.align_using_pred_text=True when the manifest at cfg.manifest_filepath " + "contains 'pred_text' entries. This is because the audio will be transcribed and may produce " + "a different 'pred_text'. This may cause confusion." + ) + else: + if not is_entry_in_all_lines(cfg.manifest_filepath, "text"): + raise RuntimeError( + "At least one line in cfg.manifest_filepath does not contain a 'text' entry. " + "NFA requires all lines to contain a 'text' entry when cfg.align_using_pred_text=True." + ) + # init devices transcribe_device = torch.device(cfg.transcribe_device) viterbi_device = torch.device(cfg.viterbi_device) @@ -147,13 +181,23 @@ def main(cfg: AlignmentConfig): # get start and end line IDs of batches starts, ends = get_batch_starts_ends(cfg.manifest_filepath, cfg.batch_size) + if cfg.align_using_pred_text: + # record pred_texts to save them in the new manifest at the end of this script + pred_text_all_lines = [] + else: + pred_text_all_lines = None + # get alignment and save in CTM batch-by-batch for start, end in zip(starts, ends): data = get_manifest_lines(cfg.manifest_filepath, start, end) - log_probs, y, T, U, token_info, word_info, segment_info = get_log_probs_y_T_U( - data, model, cfg.additional_ctm_grouping_separator + log_probs, y, T, U, token_info, word_info, segment_info, pred_text_list = get_log_probs_y_T_U( + data, model, cfg.additional_ctm_grouping_separator, cfg.align_using_pred_text, ) + + if cfg.align_using_pred_text: + pred_text_all_lines.extend(pred_text_list) + alignments = viterbi_decoding(log_probs, y, T, U, viterbi_device) make_ctm( @@ -176,7 +220,7 @@ def main(cfg: AlignmentConfig): model, cfg.model_downsample_factor, os.path.join(cfg.output_ctm_folder, "words"), - False, # dont try to remove blank tokens because we dont expect them to be there + False, # dont try to remove blank tokens because we dont expect them to be there anyway cfg.n_parts_for_ctm_id, cfg.minimum_timestamp_duration, audio_sr, @@ -190,12 +234,20 @@ def main(cfg: AlignmentConfig): model, cfg.model_downsample_factor, os.path.join(cfg.output_ctm_folder, "additional_segments"), - False, # dont try to remove blank tokens because we dont expect them to be there + False, # dont try to remove blank tokens because we dont expect them to be there anyway cfg.n_parts_for_ctm_id, cfg.minimum_timestamp_duration, audio_sr, ) + make_new_manifest( + cfg.output_ctm_folder, + cfg.manifest_filepath, + cfg.additional_ctm_grouping_separator, + cfg.n_parts_for_ctm_id, + pred_text_all_lines, + ) + return None diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index 8effa1891bca..0605cf9ba591 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -37,6 +37,37 @@ def get_batch_starts_ends(manifest_filepath, batch_size): return starts, ends +def is_entry_in_any_lines(manifest_filepath, entry): + """ + Returns True if entry is a key in any of the JSON lines in manifest_filepath + """ + + entry_in_manifest = False + + with open(manifest_filepath, 'r') as f: + for line in f: + data = json.loads(line) + + if entry in data: + entry_in_manifest = True + + return entry_in_manifest + + +def is_entry_in_all_lines(manifest_filepath, entry): + """ + Returns True is entry is a key in all of the JSON lines in manifest_filepath. + """ + with open(manifest_filepath, 'r') as f: + for line in f: + data = json.loads(line) + + if entry not in data: + return False + + return True + + def get_audio_sr(manifest_filepath): """ Measure the sampling rate of the audio file in the first line @@ -246,12 +277,13 @@ def get_y_token_word_segment_info(text, model, separator): raise RuntimeError("Cannot get tokens of this model.") -def get_log_probs_y_T_U(data, model, separator): +def get_log_probs_y_T_U(data, model, separator, align_using_pred_text): """ Preparing some tensors to be used during Viterbi decoding. Returns: log_probs, y, T, U (y and U are s.t. every other token is a blank), - token_info_list, word_info_list, segment_info_list + token_info_list, word_info_list, segment_info_list, + pred_text_list """ audio_filepaths = [line["audio_filepath"] for line in data] @@ -262,9 +294,11 @@ def get_log_probs_y_T_U(data, model, separator): log_probs_list = [] T_list = [] + pred_text_list = [] for hypothesis in hypotheses: log_probs_list.append(hypothesis.y_sequence) T_list.append(hypothesis.y_sequence.shape[0]) + pred_text_list.append(hypothesis.text) y_list = [] U_list = [] @@ -272,8 +306,14 @@ def get_log_probs_y_T_U(data, model, separator): word_info_list = [] segment_info_list = [] - for line in data: - y_line, token_info, word_info, segment_info = get_y_token_word_segment_info(line["text"], model, separator) + for i_line, line in enumerate(data): + if align_using_pred_text: + gt_text_for_alignment = pred_text_list[i_line] + else: + gt_text_for_alignment = line["text"] + y_line, token_info, word_info, segment_info = get_y_token_word_segment_info( + gt_text_for_alignment, model, separator + ) y_list.append(y_line) U_list.append(len(y_line)) @@ -299,4 +339,4 @@ def get_log_probs_y_T_U(data, model, separator): U_b = U[b] y[b, :U_b] = torch.tensor(y_b) - return log_probs, y, T, U, token_info_list, word_info_list, segment_info_list + return log_probs, y, T, U, token_info_list, word_info_list, segment_info_list, pred_text_list diff --git a/tools/nemo_forced_aligner/utils/make_ctm.py b/tools/nemo_forced_aligner/utils/make_output_files.py similarity index 76% rename from tools/nemo_forced_aligner/utils/make_ctm.py rename to tools/nemo_forced_aligner/utils/make_output_files.py index b14d79d1b5da..3daafc487850 100644 --- a/tools/nemo_forced_aligner/utils/make_ctm.py +++ b/tools/nemo_forced_aligner/utils/make_output_files.py @@ -12,6 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. +import json import os from pathlib import Path @@ -123,3 +124,45 @@ def make_ctm( f_ctm.write(f"{utt_id} 1 {start_time:.2f} {end_time - start_time:.2f} {text}\n") return None + + +def make_new_manifest( + output_ctm_folder, + original_manifest_filepath, + additional_ctm_grouping_separator, + n_parts_for_ctm_id, + pred_text_all_lines, +): + if pred_text_all_lines: + with open(original_manifest_filepath, 'r') as f: + num_lines_in_manifest = sum(1 for _ in f) + + if not num_lines_in_manifest == len(pred_text_all_lines): + raise RuntimeError( + f"Number of lines in the original manifest ({num_lines_in_manifest}) does not match " + f"the number of pred_texts we have ({len(pred_text_all_lines)}). Something has gone wrong." + ) + + tgt_manifest_name = str(Path(original_manifest_filepath).stem) + "_with_ctm_paths.json" + tgt_manifest_filepath = str(Path(output_ctm_folder) / tgt_manifest_name) + + with open(original_manifest_filepath, 'r') as fin, open(tgt_manifest_filepath, 'w') as fout: + for i_line, line in enumerate(fin): + data = json.loads(line) + + utt_id = _get_utt_id(data["audio_filepath"], n_parts_for_ctm_id) + + data["token_level_ctm_filepath"] = str(Path(output_ctm_folder) / "tokens" / f"{utt_id}.ctm") + data["word_level_ctm_filepath"] = str(Path(output_ctm_folder) / "words" / f"{utt_id}.ctm") + + if additional_ctm_grouping_separator: + data["additional_segment_level_ctm_filepath"] = str( + Path(output_ctm_folder) / "additional_segments" / f"{utt_id}.ctm" + ) + + if pred_text_all_lines: + data['pred_text'] = pred_text_all_lines[i_line] + + new_line = json.dumps(data) + + fout.write(f"{new_line}\n") From a1642a50abd6aa666db0e91431207593bfcb0032 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 10 Jan 2023 13:02:56 -0800 Subject: [PATCH 123/154] Update README Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 6e7589e01dec..4fc855c96f16 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -29,7 +29,7 @@ Call the `align.py` script, specifying the parameters as follows: * `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/{tokens,words,additional_segments}/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. -* **[OPTIONAL]** `align_using_pred_text`: if True, will transcribe the audio using the specified model and then use that transcription as the 'ground truth' for the forced alignment. The `"pred_text"` will be saved in the output JSON manifest at `/{original manifest name}_with_ctm_paths.json`. To avoid over-writing other transcribed texts, if there are already `"pred_text"` entries in the original manifest, the program will exit without attempting to generate alignments. (Default: False). +* **[OPTIONAL]** `align_using_pred_text`: if True, will transcribe the audio using the ASR model (specified by `pretrained_name` or `model_path`) and then use that transcription as the 'ground truth' for the forced alignment. The `"pred_text"` will be saved in the output JSON manifest at `/{original manifest name}_with_ctm_paths.json`. To avoid over-writing other transcribed texts, if there are already `"pred_text"` entries in the original manifest, the program will exit without attempting to generate alignments. (Default: False). * **[OPTIONAL]** `transcribe_device`: The device that will be used for generating log-probs (i.e. transcribing). (Default: 'cpu'). From 466ea8cc9f5c8a9a74f2568046d22962a4667c7f Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 10 Jan 2023 13:09:02 -0800 Subject: [PATCH 124/154] update README Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 4fc855c96f16..f2f6575b1fae 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -79,4 +79,4 @@ Ideally you would have some 'true' CTM files to compare with your generated CTM Alternatively (or additionally), you can visualize the quality of alignments using tools such as Gecko, which can play your audio file and display the predicted alignments at the same time. The Gecko tool requires you to upload an audio file and at least one CTM file. The Gecko tool can be accessed here: https://gong-io.github.io/gecko/. More information about the Gecko tool can be found on its Github page here: https://github.com/gong-io/gecko. -Note that Gecko may not display some tokens/words/segments properly if their timestamps are too short. To ameliorate this, you may try setting `minimum_timestamp_duration` to some suitable number. If you are producing token-level CTMs, it may also be helpful to set `remove_blank_tokens_from_ctm` to `True`, to make the Gecko visualization less cluttered. +Note that Gecko may not display some tokens/words/segments properly if their timestamps are too short. To ameliorate this, you may try setting `minimum_timestamp_duration` to some suitable number. If you are analyzing token-level CTMs, it may also be helpful to set `remove_blank_tokens_from_ctm` to `True`, to make the Gecko visualization less cluttered. From 6c2b04332458b6f5aff346f2bfdd83fbf4207b6a Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 10 Jan 2023 13:15:44 -0800 Subject: [PATCH 125/154] Rename output_ctm_folder to output_dir Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 18 +++++++-------- tools/nemo_forced_aligner/align.py | 22 +++++++++---------- .../utils/make_output_files.py | 16 +++++++------- 3 files changed, 28 insertions(+), 28 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index f2f6575b1fae..a0541cd9096e 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -9,7 +9,7 @@ python /NeMo/tools/nemo_forced_aligner/align.py \ pretrained_name="stt_en_citrinet_1024_gamma_0_25" \ model_downsample_factor=8 \ manifest_filepath= \ - output_ctm_folder= + output_dir= ``` ## How do I use NeMo Forced Aligner? @@ -27,9 +27,9 @@ Call the `align.py` script, specifying the parameters as follows: * `manifest_filepath`: The path to the manifest of the data you want to align, containing `'audio_filepath'` and `'text'` fields. The audio filepaths need to be absolute paths. -* `output_ctm_folder`: The folder where to save CTM files containing the generated alignments. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/{tokens,words,additional_segments}/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. +* `output_dir`: The folder where to save CTM files containing the generated alignments and new JSON manifest containing paths to those CTM files. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/{tokens,words,additional_segments}/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. The new JSON manifest will be at `/_with_ctm_paths.json`. -* **[OPTIONAL]** `align_using_pred_text`: if True, will transcribe the audio using the ASR model (specified by `pretrained_name` or `model_path`) and then use that transcription as the 'ground truth' for the forced alignment. The `"pred_text"` will be saved in the output JSON manifest at `/{original manifest name}_with_ctm_paths.json`. To avoid over-writing other transcribed texts, if there are already `"pred_text"` entries in the original manifest, the program will exit without attempting to generate alignments. (Default: False). +* **[OPTIONAL]** `align_using_pred_text`: if True, will transcribe the audio using the ASR model (specified by `pretrained_name` or `model_path`) and then use that transcription as the 'ground truth' for the forced alignment. The `"pred_text"` will be saved in the output JSON manifest at `/{original manifest name}_with_ctm_paths.json`. To avoid over-writing other transcribed texts, if there are already `"pred_text"` entries in the original manifest, the program will exit without attempting to generate alignments. (Default: False). * **[OPTIONAL]** `transcribe_device`: The device that will be used for generating log-probs (i.e. transcribing). (Default: 'cpu'). @@ -37,7 +37,7 @@ Call the `align.py` script, specifying the parameters as follows: * **[OPTIONAL]** `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). -* **[OPTIONAL]** `additional_ctm_grouping_separator`: the string used to separate CTM segments if you want to obtain CTM files at a level that is not the token level or the word level. NFA will always produce token-level and word-level CTM files in: `/tokens/.ctm` and `/words/.ctm`. If `additional_ctm_grouping_separator` is specified, an additional folder `/{tokens/words/additional_segments}/.ctm` will be created containing CTMs for `addtional_ctm_grouping_separator`-separated segments. (Default: `None`. Cannot be empty string or space (" "), as space, as space-separated word-level CTMs will always be saved in `/words/.ctm`.) +* **[OPTIONAL]** `additional_ctm_grouping_separator`: the string used to separate CTM segments if you want to obtain CTM files at a level that is not the token level or the word level. NFA will always produce token-level and word-level CTM files in: `/tokens/.ctm` and `/words/.ctm`. If `additional_ctm_grouping_separator` is specified, an additional folder `/{tokens/words/additional_segments}/.ctm` will be created containing CTMs for `addtional_ctm_grouping_separator`-separated segments. (Default: `None`. Cannot be empty string or space (" "), as space, as space-separated word-level CTMs will always be saved in `/words/.ctm`.) > Note: the `additional_ctm_grouping_separator` will be removed from the ground truth text and all the output CTMs, ie it is treated as a marker which is not part of the ground truth. The separator will essentially be treated as a space, and any additional spaces around it will be amalgamated into one, i.e. if `additional_ctm_grouping_separator="|"`, the following texts will be treated equivalently: `“abc|def”`, `“abc |def”`, `“abc| def”`, `“abc | def"`. * **[OPTIONAL]** `remove_blank_tokens_from_ctm`: a boolean denoting whether to remove tokens from token-level output CTMs. (Default: False). @@ -52,22 +52,22 @@ By default, NFA needs to be provided with a 'manifest' file where each line spec {"audio_filepath": "/absolute/path/to/audio.wav", "text": "the transcription of the utterance"} ``` -You can omit the `"text"` field from the manifest if you specify `align_using_pred_text=true`. In that case, any `"text"` fields in the manifest will be ignored: the ASR model at `pretrained_name` or `model_path` will be used to transcribe the audio and obtain `"pred_text"`, which will be used as the 'ground truth' for the forced alignment process. The `"pred_text"` will also be saved in the output manifest JSON file at `/_with_ctm_paths.json`. To remove the possibility of overwriting `"pred_text"`, NFA will not attempt to do alignment with `align_using_pred_text=true` if there are existing `"pred_text"` fields in the original manifest. +You can omit the `"text"` field from the manifest if you specify `align_using_pred_text=true`. In that case, any `"text"` fields in the manifest will be ignored: the ASR model at `pretrained_name` or `model_path` will be used to transcribe the audio and obtain `"pred_text"`, which will be used as the 'ground truth' for the forced alignment process. The `"pred_text"` will also be saved in the output manifest JSON file at `/_with_ctm_paths.json`. To remove the possibility of overwriting `"pred_text"`, NFA will not attempt to do alignment with `align_using_pred_text=true` if there are existing `"pred_text"` fields in the original manifest. > Note: NFA does not require `"duration"` fields in the manifest, and can align long audio files without running out of memory. Depending on your machine specs, you can align audios up to 5-10 minutes on Conformer CTC models, up to around 1.5 hours for QuartzNet models, and up to several hours for Citrinet models. NFA will also produce better alignments the more accurate the ground-truth `"text"` is. # Output CTM file format For each utterance specified in a line of `manifest_filepath`, several CTM files will be generated: -* a CTM file containing token-level alignments at `/tokens/.ctm`, -* a CTM file containing word-level alignments at `/words/.ctm`, -* if `additional_ctm_grouping_separator` is specified, there will also be a CTM file containing those segments at `output_ctm_folder/additional_segments`. +* a CTM file containing token-level alignments at `/tokens/.ctm`, +* a CTM file containing word-level alignments at `/words/.ctm`, +* if `additional_ctm_grouping_separator` is specified, there will also be a CTM file containing those segments at `output_dir/additional_segments`. Each CTM file will contain lines of the format: ` 1 `. Note the second item in the line (the 'channel ID', which is required by the CTM file format) is always 1, as NFA operates on single channel audio. # Output JSON manifest file format -A new manifest file will be saved at `/_with_ctm_paths.json`. It will contain the same fields as the original manifest, and additionally: +A new manifest file will be saved at `/_with_ctm_paths.json`. It will contain the same fields as the original manifest, and additionally: * `"token_level_ctm_filepath"` * `"word_level_ctm_filepath"` * `"additonal_segment_level_ctm_filepath"` (if `additional_ctm_grouping_separator` is specified) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 90b23c8b6243..19c15a5dfabd 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -37,7 +37,7 @@ """ Align the utterances in manifest_filepath. -Results are saved in ctm files in output_ctm_folder. +Results are saved in ctm files in output_dir. Arguments: pretrained_name: string specifying the name of a CTC NeMo ASR model which will be automatically downloaded @@ -54,7 +54,7 @@ If the ASR model is a Citirnet model, its downsample factor is 8. manifest_filepath: filepath to the manifest of the data you want to align, containing 'audio_filepath' and 'text' fields. - output_ctm_folder: the folder where output CTM files will be saved. + output_dir: the folder where output CTM files and new JSON manifest will be saved. align_using_pred_text: if True, will transcribe the audio using the specified model and then use that transcription as the 'ground truth' for the forced alignment. transcribe_device: string specifying the device that will be used for generating log-probs (i.e. "transcribing"). @@ -64,9 +64,9 @@ batch_size: int specifying batch size that will be used for generating log-probs and doing Viterbi decoding. additional_ctm_grouping_separator: the string used to separate CTM segments if you want to obtain CTM files at a level that is not the token level or the word level. NFA will always produce token-level and word-level CTM - files in: `/tokens/.ctm` and `/words/.ctm`. + files in: `/tokens/.ctm` and `/words/.ctm`. If `additional_ctm_grouping_separator` is specified, an additional folder - `/{tokens/words/additional_segments}/.ctm` will be created containing CTMs + `/{tokens/words/additional_segments}/.ctm` will be created containing CTMs for `addtional_ctm_grouping_separator`-separated segments. remove_blank_tokens_from_ctm: a boolean denoting whether to remove tokens from token-level output CTMs. n_parts_for_ctm_id: int specifying how many of the 'parts' of the audio_filepath @@ -91,7 +91,7 @@ class AlignmentConfig: model_path: Optional[str] = None model_downsample_factor: Optional[int] = None manifest_filepath: Optional[str] = None - output_ctm_folder: Optional[str] = None + output_dir: Optional[str] = None # General configs align_using_pred_text: bool = False @@ -125,8 +125,8 @@ def main(cfg: AlignmentConfig): if cfg.manifest_filepath is None: raise ValueError("cfg.manifest_filepath must be specified") - if cfg.output_ctm_folder is None: - raise ValueError("cfg.output_ctm_folder must be specified") + if cfg.output_dir is None: + raise ValueError("cfg.output_dir must be specified") if cfg.batch_size < 1: raise ValueError("cfg.batch_size cannot be zero or a negative number") @@ -206,7 +206,7 @@ def main(cfg: AlignmentConfig): data, model, cfg.model_downsample_factor, - os.path.join(cfg.output_ctm_folder, "tokens"), + os.path.join(cfg.output_dir, "tokens"), cfg.remove_blank_tokens_from_ctm, cfg.n_parts_for_ctm_id, cfg.minimum_timestamp_duration, @@ -219,7 +219,7 @@ def main(cfg: AlignmentConfig): data, model, cfg.model_downsample_factor, - os.path.join(cfg.output_ctm_folder, "words"), + os.path.join(cfg.output_dir, "words"), False, # dont try to remove blank tokens because we dont expect them to be there anyway cfg.n_parts_for_ctm_id, cfg.minimum_timestamp_duration, @@ -233,7 +233,7 @@ def main(cfg: AlignmentConfig): data, model, cfg.model_downsample_factor, - os.path.join(cfg.output_ctm_folder, "additional_segments"), + os.path.join(cfg.output_dir, "additional_segments"), False, # dont try to remove blank tokens because we dont expect them to be there anyway cfg.n_parts_for_ctm_id, cfg.minimum_timestamp_duration, @@ -241,7 +241,7 @@ def main(cfg: AlignmentConfig): ) make_new_manifest( - cfg.output_ctm_folder, + cfg.output_dir, cfg.manifest_filepath, cfg.additional_ctm_grouping_separator, cfg.n_parts_for_ctm_id, diff --git a/tools/nemo_forced_aligner/utils/make_output_files.py b/tools/nemo_forced_aligner/utils/make_output_files.py index 3daafc487850..6ce7957af4f8 100644 --- a/tools/nemo_forced_aligner/utils/make_output_files.py +++ b/tools/nemo_forced_aligner/utils/make_output_files.py @@ -76,7 +76,7 @@ def make_ctm( data_list, model, model_downsample_factor, - output_ctm_folder, + output_dir, remove_blank_tokens_from_ctm, n_parts_for_ctm_id, minimum_timestamp_duration, @@ -89,7 +89,7 @@ def make_ctm( BLANK_TOKEN = "" # TODO: move outside of function - os.makedirs(output_ctm_folder, exist_ok=True) + os.makedirs(output_dir, exist_ok=True) spectrogram_hop_length = model.preprocessor.featurizer.hop_length timestep_to_sample_ratio = spectrogram_hop_length * model_downsample_factor @@ -106,7 +106,7 @@ def make_ctm( with sf.SoundFile(manifest_line["audio_filepath"]) as f: audio_file_duration = f.frames / f.samplerate - with open(os.path.join(output_ctm_folder, f"{utt_id}.ctm"), "w") as f_ctm: + with open(os.path.join(output_dir, f"{utt_id}.ctm"), "w") as f_ctm: for segment_info in boundary_info: text = segment_info["text"] start_sample = segment_info["t_start"] * timestep_to_sample_ratio @@ -127,7 +127,7 @@ def make_ctm( def make_new_manifest( - output_ctm_folder, + output_dir, original_manifest_filepath, additional_ctm_grouping_separator, n_parts_for_ctm_id, @@ -144,7 +144,7 @@ def make_new_manifest( ) tgt_manifest_name = str(Path(original_manifest_filepath).stem) + "_with_ctm_paths.json" - tgt_manifest_filepath = str(Path(output_ctm_folder) / tgt_manifest_name) + tgt_manifest_filepath = str(Path(output_dir) / tgt_manifest_name) with open(original_manifest_filepath, 'r') as fin, open(tgt_manifest_filepath, 'w') as fout: for i_line, line in enumerate(fin): @@ -152,12 +152,12 @@ def make_new_manifest( utt_id = _get_utt_id(data["audio_filepath"], n_parts_for_ctm_id) - data["token_level_ctm_filepath"] = str(Path(output_ctm_folder) / "tokens" / f"{utt_id}.ctm") - data["word_level_ctm_filepath"] = str(Path(output_ctm_folder) / "words" / f"{utt_id}.ctm") + data["token_level_ctm_filepath"] = str(Path(output_dir) / "tokens" / f"{utt_id}.ctm") + data["word_level_ctm_filepath"] = str(Path(output_dir) / "words" / f"{utt_id}.ctm") if additional_ctm_grouping_separator: data["additional_segment_level_ctm_filepath"] = str( - Path(output_ctm_folder) / "additional_segments" / f"{utt_id}.ctm" + Path(output_dir) / "additional_segments" / f"{utt_id}.ctm" ) if pred_text_all_lines: From 5b658271e106f28be91b8a9ec99c98bfb47ebb15 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 10 Jan 2023 13:38:30 -0800 Subject: [PATCH 126/154] rename n_parts_for_ctm to audio_filepath_parts_in_utt_id Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 4 ++-- tools/nemo_forced_aligner/align.py | 18 +++++++++--------- .../utils/make_output_files.py | 12 ++++++------ 3 files changed, 17 insertions(+), 17 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index a0541cd9096e..09d0e2057c56 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -27,7 +27,7 @@ Call the `align.py` script, specifying the parameters as follows: * `manifest_filepath`: The path to the manifest of the data you want to align, containing `'audio_filepath'` and `'text'` fields. The audio filepaths need to be absolute paths. -* `output_dir`: The folder where to save CTM files containing the generated alignments and new JSON manifest containing paths to those CTM files. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/{tokens,words,additional_segments}/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `n_parts_for_ctm_id`. The new JSON manifest will be at `/_with_ctm_paths.json`. +* `output_dir`: The folder where to save CTM files containing the generated alignments and new JSON manifest containing paths to those CTM files. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `/{tokens,words,additional_segments}/.ctm` and each line in each file will start with ``. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `audio_filepath_parts_in_utt_id`. The new JSON manifest will be at `/_with_ctm_paths.json`. * **[OPTIONAL]** `align_using_pred_text`: if True, will transcribe the audio using the ASR model (specified by `pretrained_name` or `model_path`) and then use that transcription as the 'ground truth' for the forced alignment. The `"pred_text"` will be saved in the output JSON manifest at `/{original manifest name}_with_ctm_paths.json`. To avoid over-writing other transcribed texts, if there are already `"pred_text"` entries in the original manifest, the program will exit without attempting to generate alignments. (Default: False). @@ -42,7 +42,7 @@ Call the `align.py` script, specifying the parameters as follows: * **[OPTIONAL]** `remove_blank_tokens_from_ctm`: a boolean denoting whether to remove tokens from token-level output CTMs. (Default: False). -* **[OPTIONAL]** `n_parts_for_ctm_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be replaced with dashes, so as not to change the number of space-separated elements in the CTM files. +* **[OPTIONAL]** `audio_filepath_parts_in_utt_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be replaced with dashes, so as not to change the number of space-separated elements in the CTM files. * **[OPTIONAL]** `minimum_timestamp_duration`: a float indicating a minimum duration (in seconds) for timestamps in the CTM. If any line in the CTM has a duration lower than the `minimum_timestamp_duration`, it will be enlarged from the middle outwards until it meets the minimum_timestamp_duration, or reaches the beginning or end of the audio file. Note that this may cause timestamps to overlap. (Default: 0, i.e. no modifications to predicted duration). diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 19c15a5dfabd..59634101e6af 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -69,14 +69,14 @@ `/{tokens/words/additional_segments}/.ctm` will be created containing CTMs for `addtional_ctm_grouping_separator`-separated segments. remove_blank_tokens_from_ctm: a boolean denoting whether to remove tokens from token-level output CTMs. - n_parts_for_ctm_id: int specifying how many of the 'parts' of the audio_filepath + audio_filepath_parts_in_utt_id: int specifying how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. Note also that any spaces that are present in the audio_filepath will be replaced with dashes, so as not to change the number of space-separated elements in the CTM files. - e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 1 => utt_id will be "e1" - e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 2 => utt_id will be "d_e1" - e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and n_parts_for_ctm_id is 3 => utt_id will be "c_d_e1" + e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and audio_filepath_parts_in_utt_id is 1 => utt_id will be "e1" + e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and audio_filepath_parts_in_utt_id is 2 => utt_id will be "d_e1" + e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and audio_filepath_parts_in_utt_id is 3 => utt_id will be "c_d_e1" minimum_timestamp_duration: a float indicating a minimum duration (in seconds) for timestamps in the CTM. If any line in the CTM has a duration lower than the `minimum_timestamp_duration`, it will be enlarged from the middle outwards until it meets the minimum_timestamp_duration, or reaches the beginning or end of the audio @@ -101,7 +101,7 @@ class AlignmentConfig: additional_ctm_grouping_separator: Optional[str] = None remove_blank_tokens_from_ctm: bool = False minimum_timestamp_duration: float = 0 - n_parts_for_ctm_id: int = 1 + audio_filepath_parts_in_utt_id: int = 1 @hydra_runner(config_name="AlignmentConfig", schema=AlignmentConfig) @@ -208,7 +208,7 @@ def main(cfg: AlignmentConfig): cfg.model_downsample_factor, os.path.join(cfg.output_dir, "tokens"), cfg.remove_blank_tokens_from_ctm, - cfg.n_parts_for_ctm_id, + cfg.audio_filepath_parts_in_utt_id, cfg.minimum_timestamp_duration, audio_sr, ) @@ -221,7 +221,7 @@ def main(cfg: AlignmentConfig): cfg.model_downsample_factor, os.path.join(cfg.output_dir, "words"), False, # dont try to remove blank tokens because we dont expect them to be there anyway - cfg.n_parts_for_ctm_id, + cfg.audio_filepath_parts_in_utt_id, cfg.minimum_timestamp_duration, audio_sr, ) @@ -235,7 +235,7 @@ def main(cfg: AlignmentConfig): cfg.model_downsample_factor, os.path.join(cfg.output_dir, "additional_segments"), False, # dont try to remove blank tokens because we dont expect them to be there anyway - cfg.n_parts_for_ctm_id, + cfg.audio_filepath_parts_in_utt_id, cfg.minimum_timestamp_duration, audio_sr, ) @@ -244,7 +244,7 @@ def main(cfg: AlignmentConfig): cfg.output_dir, cfg.manifest_filepath, cfg.additional_ctm_grouping_separator, - cfg.n_parts_for_ctm_id, + cfg.audio_filepath_parts_in_utt_id, pred_text_all_lines, ) diff --git a/tools/nemo_forced_aligner/utils/make_output_files.py b/tools/nemo_forced_aligner/utils/make_output_files.py index 6ce7957af4f8..e7ec00d0c6e2 100644 --- a/tools/nemo_forced_aligner/utils/make_output_files.py +++ b/tools/nemo_forced_aligner/utils/make_output_files.py @@ -19,8 +19,8 @@ import soundfile as sf -def _get_utt_id(audio_filepath, n_parts_for_ctm_id): - fp_parts = Path(audio_filepath).parts[-n_parts_for_ctm_id:] +def _get_utt_id(audio_filepath, audio_filepath_parts_in_utt_id): + fp_parts = Path(audio_filepath).parts[-audio_filepath_parts_in_utt_id:] utt_id = Path("_".join(fp_parts)).stem utt_id = utt_id.replace(" ", "-") # replace any spaces in the filepath with dashes return utt_id @@ -78,7 +78,7 @@ def make_ctm( model_downsample_factor, output_dir, remove_blank_tokens_from_ctm, - n_parts_for_ctm_id, + audio_filepath_parts_in_utt_id, minimum_timestamp_duration, audio_sr, ): @@ -99,7 +99,7 @@ def make_ctm( boundary_info = add_t_start_end_to_boundary_info(boundary_info, alignment) # get utt_id that will be used for saving CTM file as .ctm - utt_id = _get_utt_id(manifest_line['audio_filepath'], n_parts_for_ctm_id) + utt_id = _get_utt_id(manifest_line['audio_filepath'], audio_filepath_parts_in_utt_id) # get audio file duration if we will need it later if minimum_timestamp_duration > 0: @@ -130,7 +130,7 @@ def make_new_manifest( output_dir, original_manifest_filepath, additional_ctm_grouping_separator, - n_parts_for_ctm_id, + audio_filepath_parts_in_utt_id, pred_text_all_lines, ): if pred_text_all_lines: @@ -150,7 +150,7 @@ def make_new_manifest( for i_line, line in enumerate(fin): data = json.loads(line) - utt_id = _get_utt_id(data["audio_filepath"], n_parts_for_ctm_id) + utt_id = _get_utt_id(data["audio_filepath"], audio_filepath_parts_in_utt_id) data["token_level_ctm_filepath"] = str(Path(output_dir) / "tokens" / f"{utt_id}.ctm") data["word_level_ctm_filepath"] = str(Path(output_dir) / "words" / f"{utt_id}.ctm") From bb32cea080f4d72fc13f6f5b4a39a7c5302a32db Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 10 Jan 2023 14:24:47 -0800 Subject: [PATCH 127/154] Rename some variables to improve readability Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 41 ++++--- tools/nemo_forced_aligner/utils/data_prep.py | 100 ++++++++++-------- .../utils/make_output_files.py | 61 ++++++----- .../utils/viterbi_decoding.py | 38 +++---- 4 files changed, 133 insertions(+), 107 deletions(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 59634101e6af..b86cf566a565 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -22,7 +22,7 @@ get_audio_sr, get_batch_starts_ends, get_log_probs_y_T_U, - get_manifest_lines, + get_manifest_lines_batch, is_entry_in_all_lines, is_entry_in_any_lines, ) @@ -189,21 +189,30 @@ def main(cfg: AlignmentConfig): # get alignment and save in CTM batch-by-batch for start, end in zip(starts, ends): - data = get_manifest_lines(cfg.manifest_filepath, start, end) - - log_probs, y, T, U, token_info, word_info, segment_info, pred_text_list = get_log_probs_y_T_U( - data, model, cfg.additional_ctm_grouping_separator, cfg.align_using_pred_text, + manifest_lines_batch = get_manifest_lines_batch(cfg.manifest_filepath, start, end) + + ( + log_probs_batch, + y_batch, + T_batch, + U_batch, + token_info_batch, + word_info_batch, + segment_info_batch, + pred_text_batch, + ) = get_log_probs_y_T_U( + manifest_lines_batch, model, cfg.additional_ctm_grouping_separator, cfg.align_using_pred_text, ) if cfg.align_using_pred_text: - pred_text_all_lines.extend(pred_text_list) + pred_text_all_lines.extend(pred_text_batch) - alignments = viterbi_decoding(log_probs, y, T, U, viterbi_device) + alignments_batch = viterbi_decoding(log_probs_batch, y_batch, T_batch, U_batch, viterbi_device) make_ctm( - token_info, - alignments, - data, + token_info_batch, + alignments_batch, + manifest_lines_batch, model, cfg.model_downsample_factor, os.path.join(cfg.output_dir, "tokens"), @@ -214,9 +223,9 @@ def main(cfg: AlignmentConfig): ) make_ctm( - word_info, - alignments, - data, + word_info_batch, + alignments_batch, + manifest_lines_batch, model, cfg.model_downsample_factor, os.path.join(cfg.output_dir, "words"), @@ -228,9 +237,9 @@ def main(cfg: AlignmentConfig): if cfg.additional_ctm_grouping_separator: make_ctm( - segment_info, - alignments, - data, + segment_info_batch, + alignments_batch, + manifest_lines_batch, model, cfg.model_downsample_factor, os.path.join(cfg.output_dir, "additional_segments"), diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index 0605cf9ba591..d3766ac08c96 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -84,19 +84,19 @@ def get_audio_sr(manifest_filepath): return f_audio.samplerate -def get_manifest_lines(manifest_filepath, start, end): - data = [] +def get_manifest_lines_batch(manifest_filepath, start, end): + manifest_lines_batch = [] with open(manifest_filepath, "r") as f: for line_i, line in enumerate(f): if line_i == start and line_i == end: - data.append(json.loads(line)) + manifest_lines_batch.append(json.loads(line)) break if line_i == end: break if line_i >= start: - data.append(json.loads(line)) - return data + manifest_lines_batch.append(json.loads(line)) + return manifest_lines_batch def get_char_tokens(text, model): @@ -111,6 +111,9 @@ def get_char_tokens(text, model): def get_y_token_word_segment_info(text, model, separator): + """ + Get y, token_info, word_info and segment_info for the text provided, tokenized by the model provided. + """ if hasattr(model, 'tokenizer'): @@ -123,7 +126,7 @@ def get_y_token_word_segment_info(text, model, separator): segments = [seg.strip() for seg in segments] BLANK_ID = len(model.decoder.vocabulary) # TODO: check - BLANK_TOKEN = "" + BLANK_TOKEN = "" y_token_ids = [] y_token_ids_with_blanks = [BLANK_ID] y_tokens = [] @@ -277,7 +280,7 @@ def get_y_token_word_segment_info(text, model, separator): raise RuntimeError("Cannot get tokens of this model.") -def get_log_probs_y_T_U(data, model, separator, align_using_pred_text): +def get_log_probs_y_T_U(manifest_lines_batch, model, separator, align_using_pred_text): """ Preparing some tensors to be used during Viterbi decoding. Returns: @@ -286,57 +289,66 @@ def get_log_probs_y_T_U(data, model, separator, align_using_pred_text): pred_text_list """ - audio_filepaths = [line["audio_filepath"] for line in data] - B = len(audio_filepaths) + audio_filepaths_batch = [line["audio_filepath"] for line in manifest_lines_batch] + B = len(audio_filepaths_batch) with torch.no_grad(): - hypotheses = model.transcribe(audio_filepaths, return_hypotheses=True, batch_size=B) + hypotheses = model.transcribe(audio_filepaths_batch, return_hypotheses=True, batch_size=B) - log_probs_list = [] - T_list = [] - pred_text_list = [] + log_probs_list_batch = [] + T_list_batch = [] + pred_text_batch = [] for hypothesis in hypotheses: - log_probs_list.append(hypothesis.y_sequence) - T_list.append(hypothesis.y_sequence.shape[0]) - pred_text_list.append(hypothesis.text) + log_probs_list_batch.append(hypothesis.y_sequence) + T_list_batch.append(hypothesis.y_sequence.shape[0]) + pred_text_batch.append(hypothesis.text) - y_list = [] - U_list = [] - token_info_list = [] - word_info_list = [] - segment_info_list = [] + y_list_batch = [] + U_list_batch = [] + token_info_batch = [] + word_info_batch = [] + segment_info_batch = [] - for i_line, line in enumerate(data): + for i_line, line in enumerate(manifest_lines_batch): if align_using_pred_text: - gt_text_for_alignment = pred_text_list[i_line] + gt_text_for_alignment = pred_text_batch[i_line] else: gt_text_for_alignment = line["text"] - y_line, token_info, word_info, segment_info = get_y_token_word_segment_info( + y_utt, token_info_utt, word_info_utt, segment_info_utt = get_y_token_word_segment_info( gt_text_for_alignment, model, separator ) - y_list.append(y_line) - U_list.append(len(y_line)) - token_info_list.append(token_info) - word_info_list.append(word_info) - segment_info_list.append(segment_info) + y_list_batch.append(y_utt) + U_list_batch.append(len(y_utt)) + token_info_batch.append(token_info_utt) + word_info_batch.append(word_info_utt) + segment_info_batch.append(segment_info_utt) - T_max = max(T_list) - U_max = max(U_list) + T_max = max(T_list_batch) + U_max = max(U_list_batch) V = len(model.decoder.vocabulary) + 1 - T = torch.tensor(T_list) - U = torch.tensor(U_list) + T_batch = torch.tensor(T_list_batch) + U_batch = torch.tensor(U_list_batch) - # make log_probs tensor of shape (B x T_max x V) - log_probs = V_NEG_NUM * torch.ones((B, T_max, V)) - for b, log_probs_b in enumerate(log_probs_list): - t = log_probs_b.shape[0] - log_probs[b, :t, :] = log_probs_b + # make log_probs_batch tensor of shape (B x T_max x V) + log_probs_batch = V_NEG_NUM * torch.ones((B, T_max, V)) + for b, log_probs_utt in enumerate(log_probs_list_batch): + t = log_probs_utt.shape[0] + log_probs_batch[b, :t, :] = log_probs_utt # make y tensor of shape (B x U_max) - y = V * torch.ones((B, U_max), dtype=torch.int64) - for b, y_b in enumerate(y_list): - U_b = U[b] - y[b, :U_b] = torch.tensor(y_b) - - return log_probs, y, T, U, token_info_list, word_info_list, segment_info_list, pred_text_list + y_batch = V * torch.ones((B, U_max), dtype=torch.int64) + for b, y_utt in enumerate(y_list_batch): + U_utt = U_batch[b] + y_batch[b, :U_utt] = torch.tensor(y_utt) + + return ( + log_probs_batch, + y_batch, + T_batch, + U_batch, + token_info_batch, + word_info_batch, + segment_info_batch, + pred_text_batch, + ) diff --git a/tools/nemo_forced_aligner/utils/make_output_files.py b/tools/nemo_forced_aligner/utils/make_output_files.py index e7ec00d0c6e2..c8d1dc8075a0 100644 --- a/tools/nemo_forced_aligner/utils/make_output_files.py +++ b/tools/nemo_forced_aligner/utils/make_output_files.py @@ -26,14 +26,17 @@ def _get_utt_id(audio_filepath, audio_filepath_parts_in_utt_id): return utt_id -def add_t_start_end_to_boundary_info(boundary_info, alignment): +def add_t_start_end_to_boundary_info(boundary_info_utt, alignment_utt): # first remove boundary_info of any items that are not in the alignment # the only items we expect not to be in the alignment are blanks which the alignment chooses to skip # we will iterate boundary_info in reverse order for this to make popping the items simple - s_in_alignment = set(alignment) - for boundary_info_pointer in range(len(boundary_info) - 1, -1, -1): + s_in_alignment = set(alignment_utt) + for boundary_info_pointer in range(len(boundary_info_utt) - 1, -1, -1): s_in_boundary_info = set( - range(boundary_info[boundary_info_pointer]["s_start"], boundary_info[boundary_info_pointer]["s_end"] + 1) + range( + boundary_info_utt[boundary_info_pointer]["s_start"], + boundary_info_utt[boundary_info_pointer]["s_end"] + 1, + ) ) item_not_in_alignment = True for s_ in s_in_boundary_info: @@ -41,39 +44,39 @@ def add_t_start_end_to_boundary_info(boundary_info, alignment): item_not_in_alignment = False if item_not_in_alignment: - boundary_info.pop(boundary_info_pointer) + boundary_info_utt.pop(boundary_info_pointer) # now updated boundary_info with s_start and s_end boundary_info_pointer = 0 - for t, s_at_t in enumerate(alignment): - if s_at_t == boundary_info[boundary_info_pointer]["s_start"]: - if "t_start" not in boundary_info[boundary_info_pointer]: - boundary_info[boundary_info_pointer]["t_start"] = t + for t, s_at_t in enumerate(alignment_utt): + if s_at_t == boundary_info_utt[boundary_info_pointer]["s_start"]: + if "t_start" not in boundary_info_utt[boundary_info_pointer]: + boundary_info_utt[boundary_info_pointer]["t_start"] = t - if t < len(alignment) - 1: - if alignment[t + 1] > boundary_info[boundary_info_pointer]["s_end"]: - if "t_end" not in boundary_info[boundary_info_pointer]: - boundary_info[boundary_info_pointer]["t_end"] = t + if t < len(alignment_utt) - 1: + if alignment_utt[t + 1] > boundary_info_utt[boundary_info_pointer]["s_end"]: + if "t_end" not in boundary_info_utt[boundary_info_pointer]: + boundary_info_utt[boundary_info_pointer]["t_end"] = t boundary_info_pointer += 1 else: # i.e. t == len(alignment) - 1, i.e. we are a the final element in alignment # add final t_end if we haven't already - if "t_end" not in boundary_info[boundary_info_pointer]: - boundary_info[boundary_info_pointer]["t_end"] = t + if "t_end" not in boundary_info_utt[boundary_info_pointer]: + boundary_info_utt[boundary_info_pointer]["t_end"] = t - if boundary_info_pointer == len(boundary_info): + if boundary_info_pointer == len(boundary_info_utt): # we have finished populating boundary_info with t_start and t_end, # but we might have some final remaining elements (blanks) in the alignment which we dont care about # => break, so as not to cause issues trying to access boundary_info[boundary_info_pointer] break - return boundary_info + return boundary_info_utt def make_ctm( - boundary_info_list, - alignment_list, - data_list, + boundary_info_batch, + alignments_batch, + manifest_lines_batch, model, model_downsample_factor, output_dir, @@ -83,9 +86,9 @@ def make_ctm( audio_sr, ): """ - Note: assume same order of utts in boundary_info, alignments and data + Note: assume same order of utts in boundary_info_batch, alignments_batch and manifest_lines_batch """ - assert len(boundary_info_list) == len(alignment_list) == len(data_list) + assert len(boundary_info_batch) == len(alignments_batch) == len(manifest_lines_batch) BLANK_TOKEN = "" # TODO: move outside of function @@ -94,9 +97,11 @@ def make_ctm( spectrogram_hop_length = model.preprocessor.featurizer.hop_length timestep_to_sample_ratio = spectrogram_hop_length * model_downsample_factor - for boundary_info, alignment, manifest_line in zip(boundary_info_list, alignment_list, data_list): + for boundary_info_utt, alignment_utt, manifest_line in zip( + boundary_info_batch, alignments_batch, manifest_lines_batch + ): - boundary_info = add_t_start_end_to_boundary_info(boundary_info, alignment) + boundary_info_utt = add_t_start_end_to_boundary_info(boundary_info_utt, alignment_utt) # get utt_id that will be used for saving CTM file as .ctm utt_id = _get_utt_id(manifest_line['audio_filepath'], audio_filepath_parts_in_utt_id) @@ -107,10 +112,10 @@ def make_ctm( audio_file_duration = f.frames / f.samplerate with open(os.path.join(output_dir, f"{utt_id}.ctm"), "w") as f_ctm: - for segment_info in boundary_info: - text = segment_info["text"] - start_sample = segment_info["t_start"] * timestep_to_sample_ratio - end_sample = (segment_info["t_end"] + 1) * timestep_to_sample_ratio - 1 + for boundary_info_ in boundary_info_utt: + text = boundary_info_["text"] + start_sample = boundary_info_["t_start"] * timestep_to_sample_ratio + end_sample = (boundary_info_["t_end"] + 1) * timestep_to_sample_ratio - 1 start_time = start_sample / audio_sr end_time = end_sample / audio_sr diff --git a/tools/nemo_forced_aligner/utils/viterbi_decoding.py b/tools/nemo_forced_aligner/utils/viterbi_decoding.py index a7b52272cec9..55228b1e708f 100644 --- a/tools/nemo_forced_aligner/utils/viterbi_decoding.py +++ b/tools/nemo_forced_aligner/utils/viterbi_decoding.py @@ -17,36 +17,36 @@ V_NEG_NUM = -1e30 -def viterbi_decoding(log_probs, y, T, U, device): +def viterbi_decoding(log_probs_batch, y_batch, T_batch, U_batch, viterbi_device): """ Does Viterbi decoding. Returns: - alignments: list of lists containing locations for the tokens we align to at each timestep. + alignments_batch: list of lists containing locations for the tokens we align to at each timestep. Looks like: [[0, 0, 1, 2, 2, 3, 3, ... ], [0, 1, 2, 2, 2, 3, 4, ....], ...] """ - B, T_max, _ = log_probs.shape - U_max = y.shape[1] + B, T_max, _ = log_probs_batch.shape + U_max = y_batch.shape[1] # transfer all tensors to device - log_probs = log_probs.to(device) - y = y.to(device) - T = T.to(device) - U = U.to(device) + log_probs_batch = log_probs_batch.to(viterbi_device) + y_batch = y_batch.to(viterbi_device) + T_batch = T_batch.to(viterbi_device) + U_batch = U_batch.to(viterbi_device) - padding_for_log_probs = V_NEG_NUM * torch.ones((B, T_max, 1), device=device) - log_probs_padded = torch.cat((log_probs, padding_for_log_probs), dim=2) - log_probs_reordered = torch.gather(input=log_probs_padded, dim=2, index=y.unsqueeze(1).repeat(1, T_max, 1)) + padding_for_log_probs = V_NEG_NUM * torch.ones((B, T_max, 1), device=viterbi_device) + log_probs_padded = torch.cat((log_probs_batch, padding_for_log_probs), dim=2) + log_probs_reordered = torch.gather(input=log_probs_padded, dim=2, index=y_batch.unsqueeze(1).repeat(1, T_max, 1)) v_matrix = V_NEG_NUM * torch.ones_like(log_probs_reordered) backpointers = -999 * torch.ones_like(v_matrix) v_matrix[:, 0, :2] = log_probs_reordered[:, 0, :2] - y_shifted_left = torch.roll(y, shifts=2, dims=1) - letter_repetition_mask = y - y_shifted_left + y_shifted_left = torch.roll(y_batch, shifts=2, dims=1) + letter_repetition_mask = y_batch - y_shifted_left letter_repetition_mask[:, :2] = 1 # make sure dont apply mask to first 2 tokens letter_repetition_mask = letter_repetition_mask == 0 - bp_absolute_template = torch.arange(U_max, device=device).unsqueeze(0).repeat(B, 1) + bp_absolute_template = torch.arange(U_max, device=viterbi_device).unsqueeze(0).repeat(B, 1) for t in range(1, T_max): @@ -73,15 +73,15 @@ def viterbi_decoding(log_probs, y, T, U, device): backpointers[:, t, :] = bp_absolute # trace backpointers TODO: parallelize over batch_size - alignments = [] + alignments_batch = [] for b in range(B): - T_b = int(T[b]) - U_b = int(U[b]) + T_b = int(T_batch[b]) + U_b = int(U_batch[b]) final_state = int(torch.argmax(v_matrix[b, T_b - 1, U_b - 2 : U_b])) + U_b - 2 alignment_b = [final_state] for t in range(T_b - 1, 0, -1): alignment_b.insert(0, int(backpointers[b, t, alignment_b[0]])) - alignments.append(alignment_b) + alignments_batch.append(alignment_b) - return alignments + return alignments_batch From cdb187623ca8072a05d5f96ba0a227d5c47b0bf4 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 10 Jan 2023 14:38:31 -0800 Subject: [PATCH 128/154] move constants to separate file Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/utils/constants.py | 19 +++++++++++++++++++ tools/nemo_forced_aligner/utils/data_prep.py | 7 ++----- .../utils/make_output_files.py | 6 ++++-- .../utils/viterbi_decoding.py | 12 ++++++------ 4 files changed, 31 insertions(+), 13 deletions(-) create mode 100644 tools/nemo_forced_aligner/utils/constants.py diff --git a/tools/nemo_forced_aligner/utils/constants.py b/tools/nemo_forced_aligner/utils/constants.py new file mode 100644 index 000000000000..14a2cf3d62e8 --- /dev/null +++ b/tools/nemo_forced_aligner/utils/constants.py @@ -0,0 +1,19 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +BLANK_TOKEN = "" + +SPACE_TOKEN = "" + +V_NEGATIVE_NUM = -1e30 diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index d3766ac08c96..8da494d3eda3 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -18,7 +18,7 @@ import soundfile as sf import torch -V_NEG_NUM = -1e30 +from .constants import BLANK_TOKEN, SPACE_TOKEN, V_NEGATIVE_NUM def get_batch_starts_ends(manifest_filepath, batch_size): @@ -126,7 +126,6 @@ def get_y_token_word_segment_info(text, model, separator): segments = [seg.strip() for seg in segments] BLANK_ID = len(model.decoder.vocabulary) # TODO: check - BLANK_TOKEN = "" y_token_ids = [] y_token_ids_with_blanks = [BLANK_ID] y_tokens = [] @@ -194,9 +193,7 @@ def get_y_token_word_segment_info(text, model, separator): segments = [seg.strip() for seg in segments] BLANK_ID = len(model.decoder.vocabulary) # TODO: check this is correct - BLANK_TOKEN = "" SPACE_ID = model.decoder.vocabulary.index(" ") - SPACE_TOKEN = "" y_token_ids = [] y_token_ids_with_blanks = [BLANK_ID] y_tokens = [] @@ -331,7 +328,7 @@ def get_log_probs_y_T_U(manifest_lines_batch, model, separator, align_using_pred U_batch = torch.tensor(U_list_batch) # make log_probs_batch tensor of shape (B x T_max x V) - log_probs_batch = V_NEG_NUM * torch.ones((B, T_max, V)) + log_probs_batch = V_NEGATIVE_NUM * torch.ones((B, T_max, V)) for b, log_probs_utt in enumerate(log_probs_list_batch): t = log_probs_utt.shape[0] log_probs_batch[b, :t, :] = log_probs_utt diff --git a/tools/nemo_forced_aligner/utils/make_output_files.py b/tools/nemo_forced_aligner/utils/make_output_files.py index c8d1dc8075a0..479295372f1e 100644 --- a/tools/nemo_forced_aligner/utils/make_output_files.py +++ b/tools/nemo_forced_aligner/utils/make_output_files.py @@ -18,6 +18,8 @@ import soundfile as sf +from .constants import BLANK_TOKEN, SPACE_TOKEN + def _get_utt_id(audio_filepath, audio_filepath_parts_in_utt_id): fp_parts = Path(audio_filepath).parts[-audio_filepath_parts_in_utt_id:] @@ -90,8 +92,6 @@ def make_ctm( """ assert len(boundary_info_batch) == len(alignments_batch) == len(manifest_lines_batch) - BLANK_TOKEN = "" # TODO: move outside of function - os.makedirs(output_dir, exist_ok=True) spectrogram_hop_length = model.preprocessor.featurizer.hop_length @@ -126,6 +126,8 @@ def make_ctm( end_time = min(token_mid_point + minimum_timestamp_duration / 2, audio_file_duration) if not (text == BLANK_TOKEN and remove_blank_tokens_from_ctm): + # replace any spaces with so we dont introduce extra space characters to our CTM files + text = text.replace(" ", SPACE_TOKEN) f_ctm.write(f"{utt_id} 1 {start_time:.2f} {end_time - start_time:.2f} {text}\n") return None diff --git a/tools/nemo_forced_aligner/utils/viterbi_decoding.py b/tools/nemo_forced_aligner/utils/viterbi_decoding.py index 55228b1e708f..3ed1f147ef2d 100644 --- a/tools/nemo_forced_aligner/utils/viterbi_decoding.py +++ b/tools/nemo_forced_aligner/utils/viterbi_decoding.py @@ -14,7 +14,7 @@ import torch -V_NEG_NUM = -1e30 +from .constants import V_NEGATIVE_NUM def viterbi_decoding(log_probs_batch, y_batch, T_batch, U_batch, viterbi_device): @@ -33,11 +33,11 @@ def viterbi_decoding(log_probs_batch, y_batch, T_batch, U_batch, viterbi_device) T_batch = T_batch.to(viterbi_device) U_batch = U_batch.to(viterbi_device) - padding_for_log_probs = V_NEG_NUM * torch.ones((B, T_max, 1), device=viterbi_device) + padding_for_log_probs = V_NEGATIVE_NUM * torch.ones((B, T_max, 1), device=viterbi_device) log_probs_padded = torch.cat((log_probs_batch, padding_for_log_probs), dim=2) log_probs_reordered = torch.gather(input=log_probs_padded, dim=2, index=y_batch.unsqueeze(1).repeat(1, T_max, 1)) - v_matrix = V_NEG_NUM * torch.ones_like(log_probs_reordered) + v_matrix = V_NEGATIVE_NUM * torch.ones_like(log_probs_reordered) backpointers = -999 * torch.ones_like(v_matrix) v_matrix[:, 0, :2] = log_probs_reordered[:, 0, :2] @@ -54,11 +54,11 @@ def viterbi_decoding(log_probs_batch, y_batch, T_batch, U_batch, viterbi_device) v_prev = v_matrix[:, t - 1, :] v_prev_shifted = torch.roll(v_prev, shifts=1, dims=1) - v_prev_shifted[:, 0] = V_NEG_NUM + v_prev_shifted[:, 0] = V_NEGATIVE_NUM v_prev_shifted2 = torch.roll(v_prev, shifts=2, dims=1) - v_prev_shifted2[:, :2] = V_NEG_NUM - v_prev_shifted2.masked_fill_(letter_repetition_mask, V_NEG_NUM) + v_prev_shifted2[:, :2] = V_NEGATIVE_NUM + v_prev_shifted2.masked_fill_(letter_repetition_mask, V_NEGATIVE_NUM) v_prev_dup = torch.cat( (v_prev.unsqueeze(2), v_prev_shifted.unsqueeze(2), v_prev_shifted2.unsqueeze(2),), dim=2, From 20eac8e77a14f98b0ead22146959c1dc9e61aab4 Mon Sep 17 00:00:00 2001 From: Sandeep Subramanian Date: Tue, 10 Jan 2023 14:57:43 -0800 Subject: [PATCH 129/154] Add extra data args to support proper finetuning of HF converted T5 checkpoints (#5719) * Initial addition of extra args Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change defaults Signed-off-by: MaximumEntropy Signed-off-by: MaximumEntropy Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- .../conf/megatron_t0_config.yaml | 2 +- .../megatron_t5_config_finetune_eval.yaml | 4 ++++ ...megatron_t5_config_finetune_glue_eval.yaml | 3 +++ ...megatron_t5_config_finetune_glue_mnli.yaml | 7 +++++++ ...megatron_t5_config_finetune_glue_xnli.yaml | 12 ++++++++++++ .../conf/megatron_t5_finetune.yaml | 6 ++++++ .../common/sequence_to_sequence_dataset.py | 19 +++++++++++++------ .../megatron_finetune_model.py | 6 ++++++ 8 files changed, 52 insertions(+), 7 deletions(-) diff --git a/examples/nlp/language_modeling/conf/megatron_t0_config.yaml b/examples/nlp/language_modeling/conf/megatron_t0_config.yaml index 503dc17d2acc..d6ecb146b4cb 100644 --- a/examples/nlp/language_modeling/conf/megatron_t0_config.yaml +++ b/examples/nlp/language_modeling/conf/megatron_t0_config.yaml @@ -62,7 +62,7 @@ model: max_tgt_seq_length: 512 drop_last: True concat_sampling_probabilities: ??? # When providing a list of datasets, this arg defines the sampling probabilities from each dataset when strategy='random' - replace_bos_with_pad: True # Replaces bos with pad for both the encoder and decoder. This is necessary when using Google's T5 checkpoints. + replace_bos_with_pad: False # Replaces bos with pad for both the encoder and decoder. This is necessary when using Google's T5 checkpoints. add_bos_to_input: False # Adds bos to the input sequence. add_eos_to_input: False # Adds eos to the input sequence. seed: 1234 diff --git a/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_eval.yaml b/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_eval.yaml index bc1a7420df48..95d33113e3cf 100644 --- a/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_eval.yaml +++ b/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_eval.yaml @@ -40,6 +40,10 @@ model: drop_last: False # TODO: Figure out if there is a way to avoid dropping last. write_predictions_to_file: False output_file_path_prefix: null # Prefix of the file to write predictions to. + replace_bos_with_pad: False # Replaces bos with pad for both the encoder and decoder. This is necessary when using Google's T5 checkpoints. + add_bos_to_input: False # Adds bos to the input sequence. + add_eos_to_input: False # Adds eos to the input sequence. + metric: name: "exact_string_match" # Name of the evaluation metric to use. average: micro # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported. diff --git a/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_glue_eval.yaml b/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_glue_eval.yaml index 024ad5f66ae9..59dddda07f19 100644 --- a/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_glue_eval.yaml +++ b/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_glue_eval.yaml @@ -41,6 +41,9 @@ model: drop_last: False write_predictions_to_file: False output_file_path_prefix: null # Prefix of the file to write predictions to. + replace_bos_with_pad: False # Replaces bos with pad for both the encoder and decoder. This is necessary when using Google's T5 checkpoints. + add_bos_to_input: False # Adds bos to the input sequence. + add_eos_to_input: False # Adds eos to the input sequence. metric: name: "exact_string_match" # Name of the evaluation metric to use. average: null # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported. diff --git a/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_glue_mnli.yaml b/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_glue_mnli.yaml index ff61c5fde20c..e7098e974211 100644 --- a/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_glue_mnli.yaml +++ b/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_glue_mnli.yaml @@ -62,6 +62,10 @@ model: pin_memory: True max_seq_length: 512 drop_last: True + replace_bos_with_pad: False # Replaces bos with pad for both the encoder and decoder. This is necessary when using Google's T5 checkpoints. + add_bos_to_input: False # Adds bos to the input sequence. + add_eos_to_input: False # Adds eos to the input sequence. + validation_ds: task_name: 'mnli' @@ -75,6 +79,9 @@ model: drop_last: False # TODO: Figure out if there is a way to avoid dropping last. write_predictions_to_file: False output_file_path_prefix: null # Prefix of the file to write predictions to. + replace_bos_with_pad: ${data.train_ds.replace_bos_with_pad} + add_bos_to_input: ${data.train_ds.add_bos_to_input} + add_eos_to_input: ${data.train_ds.add_eos_to_input} metric: name: "exact_string_match" # Name of the evaluation metric to use. average: null # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported. diff --git a/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_glue_xnli.yaml b/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_glue_xnli.yaml index 486a6da14135..ad625e405976 100644 --- a/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_glue_xnli.yaml +++ b/examples/nlp/language_modeling/conf/megatron_t5_config_finetune_glue_xnli.yaml @@ -63,6 +63,10 @@ model: pin_memory: True max_seq_length: 512 drop_last: True + replace_bos_with_pad: False # Replaces bos with pad for both the encoder and decoder. This is necessary when using Google's T5 checkpoints. + add_bos_to_input: True # Adds bos to the input sequence. + add_eos_to_input: True # Adds eos to the input sequence. + validation_ds: task_name: 'xnli' @@ -76,6 +80,10 @@ model: drop_last: False write_predictions_to_file: False prediction_file_path_prefix: null # Prefix of the file to write predictions to. + replace_bos_with_pad: ${data.train_ds.replace_bos_with_pad} + add_bos_to_input: ${data.train_ds.add_bos_to_input} + add_eos_to_input: ${data.train_ds.add_eos_to_input} + metric: name: "exact_string_match" # Name of the evaluation metric to use. average: null # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported. @@ -93,6 +101,10 @@ model: drop_last: False write_predictions_to_file: False output_file_path_prefix: null # Prefix of the file to write predictions to. + replace_bos_with_pad: ${data.train_ds.replace_bos_with_pad} + add_bos_to_input: ${data.train_ds.add_bos_to_input} + add_eos_to_input: ${data.train_ds.add_eos_to_input} + metric: name: "exact_string_match" # Name of the evaluation metric to use. average: null # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported. diff --git a/examples/nlp/language_modeling/conf/megatron_t5_finetune.yaml b/examples/nlp/language_modeling/conf/megatron_t5_finetune.yaml index 8c383aad9c78..eb2dc81bd4b9 100644 --- a/examples/nlp/language_modeling/conf/megatron_t5_finetune.yaml +++ b/examples/nlp/language_modeling/conf/megatron_t5_finetune.yaml @@ -65,6 +65,9 @@ model: concat_sampling_technique: temperature # When providing a list of datasets, this arg defines the sampling strategy. Options: ['temperature', 'random'] concat_sampling_temperature: 5 # When providing a list of datasets, this arg defines the sampling temperature when strategy='temperature' concat_sampling_probabilities: null # When providing a list of datasets, this arg defines the sampling probabilities from each dataset when strategy='random' + replace_bos_with_pad: False # Replaces bos with pad for both the encoder and decoder. This is necessary when using Google's T5 checkpoints. + add_bos_to_input: False # Adds bos to the input sequence. + add_eos_to_input: False # Adds eos to the input sequence. validation_ds: src_file_name: ??? # Path to the txt file corresponding to the source data. @@ -80,6 +83,9 @@ model: drop_last: False # TODO: Figure out if there is a way to avoid dropping last. write_predictions_to_file: False output_file_path_prefix: null # Prefix of the file to write predictions to. + replace_bos_with_pad: ${data.train_ds.replace_bos_with_pad} + add_bos_to_input: ${data.train_ds.add_bos_to_input} + add_eos_to_input: ${data.train_ds.add_eos_to_input} metric: name: "exact_string_match" # Name of the evaluation metric to use. average: micro # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported. diff --git a/nemo/collections/nlp/data/common/sequence_to_sequence_dataset.py b/nemo/collections/nlp/data/common/sequence_to_sequence_dataset.py index 9e75c681dc89..d3106e7c9410 100644 --- a/nemo/collections/nlp/data/common/sequence_to_sequence_dataset.py +++ b/nemo/collections/nlp/data/common/sequence_to_sequence_dataset.py @@ -40,6 +40,9 @@ def __init__( tgt_tokenizer: TokenizerSpec, max_src_seq_length: int, max_tgt_seq_length: int, + add_bos_to_input: bool = True, + add_eos_to_input: bool = True, + replace_bos_with_pad: bool = False, ): super().__init__() self.src_file_name = src_file_name @@ -48,6 +51,9 @@ def __init__( self.tgt_tokenizer = tgt_tokenizer self.max_src_seq_length = max_src_seq_length self.max_tgt_seq_length = max_tgt_seq_length + self.add_bos_to_input = add_bos_to_input + self.add_eos_to_input = add_eos_to_input + self.replace_bos_with_pad = replace_bos_with_pad assert self.max_src_seq_length > 0 assert self.max_tgt_seq_length > 0 self._check_files_exist() @@ -75,13 +81,14 @@ def _get_examples(self): for i, (src, tgt) in enumerate(zip(f_src, f_tgt)): if i % 10000 == 0 and i != 0: logging.info(f"Read {i} lines from {self.src_file_name} & {self.tgt_file_name}") - src = ( - [self.src_tokenizer.bos_id] - + self.src_tokenizer.text_to_ids(src.strip()) - + [self.src_tokenizer.eos_id] - ) + src = self.src_tokenizer.text_to_ids(src.strip()) + if self.add_bos_to_input: + src = [self.src_tokenizer.pad_id if self.replace_bos_with_pad else self.src_tokenizer.bos_id] + src + if self.add_eos_to_input: + src = src + [self.src_tokenizer.eos_id] + tgt = ( - [self.tgt_tokenizer.bos_id] + [self.tgt_tokenizer.pad_id if self.replace_bos_with_pad else self.tgt_tokenizer.bos_id] + self.tgt_tokenizer.text_to_ids(tgt.strip()) + [self.tgt_tokenizer.eos_id] ) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py b/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py index e445bf7f4482..fd61b41cae7d 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py @@ -598,6 +598,9 @@ def _build_train_dataset(self, data_cfg): tgt_tokenizer=self.tokenizer, max_src_seq_length=data_cfg.max_src_seq_length, max_tgt_seq_length=data_cfg.max_tgt_seq_length, + add_bos_to_input=data_cfg.get('add_bos_to_input', True), + add_eos_to_input=data_cfg.get('add_eos_to_input', True), + replace_bos_with_pad=data_cfg.get('replace_bos_with_pad', False), ) datasets.append(dataset) @@ -650,6 +653,9 @@ def _build_eval_dataset(self, data_cfg): tgt_tokenizer=self.tokenizer, max_src_seq_length=data_cfg.max_src_seq_length, max_tgt_seq_length=data_cfg.max_tgt_seq_length, + add_bos_to_input=data_cfg.get('add_bos_to_input', True), + add_eos_to_input=data_cfg.get('add_eos_to_input', True), + replace_bos_with_pad=data_cfg.get('replace_bos_with_pad', False), ) datasets.append(dataset) From 94d4c50dacce6aecc3ee0d40d57b3a88e546e1da Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 10 Jan 2023 15:16:41 -0800 Subject: [PATCH 130/154] Rename some functions Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 4 ++-- tools/nemo_forced_aligner/utils/data_prep.py | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index b86cf566a565..93a56a23ff64 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -21,7 +21,7 @@ from utils.data_prep import ( get_audio_sr, get_batch_starts_ends, - get_log_probs_y_T_U, + get_batch_tensors_and_boundary_info, get_manifest_lines_batch, is_entry_in_all_lines, is_entry_in_any_lines, @@ -200,7 +200,7 @@ def main(cfg: AlignmentConfig): word_info_batch, segment_info_batch, pred_text_batch, - ) = get_log_probs_y_T_U( + ) = get_batch_tensors_and_boundary_info( manifest_lines_batch, model, cfg.additional_ctm_grouping_separator, cfg.align_using_pred_text, ) diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index 8da494d3eda3..67241f832a60 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -110,7 +110,7 @@ def get_char_tokens(text, model): return tokens -def get_y_token_word_segment_info(text, model, separator): +def get_y_and_boundary_info_for_utt(text, model, separator): """ Get y, token_info, word_info and segment_info for the text provided, tokenized by the model provided. """ @@ -277,7 +277,7 @@ def get_y_token_word_segment_info(text, model, separator): raise RuntimeError("Cannot get tokens of this model.") -def get_log_probs_y_T_U(manifest_lines_batch, model, separator, align_using_pred_text): +def get_batch_tensors_and_boundary_info(manifest_lines_batch, model, separator, align_using_pred_text): """ Preparing some tensors to be used during Viterbi decoding. Returns: @@ -311,7 +311,7 @@ def get_log_probs_y_T_U(manifest_lines_batch, model, separator, align_using_pred gt_text_for_alignment = pred_text_batch[i_line] else: gt_text_for_alignment = line["text"] - y_utt, token_info_utt, word_info_utt, segment_info_utt = get_y_token_word_segment_info( + y_utt, token_info_utt, word_info_utt, segment_info_utt = get_y_and_boundary_info_for_utt( gt_text_for_alignment, model, separator ) From 26bf464b3bf4e19c188b48a7906599d614c0db31 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 10 Jan 2023 15:39:59 -0800 Subject: [PATCH 131/154] update year Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 2 +- tools/nemo_forced_aligner/utils/constants.py | 2 +- tools/nemo_forced_aligner/utils/data_prep.py | 2 +- tools/nemo_forced_aligner/utils/make_output_files.py | 2 +- tools/nemo_forced_aligner/utils/viterbi_decoding.py | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 93a56a23ff64..048d423c74ee 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -1,4 +1,4 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/tools/nemo_forced_aligner/utils/constants.py b/tools/nemo_forced_aligner/utils/constants.py index 14a2cf3d62e8..894f880401cb 100644 --- a/tools/nemo_forced_aligner/utils/constants.py +++ b/tools/nemo_forced_aligner/utils/constants.py @@ -1,4 +1,4 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index 67241f832a60..e5176a862ce7 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -1,4 +1,4 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/tools/nemo_forced_aligner/utils/make_output_files.py b/tools/nemo_forced_aligner/utils/make_output_files.py index 479295372f1e..aede6564fa63 100644 --- a/tools/nemo_forced_aligner/utils/make_output_files.py +++ b/tools/nemo_forced_aligner/utils/make_output_files.py @@ -1,4 +1,4 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/tools/nemo_forced_aligner/utils/viterbi_decoding.py b/tools/nemo_forced_aligner/utils/viterbi_decoding.py index 3ed1f147ef2d..a5599b267fbf 100644 --- a/tools/nemo_forced_aligner/utils/viterbi_decoding.py +++ b/tools/nemo_forced_aligner/utils/viterbi_decoding.py @@ -1,4 +1,4 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. From d928171cf242f5db00e2607932216e3f1c85449c Mon Sep 17 00:00:00 2001 From: Boris Fomitchev Date: Tue, 10 Jan 2023 17:38:45 -0800 Subject: [PATCH 132/154] No-script TS export, prepared for ONNX export (#5653) * Changed unfold to reshape, merged padding chenges * Almost working ONNX export of RadTTS * restored radtts function * Added explicit assume_padded flag * Fixing attn_mask * Fixing unfold * Trying no hx * Back with hx * Made fx only for tracing * Tests annotated * Fully working no-script TS export, prepared for ONNX export * Restored no-autocast block, addressed code review * Fine-tuning autocast option * Protecting InstanceNorm * Forcing eval and param freeze on export Signed-off-by: Boris Fomitchev Signed-off-by: Elena Rastorgueva --- Dockerfile | 1 - nemo/collections/tts/helpers/helpers.py | 36 ++++++-- nemo/collections/tts/losses/radttsloss.py | 2 +- nemo/collections/tts/losses/tacotron2loss.py | 3 +- nemo/collections/tts/models/aligner.py | 2 +- .../tts/modules/attribute_prediction_model.py | 2 +- nemo/collections/tts/modules/common.py | 69 +++++---------- nemo/collections/tts/modules/radtts.py | 86 ++++++++++--------- nemo/core/classes/exportable.py | 28 +++--- nemo/utils/__init__.py | 1 + nemo/utils/cast_utils.py | 12 +++ nemo/utils/export_utils.py | 28 +++--- tests/collections/tts/test_tts_exportables.py | 22 ++++- 13 files changed, 159 insertions(+), 133 deletions(-) diff --git a/Dockerfile b/Dockerfile index e7cd3e944523..8c9624144abd 100644 --- a/Dockerfile +++ b/Dockerfile @@ -16,7 +16,6 @@ ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:22.12-py3 - # build an image that includes only the nemo dependencies, ensures that dependencies # are included first for optimal caching, and useful for building a development # image (by specifying build target as `nemo-deps`) diff --git a/nemo/collections/tts/helpers/helpers.py b/nemo/collections/tts/helpers/helpers.py index eba32b1a00eb..1adb574955e9 100644 --- a/nemo/collections/tts/helpers/helpers.py +++ b/nemo/collections/tts/helpers/helpers.py @@ -43,7 +43,7 @@ # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. from enum import Enum -from typing import Dict, Optional, Sequence +from typing import Optional, Tuple import librosa import matplotlib.pylab as plt @@ -133,14 +133,38 @@ def binarize_attention_parallel(attn, in_lens, out_lens): return torch.from_numpy(attn_out).to(attn.device) -def get_mask_from_lengths(lengths, max_len: Optional[int] = None): - if max_len is None: - max_len = lengths.max() - ids = torch.arange(0, max_len, device=lengths.device, dtype=torch.long) - mask = (ids < lengths.unsqueeze(1)).bool() +def get_mask_from_lengths(lengths: Optional[torch.Tensor] = None, x: Optional[torch.Tensor] = None,) -> torch.Tensor: + """Constructs binary mask from a 1D torch tensor of input lengths + + Args: + lengths: Optional[torch.tensor] (torch.tensor): 1D tensor with lengths + x: Optional[torch.tensor] = tensor to be used on, last dimension is for mask + Returns: + mask (torch.tensor): num_sequences x max_length x 1 binary tensor + """ + if lengths is None: + assert x is not None + return torch.ones(x.shape[-1], dtype=torch.bool, device=x.device) + else: + if x is None: + max_len = torch.max(lengths) + else: + max_len = x.shape[-1] + ids = torch.arange(0, max_len, device=lengths.device, dtype=lengths.dtype) + mask = ids < lengths.unsqueeze(1) return mask +@torch.jit.script +def sort_tensor(context: torch.Tensor, lens: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: + lens_sorted, ids_sorted = torch.sort(lens, descending=True) + unsort_ids = torch.zeros_like(ids_sorted) + for i in range(ids_sorted.shape[0]): + unsort_ids[ids_sorted[i]] = i + context = context[ids_sorted] + return context, lens_sorted, unsort_ids + + @jit(nopython=True) def mas(attn_map, width=1): # assumes mel x text diff --git a/nemo/collections/tts/losses/radttsloss.py b/nemo/collections/tts/losses/radttsloss.py index 3c9c6b7ea2af..66aa982c6bd4 100644 --- a/nemo/collections/tts/losses/radttsloss.py +++ b/nemo/collections/tts/losses/radttsloss.py @@ -15,8 +15,8 @@ import torch from torch.nn import functional as F +from nemo.collections.tts.helpers.helpers import get_mask_from_lengths from nemo.collections.tts.losses.aligner_loss import ForwardSumLoss -from nemo.collections.tts.modules.common import get_mask_from_lengths from nemo.core.classes import Loss diff --git a/nemo/collections/tts/losses/tacotron2loss.py b/nemo/collections/tts/losses/tacotron2loss.py index 2645aa7cb82a..2e1350e4e47d 100644 --- a/nemo/collections/tts/losses/tacotron2loss.py +++ b/nemo/collections/tts/losses/tacotron2loss.py @@ -68,9 +68,8 @@ def forward(self, *, spec_pred_dec, spec_pred_postnet, gate_pred, spec_target, s spec_pred_dec = torch.nn.functional.pad(spec_pred_dec, (0, pad_amount), value=pad_value) spec_pred_postnet = torch.nn.functional.pad(spec_pred_postnet, (0, pad_amount), value=pad_value) gate_pred = torch.nn.functional.pad(gate_pred, (0, pad_amount), value=1e3) - max_len = spec_pred_dec.shape[2] - mask = ~get_mask_from_lengths(spec_target_len, max_len=max_len) + mask = ~get_mask_from_lengths(spec_target_len, spec_pred_dec) mask = mask.expand(spec_target.shape[1], mask.size(0), mask.size(1)) mask = mask.permute(1, 0, 2) spec_pred_dec.data.masked_fill_(mask, pad_value) diff --git a/nemo/collections/tts/models/aligner.py b/nemo/collections/tts/models/aligner.py index c068060acd40..66705d516e0d 100644 --- a/nemo/collections/tts/models/aligner.py +++ b/nemo/collections/tts/models/aligner.py @@ -117,7 +117,7 @@ def forward(self, *, spec, spec_len, text, text_len, attn_prior=None): attn_soft, attn_logprob = self.alignment_encoder( queries=spec, keys=self.embed(text).transpose(1, 2), - mask=get_mask_from_lengths(text_len).unsqueeze(-1) == 0, + mask=get_mask_from_lengths(text_len).unsqueeze(-1), attn_prior=attn_prior, ) diff --git a/nemo/collections/tts/modules/attribute_prediction_model.py b/nemo/collections/tts/modules/attribute_prediction_model.py index d7aea9e40b9b..1dc952373d89 100644 --- a/nemo/collections/tts/modules/attribute_prediction_model.py +++ b/nemo/collections/tts/modules/attribute_prediction_model.py @@ -92,7 +92,7 @@ def forward(self, txt_enc, spk_emb, x, lens): outputs = {'x_hat': x_hat, 'x': x} return outputs - def infer(self, z, txt_enc, spk_emb, lens=None): + def infer(self, txt_enc, spk_emb, lens=None): x_hat = self.forward(txt_enc, spk_emb, x=None, lens=lens)['x_hat'] x_hat = self.denormalize(x_hat) return x_hat diff --git a/nemo/collections/tts/modules/common.py b/nemo/collections/tts/modules/common.py index 63c28f12a4a7..0c3dbed24e84 100644 --- a/nemo/collections/tts/modules/common.py +++ b/nemo/collections/tts/modules/common.py @@ -24,6 +24,7 @@ from torch.nn import functional as F from torch.nn.utils.rnn import PackedSequence +from nemo.collections.tts.helpers.helpers import get_mask_from_lengths, sort_tensor from nemo.collections.tts.helpers.splines import ( piecewise_linear_inverse_transform, piecewise_linear_transform, @@ -32,36 +33,6 @@ from nemo.collections.tts.modules.submodules import ConvNorm, LinearNorm, MaskedInstanceNorm1d -@torch.jit.script -def get_mask_from_lengths_and_val(lengths, val): - """Constructs binary mask from a 1D torch tensor of input lengths - - Args: - lengths (torch.tensor): 1D tensor - Returns: - mask (torch.tensor): num_sequences x max_length x 1 binary tensor - """ - max_len = val.shape[-1] - ids = torch.arange(0, max_len, device=lengths.device) - mask = ids < lengths.unsqueeze(1) - return mask - - -@torch.jit.script -def get_mask_from_lengths(lengths): - """Constructs binary mask from a 1D torch tensor of input lengths - - Args: - lengths (torch.tensor): 1D tensor - Returns: - mask (torch.tensor): num_sequences x max_length x 1 binary tensor - """ - max_len = torch.max(lengths) - ids = torch.arange(0, max_len, device=lengths.device) - mask = ids < lengths.unsqueeze(1) - return mask - - @torch.jit.script def fused_add_tanh_sigmoid_multiply(input_a, input_b): t_act = torch.tanh(input_a) @@ -92,18 +63,8 @@ def forward(self, x): return x -@torch.jit.script -def sort_tensor(context: Tensor, lens: Tensor) -> Tuple[Tensor, Tensor, Tensor]: - lens_sorted, ids_sorted = torch.sort(lens, descending=True) - unsort_ids = torch.zeros_like(ids_sorted) - for i in range(ids_sorted.shape[0]): - unsort_ids[ids_sorted[i]] = i - context = context[ids_sorted] - return context, lens_sorted, unsort_ids - - class BiLSTM(nn.Module): - def __init__(self, input_size, hidden_size, num_layers=1, lstm_norm_fn="spectral"): + def __init__(self, input_size, hidden_size, num_layers=1, lstm_norm_fn="spectral", max_batch_size=64): super().__init__() self.bilstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True) if lstm_norm_fn is not None: @@ -116,6 +77,9 @@ def __init__(self, input_size, hidden_size, num_layers=1, lstm_norm_fn="spectral lstm_norm_fn_pntr(self.bilstm, 'weight_hh_l0') lstm_norm_fn_pntr(self.bilstm, 'weight_hh_l0_reverse') + + self.real_hidden_size: int = self.bilstm.proj_size if self.bilstm.proj_size > 0 else self.bilstm.hidden_size + self.bilstm.flatten_parameters() def lstm_tensor(self, context: Tensor, lens: Tensor, enforce_sorted: bool = False) -> Tuple[Tensor, Tensor]: @@ -127,16 +91,25 @@ def lstm_tensor(self, context: Tensor, lens: Tensor, enforce_sorted: bool = Fals def lstm_sequence(self, seq: PackedSequence) -> Tuple[Tensor, Tensor]: if not (torch.jit.is_scripting() or torch.jit.is_tracing()): self.bilstm.flatten_parameters() - ret, _ = self.bilstm(seq) + ret, _ = self.bilstm(seq) + else: + # Calculate sizes and prepare views to our zero buffer to pass as hx + max_batch_size = seq.batch_sizes[0] + common_shape = (self.bilstm.num_layers * 2, max_batch_size) + hx = ( + seq.data.new_zeros(*common_shape, self.real_hidden_size), + seq.data.new_zeros(*common_shape, self.bilstm.hidden_size), + ) + ret, _ = self.bilstm(seq, hx) return nn.utils.rnn.pad_packed_sequence(ret, batch_first=True) def forward(self, context: Tensor, lens: Tensor) -> Tensor: - context, lens_sorted, unsort_ids = sort_tensor(context, lens) + context, lens, unsort_ids = sort_tensor(context, lens) dtype = context.dtype # this is only needed for Torchscript to run in Triton # (https://github.com/pytorch/pytorch/issues/89241) with torch.cuda.amp.autocast(enabled=False): - ret = self.lstm_tensor(context.to(dtype=torch.float32), lens_sorted, enforce_sorted=True) + ret = self.lstm_tensor(context.to(dtype=torch.float32), lens, enforce_sorted=True) return ret[0].to(dtype=dtype)[unsort_ids] @@ -185,7 +158,7 @@ def __init__( self.dense = nn.Linear(n_channels, out_dim) def forward(self, context: Tensor, lens: Tensor) -> Tensor: - mask = get_mask_from_lengths_and_val(lens, context) + mask = get_mask_from_lengths(lens, context) mask = mask.to(dtype=context.dtype).unsqueeze(1) for conv in self.convolutions: context = self.dropout(F.relu(conv(context, mask))) @@ -341,10 +314,8 @@ def forward(self, z_w_context, seq_lens: Optional[Tensor] = None): # seq_lens: tensor array of sequence sequence lengths # output should be b x n_mel_channels x z_w_context.shape(2) - if seq_lens is not None: - mask = get_mask_from_lengths_and_val(seq_lens, z_w_context).unsqueeze(1).float() - else: - mask = torch.ones_like(z_w_context) + mask = get_mask_from_lengths(seq_lens, z_w_context).unsqueeze(1).to(dtype=z_w_context.dtype) + for i in range(self.n_layers): z_w_context = self.layers[i](z_w_context, mask) z_w_context = torch.relu(z_w_context) diff --git a/nemo/collections/tts/modules/radtts.py b/nemo/collections/tts/modules/radtts.py index 7bb3f2882334..63bb3bce6b92 100644 --- a/nemo/collections/tts/modules/radtts.py +++ b/nemo/collections/tts/modules/radtts.py @@ -18,6 +18,7 @@ import torch.nn.functional as F from torch import nn +from nemo.collections.tts.helpers.helpers import get_mask_from_lengths from nemo.collections.tts.helpers.helpers import mas_width1 as mas from nemo.collections.tts.helpers.helpers import regulate_len from nemo.collections.tts.modules.attribute_prediction_model import get_attribute_prediction_model @@ -29,7 +30,6 @@ Invertible1x1Conv, Invertible1x1ConvLUS, LinearNorm, - get_mask_from_lengths, get_radtts_encoder, ) from nemo.core.classes import Exportable, NeuralModule @@ -224,7 +224,7 @@ def __init__( 'padding': 0, 'dilation': 1, } - self.unfold = nn.Unfold(**self.unfold_params) + self.unfold_mod = nn.Unfold(**self.unfold_params) self.exit_steps = [] self.n_early_size = n_early_size @@ -325,14 +325,14 @@ def encode_text(self, text, in_lens): text_enc = self.encoder(text_embeddings, in_lens).transpose(1, 2) return text_enc, text_embeddings - def preprocess_context(self, context, speaker_vecs, out_lens, f0, energy_avg): + def preprocess_context(self, context, speaker_vecs, out_lens, f0, energy_avg, assume_padded=False): if self.n_group_size > 1: - context = self.unfold(context.unsqueeze(-1)) + context = self.unfold(context, assume_padded=assume_padded) if f0 is not None: - f0 = self.unfold(f0[:, None, :, None]) + f0 = self.unfold(f0[:, None, :], assume_padded=assume_padded) if energy_avg is not None: - energy_avg = self.unfold(energy_avg[:, None, :, None]) + energy_avg = self.unfold(energy_avg[:, None, :], assume_padded=assume_padded) speaker_vecs = speaker_vecs[..., None].expand(-1, -1, context.shape[2]) context_w_spkvec = torch.cat((context, speaker_vecs), 1) @@ -358,17 +358,30 @@ def preprocess_context(self, context, speaker_vecs, out_lens, f0, energy_avg): return context_w_spkvec def fold(self, mel): - """Inverse of the self.unfold(mel.unsqueeze(-1)) operation used for the - grouping or "squeeze" operation on input + """Inverse of the self.unfold() operation used for the + grouping or "squeeze" operation on input - Args: - mel: B x C x T tensor of temporal data - """ + Args: + mel: B x C x T tensor of temporal data + """ + b, d, t = mel.shape + mel = mel.reshape(b, -1, self.n_group_size, t).transpose(2, 3) + return mel.reshape(b, -1, t * self.n_group_size) - mel = nn.functional.fold(mel, output_size=(mel.shape[2] * self.n_group_size, 1), **self.unfold_params).squeeze( - -1 - ) - return mel + def unfold(self, mel, assume_padded=False): + """operation used for the + grouping or "squeeze" operation on input + + Args: + mel: B x C x T tensor of temporal data + """ + # for inference, mel is being padded beforehand + if assume_padded: + b, d, t = mel.shape + mel = mel.reshape(b, d, -1, self.n_group_size).transpose(2, 3) + return mel.reshape(b, d * self.n_group_size, -1) + else: + return self.unfold_mod(mel.unsqueeze(-1)) def binarize_attention(self, attn, in_lens, out_lens): """For training purposes only. Binarizes attention with MAS. These will @@ -435,7 +448,7 @@ def forward( attn_hard = None if 'atn' in self.include_modules or 'dec' in self.include_modules: # make sure to do the alignments before folding - attn_mask = get_mask_from_lengths(in_lens)[..., None] == 0 + attn_mask = ~get_mask_from_lengths(in_lens)[..., None] # attn_mask shld be 1 for unsd t-steps in text_enc_w_spkvec tensor attn_soft, attn_logprob = self.attention( mel, text_embeddings, out_lens, attn_mask, key_lens=in_lens, attn_prior=attn_prior @@ -463,7 +476,7 @@ def forward( # might truncate some frames at the end, but that's ok # sometimes referred to as the "squeeze" operation # invert this by calling self.fold(mel_or_z) - mel = self.unfold(mel.unsqueeze(-1)) + mel = self.unfold(mel) # where context is folded # mask f0 in case values are interpolated context_w_spkvec = self.preprocess_context( @@ -589,7 +602,6 @@ def infer( ): batch_size = text.shape[0] - n_tokens = text.shape[1] spk_vec = self.encode_speaker(speaker_id) if speaker_id_text is None: speaker_id_text = speaker_id @@ -601,9 +613,7 @@ def infer( if dur is None: # get token durations - z_dur = txt_enc.new_zeros((batch_size, 1, n_tokens), dtype=torch.float) - z_dur = torch.normal(z_dur) * sigma_txt - dur = self.dur_pred_layer.infer(z_dur, txt_enc, spk_vec_text, lens=in_lens) + dur = self.dur_pred_layer.infer(txt_enc, spk_vec_text, lens=in_lens) dur = pad_dur(dur, txt_enc) dur = dur[:, 0] dur = dur.clamp(0, token_duration_max) @@ -623,9 +633,11 @@ def infer( if voiced_mask is None: if self.use_vpred_module: # get logits - voiced_mask = self.v_pred_module.infer(None, txt_enc_time_expanded, spk_vec_attributes, lens=out_lens) + voiced_mask = self.v_pred_module.infer(txt_enc_time_expanded, spk_vec_attributes, lens=out_lens) voiced_mask_bool = torch.sigmoid(voiced_mask[:, 0]) > 0.5 voiced_mask = voiced_mask_bool.to(dur.dtype) + else: + voiced_mask_bool = None else: voiced_mask_bool = voiced_mask.bool() @@ -641,18 +653,12 @@ def infer( f0_bias = -f0_bias[..., 0] if f0 is None: - n_f0_feature_channels = 2 if self.use_first_order_features else 1 - z_f0 = torch.normal(txt_enc.new_zeros(batch_size, n_f0_feature_channels, max_out_len)) * sigma_f0 - f0 = self.infer_f0(z_f0, ap_txt_enc_time_expanded, spk_vec_attributes, voiced_mask_bool, out_lens)[:, 0] + f0 = self.infer_f0(ap_txt_enc_time_expanded, spk_vec_attributes, voiced_mask_bool, out_lens)[:, 0] f0 = adjust_f0(f0, f0_mean, f0_std, voiced_mask_bool) if energy_avg is None: - n_energy_feature_channels = 2 if self.use_first_order_features else 1 - z_energy_avg = ( - torch.normal(txt_enc.new_zeros(batch_size, n_energy_feature_channels, max_out_len)) * sigma_energy - ) - energy_avg = self.infer_energy(z_energy_avg, ap_txt_enc_time_expanded, spk_vec, out_lens)[:, 0] + energy_avg = self.infer_energy(ap_txt_enc_time_expanded, spk_vec, out_lens)[:, 0] # replication pad, because ungrouping with different group sizes # may lead to mismatched lengths @@ -660,10 +666,12 @@ def infer( (energy_avg, f0) = pad_energy_avg_and_f0(energy_avg, f0, max_out_len) context_w_spkvec = self.preprocess_context( - txt_enc_time_expanded, spk_vec, out_lens, (f0 + f0_bias) * voiced_mask, energy_avg + txt_enc_time_expanded, spk_vec, out_lens, (f0 + f0_bias) * voiced_mask, energy_avg, assume_padded=True ) - residual = torch.normal(txt_enc.new_zeros(batch_size, 80 * self.n_group_size, torch.max(n_groups))) * sigma + residual = txt_enc.new_zeros(batch_size, 80 * self.n_group_size, torch.max(n_groups)) + if sigma > 0.0: + residual = torch.normal(residual) * sigma # map from z sample to data num_steps_to_exit = len(self.exit_steps) @@ -683,10 +691,10 @@ def infer( if self.n_group_size > 1: mel = self.fold(mel) - return {'mel': mel, 'out_lens': out_lens, 'dur': dur, 'f0': f0, 'energy_avg': energy_avg} + return mel, out_lens, dur, f0, energy_avg - def infer_f0(self, residual, txt_enc_time_expanded, spk_vec, voiced_mask=None, lens=None): - f0 = self.f0_pred_module.infer(residual, txt_enc_time_expanded, spk_vec, lens) + def infer_f0(self, txt_enc_time_expanded, spk_vec, voiced_mask=None, lens=None): + f0 = self.f0_pred_module.infer(txt_enc_time_expanded, spk_vec, lens) # constants if self.ap_pred_log_f0: @@ -713,8 +721,8 @@ def infer_f0(self, residual, txt_enc_time_expanded, spk_vec, voiced_mask=None, l f0.masked_fill_(~voiced_mask, 0.0) return f0 - def infer_energy(self, residual, txt_enc_time_expanded, spk_vec, lens): - energy = self.energy_pred_module.infer(residual, txt_enc_time_expanded, spk_vec, lens) + def infer_energy(self, txt_enc_time_expanded, spk_vec, lens): + energy = self.energy_pred_module.infer(txt_enc_time_expanded, spk_vec, lens) # magic constants if self.use_first_order_features: @@ -795,7 +803,7 @@ def forward_for_export(self, text, lens, speaker_id, speaker_id_text, speaker_id """ Runs the generator, for inputs and outputs see input_types, and output_types """ - (mel, n_frames, dur, _, _) = self.infer( + mel, n_frames, dur, _, _ = self.infer( speaker_id, text, speaker_id_text=speaker_id_text, @@ -807,5 +815,5 @@ def forward_for_export(self, text, lens, speaker_id, speaker_id_text, speaker_id f0_mean=145.0, f0_std=30.0, in_lens=lens, - ).values() + ) return mel.float(), n_frames, dur.float() diff --git a/nemo/core/classes/exportable.py b/nemo/core/classes/exportable.py index 1404d5117a3c..76de3edec13f 100644 --- a/nemo/core/classes/exportable.py +++ b/nemo/core/classes/exportable.py @@ -16,7 +16,6 @@ import torch from pytorch_lightning.core.module import _jit_is_scripting -from torch.onnx import TrainingMode from nemo.core.classes import typecheck from nemo.core.utils.neural_type_utils import get_dynamic_axes, get_io_names @@ -56,7 +55,6 @@ def export( verbose=False, do_constant_folding=True, onnx_opset_version=None, - training=TrainingMode.EVAL, check_trace: Union[bool, List[torch.Tensor]] = False, dynamic_axes=None, check_tolerance=0.01, @@ -74,7 +72,6 @@ def export( verbose=verbose, do_constant_folding=do_constant_folding, onnx_opset_version=onnx_opset_version, - training=training, check_trace=check_trace, dynamic_axes=dynamic_axes, check_tolerance=check_tolerance, @@ -96,7 +93,6 @@ def _export( verbose=False, do_constant_folding=True, onnx_opset_version=None, - training=TrainingMode.EVAL, check_trace: Union[bool, List[torch.Tensor]] = False, dynamic_axes=None, check_tolerance=0.01, @@ -106,6 +102,10 @@ def _export( my_args = locals().copy() my_args.pop('self') + self.eval() + for param in self.parameters(): + param.requires_grad = False + exportables = [] for m in self.modules(): if isinstance(m, Exportable): @@ -115,9 +115,9 @@ def _export( format = get_export_format(output) output_descr = f"{qual_name} exported to {format}" - # Pytorch's default for None is too low, can't pass None through + # Pytorch's default opset version is too low, using reasonable latest one if onnx_opset_version is None: - onnx_opset_version = 13 + onnx_opset_version = 16 try: # Disable typechecks @@ -127,9 +127,7 @@ def _export( forward_method, old_forward_method = wrap_forward_method(self) # Set module mode - with torch.onnx.select_model_mode_for_export( - self, training - ), torch.inference_mode(), torch.no_grad(), torch.jit.optimized_execution(True), _jit_is_scripting(): + with torch.inference_mode(), torch.no_grad(), torch.jit.optimized_execution(True), _jit_is_scripting(): if input_example is None: input_example = self.input_module.input_example() @@ -153,9 +151,8 @@ def _export( check_trace_input = [input_example] else: check_trace_input = check_trace - + jitted_model = self if format == ExportFormat.TORCHSCRIPT: - jitted_model = torch.jit.trace_module( self, {"forward": tuple(input_list) + tuple(input_dict.values())}, @@ -163,24 +160,21 @@ def _export( check_trace=check_trace, check_tolerance=check_tolerance, ) - if not self.training: - jitted_model = torch.jit.optimize_for_inference(torch.jit.freeze(jitted_model)) + jitted_model = torch.jit.optimize_for_inference(torch.jit.freeze(jitted_model)) if verbose: logging.info(f"JIT code:\n{jitted_model.code}") jitted_model.save(output) jitted_model = torch.jit.load(output) if check_trace: - verify_torchscript(jitted_model, output, check_trace_input, input_names, check_tolerance) - + verify_torchscript(jitted_model, output, check_trace_input, check_tolerance) elif format == ExportFormat.ONNX: # dynamic axis is a mapping from input/output_name => list of "dynamic" indices if dynamic_axes is None: dynamic_axes = get_dynamic_axes(self.input_module.input_types, input_names) dynamic_axes.update(get_dynamic_axes(self.output_module.output_types, output_names)) - torch.onnx.export( - self, + jitted_model, input_example, output, input_names=input_names, diff --git a/nemo/utils/__init__.py b/nemo/utils/__init__.py index 0b2748e580af..2c424f72e411 100644 --- a/nemo/utils/__init__.py +++ b/nemo/utils/__init__.py @@ -16,6 +16,7 @@ from nemo.utils.app_state import AppState from nemo.utils.cast_utils import ( CastToFloat, + CastToFloatAll, avoid_bfloat16_autocast_context, avoid_float16_autocast_context, cast_all, diff --git a/nemo/utils/cast_utils.py b/nemo/utils/cast_utils.py index f973a4719e24..eeb48f35ffa7 100644 --- a/nemo/utils/cast_utils.py +++ b/nemo/utils/cast_utils.py @@ -73,3 +73,15 @@ def forward(self, x): with torch.cuda.amp.autocast(enabled=False): ret = self.mod.forward(x.to(torch.float32)).to(x.dtype) return ret + + +class CastToFloatAll(torch.nn.Module): + def __init__(self, mod): + super(CastToFloatAll, self).__init__() + self.mod = mod + + def forward(self, *args): + from_dtype = args[0].dtype + with torch.cuda.amp.autocast(enabled=False): + ret = self.mod.forward(*cast_all(args, from_dtype=from_dtype, to_dtype=torch.float32)) + return cast_all(ret, from_dtype=torch.float32, to_dtype=from_dtype) diff --git a/nemo/utils/export_utils.py b/nemo/utils/export_utils.py index 02d99c6ba7fd..93b939c3b426 100644 --- a/nemo/utils/export_utils.py +++ b/nemo/utils/export_utils.py @@ -22,7 +22,7 @@ import torch.nn as nn import torch.nn.functional as F -from nemo.utils import CastToFloat, logging +from nemo.utils import CastToFloat, CastToFloatAll, logging try: import onnxruntime @@ -117,13 +117,13 @@ def to_onnxrt_input(ort_input_names, input_names, input_dict, input_list): return odict -def verify_torchscript(model, output, input_examples, input_names, check_tolerance=0.01): +def verify_torchscript(model, output, input_examples, check_tolerance=0.01): all_good = True for input_example in input_examples: input_list, input_dict = parse_input_example(input_example) - output_example = model.forward(*input_list, **input_dict) # We disable autocast here to make sure exported TS will run under Triton or other C++ env with torch.cuda.amp.autocast(enabled=False): + output_example = model.forward(*input_list, **input_dict) ts_model = torch.jit.load(output) all_good = all_good and run_ts_and_compare( ts_model, input_list, input_dict, output_example, check_tolerance @@ -404,16 +404,7 @@ def script_module(m: nn.Module): return torch.jit.script(m) -default_replacements = { - "BatchNorm1d": wrap_module(nn.BatchNorm1d, CastToFloat), - "BatchNorm2d": wrap_module(nn.BatchNorm2d, CastToFloat), - "LayerNorm": wrap_module(nn.LayerNorm, CastToFloat), - "MatchedScaleMaskSoftmax": wrap_module(None, replace_MatchedScaleMaskSoftmax), -} - -script_replacements = { - "BiLSTM": script_module, -} +script_replacements = {} def replace_for_export(model: nn.Module) -> nn.Module: @@ -426,6 +417,17 @@ def replace_for_export(model: nn.Module) -> nn.Module: Returns: model, possibly modified in-place """ + from nemo.collections.tts.modules.submodules import MaskedInstanceNorm1d + + default_replacements = { + "BatchNorm1d": wrap_module(nn.BatchNorm1d, CastToFloat), + "BatchNorm2d": wrap_module(nn.BatchNorm2d, CastToFloat), + "LayerNorm": wrap_module(nn.LayerNorm, CastToFloat), + "InstanceNorm1d": wrap_module(nn.InstanceNorm1d, CastToFloat), + "MaskedInstanceNorm1d": wrap_module(MaskedInstanceNorm1d, CastToFloatAll), + "MatchedScaleMaskSoftmax": wrap_module(None, replace_MatchedScaleMaskSoftmax), + } + replace_modules(model, default_Apex_replacements) replace_modules(model, default_replacements) # This one has to be the last diff --git a/tests/collections/tts/test_tts_exportables.py b/tests/collections/tts/test_tts_exportables.py index bf2c0842eb91..023f542551ca 100644 --- a/tests/collections/tts/test_tts_exportables.py +++ b/tests/collections/tts/test_tts_exportables.py @@ -71,7 +71,7 @@ def test_HifiGanModel_export_to_onnx(self, hifigan_model): model = hifigan_model.cuda() assert hifigan_model.generator is not None with tempfile.TemporaryDirectory() as tmpdir: - filename = os.path.join(tmpdir, 'hfg.pt') + filename = os.path.join(tmpdir, 'hfg.onnx') model.export(output=filename, verbose=True, check_trace=True) @pytest.mark.run_only_on('GPU') @@ -80,5 +80,21 @@ def test_RadTTSModel_export_to_torchscript(self, radtts_model): model = radtts_model.cuda() with tempfile.TemporaryDirectory() as tmpdir: filename = os.path.join(tmpdir, 'rad.ts') - with torch.cuda.amp.autocast(enabled=True): - model.export(output=filename, verbose=True, check_trace=True) + with torch.cuda.amp.autocast(enabled=True, cache_enabled=False, dtype=torch.float16): + input_example1 = model.input_module.input_example(max_batch=3, max_dim=777) + input_example2 = model.input_module.input_example(max_batch=16, max_dim=1024) + model.export(output=filename, verbose=True, input_example=input_example1, check_trace=[input_example2]) + + @pytest.mark.pleasefixme('ONNX not working yet. Restore when Pytorch fixes LSTM/ONNX bugs.') + @pytest.mark.run_only_on('GPU') + @pytest.mark.unit + def test_RadTTSModel_export_to_onnx(self, radtts_model): + model = radtts_model.cuda() + with tempfile.TemporaryDirectory() as tmpdir: + filename = os.path.join(tmpdir, 'rad.onnx') + with torch.cuda.amp.autocast(enabled=False): + input_example1 = model.input_module.input_example(max_batch=3, max_dim=776) + input_example2 = model.input_module.input_example(max_batch=16, max_dim=998) + model.export( + output=filename, input_example=input_example1, verbose=True, check_trace=[input_example2], + ) From 1c5586de36e805927c17990039b609adfac2d5dc Mon Sep 17 00:00:00 2001 From: fayejf <36722593+fayejf@users.noreply.github.com> Date: Wed, 11 Jan 2023 08:57:25 -0800 Subject: [PATCH 133/154] ASR evaluator (#5728) * backbone Signed-off-by: fayejf * engineer and analyzer Signed-off-by: fayejf * offline_by_chunked Signed-off-by: fayejf * test_ds wip Signed-off-by: fayejf * temp remove inference Signed-off-by: fayejf * mandarin yaml Signed-off-by: fayejf * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * augmentor and a few updates Signed-off-by: fayejf * address alerts and revert unnecessary changes Signed-off-by: fayejf * Add readme Signed-off-by: fayejf * rename Signed-off-by: fayejf * typo fix Signed-off-by: fayejf * small fix Signed-off-by: fayejf * add missing header Signed-off-by: fayejf * rename augmentor_config to augmentor Signed-off-by: fayejf * raname inference_mode to inference Signed-off-by: fayejf * move utils.py Signed-off-by: fayejf * update temp file Signed-off-by: fayejf * make wer cer clear Signed-off-by: fayejf * seed_everything Signed-off-by: fayejf * fix missing rn augmentor_config in rnnt Signed-off-by: fayejf * fix rnnt transcribe Signed-off-by: fayejf * add more docstring and style fix Signed-off-by: fayejf * address codeQL Signed-off-by: fayejf * reflect comments Signed-off-by: fayejf * update readme Signed-off-by: fayejf * clearer Signed-off-by: fayejf Signed-off-by: fayejf Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- .../ctc/speech_to_text_buffered_infer_ctc.py | 5 + .../speech_to_text_buffered_infer_rnnt.py | 5 + examples/asr/transcribe_speech.py | 15 + nemo/collections/asr/models/ctc_bpe_models.py | 3 + nemo/collections/asr/models/ctc_models.py | 11 +- .../collections/asr/models/rnnt_bpe_models.py | 3 + nemo/collections/asr/models/rnnt_models.py | 9 +- .../asr/parts/utils/transcribe_utils.py | 7 +- tools/asr_evaluator/README.md | 44 ++ tools/asr_evaluator/asr_evaluator.py | 92 ++++ tools/asr_evaluator/conf/eval.yaml | 69 +++ tools/asr_evaluator/utils.py | 425 ++++++++++++++++++ 12 files changed, 685 insertions(+), 3 deletions(-) create mode 100644 tools/asr_evaluator/README.md create mode 100644 tools/asr_evaluator/asr_evaluator.py create mode 100644 tools/asr_evaluator/conf/eval.yaml create mode 100644 tools/asr_evaluator/utils.py diff --git a/examples/asr/asr_chunked_inference/ctc/speech_to_text_buffered_infer_ctc.py b/examples/asr/asr_chunked_inference/ctc/speech_to_text_buffered_infer_ctc.py index e3c6eae269ad..5755297d1600 100644 --- a/examples/asr/asr_chunked_inference/ctc/speech_to_text_buffered_infer_ctc.py +++ b/examples/asr/asr_chunked_inference/ctc/speech_to_text_buffered_infer_ctc.py @@ -41,6 +41,7 @@ from dataclasses import dataclass, is_dataclass from typing import Optional +import pytorch_lightning as pl import torch from omegaconf import OmegaConf @@ -71,6 +72,7 @@ class TranscriptionConfig: num_workers: int = 0 append_pred: bool = False # Sets mode of work, if True it will add new field transcriptions. pred_name_postfix: Optional[str] = None # If you need to use another model name, rather than standard one. + random_seed: Optional[int] = None # seed number going to be used in seed_everything() # Chunked configs chunk_len_in_secs: float = 1.6 # Chunk length in seconds @@ -96,6 +98,9 @@ def main(cfg: TranscriptionConfig) -> TranscriptionConfig: if is_dataclass(cfg): cfg = OmegaConf.structured(cfg) + if cfg.random_seed: + pl.seed_everything(cfg.random_seed) + if cfg.model_path is None and cfg.pretrained_name is None: raise ValueError("Both cfg.model_path and cfg.pretrained_name cannot be None!") if cfg.audio_dir is None and cfg.dataset_manifest is None: diff --git a/examples/asr/asr_chunked_inference/rnnt/speech_to_text_buffered_infer_rnnt.py b/examples/asr/asr_chunked_inference/rnnt/speech_to_text_buffered_infer_rnnt.py index 78bc37f9e027..f2b6d143bdb2 100644 --- a/examples/asr/asr_chunked_inference/rnnt/speech_to_text_buffered_infer_rnnt.py +++ b/examples/asr/asr_chunked_inference/rnnt/speech_to_text_buffered_infer_rnnt.py @@ -62,6 +62,7 @@ from dataclasses import dataclass, is_dataclass from typing import Optional +import pytorch_lightning as pl import torch from omegaconf import OmegaConf, open_dict @@ -95,6 +96,7 @@ class TranscriptionConfig: num_workers: int = 0 append_pred: bool = False # Sets mode of work, if True it will add new field transcriptions. pred_name_postfix: Optional[str] = None # If you need to use another model name, rather than standard one. + random_seed: Optional[int] = None # seed number going to be used in seed_everything() # Chunked configs chunk_len_in_secs: float = 1.6 # Chunk length in seconds @@ -127,6 +129,9 @@ def main(cfg: TranscriptionConfig) -> TranscriptionConfig: if is_dataclass(cfg): cfg = OmegaConf.structured(cfg) + if cfg.random_seed: + pl.seed_everything(cfg.random_seed) + if cfg.model_path is None and cfg.pretrained_name is None: raise ValueError("Both cfg.model_path and cfg.pretrained_name cannot be None!") if cfg.audio_dir is None and cfg.dataset_manifest is None: diff --git a/examples/asr/transcribe_speech.py b/examples/asr/transcribe_speech.py index 3db0ca1e5e67..54db5e16218c 100644 --- a/examples/asr/transcribe_speech.py +++ b/examples/asr/transcribe_speech.py @@ -108,6 +108,7 @@ class TranscriptionConfig: dataset_manifest: Optional[str] = None # Path to dataset's JSON manifest channel_selector: Optional[int] = None # Used to select a single channel from multi-channel files audio_key: str = 'audio_filepath' # Used to override the default audio key in dataset_manifest + eval_config_yaml: Optional[str] = None # Path to a yaml file of config of evaluation # General configs output_filename: Optional[str] = None @@ -115,6 +116,7 @@ class TranscriptionConfig: num_workers: int = 0 append_pred: bool = False # Sets mode of work, if True it will add new field transcriptions. pred_name_postfix: Optional[str] = None # If you need to use another model name, rather than standard one. + random_seed: Optional[int] = None # seed number going to be used in seed_everything() # Set to True to output greedy timestamp information (only supported models) compute_timestamps: bool = False @@ -152,11 +154,21 @@ def main(cfg: TranscriptionConfig) -> TranscriptionConfig: if is_dataclass(cfg): cfg = OmegaConf.structured(cfg) + if cfg.random_seed: + pl.seed_everything(cfg.random_seed) + if cfg.model_path is None and cfg.pretrained_name is None: raise ValueError("Both cfg.model_path and cfg.pretrained_name cannot be None!") if cfg.audio_dir is None and cfg.dataset_manifest is None: raise ValueError("Both cfg.audio_dir and cfg.dataset_manifest cannot be None!") + # Load augmentor from exteranl yaml file which contains eval info, could be extend to other feature such VAD, P&C + augmentor = None + if cfg.eval_config_yaml: + eval_config = OmegaConf.load(cfg.eval_config_yaml) + augmentor = eval_config.test_ds.get("augmentor") + logging.info(f"Will apply on-the-fly augmentation on samples during transcription: {augmentor} ") + # setup GPU if cfg.cuda is None: if torch.cuda.is_available(): @@ -253,6 +265,7 @@ def autocast(): num_workers=cfg.num_workers, return_hypotheses=return_hypotheses, channel_selector=cfg.channel_selector, + augmentor=augmentor, ) else: logging.warning( @@ -264,6 +277,7 @@ def autocast(): num_workers=cfg.num_workers, return_hypotheses=return_hypotheses, channel_selector=cfg.channel_selector, + augmentor=augmentor, ) else: transcriptions = asr_model.transcribe( @@ -272,6 +286,7 @@ def autocast(): num_workers=cfg.num_workers, return_hypotheses=return_hypotheses, channel_selector=cfg.channel_selector, + augmentor=augmentor, ) logging.info(f"Finished transcribing {len(filepaths)} files !") diff --git a/nemo/collections/asr/models/ctc_bpe_models.py b/nemo/collections/asr/models/ctc_bpe_models.py index ea23fd05ab77..47b1b805e9d3 100644 --- a/nemo/collections/asr/models/ctc_bpe_models.py +++ b/nemo/collections/asr/models/ctc_bpe_models.py @@ -227,6 +227,9 @@ def _setup_transcribe_dataloader(self, config: Dict) -> 'torch.utils.data.DataLo 'use_start_end_token': self.cfg.validation_ds.get('use_start_end_token', False), } + if config.get("augmentor"): + dl_config['augmentor'] = config.get("augmentor") + temporary_datalayer = self._setup_dataloader_from_config(config=DictConfig(dl_config)) return temporary_datalayer diff --git a/nemo/collections/asr/models/ctc_models.py b/nemo/collections/asr/models/ctc_models.py index 90ee8026ee55..86180ff2caa9 100644 --- a/nemo/collections/asr/models/ctc_models.py +++ b/nemo/collections/asr/models/ctc_models.py @@ -116,8 +116,12 @@ def transcribe( return_hypotheses: bool = False, num_workers: int = 0, channel_selector: Optional[ChannelSelectorType] = None, + augmentor: DictConfig = None, ) -> List[str]: """ + If modify this function, please remember update transcribe_partial_audio() in + nemo/collections/asr/parts/utils/trancribe_utils.py + Uses greedy decoding to transcribe audio files. Use this method for debugging and prototyping. Args: @@ -131,7 +135,7 @@ def transcribe( With hypotheses can do some postprocessing like getting timestamp or rescoring num_workers: (int) number of workers for DataLoader channel_selector (int | Iterable[int] | str): select a single channel or a subset of channels from multi-channel audio. If set to `'average'`, it performs averaging across channels. Disabled if set to `None`. Defaults to `None`. - + augmentor: (DictConfig): Augment audio samples during transcription if augmentor is applied. Returns: A list of transcriptions (or raw log probabilities if logprobs is True) in the same order as paths2audio_files """ @@ -182,6 +186,9 @@ def transcribe( 'channel_selector': channel_selector, } + if augmentor: + config['augmentor'] = augmentor + temporary_datalayer = self._setup_transcribe_dataloader(config) for test_batch in tqdm(temporary_datalayer, desc="Transcribing"): logits, logits_len, greedy_predictions = self.forward( @@ -724,6 +731,8 @@ def _setup_transcribe_dataloader(self, config: Dict) -> 'torch.utils.data.DataLo 'pin_memory': True, 'channel_selector': config.get('channel_selector', None), } + if config.get("augmentor"): + dl_config['augmentor'] = config.get("augmentor") temporary_datalayer = self._setup_dataloader_from_config(config=DictConfig(dl_config)) return temporary_datalayer diff --git a/nemo/collections/asr/models/rnnt_bpe_models.py b/nemo/collections/asr/models/rnnt_bpe_models.py index 3037eecee2a6..6b7e9cf72c80 100644 --- a/nemo/collections/asr/models/rnnt_bpe_models.py +++ b/nemo/collections/asr/models/rnnt_bpe_models.py @@ -579,5 +579,8 @@ def _setup_transcribe_dataloader(self, config: Dict) -> 'torch.utils.data.DataLo 'use_start_end_token': self.cfg.validation_ds.get('use_start_end_token', False), } + if config.get("augmentor"): + dl_config['augmentor'] = config.get("augmentor") + temporary_datalayer = self._setup_dataloader_from_config(config=DictConfig(dl_config)) return temporary_datalayer diff --git a/nemo/collections/asr/models/rnnt_models.py b/nemo/collections/asr/models/rnnt_models.py index 1574b933c1b3..459e5ecab989 100644 --- a/nemo/collections/asr/models/rnnt_models.py +++ b/nemo/collections/asr/models/rnnt_models.py @@ -217,6 +217,7 @@ def transcribe( partial_hypothesis: Optional[List['Hypothesis']] = None, num_workers: int = 0, channel_selector: Optional[ChannelSelectorType] = None, + augmentor: DictConfig = None, ) -> Tuple[List[str], Optional[List['Hypothesis']]]: """ Uses greedy decoding to transcribe audio files. Use this method for debugging and prototyping. @@ -232,7 +233,7 @@ def transcribe( With hypotheses can do some postprocessing like getting timestamp or rescoring num_workers: (int) number of workers for DataLoader channel_selector (int | Iterable[int] | str): select a single channel or a subset of channels from multi-channel audio. If set to `'average'`, it performs averaging across channels. Disabled if set to `None`. Defaults to `None`. Uses zero-based indexing. - + augmentor: (DictConfig): Augment audio samples during transcription if augmentor is applied. Returns: A list of transcriptions in the same order as paths2audio_files. Will also return """ @@ -277,6 +278,9 @@ def transcribe( 'channel_selector': channel_selector, } + if augmentor: + config['augmentor'] = augmentor + temporary_datalayer = self._setup_transcribe_dataloader(config) for test_batch in tqdm(temporary_datalayer, desc="Transcribing"): encoded, encoded_len = self.forward( @@ -938,6 +942,9 @@ def _setup_transcribe_dataloader(self, config: Dict) -> 'torch.utils.data.DataLo 'pin_memory': True, } + if config.get("augmentor"): + dl_config['augmentor'] = config.get("augmentor") + temporary_datalayer = self._setup_dataloader_from_config(config=DictConfig(dl_config)) return temporary_datalayer diff --git a/nemo/collections/asr/parts/utils/transcribe_utils.py b/nemo/collections/asr/parts/utils/transcribe_utils.py index ac400690ec0d..2e66e5866937 100644 --- a/nemo/collections/asr/parts/utils/transcribe_utils.py +++ b/nemo/collections/asr/parts/utils/transcribe_utils.py @@ -334,13 +334,16 @@ def write_transcription( def transcribe_partial_audio( asr_model, - path2manifest: str, + path2manifest: str = None, batch_size: int = 4, logprobs: bool = False, return_hypotheses: bool = False, num_workers: int = 0, channel_selector: Optional[int] = None, + augmentor: DictConfig = None, ) -> List[str]: + """ + See description of this function in trancribe() in nemo/collections/asr/models/ctc_models.py """ assert isinstance(asr_model, EncDecCTCModel), "Currently support CTC model only." @@ -377,6 +380,8 @@ def transcribe_partial_audio( 'num_workers': num_workers, 'channel_selector': channel_selector, } + if augmentor: + config['augmentor'] = augmentor temporary_datalayer = asr_model._setup_transcribe_dataloader(config) for test_batch in tqdm(temporary_datalayer, desc="Transcribing"): diff --git a/tools/asr_evaluator/README.md b/tools/asr_evaluator/README.md new file mode 100644 index 000000000000..158f653ad8b9 --- /dev/null +++ b/tools/asr_evaluator/README.md @@ -0,0 +1,44 @@ +ASR evaluator +-------------------- + +A tool for thoroughly evaluating the performance of ASR models and other features such as Voice Activity Detection. + +Features: + - Simple step to evaluate a model in all three modes currently supported by NeMo: offline, chunked, and offline_by_chunked. + - On-the-fly data augmentation (such as silence, noise, etc.,) for ASR robustness evaluation. + - Investigate the model's performance by detailed insertion, deletion, and substitution error rates for each and all samples. + - Evaluate models' reliability on different target groups such as gender, and audio length if metadata is presented. + + +ASR evaluator contains two main parts: +- **ENGINE**. To conduct ASR inference. +- **ANALYST**. To evaluate model performance based on predictions. + +In Analyst, we can evaluate on metadata (such as duration, emotion, etc.) if it presents in manifest. For example, with the following config, we can calculate WERs for audios in different interval groups, where each group (in seconds) is defined by [[0,2],[2,5],[5,10],[10,20],[20,100000]]. Also, we can calculate the WERs for three groups of emotions, where each group is defined by [['happy','laugh'],['neutral'],['sad']]. Moreover, if we set save_wer_per_class=True, it will calculate WERs for audios in all classes presented in the data (i.e. above 5 classes + 'cry' which presented in data but not in the slot). + +``` +analyst: + metadata: + duration: + enable: True + slot: [[0,2],[2,5],[5,10],[10,20],[20,100000]] + save_wer_per_class: False # whether to save wer for each presented class. + + emotion: + enable: True + slot: [['happy','laugh'],['neutral'],['sad']] # we could have 'cry' in data but not in slot we focus on. + save_wer_per_class: False + ``` + + +Check `./conf/eval.yaml` for the supported configuration. + +If you plan to evaluate/add new tasks such as Punctuation and Capitalization, add it to the engine. + +Run +``` +python asr_evaluator.py \ +engine.pretrained_name="stt_en_conformer_transducer_large" \ +engine.inference_mode.mode="offline" \ +engine.test_ds.augmentor.noise.manifest_path= +``` \ No newline at end of file diff --git a/tools/asr_evaluator/asr_evaluator.py b/tools/asr_evaluator/asr_evaluator.py new file mode 100644 index 000000000000..da81f33bc8b5 --- /dev/null +++ b/tools/asr_evaluator/asr_evaluator.py @@ -0,0 +1,92 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import json + +import git +from omegaconf import OmegaConf +from utils import cal_target_metadata_wer, cal_write_wer, run_asr_inference + +from nemo.core.config import hydra_runner +from nemo.utils import logging + + +""" +This script serves as evaluator of ASR models +Usage: + python python asr_evaluator.py \ +engine.pretrained_name="stt_en_conformer_transducer_large" \ +engine.inference.mode="offline" \ +engine.test_ds.augmentor.noise.manifest_path= \ +..... + +Check out parameters in ./conf/eval.yaml +""" + + +@hydra_runner(config_path="conf", config_name="eval.yaml") +def main(cfg): + report = {} + logging.info(f'Hydra config: {OmegaConf.to_yaml(cfg)}') + + # Store git hash for reproducibility + if cfg.env.save_git_hash: + repo = git.Repo(search_parent_directories=True) + report['git_hash'] = repo.head.object.hexsha + + ## Engine + # Could skip next line to use generated manifest + + # If need to change more parameters for ASR inference, change it in + # 1) shell script in eval_utils.py in nemo/collections/asr/parts/utils or + # 2) TranscriptionConfig on top of the executed scripts such as transcribe_speech.py in examples/asr + cfg.engine = run_asr_inference(cfg=cfg.engine) + + ## Analyst + cfg, total_res, eval_metric = cal_write_wer(cfg) + report.update({"res": total_res}) + + for target in cfg.analyst.metadata: + if cfg.analyst.metadata[target].enable: + occ_avg_wer = cal_target_metadata_wer( + manifest=cfg.analyst.metric_calculator.output_filename, + target=target, + meta_cfg=cfg.analyst.metadata[target], + eval_metric=eval_metric, + ) + report[target] = occ_avg_wer + + config_engine = OmegaConf.to_object(cfg.engine) + report.update(config_engine) + + config_metric_calculator = OmegaConf.to_object(cfg.analyst.metric_calculator) + report.update(config_metric_calculator) + + pretty = json.dumps(report, indent=4) + res = "%.3f" % (report["res"][eval_metric] * 100) + logging.info(pretty) + logging.info(f"Overall {eval_metric} is {res} %") + + ## Writer + report_file = "report.json" + if "report_filename" in cfg.writer and cfg.writer.report_filename: + report_file = cfg.writer.report_filename + + with open(report_file, "a") as fout: + json.dump(report, fout) + fout.write('\n') + fout.flush() + + +if __name__ == "__main__": + main() diff --git a/tools/asr_evaluator/conf/eval.yaml b/tools/asr_evaluator/conf/eval.yaml new file mode 100644 index 000000000000..e8711021cc1c --- /dev/null +++ b/tools/asr_evaluator/conf/eval.yaml @@ -0,0 +1,69 @@ +env: + save_git_hash: True + +engine: + model_path: null + pretrained_name: null + + inference: + mode: offline # choose from offline, chunked or offline_by_chunked + chunk_len_in_secs: 1.6 #null # Need to specify if use buffered inference (default for offline_by_chunked is 20) + total_buffer_in_secs: 4 #null # Need to specify if use buffered inference (default for offline_by_chunked is 22) + model_stride: 4 # Model downsampling factor, 8 for Citrinet models and 4 for Conformer models + + test_ds: + manifest_filepath: null + sample_rate: 16000 + batch_size: 32 + + augmentor: + silence: + prob: 0.8 + min_start_silence_secs: 0 + max_start_silence_secs: 5 + min_end_silence_secs: 0 + max_end_silence_secs: 5 + + noise: + manifest_path: null + prob: 0.8 + min_snr_db: 0 + max_snr_db: 15 + + output_filename: null + random_seed: 42 + +analyst: + metric_calculator: + clean_groundtruth_text: True + langid: "en" # speciify language to clean text. Note use text normalization in NeMo for better performancce + output_filename: null # specify it if wanna skip engine and use previously generated manifest + use_cer: False + + metadata: + duration: + enable: True + slot: [[0,2],[2,5],[5,10],[10,20],[20,100000]] # a slot accepts List[List[str]] or List[List[float]]. i.e. 1.8s belongs to slot [0,2] + save_wer_per_class: False # whether to save wer for each presented class. + + gender: + enable: False + slot: [["female"]] # One could also report only one group/class though there are multiple classes in the data. + save_wer_per_class: True + + speaker: + enable: True + save_wer_per_class: False + + age: + enable: False + slot: null + save_wer_per_class: False + + emotion: + enable: True + slot: [['happy','laugh'],['neutral'],['sad']] + save_wer_per_class: False + +writer: + report_filename: null \ No newline at end of file diff --git a/tools/asr_evaluator/utils.py b/tools/asr_evaluator/utils.py new file mode 100644 index 000000000000..d2704c8a4e61 --- /dev/null +++ b/tools/asr_evaluator/utils.py @@ -0,0 +1,425 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import json +import subprocess +import tempfile +from pathlib import Path +from typing import Tuple + +from omegaconf import DictConfig, OmegaConf, open_dict + +from nemo.collections.asr.metrics.wer import word_error_rate_detail +from nemo.utils import logging + + +def run_asr_inference(cfg: DictConfig) -> DictConfig: + """ + Execute ASR inference based on input mode and parameters. + """ + if (cfg.model_path and cfg.pretrained_name) or (not cfg.model_path and not cfg.pretrained_name): + raise ValueError("Please specify either cfg.model_path or cfg.pretrained_name!") + + if cfg.inference.mode == "offline": + cfg = run_offline_inference(cfg) + + elif cfg.inference.mode == "chunked": + if ( + "total_buffer_in_secs" not in cfg.inference + or "chunk_len_in_secs" not in cfg.inference + or not cfg.inference.total_buffer_in_secs + or not cfg.inference.chunk_len_in_secs + ): + raise ValueError(f"Please specify both total_buffer_in_secs and chunk_len_in_secs for chunked inference") + cfg = run_chunked_inference(cfg) + + elif cfg.inference.mode == "offline_by_chunked": + # When use Conformer to transcribe long audio sample, we could probably encounter CUDA out of memory issue. + # Here we use offline_by_chunked mode to simulate offline mode for Conformer. + # And we specify default total_buffer_in_secs=22 and chunk_len_in_secs=20 to avoid above problem. + OmegaConf.set_struct(cfg, True) + if 'total_buffer_in_secs' not in cfg.inference or not cfg.inference.total_buffer_in_secs: + with open_dict(cfg): + cfg.inference.total_buffer_in_secs = 22 + logging.info( + f"Does not provide total_buffer_in_secs required by {cfg.inference.mode} mode. Using default value {cfg.inference.total_buffer_in_secs}" + ) + if 'chunk_len_in_secs' not in cfg.inference or not cfg.inference.chunk_len_in_secs: + with open_dict(cfg): + cfg.inference.chunk_len_in_secs = 20 + logging.info( + f"Does not provide total_buffer_in_secs required by {cfg.inference.mode} mode. Using default value {cfg.inference.chunk_len_in_secs}" + ) + cfg = run_chunked_inference(cfg) + + else: + raise ValueError(f"inference could only be offline or chunked, but got {cfg.inference.mode}") + + return cfg + + +def run_chunked_inference(cfg: DictConfig) -> DictConfig: + if "output_filename" not in cfg or not cfg.output_filename: + if cfg.model_path: + model_name = Path(cfg.model_path).stem + else: + model_name = cfg.pretrained_name + dataset_name = Path(cfg.test_ds.manifest_filepath).stem + mode_name = ( + cfg.inference.mode + + "B" + + str(cfg.inference.total_buffer_in_secs) + + "C" + + str(cfg.inference.chunk_len_in_secs) + ) + + OmegaConf.set_struct(cfg, True) + with open_dict(cfg): + cfg.output_filename = model_name + "-" + dataset_name + "-" + mode_name + ".json" + + script_path = ( + Path(__file__).parents[2] + / "examples" + / "asr" + / "asr_chunked_inference" + / "ctc" + / "speech_to_text_buffered_infer_ctc.py" + ) + + if (cfg.pretrained_name and 'transducer' in cfg.pretrained_name) or ( + cfg.model_path and 'transducer' in cfg.model_path + ): + script_path = ( + Path(__file__).parents[2] + / "examples" + / "asr" + / "asr_chunked_inference" + / "rnnt" + / "speech_to_text_buffered_infer_rnnt.py" + ) + + subprocess.run( + f"python {script_path} " + f"model_path={cfg.model_path} " + f"pretrained_name={cfg.pretrained_name} " + f"dataset_manifest={cfg.test_ds.manifest_filepath} " + f"output_filename={cfg.output_filename} " + f"random_seed={cfg.random_seed} " + f"batch_size={cfg.test_ds.batch_size} " + f"chunk_len_in_secs={cfg.inference.chunk_len_in_secs} " + f"total_buffer_in_secs={cfg.inference.total_buffer_in_secs} " + f"model_stride={cfg.inference.model_stride} ", + shell=True, + check=True, + ) + return cfg + + +def run_offline_inference(cfg: DictConfig) -> DictConfig: + if "output_filename" not in cfg or not cfg.output_filename: + if cfg.model_path: + model_name = Path(cfg.model_path).stem + else: + model_name = cfg.pretrained_name + dataset_name = Path(cfg.test_ds.manifest_filepath).stem + mode_name = cfg.inference.mode + + OmegaConf.set_struct(cfg, True) + with open_dict(cfg): + cfg.output_filename = model_name + "-" + dataset_name + "-" + mode_name + ".json" + + with tempfile.NamedTemporaryFile(mode='w', encoding='utf-8') as f: + OmegaConf.save(cfg, f) + f.seek(0) # reset file pointer + script_path = Path(__file__).parents[2] / "examples" / "asr" / "transcribe_speech.py" + + # If need to move other config such as decoding strategy, could either: + # 1) change TranscriptionConfig on top of the executed scripts such as transcribe_speech.py in examples/asr, or + # 2) add command as "rnnt_decoding.strategy=greedy_batch " to below script + subprocess.run( + f"python {script_path} " + f"model_path={cfg.model_path} " + f"pretrained_name={cfg.pretrained_name} " + f"dataset_manifest={cfg.test_ds.manifest_filepath} " + f"output_filename={cfg.output_filename} " + f"batch_size={cfg.test_ds.batch_size} " + f"random_seed={cfg.random_seed} " + f"eval_config_yaml={f.name} ", + shell=True, + check=True, + ) + + return cfg + + +def clean_label(_str: str, num_to_words: bool = True, langid="en") -> str: + """ + Remove unauthorized characters in a string, lower it and remove unneeded spaces + """ + replace_with_space = [char for char in '/?*\",.:=?_{|}~¨«·»¡¿„…‧‹›≪≫!:;ː→'] + replace_with_blank = [char for char in '`¨´‘’“”`ʻ‘’“"‘”'] + replace_with_apos = [char for char in '‘’ʻ‘’‘'] + _str = _str.strip() + _str = _str.lower() + for i in replace_with_blank: + _str = _str.replace(i, "") + for i in replace_with_space: + _str = _str.replace(i, " ") + for i in replace_with_apos: + _str = _str.replace(i, "'") + if num_to_words: + if langid == "en": + _str = convert_num_to_words(_str, langid="en") + else: + logging.info( + "Currently support basic num_to_words in English only. Please use Text Normalization to convert other languages! Skipping!" + ) + + ret = " ".join(_str.split()) + return ret + + +def convert_num_to_words(_str: str, langid: str = "en") -> str: + """ + Convert digits to corresponding words. Note this is a naive approach and could be replaced with text normalization. + """ + if langid == "en": + num_to_words = ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"] + _str = _str.strip() + words = _str.split() + out_str = "" + num_word = [] + for word in words: + if word.isdigit(): + num = int(word) + while num: + digit = num % 10 + digit_word = num_to_words[digit] + num_word.append(digit_word) + num = int(num / 10) + if not (num): + num_str = "" + num_word = num_word[::-1] + for ele in num_word: + num_str += ele + " " + out_str += num_str + " " + num_word.clear() + else: + out_str += word + " " + out_str = out_str.strip() + else: + raise ValueError( + "Currently support basic num_to_words in English only. Please use Text Normalization to convert other languages!" + ) + return out_str + + +def cal_write_wer(cfg: DictConfig, pred_text_attr_name: str = None) -> Tuple[DictConfig, dict]: + """ + Calculate wer, inserion, deletion and substitution rate based on groundtruth text and pred_text_attr_name (pred_text) + We use WER in function name as a convention, but it currently Error Rate (ER) support Word Error Rate (WER) and Character Error Rate (CER) + """ + samples = [] + hyps = [] + refs = [] + + with open(cfg.engine.output_filename, 'r') as fp: + for line in fp: + sample = json.loads(line) + + if 'text' not in sample: + raise ValueError( + "ground-truth text does not present in manifest! Cannot calculate Word Error Rate. Exiting!" + ) + + if not pred_text_attr_name: + pred_text_attr_name = "pred_text" + + hyp = sample[pred_text_attr_name] + ref = sample['text'] + + if cfg.analyst.metric_calculator.clean_groundtruth_text: + ref = clean_label(ref, langid=cfg.analyst.metric_calculator.langid) + + wer, tokens, ins_rate, del_rate, sub_rate = word_error_rate_detail( + hypotheses=[hyp], references=[ref], use_cer=cfg.analyst.metric_calculator.use_cer + ) + eval_metric = "wer" + if cfg.analyst.metric_calculator.use_cer: + eval_metric = "cer" + + sample[eval_metric] = wer # evaluatin metric, could be word error rate of character error rate + sample['tokens'] = tokens # number of word/characters/tokens + sample['ins_rate'] = ins_rate # insertion error rate + sample['del_rate'] = del_rate # deletion error rate + sample['sub_rate'] = sub_rate # substitution error rate + + samples.append(sample) + hyps.append(hyp) + refs.append(ref) + + total_wer, total_tokens, total_ins_rate, total_del_rate, total_sub_rate = word_error_rate_detail( + hypotheses=hyps, references=refs, use_cer=cfg.analyst.metric_calculator.use_cer + ) + + if "output_filename" not in cfg.analyst.metric_calculator or not cfg.analyst.metric_calculator.output_filename: + # overwrite the current generated manifest + OmegaConf.set_struct(cfg, True) + with open_dict(cfg): + cfg.analyst.metric_calculator.output_filename = cfg.engine.output_filename + + with open(cfg.analyst.metric_calculator.output_filename, 'w') as fout: + for sample in samples: + json.dump(sample, fout) + fout.write('\n') + fout.flush() + + total_res = { + "samples": len(samples), + "tokens": total_tokens, + eval_metric: total_wer, + "ins_rate": total_ins_rate, + "del_rate": total_del_rate, + "sub_rate": total_sub_rate, + } + return cfg, total_res, eval_metric + + +def cal_target_metadata_wer(manifest: str, target: str, meta_cfg: DictConfig, eval_metric: str = "wer",) -> dict: + """ + Caculating number of samples (samples), number of words/characters/tokens (tokens), + wer/cer, insertion error rate (ins_rate), deletion error rate (del_rate), substitution error rate (sub_rate) of the group/slot of target metadata. + + The group could be [female, male] or slot group like [0-2s, 2-5s, >5s audios] + + + Args: + manifest (str): Filepath of the generated manifest which contains prediction and eval result for each samples. + target (str): Target metadata. Execute the target metadata if field presents in manifest. + such as 'duration', 'speaker', 'emotion', etc. + meta_cfg (DictConfig): Config for calculating group eval_metric for the target metadata. + eval_metric: (str): Supported evaluation metrics. Currently support 'wer' and 'cer'. + + Return: + ret (dict): Generated dictionary containing all results regarding the target metadata. + """ + + if eval_metric not in ['wer', 'cer']: + raise ValueError( + "Currently support wer and cer as eval_metric. Please implement it in cal_target_metadata_wer if using different eval_metric" + ) + + wer_per_class = {} + with open(manifest, 'r') as fp: + for line in fp: + sample = json.loads(line) + if target in sample: + target_class = sample[target] + if target_class not in wer_per_class: + wer_per_class[target_class] = { + 'samples': 0, + 'tokens': 0, + "errors": 0, + "inss": 0, + "dels": 0, + "subs": 0, + } + wer_per_class[target_class]['samples'] += 1 + + tokens = sample["tokens"] + wer_per_class[target_class]["tokens"] += tokens + wer_per_class[target_class]["errors"] += tokens * sample[eval_metric] + wer_per_class[target_class]["inss"] += tokens * sample["ins_rate"] + wer_per_class[target_class]["dels"] += tokens * sample["del_rate"] + wer_per_class[target_class]["subs"] += tokens * sample["sub_rate"] + + if len(wer_per_class) > 0: + res_wer_per_class = {} + for target_class in wer_per_class: + res_wer_per_class[target_class] = {} + res_wer_per_class[target_class]["samples"] = wer_per_class[target_class]["samples"] + res_wer_per_class[target_class][eval_metric] = ( + wer_per_class[target_class]["errors"] / wer_per_class[target_class]["tokens"] + ) + res_wer_per_class[target_class]["tokens"] = wer_per_class[target_class]["tokens"] + res_wer_per_class[target_class]["ins_rate"] = ( + wer_per_class[target_class]["inss"] / wer_per_class[target_class]["tokens"] + ) + res_wer_per_class[target_class]["del_rate"] = ( + wer_per_class[target_class]["dels"] / wer_per_class[target_class]["tokens"] + ) + res_wer_per_class[target_class]["sub_rate"] = ( + wer_per_class[target_class]["subs"] / wer_per_class[target_class]["tokens"] + ) + else: + logging.info(f"metadata '{target}' does not present in manifest. Skipping! ") + return None + + values = ['samples', 'tokens', 'errors', 'inss', 'dels', 'subs'] + slot_wer = {} + if 'slot' in meta_cfg and meta_cfg.slot: + for target_class in wer_per_class: + for s in meta_cfg.slot: + if isinstance(s[0], float) or isinstance(s[0], int): + if s[0] <= target_class < s[1]: + slot_key = "slot-" + ",".join(str(i) for i in s) + if slot_key not in slot_wer: + slot_wer[slot_key] = { + 'samples': 0, + 'tokens': 0, + "errors": 0, + "inss": 0, + "dels": 0, + "subs": 0, + } + + for v in values: + slot_wer[slot_key][v] += wer_per_class[target_class][v] + break + + elif isinstance(s[0], str): + if target_class in s: + slot_key = "slot-" + ",".join(s) + if slot_key not in slot_wer: + slot_wer[slot_key] = { + 'samples': 0, + 'tokens': 0, + "errors": 0, + "inss": 0, + "dels": 0, + "subs": 0, + } + + for v in values: + slot_wer[slot_key][v] += wer_per_class[target_class][v] + break + else: + raise ValueError("Current only support target metadata belongs to numeric or string ") + + for slot_key in slot_wer: + slot_wer[slot_key]['wer'] = slot_wer[slot_key]['errors'] / slot_wer[slot_key]['tokens'] + slot_wer[slot_key]['ins_rate'] = slot_wer[slot_key]['inss'] / slot_wer[slot_key]['tokens'] + slot_wer[slot_key]['del_rate'] = slot_wer[slot_key]['dels'] / slot_wer[slot_key]['tokens'] + slot_wer[slot_key]['sub_rate'] = slot_wer[slot_key]['subs'] / slot_wer[slot_key]['tokens'] + slot_wer[slot_key].pop('errors') + slot_wer[slot_key].pop('inss') + slot_wer[slot_key].pop('dels') + slot_wer[slot_key].pop('subs') + res_wer_per_class.update(slot_wer) + + ret = None + if meta_cfg.save_wer_per_class: + ret = res_wer_per_class + if (not meta_cfg.save_wer_per_class) and ('slot' in meta_cfg and meta_cfg.slot): + ret = slot_wer + return ret From b672a3a0b24407df7a559a761c64f7feef279258 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 11 Jan 2023 13:07:04 -0800 Subject: [PATCH 134/154] Docs g2p update (#5769) (#5775) * links update, riva docs link Signed-off-by: ekmb Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- .../wfst/wfst_customization.rst | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/docs/source/nlp/text_normalization/wfst/wfst_customization.rst b/docs/source/nlp/text_normalization/wfst/wfst_customization.rst index 9e30c58b9645..c4c1b7cb47fe 100644 --- a/docs/source/nlp/text_normalization/wfst/wfst_customization.rst +++ b/docs/source/nlp/text_normalization/wfst/wfst_customization.rst @@ -5,19 +5,19 @@ Grammar customization .. note:: - *TN/ITN is transitioning from [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo) repository to a standalone [NVIDIA/NeMo-text-processing](https://github.com/NVIDIA/NeMo-text-processing) repository. All updates and discussions/issues should go to the new repository.* + TN/ITN is transitioning from `NVIDIA/NeMo `_ repository to a standalone `NVIDIA/NeMo-text-processing `_ repository. All updates and discussions/issues should go to the new repository. -All grammar development is done with [Pynini library](https://www.opengrm.org/twiki/bin/view/GRM/Pynini). +All grammar development is done with `Pynini library `_. These grammars can be exported to .far files and used with Riva/Sparrowhawk, see :doc:`Text Processing Deployment ` for details. Steps to customize grammars --------------------------- -1. Install [NeMo-TN from source](https://github.com/NVIDIA/NeMo-text-processing#from-source) -2. Run [nemo_text_processing/text_normalization/normalize.py](https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize.py) or [nemo_text_processing/inverse_text_normalization/inverse_normalize.py](https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/inverse_text_normalization/inverse_normalize.py) with `--verbose` flag to evaluate current behavior on the target case, see argument details in the scripts and [this tutorial](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Text_(Inverse)_Normalization.ipynb) -3. Modify existing grammars or add new grammars to cover the target case using [Tutorial on how to write new grammars](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/WFST_Tutorial.ipynb) -4. Add new test cases [here](https://github.com/NVIDIA/NeMo-text-processing/tree/main/tests/nemo_text_processing): +1. Install `NeMo-TN from source `_ +2. Run `nemo_text_processing/text_normalization/normalize.py `_ or `nemo_text_processing/inverse_text_normalization/inverse_normalize.py `_ with `--verbose` flag to evaluate current behavior on the target case, see argument details in the scripts and `this tutorial `_ +3. Modify existing grammars or add new grammars to cover the target case using `Tutorial on how to write new grammars `_ +4. Add new test cases `here `_: - Run python tests: .. code-block:: bash @@ -35,3 +35,8 @@ Steps to customize grammars WFST TN/ITN resources could be found in :doc:`here `. + +Riva resources +-------------- + - `Riva Text Normalization customization for TTS `_. + - `Riva ASR/Inverse Text Normalization customization `_. \ No newline at end of file From 7e8ba5191398deabff504844904dbf7a37578385 Mon Sep 17 00:00:00 2001 From: Yang Zhang Date: Wed, 11 Jan 2023 16:12:36 -0500 Subject: [PATCH 135/154] adding back tar script for decoder dataset for duplex (#5773) * adding back tar script for decoder dataset for duplex Signed-off-by: Yang Zhang * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Yang Zhang Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- .../nn_text_normalization.rst | 6 +- .../data/create_tarred_dataset.py | 299 ++++++++++++++++++ .../duplex_text_normalization_train.py | 7 +- .../text_normalization/decoder_dataset.py | 4 +- requirements/requirements_docs.txt | 1 + 5 files changed, 311 insertions(+), 6 deletions(-) create mode 100644 examples/nlp/duplex_text_normalization/data/create_tarred_dataset.py diff --git a/docs/source/nlp/text_normalization/nn_text_normalization.rst b/docs/source/nlp/text_normalization/nn_text_normalization.rst index 5b3ddfdec53b..89f18a69a5b4 100644 --- a/docs/source/nlp/text_normalization/nn_text_normalization.rst +++ b/docs/source/nlp/text_normalization/nn_text_normalization.rst @@ -105,9 +105,9 @@ Tarred datasets can be created as follows: .. code:: - python examples/nlp/duplex_text_normalization/create_tarred_dataset.py \ - --input_files = "" \ - --input_files = "" \ + python examples/nlp/duplex_text_normalization/data/create_tarred_dataset.py \ + --input_files = "" \ + --input_files = "" \ --out_dir="" diff --git a/examples/nlp/duplex_text_normalization/data/create_tarred_dataset.py b/examples/nlp/duplex_text_normalization/data/create_tarred_dataset.py new file mode 100644 index 000000000000..f8eb6970f49c --- /dev/null +++ b/examples/nlp/duplex_text_normalization/data/create_tarred_dataset.py @@ -0,0 +1,299 @@ +# Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import json +import os +import pickle +import random +import tarfile +from glob import glob +from typing import List, Tuple + +from joblib import Parallel, delayed +from tqdm import tqdm +from transformers import AutoTokenizer + +import nemo.collections.nlp.data.text_normalization.constants as constants +from nemo.collections.nlp.data.text_normalization.decoder_dataset import TextNormalizationDecoderDataset +from nemo.utils import logging + + +""" +The script builds tar files for Tarred TextNormalizationDecoderDataset + +See `text_normalization doc ` +for more details on data format, and en/data_processing.py on how to pre-process the data before tarring. + +To run the script, use: + + python create_tarred_dataset.py \ + --input_files = "train_processed/output-00099-of-00100" \ + --input_files = "train_processed/output-00098-of-00100" \ + --lang = "en" \ + --out_dir="TARRED_DATA_OUTPUT_DIR" + +See the argparse help for more arguments. +""" + + +def _preprocess_file(input_file: str) -> List[Tuple[List[str]]]: + """ + Performs initial preprocessing, i.e., urls formatting, removal of "_trans" from Ru set + + Args: + input_file: path to a file in google TN format + + Returns: + Processed data. Each element is a Tuple(List[semiotic classes], List[written words], List[spoken words]) + """ + print(f"Reading and running initial pre-processing of {input_file}...") + cur_split = [] + with open(input_file, 'r', encoding='utf-8') as f: + # Loop through each line of the file + cur_classes, cur_tokens, cur_outputs = [], [], [] + for linectx, line in tqdm(enumerate(f)): + es = line.strip().split('\t') + if len(es) == 2 and es[0] == '': + cur_split.append((cur_classes, cur_tokens, cur_outputs)) + # Reset + cur_classes, cur_tokens, cur_outputs = [], [], [] + continue + assert len(es) == 3 + cur_classes.append(es[0]) + cur_tokens.append(es[1]) + cur_outputs.append(es[2]) + return cur_split + + +def _write_batches_to_tarfiles( + input_file: str, + tokenizer: AutoTokenizer, + tokenizer_name: str, + mode: str, + lang: str, + max_seq_len: int, + batch_size: int, + out_dir: str, + num_batches_per_tarfile: int, + decoder_data_augmentation: bool = False, +): + """ + Creates tar files for the input file, i.e.: + 1. Creates a TextNormalizationDecoderDataset from the input file + 2. Constructs batches of size `batch_size` + 3. Saves each created batch to a pickle file and then adds `num_batches_per_tarfile` + of the pickle files to a tarfile. + + Args: + input_file: path to cleaned data file. See en/data_processing.py for cleaning. + tokenizer: tokenizer + tokenizer_name: the name of the tokenizer, usually corresponds to the pre-trained LM + mode: model training mode + max_seq_len: maximum length of the sequence (examples that are longer will be discarded) + batch_size: batch size + out_dir: path to output directory + num_batches_per_tarfile: number of batches saved in each tar file + decoder_data_augmentation: Set to True to enable data augmentation for the decoder model + lang: data language + """ + + dataset = TextNormalizationDecoderDataset( + input_file=input_file, + raw_instances=_preprocess_file(input_file=input_file), + tokenizer=tokenizer, + tokenizer_name=tokenizer_name, + mode=mode, + max_len=max_seq_len, + decoder_data_augmentation=decoder_data_augmentation, + lang=lang, + use_cache=False, + max_insts=-1, + do_tokenize=False, + initial_shuffle=True, + ) + dataset.batchify(batch_size) + file_name = os.path.basename(input_file) + tar_file_ctr = 0 + tar_file_path = os.path.join( + out_dir, '%s-batches.%d.%d.%d.tar' % (file_name, batch_size, max_seq_len, tar_file_ctr) + ) + tar_file_ptr = tarfile.open(tar_file_path, 'w') + total_batch_ctr = 0 + batch_ctr = 0 + for batch in dataset.batches: + total_batch_ctr += 1 + batch_ctr += 1 + pickle_file = os.path.join(out_dir, '%s-batch-%d.pkl' % (file_name, total_batch_ctr)) + + pickle.dump(batch, open(pickle_file, 'wb')) + tar_file_ptr.add(pickle_file) + os.remove(pickle_file) + + if batch_ctr == num_batches_per_tarfile: + tar_file_ctr += 1 + tar_file_ptr.close() + tar_file_path = os.path.join( + out_dir, f'%s-batches.%d.%d.%d.tar' % (file_name, batch_size, max_seq_len, tar_file_ctr) + ) + tar_file_ptr = tarfile.open(tar_file_path, 'w',) + batch_ctr = 0 + + # return tar files paths that have batches remaining + remainder_tar_file_path = tar_file_ptr.name + tar_file_ptr.close() + + return total_batch_ctr, remainder_tar_file_path + + +if __name__ == '__main__': + parser = argparse.ArgumentParser(description='(Inverse) Text Normalization tarred dataset creation') + parser.add_argument('--transformer_name', type=str, default="t5-small", help='Name of the pretrained LM.') + parser.add_argument('--mode', type=str, default='tn', choices=constants.MODES, help='(I)TN model training mode.') + parser.add_argument('--lang', type=str, default='en', choices=constants.SUPPORTED_LANGS, help='language.') + parser.add_argument( + '--decoder_data_augmentation', + action="store_true", + help='Set to True to use data augmentation for the decoder model.', + ) + parser.add_argument( + '-in', + '--input_files', + action='append', + required=True, + help="Example: -in train_processed/output-00099-of-00100 -in train_processed/output-00098-of-00100", + ) + parser.add_argument('--out_dir', type=str, required=True, help='Path to store dataloader and tokenizer models.') + parser.add_argument( + '--max_seq_length', type=int, default=80, help='Maximum sequence length, longer examples will be discarded.' + ) + parser.add_argument('--min_seq_length', type=int, default=1, help='Minimum sequence length.') + parser.add_argument( + '--num_batches_per_tarfile', + type=int, + default=2, + help='Number batches, i.e., pickle files, included in a single .tar file.', + ) + parser.add_argument('--n_jobs', type=int, default=-2, help='The maximum number of concurrently running jobs.') + parser.add_argument( + '--batch_size', type=int, default=16, help='Batch size, i.e., number of examples in a single pickle file' + ) + parser.add_argument( + '--factor', default=8, type=int, help='The final number of tar files will be divisible by the "factor" value' + ) + + args = parser.parse_args() + + # check if tar files exist + if os.path.exists(args.out_dir): + tar_files_in_out_dir = glob(f'{args.out_dir}/*.tar') + if tar_files_in_out_dir: + raise ValueError( + f'Tar files detected in {args.out_dir}. Delete the files to re-construct the dataset in the same directory.' + ) + else: + os.makedirs(args.out_dir) + + world_size = 1 + tokenizer = AutoTokenizer.from_pretrained(args.transformer_name) + + results_list = Parallel(n_jobs=args.n_jobs)( + delayed(_write_batches_to_tarfiles)( + input_file=input_file, + tokenizer=tokenizer, + tokenizer_name=args.transformer_name, + mode=args.mode, + lang=args.lang, + batch_size=args.batch_size, + max_seq_len=args.max_seq_length, + decoder_data_augmentation=args.decoder_data_augmentation, + out_dir=args.out_dir, + num_batches_per_tarfile=args.num_batches_per_tarfile, + ) + for input_file in args.input_files + ) + + total_batches = sum([batch_count for batch_count, _ in results_list]) + + # save batches from tar files containing the left over batches (if there's enough batches) + remainder_tar_file_ctr = 0 + remainder_tar_file_path = os.path.join( + args.out_dir, f'remainder-batches.tokens.{args.batch_size}.tar_file_{remainder_tar_file_ctr}.tar' + ) + remainder_tar_file_ptr = tarfile.open(remainder_tar_file_path, 'w') + batch_in_tar_ctr = 0 + for _, tar_file_path in results_list: + tar_file_ptr = tarfile.open(tar_file_path, 'r') + for member in tar_file_ptr.getmembers(): + remainder_tar_file_ptr.addfile(member, tar_file_ptr.extractfile(member.name)) + batch_in_tar_ctr += 1 + if batch_in_tar_ctr == args.num_batches_per_tarfile: + remainder_tar_file_ctr += 1 + remainder_tar_file_ptr.close() + remainder_tar_file_path = os.path.join( + args.out_dir, f'remainder-batches.tokens.{args.batch_size}.tar_file_{remainder_tar_file_ctr}.tar', + ) + remainder_tar_file_ptr = tarfile.open(remainder_tar_file_path, 'w',) + batch_in_tar_ctr = 0 + tar_file_ptr.close() + os.remove(tar_file_path) + + # log the number of batches remaining as they will be discarded + num_batches_discarded = len(remainder_tar_file_ptr.getmembers()) + remainder_tar_file_ptr.close() + os.remove(remainder_tar_file_path) + + tar_file_paths = glob(f'{args.out_dir}/*.tar') + if args.factor != 1: + num_tar_files = len(tar_file_paths) + num_tars_to_drop = num_tar_files % args.factor + num_batches_discarded += num_tars_to_drop * args.num_batches_per_tarfile + + random.shuffle(tar_file_paths) + for _ in range(num_tars_to_drop): + os.remove(tar_file_paths.pop(-1)) + + total_batches -= num_batches_discarded + logging.info(f'Number of batches discarded: {num_batches_discarded}, total batches kept: {total_batches}') + + # dump metadata to json + metadata = {} + metadata['num_batches'] = total_batches + + # rename tar files so they can be more easily used with CLI and YAML + file_name = f'{args.mode}.{args.batch_size}_bs.{args.num_batches_per_tarfile}_b_per_tar.{args.max_seq_length}_len' + for index, path in enumerate(tar_file_paths): + os.rename(path, os.path.join(args.out_dir, f'{file_name}.{index}.tar')) + + text_tar_filepaths = f'{file_name}._OP_0..{index}_CL_.tar' + logging.info(f'Files for brace expansion: "{text_tar_filepaths}"') + metadata['text_tar_filepaths'] = text_tar_filepaths + + # add tar files to metadata + tar_file_paths = glob(f'{args.out_dir}/*.tar') + metadata['tar_files'] = tar_file_paths + metadata_path = os.path.join(args.out_dir, 'metadata.json') + json.dump(metadata, open(metadata_path, 'w')) + + num_tar_files = len(tar_file_paths) + if num_tar_files < world_size: + raise ValueError( + ( + f'Number of tar files found: {num_tar_files} is less than world size: {world_size}. ' + f'There should be at least one tar file per GPU (ideally many tar files per GPU). ' + f'This may be due to dataset size. ' + f'Decrease num_batches_per_tarfile or num_tokens_per_batch to increase the number of tarfiles. ' + f'Also using shard_strategy=replicate will use all available tarfiles for every GPU. ' + ) + ) diff --git a/examples/nlp/duplex_text_normalization/duplex_text_normalization_train.py b/examples/nlp/duplex_text_normalization/duplex_text_normalization_train.py index 134dc417f1af..f2daff7a8deb 100644 --- a/examples/nlp/duplex_text_normalization/duplex_text_normalization_train.py +++ b/examples/nlp/duplex_text_normalization/duplex_text_normalization_train.py @@ -34,7 +34,7 @@ at the example config file to see the list of parameters used for training. USAGE Example: -1. Obtain a processed dataset (refer to the `text_normalization doc `) +1. Obtain a processed dataset (refer to the `text_normalization doc `) 2. Run: # python duplex_text_normalization_train.py \ data.validation_ds.data_path=PATH_TO_VALIDATION_FILE \ @@ -66,6 +66,11 @@ lang={en,ru,de} \ tagger_model.do_training=false +To use tarred dataset for decoder training set: + + data.train_ds.use_tarred_dataset=True \ + data.train_ds.tar_metadata_file=\PATH_TO\\metadata.json + Information on the arguments: Most arguments in the example config file are quite self-explanatory (e.g., diff --git a/nemo/collections/nlp/data/text_normalization/decoder_dataset.py b/nemo/collections/nlp/data/text_normalization/decoder_dataset.py index 724e2d3229e5..33f693892528 100644 --- a/nemo/collections/nlp/data/text_normalization/decoder_dataset.py +++ b/nemo/collections/nlp/data/text_normalization/decoder_dataset.py @@ -33,7 +33,7 @@ from nemo.core.classes import Dataset from nemo.utils import logging -__all__ = ['TextNormalizationDecoderDataset'] +__all__ = ['TextNormalizationDecoderDataset', 'TarredTextNormalizationDecoderDataset'] class TextNormalizationDecoderDataset(Dataset): @@ -45,7 +45,7 @@ class TextNormalizationDecoderDataset(Dataset): Args: input_file: path to the raw data file (e.g., train.tsv). For more info about the data format, refer to the - `text_normalization doc `. + `text_normalization doc `. raw_instances: processed raw instances in the Google TN dataset format (used for tarred dataset) tokenizer: tokenizer of the model that will be trained on the dataset tokenizer_name: name of the tokenizer, diff --git a/requirements/requirements_docs.txt b/requirements/requirements_docs.txt index 43cb0bb2c185..e9dc73e0ffaf 100644 --- a/requirements/requirements_docs.txt +++ b/requirements/requirements_docs.txt @@ -1,3 +1,4 @@ +Jinja2<3.1 latexcodec numpy sphinx>=3.0 From 015d36e6607be0a1262ccb51fc9af8d2847d387d Mon Sep 17 00:00:00 2001 From: anteju <108555623+anteju@users.noreply.github.com> Date: Wed, 11 Jan 2023 16:11:20 -0800 Subject: [PATCH 136/154] [ASR][Test] Enable test for cache audio with a single worker (#5763) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Ante Jukić Signed-off-by: Ante Jukić Signed-off-by: Elena Rastorgueva --- tests/collections/asr/test_asr_datasets.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/collections/asr/test_asr_datasets.py b/tests/collections/asr/test_asr_datasets.py index a8ce58d51ba8..73b0be22cea0 100644 --- a/tests/collections/asr/test_asr_datasets.py +++ b/tests/collections/asr/test_asr_datasets.py @@ -1725,8 +1725,7 @@ def test_audio_to_target_with_embedding_dataset(self): class TestUtilityFunctions: @pytest.mark.unit - # TEMP WORKAROUND: Skip `True`, multiprocessing failing in CI (#5607) - @pytest.mark.parametrize('cache_audio', [False]) + @pytest.mark.parametrize('cache_audio', [False, True]) def test_cache_datastore_manifests(self, cache_audio: bool): """Test caching of manifest and audio files. """ @@ -1794,7 +1793,8 @@ def fake_get(self): with mock.patch( 'nemo.collections.asr.data.audio_to_text.is_datastore_path', lambda x: True ), mock.patch.object(DataStoreObject, 'get', fake_get): - cache_datastore_manifests(manifest_filepaths, cache_audio=cache_audio) + # Use a single worker for this test to avoid failure with mock & multiprocessing (#5607) + cache_datastore_manifests(manifest_filepaths, cache_audio=cache_audio, num_workers=1) # Manifests need to be compared store_files_to_compare = manifest_filepaths From e4898326b14d558bad245ce0b3ecff2b5e1c9b68 Mon Sep 17 00:00:00 2001 From: Boris Fomitchev Date: Wed, 11 Jan 2023 19:05:38 -0800 Subject: [PATCH 137/154] Fixing masking in RadTTS bottleneck layer (#5771) * Fixing masking in RadTTS bottleneck layer Signed-off-by: Boris Fomitchev Signed-off-by: Elena Rastorgueva --- nemo/collections/tts/models/radtts.py | 9 +++++++ .../tts/modules/attribute_prediction_model.py | 25 +++++++++++-------- 2 files changed, 23 insertions(+), 11 deletions(-) diff --git a/nemo/collections/tts/models/radtts.py b/nemo/collections/tts/models/radtts.py index 2f51f70bd571..337da8db300d 100644 --- a/nemo/collections/tts/models/radtts.py +++ b/nemo/collections/tts/models/radtts.py @@ -395,6 +395,15 @@ def tb_logger(self): self._tb_logger = tb_logger return self._tb_logger + def load_state_dict(self, state_dict, strict=True): + # Override load_state_dict to be backward-compatible with old checkpoints + new_state_dict = {} + for k, v in state_dict.items(): + k = k.replace("projection_fn.weight", "projection_fn.conv.weight") + k = k.replace("projection_fn.bias", "projection_fn.conv.bias") + new_state_dict[k] = v + super().load_state_dict(new_state_dict, strict=strict) + @property def input_module(self): return self.model diff --git a/nemo/collections/tts/modules/attribute_prediction_model.py b/nemo/collections/tts/modules/attribute_prediction_model.py index 1dc952373d89..d6595c94efea 100644 --- a/nemo/collections/tts/modules/attribute_prediction_model.py +++ b/nemo/collections/tts/modules/attribute_prediction_model.py @@ -13,10 +13,12 @@ # limitations under the License. import torch -from torch import nn -from torch.nn import functional as F +import torch.nn as nn +import torch.nn.functional as F -from nemo.collections.tts.modules.common import ConvLSTMLinear, ConvNorm +from nemo.collections.tts.helpers.helpers import get_mask_from_lengths +from nemo.collections.tts.modules.common import ConvLSTMLinear +from nemo.collections.tts.modules.submodules import ConvNorm, MaskedInstanceNorm1d def get_attribute_prediction_model(config): @@ -54,18 +56,20 @@ def __init__(self, in_dim, reduction_factor, norm='weightnorm', non_linearity='r reduced_dim = int(in_dim / reduction_factor) self.out_dim = reduced_dim if self.reduction_factor > 1: - fn = ConvNorm(in_dim, reduced_dim, kernel_size=3) if norm == 'weightnorm': - fn = torch.nn.utils.weight_norm(fn.conv, name='weight') + norm_args = {"use_weight_norm": True} elif norm == 'instancenorm': - fn = nn.Sequential(fn, nn.InstanceNorm1d(reduced_dim, affine=True)) - + norm_args = {"norm_fn": MaskedInstanceNorm1d} + else: + norm_args = {} + fn = ConvNorm(in_dim, reduced_dim, kernel_size=3, **norm_args) self.projection_fn = fn self.non_linearity = non_linearity - def forward(self, x): + def forward(self, x, lens): if self.reduction_factor > 1: - x = self.projection_fn(x) + mask = get_mask_from_lengths(lens, x).unsqueeze(1) + x = self.projection_fn(x, mask) if self.non_linearity == 'relu': x = F.relu(x) elif self.non_linearity == 'leakyrelu': @@ -77,7 +81,6 @@ class DAP(AttributeProcessing): def __init__(self, n_speaker_dim, bottleneck_hparams, take_log_of_input, arch_hparams): super(DAP, self).__init__(take_log_of_input) self.bottleneck_layer = BottleneckLayerLayer(**bottleneck_hparams) - arch_hparams['in_dim'] = self.bottleneck_layer.out_dim + n_speaker_dim self.feat_pred_fn = ConvLSTMLinear(**arch_hparams) @@ -85,7 +88,7 @@ def forward(self, txt_enc, spk_emb, x, lens): if x is not None: x = self.normalize(x) - txt_enc = self.bottleneck_layer(txt_enc) + txt_enc = self.bottleneck_layer(txt_enc, lens) spk_emb_expanded = spk_emb[..., None].expand(-1, -1, txt_enc.shape[2]) context = torch.cat((txt_enc, spk_emb_expanded), 1) x_hat = self.feat_pred_fn(context, lens) From e35d0423dc99a0b6c708a7f7cfaa475b9147591f Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 11 Jan 2023 19:09:52 -0800 Subject: [PATCH 138/154] Update torchaudio dependency version for tutorials (#5781) (#5782) Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Elena Rastorgueva --- tutorials/asr/Online_ASR_Microphone_Demo.ipynb | 2 +- tutorials/asr/Online_Noise_Augmentation.ipynb | 2 +- tutorials/asr/Online_Offline_Microphone_VAD_Demo.ipynb | 2 +- tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb | 2 +- tutorials/asr/Speech_Commands.ipynb | 2 +- tutorials/asr/Voice_Activity_Detection.ipynb | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/tutorials/asr/Online_ASR_Microphone_Demo.ipynb b/tutorials/asr/Online_ASR_Microphone_Demo.ipynb index 5d2f1451d1bf..b7031a074c02 100644 --- a/tutorials/asr/Online_ASR_Microphone_Demo.ipynb +++ b/tutorials/asr/Online_ASR_Microphone_Demo.ipynb @@ -30,7 +30,7 @@ "!python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[asr]\n", "\n", "## Install TorchAudio\n", - "!pip install torchaudio>=0.10.0 -f https://download.pytorch.org/whl/torch_stable.html\n", + "!pip install torchaudio>=0.13.0 -f https://download.pytorch.org/whl/torch_stable.html\n", "\n", "## Grab the config we'll use in this example\n", "!mkdir configs" diff --git a/tutorials/asr/Online_Noise_Augmentation.ipynb b/tutorials/asr/Online_Noise_Augmentation.ipynb index 5756c7d58ebe..d08282e712ef 100644 --- a/tutorials/asr/Online_Noise_Augmentation.ipynb +++ b/tutorials/asr/Online_Noise_Augmentation.ipynb @@ -35,7 +35,7 @@ "!python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[asr]\n", "\n", "## Install TorchAudio\n", - "!pip install torchaudio>=0.10.0 -f https://download.pytorch.org/whl/torch_stable.html\n", + "!pip install torchaudio>=0.13.0 -f https://download.pytorch.org/whl/torch_stable.html\n", "\n", "## Grab the config we'll use in this example\n", "!mkdir configs" diff --git a/tutorials/asr/Online_Offline_Microphone_VAD_Demo.ipynb b/tutorials/asr/Online_Offline_Microphone_VAD_Demo.ipynb index 2076bc06982b..21b85eaea940 100644 --- a/tutorials/asr/Online_Offline_Microphone_VAD_Demo.ipynb +++ b/tutorials/asr/Online_Offline_Microphone_VAD_Demo.ipynb @@ -30,7 +30,7 @@ "!python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[asr]\n", "\n", "## Install TorchAudio\n", - "!pip install torchaudio>=0.10.0 -f https://download.pytorch.org/whl/torch_stable.html" + "!pip install torchaudio>=0.13.0 -f https://download.pytorch.org/whl/torch_stable.html" ] }, { diff --git a/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb b/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb index 2488e46287a6..a9ff324508d1 100644 --- a/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb +++ b/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb @@ -32,7 +32,7 @@ "!python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[asr]\n", "\n", "## Install TorchAudio\n", - "!pip install torchaudio>=0.10.0 -f https://download.pytorch.org/whl/torch_stable.html" + "!pip install torchaudio>=0.13.0 -f https://download.pytorch.org/whl/torch_stable.html" ] }, { diff --git a/tutorials/asr/Speech_Commands.ipynb b/tutorials/asr/Speech_Commands.ipynb index 14cf1dc3812f..8aece9c0bcf8 100644 --- a/tutorials/asr/Speech_Commands.ipynb +++ b/tutorials/asr/Speech_Commands.ipynb @@ -64,7 +64,7 @@ "!python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[asr]\n", "\n", "## Install TorchAudio\n", - "!pip install torchaudio>=0.10.0 -f https://download.pytorch.org/whl/torch_stable.html\n", + "!pip install torchaudio>=0.13.0 -f https://download.pytorch.org/whl/torch_stable.html\n", "\n", "## Grab the config we'll use in this example\n", "!mkdir configs" diff --git a/tutorials/asr/Voice_Activity_Detection.ipynb b/tutorials/asr/Voice_Activity_Detection.ipynb index f0d2ef14ce6f..e9cc3daff999 100644 --- a/tutorials/asr/Voice_Activity_Detection.ipynb +++ b/tutorials/asr/Voice_Activity_Detection.ipynb @@ -31,7 +31,7 @@ "!python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[asr]\n", "\n", "## Install TorchAudio\n", - "!pip install torchaudio>=0.10.0 -f https://download.pytorch.org/whl/torch_stable.html\n", + "!pip install torchaudio>=0.13.0 -f https://download.pytorch.org/whl/torch_stable.html\n", "\n", "## Grab the config we'll use in this example\n", "!mkdir configs" From db1fd2bd842ebed5f4f17173ad00cefdebcd69e7 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 11 Jan 2023 21:08:14 -0800 Subject: [PATCH 139/154] [TTS][ZH] bugfix import jieba errors. (#5776) (#5784) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva --- nemo_text_processing/g2p/modules.py | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/nemo_text_processing/g2p/modules.py b/nemo_text_processing/g2p/modules.py index fbdfd7887841..07337fd9da5f 100644 --- a/nemo_text_processing/g2p/modules.py +++ b/nemo_text_processing/g2p/modules.py @@ -551,6 +551,7 @@ def __init__( None, 'jieba', ], f"{word_segmenter} is not supported now. Please choose correct word_segmenter." + phoneme_dict = ( self._parse_as_pinyin_dict(phoneme_dict) if isinstance(phoneme_dict, str) or isinstance(phoneme_dict, pathlib.Path) @@ -572,12 +573,21 @@ def __init__( mapping_file=mapping_file, ) self.tones = {'1': '#1', '2': '#2', '3': '#3', '4': '#4', '5': '#5'} - self.word_segmenter = word_segmenter + + if word_segmenter == "jieba": + try: + import jieba + except ImportError as e: + logging.error(e) + + # Cut sentences into words to improve polyphone disambiguation + self.word_segmenter = jieba.cut + else: + self.word_segmenter = lambda x: [x] try: from pypinyin import lazy_pinyin, Style from pypinyin_dict.pinyin_data import cc_cedict - import jieba except ImportError as e: logging.error(e) @@ -612,13 +622,7 @@ def __call__(self, text): 'ge4', 'i', 'P', 'h', 'o', 'n', 'e', '。'] """ pinyin_seq = [] - if self.word_segmenter == 'jieba': - # Cut sentences into words to improve polyphone disambiguation - words_list = list(jieba.cut(text)) - elif self.word_segmenter is None: - words_list = [text] - else: - raise NotImplementedError + words_list = self.word_segmenter(text) for word in words_list: pinyin_seq += self._lazy_pinyin( From aac3d97c83cfe0bc9b3dd46b76ddb13cabf836f5 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Thu, 12 Jan 2023 06:56:58 -0800 Subject: [PATCH 140/154] fix typos Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 4 ++-- tools/nemo_forced_aligner/utils/data_prep.py | 3 +-- tools/nemo_forced_aligner/utils/make_output_files.py | 3 +-- 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 09d0e2057c56..ad613a7095a4 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -5,7 +5,7 @@ A tool for doing Forced Alignment using Viterbi decoding of NeMo CTC-based model ## Usage example ``` bash -python /NeMo/tools/nemo_forced_aligner/align.py \ +python /tools/nemo_forced_aligner/align.py \ pretrained_name="stt_en_citrinet_1024_gamma_0_25" \ model_downsample_factor=8 \ manifest_filepath= \ @@ -37,7 +37,7 @@ Call the `align.py` script, specifying the parameters as follows: * **[OPTIONAL]** `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). -* **[OPTIONAL]** `additional_ctm_grouping_separator`: the string used to separate CTM segments if you want to obtain CTM files at a level that is not the token level or the word level. NFA will always produce token-level and word-level CTM files in: `/tokens/.ctm` and `/words/.ctm`. If `additional_ctm_grouping_separator` is specified, an additional folder `/{tokens/words/additional_segments}/.ctm` will be created containing CTMs for `addtional_ctm_grouping_separator`-separated segments. (Default: `None`. Cannot be empty string or space (" "), as space, as space-separated word-level CTMs will always be saved in `/words/.ctm`.) +* **[OPTIONAL]** `additional_ctm_grouping_separator`: the string used to separate CTM segments if you want to obtain CTM files at a level that is not the token level or the word level. NFA will always produce token-level and word-level CTM files in: `/tokens/.ctm` and `/words/.ctm`. If `additional_ctm_grouping_separator` is specified, an additional folder `/{tokens/words/additional_segments}/.ctm` will be created containing CTMs for `addtional_ctm_grouping_separator`-separated segments. (Default: `None`. Cannot be empty string or space (" "), as space-separated word-level CTMs will always be saved in `/words/.ctm`.) > Note: the `additional_ctm_grouping_separator` will be removed from the ground truth text and all the output CTMs, ie it is treated as a marker which is not part of the ground truth. The separator will essentially be treated as a space, and any additional spaces around it will be amalgamated into one, i.e. if `additional_ctm_grouping_separator="|"`, the following texts will be treated equivalently: `“abc|def”`, `“abc |def”`, `“abc| def”`, `“abc | def"`. * **[OPTIONAL]** `remove_blank_tokens_from_ctm`: a boolean denoting whether to remove tokens from token-level output CTMs. (Default: False). diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index e5176a862ce7..7422e7738aa3 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -17,8 +17,7 @@ import soundfile as sf import torch - -from .constants import BLANK_TOKEN, SPACE_TOKEN, V_NEGATIVE_NUM +from utils.constants import BLANK_TOKEN, SPACE_TOKEN, V_NEGATIVE_NUM def get_batch_starts_ends(manifest_filepath, batch_size): diff --git a/tools/nemo_forced_aligner/utils/make_output_files.py b/tools/nemo_forced_aligner/utils/make_output_files.py index aede6564fa63..a972b13a9982 100644 --- a/tools/nemo_forced_aligner/utils/make_output_files.py +++ b/tools/nemo_forced_aligner/utils/make_output_files.py @@ -17,8 +17,7 @@ from pathlib import Path import soundfile as sf - -from .constants import BLANK_TOKEN, SPACE_TOKEN +from utils.constants import BLANK_TOKEN, SPACE_TOKEN def _get_utt_id(audio_filepath, audio_filepath_parts_in_utt_id): From 58a461e3d2fc594986b0249602a3b4bcff471091 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Thu, 12 Jan 2023 08:32:09 -0800 Subject: [PATCH 141/154] update requirements.txt Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/nemo_forced_aligner/requirements.txt b/tools/nemo_forced_aligner/requirements.txt index 84e0b7dbae6d..bc5691de3513 100644 --- a/tools/nemo_forced_aligner/requirements.txt +++ b/tools/nemo_forced_aligner/requirements.txt @@ -1 +1 @@ -nemo +nemo_toolkit[asr] From 22d1a395be5a44e53715923c1429e754e1b841fd Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 13 Jan 2023 04:03:27 -0800 Subject: [PATCH 142/154] Make default devices None and set to GPU if it is available Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 4 ++-- tools/nemo_forced_aligner/align.py | 33 ++++++++++++++++++++++------- 2 files changed, 27 insertions(+), 10 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index ad613a7095a4..bc148104d7d6 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -31,9 +31,9 @@ Call the `align.py` script, specifying the parameters as follows: * **[OPTIONAL]** `align_using_pred_text`: if True, will transcribe the audio using the ASR model (specified by `pretrained_name` or `model_path`) and then use that transcription as the 'ground truth' for the forced alignment. The `"pred_text"` will be saved in the output JSON manifest at `/{original manifest name}_with_ctm_paths.json`. To avoid over-writing other transcribed texts, if there are already `"pred_text"` entries in the original manifest, the program will exit without attempting to generate alignments. (Default: False). -* **[OPTIONAL]** `transcribe_device`: The device that will be used for generating log-probs (i.e. transcribing). (Default: 'cpu'). +* **[OPTIONAL]** `transcribe_device`: The device that will be used for generating log-probs (i.e. transcribing). If None, NFA will set it to 'cuda' if it is available (otherwise will set it to 'cpu'). If specified `transcribe_device` needs to be a string that can be input to the `torch.device()` method. (Default: `None`). -* **[OPTIONAL]** `viterbi_device`: The device that will be used for doing Viterbi decoding. (Default: 'cpu'). +* **[OPTIONAL]** `viterbi_device`: The device that will be used for doing Viterbi decoding. If None, NFA will set it to 'cuda' if it is available (otherwise will set it to 'cpu'). If specified `transcribe_device` needs to be a string that can be input to the `torch.device()` method.(Default: `None`). * **[OPTIONAL]** `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1). diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 048d423c74ee..24b49f86306e 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -57,10 +57,12 @@ output_dir: the folder where output CTM files and new JSON manifest will be saved. align_using_pred_text: if True, will transcribe the audio using the specified model and then use that transcription as the 'ground truth' for the forced alignment. - transcribe_device: string specifying the device that will be used for generating log-probs (i.e. "transcribing"). - The string needs to be in a format recognized by torch.device(). - viterbi_device: string specifying the device that will be used for doing Viterbi decoding. - The string needs to be in a format recognized by torch.device(). + transcribe_device: None, or a string specifying the device that will be used for generating log-probs (i.e. "transcribing"). + The string needs to be in a format recognized by torch.device(). If None, NFA will set it to 'cuda' if it is available + (otherwise will set it to 'cpu'). + viterbi_device: None, or string specifying the device that will be used for doing Viterbi decoding. + The string needs to be in a format recognized by torch.device(). If None, NFA will set it to 'cuda' if it is available + (otherwise will set it to 'cpu'). batch_size: int specifying batch size that will be used for generating log-probs and doing Viterbi decoding. additional_ctm_grouping_separator: the string used to separate CTM segments if you want to obtain CTM files at a level that is not the token level or the word level. NFA will always produce token-level and word-level CTM @@ -95,8 +97,8 @@ class AlignmentConfig: # General configs align_using_pred_text: bool = False - transcribe_device: str = "cpu" - viterbi_device: str = "cpu" + transcribe_device: Optional[str] = None + viterbi_device: Optional[str] = None batch_size: int = 1 additional_ctm_grouping_separator: Optional[str] = None remove_blank_tokens_from_ctm: bool = False @@ -159,8 +161,23 @@ def main(cfg: AlignmentConfig): ) # init devices - transcribe_device = torch.device(cfg.transcribe_device) - viterbi_device = torch.device(cfg.viterbi_device) + if cfg.transcribe_device is None: + transcribe_device = torch.device("cuda" if torch.cuda.is_available else "cpu") + else: + transcribe_device = torch.device(cfg.transcribe_device) + logging.info(f"Device to be used for transcription step (`transcribe_device`) is {transcribe_device}") + + if cfg.viterbi_device is None: + viterbi_device = torch.device("cuda" if torch.cuda.is_available else "cpu") + else: + viterbi_device = torch.device(cfg.viterbi_device) + logging.info(f"Device to be used for viterbi step (`viterbi_device`) is {viterbi_device}") + + if transcribe_device.type == 'cuda' or viterbi_device.type == 'cuda': + logging.warning( + 'One or both of transcribe_device and viterbi_device are GPUs. If you run into OOM errors ' + 'it may help to change both devices to be the CPU.' + ) # load model model, _ = setup_model(cfg, transcribe_device) From c8e692e4f85925324378adca686cfa3086cd39f0 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 13 Jan 2023 04:23:13 -0800 Subject: [PATCH 143/154] add warning for non-zero minimum_timestamp_duration Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/align.py | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 24b49f86306e..5f2a781a381f 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -195,6 +195,12 @@ def main(cfg: AlignmentConfig): "timestamps in output CTM." ) + if cfg.minimum_timestamp_duration > 0: + logging.warning( + f"cfg.minimum_timestamp_duration has been set to {cfg.minimum_timestamp_duration} seconds. " + "This may cause the alignments for some tokens/words/additional segments to be overlapping." + ) + # get start and end line IDs of batches starts, ends = get_batch_starts_ends(cfg.manifest_filepath, cfg.batch_size) From 1a298d5abda7640f38152ecec038b3d26d89092e Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 13 Jan 2023 04:29:06 -0800 Subject: [PATCH 144/154] Clarify phrasing in README regarding raising error if pred_text exists Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index bc148104d7d6..26f33bcb67c6 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -52,7 +52,7 @@ By default, NFA needs to be provided with a 'manifest' file where each line spec {"audio_filepath": "/absolute/path/to/audio.wav", "text": "the transcription of the utterance"} ``` -You can omit the `"text"` field from the manifest if you specify `align_using_pred_text=true`. In that case, any `"text"` fields in the manifest will be ignored: the ASR model at `pretrained_name` or `model_path` will be used to transcribe the audio and obtain `"pred_text"`, which will be used as the 'ground truth' for the forced alignment process. The `"pred_text"` will also be saved in the output manifest JSON file at `/_with_ctm_paths.json`. To remove the possibility of overwriting `"pred_text"`, NFA will not attempt to do alignment with `align_using_pred_text=true` if there are existing `"pred_text"` fields in the original manifest. +You can omit the `"text"` field from the manifest if you specify `align_using_pred_text=true`. In that case, any `"text"` fields in the manifest will be ignored: the ASR model at `pretrained_name` or `model_path` will be used to transcribe the audio and obtain `"pred_text"`, which will be used as the 'ground truth' for the forced alignment process. The `"pred_text"` will also be saved in the output manifest JSON file at `/_with_ctm_paths.json`. To remove the possibility of overwriting `"pred_text"`, NFA will raise an error if `align_using_pred_text=true` and there are existing `"pred_text"` fields in the original manifest. > Note: NFA does not require `"duration"` fields in the manifest, and can align long audio files without running out of memory. Depending on your machine specs, you can align audios up to 5-10 minutes on Conformer CTC models, up to around 1.5 hours for QuartzNet models, and up to several hours for Citrinet models. NFA will also produce better alignments the more accurate the ground-truth `"text"` is. From 47acf80d2c6b40e90fd9223bb78e49a892409833 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 13 Jan 2023 04:40:33 -0800 Subject: [PATCH 145/154] Update README section on evaluating alignment accuracy Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/README.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index 26f33bcb67c6..1f96eba98887 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -75,8 +75,10 @@ A new manifest file will be saved at `/ # How do I evaluate the alignment accuracy? -Ideally you would have some 'true' CTM files to compare with your generated CTM files. +Ideally you would have some 'true' CTM files to compare with your generated CTM files. With these you could obtain metrics such as the mean (absolute) errors between predicted starts/ends and the 'true' starts/ends of the segments. Alternatively (or additionally), you can visualize the quality of alignments using tools such as Gecko, which can play your audio file and display the predicted alignments at the same time. The Gecko tool requires you to upload an audio file and at least one CTM file. The Gecko tool can be accessed here: https://gong-io.github.io/gecko/. More information about the Gecko tool can be found on its Github page here: https://github.com/gong-io/gecko. -Note that Gecko may not display some tokens/words/segments properly if their timestamps are too short. To ameliorate this, you may try setting `minimum_timestamp_duration` to some suitable number. If you are analyzing token-level CTMs, it may also be helpful to set `remove_blank_tokens_from_ctm` to `True`, to make the Gecko visualization less cluttered. +**Note**: the following may help improve your experience viewing the CTMs in Gecko: +* setting `minimum_timestamp_duration` to a larger number, as Gecko may not display some tokens/words/segments properly if their timestamps are too short. +* setting `remove_blank_tokens_from_ctm=true` if you are analyzing token-level CTMs, as it will make the Gecko visualization less cluttered. From 211dc067b040d3f4942f8e8203555fd9f5a268c4 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 13 Jan 2023 07:33:56 -0800 Subject: [PATCH 146/154] fix some code in creating segments Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/utils/data_prep.py | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index 7422e7738aa3..b882351916d7 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -114,15 +114,15 @@ def get_y_and_boundary_info_for_utt(text, model, separator): Get y, token_info, word_info and segment_info for the text provided, tokenized by the model provided. """ - if hasattr(model, 'tokenizer'): + if not separator: # if separator is not defined - treat the whole text as one segment + segments = [text] + else: + segments = text.split(separator) - if not separator: # if separator is not defined - treat the whole text as one segment - segments = [text] - else: - segments = text.split(separator) + # remove any spaces at start and end of segments + segments = [seg.strip() for seg in segments] - # remove any spaces at start and end of segments - segments = [seg.strip() for seg in segments] + if hasattr(model, 'tokenizer'): BLANK_ID = len(model.decoder.vocabulary) # TODO: check y_token_ids = [] @@ -186,11 +186,6 @@ def get_y_and_boundary_info_for_utt(text, model, separator): elif hasattr(model.decoder, "vocabulary"): - segments = text.split(separator) - - # remove any spaces at start and end of segments - segments = [seg.strip() for seg in segments] - BLANK_ID = len(model.decoder.vocabulary) # TODO: check this is correct SPACE_ID = model.decoder.vocabulary.index(" ") y_token_ids = [] From 00b38c4ce5536e0ad99c19475b09e5c00f93c204 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 13 Jan 2023 08:49:40 -0800 Subject: [PATCH 147/154] Add some unit tests for NFA boundary_info creation Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/requirements.txt | 1 + .../test_get_y_and_boundary_info_for_utt.py | 144 ++++++++++++++++++ 2 files changed, 145 insertions(+) create mode 100644 tools/nemo_forced_aligner/tests/test_get_y_and_boundary_info_for_utt.py diff --git a/tools/nemo_forced_aligner/requirements.txt b/tools/nemo_forced_aligner/requirements.txt index bc5691de3513..f00728e27ec9 100644 --- a/tools/nemo_forced_aligner/requirements.txt +++ b/tools/nemo_forced_aligner/requirements.txt @@ -1 +1,2 @@ nemo_toolkit[asr] +pytest diff --git a/tools/nemo_forced_aligner/tests/test_get_y_and_boundary_info_for_utt.py b/tools/nemo_forced_aligner/tests/test_get_y_and_boundary_info_for_utt.py new file mode 100644 index 000000000000..42627dccdbd5 --- /dev/null +++ b/tools/nemo_forced_aligner/tests/test_get_y_and_boundary_info_for_utt.py @@ -0,0 +1,144 @@ +import pytest +from utils.data_prep import get_y_and_boundary_info_for_utt + +from nemo.collections.asr.models import ASRModel + +EN_TEXT = "hi world | hey" + +EN_QN_EXPECTED_TOKEN_INFO = [ + {'text': '', 's_start': 0, 's_end': 0}, + {'text': 'h', 's_start': 1, 's_end': 1}, + {'text': '', 's_start': 2, 's_end': 2}, + {'text': 'i', 's_start': 3, 's_end': 3}, + {'text': '', 's_start': 4, 's_end': 4}, + {'text': '', 's_start': 5, 's_end': 5}, + {'text': '', 's_start': 6, 's_end': 6}, + {'text': 'w', 's_start': 7, 's_end': 7}, + {'text': '', 's_start': 8, 's_end': 8}, + {'text': 'o', 's_start': 9, 's_end': 9}, + {'text': '', 's_start': 10, 's_end': 10}, + {'text': 'r', 's_start': 11, 's_end': 11}, + {'text': '', 's_start': 12, 's_end': 12}, + {'text': 'l', 's_start': 13, 's_end': 13}, + {'text': '', 's_start': 14, 's_end': 14}, + {'text': 'd', 's_start': 15, 's_end': 15}, + {'text': '', 's_start': 16, 's_end': 16}, + {'text': '', 's_start': 17, 's_end': 17}, + {'text': '', 's_start': 18, 's_end': 18}, + {'text': 'h', 's_start': 19, 's_end': 19}, + {'text': '', 's_start': 20, 's_end': 20}, + {'text': 'e', 's_start': 21, 's_end': 21}, + {'text': '', 's_start': 22, 's_end': 22}, + {'text': 'y', 's_start': 23, 's_end': 23}, + {'text': '', 's_start': 24, 's_end': 24}, +] + +EN_QN_EXPECTED_WORD_INFO = [ + {'text': 'hi', 's_start': 1, 's_end': 3}, + {'text': 'world', 's_start': 7, 's_end': 15}, + {'text': 'hey', 's_start': 19, 's_end': 23}, +] + +EN_QN_EXPECTED_SEGMENT_INFO = [ + {'text': 'hi world', 's_start': 1, 's_end': 15}, + {'text': 'hey', 's_start': 19, 's_end': 23}, +] + +EN_CN_EXPECTED_TOKEN_INFO = [ + {'text': '', 's_start': 0, 's_end': 0}, + {'text': '▁hi', 's_start': 1, 's_end': 1}, + {'text': '', 's_start': 2, 's_end': 2}, + {'text': '▁world', 's_start': 3, 's_end': 3}, + {'text': '', 's_start': 4, 's_end': 4}, + {'text': '▁he', 's_start': 5, 's_end': 5}, + {'text': '', 's_start': 6, 's_end': 6}, + {'text': 'y', 's_start': 7, 's_end': 7}, + {'text': '', 's_start': 8, 's_end': 8}, +] + +EN_CN_EXPECTED_WORD_INFO = [ + {'text': 'hi', 's_start': 1, 's_end': 1}, + {'text': 'world', 's_start': 3, 's_end': 3}, + {'text': 'hey', 's_start': 5, 's_end': 7}, +] + +EN_CN_EXPECTED_SEGMENT_INFO = [ + {'text': 'hi world', 's_start': 1, 's_end': 3}, + {'text': 'hey', 's_start': 5, 's_end': 7}, +] + + +ZH_TEXT = "人工 智能|技术" + +ZH_EXPECTED_TOKEN_INFO = [ + {'text': '', 's_start': 0, 's_end': 0}, + {'text': '人', 's_start': 1, 's_end': 1}, + {'text': '', 's_start': 2, 's_end': 2}, + {'text': '工', 's_start': 3, 's_end': 3}, + {'text': '', 's_start': 4, 's_end': 4}, + {'text': '', 's_start': 5, 's_end': 5}, + {'text': '', 's_start': 6, 's_end': 6}, + {'text': '智', 's_start': 7, 's_end': 7}, + {'text': '', 's_start': 8, 's_end': 8}, + {'text': '能', 's_start': 9, 's_end': 9}, + {'text': '', 's_start': 10, 's_end': 10}, + {'text': '', 's_start': 11, 's_end': 11}, + {'text': '', 's_start': 12, 's_end': 12}, + {'text': '技', 's_start': 13, 's_end': 13}, + {'text': '', 's_start': 14, 's_end': 14}, + {'text': '术', 's_start': 15, 's_end': 15}, + {'text': '', 's_start': 16, 's_end': 16}, +] + +ZH_EXPECTED_WORD_INFO = [ + {'text': '人工', 's_start': 1, 's_end': 3}, + {'text': '智能', 's_start': 7, 's_end': 9}, + {'text': '技术', 's_start': 13, 's_end': 15}, +] + +ZH_EXPECTED_SEGMENT_INFO = [ + {'text': '人工 智能', 's_start': 1, 's_end': 9}, + {'text': '技术', 's_start': 13, 's_end': 15}, +] + + +@pytest.mark.parametrize( + "text,model_pretrained_name,separator,expected_token_info", + [ + (EN_TEXT, "stt_en_quartznet15x5", "|", EN_QN_EXPECTED_TOKEN_INFO), + (EN_TEXT, "stt_en_citrinet_256_gamma_0_25", "|", EN_CN_EXPECTED_TOKEN_INFO), + (ZH_TEXT, "stt_zh_citrinet_512", "|", ZH_EXPECTED_TOKEN_INFO), + ], +) +def test_token_info(text, model_pretrained_name, separator, expected_token_info): + model = ASRModel.from_pretrained(model_pretrained_name) + _, token_info, *_ = get_y_and_boundary_info_for_utt(text, model, separator) + assert token_info == expected_token_info + + +@pytest.mark.parametrize( + "text,model_pretrained_name,separator,expected_word_info", + [ + (EN_TEXT, "stt_en_quartznet15x5", "|", EN_QN_EXPECTED_WORD_INFO), + (EN_TEXT, "stt_en_citrinet_256_gamma_0_25", "|", EN_CN_EXPECTED_WORD_INFO), + (ZH_TEXT, "stt_zh_citrinet_512", "|", ZH_EXPECTED_WORD_INFO), + ], +) +def test_word_info(text, model_pretrained_name, separator, expected_word_info): + model = ASRModel.from_pretrained(model_pretrained_name) + _, _, word_info, _ = get_y_and_boundary_info_for_utt(text, model, separator) + assert word_info == expected_word_info + + +@pytest.mark.parametrize( + "text,model_pretrained_name,separator,expected_segment_info", + [ + (EN_TEXT, "stt_en_quartznet15x5", "|", EN_QN_EXPECTED_SEGMENT_INFO), + (EN_TEXT, "stt_en_citrinet_256_gamma_0_25", "|", EN_CN_EXPECTED_SEGMENT_INFO), + (ZH_TEXT, "stt_zh_citrinet_512", "|", ZH_EXPECTED_SEGMENT_INFO), + ], +) +def test_segment_info(text, model_pretrained_name, separator, expected_segment_info): + model = ASRModel.from_pretrained(model_pretrained_name) + *_, segment_info = get_y_and_boundary_info_for_utt(text, model, separator) + assert segment_info == expected_segment_info From 6d9c68370d72b2b418476523914fc9f54dc06c9c Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 13 Jan 2023 10:50:39 -0800 Subject: [PATCH 148/154] Added test for function adding t_start and t_end Signed-off-by: Elena Rastorgueva --- .../test_add_t_start_end_to_boundary_info.py | 107 ++++++++++++++++++ 1 file changed, 107 insertions(+) create mode 100644 tools/nemo_forced_aligner/tests/test_add_t_start_end_to_boundary_info.py diff --git a/tools/nemo_forced_aligner/tests/test_add_t_start_end_to_boundary_info.py b/tools/nemo_forced_aligner/tests/test_add_t_start_end_to_boundary_info.py new file mode 100644 index 000000000000..f96301432188 --- /dev/null +++ b/tools/nemo_forced_aligner/tests/test_add_t_start_end_to_boundary_info.py @@ -0,0 +1,107 @@ +import pytest +from utils.make_output_files import add_t_start_end_to_boundary_info + +ALIGNMENT = [ + 1, + 1, + 3, + 3, + 4, + 5, + 7, + 7, + 9, + 10, + 11, + 12, + 13, + 15, + 17, + 17, + 19, + 21, + 23, + 23, +] + +INPUT_TOKEN_INFO = [ + {'text': '', 's_start': 0, 's_end': 0}, + {'text': 'h', 's_start': 1, 's_end': 1}, + {'text': '', 's_start': 2, 's_end': 2}, + {'text': 'i', 's_start': 3, 's_end': 3}, + {'text': '', 's_start': 4, 's_end': 4}, + {'text': '', 's_start': 5, 's_end': 5}, + {'text': '', 's_start': 6, 's_end': 6}, + {'text': 'w', 's_start': 7, 's_end': 7}, + {'text': '', 's_start': 8, 's_end': 8}, + {'text': 'o', 's_start': 9, 's_end': 9}, + {'text': '', 's_start': 10, 's_end': 10}, + {'text': 'r', 's_start': 11, 's_end': 11}, + {'text': '', 's_start': 12, 's_end': 12}, + {'text': 'l', 's_start': 13, 's_end': 13}, + {'text': '', 's_start': 14, 's_end': 14}, + {'text': 'd', 's_start': 15, 's_end': 15}, + {'text': '', 's_start': 16, 's_end': 16}, + {'text': '', 's_start': 17, 's_end': 17}, + {'text': '', 's_start': 18, 's_end': 18}, + {'text': 'h', 's_start': 19, 's_end': 19}, + {'text': '', 's_start': 20, 's_end': 20}, + {'text': 'e', 's_start': 21, 's_end': 21}, + {'text': '', 's_start': 22, 's_end': 22}, + {'text': 'y', 's_start': 23, 's_end': 23}, + {'text': '', 's_start': 24, 's_end': 24}, +] + +EXPECTED_OUTPUT_TOKEN_INFO = [ + {'text': 'h', 's_start': 1, 's_end': 1, 't_start': 0, 't_end': 1}, + {'text': 'i', 's_start': 3, 's_end': 3, 't_start': 2, 't_end': 3}, + {'text': '', 's_start': 4, 's_end': 4, 't_start': 4, 't_end': 4}, + {'text': '', 's_start': 5, 's_end': 5, 't_start': 5, 't_end': 5}, + {'text': 'w', 's_start': 7, 's_end': 7, 't_start': 6, 't_end': 7}, + {'text': 'o', 's_start': 9, 's_end': 9, 't_start': 8, 't_end': 8}, + {'text': '', 's_start': 10, 's_end': 10, 't_start': 9, 't_end': 9}, + {'text': 'r', 's_start': 11, 's_end': 11, 't_start': 10, 't_end': 10}, + {'text': '', 's_start': 12, 's_end': 12, 't_start': 11, 't_end': 11}, + {'text': 'l', 's_start': 13, 's_end': 13, 't_start': 12, 't_end': 12}, + {'text': 'd', 's_start': 15, 's_end': 15, 't_start': 13, 't_end': 13}, + {'text': '', 's_start': 17, 's_end': 17, 't_start': 14, 't_end': 15}, + {'text': 'h', 's_start': 19, 's_end': 19, 't_start': 16, 't_end': 16}, + {'text': 'e', 's_start': 21, 's_end': 21, 't_start': 17, 't_end': 17}, + {'text': 'y', 's_start': 23, 's_end': 23, 't_start': 18, 't_end': 19}, +] + + +INPUT_WORD_INFO = [ + {'text': 'hi', 's_start': 1, 's_end': 3}, + {'text': 'world', 's_start': 7, 's_end': 15}, + {'text': 'hey', 's_start': 19, 's_end': 23}, +] + +EXPECTED_OUTPUT_WORD_INFO = [ + {'text': 'hi', 's_start': 1, 's_end': 3, 't_start': 0, 't_end': 3}, + {'text': 'world', 's_start': 7, 's_end': 15, 't_start': 6, 't_end': 13}, + {'text': 'hey', 's_start': 19, 's_end': 23, 't_start': 16, 't_end': 19}, +] + +INPUT_SEGMENT_INFO = [ + {'text': 'hi world', 's_start': 1, 's_end': 15}, + {'text': 'hey', 's_start': 19, 's_end': 23}, +] + +EXPECTED_OUTPUT_SEGMENT_INFO = [ + {'text': 'hi world', 's_start': 1, 's_end': 15, 't_start': 0, 't_end': 13}, + {'text': 'hey', 's_start': 19, 's_end': 23, 't_start': 16, 't_end': 19}, +] + + +@pytest.mark.parametrize( + "input_boundary_info_utt,alignment_utt,expected_output_boundary_info_utt", + [ + (INPUT_TOKEN_INFO, ALIGNMENT, EXPECTED_OUTPUT_TOKEN_INFO), + (INPUT_WORD_INFO, ALIGNMENT, EXPECTED_OUTPUT_WORD_INFO), + (INPUT_SEGMENT_INFO, ALIGNMENT, EXPECTED_OUTPUT_SEGMENT_INFO), + ], +) +def test_add_t_start_end_to_boundary_info(input_boundary_info_utt, alignment_utt, expected_output_boundary_info_utt): + output_boundary_info_utt = add_t_start_end_to_boundary_info(input_boundary_info_utt, alignment_utt) + assert output_boundary_info_utt == expected_output_boundary_info_utt From dd516e9f144f6ff73b079f2c0de7554ef93aa6b6 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 13 Jan 2023 11:59:38 -0800 Subject: [PATCH 149/154] add comments to get_y_and_boundary_info_for_utt and remove redundant variables Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/utils/data_prep.py | 158 +++++++++++-------- 1 file changed, 93 insertions(+), 65 deletions(-) diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index b882351916d7..b0a158041197 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -111,7 +111,40 @@ def get_char_tokens(text, model): def get_y_and_boundary_info_for_utt(text, model, separator): """ - Get y, token_info, word_info and segment_info for the text provided, tokenized by the model provided. + Get y_token_ids_with_blanks, token_info, word_info and segment_info for the text provided, tokenized + by the model provided. + y_token_ids_with_blanks is a list of the indices of the text tokens with the blank token id in between every + text token. + token_info, word_info and segment_info are lists of dictionaries containing information about + where the tokens/words/segments start and end. + For example, 'hi world | hey ' with separator = '|' and tokenized by a BPE tokenizer can have token_info like: + token_info = [ + {'text': '', 's_start': 0, 's_end': 0}, + {'text': '▁hi', 's_start': 1, 's_end': 1}, + {'text': '', 's_start': 2, 's_end': 2}, + {'text': '▁world', 's_start': 3, 's_end': 3}, + {'text': '', 's_start': 4, 's_end': 4}, + {'text': '▁he', 's_start': 5, 's_end': 5}, + {'text': '', 's_start': 6, 's_end': 6}, + {'text': 'y', 's_start': 7, 's_end': 7}, + {'text': '', 's_start': 8, 's_end': 8}, + ] + 's_start' and 's_end' indicate where in the sequence of tokens does each token start and end. + + The word_info will be as follows: + word_info = [ + {'text': 'hi', 's_start': 1, 's_end': 1}, + {'text': 'world', 's_start': 3, 's_end': 3}, + {'text': 'hey', 's_start': 5, 's_end': 7}, + ] + 's_start' and 's_end' indicate where in the sequence of tokens does each word start and end. + + segment_info will be as follows: + segment_info = [ + {'text': 'hi world', 's_start': 1, 's_end': 3}, + {'text': 'hey', 's_start': 5, 's_end': 7}, + ] + 's_start' and 's_end' indicate where in the sequence of tokens does each segment start and end. """ if not separator: # if separator is not defined - treat the whole text as one segment @@ -125,44 +158,41 @@ def get_y_and_boundary_info_for_utt(text, model, separator): if hasattr(model, 'tokenizer'): BLANK_ID = len(model.decoder.vocabulary) # TODO: check - y_token_ids = [] + y_token_ids_with_blanks = [BLANK_ID] - y_tokens = [] - y_tokens_with_blanks = [BLANK_TOKEN] token_info = [{"text": BLANK_TOKEN, "s_start": 0, "s_end": 0,}] word_info = [] segment_info = [] - segment_s_pointer = 1 - word_s_pointer = 1 + segment_s_pointer = 1 # first segment will start at s=1 because s=0 is a blank + word_s_pointer = 1 # first word will start at s=1 because s=0 is a blank for segment in segments: - words = segment.split(" ") + words = segment.split(" ") # we define words to be space-separated sub-strings for word in words: word_tokens = model.tokenizer.text_to_tokens(word) word_ids = model.tokenizer.text_to_ids(word) for token, id_ in zip(word_tokens, word_ids): - y_token_ids.append(id_) + # add the text token and the blank that follows it + # to our token-based variables y_token_ids_with_blanks.extend([id_, BLANK_ID]) - y_tokens.append(token) - y_tokens_with_blanks.extend([token, BLANK_TOKEN]) - - token_info.append( - { - "text": token, - "s_start": len(y_tokens_with_blanks) - 2, - "s_end": len(y_tokens_with_blanks) - 2, - } - ) - token_info.append( - { - "text": BLANK_TOKEN, - "s_start": len(y_tokens_with_blanks) - 1, - "s_end": len(y_tokens_with_blanks) - 1, - } + token_info.extend( + [ + { + "text": token, + "s_start": len(y_token_ids_with_blanks) - 2, + "s_end": len(y_token_ids_with_blanks) - 2, + }, + { + "text": BLANK_TOKEN, + "s_start": len(y_token_ids_with_blanks) - 1, + "s_end": len(y_token_ids_with_blanks) - 1, + }, + ] ) + # add the word to word_info and increment the word_s_pointer word_info.append( { "text": word, @@ -172,6 +202,7 @@ def get_y_and_boundary_info_for_utt(text, model, separator): ) word_s_pointer += len(word_tokens) * 2 # TODO check this + # add the segment to segment_info and increment the segment_s_pointer segment_tokens = model.tokenizer.text_to_tokens(segment) segment_info.append( { @@ -184,68 +215,64 @@ def get_y_and_boundary_info_for_utt(text, model, separator): return y_token_ids_with_blanks, token_info, word_info, segment_info - elif hasattr(model.decoder, "vocabulary"): + elif hasattr(model.decoder, "vocabulary"): # i.e. tokenization is simply character-based BLANK_ID = len(model.decoder.vocabulary) # TODO: check this is correct SPACE_ID = model.decoder.vocabulary.index(" ") - y_token_ids = [] + y_token_ids_with_blanks = [BLANK_ID] - y_tokens = [] - y_tokens_with_blanks = [BLANK_TOKEN] token_info = [{"text": BLANK_TOKEN, "s_start": 0, "s_end": 0,}] word_info = [] segment_info = [] - segment_s_pointer = 1 - word_s_pointer = 1 + segment_s_pointer = 1 # first segment will start at s=1 because s=0 is a blank + word_s_pointer = 1 # first word will start at s=1 because s=0 is a blank for i_segment, segment in enumerate(segments): - words = segment.split(" ") + words = segment.split(" ") # we define words to be space-separated characters for i_word, word in enumerate(words): + # convert string to list of characters word_tokens = list(word) + # convert list of characters to list of their ids in the vocabulary word_ids = get_char_tokens(word, model) for token, id_ in zip(word_tokens, word_ids): - y_token_ids.append(id_) + # add the text token and the blank that follows it + # to our token-based variables y_token_ids_with_blanks.extend([id_, BLANK_ID]) - y_tokens.append(token) - y_tokens_with_blanks.extend([token, BLANK_TOKEN]) - - token_info.append( - { - "text": token, - "s_start": len(y_tokens_with_blanks) - 2, - "s_end": len(y_tokens_with_blanks) - 2, - } - ) - token_info.append( - { - "text": BLANK_TOKEN, - "s_start": len(y_tokens_with_blanks) - 1, - "s_end": len(y_tokens_with_blanks) - 1, - } + token_info.extend( + [ + { + "text": token, + "s_start": len(y_token_ids_with_blanks) - 2, + "s_end": len(y_token_ids_with_blanks) - 2, + }, + { + "text": BLANK_TOKEN, + "s_start": len(y_token_ids_with_blanks) - 1, + "s_end": len(y_token_ids_with_blanks) - 1, + }, + ] ) - # add space token unless this is the final word in the final segment + # add space token (and the blank after it) unless this is the final word in the final segment if not (i_segment == len(segments) - 1 and i_word == len(words) - 1): - y_token_ids.append(SPACE_ID) y_token_ids_with_blanks.extend([SPACE_ID, BLANK_ID]) - y_tokens.append(SPACE_TOKEN) - y_tokens_with_blanks.extend([SPACE_TOKEN, BLANK_TOKEN]) - token_info.append( - { - "text": SPACE_TOKEN, - "s_start": len(y_tokens_with_blanks) - 2, - "s_end": len(y_tokens_with_blanks) - 2, - } - ) - token_info.append( - { - "text": BLANK_TOKEN, - "s_start": len(y_tokens_with_blanks) - 1, - "s_end": len(y_tokens_with_blanks) - 1, - } + token_info.extend( + ( + { + "text": SPACE_TOKEN, + "s_start": len(y_token_ids_with_blanks) - 2, + "s_end": len(y_token_ids_with_blanks) - 2, + }, + { + "text": BLANK_TOKEN, + "s_start": len(y_token_ids_with_blanks) - 1, + "s_end": len(y_token_ids_with_blanks) - 1, + }, + ) ) + # add the word to word_info and increment the word_s_pointer word_info.append( { "text": word, @@ -255,6 +282,7 @@ def get_y_and_boundary_info_for_utt(text, model, separator): ) word_s_pointer += len(word_tokens) * 2 + 2 # TODO check this + # add the segment to segment_info and increment the segment_s_pointer segment_tokens = get_char_tokens(segment, model) segment_info.append( { From 55731fd35ec03f77374e86394b54986464d3056c Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 13 Jan 2023 12:14:20 -0800 Subject: [PATCH 150/154] add comments to get_batch_tensors_and_boundary_info Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/utils/data_prep.py | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index b0a158041197..7dda1e1a06fb 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -301,16 +301,20 @@ def get_y_and_boundary_info_for_utt(text, model, separator): def get_batch_tensors_and_boundary_info(manifest_lines_batch, model, separator, align_using_pred_text): """ - Preparing some tensors to be used during Viterbi decoding. Returns: - log_probs, y, T, U (y and U are s.t. every other token is a blank), - token_info_list, word_info_list, segment_info_list, - pred_text_list + log_probs, y, T, U (y and U are s.t. every other token is a blank) - these are the tensors we will need + during Viterbi decoding. + token_info_list, word_info_list, segment_info_list - these are lists of dictionaries which we will need + for writing the CTM files with the human-readable alignments. + pred_text_list - this is a list of the transcriptions from our model which we will save to our output JSON + file if align_using_pred_text is True. """ + # get hypotheses by calling 'transcribe' + # we will use the output log_probs, the duration of the log_probs, + # and (optionally) the predicted ASR text from the hypotheses audio_filepaths_batch = [line["audio_filepath"] for line in manifest_lines_batch] B = len(audio_filepaths_batch) - with torch.no_grad(): hypotheses = model.transcribe(audio_filepaths_batch, return_hypotheses=True, batch_size=B) @@ -322,6 +326,9 @@ def get_batch_tensors_and_boundary_info(manifest_lines_batch, model, separator, T_list_batch.append(hypothesis.y_sequence.shape[0]) pred_text_batch.append(hypothesis.text) + # we loop over every line in the manifest that is in our current batch, + # and record the y (list of tokens, including blanks), U (list of lengths of y) and + # token_info_batch, word_info_batch, segment_info_batch y_list_batch = [] U_list_batch = [] token_info_batch = [] @@ -343,6 +350,7 @@ def get_batch_tensors_and_boundary_info(manifest_lines_batch, model, separator, word_info_batch.append(word_info_utt) segment_info_batch.append(segment_info_utt) + # turn log_probs, y, T, U into dense tensors for fast computation during Viterbi decoding T_max = max(T_list_batch) U_max = max(U_list_batch) V = len(model.decoder.vocabulary) + 1 From 7f613eca378cc3086071e5e73fa249b52f33d37f Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 13 Jan 2023 13:04:57 -0800 Subject: [PATCH 151/154] Add comments to make_output_files.py Signed-off-by: Elena Rastorgueva --- .../utils/make_output_files.py | 52 ++++++++++++++++--- 1 file changed, 44 insertions(+), 8 deletions(-) diff --git a/tools/nemo_forced_aligner/utils/make_output_files.py b/tools/nemo_forced_aligner/utils/make_output_files.py index a972b13a9982..830bf476ff2f 100644 --- a/tools/nemo_forced_aligner/utils/make_output_files.py +++ b/tools/nemo_forced_aligner/utils/make_output_files.py @@ -28,8 +28,31 @@ def _get_utt_id(audio_filepath, audio_filepath_parts_in_utt_id): def add_t_start_end_to_boundary_info(boundary_info_utt, alignment_utt): + """ + We use the list of alignments to add the timesteps where each token/word/segment is predicted to + start and end. + boundary_info_utt can be any one of the variables referred to as `token_info`, `word_info`, `segment_info` + in other parts of the code. + + e.g. the input boundary info could be + boundary_info_utt = [ + {'text': 'hi', 's_start': 1, 's_end': 3}, + {'text': 'world', 's_start': 7, 's_end': 15}, + {'text': 'hey', 's_start': 19, 's_end': 23}, + ] + + and the alignment could be + alignment_utt = [ 1, 1, 3, 3, 4, 5, 7, 7, 9, 10, 11, 12, 13, 15, 17, 17, 19, 21, 23, 23] + + in which case the output would be: + boundary_info_utt = [ + {'text': 'hi', 's_start': 1, 's_end': 3, 't_start': 0, 't_end': 3}, + {'text': 'world', 's_start': 7, 's_end': 15, 't_start': 6, 't_end': 13}, + {'text': 'hey', 's_start': 19, 's_end': 23, 't_start': 16, 't_end': 19}, + ] + """ # first remove boundary_info of any items that are not in the alignment - # the only items we expect not to be in the alignment are blanks which the alignment chooses to skip + # the only items we expect not to be in the alignment are blanks that the alignment chooses to skip # we will iterate boundary_info in reverse order for this to make popping the items simple s_in_alignment = set(alignment_utt) for boundary_info_pointer in range(len(boundary_info_utt) - 1, -1, -1): @@ -47,14 +70,15 @@ def add_t_start_end_to_boundary_info(boundary_info_utt, alignment_utt): if item_not_in_alignment: boundary_info_utt.pop(boundary_info_pointer) - # now updated boundary_info with s_start and s_end + # now update boundary_info with t_start and t_end boundary_info_pointer = 0 for t, s_at_t in enumerate(alignment_utt): if s_at_t == boundary_info_utt[boundary_info_pointer]["s_start"]: if "t_start" not in boundary_info_utt[boundary_info_pointer]: + # we have just reached the start of the word/token/segment in the alignment => update t_start boundary_info_utt[boundary_info_pointer]["t_start"] = t - if t < len(alignment_utt) - 1: + if t < len(alignment_utt) - 1: # this if is to avoid accessing an index that is not in the list if alignment_utt[t + 1] > boundary_info_utt[boundary_info_pointer]["s_end"]: if "t_end" not in boundary_info_utt[boundary_info_pointer]: boundary_info_utt[boundary_info_pointer]["t_end"] = t @@ -87,14 +111,19 @@ def make_ctm( audio_sr, ): """ - Note: assume same order of utts in boundary_info_batch, alignments_batch and manifest_lines_batch + Function to save CTM files for all the utterances in the incoming batch. """ + assert len(boundary_info_batch) == len(alignments_batch) == len(manifest_lines_batch) + # we also assume that utterances are in the same order in boundary_info_batch, alignments_batch + # and manifest_lines_batch - this should be the case unless there is a strange bug upstream in the + # code os.makedirs(output_dir, exist_ok=True) - spectrogram_hop_length = model.preprocessor.featurizer.hop_length - timestep_to_sample_ratio = spectrogram_hop_length * model_downsample_factor + # the ratio to convert from timesteps (the units of 't_start' and 't_end' in boundary_info_utt) + # to the number of samples ('samples' in the sense of 16000 'samples' per second) + timestep_to_sample_ratio = model.preprocessor.featurizer.hop_length * model_downsample_factor for boundary_info_utt, alignment_utt, manifest_line in zip( boundary_info_batch, alignments_batch, manifest_lines_batch @@ -111,7 +140,7 @@ def make_ctm( audio_file_duration = f.frames / f.samplerate with open(os.path.join(output_dir, f"{utt_id}.ctm"), "w") as f_ctm: - for boundary_info_ in boundary_info_utt: + for boundary_info_ in boundary_info_utt: # loop over every token/word/segment text = boundary_info_["text"] start_sample = boundary_info_["t_start"] * timestep_to_sample_ratio end_sample = (boundary_info_["t_end"] + 1) * timestep_to_sample_ratio - 1 @@ -120,13 +149,16 @@ def make_ctm( end_time = end_sample / audio_sr if minimum_timestamp_duration > 0 and minimum_timestamp_duration > end_time - start_time: + # make the predicted duration of the token/word/segment longer, growing it outwards equal + # amounts from the predicted center of the token/word/segment token_mid_point = (start_time + end_time) / 2 start_time = max(token_mid_point - minimum_timestamp_duration / 2, 0) end_time = min(token_mid_point + minimum_timestamp_duration / 2, audio_file_duration) - if not (text == BLANK_TOKEN and remove_blank_tokens_from_ctm): + if not (text == BLANK_TOKEN and remove_blank_tokens_from_ctm): # don't save blanks if we don't want to # replace any spaces with so we dont introduce extra space characters to our CTM files text = text.replace(" ", SPACE_TOKEN) + f_ctm.write(f"{utt_id} 1 {start_time:.2f} {end_time - start_time:.2f} {text}\n") return None @@ -139,6 +171,10 @@ def make_new_manifest( audio_filepath_parts_in_utt_id, pred_text_all_lines, ): + """ + Function to save a new manifest with the same info as the original manifest, but also the paths to the + CTM files for each utterance and the "pred_text" if it was used for the alignment. + """ if pred_text_all_lines: with open(original_manifest_filepath, 'r') as f: num_lines_in_manifest = sum(1 for _ in f) From 989e273133a3acbfa27a7454a73474acd7117aa4 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Fri, 13 Jan 2023 14:51:55 -0800 Subject: [PATCH 152/154] add comments to viterbi decoding code Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/utils/data_prep.py | 4 ++ .../utils/viterbi_decoding.py | 61 +++++++++++++++++-- 2 files changed, 59 insertions(+), 6 deletions(-) diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py index 7dda1e1a06fb..26d8a328b50d 100644 --- a/tools/nemo_forced_aligner/utils/data_prep.py +++ b/tools/nemo_forced_aligner/utils/data_prep.py @@ -353,6 +353,7 @@ def get_batch_tensors_and_boundary_info(manifest_lines_batch, model, separator, # turn log_probs, y, T, U into dense tensors for fast computation during Viterbi decoding T_max = max(T_list_batch) U_max = max(U_list_batch) + # V = the number of tokens in the vocabulary + 1 for the blank token. V = len(model.decoder.vocabulary) + 1 T_batch = torch.tensor(T_list_batch) U_batch = torch.tensor(U_list_batch) @@ -364,6 +365,9 @@ def get_batch_tensors_and_boundary_info(manifest_lines_batch, model, separator, log_probs_batch[b, :t, :] = log_probs_utt # make y tensor of shape (B x U_max) + # populate it initially with all 'V' numbers so that the 'V's will remain in the areas that + # are 'padding'. This will be useful for when we make 'log_probs_reorderd' during Viterbi decoding + # in a different function. y_batch = V * torch.ones((B, U_max), dtype=torch.int64) for b, y_utt in enumerate(y_list_batch): U_utt = U_batch[b] diff --git a/tools/nemo_forced_aligner/utils/viterbi_decoding.py b/tools/nemo_forced_aligner/utils/viterbi_decoding.py index a5599b267fbf..bc9a45dda527 100644 --- a/tools/nemo_forced_aligner/utils/viterbi_decoding.py +++ b/tools/nemo_forced_aligner/utils/viterbi_decoding.py @@ -13,62 +13,111 @@ # limitations under the License. import torch - -from .constants import V_NEGATIVE_NUM +from utils.constants import V_NEGATIVE_NUM def viterbi_decoding(log_probs_batch, y_batch, T_batch, U_batch, viterbi_device): """ - Does Viterbi decoding. + Do Viterbi decoding with an efficient algorithm (the only for-loop in the 'forward pass' is over the time dimension). + Args: + log_probs_batch: tensor of shape (B, T_max, V). The parts of log_probs_batch which are 'padding' are filled + with 'V_NEGATIVE_NUM' - a large negative number which represents a very low probability. + y_batch: tensor of shape (B, U_max) - contains token IDs including blanks in every other position. The parts of + y_batch which are padding are filled with the number 'V'. V = the number of tokens in the vocabulary + 1 for + the blank token. + T_batch: tensor of shape (B, 1) - contains the durations of the log_probs_batch (so we can ignore the + parts of log_probs_batch which are padding) + U_batch: tensor of shape (B, 1) - contains the lengths of y_batch (so we can ignore the parts of y_batch + which are padding). + viterbi_device: the torch device on which Viterbi decoding will be done. + Returns: alignments_batch: list of lists containing locations for the tokens we align to at each timestep. - Looks like: [[0, 0, 1, 2, 2, 3, 3, ... ], [0, 1, 2, 2, 2, 3, 4, ....], ...] + Looks like: [[0, 0, 1, 2, 2, 3, 3, ..., ], ..., [0, 1, 2, 2, 2, 3, 4, ....]]. + Each list inside alignments_batch is of length T_batch[location of utt in batch]. """ B, T_max, _ = log_probs_batch.shape U_max = y_batch.shape[1] - # transfer all tensors to device + # transfer all tensors to viterbi_device log_probs_batch = log_probs_batch.to(viterbi_device) y_batch = y_batch.to(viterbi_device) T_batch = T_batch.to(viterbi_device) U_batch = U_batch.to(viterbi_device) + # make tensor that we will put at timesteps beyond the duration of the audio padding_for_log_probs = V_NEGATIVE_NUM * torch.ones((B, T_max, 1), device=viterbi_device) + # make log_probs_padded tensor of shape (B, T_max, V +1 ) where all of + # log_probs_padded[:,:,-1] is the 'V_NEGATIVE_NUM' log_probs_padded = torch.cat((log_probs_batch, padding_for_log_probs), dim=2) + # make log_probs_reordered tensor of shape (B, T_max, U_max) + # it contains the log_probs for only the tokens that are in the Ground Truth, and in the order + # that they occur log_probs_reordered = torch.gather(input=log_probs_padded, dim=2, index=y_batch.unsqueeze(1).repeat(1, T_max, 1)) + # initialize tensors of viterbi probabilies and backpointers v_matrix = V_NEGATIVE_NUM * torch.ones_like(log_probs_reordered) backpointers = -999 * torch.ones_like(v_matrix) v_matrix[:, 0, :2] = log_probs_reordered[:, 0, :2] + # Make a letter_repetition_mask the same shape as y_batch + # the letter_repetition_mask will have 'True' where the token (including blanks) is the same + # as the token two places before it in the ground truth (and 'False everywhere else). + # We will use letter_repetition_mask to determine whether the Viterbi algorithm needs to look two tokens back or + # three tokens back y_shifted_left = torch.roll(y_batch, shifts=2, dims=1) letter_repetition_mask = y_batch - y_shifted_left letter_repetition_mask[:, :2] = 1 # make sure dont apply mask to first 2 tokens letter_repetition_mask = letter_repetition_mask == 0 + # bp_absolute_template is a tensor we will need during the Viterbi decoding to convert our argmaxes from indices between 0 and 2, + # to indices in the range (0, U_max-1) indicating from which token the mostly path up to that point came from. + # it is a tensor of shape (B, U_max) that looks like + # bp_absolute_template = [ + # [0, 1, 2, ...,, U_max] + # [0, 1, 2, ...,, U_max] + # [0, 1, 2, ...,, U_max] + # ... rows repeated so there are B number of rows in total + # ] bp_absolute_template = torch.arange(U_max, device=viterbi_device).unsqueeze(0).repeat(B, 1) for t in range(1, T_max): + # e_current is a tensor of shape (B, U_max) of the log probs of every possible token at the current timestep e_current = log_probs_reordered[:, t, :] + # v_prev is a tensor of shape (B, U_max) of the viterbi probabilities 1 timestep back and in the same token position v_prev = v_matrix[:, t - 1, :] + + # v_prev_shifted is a tensor of shape (B, U_max) of the viterbi probabilities 1 timestep back and 1 token position back v_prev_shifted = torch.roll(v_prev, shifts=1, dims=1) + # by doing a roll shift of size 1, we have brought the viterbi probability in the final token position to the + # first token position - let's overcome this by 'zeroing out' the probabilities in the firest token position v_prev_shifted[:, 0] = V_NEGATIVE_NUM + # v_prev_shifted2 is a tensor of shape (B, U_max) of the viterbi probabilities 1 timestep back and 2 token position back v_prev_shifted2 = torch.roll(v_prev, shifts=2, dims=1) - v_prev_shifted2[:, :2] = V_NEGATIVE_NUM + v_prev_shifted2[:, :2] = V_NEGATIVE_NUM # zero out as we did for v_prev_shifted + # use our letter_repetition_mask to remove the connections between 2 blanks (so we don't skip over a letter) + # and to remove the connections between 2 consective letters (so we don't skip over a blank) v_prev_shifted2.masked_fill_(letter_repetition_mask, V_NEGATIVE_NUM) + # we need this v_prev_dup tensor so we can calculated the viterbi probability of every possible + # token position simultaneously v_prev_dup = torch.cat( (v_prev.unsqueeze(2), v_prev_shifted.unsqueeze(2), v_prev_shifted2.unsqueeze(2),), dim=2, ) + # candidates_v_current are our candidate viterbi probabilities for every token position, from which + # we will pick the max and record the argmax candidates_v_current = v_prev_dup + e_current.unsqueeze(2) v_current, bp_relative = torch.max(candidates_v_current, dim=2) + # convert our argmaxes from indices between 0 and 2, to indices in the range (0, U_max-1) indicating + # from which token the mostly path up to that point came from bp_absolute = bp_absolute_template - bp_relative + # update our tensors containing all the viterbi probabilites and backpointers v_matrix[:, t, :] = v_current backpointers[:, t, :] = bp_absolute From e11258db8cb15cc467f1da558b6cff861699f374 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 17 Jan 2023 09:39:59 -0800 Subject: [PATCH 153/154] Add copyright headers Signed-off-by: Elena Rastorgueva --- .../tests/test_add_t_start_end_to_boundary_info.py | 14 ++++++++++++++ .../tests/test_get_y_and_boundary_info_for_utt.py | 14 ++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/tools/nemo_forced_aligner/tests/test_add_t_start_end_to_boundary_info.py b/tools/nemo_forced_aligner/tests/test_add_t_start_end_to_boundary_info.py index f96301432188..406c4be1fb70 100644 --- a/tools/nemo_forced_aligner/tests/test_add_t_start_end_to_boundary_info.py +++ b/tools/nemo_forced_aligner/tests/test_add_t_start_end_to_boundary_info.py @@ -1,3 +1,17 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import pytest from utils.make_output_files import add_t_start_end_to_boundary_info diff --git a/tools/nemo_forced_aligner/tests/test_get_y_and_boundary_info_for_utt.py b/tools/nemo_forced_aligner/tests/test_get_y_and_boundary_info_for_utt.py index 42627dccdbd5..f5bc722d5a1c 100644 --- a/tools/nemo_forced_aligner/tests/test_get_y_and_boundary_info_for_utt.py +++ b/tools/nemo_forced_aligner/tests/test_get_y_and_boundary_info_for_utt.py @@ -1,3 +1,17 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import pytest from utils.data_prep import get_y_and_boundary_info_for_utt From 3bd71bec34ca0d7a62995a3fca8391da25457c57 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Tue, 17 Jan 2023 09:43:12 -0800 Subject: [PATCH 154/154] Change req to nemo_toolkit[all] Signed-off-by: Elena Rastorgueva --- tools/nemo_forced_aligner/requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/nemo_forced_aligner/requirements.txt b/tools/nemo_forced_aligner/requirements.txt index f00728e27ec9..3af8ebf1b488 100644 --- a/tools/nemo_forced_aligner/requirements.txt +++ b/tools/nemo_forced_aligner/requirements.txt @@ -1,2 +1,2 @@ -nemo_toolkit[asr] +nemo_toolkit[all] pytest