From 59d9325c2c27fc095a120290ba97a442f93c042e Mon Sep 17 00:00:00 2001 From: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Date: Fri, 8 Sep 2023 16:13:28 +0400 Subject: [PATCH 001/112] Create pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister --- docs/source/nlp/pos_emb.rst | 2 ++ 1 file changed, 2 insertions(+) create mode 100644 docs/source/nlp/pos_emb.rst diff --git a/docs/source/nlp/pos_emb.rst b/docs/source/nlp/pos_emb.rst new file mode 100644 index 000000000000..9b0a5fc15f97 --- /dev/null +++ b/docs/source/nlp/pos_emb.rst @@ -0,0 +1,2 @@ +Supported positional embedding types +------------------------------------ From 35ba3ec7687e37e095a41e0b4419ad697e2198a5 Mon Sep 17 00:00:00 2001 From: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Date: Fri, 8 Sep 2023 18:15:05 +0400 Subject: [PATCH 002/112] Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister --- docs/source/nlp/pos_emb.rst | 80 +++++++++++++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) diff --git a/docs/source/nlp/pos_emb.rst b/docs/source/nlp/pos_emb.rst index 9b0a5fc15f97..59b22c501772 100644 --- a/docs/source/nlp/pos_emb.rst +++ b/docs/source/nlp/pos_emb.rst @@ -1,2 +1,82 @@ Supported positional embedding types ------------------------------------ +.. list-table:: *Supported positional embeddings in GPT models* + :widths: 10 30 60 + :header-rows: 1 + + * - Parameter value + - How to use + - Description + + * - **learned_absolute** + - .. code:: + + model.position_embedding_type='learned_absolute' + - `Absolute Position Encodings `_ are position embeddings used in Transformer-based models, added to input embeddings in the encoder and decoder sections. These encodings match the dimension of embeddings and are created using sine and cosine functions of various frequencies. Each dimension in the encoding corresponds to a sinusoid with wavelengths forming a geometric progression. + + * - **rope** + - .. code:: + + model.position_embedding_type='rope' + model.rotary_percentage=1.0 + - `Rotary Position Embedding (RoPE) `_ incorporates positional information by utilizing a rotation matrix to encode the absolute positions of tokens while maintaining relative positional relationships in self-attention formulations by leveraging the geometric properties of vectors and complex numbers, applying a rotation based on a preset non-zero constant and the relative positions of the tokens to the word embeddings. + + * - **alibi** + - .. code:: + + model.position_embedding_type='alibi' + - `Attention with Linear Biases (ALiBi) `_ modifies the way attention scores are computed in the attention sublayer of the network. ALiBi introduces a static, non-learned bias after the query-key dot product during the computation of attention scores. This bias is added in the form of a head-specific slope that is determined before training, creating a geometric sequence of slopes for the different heads in the model. The method has an inductive bias towards recency, penalizing attention scores between distant query-key pairs with the penalty increasing as the distance grows, and it leverages different rates of penalty increase across different heads based on the slope magnitude. + + * - **kerple** + - .. code:: + + model.position_embedding_type='kerple' + - `Kernelized Relative Positional Embedding for Length Extrapolation (KERPLE) `_ generalizes relative positional embeddings (RPE) by kernelizing positional differences using conditionally positive definite (CPD) kernels known for generalizing distance metrics. They transform CPD kernels into positive definite (PD) kernels by adding a constant offset, which is absorbed during softmax normalization in the self-attention mechanism of transformers. This approach allows for a variety of RPEs that facilitate length extrapolation in a principled manner. + + * - **xpos** + - .. code:: + + model.position_embedding_type='xpos' + - `Extrapolatable Position Embedding (xPos) `_ + + * - **sandwich** + - .. code:: + + model.position_embedding_type='sandwich' + - `Sandwich `_ + +.. list-table:: *Supported positional embeddings in T5 models* + :widths: 10 30 60 + :header-rows: 1 + + * - Parameter value + - How to use + - Description + + * - **learned_absolute** + - .. code:: + + model.encoder.position_embedding_type='learned_absolute' + model.decoder.position_embedding_type='learned_absolute' + - `Absolute Position Encodings `_ + + * - **relative** + - .. code:: + + model.encoder.position_embedding_type='relative' + model.decoder.position_embedding_type='relative' + - `Relative Position Representations `_ + + * - **alibi** + - .. code:: + + model.encoder.position_embedding_type='alibi' + model.decoder.position_embedding_type='alibi' + - `Attention with Linear Biases (ALiBi) `_ modifies the way attention scores are computed in the attention sublayer of the network. ALiBi introduces a static, non-learned bias after the query-key dot product during the computation of attention scores. This bias is added in the form of a head-specific slope that is determined before training, creating a geometric sequence of slopes for the different heads in the model. The method has an inductive bias towards recency, penalizing attention scores between distant query-key pairs with the penalty increasing as the distance grows, and it leverages different rates of penalty increase across different heads based on the slope magnitude. + + * - **kerple** + - .. code:: + + model.encoder.position_embedding_type='kerple' + model.decoder.position_embedding_type='kerple' + - `Kernelized Relative Positional Embedding for Length Extrapolation (KERPLE) `_ generalizes relative positional embeddings (RPE) by kernelizing positional differences using conditionally positive definite (CPD) kernels known for generalizing distance metrics. They transform CPD kernels into positive definite (PD) kernels by adding a constant offset, which is absorbed during softmax normalization in the self-attention mechanism of transformers. This approach allows for a variety of RPEs that facilitate length extrapolation in a principled manner. From 2939667abc6432ae340852d700dcd6dcc9dab5ed Mon Sep 17 00:00:00 2001 From: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Date: Fri, 8 Sep 2023 18:23:35 +0400 Subject: [PATCH 003/112] Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister --- docs/source/nlp/pos_emb.rst | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/docs/source/nlp/pos_emb.rst b/docs/source/nlp/pos_emb.rst index 59b22c501772..4df0285b8d9b 100644 --- a/docs/source/nlp/pos_emb.rst +++ b/docs/source/nlp/pos_emb.rst @@ -1,5 +1,11 @@ -Supported positional embedding types ------------------------------------- +Positional embedding +-------------------- + +Positional embeddings are used to give the model information about the position of each element in a sequence. Megatron LLM supports the following positional embedding types: + +GPT +^^^ + .. list-table:: *Supported positional embeddings in GPT models* :widths: 10 30 60 :header-rows: 1 @@ -45,6 +51,9 @@ Supported positional embedding types model.position_embedding_type='sandwich' - `Sandwich `_ +T5 +^^^ + .. list-table:: *Supported positional embeddings in T5 models* :widths: 10 30 60 :header-rows: 1 From 99a23504b3dd5dc53c7be4016e54bf0d61517413 Mon Sep 17 00:00:00 2001 From: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Date: Fri, 8 Sep 2023 18:24:16 +0400 Subject: [PATCH 004/112] Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister --- docs/source/nlp/pos_emb.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/nlp/pos_emb.rst b/docs/source/nlp/pos_emb.rst index 4df0285b8d9b..6b3a98da9bf8 100644 --- a/docs/source/nlp/pos_emb.rst +++ b/docs/source/nlp/pos_emb.rst @@ -67,7 +67,7 @@ T5 model.encoder.position_embedding_type='learned_absolute' model.decoder.position_embedding_type='learned_absolute' - - `Absolute Position Encodings `_ + - `Absolute Position Encodings `_ are position embeddings used in Transformer-based models, added to input embeddings in the encoder and decoder sections. These encodings match the dimension of embeddings and are created using sine and cosine functions of various frequencies. Each dimension in the encoding corresponds to a sinusoid with wavelengths forming a geometric progression. * - **relative** - .. code:: From 2e5af172150df709d7093f0e115ce60ae1767355 Mon Sep 17 00:00:00 2001 From: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Date: Fri, 8 Sep 2023 18:25:03 +0400 Subject: [PATCH 005/112] Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister --- docs/source/nlp/pos_emb.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/nlp/pos_emb.rst b/docs/source/nlp/pos_emb.rst index 6b3a98da9bf8..37c53f6b2ca9 100644 --- a/docs/source/nlp/pos_emb.rst +++ b/docs/source/nlp/pos_emb.rst @@ -1,5 +1,5 @@ -Positional embedding --------------------- +Positional embeddings +--------------------- Positional embeddings are used to give the model information about the position of each element in a sequence. Megatron LLM supports the following positional embedding types: From 5314d2dbe8d30e323154785d55dfa8481b34e630 Mon Sep 17 00:00:00 2001 From: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Date: Fri, 8 Sep 2023 19:39:24 +0400 Subject: [PATCH 006/112] Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister --- docs/source/nlp/pos_emb.rst | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/docs/source/nlp/pos_emb.rst b/docs/source/nlp/pos_emb.rst index 37c53f6b2ca9..c491f861fa99 100644 --- a/docs/source/nlp/pos_emb.rst +++ b/docs/source/nlp/pos_emb.rst @@ -52,7 +52,7 @@ GPT - `Sandwich `_ T5 -^^^ +^^ .. list-table:: *Supported positional embeddings in T5 models* :widths: 10 30 60 @@ -89,3 +89,33 @@ T5 model.encoder.position_embedding_type='kerple' model.decoder.position_embedding_type='kerple' - `Kernelized Relative Positional Embedding for Length Extrapolation (KERPLE) `_ generalizes relative positional embeddings (RPE) by kernelizing positional differences using conditionally positive definite (CPD) kernels known for generalizing distance metrics. They transform CPD kernels into positive definite (PD) kernels by adding a constant offset, which is absorbed during softmax normalization in the self-attention mechanism of transformers. This approach allows for a variety of RPEs that facilitate length extrapolation in a principled manner. + +Positional interpolation +------------------------ +`Position Interpolation (PI) `_ is a method introduced to extend the context window sizes of Rotary Position Embedding (RoPE)-based pretrained large language models (LLMs). The central principle of PI is to reduce the position indices so that they align with the initial context window size through interpolation. + +Positional Interpolation is supported in Megatron GPT SFT models. Set RoPE Interpolation factor for sequence length :code:`seq_len_interpolation_factor` to enable it. + +.. code:: + + model.position_embedding_type='rope' + model.rotary_percentage=1.0 + model.seq_len_interpolation_factor: null + +Flash attention +--------------- +`FlashAttention `_ is a method designed to enhance the efficiency of Transformer models, which are widely utilized in applications such as natural language processing. Traditional Transformers are slow and consume a lot of memory, especially with long sequences, due to the quadratic time and memory complexity of self-attention. FlashAttention, an IO-aware exact attention algorithm that leverages tiling to minimize the number of memory reads/writes between the GPU's high bandwidth memory (HBM) and on-chip SRAM. This approach is designed to be more efficient in terms of IO complexity compared to standard attention mechanisms. + +GPT +^^^ +To enable Flash Attention while Megatron GPT model training or fine-tuning, modify the following configuration: +.. code:: + + model.use_flash_attention=True + +T5 +^^ +To enable Flash Attention while Megatron T5 model training, modify the following configuration: +.. code:: + + model.use_flash_attention=True From 36de4c34b765fc3faea1678a1a328510d0494b26 Mon Sep 17 00:00:00 2001 From: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Date: Fri, 8 Sep 2023 19:40:40 +0400 Subject: [PATCH 007/112] Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister --- docs/source/nlp/pos_emb.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/nlp/pos_emb.rst b/docs/source/nlp/pos_emb.rst index c491f861fa99..99c19af91073 100644 --- a/docs/source/nlp/pos_emb.rst +++ b/docs/source/nlp/pos_emb.rst @@ -109,6 +109,7 @@ Flash attention GPT ^^^ To enable Flash Attention while Megatron GPT model training or fine-tuning, modify the following configuration: + .. code:: model.use_flash_attention=True @@ -116,6 +117,7 @@ To enable Flash Attention while Megatron GPT model training or fine-tuning, modi T5 ^^ To enable Flash Attention while Megatron T5 model training, modify the following configuration: + .. code:: model.use_flash_attention=True From 28d5b0d8e8aa4e101906d3b1f2046114c3177956 Mon Sep 17 00:00:00 2001 From: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Date: Fri, 8 Sep 2023 19:52:54 +0400 Subject: [PATCH 008/112] Update and rename docs/source/nlp/pos_emb.rst to docs/source/nlp/nemo_megatron /positional_embeddings.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../positional_embeddings.rst} | 20 ------------------- 1 file changed, 20 deletions(-) rename docs/source/nlp/{pos_emb.rst => nemo_megatron /positional_embeddings.rst} (86%) diff --git a/docs/source/nlp/pos_emb.rst b/docs/source/nlp/nemo_megatron /positional_embeddings.rst similarity index 86% rename from docs/source/nlp/pos_emb.rst rename to docs/source/nlp/nemo_megatron /positional_embeddings.rst index 99c19af91073..84a62f81c77d 100644 --- a/docs/source/nlp/pos_emb.rst +++ b/docs/source/nlp/nemo_megatron /positional_embeddings.rst @@ -101,23 +101,3 @@ Positional Interpolation is supported in Megatron GPT SFT models. Set RoPE Inter model.position_embedding_type='rope' model.rotary_percentage=1.0 model.seq_len_interpolation_factor: null - -Flash attention ---------------- -`FlashAttention `_ is a method designed to enhance the efficiency of Transformer models, which are widely utilized in applications such as natural language processing. Traditional Transformers are slow and consume a lot of memory, especially with long sequences, due to the quadratic time and memory complexity of self-attention. FlashAttention, an IO-aware exact attention algorithm that leverages tiling to minimize the number of memory reads/writes between the GPU's high bandwidth memory (HBM) and on-chip SRAM. This approach is designed to be more efficient in terms of IO complexity compared to standard attention mechanisms. - -GPT -^^^ -To enable Flash Attention while Megatron GPT model training or fine-tuning, modify the following configuration: - -.. code:: - - model.use_flash_attention=True - -T5 -^^ -To enable Flash Attention while Megatron T5 model training, modify the following configuration: - -.. code:: - - model.use_flash_attention=True From 85f4e639da4f5c24508a19843d5c149c0b5a9ee8 Mon Sep 17 00:00:00 2001 From: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Date: Fri, 8 Sep 2023 19:56:43 +0400 Subject: [PATCH 009/112] Rename positional_embeddings.rst to positional_embeddings.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../{nemo_megatron => nemo_megatron}/positional_embeddings.rst | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/source/nlp/{nemo_megatron => nemo_megatron}/positional_embeddings.rst (100%) diff --git a/docs/source/nlp/nemo_megatron /positional_embeddings.rst b/docs/source/nlp/nemo_megatron/positional_embeddings.rst similarity index 100% rename from docs/source/nlp/nemo_megatron /positional_embeddings.rst rename to docs/source/nlp/nemo_megatron/positional_embeddings.rst From ce6113d1325f0a7b4523e8b4ab60f4cafc578d20 Mon Sep 17 00:00:00 2001 From: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Date: Fri, 8 Sep 2023 19:57:34 +0400 Subject: [PATCH 010/112] Create flash_attention.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../nlp/nemo_megatron/flash_attention.rst | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 docs/source/nlp/nemo_megatron/flash_attention.rst diff --git a/docs/source/nlp/nemo_megatron/flash_attention.rst b/docs/source/nlp/nemo_megatron/flash_attention.rst new file mode 100644 index 000000000000..fd0d519e577c --- /dev/null +++ b/docs/source/nlp/nemo_megatron/flash_attention.rst @@ -0,0 +1,19 @@ +Flash attention +--------------- +`FlashAttention `_ is a method designed to enhance the efficiency of Transformer models, which are widely utilized in applications such as natural language processing. Traditional Transformers are slow and consume a lot of memory, especially with long sequences, due to the quadratic time and memory complexity of self-attention. FlashAttention, an IO-aware exact attention algorithm that leverages tiling to minimize the number of memory reads/writes between the GPU's high bandwidth memory (HBM) and on-chip SRAM. This approach is designed to be more efficient in terms of IO complexity compared to standard attention mechanisms. + +GPT +^^^ +To enable Flash Attention while Megatron GPT model training or fine-tuning, modify the following configuration: + +.. code:: + + model.use_flash_attention=True + +T5 +^^ +To enable Flash Attention while Megatron T5 model training, modify the following configuration: + +.. code:: + + model.use_flash_attention=True From 2e03e6cf5795ad4ac8288be81583e9209ef1f477 Mon Sep 17 00:00:00 2001 From: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Date: Mon, 11 Sep 2023 14:30:57 +0400 Subject: [PATCH 011/112] Changed value for model.seq_len_interpolation_factor to 2 Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister --- docs/source/nlp/nemo_megatron/positional_embeddings.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/nlp/nemo_megatron/positional_embeddings.rst b/docs/source/nlp/nemo_megatron/positional_embeddings.rst index 84a62f81c77d..68dce5bd119a 100644 --- a/docs/source/nlp/nemo_megatron/positional_embeddings.rst +++ b/docs/source/nlp/nemo_megatron/positional_embeddings.rst @@ -100,4 +100,4 @@ Positional Interpolation is supported in Megatron GPT SFT models. Set RoPE Inter model.position_embedding_type='rope' model.rotary_percentage=1.0 - model.seq_len_interpolation_factor: null + model.seq_len_interpolation_factor: 2 From b27d077f0a38e67498c287a8da19c7e1da9e5b05 Mon Sep 17 00:00:00 2001 From: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Date: Mon, 11 Sep 2023 14:57:26 +0400 Subject: [PATCH 012/112] fixed flash_attention enabling for t5 Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister --- docs/source/nlp/nemo_megatron/flash_attention.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/source/nlp/nemo_megatron/flash_attention.rst b/docs/source/nlp/nemo_megatron/flash_attention.rst index fd0d519e577c..78b3b9c2470c 100644 --- a/docs/source/nlp/nemo_megatron/flash_attention.rst +++ b/docs/source/nlp/nemo_megatron/flash_attention.rst @@ -16,4 +16,5 @@ To enable Flash Attention while Megatron T5 model training, modify the following .. code:: - model.use_flash_attention=True + model.encoder.use_flash_attention=True + model.decoder.use_flash_attention=True From 252123da993d15a2bb8f893114a7fce01f56ee11 Mon Sep 17 00:00:00 2001 From: anteju <108555623+anteju@users.noreply.github.com> Date: Fri, 8 Sep 2023 16:30:25 -0700 Subject: [PATCH 013/112] [TTS] Added a callback for logging initial data (#7384) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Ante Jukić Signed-off-by: Sasha Meister --- nemo/collections/tts/parts/utils/callbacks.py | 111 +++++++++++++++--- 1 file changed, 95 insertions(+), 16 deletions(-) diff --git a/nemo/collections/tts/parts/utils/callbacks.py b/nemo/collections/tts/parts/utils/callbacks.py index 304efad3f5f7..c1be48bdaa3d 100644 --- a/nemo/collections/tts/parts/utils/callbacks.py +++ b/nemo/collections/tts/parts/utils/callbacks.py @@ -110,7 +110,7 @@ def create_id(filepath: Path) -> str: class ArtifactGenerator(ABC): @abstractmethod def generate_artifacts( - self, model: LightningModule, batch_dict: Dict + self, model: LightningModule, batch_dict: Dict, initial_log: bool = False ) -> Tuple[List[AudioArtifact], List[ImageArtifact]]: """ Create artifacts for the input model and test batch. @@ -118,6 +118,8 @@ def generate_artifacts( Args: model: Model instance being trained to use for inference. batch_dict: Test batch to generate artifacts for. + initial_log: Flag to denote if this is the initial log, can + be used to save ground-truth data only once. Returns: List of audio and image artifacts to log. @@ -214,17 +216,48 @@ def _log_image(self, image: ImageArtifact, log_dir: Path, step: int): wandb_image = (wandb.Image(image_plot, caption=image.id),) self.wandb_logger.log({image.id: wandb_image}) + def _log_artifacts(self, audio_list: list, image_list: list, log_dir: Optional[Path] = None, global_step: int = 0): + """Log audio and image artifacts. + """ + if log_dir is not None: + log_dir.mkdir(parents=True, exist_ok=True) + + for audio in audio_list: + self._log_audio(audio=audio, log_dir=log_dir, step=global_step) + + for image in image_list: + self._log_image(image=image, log_dir=log_dir, step=global_step) + + def on_fit_start(self, trainer: Trainer, model: LightningModule): + """Log initial data artifacts. + """ + audio_list = [] + image_list = [] + for batch_dict in self.data_loader: + for key, value in batch_dict.items(): + if isinstance(value, torch.Tensor): + batch_dict[key] = value.to(model.device) + + for generator in self.generators: + audio, images = generator.generate_artifacts(model=model, batch_dict=batch_dict, initial_log=True) + audio_list += audio + image_list += images + + if len(audio_list) == len(image_list) == 0: + logging.debug('List are empty, no initial artifacts to log.') + return + + log_dir = self.output_dir / f"initial" if self.output_dir else None + + self._log_artifacts(audio_list=audio_list, image_list=image_list, log_dir=log_dir) + def on_train_epoch_end(self, trainer: Trainer, model: LightningModule): + """Log artifacts at the end of an epoch. + """ epoch = 1 + model.current_epoch if (epoch not in self.log_epochs) and (epoch % self.epoch_frequency != 0): return - if self.output_dir: - log_dir = self.output_dir / f"epoch_{epoch}" - log_dir.mkdir(parents=True, exist_ok=True) - else: - log_dir = None - audio_list = [] image_list = [] for batch_dict in self.data_loader: @@ -237,11 +270,13 @@ def on_train_epoch_end(self, trainer: Trainer, model: LightningModule): audio_list += audio image_list += images - for audio in audio_list: - self._log_audio(audio=audio, log_dir=log_dir, step=model.global_step) + if len(audio_list) == len(image_list) == 0: + logging.debug('List are empty, no artifacts to log at epoch %d.', epoch) + return - for image in image_list: - self._log_image(image=image, log_dir=log_dir, step=model.global_step) + log_dir = self.output_dir / f"epoch_{epoch}" if self.output_dir else None + + self._log_artifacts(audio_list=audio_list, image_list=image_list, log_dir=log_dir) class VocoderArtifactGenerator(ArtifactGenerator): @@ -250,8 +285,11 @@ class VocoderArtifactGenerator(ArtifactGenerator): """ def generate_artifacts( - self, model: LightningModule, batch_dict: Dict + self, model: LightningModule, batch_dict: Dict, initial_log: bool = False ) -> Tuple[List[AudioArtifact], List[ImageArtifact]]: + if initial_log: + # Currently, nothing to log before training starts + return [], [] audio_artifacts = [] @@ -297,7 +335,16 @@ def __init__(self, log_audio: bool = True, log_encoding: bool = False, log_dequa logging.debug('\tlog_encoding: %s', self.log_encoding) logging.debug('\tlog_dequantized: %s', self.log_dequantized) - def _generate_audio(self, model, audio_ids, audio, audio_len): + def _generate_audio(self, model, audio_ids, audio, audio_len, save_input: bool = False): + """Generate audio artifacts. + + Args: + model: callable model, outputs (audio_pred, audio_pred_len) + audio_ids: list of IDs for the examples in audio batch + audio: tensor of input audio signals, shape (B, T) + audio_len: tensor of lengths for each example in the batch, shape (B,) + save_input: if True, save input audio signals + """ if not self.log_audio: return [] @@ -317,9 +364,29 @@ def _generate_audio(self, model, audio_ids, audio, audio_len): ) audio_artifacts.append(audio_artifact) + if save_input: + # save input audio + for i, audio_id in enumerate(audio_ids): + audio_in_i = audio[i, : audio_len[i]].cpu().numpy() + audio_artifact = AudioArtifact( + id=f"audio_in_{audio_id}", + data=audio_in_i, + filename=f"{audio_id}_audio_in.wav", + sample_rate=model.sample_rate, + ) + audio_artifacts.append(audio_artifact) + return audio_artifacts def _generate_images(self, model, audio_ids, audio, audio_len): + """Generate image artifacts. + + Args: + model: model, needs to support `model.encode_audio`, `model.quantize` and `model.dequantize` + audio_ids: list of IDs for the examples in audio batch + audio: tensor of input audio signals, shape (B, T) + audio_len: tensor of lengths for each example in the batch, shape (B,) + """ image_artifacts = [] if not self.log_encoding and not self.log_dequantized: @@ -363,8 +430,14 @@ def _generate_images(self, model, audio_ids, audio, audio_len): return image_artifacts def generate_artifacts( - self, model: LightningModule, batch_dict: Dict + self, model: LightningModule, batch_dict: Dict, initial_log: bool = False ) -> Tuple[List[AudioArtifact], List[ImageArtifact]]: + """ + Args: + model: model used to process input to generate artifacts + batch_dict: dictionary obtained form the dataloader + initial_log: save input audio for the initial log + """ audio_filepaths = batch_dict.get("audio_filepaths") audio_ids = [create_id(p) for p in audio_filepaths] @@ -372,7 +445,9 @@ def generate_artifacts( audio = batch_dict.get("audio") audio_len = batch_dict.get("audio_lens") - audio_artifacts = self._generate_audio(model=model, audio_ids=audio_ids, audio=audio, audio_len=audio_len) + audio_artifacts = self._generate_audio( + model=model, audio_ids=audio_ids, audio=audio, audio_len=audio_len, save_input=initial_log + ) image_artifacts = self._generate_images(model=model, audio_ids=audio_ids, audio=audio, audio_len=audio_len) return audio_artifacts, image_artifacts @@ -518,9 +593,13 @@ def _generate_gta_predictions(self, model: LightningModule, audio_ids: List[str] return audio_artifacts, image_artifacts def generate_artifacts( - self, model: LightningModule, batch_dict: Dict + self, model: LightningModule, batch_dict: Dict, initial_log: bool = False ) -> Tuple[List[AudioArtifact], List[ImageArtifact]]: + if initial_log: + # Currently, nothing to log before training starts + return [], [] + audio_artifacts = [] image_artifacts = [] audio_filepaths = batch_dict.get("audio_filepaths") From 5341a7398de11c4339bc466b9c1f6a6f835e2409 Mon Sep 17 00:00:00 2001 From: Abhinav Khattar Date: Fri, 8 Sep 2023 17:22:05 -0700 Subject: [PATCH 014/112] Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar * update commit Signed-off-by: Abhinav Khattar --------- Signed-off-by: Abhinav Khattar Signed-off-by: Sasha Meister --- Dockerfile | 2 +- Jenkinsfile | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Dockerfile b/Dockerfile index e935d3bd810c..e4f04359e6eb 100644 --- a/Dockerfile +++ b/Dockerfile @@ -55,7 +55,7 @@ RUN git clone https://github.com/NVIDIA/apex.git && \ # install megatron core, this can be removed once 0.3 pip package is released RUN git clone https://github.com/NVIDIA/Megatron-LM.git && \ cd Megatron-LM && \ - git checkout 01c8704453af7e26134441224c8a351746ca0349 && \ + git checkout ab0336a5c8eab77aa74ae604ba1e73decbf6d560 && \ pip install -e . # uninstall stuff from base container diff --git a/Jenkinsfile b/Jenkinsfile index 45910b740655..04bc96ff1596 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -59,11 +59,11 @@ pipeline { stage('Megatron Core installation') { steps { - // pinned MCore https://github.com/NVIDIA/Megatron-LM/commit/01c8704453af7e26134441224c8a351746ca0349 + // pinned MCore https://github.com/NVIDIA/Megatron-LM/commit/ab0336a5c8eab77aa74ae604ba1e73decbf6d560 // ToT for 23.08 branch sh 'git clone https://github.com/NVIDIA/Megatron-LM.git && \ cd Megatron-LM && \ - git checkout 01c8704453af7e26134441224c8a351746ca0349 && \ + git checkout ab0336a5c8eab77aa74ae604ba1e73decbf6d560 && \ pip install -e .' } } From 68142d6716fd1a41581a31a5e1212ec003356b59 Mon Sep 17 00:00:00 2001 From: Maanu Grover <109391026+maanug-nv@users.noreply.github.com> Date: Fri, 8 Sep 2023 22:43:35 -0500 Subject: [PATCH 015/112] Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover * move precision copy before super constructor Signed-off-by: Maanu Grover * use trainer arg Signed-off-by: Maanu Grover --------- Signed-off-by: Maanu Grover Signed-off-by: Sasha Meister --- .../nlp/models/language_modeling/megatron_base_model.py | 6 ++++-- .../nlp/models/language_modeling/megatron_bert_model.py | 6 +++--- .../models/language_modeling/megatron_gpt_adapter_model.py | 2 +- .../nlp/models/language_modeling/megatron_gpt_model.py | 2 +- .../language_modeling/megatron_gpt_prompt_learning_model.py | 2 +- .../models/language_modeling/megatron_retrieval_model.py | 2 +- .../language_modeling/megatron_t5_prompt_learning_model.py | 2 +- 7 files changed, 12 insertions(+), 10 deletions(-) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_base_model.py b/nemo/collections/nlp/models/language_modeling/megatron_base_model.py index a7b1b9521e3c..29b1886c957a 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_base_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_base_model.py @@ -102,10 +102,12 @@ def __init__(self, cfg: DictConfig, trainer: Trainer, no_lm_init=True): # this prevents base constructor from initializing tokenizer self.tokenizer = None + with open_dict(cfg): + if cfg.get('precision', None) is None and trainer is not None: + cfg.precision = trainer.precision + super().__init__(cfg, trainer=trainer, no_lm_init=no_lm_init) - with open_dict(self.cfg): - self.cfg.precision = trainer.precision # TODO: @maanug-nv consolidate into one attribute (requires lots of changes in subclasses) self.torch_dtype = utils_funcs.torch_dtype_from_precision(self.cfg.precision) # Mixed precision datatype self.autocast_dtype = self.torch_dtype # Mixed precision datatype diff --git a/nemo/collections/nlp/models/language_modeling/megatron_bert_model.py b/nemo/collections/nlp/models/language_modeling/megatron_bert_model.py index 73548fda3348..083b6391bf0e 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_bert_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_bert_model.py @@ -124,12 +124,12 @@ def __init__(self, cfg: DictConfig, trainer: Trainer): converted_model = [] for module in self.model: converted_model.append( - Float16Module(config=self.model_parallel_config, module=module, precision=cfg.precision) + Float16Module(config=self.model_parallel_config, module=module, precision=self.cfg.precision) ) self.model = converted_model else: self.model = Float16Module( - config=self.model_parallel_config, module=self.model, precision=cfg.precision + config=self.model_parallel_config, module=self.model, precision=self.cfg.precision ) if hasattr(self, '_nsys_profile_enabled'): @@ -358,7 +358,7 @@ def training_step(self, dataloader_iter, batch_idx): torch.distributed.broadcast(loss_mean, get_last_rank()) - if self.cfg.precision == 16: + if self.torch_dtype == torch.float16: loss_scale = self.trainer.precision_plugin.scaler._scale if loss_scale is not None: self.log('loss_scale', loss_scale, batch_size=1) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_adapter_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_adapter_model.py index 2985ab4df3bb..c6b4d055ef6e 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_adapter_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_adapter_model.py @@ -223,7 +223,7 @@ def training_step(self, dataloader_iter, batch_idx): # we can avoid this broadcast by updating the PTL log function to accept specific ranks torch.distributed.broadcast(loss_mean, get_last_rank()) - if self.cfg.precision == 16: + if self.torch_dtype == torch.float16: loss_scale = self.trainer.precision_plugin.scaler._scale if loss_scale is not None: self.log('loss_scale', loss_scale, batch_size=1) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py index 1d0e08abe6dd..1d2d68b9b6f5 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py @@ -625,7 +625,7 @@ def training_step(self, dataloader_iter, batch_idx): torch.distributed.broadcast(loss_mean, get_last_rank()) # (@adithyare) we need to check for the _scaler attribute to enable pp>1 for adapter training - if self.cfg.precision == 16 and hasattr(self.trainer.precision_plugin.scaler, "_scale"): + if self.torch_dtype == torch.float16 and hasattr(self.trainer.precision_plugin.scaler, "_scale"): loss_scale = self.trainer.precision_plugin.scaler._scale if loss_scale is not None: self.log('loss_scale', loss_scale, batch_size=1) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py index d0dca0173319..688d3a49159c 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py @@ -349,7 +349,7 @@ def training_step(self, dataloader_iter, batch_idx): # we can avoid this broadcast by updating the PTL log function to accept specific ranks torch.distributed.broadcast(loss_mean, get_last_rank()) - if self.cfg.precision == 16 and hasattr(self.trainer.precision_plugin.scaler, "_scale"): + if self.torch_dtype == torch.float16 and hasattr(self.trainer.precision_plugin.scaler, "_scale"): loss_scale = self.trainer.precision_plugin.scaler._scale if loss_scale is not None: self.log('loss_scale', loss_scale, batch_size=1) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_retrieval_model.py b/nemo/collections/nlp/models/language_modeling/megatron_retrieval_model.py index 4323d65a977e..e418151308b0 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_retrieval_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_retrieval_model.py @@ -262,7 +262,7 @@ def training_step(self, batch, batch_idx): reduced_loss = average_losses_across_data_parallel_group([lm_loss]) self._reduced_loss_buffer.append(reduced_loss[0]) - if self.cfg.precision == 16: + if self.torch_dtype == torch.float16: loss_scale = self.trainer.precision_plugin.scaler._scale if loss_scale is not None: self.log('loss_scale', loss_scale, batch_size=1) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py b/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py index f085e38573af..f1a13c2abee4 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py @@ -274,7 +274,7 @@ def training_step(self, dataloader_iter, batch_idx): # we can avoid this broadcast by updating the PTL log function to accept specific ranks torch.distributed.broadcast(loss_mean, get_last_rank()) - if self.cfg.precision == 16 and hasattr(self.trainer.precision_plugin.scaler, "_scale"): + if self.torch_dtype == torch.float16 and hasattr(self.trainer.precision_plugin.scaler, "_scale"): loss_scale = self.trainer.precision_plugin.scaler._scale if loss_scale is not None: self.log('loss_scale', loss_scale, batch_size=1) From 2a60fb0fc924e6739b4a262d166acb4e80a403b3 Mon Sep 17 00:00:00 2001 From: Somshubra Majumdar Date: Fri, 8 Sep 2023 20:44:17 -0700 Subject: [PATCH 016/112] Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar * Add support for auto extracting tokenizer model Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar * Fix issue with missing tokenizer Signed-off-by: smajumdar * Refactor Signed-off-by: smajumdar * Refactor Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../megatron_change_num_partitions.py | 43 ++++++++++++++++++- 1 file changed, 41 insertions(+), 2 deletions(-) diff --git a/examples/nlp/language_modeling/megatron_change_num_partitions.py b/examples/nlp/language_modeling/megatron_change_num_partitions.py index 1f2195ad1b2b..f7ada6ab762f 100644 --- a/examples/nlp/language_modeling/megatron_change_num_partitions.py +++ b/examples/nlp/language_modeling/megatron_change_num_partitions.py @@ -13,6 +13,8 @@ # limitations under the License. import os +import shutil +import tarfile import tempfile from argparse import ArgumentParser from typing import Dict, List @@ -233,7 +235,7 @@ def compute_tp_splits( for i in range(tp_size): tp_qkv = torch.cat([tp_qkv_splits[item] for item in range(i, tp_size * 2, tp_size)]) split.append(tp_qkv) - elif ('dense_h_to_4h.weight' in param_name or 'linear_fc1.weight' in param_name) and fast_glu_activation: + elif ('dense_h_to_4h' in param_name or 'linear_fc1' in param_name) and fast_glu_activation: # For Megatron GPT model with Fast Glu activation # Handle gated linear units # concat all the first halves ('W's) and all the second halves ('V's) @@ -275,7 +277,7 @@ def compute_tp_merge(idx, name, param, partitions_pp, model_cfg): concated = torch.cat([partitions_pp[i][idx].data for i in range(len(partitions_pp))], dim=0) # Logic for Fast Glu activation - if 'dense_h_to_4h.weight' in name and fast_glu_activation: + if 'dense_h_to_4h' in name and fast_glu_activation: # concat all the first halves ('W's) and all the second halves ('V's) wk_splits = [] for tpr in range(len(partitions_pp)): @@ -977,6 +979,31 @@ def main(): if vp_size > 1: set_virtual_parallel_rank_safely(0) + # Extract tokenizer artifact from the model to temp directory + logging.info("Extracting tokenizer artifact from NeMo file...") + temp_dir = tempfile.mkdtemp() + tokenizer_model_path = None + with tarfile.open(args.model_file, "r") as tar: + for member in tar.getmembers(): + if '.model' in member.name: + extracted_file = tar.extractfile(member) + extracted_file_path = os.path.join(temp_dir, member.name) + + if tokenizer_model_path is None: + logging.info(f"Found tokenizer. Extracting {member.name} to {extracted_file_path}") + + tokenizer_model_path = extracted_file_path + with open(extracted_file_path, "wb") as f: + f.write(extracted_file.read()) + else: + if args.tokenizer_model_path is None: + logging.warning( + f"\n\nFound multiple tokenizer artifacts in the model file.\n" + f"Using only {tokenizer_model_path}.\n" + f"If this is incorrect, manually pass the correct tokenizer using " + f"`--tokenizer_model_path`.\n\n" + ) + # If input model has TP > 1 or PP > 1 # Reconstruct the model to have TP = 1 and PP = 1 # Note that this is a forward loop that will process PP [0..N] TP [0..M] in sequential order. @@ -1384,6 +1411,15 @@ def main(): with open_dict(model.cfg): model.cfg.tokenizer.model = args.tokenizer_model_path + else: + if tokenizer_model_path is None: + logging.warning("Could not extract tokenizer model file from checkpoint.") + + else: + # Extract tokenizer info + with open_dict(model.cfg): + model.cfg.tokenizer.model = tokenizer_model_path + model.cfg, restore_dict = force_cpu_model(model.cfg) model = cls(model.cfg, trainer) @@ -1458,6 +1494,9 @@ def main(): logging.info("Successfully finished changing partitions!") + if temp_dir is not None: + shutil.rmtree(temp_dir, ignore_errors=True) + if __name__ == '__main__': main() From 8b5ef4175af7b9cbc5582845b0a0da408bddb44c Mon Sep 17 00:00:00 2001 From: Eric Harper Date: Fri, 8 Sep 2023 22:25:35 -0600 Subject: [PATCH 017/112] Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper * move dist ckpt Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper * when using mcore we can change tp pp on the fly Signed-off-by: eharper * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper * fix load dist ckpt Signed-off-by: jasonwan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper * remove import Signed-off-by: eharper --------- Signed-off-by: eharper Signed-off-by: jasonwan Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan Signed-off-by: Sasha Meister --- .../conf/megatron_gpt_inference.yaml | 2 + .../conf/megatron_llama_inference.yaml | 1 + .../language_modeling/megatron_gpt_eval.py | 49 +++-- .../language_modeling/megatron_gpt_model.py | 27 ++- nemo/collections/nlp/models/nlp_model.py | 45 +++- nemo/collections/nlp/parts/nlp_overrides.py | 196 +++++++++++++----- nemo/core/optim/distributed_adam.py | 6 +- .../convert_hf_llama_to_nemo.py | 3 +- 8 files changed, 238 insertions(+), 91 deletions(-) diff --git a/examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml b/examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml index f2912e27fccf..2570251bcdee 100644 --- a/examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml +++ b/examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml @@ -17,6 +17,8 @@ trainer: accelerator: gpu logger: False # logger provided by exp_manager precision: 16 # 16, 32, or bf16 + use_distributed_sampler: False + tensor_model_parallel_size: -1 pipeline_model_parallel_size: -1 diff --git a/examples/nlp/language_modeling/conf/megatron_llama_inference.yaml b/examples/nlp/language_modeling/conf/megatron_llama_inference.yaml index 9e82a7f8a4e7..e508b01858f5 100644 --- a/examples/nlp/language_modeling/conf/megatron_llama_inference.yaml +++ b/examples/nlp/language_modeling/conf/megatron_llama_inference.yaml @@ -17,6 +17,7 @@ trainer: accelerator: gpu logger: False # logger provided by exp_manager precision: 32 # 16, 32, or bf16 + use_distributed_sampler: False tensor_model_parallel_size: -1 pipeline_model_parallel_size: -1 diff --git a/examples/nlp/language_modeling/megatron_gpt_eval.py b/examples/nlp/language_modeling/megatron_gpt_eval.py index bc1a446226be..cbac3ab29ce2 100644 --- a/examples/nlp/language_modeling/megatron_gpt_eval.py +++ b/examples/nlp/language_modeling/megatron_gpt_eval.py @@ -169,25 +169,28 @@ def main(cfg) -> None: # trainer required for restoring model parallel models trainer = Trainer(strategy=NLPDDPStrategy(), **cfg.trainer) - if ( - cfg.tensor_model_parallel_size < 0 - or cfg.pipeline_model_parallel_size < 0 - or cfg.get('pipeline_model_parallel_split_rank', -1) < 0 - ): - save_restore_connector = NLPSaveRestoreConnector() - if os.path.isdir(cfg.gpt_model_file): - save_restore_connector.model_extracted_dir = cfg.gpt_model_file - model_config = MegatronGPTModel.restore_from( - restore_path=cfg.gpt_model_file, - trainer=trainer, - return_config=True, - save_restore_connector=save_restore_connector, - ) + if cfg.gpt_model_file is not None: + if ( + cfg.tensor_model_parallel_size < 0 + or cfg.pipeline_model_parallel_size < 0 + or cfg.get('pipeline_model_parallel_split_rank', -1) < 0 + ): + save_restore_connector = NLPSaveRestoreConnector() + if os.path.isdir(cfg.gpt_model_file): + save_restore_connector.model_extracted_dir = cfg.gpt_model_file + model_config = MegatronGPTModel.restore_from( + restore_path=cfg.gpt_model_file, + trainer=trainer, + return_config=True, + save_restore_connector=save_restore_connector, + ) - with open_dict(cfg): - cfg.tensor_model_parallel_size = model_config.get('tensor_model_parallel_size', 1) - cfg.pipeline_model_parallel_size = model_config.get('pipeline_model_parallel_size', 1) - cfg.pipeline_model_parallel_split_rank = model_config.get('pipeline_model_parallel_split_rank', 0) + # with dist checkpointing we don't need to set this + if not model_config.get('mcore_gpt', False): + with open_dict(cfg): + cfg.tensor_model_parallel_size = model_config.get('tensor_model_parallel_size', 1) + cfg.pipeline_model_parallel_size = model_config.get('pipeline_model_parallel_size', 1) + cfg.pipeline_model_parallel_split_rank = model_config.get('pipeline_model_parallel_split_rank', 0) assert ( cfg.trainer.devices * cfg.trainer.num_nodes @@ -211,6 +214,10 @@ def main(cfg) -> None: pretrained_cfg.activations_checkpoint_granularity = None pretrained_cfg.activations_checkpoint_method = None pretrained_cfg.precision = trainer.precision + if pretrained_cfg.get('mcore_gpt', False): + # with dist checkpointing we can use the model parallel config specified by the user + pretrained_cfg.tensor_model_parallel_size = cfg.tensor_model_parallel_size + pretrained_cfg.pipeline_model_parallel_size = cfg.pipeline_model_parallel_size if trainer.precision == "16": pretrained_cfg.megatron_amp_O2 = False elif trainer.precision in ['bf16', 'bf16-mixed'] and cfg.get('megatron_amp_O2', False): @@ -242,7 +249,11 @@ def main(cfg) -> None: pipeline_model_parallel_size_=cfg.pipeline_model_parallel_size, pipeline_model_parallel_split_rank_=cfg.pipeline_model_parallel_split_rank, ) - checkpoint_path = inject_model_parallel_rank(os.path.join(cfg.checkpoint_dir, cfg.checkpoint_name)) + checkpoint_path = os.path.join(cfg.checkpoint_dir, cfg.checkpoint_name) + # checkpoint_path is a dir in case of distributed checkpointing + if not os.path.isdir(checkpoint_path): + # legacy checkpoint needs model parallel rank injection + checkpoint_path = inject_model_parallel_rank(os.path.join(cfg.checkpoint_dir, cfg.checkpoint_name)) model = MegatronGPTModel.load_from_checkpoint(checkpoint_path, hparams_file=cfg.hparams_file, trainer=trainer) else: raise ValueError("need at least a nemo file or checkpoint dir") diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py index 1d2d68b9b6f5..4524ca5b21c7 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py @@ -1291,17 +1291,22 @@ def on_load_checkpoint(self, checkpoint) -> None: # mcore uses distributed checkpointing if self.mcore_gpt: - for index, module in enumerate(self.get_gpt_module_list()): - if parallel_state.get_virtual_pipeline_model_parallel_world_size() is not None: - checkpoint_state_dict = checkpoint['state_dict'][f'model_{index}'] - else: - checkpoint_state_dict = checkpoint['state_dict'] - # checkpoint_state_dict has "model." but module does not so we need to remove it when loading - checkpoint_state_dict = { - key.replace('model.', ''): checkpoint_state_dict.pop(key) - for key in list(checkpoint_state_dict.keys()) - } - module.load_state_dict(checkpoint_state_dict, strict=True) + if 'state_dict' in checkpoint: + for index, module in enumerate(self.get_gpt_module_list()): + if parallel_state.get_virtual_pipeline_model_parallel_world_size() is not None: + checkpoint_state_dict = checkpoint['state_dict'][f'model_{index}'] + else: + checkpoint_state_dict = checkpoint['state_dict'] + # checkpoint_state_dict has "model." but module does not so we need to remove it when loading + checkpoint_state_dict = { + key.replace('model.', ''): checkpoint_state_dict.pop(key) + for key in list(checkpoint_state_dict.keys()) + } + module.load_state_dict(checkpoint_state_dict, strict=True) + else: + # when restoring a distributed checkpoint from a ptl checkpoint we need to defer loading the state_dict + # see NLPModel.on_load_checkpoint + checkpoint['state_dict'] = {} # legacy checkpointing for interleaved else: diff --git a/nemo/collections/nlp/models/nlp_model.py b/nemo/collections/nlp/models/nlp_model.py index e7e8d3b137f4..0f0de87d3887 100644 --- a/nemo/collections/nlp/models/nlp_model.py +++ b/nemo/collections/nlp/models/nlp_model.py @@ -41,6 +41,15 @@ from nemo.core.classes.exportable import Exportable from nemo.utils import AppState, logging +try: + from megatron.core import dist_checkpointing, parallel_state + + HAVE_MEGATRON_CORE = True + +except (ImportError, ModuleNotFoundError): + + HAVE_MEGATRON_CORE = False + __all__ = ['NLPModel'] NEMO_NLP_TMP = os.path.join(os.path.dirname(str(TRANSFORMERS_CACHE)), "nemo_nlp_tmp") @@ -310,6 +319,20 @@ def load_from_checkpoint( checkpoint = None try: cls._set_model_restore_state(is_being_restored=True) + + # dist checkpoint is a dir + checkpoint_dir = None + if os.path.isdir(checkpoint_path): + # store dir for later use + checkpoint_dir = checkpoint_path + + # metadata is stored in common.pt + checkpoint_path = os.path.join(checkpoint_path, 'common.pt') + + # we defer loading the state_dict until the class has been initialized + # we need to set this for ptl_load_state + strict = False + # TODO: replace with proper PTL API with pl_legacy_patch(): if map_location is not None: @@ -342,7 +365,7 @@ def load_from_checkpoint( config_kwargs.pop('trainer') cfg.update(config_kwargs) - if cfg.get('megatron_amp_O2', False): + if cfg.get('megatron_amp_O2', False) and checkpoint_dir is None: new_state_dict = {} for key in checkpoint['state_dict'].keys(): new_key = key.replace('model.', 'model.module.', 1) @@ -355,6 +378,26 @@ def load_from_checkpoint( model = ptl_load_state(cls, checkpoint, strict=strict, cfg=cfg, **kwargs) # cfg = checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].cfg + # if the checkpoint is distributed, we deferred loading the state_dict until now + if checkpoint_dir is not None: + sharded_state_dict = model.sharded_state_dict() + checkpoint['state_dict'] = sharded_state_dict + # dist checkpointing needs torch.distributed to load the checkpoint + if parallel_state.is_unitialized(): + + def dummy(): + return + + if model.trainer.strategy.launcher is not None: + model.trainer.strategy.launcher.launch(dummy, trainer=model.trainer) + model.trainer.strategy.setup_environment() + # load the checkpoint from disk + checkpoint = dist_checkpointing.load(sharded_state_dict=checkpoint, checkpoint_dir=checkpoint_dir) + # restore the weights + model.on_load_checkpoint(checkpoint) + if hasattr(model, 'setup_transformer_engine_tp_groups'): + model.setup_transformer_engine_tp_groups() + # NMT models do not have a `tokenizer` attribute, they instead have an encoder_tokenizer and decoder_tokenizer attribute. if hasattr(cfg, "tokenizer"): if cfg.tokenizer.get("tokenizer_model") is not None: diff --git a/nemo/collections/nlp/parts/nlp_overrides.py b/nemo/collections/nlp/parts/nlp_overrides.py index 8b7970df4667..273768f7776d 100644 --- a/nemo/collections/nlp/parts/nlp_overrides.py +++ b/nemo/collections/nlp/parts/nlp_overrides.py @@ -41,7 +41,7 @@ from nemo.core.optim import MainParamsOptimizerWrapper from nemo.utils import AppState, logging from nemo.utils.get_rank import is_global_rank_zero -from nemo.utils.model_utils import ckpt_to_dir, inject_model_parallel_rank +from nemo.utils.model_utils import ckpt_to_dir, inject_model_parallel_rank, uninject_model_parallel_rank try: from apex.transformer.pipeline_parallel.utils import get_num_microbatches @@ -448,12 +448,38 @@ def __init__(self) -> None: def save_to(self, model, save_path: str): app_state = AppState() - if app_state.model_parallel_size is not None and app_state.model_parallel_size > 1: + + # Check if using distributed checkpointing + dist_ckpt = hasattr(model, 'sharded_state_dict') and model.sharded_state_dict() is not None + + dist_ckpt_dir = None + + if (app_state.model_parallel_size is not None and app_state.model_parallel_size > 1) or dist_ckpt: dir_name = os.path.dirname(save_path) - # first we save the weights for each model parallel rank - if app_state.model_parallel_size is not None and app_state.model_parallel_size > 1: + # dist ckpt calls save on every rank + if dist_ckpt: + # model weights is a directory + dist_ckpt_dir = ckpt_to_dir(os.path.join(dir_name, self.model_weights_ckpt)) + fs = get_filesystem(dist_ckpt_dir) + if is_global_rank_zero(): + fs.makedirs(dist_ckpt_dir, exist_ok=True) + sharded_state_dict = model.sharded_state_dict() + # dist checkpoint needs torch.distributed to save the checkpoint + if parallel_state.is_unitialized(): + + def dummy(): + return + + if model.trainer.strategy.launcher is not None: + model.trainer.strategy.launcher.launch(dummy, trainer=model.trainer) + model.trainer.strategy.setup_environment() + dist_checkpointing.save(sharded_state_dict=sharded_state_dict, checkpoint_dir=dist_ckpt_dir) + + else: + + # first we save the weights for each model parallel rank if app_state.data_parallel_rank == 0: if app_state.pipeline_model_parallel_size == 1: mp_model_weights = os.path.join( @@ -468,54 +494,56 @@ def save_to(self, model, save_path: str): self._save_state_dict_to_disk(model.state_dict(), mp_model_weights) - if torch.distributed.is_initialized(): - torch.distributed.barrier() - - # create nemo file from folder with all mp_ranks checkpoints - if ( - app_state.pipeline_model_parallel_rank == 0 - and app_state.tensor_model_parallel_rank == 0 - and app_state.data_parallel_rank == 0 - ): - with tempfile.TemporaryDirectory() as tmpdir: - - if app_state.pipeline_model_parallel_size == 1: - # move weights to the tmpdir - for tp_rank in range(app_state.tensor_model_parallel_size): - os.makedirs(os.path.join(tmpdir, f'mp_rank_{tp_rank:02d}')) - mp_model_weights = os.path.join( - dir_name, f'mp_rank_{tp_rank:02d}_' + self.model_weights_ckpt - ) - shutil.move( - mp_model_weights, - os.path.join(tmpdir, f'mp_rank_{tp_rank:02d}', self.model_weights_ckpt), - ) - else: - # move weights to the tmpdir - for tp_rank, pp_rank in itertools.product( - range(app_state.tensor_model_parallel_size), - range(app_state.pipeline_model_parallel_size), - ): - os.makedirs(os.path.join(tmpdir, f'tp_rank_{tp_rank:02d}_pp_rank_{pp_rank:03d}')) - mp_model_weights = os.path.join( - dir_name, f'tp_rank_{tp_rank:02d}_pp_rank_{pp_rank:03d}_' + self.model_weights_ckpt - ) - shutil.move( - mp_model_weights, - os.path.join( - tmpdir, f'tp_rank_{tp_rank:02d}_pp_rank_{pp_rank:03d}', self.model_weights_ckpt - ), - ) - - # create config and artifacts in tmpdir - config_yaml = os.path.join(tmpdir, self.model_config_yaml) - model.to_config_file(path2yaml_file=config_yaml) - if hasattr(model, 'artifacts') and model.artifacts is not None: - self._handle_artifacts(model, nemo_file_folder=tmpdir) - self._update_artifact_paths(model, path2yaml_file=config_yaml) - - # create tar file - self._make_nemo_file_from_folder(save_path, tmpdir) + if torch.distributed.is_initialized(): + torch.distributed.barrier() + + # create nemo file from folder with all mp_ranks checkpoints + if ( + app_state.pipeline_model_parallel_rank == 0 + and app_state.tensor_model_parallel_rank == 0 + and app_state.data_parallel_rank == 0 + ): + with tempfile.TemporaryDirectory() as tmpdir: + + if dist_ckpt: + shutil.move(str(dist_ckpt_dir), tmpdir) + + elif app_state.pipeline_model_parallel_size == 1: + # move weights to the tmpdir + for tp_rank in range(app_state.tensor_model_parallel_size): + os.makedirs(os.path.join(tmpdir, f'mp_rank_{tp_rank:02d}')) + mp_model_weights = os.path.join( + dir_name, f'mp_rank_{tp_rank:02d}_' + self.model_weights_ckpt + ) + shutil.move( + mp_model_weights, + os.path.join(tmpdir, f'mp_rank_{tp_rank:02d}', self.model_weights_ckpt), + ) + else: + # move weights to the tmpdir + for tp_rank, pp_rank in itertools.product( + range(app_state.tensor_model_parallel_size), range(app_state.pipeline_model_parallel_size), + ): + os.makedirs(os.path.join(tmpdir, f'tp_rank_{tp_rank:02d}_pp_rank_{pp_rank:03d}')) + mp_model_weights = os.path.join( + dir_name, f'tp_rank_{tp_rank:02d}_pp_rank_{pp_rank:03d}_' + self.model_weights_ckpt + ) + shutil.move( + mp_model_weights, + os.path.join( + tmpdir, f'tp_rank_{tp_rank:02d}_pp_rank_{pp_rank:03d}', self.model_weights_ckpt + ), + ) + + # create config and artifacts in tmpdir + config_yaml = os.path.join(tmpdir, self.model_config_yaml) + model.to_config_file(path2yaml_file=config_yaml) + if hasattr(model, 'artifacts') and model.artifacts is not None: + self._handle_artifacts(model, nemo_file_folder=tmpdir) + self._update_artifact_paths(model, path2yaml_file=config_yaml) + + # create tar file + self._make_nemo_file_from_folder(save_path, tmpdir) else: return super().save_to(model, save_path) @@ -539,6 +567,21 @@ def modify_state_dict(self, conf, state_dict): return state_dict + def _load_state_dict_from_disk(self, model_weights, map_location=None): + # if model_weights with the extension removed is a directory, we assume it is a distributed checkpoint + # we need to defer loading the state dict so we return None + uninject_model_weights = uninject_model_parallel_rank(model_weights) + + # legacy model_weights will have mp rank injected + if os.path.isfile(model_weights): + return super()._load_state_dict_from_disk(model_weights, map_location) + + # dist checkpoint will be a dir + elif os.path.isdir(os.path.splitext(uninject_model_weights)[0]): + return None + else: + raise ValueError(f'Expected {model_weights} to be a file or directory.') + def restore_from( self, calling_cls, @@ -571,6 +614,7 @@ def restore_from( Returns: An instance of type cls or its underlying config (if return_config is set). """ + # Get path where the command is executed - the artifacts will be "retrieved" there # (original .nemo behavior) loaded_params = super().load_config_and_state_dict( @@ -579,8 +623,52 @@ def restore_from( if not isinstance(loaded_params, tuple) or return_config is True: return loaded_params conf, instance, state_dict = loaded_params - state_dict = self.modify_state_dict(conf, state_dict) - super().load_instance_with_state_dict(instance, state_dict, strict) + + # if we're using dist checkpointing then state_dict will be None + if state_dict is None: + # dist checkpointing needs torch.distributed to load the checkpoint + if parallel_state.is_unitialized(): + + def dummy(): + return + + if trainer.strategy.launcher is not None: + trainer.strategy.launcher.launch(dummy, trainer=trainer) + trainer.strategy.setup_environment() + + with tempfile.TemporaryDirectory() as tmpdir: + # Check if self.model_extracted_dir is set, and is a valid path + if self.model_extracted_dir is not None and os.path.isdir(self.model_extracted_dir): + # Log that NeMo will use the provided `model_extracted_dir` + logging.info( + f"Restoration will occur within pre-extracted directory : " f"`{self.model_extracted_dir}`." + ) + + # Override `tmpdir` above with the pre-extracted `model_extracted_dir` + tmpdir = self.model_extracted_dir + + else: + # Extract the nemo file into the temporary directory + self._unpack_nemo_file( + path2file=restore_path, out_folder=tmpdir, extract_config_only=return_config is True + ) + checkpoint = {} + sharded_state_dict = instance.sharded_state_dict() + checkpoint['state_dict'] = sharded_state_dict + # remove model weights extension + tmp_model_weights_ckpt = os.path.join(tmpdir, self.model_weights_ckpt) + tmp_model_weights_dir = os.path.splitext(tmp_model_weights_ckpt)[0] + assert os.path.isdir(tmp_model_weights_dir), f'Expected {tmp_model_weights_dir} to be a directory.' + checkpoint = dist_checkpointing.load( + sharded_state_dict=checkpoint, checkpoint_dir=tmp_model_weights_dir + ) + instance.on_load_checkpoint(checkpoint) + if hasattr(instance, 'setup_transformer_engine_tp_groups'): + instance.setup_transformer_engine_tp_groups() + + else: + state_dict = self.modify_state_dict(conf, state_dict) + super().load_instance_with_state_dict(instance, state_dict, strict) logging.info(f'Model {instance.__class__.__name__} was successfully restored from {restore_path}.') return instance diff --git a/nemo/core/optim/distributed_adam.py b/nemo/core/optim/distributed_adam.py index 62bba769f652..d7bc049c1808 100644 --- a/nemo/core/optim/distributed_adam.py +++ b/nemo/core/optim/distributed_adam.py @@ -21,11 +21,7 @@ from megatron.core import parallel_state from megatron.core.dist_checkpointing.dict_utils import dict_list_map_inplace from megatron.core.dist_checkpointing.mapping import ShardedTensor -from megatron.core.dist_checkpointing.optimizer import ( - get_param_id_to_sharded_param_map, - make_sharded_optimizer_tensor, - optim_state_to_sharding_state, -) +from megatron.core.dist_checkpointing.optimizer import get_param_id_to_sharded_param_map, optim_state_to_sharding_state def _str_to_dtype(dtype: Union[str, torch.dtype]) -> torch.dtype: diff --git a/scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py b/scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py index 69f229ec5e10..c281088f8c5c 100644 --- a/scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py +++ b/scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py @@ -37,6 +37,7 @@ from nemo.collections.nlp.parts.nlp_overrides import ( GradScaler, MegatronHalfPrecisionPlugin, + NLPDDPStrategy, NLPSaveRestoreConnector, PipelineMixedPrecisionPlugin, ) @@ -179,7 +180,7 @@ def convert(args): nemo_config.precision = precision print(f"nemo_config: {nemo_config}") - trainer = Trainer(plugins=plugins, accelerator='cpu', precision=precision) + trainer = Trainer(plugins=plugins, accelerator='cpu', precision=precision, strategy=NLPDDPStrategy()) hidden_size = hf_config["hidden_size"] head_num = hf_config["num_attention_heads"] From b24fdb96c2dbc2eb69195432d720c174e3c5c7a7 Mon Sep 17 00:00:00 2001 From: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com> Date: Fri, 8 Sep 2023 21:25:54 -0700 Subject: [PATCH 018/112] fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang Co-authored-by: Jimmy Zhang Signed-off-by: Sasha Meister --- .../language_modeling/megatron_gpt_model.py | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py index 4524ca5b21c7..0362d6f2d1d7 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py @@ -884,16 +884,16 @@ def fwd_output_only_func(dataloader_iter, model): if attention_mask is not None: attention_mask = attention_mask.cuda() attention_mask = attention_mask[0:1] - if self.mcore_gpt: - # if first step, then clear KV cache, otherwise reuse inference_paarms - if set_inference_key_value_memory[0].item(): - self.inference_params = InferenceParams( - max_batch_size=tokens.size(0), max_sequence_length=inference_max_sequence_len[0].item() - ) - extra_arg['inference_params'] = self.inference_params - else: - extra_arg['set_inference_key_value_memory'] = set_inference_key_value_memory[0].item() - extra_arg['inference_max_sequence_len'] = inference_max_sequence_len[0].item() + if self.mcore_gpt: + # if first step, then clear KV cache, otherwise reuse inference_paarms + if set_inference_key_value_memory[0].item(): + self.inference_params = InferenceParams( + max_batch_size=tokens.size(0), max_sequence_length=inference_max_sequence_len[0].item() + ) + extra_arg['inference_params'] = self.inference_params + else: + extra_arg['set_inference_key_value_memory'] = set_inference_key_value_memory[0].item() + extra_arg['inference_max_sequence_len'] = inference_max_sequence_len[0].item() output_tensor = model(tokens, position_ids, attention_mask, **extra_arg) # Advance inference sequence offset. From c85dcbab468dfe4d003312290328c2a7441b96f8 Mon Sep 17 00:00:00 2001 From: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Date: Fri, 8 Sep 2023 21:31:43 -0700 Subject: [PATCH 019/112] Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../megatron_bart_pretraining.py | 5 +++- .../megatron_gpt_continue_training.py | 3 ++- .../language_modeling/megatron_gpt_eval.py | 4 +-- .../megatron_gpt_prompt_learning.py | 3 ++- .../megatron_gpt_prompt_learning_eval.py | 4 +-- .../megatron_retro_cal_shape.py | 9 +++++-- .../megatron_retro_fine_tune.py | 3 ++- .../megatron_retro_mutransfer_pretrain.py | 9 +++++-- .../megatron_retro_pretraining.py | 3 ++- .../megatron_t5_lm_adaptation_finetune.py | 5 +++- .../megatron_t5_prompt_learning.py | 3 ++- .../megatron_t5_seq2seq_finetune.py | 3 ++- .../tuning/megatron_gpt_adapter_tuning.py | 3 ++- .../tuning/megatron_gpt_ia3_tuning.py | 3 ++- .../tuning/megatron_gpt_peft_tuning.py | 3 ++- .../tuning/megatron_gpt_sft.py | 3 ++- .../tuning/megatron_t5_adapter_tuning.py | 3 ++- .../tuning/megatron_t5_ia3_tuning.py | 3 ++- .../tuning/megatron_t5_lora_tuning.py | 3 ++- .../nlp/parts/megatron_trainer_builder.py | 11 ++++++-- nemo/collections/nlp/parts/nlp_overrides.py | 26 +++++++++++++++++++ nemo/utils/exp_manager.py | 9 ++++++- 22 files changed, 95 insertions(+), 26 deletions(-) diff --git a/examples/nlp/language_modeling/megatron_bart_pretraining.py b/examples/nlp/language_modeling/megatron_bart_pretraining.py index 2b3c7ad1afef..72c7a755c0ec 100644 --- a/examples/nlp/language_modeling/megatron_bart_pretraining.py +++ b/examples/nlp/language_modeling/megatron_bart_pretraining.py @@ -21,6 +21,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_bart_model import MegatronBARTModel from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -63,7 +64,9 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[ModelSummary(max_depth=3)]) + trainer = Trainer( + plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[ModelSummary(max_depth=3), CustomProgressBar()] + ) exp_manager(trainer, cfg.exp_manager) diff --git a/examples/nlp/language_modeling/megatron_gpt_continue_training.py b/examples/nlp/language_modeling/megatron_gpt_continue_training.py index 2cb18c9a6714..c829a91c4092 100644 --- a/examples/nlp/language_modeling/megatron_gpt_continue_training.py +++ b/examples/nlp/language_modeling/megatron_gpt_continue_training.py @@ -23,6 +23,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel from nemo.collections.nlp.modules.common.megatron.megatron_init import fake_initialize_model_parallel from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -156,7 +157,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) diff --git a/examples/nlp/language_modeling/megatron_gpt_eval.py b/examples/nlp/language_modeling/megatron_gpt_eval.py index cbac3ab29ce2..04125c6f750e 100644 --- a/examples/nlp/language_modeling/megatron_gpt_eval.py +++ b/examples/nlp/language_modeling/megatron_gpt_eval.py @@ -27,7 +27,7 @@ from nemo.collections.nlp.modules.common.text_generation_server import MegatronServer from nemo.collections.nlp.modules.common.text_generation_utils import generate from nemo.collections.nlp.modules.common.transformer.text_generation import LengthParam, SamplingParam -from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy, NLPSaveRestoreConnector +from nemo.collections.nlp.parts.nlp_overrides import CustomProgressBar, NLPDDPStrategy, NLPSaveRestoreConnector from nemo.core.config import hydra_runner from nemo.utils.app_state import AppState from nemo.utils.model_utils import inject_model_parallel_rank @@ -167,7 +167,7 @@ def remove_padded_prompts(response, nb_paddings): def main(cfg) -> None: # trainer required for restoring model parallel models - trainer = Trainer(strategy=NLPDDPStrategy(), **cfg.trainer) + trainer = Trainer(strategy=NLPDDPStrategy(), **cfg.trainer, callbacks=[CustomProgressBar()]) if cfg.gpt_model_file is not None: if ( diff --git a/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py b/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py index 278cb454a50a..89077f099aeb 100644 --- a/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py +++ b/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py @@ -21,6 +21,7 @@ MegatronGPTPromptLearningModel, ) from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -74,7 +75,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) # load existing or init new soft prompt GPT model diff --git a/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py b/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py index 038bf3c2587b..e096bafc6132 100644 --- a/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py +++ b/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py @@ -26,7 +26,7 @@ MegatronGPTPromptLearningModel, ) from nemo.collections.nlp.modules.common.transformer.text_generation import LengthParam, SamplingParam -from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy, NLPSaveRestoreConnector +from nemo.collections.nlp.parts.nlp_overrides import CustomProgressBar, NLPDDPStrategy, NLPSaveRestoreConnector from nemo.core.config import hydra_runner from nemo.utils import logging @@ -83,7 +83,7 @@ def main(cfg) -> None: raise EnvironmentError("GPU is needed for the inference") # trainer required for restoring model parallel models - trainer = Trainer(strategy=NLPDDPStrategy(), **cfg.trainer) + trainer = Trainer(strategy=NLPDDPStrategy(), **cfg.trainer, callbacks=[CustomProgressBar()]) if ( cfg.tensor_model_parallel_size < 0 diff --git a/examples/nlp/language_modeling/megatron_retro_cal_shape.py b/examples/nlp/language_modeling/megatron_retro_cal_shape.py index a50736e88653..008ce1445929 100644 --- a/examples/nlp/language_modeling/megatron_retro_cal_shape.py +++ b/examples/nlp/language_modeling/megatron_retro_cal_shape.py @@ -19,7 +19,12 @@ from nemo.collections.nlp.models.language_modeling.megatron_retrieval_model import MegatronRetrievalModel from nemo.collections.nlp.modules.common.megatron.mup.shape import make_base_shapes -from nemo.collections.nlp.parts.nlp_overrides import GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy +from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, + GradScaler, + MegatronHalfPrecisionPlugin, + NLPDDPStrategy, +) from nemo.core.config import hydra_runner from nemo.utils import logging @@ -56,7 +61,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams with open_dict(cfg): diff --git a/examples/nlp/language_modeling/megatron_retro_fine_tune.py b/examples/nlp/language_modeling/megatron_retro_fine_tune.py index 15f1d541b6e7..031191c22b0a 100644 --- a/examples/nlp/language_modeling/megatron_retro_fine_tune.py +++ b/examples/nlp/language_modeling/megatron_retro_fine_tune.py @@ -24,6 +24,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_retro_fine_tune_model import MegatronRetroFinetuneModel from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -102,7 +103,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) logging.info(f'Resuming training from checkpoint: {trainer.ckpt_path}') diff --git a/examples/nlp/language_modeling/megatron_retro_mutransfer_pretrain.py b/examples/nlp/language_modeling/megatron_retro_mutransfer_pretrain.py index 65a98c8b9b2d..79b25fd2c5e8 100644 --- a/examples/nlp/language_modeling/megatron_retro_mutransfer_pretrain.py +++ b/examples/nlp/language_modeling/megatron_retro_mutransfer_pretrain.py @@ -20,7 +20,12 @@ from nemo.collections.nlp.models.language_modeling.megatron_retrieval_model import MegatronRetrievalModel from nemo.collections.nlp.modules.common.megatron.mup.optim import MuAdam, MuAdamW -from nemo.collections.nlp.parts.nlp_overrides import GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy +from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, + GradScaler, + MegatronHalfPrecisionPlugin, + NLPDDPStrategy, +) from nemo.core.config import hydra_runner from nemo.core.config.optimizers import AdamParams, AdamWParams from nemo.core.optim.optimizers import register_optimizer @@ -62,7 +67,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) diff --git a/examples/nlp/language_modeling/megatron_retro_pretraining.py b/examples/nlp/language_modeling/megatron_retro_pretraining.py index 3d0de78dc4d8..e90fdddf1169 100644 --- a/examples/nlp/language_modeling/megatron_retro_pretraining.py +++ b/examples/nlp/language_modeling/megatron_retro_pretraining.py @@ -23,6 +23,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_retrieval_model import MegatronRetrievalModel from nemo.collections.nlp.modules.common.megatron.megatron_init import initialize_model_parallel_for_nemo from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -65,7 +66,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) diff --git a/examples/nlp/language_modeling/megatron_t5_lm_adaptation_finetune.py b/examples/nlp/language_modeling/megatron_t5_lm_adaptation_finetune.py index 89e7e4e70321..0a10d434127d 100644 --- a/examples/nlp/language_modeling/megatron_t5_lm_adaptation_finetune.py +++ b/examples/nlp/language_modeling/megatron_t5_lm_adaptation_finetune.py @@ -21,6 +21,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_t5_model import MegatronT5Model from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -64,7 +65,9 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[ModelSummary(max_depth=3)]) + trainer = Trainer( + plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[ModelSummary(max_depth=3), CustomProgressBar()] + ) exp_manager(trainer, cfg.exp_manager) # update resume from checkpoint found by exp_manager diff --git a/examples/nlp/language_modeling/megatron_t5_prompt_learning.py b/examples/nlp/language_modeling/megatron_t5_prompt_learning.py index 9923297f6c33..ba335e39c225 100644 --- a/examples/nlp/language_modeling/megatron_t5_prompt_learning.py +++ b/examples/nlp/language_modeling/megatron_t5_prompt_learning.py @@ -21,6 +21,7 @@ MegatronT5PromptLearningModel, ) from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, NLPDDPStrategy, NLPSaveRestoreConnector, @@ -64,7 +65,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) # load existing or init new soft prompt T5 model diff --git a/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py b/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py index 1c59431f0515..e8e4ff0f9868 100644 --- a/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py +++ b/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py @@ -26,6 +26,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_t0_model import MegatronT0Model from nemo.collections.nlp.modules.common.megatron.megatron_init import fake_initialize_model_parallel from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -177,7 +178,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py b/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py index 66e03a94db74..7aebad7e9010 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py @@ -20,6 +20,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_gpt_adapter_model import MegatronGPTAdapterLearningModel from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -90,7 +91,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) # load existing or init new soft prompt GPT model diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py b/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py index 7467528c40da..b74d215b6ef7 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py @@ -20,6 +20,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_gpt_adapter_model import MegatronGPTInfusedAdapterModel from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -89,7 +90,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) # load existing or init new soft prompt GPT model diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py b/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py index bf4473594d24..2c9293b2600e 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py @@ -35,6 +35,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTModel from nemo.collections.nlp.modules.common.megatron.megatron_init import fake_initialize_model_parallel from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -213,7 +214,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) # update resume from checkpoint found by exp_manager if cfg.model.resume_from_checkpoint is not None: diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py b/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py index b5c37e4936c0..15d3dc53c348 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py @@ -23,6 +23,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTSFTModel from nemo.collections.nlp.modules.common.megatron.megatron_init import fake_initialize_model_parallel from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -174,7 +175,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py b/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py index ad3a6465112d..c096a75d0b9d 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py @@ -20,6 +20,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_t5_adapter_model import MegatronT5AdapterLearningModel from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -89,7 +90,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) # load existing or init new soft prompt GPT model diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py b/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py index cc5cf931cfce..46b590acd725 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py @@ -20,6 +20,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_t5_adapter_model import MegatronT5InfusedAdapterModel from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -90,7 +91,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_lora_tuning.py b/examples/nlp/language_modeling/tuning/megatron_t5_lora_tuning.py index de796aef3f86..fb982a273662 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_lora_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_lora_tuning.py @@ -20,6 +20,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_t5_adapter_model import MegatronT5LoraModel from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -89,7 +90,7 @@ def main(cfg) -> None: if cfg.get('cluster_type', None) == 'BCP': plugins.append(TorchElasticEnvironment()) - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) + trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) exp_manager(trainer, cfg.exp_manager) # load existing or init new soft prompt GPT model diff --git a/nemo/collections/nlp/parts/megatron_trainer_builder.py b/nemo/collections/nlp/parts/megatron_trainer_builder.py index e5af76cbd1ec..02f0a7f4f4aa 100644 --- a/nemo/collections/nlp/parts/megatron_trainer_builder.py +++ b/nemo/collections/nlp/parts/megatron_trainer_builder.py @@ -16,7 +16,9 @@ from pytorch_lightning import Trainer from pytorch_lightning.callbacks import ModelSummary from pytorch_lightning.plugins.environments import TorchElasticEnvironment + from nemo.collections.nlp.parts.nlp_overrides import ( + CustomProgressBar, GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, @@ -83,7 +85,7 @@ def _plugins(self) -> list: def create_trainer(self) -> Trainer: strategy = self._training_strategy() plugins = self._plugins() - return Trainer(plugins=plugins, strategy=strategy, **self.cfg.trainer) + return Trainer(plugins=plugins, strategy=strategy, **self.cfg.trainer, callbacks=[CustomProgressBar()]) class MegatronBertTrainerBuilder(MegatronTrainerBuilder): @@ -102,4 +104,9 @@ class MegatronT5TrainerBuilder(MegatronTrainerBuilder): def create_trainer(self) -> Trainer: strategy = self._training_strategy() plugins = self._plugins() - return Trainer(plugins=plugins, strategy=strategy, **self.cfg.trainer, callbacks=[ModelSummary(max_depth=3)]) + return Trainer( + plugins=plugins, + strategy=strategy, + **self.cfg.trainer, + callbacks=[ModelSummary(max_depth=3), CustomProgressBar()] + ) diff --git a/nemo/collections/nlp/parts/nlp_overrides.py b/nemo/collections/nlp/parts/nlp_overrides.py index 273768f7776d..d7eaaa90e536 100644 --- a/nemo/collections/nlp/parts/nlp_overrides.py +++ b/nemo/collections/nlp/parts/nlp_overrides.py @@ -25,6 +25,8 @@ import torch from lightning_fabric.utilities.cloud_io import get_filesystem from omegaconf import OmegaConf +from pytorch_lightning.callbacks.progress import TQDMProgressBar +from pytorch_lightning.callbacks.progress.tqdm_progress import _update_n from pytorch_lightning.loops.fetchers import _DataFetcher from pytorch_lightning.overrides.base import _LightningModuleWrapperBase from pytorch_lightning.plugins import ClusterEnvironment @@ -1073,3 +1075,27 @@ def _fetch_next_batch(self, iterator: Iterator) -> None: assert isinstance(dataloader, Sized) # `_has_len` is True self.done = self.fetched >= len(dataloader) self.on_fetch_end(batch, start_output) + + +class CustomProgressBar(TQDMProgressBar): + """ + Add CustomProgressBar to remove 's/it' and display progress per step instead of per microbatch + for megatron models + """ + + def init_train_tqdm(self): + """ + Override bar_format to not have 's/it' + """ + self.bar = super().init_train_tqdm() + self.bar.bar_format = "{desc}: {percentage:3.0f}%|{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}{postfix}]" + return self.bar + + def on_train_batch_end(self, trainer, pl_module, *_, **__): + """ + Override parent class on_train_batch_end to update progress bar per global_step instead of per microbatch + """ + n = trainer.global_step + if self._should_update(n, self.train_progress_bar.total): + _update_n(self.train_progress_bar, n) + self.train_progress_bar.set_postfix(self.get_metrics(trainer, pl_module)) diff --git a/nemo/utils/exp_manager.py b/nemo/utils/exp_manager.py index c3e938bbb751..3deb814ae2df 100644 --- a/nemo/utils/exp_manager.py +++ b/nemo/utils/exp_manager.py @@ -190,7 +190,14 @@ def _on_batch_start(self, name): def _on_batch_end(self, name, pl_module): self.timer.stop(name) # Set the `batch_size=1` as WAR for `dataloader_iter`, which is not used for any metric - pl_module.log(name, self.timer[name], on_step=True, on_epoch=False, batch_size=1) + pl_module.log( + name + ' in s', + self.timer[name], + on_step=True, + on_epoch=False, + batch_size=1, + prog_bar=(name == "train_step_timing"), + ) def on_train_batch_start(self, trainer, pl_module, batch, batch_idx): self._on_batch_start("train_step_timing") From ae5d3ceae8fb25c403de4cf597756079273535ca Mon Sep 17 00:00:00 2001 From: Abhinav Khattar Date: Fri, 8 Sep 2023 22:45:42 -0700 Subject: [PATCH 020/112] Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../nlp/modules/common/megatron/transformer.py | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/nemo/collections/nlp/modules/common/megatron/transformer.py b/nemo/collections/nlp/modules/common/megatron/transformer.py index fc8f363a556f..06a2e306482e 100644 --- a/nemo/collections/nlp/modules/common/megatron/transformer.py +++ b/nemo/collections/nlp/modules/common/megatron/transformer.py @@ -981,14 +981,14 @@ def __init__( elif self.activations_checkpoint_method == 'block': logging.info( ( - f'Using block activation checkpointing requires activations_checkpoint_num_layers to be set.' - f'Got: {self.activations_checkpoint_num_layers}. Setting to 1 by default.' + f'Using block activation checkpointing with granularity selective forces all layers to use checkpointing.' ) ) else: raise ValueError( f'activations_checkpoint_method should be "uniform" or "block" when using granularity selective.' ) + self.activations_checkpoint_num_layers = num_layers # forcing all layers elif self.activations_checkpoint_granularity == 'full': if self.activations_checkpoint_method in ['uniform', 'block']: if not self.activations_checkpoint_num_layers: @@ -998,6 +998,7 @@ def __init__( f'Got: {self.activations_checkpoint_num_layers}. Setting to 1 by default.' ) ) + self.activations_checkpoint_num_layers = 1 # keeping the old default else: raise ValueError( f'activations_checkpoint_method should be "uniform" or "block" when using granularity full.' @@ -1047,6 +1048,13 @@ def __init__( # TODO: Add similar assert for encoder-decoder. self.num_layers = self.get_num_layers(num_layers) + + if ( + self.activations_checkpoint_num_layers is not None + and self.activations_checkpoint_num_layers > self.num_layers + ): + self.activations_checkpoint_num_layers = self.num_layers + # Transformer layers. def build_layer(layer_number): if isinstance(layer_type, list): From 153c53b730a186585bb7367d080c3a230b61a763 Mon Sep 17 00:00:00 2001 From: Eric Harper Date: Sat, 9 Sep 2023 00:44:45 -0600 Subject: [PATCH 021/112] make loss mask default to false (#7407) Signed-off-by: eharper Signed-off-by: Sasha Meister --- .../nlp/models/language_modeling/megatron_gpt_model.py | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py index 0362d6f2d1d7..45586bffcdce 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py @@ -207,6 +207,7 @@ def __init__(self, cfg: DictConfig, trainer: Trainer): self._validate_trainer() # build the transformer config + # TODO: add type hint once pip package is out self.transformer_config = self.build_transformer_config() self.megatron_amp_o2 = cfg.get('megatron_amp_O2', False) @@ -275,6 +276,12 @@ def __init__(self, cfg: DictConfig, trainer: Trainer): self.inference_params = None + # default to false since this doesn't work with sequence parallelism currently + self.use_loss_mask = self.cfg.get('use_loss_mask', False) + + if self.use_loss_mask and self.transformer_config.sequence_parallel: + raise ValueError('Loss mask is not supported with sequence parallelism.') + def get_gpt_module_list(self): if isinstance(self.model, list): return [ @@ -826,8 +833,11 @@ def fwd_output_and_loss_func(dataloader_iter, model, checkpoint_activations_all_ 'labels': batch['labels'], 'loss_mask': batch['loss_mask'], } + if not self.mcore_gpt: forward_args['checkpoint_activations_all_layers'] = checkpoint_activations_all_layers + if not self.use_loss_mask: + forward_args.pop('loss_mask') else: # TODO: @eharper can we add this to mcore? forward_args.pop('loss_mask') From b61b9ab1ddba7af03b15cadfb60d3d36a40223e6 Mon Sep 17 00:00:00 2001 From: Sangkug Lym Date: Sat, 9 Sep 2023 13:54:32 -0700 Subject: [PATCH 022/112] Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym Signed-off-by: Sasha Meister --- .../conf/tp_overlap/ub_cfg_a100_h6144_tp4_mbs4_seqlen2048.yaml | 1 + .../conf/tp_overlap/ub_cfg_a100_h8192_tp8_mbs4_seqlen2048.yaml | 1 + .../conf/tp_overlap/ub_cfg_h100_h6144_tp4_mbs4_seqlen2048.yaml | 1 + .../conf/tp_overlap/ub_cfg_h100_h8192_tp8_mbs4_seqlen2048.yaml | 1 + 4 files changed, 4 insertions(+) create mode 100644 examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_a100_h6144_tp4_mbs4_seqlen2048.yaml create mode 100644 examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_a100_h8192_tp8_mbs4_seqlen2048.yaml create mode 100644 examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_h100_h6144_tp4_mbs4_seqlen2048.yaml create mode 100644 examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_h100_h8192_tp8_mbs4_seqlen2048.yaml diff --git a/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_a100_h6144_tp4_mbs4_seqlen2048.yaml b/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_a100_h6144_tp4_mbs4_seqlen2048.yaml new file mode 100644 index 000000000000..2c11fa89af4b --- /dev/null +++ b/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_a100_h6144_tp4_mbs4_seqlen2048.yaml @@ -0,0 +1 @@ +# dummy file to build hydra configs diff --git a/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_a100_h8192_tp8_mbs4_seqlen2048.yaml b/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_a100_h8192_tp8_mbs4_seqlen2048.yaml new file mode 100644 index 000000000000..2c11fa89af4b --- /dev/null +++ b/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_a100_h8192_tp8_mbs4_seqlen2048.yaml @@ -0,0 +1 @@ +# dummy file to build hydra configs diff --git a/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_h100_h6144_tp4_mbs4_seqlen2048.yaml b/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_h100_h6144_tp4_mbs4_seqlen2048.yaml new file mode 100644 index 000000000000..2c11fa89af4b --- /dev/null +++ b/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_h100_h6144_tp4_mbs4_seqlen2048.yaml @@ -0,0 +1 @@ +# dummy file to build hydra configs diff --git a/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_h100_h8192_tp8_mbs4_seqlen2048.yaml b/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_h100_h8192_tp8_mbs4_seqlen2048.yaml new file mode 100644 index 000000000000..2c11fa89af4b --- /dev/null +++ b/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_h100_h8192_tp8_mbs4_seqlen2048.yaml @@ -0,0 +1 @@ +# dummy file to build hydra configs From 3f36e112dcc979ff5e2c8b9291b896a2b67c808f Mon Sep 17 00:00:00 2001 From: Abhinav Khattar Date: Mon, 11 Sep 2023 11:52:25 -0600 Subject: [PATCH 023/112] add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar Signed-off-by: Sasha Meister --- .../conf/tp_overlap/ub_cfg_a100_h6144_tp8_mbs4_seqlen2048.yaml | 1 + .../conf/tp_overlap/ub_cfg_h100_h6144_tp8_mbs4_seqlen2048.yaml | 1 + 2 files changed, 2 insertions(+) create mode 100644 examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_a100_h6144_tp8_mbs4_seqlen2048.yaml create mode 100644 examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_h100_h6144_tp8_mbs4_seqlen2048.yaml diff --git a/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_a100_h6144_tp8_mbs4_seqlen2048.yaml b/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_a100_h6144_tp8_mbs4_seqlen2048.yaml new file mode 100644 index 000000000000..2c11fa89af4b --- /dev/null +++ b/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_a100_h6144_tp8_mbs4_seqlen2048.yaml @@ -0,0 +1 @@ +# dummy file to build hydra configs diff --git a/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_h100_h6144_tp8_mbs4_seqlen2048.yaml b/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_h100_h6144_tp8_mbs4_seqlen2048.yaml new file mode 100644 index 000000000000..2c11fa89af4b --- /dev/null +++ b/examples/nlp/language_modeling/conf/tp_overlap/ub_cfg_h100_h6144_tp8_mbs4_seqlen2048.yaml @@ -0,0 +1 @@ +# dummy file to build hydra configs From 53224a253bcb2c88fb7b1888fd1847be9eb4313d Mon Sep 17 00:00:00 2001 From: George <37293288+Jorjeous@users.noreply.github.com> Date: Tue, 12 Sep 2023 00:08:50 +0400 Subject: [PATCH 024/112] New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd Signed-off-by: Sasha Meister --- docs/source/starthere/tutorials.rst | 3 + tutorials/tools/SDE_HowTo_v2.ipynb | 2786 +++++++++++++++++++++++++++ 2 files changed, 2789 insertions(+) create mode 100644 tutorials/tools/SDE_HowTo_v2.ipynb diff --git a/docs/source/starthere/tutorials.rst b/docs/source/starthere/tutorials.rst index 8c74f88df08c..29ba2f300b74 100644 --- a/docs/source/starthere/tutorials.rst +++ b/docs/source/starthere/tutorials.rst @@ -190,6 +190,9 @@ To run a tutorial: * - Tools - NeMo Forced Aligner - `NeMo Forced Aligner `_ + * - Tools + - Speech Data Explorer + - `Speech Data Explorer `_ * - Tools - CTC Segmentation - `CTC Segmentation `_ diff --git a/tutorials/tools/SDE_HowTo_v2.ipynb b/tutorials/tools/SDE_HowTo_v2.ipynb new file mode 100644 index 000000000000..a6d3f8dd2723 --- /dev/null +++ b/tutorials/tools/SDE_HowTo_v2.ipynb @@ -0,0 +1,2786 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Introduction" + ], + "metadata": { + "id": "sHAHlxz1HYc0" + }, + "id": "sHAHlxz1HYc0" + }, + { + "cell_type": "markdown", + "source": [ + "[Speech Data Explorer](https://github.com/NVIDIA/NeMo/tree/main/tools/speech_data_explorer) (SDE) is a visual tool for interactive exploration of speech datasets and error analysis of Automatic Speech Recognition (ASR) models. This tutorial demonstrates how to use SDE in Comparison mode to evaluate two ASR models on a given test set and identify differences in their predictions." + ], + "metadata": { + "id": "9BVTGynbHSoy" + }, + "id": "9BVTGynbHSoy" + }, + { + "cell_type": "markdown", + "source": [ + "# Installation" + ], + "metadata": { + "id": "57pDMtWtHdxv" + }, + "id": "57pDMtWtHdxv" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u_sKMlVpcAvO" + }, + "source": [ + "First, let's install NeMo:" + ], + "id": "u_sKMlVpcAvO" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c3919489", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "87f4e2f4-a06c-432d-d986-429fbe6714af" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Cloning into 'NeMo'...\n", + "remote: Enumerating objects: 121443, done.\u001b[K\n", + "remote: Counting objects: 100% (1811/1811), done.\u001b[K\n", + "remote: Compressing objects: 100% (945/945), done.\u001b[K\n", + "remote: Total 121443 (delta 1299), reused 1238 (delta 864), pack-reused 119632\u001b[K\n", + "Receiving objects: 100% (121443/121443), 228.05 MiB | 21.34 MiB/s, done.\n", + "Resolving deltas: 100% (90608/90608), done.\n", + "Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]\n", + "Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease\n", + "Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]\n", + "Get:4 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]\n", + "Get:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]\n", + "Hit:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 InRelease\n", + "Hit:7 https://ppa.launchpadcontent.net/c2d4u.team/c2d4u4.0+/ubuntu jammy InRelease\n", + "Get:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease [18.1 kB]\n", + "Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease\n", + "Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease\n", + "Get:11 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [962 kB]\n", + "Get:12 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [1,059 kB]\n", + "Get:13 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [993 kB]\n", + "Get:14 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1,254 kB]\n", + "Get:15 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1,230 kB]\n", + "Get:16 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [1,079 kB]\n", + "Get:17 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy/main amd64 Packages [21.8 kB]\n", + "Fetched 6,959 kB in 1s (5,794 kB/s)\n", + "Reading package lists... Done\n", + "Reading package lists... Done\n", + "Building dependency tree... Done\n", + "Reading state information... Done\n", + "libsndfile1 is already the newest version (1.0.31-2build1).\n", + "ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).\n", + "0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.\n", + "Requirement already satisfied: pip in /usr/local/lib/python3.10/dist-packages (23.1.2)\n", + "Collecting pip\n", + " Downloading pip-23.2.1-py3-none-any.whl (2.1 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.1/2.1 MB\u001b[0m \u001b[31m12.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: pip\n", + " Attempting uninstall: pip\n", + " Found existing installation: pip 23.1.2\n", + " Uninstalling pip-23.1.2:\n", + " Successfully uninstalled pip-23.1.2\n", + "Successfully installed pip-23.2.1\n", + "Uninstalling stuff\n", + "\u001b[33mWARNING: Skipping nemo_toolkit as it is not installed.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Skipping sacrebleu as it is not installed.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Skipping nemo_asr as it is not installed.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Skipping nemo_nlp as it is not installed.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Skipping nemo_tts as it is not installed.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", + "\u001b[0mInstalling nemo\n", + "Obtaining file:///content/NeMo\n", + " Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", + " Checking if build backend supports build_editable ... \u001b[?25l\u001b[?25hdone\n", + " Getting requirements to build editable ... \u001b[?25l\u001b[?25hdone\n", + " Preparing editable metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", + "Collecting huggingface-hub (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for huggingface-hub from https://files.pythonhosted.org/packages/7f/c4/adcbe9a696c135578cabcbdd7331332daad4d49b7c43688bc2d36b3a47d2/huggingface_hub-0.16.4-py3-none-any.whl.metadata\n", + " Downloading huggingface_hub-0.16.4-py3-none-any.whl.metadata (12 kB)\n", + "Requirement already satisfied: numba in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.56.4)\n", + "Requirement already satisfied: numpy<1.24,>=1.22 in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (1.23.5)\n", + "Collecting onnx>=1.7.0 (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for onnx>=1.7.0 from https://files.pythonhosted.org/packages/47/d4/f2d212558245e252b936247666c3f5981e6dba62ec470ff8be3df3389364/onnx-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", + " Downloading onnx-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (15 kB)\n", + "Requirement already satisfied: python-dateutil in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (2.8.2)\n", + "Collecting ruamel.yaml (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for ruamel.yaml from https://files.pythonhosted.org/packages/d9/0e/2a05efa11ea33513fbdf4a2e2576fe94fd8fa5ad226dbb9c660886390974/ruamel.yaml-0.17.32-py3-none-any.whl.metadata\n", + " Downloading ruamel.yaml-0.17.32-py3-none-any.whl.metadata (17 kB)\n", + "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (1.2.2)\n", + "Collecting setuptools==65.5.1 (from nemo-toolkit==1.21.0rc0)\n", + " Downloading setuptools-65.5.1-py3-none-any.whl (1.2 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.2/1.2 MB\u001b[0m \u001b[31m13.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: tensorboard in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (2.12.3)\n", + "Requirement already satisfied: text-unidecode in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (1.3)\n", + "Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (2.0.1+cu118)\n", + "Requirement already satisfied: tqdm>=4.41.0 in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (4.66.1)\n", + "Collecting wget (from nemo-toolkit==1.21.0rc0)\n", + " Downloading wget-3.2.zip (10 kB)\n", + " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Requirement already satisfied: wrapt in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (1.14.1)\n", + "Collecting black==19.10b0 (from nemo-toolkit==1.21.0rc0)\n", + " Downloading black-19.10b0-py36-none-any.whl (97 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m97.5/97.5 kB\u001b[0m \u001b[31m12.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting click==8.0.2 (from nemo-toolkit==1.21.0rc0)\n", + " Downloading click-8.0.2-py3-none-any.whl (97 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m97.6/97.6 kB\u001b[0m \u001b[31m12.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting isort<6.0.0,>5.1.0 (from nemo-toolkit==1.21.0rc0)\n", + " Downloading isort-5.12.0-py3-none-any.whl (91 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m91.2/91.2 kB\u001b[0m \u001b[31m11.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting parameterized (from nemo-toolkit==1.21.0rc0)\n", + " Downloading parameterized-0.9.0-py2.py3-none-any.whl (20 kB)\n", + "Requirement already satisfied: pytest in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (7.4.1)\n", + "Collecting pytest-runner (from nemo-toolkit==1.21.0rc0)\n", + " Downloading pytest_runner-6.0.0-py3-none-any.whl (7.2 kB)\n", + "Requirement already satisfied: sphinx in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (5.0.2)\n", + "Collecting sphinxcontrib-bibtex (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for sphinxcontrib-bibtex from https://files.pythonhosted.org/packages/79/59/fafc5c480506cc356e2a7ea009d7c7d75812475b4385fe851ae55575661c/sphinxcontrib_bibtex-2.6.1-py3-none-any.whl.metadata\n", + " Downloading sphinxcontrib_bibtex-2.6.1-py3-none-any.whl.metadata (6.1 kB)\n", + "Collecting wandb (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for wandb from https://files.pythonhosted.org/packages/fe/10/18b03623c460fd433525d9b4739af58c5e69f5974328dcdd037cfbc855d7/wandb-0.15.10-py3-none-any.whl.metadata\n", + " Downloading wandb-0.15.10-py3-none-any.whl.metadata (9.6 kB)\n", + "Collecting hydra-core<=1.3.2,>1.3 (from nemo-toolkit==1.21.0rc0)\n", + " Downloading hydra_core-1.3.2-py3-none-any.whl (154 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m154.5/154.5 kB\u001b[0m \u001b[31m16.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting omegaconf<=2.3 (from nemo-toolkit==1.21.0rc0)\n", + " Downloading omegaconf-2.3.0-py3-none-any.whl (79 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.5/79.5 kB\u001b[0m \u001b[31m11.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting pytorch-lightning<=2.0.7,>=2.0 (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for pytorch-lightning<=2.0.7,>=2.0 from https://files.pythonhosted.org/packages/d5/ef/39994adec1fe1d5f25fd0dd0a82abcd8bd61fc968283790b9da7463f0279/pytorch_lightning-2.0.7-py3-none-any.whl.metadata\n", + " Downloading pytorch_lightning-2.0.7-py3-none-any.whl.metadata (23 kB)\n", + "Collecting torchmetrics>=0.11.0 (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for torchmetrics>=0.11.0 from https://files.pythonhosted.org/packages/e3/86/47091c33ecf05f8826d134fd518485d4c68ca524c053b2fdd4e041c20547/torchmetrics-1.1.1-py3-none-any.whl.metadata\n", + " Downloading torchmetrics-1.1.1-py3-none-any.whl.metadata (21 kB)\n", + "Collecting transformers>=4.0.1 (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for transformers>=4.0.1 from https://files.pythonhosted.org/packages/13/30/54b59e73400df3de506ad8630284e9fd63f4b94f735423d55fc342181037/transformers-4.33.1-py3-none-any.whl.metadata\n", + " Downloading transformers-4.33.1-py3-none-any.whl.metadata (119 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m119.9/119.9 kB\u001b[0m \u001b[31m12.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting webdataset<=0.1.62,>=0.1.48 (from nemo-toolkit==1.21.0rc0)\n", + " Downloading webdataset-0.1.62-py3-none-any.whl (32 kB)\n", + "Requirement already satisfied: inflect in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (7.0.0)\n", + "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (1.5.3)\n", + "Collecting pydantic<2 (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for pydantic<2 from https://files.pythonhosted.org/packages/bc/e0/0371e9b6c910afe502e5fe18cc94562bfd9399617c7b4f5b6e13c29115b3/pydantic-1.10.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", + " Downloading pydantic-1.10.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (149 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m149.3/149.3 kB\u001b[0m \u001b[31m17.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting sacremoses>=0.0.43 (from nemo-toolkit==1.21.0rc0)\n", + " Downloading sacremoses-0.0.53.tar.gz (880 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m880.6/880.6 kB\u001b[0m \u001b[31m24.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Collecting sentencepiece<1.0.0 (from nemo-toolkit==1.21.0rc0)\n", + " Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m34.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting youtokentome>=1.0.5 (from nemo-toolkit==1.21.0rc0)\n", + " Downloading youtokentome-1.0.6.tar.gz (86 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m86.7/86.7 kB\u001b[0m \u001b[31m11.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Collecting braceexpand (from nemo-toolkit==1.21.0rc0)\n", + " Downloading braceexpand-0.1.7-py2.py3-none-any.whl (5.9 kB)\n", + "Requirement already satisfied: editdistance in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.6.2)\n", + "Collecting g2p-en (from nemo-toolkit==1.21.0rc0)\n", + " Downloading g2p_en-2.1.0-py3-none-any.whl (3.1 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.1/3.1 MB\u001b[0m \u001b[31m51.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: ipywidgets in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (7.7.1)\n", + "Collecting jiwer (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for jiwer from https://files.pythonhosted.org/packages/0d/4f/ee537ab20144811dd99321735ff92ef2b3a3230b77ed7454bed4c44d21fc/jiwer-3.0.3-py3-none-any.whl.metadata\n", + " Downloading jiwer-3.0.3-py3-none-any.whl.metadata (2.6 kB)\n", + "Collecting kaldi-python-io (from nemo-toolkit==1.21.0rc0)\n", + " Downloading kaldi-python-io-1.2.2.tar.gz (8.8 kB)\n", + " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Collecting kaldiio (from nemo-toolkit==1.21.0rc0)\n", + " Downloading kaldiio-2.18.0-py3-none-any.whl (28 kB)\n", + "Requirement already satisfied: librosa>=0.9.0 in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.10.1)\n", + "Collecting marshmallow (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for marshmallow from https://files.pythonhosted.org/packages/ed/3c/cebfdcad015240014ff08b883d1c0c427f2ba45ae8c6572851b6ef136cad/marshmallow-3.20.1-py3-none-any.whl.metadata\n", + " Downloading marshmallow-3.20.1-py3-none-any.whl.metadata (7.8 kB)\n", + "Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (3.7.1)\n", + "Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (23.1)\n", + "Collecting pyannote.core (from nemo-toolkit==1.21.0rc0)\n", + " Downloading pyannote.core-5.0.0-py3-none-any.whl (58 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.5/58.5 kB\u001b[0m \u001b[31m7.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting pyannote.metrics (from nemo-toolkit==1.21.0rc0)\n", + " Downloading pyannote.metrics-3.2.1-py3-none-any.whl (51 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m51.4/51.4 kB\u001b[0m \u001b[31m6.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting pydub (from nemo-toolkit==1.21.0rc0)\n", + " Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)\n", + "Requirement already satisfied: scipy>=0.14 in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (1.10.1)\n", + "Requirement already satisfied: soundfile in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.12.1)\n", + "Collecting sox (from nemo-toolkit==1.21.0rc0)\n", + " Downloading sox-1.4.1-py2.py3-none-any.whl (39 kB)\n", + "Collecting texterrors (from nemo-toolkit==1.21.0rc0)\n", + " Downloading texterrors-0.4.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m50.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting boto3 (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for boto3 from https://files.pythonhosted.org/packages/6e/7f/ffb72ddc9f465183af04623baf9d0ea70e73cb0957407e4c333b9e0263fb/boto3-1.28.43-py3-none-any.whl.metadata\n", + " Downloading boto3-1.28.43-py3-none-any.whl.metadata (6.7 kB)\n", + "Collecting datasets (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for datasets from https://files.pythonhosted.org/packages/09/7e/fd4d6441a541dba61d0acb3c1fd5df53214c2e9033854e837a99dd9e0793/datasets-2.14.5-py3-none-any.whl.metadata\n", + " Downloading datasets-2.14.5-py3-none-any.whl.metadata (19 kB)\n", + "Collecting einops (from nemo-toolkit==1.21.0rc0)\n", + " Downloading einops-0.6.1-py3-none-any.whl (42 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m42.2/42.2 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting faiss-cpu (from nemo-toolkit==1.21.0rc0)\n", + " Downloading faiss_cpu-1.7.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m17.6/17.6 MB\u001b[0m \u001b[31m60.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting fasttext (from nemo-toolkit==1.21.0rc0)\n", + " Downloading fasttext-0.9.2.tar.gz (68 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m68.8/68.8 kB\u001b[0m \u001b[31m8.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Collecting flask-restful (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for flask-restful from https://files.pythonhosted.org/packages/d7/7b/f0b45f0df7d2978e5ae51804bb5939b7897b2ace24306009da0cc34d8d1f/Flask_RESTful-0.3.10-py2.py3-none-any.whl.metadata\n", + " Downloading Flask_RESTful-0.3.10-py2.py3-none-any.whl.metadata (1.0 kB)\n", + "Collecting ftfy (from nemo-toolkit==1.21.0rc0)\n", + " Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.1/53.1 kB\u001b[0m \u001b[31m7.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: gdown in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (4.6.6)\n", + "Requirement already satisfied: h5py in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (3.9.0)\n", + "Collecting ijson (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for ijson from https://files.pythonhosted.org/packages/6b/78/2cbeb7020a7a319d148c92331951cfc710864990e32ff6c7f4859729fb48/ijson-3.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", + " Downloading ijson-3.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)\n", + "Requirement already satisfied: jieba in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.42.1)\n", + "Collecting markdown2 (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for markdown2 from https://files.pythonhosted.org/packages/f1/98/61276a753f078dd2f3171c9a69fd3f451d220e806b2b1cdca41b8e368b0f/markdown2-2.4.10-py2.py3-none-any.whl.metadata\n", + " Downloading markdown2-2.4.10-py2.py3-none-any.whl.metadata (2.0 kB)\n", + "Collecting megatron-core==0.2.0 (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for megatron-core==0.2.0 from https://files.pythonhosted.org/packages/33/f1/d94f2282b91950e31223efc39138748d71907dbf857e5523d1e73619fd62/megatron_core-0.2.0-py3-none-any.whl.metadata\n", + " Downloading megatron_core-0.2.0-py3-none-any.whl.metadata (1.6 kB)\n", + "Requirement already satisfied: nltk>=3.6.5 in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (3.8.1)\n", + "Collecting opencc (from nemo-toolkit==1.21.0rc0)\n", + " Downloading OpenCC-1.1.6-cp310-cp310-manylinux1_x86_64.whl (778 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m778.3/778.3 kB\u001b[0m \u001b[31m70.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting pangu (from nemo-toolkit==1.21.0rc0)\n", + " Downloading pangu-4.0.6.1-py3-none-any.whl (6.4 kB)\n", + "Collecting rapidfuzz (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for rapidfuzz from https://files.pythonhosted.org/packages/35/04/9ca97b17da457ed294519477da2aad0799c9ba8eebf37761a5ca94c35534/rapidfuzz-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", + " Downloading rapidfuzz-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)\n", + "Collecting rouge-score (from nemo-toolkit==1.21.0rc0)\n", + " Downloading rouge_score-0.1.2.tar.gz (17 kB)\n", + " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Collecting sacrebleu[ja] (from nemo-toolkit==1.21.0rc0)\n", + " Downloading sacrebleu-2.3.1-py3-none-any.whl (118 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m118.9/118.9 kB\u001b[0m \u001b[31m16.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting sentence-transformers (from nemo-toolkit==1.21.0rc0)\n", + " Downloading sentence-transformers-2.2.2.tar.gz (85 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m86.0/86.0 kB\u001b[0m \u001b[31m12.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Requirement already satisfied: tensorstore in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.1.41)\n", + "Collecting zarr (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for zarr from https://files.pythonhosted.org/packages/ba/55/0f5ec28561a1698ac5c11edc5724f8c6d48d01baecf740ffd62107d95e7f/zarr-2.16.1-py3-none-any.whl.metadata\n", + " Downloading zarr-2.16.1-py3-none-any.whl.metadata (5.8 kB)\n", + "Collecting attrdict (from nemo-toolkit==1.21.0rc0)\n", + " Downloading attrdict-2.0.1-py2.py3-none-any.whl (9.9 kB)\n", + "Collecting kornia (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for kornia from https://files.pythonhosted.org/packages/55/da/72cb83aa364ebb4d0109965e20c5d33d7063ccab15332c3fd0acfd5609c9/kornia-0.7.0-py2.py3-none-any.whl.metadata\n", + " Downloading kornia-0.7.0-py2.py3-none-any.whl.metadata (12 kB)\n", + "Collecting nemo-text-processing (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for nemo-text-processing from https://files.pythonhosted.org/packages/bd/82/b776d01ba650c3ab42ba5e381e34b35e507a21b16ad1246f51026fa13f0b/nemo_text_processing-0.2.0rc0-py3-none-any.whl.metadata\n", + " Downloading nemo_text_processing-0.2.0rc0-py3-none-any.whl.metadata (7.2 kB)\n", + "Collecting pypinyin (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for pypinyin from https://files.pythonhosted.org/packages/00/fc/3e82bf38739a7b2c4f699245ce6c84ff254723c678c2cdc5d2ecbddf9afb/pypinyin-0.49.0-py2.py3-none-any.whl.metadata\n", + " Downloading pypinyin-0.49.0-py2.py3-none-any.whl.metadata (12 kB)\n", + "Collecting pypinyin-dict (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for pypinyin-dict from https://files.pythonhosted.org/packages/89/31/16c26425685a84191503a226450837e0f4d540c164665d8567f2472861a9/pypinyin_dict-0.6.0-py2.py3-none-any.whl.metadata\n", + " Downloading pypinyin_dict-0.6.0-py2.py3-none-any.whl.metadata (3.6 kB)\n", + "Collecting progress>=1.5 (from nemo-toolkit==1.21.0rc0)\n", + " Downloading progress-1.6.tar.gz (7.8 kB)\n", + " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Requirement already satisfied: tabulate>=0.8.7 in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.9.0)\n", + "Collecting textdistance>=4.1.5 (from nemo-toolkit==1.21.0rc0)\n", + " Downloading textdistance-4.5.0-py3-none-any.whl (31 kB)\n", + "Requirement already satisfied: attrs>=18.1.0 in /usr/local/lib/python3.10/dist-packages (from black==19.10b0->nemo-toolkit==1.21.0rc0) (23.1.0)\n", + "Requirement already satisfied: appdirs in /usr/local/lib/python3.10/dist-packages (from black==19.10b0->nemo-toolkit==1.21.0rc0) (1.4.4)\n", + "Requirement already satisfied: toml>=0.9.4 in /usr/local/lib/python3.10/dist-packages (from black==19.10b0->nemo-toolkit==1.21.0rc0) (0.10.2)\n", + "Collecting typed-ast>=1.4.0 (from black==19.10b0->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for typed-ast>=1.4.0 from https://files.pythonhosted.org/packages/e2/ed/b9b8b794b37b55c9247b1e8d38b0361e8158795c181636d34d6c11b506e7/typed_ast-1.5.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", + " Downloading typed_ast-1.5.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.7 kB)\n", + "Requirement already satisfied: regex in /usr/local/lib/python3.10/dist-packages (from black==19.10b0->nemo-toolkit==1.21.0rc0) (2023.6.3)\n", + "Collecting pathspec<1,>=0.6 (from black==19.10b0->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for pathspec<1,>=0.6 from https://files.pythonhosted.org/packages/b4/2a/9b1be29146139ef459188f5e420a66e835dda921208db600b7037093891f/pathspec-0.11.2-py3-none-any.whl.metadata\n", + " Downloading pathspec-0.11.2-py3-none-any.whl.metadata (19 kB)\n", + "Collecting antlr4-python3-runtime==4.9.* (from hydra-core<=1.3.2,>1.3->nemo-toolkit==1.21.0rc0)\n", + " Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m117.0/117.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "INFO: pip is looking at multiple versions of jiwer to determine which version is compatible with other requirements. This could take a while.\n", + "Collecting jiwer (from nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for jiwer from https://files.pythonhosted.org/packages/23/a3/92c29a5e422acd87e3b4f2e6dc0ce877070cc9b2f81d30fe84122032338a/jiwer-3.0.2-py3-none-any.whl.metadata\n", + " Downloading jiwer-3.0.2-py3-none-any.whl.metadata (2.6 kB)\n", + " Downloading jiwer-3.0.1-py3-none-any.whl (21 kB)\n", + " Downloading jiwer-3.0.0-py3-none-any.whl (21 kB)\n", + " Downloading jiwer-2.6.0-py3-none-any.whl (20 kB)\n", + " Downloading jiwer-2.5.2-py3-none-any.whl (15 kB)\n", + "Collecting rapidfuzz (from nemo-toolkit==1.21.0rc0)\n", + " Downloading rapidfuzz-2.13.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.2/2.2 MB\u001b[0m \u001b[31m96.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (3.0.0)\n", + "Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (1.3.2)\n", + "Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (4.4.2)\n", + "Requirement already satisfied: pooch>=1.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (1.7.0)\n", + "Requirement already satisfied: soxr>=0.3.2 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (0.3.6)\n", + "Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (4.7.1)\n", + "Requirement already satisfied: lazy-loader>=0.1 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (0.3)\n", + "Requirement already satisfied: msgpack>=1.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (1.0.5)\n", + "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->nemo-toolkit==1.21.0rc0) (1.1.0)\n", + "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->nemo-toolkit==1.21.0rc0) (0.11.0)\n", + "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->nemo-toolkit==1.21.0rc0) (4.42.1)\n", + "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->nemo-toolkit==1.21.0rc0) (1.4.5)\n", + "Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->nemo-toolkit==1.21.0rc0) (9.4.0)\n", + "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->nemo-toolkit==1.21.0rc0) (3.1.1)\n", + "Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba->nemo-toolkit==1.21.0rc0) (0.39.1)\n", + "Requirement already satisfied: PyYAML>=5.1.0 in /usr/local/lib/python3.10/dist-packages (from omegaconf<=2.3->nemo-toolkit==1.21.0rc0) (6.0.1)\n", + "Requirement already satisfied: protobuf>=3.20.2 in /usr/local/lib/python3.10/dist-packages (from onnx>=1.7.0->nemo-toolkit==1.21.0rc0) (3.20.3)\n", + "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil->nemo-toolkit==1.21.0rc0) (1.16.0)\n", + "Requirement already satisfied: fsspec[http]>2021.06.0 in /usr/local/lib/python3.10/dist-packages (from pytorch-lightning<=2.0.7,>=2.0->nemo-toolkit==1.21.0rc0) (2023.6.0)\n", + "Collecting lightning-utilities>=0.7.0 (from pytorch-lightning<=2.0.7,>=2.0->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for lightning-utilities>=0.7.0 from https://files.pythonhosted.org/packages/46/ee/8641eeb6a062f383b7d6875604e1f3f83bd2c93a0b4dbcabd3150b32de6e/lightning_utilities-0.9.0-py3-none-any.whl.metadata\n", + " Downloading lightning_utilities-0.9.0-py3-none-any.whl.metadata (4.6 kB)\n", + "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->nemo-toolkit==1.21.0rc0) (3.2.0)\n", + "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.10/dist-packages (from soundfile->nemo-toolkit==1.21.0rc0) (1.15.1)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->nemo-toolkit==1.21.0rc0) (3.12.3)\n", + "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->nemo-toolkit==1.21.0rc0) (1.12)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->nemo-toolkit==1.21.0rc0) (3.1)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->nemo-toolkit==1.21.0rc0) (3.1.2)\n", + "Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch->nemo-toolkit==1.21.0rc0) (2.0.0)\n", + "Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch->nemo-toolkit==1.21.0rc0) (3.27.2)\n", + "Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch->nemo-toolkit==1.21.0rc0) (16.0.6)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers>=4.0.1->nemo-toolkit==1.21.0rc0) (2.31.0)\n", + "Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers>=4.0.1->nemo-toolkit==1.21.0rc0)\n", + " Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.8/7.8 MB\u001b[0m \u001b[31m81.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting safetensors>=0.3.1 (from transformers>=4.0.1->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for safetensors>=0.3.1 from https://files.pythonhosted.org/packages/6c/f0/c17bbdb1e5f9dab29d44cade445135789f75f8f08ea2728d04493ea8412b/safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", + " Downloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.7 kB)\n", + "Collecting botocore<1.32.0,>=1.31.43 (from boto3->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for botocore<1.32.0,>=1.31.43 from https://files.pythonhosted.org/packages/88/37/68fd026cde5d1c802ab34290285c5019e7ba3de3eea8e6c07756cfb827ae/botocore-1.31.43-py3-none-any.whl.metadata\n", + " Downloading botocore-1.31.43-py3-none-any.whl.metadata (6.0 kB)\n", + "Collecting jmespath<2.0.0,>=0.7.1 (from boto3->nemo-toolkit==1.21.0rc0)\n", + " Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)\n", + "Collecting s3transfer<0.7.0,>=0.6.0 (from boto3->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for s3transfer<0.7.0,>=0.6.0 from https://files.pythonhosted.org/packages/d9/17/a3b666f5ef9543cfd3c661d39d1e193abb9649d0cfbbfee3cf3b51d5af02/s3transfer-0.6.2-py3-none-any.whl.metadata\n", + " Downloading s3transfer-0.6.2-py3-none-any.whl.metadata (1.8 kB)\n", + "Requirement already satisfied: pyarrow>=8.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets->nemo-toolkit==1.21.0rc0) (9.0.0)\n", + "Collecting dill<0.3.8,>=0.3.0 (from datasets->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for dill<0.3.8,>=0.3.0 from https://files.pythonhosted.org/packages/f5/3a/74a29b11cf2cdfcd6ba89c0cecd70b37cd1ba7b77978ce611eb7a146a832/dill-0.3.7-py3-none-any.whl.metadata\n", + " Downloading dill-0.3.7-py3-none-any.whl.metadata (9.9 kB)\n", + "Collecting xxhash (from datasets->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for xxhash from https://files.pythonhosted.org/packages/13/c3/e942893f4864a424514c81640f114980cfd5aff7e7414d1e0255f4571111/xxhash-3.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", + " Downloading xxhash-3.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)\n", + "Collecting multiprocess (from datasets->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for multiprocess from https://files.pythonhosted.org/packages/35/a8/36d8d7b3e46b377800d8dec47891cdf05842d1a2366909ae4a0c89fbc5e6/multiprocess-0.70.15-py310-none-any.whl.metadata\n", + " Downloading multiprocess-0.70.15-py310-none-any.whl.metadata (7.2 kB)\n", + "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets->nemo-toolkit==1.21.0rc0) (3.8.5)\n", + "Collecting pybind11>=2.2 (from fasttext->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for pybind11>=2.2 from https://files.pythonhosted.org/packages/06/55/9f73c32dda93fa4f539fafa268f9504e83c489f460c380371d94296126cd/pybind11-2.11.1-py3-none-any.whl.metadata\n", + " Using cached pybind11-2.11.1-py3-none-any.whl.metadata (9.5 kB)\n", + "Collecting aniso8601>=0.82 (from flask-restful->nemo-toolkit==1.21.0rc0)\n", + " Downloading aniso8601-9.0.1-py2.py3-none-any.whl (52 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m52.8/52.8 kB\u001b[0m \u001b[31m7.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: Flask>=0.8 in /usr/local/lib/python3.10/dist-packages (from flask-restful->nemo-toolkit==1.21.0rc0) (2.2.5)\n", + "Requirement already satisfied: pytz in /usr/local/lib/python3.10/dist-packages (from flask-restful->nemo-toolkit==1.21.0rc0) (2023.3.post1)\n", + "Requirement already satisfied: wcwidth>=0.2.5 in /usr/local/lib/python3.10/dist-packages (from ftfy->nemo-toolkit==1.21.0rc0) (0.2.6)\n", + "Collecting distance>=0.1.3 (from g2p-en->nemo-toolkit==1.21.0rc0)\n", + " Downloading Distance-0.1.3.tar.gz (180 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m180.3/180.3 kB\u001b[0m \u001b[31m24.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (from gdown->nemo-toolkit==1.21.0rc0) (4.11.2)\n", + "Requirement already satisfied: ipykernel>=4.5.1 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->nemo-toolkit==1.21.0rc0) (5.5.6)\n", + "Requirement already satisfied: ipython-genutils~=0.2.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->nemo-toolkit==1.21.0rc0) (0.2.0)\n", + "Requirement already satisfied: traitlets>=4.3.1 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->nemo-toolkit==1.21.0rc0) (5.7.1)\n", + "Requirement already satisfied: widgetsnbextension~=3.6.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->nemo-toolkit==1.21.0rc0) (3.6.5)\n", + "Requirement already satisfied: ipython>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->nemo-toolkit==1.21.0rc0) (7.34.0)\n", + "Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->nemo-toolkit==1.21.0rc0) (3.0.8)\n", + "Collecting cdifflib (from nemo-text-processing->nemo-toolkit==1.21.0rc0)\n", + " Downloading cdifflib-1.2.6.tar.gz (11 kB)\n", + " Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", + " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n", + " Installing backend dependencies ... \u001b[?25l\u001b[?25hdone\n", + " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", + "Collecting pynini==2.1.5 (from nemo-text-processing->nemo-toolkit==1.21.0rc0)\n", + " Downloading pynini-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (161.3 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m161.3/161.3 MB\u001b[0m \u001b[31m6.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: Cython>=0.29 in /usr/local/lib/python3.10/dist-packages (from pynini==2.1.5->nemo-text-processing->nemo-toolkit==1.21.0rc0) (0.29.36)\n", + "Requirement already satisfied: sortedcontainers>=2.0.4 in /usr/local/lib/python3.10/dist-packages (from pyannote.core->nemo-toolkit==1.21.0rc0) (2.4.0)\n", + "Collecting pyannote.database>=4.0.1 (from pyannote.metrics->nemo-toolkit==1.21.0rc0)\n", + " Downloading pyannote.database-5.0.1-py3-none-any.whl (48 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m48.1/48.1 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting docopt>=0.6.2 (from pyannote.metrics->nemo-toolkit==1.21.0rc0)\n", + " Downloading docopt-0.6.2.tar.gz (25 kB)\n", + " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Requirement already satisfied: iniconfig in /usr/local/lib/python3.10/dist-packages (from pytest->nemo-toolkit==1.21.0rc0) (2.0.0)\n", + "Requirement already satisfied: pluggy<2.0,>=0.12 in /usr/local/lib/python3.10/dist-packages (from pytest->nemo-toolkit==1.21.0rc0) (1.3.0)\n", + "Requirement already satisfied: exceptiongroup>=1.0.0rc8 in /usr/local/lib/python3.10/dist-packages (from pytest->nemo-toolkit==1.21.0rc0) (1.1.3)\n", + "Requirement already satisfied: tomli>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from pytest->nemo-toolkit==1.21.0rc0) (2.0.1)\n", + "Requirement already satisfied: absl-py in /usr/local/lib/python3.10/dist-packages (from rouge-score->nemo-toolkit==1.21.0rc0) (1.4.0)\n", + "Collecting ruamel.yaml.clib>=0.2.7 (from ruamel.yaml->nemo-toolkit==1.21.0rc0)\n", + " Downloading ruamel.yaml.clib-0.2.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (485 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m485.6/485.6 kB\u001b[0m \u001b[31m49.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting portalocker (from sacrebleu[ja]->nemo-toolkit==1.21.0rc0)\n", + " Downloading portalocker-2.7.0-py2.py3-none-any.whl (15 kB)\n", + "Collecting colorama (from sacrebleu[ja]->nemo-toolkit==1.21.0rc0)\n", + " Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)\n", + "Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (from sacrebleu[ja]->nemo-toolkit==1.21.0rc0) (4.9.3)\n", + "Collecting mecab-python3==1.0.5 (from sacrebleu[ja]->nemo-toolkit==1.21.0rc0)\n", + " Downloading mecab_python3-1.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (581 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m581.1/581.1 kB\u001b[0m \u001b[31m56.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting ipadic<2.0,>=1.0 (from sacrebleu[ja]->nemo-toolkit==1.21.0rc0)\n", + " Downloading ipadic-1.0.0.tar.gz (13.4 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.4/13.4 MB\u001b[0m \u001b[31m96.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Requirement already satisfied: torchvision in /usr/local/lib/python3.10/dist-packages (from sentence-transformers->nemo-toolkit==1.21.0rc0) (0.15.2+cu118)\n", + "Requirement already satisfied: sphinxcontrib-applehelp in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (1.0.7)\n", + "Requirement already satisfied: sphinxcontrib-devhelp in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (1.0.5)\n", + "Requirement already satisfied: sphinxcontrib-jsmath in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (1.0.1)\n", + "Requirement already satisfied: sphinxcontrib-htmlhelp>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (2.0.4)\n", + "Requirement already satisfied: sphinxcontrib-serializinghtml>=1.1.5 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (1.1.9)\n", + "Requirement already satisfied: sphinxcontrib-qthelp in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (1.0.6)\n", + "Requirement already satisfied: Pygments>=2.0 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (2.16.1)\n", + "Requirement already satisfied: docutils<0.19,>=0.14 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (0.18.1)\n", + "Requirement already satisfied: snowballstemmer>=1.1 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (2.2.0)\n", + "Requirement already satisfied: babel>=1.3 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (2.12.1)\n", + "Requirement already satisfied: alabaster<0.8,>=0.7 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (0.7.13)\n", + "Requirement already satisfied: imagesize in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (1.4.1)\n", + "Collecting docutils<0.19,>=0.14 (from sphinx->nemo-toolkit==1.21.0rc0)\n", + " Downloading docutils-0.17.1-py2.py3-none-any.whl (575 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m575.5/575.5 kB\u001b[0m \u001b[31m42.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting pybtex>=0.24 (from sphinxcontrib-bibtex->nemo-toolkit==1.21.0rc0)\n", + " Downloading pybtex-0.24.0-py2.py3-none-any.whl (561 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m561.4/561.4 kB\u001b[0m \u001b[31m48.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting pybtex-docutils>=1.0.0 (from sphinxcontrib-bibtex->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for pybtex-docutils>=1.0.0 from https://files.pythonhosted.org/packages/11/b1/ce1f4596211efb5410e178a803f08e59b20bedb66837dcf41e21c54f9ec1/pybtex_docutils-1.0.3-py3-none-any.whl.metadata\n", + " Downloading pybtex_docutils-1.0.3-py3-none-any.whl.metadata (4.3 kB)\n", + "Requirement already satisfied: grpcio>=1.48.2 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (1.57.0)\n", + "Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (2.17.3)\n", + "Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (1.0.0)\n", + "Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (3.4.4)\n", + "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (0.7.1)\n", + "Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (2.3.7)\n", + "Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (0.41.2)\n", + "Collecting plac (from texterrors->nemo-toolkit==1.21.0rc0)\n", + " Downloading plac-1.3.5-py2.py3-none-any.whl (22 kB)\n", + "Collecting loguru (from texterrors->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for loguru from https://files.pythonhosted.org/packages/19/a9/4e91197b121a41c640367641a510fd9a05bb7a3259fc9678ee2976c8fd00/loguru-0.7.1-py3-none-any.whl.metadata\n", + " Downloading loguru-0.7.1-py3-none-any.whl.metadata (22 kB)\n", + "Requirement already satisfied: termcolor in /usr/local/lib/python3.10/dist-packages (from texterrors->nemo-toolkit==1.21.0rc0) (2.3.0)\n", + "Collecting Levenshtein (from texterrors->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for Levenshtein from https://files.pythonhosted.org/packages/e6/02/0a4ed6a9e2b78f6b57f25a87fc194d7d10c2bbe95d985f36390e86285232/Levenshtein-0.21.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", + " Downloading Levenshtein-0.21.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)\n", + "Collecting GitPython!=3.1.29,>=1.0.0 (from wandb->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for GitPython!=3.1.29,>=1.0.0 from https://files.pythonhosted.org/packages/0f/c6/bb9e2276b6fed126aa21e292493b45a3df4cfba7cbfcf2ab8809a6b0e718/GitPython-3.1.35-py3-none-any.whl.metadata\n", + " Downloading GitPython-3.1.35-py3-none-any.whl.metadata (10 kB)\n", + "Requirement already satisfied: psutil>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from wandb->nemo-toolkit==1.21.0rc0) (5.9.5)\n", + "Collecting sentry-sdk>=1.0.0 (from wandb->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for sentry-sdk>=1.0.0 from https://files.pythonhosted.org/packages/17/22/dbd5f854f373214d48585eeb6844e50a8dd1600f435d9033493f76f66dfa/sentry_sdk-1.30.0-py2.py3-none-any.whl.metadata\n", + " Downloading sentry_sdk-1.30.0-py2.py3-none-any.whl.metadata (9.6 kB)\n", + "Collecting docker-pycreds>=0.4.0 (from wandb->nemo-toolkit==1.21.0rc0)\n", + " Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)\n", + "Collecting pathtools (from wandb->nemo-toolkit==1.21.0rc0)\n", + " Downloading pathtools-0.1.2.tar.gz (11 kB)\n", + " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Collecting setproctitle (from wandb->nemo-toolkit==1.21.0rc0)\n", + " Downloading setproctitle-1.3.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30 kB)\n", + "Collecting asciitree (from zarr->nemo-toolkit==1.21.0rc0)\n", + " Downloading asciitree-0.3.3.tar.gz (4.0 kB)\n", + " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Collecting fasteners (from zarr->nemo-toolkit==1.21.0rc0)\n", + " Downloading fasteners-0.18-py3-none-any.whl (18 kB)\n", + "Collecting numcodecs>=0.10.0 (from zarr->nemo-toolkit==1.21.0rc0)\n", + " Downloading numcodecs-0.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.7 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.7/6.7 MB\u001b[0m \u001b[31m116.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting urllib3<1.27,>=1.25.4 (from botocore<1.32.0,>=1.31.43->boto3->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for urllib3<1.27,>=1.25.4 from https://files.pythonhosted.org/packages/c5/05/c214b32d21c0b465506f95c4f28ccbcba15022e000b043b72b3df7728471/urllib3-1.26.16-py2.py3-none-any.whl.metadata\n", + " Downloading urllib3-1.26.16-py2.py3-none-any.whl.metadata (48 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m48.4/48.4 kB\u001b[0m \u001b[31m6.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.0->soundfile->nemo-toolkit==1.21.0rc0) (2.21)\n", + "Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from Flask>=0.8->flask-restful->nemo-toolkit==1.21.0rc0) (2.1.2)\n", + "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->nemo-toolkit==1.21.0rc0) (3.2.0)\n", + "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->nemo-toolkit==1.21.0rc0) (6.0.4)\n", + "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->nemo-toolkit==1.21.0rc0) (4.0.3)\n", + "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->nemo-toolkit==1.21.0rc0) (1.9.2)\n", + "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->nemo-toolkit==1.21.0rc0) (1.4.0)\n", + "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->nemo-toolkit==1.21.0rc0) (1.3.1)\n", + "Collecting gitdb<5,>=4.0.1 (from GitPython!=3.1.29,>=1.0.0->wandb->nemo-toolkit==1.21.0rc0)\n", + " Downloading gitdb-4.0.10-py3-none-any.whl (62 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.7/62.7 kB\u001b[0m \u001b[31m8.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard->nemo-toolkit==1.21.0rc0) (5.3.1)\n", + "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard->nemo-toolkit==1.21.0rc0) (0.3.0)\n", + "Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard->nemo-toolkit==1.21.0rc0) (4.9)\n", + "Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard->nemo-toolkit==1.21.0rc0) (1.3.1)\n", + "Requirement already satisfied: jupyter-client in /usr/local/lib/python3.10/dist-packages (from ipykernel>=4.5.1->ipywidgets->nemo-toolkit==1.21.0rc0) (6.1.12)\n", + "Requirement already satisfied: tornado>=4.2 in /usr/local/lib/python3.10/dist-packages (from ipykernel>=4.5.1->ipywidgets->nemo-toolkit==1.21.0rc0) (6.3.2)\n", + "Collecting jedi>=0.16 (from ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for jedi>=0.16 from https://files.pythonhosted.org/packages/8e/46/7e3ae3aa2dcfcffc5138c6cef5448523218658411c84a2000bf75c8d3ec1/jedi-0.19.0-py2.py3-none-any.whl.metadata\n", + " Downloading jedi-0.19.0-py2.py3-none-any.whl.metadata (22 kB)\n", + "Requirement already satisfied: pickleshare in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.7.5)\n", + "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (3.0.39)\n", + "Requirement already satisfied: backcall in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.2.0)\n", + "Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.1.6)\n", + "Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (4.8.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->nemo-toolkit==1.21.0rc0) (2.1.3)\n", + "Requirement already satisfied: entrypoints in /usr/local/lib/python3.10/dist-packages (from numcodecs>=0.10.0->zarr->nemo-toolkit==1.21.0rc0) (0.4)\n", + "Requirement already satisfied: platformdirs>=2.5.0 in /usr/local/lib/python3.10/dist-packages (from pooch>=1.0->librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (3.10.0)\n", + "Requirement already satisfied: typer[all]>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit==1.21.0rc0) (0.9.0)\n", + "Collecting latexcodec>=1.0.4 (from pybtex>=0.24->sphinxcontrib-bibtex->nemo-toolkit==1.21.0rc0)\n", + " Downloading latexcodec-2.0.1-py2.py3-none-any.whl (18 kB)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.0.1->nemo-toolkit==1.21.0rc0) (3.4)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.0.1->nemo-toolkit==1.21.0rc0) (2023.7.22)\n", + "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->nemo-toolkit==1.21.0rc0) (1.3.0)\n", + "Requirement already satisfied: notebook>=4.4.1 in /usr/local/lib/python3.10/dist-packages (from widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (6.5.5)\n", + "Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4->gdown->nemo-toolkit==1.21.0rc0) (2.5)\n", + "Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.0.1->nemo-toolkit==1.21.0rc0) (1.7.1)\n", + "Collecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->GitPython!=3.1.29,>=1.0.0->wandb->nemo-toolkit==1.21.0rc0)\n", + " Downloading smmap-5.0.0-py3-none-any.whl (24 kB)\n", + "Requirement already satisfied: parso<0.9.0,>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.8.3)\n", + "Requirement already satisfied: pyzmq<25,>=17 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (23.2.1)\n", + "Requirement already satisfied: argon2-cffi in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (23.1.0)\n", + "Requirement already satisfied: jupyter-core>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (5.3.1)\n", + "Requirement already satisfied: nbformat in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (5.9.2)\n", + "Requirement already satisfied: nbconvert>=5 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (6.5.4)\n", + "Requirement already satisfied: nest-asyncio>=1.5 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.5.7)\n", + "Requirement already satisfied: Send2Trash>=1.8.0 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.8.2)\n", + "Requirement already satisfied: terminado>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.17.1)\n", + "Requirement already satisfied: prometheus-client in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.17.1)\n", + "Requirement already satisfied: nbclassic>=0.4.7 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.0.0)\n", + "Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.10/dist-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.7.0)\n", + "Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /usr/local/lib/python3.10/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard->nemo-toolkit==1.21.0rc0) (0.5.0)\n", + "Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard->nemo-toolkit==1.21.0rc0) (3.2.2)\n", + "Collecting shellingham<2.0.0,>=1.3.0 (from typer[all]>=0.2.1->pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit==1.21.0rc0)\n", + " Obtaining dependency information for shellingham<2.0.0,>=1.3.0 from https://files.pythonhosted.org/packages/57/70/0265437683625b2e6491736706d3d679d90e2a26f6bff59f4e46e09872b9/shellingham-1.5.3-py2.py3-none-any.whl.metadata\n", + " Downloading shellingham-1.5.3-py2.py3-none-any.whl.metadata (3.4 kB)\n", + "Requirement already satisfied: rich<14.0.0,>=10.11.0 in /usr/local/lib/python3.10/dist-packages (from typer[all]>=0.2.1->pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit==1.21.0rc0) (13.5.2)\n", + "Requirement already satisfied: jupyter-server>=1.8 in /usr/local/lib/python3.10/dist-packages (from nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.24.0)\n", + "Requirement already satisfied: notebook-shim>=0.2.3 in /usr/local/lib/python3.10/dist-packages (from nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.2.3)\n", + "Requirement already satisfied: bleach in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (6.0.0)\n", + "Requirement already satisfied: defusedxml in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.7.1)\n", + "Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.2.2)\n", + "Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.8.4)\n", + "Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.8.0)\n", + "Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.5.0)\n", + "Requirement already satisfied: tinycss2 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.2.1)\n", + "Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.10/dist-packages (from nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (2.18.0)\n", + "Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.10/dist-packages (from nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (4.19.0)\n", + "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich<14.0.0,>=10.11.0->typer[all]>=0.2.1->pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit==1.21.0rc0) (3.0.0)\n", + "Requirement already satisfied: argon2-cffi-bindings in /usr/local/lib/python3.10/dist-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (21.2.0)\n", + "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (2023.7.1)\n", + "Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.30.2)\n", + "Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.10.2)\n", + "Requirement already satisfied: anyio<4,>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from jupyter-server>=1.8->nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (3.7.1)\n", + "Requirement already satisfied: websocket-client in /usr/local/lib/python3.10/dist-packages (from jupyter-server>=1.8->nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.6.2)\n", + "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich<14.0.0,>=10.11.0->typer[all]>=0.2.1->pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit==1.21.0rc0) (0.1.2)\n", + "Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from bleach->nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.5.1)\n", + "Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.10/dist-packages (from anyio<4,>=3.1.0->jupyter-server>=1.8->nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.3.0)\n", + "Downloading megatron_core-0.2.0-py3-none-any.whl (46 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m46.0/46.0 kB\u001b[0m \u001b[31m4.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading onnx-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.6 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.6/14.6 MB\u001b[0m \u001b[31m71.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading pydantic-1.10.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.1/3.1 MB\u001b[0m \u001b[31m81.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading pytorch_lightning-2.0.7-py3-none-any.whl (724 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m725.0/725.0 kB\u001b[0m \u001b[31m55.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading torchmetrics-1.1.1-py3-none-any.whl (763 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m763.4/763.4 kB\u001b[0m \u001b[31m51.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading transformers-4.33.1-py3-none-any.whl (7.6 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.6/7.6 MB\u001b[0m \u001b[31m90.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m268.8/268.8 kB\u001b[0m \u001b[31m31.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading boto3-1.28.43-py3-none-any.whl (135 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m135.8/135.8 kB\u001b[0m \u001b[31m18.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading datasets-2.14.5-py3-none-any.whl (519 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m519.6/519.6 kB\u001b[0m \u001b[31m48.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading Flask_RESTful-0.3.10-py2.py3-none-any.whl (26 kB)\n", + "Downloading ijson-3.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (111 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m111.8/111.8 kB\u001b[0m \u001b[31m14.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading kornia-0.7.0-py2.py3-none-any.whl (705 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m705.7/705.7 kB\u001b[0m \u001b[31m35.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading markdown2-2.4.10-py2.py3-none-any.whl (39 kB)\n", + "Downloading marshmallow-3.20.1-py3-none-any.whl (49 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.4/49.4 kB\u001b[0m \u001b[31m6.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nemo_text_processing-0.2.0rc0-py3-none-any.whl (2.4 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.4/2.4 MB\u001b[0m \u001b[31m50.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading pypinyin-0.49.0-py2.py3-none-any.whl (1.4 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.4/1.4 MB\u001b[0m \u001b[31m69.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading pypinyin_dict-0.6.0-py2.py3-none-any.whl (9.5 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m9.5/9.5 MB\u001b[0m \u001b[31m86.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading ruamel.yaml-0.17.32-py3-none-any.whl (112 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m112.2/112.2 kB\u001b[0m \u001b[31m11.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading sphinxcontrib_bibtex-2.6.1-py3-none-any.whl (40 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m40.9/40.9 kB\u001b[0m \u001b[31m5.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading wandb-0.15.10-py3-none-any.whl (2.1 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.1/2.1 MB\u001b[0m \u001b[31m76.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading zarr-2.16.1-py3-none-any.whl (206 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m206.9/206.9 kB\u001b[0m \u001b[31m21.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading botocore-1.31.43-py3-none-any.whl (11.2 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m11.2/11.2 MB\u001b[0m \u001b[31m87.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading dill-0.3.7-py3-none-any.whl (115 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m115.3/115.3 kB\u001b[0m \u001b[31m14.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading GitPython-3.1.35-py3-none-any.whl (188 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m188.8/188.8 kB\u001b[0m \u001b[31m23.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading lightning_utilities-0.9.0-py3-none-any.whl (23 kB)\n", + "Downloading pathspec-0.11.2-py3-none-any.whl (29 kB)\n", + "Using cached pybind11-2.11.1-py3-none-any.whl (227 kB)\n", + "Downloading pybtex_docutils-1.0.3-py3-none-any.whl (6.4 kB)\n", + "Downloading s3transfer-0.6.2-py3-none-any.whl (79 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.8/79.8 kB\u001b[0m \u001b[31m10.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m77.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading sentry_sdk-1.30.0-py2.py3-none-any.whl (218 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m218.8/218.8 kB\u001b[0m \u001b[31m26.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading typed_ast-1.5.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (824 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m824.7/824.7 kB\u001b[0m \u001b[31m52.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading Levenshtein-0.21.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (172 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m172.5/172.5 kB\u001b[0m \u001b[31m21.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading loguru-0.7.1-py3-none-any.whl (61 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m61.4/61.4 kB\u001b[0m \u001b[31m8.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading multiprocess-0.70.15-py310-none-any.whl (134 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m16.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading xxhash-3.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m14.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading jedi-0.19.0-py2.py3-none-any.whl (1.6 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.6/1.6 MB\u001b[0m \u001b[31m66.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading urllib3-1.26.16-py2.py3-none-any.whl (143 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m143.1/143.1 kB\u001b[0m \u001b[31m18.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading shellingham-1.5.3-py2.py3-none-any.whl (9.7 kB)\n", + "Building wheels for collected packages: antlr4-python3-runtime, progress, sacremoses, youtokentome, fasttext, kaldi-python-io, nemo-toolkit, rouge-score, sentence-transformers, wget, distance, docopt, ipadic, asciitree, cdifflib, pathtools\n", + " Building wheel for antlr4-python3-runtime (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144554 sha256=ad5d3c0bd8ee550570d532ca332fe7204baee65f2066eefcf5fc70cd2afdb382\n", + " Stored in directory: /root/.cache/pip/wheels/12/93/dd/1f6a127edc45659556564c5730f6d4e300888f4bca2d4c5a88\n", + " Building wheel for progress (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for progress: filename=progress-1.6-py3-none-any.whl size=9610 sha256=79cc5bc84a633414a3856bc3a35542e0489aaffa62e7cdae95e51474a38f064c\n", + " Stored in directory: /root/.cache/pip/wheels/a2/68/5f/c339b20a41659d856c93ccdce6a33095493eb82c3964aac5a1\n", + " Building wheel for sacremoses (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895241 sha256=1fd4f16d193691e8ff7869591f5d11f5a32e21289fbb9a8422a800bc1bf781c7\n", + " Stored in directory: /root/.cache/pip/wheels/00/24/97/a2ea5324f36bc626e1ea0267f33db6aa80d157ee977e9e42fb\n", + " Building wheel for youtokentome (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for youtokentome: filename=youtokentome-1.0.6-cp310-cp310-linux_x86_64.whl size=1883668 sha256=d3e4925cf4b245a76184b4afe612a4fffd683de42889b5c1a07ca44be98dde9b\n", + " Stored in directory: /root/.cache/pip/wheels/df/85/f8/301d2ba45f43f30bed2fe413efa760bc726b8b660ed9c2900c\n", + " Building wheel for fasttext (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for fasttext: filename=fasttext-0.9.2-cp310-cp310-linux_x86_64.whl size=4199767 sha256=5bcc699963ee85f0f02846c1e278140b3ba3d95a7877d60b48c5b10f8f424bb1\n", + " Stored in directory: /root/.cache/pip/wheels/a5/13/75/f811c84a8ab36eedbaef977a6a58a98990e8e0f1967f98f394\n", + " Building wheel for kaldi-python-io (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for kaldi-python-io: filename=kaldi_python_io-1.2.2-py3-none-any.whl size=8948 sha256=d2e946cb4c2298a69803c0492c1533a3d08d313ced846ba7f1555a11ab164bd6\n", + " Stored in directory: /root/.cache/pip/wheels/b7/23/5f/49d3a826be576faf61d84e8028e1914bb36a5586ee2613b087\n", + " Building editable for nemo-toolkit (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for nemo-toolkit: filename=nemo_toolkit-1.21.0rc0-0.editable-py3-none-any.whl size=9181 sha256=d5c26c7630db00d8c6a483e19f89aa11a4d7a11920dcb052d8bb6a56689eb77d\n", + " Stored in directory: /tmp/pip-ephem-wheel-cache-4nra31ak/wheels/41/a7/71/00ccdddfb43c015e8d025853cafb90117dd722fd8ec557581b\n", + " Building wheel for rouge-score (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24932 sha256=7f61e2fe8b2f326d21f3a0a06e5b58e2f1103a5523bac58ff728481bfdf203d8\n", + " Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4\n", + " Building wheel for sentence-transformers (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for sentence-transformers: filename=sentence_transformers-2.2.2-py3-none-any.whl size=125923 sha256=b4fb59313f79b135e68441f9b4780a35d2658bac193693e6177887cb018824a7\n", + " Stored in directory: /root/.cache/pip/wheels/62/f2/10/1e606fd5f02395388f74e7462910fe851042f97238cbbd902f\n", + " Building wheel for wget (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9655 sha256=dd1cfe282f727f91b5a5db464ae58f43372f62a80d0a083733b36c68edc6ee76\n", + " Stored in directory: /root/.cache/pip/wheels/8b/f1/7f/5c94f0a7a505ca1c81cd1d9208ae2064675d97582078e6c769\n", + " Building wheel for distance (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for distance: filename=Distance-0.1.3-py3-none-any.whl size=16258 sha256=287628136639656b068a6f2e67ab0f5a80bc4ad466590f19499978136c5e3726\n", + " Stored in directory: /root/.cache/pip/wheels/e8/bb/de/f71bf63559ea9a921059a5405806f7ff6ed612a9231c4a9309\n", + " Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13705 sha256=9af9a132b49b73c475f9838a56ff79e968af41bb509c8b38b81f7c08c3da87dd\n", + " Stored in directory: /root/.cache/pip/wheels/fc/ab/d4/5da2067ac95b36618c629a5f93f809425700506f72c9732fac\n", + " Building wheel for ipadic (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for ipadic: filename=ipadic-1.0.0-py3-none-any.whl size=13556703 sha256=9e40daac3101fb3d52fcf9c986017638569c576c3be36805d1437e722a2d8803\n", + " Stored in directory: /root/.cache/pip/wheels/5b/ea/e3/2f6e0860a327daba3b030853fce4483ed37468bbf1101c59c3\n", + " Building wheel for asciitree (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for asciitree: filename=asciitree-0.3.3-py3-none-any.whl size=5034 sha256=69d1478059cdad8cb54b8a1a29e66bd3229cf3458dcf96aeedb978381d8116f3\n", + " Stored in directory: /root/.cache/pip/wheels/7f/4e/be/1171b40f43b918087657ec57cf3b81fa1a2e027d8755baa184\n", + " Building wheel for cdifflib (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for cdifflib: filename=cdifflib-1.2.6-cp310-cp310-linux_x86_64.whl size=27681 sha256=f4ccba50f62b7265ea20b84bac762b6ed3dacae8d42ea7346d54e07da2a6ad9e\n", + " Stored in directory: /root/.cache/pip/wheels/87/a7/fd/8061e24ed08689045cb6d1ca303768dc463b20a5a338174841\n", + " Building wheel for pathtools (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for pathtools: filename=pathtools-0.1.2-py3-none-any.whl size=8791 sha256=95b669a715a9eefa4cd1222ac5c899a97064dbb765dccbd3a5b35c0208aa1389\n", + " Stored in directory: /root/.cache/pip/wheels/e7/f3/22/152153d6eb222ee7a56ff8617d80ee5207207a8c00a7aab794\n", + "Successfully built antlr4-python3-runtime progress sacremoses youtokentome fasttext kaldi-python-io nemo-toolkit rouge-score sentence-transformers wget distance docopt ipadic asciitree cdifflib pathtools\n", + "Installing collected packages: wget, tokenizers, sentencepiece, safetensors, pydub, progress, plac, pathtools, pangu, opencc, mecab-python3, ipadic, ijson, faiss-cpu, docopt, distance, braceexpand, asciitree, antlr4-python3-runtime, aniso8601, xxhash, webdataset, urllib3, typed-ast, textdistance, sox, smmap, shellingham, setuptools, setproctitle, ruamel.yaml.clib, rapidfuzz, pytest-runner, pypinyin, pynini, pydantic, pybind11, portalocker, pathspec, parameterized, onnx, omegaconf, numcodecs, marshmallow, markdown2, loguru, lightning-utilities, latexcodec, kaldiio, kaldi-python-io, jmespath, jedi, isort, ftfy, fasteners, einops, docutils, docker-pycreds, dill, colorama, click, cdifflib, attrdict, zarr, youtokentome, sentry-sdk, sacremoses, sacrebleu, ruamel.yaml, pypinyin-dict, pybtex, pyannote.core, multiprocess, Levenshtein, jiwer, hydra-core, gitdb, fasttext, botocore, black, texterrors, s3transfer, rouge-score, pybtex-docutils, huggingface-hub, GitPython, g2p-en, flask-restful, wandb, transformers, pyannote.database, datasets, boto3, pyannote.metrics, nemo-text-processing, torchmetrics, sphinxcontrib-bibtex, sentence-transformers, pytorch-lightning, nemo-toolkit, megatron-core, kornia\n", + " Attempting uninstall: urllib3\n", + " Found existing installation: urllib3 2.0.4\n", + " Uninstalling urllib3-2.0.4:\n", + " Successfully uninstalled urllib3-2.0.4\n", + " Attempting uninstall: setuptools\n", + " Found existing installation: setuptools 67.7.2\n", + " Uninstalling setuptools-67.7.2:\n", + " Successfully uninstalled setuptools-67.7.2\n", + " Attempting uninstall: pydantic\n", + " Found existing installation: pydantic 2.3.0\n", + " Uninstalling pydantic-2.3.0:\n", + " Successfully uninstalled pydantic-2.3.0\n", + " Attempting uninstall: docutils\n", + " Found existing installation: docutils 0.18.1\n", + " Uninstalling docutils-0.18.1:\n", + " Successfully uninstalled docutils-0.18.1\n", + " Attempting uninstall: click\n", + " Found existing installation: click 8.1.7\n", + " Uninstalling click-8.1.7:\n", + " Successfully uninstalled click-8.1.7\n", + "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "cvxpy 1.3.2 requires setuptools>65.5.1, but you have setuptools 65.5.1 which is incompatible.\u001b[0m\u001b[31m\n", + "\u001b[0mSuccessfully installed GitPython-3.1.35 Levenshtein-0.21.1 aniso8601-9.0.1 antlr4-python3-runtime-4.9.3 asciitree-0.3.3 attrdict-2.0.1 black-19.10b0 boto3-1.28.43 botocore-1.31.43 braceexpand-0.1.7 cdifflib-1.2.6 click-8.0.2 colorama-0.4.6 datasets-2.14.5 dill-0.3.7 distance-0.1.3 docker-pycreds-0.4.0 docopt-0.6.2 docutils-0.17.1 einops-0.6.1 faiss-cpu-1.7.4 fasteners-0.18 fasttext-0.9.2 flask-restful-0.3.10 ftfy-6.1.1 g2p-en-2.1.0 gitdb-4.0.10 huggingface-hub-0.16.4 hydra-core-1.3.2 ijson-3.2.3 ipadic-1.0.0 isort-5.12.0 jedi-0.19.0 jiwer-2.5.2 jmespath-1.0.1 kaldi-python-io-1.2.2 kaldiio-2.18.0 kornia-0.7.0 latexcodec-2.0.1 lightning-utilities-0.9.0 loguru-0.7.1 markdown2-2.4.10 marshmallow-3.20.1 mecab-python3-1.0.5 megatron-core-0.2.0 multiprocess-0.70.15 nemo-text-processing-0.2.0rc0 nemo-toolkit-1.21.0rc0 numcodecs-0.11.0 omegaconf-2.3.0 onnx-1.14.1 opencc-1.1.6 pangu-4.0.6.1 parameterized-0.9.0 pathspec-0.11.2 pathtools-0.1.2 plac-1.3.5 portalocker-2.7.0 progress-1.6 pyannote.core-5.0.0 pyannote.database-5.0.1 pyannote.metrics-3.2.1 pybind11-2.11.1 pybtex-0.24.0 pybtex-docutils-1.0.3 pydantic-1.10.12 pydub-0.25.1 pynini-2.1.5 pypinyin-0.49.0 pypinyin-dict-0.6.0 pytest-runner-6.0.0 pytorch-lightning-2.0.7 rapidfuzz-2.13.7 rouge-score-0.1.2 ruamel.yaml-0.17.32 ruamel.yaml.clib-0.2.7 s3transfer-0.6.2 sacrebleu-2.3.1 sacremoses-0.0.53 safetensors-0.3.3 sentence-transformers-2.2.2 sentencepiece-0.1.99 sentry-sdk-1.30.0 setproctitle-1.3.2 setuptools-65.5.1 shellingham-1.5.3 smmap-5.0.0 sox-1.4.1 sphinxcontrib-bibtex-2.6.1 textdistance-4.5.0 texterrors-0.4.4 tokenizers-0.13.3 torchmetrics-1.1.1 transformers-4.33.1 typed-ast-1.5.5 urllib3-1.26.16 wandb-0.15.10 webdataset-0.1.62 wget-3.2 xxhash-3.3.0 youtokentome-1.0.6 zarr-2.16.1\n", + "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", + "\u001b[0mAll done!\n", + "Reading package lists... Done\n", + "Building dependency tree... Done\n", + "Reading state information... Done\n", + "The following additional packages will be installed:\n", + " libopencore-amrnb0 libopencore-amrwb0 libsox-fmt-alsa libsox-fmt-base\n", + " libsox3 libwavpack1\n", + "Suggested packages:\n", + " libsox-fmt-all\n", + "The following NEW packages will be installed:\n", + " libopencore-amrnb0 libopencore-amrwb0 libsox-fmt-alsa libsox-fmt-base\n", + " libsox3 libwavpack1 sox\n", + "0 upgraded, 7 newly installed, 0 to remove and 16 not upgraded.\n", + "Need to get 617 kB of archives.\n", + "After this operation, 1,764 kB of additional disk space will be used.\n", + "Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libopencore-amrnb0 amd64 0.1.5-1 [94.8 kB]\n", + "Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libopencore-amrwb0 amd64 0.1.5-1 [49.1 kB]\n", + "Get:3 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 libsox3 amd64 14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1 [240 kB]\n", + "Get:4 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 libsox-fmt-alsa amd64 14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1 [11.2 kB]\n", + "Get:5 http://archive.ubuntu.com/ubuntu jammy/main amd64 libwavpack1 amd64 5.4.0-1build2 [83.7 kB]\n", + "Get:6 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 libsox-fmt-base amd64 14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1 [33.7 kB]\n", + "Get:7 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 sox amd64 14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1 [104 kB]\n", + "Fetched 617 kB in 0s (5,224 kB/s)\n", + "Selecting previously unselected package libopencore-amrnb0:amd64.\n", + "(Reading database ... 120901 files and directories currently installed.)\n", + "Preparing to unpack .../0-libopencore-amrnb0_0.1.5-1_amd64.deb ...\n", + "Unpacking libopencore-amrnb0:amd64 (0.1.5-1) ...\n", + "Selecting previously unselected package libopencore-amrwb0:amd64.\n", + "Preparing to unpack .../1-libopencore-amrwb0_0.1.5-1_amd64.deb ...\n", + "Unpacking libopencore-amrwb0:amd64 (0.1.5-1) ...\n", + "Selecting previously unselected package libsox3:amd64.\n", + "Preparing to unpack .../2-libsox3_14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1_amd64.deb ...\n", + "Unpacking libsox3:amd64 (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", + "Selecting previously unselected package libsox-fmt-alsa:amd64.\n", + "Preparing to unpack .../3-libsox-fmt-alsa_14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1_amd64.deb ...\n", + "Unpacking libsox-fmt-alsa:amd64 (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", + "Selecting previously unselected package libwavpack1:amd64.\n", + "Preparing to unpack .../4-libwavpack1_5.4.0-1build2_amd64.deb ...\n", + "Unpacking libwavpack1:amd64 (5.4.0-1build2) ...\n", + "Selecting previously unselected package libsox-fmt-base:amd64.\n", + "Preparing to unpack .../5-libsox-fmt-base_14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1_amd64.deb ...\n", + "Unpacking libsox-fmt-base:amd64 (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", + "Selecting previously unselected package sox.\n", + "Preparing to unpack .../6-sox_14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1_amd64.deb ...\n", + "Unpacking sox (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", + "Setting up libsox3:amd64 (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", + "Setting up libopencore-amrwb0:amd64 (0.1.5-1) ...\n", + "Setting up libsox-fmt-alsa:amd64 (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", + "Setting up libwavpack1:amd64 (5.4.0-1build2) ...\n", + "Setting up libopencore-amrnb0:amd64 (0.1.5-1) ...\n", + "Setting up libsox-fmt-base:amd64 (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", + "Setting up sox (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", + "Processing triggers for man-db (2.10.2-1) ...\n", + "Processing triggers for libc-bin (2.35-0ubuntu3.1) ...\n", + "/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link\n", + "\n", + "/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link\n", + "\n", + "/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link\n", + "\n", + "/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link\n", + "\n", + "/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link\n", + "\n", + "/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link\n", + "\n", + "Collecting dash>=2.1.0 (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", + " Obtaining dependency information for dash>=2.1.0 from https://files.pythonhosted.org/packages/9b/b4/d522c16b41a8da013fd60a67f9618e57c504cd2c80e02a7a861413b93906/dash-2.13.0-py3-none-any.whl.metadata\n", + " Downloading dash-2.13.0-py3-none-any.whl.metadata (11 kB)\n", + "Collecting dash_bootstrap_components>=1.0.3 (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 2))\n", + " Obtaining dependency information for dash_bootstrap_components>=1.0.3 from https://files.pythonhosted.org/packages/cd/2a/cf963336e8b6745406d357e2f2b33ff1f236531fcadbe250096931855ec0/dash_bootstrap_components-1.5.0-py3-none-any.whl.metadata\n", + " Downloading dash_bootstrap_components-1.5.0-py3-none-any.whl.metadata (5.2 kB)\n", + "Collecting diff_match_patch (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 3))\n", + " Downloading diff_match_patch-20230430-py3-none-any.whl (42 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m42.8/42.8 kB\u001b[0m \u001b[31m2.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: editdistance in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 4)) (0.6.2)\n", + "Requirement already satisfied: jiwer in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 5)) (2.5.2)\n", + "Requirement already satisfied: librosa>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (0.10.1)\n", + "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 7)) (1.23.5)\n", + "Requirement already satisfied: plotly in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 8)) (5.15.0)\n", + "Requirement already satisfied: SoundFile in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 9)) (0.12.1)\n", + "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 10)) (4.66.1)\n", + "Requirement already satisfied: Flask<2.3.0,>=1.0.4 in /usr/local/lib/python3.10/dist-packages (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (2.2.5)\n", + "Collecting Werkzeug<2.3.0 (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", + " Downloading Werkzeug-2.2.3-py3-none-any.whl (233 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m233.6/233.6 kB\u001b[0m \u001b[31m9.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting dash-html-components==2.0.0 (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", + " Downloading dash_html_components-2.0.0-py3-none-any.whl (4.1 kB)\n", + "Collecting dash-core-components==2.0.0 (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", + " Downloading dash_core_components-2.0.0-py3-none-any.whl (3.8 kB)\n", + "Collecting dash-table==5.0.0 (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", + " Downloading dash_table-5.0.0-py3-none-any.whl (3.9 kB)\n", + "Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (4.7.1)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (2.31.0)\n", + "Collecting retrying (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", + " Downloading retrying-1.3.4-py3-none-any.whl (11 kB)\n", + "Collecting ansi2html (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", + " Downloading ansi2html-1.8.0-py3-none-any.whl (16 kB)\n", + "Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.10/dist-packages (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (1.5.7)\n", + "Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (65.5.1)\n", + "Requirement already satisfied: rapidfuzz==2.13.7 in /usr/local/lib/python3.10/dist-packages (from jiwer->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 5)) (2.13.7)\n", + "Requirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (3.0.0)\n", + "Requirement already satisfied: scipy>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (1.10.1)\n", + "Requirement already satisfied: scikit-learn>=0.20.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (1.2.2)\n", + "Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (1.3.2)\n", + "Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (4.4.2)\n", + "Requirement already satisfied: numba>=0.51.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (0.56.4)\n", + "Requirement already satisfied: pooch>=1.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (1.7.0)\n", + "Requirement already satisfied: soxr>=0.3.2 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (0.3.6)\n", + "Requirement already satisfied: lazy-loader>=0.1 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (0.3)\n", + "Requirement already satisfied: msgpack>=1.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (1.0.5)\n", + "Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from plotly->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 8)) (8.2.3)\n", + "Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from plotly->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 8)) (23.1)\n", + "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.10/dist-packages (from SoundFile->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 9)) (1.15.1)\n", + "Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.0->SoundFile->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 9)) (2.21)\n", + "Requirement already satisfied: Jinja2>=3.0 in /usr/local/lib/python3.10/dist-packages (from Flask<2.3.0,>=1.0.4->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (3.1.2)\n", + "Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from Flask<2.3.0,>=1.0.4->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (2.1.2)\n", + "Requirement already satisfied: click>=8.0 in /usr/local/lib/python3.10/dist-packages (from Flask<2.3.0,>=1.0.4->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (8.0.2)\n", + "Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba>=0.51.0->librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (0.39.1)\n", + "Requirement already satisfied: platformdirs>=2.5.0 in /usr/local/lib/python3.10/dist-packages (from pooch>=1.0->librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (3.10.0)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (3.2.0)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (3.4)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (1.26.16)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (2023.7.22)\n", + "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.20.0->librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (3.2.0)\n", + "Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from Werkzeug<2.3.0->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (2.1.3)\n", + "Requirement already satisfied: six>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from retrying->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (1.16.0)\n", + "Downloading dash-2.13.0-py3-none-any.whl (10.4 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m10.4/10.4 MB\u001b[0m \u001b[31m46.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading dash_bootstrap_components-1.5.0-py3-none-any.whl (221 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m221.2/221.2 kB\u001b[0m \u001b[31m23.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: dash-table, dash-html-components, dash-core-components, Werkzeug, retrying, diff_match_patch, ansi2html, dash, dash_bootstrap_components\n", + " Attempting uninstall: Werkzeug\n", + " Found existing installation: Werkzeug 2.3.7\n", + " Uninstalling Werkzeug-2.3.7:\n", + " Successfully uninstalled Werkzeug-2.3.7\n", + "Successfully installed Werkzeug-2.2.3 ansi2html-1.8.0 dash-2.13.0 dash-core-components-2.0.0 dash-html-components-2.0.0 dash-table-5.0.0 dash_bootstrap_components-1.5.0 diff_match_patch-20230430 retrying-1.3.4\n", + "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", + "\u001b[0m" + ] + } + ], + "source": [ + "!git clone https://github.com/NVIDIA/NeMo.git\n", + "\n", + "!apt-get update && apt-get install -y libsndfile1 ffmpeg\n", + "!cd NeMo;./reinstall.sh\n", + "!apt-get install sox\n", + "\n", + "!pip3 install -r ./NeMo/tools/speech_data_explorer/requirements.txt" + ], + "id": "c3919489" + }, + { + "cell_type": "markdown", + "source": [ + "# Dataset" + ], + "metadata": { + "id": "TOOhimHTrL9r" + }, + "id": "TOOhimHTrL9r" + }, + { + "cell_type": "markdown", + "source": [ + "In this tutorial we use LibriSpeech test-other dataset to evaluate ASR models. Let's download and prepare the dataset." + ], + "metadata": { + "id": "Eg2mqxC0XrOV" + }, + "id": "Eg2mqxC0XrOV" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "LQYnm_hsYqyt", + "outputId": "da5ebf03-2fca-4800-f162-b3efb1e08668" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "100% 314M/314M [00:03<00:00, 110MB/s]\n", + "100% 90/90 [00:35<00:00, 2.50it/s]\n" + ] + } + ], + "source": [ + "!python3 ./NeMo/scripts/dataset_processing/get_librispeech_data.py --data_sets test_other --data_root ." + ], + "id": "LQYnm_hsYqyt" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9737e0f1" + }, + "source": [ + "Before starting SDE, we need to run ASR inference on a given dataset and save predictions into a JSON manifest.\n" + ], + "id": "9737e0f1" + }, + { + "cell_type": "markdown", + "source": [ + "It is assumed that you already have a JSON manifest containing the `audio_filepath` and reference `text` fields. Here is a minimal example:" + ], + "metadata": { + "id": "t5ruQ3x84Yiq" + }, + "id": "t5ruQ3x84Yiq" + }, + { + "cell_type": "markdown", + "source": [ + "```\n", + "{\n", + " \"audio_filepath\": \"/LibriSpeech/dev-clean-processed/2902-9008-0000.wav\",\n", + " \"text\": \"the place seemed fragrant with all the riches of greek\",\n", + "}\n", + "```" + ], + "metadata": { + "id": "ic-QvihbHMQa" + }, + "id": "ic-QvihbHMQa" + }, + { + "cell_type": "markdown", + "source": [ + "# Transcription" + ], + "metadata": { + "id": "O7XnVkY0rXka" + }, + "id": "O7XnVkY0rXka" + }, + { + "cell_type": "markdown", + "source": [ + "To compare two models, JSON file should contain predictions from 1st (e.g., `QuartzNet15x5`) and 2nd (e.g., `Conformer-CTC Small`) models.\n", + "\n", + "NeMo includes a Python script for ASR inference: [`transcribe_speech.py`](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/transcribe_speech.py).\n", + "\n", + "`transcribe_speech.py` accepts `append_pred` flag that allows saving an ASR transcript in the JSON file with a given custom field (like `pred_text_QN`). `pred_name_postfix` parameter defines the custom field's name. In this example it is set to the abbreviated model name `QN`." + ], + "metadata": { + "id": "rYe0xVTzjbhQ" + }, + "id": "rYe0xVTzjbhQ" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "928a9127", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "6558faad-e02d-4ede-f052-55d0f91f9b44" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "2023-09-08 15:46:53.368750: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n", + "[NeMo W 2023-09-08 15:47:01 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.\n", + " See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.\n", + " ret = run_job(\n", + " \n", + "[NeMo I 2023-09-08 15:47:01 transcribe_speech:181] Hydra config: model_path: null\n", + " pretrained_name: QuartzNet15x5Base-En\n", + " audio_dir: null\n", + " dataset_manifest: ./test_other.json\n", + " channel_selector: null\n", + " audio_key: audio_filepath\n", + " eval_config_yaml: null\n", + " output_filename: test_other_QN.json\n", + " batch_size: 8\n", + " num_workers: 0\n", + " append_pred: true\n", + " pred_name_postfix: QN\n", + " random_seed: null\n", + " compute_timestamps: false\n", + " preserve_alignment: false\n", + " compute_langs: false\n", + " cuda: 0\n", + " allow_mps: false\n", + " amp: true\n", + " amp_dtype: float16\n", + " audio_type: wav\n", + " overwrite_transcripts: true\n", + " ctc_decoding:\n", + " strategy: greedy\n", + " preserve_alignments: null\n", + " compute_timestamps: null\n", + " word_seperator: ' '\n", + " ctc_timestamp_type: all\n", + " batch_dim_index: 0\n", + " greedy:\n", + " preserve_alignments: false\n", + " compute_timestamps: false\n", + " preserve_frame_confidence: false\n", + " confidence_measure_cfg:\n", + " name: entropy\n", + " entropy_type: tsallis\n", + " alpha: 0.33\n", + " entropy_norm: exp\n", + " temperature: DEPRECATED\n", + " confidence_method_cfg: DEPRECATED\n", + " beam:\n", + " beam_size: 4\n", + " search_type: default\n", + " preserve_alignments: false\n", + " compute_timestamps: false\n", + " return_best_hypothesis: true\n", + " beam_alpha: 1.0\n", + " beam_beta: 0.0\n", + " kenlm_path: null\n", + " flashlight_cfg:\n", + " lexicon_path: null\n", + " boost_path: null\n", + " beam_size_token: 16\n", + " beam_threshold: 20.0\n", + " unk_weight: -.inf\n", + " sil_weight: 0.0\n", + " pyctcdecode_cfg:\n", + " beam_prune_logp: -10.0\n", + " token_min_logp: -5.0\n", + " prune_history: false\n", + " hotwords: null\n", + " hotword_weight: 10.0\n", + " confidence_cfg:\n", + " preserve_frame_confidence: false\n", + " preserve_token_confidence: false\n", + " preserve_word_confidence: false\n", + " exclude_blank: true\n", + " aggregation: min\n", + " measure_cfg:\n", + " name: entropy\n", + " entropy_type: tsallis\n", + " alpha: 0.33\n", + " entropy_norm: exp\n", + " temperature: DEPRECATED\n", + " method_cfg: DEPRECATED\n", + " temperature: 1.0\n", + " rnnt_decoding:\n", + " model_type: rnnt\n", + " strategy: greedy_batch\n", + " compute_hypothesis_token_set: false\n", + " preserve_alignments: null\n", + " confidence_cfg:\n", + " preserve_frame_confidence: false\n", + " preserve_token_confidence: false\n", + " preserve_word_confidence: false\n", + " exclude_blank: true\n", + " aggregation: min\n", + " measure_cfg:\n", + " name: entropy\n", + " entropy_type: tsallis\n", + " alpha: 0.33\n", + " entropy_norm: exp\n", + " temperature: DEPRECATED\n", + " method_cfg: DEPRECATED\n", + " fused_batch_size: -1\n", + " compute_timestamps: null\n", + " compute_langs: false\n", + " word_seperator: ' '\n", + " rnnt_timestamp_type: all\n", + " greedy:\n", + " max_symbols_per_step: 10\n", + " preserve_alignments: false\n", + " preserve_frame_confidence: false\n", + " confidence_measure_cfg:\n", + " name: entropy\n", + " entropy_type: tsallis\n", + " alpha: 0.33\n", + " entropy_norm: exp\n", + " temperature: DEPRECATED\n", + " confidence_method_cfg: DEPRECATED\n", + " beam:\n", + " beam_size: 4\n", + " search_type: default\n", + " score_norm: true\n", + " return_best_hypothesis: true\n", + " tsd_max_sym_exp_per_step: 50\n", + " alsd_max_target_len: 1.0\n", + " nsc_max_timesteps_expansion: 1\n", + " nsc_prefix_alpha: 1\n", + " maes_num_steps: 2\n", + " maes_prefix_alpha: 1\n", + " maes_expansion_gamma: 2.3\n", + " maes_expansion_beta: 2\n", + " language_model: null\n", + " softmax_temperature: 1.0\n", + " preserve_alignments: false\n", + " ngram_lm_model: null\n", + " ngram_lm_alpha: 0.0\n", + " hat_subtract_ilm: false\n", + " hat_ilm_weight: 0.0\n", + " temperature: 1.0\n", + " decoder_type: null\n", + " att_context_size: null\n", + " model_change:\n", + " conformer:\n", + " self_attention_model: null\n", + " att_context_size: null\n", + " calculate_wer: true\n", + " clean_groundtruth_text: false\n", + " langid: en\n", + " use_cer: false\n", + " return_transcriptions: false\n", + " return_hypotheses: true\n", + " \n", + "[NeMo I 2023-09-08 15:47:01 transcribe_speech:227] Inference will be done on device: cuda:0\n", + "[NeMo I 2023-09-08 15:47:01 cloud:68] Downloading from: https://api.ngc.nvidia.com/v2/models/nvidia/nemospeechmodels/versions/1.0.0a5/files/QuartzNet15x5Base-En.nemo to /root/.cache/torch/NeMo/NeMo_1.21.0rc0/QuartzNet15x5Base-En/2b066be39e9294d7100fb176ec817722/QuartzNet15x5Base-En.nemo\n", + "[NeMo I 2023-09-08 15:47:07 common:913] Instantiating model from pre-trained checkpoint\n", + "[NeMo I 2023-09-08 15:47:08 features:289] PADDING: 16\n", + "[NeMo I 2023-09-08 15:47:10 save_restore_connector:249] Model EncDecCTCModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.21.0rc0/QuartzNet15x5Base-En/2b066be39e9294d7100fb176ec817722/QuartzNet15x5Base-En.nemo.\n", + "GPU available: True (cuda), used: True\n", + "TPU available: False, using: 0 TPU cores\n", + "IPU available: False, using: 0 IPUs\n", + "HPU available: False, using: 0 HPUs\n", + "[NeMo I 2023-09-08 15:47:10 ctc_models:346] Changed decoding strategy to \n", + " strategy: greedy\n", + " preserve_alignments: null\n", + " compute_timestamps: false\n", + " word_seperator: ' '\n", + " ctc_timestamp_type: all\n", + " batch_dim_index: 0\n", + " greedy:\n", + " preserve_alignments: false\n", + " compute_timestamps: false\n", + " preserve_frame_confidence: false\n", + " confidence_measure_cfg:\n", + " name: entropy\n", + " entropy_type: tsallis\n", + " alpha: 0.33\n", + " entropy_norm: exp\n", + " temperature: DEPRECATED\n", + " confidence_method_cfg: DEPRECATED\n", + " beam:\n", + " beam_size: 4\n", + " search_type: default\n", + " preserve_alignments: false\n", + " compute_timestamps: false\n", + " return_best_hypothesis: true\n", + " beam_alpha: 1.0\n", + " beam_beta: 0.0\n", + " kenlm_path: null\n", + " flashlight_cfg:\n", + " lexicon_path: null\n", + " boost_path: null\n", + " beam_size_token: 16\n", + " beam_threshold: 20.0\n", + " unk_weight: -.inf\n", + " sil_weight: 0.0\n", + " pyctcdecode_cfg:\n", + " beam_prune_logp: -10.0\n", + " token_min_logp: -5.0\n", + " prune_history: false\n", + " hotwords: null\n", + " hotword_weight: 10.0\n", + " confidence_cfg:\n", + " preserve_frame_confidence: false\n", + " preserve_token_confidence: false\n", + " preserve_word_confidence: false\n", + " exclude_blank: true\n", + " aggregation: min\n", + " measure_cfg:\n", + " name: entropy\n", + " entropy_type: tsallis\n", + " alpha: 0.33\n", + " entropy_norm: exp\n", + " temperature: DEPRECATED\n", + " method_cfg: DEPRECATED\n", + " temperature: 1.0\n", + " \n", + "[NeMo I 2023-09-08 15:47:10 transcribe_utils:235] \n", + " Transcribing 2939 files...\n", + " \n", + "[NeMo I 2023-09-08 15:47:10 transcribe_speech:303] AMP enabled!\n", + " \n", + "Transcribing: 100% 368/368 [00:53<00:00, 6.82it/s]\n", + "[NeMo I 2023-09-08 15:48:04 transcribe_speech:349] Finished transcribing 2939 files !\n", + "[NeMo I 2023-09-08 15:48:04 transcribe_speech:350] Writing transcriptions into file: test_other_QN.json\n", + "[NeMo I 2023-09-08 15:48:04 transcribe_utils:284] Transcripts will be written in \"test_other_QN.json\" file\n", + "[NeMo I 2023-09-08 15:48:05 transcribe_speech:368] Finished writing predictions to test_other_QN.json!\n", + "[NeMo I 2023-09-08 15:48:05 transcribe_speech:380] Writing prediction and error rate of each sample to test_other_QN.json!\n", + "[NeMo I 2023-09-08 15:48:05 transcribe_speech:381] {'samples': 2939, 'tokens': 52343, 'wer': 0.10062472536919932, 'ins_rate': 0.007298015016334563, 'del_rate': 0.009361328162313968, 'sub_rate': 0.08396538219055079}\n" + ] + } + ], + "source": [ + "!python3 ./NeMo/examples/asr/transcribe_speech.py \\\n", + "pretrained_name=\"QuartzNet15x5Base-En\" \\\n", + "dataset_manifest=./test_other.json \\\n", + "output_filename=\"test_other_QN.json\" \\\n", + "batch_size=8 \\\n", + "compute_timestamps=False \\\n", + "compute_langs=False \\\n", + "cuda=0 amp=True \\\n", + "append_pred=True \\\n", + "pred_name_postfix=\"QN\"" + ], + "id": "928a9127" + }, + { + "cell_type": "markdown", + "source": [ + "`transcribe_speech.py` also reports overall WER for the given dataset. In our case, QuartzNet's WER is 10.0%.\n", + "\n", + "Here is the first line of the newly created `test_other_QN.json` manifest file with QuartzNet's predictions:" + ], + "metadata": { + "id": "OfhJrSkvqrs6" + }, + "id": "OfhJrSkvqrs6" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "5e274c99", + "outputId": "0a1a3a56-32ec-42c4-f16c-c298c643e851" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "{\"audio_filepath\": \"/content/LibriSpeech/test-other-processed/6938-70848-0000.wav\", \"duration\": 3.315, \"text\": \"even the sun came out pale and watery at noon\", \"pred_text_QN\": \"even the sun came out pale and watery at noon\", \"wer\": 0.0, \"tokens\": 10, \"ins_rate\": 0.0, \"del_rate\": 0.0, \"sub_rate\": 0.0}\n" + ] + } + ], + "source": [ + "!head -n 1 test_other_QN.json" + ], + "id": "5e274c99" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "21761c16" + }, + "source": [ + "Let's run inference with the second ASR model (`Conformer-CTC Small`):" + ], + "id": "21761c16" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e88e4ed3", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c55fda37-a235-46be-b32d-c25a9a1464a4" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "2023-09-08 15:48:30.111329: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n", + "[NeMo W 2023-09-08 15:48:36 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.\n", + " See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.\n", + " ret = run_job(\n", + " \n", + "[NeMo I 2023-09-08 15:48:36 transcribe_speech:181] Hydra config: model_path: null\n", + " pretrained_name: stt_en_conformer_ctc_small\n", + " audio_dir: null\n", + " dataset_manifest: ./test_other_QN.json\n", + " channel_selector: null\n", + " audio_key: audio_filepath\n", + " eval_config_yaml: null\n", + " output_filename: test_other_QN_Conf_small.json\n", + " batch_size: 8\n", + " num_workers: 0\n", + " append_pred: true\n", + " pred_name_postfix: conf_s\n", + " random_seed: null\n", + " compute_timestamps: false\n", + " preserve_alignment: false\n", + " compute_langs: false\n", + " cuda: 0\n", + " allow_mps: false\n", + " amp: true\n", + " amp_dtype: float16\n", + " audio_type: wav\n", + " overwrite_transcripts: true\n", + " ctc_decoding:\n", + " strategy: greedy\n", + " preserve_alignments: null\n", + " compute_timestamps: null\n", + " word_seperator: ' '\n", + " ctc_timestamp_type: all\n", + " batch_dim_index: 0\n", + " greedy:\n", + " preserve_alignments: false\n", + " compute_timestamps: false\n", + " preserve_frame_confidence: false\n", + " confidence_measure_cfg:\n", + " name: entropy\n", + " entropy_type: tsallis\n", + " alpha: 0.33\n", + " entropy_norm: exp\n", + " temperature: DEPRECATED\n", + " confidence_method_cfg: DEPRECATED\n", + " beam:\n", + " beam_size: 4\n", + " search_type: default\n", + " preserve_alignments: false\n", + " compute_timestamps: false\n", + " return_best_hypothesis: true\n", + " beam_alpha: 1.0\n", + " beam_beta: 0.0\n", + " kenlm_path: null\n", + " flashlight_cfg:\n", + " lexicon_path: null\n", + " boost_path: null\n", + " beam_size_token: 16\n", + " beam_threshold: 20.0\n", + " unk_weight: -.inf\n", + " sil_weight: 0.0\n", + " pyctcdecode_cfg:\n", + " beam_prune_logp: -10.0\n", + " token_min_logp: -5.0\n", + " prune_history: false\n", + " hotwords: null\n", + " hotword_weight: 10.0\n", + " confidence_cfg:\n", + " preserve_frame_confidence: false\n", + " preserve_token_confidence: false\n", + " preserve_word_confidence: false\n", + " exclude_blank: true\n", + " aggregation: min\n", + " measure_cfg:\n", + " name: entropy\n", + " entropy_type: tsallis\n", + " alpha: 0.33\n", + " entropy_norm: exp\n", + " temperature: DEPRECATED\n", + " method_cfg: DEPRECATED\n", + " temperature: 1.0\n", + " rnnt_decoding:\n", + " model_type: rnnt\n", + " strategy: greedy_batch\n", + " compute_hypothesis_token_set: false\n", + " preserve_alignments: null\n", + " confidence_cfg:\n", + " preserve_frame_confidence: false\n", + " preserve_token_confidence: false\n", + " preserve_word_confidence: false\n", + " exclude_blank: true\n", + " aggregation: min\n", + " measure_cfg:\n", + " name: entropy\n", + " entropy_type: tsallis\n", + " alpha: 0.33\n", + " entropy_norm: exp\n", + " temperature: DEPRECATED\n", + " method_cfg: DEPRECATED\n", + " fused_batch_size: -1\n", + " compute_timestamps: null\n", + " compute_langs: false\n", + " word_seperator: ' '\n", + " rnnt_timestamp_type: all\n", + " greedy:\n", + " max_symbols_per_step: 10\n", + " preserve_alignments: false\n", + " preserve_frame_confidence: false\n", + " confidence_measure_cfg:\n", + " name: entropy\n", + " entropy_type: tsallis\n", + " alpha: 0.33\n", + " entropy_norm: exp\n", + " temperature: DEPRECATED\n", + " confidence_method_cfg: DEPRECATED\n", + " beam:\n", + " beam_size: 4\n", + " search_type: default\n", + " score_norm: true\n", + " return_best_hypothesis: true\n", + " tsd_max_sym_exp_per_step: 50\n", + " alsd_max_target_len: 1.0\n", + " nsc_max_timesteps_expansion: 1\n", + " nsc_prefix_alpha: 1\n", + " maes_num_steps: 2\n", + " maes_prefix_alpha: 1\n", + " maes_expansion_gamma: 2.3\n", + " maes_expansion_beta: 2\n", + " language_model: null\n", + " softmax_temperature: 1.0\n", + " preserve_alignments: false\n", + " ngram_lm_model: null\n", + " ngram_lm_alpha: 0.0\n", + " hat_subtract_ilm: false\n", + " hat_ilm_weight: 0.0\n", + " temperature: 1.0\n", + " decoder_type: null\n", + " att_context_size: null\n", + " model_change:\n", + " conformer:\n", + " self_attention_model: null\n", + " att_context_size: null\n", + " calculate_wer: true\n", + " clean_groundtruth_text: false\n", + " langid: en\n", + " use_cer: false\n", + " return_transcriptions: false\n", + " return_hypotheses: true\n", + " \n", + "[NeMo I 2023-09-08 15:48:36 transcribe_speech:227] Inference will be done on device: cuda:0\n", + "[NeMo I 2023-09-08 15:48:36 cloud:68] Downloading from: https://api.ngc.nvidia.com/v2/models/nvidia/nemo/stt_en_conformer_ctc_small/versions/1.6.0/files/stt_en_conformer_ctc_small.nemo to /root/.cache/torch/NeMo/NeMo_1.21.0rc0/stt_en_conformer_ctc_small/5d2d8e5b2b5adb8f5091363c6ba19c55/stt_en_conformer_ctc_small.nemo\n", + "[NeMo I 2023-09-08 15:48:41 common:913] Instantiating model from pre-trained checkpoint\n", + "[NeMo I 2023-09-08 15:48:42 mixins:170] Tokenizer SentencePieceTokenizer initialized with 1024 tokens\n", + "[NeMo W 2023-09-08 15:48:43 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.\n", + " Train config : \n", + " manifest_filepath: /data/NeMo_ASR_SET/English/v2.0/train/tarred_audio_manifest.json\n", + " sample_rate: 16000\n", + " batch_size: 64\n", + " shuffle: true\n", + " num_workers: 8\n", + " pin_memory: true\n", + " use_start_end_token: false\n", + " trim_silence: false\n", + " max_duration: 20.0\n", + " min_duration: 0.1\n", + " shuffle_n: 2048\n", + " is_tarred: true\n", + " tarred_audio_filepaths: /data/NeMo_ASR_SET/English/v2.0/train/audio__OP_0..4095_CL_.tar\n", + " \n", + "[NeMo W 2023-09-08 15:48:43 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). \n", + " Validation config : \n", + " manifest_filepath:\n", + " - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-dev-other.json\n", + " - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-dev-clean.json\n", + " - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-test-other.json\n", + " - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-test-clean.json\n", + " sample_rate: 16000\n", + " batch_size: 64\n", + " shuffle: false\n", + " num_workers: 8\n", + " pin_memory: true\n", + " use_start_end_token: false\n", + " is_tarred: false\n", + " tarred_audio_filepaths: na\n", + " \n", + "[NeMo W 2023-09-08 15:48:43 modelPT:174] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).\n", + " Test config : \n", + " manifest_filepath:\n", + " - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-test-other.json\n", + " - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-dev-clean.json\n", + " - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-dev-other.json\n", + " - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-test-clean.json\n", + " sample_rate: 16000\n", + " batch_size: 64\n", + " shuffle: false\n", + " num_workers: 8\n", + " pin_memory: true\n", + " use_start_end_token: false\n", + " is_tarred: false\n", + " tarred_audio_filepaths: na\n", + " \n", + "[NeMo I 2023-09-08 15:48:43 features:289] PADDING: 0\n", + "[NeMo I 2023-09-08 15:48:43 save_restore_connector:249] Model EncDecCTCModelBPE was successfully restored from /root/.cache/torch/NeMo/NeMo_1.21.0rc0/stt_en_conformer_ctc_small/5d2d8e5b2b5adb8f5091363c6ba19c55/stt_en_conformer_ctc_small.nemo.\n", + "GPU available: True (cuda), used: True\n", + "TPU available: False, using: 0 TPU cores\n", + "IPU available: False, using: 0 IPUs\n", + "HPU available: False, using: 0 HPUs\n", + "[NeMo I 2023-09-08 15:48:44 ctc_bpe_models:314] Changed decoding strategy to \n", + " strategy: greedy\n", + " preserve_alignments: null\n", + " compute_timestamps: false\n", + " word_seperator: ' '\n", + " ctc_timestamp_type: all\n", + " batch_dim_index: 0\n", + " greedy:\n", + " preserve_alignments: false\n", + " compute_timestamps: false\n", + " preserve_frame_confidence: false\n", + " confidence_measure_cfg:\n", + " name: entropy\n", + " entropy_type: tsallis\n", + " alpha: 0.33\n", + " entropy_norm: exp\n", + " temperature: DEPRECATED\n", + " confidence_method_cfg: DEPRECATED\n", + " beam:\n", + " beam_size: 4\n", + " search_type: default\n", + " preserve_alignments: false\n", + " compute_timestamps: false\n", + " return_best_hypothesis: true\n", + " beam_alpha: 1.0\n", + " beam_beta: 0.0\n", + " kenlm_path: null\n", + " flashlight_cfg:\n", + " lexicon_path: null\n", + " boost_path: null\n", + " beam_size_token: 16\n", + " beam_threshold: 20.0\n", + " unk_weight: -.inf\n", + " sil_weight: 0.0\n", + " pyctcdecode_cfg:\n", + " beam_prune_logp: -10.0\n", + " token_min_logp: -5.0\n", + " prune_history: false\n", + " hotwords: null\n", + " hotword_weight: 10.0\n", + " confidence_cfg:\n", + " preserve_frame_confidence: false\n", + " preserve_token_confidence: false\n", + " preserve_word_confidence: false\n", + " exclude_blank: true\n", + " aggregation: min\n", + " measure_cfg:\n", + " name: entropy\n", + " entropy_type: tsallis\n", + " alpha: 0.33\n", + " entropy_norm: exp\n", + " temperature: DEPRECATED\n", + " method_cfg: DEPRECATED\n", + " temperature: 1.0\n", + " \n", + "[NeMo I 2023-09-08 15:48:44 transcribe_utils:235] \n", + " Transcribing 2939 files...\n", + " \n", + "[NeMo I 2023-09-08 15:48:44 transcribe_speech:303] AMP enabled!\n", + " \n", + "Transcribing: 100% 368/368 [00:46<00:00, 7.88it/s]\n", + "[NeMo I 2023-09-08 15:49:31 transcribe_speech:349] Finished transcribing 2939 files !\n", + "[NeMo I 2023-09-08 15:49:31 transcribe_speech:350] Writing transcriptions into file: test_other_QN_Conf_small.json\n", + "[NeMo I 2023-09-08 15:49:31 transcribe_utils:284] Transcripts will be written in \"test_other_QN_Conf_small.json\" file\n", + "[NeMo I 2023-09-08 15:49:31 transcribe_speech:368] Finished writing predictions to test_other_QN_Conf_small.json!\n", + "[NeMo I 2023-09-08 15:49:32 transcribe_speech:380] Writing prediction and error rate of each sample to test_other_QN_Conf_small.json!\n", + "[NeMo I 2023-09-08 15:49:32 transcribe_speech:381] {'samples': 2939, 'tokens': 52343, 'wer': 0.08180654528781307, 'ins_rate': 0.007450853027147852, 'del_rate': 0.006820396232543034, 'sub_rate': 0.0675352960281222}\n" + ] + } + ], + "source": [ + "!python ./NeMo/examples/asr/transcribe_speech.py \\\n", + "pretrained_name=\"stt_en_conformer_ctc_small\" \\\n", + "dataset_manifest=./test_other_QN.json \\\n", + "output_filename=\"test_other_QN_Conf_small.json\" \\\n", + "batch_size=8 \\\n", + "compute_timestamps=False \\\n", + "compute_langs=False \\\n", + "cuda=0 amp=True \\\n", + "append_pred=True \\\n", + "pred_name_postfix=\"conf_s\"" + ], + "id": "e88e4ed3" + }, + { + "cell_type": "markdown", + "source": [ + "Conformer's WER is 8.1%.\n", + "\n", + "Here is the first line of `test_other_QN_Conf_small.json` manifest file with transcripts by both ASR models\n", + "- `pred_text_QN` by QuartzNet\n", + "- `pred_text_conf_s` by Conformer-Small" + ], + "metadata": { + "id": "MyvMpkcGqhff" + }, + "id": "MyvMpkcGqhff" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "551a2b33", + "outputId": "7dbbc35f-44fd-49c4-c035-2a07f033cdaf" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "{\"audio_filepath\": \"/content/LibriSpeech/test-other-processed/6938-70848-0000.wav\", \"duration\": 3.315, \"text\": \"even the sun came out pale and watery at noon\", \"pred_text_QN\": \"even the sun came out pale and watery at noon\", \"wer\": 0.0, \"tokens\": 10, \"ins_rate\": 0.0, \"del_rate\": 0.0, \"sub_rate\": 0.0, \"pred_text_conf_s\": \"even the sun came out pale and watery at noon\"}\n" + ] + } + ], + "source": [ + "!head -n 1 test_other_QN_Conf_small.json" + ], + "id": "551a2b33" + }, + { + "cell_type": "markdown", + "source": [ + "# Launching SDE" + ], + "metadata": { + "id": "wYQAP60iretG" + }, + "id": "wYQAP60iretG" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "67a15347" + }, + "source": [ + "Now it's time to start SDE!\n", + "\n", + "SDE is a Python web application that can be run either locally or on a remote server (a cloud instance) from a command line.\n", + "\n", + "In this tutorial, we use Google Colab. So we need to get a proxy link to the instance:" + ], + "id": "67a15347" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "id": "Fmh2uHfLgTrL", + "outputId": "dd93bd85-ae6a-4024-fae6-d18e9eb402e8" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "https://1p2l450rmms-496ff2e9c6d22116-8080-colab.googleusercontent.com/\n" + ] + } + ], + "source": [ + "from google.colab.output import eval_js\n", + "print(eval_js(\"google.colab.kernel.proxyPort(8050)\"))" + ], + "id": "Fmh2uHfLgTrL" + }, + { + "cell_type": "markdown", + "source": [ + "Now let's start SDE with the following shell command. `nc` parameter provides names of ASR transcripts for Comparison mode." + ], + "metadata": { + "id": "hABptmgK1amL" + }, + "id": "hABptmgK1amL" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e69fc876" + }, + "outputs": [], + "source": [ + "!python3 ./NeMo/tools/speech_data_explorer/data_explorer.py test_other_QN_Conf_small.json --port=8050 \\\n", + "-nc pred_text_QN pred_text_conf_s" + ], + "id": "e69fc876" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "710ea9c4" + }, + "source": [ + "Please click on the aforementioned proxy link to view SDE application in another web-browser tab." + ], + "id": "710ea9c4" + }, + { + "cell_type": "markdown", + "source": [ + "#Quick guide on SDE interface" + ], + "metadata": { + "id": "rLNS2kFhr-GL" + }, + "id": "rLNS2kFhr-GL" + }, + { + "cell_type": "markdown", + "source": [ + "SDE provides general dataset's statistics (number of utterances, hours, character set size, histograms, etc.) on `Statistics` page. It allows users to do interactive exploration and analysis on the dataset on `Samples` page. All these functions are described in detail in [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/tools/speech_data_explorer.html)." + ], + "metadata": { + "id": "AI1Wkahb2w2-" + }, + "id": "AI1Wkahb2w2-" + }, + { + "cell_type": "markdown", + "source": [ + "![снимок 1.png]()" + ], + "metadata": { + "id": "IsNRQkg9CQ6n" + }, + "id": "IsNRQkg9CQ6n" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "67693388" + }, + "source": [ + "In this tutorial, we are interested in `Comparison tool` page.\n", + "\n", + "The `Comparison` page has two main modes (levels) of comparison:\n", + "- word level (metrics for individual vocabulary words are compared)\n", + "- utterance level (metrics for individual dataset's utterances are compared)\n", + "\n", + "In each mode, the dataset is visualized as an interactive scatterplot. Each marker represents a data unit (either a word, or an utterance). X-coordinate encodes a selected metric for one model, Y-coordinate does the same for the other model.\n", + "\n", + "By default, word level comparison is selected on `Comparison tool` page. In second (2) and third (3) box you can choose what will be shown on the axes of the scatterplot (that is, metrics for 1st and 2nd models). In our example, it is word level accuracy for QuartzNet and Conformer-Small: `accuracy_model_pred_text_{model_name}`." + ], + "id": "67693388" + }, + { + "cell_type": "markdown", + "source": [ + "Depending on comparison level, these fields provide the following options:\n", + "\n", + "\n", + "* word accuracy (ratio of the correctly recognized number of words to the total number of this word in the entire dataset)\n", + "* utterance WER\n", + "* utterance CER" + ], + "metadata": { + "id": "dUXwvaU8ASxv" + }, + "id": "dUXwvaU8ASxv" + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()" + ], + "metadata": { + "id": "xHfBizBuv5Pg" + }, + "id": "xHfBizBuv5Pg" + }, + { + "cell_type": "markdown", + "source": [ + "![Снимок 2.png]()" + ], + "metadata": { + "id": "WhRiRrVCDUVO" + }, + "id": "WhRiRrVCDUVO" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1c0c797c" + }, + "source": [ + "By default, the scatterplot displays all data units (either words or utterances). To allow users to select more interesting subsets, SDE supports flexible filtering queries like in the following example (for word level):\n" + ], + "id": "1c0c797c" + }, + { + "cell_type": "markdown", + "source": [ + "![3.png]()" + ], + "metadata": { + "id": "BtabXT7PEGe-" + }, + "id": "BtabXT7PEGe-" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1f8cadb0" + }, + "source": [ + "You can type the filtering queries either in a specific column header, or in a custom filter expression textbox (putting lowercased column names in curly brackets).\n", + "\n", + "\n" + ], + "id": "1f8cadb0" + }, + { + "cell_type": "markdown", + "source": [ + "Below is an example of complicated query, that allows us to display utterances where both ASR models yield different WERs (for utterance level):" + ], + "metadata": { + "id": "9COXr5cXsp8X" + }, + "id": "9COXr5cXsp8X" + }, + { + "cell_type": "markdown", + "source": [ + "![image (1).png]()" + ], + "metadata": { + "id": "5OQDmR00sSub" + }, + "id": "5OQDmR00sSub" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f7f64b2d" + }, + "source": [ + "The scatterplot is fully interactive: you can zoom, swipe, and view extended information about a data point hovering over it." + ], + "id": "f7f64b2d" + }, + { + "cell_type": "markdown", + "source": [ + "![4.png]()" + ], + "metadata": { + "id": "oalX8opDEdMX" + }, + "id": "oalX8opDEdMX" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7f557223" + }, + "source": [ + "Let's examine an utterance level mode.\n", + "\n", + "You can choose either WER or CER as a metric:" + ], + "id": "7f557223" + }, + { + "cell_type": "markdown", + "source": [ + "![5.png]()" + ], + "metadata": { + "id": "WdYYdwMdFUHo" + }, + "id": "WdYYdwMdFUHo" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ee18d226" + }, + "source": [ + "It is easy to add complex filtering expressions:" + ], + "id": "ee18d226" + }, + { + "cell_type": "markdown", + "source": [ + "![6.png]()" + ], + "metadata": { + "id": "I3JT9-afEgAY" + }, + "id": "I3JT9-afEgAY" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3aefad51" + }, + "source": [ + "The scatterplot is interactive. Clicking on any data point automatically selects corresponding data row in dataset table." + ], + "id": "3aefad51" + }, + { + "cell_type": "markdown", + "source": [ + "![7.png]()" + ], + "metadata": { + "id": "64rWT4JeFy93" + }, + "id": "64rWT4JeFy93" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e3441c51" + }, + "source": [ + "After selecting a data point, you can listen to the utterance. Just click on player icon.\n", + "\n", + "Also you can analyze audio signal in time and frequency domain using interactive plots of the signal and its spectrogram.\n" + ], + "id": "e3441c51" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a41728b3" + }, + "source": [ + "![8.png]()" + ], + "id": "a41728b3" + }, + { + "cell_type": "markdown", + "source": [ + "![9.png]()" + ], + "metadata": { + "id": "MiycGa3NGQQT" + }, + "id": "MiycGa3NGQQT" + }, + { + "cell_type": "markdown", + "source": [ + "# Interesting examples and use cases" + ], + "metadata": { + "id": "EVn6Bu8fsEWu" + }, + "id": "EVn6Bu8fsEWu" + }, + { + "cell_type": "markdown", + "source": [ + "Using the {wer_model_1} != {wer_model_2} query you can remove all sentences/words that have the same WER, this can be very convenient if the models you are comparing are similar.\n", + "\n", + "In the following example, you can see how QuartzNet and Conformer-Small transcribe the same utterance:\n", + "\n", + "* reference transcript is `the school of the wilderness`\n", + "* QuartzNet's transcript is `the school of the wearerness` (the error in the last word)\n", + "* Conformer-Small's transcript is `the score of the word in its` (almost completely wrong prediction)\n", + "\n" + ], + "metadata": { + "id": "J8ko_-Q1tX6v" + }, + "id": "J8ko_-Q1tX6v" + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()" + ], + "metadata": { + "id": "U1qKT9tEWk0j" + }, + "id": "U1qKT9tEWk0j" + }, + { + "cell_type": "markdown", + "source": [ + "General ASR models might make mistakes in proper nouns and narrow domain specific terms since they are rarely found in training datasets." + ], + "metadata": { + "id": "MK7B7xniCPky" + }, + "id": "MK7B7xniCPky" + }, + { + "cell_type": "markdown", + "source": [ + "1) Name: `Shiloh`" + ], + "metadata": { + "id": "sAP9F-bHtpVi" + }, + "id": "sAP9F-bHtpVi" + }, + { + "cell_type": "markdown", + "source": [ + "![image (3).png]()" + ], + "metadata": { + "id": "e6LCtKS0s9LE" + }, + "id": "e6LCtKS0s9LE" + }, + { + "cell_type": "markdown", + "source": [ + "2) Domain specific term: `rheumatic`" + ], + "metadata": { + "id": "vXf2v13ys84x" + }, + "id": "vXf2v13ys84x" + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()" + ], + "metadata": { + "id": "BI7sIWh7o8zj" + }, + "id": "BI7sIWh7o8zj" + }, + { + "cell_type": "markdown", + "source": [ + "It is very likely that, in the word level mode, many words will receive the same accuracy, and will overlap with each other on the scatterplot. That is why the graph will display only one point at a given location. To resolve this issue, there is an option to space them with a given radius:" + ], + "metadata": { + "id": "1D7a1_p5svWu" + }, + "id": "1D7a1_p5svWu" + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()" + ], + "metadata": { + "id": "VUL1xTrXpHEL" + }, + "id": "VUL1xTrXpHEL" + }, + { + "cell_type": "markdown", + "source": [ + "This part of the graph shows an enlarged view in coordinates (0, 100). That is, words correctly recognized by the Conformer-Small and not recognized by QuartzNet." + ], + "metadata": { + "id": "2wOPEk4ktKrb" + }, + "id": "2wOPEk4ktKrb" + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()" + ], + "metadata": { + "id": "J1jw_1c5pMRc" + }, + "id": "J1jw_1c5pMRc" + }, + { + "cell_type": "markdown", + "source": [ + "Let's write down words recognized only by QuartzNet and words recognized only by Conformer-Small:" + ], + "metadata": { + "id": "IjXTQJZuC3Z-" + }, + "id": "IjXTQJZuC3Z-" + }, + { + "cell_type": "code", + "source": [ + "import pandas as pd\n", + "\n", + "data = {\n", + " \"QuartzNet_words\": [\"allspice\", \"southwark\", \"dante\", \"panada\", \"favour\", \"vapours\", \"fuchs\", \"battlefields\", \"morrel\", \"postscript\"],\n", + " \"Conformer_words\": [\"gothic\", \"tablespoons\", \"heidelberg\", \"bough\", \"nocturnal\", \"meekin\", \"-\", \"-\", \"-\", \"-\"]\n", + "}\n", + "\n", + "df = pd.DataFrame(data)\n", + "\n", + "df\n", + "\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 363 + }, + "id": "57bPPFyxSPiJ", + "outputId": "0b003e3c-2c26-4e38-c4f6-8d4d4b568ea1" + }, + "id": "57bPPFyxSPiJ", + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " QuartzNet_words Conformer_words\n", + "0 allspice gothic\n", + "1 southwark tablespoons\n", + "2 dante heidelberg\n", + "3 panada bough\n", + "4 favour nocturnal\n", + "5 vapours meekin\n", + "6 fuchs -\n", + "7 battlefields -\n", + "8 morrel -\n", + "9 postscript -" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
QuartzNet_wordsConformer_words
0allspicegothic
1southwarktablespoons
2danteheidelberg
3panadabough
4favournocturnal
5vapoursmeekin
6fuchs-
7battlefields-
8morrel-
9postscript-
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 17 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()" + ], + "metadata": { + "id": "UghOVs2xUG1b" + }, + "id": "UghOVs2xUG1b" + }, + { + "cell_type": "markdown", + "source": [ + "Words in the first group seem to have varied origins, with a mix of English, Germanic, and potentially Latin-derived terms.\n", + "The second group, on the other hand, also appears to contain words of varied linguistic origins but might have a slightly more European or Old World feel, especially with words like `gothic` and `heidelberg`.\n", + "\n", + "**Word Length**\n", + "\n", + "The average word length in the first group might be longer with words like `battlefields` and `postscript`.\n", + "The second group also contains long words, but when we consider `bough` or `gothic`, it might have a slightly shorter average length.\n", + "\n", + "**Functional vs. Descriptive**\n", + "\n", + "The first group contains a mix of nouns that are both functional (like `allspice` or `panada`) and more abstract or descriptive (like `dante` or `vapours`).\n", + "The second group also contains nouns, but they seem more descriptive or pertaining to concepts or themes like `gothic` or `nocturnal`." + ], + "metadata": { + "id": "5_rKgNoovt1k" + }, + "id": "5_rKgNoovt1k" + }, + { + "cell_type": "markdown", + "source": [ + "Next, we will look at several interesting examples that were easily discovered using this tool." + ], + "metadata": { + "id": "C0hcCrZI98fF" + }, + "id": "C0hcCrZI98fF" + }, + { + "cell_type": "markdown", + "source": [ + "# Examples with audio" + ], + "metadata": { + "id": "FV8rLFfr9KVv" + }, + "id": "FV8rLFfr9KVv" + }, + { + "cell_type": "markdown", + "source": [ + "(It is worth noting that the examples below are taken from LibriSpeech dev-clean set, while the rest of the tutorial is based on LibriSpeech test-other)" + ], + "metadata": { + "id": "W35kfqsYkVlf" + }, + "id": "W35kfqsYkVlf" + }, + { + "cell_type": "code", + "source": [ + "#This cell is made so that you can quickly listen to those utterances.\n", + "from IPython.display import Audio, display\n", + "!python3 ./NeMo/scripts/dataset_processing/get_librispeech_data.py --data_sets dev_clean --data_root ." + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "9Q1aCuBC8QZm", + "outputId": "9bf30ea1-c15f-434d-c5cf-56a1111f9a0f" + }, + "id": "9Q1aCuBC8QZm", + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "100% 97/97 [00:05<00:00, 16.64it/s]\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()" + ], + "metadata": { + "id": "BjgoarWOrntG" + }, + "id": "BjgoarWOrntG" + }, + { + "cell_type": "markdown", + "source": [ + "Above is example how Conformer-Small trying to use more common phrases fails with recognizing word `southwark` (Southwark is a district of Central London). We see how Conformer-Small fails on proper nouns and just names in general." + ], + "metadata": { + "id": "QogvdaUzrq2L" + }, + "id": "QogvdaUzrq2L" + }, + { + "cell_type": "code", + "source": [ + "sound_file = \"/content/LibriSpeech/dev-clean-processed/5895-34629-0012.wav\"\n", + "display(Audio(sound_file, autoplay=True))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 76 + }, + "id": "iA-wfLFE-Lke", + "outputId": "e4b35c24-3047-448c-8006-176fe68c80be" + }, + "id": "iA-wfLFE-Lke", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " " + ] + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "In this case, the speaker does not really pause between `over` and `all`, and QuartzNet transcribes it as a single word." + ], + "metadata": { + "id": "jyX2PVgovdjj" + }, + "id": "jyX2PVgovdjj" + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()" + ], + "metadata": { + "id": "z3jISeCTrRVI" + }, + "id": "z3jISeCTrRVI" + }, + { + "cell_type": "code", + "source": [ + "sound_file = \"/content/LibriSpeech/dev-clean-processed/652-129742-0008.wav\"\n", + "display(Audio(sound_file, autoplay=True))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 76 + }, + "id": "5aYtqrTG8Xpa", + "outputId": "b5ff0a91-0bc2-42e6-a575-611d3170731d" + }, + "id": "5aYtqrTG8Xpa", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " " + ] + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "This is an example of an error in the dataset.\n", + "The audio is challenging (quality is not very good, and the speaker is singing this phrase). But it is obvious that the reference transcript should be `grub pile grub pile`. Likely, the extra space in word `pile` was introduced by replacing a hyphen character with a space in the original dataset:" + ], + "metadata": { + "id": "jz9qolfuvdFj" + }, + "id": "jz9qolfuvdFj" + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()" + ], + "metadata": { + "id": "FrjjbUv0A3g9" + }, + "id": "FrjjbUv0A3g9" + }, + { + "cell_type": "code", + "source": [ + "sound_file = \"/content/LibriSpeech/dev-clean-processed/6313-76958-0023.wav\"\n", + "display(Audio(sound_file, autoplay=True))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 76 + }, + "id": "sdKQ2hvB7bIa", + "outputId": "b4e7ddcd-f124-47ea-ac93-67f68efdf707" + }, + "id": "sdKQ2hvB7bIa", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " " + ] + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "Here is another example. I would not say that the models were wrong. When listening to the audio, you might actually think that the announcer is saying “guessed”.\n", + "\n", + "Therefore, if we see that both models make the same errors - this is a good reason to check the dataset." + ], + "metadata": { + "id": "uXbTGzVCvd8K" + }, + "id": "uXbTGzVCvd8K" + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()" + ], + "metadata": { + "id": "D4xWxRk8BLtb" + }, + "id": "D4xWxRk8BLtb" + }, + { + "cell_type": "code", + "source": [ + "sound_file = \"/content/LibriSpeech/dev-clean-processed/2428-83705-0008.wav\"\n", + "display(Audio(sound_file, autoplay=True))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 76 + }, + "id": "MsOJGMOn8ERg", + "outputId": "09269135-92b6-415d-cb4f-8d7b86d31f3e" + }, + "id": "MsOJGMOn8ERg", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " " + ] + }, + "metadata": {} + } + ] + } + ], + "metadata": { + "colab": { + "provenance": [], + "gpuType": "T4" + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + }, + "accelerator": "GPU" + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file From a8b85b73f5ad815077e477408dee49caf6bdc43c Mon Sep 17 00:00:00 2001 From: Eric Harper Date: Mon, 11 Sep 2023 20:35:08 -0600 Subject: [PATCH 025/112] Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../megatron_ckpt_to_nemo.py | 27 +++++++++++++------ 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/examples/nlp/language_modeling/megatron_ckpt_to_nemo.py b/examples/nlp/language_modeling/megatron_ckpt_to_nemo.py index e2fd1d4bbcd1..1161614e0a53 100644 --- a/examples/nlp/language_modeling/megatron_ckpt_to_nemo.py +++ b/examples/nlp/language_modeling/megatron_ckpt_to_nemo.py @@ -24,10 +24,12 @@ --pipeline_model_parallel_size """ +import dis import os from argparse import ArgumentParser import torch +from genericpath import isdir from megatron.core import parallel_state from omegaconf import open_dict from pytorch_lightning.plugins.environments import TorchElasticEnvironment @@ -40,7 +42,7 @@ from nemo.collections.nlp.models.language_modeling.megatron_retrieval_model import MegatronRetrievalModel from nemo.collections.nlp.models.language_modeling.megatron_t5_model import MegatronT5Model from nemo.collections.nlp.models.machine_translation.megatron_nmt_model import MegatronNMTModel -from nemo.collections.nlp.parts.nlp_overrides import NLPSaveRestoreConnector +from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy, NLPSaveRestoreConnector from nemo.utils import AppState, logging from nemo.utils.distributed import initialize_distributed from nemo.utils.model_utils import inject_model_parallel_rank @@ -100,12 +102,16 @@ def convert(local_rank, rank, world_size, args): app_state = AppState() app_state.data_parallel_rank = 0 num_nodes = world_size // args.gpus_per_node + plugins = [] + strategy = "auto" if args.bcp: - trainer = Trainer( - devices=args.gpus_per_node, num_nodes=num_nodes, accelerator='gpu', plugins=[TorchElasticEnvironment()] - ) - else: - trainer = Trainer(devices=args.gpus_per_node, num_nodes=num_nodes, accelerator='gpu') + plugins.append(TorchElasticEnvironment()) + if args.model_type == 'gpt': + strategy = NLPDDPStrategy() + + trainer = Trainer( + devices=args.gpus_per_node, num_nodes=num_nodes, accelerator='gpu', plugins=plugins, strategy=strategy + ) app_state.pipeline_model_parallel_size = args.pipeline_model_parallel_size app_state.tensor_model_parallel_size = args.tensor_model_parallel_size @@ -135,8 +141,13 @@ def convert(local_rank, rank, world_size, args): app_state.pipeline_model_parallel_rank = parallel_state.get_pipeline_model_parallel_rank() app_state.tensor_model_parallel_rank = parallel_state.get_tensor_model_parallel_rank() - # inject model parallel rank - checkpoint_path = inject_model_parallel_rank(os.path.join(args.checkpoint_folder, args.checkpoint_name)) + # check for distributed checkpoint + dist_ckpt_dir = os.path.join(args.checkpoint_folder, args.checkpoint_name) + if os.path.isdir(dist_ckpt_dir): + checkpoint_path = dist_ckpt_dir + else: + # legacy checkpoint needs model parallel injection + checkpoint_path = inject_model_parallel_rank(os.path.join(args.checkpoint_folder, args.checkpoint_name)) logging.info( f'rank: {rank}, local_rank: {local_rank}, is loading checkpoint: {checkpoint_path} for tp_rank: {app_state.tensor_model_parallel_rank} and pp_rank: {app_state.pipeline_model_parallel_rank}' From a81bdbe6bc5962e279d1616e5c3358e0a3104aab Mon Sep 17 00:00:00 2001 From: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Date: Mon, 11 Sep 2023 21:08:01 -0700 Subject: [PATCH 026/112] Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../nlp/models/language_modeling/megatron_base_model.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_base_model.py b/nemo/collections/nlp/models/language_modeling/megatron_base_model.py index 29b1886c957a..3a4fcd5747af 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_base_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_base_model.py @@ -211,8 +211,8 @@ def _reconfigure_val_batches(self): """ # Override limit_val_batches to be a multiple of num microbatches and so there are limit_val_batches//num_micro_batches num of global batches self.trainer.limit_val_batches *= get_num_microbatches() - # Override num sanity steps equal to num of microbatches and perform one val_step - self.trainer.num_sanity_val_steps = get_num_microbatches() + # Override num sanity steps to be a multiple of num of microbatches + self.trainer.num_sanity_val_steps *= get_num_microbatches() def _enable_nvidia_optimizations(self): "These optimizations are present in NVIDIA NGC PyTorch Containers" From b79ec2a4850e6dd37377dbed48f745a1c5bf35fd Mon Sep 17 00:00:00 2001 From: PeganovAnton Date: Tue, 12 Sep 2023 06:28:39 +0200 Subject: [PATCH 027/112] Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov Signed-off-by: Sasha Meister --- .../nlp/modules/common/text_generation_utils.py | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/nemo/collections/nlp/modules/common/text_generation_utils.py b/nemo/collections/nlp/modules/common/text_generation_utils.py index 36b30aae47b9..33bd59fd9b30 100644 --- a/nemo/collections/nlp/modules/common/text_generation_utils.py +++ b/nemo/collections/nlp/modules/common/text_generation_utils.py @@ -160,6 +160,14 @@ def get_computeprob_response(tokenizer, response, inputs): token_len = int(inputs[1][batch_id].item()) new_token_id = inputs[0][batch_id][:token_len].tolist() new_text = tokenizer.ids_to_text(new_token_id) + else: + raise TypeError( + f"Unsupported type of `inputs[0]`: {type(inputs[0])}. Supported types: `str`, `torch.Tensor`." + ) + else: + raise TypeError( + f"Unsupported type of parameter `inputs`: {type(inputs)}. Supported types: `list` and `tuple`" + ) new_token_ids.append(new_token_id) new_tokens.append(response['tokens'][batch_id][:token_len]) new_texts.append(new_text) From 0d85ab8bf81c94e7a5ddb89a82848a21d96090a1 Mon Sep 17 00:00:00 2001 From: Nikolay Karpov Date: Tue, 12 Sep 2023 13:13:34 +0400 Subject: [PATCH 028/112] check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov Signed-off-by: Sasha Meister --- .../ngram_lm/install_beamsearch_decoders.sh | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh b/scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh index 3ba337a6afd3..b09ed351bd15 100755 --- a/scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh +++ b/scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh @@ -17,20 +17,25 @@ shopt -s expand_aliases NEMO_PATH=/workspace/nemo # Path to NeMo folder: /workspace/nemo if you use NeMo/Dockerfile -if [ "$#" -eq 1 ] -then +if [ "$#" -eq 1 ]; then NEMO_PATH=$1 fi KENLM_MAX_ORDER=10 # Maximum order of KenLM model, also specified in the setup_os2s_decoders.py +if [ -d "$NEMO_PATH" ]; then + echo "The folder '$NEMO_PATH' exists." +else + echo "Error: The folder '$NEMO_PATH' does not exist. Specify it as a first command line positional argument!" + exit 1 +fi cd $NEMO_PATH if [ $(id -u) -eq 0 ]; then - alias aptupdate='apt-get update' - alias b2install='./b2' - else - alias aptupdate='sudo apt-get update' - alias b2install='sudo ./b2' + alias aptupdate='apt-get update' + alias b2install='./b2' +else + alias aptupdate='sudo apt-get update' + alias b2install='sudo ./b2' fi aptupdate && apt-get upgrade -y && apt-get install -y liblzma-dev && rm -rf /var/lib/apt/lists/* # liblzma needed for flashlight decoder From f7fc3bd08171c7f0a10ded2d7fa2bd5d932ed562 Mon Sep 17 00:00:00 2001 From: Adi Renduchintala Date: Wed, 13 Sep 2023 09:20:30 -0700 Subject: [PATCH 029/112] layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../nlp/models/language_modeling/megatron_gpt_peft_models.py | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_peft_models.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_peft_models.py index 7b772ea33a47..ba7dd4529326 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_peft_models.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_peft_models.py @@ -434,6 +434,11 @@ def __init__(self, cfg: DictConfig, trainer: Trainer): self.name_key_to_cfg[k] = infused_adapter_cfg else: raise ValueError(f"PEFT Key {k} is unknown.") + + self.layer_selection = cfg.peft.ia3_tuning.get("layer_selection", None) + if self.layer_selection is None: + self.layer_selection = list(range(1, cfg.num_layers + 1)) + super().__init__(cfg, trainer) From 3c90b5e9a8b9d22c92accd85ecbb7d8b904046c5 Mon Sep 17 00:00:00 2001 From: Robin Dong Date: Fri, 15 Sep 2023 08:52:58 +1000 Subject: [PATCH 030/112] Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong Signed-off-by: Sasha Meister --- tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb b/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb index b1d70d8446c2..9654a0fe6924 100644 --- a/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb +++ b/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb @@ -672,7 +672,8 @@ "# !git clone --depth 1 --branch v1.8.0 https://github.com/microsoft/onnxruntime.git .\n", "# !./build.sh --skip_tests --config Release --build_shared_lib --parallel --use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-linux-gnu --build_wheel\n", "# !pip install ./build/Linux/Release/dist/onnxruntime*.whl\n", - "# %cd .." + "# %cd ..\n", + "!pip install einops\n" ] }, { From 12e77d4a27a32dd1ca35c076b21afb8c08f2124d Mon Sep 17 00:00:00 2001 From: Robin Dong Date: Fri, 15 Sep 2023 16:50:51 +1000 Subject: [PATCH 031/112] Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong Signed-off-by: Sasha Meister --- tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb b/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb index 9654a0fe6924..31035ac19643 100644 --- a/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb +++ b/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb @@ -24,7 +24,7 @@ "\n", "## Install dependencies\n", "!pip install wget\n", - "!apt-get install sox libsndfile1 ffmpeg portaudio19-dev\n", + "!apt-get install libsndfile1 ffmpeg portaudio19-dev\n", "!pip install text-unidecode\n", "!pip install pyaudio\n", "\n", From aa3f36d6c67fb02fd7d33e57c2ff4e2b926b7fba Mon Sep 17 00:00:00 2001 From: Samuele Cornell Date: Mon, 18 Sep 2023 18:06:13 +0100 Subject: [PATCH 032/112] Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell Signed-off-by: Sasha Meister --- tools/speech_data_simulator/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/speech_data_simulator/README.md b/tools/speech_data_simulator/README.md index 94b917af88f3..ad589ef9194d 100644 --- a/tools/speech_data_simulator/README.md +++ b/tools/speech_data_simulator/README.md @@ -89,7 +89,7 @@ python scripts/dataset_processing/get_librispeech_data.py \ python /scripts/speaker_tasks/create_alignment_manifest.py \ --input_manifest_filepath \ --base_alignment_path \ - --output_path train-clean-100-align.json \ + --output_manifest_filepath train-clean-100-align.json \ --ctm_output_directory ./ctm_out \ --libri_dataset_split train-clean-100 ``` From 0a8eaf24cdb687ee0bf94ba20f9c9e59d320ce47 Mon Sep 17 00:00:00 2001 From: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Date: Mon, 18 Sep 2023 18:44:36 -0400 Subject: [PATCH 033/112] Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang Signed-off-by: Sasha Meister --- .../megatron_gpt_continue_training.py | 4 +++- .../modules/common/megatron/language_model.py | 4 +++- .../rotary_position_embedding.py | 21 ++++++++++++++----- 3 files changed, 22 insertions(+), 7 deletions(-) mode change 100644 => 100755 examples/nlp/language_modeling/megatron_gpt_continue_training.py diff --git a/examples/nlp/language_modeling/megatron_gpt_continue_training.py b/examples/nlp/language_modeling/megatron_gpt_continue_training.py old mode 100644 new mode 100755 index c829a91c4092..d866530f1312 --- a/examples/nlp/language_modeling/megatron_gpt_continue_training.py +++ b/examples/nlp/language_modeling/megatron_gpt_continue_training.py @@ -61,7 +61,9 @@ def _modify_config(gpt_cfg, cfg, add_cfg_to_tree=False): gpt_cfg.max_position_embeddings = cfg.model.max_position_embeddings gpt_cfg.seq_len_interpolation_factor = cfg.model.seq_len_interpolation_factor gpt_cfg.use_flash_attention = cfg.model.use_flash_attention - + assert ( + gpt_cfg.encoder_seq_length == gpt_cfg.max_position_embeddings * gpt_cfg.seq_len_interpolation_factor + ), 'seq_length should be equal to max_position_embedding * seq_len_interpolation_factor' # This is needed when modifying a hparam file directly to load `.ckpt` files. # This is not needed to modify the cfg in `.nemo` files. if add_cfg_to_tree: diff --git a/nemo/collections/nlp/modules/common/megatron/language_model.py b/nemo/collections/nlp/modules/common/megatron/language_model.py index cb68faa0a100..bc0ef1df4ec2 100755 --- a/nemo/collections/nlp/modules/common/megatron/language_model.py +++ b/nemo/collections/nlp/modules/common/megatron/language_model.py @@ -554,7 +554,9 @@ def __init__( if rotary_percentage < 1: rotary_dim = int(rotary_dim * rotary_percentage) self.rotary_pos_emb = RotaryEmbedding( - rotary_dim, seq_len_interpolation_factor=seq_len_interpolation_factor + rotary_dim, + seq_len_interpolation_factor=seq_len_interpolation_factor, + pretrained_max_position_embeddings=max_position_embeddings, ) elif position_embedding_type == 'alibi': diff --git a/nemo/collections/nlp/modules/common/megatron/position_embedding/rotary_position_embedding.py b/nemo/collections/nlp/modules/common/megatron/position_embedding/rotary_position_embedding.py index c97010ecb911..6dba53dd1e7a 100644 --- a/nemo/collections/nlp/modules/common/megatron/position_embedding/rotary_position_embedding.py +++ b/nemo/collections/nlp/modules/common/megatron/position_embedding/rotary_position_embedding.py @@ -25,25 +25,36 @@ class RotaryEmbedding(nn.Module): Implements Rotary Position Embedding from https://arxiv.org/abs/2104.09864. """ - def __init__(self, dim: int, seq_len_interpolation_factor: int = None): + def __init__( + self, dim: int, seq_len_interpolation_factor: int = None, pretrained_max_position_embeddings: int = None + ): """ Args: dim (int): rotary embedding dimension seq_len_interpolation_factor (int): if not None, discrete positions will be interpolated by this factor via the trick in https://arxiv.org/abs/2306.15595. + pretrained_max_position_embeddings (int): pre-trained max_position_embeddings before position interpolation. """ super().__init__() self.seq_len_interpolation_factor = seq_len_interpolation_factor inv_freq = 1.0 / (10000 ** (torch.arange(0, dim, 2).float() / dim)) self.register_buffer('inv_freq', inv_freq) + self.pretrained_max_position_embeddings = pretrained_max_position_embeddings def forward(self, max_seq_len, offset=0): seq = torch.arange(max_seq_len, device=self.inv_freq.device) + offset - if self.seq_len_interpolation_factor is not None: - seq = seq.type_as(self.inv_freq) - seq *= 1 / self.seq_len_interpolation_factor - freqs = einsum('i , j -> i j', seq.type_as(self.inv_freq), self.inv_freq) + seq = seq.type_as(self.inv_freq) + + if self.pretrained_max_position_embeddings is not None and self.seq_len_interpolation_factor is not None: + if max_seq_len > self.pretrained_max_position_embeddings * self.seq_len_interpolation_factor: + # dynamic linear scaling (length > position we have learned) + seq *= 1 / (max_seq_len / self.pretrained_max_position_embeddings) + else: + # fixed linear scaling + seq *= 1 / self.seq_len_interpolation_factor + + freqs = einsum('i , j -> i j', seq, self.inv_freq) # first part even vector components, second part odd vector components, # 2 * dim in dimension size emb = torch.cat((freqs, freqs), dim=-1) From 5bb294098d92b666c4172f3facead60e350f46bd Mon Sep 17 00:00:00 2001 From: Kunal Dhawan Date: Mon, 18 Sep 2023 17:48:37 -0700 Subject: [PATCH 034/112] Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- nemo/core/classes/modelPT.py | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/nemo/core/classes/modelPT.py b/nemo/core/classes/modelPT.py index c5cc99d5208d..6e616319b12b 100644 --- a/nemo/core/classes/modelPT.py +++ b/nemo/core/classes/modelPT.py @@ -856,12 +856,18 @@ def train_dataloader(self): return self._train_dl def val_dataloader(self): - if self._validation_dl is not None: - return self._validation_dl + if self._validation_dl is None: + # None dataloader no longer supported in PTL2.0 + self._validation_dl = [] + + return self._validation_dl def test_dataloader(self): - if self._test_dl is not None: - return self._test_dl + if self._test_dl is None: + # None dataloader no longer supported in PTL2.0 + self._test_dl = [] + + return self._test_dl def on_validation_epoch_end(self) -> Optional[Dict[str, Dict[str, torch.Tensor]]]: """ From bdcde4629ca9b43ce373d7ee49e6cdbc7c41dce3 Mon Sep 17 00:00:00 2001 From: Aleksandr Laptev Date: Tue, 19 Sep 2023 07:53:05 +0700 Subject: [PATCH 035/112] [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- nemo/collections/asr/metrics/rnnt_wer.py | 28 ++-- nemo/collections/asr/metrics/rnnt_wer_bpe.py | 8 +- nemo/collections/asr/metrics/wer.py | 18 +-- nemo/collections/asr/metrics/wer_bpe.py | 8 +- .../asr/models/confidence_ensemble.py | 12 +- .../parts/submodules/ctc_greedy_decoding.py | 41 ++---- .../parts/submodules/rnnt_greedy_decoding.py | 128 +++++++----------- .../asr_confidence_benchmarking_utils.py | 6 +- .../asr/parts/utils/asr_confidence_utils.py | 105 ++++++-------- .../confidence_ensembles/build_ensemble.py | 4 +- .../confidence_ensembles/ensemble_config.yaml | 2 +- .../confidence/benchmark_asr_confidence.py | 8 +- .../asr/confidence/test_asr_confidence.py | 27 ++-- .../test_asr_hybrid_rnnt_ctc_model_char.py | 6 +- .../asr/test_asr_rnnt_encdec_model.py | 6 +- .../asr/test_confidence_ensembles.py | 6 +- tutorials/asr/ASR_Confidence_Estimation.ipynb | 14 +- 17 files changed, 172 insertions(+), 255 deletions(-) diff --git a/nemo/collections/asr/metrics/rnnt_wer.py b/nemo/collections/asr/metrics/rnnt_wer.py index fc083a2ab3f3..97c9c4575982 100644 --- a/nemo/collections/asr/metrics/rnnt_wer.py +++ b/nemo/collections/asr/metrics/rnnt_wer.py @@ -100,16 +100,16 @@ class AbstractRNNTDecoding(ConfidenceMixin): from the `token_confidence`. aggregation: Which aggregation type to use for collapsing per-token confidence into per-word confidence. Valid options are `mean`, `min`, `max`, `prod`. - measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. entropy_type: Which type of entropy to use (str). - Used if confidence_measure_cfg.name is set to `entropy`. + Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -140,7 +140,7 @@ class AbstractRNNTDecoding(ConfidenceMixin): timestep during greedy decoding. Setting to larger values allows longer sentences to be decoded, at the cost of increased execution time. preserve_frame_confidence: Same as above, overrides above value. - confidence_measure_cfg: Same as above, overrides confidence_cfg.measure_cfg. + confidence_method_cfg: Same as above, overrides confidence_cfg.method_cfg. "beam": beam_size: int, defining the beam size for beam search. Must be >= 1. @@ -277,7 +277,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int): ), preserve_alignments=self.preserve_alignments, preserve_frame_confidence=self.preserve_frame_confidence, - confidence_measure_cfg=self.confidence_measure_cfg, + confidence_method_cfg=self.confidence_method_cfg, ) else: self.decoding = greedy_decode.GreedyTDTInfer( @@ -291,7 +291,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int): ), preserve_alignments=self.preserve_alignments, preserve_frame_confidence=self.preserve_frame_confidence, - confidence_measure_cfg=self.confidence_measure_cfg, + confidence_method_cfg=self.confidence_method_cfg, ) else: self.decoding = greedy_decode.GreedyMultiblankRNNTInfer( @@ -304,7 +304,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int): ), preserve_alignments=self.preserve_alignments, preserve_frame_confidence=self.preserve_frame_confidence, - confidence_measure_cfg=self.confidence_measure_cfg, + confidence_method_cfg=self.confidence_method_cfg, ) elif self.cfg.strategy == 'greedy_batch': @@ -320,7 +320,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int): ), preserve_alignments=self.preserve_alignments, preserve_frame_confidence=self.preserve_frame_confidence, - confidence_measure_cfg=self.confidence_measure_cfg, + confidence_method_cfg=self.confidence_method_cfg, ) else: self.decoding = greedy_decode.GreedyBatchedTDTInfer( @@ -334,7 +334,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int): ), preserve_alignments=self.preserve_alignments, preserve_frame_confidence=self.preserve_frame_confidence, - confidence_measure_cfg=self.confidence_measure_cfg, + confidence_method_cfg=self.confidence_method_cfg, ) else: @@ -348,7 +348,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int): ), preserve_alignments=self.preserve_alignments, preserve_frame_confidence=self.preserve_frame_confidence, - confidence_measure_cfg=self.confidence_measure_cfg, + confidence_method_cfg=self.confidence_method_cfg, ) elif self.cfg.strategy == 'beam': @@ -1005,16 +1005,16 @@ class RNNTDecoding(AbstractRNNTDecoding): from the `token_confidence`. aggregation: Which aggregation type to use for collapsing per-token confidence into per-word confidence. Valid options are `mean`, `min`, `max`, `prod`. - measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. entropy_type: Which type of entropy to use (str). - Used if confidence_measure_cfg.name is set to `entropy`. + Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -1047,7 +1047,7 @@ class RNNTDecoding(AbstractRNNTDecoding): preserve_frame_confidence: Same as above, overrides above value. - confidence_measure_cfg: Same as above, overrides confidence_cfg.measure_cfg. + confidence_method_cfg: Same as above, overrides confidence_cfg.method_cfg. "beam": beam_size: int, defining the beam size for beam search. Must be >= 1. diff --git a/nemo/collections/asr/metrics/rnnt_wer_bpe.py b/nemo/collections/asr/metrics/rnnt_wer_bpe.py index 40ae00b413b3..e8ea8f399b99 100644 --- a/nemo/collections/asr/metrics/rnnt_wer_bpe.py +++ b/nemo/collections/asr/metrics/rnnt_wer_bpe.py @@ -100,16 +100,16 @@ class RNNTBPEDecoding(AbstractRNNTDecoding): from the `token_confidence`. aggregation: Which aggregation type to use for collapsing per-token confidence into per-word confidence. Valid options are `mean`, `min`, `max`, `prod`. - measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. entropy_type: Which type of entropy to use (str). - Used if confidence_measure_cfg.name is set to `entropy`. + Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -142,7 +142,7 @@ class RNNTBPEDecoding(AbstractRNNTDecoding): preserve_frame_confidence: Same as above, overrides above value. - confidence_measure_cfg: Same as above, overrides confidence_cfg.measure_cfg. + confidence_method_cfg: Same as above, overrides confidence_cfg.method_cfg. "beam": beam_size: int, defining the beam size for beam search. Must be >= 1. diff --git a/nemo/collections/asr/metrics/wer.py b/nemo/collections/asr/metrics/wer.py index 7802e3b8a0c9..14fa46b308ab 100644 --- a/nemo/collections/asr/metrics/wer.py +++ b/nemo/collections/asr/metrics/wer.py @@ -258,16 +258,16 @@ class AbstractCTCDecoding(ConfidenceMixin): from the `token_confidence`. aggregation: Which aggregation type to use for collapsing per-token confidence into per-word confidence. Valid options are `mean`, `min`, `max`, `prod`. - measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. entropy_type: Which type of entropy to use (str). - Used if confidence_measure_cfg.name is set to `entropy`. + Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -300,7 +300,7 @@ class AbstractCTCDecoding(ConfidenceMixin): preserve_alignments: Same as above, overrides above value. compute_timestamps: Same as above, overrides above value. preserve_frame_confidence: Same as above, overrides above value. - confidence_measure_cfg: Same as above, overrides confidence_cfg.measure_cfg. + confidence_method_cfg: Same as above, overrides confidence_cfg.method_cfg. "beam": beam_size: int, defining the beam size for beam search. Must be >= 1. @@ -389,7 +389,7 @@ def __init__(self, decoding_cfg, blank_id: int): preserve_alignments=self.preserve_alignments, compute_timestamps=self.compute_timestamps, preserve_frame_confidence=self.preserve_frame_confidence, - confidence_measure_cfg=self.confidence_measure_cfg, + confidence_method_cfg=self.confidence_method_cfg, ) elif self.cfg.strategy == 'beam': @@ -1037,16 +1037,16 @@ class CTCDecoding(AbstractCTCDecoding): from the `token_confidence`. aggregation: Which aggregation type to use for collapsing per-token confidence into per-word confidence. Valid options are `mean`, `min`, `max`, `prod`. - measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. entropy_type: Which type of entropy to use (str). - Used if confidence_measure_cfg.name is set to `entropy`. + Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -1079,7 +1079,7 @@ class CTCDecoding(AbstractCTCDecoding): preserve_alignments: Same as above, overrides above value. compute_timestamps: Same as above, overrides above value. preserve_frame_confidence: Same as above, overrides above value. - confidence_measure_cfg: Same as above, overrides confidence_cfg.measure_cfg. + confidence_method_cfg: Same as above, overrides confidence_cfg.method_cfg. "beam": beam_size: int, defining the beam size for beam search. Must be >= 1. diff --git a/nemo/collections/asr/metrics/wer_bpe.py b/nemo/collections/asr/metrics/wer_bpe.py index 0a277e57e86a..b95bb62008ae 100644 --- a/nemo/collections/asr/metrics/wer_bpe.py +++ b/nemo/collections/asr/metrics/wer_bpe.py @@ -74,16 +74,16 @@ class CTCBPEDecoding(AbstractCTCDecoding): from the `token_confidence`. aggregation: Which aggregation type to use for collapsing per-token confidence into per-word confidence. Valid options are `mean`, `min`, `max`, `prod`. - measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. entropy_type: Which type of entropy to use (str). - Used if confidence_measure_cfg.name is set to `entropy`. + Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -116,7 +116,7 @@ class CTCBPEDecoding(AbstractCTCDecoding): preserve_alignments: Same as above, overrides above value. compute_timestamps: Same as above, overrides above value. preserve_frame_confidence: Same as above, overrides above value. - confidence_measure_cfg: Same as above, overrides confidence_cfg.measure_cfg. + confidence_method_cfg: Same as above, overrides confidence_cfg.method_cfg. "beam": beam_size: int, defining the beam size for beam search. Must be >= 1. diff --git a/nemo/collections/asr/models/confidence_ensemble.py b/nemo/collections/asr/models/confidence_ensemble.py index bf65ff96ef5c..dcbb0a05976c 100644 --- a/nemo/collections/asr/models/confidence_ensemble.py +++ b/nemo/collections/asr/models/confidence_ensemble.py @@ -25,7 +25,7 @@ from nemo.collections.asr.models.hybrid_rnnt_ctc_models import EncDecHybridRNNTCTCModel from nemo.collections.asr.parts.utils.asr_confidence_utils import ( ConfidenceConfig, - ConfidenceMeasureConfig, + ConfidenceMethodConfig, get_confidence_aggregation_bank, get_confidence_measure_bank, ) @@ -61,7 +61,7 @@ def to_confidence_config(self) -> ConfidenceConfig: return ConfidenceConfig( exclude_blank=self.exclude_blank, aggregation=self.aggregation, - measure_cfg=ConfidenceMeasureConfig( + method_cfg=ConfidenceMethodConfig( name=name, entropy_type=entropy_type, alpha=self.alpha, entropy_norm=entropy_norm, ), ) @@ -126,7 +126,7 @@ def compute_confidence(hypothesis: Hypothesis, confidence_cfg: ConfidenceConfig) hypothesis: generated hypothesis as returned from the transcribe method of the ASR model. confidence_cfg: confidence config specifying what kind of - measure/aggregation should be used. + method/aggregation should be used. Returns: float: confidence score. @@ -135,12 +135,12 @@ def compute_confidence(hypothesis: Hypothesis, confidence_cfg: ConfidenceConfig) filtered_logprobs = get_filtered_logprobs(hypothesis, confidence_cfg.exclude_blank) vocab_size = filtered_logprobs.shape[1] aggr_func = get_confidence_aggregation_bank()[confidence_cfg.aggregation] - if confidence_cfg.measure_cfg.name == "max_prob": + if confidence_cfg.method_cfg.name == "max_prob": conf_type = "max_prob" alpha = 1.0 else: - conf_type = f"entropy_{confidence_cfg.measure_cfg.entropy_type}_{confidence_cfg.measure_cfg.entropy_norm}" - alpha = confidence_cfg.measure_cfg.alpha + conf_type = f"entropy_{confidence_cfg.method_cfg.entropy_type}_{confidence_cfg.method_cfg.entropy_norm}" + alpha = confidence_cfg.method_cfg.alpha conf_func = get_confidence_measure_bank()[conf_type] conf_value = aggr_func(conf_func(filtered_logprobs, v=vocab_size, t=alpha)).cpu().item() diff --git a/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py b/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py index 1f29a511fc9c..44ae9f4a134b 100644 --- a/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py +++ b/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py @@ -19,7 +19,7 @@ from omegaconf import DictConfig, OmegaConf from nemo.collections.asr.parts.utils import rnnt_utils -from nemo.collections.asr.parts.utils.asr_confidence_utils import ConfidenceMeasureConfig, ConfidenceMeasureMixin +from nemo.collections.asr.parts.utils.asr_confidence_utils import ConfidenceMethodConfig, ConfidenceMethodMixin from nemo.core.classes import Typing, typecheck from nemo.core.neural_types import HypothesisType, LengthsType, LogprobsType, NeuralType from nemo.utils import logging @@ -55,7 +55,7 @@ def _states_to_device(dec_state, device='cpu'): return dec_state -class GreedyCTCInfer(Typing, ConfidenceMeasureMixin): +class GreedyCTCInfer(Typing, ConfidenceMethodMixin): """A greedy CTC decoder. Provides a common abstraction for sample level and batch level greedy decoding. @@ -71,15 +71,15 @@ class GreedyCTCInfer(Typing, ConfidenceMeasureMixin): preserve_frame_confidence: Bool flag which preserves the history of per-frame confidence scores generated during decoding. When set to true, the Hypothesis will contain the non-null value for `frame_confidence` in it. Here, `frame_confidence` is a List of floats. - confidence_measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + confidence_method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. - entropy_type: Which type of entropy to use (str). Used if confidence_measure_cfg.name is set to `entropy`. + entropy_type: Which type of entropy to use (str). Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -130,7 +130,7 @@ def __init__( preserve_alignments: bool = False, compute_timestamps: bool = False, preserve_frame_confidence: bool = False, - confidence_measure_cfg: Optional[DictConfig] = None, + confidence_method_cfg: Optional[DictConfig] = None, ): super().__init__() @@ -140,8 +140,8 @@ def __init__( self.compute_timestamps = compute_timestamps | preserve_frame_confidence self.preserve_frame_confidence = preserve_frame_confidence - # set confidence calculation measure - self._init_confidence_measure(confidence_measure_cfg) + # set confidence calculation method + self._init_confidence_method(confidence_method_cfg) @typecheck() def forward( @@ -253,27 +253,12 @@ class GreedyCTCInferConfig: preserve_alignments: bool = False compute_timestamps: bool = False preserve_frame_confidence: bool = False - confidence_measure_cfg: Optional[ConfidenceMeasureConfig] = ConfidenceMeasureConfig() - confidence_method_cfg: str = "DEPRECATED" + confidence_method_cfg: Optional[ConfidenceMethodConfig] = ConfidenceMethodConfig() def __post_init__(self): # OmegaConf.structured ensures that post_init check is always executed - self.confidence_measure_cfg = OmegaConf.structured( - self.confidence_measure_cfg - if isinstance(self.confidence_measure_cfg, ConfidenceMeasureConfig) - else ConfidenceMeasureConfig(**self.confidence_measure_cfg) + self.confidence_method_cfg = OmegaConf.structured( + self.confidence_method_cfg + if isinstance(self.confidence_method_cfg, ConfidenceMethodConfig) + else ConfidenceMethodConfig(**self.confidence_method_cfg) ) - if self.confidence_method_cfg != "DEPRECATED": - logging.warning( - "`confidence_method_cfg` is deprecated and will be removed in the future. " - "Please use `confidence_measure_cfg` instead." - ) - - # TODO (alaptev): delete the following two lines sometime in the future - logging.warning("Re-writing `confidence_measure_cfg` with the value of `confidence_method_cfg`.") - # OmegaConf.structured ensures that post_init check is always executed - self.confidence_measure_cfg = OmegaConf.structured( - self.confidence_method_cfg - if isinstance(self.confidence_method_cfg, ConfidenceMeasureConfig) - else ConfidenceMeasureConfig(**self.confidence_method_cfg) - ) diff --git a/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py b/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py index dfa3ac27854b..185a3abf1151 100644 --- a/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py +++ b/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py @@ -35,7 +35,7 @@ from nemo.collections.asr.modules import rnnt_abstract from nemo.collections.asr.parts.utils import rnnt_utils -from nemo.collections.asr.parts.utils.asr_confidence_utils import ConfidenceMeasureConfig, ConfidenceMeasureMixin +from nemo.collections.asr.parts.utils.asr_confidence_utils import ConfidenceMethodConfig, ConfidenceMethodMixin from nemo.collections.common.parts.rnn import label_collate from nemo.core.classes import Typing, typecheck from nemo.core.neural_types import AcousticEncodedRepresentation, ElementType, HypothesisType, LengthsType, NeuralType @@ -69,7 +69,7 @@ def _states_to_device(dec_state, device='cpu'): return dec_state -class _GreedyRNNTInfer(Typing, ConfidenceMeasureMixin): +class _GreedyRNNTInfer(Typing, ConfidenceMethodMixin): """A greedy transducer decoder. Provides a common abstraction for sample level and batch level greedy decoding. @@ -96,15 +96,15 @@ class _GreedyRNNTInfer(Typing, ConfidenceMeasureMixin): The length of the list corresponds to the Acoustic Length (T). Each value in the list (Ti) is a torch.Tensor (U), representing 1 or more confidence scores. U is the number of target tokens for the current timestep Ti. - confidence_measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + confidence_method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. - entropy_type: Which type of entropy to use (str). Used if confidence_measure_cfg.name is set to `entropy`. + entropy_type: Which type of entropy to use (str). Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -154,7 +154,7 @@ def __init__( max_symbols_per_step: Optional[int] = None, preserve_alignments: bool = False, preserve_frame_confidence: bool = False, - confidence_measure_cfg: Optional[DictConfig] = None, + confidence_method_cfg: Optional[DictConfig] = None, ): super().__init__() self.decoder = decoder_model @@ -166,8 +166,8 @@ def __init__( self.preserve_alignments = preserve_alignments self.preserve_frame_confidence = preserve_frame_confidence - # set confidence calculation measure - self._init_confidence_measure(confidence_measure_cfg) + # set confidence calculation method + self._init_confidence_method(confidence_method_cfg) def __call__(self, *args, **kwargs): return self.forward(*args, **kwargs) @@ -263,15 +263,15 @@ class GreedyRNNTInfer(_GreedyRNNTInfer): The length of the list corresponds to the Acoustic Length (T). Each value in the list (Ti) is a torch.Tensor (U), representing 1 or more confidence scores. U is the number of target tokens for the current timestep Ti. - confidence_measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + confidence_method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. - entropy_type: Which type of entropy to use (str). Used if confidence_measure_cfg.name is set to `entropy`. + entropy_type: Which type of entropy to use (str). Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -305,7 +305,7 @@ def __init__( max_symbols_per_step: Optional[int] = None, preserve_alignments: bool = False, preserve_frame_confidence: bool = False, - confidence_measure_cfg: Optional[DictConfig] = None, + confidence_method_cfg: Optional[DictConfig] = None, ): super().__init__( decoder_model=decoder_model, @@ -314,7 +314,7 @@ def __init__( max_symbols_per_step=max_symbols_per_step, preserve_alignments=preserve_alignments, preserve_frame_confidence=preserve_frame_confidence, - confidence_measure_cfg=confidence_measure_cfg, + confidence_method_cfg=confidence_method_cfg, ) @typecheck() @@ -502,15 +502,15 @@ class GreedyBatchedRNNTInfer(_GreedyRNNTInfer): The length of the list corresponds to the Acoustic Length (T). Each value in the list (Ti) is a torch.Tensor (U), representing 1 or more confidence scores. U is the number of target tokens for the current timestep Ti. - confidence_measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + confidence_method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. - entropy_type: Which type of entropy to use (str). Used if confidence_measure_cfg.name is set to `entropy`. + entropy_type: Which type of entropy to use (str). Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -544,7 +544,7 @@ def __init__( max_symbols_per_step: Optional[int] = None, preserve_alignments: bool = False, preserve_frame_confidence: bool = False, - confidence_measure_cfg: Optional[DictConfig] = None, + confidence_method_cfg: Optional[DictConfig] = None, ): super().__init__( decoder_model=decoder_model, @@ -553,7 +553,7 @@ def __init__( max_symbols_per_step=max_symbols_per_step, preserve_alignments=preserve_alignments, preserve_frame_confidence=preserve_frame_confidence, - confidence_measure_cfg=confidence_measure_cfg, + confidence_method_cfg=confidence_method_cfg, ) # Depending on availability of `blank_as_pad` support @@ -1478,15 +1478,15 @@ class GreedyMultiblankRNNTInfer(GreedyRNNTInfer): The length of the list corresponds to the Acoustic Length (T). Each value in the list (Ti) is a torch.Tensor (U), representing 1 or more confidence scores. U is the number of target tokens for the current timestep Ti. - confidence_measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + confidence_method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. - entropy_type: Which type of entropy to use (str). Used if confidence_measure_cfg.name is set to `entropy`. + entropy_type: Which type of entropy to use (str). Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -1521,7 +1521,7 @@ def __init__( max_symbols_per_step: Optional[int] = None, preserve_alignments: bool = False, preserve_frame_confidence: bool = False, - confidence_measure_cfg: Optional[DictConfig] = None, + confidence_method_cfg: Optional[DictConfig] = None, ): super().__init__( decoder_model=decoder_model, @@ -1530,7 +1530,7 @@ def __init__( max_symbols_per_step=max_symbols_per_step, preserve_alignments=preserve_alignments, preserve_frame_confidence=preserve_frame_confidence, - confidence_measure_cfg=confidence_measure_cfg, + confidence_method_cfg=confidence_method_cfg, ) self.big_blank_durations = big_blank_durations self._SOS = blank_index - len(big_blank_durations) @@ -1682,15 +1682,15 @@ class GreedyBatchedMultiblankRNNTInfer(GreedyBatchedRNNTInfer): The length of the list corresponds to the Acoustic Length (T). Each value in the list (Ti) is a torch.Tensor (U), representing 1 or more confidence scores. U is the number of target tokens for the current timestep Ti. - confidence_measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + confidence_method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. - entropy_type: Which type of entropy to use (str). Used if confidence_measure_cfg.name is set to `entropy`. + entropy_type: Which type of entropy to use (str). Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -1725,7 +1725,7 @@ def __init__( max_symbols_per_step: Optional[int] = None, preserve_alignments: bool = False, preserve_frame_confidence: bool = False, - confidence_measure_cfg: Optional[DictConfig] = None, + confidence_method_cfg: Optional[DictConfig] = None, ): super().__init__( decoder_model=decoder_model, @@ -1734,7 +1734,7 @@ def __init__( max_symbols_per_step=max_symbols_per_step, preserve_alignments=preserve_alignments, preserve_frame_confidence=preserve_frame_confidence, - confidence_measure_cfg=confidence_measure_cfg, + confidence_method_cfg=confidence_method_cfg, ) self.big_blank_durations = big_blank_durations @@ -2203,31 +2203,15 @@ class GreedyRNNTInferConfig: max_symbols_per_step: Optional[int] = 10 preserve_alignments: bool = False preserve_frame_confidence: bool = False - confidence_measure_cfg: Optional[ConfidenceMeasureConfig] = ConfidenceMeasureConfig() - confidence_method_cfg: str = "DEPRECATED" + confidence_method_cfg: Optional[ConfidenceMethodConfig] = ConfidenceMethodConfig() def __post_init__(self): # OmegaConf.structured ensures that post_init check is always executed - self.confidence_measure_cfg = OmegaConf.structured( - self.confidence_measure_cfg - if isinstance(self.confidence_measure_cfg, ConfidenceMeasureConfig) - else ConfidenceMeasureConfig(**self.confidence_measure_cfg) + self.confidence_method_cfg = OmegaConf.structured( + self.confidence_method_cfg + if isinstance(self.confidence_method_cfg, ConfidenceMethodConfig) + else ConfidenceMethodConfig(**self.confidence_method_cfg) ) - if self.confidence_method_cfg != "DEPRECATED": - logging.warning( - "`confidence_method_cfg` is deprecated and will be removed in the future. " - "Please use `confidence_measure_cfg` instead." - ) - - # TODO (alaptev): delete the following two lines sometime in the future - logging.warning("Re-writing `confidence_measure_cfg` with the value of `confidence_method_cfg`.") - # OmegaConf.structured ensures that post_init check is always executed - self.confidence_measure_cfg = OmegaConf.structured( - self.confidence_method_cfg - if isinstance(self.confidence_method_cfg, ConfidenceMeasureConfig) - else ConfidenceMeasureConfig(**self.confidence_method_cfg) - ) - self.confidence_method_cfg = "DEPRECATED" @dataclass @@ -2235,31 +2219,15 @@ class GreedyBatchedRNNTInferConfig: max_symbols_per_step: Optional[int] = 10 preserve_alignments: bool = False preserve_frame_confidence: bool = False - confidence_measure_cfg: Optional[ConfidenceMeasureConfig] = ConfidenceMeasureConfig() - confidence_method_cfg: str = "DEPRECATED" + confidence_method_cfg: Optional[ConfidenceMethodConfig] = ConfidenceMethodConfig() def __post_init__(self): # OmegaConf.structured ensures that post_init check is always executed - self.confidence_measure_cfg = OmegaConf.structured( - self.confidence_measure_cfg - if isinstance(self.confidence_measure_cfg, ConfidenceMeasureConfig) - else ConfidenceMeasureConfig(**self.confidence_measure_cfg) + self.confidence_method_cfg = OmegaConf.structured( + self.confidence_method_cfg + if isinstance(self.confidence_method_cfg, ConfidenceMethodConfig) + else ConfidenceMethodConfig(**self.confidence_method_cfg) ) - if self.confidence_method_cfg != "DEPRECATED": - logging.warning( - "`confidence_method_cfg` is deprecated and will be removed in the future. " - "Please use `confidence_measure_cfg` instead." - ) - - # TODO (alaptev): delete the following two lines sometime in the future - logging.warning("Re-writing `confidence_measure_cfg` with the value of `confidence_method_cfg`.") - # OmegaConf.structured ensures that post_init check is always executed - self.confidence_measure_cfg = OmegaConf.structured( - self.confidence_method_cfg - if isinstance(self.confidence_method_cfg, ConfidenceMeasureConfig) - else ConfidenceMeasureConfig(**self.confidence_method_cfg) - ) - self.confidence_method_cfg = "DEPRECATED" class GreedyTDTInfer(_GreedyRNNTInfer): @@ -2288,15 +2256,15 @@ class GreedyTDTInfer(_GreedyRNNTInfer): The length of the list corresponds to the Acoustic Length (T). Each value in the list (Ti) is a torch.Tensor (U), representing 1 or more confidence scores. U is the number of target tokens for the current timestep Ti. - confidence_measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + confidence_method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. - entropy_type: Which type of entropy to use (str). Used if confidence_measure_cfg.name is set to `entropy`. + entropy_type: Which type of entropy to use (str). Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -2331,7 +2299,7 @@ def __init__( max_symbols_per_step: Optional[int] = None, preserve_alignments: bool = False, preserve_frame_confidence: bool = False, - confidence_measure_cfg: Optional[DictConfig] = None, + confidence_method_cfg: Optional[DictConfig] = None, ): super().__init__( decoder_model=decoder_model, @@ -2340,7 +2308,7 @@ def __init__( max_symbols_per_step=max_symbols_per_step, preserve_alignments=preserve_alignments, preserve_frame_confidence=preserve_frame_confidence, - confidence_measure_cfg=confidence_measure_cfg, + confidence_method_cfg=confidence_method_cfg, ) self.durations = durations @@ -2544,15 +2512,15 @@ class GreedyBatchedTDTInfer(_GreedyRNNTInfer): The length of the list corresponds to the Acoustic Length (T). Each value in the list (Ti) is a torch.Tensor (U), representing 1 or more confidence scores. U is the number of target tokens for the current timestep Ti. - confidence_measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + confidence_method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. - entropy_type: Which type of entropy to use (str). Used if confidence_measure_cfg.name is set to `entropy`. + entropy_type: Which type of entropy to use (str). Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -2587,7 +2555,7 @@ def __init__( max_symbols_per_step: Optional[int] = None, preserve_alignments: bool = False, preserve_frame_confidence: bool = False, - confidence_measure_cfg: Optional[DictConfig] = None, + confidence_method_cfg: Optional[DictConfig] = None, ): super().__init__( decoder_model=decoder_model, @@ -2596,7 +2564,7 @@ def __init__( max_symbols_per_step=max_symbols_per_step, preserve_alignments=preserve_alignments, preserve_frame_confidence=preserve_frame_confidence, - confidence_measure_cfg=confidence_measure_cfg, + confidence_method_cfg=confidence_method_cfg, ) self.durations = durations diff --git a/nemo/collections/asr/parts/utils/asr_confidence_benchmarking_utils.py b/nemo/collections/asr/parts/utils/asr_confidence_benchmarking_utils.py index 958195a4bb11..0e057e012542 100644 --- a/nemo/collections/asr/parts/utils/asr_confidence_benchmarking_utils.py +++ b/nemo/collections/asr/parts/utils/asr_confidence_benchmarking_utils.py @@ -173,11 +173,11 @@ def apply_confidence_parameters(decoding_cfg, hp): """ new_decoding_cfg = copy.deepcopy(decoding_cfg) confidence_cfg_fields = ("aggregation", "exclude_blank") - confidence_measure_cfg_fields = ("name", "alpha", "entropy_type", "entropy_norm") + confidence_method_cfg_fields = ("name", "alpha", "entropy_type", "entropy_norm") with open_dict(new_decoding_cfg): for p, v in hp.items(): if p in confidence_cfg_fields: new_decoding_cfg.confidence_cfg[p] = v - elif p in confidence_measure_cfg_fields: - new_decoding_cfg.confidence_cfg.measure_cfg[p] = v + elif p in confidence_method_cfg_fields: + new_decoding_cfg.confidence_cfg.method_cfg[p] = v return new_decoding_cfg diff --git a/nemo/collections/asr/parts/utils/asr_confidence_utils.py b/nemo/collections/asr/parts/utils/asr_confidence_utils.py index 29c49529a509..ddfac3744c6a 100644 --- a/nemo/collections/asr/parts/utils/asr_confidence_utils.py +++ b/nemo/collections/asr/parts/utils/asr_confidence_utils.py @@ -25,7 +25,7 @@ from nemo.utils import logging -class ConfidenceMeasureConstants: +class ConfidenceMethodConstants: NAMES = ("max_prob", "entropy") ENTROPY_TYPES = ("gibbs", "tsallis", "renyi") ENTROPY_NORMS = ("lin", "exp") @@ -48,17 +48,17 @@ def print(cls): @dataclass -class ConfidenceMeasureConfig: - """A Config which contains the measure name and settings to compute per-frame confidence scores. +class ConfidenceMethodConfig: + """A Config which contains the method name and settings to compute per-frame confidence scores. Args: - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. entropy_type: Which type of entropy to use (str). - Used if confidence_measure_cfg.name is set to `entropy`. + Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -92,31 +92,25 @@ class ConfidenceMeasureConfig: def __post_init__(self): if self.temperature != "DEPRECATED": - logging.warning( - "`temperature` is deprecated and will be removed in the future. Please use `alpha` instead." - ) - - # TODO (alaptev): delete the following two lines sometime in the future - logging.warning("Re-writing `alpha` with the value of `temperature`.") # self.temperature has type str self.alpha = float(self.temperature) self.temperature = "DEPRECATED" - if self.name not in ConfidenceMeasureConstants.NAMES: + if self.name not in ConfidenceMethodConstants.NAMES: raise ValueError( f"`name` must be one of the following: " - f"{'`' + '`, `'.join(ConfidenceMeasureConstants.NAMES) + '`'}. Provided: `{self.name}`" + f"{'`' + '`, `'.join(ConfidenceMethodConstants.NAMES) + '`'}. Provided: `{self.name}`" ) - if self.entropy_type not in ConfidenceMeasureConstants.ENTROPY_TYPES: + if self.entropy_type not in ConfidenceMethodConstants.ENTROPY_TYPES: raise ValueError( f"`entropy_type` must be one of the following: " - f"{'`' + '`, `'.join(ConfidenceMeasureConstants.ENTROPY_TYPES) + '`'}. Provided: `{self.entropy_type}`" + f"{'`' + '`, `'.join(ConfidenceMethodConstants.ENTROPY_TYPES) + '`'}. Provided: `{self.entropy_type}`" ) if self.alpha <= 0.0: raise ValueError(f"`alpha` must be > 0. Provided: {self.alpha}") - if self.entropy_norm not in ConfidenceMeasureConstants.ENTROPY_NORMS: + if self.entropy_norm not in ConfidenceMethodConstants.ENTROPY_NORMS: raise ValueError( f"`entropy_norm` must be one of the following: " - f"{'`' + '`, `'.join(ConfidenceMeasureConstants.ENTROPY_NORMS) + '`'}. Provided: `{self.entropy_norm}`" + f"{'`' + '`, `'.join(ConfidenceMethodConstants.ENTROPY_NORMS) + '`'}. Provided: `{self.entropy_norm}`" ) @@ -142,15 +136,15 @@ class ConfidenceConfig: from the `token_confidence`. aggregation: Which aggregation type to use for collapsing per-token confidence into per-word confidence. Valid options are `mean`, `min`, `max`, `prod`. - measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame + method_cfg: A dict-like object which contains the method name and settings to compute per-frame confidence scores. - name: The measure name (str). + name: The method name (str). Supported values: - 'max_prob' for using the maximum token probability as a confidence. - 'entropy' for using a normalized entropy of a log-likelihood vector. - entropy_type: Which type of entropy to use (str). Used if confidence_measure_cfg.name is set to `entropy`. + entropy_type: Which type of entropy to use (str). Used if confidence_method_cfg.name is set to `entropy`. Supported values: - 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided, the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)). @@ -181,34 +175,19 @@ class ConfidenceConfig: preserve_word_confidence: bool = False exclude_blank: bool = True aggregation: str = "min" - measure_cfg: ConfidenceMeasureConfig = ConfidenceMeasureConfig() - method_cfg: str = "DEPRECATED" + method_cfg: ConfidenceMethodConfig = ConfidenceMethodConfig() def __post_init__(self): # OmegaConf.structured ensures that post_init check is always executed - self.measure_cfg = OmegaConf.structured( - self.measure_cfg - if isinstance(self.measure_cfg, ConfidenceMeasureConfig) - else ConfidenceMeasureConfig(**self.measure_cfg) + self.method_cfg = OmegaConf.structured( + self.method_cfg + if isinstance(self.method_cfg, ConfidenceMethodConfig) + else ConfidenceMethodConfig(**self.method_cfg) ) - if self.method_cfg != "DEPRECATED": - logging.warning( - "`method_cfg` is deprecated and will be removed in the future. Please use `measure_cfg` instead." - ) - - # TODO (alaptev): delete the following two lines sometime in the future - logging.warning("Re-writing `measure_cfg` with the value of `method_cfg`.") - # OmegaConf.structured ensures that post_init check is always executed - self.measure_cfg = OmegaConf.structured( - self.method_cfg - if isinstance(self.method_cfg, ConfidenceMeasureConfig) - else ConfidenceMeasureConfig(**self.method_cfg) - ) - self.method_cfg = "DEPRECATED" if self.aggregation not in ConfidenceConstants.AGGREGATIONS: raise ValueError( f"`aggregation` has to be one of the following: " - f"{'`' + '`, `'.join(ConfidenceMeasureConstants.AGGREGATIONS) + '`'}. Provided: `{self.aggregation}`" + f"{'`' + '`, `'.join(ConfidenceConstants.AGGREGATIONS) + '`'}. Provided: `{self.aggregation}`" ) @@ -284,7 +263,7 @@ def entropy_gibbs_exp(x, v, t): def get_confidence_aggregation_bank(): """Generate a dictionary with confidence aggregation functions. - Supported confidence measures: + Supported confidence aggregation functions: min: minimum max: maximum mean: arithmetic mean @@ -305,26 +284,26 @@ def get_confidence_aggregation_bank(): return confidence_aggregation_bank -class ConfidenceMeasureMixin(ABC): - """Confidence Measure Mixin class. +class ConfidenceMethodMixin(ABC): + """Confidence Method Mixin class. - It initializes per-frame confidence measure. + It initializes per-frame confidence method. """ - def _init_confidence_measure(self, confidence_measure_cfg: Optional[DictConfig] = None): - """Initialize per-frame confidence measure from config. + def _init_confidence_method(self, confidence_method_cfg: Optional[DictConfig] = None): + """Initialize per-frame confidence method from config. """ # OmegaConf.structured ensures that post_init check is always executed - confidence_measure_cfg = OmegaConf.structured( - ConfidenceMeasureConfig() - if confidence_measure_cfg is None - else ConfidenceMeasureConfig(**confidence_measure_cfg) + confidence_method_cfg = OmegaConf.structured( + ConfidenceMethodConfig() + if confidence_method_cfg is None + else ConfidenceMethodConfig(**confidence_method_cfg) ) - # set confidence calculation measure + # set confidence calculation method # we suppose that self.blank_id == len(vocabulary) self.num_tokens = (self.blank_id if hasattr(self, "blank_id") else self._blank_index) + 1 - self.alpha = confidence_measure_cfg.alpha + self.alpha = confidence_method_cfg.alpha # init confidence measure bank self.confidence_measure_bank = get_confidence_measure_bank() @@ -332,14 +311,14 @@ def _init_confidence_measure(self, confidence_measure_cfg: Optional[DictConfig] measure = None # construct measure_name measure_name = "" - if confidence_measure_cfg.name == "max_prob": + if confidence_method_cfg.name == "max_prob": measure_name = "max_prob" - elif confidence_measure_cfg.name == "entropy": + elif confidence_method_cfg.name == "entropy": measure_name = '_'.join( - [confidence_measure_cfg.name, confidence_measure_cfg.entropy_type, confidence_measure_cfg.entropy_norm] + [confidence_method_cfg.name, confidence_method_cfg.entropy_type, confidence_method_cfg.entropy_norm] ) else: - raise ValueError(f"Unsupported `confidence_measure_cfg.name`: `{confidence_measure_cfg.name}`") + raise ValueError(f"Unsupported `confidence_method_cfg.name`: `{confidence_method_cfg.name}`") if measure_name not in self.confidence_measure_bank: raise ValueError(f"Unsupported measure setup: `{measure_name}`") measure = partial(self.confidence_measure_bank[measure_name], v=self.num_tokens, t=self.alpha) @@ -359,7 +338,7 @@ def _init_confidence(self, confidence_cfg: Optional[DictConfig] = None): confidence_cfg = OmegaConf.structured( ConfidenceConfig() if confidence_cfg is None else ConfidenceConfig(**confidence_cfg) ) - self.confidence_measure_cfg = confidence_cfg.measure_cfg + self.confidence_method_cfg = confidence_cfg.method_cfg # extract the config self.preserve_word_confidence = confidence_cfg.get('preserve_word_confidence', False) @@ -384,11 +363,11 @@ def _init_confidence(self, confidence_cfg: Optional[DictConfig] = None): if self.cfg.strategy in ['greedy', 'greedy_batch']: self.preserve_frame_confidence = self.cfg.greedy.get('preserve_frame_confidence', False) # OmegaConf.structured ensures that post_init check is always executed - confidence_measure_cfg = OmegaConf.structured(self.cfg.greedy).get('confidence_measure_cfg', None) - self.confidence_measure_cfg = ( - OmegaConf.structured(ConfidenceMeasureConfig()) - if confidence_measure_cfg is None - else OmegaConf.structured(ConfidenceMeasureConfig(**confidence_measure_cfg)) + confidence_method_cfg = OmegaConf.structured(self.cfg.greedy).get('confidence_method_cfg', None) + self.confidence_method_cfg = ( + OmegaConf.structured(ConfidenceMethodConfig()) + if confidence_method_cfg is None + else OmegaConf.structured(ConfidenceMethodConfig(**confidence_method_cfg)) ) @abstractmethod diff --git a/scripts/confidence_ensembles/build_ensemble.py b/scripts/confidence_ensembles/build_ensemble.py index bc32a4f99840..99bfa6187b30 100644 --- a/scripts/confidence_ensembles/build_ensemble.py +++ b/scripts/confidence_ensembles/build_ensemble.py @@ -97,7 +97,7 @@ ) from nemo.collections.asr.parts.utils.asr_confidence_utils import ( ConfidenceConfig, - ConfidenceMeasureConfig, + ConfidenceMethodConfig, get_confidence_aggregation_bank, get_confidence_measure_bank, ) @@ -214,7 +214,7 @@ class BuildEnsembleConfig: preserve_frame_confidence=True, exclude_blank=True, aggregation="mean", - measure_cfg=ConfidenceMeasureConfig(name="entropy", entropy_type="renyi", alpha=0.25, entropy_norm="lin",), + method_cfg=ConfidenceMethodConfig(name="entropy", entropy_type="renyi", alpha=0.25, entropy_norm="lin",), ) temperature: float = 1.0 diff --git a/scripts/confidence_ensembles/ensemble_config.yaml b/scripts/confidence_ensembles/ensemble_config.yaml index 590318ee3b28..8184d4d5acb5 100644 --- a/scripts/confidence_ensembles/ensemble_config.yaml +++ b/scripts/confidence_ensembles/ensemble_config.yaml @@ -16,7 +16,7 @@ temperature: 1.0 confidence: exclude_blank: True aggregation: mean - measure_cfg: + method_cfg: name: entropy entropy_type: renyi alpha: 0.25 diff --git a/scripts/speech_recognition/confidence/benchmark_asr_confidence.py b/scripts/speech_recognition/confidence/benchmark_asr_confidence.py index 8922fe09176d..246aa61c2c0e 100644 --- a/scripts/speech_recognition/confidence/benchmark_asr_confidence.py +++ b/scripts/speech_recognition/confidence/benchmark_asr_confidence.py @@ -83,11 +83,11 @@ def get_experiment_params(cfg): """ blank = "no_blank" if cfg.exclude_blank else "blank" aggregation = cfg.aggregation - method_name = cfg.measure_cfg.name - alpha = cfg.measure_cfg.alpha + method_name = cfg.method_cfg.name + alpha = cfg.method_cfg.alpha if method_name == "entropy": - entropy_type = cfg.measure_cfg.entropy_type - entropy_norm = cfg.measure_cfg.entropy_norm + entropy_type = cfg.method_cfg.entropy_type + entropy_norm = cfg.method_cfg.entropy_norm experiment_param_list = [ aggregation, str(cfg.exclude_blank), diff --git a/tests/collections/asr/confidence/test_asr_confidence.py b/tests/collections/asr/confidence/test_asr_confidence.py index 11b127424908..e95a0bd8127b 100644 --- a/tests/collections/asr/confidence/test_asr_confidence.py +++ b/tests/collections/asr/confidence/test_asr_confidence.py @@ -106,32 +106,21 @@ def test_run_confidence_benchmark( @pytest.mark.integration @pytest.mark.with_downloads @pytest.mark.parametrize('model_name', ("ctc", "rnnt")) - @pytest.mark.parametrize('arg', ("method_cfg", "temperature", "all")) - def test_deprecated_config_args(self, model_name, arg, conformer_ctc_bpe_model, conformer_rnnt_bpe_model): - assert ConfidenceConfig().measure_cfg.alpha == 0.33, "default `alpha` is supposed to be 0.33" + def test_deprecated_config_args(self, model_name, conformer_ctc_bpe_model, conformer_rnnt_bpe_model): + assert ConfidenceConfig().method_cfg.alpha == 0.33, "default `alpha` is supposed to be 0.33" model = conformer_ctc_bpe_model if model_name == "ctc" else conformer_rnnt_bpe_model assert isinstance(model, ASRModel) - if arg == "all": - conf = OmegaConf.create({"temperature": 0.5}) - test_args_main = {"method_cfg": conf} - test_args_greedy = {"confidence_method_cfg": conf} - elif arg == "method_cfg": - conf = OmegaConf.create({"alpha": 0.5}) - test_args_main = {"method_cfg": conf} - test_args_greedy = {"confidence_method_cfg": conf} - elif arg == "temperature": - conf = OmegaConf.create({"temperature": 0.5}) - test_args_main = {"measure_cfg": conf} - test_args_greedy = {"confidence_measure_cfg": conf} - else: - raise NotImplementedError(arg) + + conf = OmegaConf.create({"temperature": 0.5}) + test_args_main = {"method_cfg": conf} + test_args_greedy = {"confidence_method_cfg": conf} confidence_cfg = ConfidenceConfig(preserve_word_confidence=True, **test_args_main) model.change_decoding_strategy( RNNTDecodingConfig(fused_batch_size=-1, strategy="greedy", confidence_cfg=confidence_cfg) if model_name == "rnnt" else CTCDecodingConfig(confidence_cfg=confidence_cfg) ) - assert model.cfg.decoding.confidence_cfg.measure_cfg.alpha == 0.5 + assert model.cfg.decoding.confidence_cfg.method_cfg.alpha == 0.5 model.change_decoding_strategy( RNNTDecodingConfig( fused_batch_size=-1, @@ -141,4 +130,4 @@ def test_deprecated_config_args(self, model_name, arg, conformer_ctc_bpe_model, if model_name == "rnnt" else CTCDecodingConfig(greedy=GreedyCTCInferConfig(preserve_frame_confidence=True, **test_args_greedy)) ) - assert model.cfg.decoding.greedy.confidence_measure_cfg.alpha == 0.5 + assert model.cfg.decoding.greedy.confidence_method_cfg.alpha == 0.5 diff --git a/tests/collections/asr/test_asr_hybrid_rnnt_ctc_model_char.py b/tests/collections/asr/test_asr_hybrid_rnnt_ctc_model_char.py index 8687ed683833..22926b6516ee 100644 --- a/tests/collections/asr/test_asr_hybrid_rnnt_ctc_model_char.py +++ b/tests/collections/asr/test_asr_hybrid_rnnt_ctc_model_char.py @@ -242,8 +242,7 @@ def test_decoding_change(self, hybrid_asr_model): @pytest.mark.unit def test_GreedyRNNTInferConfig(self): - # confidence_method_cfg is deprecated - IGNORE_ARGS = ['decoder_model', 'joint_model', 'blank_index', 'confidence_method_cfg'] + IGNORE_ARGS = ['decoder_model', 'joint_model', 'blank_index'] result = assert_dataclass_signature_match( greedy_decode.GreedyRNNTInfer, greedy_decode.GreedyRNNTInferConfig, ignore_args=IGNORE_ARGS @@ -257,8 +256,7 @@ def test_GreedyRNNTInferConfig(self): @pytest.mark.unit def test_GreedyBatchedRNNTInferConfig(self): - # confidence_method_cfg is deprecated - IGNORE_ARGS = ['decoder_model', 'joint_model', 'blank_index', 'confidence_method_cfg'] + IGNORE_ARGS = ['decoder_model', 'joint_model', 'blank_index'] result = assert_dataclass_signature_match( greedy_decode.GreedyBatchedRNNTInfer, greedy_decode.GreedyBatchedRNNTInferConfig, ignore_args=IGNORE_ARGS diff --git a/tests/collections/asr/test_asr_rnnt_encdec_model.py b/tests/collections/asr/test_asr_rnnt_encdec_model.py index 775a146c74c4..68f1e38f797b 100644 --- a/tests/collections/asr/test_asr_rnnt_encdec_model.py +++ b/tests/collections/asr/test_asr_rnnt_encdec_model.py @@ -242,8 +242,7 @@ def test_decoding_change(self, asr_model): @pytest.mark.unit def test_GreedyRNNTInferConfig(self): - # confidence_method_cfg is deprecated - IGNORE_ARGS = ['decoder_model', 'joint_model', 'blank_index', 'confidence_method_cfg'] + IGNORE_ARGS = ['decoder_model', 'joint_model', 'blank_index'] result = assert_dataclass_signature_match( greedy_decode.GreedyRNNTInfer, greedy_decode.GreedyRNNTInferConfig, ignore_args=IGNORE_ARGS @@ -257,8 +256,7 @@ def test_GreedyRNNTInferConfig(self): @pytest.mark.unit def test_GreedyBatchedRNNTInferConfig(self): - # confidence_method_cfg is deprecated - IGNORE_ARGS = ['decoder_model', 'joint_model', 'blank_index', 'confidence_method_cfg'] + IGNORE_ARGS = ['decoder_model', 'joint_model', 'blank_index'] result = assert_dataclass_signature_match( greedy_decode.GreedyBatchedRNNTInfer, greedy_decode.GreedyBatchedRNNTInferConfig, ignore_args=IGNORE_ARGS diff --git a/tests/collections/asr/test_confidence_ensembles.py b/tests/collections/asr/test_confidence_ensembles.py index b8b027dd3426..e926475009e2 100644 --- a/tests/collections/asr/test_confidence_ensembles.py +++ b/tests/collections/asr/test_confidence_ensembles.py @@ -19,7 +19,7 @@ from nemo.collections.asr.metrics.wer import CTCDecodingConfig from nemo.collections.asr.models import EncDecCTCModel, EncDecHybridRNNTCTCModel, EncDecRNNTModel from nemo.collections.asr.models.confidence_ensemble import ConfidenceEnsembleModel -from nemo.collections.asr.parts.utils.asr_confidence_utils import ConfidenceConfig, ConfidenceMeasureConfig +from nemo.collections.asr.parts.utils.asr_confidence_utils import ConfidenceConfig, ConfidenceMethodConfig def get_model_config(model_class): @@ -117,7 +117,7 @@ def test_model_creation_2models(self, tmp_path, model_class0, model_class1): preserve_frame_confidence=True, exclude_blank=True, aggregation="mean", - measure_cfg=ConfidenceMeasureConfig(name="entropy", entropy_type="renyi", alpha=0.25, entropy_norm="lin",), + method_cfg=ConfidenceMethodConfig(name="entropy", entropy_type="renyi", alpha=0.25, entropy_norm="lin",), ) # just checking that no errors are raised when creating the model @@ -148,7 +148,7 @@ def test_model_creation_5models(self, tmp_path): preserve_frame_confidence=True, exclude_blank=True, aggregation="mean", - measure_cfg=ConfidenceMeasureConfig(name="entropy", entropy_type="renyi", alpha=0.25, entropy_norm="lin",), + method_cfg=ConfidenceMethodConfig(name="entropy", entropy_type="renyi", alpha=0.25, entropy_norm="lin",), ) # just checking that no errors are raised when creating the model diff --git a/tutorials/asr/ASR_Confidence_Estimation.ipynb b/tutorials/asr/ASR_Confidence_Estimation.ipynb index 2a1ad024a889..7a92ed026f07 100644 --- a/tutorials/asr/ASR_Confidence_Estimation.ipynb +++ b/tutorials/asr/ASR_Confidence_Estimation.ipynb @@ -443,8 +443,8 @@ "from nemo.collections.asr.parts.utils.asr_confidence_utils import (\n", " ConfidenceConfig,\n", " ConfidenceConstants,\n", - " ConfidenceMeasureConfig,\n", - " ConfidenceMeasureConstants,\n", + " ConfidenceMethodConfig,\n", + " ConfidenceMethodConstants,\n", ")\n", "from nemo.collections.asr.parts.utils.asr_confidence_benchmarking_utils import (\n", " apply_confidence_parameters,\n", @@ -454,11 +454,11 @@ ")\n", "\n", "\n", - "# List allowed options for ConfidenceMeasureConfig and ConfidenceConfig\n", - "print(f\"Allowed options for ConfidenceMeasureConfig: {ConfidenceMeasureConstants.print()}\\n\")\n", + "# List allowed options for ConfidenceMethodConfig and ConfidenceConfig\n", + "print(f\"Allowed options for ConfidenceMethodConfig: {ConfidenceMethodConstants.print()}\\n\")\n", "print(f\"Allowed options for ConfidenceConfig: {ConfidenceConstants.print()}\\n\")\n", "\n", - "# Initialize ConfidenceConfig and ConfidenceMeasureConfig\n", + "# Initialize ConfidenceConfig and ConfidenceMethodConfig\n", "confidence_cfg = ConfidenceConfig(\n", " preserve_frame_confidence=True, # Internally set to true if preserve_token_confidence == True\n", " # or preserve_word_confidence == True\n", @@ -466,7 +466,7 @@ " preserve_word_confidence=True,\n", " aggregation=\"prod\", # How to aggregate frame scores to token scores and token scores to word scores\n", " exclude_blank=False, # If true, only non-blank emissions contribute to confidence scores\n", - " measure_cfg=ConfidenceMeasureConfig( # Config for per-frame scores calculation (before aggregation)\n", + " method_cfg=ConfidenceMethodConfig( # Config for per-frame scores calculation (before aggregation)\n", " name=\"max_prob\", # Or \"entropy\" (default), which usually works better\n", " entropy_type=\"gibbs\", # Used only for name == \"entropy\". Recommended: \"tsallis\" (default) or \"renyi\"\n", " alpha=0.5, # Low values (<1) increase sensitivity, high values decrease sensitivity\n", @@ -1058,7 +1058,7 @@ " preserve_word_confidence=True,\n", " preserve_token_confidence=True,\n", " aggregation=\"min\",\n", - " measure_cfg=DictConfig({\"entropy_type\": \"tsallis\", \"alpha\": 1.5, \"entropy_norm\": \"lin\"}),\n", + " method_cfg=DictConfig({\"entropy_type\": \"tsallis\", \"alpha\": 1.5, \"entropy_norm\": \"lin\"}),\n", ")\n", "\n", "model.change_decoding_strategy(\n", From 3b45cbe77c6ade484e4e867f52624b20e1e3126a Mon Sep 17 00:00:00 2001 From: Robin Dong Date: Wed, 20 Sep 2023 01:01:14 +1000 Subject: [PATCH 036/112] Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> --------- Signed-off-by: Robin Dong Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister --- docs/source/tts/datasets.rst | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/docs/source/tts/datasets.rst b/docs/source/tts/datasets.rst index b5317ce01f64..dabf50b30dae 100644 --- a/docs/source/tts/datasets.rst +++ b/docs/source/tts/datasets.rst @@ -172,18 +172,24 @@ SFSpeech Chinese/English Bilingual Speech ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Dataset URL: https://catalog.ngc.nvidia.com/orgs/nvidia/resources/sf_bilingual_speech_zh_en * Dataset Processing Script: https://github.com/NVIDIA/NeMo/tree/stable/scripts/dataset_processing/tts/sfbilingual/get_data.py -* Command Line Instruction: +* Command Line Instruction: please refer details in Section 1 (NGC Registry CLI installation), Section 2 (Downloading SFSpeech Dataset), and Section 3 (Creatiung Data Manifests) from https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb. Below code block briefly describes the steps. .. code-block:: bash + # [prerequisite] Install and setup 'ngc' cli tool by following document https://docs.ngc.nvidia.com/cli/cmd.html + + $ ngc registry resource download-version "nvidia/sf_bilingual_speech_zh_en:v1" + + $ unzip sf_bilingual_speech_zh_en_vv1/SF_bilingual.zip -d + $ python scripts/dataset_processing/tts/sfbilingual/get_data.py \ - --data-root \ - --val-size 0.1 \ - --test-size 0.2 \ + --data-root /SF_bilingual \ + --val-size 0.005 \ + --test-size 0.01 \ --seed-for-ds-split 100 $ python scripts/dataset_processing/tts/extract_sup_data.py \ --config-path sfbilingual/ds_conf \ --config-name ds_for_fastpitch_align.yaml \ manifest_filepath= \ - sup_data_path= \ No newline at end of file + sup_data_path= From b9ddba746c03dcd4b6ef0b999279376bf7a1ed1b Mon Sep 17 00:00:00 2001 From: Aleksandr Laptev Date: Tue, 19 Sep 2023 22:53:14 +0700 Subject: [PATCH 037/112] RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev * tests added Signed-off-by: Aleksandr Laptev --------- Signed-off-by: Aleksandr Laptev Signed-off-by: Sasha Meister --- .../parts/submodules/rnnt_greedy_decoding.py | 278 ++++++++---------- .../asr/test_asr_rnnt_encdec_model.py | 238 ++++++++++++++- 2 files changed, 363 insertions(+), 153 deletions(-) diff --git a/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py b/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py index 185a3abf1151..95b0bdf5fd13 100644 --- a/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py +++ b/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py @@ -441,13 +441,6 @@ def _greedy_decode( # If blank token is predicted, exit inner loop, move onto next timestep t if k == self._blank_index: not_blank = False - - if self.preserve_alignments: - # convert Ti-th logits into a torch array - hypothesis.alignments.append([]) # blank buffer for next timestep - - if self.preserve_frame_confidence: - hypothesis.frame_confidence.append([]) # blank buffer for next timestep else: # Append token to label set, update RNN state. hypothesis.y_sequence.append(k) @@ -459,6 +452,13 @@ def _greedy_decode( # Increment token counter. symbols_added += 1 + if self.preserve_alignments: + # convert Ti-th logits into a torch array + hypothesis.alignments.append([]) # blank buffer for next timestep + + if self.preserve_frame_confidence: + hypothesis.frame_confidence.append([]) # blank buffer for next timestep + # Remove trailing empty list of Alignments if self.preserve_alignments: if len(hypothesis.alignments[-1]) == 0: @@ -642,9 +642,6 @@ def _greedy_decode_blank_as_pad( # frame_confidence is a 3-dimensional dangling list representing B x T x U for hyp in hypotheses: hyp.frame_confidence = [[]] - hyp.y_3best = [[]] - hyp.frame_confidence_3best = [[[]]] - hyp.logp = [[]] # Last Label buffer + Last Label without blank buffer # batch level equivalent of the last_label @@ -731,32 +728,6 @@ def _greedy_decode_blank_as_pad( # This is equivalent to if single sample predicted k if all_blanks: not_blank = False - - # If preserving alignments, convert the current Uj alignments into a torch.Tensor - # Then preserve U at current timestep Ti - # Finally, forward the timestep history to Ti+1 for that sample - # All of this should only be done iff the current time index <= sample-level AM length. - # Otherwise ignore and move to next sample / next timestep. - if self.preserve_alignments: - - # convert Ti-th logits into a torch array - for batch_idx in range(batchsize): - - # this checks if current timestep <= sample-level AM length - # If current timestep > sample-level AM length, no alignments will be added - # Therefore the list of Uj alignments is empty here. - if len(hypotheses[batch_idx].alignments[-1]) > 0: - hypotheses[batch_idx].alignments.append([]) # blank buffer for next timestep - - # Do the same if preserving per-frame confidence - if self.preserve_frame_confidence: - - for batch_idx in range(batchsize): - if len(hypotheses[batch_idx].frame_confidence[-1]) > 0: - hypotheses[batch_idx].frame_confidence.append([]) # blank buffer for next timestep - hypotheses[batch_idx].y_3best.append([]) - hypotheses[batch_idx].frame_confidence_3best.append([]) - hypotheses[batch_idx].logp.append([]) else: # Collect batch indices where blanks occurred now/past blank_indices = (blank_mask == 1).nonzero(as_tuple=False) @@ -791,6 +762,29 @@ def _greedy_decode_blank_as_pad( hypotheses[kidx].score += float(v[kidx]) symbols_added += 1 + # If preserving alignments, convert the current Uj alignments into a torch.Tensor + # Then preserve U at current timestep Ti + # Finally, forward the timestep history to Ti+1 for that sample + # All of this should only be done iff the current time index <= sample-level AM length. + # Otherwise ignore and move to next sample / next timestep. + if self.preserve_alignments: + + # convert Ti-th logits into a torch array + for batch_idx in range(batchsize): + + # this checks if current timestep <= sample-level AM length + # If current timestep > sample-level AM length, no alignments will be added + # Therefore the list of Uj alignments is empty here. + if len(hypotheses[batch_idx].alignments[-1]) > 0: + hypotheses[batch_idx].alignments.append([]) # blank buffer for next timestep + + # Do the same if preserving per-frame confidence + if self.preserve_frame_confidence: + + for batch_idx in range(batchsize): + if len(hypotheses[batch_idx].frame_confidence[-1]) > 0: + hypotheses[batch_idx].frame_confidence.append([]) # blank buffer for next timestep + # Remove trailing empty list of alignments at T_{am-len} x Uj if self.preserve_alignments: for batch_idx in range(batchsize): @@ -802,9 +796,6 @@ def _greedy_decode_blank_as_pad( for batch_idx in range(batchsize): if len(hypotheses[batch_idx].frame_confidence[-1]) == 0: del hypotheses[batch_idx].frame_confidence[-1] - del hypotheses[batch_idx].y_3best[-1] - del hypotheses[batch_idx].frame_confidence_3best[-1] - del hypotheses[batch_idx].logp[-1] # Preserve states for batch_idx in range(batchsize): @@ -946,29 +937,6 @@ def _greedy_decode_masked( # This is equivalent to if single sample predicted k if blank_mask.all(): not_blank = False - - # If preserving alignments, convert the current Uj alignments into a torch.Tensor - # Then preserve U at current timestep Ti - # Finally, forward the timestep history to Ti+1 for that sample - # All of this should only be done iff the current time index <= sample-level AM length. - # Otherwise ignore and move to next sample / next timestep. - if self.preserve_alignments: - - # convert Ti-th logits into a torch array - for batch_idx in range(batchsize): - - # this checks if current timestep <= sample-level AM length - # If current timestep > sample-level AM length, no alignments will be added - # Therefore the list of Uj alignments is empty here. - if len(hypotheses[batch_idx].alignments[-1]) > 0: - hypotheses[batch_idx].alignments.append([]) # blank buffer for next timestep - - # Do the same if preserving per-frame confidence - if self.preserve_frame_confidence: - - for batch_idx in range(batchsize): - if len(hypotheses[batch_idx].frame_confidence[-1]) > 0: - hypotheses[batch_idx].frame_confidence.append([]) # blank buffer for next timestep else: # Collect batch indices where blanks occurred now/past blank_indices = (blank_mask == 1).nonzero(as_tuple=False) @@ -1004,6 +972,29 @@ def _greedy_decode_masked( symbols_added += 1 + # If preserving alignments, convert the current Uj alignments into a torch.Tensor + # Then preserve U at current timestep Ti + # Finally, forward the timestep history to Ti+1 for that sample + # All of this should only be done iff the current time index <= sample-level AM length. + # Otherwise ignore and move to next sample / next timestep. + if self.preserve_alignments: + + # convert Ti-th logits into a torch array + for batch_idx in range(batchsize): + + # this checks if current timestep <= sample-level AM length + # If current timestep > sample-level AM length, no alignments will be added + # Therefore the list of Uj alignments is empty here. + if len(hypotheses[batch_idx].alignments[-1]) > 0: + hypotheses[batch_idx].alignments.append([]) # blank buffer for next timestep + + # Do the same if preserving per-frame confidence + if self.preserve_frame_confidence: + + for batch_idx in range(batchsize): + if len(hypotheses[batch_idx].frame_confidence[-1]) > 0: + hypotheses[batch_idx].frame_confidence.append([]) # blank buffer for next timestep + # Remove trailing empty list of alignments at T_{am-len} x Uj if self.preserve_alignments: for batch_idx in range(batchsize): @@ -1624,13 +1615,6 @@ def _greedy_decode( # If any type of blank token is predicted, exit inner loop, move onto next timestep t if k >= self._blank_index - len(self.big_blank_durations): not_blank = False - - if self.preserve_alignments: - # convert Ti-th logits into a torch array - hypothesis.alignments.append([]) # blank buffer for next timestep - - if self.preserve_frame_confidence: - hypothesis.frame_confidence.append([]) # blank buffer for next timestep else: # Append token to label set, update RNN state. hypothesis.y_sequence.append(k) @@ -1642,6 +1626,13 @@ def _greedy_decode( # Increment token counter. symbols_added += 1 + if self.preserve_alignments: + # convert Ti-th logits into a torch array + hypothesis.alignments.append([]) # blank buffer for next timestep + + if self.preserve_frame_confidence: + hypothesis.frame_confidence.append([]) # blank buffer for next timestep + # Remove trailing empty list of Alignments if self.preserve_alignments: if len(hypothesis.alignments[-1]) == 0: @@ -1781,9 +1772,6 @@ def _greedy_decode_blank_as_pad( # frame_confidence is a 3-dimensional dangling list representing B x T x U for hyp in hypotheses: hyp.frame_confidence = [[]] - hyp.y_3best = [[]] - hyp.frame_confidence_3best = [[[]]] - hyp.logp = [[]] # Last Label buffer + Last Label without blank buffer # batch level equivalent of the last_label @@ -1897,40 +1885,6 @@ def _greedy_decode_blank_as_pad( # This is equivalent to if single sample predicted k if blank_mask.all(): not_blank = False - - for i in range(len(big_blank_masks) + 1): - # The task here is find the shortest blank duration of all batches. - # so we start from the shortest blank duration and go up, - # and stop once we found the duration whose corresponding mask isn't all True. - if i == len(big_blank_masks) or not big_blank_masks[i].all(): - big_blank_duration = self.big_blank_durations[i - 1] if i > 0 else 1 - break - - # If preserving alignments, convert the current Uj alignments into a torch.Tensor - # Then preserve U at current timestep Ti - # Finally, forward the timestep history to Ti+1 for that sample - # All of this should only be done iff the current time index <= sample-level AM length. - # Otherwise ignore and move to next sample / next timestep. - if self.preserve_alignments: - - # convert Ti-th logits into a torch array - for batch_idx in range(batchsize): - - # this checks if current timestep <= sample-level AM length - # If current timestep > sample-level AM length, no alignments will be added - # Therefore the list of Uj alignments is empty here. - if len(hypotheses[batch_idx].alignments[-1]) > 0: - hypotheses[batch_idx].alignments.append([]) # blank buffer for next timestep - - # Do the same if preserving per-frame confidence - if self.preserve_frame_confidence: - - for batch_idx in range(batchsize): - if len(hypotheses[batch_idx].frame_confidence[-1]) > 0: - hypotheses[batch_idx].frame_confidence.append([]) # blank buffer for next timestep - hypotheses[batch_idx].y_3best.append([]) - hypotheses[batch_idx].frame_confidence_3best.append([]) - hypotheses[batch_idx].logp.append([]) else: # Collect batch indices where blanks occurred now/past blank_indices = (blank_mask == 1).nonzero(as_tuple=False) @@ -1966,6 +1920,37 @@ def _greedy_decode_blank_as_pad( symbols_added += 1 + for i in range(len(big_blank_masks) + 1): + # The task here is find the shortest blank duration of all batches. + # so we start from the shortest blank duration and go up, + # and stop once we found the duration whose corresponding mask isn't all True. + if i == len(big_blank_masks) or not big_blank_masks[i].all(): + big_blank_duration = self.big_blank_durations[i - 1] if i > 0 else 1 + break + + # If preserving alignments, convert the current Uj alignments into a torch.Tensor + # Then preserve U at current timestep Ti + # Finally, forward the timestep history to Ti+1 for that sample + # All of this should only be done iff the current time index <= sample-level AM length. + # Otherwise ignore and move to next sample / next timestep. + if self.preserve_alignments: + + # convert Ti-th logits into a torch array + for batch_idx in range(batchsize): + + # this checks if current timestep <= sample-level AM length + # If current timestep > sample-level AM length, no alignments will be added + # Therefore the list of Uj alignments is empty here. + if len(hypotheses[batch_idx].alignments[-1]) > 0: + hypotheses[batch_idx].alignments.append([]) # blank buffer for next timestep + + # Do the same if preserving per-frame confidence + if self.preserve_frame_confidence: + + for batch_idx in range(batchsize): + if len(hypotheses[batch_idx].frame_confidence[-1]) > 0: + hypotheses[batch_idx].frame_confidence.append([]) # blank buffer for next timestep + # Remove trailing empty list of alignments at T_{am-len} x Uj if self.preserve_alignments: for batch_idx in range(batchsize): @@ -1977,9 +1962,6 @@ def _greedy_decode_blank_as_pad( for batch_idx in range(batchsize): if len(hypotheses[batch_idx].frame_confidence[-1]) == 0: del hypotheses[batch_idx].frame_confidence[-1] - del hypotheses[batch_idx].y_3best[-1] - del hypotheses[batch_idx].frame_confidence_3best[-1] - del hypotheses[batch_idx].logp[-1] # Preserve states for batch_idx in range(batchsize): @@ -2121,29 +2103,6 @@ def _greedy_decode_masked( # This is equivalent to if single sample predicted k if blank_mask.all(): not_blank = False - - # If preserving alignments, convert the current Uj alignments into a torch.Tensor - # Then preserve U at current timestep Ti - # Finally, forward the timestep history to Ti+1 for that sample - # All of this should only be done iff the current time index <= sample-level AM length. - # Otherwise ignore and move to next sample / next timestep. - if self.preserve_alignments: - - # convert Ti-th logits into a torch array - for batch_idx in range(batchsize): - - # this checks if current timestep <= sample-level AM length - # If current timestep > sample-level AM length, no alignments will be added - # Therefore the list of Uj alignments is empty here. - if len(hypotheses[batch_idx].alignments[-1]) > 0: - hypotheses[batch_idx].alignments.append([]) # blank buffer for next timestep - - # Do the same if preserving per-frame confidence - if self.preserve_frame_confidence: - - for batch_idx in range(batchsize): - if len(hypotheses[batch_idx].frame_confidence[-1]) > 0: - hypotheses[batch_idx].frame_confidence.append([]) # blank buffer for next timestep else: # Collect batch indices where blanks occurred now/past blank_indices = (blank_mask == 1).nonzero(as_tuple=False) @@ -2179,6 +2138,29 @@ def _greedy_decode_masked( symbols_added += 1 + # If preserving alignments, convert the current Uj alignments into a torch.Tensor + # Then preserve U at current timestep Ti + # Finally, forward the timestep history to Ti+1 for that sample + # All of this should only be done iff the current time index <= sample-level AM length. + # Otherwise ignore and move to next sample / next timestep. + if self.preserve_alignments: + + # convert Ti-th logits into a torch array + for batch_idx in range(batchsize): + + # this checks if current timestep <= sample-level AM length + # If current timestep > sample-level AM length, no alignments will be added + # Therefore the list of Uj alignments is empty here. + if len(hypotheses[batch_idx].alignments[-1]) > 0: + hypotheses[batch_idx].alignments.append([]) # blank buffer for next timestep + + # Do the same if preserving per-frame confidence + if self.preserve_frame_confidence: + + for batch_idx in range(batchsize): + if len(hypotheses[batch_idx].frame_confidence[-1]) > 0: + hypotheses[batch_idx].frame_confidence.append([]) # blank buffer for next timestep + # Remove trailing empty list of alignments at T_{am-len} x Uj if self.preserve_alignments: for batch_idx in range(batchsize): @@ -2443,19 +2425,6 @@ def _greedy_decode( # If blank token is predicted, exit inner loop, move onto next timestep t if k == self._blank_index: not_blank = False - - # this rarely happens, but we manually increment the `skip` number - # if blank is emitted and duration=0 is predicted. This prevents possible - # infinite loops. - if skip == 0: - skip = 1 - - if self.preserve_alignments: - # convert Ti-th logits into a torch array - hypothesis.alignments.append([]) # blank buffer for next timestep - - if self.preserve_frame_confidence: - hypothesis.frame_confidence.append([]) # blank buffer for next timestep else: # Append token to label set, update RNN state. hypothesis.y_sequence.append(k) @@ -2469,6 +2438,19 @@ def _greedy_decode( time_idx += skip need_loop = skip == 0 + # this rarely happens, but we manually increment the `skip` number + # if blank is emitted and duration=0 is predicted. This prevents possible + # infinite loops. + if skip == 0: + skip = 1 + + if self.preserve_alignments: + # convert Ti-th logits into a torch array + hypothesis.alignments.append([]) # blank buffer for next timestep + + if self.preserve_frame_confidence: + hypothesis.frame_confidence.append([]) # blank buffer for next timestep + if symbols_added == self.max_symbols: time_idx += 1 @@ -2652,9 +2634,6 @@ def _greedy_decode_blank_as_pad( # frame_confidence is a 3-dimensional dangling list representing B x T x U for hyp in hypotheses: hyp.frame_confidence = [[]] - hyp.y_3best = [[]] - hyp.frame_confidence_3best = [[[]]] - hyp.logp = [[]] # Last Label buffer + Last Label without blank buffer # batch level equivalent of the last_label @@ -2781,9 +2760,6 @@ def _greedy_decode_blank_as_pad( for batch_idx in range(batchsize): if len(hypotheses[batch_idx].frame_confidence[-1]) == 0: del hypotheses[batch_idx].frame_confidence[-1] - del hypotheses[batch_idx].y_3best[-1] - del hypotheses[batch_idx].frame_confidence_3best[-1] - del hypotheses[batch_idx].logp[-1] # Preserve states for batch_idx in range(batchsize): diff --git a/tests/collections/asr/test_asr_rnnt_encdec_model.py b/tests/collections/asr/test_asr_rnnt_encdec_model.py index 68f1e38f797b..12e08006a3e4 100644 --- a/tests/collections/asr/test_asr_rnnt_encdec_model.py +++ b/tests/collections/asr/test_asr_rnnt_encdec_model.py @@ -12,6 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. import copy +from typing import Any, Dict, List, Optional, Tuple import pytest import torch @@ -31,6 +32,71 @@ ) or numba_utils.numba_cuda_is_supported(__NUMBA_MINIMUM_VERSION__) +@pytest.fixture() +def max_symbols_setup(): + from nemo.collections.asr.modules.rnnt_abstract import AbstractRNNTDecoder, AbstractRNNTJoint + from nemo.collections.asr.parts.utils.rnnt_utils import Hypothesis + + class DummyRNNTDecoder(AbstractRNNTDecoder): + def predict( + self, + y: Optional[torch.Tensor] = None, + state: Optional[torch.Tensor] = None, + add_sos: bool = False, + batch_size: Optional[int] = None, + ) -> Tuple[torch.Tensor, List[torch.Tensor]]: + if y is not None and state is not None: + return (y + state) / 2, y * state + elif state is not None: + return torch.tensor([0] * self.vocab_size + [1], dtype=torch.float32).repeat(state.size()), state + elif y is not None: + return y, torch.tensor([0] * self.vocab_size + [1], dtype=torch.float32).repeat(y.size()) + return ( + torch.tensor([0] * self.vocab_size + [1], dtype=torch.float32).repeat([1, 1, 1]), + torch.tensor([0] * self.vocab_size + [1], dtype=torch.float32).repeat([1, 1, 1]), + ) + + def initialize_state(self, y: torch.Tensor) -> List[torch.Tensor]: + return [torch.tensor()] + + def score_hypothesis( + self, hypothesis: Hypothesis, cache: Dict[Tuple[int], Any] + ) -> Tuple[torch.Tensor, List[torch.Tensor], torch.Tensor]: + return torch.tensor(), [torch.tensor()], torch.tensor() + + def batch_select_state(self, batch_states: List[torch.Tensor], idx: int) -> List[List[torch.Tensor]]: + if batch_states is not None: + states = batch_states[0][idx] + states = states.long() + return [states] + else: + return None + + def batch_copy_states( + self, + old_states: List[torch.Tensor], + new_states: List[torch.Tensor], + ids: List[int], + value: Optional[float] = None, + ) -> List[torch.Tensor]: + if value is None: + old_states[0][ids, :] = new_states[0][ids, :] + + return old_states + + class DummyRNNTJoint(AbstractRNNTJoint): + def joint(self, f: torch.Tensor, g: torch.Tensor) -> torch.Tensor: + return f.unsqueeze(dim=2) + g.unsqueeze(dim=1) + + setup = {} + setup["decoder"] = DummyRNNTDecoder(vocab_size=2, blank_idx=2, blank_as_pad=True) + setup["decoder_masked"] = DummyRNNTDecoder(vocab_size=2, blank_idx=2, blank_as_pad=False) + setup["joint"] = DummyRNNTJoint() + setup["encoder_output"] = torch.tensor([[[1, 0, 0], [0, 1, 0], [0, 0, 1]]], dtype=torch.float32).transpose(1, 2) + setup["encoded_lengths"] = torch.tensor([3]) + return setup + + @pytest.fixture() def asr_model(): preprocessor = {'cls': 'nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor', 'params': dict({})} @@ -589,11 +655,16 @@ def test_greedy_decoding_preserve_alignment(self, greedy_class): decoder = RNNTDecoder(prednet_cfg, vocab_size) + max_symbols_per_step = 5 for joint_type in [RNNTJoint, HATJoint]: joint_net = joint_type(jointnet_cfg, vocab_size, vocabulary=token_list) greedy = greedy_class( - decoder, joint_net, blank_index=len(token_list) - 1, preserve_alignments=True, max_symbols_per_step=5 + decoder, + joint_net, + blank_index=len(token_list), + preserve_alignments=True, + max_symbols_per_step=max_symbols_per_step, ) # (B, D, T) @@ -604,12 +675,175 @@ def test_greedy_decoding_preserve_alignment(self, greedy_class): hyp = greedy(encoder_output=enc_out, encoded_lengths=enc_len)[0][0] # type: rnnt_utils.Hypothesis assert hyp.alignments is not None + timestep_count = { + u.item(): c.item() for u, c in zip(*torch.unique(torch.tensor(hyp.timestep), return_counts=True)) + } for t in range(len(hyp.alignments)): - for u in range(len(hyp.alignments[t])): + + # check that the number of alignment elements is consistent with hyp.timestep + alignment_len = len(hyp.alignments[t]) + assert alignment_len <= max_symbols_per_step + if t in timestep_count: # non-blank + assert alignment_len == timestep_count[t] + (1 if alignment_len < max_symbols_per_step else 0) + else: # blank + assert alignment_len == 1 + + for u in range(alignment_len): logp, label = hyp.alignments[t][u] assert torch.is_tensor(logp) assert torch.is_tensor(label) + @pytest.mark.skipif( + not NUMBA_RNNT_LOSS_AVAILABLE, reason='RNNTLoss has not been compiled with appropriate numba version.', + ) + @pytest.mark.unit + @pytest.mark.parametrize( + "greedy_class", [greedy_decode.GreedyRNNTInfer, greedy_decode.GreedyBatchedRNNTInfer], + ) + def test_greedy_decoding_preserve_frame_confidence(self, greedy_class): + token_list = [" ", "a", "b", "c"] + vocab_size = len(token_list) + + encoder_output_size = 4 + decoder_output_size = 4 + joint_output_shape = 4 + + prednet_cfg = {'pred_hidden': decoder_output_size, 'pred_rnn_layers': 1} + jointnet_cfg = { + 'encoder_hidden': encoder_output_size, + 'pred_hidden': decoder_output_size, + 'joint_hidden': joint_output_shape, + 'activation': 'relu', + } + + decoder = RNNTDecoder(prednet_cfg, vocab_size) + + max_symbols_per_step = 5 + for joint_type in [RNNTJoint, HATJoint]: + joint_net = joint_type(jointnet_cfg, vocab_size, vocabulary=token_list) + + greedy = greedy_class( + decoder, + joint_net, + blank_index=len(token_list), + preserve_alignments=True, + preserve_frame_confidence=True, + max_symbols_per_step=max_symbols_per_step, + ) + + # (B, D, T) + enc_out = torch.randn(1, encoder_output_size, 30) + enc_len = torch.tensor([30], dtype=torch.int32) + + with torch.no_grad(): + hyp = greedy(encoder_output=enc_out, encoded_lengths=enc_len)[0][0] # type: rnnt_utils.Hypothesis + assert hyp.frame_confidence is not None + + timestep_count = { + u.item(): c.item() for u, c in zip(*torch.unique(torch.tensor(hyp.timestep), return_counts=True)) + } + for t in range(len(hyp.frame_confidence)): + + # check that the number of confidence elements is consistent with hyp.timestep + confidence_len = len(hyp.frame_confidence[t]) + assert confidence_len <= max_symbols_per_step + if t in timestep_count: # non-blank + assert confidence_len == timestep_count[t] + ( + 1 if confidence_len < max_symbols_per_step else 0 + ) + else: # blank + assert confidence_len == 1 + + for u in range(confidence_len): + score = hyp.frame_confidence[t][u] + assert 0 <= score <= 1 + + @pytest.mark.skipif( + not NUMBA_RNNT_LOSS_AVAILABLE, reason='RNNTLoss has not been compiled with appropriate numba version.', + ) + @pytest.mark.unit + @pytest.mark.parametrize( + "greedy_class", [greedy_decode.GreedyRNNTInfer, greedy_decode.GreedyBatchedRNNTInfer], + ) + @pytest.mark.parametrize("max_symbols_per_step", [0, 1, 5]) + def test_greedy_decoding_max_symbols_alignment(self, max_symbols_setup, greedy_class, max_symbols_per_step): + decoders = [max_symbols_setup["decoder"]] + if greedy_class is greedy_decode.GreedyBatchedRNNTInfer: + decoders.append(max_symbols_setup["decoder_masked"]) + joint = max_symbols_setup["joint"] + encoder_output = max_symbols_setup["encoder_output"] + encoded_lengths = max_symbols_setup["encoded_lengths"] + + for decoder in decoders: + greedy = greedy_class( + decoder_model=decoder, + joint_model=joint, + blank_index=decoder.blank_idx, + max_symbols_per_step=max_symbols_per_step, + preserve_alignments=True, + ) + + with torch.no_grad(): + hyp = greedy(encoder_output=encoder_output, encoded_lengths=encoded_lengths)[0][0] + assert hyp.alignments is not None + + timestep_count = { + u.item(): c.item() for u, c in zip(*torch.unique(torch.tensor(hyp.timestep), return_counts=True)) + } + for t in range(len(hyp.alignments)): + + # check that the number of confidence elements is consistent with hyp.timestep + alignment_len = len(hyp.alignments[t]) + assert alignment_len <= max_symbols_per_step + if t in timestep_count: # non-blank + assert alignment_len == timestep_count[t] + (1 if alignment_len < max_symbols_per_step else 0) + else: # blank or max_symbols_per_step == 0 + assert alignment_len <= 1 + + @pytest.mark.skipif( + not NUMBA_RNNT_LOSS_AVAILABLE, reason='RNNTLoss has not been compiled with appropriate numba version.', + ) + @pytest.mark.unit + @pytest.mark.parametrize( + "greedy_class", [greedy_decode.GreedyRNNTInfer, greedy_decode.GreedyBatchedRNNTInfer], + ) + @pytest.mark.parametrize("max_symbols_per_step", [0, 1, 5]) + def test_greedy_decoding_max_symbols_confidence(self, max_symbols_setup, greedy_class, max_symbols_per_step): + decoders = [max_symbols_setup["decoder"]] + if greedy_class is greedy_decode.GreedyBatchedRNNTInfer: + decoders.append(max_symbols_setup["decoder_masked"]) + joint = max_symbols_setup["joint"] + encoder_output = max_symbols_setup["encoder_output"] + encoded_lengths = max_symbols_setup["encoded_lengths"] + + for decoder in decoders: + greedy = greedy_class( + decoder_model=decoder, + joint_model=joint, + blank_index=decoder.blank_idx, + max_symbols_per_step=max_symbols_per_step, + preserve_frame_confidence=True, + ) + + with torch.no_grad(): + hyp = greedy(encoder_output=encoder_output, encoded_lengths=encoded_lengths)[0][0] + assert hyp.frame_confidence is not None + + timestep_count = { + u.item(): c.item() for u, c in zip(*torch.unique(torch.tensor(hyp.timestep), return_counts=True)) + } + for t in range(len(hyp.frame_confidence)): + + # check that the number of confidence elements is consistent with hyp.timestep + confidence_len = len(hyp.frame_confidence[t]) + assert confidence_len <= max_symbols_per_step + if t in timestep_count: # non-blank + assert confidence_len == timestep_count[t] + ( + 1 if confidence_len < max_symbols_per_step else 0 + ) + else: # blank or max_symbols_per_step == 0 + assert confidence_len <= 1 + @pytest.mark.skipif( not NUMBA_RNNT_LOSS_AVAILABLE, reason='RNNTLoss has not been compiled with appropriate numba version.', ) From a5492041e2eb60401a4e0722ff4304760b7c9380 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 19 Sep 2023 10:44:46 -0600 Subject: [PATCH 038/112] Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Eric Harper Signed-off-by: Sasha Meister --- nemo/utils/exp_manager.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/nemo/utils/exp_manager.py b/nemo/utils/exp_manager.py index 3deb814ae2df..1629aa5cbb50 100644 --- a/nemo/utils/exp_manager.py +++ b/nemo/utils/exp_manager.py @@ -578,8 +578,8 @@ def check_resume( end_dist_checkpoints = [d for d in dist_checkpoints if d.match("*end")] last_dist_checkpoints = [d for d in dist_checkpoints if d.match("*last")] - end_checkpoints = end_dist_checkpoints if end_dist_checkpoints else list(checkpoint_dir.glob("*end.ckpt")) - last_checkpoints = last_dist_checkpoints if last_dist_checkpoints else list(checkpoint_dir.glob("*last.ckpt")) + end_checkpoints = end_dist_checkpoints if end_dist_checkpoints else list(checkpoint_dir.rglob("*end.ckpt")) + last_checkpoints = last_dist_checkpoints if last_dist_checkpoints else list(checkpoint_dir.rglob("*last.ckpt")) if not checkpoint_dir.exists() or (not len(end_checkpoints) > 0 and not len(last_checkpoints) > 0): if resume_ignore_no_checkpoint: From 0ff844ff8fa9f1e9aa0839969c389da0e8a347f0 Mon Sep 17 00:00:00 2001 From: Robin Dong Date: Wed, 20 Sep 2023 02:49:18 +1000 Subject: [PATCH 039/112] Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong * Update tacotron2.py Signed-off-by: Jason --------- Signed-off-by: Robin Dong Signed-off-by: Jason Co-authored-by: Jason Signed-off-by: Sasha Meister --- nemo/collections/tts/modules/tacotron2.py | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/nemo/collections/tts/modules/tacotron2.py b/nemo/collections/tts/modules/tacotron2.py index 7a7c63eb5ad4..dc86074abbe2 100644 --- a/nemo/collections/tts/modules/tacotron2.py +++ b/nemo/collections/tts/modules/tacotron2.py @@ -312,11 +312,8 @@ def infer(self, *, memory, memory_lengths): self.initialize_decoder_states(memory, mask=mask) - mel_lengths = torch.zeros([memory.size(0)], dtype=torch.int32) - not_finished = torch.ones([memory.size(0)], dtype=torch.int32) - if torch.cuda.is_available(): - mel_lengths = mel_lengths.cuda() - not_finished = not_finished.cuda() + mel_lengths = torch.zeros([memory.size(0)], dtype=torch.int32).to(memory.device) + not_finished = torch.ones([memory.size(0)], dtype=torch.int32).to(memory.device) mel_outputs, gate_outputs, alignments = [], [], [] stepped = False From b01b01f34d5bdfe9018ce119b12ec9d52c50d9dd Mon Sep 17 00:00:00 2001 From: Robin Dong Date: Wed, 20 Sep 2023 03:19:56 +1000 Subject: [PATCH 040/112] Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../tts/ljspeech/get_data.py | 23 +++---------------- 1 file changed, 3 insertions(+), 20 deletions(-) diff --git a/scripts/dataset_processing/tts/ljspeech/get_data.py b/scripts/dataset_processing/tts/ljspeech/get_data.py index d8a0b1c2834c..c8aeed5dbfca 100644 --- a/scripts/dataset_processing/tts/ljspeech/get_data.py +++ b/scripts/dataset_processing/tts/ljspeech/get_data.py @@ -27,11 +27,6 @@ def get_args(): parser = argparse.ArgumentParser(description='Download LJSpeech and create manifests with predefined split') parser.add_argument("--data-root", required=True, type=Path) - parser.add_argument( - '--whitelist-path', - type=str, - default="lj_speech.tsv extracted from the readme file in the dataset. You can also download the file from https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/en/data/whitelist/lj_speech.tsv", - ) args = parser.parse_args() return args @@ -57,20 +52,9 @@ def __extract_file(filepath, data_dir): print(f"Error while extracting {filepath}. Already extracted?") -def __process_data(data_root, whitelist_path): - if whitelist_path is None: - wget.download( - "https://raw.githubusercontent.com/NVIDIA/NeMo-text-processing/main/nemo_text_processing/text_normalization/en/data/whitelist/lj_speech.tsv", - out=str(data_root), - ) - whitelist_path = data_root / "lj_speech.tsv" - +def __process_data(data_root): text_normalizer = Normalizer( - lang="en", - input_case="cased", - whitelist=whitelist_path, - overwrite_cache=True, - cache_dir=data_root / "cache_dir", + lang="en", input_case="cased", overwrite_cache=True, cache_dir=data_root / "cache_dir", ) text_normalizer_call_kwargs = {"punct_pre_process": True, "punct_post_process": True} normalizer_call = lambda x: text_normalizer.normalize(x, **text_normalizer_call_kwargs) @@ -117,9 +101,8 @@ def main(): __extract_file(str(tarred_data_path), str(args.data_root)) data_root = args.data_root / "LJSpeech-1.1" - whitelist_path = args.whitelist_path - __process_data(data_root, whitelist_path) + __process_data(data_root) if __name__ == '__main__': From 422d464990acfbaa5260fb884680520216be9511 Mon Sep 17 00:00:00 2001 From: Ryan Langman Date: Tue, 19 Sep 2023 10:56:27 -0700 Subject: [PATCH 041/112] [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan * [TTS] Fix audio codec tests Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: Sasha Meister --- .../tts/losses/audio_codec_loss.py | 6 +- nemo/collections/tts/models/audio_codec.py | 6 +- .../tts/modules/audio_codec_modules.py | 28 ++++---- .../tts/modules/encodec_modules.py | 64 +++++++++++-------- .../tts/modules/test_audio_codec_modules.py | 6 +- 5 files changed, 61 insertions(+), 49 deletions(-) diff --git a/nemo/collections/tts/losses/audio_codec_loss.py b/nemo/collections/tts/losses/audio_codec_loss.py index bde96fadb4c2..8819282f07bd 100644 --- a/nemo/collections/tts/losses/audio_codec_loss.py +++ b/nemo/collections/tts/losses/audio_codec_loss.py @@ -40,8 +40,8 @@ def __init__(self, loss_fn, loss_scale: float = 1.0): @property def input_types(self): return { - "target": NeuralType(('B', 'D', 'T'), RegressionValuesType()), "predicted": NeuralType(('B', 'D', 'T'), PredictionsType()), + "target": NeuralType(('B', 'D', 'T'), RegressionValuesType()), "target_len": NeuralType(tuple('B'), LengthsType()), } @@ -97,7 +97,7 @@ def input_types(self): @property def output_types(self): return { - "loss": [NeuralType(elements_type=LossType())], + "loss": NeuralType(elements_type=LossType()), } @typecheck() @@ -146,7 +146,7 @@ def input_types(self): @property def output_types(self): return { - "loss": [NeuralType(elements_type=LossType())], + "loss": NeuralType(elements_type=LossType()), } @typecheck() diff --git a/nemo/collections/tts/models/audio_codec.py b/nemo/collections/tts/models/audio_codec.py index 6414fa20e52d..63140b77f2b5 100644 --- a/nemo/collections/tts/models/audio_codec.py +++ b/nemo/collections/tts/models/audio_codec.py @@ -484,9 +484,11 @@ def configure_optimizers(self): sched_config = optim_config.pop("sched", None) OmegaConf.set_struct(optim_config, True) - gen_params = itertools.chain(self.audio_encoder.parameters(), self.audio_decoder.parameters()) - disc_params = self.discriminator.parameters() + vq_params = self.vector_quantizer.parameters() if self.vector_quantizer else [] + gen_params = itertools.chain(self.audio_encoder.parameters(), self.audio_decoder.parameters(), vq_params) optim_g = instantiate(optim_config, params=gen_params) + + disc_params = self.discriminator.parameters() optim_d = instantiate(optim_config, params=disc_params) if sched_config is None: diff --git a/nemo/collections/tts/modules/audio_codec_modules.py b/nemo/collections/tts/modules/audio_codec_modules.py index aaf4fb0a7f21..90c53b1f4337 100644 --- a/nemo/collections/tts/modules/audio_codec_modules.py +++ b/nemo/collections/tts/modules/audio_codec_modules.py @@ -12,15 +12,14 @@ # See the License for the specific language governing permissions and # limitations under the License. -from typing import Iterable, Optional, Tuple +from typing import Optional, Tuple -import torch import torch.nn as nn -from einops import rearrange from nemo.collections.tts.parts.utils.helpers import mask_sequence_tensor +from nemo.core.classes.common import typecheck from nemo.core.classes.module import NeuralModule -from nemo.core.neural_types.elements import AudioSignal, EncodedRepresentation, LengthsType, VoidType +from nemo.core.neural_types.elements import LengthsType, VoidType from nemo.core.neural_types.neural_type import NeuralType @@ -64,21 +63,22 @@ def __init__( def input_types(self): return { "inputs": NeuralType(('B', 'C', 'T'), VoidType()), - "lengths": NeuralType(tuple('B'), LengthsType()), + "input_len": NeuralType(tuple('B'), LengthsType()), } @property def output_types(self): return { - "out": [NeuralType(('B', 'C', 'T'), VoidType())], + "out": NeuralType(('B', 'C', 'T'), VoidType()), } def remove_weight_norm(self): nn.utils.remove_weight_norm(self.conv) - def forward(self, inputs, lengths): + @typecheck() + def forward(self, inputs, input_len): out = self.conv(inputs) - out = mask_sequence_tensor(out, lengths) + out = mask_sequence_tensor(out, input_len) return out @@ -101,21 +101,22 @@ def __init__(self, in_channels: int, out_channels: int, kernel_size: int, stride def input_types(self): return { "inputs": NeuralType(('B', 'C', 'T'), VoidType()), - "lengths": NeuralType(tuple('B'), LengthsType()), + "input_len": NeuralType(tuple('B'), LengthsType()), } @property def output_types(self): return { - "out": [NeuralType(('B', 'C', 'T'), VoidType())], + "out": NeuralType(('B', 'C', 'T'), VoidType()), } def remove_weight_norm(self): nn.utils.remove_weight_norm(self.conv) - def forward(self, inputs, lengths): + @typecheck() + def forward(self, inputs, input_len): out = self.conv(inputs) - out = mask_sequence_tensor(out, lengths) + out = mask_sequence_tensor(out, input_len) return out @@ -151,11 +152,12 @@ def input_types(self): @property def output_types(self): return { - "out": [NeuralType(('B', 'C', 'H', 'T'), VoidType())], + "out": NeuralType(('B', 'C', 'H', 'T'), VoidType()), } def remove_weight_norm(self): nn.utils.remove_weight_norm(self.conv) + @typecheck() def forward(self, inputs): return self.conv(inputs) diff --git a/nemo/collections/tts/modules/encodec_modules.py b/nemo/collections/tts/modules/encodec_modules.py index 031b2001e5ca..b05187ccb74b 100644 --- a/nemo/collections/tts/modules/encodec_modules.py +++ b/nemo/collections/tts/modules/encodec_modules.py @@ -72,13 +72,13 @@ def __init__(self, channels: int): def input_types(self): return { "inputs": NeuralType(('B', 'C', 'T_input'), VoidType()), - "lengths": NeuralType(tuple('B'), LengthsType()), + "input_len": NeuralType(tuple('B'), LengthsType()), } @property def output_types(self): return { - "out": [NeuralType(('B', 'C', 'T_out'), VoidType())], + "out": NeuralType(('B', 'C', 'T_out'), VoidType()), } def remove_weight_norm(self): @@ -86,14 +86,15 @@ def remove_weight_norm(self): self.res_conv1.remove_weight_norm() self.res_conv2.remove_weight_norm() - def forward(self, inputs, lengths): + @typecheck() + def forward(self, inputs, input_len): res = self.activation(inputs) - res = self.res_conv1(res, lengths) + res = self.res_conv1(inputs=res, input_len=input_len) res = self.activation(res) - res = self.res_conv2(res, lengths) + res = self.res_conv2(inputs=res, input_len=input_len) - out = self.pre_conv(inputs, lengths) + res - out = mask_sequence_tensor(out, lengths) + out = self.pre_conv(inputs=inputs, input_len=input_len) + res + out = mask_sequence_tensor(out, input_len) return out @@ -112,20 +113,21 @@ def __init__(self, dim: int, num_layers: int, rnn_type: str = "lstm", use_skip: def input_types(self): return { "inputs": NeuralType(('B', 'C', 'T'), VoidType()), - "lengths": NeuralType(tuple('B'), LengthsType()), + "input_len": NeuralType(tuple('B'), LengthsType()), } @property def output_types(self): return { - "out": [NeuralType(('B', 'C', 'T'), VoidType())], + "out": NeuralType(('B', 'C', 'T'), VoidType()), } - def forward(self, inputs, lengths): + @typecheck() + def forward(self, inputs, input_len): inputs = rearrange(inputs, "B C T -> B T C") packed_inputs = nn.utils.rnn.pack_padded_sequence( - inputs, lengths=lengths.cpu(), batch_first=True, enforce_sorted=False + inputs, lengths=input_len.cpu(), batch_first=True, enforce_sorted=False ) packed_out, _ = self.rnn(packed_inputs) out, _ = nn.utils.rnn.pad_packed_sequence(packed_out, batch_first=True) @@ -183,15 +185,15 @@ def __init__( @property def input_types(self): return { - "audio": NeuralType(('B', 'C', 'T_audio'), AudioSignal()), + "audio": NeuralType(('B', 'T_audio'), AudioSignal()), "audio_len": NeuralType(tuple('B'), LengthsType()), } @property def output_types(self): return { - "encoded": [NeuralType(('B', 'D', 'T_encoded'), EncodedRepresentation())], - "encoded_len": [NeuralType(tuple('B'), LengthsType())], + "encoded": NeuralType(('B', 'D', 'T_encoded'), EncodedRepresentation()), + "encoded_len": NeuralType(tuple('B'), LengthsType()), } def remove_weight_norm(self): @@ -201,26 +203,27 @@ def remove_weight_norm(self): for down_sample_conv in self.down_sample_conv_layers: down_sample_conv.remove_weight_norm() + @typecheck() def forward(self, audio, audio_len): encoded_len = audio_len audio = rearrange(audio, "B T -> B 1 T") # [B, C, T_audio] - out = self.pre_conv(audio, encoded_len) + out = self.pre_conv(inputs=audio, input_len=encoded_len) for res_block, down_sample_conv, down_sample_rate in zip( self.res_blocks, self.down_sample_conv_layers, self.down_sample_rates ): # [B, C, T] - out = res_block(out, encoded_len) + out = res_block(inputs=out, input_len=encoded_len) out = self.activation(out) encoded_len = encoded_len // down_sample_rate # [B, 2 * C, T / down_sample_rate] - out = down_sample_conv(out, encoded_len) + out = down_sample_conv(inputs=out, input_len=encoded_len) - out = self.rnn(out, encoded_len) + out = self.rnn(inputs=out, input_len=encoded_len) out = self.activation(out) # [B, encoded_dim, T_encoded] - encoded = self.post_conv(out, encoded_len) + encoded = self.post_conv(inputs=out, input_len=encoded_len) return encoded, encoded_len @@ -274,7 +277,7 @@ def input_types(self): @property def output_types(self): return { - "audio": NeuralType(('B', 'C', 'T_audio'), AudioSignal()), + "audio": NeuralType(('B', 'T_audio'), AudioSignal()), "audio_len": NeuralType(tuple('B'), LengthsType()), } @@ -285,23 +288,24 @@ def remove_weight_norm(self): for res_block in self.res_blocks: res_block.remove_weight_norm() + @typecheck() def forward(self, inputs, input_len): audio_len = input_len # [B, C, T_encoded] - out = self.pre_conv(inputs, audio_len) - out = self.rnn(out, audio_len) + out = self.pre_conv(inputs=inputs, input_len=audio_len) + out = self.rnn(inputs=out, input_len=audio_len) for res_block, up_sample_conv, up_sample_rate in zip( self.res_blocks, self.up_sample_conv_layers, self.up_sample_rates ): audio_len = audio_len * up_sample_rate out = self.activation(out) # [B, C / 2, T * up_sample_rate] - out = up_sample_conv(out, audio_len) - out = res_block(out, audio_len) + out = up_sample_conv(inputs=out, input_len=audio_len) + out = res_block(inputs=out, input_len=audio_len) out = self.activation(out) # [B, 1, T_audio] - out = self.post_conv(out, audio_len) + out = self.post_conv(inputs=out, input_len=audio_len) audio = self.out_activation(out) audio = rearrange(audio, "B 1 T -> B T") return audio, audio_len @@ -356,6 +360,7 @@ def output_types(self): "fmap": [NeuralType(('B', 'D', 'T_spec', 'C'), VoidType())], } + @typecheck() def forward(self, audio): fmap = [] @@ -363,11 +368,11 @@ def forward(self, audio): out = self.stft(audio) for conv in self.conv_layers: # [batch, filters, T_spec, fft // 2**i] - out = conv(out) + out = conv(inputs=out) out = self.activation(out) fmap.append(out) # [batch, 1, T_spec, fft // 8] - scores = self.conv_post(out) + scores = self.conv_post(inputs=out) fmap.append(scores) scores = rearrange(scores, "B 1 T C -> B C T") @@ -382,7 +387,7 @@ def __init__(self, resolutions): @property def input_types(self): return { - "audio": NeuralType(('B', 'T_audio'), AudioSignal()), + "audio_real": NeuralType(('B', 'T_audio'), AudioSignal()), "audio_gen": NeuralType(('B', 'T_audio'), AudioSignal()), } @@ -395,6 +400,7 @@ def output_types(self): "fmaps_gen": [[NeuralType(('B', 'D', 'T_spec', 'C'), VoidType())]], } + @typecheck() def forward(self, audio_real, audio_gen): scores_real = [] scores_gen = [] @@ -627,6 +633,7 @@ def output_types(self): "indices": NeuralType(('B', 'T'), Index()), } + @typecheck() def forward(self, inputs, input_len): input_flat = rearrange(inputs, "B T D -> (B T) D") self._init_codes(input_flat) @@ -746,6 +753,7 @@ def output_types(self): "commit_loss": NeuralType((), LossType()), } + @typecheck() def forward(self, inputs: Tensor, input_len: Tensor) -> Tuple[Tensor, Tensor, float]: commit_loss = 0.0 residual = rearrange(inputs, "B D T -> B T D") diff --git a/tests/collections/tts/modules/test_audio_codec_modules.py b/tests/collections/tts/modules/test_audio_codec_modules.py index 948b1220f39c..4650a6508edd 100644 --- a/tests/collections/tts/modules/test_audio_codec_modules.py +++ b/tests/collections/tts/modules/test_audio_codec_modules.py @@ -40,7 +40,7 @@ def test_conv1d(self): lengths = torch.tensor([self.len1, self.len2], dtype=torch.int32) conv = Conv1dNorm(in_channels=self.in_channels, out_channels=self.out_channels, kernel_size=self.kernel_size) - out = conv(inputs, lengths) + out = conv(inputs=inputs, input_len=lengths) assert out.shape == (self.batch_size, self.out_channels, self.max_len) assert torch.all(out[0, :, : self.len1] != 0.0) @@ -66,7 +66,7 @@ def test_conv1d_downsample(self): stride=stride, padding=padding, ) - out = conv(inputs, lengths) + out = conv(inputs=inputs, input_len=lengths) assert out.shape == (self.batch_size, self.out_channels, out_len) assert torch.all(out[0, :, :out_len_1] != 0.0) @@ -87,7 +87,7 @@ def test_conv1d_transpose_upsample(self): conv = ConvTranspose1dNorm( in_channels=self.in_channels, out_channels=self.out_channels, kernel_size=self.kernel_size, stride=stride ) - out = conv(inputs, lengths) + out = conv(inputs=inputs, input_len=lengths) assert out.shape == (self.batch_size, self.out_channels, out_len) assert torch.all(out[0, :, :out_len_1] != 0.0) From b4bf4ee3d9a7a10b7513d3e1004587b864334c70 Mon Sep 17 00:00:00 2001 From: Ryan Langman Date: Wed, 20 Sep 2023 09:03:59 -0700 Subject: [PATCH 042/112] [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: Sasha Meister --- .../tts/data/text_to_speech_dataset.py | 12 +- nemo/collections/tts/data/vocoder_dataset.py | 13 +- nemo/collections/tts/parts/utils/callbacks.py | 185 ++++++++++++------ 3 files changed, 149 insertions(+), 61 deletions(-) diff --git a/nemo/collections/tts/data/text_to_speech_dataset.py b/nemo/collections/tts/data/text_to_speech_dataset.py index 23ddb50346a2..8530356c5cd0 100644 --- a/nemo/collections/tts/data/text_to_speech_dataset.py +++ b/nemo/collections/tts/data/text_to_speech_dataset.py @@ -46,6 +46,7 @@ class DatasetMeta: @dataclass class DatasetSample: + dataset_name: str manifest_entry: Dict[str, Any] audio_dir: Path feature_dir: Path @@ -180,6 +181,7 @@ def _preprocess_manifest( speaker_index = 0 sample = DatasetSample( + dataset_name=dataset_name, manifest_entry=entry, audio_dir=Path(dataset.audio_dir), feature_dir=Path(dataset.feature_dir), @@ -204,7 +206,12 @@ def __getitem__(self, index): audio, _ = librosa.load(audio_filepath_abs, sr=self.sample_rate) tokens = self.text_tokenizer(data.text) - example = {"audio_filepath": audio_filepath_rel, "audio": audio, "tokens": tokens} + example = { + "dataset_name": data.dataset_name, + "audio_filepath": audio_filepath_rel, + "audio": audio, + "tokens": tokens, + } if data.speaker is not None: example["speaker"] = data.speaker @@ -229,6 +236,7 @@ def __getitem__(self, index): return example def collate_fn(self, batch: List[dict]): + dataset_name_list = [] audio_filepath_list = [] audio_list = [] audio_len_list = [] @@ -238,6 +246,7 @@ def collate_fn(self, batch: List[dict]): prior_list = [] for example in batch: + dataset_name_list.append(example["dataset_name"]) audio_filepath_list.append(example["audio_filepath"]) audio_tensor = torch.tensor(example["audio"], dtype=torch.float32) @@ -264,6 +273,7 @@ def collate_fn(self, batch: List[dict]): batch_tokens = stack_tensors(token_list, max_lens=[token_max_len], pad_value=self.text_tokenizer.pad) batch_dict = { + "dataset_names": dataset_name_list, "audio_filepaths": audio_filepath_list, "audio": batch_audio, "audio_lens": batch_audio_len, diff --git a/nemo/collections/tts/data/vocoder_dataset.py b/nemo/collections/tts/data/vocoder_dataset.py index 6bf03068a395..a5a30870dfff 100644 --- a/nemo/collections/tts/data/vocoder_dataset.py +++ b/nemo/collections/tts/data/vocoder_dataset.py @@ -43,6 +43,7 @@ class DatasetMeta: @dataclass class DatasetSample: + dataset_name: str manifest_entry: dict audio_dir: Path @@ -165,7 +166,7 @@ def _preprocess_manifest( samples = [] sample_weights = [] for entry in filtered_entries: - sample = DatasetSample(manifest_entry=entry, audio_dir=Path(dataset.audio_dir),) + sample = DatasetSample(dataset_name=dataset_name, manifest_entry=entry, audio_dir=Path(dataset.audio_dir)) samples.append(sample) sample_weights.append(dataset.sample_weight) @@ -182,7 +183,12 @@ def __getitem__(self, index): audio, audio_len = self._sample_audio(audio_filepath_abs) - example = {"audio_filepath": audio_filepath_rel, "audio": audio, "audio_len": audio_len} + example = { + "dataset_name": data.dataset_name, + "audio_filepath": audio_filepath_rel, + "audio": audio, + "audio_len": audio_len, + } for processor in self.feature_processors: processor.process(example) @@ -190,11 +196,13 @@ def __getitem__(self, index): return example def collate_fn(self, batch: List[dict]): + dataset_name_list = [] audio_filepath_list = [] audio_list = [] audio_len_list = [] for example in batch: + dataset_name_list.append(example["dataset_name"]) audio_filepath_list.append(example["audio_filepath"]) audio_list.append(example["audio"]) audio_len_list.append(example["audio_len"]) @@ -205,6 +213,7 @@ def collate_fn(self, batch: List[dict]): batch_audio = stack_tensors(audio_list, max_lens=[audio_max_len]) batch_dict = { + "dataset_names": dataset_name_list, "audio_filepaths": audio_filepath_list, "audio": batch_audio, "audio_lens": batch_audio_len, diff --git a/nemo/collections/tts/parts/utils/callbacks.py b/nemo/collections/tts/parts/utils/callbacks.py index c1be48bdaa3d..09e5c41112f3 100644 --- a/nemo/collections/tts/parts/utils/callbacks.py +++ b/nemo/collections/tts/parts/utils/callbacks.py @@ -27,6 +27,7 @@ from pytorch_lightning.loggers import TensorBoardLogger from pytorch_lightning.loggers.logger import Logger from pytorch_lightning.loggers.wandb import WandbLogger +from torch import Tensor from nemo.collections.tts.parts.utils.helpers import create_plot from nemo.utils import logging @@ -81,14 +82,14 @@ class AudioArtifact: id: str data: np.ndarray sample_rate: int - filename: str + filepath: Path @dataclass class ImageArtifact: id: str data: np.ndarray - filename: str + filepath: Path x_axis: str y_axis: str @@ -187,7 +188,8 @@ def __init__( def _log_audio(self, audio: AudioArtifact, log_dir: Path, step: int): if log_dir: - filepath = log_dir / audio.filename + filepath = log_dir / audio.filepath + filepath.parent.mkdir(parents=True, exist_ok=True) sf.write(file=filepath, data=audio.data, samplerate=audio.sample_rate) if self.tensorboard_logger: @@ -201,7 +203,8 @@ def _log_audio(self, audio: AudioArtifact, log_dir: Path, step: int): def _log_image(self, image: ImageArtifact, log_dir: Path, step: int): if log_dir: - filepath = log_dir / image.filename + filepath = log_dir / image.filepath + filepath.parent.mkdir(parents=True, exist_ok=True) else: filepath = None @@ -247,7 +250,7 @@ def on_fit_start(self, trainer: Trainer, model: LightningModule): logging.debug('List are empty, no initial artifacts to log.') return - log_dir = self.output_dir / f"initial" if self.output_dir else None + log_dir = self.output_dir / "initial" if self.output_dir else None self._log_artifacts(audio_list=audio_list, image_list=image_list, log_dir=log_dir) @@ -287,28 +290,38 @@ class VocoderArtifactGenerator(ArtifactGenerator): def generate_artifacts( self, model: LightningModule, batch_dict: Dict, initial_log: bool = False ) -> Tuple[List[AudioArtifact], List[ImageArtifact]]: - if initial_log: - # Currently, nothing to log before training starts - return [], [] - - audio_artifacts = [] + dataset_names = batch_dict.get("dataset_names") audio_filepaths = batch_dict.get("audio_filepaths") audio_ids = [create_id(p) for p in audio_filepaths] audio = batch_dict.get("audio") audio_len = batch_dict.get("audio_lens") + audio_artifacts = [] + + if initial_log: + # Log ground truth audio + for i, (dataset_name, audio_id) in enumerate(zip(dataset_names, audio_ids)): + audio_gt_path = Path(f"{dataset_name}/{audio_id}_gt.wav") + audio_gt_i = audio[i, : audio_len[i]].cpu().numpy() + audio_artifact = AudioArtifact( + id=f"audio_gt_{audio_id}", data=audio_gt_i, filepath=audio_gt_path, sample_rate=model.sample_rate, + ) + audio_artifacts.append(audio_artifact) + return audio_artifacts, [] + spec, spec_len = model.audio_to_melspec_precessor(audio, audio_len) with torch.no_grad(): audio_pred = model.forward(spec=spec) audio_pred = rearrange(audio_pred, "B 1 T -> B T") - for i, audio_id in enumerate(audio_ids): - audio_pred_i = audio_pred[i][: audio_len[i]].cpu().numpy() + for i, (dataset_name, audio_id) in enumerate(zip(dataset_names, audio_ids)): + audio_pred_path = Path(f"{dataset_name}/{audio_id}.wav") + audio_pred_i = audio_pred[i, : audio_len[i]].cpu().numpy() audio_artifact = AudioArtifact( - id=f"audio_{audio_id}", data=audio_pred_i, filename=f"{audio_id}.wav", sample_rate=model.sample_rate, + id=f"audio_{audio_id}", data=audio_pred_i, filepath=audio_pred_path, sample_rate=model.sample_rate, ) audio_artifacts.append(audio_artifact) @@ -327,19 +340,26 @@ def __init__(self, log_audio: bool = True, log_encoding: bool = False, log_dequa self.log_encoding = log_encoding # Log dequantized encoded representation of the input audio (decoder input) self.log_dequantized = log_dequantized - # Input audio will be logged only once - self.input_audio_logged = False logging.debug('Initialized %s with', self.__class__.__name__) logging.debug('\tlog_audio: %s', self.log_audio) logging.debug('\tlog_encoding: %s', self.log_encoding) logging.debug('\tlog_dequantized: %s', self.log_dequantized) - def _generate_audio(self, model, audio_ids, audio, audio_len, save_input: bool = False): + def _generate_audio( + self, + model: LightningModule, + dataset_names: List[str], + audio_ids: List[str], + audio: Tensor, + audio_len: Tensor, + save_input: bool = False, + ): """Generate audio artifacts. Args: model: callable model, outputs (audio_pred, audio_pred_len) + dataset_names: list of dataset names for the examples in audio batch audio_ids: list of IDs for the examples in audio batch audio: tensor of input audio signals, shape (B, T) audio_len: tensor of lengths for each example in the batch, shape (B,) @@ -354,35 +374,34 @@ def _generate_audio(self, model, audio_ids, audio, audio_len, save_input: bool = audio_artifacts = [] # Log output audio - for i, audio_id in enumerate(audio_ids): + for i, (dataset_name, audio_id) in enumerate(zip(dataset_names, audio_ids)): + audio_pred_path = Path(f"{dataset_name}/{audio_id}_audio_out.wav") audio_pred_i = audio_pred[i, : audio_pred_len[i]].cpu().numpy() audio_artifact = AudioArtifact( - id=f"audio_out_{audio_id}", - data=audio_pred_i, - filename=f"{audio_id}_audio_out.wav", - sample_rate=model.sample_rate, + id=f"audio_out_{audio_id}", data=audio_pred_i, filepath=audio_pred_path, sample_rate=model.sample_rate, ) audio_artifacts.append(audio_artifact) if save_input: # save input audio - for i, audio_id in enumerate(audio_ids): + for i, (dataset_name, audio_id) in enumerate(zip(dataset_names, audio_ids)): + audio_in_path = Path(f"{dataset_name}/{audio_id}_audio_in.wav") audio_in_i = audio[i, : audio_len[i]].cpu().numpy() audio_artifact = AudioArtifact( - id=f"audio_in_{audio_id}", - data=audio_in_i, - filename=f"{audio_id}_audio_in.wav", - sample_rate=model.sample_rate, + id=f"audio_in_{audio_id}", data=audio_in_i, filepath=audio_in_path, sample_rate=model.sample_rate, ) audio_artifacts.append(audio_artifact) return audio_artifacts - def _generate_images(self, model, audio_ids, audio, audio_len): + def _generate_images( + self, model: LightningModule, dataset_names: List[str], audio_ids: List[str], audio: Tensor, audio_len: Tensor + ): """Generate image artifacts. Args: model: model, needs to support `model.encode_audio`, `model.quantize` and `model.dequantize` + dataset_names: list of dataset names for the examples in audio batch audio_ids: list of IDs for the examples in audio batch audio: tensor of input audio signals, shape (B, T) audio_len: tensor of lengths for each example in the batch, shape (B,) @@ -397,12 +416,13 @@ def _generate_images(self, model, audio_ids, audio, audio_len): encoded, encoded_len = model.encode_audio(audio=audio, audio_len=audio_len) if self.log_encoding: - for i, audio_id in enumerate(audio_ids): + for i, (dataset_name, audio_id) in enumerate(zip(dataset_names, audio_ids)): + encoded_path = Path(f"{dataset_name}/{audio_id}_encoded.png") encoded_i = encoded[i, :, : encoded_len[i]].cpu().numpy() encoded_artifact = ImageArtifact( id=f"encoded_{audio_id}", data=encoded_i, - filename=f"{audio_id}_encoded.png", + filepath=encoded_path, x_axis="Audio Frames", y_axis="Channels", ) @@ -416,12 +436,13 @@ def _generate_images(self, model, audio_ids, audio, audio_len): tokens = model.quantize(encoded=encoded, encoded_len=encoded_len) dequantized = model.dequantize(tokens=tokens, tokens_len=encoded_len) - for i, audio_id in enumerate(audio_ids): + for i, (dataset_name, audio_id) in enumerate(zip(dataset_names, audio_ids)): + dequantized_path = Path(f"{dataset_name}/{audio_id}_dequantized.png") dequantized_i = dequantized[i, :, : encoded_len[i]].cpu().numpy() dequantized_artifact = ImageArtifact( id=f"dequantized_{audio_id}", data=dequantized_i, - filename=f"{audio_id}_dequantized.png", + filepath=dequantized_path, x_axis="Audio Frames", y_axis="Channels", ) @@ -439,6 +460,7 @@ def generate_artifacts( initial_log: save input audio for the initial log """ + dataset_names = batch_dict.get("dataset_names") audio_filepaths = batch_dict.get("audio_filepaths") audio_ids = [create_id(p) for p in audio_filepaths] @@ -446,9 +468,16 @@ def generate_artifacts( audio_len = batch_dict.get("audio_lens") audio_artifacts = self._generate_audio( - model=model, audio_ids=audio_ids, audio=audio, audio_len=audio_len, save_input=initial_log + model=model, + dataset_names=dataset_names, + audio_ids=audio_ids, + audio=audio, + audio_len=audio_len, + save_input=initial_log, + ) + image_artifacts = self._generate_images( + model=model, dataset_names=dataset_names, audio_ids=audio_ids, audio=audio, audio_len=audio_len ) - image_artifacts = self._generate_images(model=model, audio_ids=audio_ids, audio=audio, audio_len=audio_len) return audio_artifacts, image_artifacts @@ -486,7 +515,36 @@ def __init__( type=audio_params.vocoder_type, ) - def _generate_audio(self, mels, mels_len, hop_length): + def _create_ground_truth_artifacts( + self, model: LightningModule, dataset_names: List[str], audio_ids: List[str], batch_dict: Dict + ): + audio = batch_dict.get("audio") + audio_lens = batch_dict.get("audio_lens") + spec, spec_len = model.preprocessor(input_signal=audio, length=audio_lens) + + audio_artifacts = [] + image_artifacts = [] + for i, (dataset_name, audio_id) in enumerate(zip(dataset_names, audio_ids)): + audio_gt_path = Path(f"{dataset_name}/{audio_id}_gt.wav") + audio_gt_i = audio[i, : audio_lens[i]].cpu().numpy() + audio_artifact = AudioArtifact( + id=f"audio_gt_{audio_id}", + data=audio_gt_i, + filepath=audio_gt_path, + sample_rate=self.vocoder.sample_rate, + ) + audio_artifacts.append(audio_artifact) + + spec_gt_path = Path(f"{dataset_name}/{audio_id}_spec_gt.png") + spec_gt_i = spec[i, :, : spec_len[i]].cpu().numpy() + spec_artifact = ImageArtifact( + id=f"spec_{audio_id}", data=spec_gt_i, filepath=spec_gt_path, x_axis="Audio Frames", y_axis="Channels", + ) + image_artifacts.append(spec_artifact) + + return audio_artifacts, image_artifacts + + def _generate_audio(self, mels: Tensor, mels_len: Tensor, hop_length: int): voc_input = mels.to(self.vocoder.device) with torch.no_grad(): audio_pred = self.vocoder.convert_spectrogram_to_audio(spec=voc_input) @@ -495,7 +553,9 @@ def _generate_audio(self, mels, mels_len, hop_length): audio_pred_lens = librosa.core.frames_to_samples(mels_len_array, hop_length=hop_length) return audio_pred, audio_pred_lens - def _generate_predictions(self, model: LightningModule, audio_ids: List[str], batch_dict: Dict): + def _generate_predictions( + self, model: LightningModule, dataset_names: List[str], audio_ids: List[str], batch_dict: Dict + ): audio_artifacts = [] image_artifacts = [] @@ -508,14 +568,11 @@ def _generate_predictions(self, model: LightningModule, audio_ids: List[str], ba mels_pred, mels_pred_len, *_ = model.forward(text=text, input_lens=text_lens, speaker=speaker,) if self.log_spectrogram: - for i, audio_id in enumerate(audio_ids): - spec_i = mels_pred[i][:, : mels_pred_len[i]].cpu().numpy() + for i, (dataset_name, audio_id) in enumerate(zip(dataset_names, audio_ids)): + spec_path = Path(f"{dataset_name}/{audio_id}_spec.png") + spec_i = mels_pred[i, :, : mels_pred_len[i]].cpu().numpy() spec_artifact = ImageArtifact( - id=f"spec_{audio_id}", - data=spec_i, - filename=f"{audio_id}_spec.png", - x_axis="Audio Frames", - y_axis="Channels", + id=f"spec_{audio_id}", data=spec_i, filepath=spec_path, x_axis="Audio Frames", y_axis="Channels", ) image_artifacts.append(spec_artifact) @@ -524,19 +581,22 @@ def _generate_predictions(self, model: LightningModule, audio_ids: List[str], ba audio_pred, audio_pred_lens = self._generate_audio( mels=mels_pred, mels_len=mels_pred_len, hop_length=model.preprocessor.hop_length ) - for i, audio_id in enumerate(audio_ids): - audio_pred_i = audio_pred[i][: audio_pred_lens[i]].cpu().numpy() + for i, (dataset_name, audio_id) in enumerate(zip(dataset_names, audio_ids)): + audio_pred_path = Path(f"{dataset_name}/{audio_id}.wav") + audio_pred_i = audio_pred[i, : audio_pred_lens[i]].cpu().numpy() audio_artifact = AudioArtifact( id=f"audio_{audio_id}", data=audio_pred_i, - filename=f"{audio_id}.wav", + filepath=audio_pred_path, sample_rate=self.vocoder.sample_rate, ) audio_artifacts.append(audio_artifact) return audio_artifacts, image_artifacts - def _generate_gta_predictions(self, model: LightningModule, audio_ids: List[str], batch_dict: Dict): + def _generate_gta_predictions( + self, model: LightningModule, dataset_names: List[str], audio_ids: List[str], batch_dict: Dict + ): audio_artifacts = [] image_artifacts = [] @@ -564,12 +624,13 @@ def _generate_gta_predictions(self, model: LightningModule, audio_ids: List[str] if self.log_alignment: attn = rearrange(attn, "B 1 T_spec T_text -> B T_text T_spec") - for i, audio_id in enumerate(audio_ids): - attn_i = attn[i][: text_lens[i], : mels_pred_len[i]].cpu().numpy() + for i, (dataset_name, audio_id) in enumerate(zip(dataset_names, audio_ids)): + attn_path = Path(f"{dataset_name}/{audio_id}_align.png") + attn_i = attn[i, : text_lens[i], : mels_pred_len[i]].cpu().numpy() alignment_artifact = ImageArtifact( id=f"align_{audio_id}", data=attn_i, - filename=f"{audio_id}_align.png", + filepath=attn_path, x_axis="Audio Frames", y_axis="Text Tokens", ) @@ -580,12 +641,13 @@ def _generate_gta_predictions(self, model: LightningModule, audio_ids: List[str] audio_pred, audio_pred_lens = self._generate_audio( mels=mels_pred, mels_len=mels_pred_len, hop_length=model.preprocessor.hop_length ) - for i, audio_id in enumerate(audio_ids): - audio_pred_i = audio_pred[i][: audio_pred_lens[i]].cpu().numpy() + for i, (dataset_name, audio_id) in enumerate(zip(dataset_names, audio_ids)): + audio_pred_path = Path(f"{dataset_name}/{audio_id}_gta.wav") + audio_pred_i = audio_pred[i, : audio_pred_lens[i]].cpu().numpy() audio_artifact = AudioArtifact( id=f"audio_gta_{audio_id}", data=audio_pred_i, - filename=f"{audio_id}_gta.wav", + filepath=audio_pred_path, sample_rate=self.vocoder.sample_rate, ) audio_artifacts.append(audio_artifact) @@ -596,23 +658,30 @@ def generate_artifacts( self, model: LightningModule, batch_dict: Dict, initial_log: bool = False ) -> Tuple[List[AudioArtifact], List[ImageArtifact]]: + dataset_names = batch_dict.get("dataset_names") + audio_filepaths = batch_dict.get("audio_filepaths") + audio_ids = [create_id(p) for p in audio_filepaths] + if initial_log: - # Currently, nothing to log before training starts - return [], [] + # Log ground truth audio and spectrograms + audio_gt, spec_gt = self._create_ground_truth_artifacts( + model=model, dataset_names=dataset_names, audio_ids=audio_ids, batch_dict=batch_dict + ) + return audio_gt, spec_gt audio_artifacts = [] image_artifacts = [] - audio_filepaths = batch_dict.get("audio_filepaths") - audio_ids = [create_id(p) for p in audio_filepaths] if self.log_audio or self.log_spectrogram: - audio_pred, spec_pred = self._generate_predictions(model=model, batch_dict=batch_dict, audio_ids=audio_ids) + audio_pred, spec_pred = self._generate_predictions( + model=model, dataset_names=dataset_names, audio_ids=audio_ids, batch_dict=batch_dict + ) audio_artifacts += audio_pred image_artifacts += spec_pred if self.log_audio_gta or self.log_alignment: audio_gta_pred, alignments = self._generate_gta_predictions( - model=model, batch_dict=batch_dict, audio_ids=audio_ids + model=model, dataset_names=dataset_names, audio_ids=audio_ids, batch_dict=batch_dict ) audio_artifacts += audio_gta_pred image_artifacts += alignments From 7b444dc3699bb5c855cfb9aa4cab650f7fa445c6 Mon Sep 17 00:00:00 2001 From: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Date: Wed, 20 Sep 2023 15:10:25 -0400 Subject: [PATCH 043/112] Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../nlp/data/language_modeling/megatron/gpt_sft_dataset.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py b/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py index 72ba68945391..2c655d5cde6b 100644 --- a/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py +++ b/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py @@ -272,7 +272,10 @@ def _multiple_truncation(self, template_ids: List[List[int]], template_ids_keys: for i, (ids, key) in enumerate(zip(template_ids, template_ids_keys)): if key in self.truncation_fields: truncation_length = truncation_length_list.pop() - assert len(ids) >= truncation_length, f'{key} is not long enough to truncate.' + if len(ids) < truncation_length: + logging.warning(f'{key} is not long enough to truncate.') + truncation_length = len(ids) + if self.truncation_method == 'left': window_offset = truncation_length elif self.truncation_method == 'right': @@ -328,6 +331,7 @@ def _process_example(self, example): if len(input_ids) > self.max_seq_length: logging.warning(f'Input ids length {len(input_ids)} exceed max sequence length {self.max_seq_length}') input_ids = input_ids[: self.max_seq_length] + answer_ids = input_ids[answer_start_idx:] # store metadata in dataset, in case user may have keys required in the prediction json files metadata = {k: v for k, v in example.items() if k not in self.prompt_template_keys} From f5505837560135237163d8e73fc0cdece27c34c9 Mon Sep 17 00:00:00 2001 From: Maxime Burchi <60737204+burchim@users.noreply.github.com> Date: Wed, 20 Sep 2023 23:14:16 +0200 Subject: [PATCH 044/112] Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi * transpose conv1d inputs Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv branch Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi * add collection classes Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi * correct manifest bug when having only audio or only videos Signed-off-by: mburchi * correct manifest bug when having only audio or only videos Signed-off-by: mburchi * clean references Signed-off-by: mburchi * freeze unfreeze transcribe cv models Signed-off-by: mburchi * correct manifest get_full_path bug Signed-off-by: mburchi * update for PR Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi * add self.out = None to asr subsampling Signed-off-by: mburchi * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman Signed-off-by: Sasha Meister --- .../asr/parts/submodules/subsampling.py | 1 + .../common/parts/preprocessing/collections.py | 144 +++ .../common/parts/preprocessing/manifest.py | 21 +- nemo/collections/multimodal/__init__.py | 13 + .../multimodal/speech_cv/__init__.py | 25 + .../multimodal/speech_cv/data/__init__.py | 13 + .../speech_cv/data/video_to_text.py | 870 +++++++++++++++++ .../speech_cv/data/video_to_text_dataset.py | 287 ++++++ .../multimodal/speech_cv/models/__init__.py | 27 + .../speech_cv/models/visual_ctc_bpe_models.py | 314 ++++++ .../speech_cv/models/visual_ctc_models.py | 692 +++++++++++++ .../visual_hybrid_rnnt_ctc_bpe_models.py | 455 +++++++++ .../models/visual_hybrid_rnnt_ctc_models.py | 644 ++++++++++++ .../models/visual_rnnt_bpe_models.py | 321 ++++++ .../speech_cv/models/visual_rnnt_models.py | 920 ++++++++++++++++++ .../multimodal/speech_cv/modules/__init__.py | 20 + .../linear_projection_video_front_end.py | 143 +++ .../modules/resnet_video_front_end.py | 84 ++ .../speech_cv/modules/video_augment.py | 225 +++++ .../speech_cv/modules/video_preprocessing.py | 138 +++ .../multimodal/speech_cv/parts/__init__.py | 13 + .../speech_cv/parts/preprocessing/features.py | 62 ++ .../speech_cv/parts/submodules/__init__.py | 13 + .../speech_cv/parts/submodules/conv2d.py | 72 ++ .../parts/submodules/global_avg_pool2d.py | 28 + .../speech_cv/parts/submodules/permute.py | 28 + .../speech_cv/parts/submodules/resnet.py | 175 ++++ .../parts/submodules/resnet_block.py | 86 ++ .../submodules/resnet_bottleneck_block.py | 107 ++ nemo/core/neural_types/elements.py | 16 + 30 files changed, 5952 insertions(+), 5 deletions(-) create mode 100644 nemo/collections/multimodal/__init__.py create mode 100644 nemo/collections/multimodal/speech_cv/__init__.py create mode 100644 nemo/collections/multimodal/speech_cv/data/__init__.py create mode 100644 nemo/collections/multimodal/speech_cv/data/video_to_text.py create mode 100644 nemo/collections/multimodal/speech_cv/data/video_to_text_dataset.py create mode 100644 nemo/collections/multimodal/speech_cv/models/__init__.py create mode 100644 nemo/collections/multimodal/speech_cv/models/visual_ctc_bpe_models.py create mode 100644 nemo/collections/multimodal/speech_cv/models/visual_ctc_models.py create mode 100644 nemo/collections/multimodal/speech_cv/models/visual_hybrid_rnnt_ctc_bpe_models.py create mode 100644 nemo/collections/multimodal/speech_cv/models/visual_hybrid_rnnt_ctc_models.py create mode 100644 nemo/collections/multimodal/speech_cv/models/visual_rnnt_bpe_models.py create mode 100644 nemo/collections/multimodal/speech_cv/models/visual_rnnt_models.py create mode 100644 nemo/collections/multimodal/speech_cv/modules/__init__.py create mode 100644 nemo/collections/multimodal/speech_cv/modules/linear_projection_video_front_end.py create mode 100644 nemo/collections/multimodal/speech_cv/modules/resnet_video_front_end.py create mode 100644 nemo/collections/multimodal/speech_cv/modules/video_augment.py create mode 100644 nemo/collections/multimodal/speech_cv/modules/video_preprocessing.py create mode 100644 nemo/collections/multimodal/speech_cv/parts/__init__.py create mode 100644 nemo/collections/multimodal/speech_cv/parts/preprocessing/features.py create mode 100644 nemo/collections/multimodal/speech_cv/parts/submodules/__init__.py create mode 100644 nemo/collections/multimodal/speech_cv/parts/submodules/conv2d.py create mode 100644 nemo/collections/multimodal/speech_cv/parts/submodules/global_avg_pool2d.py create mode 100644 nemo/collections/multimodal/speech_cv/parts/submodules/permute.py create mode 100644 nemo/collections/multimodal/speech_cv/parts/submodules/resnet.py create mode 100644 nemo/collections/multimodal/speech_cv/parts/submodules/resnet_block.py create mode 100644 nemo/collections/multimodal/speech_cv/parts/submodules/resnet_bottleneck_block.py diff --git a/nemo/collections/asr/parts/submodules/subsampling.py b/nemo/collections/asr/parts/submodules/subsampling.py index b04138629055..068cd36022b0 100644 --- a/nemo/collections/asr/parts/submodules/subsampling.py +++ b/nemo/collections/asr/parts/submodules/subsampling.py @@ -369,6 +369,7 @@ def __init__( self.out = torch.nn.Linear(conv_channels * int(out_length), feat_out) self.conv2d_subsampling = True elif subsampling in ["striding_conv1d", "dw_striding_conv1d"]: + self.out = None self.conv2d_subsampling = False else: raise ValueError(f"Not valid sub-sampling: {subsampling}!") diff --git a/nemo/collections/common/parts/preprocessing/collections.py b/nemo/collections/common/parts/preprocessing/collections.py index ed9e53ae6ffe..66def034400f 100644 --- a/nemo/collections/common/parts/preprocessing/collections.py +++ b/nemo/collections/common/parts/preprocessing/collections.py @@ -199,6 +199,114 @@ def __init__( super().__init__(data) +class VideoText(_Collection): + """List of video-transcript text correspondence with preprocessing.""" + + OUTPUT_TYPE = collections.namedtuple( + typename='AudioTextEntity', + field_names='id video_file duration text_tokens offset text_raw speaker orig_sr lang', + ) + + def __init__( + self, + ids: List[int], + video_files: List[str], + durations: List[float], + texts: List[str], + offsets: List[str], + speakers: List[Optional[int]], + orig_sampling_rates: List[Optional[int]], + token_labels: List[Optional[int]], + langs: List[Optional[str]], + parser: parsers.CharParser, + min_duration: Optional[float] = None, + max_duration: Optional[float] = None, + max_number: Optional[int] = None, + do_sort_by_duration: bool = False, + index_by_file_id: bool = False, + ): + """Instantiates video-text manifest with filters and preprocessing. + + Args: + ids: List of examples positions. + video_files: List of video files. + durations: List of float durations. + texts: List of raw text transcripts. + offsets: List of duration offsets or None. + speakers: List of optional speakers ids. + orig_sampling_rates: List of original sampling rates of audio files. + langs: List of language ids, one for eadh sample, or None. + parser: Instance of `CharParser` to convert string to tokens. + min_duration: Minimum duration to keep entry with (default: None). + max_duration: Maximum duration to keep entry with (default: None). + max_number: Maximum number of samples to collect. + do_sort_by_duration: True if sort samples list by duration. Not compatible with index_by_file_id. + index_by_file_id: If True, saves a mapping from filename base (ID) to index in data. + """ + + output_type = self.OUTPUT_TYPE + data, duration_filtered, num_filtered, total_duration = [], 0.0, 0, 0.0 + if index_by_file_id: + self.mapping = {} + + for id_, video_file, duration, offset, text, speaker, orig_sr, token_labels, lang in zip( + ids, video_files, durations, offsets, texts, speakers, orig_sampling_rates, token_labels, langs + ): + # Duration filters. + if min_duration is not None and duration < min_duration: + duration_filtered += duration + num_filtered += 1 + continue + + if max_duration is not None and duration > max_duration: + duration_filtered += duration + num_filtered += 1 + continue + + if token_labels is not None: + text_tokens = token_labels + else: + if text != '': + if hasattr(parser, "is_aggregate") and parser.is_aggregate and isinstance(text, str): + if lang is not None: + text_tokens = parser(text, lang) + else: + raise ValueError("lang required in manifest when using aggregate tokenizers") + else: + text_tokens = parser(text) + else: + text_tokens = [] + + if text_tokens is None: + duration_filtered += duration + num_filtered += 1 + continue + + total_duration += duration + + data.append(output_type(id_, video_file, duration, text_tokens, offset, text, speaker, orig_sr, lang)) + if index_by_file_id: + file_id, _ = os.path.splitext(os.path.basename(video_file)) + if file_id not in self.mapping: + self.mapping[file_id] = [] + self.mapping[file_id].append(len(data) - 1) + + # Max number of entities filter. + if len(data) == max_number: + break + + if do_sort_by_duration: + if index_by_file_id: + logging.warning("Tried to sort dataset by duration, but cannot since index_by_file_id is set.") + else: + data.sort(key=lambda entity: entity.duration) + + logging.info("Dataset loaded with %d files totalling %.2f hours", len(data), total_duration / 3600) + logging.info("%d files were filtered totalling %.2f hours", num_filtered, duration_filtered / 3600) + + super().__init__(data) + + class ASRAudioText(AudioText): """`AudioText` collector from asr structured json files.""" @@ -235,6 +343,42 @@ def __init__(self, manifests_files: Union[str, List[str]], *args, **kwargs): ) +class ASRVideoText(VideoText): + """`VideoText` collector from cv structured json files.""" + + def __init__(self, manifests_files: Union[str, List[str]], *args, **kwargs): + """Parse lists of video files, durations and transcripts texts. + + Args: + manifests_files: Either single string file or list of such - + manifests to yield items from. + *args: Args to pass to `VideoText` constructor. + **kwargs: Kwargs to pass to `VideoText` constructor. + """ + + ids, video_files, durations, texts, offsets, = ( + [], + [], + [], + [], + [], + ) + speakers, orig_srs, token_labels, langs = [], [], [], [] + for item in manifest.item_iter(manifests_files): + ids.append(item['id']) + video_files.append(item['video_file']) + durations.append(item['duration']) + texts.append(item['text']) + offsets.append(item['offset']) + speakers.append(item['speaker']) + orig_srs.append(item['orig_sr']) + token_labels.append(item['token_labels']) + langs.append(item['lang']) + super().__init__( + ids, video_files, durations, texts, offsets, speakers, orig_srs, token_labels, langs, *args, **kwargs + ) + + class SpeechLabel(_Collection): """List of audio-label correspondence with preprocessing.""" diff --git a/nemo/collections/common/parts/preprocessing/manifest.py b/nemo/collections/common/parts/preprocessing/manifest.py index 98194505c589..882895570711 100644 --- a/nemo/collections/common/parts/preprocessing/manifest.py +++ b/nemo/collections/common/parts/preprocessing/manifest.py @@ -91,16 +91,26 @@ def __parse_item(line: str, manifest_file: str) -> Dict[str, Any]: item['audio_file'] = item.pop('audio_filename') elif 'audio_filepath' in item: item['audio_file'] = item.pop('audio_filepath') - elif 'audio_file' not in item: + + # Video File + if 'video_filename' in item: + item['video_file'] = item.pop('video_filename') + elif 'video_filepath' in item: + item['video_file'] = item.pop('video_filepath') + + if 'video_file' not in item and 'audio_file' not in item: raise ValueError( - f"Manifest file {manifest_file} has invalid json line structure: {line} without proper audio file key." + f"Manifest file {manifest_file} has invalid json line structure: {line} without proper audio/video file key." ) - # If the audio path is a relative path and does not exist, + # If the audio/video path is a relative path and does not exist, # try to attach the parent directory of manifest to the audio path. # Revert to the original path if the new path still doesn't exist. # Assume that the audio path is like "wavs/xxxxxx.wav". - item['audio_file'] = get_full_path(audio_file=item['audio_file'], manifest_file=manifest_file) + if 'audio_file' in item: + item['audio_file'] = get_full_path(audio_file=item['audio_file'], manifest_file=manifest_file) + if 'video_file' in item: + item['video_file'] = get_full_path(audio_file=item['video_file'], manifest_file=manifest_file) # Duration. if 'duration' not in item: @@ -144,7 +154,8 @@ def __parse_item(line: str, manifest_file: str) -> Dict[str, Any]: item['feature_file'] = get_full_path(audio_file=item['feature_file'], manifest_file=manifest_file) item = dict( - audio_file=item['audio_file'], + audio_file=item.get('audio_file', None), + video_file=item.get('video_file', None), duration=item['duration'], text=item['text'], rttm_file=item['rttm_file'], diff --git a/nemo/collections/multimodal/__init__.py b/nemo/collections/multimodal/__init__.py new file mode 100644 index 000000000000..4fc50543f1d2 --- /dev/null +++ b/nemo/collections/multimodal/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/nemo/collections/multimodal/speech_cv/__init__.py b/nemo/collections/multimodal/speech_cv/__init__.py new file mode 100644 index 000000000000..e13e4812457b --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/__init__.py @@ -0,0 +1,25 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from nemo.collections.multimodal.speech_cv import data, models, modules +from nemo.package_info import __version__ + +# Set collection version equal to NeMo version. +__version = __version__ + +# Authorship. +__author__ = "NVIDIA Corporation" + +# Set collection name. +__description__ = "Speech Computer Vision collection" diff --git a/nemo/collections/multimodal/speech_cv/data/__init__.py b/nemo/collections/multimodal/speech_cv/data/__init__.py new file mode 100644 index 000000000000..9e3250071955 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/data/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/nemo/collections/multimodal/speech_cv/data/video_to_text.py b/nemo/collections/multimodal/speech_cv/data/video_to_text.py new file mode 100644 index 000000000000..d0b903a5895b --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/data/video_to_text.py @@ -0,0 +1,870 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Callable, Dict, Iterable, List, Optional, Tuple, Union + +import braceexpand +import torch +import webdataset as wd + +from nemo.collections.asr.data.audio_to_text import cache_datastore_manifests, expand_sharded_filepaths +from nemo.collections.asr.parts.utils.audio_utils import ChannelSelectorType +from nemo.collections.common import tokenizers +from nemo.collections.common.parts.preprocessing import collections, parsers +from nemo.collections.multimodal.speech_cv.parts.preprocessing.features import VideoFeaturizer +from nemo.core.classes import Dataset, IterableDataset +from nemo.core.neural_types import * +from nemo.utils import logging +from nemo.utils.data_utils import datastore_path_to_webdataset_url, is_datastore_path + + +def _video_speech_collate_fn(batch, pad_id): + """collate batch of video sig, video len, tokens, tokens len + Args: + batch (Optional[FloatTensor], Optional[LongTensor], LongTensor, + LongTensor): A tuple of tuples of signal, signal lengths, + encoded tokens, and encoded tokens length. This collate func + assumes the signals are 4d torch tensors (Time, Height, Width, Channels). + """ + packed_batch = list(zip(*batch)) + + if len(packed_batch) == 5: + _, video_lengths, _, tokens_lengths, sample_ids = packed_batch + elif len(packed_batch) == 4: + sample_ids = None + _, video_lengths, _, tokens_lengths = packed_batch + else: + raise ValueError("Expects 4 or 5 tensors in the batch!") + + # Max Video Len + max_video_len = 0 + has_video = video_lengths[0] is not None + if has_video: + max_video_len = max(video_lengths).item() + + # Max Token Len + max_tokens_len = max(tokens_lengths).item() + + video_signal, tokens = [], [] + for b in batch: + + if len(b) == 5: + video_sig, video_sig_len, tokens_i, tokens_i_len, _ = b + else: + video_sig, video_sig_len, tokens_i, tokens_i_len = b + + # Pad and Append Video + if has_video: + video_sig_len = video_sig_len.item() + if video_sig_len < max_video_len: + pad = (0, 0, 0, 0, 0, 0, 0, max_video_len - video_sig_len) + video_sig = torch.nn.functional.pad(video_sig, pad) + video_signal.append(video_sig) + + # Pad and Append Token + tokens_i_len = tokens_i_len.item() + if tokens_i_len < max_tokens_len: + pad = (0, max_tokens_len - tokens_i_len) + tokens_i = torch.nn.functional.pad(tokens_i, pad, value=pad_id) + tokens.append(tokens_i) + + # Stack Video + if has_video: + video_signal = torch.stack(video_signal) + video_lengths = torch.stack(video_lengths) + else: + video_signal, video_lengths = None, None + + # Stack Text + tokens = torch.stack(tokens) + tokens_lengths = torch.stack(tokens_lengths) + + # Return + if sample_ids is None: + return video_signal, video_lengths, tokens, tokens_lengths + else: + sample_ids = torch.tensor(sample_ids, dtype=torch.int32) + return video_signal, video_lengths, tokens, tokens_lengths, sample_ids + + +class _VideoTextDataset(Dataset): + """ + Dataset that loads tensors via a json file containing paths to video files, transcripts, and durations (in seconds). + Each new line is a different sample. Example below: + {"video_filepath": "/path/to/video.mp4", "text_filepath": "/path/to/video.txt", "duration": 23.147} + ... + {"video_filepath": "/path/to/video.mp4", "text": "the transcription", "offset": 301.75, "duration": 0.82, "utt": + "utterance_id", "ctm_utt": "en_4156", "side": "A"} + Args: + manifest_filepath: Path to manifest json as described above. Can be comma-separated paths. + parser: Str for a language specific preprocessor or a callable. + int_values (bool): If true, load samples as 32-bit integers. Defauts to False. + max_duration: If video exceeds this length, do not include in dataset + min_duration: If video is less than this length, do not include in dataset + max_utts: Limit number of utterances + trim: whether or not to trim silence. Defaults to False + bos_id: Id of beginning of sequence symbol to append if not None + eos_id: Id of end of sequence symbol to append if not None + pad_id: Id of pad symbol. Defaults to 0 + return_sample_id (bool): whether to return the sample_id as a part of each sample + channel_selector (int | Iterable[int] | str): select a single channel or a subset of channels from multi-channel audio. If set to `'average'`, it performs averaging across channels. Disabled if set to `None`. Defaults to `None`. Uses zero-based indexing. + """ + + @property + def output_types(self) -> Optional[Dict[str, NeuralType]]: + """Returns definitions of module output ports. + """ + return { + 'video_signal': NeuralType(('B', 'C', 'T', 'H', 'W'), VideoSignal()), + 'video_sig_length': NeuralType(tuple('B'), LengthsType()), + 'transcripts': NeuralType(('B', 'T'), LabelsType()), + 'transcript_length': NeuralType(tuple('B'), LengthsType()), + 'sample_id': NeuralType(tuple('B'), LengthsType(), optional=True), + } + + def __init__( + self, + manifest_filepath: str, + parser: Union[str, Callable], + int_values: bool = False, + max_duration: Optional[int] = None, + min_duration: Optional[int] = None, + max_utts: int = 0, + trim: bool = False, + bos_id: Optional[int] = None, + eos_id: Optional[int] = None, + pad_id: int = 0, + return_sample_id: bool = False, + channel_selector: Optional[ChannelSelectorType] = None, + ): + if type(manifest_filepath) == str: + manifest_filepath = manifest_filepath.split(",") + + # If necessary, cache manifests and audio from object store + cache_datastore_manifests(manifest_filepaths=manifest_filepath, cache_audio=True) + + self.manifest_processor = VSRManifestProcessor( + manifest_filepath=manifest_filepath, + parser=parser, + max_duration=max_duration, + min_duration=min_duration, + max_utts=max_utts, + bos_id=bos_id, + eos_id=eos_id, + pad_id=pad_id, + ) + self.video_featurizer = VideoFeaturizer() + self.trim = trim + self.return_sample_id = return_sample_id + self.channel_selector = channel_selector + + def get_manifest_sample(self, sample_id): + return self.manifest_processor.collection[sample_id] + + def __getitem__(self, index): + + # Select Sample + sample = self.manifest_processor.collection[index] + + # Offset + offset = sample.offset + if offset is None: + offset = 0 + + # Load Video + video_features = self.video_featurizer.process(sample.video_file, offset=offset, duration=sample.duration) + vf, vfl = video_features, torch.tensor(video_features.shape[0]).long() + + # Load Tokens + t, tl = self.manifest_processor.process_text_by_sample(sample=sample) + + if self.return_sample_id: + output = vf, vfl, torch.tensor(t).long(), torch.tensor(tl).long(), index + else: + output = vf, vfl, torch.tensor(t).long(), torch.tensor(tl).long() + + return output + + def __len__(self): + return len(self.manifest_processor.collection) + + def _collate_fn(self, batch): + return _video_speech_collate_fn(batch, pad_id=self.manifest_processor.pad_id) + + +class VSRManifestProcessor: + """ + Class that processes a manifest json file containing paths to video files, transcripts, and durations (in seconds). + Each new line is a different sample. Example below: + {"video_filepath": "/path/to/video.mp4", "text_filepath": "/path/to/video.txt", "duration": 23.147} + ... + {"video_filepath": "/path/to/video.mp4", "text": "the transcription", "offset": 301.75, "duration": 0.82, "utt": + "utterance_id", "ctm_utt": "en_4156", "side": "A"} + Args: + manifest_filepath: Path to manifest json as described above. Can be comma-separated paths. + parser: Str for a language specific preprocessor or a callable. + max_duration: If video exceeds this length, do not include in dataset. + min_duration: If video is less than this length, do not include in dataset. + max_utts: Limit number of utterances. + bos_id: Id of beginning of sequence symbol to append if not None. + eos_id: Id of end of sequence symbol to append if not None. + pad_id: Id of pad symbol. Defaults to 0. + """ + + def __init__( + self, + manifest_filepath: str, + parser: Union[str, Callable], + max_duration: Optional[float] = None, + min_duration: Optional[float] = None, + max_utts: int = 0, + bos_id: Optional[int] = None, + eos_id: Optional[int] = None, + pad_id: int = 0, + index_by_file_id: bool = False, + ): + self.parser = parser + + self.collection = collections.ASRVideoText( + manifests_files=manifest_filepath, + parser=parser, + min_duration=min_duration, + max_duration=max_duration, + max_number=max_utts, + index_by_file_id=index_by_file_id, + ) + + self.eos_id = eos_id + self.bos_id = bos_id + self.pad_id = pad_id + + def process_text_by_id(self, index: int) -> Tuple[List[int], int]: + sample = self.collection[index] + return self.process_text_by_sample(sample) + + def process_text_by_file_id(self, file_id: str) -> Tuple[List[int], int]: + manifest_idx = self.collection.mapping[file_id][0] + sample = self.collection[manifest_idx] + return self.process_text_by_sample(sample) + + def process_text_by_sample(self, sample: collections.ASRAudioText.OUTPUT_TYPE) -> Tuple[List[int], int]: + t, tl = sample.text_tokens, len(sample.text_tokens) + + if self.bos_id is not None: + t = [self.bos_id] + t + tl += 1 + if self.eos_id is not None: + t = t + [self.eos_id] + tl += 1 + + return t, tl + + +class VideoToBPEDataset(_VideoTextDataset): + """ + Dataset that loads tensors via a json file containing paths to video + files, transcripts, and durations (in seconds). Each new line is a + different sample. Example below: + {"video_filepath": "/path/to/video.mp4", "text_filepath": + "/path/to/video.txt", "duration": 23.147} + ... + {"video_filepath": "/path/to/video.mp4", "text": "the + transcription", "offset": 301.75, "duration": 0.82, "utt": + "utterance_id", "ctm_utt": "en_4156", "side": "A"} + + In practice, the dataset and manifest used for character encoding and byte pair encoding + are exactly the same. The only difference lies in how the dataset tokenizes the text in + the manifest. + + Args: + manifest_filepath: Path to manifest json as described above. Can + be comma-separated paths. + tokenizer: A subclass of the Tokenizer wrapper found in the common collection, + nemo.collections.common.tokenizers.TokenizerSpec. ASR Models support a subset of + all available tokenizers. + int_values (bool): If true, load samples as 32-bit integers. Defauts to False. + max_duration: If video exceeds this length, do not include in dataset + min_duration: If video is less than this length, do not include + in dataset + max_utts: Limit number of utterances + trim: Whether to trim silence segments + use_start_end_token: Boolean which dictates whether to add [BOS] and [EOS] + tokens to beginning and ending of speech respectively. + return_sample_id (bool): whether to return the sample_id as a part of each sample + channel_selector (int | Iterable[int] | str): select a single channel or a subset of channels from multi-channel audio. If set to `'average'`, it performs averaging across channels. Disabled if set to `None`. Defaults to `None`. Uses zero-based indexing. + """ + + @property + def output_types(self) -> Optional[Dict[str, NeuralType]]: + """Returns definitions of module output ports. + """ + return { + 'video_signal': NeuralType(('B', 'C', 'T', 'H', 'W'), VideoSignal()), + 'video_sig_length': NeuralType(tuple('B'), LengthsType()), + 'transcripts': NeuralType(('B', 'T'), LabelsType()), + 'transcript_length': NeuralType(tuple('B'), LengthsType()), + 'sample_id': NeuralType(tuple('B'), LengthsType(), optional=True), + } + + def __init__( + self, + manifest_filepath: str, + tokenizer: 'nemo.collections.common.tokenizers.TokenizerSpec', + int_values: bool = False, + max_duration: Optional[int] = None, + min_duration: Optional[int] = None, + max_utts: int = 0, + trim: bool = False, + use_start_end_token: bool = True, + return_sample_id: bool = False, + channel_selector: Optional[ChannelSelectorType] = None, + ): + if use_start_end_token and hasattr(tokenizer, "bos_id") and tokenizer.bos_id > 0: + bos_id = tokenizer.bos_id + else: + bos_id = None + + if use_start_end_token and hasattr(tokenizer, "eos_id") and tokenizer.eos_id > 0: + eos_id = tokenizer.eos_id + else: + eos_id = None + + if hasattr(tokenizer, "pad_id") and tokenizer.pad_id > 0: + pad_id = tokenizer.pad_id + else: + pad_id = 0 + + class TokenizerWrapper: + def __init__(self, tokenizer): + if isinstance(tokenizer, tokenizers.aggregate_tokenizer.AggregateTokenizer): + self.is_aggregate = True + else: + self.is_aggregate = False + self._tokenizer = tokenizer + + def __call__(self, *args): + if isinstance(args[0], List) and self.is_aggregate: + t = [] + for span in args[0]: + t.extend(self._tokenizer.text_to_ids(span['str'], span['lang'])) + return t + + t = self._tokenizer.text_to_ids(*args) + return t + + super().__init__( + manifest_filepath=manifest_filepath, + parser=TokenizerWrapper(tokenizer), + int_values=int_values, + max_duration=max_duration, + min_duration=min_duration, + max_utts=max_utts, + bos_id=bos_id, + eos_id=eos_id, + pad_id=pad_id, + trim=trim, + return_sample_id=return_sample_id, + channel_selector=channel_selector, + ) + + +class VideoToCharDataset(_VideoTextDataset): + """ + Dataset that loads tensors via a json file containing paths to video + files, transcripts, and durations (in seconds). Each new line is a + different sample. Example below: + {"video_filepath": "/path/to/video.mp4", "text_filepath": + "/path/to/video.txt", "duration": 23.147} + ... + {"video_filepath": "/path/to/video.mp4", "text": "the + transcription", "offset": 301.75, "duration": 0.82, "utt": + "utterance_id", "ctm_utt": "en_4156", "side": "A"} + + Args: + manifest_filepath: Path to manifest json as described above. Can + be comma-separated paths. + labels: String containing all the possible characters to map to + int_values (bool): If true, load samples as 32-bit integers. Defauts to False. + max_duration: If video exceeds this length, do not include in dataset + min_duration: If video is less than this length, do not include + in dataset + max_utts: Limit number of utterances + blank_index: blank character index, default = -1 + unk_index: unk_character index, default = -1 + normalize: whether to normalize transcript text (default): True + bos_id: Id of beginning of sequence symbol to append if not None + eos_id: Id of end of sequence symbol to append if not None + return_sample_id (bool): whether to return the sample_id as a part of each sample + channel_selector (int | Iterable[int] | str): select a single channel or a subset of channels from multi-channel audio. If set to `'average'`, it performs averaging across channels. Disabled if set to `None`. Defaults to `None`. Uses zero-based indexing. + """ + + @property + def output_types(self) -> Optional[Dict[str, NeuralType]]: + """Returns definitions of module output ports. + """ + return { + 'video_signal': NeuralType(('B', 'C', 'T', 'H', 'W'), VideoSignal()), + 'video_sig_length': NeuralType(tuple('B'), LengthsType()), + 'transcripts': NeuralType(('B', 'T'), LabelsType()), + 'transcript_length': NeuralType(tuple('B'), LengthsType()), + 'sample_id': NeuralType(tuple('B'), LengthsType(), optional=True), + } + + def __init__( + self, + manifest_filepath: str, + labels: Union[str, List[str]], + int_values: bool = False, + max_duration: Optional[float] = None, + min_duration: Optional[float] = None, + max_utts: int = 0, + blank_index: int = -1, + unk_index: int = -1, + normalize: bool = True, + trim: bool = False, + bos_id: Optional[int] = None, + eos_id: Optional[int] = None, + pad_id: int = 0, + parser: Union[str, Callable] = 'en', + return_sample_id: bool = False, + channel_selector: Optional[ChannelSelectorType] = None, + ): + self.labels = labels + + parser = parsers.make_parser( + labels=labels, name=parser, unk_id=unk_index, blank_id=blank_index, do_normalize=normalize + ) + + super().__init__( + manifest_filepath=manifest_filepath, + parser=parser, + int_values=int_values, + max_duration=max_duration, + min_duration=min_duration, + max_utts=max_utts, + trim=trim, + bos_id=bos_id, + eos_id=eos_id, + pad_id=pad_id, + return_sample_id=return_sample_id, + channel_selector=channel_selector, + ) + + +class _TarredVideoToTextDataset(IterableDataset): + """ + A similar Dataset to the VideoToCharDataset/VideoToBPEDataset, but which loads tarred video files. + + Accepts a single comma-separated JSON manifest file (in the same style as for the VideoToCharDataset/VideoToBPEDataset), + as well as the path(s) to the tarball(s) containing the mp4 files. Each line of the manifest should + contain the information for one video file, including at least the transcript and name of the audio + file within the tarball. + + Valid formats for the audio_tar_filepaths argument include: + (1) a single string that can be brace-expanded, e.g. 'path/to/audio.tar' or 'path/to/audio_{1..100}.tar.gz', or + (2) a list of file paths that will not be brace-expanded, e.g. ['audio_1.tar', 'audio_2.tar', ...]. + + Note: For brace expansion in (1), there may be cases where `{x..y}` syntax cannot be used due to shell interference. + This occurs most commonly inside SLURM scripts. Therefore we provide a few equivalent replacements. + Supported opening braces - { <=> (, [, < and the special tag _OP_. + Supported closing braces - } <=> ), ], > and the special tag _CL_. + For SLURM based tasks, we suggest the use of the special tags for ease of use. + + See the WebDataset documentation for more information about accepted data and input formats. + + If using multiple workers the number of shards should be divisible by world_size to ensure an + even split among workers. If it is not divisible, logging will give a warning but training will proceed. + In addition, if using mutiprocessing, each shard MUST HAVE THE SAME NUMBER OF ENTRIES after filtering + is applied. We currently do not check for this, but your program may hang if the shards are uneven! + + Notice that a few arguments are different from the AudioToCharDataset; for example, shuffle (bool) has been + replaced by shuffle_n (int). + + Additionally, please note that the len() of this DataLayer is assumed to be the length of the manifest + after filtering. An incorrect manifest length may lead to some DataLoader issues down the line. + + Args: + audio_tar_filepaths: Either a list of audio tarball filepaths, or a + string (can be brace-expandable). + manifest_filepath (str): Path to the manifest. + parser (callable): A callable which is used to pre-process the text output. + int_values (bool): If true, load samples as 32-bit integers. Defauts to False. + shuffle_n (int): How many samples to look ahead and load to be shuffled. + See WebDataset documentation for more details. + Defaults to 0. + min_duration (float): Dataset parameter. + All training files which have a duration less than min_duration + are dropped. Note: Duration is read from the manifest JSON. + Defaults to 0.1. + max_duration (float): Dataset parameter. + All training files which have a duration more than max_duration + are dropped. Note: Duration is read from the manifest JSON. + Defaults to None. + blank_index (int): Blank character index, defaults to -1. + unk_index (int): Unknown character index, defaults to -1. + normalize (bool): Dataset parameter. + Whether to use automatic text cleaning. + It is highly recommended to manually clean text for best results. + Defaults to True. + trim (bool): Whether to use trim silence from beginning and end + of audio signal using librosa.effects.trim(). + Defaults to False. + bos_id (id): Dataset parameter. + Beginning of string symbol id used for seq2seq models. + Defaults to None. + eos_id (id): Dataset parameter. + End of string symbol id used for seq2seq models. + Defaults to None. + pad_id (id): Token used to pad when collating samples in batches. + If this is None, pads using 0s. + Defaults to None. + shard_strategy (str): Tarred dataset shard distribution strategy chosen as a str value during ddp. + - `scatter`: The default shard strategy applied by WebDataset, where each node gets + a unique set of shards, which are permanently pre-allocated and never changed at runtime. + - `replicate`: Optional shard strategy, where each node gets all of the set of shards + available in the tarred dataset, which are permanently pre-allocated and never changed at runtime. + The benefit of replication is that it allows each node to sample data points from the entire + dataset independently of other nodes, and reduces dependence on value of `shuffle_n`. + + .. warning:: + Replicated strategy allows every node to sample the entire set of available tarfiles, + and therefore more than one node may sample the same tarfile, and even sample the same + data points! As such, there is no assured guarantee that all samples in the dataset will be + sampled at least once during 1 epoch. Scattered strategy, on the other hand, on specific + occasions (when the number of shards is not divisible with ``world_size``), will not sample + the entire dataset. For these reasons it is not advisable to use tarred datasets as validation + or test datasets. + global_rank (int): Worker rank, used for partitioning shards. Defaults to 0. + world_size (int): Total number of processes, used for partitioning shards. Defaults to 0. + return_sample_id (bool): whether to return the sample_id as a part of each sample + """ + + def __init__( + self, + audio_tar_filepaths: Union[str, List[str]], + manifest_filepath: str, + parser: Callable, + int_values: bool = False, + shuffle_n: int = 0, + min_duration: Optional[float] = None, + max_duration: Optional[float] = None, + trim: bool = False, + bos_id: Optional[int] = None, + eos_id: Optional[int] = None, + pad_id: int = 0, + shard_strategy: str = "scatter", + global_rank: int = 0, + world_size: int = 0, + return_sample_id: bool = False, + ): + # If necessary, cache manifests from object store + cache_datastore_manifests(manifest_filepaths=manifest_filepath) + + self.manifest_processor = VSRManifestProcessor( + manifest_filepath=manifest_filepath, + parser=parser, + max_duration=max_duration, + min_duration=min_duration, + max_utts=0, + bos_id=bos_id, + eos_id=eos_id, + pad_id=pad_id, + index_by_file_id=True, # Must set this so the manifest lines can be indexed by file ID + ) + + self.video_featurizer = VideoFeaturizer() + self.trim = trim + self.eos_id = eos_id + self.bos_id = bos_id + self.pad_id = pad_id + self.return_sample_id = return_sample_id + + audio_tar_filepaths = expand_sharded_filepaths( + audio_tar_filepaths=audio_tar_filepaths, + shard_strategy=shard_strategy, + world_size=world_size, + global_rank=global_rank, + ) + + # Put together WebDataset + self._dataset = wd.WebDataset(urls=audio_tar_filepaths, nodesplitter=None) + + if shuffle_n > 0: + self._dataset = self._dataset.shuffle(shuffle_n) + else: + logging.info("WebDataset will not shuffle files within the tar files.") + + self._dataset = ( + self._dataset.map(wd.autodecode.Decoder([wd.torch_video])) + .rename(video="mp4", key='__key__') + .to_tuple('video', 'key') + .pipe(self._filter) + .pipe(self._loop_offsets) + .map(f=self._build_sample) + ) + + def _filter(self, iterator): + """This function is used to remove samples that have been filtered out by ASRVideoText already. + Otherwise, we would get a KeyError as _build_sample attempts to find the manifest entry for a sample + that was filtered out (e.g. for duration). + Note that if using multi-GPU training, filtering may lead to an imbalance in samples in each shard, + which may make your code hang as one process will finish before the other. + """ + + class TarredAudioFilter: + def __init__(self, collection): + self.iterator = iterator + self.collection = collection + + def __iter__(self): + return self + + def __next__(self): + while True: + try: + video_bytes, audio_filename = next(self.iterator) + except: + print("except") + continue + file_id, _ = os.path.splitext(os.path.basename(audio_filename)) + if file_id in self.collection.mapping: + return video_bytes, audio_filename + + return TarredAudioFilter(self.manifest_processor.collection) + + def _loop_offsets(self, iterator): + """This function is used to iterate through utterances with different offsets for each file. + """ + + class TarredAudioLoopOffsets: + def __init__(self, collection): + self.iterator = iterator + self.collection = collection + self.current_fn = None + self.current_video_bytes = None + self.offset_id = 0 + + def __iter__(self): + return self + + def __next__(self): + if self.current_fn is None: + self.current_video_bytes, self.current_fn = next(self.iterator) + self.offset_id = 0 + else: + offset_list = self.collection.mapping[self.current_fn] + if len(offset_list) == self.offset_id + 1: + self.current_video_bytes, self.current_fn = next(self.iterator) + self.offset_id = 0 + else: + self.offset_id += 1 + + return self.current_video_bytes, self.current_fn, self.offset_id + + return TarredAudioLoopOffsets(self.manifest_processor.collection) + + def _collate_fn(self, batch): + return _video_speech_collate_fn(batch, self.pad_id) + + def _build_sample(self, tup): + """Builds the training sample by combining the data from the WebDataset with the manifest info. + """ + video_tuple, audio_filename, offset_id = tup + + # Grab manifest entry from self.manifest_preprocessor.collection + file_id, _ = os.path.splitext(os.path.basename(audio_filename)) + manifest_idx = self.manifest_processor.collection.mapping[file_id][offset_id] + manifest_entry = self.manifest_processor.collection[manifest_idx] + + offset = manifest_entry.offset + if offset is None: + offset = 0 + + # Load Video + video_features = video_tuple[0] + + # Signal length + vf, vfl = video_features, torch.tensor(video_features.shape[0]).long() + + # Load Tokens + t, tl = manifest_entry.text_tokens, len(manifest_entry.text_tokens) + + self.manifest_processor.process_text_by_sample(sample=manifest_entry) + + if self.bos_id is not None: + t = [self.bos_id] + t + tl += 1 + if self.eos_id is not None: + t = t + [self.eos_id] + tl += 1 + + if self.return_sample_id: + return vf, vfl, torch.tensor(t).long(), torch.tensor(tl).long(), manifest_idx + else: + return vf, vfl, torch.tensor(t).long(), torch.tensor(tl).long() + + def get_manifest_sample(self, sample_id): + return self.manifest_processor.collection[sample_id] + + def __iter__(self): + return self._dataset.__iter__() + + def __len__(self): + return len(self.manifest_processor.collection) + + +class TarredVideoToBPEDataset(_TarredVideoToTextDataset): + """ + A similar Dataset to the VideoToBPEDataset, but which loads tarred audio files. + + Accepts a single comma-separated JSON manifest file (in the same style as for the VideoToBPEDataset), + as well as the path(s) to the tarball(s) containing the wav files. Each line of the manifest should + contain the information for one audio file, including at least the transcript and name of the audio + file within the tarball. + + Valid formats for the audio_tar_filepaths argument include: + (1) a single string that can be brace-expanded, e.g. 'path/to/audio.tar' or 'path/to/audio_{1..100}.tar.gz', or + (2) a list of file paths that will not be brace-expanded, e.g. ['audio_1.tar', 'audio_2.tar', ...]. + + See the WebDataset documentation for more information about accepted data and input formats. + + If using multiple workers the number of shards should be divisible by world_size to ensure an + even split among workers. If it is not divisible, logging will give a warning but training will proceed. + In addition, if using mutiprocessing, each shard MUST HAVE THE SAME NUMBER OF ENTRIES after filtering + is applied. We currently do not check for this, but your program may hang if the shards are uneven! + + Notice that a few arguments are different from the AudioToBPEDataset; for example, shuffle (bool) has been + replaced by shuffle_n (int). + + Additionally, please note that the len() of this DataLayer is assumed to be the length of the manifest + after filtering. An incorrect manifest length may lead to some DataLoader issues down the line. + + Args: + audio_tar_filepaths: Either a list of audio tarball filepaths, or a + string (can be brace-expandable). + manifest_filepath (str): Path to the manifest. + tokenizer (TokenizerSpec): Either a Word Piece Encoding tokenizer (BERT), + or a Sentence Piece Encoding tokenizer (BPE). The CTC blank + symbol is automatically added later for models using ctc. + int_values (bool): If true, load samples as 32-bit integers. Defauts to False. + shuffle_n (int): How many samples to look ahead and load to be shuffled. + See WebDataset documentation for more details. + Defaults to 0. + min_duration (float): Dataset parameter. + All training files which have a duration less than min_duration + are dropped. Note: Duration is read from the manifest JSON. + Defaults to 0.1. + max_duration (float): Dataset parameter. + All training files which have a duration more than max_duration + are dropped. Note: Duration is read from the manifest JSON. + Defaults to None. + trim (bool): Whether to use trim silence from beginning and end + of audio signal using librosa.effects.trim(). + Defaults to False. + use_start_end_token: Boolean which dictates whether to add [BOS] and [EOS] + tokens to beginning and ending of speech respectively. + pad_id (id): Token used to pad when collating samples in batches. + If this is None, pads using 0s. + Defaults to None. + shard_strategy (str): Tarred dataset shard distribution strategy chosen as a str value during ddp. + + - `scatter`: The default shard strategy applied by WebDataset, where each node gets + a unique set of shards, which are permanently pre-allocated and never changed at runtime. + - `replicate`: Optional shard strategy, where each node gets all of the set of shards + available in the tarred dataset, which are permanently pre-allocated and never changed at runtime. + The benefit of replication is that it allows each node to sample data points from the entire + dataset independently of other nodes, and reduces dependence on value of `shuffle_n`. + + .. warning:: + + Replicated strategy allows every node to sample the entire set of available tarfiles, + and therefore more than one node may sample the same tarfile, and even sample the same + data points! As such, there is no assured guarantee that all samples in the dataset will be + sampled at least once during 1 epoch. Scattered strategy, on the other hand, on specific + occasions (when the number of shards is not divisible with ``world_size``), will not sample + the entire dataset. For these reasons it is not advisable to use tarred datasets as validation + or test datasets. + + global_rank (int): Worker rank, used for partitioning shards. Defaults to 0. + world_size (int): Total number of processes, used for partitioning shards. Defaults to 0. + return_sample_id (bool): whether to return the sample_id as a part of each sample + """ + + def __init__( + self, + audio_tar_filepaths: Union[str, List[str]], + manifest_filepath: str, + tokenizer: 'nemo.collections.common.tokenizers.TokenizerSpec', + int_values: bool = False, + shuffle_n: int = 0, + min_duration: Optional[float] = None, + max_duration: Optional[float] = None, + trim: bool = False, + use_start_end_token: bool = True, + shard_strategy: str = "scatter", + global_rank: int = 0, + world_size: int = 0, + return_sample_id: bool = False, + ): + if use_start_end_token and hasattr(tokenizer, "bos_id") and tokenizer.bos_id > 0: + bos_id = tokenizer.bos_id + else: + bos_id = None + + if use_start_end_token and hasattr(tokenizer, "eos_id") and tokenizer.eos_id > 0: + eos_id = tokenizer.eos_id + else: + eos_id = None + + if hasattr(tokenizer, "pad_id") and tokenizer.pad_id > 0: + pad_id = tokenizer.pad_id + else: + pad_id = 0 + + class TokenizerWrapper: + def __init__(self, tokenizer): + if isinstance(tokenizer, tokenizers.aggregate_tokenizer.AggregateTokenizer): + self.is_aggregate = True + else: + self.is_aggregate = False + self._tokenizer = tokenizer + + def __call__(self, *args): + if isinstance(args[0], Iterable) and self.is_aggregate: + t = [] + for span in args[0]: + t.extend(self._tokenizer.text_to_ids(span['str'], span['lang'])) + return t + + t = self._tokenizer.text_to_ids(*args) + return t + + super().__init__( + audio_tar_filepaths=audio_tar_filepaths, + manifest_filepath=manifest_filepath, + parser=TokenizerWrapper(tokenizer), + int_values=int_values, + shuffle_n=shuffle_n, + min_duration=min_duration, + max_duration=max_duration, + trim=trim, + bos_id=bos_id, + eos_id=eos_id, + pad_id=pad_id, + shard_strategy=shard_strategy, + global_rank=global_rank, + world_size=world_size, + return_sample_id=return_sample_id, + ) diff --git a/nemo/collections/multimodal/speech_cv/data/video_to_text_dataset.py b/nemo/collections/multimodal/speech_cv/data/video_to_text_dataset.py new file mode 100644 index 000000000000..cf34cc14974e --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/data/video_to_text_dataset.py @@ -0,0 +1,287 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import random +from math import isclose +from typing import Optional + +import torch +from omegaconf import DictConfig +from omegaconf.listconfig import ListConfig +from torch.utils.data import ChainDataset + +from nemo.collections.asr.data.audio_to_text_dataset import convert_to_config_list, get_chain_dataset +from nemo.collections.multimodal.speech_cv.data import video_to_text +from nemo.utils import logging + + +def get_video_to_text_bpe_dataset_from_config( + config, + local_rank: int, + global_rank: int, + world_size: int, + tokenizer, + preprocessor_cfg: Optional[DictConfig] = None, +): + """ + Construct Video-To-Text BPE dataset from a config. + Args: + config: BPE dataset config + local_rank: model local rank + global_rank: model global rand + world_size: world size + tokenizer: BPE tokenizer + preprocessor_cfg: preprocessor config, for DALI BPE dataset + + Returns: + constructed dataset or None if dataset config is invalid or nothing to load + """ + + is_concat = config.get('is_concat', False) + if is_concat: + if 'concat_sampling' in config and config['concat_sampling'] is None: + logging.warning(f"Concat dataset requires `concat_sampling` but it was not provided. Config: {config}") + return None + + if not 'concat_probabilities' in config: + logging.warning( + f"Concat dataset requires `concat_probabilities` list but it was not provided. Config: {config}" + ) + return None + else: + if not isclose(sum(config['concat_probabilities']), 1, abs_tol=1e-6): + logging.warning(f"`concat_probabilities` need to sum to 1. Config: {config}") + return None + + shuffle = config['shuffle'] + + # Instantiate tarred dataset loader or normal dataset loader + if config.get('is_tarred', False): + if ('tarred_audio_filepaths' in config and config['tarred_audio_filepaths'] is None) or ( + 'manifest_filepath' in config and config['manifest_filepath'] is None + ): + logging.warning( + "Could not load dataset as `manifest_filepath` was None or " + f"`tarred_audio_filepaths` is None. Provided config : {config}" + ) + return None + + shuffle_n = config.get('shuffle_n', 4 * config['batch_size']) if shuffle else 0 + if is_concat: + raise NotImplementedError("get_concat_tarred_dataset method not implemented") + else: + dataset = get_tarred_dataset( + config=config, tokenizer=tokenizer, shuffle_n=shuffle_n, global_rank=global_rank, world_size=world_size + ) + else: + if 'manifest_filepath' in config and config['manifest_filepath'] is None: + logging.warning(f"Could not load dataset as `manifest_filepath` was None. Provided config : {config}") + return None + if is_concat: + raise NotImplementedError("get_concat_bpe_dataset method not implemented") + else: + dataset = get_bpe_dataset(config=config, tokenizer=tokenizer) + return dataset + + +def get_video_to_text_char_dataset_from_config( + config, local_rank: int, global_rank: int, world_size: int, preprocessor_cfg: Optional[DictConfig] = None +): + """ + Construct Video-To-Text Char dataset from a config. + Args: + config: dataset config + local_rank: model local rank + global_rank: model global rand + world_size: world size + preprocessor_cfg: preprocessor config, for DALI dataset + + Returns: + constructed dataset or None if dataset config is invalid or nothing to load + """ + + is_concat = config.get('is_concat', False) + if is_concat: + if 'concat_sampling' in config and config['concat_sampling'] is None: + logging.warning(f"Concat dataset requires `concat_sampling` but it was not provided. Config: {config}") + return None + + if not 'concat_probabilities' in config: + logging.warning( + f"Concat dataset requires `concat_probabilities` list but it was not provided. Config: {config}" + ) + return None + else: + if not isclose(sum(config['concat_probabilities']), 1, abs_tol=1e-6): + logging.warning(f"`concat_probabilities` need to sum to 1. Config: {config}") + return None + + shuffle = config['shuffle'] + + # Instantiate tarred dataset loader or normal dataset loader + if config.get('is_tarred', False): + if ('tarred_audio_filepaths' in config and config['tarred_audio_filepaths'] is None) or ( + 'manifest_filepath' in config and config['manifest_filepath'] is None + ): + logging.warning( + "Could not load dataset as `manifest_filepath` was None or " + f"`tarred_audio_filepaths` is None. Provided config : {config}" + ) + return None + + shuffle_n = config.get('shuffle_n', 4 * config['batch_size']) if shuffle else 0 + if is_concat: + raise Exception("get_concat_tarred_dataset method not implemented") + else: + dataset = get_tarred_dataset( + config=config, shuffle_n=shuffle_n, global_rank=global_rank, world_size=world_size, + ) + else: + if 'manifest_filepath' in config and config['manifest_filepath'] is None: + logging.warning(f"Could not load dataset as `manifest_filepath` was None. Provided config : {config}") + return None + if is_concat: + raise Exception("get_concat_char_dataset method not implemented") + else: + dataset = get_char_dataset(config=config) + return dataset + + +def get_bpe_dataset(config: dict, tokenizer: 'TokenizerSpec') -> video_to_text.VideoToBPEDataset: + """ + Instantiates a Byte Pair Encoding / Word Piece Encoding based VideoToBPEDataset. + + Args: + config: Config of the VideoToBPEDataset. + tokenizer: An instance of a TokenizerSpec object. + + Returns: + An instance of VideoToBPEDataset. + """ + dataset = video_to_text.VideoToBPEDataset( + manifest_filepath=config['manifest_filepath'], + tokenizer=tokenizer, + int_values=config.get('int_values', False), + max_duration=config.get('max_duration', None), + min_duration=config.get('min_duration', None), + max_utts=config.get('max_utts', 0), + trim=config.get('trim_silence', False), + use_start_end_token=config.get('use_start_end_token', True), + return_sample_id=config.get('return_sample_id', False), + channel_selector=config.get('channel_selector', None), + ) + return dataset + + +def get_char_dataset(config: dict) -> video_to_text.VideoToCharDataset: + """ + Instantiates a Character Encoding based VideoToCharDataset. + + Args: + config: Config of the VideoToCharDataset. + + Returns: + An instance of VideoToCharDataset. + """ + if 'labels' not in config: + logging.warning(f"dataset does not have explicitly defined labels") + + dataset = video_to_text.VideoToCharDataset( + manifest_filepath=config['manifest_filepath'], + labels=config.get('labels', None), + int_values=config.get('int_values', False), + max_duration=config.get('max_duration', None), + min_duration=config.get('min_duration', None), + max_utts=config.get('max_utts', 0), + blank_index=config.get('blank_index', -1), + unk_index=config.get('unk_index', -1), + normalize=config.get('normalize_transcripts', False), + trim=config.get('trim_silence', False), + parser=config.get('parser', 'en'), + return_sample_id=config.get('return_sample_id', False), + channel_selector=config.get('channel_selector', None), + ) + return dataset + + +def get_tarred_dataset( + config: dict, shuffle_n: int, global_rank: int, world_size: int, tokenizer: Optional['TokenizerSpec'] = None, +) -> video_to_text.TarredVideoToBPEDataset: + """ + Instantiates a Word Piece/BPE Encoding based TarredVideoToBPEDataset or a char based TarredVideoToCharDataset. + + Args: + config: Config of the TarredVideoToBPEDataset or TarredVideoToCharDataset. + shuffle_n: How many samples to look ahead and load to be shuffled. + See WebDataset documentation for more details. + tokenizer: An instance of a TokenizerSpec object if BPE dataset is needed. + global_rank: Global rank of this device. + world_size: Global world size in the training method. + Passsing None would return a char-based dataset. + + Returns: + An instance of TarredVideoToBPEDataset or TarredVideoToCharDataset. + """ + tarred_audio_filepaths = config['tarred_audio_filepaths'] + manifest_filepaths = config['manifest_filepath'] + datasets = [] + tarred_audio_filepaths = convert_to_config_list(tarred_audio_filepaths) + manifest_filepaths = convert_to_config_list(manifest_filepaths) + + bucketing_weights = config.get('bucketing_weights', None) # For upsampling buckets + if bucketing_weights: + for idx, weight in enumerate(bucketing_weights): + if not isinstance(weight, int) or weight <= 0: + raise ValueError(f"bucket weights must be positive integers") + + if len(manifest_filepaths) != len(tarred_audio_filepaths): + raise ValueError( + f"manifest_filepaths (length={len(manifest_filepaths)}) and tarred_audio_filepaths (length={len(tarred_audio_filepaths)}) need to have the same number of buckets." + ) + + if 'labels' not in config: + logging.warning(f"dataset does not have explicitly defined labels") + + if 'max_utts' in config: + raise ValueError('"max_utts" parameter is not supported for tarred datasets') + + for dataset_idx, (tarred_audio_filepath, manifest_filepath) in enumerate( + zip(tarred_audio_filepaths, manifest_filepaths) + ): + if len(tarred_audio_filepath) == 1: + tarred_audio_filepath = tarred_audio_filepath[0] + if tokenizer is None: + raise Exception("video_to_text.TarredVideoToCharDataset class not Implemented") + else: + dataset = video_to_text.TarredVideoToBPEDataset( + audio_tar_filepaths=tarred_audio_filepath, + manifest_filepath=manifest_filepath, + tokenizer=tokenizer, + int_values=config.get('int_values', False), + shuffle_n=shuffle_n, + max_duration=config.get('max_duration', None), + min_duration=config.get('min_duration', None), + trim=config.get('trim_silence', False), + use_start_end_token=config.get('use_start_end_token', True), + shard_strategy=config.get('tarred_shard_strategy', 'scatter'), + global_rank=global_rank, + world_size=world_size, + return_sample_id=config.get('return_sample_id', False), + ) + if bucketing_weights: + [datasets.append(dataset) for _ in range(bucketing_weights[dataset_idx])] + else: + datasets.append(dataset) + + return get_chain_dataset(datasets=datasets, ds_config=config) diff --git a/nemo/collections/multimodal/speech_cv/models/__init__.py b/nemo/collections/multimodal/speech_cv/models/__init__.py new file mode 100644 index 000000000000..c34b4c174ac1 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/models/__init__.py @@ -0,0 +1,27 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# CTC +from nemo.collections.multimodal.speech_cv.models.visual_ctc_bpe_models import VisualEncDecCTCModelBPE +from nemo.collections.multimodal.speech_cv.models.visual_ctc_models import VisualEncDecCTCModel +from nemo.collections.multimodal.speech_cv.models.visual_hybrid_rnnt_ctc_bpe_models import ( + VisualEncDecHybridRNNTCTCBPEModel, +) + +# Hybrid CTC/RNN-T +from nemo.collections.multimodal.speech_cv.models.visual_hybrid_rnnt_ctc_models import VisualEncDecHybridRNNTCTCModel +from nemo.collections.multimodal.speech_cv.models.visual_rnnt_bpe_models import VisualEncDecRNNTBPEModel + +# RNN-T +from nemo.collections.multimodal.speech_cv.models.visual_rnnt_models import VisualEncDecRNNTModel diff --git a/nemo/collections/multimodal/speech_cv/models/visual_ctc_bpe_models.py b/nemo/collections/multimodal/speech_cv/models/visual_ctc_bpe_models.py new file mode 100644 index 000000000000..529acb4cc86f --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/models/visual_ctc_bpe_models.py @@ -0,0 +1,314 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy +import os +from typing import Dict, List, Optional, Union + +import torch +from omegaconf import DictConfig, ListConfig, OmegaConf, open_dict + +from nemo.collections.asr.losses.ctc import CTCLoss +from nemo.collections.asr.metrics.wer_bpe import WERBPE, CTCBPEDecoding, CTCBPEDecodingConfig +from nemo.collections.asr.parts.mixins import ASRBPEMixin +from nemo.collections.multimodal.speech_cv.data import video_to_text_dataset +from nemo.collections.multimodal.speech_cv.models.visual_ctc_models import VisualEncDecCTCModel +from nemo.core.classes.common import PretrainedModelInfo +from nemo.utils import logging, model_utils + +__all__ = ['VisualEncDecCTCModelBPE'] + + +class VisualEncDecCTCModelBPE(VisualEncDecCTCModel, ASRBPEMixin): + """Encoder decoder CTC-based models with Byte Pair Encoding.""" + + def __init__(self, cfg: DictConfig, trainer=None): + # Convert to Hydra 1.0 compatible DictConfig + cfg = model_utils.convert_model_config_to_dict_config(cfg) + cfg = model_utils.maybe_update_config_version(cfg) + + if 'tokenizer' not in cfg: + raise ValueError("`cfg` must have `tokenizer` config to create a tokenizer !") + + # Setup the tokenizer + self._setup_tokenizer(cfg.tokenizer) + + # Initialize a dummy vocabulary + vocabulary = self.tokenizer.tokenizer.get_vocab() + + # Set the new vocabulary + with open_dict(cfg): + # sidestepping the potential overlapping tokens issue in aggregate tokenizers + if self.tokenizer_type == "agg": + cfg.decoder.vocabulary = ListConfig(vocabulary) + else: + cfg.decoder.vocabulary = ListConfig(list(vocabulary.keys())) + + # Override number of classes if placeholder provided + num_classes = cfg.decoder["num_classes"] + + if num_classes < 1: + logging.info( + "\nReplacing placeholder number of classes ({}) with actual number of classes - {}".format( + num_classes, len(vocabulary) + ) + ) + cfg.decoder["num_classes"] = len(vocabulary) + + super().__init__(cfg=cfg, trainer=trainer) + + # Setup decoding objects + decoding_cfg = self.cfg.get('decoding', None) + + # In case decoding config not found, use default config + if decoding_cfg is None: + decoding_cfg = OmegaConf.structured(CTCBPEDecodingConfig) + with open_dict(self.cfg): + self.cfg.decoding = decoding_cfg + + self.decoding = CTCBPEDecoding(self.cfg.decoding, tokenizer=self.tokenizer) + + # Setup metric with decoding strategy + self._wer = WERBPE( + decoding=self.decoding, + use_cer=self._cfg.get('use_cer', False), + dist_sync_on_step=True, + log_prediction=self._cfg.get("log_prediction", False), + ) + + def _setup_dataloader_from_config(self, config: Optional[Dict]): + dataset = video_to_text_dataset.get_video_to_text_bpe_dataset_from_config( + config=config, + local_rank=self.local_rank, + global_rank=self.global_rank, + world_size=self.world_size, + tokenizer=self.tokenizer, + preprocessor_cfg=self.cfg.get("preprocessor", None), + ) + + if dataset is None: + return None + + shuffle = config['shuffle'] + if config.get('is_tarred', False): + shuffle = False + + if hasattr(dataset, 'collate_fn'): + collate_fn = dataset.collate_fn + else: + collate_fn = dataset.datasets[0].collate_fn + + return torch.utils.data.DataLoader( + dataset=dataset, + batch_size=config['batch_size'], + collate_fn=collate_fn, + drop_last=config.get('drop_last', False), + shuffle=shuffle, + num_workers=config.get('num_workers', 0), + pin_memory=config.get('pin_memory', False), + prefetch_factor=config.get('prefetch_factor', 2), + ) + + def _setup_transcribe_dataloader(self, config: Dict) -> 'torch.utils.data.DataLoader': + """ + Setup function for a temporary data loader which wraps the provided video file. + + Args: + config: A python dictionary which contains the following keys: + paths2video_files: (a list) of paths to video files. + batch_size: (int) batch size to use during inference. \ + Bigger will result in better throughput performance but would use more memory. + temp_dir: (str) A temporary directory where the video manifest is temporarily + stored. + num_workers: (int) number of workers. Depends of the batch_size and machine. \ + 0 - only the main process will load batches, 1 - one worker (not main process) + + Returns: + A pytorch DataLoader for the given video file(s). + """ + + if 'manifest_filepath' in config: + manifest_filepath = config['manifest_filepath'] + batch_size = config['batch_size'] + else: + manifest_filepath = os.path.join(config['temp_dir'], 'manifest.json') + batch_size = min(config['batch_size'], len(config['paths2video_files'])) + + dl_config = { + 'manifest_filepath': manifest_filepath, + 'batch_size': batch_size, + 'shuffle': False, + 'num_workers': config.get('num_workers', min(batch_size, os.cpu_count() - 1)), + 'pin_memory': True, + 'channel_selector': config.get('channel_selector', None), + 'use_start_end_token': self.cfg.validation_ds.get('use_start_end_token', False), + } + + if config.get("augmentor"): + dl_config['augmentor'] = config.get("augmentor") + + temporary_datalayer = self._setup_dataloader_from_config(config=DictConfig(dl_config)) + return temporary_datalayer + + def change_vocabulary( + self, + new_tokenizer_dir: Union[str, DictConfig], + new_tokenizer_type: str, + decoding_cfg: Optional[DictConfig] = None, + ): + """ + Changes vocabulary of the tokenizer used during CTC decoding process. + Use this method when fine-tuning on from pre-trained model. + This method changes only decoder and leaves encoder and pre-processing modules unchanged. For example, you would + use it if you want to use pretrained encoder when fine-tuning on a data in another language, or when you'd need + model to learn capitalization, punctuation and/or special characters. + + Args: + new_tokenizer_dir: Directory path to tokenizer or a config for a new tokenizer (if the tokenizer type is `agg`) + new_tokenizer_type: Either `agg`, `bpe` or `wpe`. `bpe` is used for SentencePiece tokenizers, + whereas `wpe` is used for `BertTokenizer`. + new_tokenizer_cfg: A config for the new tokenizer. if provided, pre-empts the dir and type + + Returns: None + + """ + if isinstance(new_tokenizer_dir, DictConfig): + if new_tokenizer_type == 'agg': + new_tokenizer_cfg = new_tokenizer_dir + else: + raise ValueError( + f'New tokenizer dir should be a string unless the tokenizer is `agg`, but this tokenizer type is: {new_tokenizer_type}' + ) + else: + new_tokenizer_cfg = None + + if new_tokenizer_cfg is not None: + tokenizer_cfg = new_tokenizer_cfg + else: + if not os.path.isdir(new_tokenizer_dir): + raise NotADirectoryError( + f'New tokenizer dir must be non-empty path to a directory. But I got: {new_tokenizer_dir}' + f"New tokenizer dir must be non-empty path to a directory. But I got: {new_tokenizer_dir}" + ) + + if new_tokenizer_type.lower() not in ('bpe', 'wpe'): + raise ValueError(f'New tokenizer type must be either `bpe` or `wpe`') + + tokenizer_cfg = OmegaConf.create({'dir': new_tokenizer_dir, 'type': new_tokenizer_type}) + + # Setup the tokenizer + self._setup_tokenizer(tokenizer_cfg) + + # Initialize a dummy vocabulary + vocabulary = self.tokenizer.tokenizer.get_vocab() + + # Set the new vocabulary + decoder_config = copy.deepcopy(self.decoder.to_config_dict()) + # sidestepping the potential overlapping tokens issue in aggregate tokenizers + if self.tokenizer_type == "agg": + decoder_config.vocabulary = ListConfig(vocabulary) + else: + decoder_config.vocabulary = ListConfig(list(vocabulary.keys())) + + decoder_num_classes = decoder_config['num_classes'] + + # Override number of classes if placeholder provided + logging.info( + "\nReplacing old number of classes ({}) with new number of classes - {}".format( + decoder_num_classes, len(vocabulary) + ) + ) + + decoder_config['num_classes'] = len(vocabulary) + + del self.decoder + self.decoder = VisualEncDecCTCModelBPE.from_config_dict(decoder_config) + del self.loss + self.loss = CTCLoss( + num_classes=self.decoder.num_classes_with_blank - 1, + zero_infinity=True, + reduction=self._cfg.get("ctc_reduction", "mean_batch"), + ) + + if decoding_cfg is None: + # Assume same decoding config as before + decoding_cfg = self.cfg.decoding + + # Assert the decoding config with all hyper parameters + decoding_cls = OmegaConf.structured(CTCBPEDecodingConfig) + decoding_cls = OmegaConf.create(OmegaConf.to_container(decoding_cls)) + decoding_cfg = OmegaConf.merge(decoding_cls, decoding_cfg) + + self.decoding = CTCBPEDecoding(decoding_cfg=decoding_cfg, tokenizer=self.tokenizer) + + self._wer = WERBPE( + decoding=self.decoding, + use_cer=self._cfg.get('use_cer', False), + log_prediction=self._cfg.get("log_prediction", False), + dist_sync_on_step=True, + ) + + # Update config + with open_dict(self.cfg.decoder): + self._cfg.decoder = decoder_config + + with open_dict(self.cfg.decoding): + self._cfg.decoding = decoding_cfg + + logging.info(f"Changed tokenizer to {self.decoder.vocabulary} vocabulary.") + + def change_decoding_strategy(self, decoding_cfg: DictConfig): + """ + Changes decoding strategy used during CTC decoding process. + + Args: + decoding_cfg: A config for the decoder, which is optional. If the decoding type + needs to be changed (from say Greedy to Beam decoding etc), the config can be passed here. + """ + if decoding_cfg is None: + # Assume same decoding config as before + logging.info("No `decoding_cfg` passed when changing decoding strategy, using internal config") + decoding_cfg = self.cfg.decoding + + # Assert the decoding config with all hyper parameters + decoding_cls = OmegaConf.structured(CTCBPEDecodingConfig) + decoding_cls = OmegaConf.create(OmegaConf.to_container(decoding_cls)) + decoding_cfg = OmegaConf.merge(decoding_cls, decoding_cfg) + + self.decoding = CTCBPEDecoding(decoding_cfg=decoding_cfg, tokenizer=self.tokenizer,) + + self._wer = WERBPE( + decoding=self.decoding, + use_cer=self._wer.use_cer, + log_prediction=self._wer.log_prediction, + dist_sync_on_step=True, + ) + + # Update config + with open_dict(self.cfg.decoding): + self.cfg.decoding = decoding_cfg + + logging.info(f"Changed decoding strategy to \n{OmegaConf.to_yaml(self.cfg.decoding)}") + + @classmethod + def list_available_models(cls) -> List[PretrainedModelInfo]: + """ + This method returns a list of pre-trained model which can be instantiated directly from NVIDIA's NGC cloud. + + Returns: + List of available pre-trained models. + """ + results = [] + + return results diff --git a/nemo/collections/multimodal/speech_cv/models/visual_ctc_models.py b/nemo/collections/multimodal/speech_cv/models/visual_ctc_models.py new file mode 100644 index 000000000000..a2eeba03ee8f --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/models/visual_ctc_models.py @@ -0,0 +1,692 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy +import json +import os +import tempfile +from math import ceil +from typing import Dict, List, Optional, Union + +import torch +from omegaconf import DictConfig, OmegaConf, open_dict +from pytorch_lightning import Trainer +from tqdm.auto import tqdm + +from nemo.collections.asr.data import audio_to_text_dataset +from nemo.collections.asr.losses.ctc import CTCLoss +from nemo.collections.asr.metrics.wer import WER, CTCDecoding, CTCDecodingConfig +from nemo.collections.asr.models.asr_model import ASRModel, ExportableEncDecModel +from nemo.collections.asr.parts.mixins import ASRModuleMixin, InterCTCMixin +from nemo.collections.asr.parts.utils.audio_utils import ChannelSelectorType +from nemo.collections.multimodal.speech_cv.data import video_to_text_dataset +from nemo.core.classes.common import PretrainedModelInfo, typecheck +from nemo.core.classes.mixins import AccessMixin +from nemo.core.neural_types import LabelsType, LengthsType, LogprobsType, NeuralType, VideoSignal +from nemo.utils import logging + +__all__ = ['VisualEncDecCTCModel'] + + +class VisualEncDecCTCModel(ASRModel, ExportableEncDecModel, ASRModuleMixin, InterCTCMixin): + """Base class for encoder decoder CTC-based models.""" + + def __init__(self, cfg: DictConfig, trainer: Trainer = None): + + # Get global rank and total number of GPU workers for IterableDataset partitioning, if applicable + # Global_rank and local_rank is set by LightningModule in Lightning 1.2.0 + self.world_size = 1 + if trainer is not None: + self.world_size = trainer.world_size + + # Init + super().__init__(cfg=cfg, trainer=trainer) + + # Preprocessor, video transforms + self.video_preprocessor = VisualEncDecCTCModel.from_config_dict(self._cfg.video_preprocessor) + + # Augmentation, video augmentations + self.video_augmentation = VisualEncDecCTCModel.from_config_dict(self._cfg.video_augment) + + # Front-end Network, learned module that transform videos to temporal sequence + self.video_front_end = VisualEncDecCTCModel.from_config_dict(self._cfg.video_front_end) + + # Encoder Network + self.encoder = VisualEncDecCTCModel.from_config_dict(self._cfg.encoder) + + with open_dict(self._cfg): + if "feat_in" not in self._cfg.decoder or ( + not self._cfg.decoder.feat_in and hasattr(self.encoder, '_feat_out') + ): + self._cfg.decoder.feat_in = self.encoder._feat_out + if "feat_in" not in self._cfg.decoder or not self._cfg.decoder.feat_in: + raise ValueError("param feat_in of the decoder's config is not set!") + + if self.cfg.decoder.num_classes < 1 and self.cfg.decoder.vocabulary is not None: + logging.info( + "\nReplacing placeholder number of classes ({}) with actual number of classes - {}".format( + self.cfg.decoder.num_classes, len(self.cfg.decoder.vocabulary) + ) + ) + cfg.decoder["num_classes"] = len(self.cfg.decoder.vocabulary) + + # Decoder + self.decoder = VisualEncDecCTCModel.from_config_dict(self._cfg.decoder) + + # CTC Loss + self.loss = CTCLoss( + num_classes=self.decoder.num_classes_with_blank - 1, + zero_infinity=True, + reduction=self._cfg.get("ctc_reduction", "mean_batch"), + ) + + # Setup decoding objects + decoding_cfg = self.cfg.get('decoding', None) + + # In case decoding config not found, use default config + if decoding_cfg is None: + decoding_cfg = OmegaConf.structured(CTCDecodingConfig) + with open_dict(self.cfg): + self.cfg.decoding = decoding_cfg + + # Decoding + self.decoding = CTCDecoding(self.cfg.decoding, vocabulary=self.decoder.vocabulary) + + # Setup metric with decoding strategy + self._wer = WER( + decoding=self.decoding, + use_cer=self._cfg.get('use_cer', False), + dist_sync_on_step=True, + log_prediction=self._cfg.get("log_prediction", False), + ) + + # Setup optional Optimization flags + self.setup_optimization_flags() + + # setting up interCTC loss (from InterCTCMixin) + self.setup_interctc(decoder_name='decoder', loss_name='loss', wer_name='_wer') + + # Adapter modules setup (from ASRAdapterModelMixin) + self.setup_adapters() + + @torch.no_grad() + def transcribe( + self, + paths2video_files: List[str], + batch_size: int = 4, + logprobs: bool = False, + return_hypotheses: bool = False, + num_workers: int = 0, + channel_selector: Optional[ChannelSelectorType] = None, + augmentor: DictConfig = None, + ) -> List[str]: + """ + If modify this function, please remember update transcribe_partial_audio() in + nemo/collections/asr/parts/utils/trancribe_utils.py + + Uses greedy decoding to transcribe video files. Use this method for debugging and prototyping. + + Args: + paths2video_files: (a list) of paths to video files. + batch_size: (int) batch size to use during inference. + Bigger will result in better throughput performance but would use more memory. + logprobs: (bool) pass True to get log probabilities instead of transcripts. + return_hypotheses: (bool) Either return hypotheses or text + With hypotheses can do some postprocessing like getting timestamp or rescoring + num_workers: (int) number of workers for DataLoader + channel_selector (int | Iterable[int] | str): select a single channel or a subset of channels from multi-channel audio. If set to `'average'`, it performs averaging across channels. Disabled if set to `None`. Defaults to `None`. + augmentor: (DictConfig): Augment audio samples during transcription if augmentor is applied. + Returns: + A list of transcriptions (or raw log probabilities if logprobs is True) in the same order as paths2video_files + """ + if paths2video_files is None or len(paths2video_files) == 0: + return {} + + if return_hypotheses and logprobs: + raise ValueError( + "Either `return_hypotheses` or `logprobs` can be True at any given time." + "Returned hypotheses will contain the logprobs." + ) + + if num_workers is None: + num_workers = min(batch_size, os.cpu_count() - 1) + + # We will store transcriptions here + hypotheses = [] + all_hypotheses = [] + + # Model's mode and device + mode = self.training + device = next(self.parameters()).device + + try: + # Switch model to evaluation mode + self.eval() + # Freeze the visual front-end, encoder and decoder modules + self.video_front_end.freeze() + self.encoder.freeze() + self.decoder.freeze() + logging_level = logging.get_verbosity() + logging.set_verbosity(logging.WARNING) + # Work in tmp directory - will store manifest file there + with tempfile.TemporaryDirectory() as tmpdir: + with open(os.path.join(tmpdir, 'manifest.json'), 'w', encoding='utf-8') as fp: + for video_file in paths2video_files: + entry = {'video_filepath': video_file, 'duration': 100000, 'text': ''} + fp.write(json.dumps(entry) + '\n') + + config = { + 'paths2video_files': paths2video_files, + 'batch_size': batch_size, + 'temp_dir': tmpdir, + 'num_workers': num_workers, + 'channel_selector': channel_selector, + } + + if augmentor: + config['augmentor'] = augmentor + + temporary_datalayer = self._setup_transcribe_dataloader(config) + for test_batch in tqdm(temporary_datalayer, desc="Transcribing"): + logits, logits_len, greedy_predictions = self.forward( + input_signal=test_batch[0].to(device), input_signal_length=test_batch[1].to(device) + ) + if logprobs: + # dump log probs per file + for idx in range(logits.shape[0]): + lg = logits[idx][: logits_len[idx]] + hypotheses.append(lg.cpu().numpy()) + else: + current_hypotheses, all_hyp = self.decoding.ctc_decoder_predictions_tensor( + logits, decoder_lengths=logits_len, return_hypotheses=return_hypotheses, + ) + + if return_hypotheses: + # dump log probs per file + for idx in range(logits.shape[0]): + current_hypotheses[idx].y_sequence = logits[idx][: logits_len[idx]] + if current_hypotheses[idx].alignments is None: + current_hypotheses[idx].alignments = current_hypotheses[idx].y_sequence + + if all_hyp is None: + hypotheses += current_hypotheses + else: + hypotheses += all_hyp + + del greedy_predictions + del logits + del test_batch + finally: + # set mode back to its original value + self.train(mode=mode) + if mode is True: + self.video_front_end.unfreeze() + self.encoder.unfreeze() + self.decoder.unfreeze() + logging.set_verbosity(logging_level) + + return hypotheses + + def change_vocabulary(self, new_vocabulary: List[str], decoding_cfg: Optional[DictConfig] = None): + """ + Changes vocabulary used during CTC decoding process. Use this method when fine-tuning on from pre-trained model. + This method changes only decoder and leaves encoder and pre-processing modules unchanged. For example, you would + use it if you want to use pretrained encoder when fine-tuning on a data in another language, or when you'd need + model to learn capitalization, punctuation and/or special characters. + + If new_vocabulary == self.decoder.vocabulary then nothing will be changed. + + Args: + + new_vocabulary: list with new vocabulary. Must contain at least 2 elements. Typically, \ + this is target alphabet. + + Returns: None + + """ + if self.decoder.vocabulary == new_vocabulary: + logging.warning(f"Old {self.decoder.vocabulary} and new {new_vocabulary} match. Not changing anything.") + else: + if new_vocabulary is None or len(new_vocabulary) == 0: + raise ValueError(f'New vocabulary must be non-empty list of chars. But I got: {new_vocabulary}') + decoder_config = self.decoder.to_config_dict() + new_decoder_config = copy.deepcopy(decoder_config) + new_decoder_config['vocabulary'] = new_vocabulary + new_decoder_config['num_classes'] = len(new_vocabulary) + + del self.decoder + self.decoder = VisualEncDecCTCModel.from_config_dict(new_decoder_config) + del self.loss + self.loss = CTCLoss( + num_classes=self.decoder.num_classes_with_blank - 1, + zero_infinity=True, + reduction=self._cfg.get("ctc_reduction", "mean_batch"), + ) + + if decoding_cfg is None: + # Assume same decoding config as before + decoding_cfg = self.cfg.decoding + + # Assert the decoding config with all hyper parameters + decoding_cls = OmegaConf.structured(CTCDecodingConfig) + decoding_cls = OmegaConf.create(OmegaConf.to_container(decoding_cls)) + decoding_cfg = OmegaConf.merge(decoding_cls, decoding_cfg) + + self.decoding = CTCDecoding(decoding_cfg=decoding_cfg, vocabulary=self.decoder.vocabulary) + + self._wer = WER( + decoding=self.decoding, + use_cer=self._cfg.get('use_cer', False), + dist_sync_on_step=True, + log_prediction=self._cfg.get("log_prediction", False), + ) + + # Update config + with open_dict(self.cfg.decoder): + self._cfg.decoder = new_decoder_config + + with open_dict(self.cfg.decoding): + self.cfg.decoding = decoding_cfg + + ds_keys = ['train_ds', 'validation_ds', 'test_ds'] + for key in ds_keys: + if key in self.cfg: + with open_dict(self.cfg[key]): + self.cfg[key]['labels'] = OmegaConf.create(new_vocabulary) + + logging.info(f"Changed decoder to output to {self.decoder.vocabulary} vocabulary.") + + def change_decoding_strategy(self, decoding_cfg: DictConfig): + """ + Changes decoding strategy used during CTC decoding process. + + Args: + decoding_cfg: A config for the decoder, which is optional. If the decoding type + needs to be changed (from say Greedy to Beam decoding etc), the config can be passed here. + """ + if decoding_cfg is None: + # Assume same decoding config as before + logging.info("No `decoding_cfg` passed when changing decoding strategy, using internal config") + decoding_cfg = self.cfg.decoding + + # Assert the decoding config with all hyper parameters + decoding_cls = OmegaConf.structured(CTCDecodingConfig) + decoding_cls = OmegaConf.create(OmegaConf.to_container(decoding_cls)) + decoding_cfg = OmegaConf.merge(decoding_cls, decoding_cfg) + + self.decoding = CTCDecoding(decoding_cfg=decoding_cfg, vocabulary=self.decoder.vocabulary) + + self._wer = WER( + decoding=self.decoding, + use_cer=self._wer.use_cer, + log_prediction=self._wer.log_prediction, + dist_sync_on_step=True, + ) + + # Update config + with open_dict(self.cfg.decoding): + self.cfg.decoding = decoding_cfg + + logging.info(f"Changed decoding strategy to \n{OmegaConf.to_yaml(self.cfg.decoding)}") + + def _setup_dataloader_from_config(self, config: Optional[Dict]): + # Automatically inject args from model config to dataloader config + audio_to_text_dataset.inject_dataloader_value_from_model_config(self.cfg, config, key='sample_rate') + audio_to_text_dataset.inject_dataloader_value_from_model_config(self.cfg, config, key='labels') + dataset = video_to_text_dataset.get_video_to_text_char_dataset_from_config( + config=config, + local_rank=self.local_rank, + global_rank=self.global_rank, + world_size=self.world_size, + preprocessor_cfg=self._cfg.get("preprocessor", None), + ) + + if dataset is None: + return None + + shuffle = config['shuffle'] + if config.get('is_tarred', False): + shuffle = False + + if hasattr(dataset, 'collate_fn'): + collate_fn = dataset.collate_fn + else: + collate_fn = dataset.datasets[0].collate_fn + + return torch.utils.data.DataLoader( + dataset=dataset, + batch_size=config['batch_size'], + collate_fn=collate_fn, + drop_last=config.get('drop_last', False), + shuffle=shuffle, + num_workers=config.get('num_workers', 0), + pin_memory=config.get('pin_memory', False), + ) + + def setup_training_data(self, train_data_config: Optional[Union[DictConfig, Dict]]): + """ + Sets up the training data loader via a Dict-like object. + + Args: + train_data_config: A config that contains the information regarding construction + of an ASR Training dataset. + + Supported Datasets: + - :class:`~nemo.collections.multimodal.speech_cv.data.video_to_text.VideoToCharDataset` + - :class:`~nemo.collections.asr.data.video_to_text.VideoToBPEDataset` + - :class:`~nemo.collections.asr.data.video_to_text.TarredVideoToBPEDataset` + """ + if 'shuffle' not in train_data_config: + train_data_config['shuffle'] = True + + # preserve config + self._update_dataset_config(dataset_name='train', config=train_data_config) + + self._train_dl = self._setup_dataloader_from_config(config=train_data_config) + + # Need to set this because if using an IterableDataset, the length of the dataloader is the total number + # of samples rather than the number of batches, and this messes up the tqdm progress bar. + # So we set the number of steps manually (to the correct number) to fix this. + if 'is_tarred' in train_data_config and train_data_config['is_tarred']: + # We also need to check if limit_train_batches is already set. + # If it's an int, we assume that the user has set it to something sane, i.e. <= # training batches, + # and don't change it. Otherwise, adjust batches accordingly if it's a float (including 1.0). + if self._trainer is not None and isinstance(self._trainer.limit_train_batches, float): + self._trainer.limit_train_batches = int( + self._trainer.limit_train_batches + * ceil((len(self._train_dl.dataset) / self.world_size) / train_data_config['batch_size']) + ) + elif self._trainer is None: + logging.warning( + "Model Trainer was not set before constructing the dataset, incorrect number of " + "training batches will be used. Please set the trainer and rebuild the dataset." + ) + + def setup_validation_data(self, val_data_config: Optional[Union[DictConfig, Dict]]): + """ + Sets up the validation data loader via a Dict-like object. + + Args: + val_data_config: A config that contains the information regarding construction + of an ASR Training dataset. + + Supported Datasets: + - :class:`~nemo.collections.multimodal.speech_cv.data.video_to_text.VideoToCharDataset` + - :class:`~nemo.collections.asr.data.video_to_text.VideoToBPEDataset` + - :class:`~nemo.collections.asr.data.video_to_text.TarredVideoToBPEDataset` + """ + if 'shuffle' not in val_data_config: + val_data_config['shuffle'] = False + + # preserve config + self._update_dataset_config(dataset_name='validation', config=val_data_config) + + self._validation_dl = self._setup_dataloader_from_config(config=val_data_config) + + def setup_test_data(self, test_data_config: Optional[Union[DictConfig, Dict]]): + """ + Sets up the test data loader via a Dict-like object. + + Args: + test_data_config: A config that contains the information regarding construction + of an ASR Training dataset. + + Supported Datasets: + - :class:`~nemo.collections.multimodal.speech_cv.data.video_to_text.VideoToCharDataset` + - :class:`~nemo.collections.asr.data.video_to_text.VideoToBPEDataset` + - :class:`~nemo.collections.asr.data.video_to_text.TarredVideoToBPEDataset` + """ + if 'shuffle' not in test_data_config: + test_data_config['shuffle'] = False + + # preserve config + self._update_dataset_config(dataset_name='test', config=test_data_config) + + self._test_dl = self._setup_dataloader_from_config(config=test_data_config) + + @property + def input_types(self) -> Optional[Dict[str, NeuralType]]: + return { + "input_video_signal": NeuralType(('B', 'C', 'T', 'H', 'W'), VideoSignal(), optional=True), + "input_video_signal_length": NeuralType(tuple('B'), LengthsType(), optional=True), + "sample_id": NeuralType(tuple('B'), LengthsType(), optional=True), + } + + @property + def output_types(self) -> Optional[Dict[str, NeuralType]]: + return { + "outputs": NeuralType(('B', 'T', 'D'), LogprobsType()), + "encoded_lengths": NeuralType(tuple('B'), LengthsType()), + "greedy_predictions": NeuralType(('B', 'T'), LabelsType()), + } + + @typecheck() + def forward(self, input_video_signal=None, input_video_signal_length=None): + """ + Forward pass of the model. + + Args: + input_video_signal: Tensor that represents a batch of video signals, + of shape [B, T, H, W, C]. T here represents timesteps, H height, W width and C channels + input_video_signal_length: Vector of length B, that contains the individual lengths of the video + sequences. + + Returns: + A tuple of 3 elements - + 1) The log probabilities tensor of shape [B, T, D]. + 2) The lengths of the acoustic sequence after propagation through the encoder, of shape [B]. + 3) The greedy token predictions of the model of shape [B, T] (via argmax) + """ + + # Preprocessing + processed_video_signal, processed_video_signal_length = self.video_preprocessor( + input_signal=input_video_signal, length=input_video_signal_length + ) + + # Augmentation + processed_video_signal = self.video_augmentation( + input_signal=processed_video_signal, length=processed_video_signal_length + ) + + # Front-end Networks + processed_video_signal, processed_video_signal_length = self.video_front_end( + input_signal=processed_video_signal, length=processed_video_signal_length + ) + + # Back-end Networks + encoded, encoded_len = self.encoder(audio_signal=processed_video_signal, length=processed_video_signal_length) + + log_probs = self.decoder(encoder_output=encoded) + greedy_predictions = log_probs.argmax(dim=-1, keepdim=False) + + return ( + log_probs, + encoded_len, + greedy_predictions, + ) + + # PTL-specific methods + def training_step(self, batch, batch_nb): + # Reset access registry + if AccessMixin.is_access_enabled(): + AccessMixin.reset_registry(self) + + if self.is_interctc_enabled(): + AccessMixin.set_access_enabled(access_enabled=True) + + video_signal, video_signal_len, transcript, transcript_len = batch + log_probs, encoded_len, predictions = self.forward( + input_video_signal=video_signal, input_video_signal_length=video_signal_len + ) + + if hasattr(self, '_trainer') and self._trainer is not None: + log_every_n_steps = self._trainer.log_every_n_steps + else: + log_every_n_steps = 1 + + loss_value = self.loss( + log_probs=log_probs, targets=transcript, input_lengths=encoded_len, target_lengths=transcript_len + ) + + # Add auxiliary losses, if registered + loss_value = self.add_auxiliary_losses(loss_value) + # only computing WER when requested in the logs (same as done for final-layer WER below) + loss_value, tensorboard_logs = self.add_interctc_losses( + loss_value, transcript, transcript_len, compute_wer=((batch_nb + 1) % log_every_n_steps == 0) + ) + + # Reset access registry + if AccessMixin.is_access_enabled(): + AccessMixin.reset_registry(self) + + tensorboard_logs.update( + { + 'train_loss': loss_value, + 'learning_rate': self._optimizer.param_groups[0]['lr'], + 'global_step': torch.tensor(self.trainer.global_step, dtype=torch.float32), + } + ) + + if (batch_nb + 1) % log_every_n_steps == 0: + self._wer.update( + predictions=log_probs, + targets=transcript, + target_lengths=transcript_len, + predictions_lengths=encoded_len, + ) + wer, _, _ = self._wer.compute() + self._wer.reset() + tensorboard_logs.update({'training_batch_wer': wer}) + + return {'loss': loss_value, 'log': tensorboard_logs} + + def predict_step(self, batch, batch_idx, dataloader_idx=0): + video_signal, video_signal_len, transcript, transcript_len, sample_id = batch + log_probs, encoded_len, predictions = self.forward( + input_video_signal=video_signal, input_video_signal_length=video_signal_len + ) + + transcribed_texts, _ = self._wer.decoding.ctc_decoder_predictions_tensor( + decoder_outputs=log_probs, decoder_lengths=encoded_len, return_hypotheses=False, + ) + + sample_id = sample_id.cpu().detach().numpy() + return list(zip(sample_id, transcribed_texts)) + + def validation_step(self, batch, batch_idx, dataloader_idx=0): + if self.is_interctc_enabled(): + AccessMixin.set_access_enabled(access_enabled=True) + + video_signal, video_signal_len, transcript, transcript_len = batch + log_probs, encoded_len, predictions = self.forward( + input_video_signal=video_signal, input_video_signal_length=video_signal_len + ) + + loss_value = self.loss( + log_probs=log_probs, targets=transcript, input_lengths=encoded_len, target_lengths=transcript_len + ) + loss_value, metrics = self.add_interctc_losses( + loss_value, transcript, transcript_len, compute_wer=True, log_wer_num_denom=True, log_prefix="val_", + ) + + self._wer.update( + predictions=log_probs, targets=transcript, target_lengths=transcript_len, predictions_lengths=encoded_len + ) + wer, wer_num, wer_denom = self._wer.compute() + self._wer.reset() + metrics.update({'val_loss': loss_value, 'val_wer_num': wer_num, 'val_wer_denom': wer_denom, 'val_wer': wer}) + + self.log('global_step', torch.tensor(self.trainer.global_step, dtype=torch.float32)) + + # Reset access registry + if AccessMixin.is_access_enabled(): + AccessMixin.reset_registry(self) + + return metrics + + def multi_validation_epoch_end(self, outputs, dataloader_idx: int = 0): + metrics = super().multi_validation_epoch_end(outputs, dataloader_idx) + self.finalize_interctc_metrics(metrics, outputs, prefix="val_") + return metrics + + def multi_test_epoch_end(self, outputs, dataloader_idx: int = 0): + metrics = super().multi_test_epoch_end(outputs, dataloader_idx) + self.finalize_interctc_metrics(metrics, outputs, prefix="test_") + return metrics + + def test_step(self, batch, batch_idx, dataloader_idx=0): + logs = self.validation_step(batch, batch_idx, dataloader_idx=dataloader_idx) + test_logs = {name.replace("val_", "test_"): value for name, value in logs.items()} + if type(self.trainer.test_dataloaders) == list and len(self.trainer.test_dataloaders) > 1: + self.test_step_outputs[dataloader_idx].append(test_logs) + else: + self.test_step_outputs.append(test_logs) + return test_logs + + def test_dataloader(self): + if self._test_dl is not None: + return self._test_dl + + def _setup_transcribe_dataloader(self, config: Dict) -> 'torch.utils.data.DataLoader': + """ + Setup function for a temporary data loader which wraps the provided video file. + + Args: + config: A python dictionary which contains the following keys: + paths2video_files: (a list) of paths to video files. The files should be relatively short fragments. \ + Recommended length per file is between 5 and 25 seconds. + batch_size: (int) batch size to use during inference. \ + Bigger will result in better throughput performance but would use more memory. + temp_dir: (str) A temporary directory where the video manifest is temporarily + stored. + num_workers: (int) number of workers. Depends of the batch_size and machine. \ + 0 - only the main process will load batches, 1 - one worker (not main process) + + Returns: + A pytorch DataLoader for the given video file(s). + """ + if 'manifest_filepath' in config: + manifest_filepath = config['manifest_filepath'] + batch_size = config['batch_size'] + else: + manifest_filepath = os.path.join(config['temp_dir'], 'manifest.json') + batch_size = min(config['batch_size'], len(config['paths2video_files'])) + + dl_config = { + 'manifest_filepath': manifest_filepath, + 'labels': self.decoder.vocabulary, + 'batch_size': batch_size, + 'trim_silence': False, + 'shuffle': False, + 'num_workers': config.get('num_workers', min(batch_size, os.cpu_count() - 1)), + 'pin_memory': True, + 'channel_selector': config.get('channel_selector', None), + } + if config.get("augmentor"): + dl_config['augmentor'] = config.get("augmentor") + + temporary_datalayer = self._setup_dataloader_from_config(config=DictConfig(dl_config)) + return temporary_datalayer + + @classmethod + def list_available_models(cls) -> List[PretrainedModelInfo]: + """ + This method returns a list of pre-trained model which can be instantiated directly from NVIDIA's NGC cloud. + + Returns: + List of available pre-trained models. + """ + results = [] + + return results diff --git a/nemo/collections/multimodal/speech_cv/models/visual_hybrid_rnnt_ctc_bpe_models.py b/nemo/collections/multimodal/speech_cv/models/visual_hybrid_rnnt_ctc_bpe_models.py new file mode 100644 index 000000000000..882d15700593 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/models/visual_hybrid_rnnt_ctc_bpe_models.py @@ -0,0 +1,455 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy +import os +from typing import Dict, Optional, Union + +import torch +from omegaconf import DictConfig, ListConfig, OmegaConf, open_dict +from pytorch_lightning import Trainer + +from nemo.collections.asr.losses.ctc import CTCLoss +from nemo.collections.asr.losses.rnnt import RNNTLoss +from nemo.collections.asr.metrics.rnnt_wer_bpe import RNNTBPEWER, RNNTBPEDecoding, RNNTBPEDecodingConfig +from nemo.collections.asr.metrics.wer_bpe import WERBPE, CTCBPEDecoding, CTCBPEDecodingConfig +from nemo.collections.asr.parts.mixins import ASRBPEMixin +from nemo.collections.multimodal.speech_cv.data import video_to_text_dataset +from nemo.collections.multimodal.speech_cv.models.visual_hybrid_rnnt_ctc_models import VisualEncDecHybridRNNTCTCModel +from nemo.core.classes.common import PretrainedModelInfo +from nemo.utils import logging, model_utils + + +class VisualEncDecHybridRNNTCTCBPEModel(VisualEncDecHybridRNNTCTCModel, ASRBPEMixin): + """Base class for encoder decoder RNNT-based models with auxiliary CTC decoder/loss and subword tokenization.""" + + def __init__(self, cfg: DictConfig, trainer: Trainer = None): + # Convert to Hydra 1.0 compatible DictConfig + cfg = model_utils.convert_model_config_to_dict_config(cfg) + cfg = model_utils.maybe_update_config_version(cfg) + + # Tokenizer is necessary for this model + if 'tokenizer' not in cfg: + raise ValueError("`cfg` must have `tokenizer` config to create a tokenizer !") + + if not isinstance(cfg, DictConfig): + cfg = OmegaConf.create(cfg) + + # Setup the tokenizer + self._setup_tokenizer(cfg.tokenizer) + + # Initialize a dummy vocabulary + vocabulary = self.tokenizer.tokenizer.get_vocab() + + # Set the new vocabulary + with open_dict(cfg): + cfg.labels = ListConfig(list(vocabulary)) + + with open_dict(cfg.decoder): + cfg.decoder.vocab_size = len(vocabulary) + + with open_dict(cfg.joint): + cfg.joint.num_classes = len(vocabulary) + cfg.joint.vocabulary = ListConfig(list(vocabulary)) + cfg.joint.jointnet.encoder_hidden = cfg.model_defaults.enc_hidden + cfg.joint.jointnet.pred_hidden = cfg.model_defaults.pred_hidden + + # setup auxiliary CTC decoder + if 'aux_ctc' not in cfg: + raise ValueError( + "The config need to have a section for the CTC decoder named as aux_ctc for Hybrid models." + ) + + with open_dict(cfg): + if self.tokenizer_type == "agg": + cfg.aux_ctc.decoder.vocabulary = ListConfig(vocabulary) + else: + cfg.aux_ctc.decoder.vocabulary = ListConfig(list(vocabulary.keys())) + + if cfg.aux_ctc.decoder["num_classes"] < 1: + logging.info( + "\nReplacing placholder number of classes ({}) with actual number of classes - {}".format( + cfg.aux_ctc.decoder["num_classes"], len(vocabulary) + ) + ) + cfg.aux_ctc.decoder["num_classes"] = len(vocabulary) + + super().__init__(cfg=cfg, trainer=trainer) + + # Setup decoding object + self.decoding = RNNTBPEDecoding( + decoding_cfg=self.cfg.decoding, decoder=self.decoder, joint=self.joint, tokenizer=self.tokenizer, + ) + + # Setup wer object + self.wer = RNNTBPEWER( + decoding=self.decoding, + batch_dim_index=0, + use_cer=self.cfg.get('use_cer', False), + log_prediction=self.cfg.get('log_prediction', True), + dist_sync_on_step=True, + ) + + # Setup fused Joint step if flag is set + if self.joint.fuse_loss_wer: + self.joint.set_loss(self.loss) + self.joint.set_wer(self.wer) + + # Setup CTC decoding + ctc_decoding_cfg = self.cfg.aux_ctc.get('decoding', None) + if ctc_decoding_cfg is None: + ctc_decoding_cfg = OmegaConf.structured(CTCBPEDecodingConfig) + with open_dict(self.cfg.aux_ctc): + self.cfg.aux_ctc.decoding = ctc_decoding_cfg + self.ctc_decoding = CTCBPEDecoding(self.cfg.aux_ctc.decoding, tokenizer=self.tokenizer) + + # Setup CTC WER + self.ctc_wer = WERBPE( + decoding=self.ctc_decoding, + use_cer=self.cfg.aux_ctc.get('use_cer', False), + dist_sync_on_step=True, + log_prediction=self.cfg.get("log_prediction", False), + ) + + # setting the RNNT decoder as the default one + self.use_rnnt_decoder = True + + def _setup_dataloader_from_config(self, config: Optional[Dict]): + dataset = video_to_text_dataset.get_video_to_text_bpe_dataset_from_config( + config=config, + local_rank=self.local_rank, + global_rank=self.global_rank, + world_size=self.world_size, + tokenizer=self.tokenizer, + preprocessor_cfg=self.cfg.get("preprocessor", None), + ) + + if dataset is None: + return None + + shuffle = config['shuffle'] + if config.get('is_tarred', False): + shuffle = False + + if hasattr(dataset, 'collate_fn'): + collate_fn = dataset.collate_fn + else: + collate_fn = dataset.datasets[0].collate_fn + + return torch.utils.data.DataLoader( + dataset=dataset, + batch_size=config['batch_size'], + collate_fn=collate_fn, + drop_last=config.get('drop_last', False), + shuffle=shuffle, + num_workers=config.get('num_workers', 0), + pin_memory=config.get('pin_memory', False), + ) + + def _setup_transcribe_dataloader(self, config: Dict) -> 'torch.utils.data.DataLoader': + """ + Setup function for a temporary data loader which wraps the provided video file. + + Args: + config: A python dictionary which contains the following keys: + paths2video_files: (a list) of paths to video files. The files should be relatively short fragments. \ + Recommended length per file is between 5 and 25 seconds. + batch_size: (int) batch size to use during inference. \ + Bigger will result in better throughput performance but would use more memory. + temp_dir: (str) A temporary directory where the video manifest is temporarily + stored. + num_workers: (int) number of workers. Depends of the batch_size and machine. \ + 0 - only the main process will load batches, 1 - one worker (not main process) + + Returns: + A pytorch DataLoader for the given video file(s). + """ + + if 'manifest_filepath' in config: + manifest_filepath = config['manifest_filepath'] + batch_size = config['batch_size'] + else: + manifest_filepath = os.path.join(config['temp_dir'], 'manifest.json') + batch_size = min(config['batch_size'], len(config['paths2video_files'])) + + dl_config = { + 'manifest_filepath': manifest_filepath, + 'batch_size': batch_size, + 'shuffle': False, + 'num_workers': config.get('num_workers', min(batch_size, os.cpu_count() - 1)), + 'pin_memory': True, + 'channel_selector': config.get('channel_selector', None), + 'use_start_end_token': self.cfg.validation_ds.get('use_start_end_token', False), + } + + if config.get("augmentor"): + dl_config['augmentor'] = config.get("augmentor") + + temporary_datalayer = self._setup_dataloader_from_config(config=DictConfig(dl_config)) + return temporary_datalayer + + def change_vocabulary( + self, + new_tokenizer_dir: Union[str, DictConfig], + new_tokenizer_type: str, + decoding_cfg: Optional[DictConfig] = None, + ctc_decoding_cfg: Optional[DictConfig] = None, + ): + """ + Changes vocabulary used during RNNT decoding process. Use this method when fine-tuning on from pre-trained model. + This method changes only decoder and leaves encoder and pre-processing modules unchanged. For example, you would + use it if you want to use pretrained encoder when fine-tuning on data in another language, or when you'd need + model to learn capitalization, punctuation and/or special characters. + + Args: + new_tokenizer_dir: Directory path to tokenizer or a config for a new tokenizer (if the tokenizer type is `agg`) + new_tokenizer_type: Type of tokenizer. Can be either `agg`, `bpe` or `wpe`. + decoding_cfg: A config for the decoder, which is optional. If the decoding type + needs to be changed (from say Greedy to Beam decoding etc), the config can be passed here. + ctc_decoding_cfg: A config for auxiliary CTC decoding, which is optional and can be used to change the decoding type. + + Returns: None + + """ + if isinstance(new_tokenizer_dir, DictConfig): + if new_tokenizer_type == 'agg': + new_tokenizer_cfg = new_tokenizer_dir + else: + raise ValueError( + f'New tokenizer dir should be a string unless the tokenizer is `agg`, but this tokenizer type is: {new_tokenizer_type}' + ) + else: + new_tokenizer_cfg = None + + if new_tokenizer_cfg is not None: + tokenizer_cfg = new_tokenizer_cfg + else: + if not os.path.isdir(new_tokenizer_dir): + raise NotADirectoryError( + f'New tokenizer dir must be non-empty path to a directory. But I got: {new_tokenizer_dir}' + ) + + if new_tokenizer_type.lower() not in ('bpe', 'wpe'): + raise ValueError(f'New tokenizer type must be either `bpe` or `wpe`') + + tokenizer_cfg = OmegaConf.create({'dir': new_tokenizer_dir, 'type': new_tokenizer_type}) + + # Setup the tokenizer + self._setup_tokenizer(tokenizer_cfg) + + # Initialize a dummy vocabulary + vocabulary = self.tokenizer.tokenizer.get_vocab() + + joint_config = self.joint.to_config_dict() + new_joint_config = copy.deepcopy(joint_config) + if self.tokenizer_type == "agg": + new_joint_config["vocabulary"] = ListConfig(vocabulary) + else: + new_joint_config["vocabulary"] = ListConfig(list(vocabulary.keys())) + + new_joint_config['num_classes'] = len(vocabulary) + del self.joint + self.joint = VisualEncDecHybridRNNTCTCBPEModel.from_config_dict(new_joint_config) + + decoder_config = self.decoder.to_config_dict() + new_decoder_config = copy.deepcopy(decoder_config) + new_decoder_config.vocab_size = len(vocabulary) + del self.decoder + self.decoder = VisualEncDecHybridRNNTCTCBPEModel.from_config_dict(new_decoder_config) + + del self.loss + self.loss = RNNTLoss(num_classes=self.joint.num_classes_with_blank - 1) + + if decoding_cfg is None: + # Assume same decoding config as before + decoding_cfg = self.cfg.decoding + + # Assert the decoding config with all hyper parameters + decoding_cls = OmegaConf.structured(RNNTBPEDecodingConfig) + decoding_cls = OmegaConf.create(OmegaConf.to_container(decoding_cls)) + decoding_cfg = OmegaConf.merge(decoding_cls, decoding_cfg) + + self.decoding = RNNTBPEDecoding( + decoding_cfg=decoding_cfg, decoder=self.decoder, joint=self.joint, tokenizer=self.tokenizer, + ) + + self.wer = RNNTBPEWER( + decoding=self.decoding, + batch_dim_index=self.wer.batch_dim_index, + use_cer=self.wer.use_cer, + log_prediction=self.wer.log_prediction, + dist_sync_on_step=True, + ) + + # Setup fused Joint step + if self.joint.fuse_loss_wer or ( + self.decoding.joint_fused_batch_size is not None and self.decoding.joint_fused_batch_size > 0 + ): + self.joint.set_loss(self.loss) + self.joint.set_wer(self.wer) + + # Update config + with open_dict(self.cfg.joint): + self.cfg.joint = new_joint_config + + with open_dict(self.cfg.decoder): + self.cfg.decoder = new_decoder_config + + with open_dict(self.cfg.decoding): + self.cfg.decoding = decoding_cfg + + logging.info(f"Changed tokenizer of the RNNT decoder to {self.joint.vocabulary} vocabulary.") + + # set up the new tokenizer for the CTC decoder + if hasattr(self, 'ctc_decoder'): + ctc_decoder_config = copy.deepcopy(self.ctc_decoder.to_config_dict()) + # sidestepping the potential overlapping tokens issue in aggregate tokenizers + if self.tokenizer_type == "agg": + ctc_decoder_config.vocabulary = ListConfig(vocabulary) + else: + ctc_decoder_config.vocabulary = ListConfig(list(vocabulary.keys())) + + decoder_num_classes = ctc_decoder_config['num_classes'] + # Override number of classes if placeholder provided + logging.info( + "\nReplacing old number of classes ({}) with new number of classes - {}".format( + decoder_num_classes, len(vocabulary) + ) + ) + ctc_decoder_config['num_classes'] = len(vocabulary) + + del self.ctc_decoder + self.ctc_decoder = VisualEncDecHybridRNNTCTCBPEModel.from_config_dict(ctc_decoder_config) + del self.ctc_loss + self.ctc_loss = CTCLoss( + num_classes=self.ctc_decoder.num_classes_with_blank - 1, + zero_infinity=True, + reduction=self.cfg.aux_ctc.get("ctc_reduction", "mean_batch"), + ) + + if ctc_decoding_cfg is None: + # Assume same decoding config as before + ctc_decoding_cfg = self.cfg.aux_ctc.decoding + + # Assert the decoding config with all hyper parameters + ctc_decoding_cls = OmegaConf.structured(CTCBPEDecodingConfig) + ctc_decoding_cls = OmegaConf.create(OmegaConf.to_container(ctc_decoding_cls)) + ctc_decoding_cfg = OmegaConf.merge(ctc_decoding_cls, ctc_decoding_cfg) + + self.ctc_decoding = CTCBPEDecoding(decoding_cfg=ctc_decoding_cfg, tokenizer=self.tokenizer) + + self.ctc_wer = WERBPE( + decoding=self.ctc_decoding, + use_cer=self.cfg.aux_ctc.get('use_cer', False), + log_prediction=self.cfg.get("log_prediction", False), + dist_sync_on_step=True, + ) + + # Update config + with open_dict(self.cfg.aux_ctc): + self.cfg.aux_ctc.decoder = ctc_decoder_config + + with open_dict(self.cfg.aux_ctc): + self.cfg.aux_ctc.decoding = ctc_decoding_cfg + + logging.info(f"Changed tokenizer of the CTC decoder to {self.ctc_decoder.vocabulary} vocabulary.") + + def change_decoding_strategy(self, decoding_cfg: DictConfig, decoder_type: str = None): + """ + Changes decoding strategy used during RNNT decoding process. + Args: + decoding_cfg: A config for the decoder, which is optional. If the decoding type + needs to be changed (from say Greedy to Beam decoding etc), the config can be passed here. + decoder_type: (str) Can be set to 'rnnt' or 'ctc' to switch between appropriate decoder in a + model having both RNN-T and CTC decoders. Defaults to None, in which case RNN-T decoder is + used. If set to 'ctc', it raises error if 'ctc_decoder' is not an attribute of the model. + """ + if decoder_type is None or decoder_type == 'rnnt': + if decoding_cfg is None: + # Assume same decoding config as before + logging.info("No `decoding_cfg` passed when changing decoding strategy, using internal config") + decoding_cfg = self.cfg.decoding + + # Assert the decoding config with all hyper parameters + decoding_cls = OmegaConf.structured(RNNTBPEDecodingConfig) + decoding_cls = OmegaConf.create(OmegaConf.to_container(decoding_cls)) + decoding_cfg = OmegaConf.merge(decoding_cls, decoding_cfg) + + self.decoding = RNNTBPEDecoding( + decoding_cfg=decoding_cfg, decoder=self.decoder, joint=self.joint, tokenizer=self.tokenizer, + ) + + self.wer = RNNTBPEWER( + decoding=self.decoding, + batch_dim_index=self.wer.batch_dim_index, + use_cer=self.wer.use_cer, + log_prediction=self.wer.log_prediction, + dist_sync_on_step=True, + ) + + # Setup fused Joint step + if self.joint.fuse_loss_wer or ( + self.decoding.joint_fused_batch_size is not None and self.decoding.joint_fused_batch_size > 0 + ): + self.joint.set_loss(self.loss) + self.joint.set_wer(self.wer) + + # Update config + with open_dict(self.cfg.decoding): + self.cfg.decoding = decoding_cfg + + logging.info(f"Changed decoding strategy of the RNNT decoder to \n{OmegaConf.to_yaml(self.cfg.decoding)}") + elif decoder_type == 'ctc': + if not hasattr(self, 'ctc_decoding'): + raise ValueError("The model does not have the ctc_decoding module and does not support ctc decoding.") + if decoding_cfg is None: + # Assume same decoding config as before + logging.info("No `decoding_cfg` passed when changing decoding strategy, using internal config") + decoding_cfg = self.cfg.aux_ctc.decoding + + # Assert the decoding config with all hyper parameters + decoding_cls = OmegaConf.structured(CTCBPEDecodingConfig) + decoding_cls = OmegaConf.create(OmegaConf.to_container(decoding_cls)) + decoding_cfg = OmegaConf.merge(decoding_cls, decoding_cfg) + + self.ctc_decoding = CTCBPEDecoding(decoding_cfg=decoding_cfg, tokenizer=self.tokenizer) + + self.ctc_wer = WERBPE( + decoding=self.ctc_decoding, + use_cer=self.ctc_wer.use_cer, + log_prediction=self.ctc_wer.log_prediction, + dist_sync_on_step=True, + ) + + # Update config + with open_dict(self.cfg.aux_ctc.decoding): + self.cfg.aux_ctc.decoding = decoding_cfg + + self.use_rnnt_decoder = False + logging.info( + f"Changed decoding strategy of the CTC decoder to \n{OmegaConf.to_yaml(self.cfg.aux_ctc.decoding)}" + ) + else: + raise ValueError(f"decoder_type={decoder_type} is not supported. Supported values: [ctc,rnnt]") + + @classmethod + def list_available_models(cls) -> Optional[PretrainedModelInfo]: + """ + This method returns a list of pre-trained model which can be instantiated directly from NVIDIA's NGC cloud. + + Returns: + List of available pre-trained models. + """ + results = [] + return results diff --git a/nemo/collections/multimodal/speech_cv/models/visual_hybrid_rnnt_ctc_models.py b/nemo/collections/multimodal/speech_cv/models/visual_hybrid_rnnt_ctc_models.py new file mode 100644 index 000000000000..7c658f2a18c6 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/models/visual_hybrid_rnnt_ctc_models.py @@ -0,0 +1,644 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy +import json +import os +import tempfile +from typing import List, Optional + +import torch +from omegaconf import DictConfig, OmegaConf, open_dict +from pytorch_lightning import Trainer +from tqdm.auto import tqdm + +from nemo.collections.asr.losses.ctc import CTCLoss +from nemo.collections.asr.metrics.wer import WER, CTCDecoding, CTCDecodingConfig +from nemo.collections.asr.parts.mixins import ASRBPEMixin, InterCTCMixin +from nemo.collections.asr.parts.utils.audio_utils import ChannelSelectorType +from nemo.collections.multimodal.speech_cv.models.visual_rnnt_models import VisualEncDecRNNTModel +from nemo.core.classes.common import PretrainedModelInfo +from nemo.core.classes.mixins import AccessMixin +from nemo.utils import logging, model_utils + + +class VisualEncDecHybridRNNTCTCModel(VisualEncDecRNNTModel, ASRBPEMixin, InterCTCMixin): + """Base class for hybrid RNNT/CTC models.""" + + def __init__(self, cfg: DictConfig, trainer: Trainer = None): + cfg = model_utils.convert_model_config_to_dict_config(cfg) + cfg = model_utils.maybe_update_config_version(cfg) + super().__init__(cfg=cfg, trainer=trainer) + + if 'aux_ctc' not in self.cfg: + raise ValueError( + "The config need to have a section for the CTC decoder named as aux_ctc for Hybrid models." + ) + with open_dict(self.cfg.aux_ctc): + if "feat_in" not in self.cfg.aux_ctc.decoder or ( + not self.cfg.aux_ctc.decoder.feat_in and hasattr(self.encoder, '_feat_out') + ): + self.cfg.aux_ctc.decoder.feat_in = self.encoder._feat_out + if "feat_in" not in self.cfg.aux_ctc.decoder or not self.cfg.aux_ctc.decoder.feat_in: + raise ValueError("param feat_in of the decoder's config is not set!") + + if self.cfg.aux_ctc.decoder.num_classes < 1 and self.cfg.aux_ctc.decoder.vocabulary is not None: + logging.info( + "\nReplacing placeholder number of classes ({}) with actual number of classes - {}".format( + self.cfg.aux_ctc.decoder.num_classes, len(self.cfg.aux_ctc.decoder.vocabulary) + ) + ) + self.cfg.aux_ctc.decoder["num_classes"] = len(self.cfg.aux_ctc.decoder.vocabulary) + + self.ctc_decoder = VisualEncDecHybridRNNTCTCModel.from_config_dict(self.cfg.aux_ctc.decoder) + self.ctc_loss_weight = self.cfg.aux_ctc.get("ctc_loss_weight", 0.5) + + self.ctc_loss = CTCLoss( + num_classes=self.ctc_decoder.num_classes_with_blank - 1, + zero_infinity=True, + reduction=self.cfg.aux_ctc.get("ctc_reduction", "mean_batch"), + ) + + ctc_decoding_cfg = self.cfg.aux_ctc.get('decoding', None) + if ctc_decoding_cfg is None: + ctc_decoding_cfg = OmegaConf.structured(CTCDecodingConfig) + with open_dict(self.cfg.aux_ctc): + self.cfg.aux_ctc.decoding = ctc_decoding_cfg + + self.ctc_decoding = CTCDecoding(self.cfg.aux_ctc.decoding, vocabulary=self.ctc_decoder.vocabulary) + self.ctc_wer = WER( + decoding=self.ctc_decoding, + use_cer=self.cfg.aux_ctc.get('use_cer', False), + dist_sync_on_step=True, + log_prediction=self.cfg.get("log_prediction", False), + ) + + # setting the RNNT decoder as the default one + self.use_rnnt_decoder = True + + # setting up interCTC loss (from InterCTCMixin) + self.setup_interctc(decoder_name='decoder', loss_name='loss', wer_name='_wer') + + @torch.no_grad() + def transcribe( + self, + paths2video_files: List[str], + batch_size: int = 4, + return_hypotheses: bool = False, + partial_hypothesis: Optional[List['Hypothesis']] = None, + num_workers: int = 0, + channel_selector: Optional[ChannelSelectorType] = None, + ) -> (List[str], Optional[List['Hypothesis']]): + """ + Uses greedy decoding to transcribe video files. Use this method for debugging and prototyping. + + Args: + + paths2video_files: (a list) of paths to video files. + batch_size: (int) batch size to use during inference. \ + Bigger will result in better throughput performance but would use more memory. + return_hypotheses: (bool) Either return hypotheses or text + With hypotheses can do some postprocessing like getting timestamp or rescoring + num_workers: (int) number of workers for DataLoader + channel_selector (int | Iterable[int] | str): select a single channel or a subset of channels from multi-channel audio. If set to `'average'`, it performs averaging across channels. Disabled if set to `None`. Defaults to `None`. Uses zero-based indexing. + + Returns: + Returns a tuple of 2 items - + * A list of greedy transcript texts / Hypothesis + * An optional list of beam search transcript texts / Hypothesis / NBestHypothesis. + """ + if self.use_rnnt_decoder: + return super().transcribe( + paths2video_files=paths2video_files, + batch_size=batch_size, + return_hypotheses=return_hypotheses, + partial_hypothesis=partial_hypothesis, + num_workers=num_workers, + channel_selector=channel_selector, + ) + + if paths2video_files is None or len(paths2video_files) == 0: + return {} + # We will store transcriptions here + hypotheses = [] + all_hypotheses = [] + # Model's mode and device + mode = self.training + device = next(self.parameters()).device + + if num_workers is None: + num_workers = min(batch_size, os.cpu_count() - 1) + + try: + + # Switch model to evaluation mode + self.eval() + # Freeze the visual front-end, encoder and decoder modules + self.video_front_end.freeze() + self.encoder.freeze() + self.decoder.freeze() + self.joint.freeze() + if hasattr(self, 'ctc_decoder'): + self.ctc_decoder.freeze() + + logging_level = logging.get_verbosity() + logging.set_verbosity(logging.WARNING) + # Work in tmp directory - will store manifest file there + with tempfile.TemporaryDirectory() as tmpdir: + with open(os.path.join(tmpdir, 'manifest.json'), 'w', encoding='utf-8') as fp: + for video_file in paths2video_files: + entry = {'video_filepath': video_file, 'duration': 100000, 'text': ''} + fp.write(json.dumps(entry) + '\n') + + config = { + 'paths2video_files': paths2video_files, + 'batch_size': batch_size, + 'temp_dir': tmpdir, + 'num_workers': num_workers, + 'channel_selector': channel_selector, + } + + temporary_datalayer = self._setup_transcribe_dataloader(config) + for test_batch in tqdm(temporary_datalayer, desc="Transcribing"): + encoded, encoded_len = self.forward( + input_signal=test_batch[0].to(device), input_signal_length=test_batch[1].to(device) + ) + + logits = self.ctc_decoder(encoder_output=encoded) + best_hyp, all_hyp = self.ctc_decoding.ctc_decoder_predictions_tensor( + logits, encoded_len, return_hypotheses=return_hypotheses, + ) + if return_hypotheses: + # dump log probs per file + for idx in range(logits.shape[0]): + best_hyp[idx].y_sequence = logits[idx][: encoded_len[idx]] + if best_hyp[idx].alignments is None: + best_hyp[idx].alignments = best_hyp[idx].y_sequence + del logits + + hypotheses += best_hyp + if all_hyp is not None: + all_hypotheses += all_hyp + else: + all_hypotheses += best_hyp + + del encoded + del test_batch + finally: + # set mode back to its original value + self.train(mode=mode) + + logging.set_verbosity(logging_level) + if mode is True: + self.video_front_end.unfreeze() + self.encoder.unfreeze() + self.decoder.unfreeze() + self.joint.unfreeze() + if hasattr(self, 'ctc_decoder'): + self.ctc_decoder.unfreeze() + return hypotheses, all_hypotheses + + def change_vocabulary( + self, + new_vocabulary: List[str], + decoding_cfg: Optional[DictConfig] = None, + ctc_decoding_cfg: Optional[DictConfig] = None, + ): + """ + Changes vocabulary used during RNNT decoding process. Use this method when fine-tuning a pre-trained model. + This method changes only decoder and leaves encoder and pre-processing modules unchanged. For example, you would + use it if you want to use pretrained encoder when fine-tuning on data in another language, or when you'd need + model to learn capitalization, punctuation and/or special characters. + + Args: + new_vocabulary: list with new vocabulary. Must contain at least 2 elements. Typically, \ + this is target alphabet. + decoding_cfg: A config for the decoder, which is optional. If the decoding type + needs to be changed (from say Greedy to Beam decoding etc), the config can be passed here. + ctc_decoding_cfg: A config for CTC decoding, which is optional and can be used to change decoding type. + + Returns: None + + """ + super().change_vocabulary(new_vocabulary=new_vocabulary, decoding_cfg=decoding_cfg) + + # set up the new tokenizer for the CTC decoder + if hasattr(self, 'ctc_decoder'): + if self.ctc_decoder.vocabulary == new_vocabulary: + logging.warning( + f"Old {self.ctc_decoder.vocabulary} and new {new_vocabulary} match. Not changing anything." + ) + else: + if new_vocabulary is None or len(new_vocabulary) == 0: + raise ValueError(f'New vocabulary must be non-empty list of chars. But I got: {new_vocabulary}') + decoder_config = self.ctc_decoder.to_config_dict() + new_decoder_config = copy.deepcopy(decoder_config) + new_decoder_config['vocabulary'] = new_vocabulary + new_decoder_config['num_classes'] = len(new_vocabulary) + + del self.ctc_decoder + self.ctc_decoder = VisualEncDecHybridRNNTCTCModel.from_config_dict(new_decoder_config) + del self.ctc_loss + self.ctc_loss = CTCLoss( + num_classes=self.ctc_decoder.num_classes_with_blank - 1, + zero_infinity=True, + reduction=self.cfg.aux_ctc.get("ctc_reduction", "mean_batch"), + ) + + if ctc_decoding_cfg is None: + # Assume same decoding config as before + logging.info("No `ctc_decoding_cfg` passed when changing decoding strategy, using internal config") + ctc_decoding_cfg = self.cfg.aux_ctc.decoding + + # Assert the decoding config with all hyper parameters + ctc_decoding_cls = OmegaConf.structured(CTCDecodingConfig) + ctc_decoding_cls = OmegaConf.create(OmegaConf.to_container(ctc_decoding_cls)) + ctc_decoding_cfg = OmegaConf.merge(ctc_decoding_cls, ctc_decoding_cfg) + + self.ctc_decoding = CTCDecoding(decoding_cfg=ctc_decoding_cfg, vocabulary=self.ctc_decoder.vocabulary) + + self.ctc_wer = WER( + decoding=self.ctc_decoding, + use_cer=self.ctc_wer.use_cer, + log_prediction=self.ctc_wer.log_prediction, + dist_sync_on_step=True, + ) + + # Update config + with open_dict(self.cfg.aux_ctc): + self.cfg.aux_ctc.decoding = ctc_decoding_cfg + + with open_dict(self.cfg.aux_ctc): + self.cfg.aux_ctc.decoder = new_decoder_config + + ds_keys = ['train_ds', 'validation_ds', 'test_ds'] + for key in ds_keys: + if key in self.cfg: + with open_dict(self.cfg[key]): + self.cfg[key]['labels'] = OmegaConf.create(new_vocabulary) + + logging.info(f"Changed the tokenizer of the CTC decoder to {self.ctc_decoder.vocabulary} vocabulary.") + + def change_decoding_strategy(self, decoding_cfg: DictConfig, decoder_type: str = None): + """ + Changes decoding strategy used during RNNT decoding process. + + Args: + decoding_cfg: A config for the decoder, which is optional. If the decoding type + needs to be changed (from say Greedy to Beam decoding etc), the config can be passed here. + decoder_type: (str) Can be set to 'rnnt' or 'ctc' to switch between appropriate decoder in a + model having RNN-T and CTC decoders. Defaults to None, in which case RNN-T decoder is + used. If set to 'ctc', it raises error if 'ctc_decoder' is not an attribute of the model. + """ + if decoder_type is None or decoder_type == 'rnnt': + self.use_rnnt_decoder = True + return super().change_decoding_strategy(decoding_cfg=decoding_cfg) + + assert decoder_type == 'ctc' and hasattr(self, 'ctc_decoder') + if decoding_cfg is None: + # Assume same decoding config as before + logging.info("No `decoding_cfg` passed when changing decoding strategy, using internal config") + decoding_cfg = self.cfg.aux_ctc.decoding + + # Assert the decoding config with all hyper parameters + decoding_cls = OmegaConf.structured(CTCDecodingConfig) + decoding_cls = OmegaConf.create(OmegaConf.to_container(decoding_cls)) + decoding_cfg = OmegaConf.merge(decoding_cls, decoding_cfg) + + self.ctc_decoding = CTCDecoding(decoding_cfg=decoding_cfg, vocabulary=self.ctc_decoder.vocabulary) + + self.ctc_wer = WER( + decoding=self.ctc_decoding, + use_cer=self.ctc_wer.use_cer, + log_prediction=self.ctc_wer.log_prediction, + dist_sync_on_step=True, + ) + + # Update config + with open_dict(self.cfg.aux_ctc): + self.cfg.aux_ctc.decoding = decoding_cfg + + self.use_rnnt_decoder = False + logging.info(f"Changed decoding strategy to \n{OmegaConf.to_yaml(self.cfg.aux_ctc.decoding)}") + + # PTL-specific methods + def training_step(self, batch, batch_nb): + # Reset access registry + if AccessMixin.is_access_enabled(): + AccessMixin.reset_registry(self) + + if self.is_interctc_enabled(): + AccessMixin.set_access_enabled(access_enabled=True) + + signal, signal_len, transcript, transcript_len = batch + + # forward() only performs encoder forward + encoded, encoded_len = self.forward(input_signal=signal, input_signal_length=signal_len) + del signal + + # During training, loss must be computed, so decoder forward is necessary + decoder, target_length, states = self.decoder(targets=transcript, target_length=transcript_len) + + if hasattr(self, '_trainer') and self._trainer is not None: + log_every_n_steps = self._trainer.log_every_n_steps + sample_id = self._trainer.global_step + else: + log_every_n_steps = 1 + sample_id = batch_nb + + # If fused Joint-Loss-WER is not used + if not self.joint.fuse_loss_wer: + # Compute full joint and loss + joint = self.joint(encoder_outputs=encoded, decoder_outputs=decoder) + loss_value = self.loss( + log_probs=joint, targets=transcript, input_lengths=encoded_len, target_lengths=target_length + ) + + # Add auxiliary losses, if registered + loss_value = self.add_auxiliary_losses(loss_value) + + # Reset access registry + # if AccessMixin.is_access_enabled(): + # AccessMixin.reset_registry(self) + + tensorboard_logs = { + 'train_loss': loss_value, + 'learning_rate': self._optimizer.param_groups[0]['lr'], + 'global_step': torch.tensor(self.trainer.global_step, dtype=torch.float32), + } + + if (sample_id + 1) % log_every_n_steps == 0: + self.wer.update(encoded, encoded_len, transcript, transcript_len) + _, scores, words = self.wer.compute() + self.wer.reset() + tensorboard_logs.update({'training_batch_wer': scores.float() / words}) + + else: + # If fused Joint-Loss-WER is used + if (sample_id + 1) % log_every_n_steps == 0: + compute_wer = True + else: + compute_wer = False + + # Fused joint step + loss_value, wer, _, _ = self.joint( + encoder_outputs=encoded, + decoder_outputs=decoder, + encoder_lengths=encoded_len, + transcripts=transcript, + transcript_lengths=transcript_len, + compute_wer=compute_wer, + ) + + # Add auxiliary losses, if registered + loss_value = self.add_auxiliary_losses(loss_value) + + # Reset access registry + # if AccessMixin.is_access_enabled(): + # AccessMixin.reset_registry(self) + + tensorboard_logs = { + 'train_loss': loss_value, + 'learning_rate': self._optimizer.param_groups[0]['lr'], + 'global_step': torch.tensor(self.trainer.global_step, dtype=torch.float32), + } + + if compute_wer: + tensorboard_logs.update({'training_batch_wer': wer}) + + if self.ctc_loss_weight > 0: + log_probs = self.ctc_decoder(encoder_output=encoded) + ctc_loss = self.ctc_loss( + log_probs=log_probs, targets=transcript, input_lengths=encoded_len, target_lengths=transcript_len + ) + + # Add Interctc Losses + ctc_loss, interctc_tensorboard_logs = self.add_interctc_losses( + ctc_loss, transcript, transcript_len, compute_wer=((batch_nb + 1) % log_every_n_steps == 0) + ) + tensorboard_logs.update(interctc_tensorboard_logs) + + tensorboard_logs['train_rnnt_loss'] = loss_value + tensorboard_logs['train_ctc_loss'] = ctc_loss + loss_value = (1 - self.ctc_loss_weight) * loss_value + self.ctc_loss_weight * ctc_loss + tensorboard_logs['train_loss'] = loss_value + if (sample_id + 1) % log_every_n_steps == 0: + self.ctc_wer.update( + predictions=log_probs, + targets=transcript, + target_lengths=transcript_len, + predictions_lengths=encoded_len, + ) + ctc_wer, _, _ = self.ctc_wer.compute() + self.ctc_wer.reset() + tensorboard_logs.update({'training_batch_wer_ctc': ctc_wer}) + + # Reset access registry + if AccessMixin.is_access_enabled(): + AccessMixin.reset_registry(self) + + # Log items + self.log_dict(tensorboard_logs) + + # Preserve batch acoustic model T and language model U parameters if normalizing + if self._optim_normalize_joint_txu: + self._optim_normalize_txu = [encoded_len.max(), transcript_len.max()] + + return {'loss': loss_value} + + def predict_step(self, batch, batch_idx, dataloader_idx=0): + # TODO: add support for CTC decoding + signal, signal_len, transcript, transcript_len, sample_id = batch + + # forward() only performs encoder forward + encoded, encoded_len = self.forward(input_signal=signal, input_signal_length=signal_len) + del signal + + best_hyp_text, all_hyp_text = self.decoding.rnnt_decoder_predictions_tensor( + encoder_output=encoded, encoded_lengths=encoded_len, return_hypotheses=False + ) + + sample_id = sample_id.cpu().detach().numpy() + return list(zip(sample_id, best_hyp_text)) + + def validation_step(self, batch, batch_idx, dataloader_idx=0): + if self.is_interctc_enabled(): + AccessMixin.set_access_enabled(access_enabled=True) + + signal, signal_len, transcript, transcript_len = batch + + # forward() only performs encoder forward + encoded, encoded_len = self.forward(input_signal=signal, input_signal_length=signal_len) + del signal + + tensorboard_logs = {} + + # If experimental fused Joint-Loss-WER is not used + if not self.joint.fuse_loss_wer: + if self.compute_eval_loss: + decoder, target_length, states = self.decoder(targets=transcript, target_length=transcript_len) + joint = self.joint(encoder_outputs=encoded, decoder_outputs=decoder) + + loss_value = self.loss( + log_probs=joint, targets=transcript, input_lengths=encoded_len, target_lengths=target_length + ) + + tensorboard_logs['val_loss'] = loss_value + + self.wer.update(encoded, encoded_len, transcript, transcript_len) + wer, wer_num, wer_denom = self.wer.compute() + self.wer.reset() + + tensorboard_logs['val_wer_num'] = wer_num + tensorboard_logs['val_wer_denom'] = wer_denom + tensorboard_logs['val_wer'] = wer + + else: + # If experimental fused Joint-Loss-WER is used + compute_wer = True + + if self.compute_eval_loss: + decoded, target_len, states = self.decoder(targets=transcript, target_length=transcript_len) + else: + decoded = None + target_len = transcript_len + + # Fused joint step + loss_value, wer, wer_num, wer_denom = self.joint( + encoder_outputs=encoded, + decoder_outputs=decoded, + encoder_lengths=encoded_len, + transcripts=transcript, + transcript_lengths=target_len, + compute_wer=compute_wer, + ) + + if loss_value is not None: + tensorboard_logs['val_loss'] = loss_value + + tensorboard_logs['val_wer_num'] = wer_num + tensorboard_logs['val_wer_denom'] = wer_denom + tensorboard_logs['val_wer'] = wer + + log_probs = self.ctc_decoder(encoder_output=encoded) + if self.compute_eval_loss: + ctc_loss = self.ctc_loss( + log_probs=log_probs, targets=transcript, input_lengths=encoded_len, target_lengths=transcript_len + ) + + # Add interCTC losses + ctc_loss, interctc_tensorboard_logs = self.add_interctc_losses( + ctc_loss, transcript, transcript_len, compute_wer=True, log_wer_num_denom=True, log_prefix="val_", + ) + tensorboard_logs.update(interctc_tensorboard_logs) + + tensorboard_logs['val_ctc_loss'] = ctc_loss + tensorboard_logs['val_rnnt_loss'] = loss_value + loss_value = (1 - self.ctc_loss_weight) * loss_value + self.ctc_loss_weight * ctc_loss + tensorboard_logs['val_loss'] = loss_value + self.ctc_wer.update( + predictions=log_probs, targets=transcript, target_lengths=transcript_len, predictions_lengths=encoded_len, + ) + ctc_wer, ctc_wer_num, ctc_wer_denom = self.ctc_wer.compute() + self.ctc_wer.reset() + tensorboard_logs['val_wer_num_ctc'] = ctc_wer_num + tensorboard_logs['val_wer_denom_ctc'] = ctc_wer_denom + tensorboard_logs['val_wer_ctc'] = ctc_wer + + self.log('global_step', torch.tensor(self.trainer.global_step, dtype=torch.float32)) + + # Reset access registry + if AccessMixin.is_access_enabled(): + AccessMixin.reset_registry(self) + + return tensorboard_logs + + def test_step(self, batch, batch_idx, dataloader_idx=0): + logs = self.validation_step(batch, batch_idx, dataloader_idx=dataloader_idx) + test_logs = {name.replace("val_", "test_"): value for name, value in logs.items()} + if type(self.trainer.test_dataloaders) == list and len(self.trainer.test_dataloaders) > 1: + self.test_step_outputs[dataloader_idx].append(test_logs) + else: + self.test_step_outputs.append(test_logs) + return test_logs + + """ + def test_step(self, batch, batch_idx, dataloader_idx=0): + logs = self.validation_step(batch, batch_idx, dataloader_idx=dataloader_idx) + test_logs = { + 'test_wer_num': logs['val_wer_num'], + 'test_wer_denom': logs['val_wer_denom'], + # 'test_wer': logs['val_wer'], + } + if 'val_loss' in logs: + test_logs['test_loss'] = logs['val_loss'] + + if self.ctc_loss_weight > 0: + test_logs['test_wer_num_ctc'] = logs['val_wer_num_ctc'] + test_logs['test_wer_denom_ctc'] = logs['val_wer_denom_ctc'] + if 'val_ctc_loss' in logs: + test_logs['test_ctc_loss'] = logs['val_ctc_loss'] + if 'val_rnnt_loss' in logs: + test_logs['test_rnnt_loss'] = logs['val_rnnt_loss'] + + return test_logs + """ + + def multi_validation_epoch_end(self, outputs, dataloader_idx: int = 0): + if self.compute_eval_loss: + val_loss_mean = torch.stack([x['val_loss'] for x in outputs]).mean() + val_loss_log = {'val_loss': val_loss_mean} + else: + val_loss_log = {} + wer_num = torch.stack([x['val_wer_num'] for x in outputs]).sum() + wer_denom = torch.stack([x['val_wer_denom'] for x in outputs]).sum() + tensorboard_logs = {**val_loss_log, 'val_wer': wer_num.float() / wer_denom} + if self.ctc_loss_weight > 0: + ctc_wer_num = torch.stack([x['val_wer_num_ctc'] for x in outputs]).sum() + ctc_wer_denom = torch.stack([x['val_wer_denom_ctc'] for x in outputs]).sum() + tensorboard_logs['val_wer_ctc'] = ctc_wer_num.float() / ctc_wer_denom + + metrics = {**val_loss_log, 'log': tensorboard_logs} + self.finalize_interctc_metrics(metrics, outputs, prefix="val_") + return metrics + + def multi_test_epoch_end(self, outputs, dataloader_idx: int = 0): + if self.compute_eval_loss: + test_loss_mean = torch.stack([x['test_loss'] for x in outputs]).mean() + test_loss_log = {'test_loss': test_loss_mean} + else: + test_loss_log = {} + wer_num = torch.stack([x['test_wer_num'] for x in outputs]).sum() + wer_denom = torch.stack([x['test_wer_denom'] for x in outputs]).sum() + tensorboard_logs = {**test_loss_log, 'test_wer': wer_num.float() / wer_denom} + + if self.ctc_loss_weight > 0: + ctc_wer_num = torch.stack([x['test_wer_num_ctc'] for x in outputs]).sum() + ctc_wer_denom = torch.stack([x['test_wer_denom_ctc'] for x in outputs]).sum() + tensorboard_logs['test_wer_ctc'] = ctc_wer_num.float() / ctc_wer_denom + + metrics = {**test_loss_log, 'log': tensorboard_logs} + self.finalize_interctc_metrics(metrics, outputs, prefix="test_") + return metrics + + @classmethod + def list_available_models(cls) -> Optional[PretrainedModelInfo]: + """ + This method returns a list of pre-trained model which can be instantiated directly from NVIDIA's NGC cloud. + + Returns: + List of available pre-trained models. + """ + results = [] + return results diff --git a/nemo/collections/multimodal/speech_cv/models/visual_rnnt_bpe_models.py b/nemo/collections/multimodal/speech_cv/models/visual_rnnt_bpe_models.py new file mode 100644 index 000000000000..5c5263b1ce76 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/models/visual_rnnt_bpe_models.py @@ -0,0 +1,321 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy +import os +from typing import Dict, List, Optional, Union + +import torch +from omegaconf import DictConfig, ListConfig, OmegaConf, open_dict +from pytorch_lightning import Trainer + +from nemo.collections.asr.losses.rnnt import RNNTLoss +from nemo.collections.asr.metrics.rnnt_wer_bpe import RNNTBPEWER, RNNTBPEDecoding, RNNTBPEDecodingConfig +from nemo.collections.asr.parts.mixins import ASRBPEMixin +from nemo.collections.multimodal.speech_cv.data import video_to_text_dataset +from nemo.collections.multimodal.speech_cv.models.visual_rnnt_models import VisualEncDecRNNTModel +from nemo.core.classes.common import PretrainedModelInfo +from nemo.utils import logging, model_utils + + +class VisualEncDecRNNTBPEModel(VisualEncDecRNNTModel, ASRBPEMixin): + """Base class for encoder decoder RNNT-based models with subword tokenization.""" + + def __init__(self, cfg: DictConfig, trainer: Trainer = None): + # Convert to Hydra 1.0 compatible DictConfig + cfg = model_utils.convert_model_config_to_dict_config(cfg) + cfg = model_utils.maybe_update_config_version(cfg) + + # Tokenizer is necessary for this model + if 'tokenizer' not in cfg: + raise ValueError("`cfg` must have `tokenizer` config to create a tokenizer !") + + if not isinstance(cfg, DictConfig): + cfg = OmegaConf.create(cfg) + + # Setup the tokenizer + self._setup_tokenizer(cfg.tokenizer) + + # Initialize a dummy vocabulary + vocabulary = self.tokenizer.tokenizer.get_vocab() + + # Set the new vocabulary + with open_dict(cfg): + cfg.labels = ListConfig(list(vocabulary)) + + with open_dict(cfg.decoder): + cfg.decoder.vocab_size = len(vocabulary) + + with open_dict(cfg.joint): + cfg.joint.num_classes = len(vocabulary) + cfg.joint.vocabulary = ListConfig(list(vocabulary)) + cfg.joint.jointnet.encoder_hidden = cfg.model_defaults.enc_hidden + cfg.joint.jointnet.pred_hidden = cfg.model_defaults.pred_hidden + + super().__init__(cfg=cfg, trainer=trainer) + + # Setup decoding object + self.decoding = RNNTBPEDecoding( + decoding_cfg=self.cfg.decoding, decoder=self.decoder, joint=self.joint, tokenizer=self.tokenizer, + ) + + # Setup wer object + self.wer = RNNTBPEWER( + decoding=self.decoding, + batch_dim_index=0, + use_cer=self._cfg.get('use_cer', False), + log_prediction=self._cfg.get('log_prediction', True), + dist_sync_on_step=True, + ) + + # Setup fused Joint step if flag is set + if self.joint.fuse_loss_wer: + self.joint.set_loss(self.loss) + self.joint.set_wer(self.wer) + + def change_vocabulary( + self, + new_tokenizer_dir: Union[str, DictConfig], + new_tokenizer_type: str, + decoding_cfg: Optional[DictConfig] = None, + ): + """ + Changes vocabulary used during RNNT decoding process. Use this method when fine-tuning on from pre-trained model. + This method changes only decoder and leaves encoder and pre-processing modules unchanged. For example, you would + use it if you want to use pretrained encoder when fine-tuning on data in another language, or when you'd need + model to learn capitalization, punctuation and/or special characters. + + Args: + new_tokenizer_dir: Directory path to tokenizer or a config for a new tokenizer (if the tokenizer type is `agg`) + new_tokenizer_type: Type of tokenizer. Can be either `agg`, `bpe` or `wpe`. + decoding_cfg: A config for the decoder, which is optional. If the decoding type + needs to be changed (from say Greedy to Beam decoding etc), the config can be passed here. + + Returns: None + + """ + if isinstance(new_tokenizer_dir, DictConfig): + if new_tokenizer_type == 'agg': + new_tokenizer_cfg = new_tokenizer_dir + else: + raise ValueError( + f'New tokenizer dir should be a string unless the tokenizer is `agg`, but this tokenizer type is: {new_tokenizer_type}' + ) + else: + new_tokenizer_cfg = None + + if new_tokenizer_cfg is not None: + tokenizer_cfg = new_tokenizer_cfg + else: + if not os.path.isdir(new_tokenizer_dir): + raise NotADirectoryError( + f'New tokenizer dir must be non-empty path to a directory. But I got: {new_tokenizer_dir}' + ) + + if new_tokenizer_type.lower() not in ('bpe', 'wpe'): + raise ValueError(f'New tokenizer type must be either `bpe` or `wpe`') + + tokenizer_cfg = OmegaConf.create({'dir': new_tokenizer_dir, 'type': new_tokenizer_type}) + + # Setup the tokenizer + self._setup_tokenizer(tokenizer_cfg) + + # Initialize a dummy vocabulary + vocabulary = self.tokenizer.tokenizer.get_vocab() + + joint_config = self.joint.to_config_dict() + new_joint_config = copy.deepcopy(joint_config) + if self.tokenizer_type == "agg": + new_joint_config["vocabulary"] = ListConfig(vocabulary) + else: + new_joint_config["vocabulary"] = ListConfig(list(vocabulary.keys())) + + new_joint_config['num_classes'] = len(vocabulary) + del self.joint + self.joint = VisualEncDecRNNTBPEModel.from_config_dict(new_joint_config) + + decoder_config = self.decoder.to_config_dict() + new_decoder_config = copy.deepcopy(decoder_config) + new_decoder_config.vocab_size = len(vocabulary) + del self.decoder + self.decoder = VisualEncDecRNNTBPEModel.from_config_dict(new_decoder_config) + + del self.loss + self.loss = RNNTLoss(num_classes=self.joint.num_classes_with_blank - 1) + + if decoding_cfg is None: + # Assume same decoding config as before + decoding_cfg = self.cfg.decoding + + # Assert the decoding config with all hyper parameters + decoding_cls = OmegaConf.structured(RNNTBPEDecodingConfig) + decoding_cls = OmegaConf.create(OmegaConf.to_container(decoding_cls)) + decoding_cfg = OmegaConf.merge(decoding_cls, decoding_cfg) + + self.decoding = RNNTBPEDecoding( + decoding_cfg=decoding_cfg, decoder=self.decoder, joint=self.joint, tokenizer=self.tokenizer, + ) + + self.wer = RNNTBPEWER( + decoding=self.decoding, + batch_dim_index=self.wer.batch_dim_index, + use_cer=self.wer.use_cer, + log_prediction=self.wer.log_prediction, + dist_sync_on_step=True, + ) + + # Setup fused Joint step + if self.joint.fuse_loss_wer or ( + self.decoding.joint_fused_batch_size is not None and self.decoding.joint_fused_batch_size > 0 + ): + self.joint.set_loss(self.loss) + self.joint.set_wer(self.wer) + + # Update config + with open_dict(self.cfg.joint): + self.cfg.joint = new_joint_config + + with open_dict(self.cfg.decoder): + self.cfg.decoder = new_decoder_config + + with open_dict(self.cfg.decoding): + self.cfg.decoding = decoding_cfg + + logging.info(f"Changed decoder to output to {self.joint.vocabulary} vocabulary.") + + def change_decoding_strategy(self, decoding_cfg: DictConfig): + """ + Changes decoding strategy used during RNNT decoding process. + + Args: + decoding_cfg: A config for the decoder, which is optional. If the decoding type + needs to be changed (from say Greedy to Beam decoding etc), the config can be passed here. + """ + if decoding_cfg is None: + # Assume same decoding config as before + logging.info("No `decoding_cfg` passed when changing decoding strategy, using internal config") + decoding_cfg = self.cfg.decoding + + # Assert the decoding config with all hyper parameters + decoding_cls = OmegaConf.structured(RNNTBPEDecodingConfig) + decoding_cls = OmegaConf.create(OmegaConf.to_container(decoding_cls)) + decoding_cfg = OmegaConf.merge(decoding_cls, decoding_cfg) + + self.decoding = RNNTBPEDecoding( + decoding_cfg=decoding_cfg, decoder=self.decoder, joint=self.joint, tokenizer=self.tokenizer, + ) + + self.wer = RNNTBPEWER( + decoding=self.decoding, + batch_dim_index=self.wer.batch_dim_index, + use_cer=self.wer.use_cer, + log_prediction=self.wer.log_prediction, + dist_sync_on_step=True, + ) + + # Setup fused Joint step + if self.joint.fuse_loss_wer or ( + self.decoding.joint_fused_batch_size is not None and self.decoding.joint_fused_batch_size > 0 + ): + self.joint.set_loss(self.loss) + self.joint.set_wer(self.wer) + + # Update config + with open_dict(self.cfg.decoding): + self.cfg.decoding = decoding_cfg + + logging.info(f"Changed decoding strategy to \n{OmegaConf.to_yaml(self.cfg.decoding)}") + + def _setup_dataloader_from_config(self, config: Optional[Dict]): + dataset = video_to_text_dataset.get_video_to_text_bpe_dataset_from_config( + config=config, + local_rank=self.local_rank, + global_rank=self.global_rank, + world_size=self.world_size, + tokenizer=self.tokenizer, + preprocessor_cfg=self.cfg.get("preprocessor", None), + ) + + if dataset is None: + return None + + shuffle = config['shuffle'] + if config.get('is_tarred', False): + shuffle = False + + if hasattr(dataset, 'collate_fn'): + collate_fn = dataset.collate_fn + else: + collate_fn = dataset.datasets[0].collate_fn + + return torch.utils.data.DataLoader( + dataset=dataset, + batch_size=config['batch_size'], + collate_fn=collate_fn, + drop_last=config.get('drop_last', False), + shuffle=shuffle, + num_workers=config.get('num_workers', 0), + pin_memory=config.get('pin_memory', False), + ) + + def _setup_transcribe_dataloader(self, config: Dict) -> 'torch.utils.data.DataLoader': + """ + Setup function for a temporary data loader which wraps the provided video file. + + Args: + config: A python dictionary which contains the following keys: + paths2video_files: (a list) of paths to video files. The files should be relatively short fragments. \ + Recommended length per file is between 5 and 25 seconds. + batch_size: (int) batch size to use during inference. \ + Bigger will result in better throughput performance but would use more memory. + temp_dir: (str) A temporary directory where the video manifest is temporarily + stored. + + Returns: + A pytorch DataLoader for the given video file(s). + """ + if 'manifest_filepath' in config: + manifest_filepath = config['manifest_filepath'] + batch_size = config['batch_size'] + else: + manifest_filepath = os.path.join(config['temp_dir'], 'manifest.json') + batch_size = min(config['batch_size'], len(config['paths2video_files'])) + + dl_config = { + 'manifest_filepath': manifest_filepath, + 'batch_size': batch_size, + 'shuffle': False, + 'num_workers': config.get('num_workers', min(batch_size, os.cpu_count() - 1)), + 'pin_memory': True, + 'channel_selector': config.get('channel_selector', None), + 'use_start_end_token': self.cfg.validation_ds.get('use_start_end_token', False), + } + + if config.get("augmentor"): + dl_config['augmentor'] = config.get("augmentor") + + temporary_datalayer = self._setup_dataloader_from_config(config=DictConfig(dl_config)) + return temporary_datalayer + + @classmethod + def list_available_models(cls) -> List[PretrainedModelInfo]: + """ + This method returns a list of pre-trained model which can be instantiated directly from NVIDIA's NGC cloud. + + Returns: + List of available pre-trained models. + """ + results = [] + + return results diff --git a/nemo/collections/multimodal/speech_cv/models/visual_rnnt_models.py b/nemo/collections/multimodal/speech_cv/models/visual_rnnt_models.py new file mode 100644 index 000000000000..0bd3e2fd1563 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/models/visual_rnnt_models.py @@ -0,0 +1,920 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy +import json +import os +import tempfile +from math import ceil, isclose +from typing import Dict, List, Optional, Tuple, Union + +import torch +from omegaconf import DictConfig, OmegaConf, open_dict +from pytorch_lightning import Trainer +from tqdm.auto import tqdm + +from nemo.collections.asr.data import audio_to_text_dataset +from nemo.collections.asr.losses.rnnt import RNNTLoss, resolve_rnnt_default_loss_name +from nemo.collections.asr.metrics.rnnt_wer import RNNTWER, RNNTDecoding, RNNTDecodingConfig +from nemo.collections.asr.models.asr_model import ASRModel +from nemo.collections.asr.modules.rnnt import RNNTDecoderJoint +from nemo.collections.asr.parts.mixins import ASRModuleMixin +from nemo.collections.asr.parts.utils.audio_utils import ChannelSelectorType +from nemo.collections.multimodal.speech_cv.data import video_to_text_dataset +from nemo.core.classes import Exportable +from nemo.core.classes.common import PretrainedModelInfo, typecheck +from nemo.core.classes.mixins import AccessMixin +from nemo.core.neural_types import AcousticEncodedRepresentation, LengthsType, NeuralType, VideoSignal +from nemo.utils import logging + + +class VisualEncDecRNNTModel(ASRModel, ASRModuleMixin, Exportable): + """Base class for encoder decoder RNNT-based models.""" + + def __init__(self, cfg: DictConfig, trainer: Trainer = None): + # Get global rank and total number of GPU workers for IterableDataset partitioning, if applicable + # Global_rank and local_rank is set by LightningModule in Lightning 1.2.0 + self.world_size = 1 + if trainer is not None: + self.world_size = trainer.world_size + + super().__init__(cfg=cfg, trainer=trainer) + + # Preprocessors + self.video_preprocessor = VisualEncDecRNNTModel.from_config_dict(self._cfg.video_preprocessor) + + # Augmentations + self.video_augmentation = VisualEncDecRNNTModel.from_config_dict(self._cfg.video_augment) + + # Front-end Networks + self.video_front_end = VisualEncDecRNNTModel.from_config_dict(self._cfg.video_front_end) + + # Back-end Networks + self.encoder = VisualEncDecRNNTModel.from_config_dict(self._cfg.encoder) + + # Update config values required by components dynamically + with open_dict(self.cfg.decoder): + self.cfg.decoder.vocab_size = len(self.cfg.labels) + + with open_dict(self.cfg.joint): + self.cfg.joint.num_classes = len(self.cfg.labels) + self.cfg.joint.vocabulary = self.cfg.labels + self.cfg.joint.jointnet.encoder_hidden = self.cfg.model_defaults.enc_hidden + self.cfg.joint.jointnet.pred_hidden = self.cfg.model_defaults.pred_hidden + + self.decoder = VisualEncDecRNNTModel.from_config_dict(self.cfg.decoder) + self.joint = VisualEncDecRNNTModel.from_config_dict(self.cfg.joint) + + # Setup RNNT Loss + loss_name, loss_kwargs = self.extract_rnnt_loss_cfg(self.cfg.get("loss", None)) + + self.loss = RNNTLoss( + num_classes=self.joint.num_classes_with_blank - 1, + loss_name=loss_name, + loss_kwargs=loss_kwargs, + reduction=self.cfg.get("rnnt_reduction", "mean_batch"), + ) + + # Setup decoding objects + self.decoding = RNNTDecoding( + decoding_cfg=self.cfg.decoding, decoder=self.decoder, joint=self.joint, vocabulary=self.joint.vocabulary, + ) + # Setup WER calculation + self.wer = RNNTWER( + decoding=self.decoding, + batch_dim_index=0, + use_cer=self._cfg.get('use_cer', False), + log_prediction=self._cfg.get('log_prediction', True), + dist_sync_on_step=True, + ) + + # Whether to compute loss during evaluation + if 'compute_eval_loss' in self.cfg: + self.compute_eval_loss = self.cfg.compute_eval_loss + else: + self.compute_eval_loss = True + + # Setup fused Joint step if flag is set + if self.joint.fuse_loss_wer or ( + self.decoding.joint_fused_batch_size is not None and self.decoding.joint_fused_batch_size > 0 + ): + self.joint.set_loss(self.loss) + self.joint.set_wer(self.wer) + + # Setup optimization normalization (if provided in config) + self.setup_optim_normalization() + + # Setup optional Optimization flags + self.setup_optimization_flags() + + # Setup encoder adapters (from ASRAdapterModelMixin) + self.setup_adapters() + + def setup_optim_normalization(self): + """ + Helper method to setup normalization of certain parts of the model prior to the optimization step. + + Supported pre-optimization normalizations are as follows: + + .. code-block:: yaml + + # Variation Noise injection + model: + variational_noise: + std: 0.0 + start_step: 0 + + # Joint - Length normalization + model: + normalize_joint_txu: false + + # Encoder Network - gradient normalization + model: + normalize_encoder_norm: false + + # Decoder / Prediction Network - gradient normalization + model: + normalize_decoder_norm: false + + # Joint - gradient normalization + model: + normalize_joint_norm: false + """ + # setting up the variational noise for the decoder + if hasattr(self.cfg, 'variational_noise'): + self._optim_variational_noise_std = self.cfg['variational_noise'].get('std', 0) + self._optim_variational_noise_start = self.cfg['variational_noise'].get('start_step', 0) + else: + self._optim_variational_noise_std = 0 + self._optim_variational_noise_start = 0 + + # Setup normalized gradients for model joint by T x U scaling factor (joint length normalization) + self._optim_normalize_joint_txu = self.cfg.get('normalize_joint_txu', False) + self._optim_normalize_txu = None + + # Setup normalized encoder norm for model + self._optim_normalize_encoder_norm = self.cfg.get('normalize_encoder_norm', False) + + # Setup normalized decoder norm for model + self._optim_normalize_decoder_norm = self.cfg.get('normalize_decoder_norm', False) + + # Setup normalized joint norm for model + self._optim_normalize_joint_norm = self.cfg.get('normalize_joint_norm', False) + + def extract_rnnt_loss_cfg(self, cfg: Optional[DictConfig]): + """ + Helper method to extract the rnnt loss name, and potentially its kwargs + to be passed. + + Args: + cfg: Should contain `loss_name` as a string which is resolved to a RNNT loss name. + If the default should be used, then `default` can be used. + Optionally, one can pass additional kwargs to the loss function. The subdict + should have a keyname as follows : `{loss_name}_kwargs`. + + Note that whichever loss_name is selected, that corresponding kwargs will be + selected. For the "default" case, the "{resolved_default}_kwargs" will be used. + + Examples: + .. code-block:: yaml + + loss_name: "default" + warprnnt_numba_kwargs: + kwargs2: some_other_val + + Returns: + A tuple, the resolved loss name as well as its kwargs (if found). + """ + if cfg is None: + cfg = DictConfig({}) + + loss_name = cfg.get("loss_name", "default") + + if loss_name == "default": + loss_name = resolve_rnnt_default_loss_name() + + loss_kwargs = cfg.get(f"{loss_name}_kwargs", None) + + logging.info(f"Using RNNT Loss : {loss_name}\n" f"Loss {loss_name}_kwargs: {loss_kwargs}") + + return loss_name, loss_kwargs + + @torch.no_grad() + def transcribe( + self, + paths2video_files: List[str], + batch_size: int = 4, + return_hypotheses: bool = False, + partial_hypothesis: Optional[List['Hypothesis']] = None, + num_workers: int = 0, + channel_selector: Optional[ChannelSelectorType] = None, + augmentor: DictConfig = None, + ) -> Tuple[List[str], Optional[List['Hypothesis']]]: + """ + Uses greedy decoding to transcribe video files. Use this method for debugging and prototyping. + + Args: + + paths2video_files: (a list) of paths to video files. + batch_size: (int) batch size to use during inference. \ + Bigger will result in better throughput performance but would use more memory. + return_hypotheses: (bool) Either return hypotheses or text + With hypotheses can do some postprocessing like getting timestamp or rescoring + num_workers: (int) number of workers for DataLoader + channel_selector (int | Iterable[int] | str): select a single channel or a subset of channels from multi-channel audio. If set to `'average'`, it performs averaging across channels. Disabled if set to `None`. Defaults to `None`. Uses zero-based indexing. + augmentor: (DictConfig): Augment audio samples during transcription if augmentor is applied. + Returns: + Returns a tuple of 2 items - + * A list of greedy transcript texts / Hypothesis + * An optional list of beam search transcript texts / Hypothesis / NBestHypothesis. + """ + if paths2video_files is None or len(paths2video_files) == 0: + return {} + # We will store transcriptions here + hypotheses = [] + all_hypotheses = [] + # Model's mode and device + mode = self.training + device = next(self.parameters()).device + + if num_workers is None: + num_workers = min(batch_size, os.cpu_count() - 1) + + try: + + # Switch model to evaluation mode + self.eval() + # Freeze the visual front-end, encoder and decoder modules + self.video_front_end.freeze() + self.encoder.freeze() + self.decoder.freeze() + self.joint.freeze() + logging_level = logging.get_verbosity() + logging.set_verbosity(logging.WARNING) + # Work in tmp directory - will store manifest file there + with tempfile.TemporaryDirectory() as tmpdir: + with open(os.path.join(tmpdir, 'manifest.json'), 'w', encoding='utf-8') as fp: + for video_file in paths2video_files: + entry = {'video_filepath': video_file, 'duration': 100000, 'text': ''} + fp.write(json.dumps(entry) + '\n') + + config = { + 'paths2video_files': paths2video_files, + 'batch_size': batch_size, + 'temp_dir': tmpdir, + 'num_workers': num_workers, + 'channel_selector': channel_selector, + } + + if augmentor: + config['augmentor'] = augmentor + + temporary_datalayer = self._setup_transcribe_dataloader(config) + for test_batch in tqdm(temporary_datalayer, desc="Transcribing"): + encoded, encoded_len = self.forward( + input_signal=test_batch[0].to(device), input_signal_length=test_batch[1].to(device) + ) + best_hyp, all_hyp = self.decoding.rnnt_decoder_predictions_tensor( + encoded, + encoded_len, + return_hypotheses=return_hypotheses, + partial_hypotheses=partial_hypothesis, + ) + + hypotheses += best_hyp + if all_hyp is not None: + all_hypotheses += all_hyp + else: + all_hypotheses += best_hyp + + del encoded + del test_batch + finally: + # set mode back to its original value + self.train(mode=mode) + + logging.set_verbosity(logging_level) + if mode is True: + self.video_front_end.unfreeze() + self.encoder.unfreeze() + self.decoder.unfreeze() + self.joint.unfreeze() + return hypotheses, all_hypotheses + + def change_vocabulary(self, new_vocabulary: List[str], decoding_cfg: Optional[DictConfig] = None): + """ + Changes vocabulary used during RNNT decoding process. Use this method when fine-tuning a pre-trained model. + This method changes only decoder and leaves encoder and pre-processing modules unchanged. For example, you would + use it if you want to use pretrained encoder when fine-tuning on data in another language, or when you'd need + model to learn capitalization, punctuation and/or special characters. + + Args: + new_vocabulary: list with new vocabulary. Must contain at least 2 elements. Typically, \ + this is target alphabet. + decoding_cfg: A config for the decoder, which is optional. If the decoding type + needs to be changed (from say Greedy to Beam decoding etc), the config can be passed here. + + Returns: None + + """ + if self.joint.vocabulary == new_vocabulary: + logging.warning(f"Old {self.joint.vocabulary} and new {new_vocabulary} match. Not changing anything.") + else: + if new_vocabulary is None or len(new_vocabulary) == 0: + raise ValueError(f'New vocabulary must be non-empty list of chars. But I got: {new_vocabulary}') + + joint_config = self.joint.to_config_dict() + new_joint_config = copy.deepcopy(joint_config) + new_joint_config['vocabulary'] = new_vocabulary + new_joint_config['num_classes'] = len(new_vocabulary) + del self.joint + self.joint = VisualEncDecRNNTModel.from_config_dict(new_joint_config) + + decoder_config = self.decoder.to_config_dict() + new_decoder_config = copy.deepcopy(decoder_config) + new_decoder_config.vocab_size = len(new_vocabulary) + del self.decoder + self.decoder = VisualEncDecRNNTModel.from_config_dict(new_decoder_config) + + del self.loss + loss_name, loss_kwargs = self.extract_rnnt_loss_cfg(self.cfg.get('loss', None)) + self.loss = RNNTLoss( + num_classes=self.joint.num_classes_with_blank - 1, loss_name=loss_name, loss_kwargs=loss_kwargs + ) + + if decoding_cfg is None: + # Assume same decoding config as before + decoding_cfg = self.cfg.decoding + + # Assert the decoding config with all hyper parameters + decoding_cls = OmegaConf.structured(RNNTDecodingConfig) + decoding_cls = OmegaConf.create(OmegaConf.to_container(decoding_cls)) + decoding_cfg = OmegaConf.merge(decoding_cls, decoding_cfg) + + self.decoding = RNNTDecoding( + decoding_cfg=decoding_cfg, decoder=self.decoder, joint=self.joint, vocabulary=self.joint.vocabulary, + ) + + self.wer = RNNTWER( + decoding=self.decoding, + batch_dim_index=self.wer.batch_dim_index, + use_cer=self.wer.use_cer, + log_prediction=self.wer.log_prediction, + dist_sync_on_step=True, + ) + + # Setup fused Joint step + if self.joint.fuse_loss_wer or ( + self.decoding.joint_fused_batch_size is not None and self.decoding.joint_fused_batch_size > 0 + ): + self.joint.set_loss(self.loss) + self.joint.set_wer(self.wer) + + # Update config + with open_dict(self.cfg.joint): + self.cfg.joint = new_joint_config + + with open_dict(self.cfg.decoder): + self.cfg.decoder = new_decoder_config + + with open_dict(self.cfg.decoding): + self.cfg.decoding = decoding_cfg + + ds_keys = ['train_ds', 'validation_ds', 'test_ds'] + for key in ds_keys: + if key in self.cfg: + with open_dict(self.cfg[key]): + self.cfg[key]['labels'] = OmegaConf.create(new_vocabulary) + + logging.info(f"Changed decoder to output to {self.joint.vocabulary} vocabulary.") + + def change_decoding_strategy(self, decoding_cfg: DictConfig): + """ + Changes decoding strategy used during RNNT decoding process. + + Args: + decoding_cfg: A config for the decoder, which is optional. If the decoding type + needs to be changed (from say Greedy to Beam decoding etc), the config can be passed here. + """ + if decoding_cfg is None: + # Assume same decoding config as before + logging.info("No `decoding_cfg` passed when changing decoding strategy, using internal config") + decoding_cfg = self.cfg.decoding + + # Assert the decoding config with all hyper parameters + decoding_cls = OmegaConf.structured(RNNTDecodingConfig) + decoding_cls = OmegaConf.create(OmegaConf.to_container(decoding_cls)) + decoding_cfg = OmegaConf.merge(decoding_cls, decoding_cfg) + + self.decoding = RNNTDecoding( + decoding_cfg=decoding_cfg, decoder=self.decoder, joint=self.joint, vocabulary=self.joint.vocabulary, + ) + + self.wer = RNNTWER( + decoding=self.decoding, + batch_dim_index=self.wer.batch_dim_index, + use_cer=self.wer.use_cer, + log_prediction=self.wer.log_prediction, + dist_sync_on_step=True, + ) + + # Setup fused Joint step + if self.joint.fuse_loss_wer or ( + self.decoding.joint_fused_batch_size is not None and self.decoding.joint_fused_batch_size > 0 + ): + self.joint.set_loss(self.loss) + self.joint.set_wer(self.wer) + + # Update config + with open_dict(self.cfg.decoding): + self.cfg.decoding = decoding_cfg + + logging.info(f"Changed decoding strategy to \n{OmegaConf.to_yaml(self.cfg.decoding)}") + + def _setup_dataloader_from_config(self, config: Optional[Dict]): + # Automatically inject args from model config to dataloader config + audio_to_text_dataset.inject_dataloader_value_from_model_config(self.cfg, config, key='sample_rate') + audio_to_text_dataset.inject_dataloader_value_from_model_config(self.cfg, config, key='labels') + dataset = video_to_text_dataset.get_video_to_text_bpe_dataset_from_config( + config=config, + local_rank=self.local_rank, + global_rank=self.global_rank, + world_size=self.world_size, + preprocessor_cfg=self._cfg.get("preprocessor", None), + ) + + if dataset is None: + return None + + shuffle = config['shuffle'] + if config.get('is_tarred', False): + shuffle = False + + if hasattr(dataset, 'collate_fn'): + collate_fn = dataset.collate_fn + else: + collate_fn = dataset.datasets[0].collate_fn + + return torch.utils.data.DataLoader( + dataset=dataset, + batch_size=config['batch_size'], + collate_fn=collate_fn, + drop_last=config.get('drop_last', False), + shuffle=shuffle, + num_workers=config.get('num_workers', 0), + pin_memory=config.get('pin_memory', False), + ) + + def setup_training_data(self, train_data_config: Optional[Union[DictConfig, Dict]]): + """ + Sets up the training data loader via a Dict-like object. + + Args: + train_data_config: A config that contains the information regarding construction + of an ASR Training dataset. + + Supported Datasets: + - :class:`~nemo.collections.multimodal.speech_cv.data.video_to_text.VideoToCharDataset` + - :class:`~nemo.collections.asr.data.video_to_text.VideoToBPEDataset` + - :class:`~nemo.collections.asr.data.video_to_text.TarredVideoToBPEDataset` + """ + if 'shuffle' not in train_data_config: + train_data_config['shuffle'] = True + + # preserve config + self._update_dataset_config(dataset_name='train', config=train_data_config) + + self._train_dl = self._setup_dataloader_from_config(config=train_data_config) + + # Need to set this because if using an IterableDataset, the length of the dataloader is the total number + # of samples rather than the number of batches, and this messes up the tqdm progress bar. + # So we set the number of steps manually (to the correct number) to fix this. + if 'is_tarred' in train_data_config and train_data_config['is_tarred']: + # We also need to check if limit_train_batches is already set. + # If it's an int, we assume that the user has set it to something sane, i.e. <= # training batches, + # and don't change it. Otherwise, adjust batches accordingly if it's a float (including 1.0). + if self._trainer is not None and isinstance(self._trainer.limit_train_batches, float): + self._trainer.limit_train_batches = int( + self._trainer.limit_train_batches + * ceil((len(self._train_dl.dataset) / self.world_size) / train_data_config['batch_size']) + ) + elif self._trainer is None: + logging.warning( + "Model Trainer was not set before constructing the dataset, incorrect number of " + "training batches will be used. Please set the trainer and rebuild the dataset." + ) + + def setup_validation_data(self, val_data_config: Optional[Union[DictConfig, Dict]]): + """ + Sets up the validation data loader via a Dict-like object. + + Args: + val_data_config: A config that contains the information regarding construction + of an ASR Training dataset. + + Supported Datasets: + - :class:`~nemo.collections.multimodal.speech_cv.data.video_to_text.VideoToCharDataset` + - :class:`~nemo.collections.asr.data.video_to_text.VideoToBPEDataset` + - :class:`~nemo.collections.asr.data.video_to_text.TarredVideoToBPEDataset` + """ + if 'shuffle' not in val_data_config: + val_data_config['shuffle'] = False + + # preserve config + self._update_dataset_config(dataset_name='validation', config=val_data_config) + + self._validation_dl = self._setup_dataloader_from_config(config=val_data_config) + + def setup_test_data(self, test_data_config: Optional[Union[DictConfig, Dict]]): + """ + Sets up the test data loader via a Dict-like object. + + Args: + test_data_config: A config that contains the information regarding construction + of an ASR Training dataset. + + Supported Datasets: + - :class:`~nemo.collections.multimodal.speech_cv.data.video_to_text.VideoToCharDataset` + - :class:`~nemo.collections.asr.data.video_to_text.VideoToBPEDataset` + - :class:`~nemo.collections.asr.data.video_to_text.TarredVideoToBPEDataset` + """ + if 'shuffle' not in test_data_config: + test_data_config['shuffle'] = False + + # preserve config + self._update_dataset_config(dataset_name='test', config=test_data_config) + + self._test_dl = self._setup_dataloader_from_config(config=test_data_config) + + @property + def input_types(self) -> Optional[Dict[str, NeuralType]]: + + return { + "input_signal": NeuralType(('B', 'C', 'T', 'H', 'W'), VideoSignal(), optional=True), + "input_signal_length": NeuralType(tuple('B'), LengthsType(), optional=True), + } + + @property + def output_types(self) -> Optional[Dict[str, NeuralType]]: + return { + "outputs": NeuralType(('B', 'D', 'T'), AcousticEncodedRepresentation()), + "encoded_lengths": NeuralType(tuple('B'), LengthsType()), + } + + @typecheck() + def forward(self, input_signal=None, input_signal_length=None): + """ + Forward pass of the model. Note that for RNNT Models, the forward pass of the model is a 3 step process, + and this method only performs the first step - forward of the acoustic/visual model. + + Please refer to the `training_step` in order to see the full `forward` step for training - which + performs the forward of the acoustic model, the prediction network and then the joint network. + Finally, it computes the loss and possibly compute the detokenized text via the `decoding` step. + + Please refer to the `validation_step` in order to see the full `forward` step for inference - which + performs the forward of the acoustic model, the prediction network and then the joint network. + Finally, it computes the decoded tokens via the `decoding` step and possibly compute the batch metrics. + + Args: + input_signal: Tensor that represents a batch of video signals, + of shape [B, T, H, W, C]. T here represents timesteps, H height, W width and C channels + input_signal_length: Vector of length B, that contains the individual lengths of the video + sequences. + + Returns: + A tuple of 2 elements - + 1) The log probabilities tensor of shape [B, T, D]. + 2) The lengths of the acoustic sequence after propagation through the encoder, of shape [B]. + """ + + # Preprocessing + processed_video_signal, processed_video_signal_length = self.video_preprocessor( + input_signal=input_signal, length=input_signal_length + ) + + # Augmentation + processed_video_signal = self.video_augmentation( + input_signal=processed_video_signal, length=processed_video_signal_length + ) + + # Front-end Networks + processed_video_signal, processed_video_signal_length = self.video_front_end( + input_signal=processed_video_signal, length=processed_video_signal_length + ) + + # Back-end Networks + encoded, encoded_len = self.encoder(audio_signal=processed_video_signal, length=processed_video_signal_length) + + return encoded, encoded_len + + # PTL-specific methods + def training_step(self, batch, batch_nb): + # Reset access registry + if AccessMixin.is_access_enabled(): + AccessMixin.reset_registry(self) + + signal, signal_len, transcript, transcript_len = batch + + # forward() only performs encoder forward + encoded, encoded_len = self.forward(input_signal=signal, input_signal_length=signal_len) + del signal + + # During training, loss must be computed, so decoder forward is necessary + decoder, target_length, states = self.decoder(targets=transcript, target_length=transcript_len) + + if hasattr(self, '_trainer') and self._trainer is not None: + log_every_n_steps = self._trainer.log_every_n_steps + sample_id = self._trainer.global_step + else: + log_every_n_steps = 1 + sample_id = batch_nb + + # If experimental fused Joint-Loss-WER is not used + if not self.joint.fuse_loss_wer: + # Compute full joint and loss + joint = self.joint(encoder_outputs=encoded, decoder_outputs=decoder) + loss_value = self.loss( + log_probs=joint, targets=transcript, input_lengths=encoded_len, target_lengths=target_length + ) + + # Add auxiliary losses, if registered + loss_value = self.add_auxiliary_losses(loss_value) + + # Reset access registry + if AccessMixin.is_access_enabled(): + AccessMixin.reset_registry(self) + + tensorboard_logs = { + 'train_loss': loss_value, + 'learning_rate': self._optimizer.param_groups[0]['lr'], + 'global_step': torch.tensor(self.trainer.global_step, dtype=torch.float32), + } + + if (sample_id + 1) % log_every_n_steps == 0: + self.wer.update(encoded, encoded_len, transcript, transcript_len) + _, scores, words = self.wer.compute() + self.wer.reset() + tensorboard_logs.update({'training_batch_wer': scores.float() / words}) + + else: + # If experimental fused Joint-Loss-WER is used + if (sample_id + 1) % log_every_n_steps == 0: + compute_wer = True + else: + compute_wer = False + + # Fused joint step + loss_value, wer, _, _ = self.joint( + encoder_outputs=encoded, + decoder_outputs=decoder, + encoder_lengths=encoded_len, + transcripts=transcript, + transcript_lengths=transcript_len, + compute_wer=compute_wer, + ) + + # Add auxiliary losses, if registered + loss_value = self.add_auxiliary_losses(loss_value) + + # Reset access registry + if AccessMixin.is_access_enabled(): + AccessMixin.reset_registry(self) + + tensorboard_logs = { + 'train_loss': loss_value, + 'learning_rate': self._optimizer.param_groups[0]['lr'], + 'global_step': torch.tensor(self.trainer.global_step, dtype=torch.float32), + } + + if compute_wer: + tensorboard_logs.update({'training_batch_wer': wer}) + + # Log items + self.log_dict(tensorboard_logs) + + # Preserve batch acoustic model T and language model U parameters if normalizing + if self._optim_normalize_joint_txu: + self._optim_normalize_txu = [encoded_len.max(), transcript_len.max()] + + return {'loss': loss_value} + + def predict_step(self, batch, batch_idx, dataloader_idx=0): + signal, signal_len, transcript, transcript_len, sample_id = batch + + # forward() only performs encoder forward + encoded, encoded_len = self.forward(input_signal=signal, input_signal_length=signal_len) + del signal + + best_hyp_text, all_hyp_text = self.decoding.rnnt_decoder_predictions_tensor( + encoder_output=encoded, encoded_lengths=encoded_len, return_hypotheses=False + ) + + sample_id = sample_id.cpu().detach().numpy() + return list(zip(sample_id, best_hyp_text)) + + def validation_step(self, batch, batch_idx, dataloader_idx=0): + signal, signal_len, transcript, transcript_len = batch + + # forward() only performs encoder forward + encoded, encoded_len = self.forward(input_signal=signal, input_signal_length=signal_len) + del signal + + tensorboard_logs = {} + + # If experimental fused Joint-Loss-WER is not used + if not self.joint.fuse_loss_wer: + if self.compute_eval_loss: + decoder, target_length, states = self.decoder(targets=transcript, target_length=transcript_len) + joint = self.joint(encoder_outputs=encoded, decoder_outputs=decoder) + + loss_value = self.loss( + log_probs=joint, targets=transcript, input_lengths=encoded_len, target_lengths=target_length + ) + + tensorboard_logs['val_loss'] = loss_value + + self.wer.update(encoded, encoded_len, transcript, transcript_len) + wer, wer_num, wer_denom = self.wer.compute() + self.wer.reset() + + tensorboard_logs['val_wer_num'] = wer_num + tensorboard_logs['val_wer_denom'] = wer_denom + tensorboard_logs['val_wer'] = wer + + else: + # If experimental fused Joint-Loss-WER is used + compute_wer = True + + if self.compute_eval_loss: + decoded, target_len, states = self.decoder(targets=transcript, target_length=transcript_len) + else: + decoded = None + target_len = transcript_len + + # Fused joint step + loss_value, wer, wer_num, wer_denom = self.joint( + encoder_outputs=encoded, + decoder_outputs=decoded, + encoder_lengths=encoded_len, + transcripts=transcript, + transcript_lengths=target_len, + compute_wer=compute_wer, + ) + + if loss_value is not None: + tensorboard_logs['val_loss'] = loss_value + + tensorboard_logs['val_wer_num'] = wer_num + tensorboard_logs['val_wer_denom'] = wer_denom + tensorboard_logs['val_wer'] = wer + + self.log('global_step', torch.tensor(self.trainer.global_step, dtype=torch.float32)) + + return tensorboard_logs + + def test_step(self, batch, batch_idx, dataloader_idx=0): + logs = self.validation_step(batch, batch_idx, dataloader_idx=dataloader_idx) + test_logs = { + 'test_wer_num': logs['val_wer_num'], + 'test_wer_denom': logs['val_wer_denom'], + # 'test_wer': logs['val_wer'], + } + if 'val_loss' in logs: + test_logs['test_loss'] = logs['val_loss'] + return test_logs + + def multi_validation_epoch_end(self, outputs, dataloader_idx: int = 0): + if self.compute_eval_loss: + val_loss_mean = torch.stack([x['val_loss'] for x in outputs]).mean() + val_loss_log = {'val_loss': val_loss_mean} + else: + val_loss_log = {} + wer_num = torch.stack([x['val_wer_num'] for x in outputs]).sum() + wer_denom = torch.stack([x['val_wer_denom'] for x in outputs]).sum() + tensorboard_logs = {**val_loss_log, 'val_wer': wer_num.float() / wer_denom} + return {**val_loss_log, 'log': tensorboard_logs} + + def multi_test_epoch_end(self, outputs, dataloader_idx: int = 0): + if self.compute_eval_loss: + test_loss_mean = torch.stack([x['test_loss'] for x in outputs]).mean() + test_loss_log = {'test_loss': test_loss_mean} + else: + test_loss_log = {} + wer_num = torch.stack([x['test_wer_num'] for x in outputs]).sum() + wer_denom = torch.stack([x['test_wer_denom'] for x in outputs]).sum() + tensorboard_logs = {**test_loss_log, 'test_wer': wer_num.float() / wer_denom} + return {**test_loss_log, 'log': tensorboard_logs} + + def _setup_transcribe_dataloader(self, config: Dict) -> 'torch.utils.data.DataLoader': + """ + Setup function for a temporary data loader which wraps the provided video file. + + Args: + config: A python dictionary which contains the following keys: + paths2video_files: (a list) of paths to video files. The files should be relatively short fragments. \ + Recommended length per file is between 5 and 25 seconds. + batch_size: (int) batch size to use during inference. \ + Bigger will result in better throughput performance but would use more memory. + temp_dir: (str) A temporary directory where the video manifest is temporarily + stored. + + Returns: + A pytorch DataLoader for the given video file(s). + """ + if 'manifest_filepath' in config: + manifest_filepath = config['manifest_filepath'] + batch_size = config['batch_size'] + else: + manifest_filepath = os.path.join(config['temp_dir'], 'manifest.json') + batch_size = min(config['batch_size'], len(config['paths2video_files'])) + + dl_config = { + 'manifest_filepath': manifest_filepath, + 'labels': self.joint.vocabulary, + 'batch_size': batch_size, + 'trim_silence': False, + 'shuffle': False, + 'num_workers': config.get('num_workers', min(batch_size, os.cpu_count() - 1)), + 'pin_memory': True, + } + + if config.get("augmentor"): + dl_config['augmentor'] = config.get("augmentor") + + temporary_datalayer = self._setup_dataloader_from_config(config=DictConfig(dl_config)) + return temporary_datalayer + + def on_after_backward(self): + super().on_after_backward() + if self._optim_variational_noise_std > 0 and self.global_step >= self._optim_variational_noise_start: + for param_name, param in self.decoder.named_parameters(): + if param.grad is not None: + noise = torch.normal( + mean=0.0, + std=self._optim_variational_noise_std, + size=param.size(), + device=param.device, + dtype=param.dtype, + ) + param.grad.data.add_(noise) + + if self._optim_normalize_joint_txu: + T, U = self._optim_normalize_txu + if T is not None and U is not None: + for param_name, param in self.encoder.named_parameters(): + if param.grad is not None: + param.grad.data.div_(U) + + for param_name, param in self.decoder.named_parameters(): + if param.grad is not None: + param.grad.data.div_(T) + + if self._optim_normalize_encoder_norm: + for param_name, param in self.encoder.named_parameters(): + if param.grad is not None: + norm = param.grad.norm() + param.grad.data.div_(norm) + + if self._optim_normalize_decoder_norm: + for param_name, param in self.decoder.named_parameters(): + if param.grad is not None: + norm = param.grad.norm() + param.grad.data.div_(norm) + + if self._optim_normalize_joint_norm: + for param_name, param in self.joint.named_parameters(): + if param.grad is not None: + norm = param.grad.norm() + param.grad.data.div_(norm) + + # EncDecRNNTModel is exported in 2 parts + def list_export_subnets(self): + return ['encoder', 'decoder_joint'] + + # for export + @property + def decoder_joint(self): + return RNNTDecoderJoint(self.decoder, self.joint) + + @classmethod + def list_available_models(cls) -> List[PretrainedModelInfo]: + """ + This method returns a list of pre-trained model which can be instantiated directly from NVIDIA's NGC cloud. + + Returns: + List of available pre-trained models. + """ + results = [] + + return results diff --git a/nemo/collections/multimodal/speech_cv/modules/__init__.py b/nemo/collections/multimodal/speech_cv/modules/__init__.py new file mode 100644 index 000000000000..bfc09080b274 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/modules/__init__.py @@ -0,0 +1,20 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from nemo.collections.multimodal.speech_cv.modules.linear_projection_video_front_end import ( + LinearProjectionVideoFrontEnd, +) +from nemo.collections.multimodal.speech_cv.modules.resnet_video_front_end import ResNetVideoFrontEnd +from nemo.collections.multimodal.speech_cv.modules.video_augment import VideoAugmentation +from nemo.collections.multimodal.speech_cv.modules.video_preprocessing import VideoPreprocessor diff --git a/nemo/collections/multimodal/speech_cv/modules/linear_projection_video_front_end.py b/nemo/collections/multimodal/speech_cv/modules/linear_projection_video_front_end.py new file mode 100644 index 000000000000..45e797171f2e --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/modules/linear_projection_video_front_end.py @@ -0,0 +1,143 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from collections import OrderedDict + +import torch +from torch import nn + +from nemo.core.classes.module import NeuralModule +from nemo.core.neural_types import LengthsType, NeuralType, VideoSignal + + +class LinearProjectionVideoFrontEnd(NeuralModule): + + """ + Linear Projection Video Front-End for Lip Reading + + The spatial dimension is flattened and projected to dim_output using a Linear layer. + This is equivalent to having a convolution layer with a kernel size of the size of the image. + Circle crop can be used as pre-processing to crop the image as a circle around lips and ignore corner pixels + + Args: + in_channels: number of inputs video channels, 1 for grayscale and 3 for RGB + in_height: image height + in_width: image width + dim_output: output feature dimension for linear projection + out_channels_first: Whether outputs should have channels_first format (Batch, Dout, Time) or channels_last (Batch, Time, Dout) + circle_crop: crop the image as a circle before the Linear layer, default to False + circle_radius: the circle radius, default to 1 for full circle + + """ + + def __init__( + self, + in_channels, + in_height, + in_width, + dim_output, + out_channels_first=True, + circle_crop=False, + circle_radius=1.0, + ): + super(LinearProjectionVideoFrontEnd, self).__init__() + + self.out_channels_first = out_channels_first + self.in_height = in_height + self.in_width = in_width + self.dim_output = dim_output + self.in_channels = in_channels + self.circle_crop = circle_crop + self.circle_radius = circle_radius + self.circle_indices = self.get_circle_indices() + + if self.dim_output is not None: + if self.circle_crop: + self.linear_proj = nn.Linear(in_channels * len(self.circle_indices), dim_output) + else: + self.linear_proj = nn.Linear(in_channels * in_height * in_width, dim_output) + else: + self.linear_proj = nn.Identity() + + @property + def input_types(self): + """Returns definitions of module input ports.""" + return OrderedDict( + { + "audio_signal": NeuralType(('B', 'D', 'T', 'H', 'W'), VideoSignal()), + "length": NeuralType(tuple('B'), LengthsType()), + } + ) + + @property + def input_types_for_export(self): + """Returns definitions of module input ports.""" + return OrderedDict( + { + "output_signal": NeuralType(('B', 'D', 'T'), NeuralType()), + "length": NeuralType(tuple('B'), LengthsType()), + } + ) + + def get_circle_indices(self): + + """ return image indices inside circle of radius circle_radius """ + + # Create linspace + linspace_height = (torch.linspace(0, 2, steps=self.in_height) - 1).abs() + linspace_width = (torch.linspace(0, 2, steps=self.in_width) - 1).abs() + + # Repeat linspace along height/width + linspace_height = linspace_height.unsqueeze(dim=-1).repeat(1, self.in_width).flatten() + linspace_width = linspace_width.repeat(self.in_height) + + # Compute norm + dist = torch.sqrt(linspace_height.square() + linspace_width.square()) + + # Get circle indices + circle_indices = torch.nonzero(dist <= self.circle_radius).squeeze(dim=-1) + + return circle_indices + + def forward(self, input_signal, length): + + # Permute (B, C, T, H, W) -> (B, T, H, W, C) + input_signal = input_signal.permute(0, 2, 3, 4, 1) + + # Circle Crop + if self.circle_crop: + + # Flatten height, width (B, T, H, W, C) -> (B, T, H*W, C) + input_signal = input_signal.flatten(start_dim=2, end_dim=-2) + + # (B, T, H*W, C) -> (B, T, N circle, C) + input_signal = input_signal[:, :, self.circle_indices] + + # Flatten circle and channels (B, T, N circle, C) -> (B, T, N) + input_signal = input_signal.flatten(start_dim=2, end_dim=-1) + + # Flatten height, width and channels (B, T, H, W, C) -> (B, T, N) + else: + input_signal = input_signal.flatten(start_dim=2, end_dim=-1) + + # Project (B, T, N) -> (B, T, Dout) + input_signal = self.linear_proj(input_signal) + + # Transpose to channels_last format (Batch, Dout, Time) -> (Batch, Time, Dout) + if self.out_channels_first: + output_signal = input_signal.transpose(1, 2) + else: + output_signal = input_signal + + return output_signal, length diff --git a/nemo/collections/multimodal/speech_cv/modules/resnet_video_front_end.py b/nemo/collections/multimodal/speech_cv/modules/resnet_video_front_end.py new file mode 100644 index 000000000000..202c629e2b02 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/modules/resnet_video_front_end.py @@ -0,0 +1,84 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from collections import OrderedDict + +import torch +from torch import nn + +from nemo.collections.multimodal.speech_cv.parts.submodules.resnet import ResNet +from nemo.core.classes.module import NeuralModule +from nemo.core.neural_types import LengthsType, NeuralType, VideoSignal + + +class ResNetVideoFrontEnd(NeuralModule): + """ + Lip Reading / Visual Speech Recognition (VSR) ResNet Front-End Network + + Paper: + 'Audio-Visual Efficient Conformer for Robust Speech Recognition' by Burchi and Timofte + https://arxiv.org/abs/2301.01456 + + Args: + in_channels: number of inputs video channels, 1 for grayscale and 3 for RGB + model: model size in ["ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"] + dim_output: output feature dimension for linear projection after spacial average pooling + out_channels_first: Whether outputs should have channels_first format (Batch, Dout, Time) or channels_last (Batch, Time, Dout) + """ + + def __init__(self, in_channels=1, model="ResNet18", dim_output=256, out_channels_first=True): + super(ResNetVideoFrontEnd, self).__init__() + + self.front_end = nn.Sequential( + nn.Conv3d( + in_channels=in_channels, out_channels=64, kernel_size=(5, 7, 7), stride=(1, 2, 2), padding=(2, 3, 3) + ), + nn.BatchNorm3d(num_features=64), + nn.ReLU(), + nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1)), + ResNet(include_stem=False, dim_output=dim_output, model=model), + ) + + self.out_channels_first = out_channels_first + + @property + def input_types(self): + """Returns definitions of module input ports.""" + return OrderedDict( + { + "audio_signal": NeuralType(('B', 'D', 'T', 'H', 'W'), VideoSignal()), + "length": NeuralType(tuple('B'), LengthsType()), + } + ) + + @property + def input_types_for_export(self): + """Returns definitions of module input ports.""" + return OrderedDict( + { + "output_signal": NeuralType(('B', 'D', 'T'), NeuralType()), + "length": NeuralType(tuple('B'), LengthsType()), + } + ) + + def forward(self, input_signal, length): + + # Front-End Network (Batch, Din, Time, Height, Width) -> (Batch, Dout, Time) + input_signal = self.front_end(input_signal) + + # Transpose to channels_last format (Batch, Dout, Time) -> (Batch, Time, Dout) + if not self.out_channels_first: + input_signal = input_signal.transpose(1, 2) + + return input_signal, length diff --git a/nemo/collections/multimodal/speech_cv/modules/video_augment.py b/nemo/collections/multimodal/speech_cv/modules/video_augment.py new file mode 100644 index 000000000000..e191cc89d1a0 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/modules/video_augment.py @@ -0,0 +1,225 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import random +from collections import OrderedDict + +import torch +from torch import nn + +from nemo.core.classes.module import NeuralModule +from nemo.core.neural_types import NeuralType, VideoSignal + + +try: + import torchvision + + TORCHVISION_AVAILABLE = True +except (ImportError, ModuleNotFoundError): + TORCHVISION_AVAILABLE = False + + +class VideoAugmentation(NeuralModule): + + """ Video Augmentation for batched video input: input_signal shape (B, C, T, H, W) """ + + def __init__( + self, + random_crop, + crop_size, + horizontal_flip, + time_masking, + num_mask_second=1.0, + spatial_masking=False, + mean_frame=True, + ): + super().__init__() + + # Params + self.random_crop = random_crop + self.crop_size = crop_size + self.horizontal_flip = horizontal_flip + self.time_masking = time_masking + self.spatial_masking = spatial_masking + + self.training_augments = nn.ModuleList() + self.inference_augments = nn.ModuleList() + + # Random Crop + if self.random_crop: + if TORCHVISION_AVAILABLE: + self.training_augments.append(torchvision.transforms.RandomCrop(self.crop_size)) + self.inference_augments.append(torchvision.transforms.CenterCrop(self.crop_size)) + else: + raise Exception("RandomCrop transform requires torchvision") + + # Horizontal Flip + if self.horizontal_flip: + if TORCHVISION_AVAILABLE: + self.training_augments.append(torchvision.transforms.RandomHorizontalFlip()) + else: + raise Exception("RandomHorizontalFlip transform requires torchvision") + + # Time Masking + if self.time_masking: + self.training_augments.append(VideoFrameMasking(num_mask_second=num_mask_second, mean_frame=mean_frame)) + + # Spatial Masking + if self.spatial_masking: + self.training_augments.append(SpatialVideoMasking(mean_frame=mean_frame)) + + @property + def input_types(self): + """Returns definitions of module input ports.""" + return OrderedDict({"input_signal": NeuralType(('B', 'D', 'T', 'H', 'W'), VideoSignal()),}) + + @property + def input_types_for_export(self): + """Returns definitions of module input ports.""" + return OrderedDict({"output_signal": NeuralType(('B', 'D', 'T', 'H', 'W'), VideoSignal()),}) + + @torch.no_grad() + def forward(self, input_signal, length): + + if self.training: + augments = self.training_augments + else: + augments = self.inference_augments + + output_signal = input_signal + + for augment in augments: + if isinstance(augment, VideoFrameMasking) or isinstance(augment, SpatialVideoMasking): + output_signal = augment(output_signal, length) + else: + output_signal = augment(output_signal) + + return output_signal + + +class SpatialVideoMasking(NeuralModule): + + """ Spatial Video Mask + + Will mask videos frames in the spatial dimensions using horizontal and vertical masks + + params: + num_horizontal_masks: number of horizontal masks + num_vertical_masks: number of vertical masks + max_h: maximum width of horizontal mask + max_v: maximum width of vertical mask + mean_frame: mask using video mean instead of zeros + + """ + + def __init__(self, num_horizontal_masks=1, num_vertical_masks=1, max_h=30, max_v=30, mean_frame=True): + super().__init__() + + self.num_horizontal_masks = num_horizontal_masks + self.num_vertical_masks = num_vertical_masks + self.max_h = max_h + self.max_v = max_v + self.mean_frame = mean_frame + self.random = random.Random() + + def forward(self, input_signal, length): + + # (B, C, T, H, W) + shape = input_signal.shape + + # Batch loop + for b in range(shape[0]): + + # Mask Value + mask_value = input_signal[b, :, : length[b]].mean() if self.mean_frame else 0.0 + + # Horizontal Mask loop + for i in range(self.num_horizontal_masks): + + # Start index + x = self.random.randint(0, shape[3] - self.max_h) + + # Mask width + w = self.random.randint(0, self.max_h) + + # Apply mask + input_signal[b, :, :, x : x + w] = mask_value + + # Vertical Mask loop + for i in range(self.num_vertical_masks): + + # Start index + x = self.random.randint(0, shape[4] - self.max_v) + + # Mask width + w = self.random.randint(0, self.max_v) + + # Apply mask + input_signal[b, :, :, :, x : x + w] = mask_value + + return input_signal + + +class VideoFrameMasking(NeuralModule): + + """ Video Frame Mask: + + As explained in: + "Visual Speech Recognition for Multiple Languages in the Wild" + https://arxiv.org/abs/2202.13084 + + S6 Time Masking + We mask n consecutive frames with the mean frame of the video. + The duration tn is chosen from 0 to an upper bound nmax using a uniform distribution. + Since there is a large variance in the video lengths of the LRS2 and LRS3 datasets, we set the number of masks proportional to the sequence length. + Specifically, we use one mask per second, and for each mask, the maximum duration nmax is set to 0.4 seconds. + + """ + + def __init__(self, T_second=0.4, num_mask_second=1.0, fps=25.0, mean_frame=True): + super().__init__() + + self.T = int(T_second * fps) + self.num_mask_second = num_mask_second + self.mean_frame = mean_frame + self.fps = fps + self.random = random.Random() + + def forward(self, input_signal, length): + + # (B, C, T, H, W) + shape = input_signal.shape + + # Batch loop + for b in range(shape[0]): + + # Mask per second + mT = int(length[b] / self.fps * self.num_mask_second) + + # Mask Value + mask_value = input_signal[b, :, : length[b]].mean() if self.mean_frame else 0.0 + + # Mask loop + for i in range(mT): + + # Start left Frame + x_left = self.random.randint(0, length[b] - self.T) + + # Mask width + w = self.random.randint(0, self.T) + + # Apply mask + input_signal[b, :, x_left : x_left + w] = mask_value + + return input_signal diff --git a/nemo/collections/multimodal/speech_cv/modules/video_preprocessing.py b/nemo/collections/multimodal/speech_cv/modules/video_preprocessing.py new file mode 100644 index 000000000000..30accea097db --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/modules/video_preprocessing.py @@ -0,0 +1,138 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch +from torch import nn + +from nemo.collections.multimodal.speech_cv.parts.submodules.permute import Permute +from nemo.core.classes import NeuralModule, typecheck + +try: + import torchvision + + TORCHVISION_AVAILABLE = True +except (ImportError, ModuleNotFoundError): + TORCHVISION_AVAILABLE = False + + +class VideoPreprocessor(NeuralModule): + + """ Video Pre-processing + + args: + grayscale: convert images to grayscale + normalize: normalize videos + resize: resize videos + resize_size: output image size for resize + norm_mean: normalize mean + norm_std: normalize std + + """ + + def __init__(self, grayscale, normalize, resize, resize_size, norm_mean, norm_std): + super().__init__() + + # Params + self.grayscale = grayscale + self.normalize = normalize + self.resize = resize + self.resize_size = resize_size + self.norm_mean = norm_mean + self.norm_std = norm_std + + self.transforms = nn.ModuleList() + + # Convert float32 [0:255] -> [0:1] + if TORCHVISION_AVAILABLE: + self.transforms.append(torchvision.transforms.ConvertImageDtype(dtype=torch.float32)) + else: + raise Exception("ConvertImageDtype transform requires torchvision") + + # Convert Channels First + self.transforms.append(Permute(dims=(0, 4, 1, 2, 3))) # (B, T, H, W, C) -> (B, C, T, H, W) + + # Resize + if self.resize: + self.transforms.append(ResizeVideo(self.resize_size)) # (B, C, T, H, W) -> (B, C, T, H', W') + + # Grayscale + if self.grayscale: + if TORCHVISION_AVAILABLE: + self.transforms.append( + nn.Sequential( + Permute(dims=(0, 2, 1, 3, 4)), # (B, C, T, H, W) -> (B, T, C, H, W) + torchvision.transforms.Grayscale(), + Permute(dims=(0, 2, 1, 3, 4)), # (B, T, C, H, W) -> (B, C, T, H, W) + ) + ) + else: + raise Exception("Grayscale transform requires torchvision") + + # Normalize + if self.normalize: + self.transforms.append(NormalizeVideo(mean=norm_mean, std=norm_std)) + + @typecheck() + @torch.no_grad() + def forward(self, input_signal, length): + + for transform in self.transforms: + input_signal = transform(input_signal) + + return input_signal, length + + +class NormalizeVideo(NeuralModule): + def __init__(self, mean, std): + super().__init__() + + self.register_buffer( + "mean", torch.tensor(mean, dtype=torch.float32).reshape(len(mean), 1, 1, 1), persistent=False + ) + self.register_buffer( + "std", torch.tensor(std, dtype=torch.float32).reshape(len(std), 1, 1, 1), persistent=False + ) + + def forward(self, x): + + x = (x - self.mean) / self.std + + return x + + +class ResizeVideo(NeuralModule): + def __init__(self, size): + super().__init__() + + self.size = size + if TORCHVISION_AVAILABLE: + self.resize = torchvision.transforms.Resize(size=self.size) + else: + raise Exception("Resize transform requires torchvision") + + def forward(self, x): + + # (B, C, T, H, W) + if x.dim() == 5: + + B, C = x.shape[:2] + x = x.flatten(start_dim=0, end_dim=1) + x = self.resize(x) + x = x.reshape((B, C) + x.shape[1:]) + + # (C, T, H, W) + elif x.dim() == 4: + x = self.resize(x) + + return x diff --git a/nemo/collections/multimodal/speech_cv/parts/__init__.py b/nemo/collections/multimodal/speech_cv/parts/__init__.py new file mode 100644 index 000000000000..4fc50543f1d2 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/parts/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/nemo/collections/multimodal/speech_cv/parts/preprocessing/features.py b/nemo/collections/multimodal/speech_cv/parts/preprocessing/features.py new file mode 100644 index 000000000000..29b5268d6adf --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/parts/preprocessing/features.py @@ -0,0 +1,62 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import tempfile + +try: + import torchvision + + TORCHVISION_AVAILABLE = True +except (ImportError, ModuleNotFoundError): + TORCHVISION_AVAILABLE = False + + +class VideoFeaturizer(object): + def __init__(self): + pass + + def process(self, video_file, offset, duration): + + # Load Video + video = self.from_file(video_file, offset=offset, duration=duration) + + return video + + def from_file(self, video_file, offset, duration): + + if not TORCHVISION_AVAILABLE: + raise Exception("Reading Video requires torchvision") + + # Load from filename + if isinstance(video_file, str): + video, audio, infos = torchvision.io.read_video( + video_file, start_pts=offset, end_pts=offset + duration, pts_unit="sec" + ) + + # Load from bytes + elif isinstance(video_file, bytes): + + # webdataset.torch_video + with tempfile.TemporaryDirectory() as dirname: + fname = os.path.join(dirname, f"file.mp4") + with open(fname, "wb") as stream: + stream.write(video_file) + video, audio, infos = torchvision.io.read_video( + fname, start_pts=offset, end_pts=offset + duration, pts_unit="sec" + ) + else: + raise Exception("Unknown video data format") + + return video diff --git a/nemo/collections/multimodal/speech_cv/parts/submodules/__init__.py b/nemo/collections/multimodal/speech_cv/parts/submodules/__init__.py new file mode 100644 index 000000000000..4fc50543f1d2 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/parts/submodules/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/nemo/collections/multimodal/speech_cv/parts/submodules/conv2d.py b/nemo/collections/multimodal/speech_cv/parts/submodules/conv2d.py new file mode 100644 index 000000000000..25f6e5451b66 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/parts/submodules/conv2d.py @@ -0,0 +1,72 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Union + +from torch import nn +from torch.nn import init +from torch.nn.common_types import _size_2_t + + +class Conv2d(nn.Conv2d): + + """ + Conv2d layer with ResNet initialization: + + Reference: "Deep Residual Learning for Image Recognition" by He et al. + https://arxiv.org/abs/1512.03385 + + """ + + def __init__( + self, + in_channels: int, + out_channels: int, + kernel_size: _size_2_t, + stride: _size_2_t = 1, + padding: Union[str, _size_2_t] = 0, + dilation: _size_2_t = 1, + groups: int = 1, + bias: bool = True, + padding_mode: str = 'zeros', + device=None, + dtype=None, + weight_init: str = "default", + bias_init: str = "default", + ): + + super(Conv2d, self).__init__( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding, + dilation=dilation, + groups=groups, + bias=bias, + padding_mode=padding_mode, + device=device, + dtype=dtype, + ) + + # Weight Init + assert weight_init in ["default", "he_normal"] + if weight_init == "he_normal": + init.kaiming_normal_(self.weight) + + # Bias Init + assert bias_init in ["default", "zeros"] + if self.bias is not None: + if bias_init == "zeros": + init.zeros_(self.bias) diff --git a/nemo/collections/multimodal/speech_cv/parts/submodules/global_avg_pool2d.py b/nemo/collections/multimodal/speech_cv/parts/submodules/global_avg_pool2d.py new file mode 100644 index 000000000000..6c248d86564e --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/parts/submodules/global_avg_pool2d.py @@ -0,0 +1,28 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from torch import nn + + +class GlobalAvgPool2d(nn.Module): + def __init__(self, dim=(2, 3), keepdim=False): + super(GlobalAvgPool2d, self).__init__() + self.dim = dim + self.keepdim = keepdim + + def forward(self, x): + + assert x.dim() == 4, "input signal should have 4 dims, has {}".format(x.dim()) + + return x.mean(dim=self.dim, keepdim=self.keepdim) diff --git a/nemo/collections/multimodal/speech_cv/parts/submodules/permute.py b/nemo/collections/multimodal/speech_cv/parts/submodules/permute.py new file mode 100644 index 000000000000..abd4ce34f4a8 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/parts/submodules/permute.py @@ -0,0 +1,28 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from torch import nn + + +class Permute(nn.Module): + def __init__(self, dims, make_contiguous=False): + super(Permute, self).__init__() + self.dims = dims + self.make_contiguous = make_contiguous + + def forward(self, x): + x = x.permute(self.dims) + if self.make_contiguous: + x = x.contiguous() + return x diff --git a/nemo/collections/multimodal/speech_cv/parts/submodules/resnet.py b/nemo/collections/multimodal/speech_cv/parts/submodules/resnet.py new file mode 100644 index 000000000000..c911db6f3abe --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/parts/submodules/resnet.py @@ -0,0 +1,175 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from torch import nn + +from nemo.collections.multimodal.speech_cv.parts.submodules.conv2d import Conv2d +from nemo.collections.multimodal.speech_cv.parts.submodules.global_avg_pool2d import GlobalAvgPool2d +from nemo.collections.multimodal.speech_cv.parts.submodules.resnet_block import ResNetBlock +from nemo.collections.multimodal.speech_cv.parts.submodules.resnet_bottleneck_block import ResNetBottleneckBlock + + +class ResNet(nn.Module): + + """ ResNet (ResNet18, ResNet34, ResNet50, ResNet101, ResNet152) + Models: 224 x 224 + ResNet18: 11,689,512 Params + ResNet34: 21,797,672 Params + ResNet50: 25,557,032 Params + ResNet101: 44,549,160 Params + Resnet152: 60,192,808 Params + Reference: "Deep Residual Learning for Image Recognition" by He et al. + https://arxiv.org/abs/1512.03385 + """ + + def __init__(self, dim_input=3, dim_output=1000, model="ResNet50", include_stem=True, include_head=True): + super(ResNet, self).__init__() + + assert model in ["ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"] + + if model == "ResNet18": + dim_stem = 64 + dim_blocks = [64, 128, 256, 512] + num_blocks = [2, 2, 2, 2] + bottleneck = False + elif model == "ResNet34": + dim_stem = 64 + dim_blocks = [64, 128, 256, 512] + num_blocks = [3, 4, 6, 3] + bottleneck = False + elif model == "ResNet50": + dim_stem = 64 + dim_blocks = [256, 512, 1024, 2048] + num_blocks = [3, 4, 6, 3] + bottleneck = True + elif model == "ResNet101": + dim_stem = 64 + dim_blocks = [256, 512, 1024, 2048] + num_blocks = [3, 4, 23, 3] + bottleneck = True + elif model == "ResNet152": + dim_stem = 64 + dim_blocks = [256, 512, 1024, 2048] + num_blocks = [3, 8, 36, 3] + bottleneck = True + + self.stem = ( + nn.Sequential( + Conv2d( + in_channels=dim_input, + out_channels=dim_stem, + kernel_size=(7, 7), + stride=(2, 2), + weight_init="he_normal", + bias=False, + ), + nn.BatchNorm2d(num_features=dim_stem), + nn.ReLU(), + nn.MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)), + ) + if include_stem + else nn.Identity() + ) + + # Blocks + self.blocks = nn.ModuleList() + for stage_id in range(4): + + for block_id in range(num_blocks[stage_id]): + + # Projection Block + if block_id == 0: + if stage_id == 0: + stride = (1, 1) + bottleneck_ratio = 1 + in_features = dim_stem + else: + stride = (2, 2) + bottleneck_ratio = 2 + in_features = dim_blocks[stage_id - 1] + # Default Block + else: + stride = (1, 1) + in_features = dim_blocks[stage_id] + bottleneck_ratio = 4 + + if bottleneck: + self.blocks.append( + ResNetBottleneckBlock( + in_features=in_features, + out_features=dim_blocks[stage_id], + bottleneck_ratio=bottleneck_ratio, + kernel_size=(3, 3), + stride=stride, + ) + ) + else: + self.blocks.append( + ResNetBlock( + in_features=in_features, + out_features=dim_blocks[stage_id], + kernel_size=(3, 3), + stride=stride, + ) + ) + + # Head + self.head = ( + nn.Sequential( + GlobalAvgPool2d(), + nn.Linear(in_features=dim_blocks[-1], out_features=dim_output) + if dim_output is not None + else nn.Identity(), + ) + if include_head + else nn.Identity() + ) + + def forward(self, x): + + # Is Video + if x.dim() == 5: + + is_video = True + batch_size = x.shape[0] + video_frames = x.shape[2] + + # (B, Din, T, H, W) -> (B * T, Din, H, W) + x = x.transpose(1, 2).flatten(start_dim=0, end_dim=1) + + else: + is_video = False + + # (B, Din, H, W) -> (B, D0, H//4, W//4) + x = self.stem(x) + + # (B, D0, H//4, W//4) -> (B, D4, H//32, W//32) + for block in self.blocks: + x = block(x) + + # (B, D4, H//32, W//32) -> (B, Dout) + x = self.head(x) + + # Is Video + if is_video: + + # (B * T, Dout) -> (B, Dout, T) + if x.dim() == 2: + x = x.reshape(batch_size, video_frames, -1).transpose(1, 2) + + # (B * T, D4, H//32, W//32) -> (B, D4, T, H//32, W//32) + else: + x = x.reshape(batch_size, video_frames, x.shape[1], x.shape[2], x.shape[3]).transpose(1, 2) + + return x diff --git a/nemo/collections/multimodal/speech_cv/parts/submodules/resnet_block.py b/nemo/collections/multimodal/speech_cv/parts/submodules/resnet_block.py new file mode 100644 index 000000000000..19436311ccd7 --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/parts/submodules/resnet_block.py @@ -0,0 +1,86 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch +from torch import nn +from torch.nn.modules.utils import _pair + +from nemo.collections.multimodal.speech_cv.parts.submodules.conv2d import Conv2d + + +class ResNetBlock(nn.Module): + + """ ResNet Residual Block used by ResNet18 and ResNet34 networks. + References: "Deep Residual Learning for Image Recognition", He et al. + https://arxiv.org/abs/1512.03385 + """ + + def __init__(self, in_features, out_features, kernel_size, stride, weight_init="he_normal", bias_init="zeros"): + super(ResNetBlock, self).__init__() + + # Convert to pair + kernel_size = _pair(kernel_size) + + # layers + self.layers = nn.Sequential( + Conv2d( + in_channels=in_features, + out_channels=out_features, + kernel_size=kernel_size, + stride=stride, + bias=False, + weight_init=weight_init, + bias_init=bias_init, + padding=((kernel_size[0] - 1) // 2, kernel_size[1] // 2), + ), + nn.BatchNorm2d(out_features), + nn.ReLU(), + Conv2d( + in_channels=out_features, + out_channels=out_features, + kernel_size=kernel_size, + bias=False, + weight_init=weight_init, + bias_init=bias_init, + padding=((kernel_size[0] - 1) // 2, kernel_size[1] // 2), + ), + nn.BatchNorm2d(out_features), + ) + + # Residual Block + if torch.prod(torch.tensor(stride)) > 1 or in_features != out_features: + self.residual = nn.Sequential( + Conv2d( + in_channels=in_features, + out_channels=out_features, + kernel_size=1, + stride=stride, + bias=False, + weight_init=weight_init, + bias_init=bias_init, + ), + nn.BatchNorm2d(out_features), + ) + else: + self.residual = nn.Identity() + + # Joined Post Act + self.joined_post_act = nn.ReLU() + + def forward(self, x): + + # Forward Layers + x = self.joined_post_act(self.layers(x) + self.residual(x)) + + return x diff --git a/nemo/collections/multimodal/speech_cv/parts/submodules/resnet_bottleneck_block.py b/nemo/collections/multimodal/speech_cv/parts/submodules/resnet_bottleneck_block.py new file mode 100644 index 000000000000..50cafa53343c --- /dev/null +++ b/nemo/collections/multimodal/speech_cv/parts/submodules/resnet_bottleneck_block.py @@ -0,0 +1,107 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch +from torch import nn +from torch.nn.modules.utils import _pair + +from nemo.collections.multimodal.speech_cv.parts.submodules.conv2d import Conv2d + + +class ResNetBottleneckBlock(nn.Module): + + """ ResNet Bottleneck Residual Block used by ResNet50, ResNet101 and ResNet152 networks. + References: "Deep Residual Learning for Image Recognition", He et al. + https://arxiv.org/abs/1512.03385 + """ + + def __init__( + self, + in_features, + out_features, + bottleneck_ratio, + kernel_size, + stride, + weight_init="he_normal", + bias_init="zeros", + ): + super(ResNetBottleneckBlock, self).__init__() + + # Assert + assert in_features % bottleneck_ratio == 0 + + # Convert to pair + kernel_size = _pair(kernel_size) + + # layers + self.layers = nn.Sequential( + Conv2d( + in_channels=in_features, + out_channels=in_features // bottleneck_ratio, + kernel_size=1, + bias=False, + weight_init=weight_init, + bias_init=bias_init, + ), + nn.BatchNorm2d(in_features // bottleneck_ratio), + nn.ReLU(), + Conv2d( + in_channels=in_features // bottleneck_ratio, + out_channels=in_features // bottleneck_ratio, + kernel_size=kernel_size, + stride=stride, + bias=False, + weight_init=weight_init, + bias_init=bias_init, + padding=((kernel_size[0] - 1) // 2, kernel_size[1] // 2), + ), + nn.BatchNorm2d(in_features // bottleneck_ratio), + nn.ReLU(), + Conv2d( + in_channels=in_features // bottleneck_ratio, + out_channels=out_features, + kernel_size=1, + bias=False, + weight_init=weight_init, + bias_init=bias_init, + ), + nn.BatchNorm2d(out_features), + ) + + # Joined Post Act + self.joined_post_act = nn.ReLU() + + # Residual Block + if torch.prod(torch.tensor(stride)) > 1 or in_features != out_features: + self.residual = nn.Sequential( + Conv2d( + in_channels=in_features, + out_channels=out_features, + kernel_size=1, + stride=stride, + bias=False, + weight_init=weight_init, + bias_init=bias_init, + ), + nn.BatchNorm2d(out_features), + ) + else: + self.residual = nn.Identity() + + def forward(self, x): + + # Forward Layers + x = self.joined_post_act(self.layers(x) + self.residual(x)) + + return x diff --git a/nemo/core/neural_types/elements.py b/nemo/core/neural_types/elements.py index f2de48da26d0..98fd4cc6193a 100644 --- a/nemo/core/neural_types/elements.py +++ b/nemo/core/neural_types/elements.py @@ -25,6 +25,7 @@ 'ChannelType', 'AcousticEncodedRepresentation', 'AudioSignal', + 'VideoSignal', 'SpectrogramType', 'MelSpectrogramType', 'MFCCSpectrogramType', @@ -199,6 +200,21 @@ def type_parameters(self): return self._params +class VideoSignal(ElementType): + """Element type to represent encoded representation returned by the visual encoder model + Args: + fps (int): frames per second. + """ + + def __init__(self, fps: int = None): + self._params = {} + self._params['fps'] = fps + + @property + def type_parameters(self): + return self._params + + class SpectrogramType(ChannelType): """Element type to represent generic spectrogram signal""" From fd17915f95f2a9085256c0bee76fa920a87dcf39 Mon Sep 17 00:00:00 2001 From: Jan Lasek Date: Wed, 20 Sep 2023 23:52:00 +0200 Subject: [PATCH 045/112] HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek * StarCoder conversion test Signed-off-by: Jan Lasek * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek * Fix test Signed-off-by: Jan Lasek * Catch up with save_to changes Signed-off-by: Jan Lasek * Don't abbreviate args for clarity Signed-off-by: Jan Lasek * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- Jenkinsfile | 25 +- .../convert_starcoder_hf_to_nemo.py | 220 ++++++++++++++++++ 2 files changed, 239 insertions(+), 6 deletions(-) create mode 100644 scripts/nlp_language_modeling/convert_starcoder_hf_to_nemo.py diff --git a/Jenkinsfile b/Jenkinsfile index 04bc96ff1596..2abcdbcc5ddb 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -116,12 +116,25 @@ pipeline { } } failFast true - steps { - sh 'CUDA_VISIBLE_DEVICES=0 python scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py \ - --in-file=/home/TestData/nlp/megatron_llama/llama-ci-hf \ - --out-file=/home/TestData/nlp/megatron_llama/ci.nemo \ - --precision=16' - sh 'rm -f /home/TestData/nlp/megatron_llama/ci.nemo' + parallel { + stage('Llama') { + steps { + sh 'CUDA_VISIBLE_DEVICES=0 python scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py \ + --in-file=/home/TestData/nlp/megatron_llama/llama-ci-hf \ + --out-file=/home/TestData/nlp/megatron_llama/ci.nemo \ + --precision=16' + sh 'rm -f /home/TestData/nlp/megatron_llama/ci.nemo' + } + } + stage('StarCoder') { + steps { + sh 'python scripts/nlp_language_modeling/convert_starcoder_hf_to_nemo.py \ + --config examples/nlp/language_modeling/conf/megatron_gpt_config.yaml \ + --input /home/TestData/nlp/megatron_gpt/starcoder-ci-hf \ + --output /home/TestData/nlp/megatron_gpt/starcoder-ci-hf' + sh 'rm -f /home/TestData/nlp/megatron_gpt/starcoder-ci-hf/megatron_starcoder_tp1_pp1.nemo' + } + } } } diff --git a/scripts/nlp_language_modeling/convert_starcoder_hf_to_nemo.py b/scripts/nlp_language_modeling/convert_starcoder_hf_to_nemo.py new file mode 100644 index 000000000000..e826dae037d3 --- /dev/null +++ b/scripts/nlp_language_modeling/convert_starcoder_hf_to_nemo.py @@ -0,0 +1,220 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +""" +A script to convert the BigCode StarCoder checkpoints from HuggingFace to Megatron GPTModel. +This script is hardcoded specifically for the StarCoder pretrained models only, and is not +generalisable to any other models. + +This script will load and convert the model entirely on CPU for OOM safety, but it is +possible to initialize the model on GPU before the save down. You can do this by adding --cuda +parameter to this script call. + +This script requires that you have downloaded the StarCoder checkpoint from HuggingFace. +This can be done using Git with the following command: +```bash +git clone https://huggingface.co/bigcode/starcoder +``` +Note that downloading this particular checkpoint requires authentication with a HuggingFace token. + +The script will generate a Megatron model with TP=1 and PP=1. If you need different TP/PP +values, then after running this script, please use the following script to set whatever +TP/PP configuration you want: + NeMo/examples/nlp/language_modeling/megatron_change_num_partitions.py + +This script also requires a baseline config file from which to override default parameters. +You can specify the location of this file using the -c argument. Please use the config below +to correctly configure creating GPT-2 model in Megatron: + NeMo/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml + + +Here is an example usage command: +```python +python scripts/nlp_language_modeling/convert_starcoder_hf_to_nemo.py \ + --config /path/to/megatron_gpt_config.yaml \ + --input /path/to/starcoder \ + --output /path/to/save +``` +""" + +import argparse +import os +from typing import Dict + +import pytorch_lightning as pl +import torch +import yaml +from omegaconf import OmegaConf +from transformers import AutoConfig, AutoModelForCausalLM + +from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel +from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy +from nemo.utils import logging + + +def convert_state_dict(state_dict: Dict[str, torch.Tensor], amp: bool = False): + def get_new_key(old_key): + if old_key == "transformer.wte.weight": + return "embedding.word_embeddings.weight" + if old_key == "transformer.wpe.weight": + return "embedding.position_embeddings.weight" + elif old_key.startswith("transformer.ln_f"): + return old_key.replace("transformer.ln_f", "decoder.final_layernorm") + elif old_key.startswith("lm_head"): + return old_key.replace("lm_head", "output_layer") + else: + p1 = old_key.replace("transformer.h", "decoder.layers") + p2 = p1.replace("ln_1.", "self_attention.linear_qkv.layer_norm_") + p3 = p2.replace("attn.c_proj", "self_attention.linear_proj") + p4 = p3.replace("attn.c_attn", "self_attention.linear_qkv") + p5 = p4.replace("ln_2.", "mlp.linear_fc1.layer_norm_") + p6 = p5.replace("c_fc", "linear_fc1") + p7 = p6.replace("c_proj", "linear_fc2") + return p7 + + new_dict = {} + prefix = "model.module." if amp else "model." + + for old_key, val in state_dict.items(): + new_key = get_new_key(old_key) + new_key = prefix + new_key + new_dict[new_key] = val + + return new_dict + + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument("--config", type=str, required=True, help="Path to the megatron_gpt_config.yaml file") + parser.add_argument( + "--input", type=str, required=True, help="StarCoder from HuggingFace hub or local dir with downloaded model" + ) + parser.add_argument("--output", type=str, default=".", help="Path to dir where to store output .nemo file") + parser.add_argument( + "--precision", type=str, default="bf16", choices=["bf16", "32"], help="Precision for checkpoint weights saved" + ) + parser.add_argument("--cuda", action="store_true", help="Put Nemo model onto GPU prior to saving") + args = parser.parse_args() + + if not os.path.isdir(args.output): + raise FileNotFoundError(f"Output directory '{args.output}' does not exist") + + hf_config = AutoConfig.from_pretrained(args.input) + + with open(args.config, "r", encoding="utf_8") as f: + orig_cfg = yaml.safe_load(f) + + model_dict = orig_cfg["model"] + + if "data" in model_dict: + del model_dict["data"] + + override_model_dict = { + "micro_batch_size": 1, + "global_batch_size": 1, + "tensor_model_parallel_size": 1, + "pipeline_model_parallel_size": 1, + "megatron_amp_O2": False, + "transformer_engine": True, + "use_cpu_initialization": not args.cuda, + "normalization": "layernorm", + "mcore_gpt": True, + "num_query_groups": 1, # MQA + "hidden_size": hf_config.n_embd, + "encoder_seq_length": hf_config.n_positions, + "max_position_embeddings": hf_config.n_positions, + "num_layers": hf_config.n_layer, + "num_attention_heads": hf_config.n_head, + "ffn_hidden_size": hf_config.n_inner, + "layernorm_epsilon": hf_config.layer_norm_epsilon, + "pre_process": True, + "post_process": True, + "apply_query_key_layer_scaling": True, + "bias": True, + "transformer_block_type": "pre_ln", + "fp32_residual_connection": False, + "hidden_dropout": hf_config.summary_first_dropout, + "attention_dropout": hf_config.attn_pdrop, + "ffn_dropout": 0, + "share_embeddings_and_output_weights": False, + "position_embedding_type": "learned_absolute", + "normalize_attention_scores": True, + "precision": args.precision, + } + tokenizer_dict = { + "library": "huggingface", + "type": args.input, + "use_fast": True, + } + trainer_dict = { + "devices": 1, + "num_nodes": 1, + "accelerator": "gpu" if args.cuda else "cpu", + "precision": args.precision, + "logger": False, + "enable_checkpointing": False, + "max_epochs": -1, + "max_steps": 100000, + "log_every_n_steps": 10, + "val_check_interval": 100, + "limit_val_batches": 50, + "limit_test_batches": 500, + "accumulate_grad_batches": 1, + "gradient_clip_val": 1.0, + "benchmark": False, + "enable_model_summary": False, + "strategy": NLPDDPStrategy(), + } + + model_dict.update(override_model_dict) + model_dict["tokenizer"] = tokenizer_dict + + omega_cfg = OmegaConf.create(model_dict) + + trainer = pl.Trainer(**trainer_dict) + + logging.info("Creating Megatron model...") + model = MegatronGPTModel(omega_cfg, trainer) + logging.info(f"Created model:\n{model}") + + logging.info("Loading HuggingFace model...") + model_hf = AutoModelForCausalLM.from_pretrained(args.input) + logging.info(f"Loaded model:\n{model_hf}") + + state_dict_hf = model_hf.state_dict() + convert_dict = convert_state_dict(state_dict_hf, amp=omega_cfg.megatron_amp_O2) + + logging.info("Loading state dict...") + missing_keys, unexpected_keys = model.load_state_dict(convert_dict, strict=False) + + if missing_keys: + # Keys ending with '_extra_state' are related to Transformer Engine internals + missing_keys_non_extra = [key for key in missing_keys if not key.endswith("_extra_state")] + if missing_keys_non_extra: + logging.critical("Missing keys were detected during the load, something has gone wrong. Aborting.") + raise RuntimeError(f"Missing keys: \n{missing_keys_non_extra}") + + if unexpected_keys: + logging.critical("Unexpected keys were detected which should not happen. Aborting.") + raise RuntimeError(f"Unexpected keys: \n{unexpected_keys}") + + logging.info("Saving model...") + # We make sure that the tokenizer can be instantiated later regardless of args.input + model.cfg.tokenizer.update(type="bigcode/starcoder") + dtype = torch.bfloat16 if args.precision == "bf16" else torch.float32 + model = model.to(dtype=dtype) + model.cfg.update(use_cpu_initialization=False) + model.save_to(os.path.join(args.output, "megatron_starcoder_tp1_pp1.nemo")) + logging.info("Done.") From 00767249fc7f5276d921762b78ecf46100190ecf Mon Sep 17 00:00:00 2001 From: Kelvin Liu Date: Fri, 22 Sep 2023 01:57:02 +0800 Subject: [PATCH 046/112] fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu Co-authored-by: Hongbin Liu Signed-off-by: Sasha Meister --- nemo/collections/nlp/parts/nlp_overrides.py | 65 ++++++++++++++++++--- 1 file changed, 57 insertions(+), 8 deletions(-) diff --git a/nemo/collections/nlp/parts/nlp_overrides.py b/nemo/collections/nlp/parts/nlp_overrides.py index d7eaaa90e536..a116c8e60299 100644 --- a/nemo/collections/nlp/parts/nlp_overrides.py +++ b/nemo/collections/nlp/parts/nlp_overrides.py @@ -740,7 +740,8 @@ def _load_state_dict_from_disk(self, model_weights, map_location=None): peft_state_dict = torch.load(model_weights_path, map_location)['state_dict'] else: peft_state_dict = {} - base_model_state_dict.update(peft_state_dict) # add the PEFT state_dict into the base model's state_dict + if base_model_state_dict: + base_model_state_dict.update(peft_state_dict) # add the PEFT state_dict into the base model's state_dict return base_model_state_dict def restore_from( @@ -765,13 +766,61 @@ def restore_from( return loaded_params conf, instance, state_dict = loaded_params - if ( - self.peft_model_nemo_path is None and self.peft_model_ckpt_dir is None - ): # we have this check only for training PEFT from scratch - peft_state_dict = instance.get_peft_state_dict() - state_dict.update(peft_state_dict) - state_dict = self.modify_state_dict(conf, state_dict) - self.load_instance_with_state_dict(instance, state_dict, strict) + # if we're using dist checkpointing then state_dict will be None + if state_dict is None: + # dist checkpointing needs torch.distributed to load the checkpoint + if parallel_state.is_unitialized(): + + def dummy(): + return + + if trainer.strategy.launcher is not None: + trainer.strategy.launcher.launch(dummy, trainer=trainer) + trainer.strategy.setup_environment() + + with tempfile.TemporaryDirectory() as tmpdir: + # Check if self.model_extracted_dir is set, and is a valid path + if self.model_extracted_dir is not None and os.path.isdir(self.model_extracted_dir): + # Log that NeMo will use the provided `model_extracted_dir` + logging.info( + f"Restoration will occur within pre-extracted directory : " f"`{self.model_extracted_dir}`." + ) + + # Override `tmpdir` above with the pre-extracted `model_extracted_dir` + tmpdir = self.model_extracted_dir + + else: + # Extract the nemo file into the temporary directory + self._unpack_nemo_file( + path2file=restore_path, out_folder=tmpdir, extract_config_only=return_config is True + ) + checkpoint = {} + sharded_state_dict = instance.sharded_state_dict() + peft_state_dict = instance.get_peft_state_dict() + for k in peft_state_dict.keys(): + sharded_state_dict.pop(k) + checkpoint['state_dict'] = sharded_state_dict + # remove model weights extension + tmp_model_weights_ckpt = os.path.join(tmpdir, self.model_weights_ckpt) + tmp_model_weights_dir = os.path.splitext(tmp_model_weights_ckpt)[0] + assert os.path.isdir(tmp_model_weights_dir), f'Expected {tmp_model_weights_dir} to be a directory.' + checkpoint = dist_checkpointing.load( + sharded_state_dict=checkpoint, checkpoint_dir=tmp_model_weights_dir + ) + checkpoint['state_dict'].update(peft_state_dict) + instance.on_load_checkpoint(checkpoint) + if hasattr(instance, 'setup_transformer_engine_tp_groups'): + instance.setup_transformer_engine_tp_groups() + + else: + if ( + self.peft_model_nemo_path is None and self.peft_model_ckpt_dir is None + ): # we have this check only for training PEFT from scratch + peft_state_dict = instance.get_peft_state_dict() + state_dict.update(peft_state_dict) + state_dict = self.modify_state_dict(conf, state_dict) + self.load_instance_with_state_dict(instance, state_dict, strict) + logging.info(f'Model {instance.__class__.__name__} was successfully restored from {restore_path}.') return instance From 45babd24e7debd02554b211881868fe596f16046 Mon Sep 17 00:00:00 2001 From: Tamerlan Tabolov Date: Thu, 21 Sep 2023 20:11:45 +0200 Subject: [PATCH 047/112] Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister --- nemo/collections/tts/modules/transformer.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nemo/collections/tts/modules/transformer.py b/nemo/collections/tts/modules/transformer.py index 3dda8c522dcc..728b583919ff 100644 --- a/nemo/collections/tts/modules/transformer.py +++ b/nemo/collections/tts/modules/transformer.py @@ -246,7 +246,7 @@ def forward(self, input, seq_lens, conditioning=None): def _forward(self, inp, mask, conditioning): pos_seq = torch.arange(inp.size(1), device=inp.device).to(inp.dtype) pos_emb = self.pos_emb(pos_seq) * mask - inp += pos_emb + inp = inp + pos_emb inp = self.cond_input(inp, conditioning) out = self.drop(inp) From 84584c04775baa73499e367e147b9ff377a8d0e9 Mon Sep 17 00:00:00 2001 From: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Date: Thu, 21 Sep 2023 22:13:25 -0400 Subject: [PATCH 048/112] Fix (#7478) Signed-off-by: Cheng-Ping Hsieh Signed-off-by: Sasha Meister --- .../nlp/data/language_modeling/megatron/gpt_sft_chat_dataset.py | 1 + 1 file changed, 1 insertion(+) diff --git a/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_chat_dataset.py b/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_chat_dataset.py index 55bce820ca8f..801a58394f06 100644 --- a/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_chat_dataset.py +++ b/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_chat_dataset.py @@ -289,6 +289,7 @@ def collate_fn(self, batch): labels = [x[: self.max_seq_length] for x in labels] loss_mask = [x[: self.max_seq_length] for x in loss_mask] contexts = [x[: self.max_seq_length] for x in contexts] + answers = [x[: self.max_seq_length] for x in answers] # increase max length to nearest multiple of 4 or 8 if self.pad_to_max_length: From 37cc67f00bd0fa4158be8528e0d4d694f9afb8a2 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Sat, 23 Sep 2023 23:10:20 -0700 Subject: [PATCH 049/112] add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen Co-authored-by: Gerald Shen <119401249+gshennvm@users.noreply.github.com> Signed-off-by: Sasha Meister --- nemo/utils/exp_manager.py | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/nemo/utils/exp_manager.py b/nemo/utils/exp_manager.py index 1629aa5cbb50..addbf4eda617 100644 --- a/nemo/utils/exp_manager.py +++ b/nemo/utils/exp_manager.py @@ -170,6 +170,8 @@ class ExpManagerConfig: ema: Optional[EMAParams] = EMAParams() # Wall clock time limit max_time_per_run: Optional[str] = None + # time to sleep non 0 ranks during initialization + seconds_to_sleep: float = 5 class TimingCallback(Callback): @@ -301,6 +303,7 @@ def exp_manager(trainer: 'pytorch_lightning.Trainer', cfg: Optional[Union[DictCo Set this to True if you are using DDP with many GPUs and do not want many log files in your exp dir. - max_time (str): The maximum wall clock time *per run*. This is intended to be used on clusters where you want a checkpoint to be saved after this specified time and be able to resume from that checkpoint. Defaults to None. + - seconds_to_sleep (float): seconds to sleep non rank 0 processes for. Used to give enough time for rank 0 to initialize returns: log_dir (Path): The final logging directory where logging files are saved. Usually the concatenation of @@ -501,6 +504,11 @@ def exp_manager(trainer: 'pytorch_lightning.Trainer', cfg: Optional[Union[DictCo # Add lightning file logging to global_rank zero add_filehandlers_to_pl_logger(log_dir / 'lightning_logs.txt', log_dir / 'nemo_error_log.txt') + elif trainer.num_devices * trainer.num_devices > 1: + # sleep other ranks so rank 0 can finish + # doing the initialization such as moving files + time.sleep(cfg.seconds_to_sleep) + return log_dir From 2b76509251b888af390124e5073a362f5fb662b3 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 25 Sep 2023 10:04:27 -0700 Subject: [PATCH 050/112] Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Sasha Meister --- nemo/utils/exp_manager.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nemo/utils/exp_manager.py b/nemo/utils/exp_manager.py index addbf4eda617..1125ae06cabb 100644 --- a/nemo/utils/exp_manager.py +++ b/nemo/utils/exp_manager.py @@ -504,7 +504,7 @@ def exp_manager(trainer: 'pytorch_lightning.Trainer', cfg: Optional[Union[DictCo # Add lightning file logging to global_rank zero add_filehandlers_to_pl_logger(log_dir / 'lightning_logs.txt', log_dir / 'nemo_error_log.txt') - elif trainer.num_devices * trainer.num_devices > 1: + elif trainer.num_nodes * trainer.num_devices > 1: # sleep other ranks so rank 0 can finish # doing the initialization such as moving files time.sleep(cfg.seconds_to_sleep) From 2585e0a97b1405cd47331b013e7e6ea889d891a1 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 25 Sep 2023 14:21:17 -0700 Subject: [PATCH 051/112] bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister --- tutorials/tts/Vits_Training.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/tts/Vits_Training.ipynb b/tutorials/tts/Vits_Training.ipynb index 84dad62bba6f..398ca377eff5 100644 --- a/tutorials/tts/Vits_Training.ipynb +++ b/tutorials/tts/Vits_Training.ipynb @@ -308,7 +308,7 @@ " phoneme_dict_path=tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt \\\n", " heteronyms_path=tts_dataset_files/heteronyms-052722 \\\n", " trainer.max_epochs=3 \\\n", - " trainer.accelerator=null \\\n", + " trainer.accelerator=auto \\\n", " trainer.check_val_every_n_epoch=1 \\\n", " trainer.devices=1)" ] From 9c8336578e9db8f29876e6ca3e7e4016596115ac Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Mon, 25 Sep 2023 16:55:05 -0700 Subject: [PATCH 052/112] [doc] fix broken link (#7481) Signed-off-by: Stas Bekman Signed-off-by: Sasha Meister --- docs/source/core/exp_manager.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/core/exp_manager.rst b/docs/source/core/exp_manager.rst index c23d902a26ee..5415ae403f38 100644 --- a/docs/source/core/exp_manager.rst +++ b/docs/source/core/exp_manager.rst @@ -32,7 +32,7 @@ Optionally, launch TensorBoard to view the training results in ``./nemo_experime .. If ``create_checkpoint_callback`` is set to ``True``, then NeMo automatically creates checkpoints during training -using PyTorch Lightning's `ModelCheckpoint `_. +using PyTorch Lightning's `ModelCheckpoint `_. We can configure the ``ModelCheckpoint`` via YAML or CLI. .. code-block:: yaml From 92f0eecdfa9929177196f750a7d4c8cfb561083d Mon Sep 17 00:00:00 2001 From: Ryan Langman Date: Mon, 25 Sep 2023 21:27:13 -0700 Subject: [PATCH 053/112] [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan * [TTS] Add comment about read failures Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: Sasha Meister --- .../asr/parts/preprocessing/segment.py | 17 +++++++++++++---- nemo/collections/tts/data/vocoder_dataset.py | 6 ++++-- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/nemo/collections/asr/parts/preprocessing/segment.py b/nemo/collections/asr/parts/preprocessing/segment.py index d586137d5ff2..f614d3d22186 100644 --- a/nemo/collections/asr/parts/preprocessing/segment.py +++ b/nemo/collections/asr/parts/preprocessing/segment.py @@ -375,7 +375,15 @@ def from_file_list( @classmethod def segment_from_file( - cls, audio_file, target_sr=None, n_segments=0, trim=False, orig_sr=None, channel_selector=None, offset=None + cls, + audio_file, + target_sr=None, + n_segments=0, + trim=False, + orig_sr=None, + channel_selector=None, + offset=None, + dtype='float32', ): """Grabs n_segments number of samples from audio_file. If offset is not provided, n_segments are selected randomly. @@ -390,6 +398,7 @@ def segment_from_file( :param orig_sr: the original sample rate :param channel selector: select a subset of channels. If set to `None`, the original signal will be used. :param offset: fixed offset in seconds + :param dtype: data type to load audio as. :return: numpy array of samples """ is_segmented = False @@ -412,15 +421,15 @@ def segment_from_file( f'Provided audio start ({audio_start}) is larger than the maximum possible ({max_audio_start})' ) f.seek(audio_start) - samples = f.read(n_segments_at_original_sr, dtype='float32') + samples = f.read(n_segments_at_original_sr, dtype=dtype) is_segmented = True elif n_segments_at_original_sr > len(f): logging.warning( f"Number of segments ({n_segments_at_original_sr}) is greater than the length ({len(f)}) of the audio file {audio_file}. This may lead to shape mismatch errors." ) - samples = f.read(dtype='float32') + samples = f.read(dtype=dtype) else: - samples = f.read(dtype='float32') + samples = f.read(dtype=dtype) except RuntimeError as e: logging.error(f"Loading {audio_file} via SoundFile raised RuntimeError: `{e}`.") raise e diff --git a/nemo/collections/tts/data/vocoder_dataset.py b/nemo/collections/tts/data/vocoder_dataset.py index a5a30870dfff..97e0648f8b11 100644 --- a/nemo/collections/tts/data/vocoder_dataset.py +++ b/nemo/collections/tts/data/vocoder_dataset.py @@ -122,11 +122,13 @@ def get_sampler(self, batch_size: int) -> Optional[torch.utils.data.Sampler]: return sampler def _segment_audio(self, audio_filepath: Path) -> AudioSegment: - # Retry file read multiple times as file seeking can produce random IO errors. + # File seeking sometimes fails when reading flac files with libsndfile < 1.0.30. + # Read audio as int32 to minimize issues, and retry read on a different segment in case of failure. + # https://github.com/bastibe/python-soundfile/issues/274 for _ in range(self.num_audio_retries): try: audio_segment = AudioSegment.segment_from_file( - audio_filepath, target_sr=self.sample_rate, n_segments=self.n_samples, + audio_filepath, target_sr=self.sample_rate, n_segments=self.n_samples, dtype="int32" ) return audio_segment except Exception: From c5d917bd186d9fc2a1690760c60cc2ddfe02e948 Mon Sep 17 00:00:00 2001 From: Robin Dong Date: Tue, 26 Sep 2023 14:36:26 +1000 Subject: [PATCH 054/112] Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong --------- Signed-off-by: Robin Dong Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../fastpitch_align_multispeaker_22050.yaml | 261 ++++++++++++++++++ .../ds_conf/ds_for_fastpitch_align.yaml | 49 ++++ .../tts/aishell3/get_data.py | 156 +++++++++++ 3 files changed, 466 insertions(+) create mode 100644 examples/tts/conf/zh/fastpitch_align_multispeaker_22050.yaml create mode 100755 scripts/dataset_processing/tts/aishell3/ds_conf/ds_for_fastpitch_align.yaml create mode 100755 scripts/dataset_processing/tts/aishell3/get_data.py diff --git a/examples/tts/conf/zh/fastpitch_align_multispeaker_22050.yaml b/examples/tts/conf/zh/fastpitch_align_multispeaker_22050.yaml new file mode 100644 index 000000000000..2464e546598e --- /dev/null +++ b/examples/tts/conf/zh/fastpitch_align_multispeaker_22050.yaml @@ -0,0 +1,261 @@ +# This config contains the default values for training FastPitch model with aligner using 22KHz sampling +# rate. If you want to train model on other dataset, you can change config values according to your dataset. +# Most dataset-specific arguments are in the head of the config file, see below. + +name: FastPitch + +train_dataset: ??? +validation_datasets: ??? +sup_data_path: ??? +sup_data_types: [ "align_prior_matrix", "pitch", "speaker_id"] + +# Default values from librosa.pyin +pitch_fmin: 65.40639132514966 +pitch_fmax: 1986.977294921875 + +# these frame-wise values depend on pitch_fmin and pitch_fmax, you can get values +# by running `scripts/dataset_processing/tts/extract_sup_data.py` +pitch_mean: ??? # e.g. 221.4948272705078 for SFbilingual dataset. +pitch_std: ??? # e.g. 64.6528930664063 for SFbilingual dataset. + +# Default values for dataset with sample_rate=22050 +sample_rate: 22050 +n_mel_channels: 80 +n_window_size: 1024 +n_window_stride: 256 +n_fft: 1024 +lowfreq: 0 +highfreq: null +window: hann + +# There are four candidates of `phoneme_dict_path` provided for Chinese as shown below, +# 1) 24-final Pinyin: "scripts/tts_dataset_files/zh/24finals/pinyin_dict_nv_22.10.txt", +# 2) IPA converted from 24-final Pinyin: "scripts/tts_dataset_files/zh/24finals/ipa_dict_nv23.05.txt", +# 3) 36-final Pinyin: "scripts/tts_dataset_files/zh/36finals/pinyin_dict_nv23.05.txt", +# 4) (default) IPA converted from 36-final Pinyin: "scripts/tts_dataset_files/zh/36finals/ipa_dict_nv23.05.txt" +# Suggest to choose IPA symbol set converted from 36-final Pinyin because better audio quality were observed. +phoneme_dict_path: "scripts/tts_dataset_files/zh/36finals/ipa_dict_nv23.05.txt" + +model: + learn_alignment: true + bin_loss_warmup_epochs: 100 + + n_speakers: 1958 + max_token_duration: 75 + symbols_embedding_dim: 384 + pitch_embedding_kernel_size: 3 + speaker_emb_condition_prosody: true + speaker_emb_condition_aligner: true + + pitch_fmin: ${pitch_fmin} + pitch_fmax: ${pitch_fmax} + + pitch_mean: ${pitch_mean} + pitch_std: ${pitch_std} + + sample_rate: ${sample_rate} + n_mel_channels: ${n_mel_channels} + n_window_size: ${n_window_size} + n_window_stride: ${n_window_stride} + n_fft: ${n_fft} + lowfreq: ${lowfreq} + highfreq: ${highfreq} + window: ${window} + + text_normalizer: + _target_: nemo_text_processing.text_normalization.normalize.Normalizer + lang: zh + input_case: cased + + text_normalizer_call_kwargs: + verbose: false + punct_pre_process: true + punct_post_process: true + + text_tokenizer: + _target_: nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.ChinesePhonemesTokenizer + punct: true + apostrophe: true + pad_with_space: true + g2p: + _target_: nemo.collections.tts.g2p.models.zh_cn_pinyin.ChineseG2p + phoneme_dict: ${phoneme_dict_path} + word_segmenter: jieba # Only jieba is supported now. + phoneme_prefix: "" + phoneme_case: lower + tone_prefix: "#" + ascii_letter_prefix: "" + ascii_letter_case: upper + + train_ds: + dataset: + _target_: nemo.collections.tts.data.dataset.TTSDataset + manifest_filepath: ${train_dataset} + sample_rate: ${model.sample_rate} + sup_data_path: ${sup_data_path} + sup_data_types: ${sup_data_types} + n_fft: ${model.n_fft} + win_length: ${model.n_window_size} + hop_length: ${model.n_window_stride} + window: ${model.window} + n_mels: ${model.n_mel_channels} + lowfreq: ${model.lowfreq} + highfreq: ${model.highfreq} + max_duration: null # change to null to include longer audios. + min_duration: 0.1 + ignore_file: null + trim: true + trim_top_db: 50 + trim_frame_length: ${model.n_window_size} + trim_hop_length: ${model.n_window_stride} + pitch_fmin: ${model.pitch_fmin} + pitch_fmax: ${model.pitch_fmax} + pitch_norm: true + pitch_mean: ${model.pitch_mean} + pitch_std: ${model.pitch_std} + + dataloader_params: + drop_last: false + shuffle: true + batch_size: 32 + num_workers: 12 + pin_memory: true + + validation_ds: + dataset: + _target_: nemo.collections.tts.data.dataset.TTSDataset + manifest_filepath: ${validation_datasets} + sample_rate: ${model.sample_rate} + sup_data_path: ${sup_data_path} + sup_data_types: ${sup_data_types} + n_fft: ${model.n_fft} + win_length: ${model.n_window_size} + hop_length: ${model.n_window_stride} + window: ${model.window} + n_mels: ${model.n_mel_channels} + lowfreq: ${model.lowfreq} + highfreq: ${model.highfreq} + max_duration: null # change to null to include longer audios. + min_duration: 0.1 + ignore_file: null + trim: true + trim_top_db: 50 + trim_frame_length: ${model.n_window_size} + trim_hop_length: ${model.n_window_stride} + pitch_fmin: ${model.pitch_fmin} + pitch_fmax: ${model.pitch_fmax} + pitch_norm: true + pitch_mean: ${model.pitch_mean} + pitch_std: ${model.pitch_std} + + dataloader_params: + drop_last: false + shuffle: false + batch_size: 32 + num_workers: 2 + pin_memory: true + + preprocessor: + _target_: nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor + features: ${model.n_mel_channels} + lowfreq: ${model.lowfreq} + highfreq: ${model.highfreq} + n_fft: ${model.n_fft} + n_window_size: ${model.n_window_size} + window_size: false + n_window_stride: ${model.n_window_stride} + window_stride: false + pad_to: 1 + pad_value: 0 + sample_rate: ${model.sample_rate} + window: ${model.window} + normalize: null + preemph: null + dither: 0.0 + frame_splicing: 1 + log: true + log_zero_guard_type: add + log_zero_guard_value: 1e-05 + mag_power: 1.0 + + input_fft: #n_embed and padding_idx are added by the model + _target_: nemo.collections.tts.modules.transformer.FFTransformerEncoder + n_layer: 6 + n_head: 1 + d_model: ${model.symbols_embedding_dim} + d_head: 64 + d_inner: 1536 + kernel_size: 3 + dropout: 0.1 + dropatt: 0.1 + dropemb: 0.0 + d_embed: ${model.symbols_embedding_dim} + + output_fft: + _target_: nemo.collections.tts.modules.transformer.FFTransformerDecoder + n_layer: 6 + n_head: 1 + d_model: ${model.symbols_embedding_dim} + d_head: 64 + d_inner: 1536 + kernel_size: 3 + dropout: 0.1 + dropatt: 0.1 + dropemb: 0.0 + + alignment_module: + _target_: nemo.collections.tts.modules.aligner.AlignmentEncoder + n_text_channels: ${model.symbols_embedding_dim} + + duration_predictor: + _target_: nemo.collections.tts.modules.fastpitch.TemporalPredictor + input_size: ${model.symbols_embedding_dim} + kernel_size: 3 + filter_size: 256 + dropout: 0.1 + n_layers: 2 + + pitch_predictor: + _target_: nemo.collections.tts.modules.fastpitch.TemporalPredictor + input_size: ${model.symbols_embedding_dim} + kernel_size: 3 + filter_size: 256 + dropout: 0.1 + n_layers: 2 + + optim: + name: adamw + lr: 1e-3 + betas: [0.9, 0.999] + weight_decay: 1e-6 + + sched: + name: NoamAnnealing + warmup_steps: 1000 + last_epoch: -1 + d_model: 1 # Disable scaling based on model dim + +trainer: + num_nodes: 1 + devices: -1 # number of gpus + accelerator: gpu + strategy: ddp + precision: 16 + max_epochs: 5000 + accumulate_grad_batches: 1 + gradient_clip_val: 1000.0 + enable_checkpointing: false # Provided by exp_manager + logger: false # Provided by exp_manager + log_every_n_steps: 100 + check_val_every_n_epoch: 5 + benchmark: false + +exp_manager: + exp_dir: null + name: ${name} + create_tensorboard_logger: true + create_checkpoint_callback: true + checkpoint_callback_params: + monitor: val_loss + resume_if_exists: false + resume_ignore_no_checkpoint: false diff --git a/scripts/dataset_processing/tts/aishell3/ds_conf/ds_for_fastpitch_align.yaml b/scripts/dataset_processing/tts/aishell3/ds_conf/ds_for_fastpitch_align.yaml new file mode 100755 index 000000000000..f8298d7c2680 --- /dev/null +++ b/scripts/dataset_processing/tts/aishell3/ds_conf/ds_for_fastpitch_align.yaml @@ -0,0 +1,49 @@ +name: "ds_for_fastpitch_align" + +manifest_filepath: "train_manifest.json" +sup_data_path: "sup_data" +sup_data_types: [ "align_prior_matrix", "pitch", "speaker_id"] +phoneme_dict_path: "scripts/tts_dataset_files/zh/24finals/pinyin_dict_nv_22.10.txt" + +dataset: + _target_: nemo.collections.tts.data.dataset.TTSDataset + manifest_filepath: ${manifest_filepath} + sample_rate: 22050 + sup_data_path: ${sup_data_path} + sup_data_types: ${sup_data_types} + n_fft: 1024 + win_length: 1024 + hop_length: 256 + window: "hann" + n_mels: 80 + lowfreq: 0 + highfreq: null + max_duration: null + min_duration: 0.1 + ignore_file: null + trim: true + trim_top_db: 50 + trim_frame_length: 1024 + trim_hop_length: 256 + pitch_fmin: 65.40639132514966 + pitch_fmax: 2093.004522404789 + + text_normalizer: + _target_: nemo_text_processing.text_normalization.normalize.Normalizer + lang: zh + input_case: cased + + text_normalizer_call_kwargs: + verbose: false + punct_pre_process: true + punct_post_process: true + + text_tokenizer: + _target_: nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.ChinesePhonemesTokenizer + punct: true + apostrophe: true + pad_with_space: true + g2p: + _target_: nemo.collections.tts.g2p.models.zh_cn_pinyin.ChineseG2p + phoneme_dict: ${phoneme_dict_path} + word_segmenter: jieba # Only jieba is supported now. diff --git a/scripts/dataset_processing/tts/aishell3/get_data.py b/scripts/dataset_processing/tts/aishell3/get_data.py new file mode 100755 index 000000000000..904ab0314653 --- /dev/null +++ b/scripts/dataset_processing/tts/aishell3/get_data.py @@ -0,0 +1,156 @@ +# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Disclaimer: +# Each user is responsible for checking the content of datasets and the applicable licenses and determining if suitable for the intended use. + +import argparse +import json +import os +import random +import subprocess +import tarfile +import urllib.request +from pathlib import Path + +import numpy as np +from nemo_text_processing.text_normalization.normalize import Normalizer +from opencc import OpenCC + +URL = "https://www.openslr.org/resources/93/data_aishell3.tgz" + + +def get_args(): + parser = argparse.ArgumentParser( + description='Prepare SF_bilingual dataset and create manifests with predefined split' + ) + + parser.add_argument( + "--data-root", + type=Path, + help="where the dataset will reside", + default="./DataChinese/sf_bilingual_speech_zh_en_vv1/SF_bilingual/", + ) + parser.add_argument( + "--manifests-path", type=Path, help="where the resulting manifests files will reside", default="./" + ) + parser.add_argument("--val-size", default=0.01, type=float, help="eval set split") + parser.add_argument("--test-size", default=0.01, type=float, help="test set split") + parser.add_argument( + "--seed-for-ds-split", + default=100, + type=float, + help="Seed for deterministic split of train/dev/test, NVIDIA's default is 100", + ) + + args = parser.parse_args() + return args + + +def __maybe_download_file(source_url, destination_path): + if not destination_path.exists(): + tmp_file_path = destination_path.with_suffix('.tmp') + urllib.request.urlretrieve(source_url, filename=str(tmp_file_path)) + tmp_file_path.rename(destination_path) + + +def __extract_file(filepath, data_dir): + try: + tar = tarfile.open(filepath) + tar.extractall(data_dir) + tar.close() + except Exception: + print(f"Error while extracting {filepath}. Already extracted?") + + +def __process_transcript(file_path: str): + # Create directory for processed wav files + Path(file_path / "processed").mkdir(parents=True, exist_ok=True) + # Create zh-TW to zh-simplify converter + cc = OpenCC('t2s') + # Create normalizer + text_normalizer = Normalizer( + lang="zh", input_case="cased", overwrite_cache=True, cache_dir=str(file_path / "cache_dir"), + ) + text_normalizer_call_kwargs = {"punct_pre_process": True, "punct_post_process": True} + normalizer_call = lambda x: text_normalizer.normalize(x, **text_normalizer_call_kwargs) + entries = [] + i = 0 + SPEAKER_LEN = 7 + with open(file_path / "train" / "content.txt", encoding="utf-8") as fin: + for line in fin: + content = line.split() + wav_name, text = content[0], "".join(content[1::2]) + "。" + wav_name = wav_name.replace(u'\ufeff', '') + speaker = wav_name[:SPEAKER_LEN] + wav_file = file_path / "train" / "wav" / speaker / wav_name + assert os.path.exists(wav_file), f"{wav_file} not found!" + duration = subprocess.check_output(f"soxi -D {wav_file}", shell=True) + if float(duration) <= 3.0: # filter out wav files shorter than 3 seconds + continue + processed_file = file_path / "processed" / wav_name + # convert wav to mono 22050HZ, 16 bit (as SFSpeech dataset) + subprocess.run(f"sox {wav_file} -r 22050 -c 1 -b 16 {processed_file}", shell=True) + simplified_text = cc.convert(text) + normalized_text = normalizer_call(simplified_text) + entry = { + 'audio_filepath': os.path.abspath(processed_file), + 'duration': float(duration), + 'text': text, + 'normalized_text': normalized_text, + 'speaker': int(speaker[3:]), + } + + i += 1 + entries.append(entry) + return entries + + +def __process_data(dataset_path, val_size, test_size, seed_for_ds_split, manifests_dir): + entries = __process_transcript(dataset_path) + + random.Random(seed_for_ds_split).shuffle(entries) + + train_size = 1.0 - val_size - test_size + train_entries, validate_entries, test_entries = np.split( + entries, [int(len(entries) * train_size), int(len(entries) * (train_size + val_size))] + ) + + assert len(train_entries) > 0, "Not enough data for train, val and test" + + def save(p, data): + with open(p, 'w') as f: + for d in data: + f.write(json.dumps(d) + '\n') + + save(manifests_dir / "train_manifest.json", train_entries) + save(manifests_dir / "val_manifest.json", validate_entries) + save(manifests_dir / "test_manifest.json", test_entries) + + +def main(): + args = get_args() + + tarred_data_path = args.data_root / "data_aishell3.tgz" + + __maybe_download_file(URL, tarred_data_path) + __extract_file(str(tarred_data_path), str(args.data_root)) + + __process_data( + args.data_root, args.val_size, args.test_size, args.seed_for_ds_split, args.manifests_path, + ) + + +if __name__ == "__main__": + main() From e12d30ad38f21f66c09a6dee6f4c7b49077b9b70 Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Tue, 26 Sep 2023 07:49:24 -0700 Subject: [PATCH 055/112] dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman Signed-off-by: Sasha Meister --- nemo/utils/loggers/dllogger.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/nemo/utils/loggers/dllogger.py b/nemo/utils/loggers/dllogger.py index 0b2fde061ff9..cdeef63b75f7 100644 --- a/nemo/utils/loggers/dllogger.py +++ b/nemo/utils/loggers/dllogger.py @@ -20,6 +20,7 @@ from lightning_utilities.core.apply_func import apply_to_collection from omegaconf import DictConfig, ListConfig, OmegaConf from pytorch_lightning.loggers import Logger +from pytorch_lightning.utilities import rank_zero_only from pytorch_lightning.utilities.parsing import AttributeDict from nemo.utils import logging @@ -81,6 +82,7 @@ def __init__(self, stdout: bool, verbose: bool, json_file: str): ) dllogger.init(backends=backends) + @rank_zero_only def log_hyperparams(self, params, *args, **kwargs): if isinstance(params, Namespace): params = vars(params) @@ -91,6 +93,7 @@ def log_hyperparams(self, params, *args, **kwargs): params = _sanitize_callable_params(_flatten_dict(_convert_params(params))) dllogger.log(step="PARAMETER", data=params) + @rank_zero_only def log_metrics(self, metrics, step=None): if step is None: step = tuple() From e60cd386ca64cdab8467fbbfe20b2ff444590366 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 26 Sep 2023 10:24:49 -0700 Subject: [PATCH 056/112] Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister --- tutorials/tts/FastPitch_Adapter_Finetuning.ipynb | 6 +++--- tutorials/tts/FastPitch_MultiSpeaker_Pretraining.ipynb | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tutorials/tts/FastPitch_Adapter_Finetuning.ipynb b/tutorials/tts/FastPitch_Adapter_Finetuning.ipynb index 263d22b60599..5fe61d596f4b 100644 --- a/tutorials/tts/FastPitch_Adapter_Finetuning.ipynb +++ b/tutorials/tts/FastPitch_Adapter_Finetuning.ipynb @@ -100,7 +100,7 @@ "# .nemo files for your pre-trained FastPitch and HiFiGAN\n", "pretrained_fastpitch_checkpoint = \"\"\n", "finetuned_hifigan_on_multispeaker_checkpoint = \"\"\n", - "use_ipa = True #Set to False while using Arpabet." + "use_ipa = False #Set to False while using Arpabet." ] }, { @@ -505,7 +505,7 @@ "+exp_manager.wandb_logger_kwargs.name=\"tutorial-FastPitch-finetune-adaptation\" \\\n", "+exp_manager.wandb_logger_kwargs.project=\"NeMo\" \\\n", "+exp_manager.checkpoint_callback_params.save_top_k=-1 \\\n", - "trainer.max_epochs=200 \\\n", + "trainer.max_epochs=20 \\\n", "trainer.check_val_every_n_epoch=10 \\\n", "trainer.log_every_n_steps=1 \\\n", "trainer.devices=1 \\\n", @@ -612,7 +612,7 @@ "model.optim.lr=0.0001 \\\n", "model/train_ds=train_ds_finetune \\\n", "model/validation_ds=val_ds_finetune \\\n", - "+trainer.max_epochs=500 \\\n", + "+trainer.max_epochs=50 \\\n", "trainer.check_val_every_n_epoch=5 \\\n", "trainer.devices=-1 \\\n", "trainer.strategy='ddp' \\\n", diff --git a/tutorials/tts/FastPitch_MultiSpeaker_Pretraining.ipynb b/tutorials/tts/FastPitch_MultiSpeaker_Pretraining.ipynb index cb5cb651d76e..a031723f549b 100644 --- a/tutorials/tts/FastPitch_MultiSpeaker_Pretraining.ipynb +++ b/tutorials/tts/FastPitch_MultiSpeaker_Pretraining.ipynb @@ -511,7 +511,7 @@ "+trainer.max_epochs=5 \\\n", "trainer.check_val_every_n_epoch=5 \\\n", "trainer.devices=1 \\\n", - "trainer.strategy='ddp' \\\n", + "trainer.strategy='auto' \\\n", "trainer.precision=16 \\\n", "exp_manager.exp_dir={logs_dir} \\\n", "exp_manager.create_wandb_logger=True \\\n", From 1c3c43cacbdc35fd71dbc78c1dafa0df5bd4e48a Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 26 Sep 2023 10:25:23 -0700 Subject: [PATCH 057/112] Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang Co-authored-by: Jocelyn Signed-off-by: Sasha Meister --- nemo/collections/tts/modules/aligner.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nemo/collections/tts/modules/aligner.py b/nemo/collections/tts/modules/aligner.py index 2910602474fd..f044a86a52eb 100644 --- a/nemo/collections/tts/modules/aligner.py +++ b/nemo/collections/tts/modules/aligner.py @@ -98,7 +98,7 @@ def get_dist(self, keys, queries, mask=None): self._apply_mask(dist, mask, float("inf")) - return dist + return dist.squeeze(1) @staticmethod def get_euclidean_dist(queries_enc, keys_enc): From c2d70e6b0ec67981aadbc83aedb00d73cb34e0bd Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 26 Sep 2023 10:25:50 -0700 Subject: [PATCH 058/112] bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister --- tutorials/tts/Vits_Training.ipynb | 1 + 1 file changed, 1 insertion(+) diff --git a/tutorials/tts/Vits_Training.ipynb b/tutorials/tts/Vits_Training.ipynb index 398ca377eff5..296661c6ca5e 100644 --- a/tutorials/tts/Vits_Training.ipynb +++ b/tutorials/tts/Vits_Training.ipynb @@ -309,6 +309,7 @@ " heteronyms_path=tts_dataset_files/heteronyms-052722 \\\n", " trainer.max_epochs=3 \\\n", " trainer.accelerator=auto \\\n", + " trainer.strategy=auto \\\n", " trainer.check_val_every_n_epoch=1 \\\n", " trainer.devices=1)" ] From 9317d15281bc8ab702e19bd9ff5a0bdef200abdb Mon Sep 17 00:00:00 2001 From: Abhinav Khattar Date: Tue, 26 Sep 2023 13:30:31 -0400 Subject: [PATCH 059/112] fix (#7511) Signed-off-by: Abhinav Khattar Signed-off-by: Sasha Meister --- examples/nlp/language_modeling/tuning/megatron_gpt_sft.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py b/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py index 15d3dc53c348..2d0a0d38a762 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py @@ -66,6 +66,9 @@ def _modify_config(gpt_cfg, cfg, add_cfg_to_tree=False): gpt_cfg.attention_dropout = cfg.model.get('attention_dropout', 0.0) gpt_cfg.ffn_dropout = cfg.model.ffn_dropout gpt_cfg.use_flash_attention = cfg.model.get('use_flash_attention', False) + gpt_cfg.tensor_model_parallel_size = cfg.model.get('tensor_model_parallel_size', 1) + gpt_cfg.pipeline_model_parallel_size = cfg.model.get('pipeline_model_parallel_size', 1) + gpt_cfg.pipeline_model_parallel_split_rank = cfg.model.get('pipeline_model_parallel_split_rank', 0) sft_cls = MegatronGPTSFTModel gpt_cfg.target = f"{sft_cls.__module__}.{sft_cls.__name__}" From 20b15f3bf6d503653446da6cf6a1cc9c1ebafb4c Mon Sep 17 00:00:00 2001 From: Ryan Langman Date: Tue, 26 Sep 2023 17:14:16 -0700 Subject: [PATCH 060/112] [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan Signed-off-by: Sasha Meister --- scripts/dataset_processing/tts/preprocess_text.py | 4 ++-- tutorials/tts/FastPitch_Data_Preparation.ipynb | 14 +++++++++----- 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/scripts/dataset_processing/tts/preprocess_text.py b/scripts/dataset_processing/tts/preprocess_text.py index 2edc52969140..580a84a02d6f 100644 --- a/scripts/dataset_processing/tts/preprocess_text.py +++ b/scripts/dataset_processing/tts/preprocess_text.py @@ -22,9 +22,9 @@ --input_manifest="/manifest.json" \ --output_manifest="/manifest_processed.json" \ --normalizer_config_path="/examples/tts/conf/text/normalizer_en.yaml" \ - --lower_case=True \ + --lower_case \ --num_workers=4 \ - --batch_size=16 + --joblib_batch_size=16 """ import argparse diff --git a/tutorials/tts/FastPitch_Data_Preparation.ipynb b/tutorials/tts/FastPitch_Data_Preparation.ipynb index 46778759d5cb..5a42018c70b5 100644 --- a/tutorials/tts/FastPitch_Data_Preparation.ipynb +++ b/tutorials/tts/FastPitch_Data_Preparation.ipynb @@ -332,6 +332,8 @@ "lower_case = True\n", "# Whether to overwrite output manifest, if it exists\n", "overwrite_manifest = True\n", + "# Batch size for joblib parallelization. Increasing this value might speed up the script, depending on your CPU.\n", + "joblib_batch_size = 16\n", "\n", "# Python wrapper to invoke the given bash script with the given input args\n", "def run_script(script, args):\n", @@ -351,8 +353,10 @@ " f\"--output_manifest={output_filepath}\",\n", " f\"--num_workers={num_workers}\",\n", " f\"--normalizer_config_path={normalizer_config_filepath}\",\n", - " f\"--lower_case={lower_case}\"\n", + " f\"--joblib_batch_size={joblib_batch_size}\"\n", " ]\n", + " if lower_case:\n", + " args.append(\"--lower_case\")\n", " if overwrite_manifest:\n", " args.append(\"--overwrite\")\n", "\n", @@ -787,7 +791,7 @@ "\n", "We will train HiFi-GAN first so that we can use it to help evaluate the performance of FastPitch as it is being trained.\n", "\n", - "HiFi-GAN training only requires a manifest with with the `audio_filepath` field. All other fields in the manifest are for FastPitch training.\n", + "HiFi-GAN training only requires a manifest with the `audio_filepath` field. All other fields in the manifest are for FastPitch training.\n", "\n", "Here we show how to train these models from scratch. You can also fine-tune them from pretrained checkpoints as mentioned in our [FastPitch fine-tuning tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/FastPitch_Finetuning.ipynb), but pretrained checkpoints compatible with these experimental recipes are not yet available on NGC.\n" ], @@ -914,7 +918,7 @@ { "cell_type": "code", "source": [ - "hifigan_log_epoch_dir = hifigan_log_dir / \"epoch_10\"\n", + "hifigan_log_epoch_dir = hifigan_log_dir / \"epoch_10\" / dataset_name\n", "!ls $hifigan_log_epoch_dir" ], "metadata": { @@ -966,7 +970,7 @@ "1. Training manifest(s) with `audio_filepath` and `text` or `normalized_text` fields.\n", "2. Precomputed features such as *pitch* and *energy* specified in the feature [config file](https://github.com/NVIDIA/NeMo/blob/main/examples/tts/conf/feature/feature_44100.yaml).\n", "3. (Optional) Statistics file for normalizing features.\n", - "4. (Optional) For a multi-speaker model, the manifest needs a `speaker` field amd JSON file mapping speaker IDs to speaker indices.\n", + "4. (Optional) For a multi-speaker model, the manifest needs a `speaker` field and JSON file mapping speaker IDs to speaker indices.\n", "5. (Optional) To train with IPA phonemes, a [phoneme dictionary](https://github.com/NVIDIA/NeMo/blob/main/scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt) and optional [heteronyms file](https://github.com/NVIDIA/NeMo/blob/main/scripts/tts_dataset_files/heteronyms-052722)\n", "6. (Optional) HiFi-GAN checkpoint or [NGC model name](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/tts/models/hifigan.py#L413) for generating audio predictions during training.\n", "\n" @@ -1093,7 +1097,7 @@ { "cell_type": "code", "source": [ - "faspitch_log_epoch_dir = fastpitch_log_dir / \"epoch_10\"\n", + "faspitch_log_epoch_dir = fastpitch_log_dir / \"epoch_10\" / dataset_name\n", "!ls $faspitch_log_epoch_dir" ], "metadata": { From 8b159846b9b7b8a4124b62d0bbe1fd4ec93dd2aa Mon Sep 17 00:00:00 2001 From: Giacomo Leone Maria Cavallini <72698188+GiacomoLeoneMaria@users.noreply.github.com> Date: Wed, 27 Sep 2023 08:12:26 +0200 Subject: [PATCH 061/112] add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria * add test Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../tokenizers/text_to_speech/ipa_lexicon.py | 50 +++++++++++++++++-- .../text_to_speech/tokenizer_utils.py | 5 ++ .../text_to_speech/tts_tokenizers.py | 32 +++++++++++- .../text_to_speech/test_tts_tokenizers.py | 13 +++++ 4 files changed, 95 insertions(+), 5 deletions(-) diff --git a/nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py b/nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py index 746c783bd1b6..2e1bb359102b 100644 --- a/nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py +++ b/nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py @@ -15,7 +15,7 @@ # fmt: off -SUPPORTED_LOCALES = ["en-US", "de-DE", "es-ES"] +SUPPORTED_LOCALES = ["en-US", "de-DE", "es-ES", "it-IT"] DEFAULT_PUNCTUATION = ( ',', '.', '!', '?', '-', @@ -48,6 +48,12 @@ 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'Ä', 'Ö', 'Ü', 'ẞ', ), + "it-IT": ( + 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', + 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', + 'U', 'V', 'W', 'X', 'Y', 'Z', 'À', 'È', 'É', 'Ì', + 'Ò', 'Ù' + ), } IPA_CHARACTER_SETS = { @@ -70,7 +76,20 @@ 'w', 'x', 'y', 'z', 'ç', 'ø', 'ŋ', 'œ', 'ɐ', 'ɑ', 'ɒ', 'ɔ', 'ə', 'ɛ', 'ɜ', 'ɡ', 'ɪ', 'ɹ', 'ɾ', 'ʃ', 'ʊ', 'ʌ', 'ʒ', '̃', 'θ' - ) + ), + "it-IT": ( + 'a', 'b', 'd', 'e', 'f', 'h', 'i', 'j', 'k', 'l', + 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'v', 'w', + 'x', 'z', 'æ', 'ɐ', 'ɑ', 'ɔ', 'ə', 'ɚ', + 'ɜ', 'ɬ', 'ɹ', 'ʌ', 'ʔ', 'ʲ', '̃', '̩', 'ᵻ', + 'ð', 'ŋ', 'ɛ', 'ɡ', 'ɣ', 'ɪ', 'ɲ', 'ɾ', 'ʃ', + 'ʊ', 'ʎ', 'ʒ', 'ʝ', 'β', 'θ', 'd͡', 't͡', 'ø', 'ɒ', + 'ɕ', 'ɓ', 'ç', 'ɖ', 'ɘ', 'ɝ', 'ɞ', 'ɟ','ʄ','ɡ','ɠ', + 'ɢ','ʛ','ɦ','ɧ','ħ','ɥ','ʜ','ɨ','ɬ','ɫ','ɮ','ʟ', + 'ɱ','ɯ','ɰ','ɳ','ɵ','ɸ','œ','ɶ','ʘ','ɺ','ɻ','ʀ','ʁ', + 'ɽ','ʂ','ʈ','ʧ','ʉ','ʋ','ⱱ','ɤ','ʍ','χ','ʏ','ʑ','ʐ', + 'ʔ','ʡ','ʕ','ʢ','ǀ','ǁ','ǂ','ᵻ' + ), } GRAPHEME_CHARACTER_CASES = ["upper", "lower", "mixed"] @@ -124,7 +143,7 @@ def get_ipa_punctuation_list(locale): punct_set = set(DEFAULT_PUNCTUATION) # TODO @xueyang: verify potential mismatches with locale-specific punctuation sets used # in nemo_text_processing.text_normalization.en.taggers.punctuation.py - if locale in ["de-DE", "es-ES"]: + if locale in ["de-DE", "es-ES", "it-IT"]: # ref: https://en.wikipedia.org/wiki/Guillemet#Uses punct_set.update(['«', '»', '‹', '›']) if locale == "de-DE": @@ -140,6 +159,31 @@ def get_ipa_punctuation_list(locale): '—', # em dash, U+2014, decimal 8212 ] ) + if locale == "it-IT": + # ref: https://en.wikipedia.org/wiki/German_orthography#Punctuation + punct_set.update( + [ + '„', # double low-9 quotation mark, U+201E, decimal 8222 + '“', # left double quotation mark, U+201C, decimal 8220 + '‚', # single low-9 quotation mark, U+201A, decimal 8218 + '‘', # left single quotation mark, U+2018, decimal 8216 + '‒', # figure dash, U+2012, decimal 8210 + '–', # en dash, U+2013, decimal 8211 + '—', # em dash, U+2014, decimal 8212 + 'ʴ', + 'ʰ', + 'ʱ', + 'ʲ', + 'ʷ', + 'ˠ', + 'ˤ', + '˞↓', + '↑', + '→', + '↗', + '↘,', + ] + ) elif locale == "es-ES": # ref: https://en.wikipedia.org/wiki/Spanish_orthography#Punctuation punct_set.update(['¿', '¡']) diff --git a/nemo/collections/common/tokenizers/text_to_speech/tokenizer_utils.py b/nemo/collections/common/tokenizers/text_to_speech/tokenizer_utils.py index 92a3e0fb49e0..ad9ad9f4e898 100644 --- a/nemo/collections/common/tokenizers/text_to_speech/tokenizer_utils.py +++ b/nemo/collections/common/tokenizers/text_to_speech/tokenizer_utils.py @@ -23,6 +23,7 @@ "english_text_preprocessing", "any_locale_text_preprocessing", "spanish_text_preprocessing", + "italian_text_preprocessing", "any_locale_word_tokenize", "english_word_tokenize", "LATIN_CHARS_ALL", @@ -189,5 +190,9 @@ def spanish_text_preprocessing(text: str) -> str: return text.lower() +def italian_text_preprocessing(text: str) -> str: + return text.lower() + + def chinese_text_preprocessing(text: str) -> str: return text diff --git a/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py b/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py index 4193cf00eb85..32f725c9c73f 100644 --- a/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py +++ b/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py @@ -28,6 +28,7 @@ any_locale_text_preprocessing, chinese_text_preprocessing, english_text_preprocessing, + italian_text_preprocessing, spanish_text_preprocessing, ) from nemo.utils import logging @@ -267,6 +268,34 @@ def __init__( ) +class ItalianCharsTokenizer(BaseCharsTokenizer): + PUNCT_LIST = get_ipa_punctuation_list("it-IT") + + def __init__( + self, punct=True, apostrophe=True, add_blank_at=None, pad_with_space=False, non_default_punct_list=None + ): + """Italian grapheme tokenizer. + Args: + punct: Whether to reserve grapheme for basic punctuation or not. + apostrophe: Whether to use apostrophe or not. + add_blank_at: Add blank to labels in the specified order ("last") or after tokens (any non None), + if None then no blank in labels. + pad_with_space: Whether to pad text with spaces at the beginning and at the end or not. + non_default_punct_list: List of punctuation marks which will be used instead default. + """ + + it_alphabet = "abcdefghijklmnopqrstuvwxyzàèéìòù" + super().__init__( + chars=it_alphabet, + punct=punct, + apostrophe=apostrophe, + add_blank_at=add_blank_at, + pad_with_space=pad_with_space, + non_default_punct_list=non_default_punct_list, + text_preprocessing_func=italian_text_preprocessing, + ) + + class GermanPhonemesTokenizer(BaseCharsTokenizer): # fmt: off PUNCT_LIST = ( # Derived from LJSpeech and "/" additionally @@ -694,8 +723,7 @@ def __init__( pad_with_space=False, text_preprocessing_func=chinese_text_preprocessing, ): - """ - Chinese phoneme-based tokenizer. + """Chinese phoneme-based tokenizer. Note: This tokenizer for now covers Chinese phonemes/tones and English letters because our dataset contains both Chinese and English graphemes. Args: diff --git a/tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py b/tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py index e4e16fa31d68..62c571bc16b7 100644 --- a/tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py +++ b/tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py @@ -18,6 +18,7 @@ EnglishCharsTokenizer, GermanCharsTokenizer, IPATokenizer, + ItalianCharsTokenizer, SpanishCharsTokenizer, ) from nemo.collections.tts.g2p.models.i18n_ipa import IpaG2p @@ -89,6 +90,18 @@ def test_german_chars_tokenizer(self): assert chars == expected_output assert len(tokens) == len(input_text) + @pytest.mark.run_only_on('CPU') + @pytest.mark.unit + def test_italian_chars_tokenizer(self): + input_text = "Ciao mondo!" + expected_output = "ciao mondo!" + + tokenizer = ItalianCharsTokenizer() + chars, tokens = self._parse_text(tokenizer, input_text) + + assert chars == expected_output + assert len(tokens) == len(input_text) + @pytest.mark.run_only_on('CPU') @pytest.mark.unit def test_spanish_chars_tokenizer(self): From e81dda247a2ec248ae7da55ab428036aefef75af Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 26 Sep 2023 23:13:39 -0700 Subject: [PATCH 062/112] Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister --- tutorials/02_NeMo_Adapters.ipynb | 2 +- tutorials/asr/ASR_TTS_Tutorial.ipynb | 2 +- tutorials/asr/Self_Supervised_Pre_Training.ipynb | 6 +++--- tutorials/asr/Speech_Commands.ipynb | 2 +- tutorials/asr/Voice_Activity_Detection.ipynb | 2 +- .../speech_enhancement/Speech_Enhancement_with_NeMo.ipynb | 6 +++--- tutorials/nlp/Entity_Linking_Medical.ipynb | 2 +- tutorials/nlp/GLUE_Benchmark.ipynb | 4 ++-- tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb | 4 ++-- tutorials/nlp/Punctuation_and_Capitalization.ipynb | 6 +++--- .../nlp/Punctuation_and_Capitalization_Lexical_Audio.ipynb | 4 ++-- tutorials/nlp/Relation_Extraction-BioMegatron.ipynb | 4 ++-- tutorials/nlp/Text_Classification_Sentiment_Analysis.ipynb | 6 +++--- tutorials/nlp/Token_Classification-BioMegatron.ipynb | 2 +- .../nlp/Token_Classification_Named_Entity_Recognition.ipynb | 4 ++-- tutorials/nlp/Zero_Shot_Intent_Recognition.ipynb | 4 ++-- tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb | 2 +- .../speaker_tasks/Speaker_Identification_Verification.ipynb | 2 +- 18 files changed, 32 insertions(+), 32 deletions(-) diff --git a/tutorials/02_NeMo_Adapters.ipynb b/tutorials/02_NeMo_Adapters.ipynb index 51a91a3c7053..289426f3bc2b 100644 --- a/tutorials/02_NeMo_Adapters.ipynb +++ b/tutorials/02_NeMo_Adapters.ipynb @@ -1985,4 +1985,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} \ No newline at end of file +} diff --git a/tutorials/asr/ASR_TTS_Tutorial.ipynb b/tutorials/asr/ASR_TTS_Tutorial.ipynb index 007713ee3cc2..9bbcc8e4aa34 100644 --- a/tutorials/asr/ASR_TTS_Tutorial.ipynb +++ b/tutorials/asr/ASR_TTS_Tutorial.ipynb @@ -553,7 +553,7 @@ "config.trainer.max_epochs = NUM_EPOCHS\n", "\n", "config.trainer.devices = 1\n", - "config.trainer.strategy = None # use 1 device, no need for ddp strategy\n", + "config.trainer.strategy = auto # use 1 device, no need for ddp strategy\n", "\n", "OmegaConf.resolve(config)" ] diff --git a/tutorials/asr/Self_Supervised_Pre_Training.ipynb b/tutorials/asr/Self_Supervised_Pre_Training.ipynb index 04998f68f23e..6f977df492a1 100644 --- a/tutorials/asr/Self_Supervised_Pre_Training.ipynb +++ b/tutorials/asr/Self_Supervised_Pre_Training.ipynb @@ -316,7 +316,7 @@ " cfg.trainer.gpus = 1\n", "else:\n", " cfg.trainer.accelerator = 'cpu'\n", - " cfg.trainer.strategy = None\n", + " cfg.trainer.strategy = auto\n", " cfg.trainer.gpus = 0\n", "\n", "cfg.exp_manager.exp_dir = data_dir + \"/content/exp\"\n", @@ -538,7 +538,7 @@ " cfg.trainer.gpus = 1\n", "else:\n", " cfg.trainer.accelerator = 'cpu'\n", - " cfg.trainer.strategy = None\n", + " cfg.trainer.strategy = auto\n", " cfg.trainer.gpus = 0\n", "\n", "cfg.model.tokenizer.dir = data_dir + \"/tokenizers/an4/tokenizer_spe_unigram_v128/\" # note this is a directory, not a path to a vocabulary file\n", @@ -725,4 +725,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} \ No newline at end of file +} diff --git a/tutorials/asr/Speech_Commands.ipynb b/tutorials/asr/Speech_Commands.ipynb index 208752347d64..c1566ae71850 100644 --- a/tutorials/asr/Speech_Commands.ipynb +++ b/tutorials/asr/Speech_Commands.ipynb @@ -441,7 +441,7 @@ "config.trainer.max_epochs = 5\n", "\n", "# Remove distributed training flags\n", - "config.trainer.strategy = None" + "config.trainer.strategy = auto" ], "execution_count": null, "outputs": [] diff --git a/tutorials/asr/Voice_Activity_Detection.ipynb b/tutorials/asr/Voice_Activity_Detection.ipynb index b1bdd434511b..7c7b95e99416 100644 --- a/tutorials/asr/Voice_Activity_Detection.ipynb +++ b/tutorials/asr/Voice_Activity_Detection.ipynb @@ -462,7 +462,7 @@ "config.trainer.max_epochs = 5\n", "\n", "# Remove distributed training flags\n", - "config.trainer.strategy = None" + "config.trainer.strategy = auto" ] }, { diff --git a/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb b/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb index 41a49688d35e..d8a15cbd5e1c 100644 --- a/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb +++ b/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb @@ -667,7 +667,7 @@ "config.trainer.max_epochs = 10\n", "\n", "# Remove distributed training flags\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "\n", "# Instantiate the trainer\n", "trainer = pl.Trainer(**config.trainer)" @@ -1144,7 +1144,7 @@ "config_dual_output.trainer.max_epochs = 10\n", "\n", "# Remove distributed training flags\n", - "config_dual_output.trainer.strategy = None\n", + "config_dual_output.trainer.strategy = auto\n", "\n", "# Instantiate the trainer\n", "trainer = pl.Trainer(**config_dual_output.trainer)\n", @@ -1313,4 +1313,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +} diff --git a/tutorials/nlp/Entity_Linking_Medical.ipynb b/tutorials/nlp/Entity_Linking_Medical.ipynb index ff8eda123b7f..4f56a85eabfa 100644 --- a/tutorials/nlp/Entity_Linking_Medical.ipynb +++ b/tutorials/nlp/Entity_Linking_Medical.ipynb @@ -187,7 +187,7 @@ "cfg.model.validation_ds.data_file = os.path.join(DATA_DIR, \"tiny_example_validation_pairs.tsv\")\n", "\n", "# remove distributed training flags\n", - "cfg.trainer.strategy = None\n", + "cfg.trainer.strategy = auto\n", "cfg.trainer.accelerator = None" ] }, diff --git a/tutorials/nlp/GLUE_Benchmark.ipynb b/tutorials/nlp/GLUE_Benchmark.ipynb index d8fe75940b09..445ff6705028 100644 --- a/tutorials/nlp/GLUE_Benchmark.ipynb +++ b/tutorials/nlp/GLUE_Benchmark.ipynb @@ -342,7 +342,7 @@ "# config.trainer.amp_level = O1\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "\n", "# setup max number of steps to reduce training time for demonstration purposes of this tutorial\n", "config.trainer.max_steps = 128\n", @@ -563,4 +563,4 @@ ] } ] -} \ No newline at end of file +} diff --git a/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb b/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb index 104d69df18e2..7513183fba28 100644 --- a/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb +++ b/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb @@ -286,7 +286,7 @@ "# config.trainer.amp_level = O1\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "\n", "# setup a small number of epochs for demonstration purposes of this tutorial\n", "config.trainer.max_epochs = 5\n", @@ -705,7 +705,7 @@ "config.trainer.accelerator = accelerator\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "\n", "trainer = pl.Trainer(**config.trainer)\n", "config.exp_manager.exp_dir = os.path.join(DATA_DIR, \"output/\" + run_name)\n", diff --git a/tutorials/nlp/Punctuation_and_Capitalization.ipynb b/tutorials/nlp/Punctuation_and_Capitalization.ipynb index 1519c234372b..e9d1060f6442 100644 --- a/tutorials/nlp/Punctuation_and_Capitalization.ipynb +++ b/tutorials/nlp/Punctuation_and_Capitalization.ipynb @@ -550,7 +550,7 @@ "config.trainer.max_epochs = 1\n", "\n", "# Remove distributed training flags\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "\n", "trainer = pl.Trainer(**config.trainer)" ] @@ -745,7 +745,7 @@ "config.trainer.accelerator = accelerator\n", "config.trainer.precision = 16 if torch.cuda.is_available() else 32\n", "config.trainer.max_epochs = 1\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "\n", "# Exp manager\n", "config.exp_manager.explicit_log_dir = 'tarred_experiment'\n", @@ -1043,4 +1043,4 @@ }, "nbformat": 4, "nbformat_minor": 1 -} \ No newline at end of file +} diff --git a/tutorials/nlp/Punctuation_and_Capitalization_Lexical_Audio.ipynb b/tutorials/nlp/Punctuation_and_Capitalization_Lexical_Audio.ipynb index 5580bc4cf946..778e14e63b70 100644 --- a/tutorials/nlp/Punctuation_and_Capitalization_Lexical_Audio.ipynb +++ b/tutorials/nlp/Punctuation_and_Capitalization_Lexical_Audio.ipynb @@ -645,7 +645,7 @@ "config.trainer.max_epochs = 1\n", "\n", "# Remove distributed training flags\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "config.exp_manager.use_datetime_version=False\n", "config.exp_manager.explicit_log_dir='Punctuation_And_Capitalization_Lexical_Audio'\n", "\n", @@ -860,7 +860,7 @@ "config.trainer.accelerator = accelerator\n", "config.trainer.precision = 16 if torch.cuda.is_available() else 32\n", "config.trainer.max_epochs = 1\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "\n", "# Exp manager\n", "config.exp_manager.explicit_log_dir = 'tarred_experiment'\n", diff --git a/tutorials/nlp/Relation_Extraction-BioMegatron.ipynb b/tutorials/nlp/Relation_Extraction-BioMegatron.ipynb index b7c25cb416ef..451c40152c8d 100644 --- a/tutorials/nlp/Relation_Extraction-BioMegatron.ipynb +++ b/tutorials/nlp/Relation_Extraction-BioMegatron.ipynb @@ -403,7 +403,7 @@ "config.trainer.precision = 16 if torch.cuda.is_available() else 32\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "\n", "trainer = pl.Trainer(**config.trainer)" ] @@ -652,4 +652,4 @@ }, "nbformat": 4, "nbformat_minor": 1 -} \ No newline at end of file +} diff --git a/tutorials/nlp/Text_Classification_Sentiment_Analysis.ipynb b/tutorials/nlp/Text_Classification_Sentiment_Analysis.ipynb index 5b5b74e7bf11..8e44aca9d0d1 100644 --- a/tutorials/nlp/Text_Classification_Sentiment_Analysis.ipynb +++ b/tutorials/nlp/Text_Classification_Sentiment_Analysis.ipynb @@ -370,7 +370,7 @@ "# config.trainer.amp_level = O1\n", "\n", "# disable distributed training when using Colab to prevent the errors\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "\n", "# setup max number of steps to reduce training time for demonstration purposes of this tutorial\n", "# Training stops when max_step or max_epochs is reached (earliest)\n", @@ -573,7 +573,7 @@ "# create a copy of the trainer config and update it to be used for final evaluation\n", "eval_trainer_cfg = config.trainer.copy()\n", "eval_trainer_cfg.accelerator = 'gpu' if torch.cuda.is_available() else 'cpu' # it is safer to perform evaluation on single GPU as PT is buggy with the last batch on multi-GPUs\n", - "eval_trainer_cfg.strategy = None # 'ddp' is buggy with test process in the current PT, it looks like it has been fixed in the latest master\n", + "eval_trainer_cfg.strategy = auto # 'ddp' is buggy with test process in the current PT, it looks like it has been fixed in the latest master\n", "eval_trainer = pl.Trainer(**eval_trainer_cfg)\n", "\n", "eval_trainer.test(model=eval_model, verbose=False) # test_dataloaders=eval_dataloader,\n" @@ -832,4 +832,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} diff --git a/tutorials/nlp/Token_Classification-BioMegatron.ipynb b/tutorials/nlp/Token_Classification-BioMegatron.ipynb index 517f2e557743..e5e5aa81b859 100644 --- a/tutorials/nlp/Token_Classification-BioMegatron.ipynb +++ b/tutorials/nlp/Token_Classification-BioMegatron.ipynb @@ -434,7 +434,7 @@ "config.trainer.precision = 16 if torch.cuda.is_available() else 32\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "\n", "trainer = pl.Trainer(**config.trainer)" ] diff --git a/tutorials/nlp/Token_Classification_Named_Entity_Recognition.ipynb b/tutorials/nlp/Token_Classification_Named_Entity_Recognition.ipynb index c3f7e28b6b1f..1c1999cc08c1 100644 --- a/tutorials/nlp/Token_Classification_Named_Entity_Recognition.ipynb +++ b/tutorials/nlp/Token_Classification_Named_Entity_Recognition.ipynb @@ -533,7 +533,7 @@ "# config.trainer.amp_level = O1\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "\n", "# setup max number of steps to reduce training time for demonstration purposes of this tutorial\n", "config.trainer.max_steps = 32\n", @@ -847,4 +847,4 @@ "metadata": {} } ] -} \ No newline at end of file +} diff --git a/tutorials/nlp/Zero_Shot_Intent_Recognition.ipynb b/tutorials/nlp/Zero_Shot_Intent_Recognition.ipynb index 69df7b27b02d..f571fa176e96 100644 --- a/tutorials/nlp/Zero_Shot_Intent_Recognition.ipynb +++ b/tutorials/nlp/Zero_Shot_Intent_Recognition.ipynb @@ -400,7 +400,7 @@ "# config.trainer.amp_level = O1\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "\n", "# setup max number of steps to reduce training time for demonstration purposes of this tutorial\n", "config.trainer.max_steps = 128\n", @@ -671,4 +671,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} diff --git a/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb b/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb index 3c56df2bbba0..efd86e1ef242 100644 --- a/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb +++ b/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb @@ -761,7 +761,7 @@ "source": [ "config.model.diarizer.speaker_embeddings.model_path=\"titanet_large\"\n", "config.trainer.max_epochs = 5\n", - "config.trainer.strategy = None" + "config.trainer.strategy = auto" ] }, { diff --git a/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb b/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb index dce8c46df1b0..f0ad1c19f5c9 100644 --- a/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb +++ b/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb @@ -475,7 +475,7 @@ "config.trainer.max_epochs = 10\n", "\n", "# Remove distributed training flags\n", - "config.trainer.strategy = None\n", + "config.trainer.strategy = auto\n", "\n", "# Remove augmentations\n", "config.model.train_ds.augmentor=None" From 1be2b400f3dfe6bd630379f356595029c0734d4a Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 27 Sep 2023 19:56:52 +0800 Subject: [PATCH 063/112] unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Sasha Meister --- requirements/requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements/requirements.txt b/requirements/requirements.txt index 7481e337c999..a9a8c1e98100 100644 --- a/requirements/requirements.txt +++ b/requirements/requirements.txt @@ -5,7 +5,7 @@ onnx>=1.7.0 python-dateutil ruamel.yaml scikit-learn -setuptools==65.5.1 +setuptools>=65.5.1 tensorboard text-unidecode torch From 32c06fad9330da6ebb4eb6c583f7e1787b016970 Mon Sep 17 00:00:00 2001 From: Adi Renduchintala Date: Wed, 27 Sep 2023 09:05:29 -0700 Subject: [PATCH 064/112] remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu * mark autogenrated and remove it for test Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../data/language_modeling/megatron/gpt_sft_dataset.py | 5 +++++ .../models/language_modeling/megatron_gpt_sft_model.py | 9 +++++---- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py b/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py index 2c655d5cde6b..101201ef7536 100644 --- a/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py +++ b/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py @@ -171,8 +171,13 @@ def __getitem__(self, idx): # idx may < 0 because we pad_samples_to_global_batch_size, e.g. id = -1 if idx < 0: idx = len(self) + idx + auto_gen_idx = True + else: + auto_gen_idx = False try: example = self.indexed_dataset[idx] + if auto_gen_idx: + example['__AUTOGENERATED__'] = True except Exception as e: logging.error(f"Error while loading example {idx} from dataset {self.file_path}") raise e diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py index df6d792ea703..b38a81460097 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py @@ -492,7 +492,6 @@ def inference_epoch_end(self, outputs, mode, data_cfg): ) # Remove duplicate examples due to distributed sampler. - inp_label_set = set() deduplicated_outputs = { 'preds': [], 'labels': [], @@ -505,14 +504,16 @@ def inference_epoch_end(self, outputs, mode, data_cfg): for pred, label, input, metadata in zip( batch['preds'], batch['labels'], batch['inputs'], batch['metadata'] ): - key = input + label total_size += 1 - if key not in inp_label_set: - inp_label_set.add(key) + if not metadata.get("__AUTOGENERATED__", False): deduplicated_outputs['preds'].append(pred) deduplicated_outputs['labels'].append(label) deduplicated_outputs['inputs'].append(input) deduplicated_outputs['metadata'].append(metadata) + else: + logging.info( + f"skipping autogenerated example example {input} prediction {pred} label {label}" + ) # Compute metric score metric_name = self.val_metric_name if mode == 'validation' else self.test_metric_name From 3a1818fc8e5470cf12f1219c191493eb36f14671 Mon Sep 17 00:00:00 2001 From: Olivier Delalleau <507137+odelalleau@users.noreply.github.com> Date: Wed, 27 Sep 2023 15:21:06 -0400 Subject: [PATCH 065/112] Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../nlp/models/language_modeling/megatron_gpt_model.py | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py index 45586bffcdce..1f905430e87c 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py @@ -41,6 +41,7 @@ get_ltor_masks_and_position_ids, get_params_for_weight_decay_optimization, ) +from nemo.collections.nlp.modules.common.text_generation_strategy import TextGenerationStrategy from nemo.collections.nlp.modules.common.text_generation_utils import ( generate, get_computeprob_response, @@ -1176,6 +1177,8 @@ def generate( inputs: Union[List[str], torch.Tensor, List[dict]], length_params: LengthParam, sampling_params: SamplingParam = None, + *, + strategy: Optional[TextGenerationStrategy] = None, ) -> OutputType: # check whether the DDP is initialized @@ -1201,7 +1204,11 @@ def dummy(): if length_params is None: length_params = get_default_length_params() - return megatron_gpt_generate(self.cuda(), inputs, self.tokenizer, length_params, sampling_params) + strategy_args = {} if strategy is None else {"strategy": strategy} + + return megatron_gpt_generate( + self.cuda(), inputs, self.tokenizer, length_params, sampling_params, **strategy_args + ) def predict_step(self, batch: Any, batch_idx: int, dataloader_idx: Optional[int] = None) -> Any: inference_config = self.get_inference_config() From fd59a8447341ed989da6b076849f054a64bf835c Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 27 Sep 2023 14:45:17 -0700 Subject: [PATCH 066/112] Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan Co-authored-by: Kunal Dhawan Co-authored-by: Nithin Rao Signed-off-by: Sasha Meister --- nemo/collections/asr/models/rnnt_models.py | 24 ++++++++++++++-------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/nemo/collections/asr/models/rnnt_models.py b/nemo/collections/asr/models/rnnt_models.py index 0c1da97c5012..8b5798d34356 100644 --- a/nemo/collections/asr/models/rnnt_models.py +++ b/nemo/collections/asr/models/rnnt_models.py @@ -772,7 +772,7 @@ def predict_step(self, batch, batch_idx, dataloader_idx=0): sample_id = sample_id.cpu().detach().numpy() return list(zip(sample_id, best_hyp_text)) - def validation_step(self, batch, batch_idx, dataloader_idx=0): + def validation_pass(self, batch, batch_idx, dataloader_idx=0): signal, signal_len, transcript, transcript_len = batch # forward() only performs encoder forward @@ -835,15 +835,21 @@ def validation_step(self, batch, batch_idx, dataloader_idx=0): return tensorboard_logs + def validation_step(self, batch, batch_idx, dataloader_idx=0): + metrics = self.validation_pass(batch, batch_idx, dataloader_idx) + if type(self.trainer.val_dataloaders) == list and len(self.trainer.val_dataloaders) > 1: + self.validation_step_outputs[dataloader_idx].append(metrics) + else: + self.validation_step_outputs.append(metrics) + return metrics + def test_step(self, batch, batch_idx, dataloader_idx=0): - logs = self.validation_step(batch, batch_idx, dataloader_idx=dataloader_idx) - test_logs = { - 'test_wer_num': logs['val_wer_num'], - 'test_wer_denom': logs['val_wer_denom'], - # 'test_wer': logs['val_wer'], - } - if 'val_loss' in logs: - test_logs['test_loss'] = logs['val_loss'] + logs = self.validation_pass(batch, batch_idx, dataloader_idx=dataloader_idx) + test_logs = {name.replace("val_", "test_"): value for name, value in logs.items()} + if type(self.trainer.test_dataloaders) == list and len(self.trainer.test_dataloaders) > 1: + self.test_step_outputs[dataloader_idx].append(test_logs) + else: + self.test_step_outputs.append(test_logs) return test_logs def multi_validation_epoch_end(self, outputs, dataloader_idx: int = 0): From 57116f417daae28bf1dcf345a2788d98da4b3d1a Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 27 Sep 2023 17:37:53 -0700 Subject: [PATCH 067/112] gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Signed-off-by: Sasha Meister --- tutorials/asr/Self_Supervised_Pre_Training.ipynb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tutorials/asr/Self_Supervised_Pre_Training.ipynb b/tutorials/asr/Self_Supervised_Pre_Training.ipynb index 6f977df492a1..113979314da9 100644 --- a/tutorials/asr/Self_Supervised_Pre_Training.ipynb +++ b/tutorials/asr/Self_Supervised_Pre_Training.ipynb @@ -313,11 +313,11 @@ "if torch.cuda.is_available():\n", " cfg.trainer.accelerator = 'gpu'\n", " cfg.trainer.strategy = 'dp'\n", - " cfg.trainer.gpus = 1\n", + " cfg.trainer.devices = 1\n", "else:\n", " cfg.trainer.accelerator = 'cpu'\n", " cfg.trainer.strategy = auto\n", - " cfg.trainer.gpus = 0\n", + " cfg.trainer.devices = 0\n", "\n", "cfg.exp_manager.exp_dir = data_dir + \"/content/exp\"\n", "cfg.exp_manager.name = \"pre_trained\"\n", @@ -535,11 +535,11 @@ "if torch.cuda.is_available():\n", " cfg.trainer.accelerator = 'gpu'\n", " cfg.trainer.strategy = 'dp'\n", - " cfg.trainer.gpus = 1\n", + " cfg.trainer.devices = 1\n", "else:\n", " cfg.trainer.accelerator = 'cpu'\n", " cfg.trainer.strategy = auto\n", - " cfg.trainer.gpus = 0\n", + " cfg.trainer.devices = 0\n", "\n", "cfg.model.tokenizer.dir = data_dir + \"/tokenizers/an4/tokenizer_spe_unigram_v128/\" # note this is a directory, not a path to a vocabulary file\n", "cfg.model.tokenizer.type = \"bpe\"\n", From 40c8f08572702dac22d09739fe7cab29f976bb55 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 27 Sep 2023 18:00:26 -0700 Subject: [PATCH 068/112] Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Sasha Meister --- Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Dockerfile b/Dockerfile index e4f04359e6eb..843c0c27df45 100644 --- a/Dockerfile +++ b/Dockerfile @@ -38,7 +38,7 @@ RUN apt-get update && \ libsndfile1 sox \ libfreetype6 \ swig \ - ffmpeg \ + ffmpeg=ffmpeg_5.1.2-3ubuntu1 \ libavdevice-dev && \ rm -rf /var/lib/apt/lists/* From 9bc02385aeddf7464a2419466135e20a60967341 Mon Sep 17 00:00:00 2001 From: meatybobby Date: Thu, 28 Sep 2023 10:12:55 -0700 Subject: [PATCH 069/112] PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu * list of fields for context Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * Fix bug Signed-off-by: Cheng-Ping Hsieh * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh * Remove unused import Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * Change assert logic Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh * Remove random truncation Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh --------- Signed-off-by: arendu Signed-off-by: Cheng-Ping Hsieh Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui * code style warnings Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui * add copyright header Signed-off-by: Chen Cui * fix code check warnings Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui * update deprecation notices Signed-off-by: Chen Cui * update deprecation notices Signed-off-by: Chen Cui * consolidate peft and sft scripts Signed-off-by: Chen Cui * update CI tests Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui * support pre-extracted checkpoints Signed-off-by: Chen Cui --------- Signed-off-by: jasonwan Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Signed-off-by: Adi Renduchintala Signed-off-by: arendu Signed-off-by: Cheng-Ping Hsieh Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Chen Cui Co-authored-by: Chen Cui Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn Co-authored-by: jasonwan Co-authored-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Co-authored-by: Adi Renduchintala Co-authored-by: Yuanzhe Dong Co-authored-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister --- Jenkinsfile | 380 +---- docs/source/conf.py | 1 + docs/source/nlp/api.rst | 14 +- docs/source/nlp/nemo_megatron/intro.rst | 1 + .../nlp/nemo_megatron/peft/landing_page.rst | 35 + .../nlp/nemo_megatron/peft/quick_start.rst | 90 ++ .../nemo_megatron/peft/supported_methods.rst | 71 + .../megatron_t5_prompt_learning.py | 5 + .../megatron_t5_prompt_learning_eval.py | 5 + .../megatron_t5_seq2seq_eval.py | 8 +- .../megatron_t5_seq2seq_finetune.py | 8 +- .../conf/megatron_gpt_peft_eval_config.yaml | 36 +- .../conf/megatron_gpt_peft_tuning_config.yaml | 22 +- .../conf/megatron_t5_peft_eval_config.yaml | 213 +++ .../conf/megatron_t5_peft_tuning_config.yaml | 220 +++ .../tuning/megatron_gpt_adapter_eval.py | 5 + .../tuning/megatron_gpt_adapter_tuning.py | 5 + .../tuning/megatron_gpt_ia3_eval.py | 5 + .../tuning/megatron_gpt_ia3_tuning.py | 5 + .../tuning/megatron_gpt_peft_eval.py | 199 +-- .../tuning/megatron_gpt_peft_tuning.py | 231 +-- .../tuning/megatron_gpt_sft.py | 5 + .../tuning/megatron_t5_adapter_eval.py | 5 + .../tuning/megatron_t5_adapter_tuning.py | 5 + .../tuning/megatron_t5_ia3_eval.py | 5 + .../tuning/megatron_t5_ia3_tuning.py | 5 + .../tuning/megatron_t5_lora_eval.py | 5 + .../tuning/megatron_t5_lora_tuning.py | 5 + .../tuning/megatron_t5_peft_eval.py | 135 ++ .../tuning/megatron_t5_peft_tuning.py | 65 + .../megatron/t5_sft_dataset.py | 169 +++ .../language_modeling/megatron_glue_model.py | 6 +- .../language_modeling/megatron_gpt_model.py | 3 +- .../megatron_gpt_peft_models.py | 41 +- .../megatron_gpt_sft_model.py | 19 +- .../language_modeling/megatron_t0_model.py | 14 +- .../megatron_t5_adapter_model.py | 8 +- .../megatron_t5_prompt_learning_model.py | 14 +- ...tune_model.py => megatron_t5_sft_model.py} | 301 ++-- nemo/collections/nlp/models/nlp_model.py | 23 +- .../megatron/adapters/parallel_adapters.py | 25 +- .../megatron/token_level_encoder_decoder.py | 20 +- .../nlp/parts/megatron_trainer_builder.py | 26 +- nemo/collections/nlp/parts/mixins/__init__.py | 13 + .../nlp/parts/mixins/nlp_adapter_mixins.py | 484 ++++++ nemo/collections/nlp/parts/nlp_overrides.py | 162 +- nemo/collections/nlp/parts/peft_config.py | 190 +++ nemo/core/classes/mixins/adapter_mixins.py | 21 +- .../metric_calculation/peft_metric_calc.py | 33 +- tutorials/nlp/lora.ipynb | 1301 +++-------------- 50 files changed, 2509 insertions(+), 2158 deletions(-) create mode 100644 docs/source/nlp/nemo_megatron/peft/landing_page.rst create mode 100644 docs/source/nlp/nemo_megatron/peft/quick_start.rst create mode 100644 docs/source/nlp/nemo_megatron/peft/supported_methods.rst create mode 100644 examples/nlp/language_modeling/tuning/conf/megatron_t5_peft_eval_config.yaml create mode 100644 examples/nlp/language_modeling/tuning/conf/megatron_t5_peft_tuning_config.yaml create mode 100644 examples/nlp/language_modeling/tuning/megatron_t5_peft_eval.py create mode 100644 examples/nlp/language_modeling/tuning/megatron_t5_peft_tuning.py create mode 100644 nemo/collections/nlp/data/language_modeling/megatron/t5_sft_dataset.py rename nemo/collections/nlp/models/language_modeling/{megatron_finetune_model.py => megatron_t5_sft_model.py} (74%) create mode 100644 nemo/collections/nlp/parts/mixins/__init__.py create mode 100644 nemo/collections/nlp/parts/mixins/nlp_adapter_mixins.py create mode 100644 nemo/collections/nlp/parts/peft_config.py diff --git a/Jenkinsfile b/Jenkinsfile index 2abcdbcc5ddb..92aa65ae660b 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -669,188 +669,6 @@ pipeline { } } - // commented out temporarily to save time on github ci - //stage('L2: Megatron T5 Adapter PP=2') { - // when { - // anyOf { - // branch 'main' - // changeRequest target: 'main' - // } - // } - // failFast true - // parallel{ - // stage('T5 Adapter tuning & inference TP=1 PP=2') { - // steps { - // sh "python examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py \ - // --config-name=megatron_t5_adapter_tuning_config \ - // name='test_tp1_pp2' \ - // exp_manager.exp_dir='examples/adapter_tuning' \ - // trainer.devices=2 \ - // trainer.max_steps=1 \ - // trainer.val_check_interval=1 \ - // trainer.max_epochs=null \ - // model.data.num_workers=1 \ - // model.tensor_model_parallel_size=1 \ - // model.pipeline_model_parallel_size=2 \ - // model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ - // model.existing_tasks=[] \ - // model.new_tasks=['rte'] \ - // model.data.train_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - // model.global_batch_size=4" - // sh "python examples/nlp/language_modeling/tuning/megatron_t5_adapter_eval.py \ - // --config-name=megatron_t5_adapter_inference \ - // adapter_model_file='examples/adapter_tuning/test_tp1_pp2.nemo' \ - // language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ - // trainer.devices=2 \ - // data.num_workers=1 \ - // tensor_model_parallel_size=1 \ - // pipeline_model_parallel_size=2 \ - // data.global_batch_size=2 \ - // data.micro_batch_size=2 \ - // data.test_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - // pred_file_path='examples/adapter_tuning/test_tp1_pp2/preds.txt'" - // sh "rm -rf examples/adapter_tuning/test_tp1_pp2.nemo" - // sh "rm -rf examples/adapter_tuning/test_tp1_pp2" - // } - // } - // } - //} - //stage('L2: Megatron T5 Adapter TP=2') { - // when { - // anyOf { - // branch 'main' - // changeRequest target: 'main' - // } - // } - // failFast true - // parallel{ - // stage('T5 Adapter tuning & inference TP=2 PP=1') { - // steps { - // sh "python examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py \ - // --config-name=megatron_t5_adapter_tuning_config \ - // name='test_tp2_pp1' \ - // exp_manager.exp_dir='examples/adapter_tuning' \ - // trainer.devices=2 \ - // trainer.max_steps=1 \ - // trainer.val_check_interval=1 \ - // trainer.max_epochs=null \ - // model.data.num_workers=1 \ - // model.tensor_model_parallel_size=2 \ - // model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ - // model.existing_tasks=[] \ - // model.new_tasks=['rte'] \ - // model.data.train_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - // model.global_batch_size=4" - // sh "python examples/nlp/language_modeling/tuning/megatron_t5_adapter_eval.py \ - // --config-name=megatron_t5_adapter_inference \ - // adapter_model_file='examples/adapter_tuning/test_tp2_pp1.nemo' \ - // language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ - // trainer.devices=2 \ - // tensor_model_parallel_size=2 \ - // data.global_batch_size=2 \ - // data.micro_batch_size=2 \ - // data.num_workers=1 \ - // data.test_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - // pred_file_path='examples/adapter_tuning/test_tp2_pp1/preds.txt'" - // sh "rm -rf examples/adapter_tuning/test_tp2_pp1.nemo" - // sh "rm -rf examples/adapter_tuning/test_tp2_pp1" - // } - // } - // } - //} - stage('L2: Megatron T5 IA3 PP=2') { - when { - anyOf { - branch 'main' - changeRequest target: 'main' - } - } - failFast true - parallel{ - stage('T5 IA3 tuning & inference TP=1 PP=2') { - steps { - sh "python examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py \ - --config-name=megatron_t5_ia3_tuning_config \ - name='test_tp1_pp2' \ - exp_manager.exp_dir='examples/ia3_tuning' \ - trainer.devices=2 \ - trainer.max_steps=1 \ - trainer.val_check_interval=1 \ - trainer.max_epochs=null \ - model.data.num_workers=1 \ - model.tensor_model_parallel_size=1 \ - model.pipeline_model_parallel_size=2 \ - model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ - model.existing_tasks=[] \ - model.new_tasks=['rte'] \ - model.data.train_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - model.data.validation_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - model.global_batch_size=4" - // TODO: @eharper temporarily comment while investigating how to fix - // sh "python examples/nlp/language_modeling/tuning/megatron_t5_ia3_eval.py \ - // --config-name=megatron_t5_ia3_inference \ - // adapter_model_file='examples/ia3_tuning/test_tp1_pp2.nemo' \ - // language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ - // trainer.devices=2 \ - // data.num_workers=1 \ - // tensor_model_parallel_size=1 \ - // pipeline_model_parallel_size=2 \ - // data.global_batch_size=2 \ - // data.micro_batch_size=2 \ - // data.test_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - // pred_file_path='examples/ia3_tuning/test_tp1_pp2/preds.txt'" - sh "rm -rf examples/ia3_tuning/test_tp1_pp2.nemo" - sh "rm -rf examples/ia3_tuning/test_tp1_pp2" - } - } - } - } - stage('L2: Megatron T5 IA3 TP=2') { - when { - anyOf { - branch 'main' - changeRequest target: 'main' - } - } - failFast true - parallel{ - stage('T5 IA3 tuning & inference TP=2 PP=1') { - steps { - sh "python examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py \ - --config-name=megatron_t5_ia3_tuning_config \ - name='test_tp2_pp1' \ - exp_manager.exp_dir='examples/ia3_tuning' \ - trainer.devices=2 \ - trainer.max_steps=1 \ - trainer.val_check_interval=1 \ - trainer.max_epochs=null \ - model.data.num_workers=1 \ - model.tensor_model_parallel_size=2 \ - model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ - model.existing_tasks=[] \ - model.new_tasks=['rte'] \ - model.data.train_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - model.data.validation_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - model.global_batch_size=4" - sh "python examples/nlp/language_modeling/tuning/megatron_t5_ia3_eval.py \ - --config-name=megatron_t5_ia3_inference \ - adapter_model_file='examples/ia3_tuning/test_tp2_pp1.nemo' \ - language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ - trainer.devices=2 \ - data.num_workers=1 \ - tensor_model_parallel_size=2 \ - data.global_batch_size=2 \ - data.micro_batch_size=2 \ - data.test_ds=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl'] \ - pred_file_path='examples/ia3_tuning/test_tp2_pp1/preds.txt'" - sh "rm -rf examples/ia3_tuning/test_tp2_pp1.nemo" - sh "rm -rf examples/ia3_tuning/test_tp2_pp1" - } - } - } - } stage('L2: Speech Transcription') { when { @@ -3742,7 +3560,7 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' } failFast true steps { - sh "python examples/nlp/language_modeling/tuning/megatron_gpt_sft.py \ + sh "python examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py \ trainer.devices=2 \ trainer.log_every_n_steps=1 \ trainer.val_check_interval=2 \ @@ -3756,6 +3574,7 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' model.restore_from_path=/home/TestData/nlp/megatron_gpt/PP2/gpt_pp2_tp1.nemo \ model.optim.name=fused_adam \ model.optim.lr=2e-4 \ + model.peft.peft_scheme=null \ model.data.train_ds.micro_batch_size=1 \ model.data.train_ds.global_batch_size=4 \ model.data.train_ds.file_names=[/home/TestData/nlp/megatron_sft/quarel.jsonl,/home/TestData/nlp/megatron_sft/trec.jsonl] \ @@ -3770,7 +3589,7 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' model.data.validation_ds.num_workers=0 \ model.data.validation_ds.file_names=[/home/TestData/nlp/megatron_sft/quarel.jsonl] \ model.data.validation_ds.names=[quarel]" - sh "python examples/nlp/language_modeling/tuning/megatron_gpt_sft.py \ + sh "python examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py \ trainer.devices=2 \ trainer.log_every_n_steps=1 \ trainer.val_check_interval=1 \ @@ -3784,6 +3603,7 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' model.restore_from_path=/home/TestData/nlp/megatron_gpt/PP2/gpt_pp2_tp1.nemo \ model.optim.name=fused_adam \ model.optim.lr=2e-4 \ + model.peft.peft_scheme=null \ model.data.train_ds.micro_batch_size=1 \ model.data.train_ds.global_batch_size=4 \ model.data.train_ds.file_names=[/home/TestData/nlp/megatron_sft/quarel.jsonl,/home/TestData/nlp/megatron_sft/trec.jsonl] \ @@ -3870,14 +3690,15 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' model.data.validation_ds.names=[quarel]" sh "python examples/nlp/language_modeling/tuning/megatron_gpt_peft_eval.py \ model.restore_from_path=/home/TestData/nlp/megatron_gpt/TP2/megatron_gpt_tp2.nemo \ - model.peft.restore_from_path=/home/TestData/nlp/lora_tuning_tp2/megatron_gpt_peft_tuning/checkpoints/megatron_gpt_peft_tuning.nemo \ + model.peft.restore_from_path=/home/TestData/nlp/lora_tuning_tp2/megatron_gpt_peft_lora_tuning/checkpoints/megatron_gpt_peft_lora_tuning.nemo \ model.peft.restore_from_ckpt_name=null \ model.peft.restore_from_hparams_path=null \ + model.tensor_model_parallel_size=2 \ trainer.devices=2 \ model.data.test_ds.file_names=[/home/TestData/nlp/megatron_sft/quarel_4.jsonl] \ model.data.test_ds.names=['quarel4'] \ - model.data.test_ds.global_batch_size=1 \ - model.data.test_ds.micro_batch_size=1 \ + model.global_batch_size=2 \ + model.micro_batch_size=1 \ model.data.test_ds.tokens_to_generate=10 \ model.data.test_ds.write_predictions_to_file=True \ model.data.test_ds.output_file_path_prefix='/home/TestData/nlp/lora_tuning_tp2/out' \ @@ -4428,136 +4249,6 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' } } - // commented out to save time in github ci, we have tp>1 and pp>1 tests anyway @adithyare - //stage('L2: Megatron T5 Prompt Learning TP1 PP1') { - // when { - // anyOf { - // branch 'main' - // changeRequest target: 'main' - // } - // } - // failFast true - // parallel{ - // stage('T5 Prompt Learning TP=1 PP=1') { - // steps { - // sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning.py \ - // --config-name=megatron_t5_prompt_learning \ - // name='/home/TestData/nlp/prompt_learning/t5_p_tuning_test' \ - // trainer.devices=1 \ - // trainer.max_steps=1 \ - // trainer.val_check_interval=1 \ - // trainer.max_epochs=null \ - // model.data.num_workers=1 \ - // model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m-refactor.nemo' \ - // model.existing_tasks=[] \ - // model.new_tasks=['squad'] \ - // model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - // model.global_batch_size=4 \ - // model.micro_batch_size=4" - // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test" - // sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \ - // virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test.nemo' \ - // language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m-refactor.nemo' \ - // data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - // pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_preds.txt' \ - // data.global_batch_size=4 \ - // data.micro_batch_size=4" - // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test.nemo" - // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_preds.txt" - // } - // } - // } - //} - - stage('L2: Megatron T5 Prompt Learning TP2 PP1') { - when { - anyOf { - branch 'main' - changeRequest target: 'main' - } - } - failFast true - parallel{ - stage('T5 Prompt Learning TP=2 PP=1') { - steps { - sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning.py \ - --config-name=megatron_t5_prompt_learning \ - name='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2' \ - trainer.devices=2 \ - trainer.max_steps=1 \ - trainer.val_check_interval=1 \ - trainer.max_epochs=null \ - model.data.num_workers=1 \ - model.tensor_model_parallel_size=2 \ - model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ - model.existing_tasks=[] \ - model.new_tasks=['squad'] \ - model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - model.data.validation_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - model.global_batch_size=8 \ - model.micro_batch_size=8" - sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2" - sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \ - virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo' \ - language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \ - data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt' \ - tensor_model_parallel_size=2 \ - trainer.devices=2 \ - data.global_batch_size=8 \ - data.micro_batch_size=8" - sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo" - sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt" - } - } - } - } - - // TODO: add when https://github.com/NVIDIA/apex/pull/1596 is merged - // stage('L2: Megatron T5 Prompt Learning TP1 PP2') { - // when { - // anyOf { - // branch 'main' - // changeRequest target: 'main' - // } - // } - // failFast true - // parallel{ - // stage('T5 Prompt Learning TP=1 PP=2') { - // steps { - // sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning.py \ - // --config-name=megatron_t5_prompt_learning \ - // name='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2' \ - // trainer.devices=2 \ - // trainer.max_steps=1 \ - // trainer.val_check_interval=1 \ - // trainer.max_epochs=null \ - // model.data.num_workers=1 \ - // model.pipeline_model_parallel_size=2 \ - // model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ - // model.existing_tasks=[] \ - // model.new_tasks=['squad'] \ - // model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - // model.data.validation_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - // model.global_batch_size=8 \ - // model.micro_batch_size=8" - // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2" - // sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \ - // virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2.nemo' \ - // language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp1_pp2.nemo' \ - // data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \ - // pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2_preds.txt' \ - // tensor_model_parallel_size=2 \ - // trainer.devices=2 \ - // data.global_batch_size=8 \ - // data.micro_batch_size=8" - // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2.nemo" - // sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_pp2_preds.txt" - // } - // } - // } - // } stage('L2: Megatron UL2 Pretraining and Resume Training TP=2') { when { anyOf { @@ -4870,6 +4561,61 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' } } } + + stage('L2: Megatron T5 PEFT Lora TP=2') { + when { + anyOf { + branch 'main' + changeRequest target: 'main' + } + } + failFast true + steps { + sh "rm -rf /home/TestData/nlp/t5_lora_tuning_tp2" + sh "python examples/nlp/language_modeling/tuning/megatron_t5_peft_tuning.py \ + trainer.devices=2 \ + trainer.log_every_n_steps=1 \ + trainer.max_epochs=9999 \ + trainer.max_steps=3 \ + trainer.val_check_interval=3 \ + ++trainer.limit_val_batches=2 \ + trainer.precision=16 \ + exp_manager.exp_dir=/home/TestData/nlp/t5_lora_tuning_tp2 \ + model.pipeline_model_parallel_size=1 \ + model.tensor_model_parallel_size=2 \ + model.restore_from_path=/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo \ + model.peft.peft_scheme='lora' \ + model.answer_only_loss=True \ + model.micro_batch_size=1 \ + model.global_batch_size=1 \ + model.data.train_ds.file_names=[/home/TestData/nlp/megatron_sft/quarel.jsonl] \ + model.data.train_ds.concat_sampling_probabilities=[1.0] \ + model.data.train_ds.num_workers=0 \ + model.data.validation_ds.num_workers=0 \ + model.data.validation_ds.file_names=[/home/TestData/nlp/megatron_sft/quarel.jsonl] \ + model.data.validation_ds.names=[quarel]" + sh "python examples/nlp/language_modeling/tuning/megatron_t5_peft_eval.py \ + model.restore_from_path=/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo \ + model.peft.restore_from_path=/home/TestData/nlp/t5_lora_tuning_tp2/megatron_t5_peft_lora_tuning/checkpoints/megatron_t5_peft_lora_tuning.nemo \ + model.peft.restore_from_ckpt_name=null \ + model.peft.restore_from_hparams_path=null \ + model.tensor_model_parallel_size=2 \ + trainer.devices=2 \ + model.data.test_ds.file_names=[/home/TestData/nlp/megatron_sft/quarel_4.jsonl] \ + model.data.test_ds.names=['quarel4'] \ + model.global_batch_size=2 \ + model.micro_batch_size=1 \ + model.data.test_ds.tokens_to_generate=10 \ + model.data.test_ds.write_predictions_to_file=True \ + model.data.test_ds.output_file_path_prefix='/home/TestData/nlp/t5_lora_tuning_tp2/out' \ + inference.greedy=True \ + inference.repetition_penalty=1.0 \ + inference.outfile_path='/home/TestData/nlp/t5_lora_tuning_tp2/out.jsonl'" + sh "rm -rf /home/TestData/nlp/t5_lora_tuning_tp2" + } + } + + stage('L2: Megatron Mock Data Generation') { when { anyOf { diff --git a/docs/source/conf.py b/docs/source/conf.py index 0765f8940ab0..c54defb59ce8 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -52,6 +52,7 @@ 'attr', # attrdict in requirements, attr in import 'torchmetrics', # inherited from PTL 'lightning_utilities', # inherited from PTL + 'lightning_fabric', 'apex', 'megatron.core', 'transformer_engine', diff --git a/docs/source/nlp/api.rst b/docs/source/nlp/api.rst index b13dedca300f..33709bd05a19 100755 --- a/docs/source/nlp/api.rst +++ b/docs/source/nlp/api.rst @@ -81,7 +81,6 @@ Modules .. autoclass:: nemo.collections.nlp.modules.common.megatron.module.Float16Module :show-inheritance: - .. autoclass:: nemo.collections.nlp.models.language_modeling.megatron.gpt_model.GPTModel :show-inheritance: :no-members: @@ -140,11 +139,22 @@ Datasets .. autoclass:: nemo.collections.nlp.data.language_modeling.megatron.ul2_dataset.UL2Dataset :show-inheritance: + +Adapter Mixin Class +------------------------- + +.. autoclass:: nemo.collections.nlp.parts.mixins.nlp_adapter_mixins.NLPAdapterModelMixin + :show-inheritance: + :members: add_adapter, load_adapters, merge_cfg_with, merge_inference_cfg + :exclude-members: first_stage_of_pipeline, tie_weights, get_peft_state_dict, state_dict, sharded_state_dict, load_state_dict, on_load_checkpoint + :member-order: bysource + + Exportable Model Classes ------------------------- .. autoclass:: nemo.collections.nlp.models.language_modeling.megatron_gpt_model.MegatronGPTExportableModel - :show-inheritance: + :show-inheritance: .. toctree:: :maxdepth: 1 diff --git a/docs/source/nlp/nemo_megatron/intro.rst b/docs/source/nlp/nemo_megatron/intro.rst index 7525c778b974..c1b158d77e3e 100644 --- a/docs/source/nlp/nemo_megatron/intro.rst +++ b/docs/source/nlp/nemo_megatron/intro.rst @@ -25,6 +25,7 @@ team at NVIDIA. NeMo Megatron supports several types of models: prompt_learning retro/retro_model hiddens/hiddens_module + peft/landing_page References diff --git a/docs/source/nlp/nemo_megatron/peft/landing_page.rst b/docs/source/nlp/nemo_megatron/peft/landing_page.rst new file mode 100644 index 000000000000..c90dcdfff1c5 --- /dev/null +++ b/docs/source/nlp/nemo_megatron/peft/landing_page.rst @@ -0,0 +1,35 @@ +Parameter-Efficient Fine-Tuning (PEFT) +====================================== + +PEFT is a popular technique used to efficiently finetune large language +models for use in various downstream tasks. When finetuning with PEFT, +the base model weights are frozen, and a few trainable adapter modules +are injected into the model, resulting in a very small number (<< 1%) of +trainble weights. With carefully chosen adapter modules and injection +points, PEFT achieves comparable performance to full finetuning at a +fraction of the computational and storage costs. + +NeMo supports four PEFT methods which can be used with various +transformer-based models. + +==================== ===== ===== ========= == +\ GPT 3 NvGPT LLaMa 1/2 T5 +==================== ===== ===== ========= == +Adapters (Canonical) ✅ ✅ ✅ ✅ +LoRA ✅ ✅ ✅ ✅ +IA3 ✅ ✅ ✅ ✅ +P-Tuning ✅ ✅ ✅ ✅ +==================== ===== ===== ========= == + +Learn more about PEFT in NeMo with the :ref:`peftquickstart` which provides an overview on how PEFT works +in NeMo. Read about the supported PEFT methods +`here `__. For a practical example, take a look at +the `Step-by-step Guide `__. + +The API guide can be found `here <../../api.html#adapter-mixin-class>`__ + +.. toctree:: + :maxdepth: 1 + + quick_start + supported_methods \ No newline at end of file diff --git a/docs/source/nlp/nemo_megatron/peft/quick_start.rst b/docs/source/nlp/nemo_megatron/peft/quick_start.rst new file mode 100644 index 000000000000..000e242b9508 --- /dev/null +++ b/docs/source/nlp/nemo_megatron/peft/quick_start.rst @@ -0,0 +1,90 @@ +.. _peftquickstart: + + +Quick Start Guide +================= + +The quick start guide provides an overview of a PEFT workflow in NeMo. + +Terminology: PEFT vs Adapter +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This tutorial uses "PEFT" to describe the overall parameter efficient +finetuning method, and "adapter" to describe the additional module +injected to a frozen base model. Each PEFT model can use one or more +types of adapters. + +One of the PEFT methods is sometimes referred to as "adapters", because +it was one of the first proposed usage of adapter modules for NLP. This +PEFT method will be called the "canonical" adapters to distinguish the +two usages. + +How PEFT work in NeMo models +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Each PEFT method has one or more types of adapters that need to be +injected into the base model. In NeMo models, the adapter logic and +adapter weights are already built into the submodules, but they are +disabled by default for ordinary training and fine-tuning. + +When doing PEFT, the adapter logic path can be enabled when +``model.add_adapter(peft_cfg)`` is called. In this function, the model +scans through each adapter applicable to the current PEFT method with +each of its submodules in order to find adapter logic paths that can be +enabled. Then, the base models weights are frozen, while newly added +adapter weights are unfrozen and allowed to be updated during +fine-tuning, hence achieving efficiency in the number of parameters +finetuned. + +PEFT config classes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Each PEFT method is specified by a ``PEFTConfig`` class which stores the +types of adapters applicable to the PEFT method, as well as +hyperparameters required to initialize these adapter modules. These four +PEFT methods are currently supported: + +1. Adapters (canonical): ``CanonicalAdaptersPEFTConfig`` +2. LoRA: ``LoraPEFTConfig`` +3. IA3: ``IA3PEFTConfig`` +4. P-Tuning: ``PtuningPEFTConfig`` + +These config classes make experimenting with different adapters as easy +as changing the config class. + +Moreover, it is possible to use a combination of the PEFT methods in +NeMo since they are orthogonal to each other. This can be easily done by +passing in a list of ``PEFTConfig`` objects to ``add_adapter`` instead +of a single one. For example, a common workflow is to combine P-Tuning +and Adapter, and this can be achieved with +``model.add_adapter([PtuningPEFTConfig(model_cfg), CanonicalAdaptersPEFTConfig(model_cfg)])`` + +Base model classes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +PEFT in NeMo is built with a mix-in class that does not belong to any +model in particular. This means that the same interface is available to +different NeMo models. Currently, NeMo supports PEFT for GPT-style +models such as GPT 3, NvGPT, LLaMa 1/2 (``MegatronGPTSFTModel``), as +well as T5 (``MegatronT5SFTModel``). + +Full finetuning vs PEFT +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +You can switch between full fine-tuning and PEFT by removing calls to +``add_adapter`` and ``load_adapter``. + +The code snippet below illustrates the core API of full fine-tuning and +PEFT. + +.. code:: diff + + trainer = MegatronTrainerBuilder(config).create_trainer() + model_cfg = MegatronGPTSFTModel.merge_cfg_with(config.model.restore_from_path, config) + + model = MegatronGPTSFTModel.restore_from(restore_path, model_cfg, trainer) # restore from pretrained ckpt + + peft_cfg = LoRAPEFTConfig(model_cfg) + + model.add_adapter(peft_cfg) + trainer.fit(model) # saves adapter weights only + + # Restore from base then load adapter API + model = MegatronGPTSFTModel.restore_from(restore_path, trainer, model_cfg) + + model.load_adapters(adapter_save_path, peft_cfg) + model.freeze() + trainer.predict(model) diff --git a/docs/source/nlp/nemo_megatron/peft/supported_methods.rst b/docs/source/nlp/nemo_megatron/peft/supported_methods.rst new file mode 100644 index 000000000000..4479565be6aa --- /dev/null +++ b/docs/source/nlp/nemo_megatron/peft/supported_methods.rst @@ -0,0 +1,71 @@ + + +Supported PEFT methods +---------------------- + +NeMo supports the following PFET tuning methods + +1. **Adapters (Canonical)**: `Parameter-Efficient Transfer Learning for + NLP `__ + + - Adapters (Houlsby setup) is one of the first PEFT methods applied + to NLP. Adapter tuning is more efficient than full fine-tuning + because the base model weights are frozen, while only a small + number of adapter module weights are updated. In this method, two + linear layers with a bottleneck and a non-linear activation are + inserted into each transformer layer via a residual connection. In + each case, the output linear layer is initialized to 0 to ensure + that an untrained adapter does not affect the normal forward pass + of the transformer layer. + +2. **LoRA**: `LoRA: Low-Rank Adaptation of Large Language + Models `__ + + - LoRA makes fine-tuning efficient by representing weight updates + with two low rank decomposition matrices. The original model + weights remain frozen, while the low rank decomposition matrices + are updated to adapt to the new data , so the number of trainable + parameters is kept low. In contrast with adapters, the original + model weights and adapted weights can be combined during + inference, avoiding any architectural change or additional latency + in the model at inference time. + - The matrix decomposition operation can be applied to any linear + layer, but in practice, it is only applied to the K, Q, V + projection matrices (sometimes just applied to the Q,V layers). + Since NeMo's attention implementation fuses KQV into a single + projection, our LoRA implementation learns a single Low-Rank + projection for KQV in a combined fashion. + +3. **IA3**: `Few-Shot Parameter-Efficient Fine-Tuning is Better and + Cheaper than In-Context Learning `__ + + - IA3 makes fine-tuning efficient by rescaling activations with + learned vectors. The rescaling layers are injected in the + attention (for key and value) and feedforward modules in the base + model. Similar to other PEFT methods, only the rescaling vectors + are updated during fine-tuning to adapt to the new data so the + number of updated parameters is low. However, since rescaling + vectors are much smaller than low rank matrices (LoRA) and + bottleneck layers (Adapters), IA3 cuts down the number of + trainable parameters further by an order of magnitude. The + learning rescaling vectors can also be merged with the base + weights, leading to no architectural change and no additional + latency at inference time. + +4. **P-Tuning**: `GPT Understands, + Too `__ + + - P-tuning is an example of the prompt learning family of methods, + in which trainable virtual tokens are inserted into the model + input prompt to induce it to perform a task. Virtual tokens (also + called "continuous" or "soft" tokens) are embeddings that have no + concrete mapping to strings or characters within the model’s + vocabulary. They are simply 1D vectors that match the + dimensionality of real tokens which make up the model's + vocabulary. + - In p-tuning, an intermediate LSTM or MLP model is used to generate + virtual token embeddings. We refer to this intermediate model as + our ``prompt_encoder``. The prompt encoder parameters are randomly + initialized at the start of p-tuning. All base model parameters + are frozen, and only the prompt encoder weights are updated at + each training step. diff --git a/examples/nlp/language_modeling/megatron_t5_prompt_learning.py b/examples/nlp/language_modeling/megatron_t5_prompt_learning.py index ba335e39c225..3edca99e15a5 100644 --- a/examples/nlp/language_modeling/megatron_t5_prompt_learning.py +++ b/examples/nlp/language_modeling/megatron_t5_prompt_learning.py @@ -29,6 +29,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging +from nemo.utils.decorators import deprecated from nemo.utils.exp_manager import exp_manager mp.set_start_method("spawn", force=True) @@ -43,6 +44,10 @@ """ +@deprecated( + explanation=f"{__file__} is deprecated. Please use MegatronT5SFTModel.add_adapter() for PEFT features." + "See updated scripts `megatron_t5_peft_tuning.py` and `megatron_t5_peft_eval.py` for examples." +) @hydra_runner(config_path="conf", config_name="megatron_t5_prompt_learning.yaml") def main(cfg) -> None: logging.info("\n\n************** Experiment configuration ***********") diff --git a/examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py b/examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py index 3b932e99ced3..67640138b3ff 100644 --- a/examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py +++ b/examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py @@ -24,6 +24,7 @@ from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy from nemo.core.config import hydra_runner from nemo.utils.app_state import AppState +from nemo.utils.decorators import deprecated try: from megatron.core import parallel_state @@ -37,6 +38,10 @@ raise EnvironmentError("GPU is needed for the inference") +@deprecated( + explanation=f"{__file__} is deprecated. Please use MegatronT5SFTModel.add_adapter() for PEFT features." + "See updated scripts `megatron_t5_peft_tuning.py` and `megatron_t5_peft_eval.py` for examples." +) @hydra_runner(config_path="conf", config_name="megatron_t5_prompt_learning_inference") def main(cfg) -> None: diff --git a/examples/nlp/language_modeling/megatron_t5_seq2seq_eval.py b/examples/nlp/language_modeling/megatron_t5_seq2seq_eval.py index f00b37612663..6a5ab9fbf481 100644 --- a/examples/nlp/language_modeling/megatron_t5_seq2seq_eval.py +++ b/examples/nlp/language_modeling/megatron_t5_seq2seq_eval.py @@ -18,9 +18,9 @@ from pytorch_lightning.plugins.environments import TorchElasticEnvironment from pytorch_lightning.plugins.precision import MixedPrecisionPlugin -from nemo.collections.nlp.models.language_modeling.megatron_finetune_model import MegatronT5FinetuneModel from nemo.collections.nlp.models.language_modeling.megatron_glue_model import MegatronT5GLUEModel from nemo.collections.nlp.models.language_modeling.megatron_t0_model import MegatronT0Model +from nemo.collections.nlp.models.language_modeling.megatron_t5_sft_model import MegatronT5SFTModel from nemo.collections.nlp.parts.nlp_overrides import GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy from nemo.core.config import hydra_runner from nemo.utils import logging @@ -122,13 +122,13 @@ def main(cfg) -> None: model = load_from_checkpoint_dir(MegatronT0Model, cfg, trainer, modify_confg_fn=_modify_config) else: if cfg.model.restore_from_path: - t5_cfg = MegatronT5FinetuneModel.restore_from( + t5_cfg = MegatronT5SFTModel.restore_from( restore_path=cfg.model.restore_from_path, trainer=trainer, return_config=True ) - model = load_from_nemo(MegatronT5FinetuneModel, cfg, trainer, t5_cfg, modify_confg_fn=_modify_config) + model = load_from_nemo(MegatronT5SFTModel, cfg, trainer, t5_cfg, modify_confg_fn=_modify_config) else: validate_checkpoint_loading_args(cfg.model.pretrained_checkpoint) - model = load_from_checkpoint_dir(MegatronT5FinetuneModel, cfg, trainer, modify_confg_fn=_modify_config) + model = load_from_checkpoint_dir(MegatronT5SFTModel, cfg, trainer, modify_confg_fn=_modify_config) model.freeze() trainer.validate(model) diff --git a/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py b/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py index e8e4ff0f9868..59ca94082f26 100644 --- a/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py +++ b/examples/nlp/language_modeling/megatron_t5_seq2seq_finetune.py @@ -21,9 +21,9 @@ from pytorch_lightning.plugins.environments import TorchElasticEnvironment from pytorch_lightning.trainer.connectors.checkpoint_connector import _CheckpointConnector -from nemo.collections.nlp.models.language_modeling.megatron_finetune_model import MegatronT5FinetuneModel from nemo.collections.nlp.models.language_modeling.megatron_glue_model import MegatronT5GLUEModel from nemo.collections.nlp.models.language_modeling.megatron_t0_model import MegatronT0Model +from nemo.collections.nlp.models.language_modeling.megatron_t5_sft_model import MegatronT5SFTModel from nemo.collections.nlp.modules.common.megatron.megatron_init import fake_initialize_model_parallel from nemo.collections.nlp.parts.nlp_overrides import ( CustomProgressBar, @@ -207,13 +207,13 @@ def main(cfg) -> None: model = load_from_checkpoint_dir(MegatronT0Model, cfg, trainer, modify_confg_fn=_modify_config) else: if cfg.model.restore_from_path: - t5_cfg = MegatronT5FinetuneModel.restore_from( + t5_cfg = MegatronT5SFTModel.restore_from( restore_path=cfg.model.restore_from_path, trainer=trainer, return_config=True ) - model = load_from_nemo(MegatronT5FinetuneModel, cfg, trainer, t5_cfg, modify_confg_fn=_modify_config) + model = load_from_nemo(MegatronT5SFTModel, cfg, trainer, t5_cfg, modify_confg_fn=_modify_config) else: validate_checkpoint_loading_args(cfg.model.pretrained_checkpoint) - model = load_from_checkpoint_dir(MegatronT5FinetuneModel, cfg, trainer, modify_confg_fn=_modify_config) + model = load_from_checkpoint_dir(MegatronT5SFTModel, cfg, trainer, modify_confg_fn=_modify_config) trainer.fit(model) trainer.validate(model) diff --git a/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_eval_config.yaml b/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_eval_config.yaml index ffa5385a7d6c..0409e0fec0aa 100755 --- a/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_eval_config.yaml +++ b/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_eval_config.yaml @@ -26,7 +26,7 @@ exp_manager: resume_ignore_no_checkpoint: True create_checkpoint_callback: True checkpoint_callback_params: - monitor: validation_${model.data.validation_ds.metric.name} + monitor: validation_${model.data.test_ds.metric.name} save_top_k: 1 mode: max save_nemo_on_train_end: True @@ -39,12 +39,12 @@ model: seed: 1234 tensor_model_parallel_size: 1 # intra-layer model parallelism pipeline_model_parallel_size: 1 # inter-layer model parallelism - + global_batch_size: 1 micro_batch_size: 1 restore_from_path: ??? # Path to an existing .nemo model you wish to add new tasks to or run inference with resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc. - save_nemo_on_validation_end: True # Saves an inference ready .nemo file every time a checkpoint is saved during training. + save_nemo_on_validation_end: True # Saves an inference ready .nemo file every time a checkpoint is saved during training. sync_batch_comm: False megatron_amp_O2: False @@ -53,8 +53,8 @@ model: # See Reducing Activation Recomputation in Large Transformer Models: https://arxiv.org/abs/2205.05198 for more details. sequence_parallel: False - ## Activation Checkpoint - activations_checkpoint_granularity: null # 'selective' or 'full' + ## Activation Checkpoint + activations_checkpoint_granularity: null # 'selective' or 'full' activations_checkpoint_method: null # 'uniform', 'block', not used with 'selective' # 'uniform' divides the total number of transformer layers and checkpoints the input activation # of each chunk at the specified granularity @@ -73,17 +73,29 @@ model: restore_from_path: null restore_from_ckpt_name: null restore_from_hparams_path: null - + # Used for adapter peft training adapter_tuning: type: 'parallel_adapter' # this should be either 'parallel_adapter' or 'linear_adapter' adapter_dim: 32 adapter_dropout: 0.0 - norm_position: 'pre' # This can be set to 'pre' or 'post', 'pre' is normally what is used. + norm_position: 'pre' # This can be set to 'pre', 'post' or null, 'pre' is normally what is used. column_init_method: 'xavier' # IGNORED if linear_adapter is used, options: xavier, zero or normal row_init_method: 'zero' # IGNORED if linear_adapter is used, options: xavier, zero or normal norm_type: 'mixedfusedlayernorm' # IGNORED if layer_adapter is used, options are ['layernorm', 'mixedfusedlayernorm'] - + layer_selection: null # selects in which layers to add adapters, e.g. [1,12] will add adapters to layer 1 (lowest) and 12. null will apply adapters to all layers + weight_tying: False + position_embedding_strategy: null # used only when weight_tying is True + + lora_tuning: + adapter_dim: 32 + adapter_dropout: 0.0 + column_init_method: 'xavier' # IGNORED if linear_adapter is used, options: xavier, zero or normal + row_init_method: 'zero' # IGNORED if linear_adapter is used, options: xavier, zero or normal + layer_selection: null # selects in which layers to add lora adapters. e.g. [1,12] will add lora to layer 1 (lowest) and 12. null will apply adapters to all layers + weight_tying: False + position_embedding_strategy: null # used only when weight_tying is True + # Used for p-tuning peft training p_tuning: virtual_tokens: 10 # The number of virtual tokens the prompt encoder should add at the start of the sequence @@ -91,18 +103,22 @@ model: embedding_dim: 1024 # the size of the prompt encoder embeddings init_std: 0.023 + ia3_tuning: + layer_selection: null # selects in which layers to add ia3 adapters. e.g. [1,12] will add lora to layer 1 (lowest) and 12. null will apply adapters to all layers + data: test_ds: file_names: ??? # Path to a list of JSONL files corresponding to the source data. Data format is identical to train_ds. names: ??? # Names of the corresponding datasets used to log metrics. - global_batch_size: ??? - micro_batch_size: ??? + global_batch_size: 1 + micro_batch_size: 1 shuffle: False num_workers: 0 pin_memory: True max_seq_length: 2048 min_seq_length: 1 drop_last: False + context_key: 'input' label_key: ${data.train_ds.label_key} add_eos: ${data.train_ds.add_eos} add_sep: ${data.train_ds.add_sep} diff --git a/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_tuning_config.yaml b/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_tuning_config.yaml index b020c1aa49ad..6b6af4b1c81b 100755 --- a/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_tuning_config.yaml +++ b/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_tuning_config.yaml @@ -1,4 +1,4 @@ -name: megatron_gpt_peft_tuning +name: megatron_gpt_peft_${model.peft.peft_scheme}_tuning trainer: devices: 1 @@ -10,7 +10,7 @@ trainer: use_distributed_sampler: False max_epochs: 9999 max_steps: 20000 # consumed_samples = global_step * micro_batch_size * data_parallel_size * accumulate_grad_batches - log_every_n_steps: 10 # frequency with which training steps are logged + log_every_n_steps: 10 # frequency with which training steps are logged val_check_interval: 200 # If is an int n > 1, will run val every n training steps, if a float 0.0 - 1.0 will run val every epoch fraction, e.g. 0.25 will run val every quarter epoch gradient_clip_val: 1.0 @@ -47,12 +47,12 @@ model: seed: 1234 tensor_model_parallel_size: 1 # intra-layer model parallelism pipeline_model_parallel_size: 1 # inter-layer model parallelism - + global_batch_size: 128 micro_batch_size: 4 restore_from_path: ??? # Path to an existing .nemo model you wish to add new tasks to or run inference with resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc. - save_nemo_on_validation_end: False # Saves an inference ready .nemo file every time a checkpoint is saved during training. + save_nemo_on_validation_end: False # Saves an inference ready .nemo file every time a checkpoint is saved during training. sync_batch_comm: False megatron_amp_O2: False @@ -61,8 +61,8 @@ model: # See Reducing Activation Recomputation in Large Transformer Models: https://arxiv.org/abs/2205.05198 for more details. sequence_parallel: False - ## Activation Checkpoint - activations_checkpoint_granularity: null # 'selective' or 'full' + ## Activation Checkpoint + activations_checkpoint_granularity: null # 'selective' or 'full' activations_checkpoint_method: null # 'uniform', 'block', not used with 'selective' # 'uniform' divides the total number of transformer layers and checkpoints the input activation # of each chunk at the specified granularity @@ -79,7 +79,7 @@ model: peft: peft_scheme: "adapter" # can be either adapter,ia3, or ptuning restore_from_path: null - + # Used for adapter peft training adapter_tuning: type: 'parallel_adapter' # this should be either 'parallel_adapter' or 'linear_adapter' @@ -92,7 +92,7 @@ model: layer_selection: null # selects in which layers to add adapters, e.g. [1,12] will add adapters to layer 1 (lowest) and 12. null will apply adapters to all layers weight_tying: False position_embedding_strategy: null # used only when weight_tying is True - + lora_tuning: adapter_dim: 32 adapter_dropout: 0.0 @@ -101,21 +101,21 @@ model: layer_selection: null # selects in which layers to add lora adapters. e.g. [1,12] will add lora to layer 1 (lowest) and 12. null will apply adapters to all layers weight_tying: False position_embedding_strategy: null # used only when weight_tying is True - + # Used for p-tuning peft training p_tuning: virtual_tokens: 10 # The number of virtual tokens the prompt encoder should add at the start of the sequence bottleneck_dim: 1024 # the size of the prompt encoder mlp bottleneck embedding_dim: 1024 # the size of the prompt encoder embeddings init_std: 0.023 - + ia3_tuning: layer_selection: null # selects in which layers to add ia3 adapters. e.g. [1,12] will add lora to layer 1 (lowest) and 12. null will apply adapters to all layers data: train_ds: # Example of how to specify paths to multiple datasets - # file_names: + # file_names: # - /path/to/squad.jsonl # - /path/to/mnli.jsonl # - /path/to/boolq.jsonl diff --git a/examples/nlp/language_modeling/tuning/conf/megatron_t5_peft_eval_config.yaml b/examples/nlp/language_modeling/tuning/conf/megatron_t5_peft_eval_config.yaml new file mode 100644 index 000000000000..0ef90a2343fa --- /dev/null +++ b/examples/nlp/language_modeling/tuning/conf/megatron_t5_peft_eval_config.yaml @@ -0,0 +1,213 @@ +name: megatron_t5_peft_${model.peft.peft_scheme}_tuning + +trainer: + devices: 1 + accelerator: gpu + num_nodes: 1 + precision: 16 + logger: False # logger provided by exp_manager + enable_checkpointing: False + use_distributed_sampler: False + max_epochs: 9999 + max_steps: 20000 # consumed_samples = global_step * micro_batch_size * data_parallel_size * accumulate_grad_batches + log_every_n_steps: 10 # frequency with which training steps are logged + val_check_interval: 200 # If is an int n > 1, will run val every n training steps, if a float 0.0 - 1.0 will run val every epoch fraction, e.g. 0.25 will run val every quarter epoch + gradient_clip_val: 1.0 + +exp_manager: + explicit_log_dir: null + exp_dir: null + name: ${name} + create_wandb_logger: False + wandb_logger_kwargs: + project: null + name: null + resume_if_exists: True + resume_ignore_no_checkpoint: True + create_checkpoint_callback: True + checkpoint_callback_params: + monitor: validation_${model.data.test_ds.metric.name} + save_top_k: 1 + mode: max + save_nemo_on_train_end: True + filename: '${name}--{${exp_manager.checkpoint_callback_params.monitor}:.3f}-{step}-{consumed_samples}' + model_parallel_size: ${model.tensor_model_parallel_size} + always_save_nemo: True + save_best_model: False + +model: + seed: 1234 + tensor_model_parallel_size: 1 # intra-layer model parallelism + pipeline_model_parallel_size: 1 # inter-layer model parallelism + + global_batch_size: 1 + micro_batch_size: 1 + restore_from_path: ??? # Path to an existing .nemo model you wish to add new tasks to or run inference with + resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc. + save_nemo_on_validation_end: True # Saves an inference ready .nemo file every time a checkpoint is saved during training. + sync_batch_comm: False + megatron_amp_O2: False + + ## Sequence Parallelism + # Makes tensor parallelism more memory efficient for LLMs (20B+) by parallelizing layer norms and dropout sequentially + # See Reducing Activation Recomputation in Large Transformer Models: https://arxiv.org/abs/2205.05198 for more details. + sequence_parallel: False + + ## Activation Checkpoint + activations_checkpoint_granularity: null # 'selective' or 'full' + activations_checkpoint_method: null # 'uniform', 'block', not used with 'selective' + # 'uniform' divides the total number of transformer layers and checkpoints the input activation + # of each chunk at the specified granularity + # 'block' checkpoints the specified number of layers per pipeline stage at the specified granularity + activations_checkpoint_num_layers: null # not used with 'selective' + activations_checkpoint_layers_per_pipeline: null + answer_only_loss: False # not used right now + gradient_as_bucket_view: False + + hidden_dropout: 0.0 + attention_dropout: 0.0 + ffn_dropout: 0.0 + + peft: + peft_scheme: "adapter" # can be either adapter,ia3, or ptuning + restore_from_path: null + restore_from_ckpt_name: null + restore_from_hparams_path: null + + # Used for adapter peft training + adapter_tuning: + type: 'parallel_adapter' # this should be either 'parallel_adapter' or 'linear_adapter' + adapter_dim: 32 + adapter_dropout: 0.0 + norm_position: 'pre' # This can be set to 'pre', 'post' or null, 'pre' is normally what is used. + column_init_method: 'xavier' # IGNORED if linear_adapter is used, options: xavier, zero or normal + row_init_method: 'zero' # IGNORED if linear_adapter is used, options: xavier, zero or normal + norm_type: 'mixedfusedlayernorm' # IGNORED if layer_adapter is used, options are ['layernorm', 'mixedfusedlayernorm'] + layer_selection: null # selects in which layers to add adapters, e.g. [1,12] will add adapters to layer 1 (lowest) and 12. null will apply adapters to all layers + weight_tying: False + position_embedding_strategy: null # used only when weight_tying is True + + lora_tuning: + adapter_dim: 32 + adapter_dropout: 0.0 + column_init_method: 'xavier' # IGNORED if linear_adapter is used, options: xavier, zero or normal + row_init_method: 'zero' # IGNORED if linear_adapter is used, options: xavier, zero or normal + layer_selection: null # selects in which layers to add lora adapters. e.g. [1,12] will add lora to layer 1 (lowest) and 12. null will apply adapters to all layers + weight_tying: False + position_embedding_strategy: null # used only when weight_tying is True + + # Used for p-tuning peft training + p_tuning: + virtual_tokens: 10 # The number of virtual tokens the prompt encoder should add at the start of the sequence + bottleneck_dim: 1024 # the size of the prompt encoder mlp bottleneck + embedding_dim: 1024 # the size of the prompt encoder embeddings + init_std: 0.023 + + ia3_tuning: + layer_selection: null # selects in which layers to add ia3 adapters. e.g. [1,12] will add lora to layer 1 (lowest) and 12. null will apply adapters to all layers + + data: + test_ds: + file_names: ??? # Path to a list of JSONL files corresponding to the source data. Data format is identical to train_ds. + names: ??? # Names of the corresponding datasets used to log metrics. + global_batch_size: 1 #${model.global_batch_size} + micro_batch_size: 1 #${model.micro_batch_size} + shuffle: False + num_workers: 0 + pin_memory: True + max_seq_length: 2048 + min_seq_length: 1 + drop_last: False + context_key: 'input' + label_key: ${data.train_ds.label_key} + add_eos: ${data.train_ds.add_eos} + add_sep: ${data.train_ds.add_sep} + add_bos: ${data.train_ds.add_bos} + separate_prompt_and_response_with_newline: ${data.train_ds.separate_prompt_and_response_with_newline} + write_predictions_to_file: False + output_file_path_prefix: null # Prefix of the file to write predictions to. + truncation_field: ${data.train_ds.truncation_field} # Options: keys in prompt_template index_mapping_dir: null # Path to a directory to write index mapping files. + index_mapping_dir: null # Path to a directory to write index mapping files. + prompt_template: ${data.train_ds.prompt_template} + tokens_to_generate: 32 # decide how many tokens we want to generate to evaluate performance with string metrics + + metric: + name: "loss" # Name of the evaluation metric to use. Options: ['exact_string_match', 'loss'] + average: null # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported. + num_classes: null + +inference: + greedy: True # Whether or not to use sampling ; use greedy decoding otherwise + top_k: 0 # The number of highest probability vocabulary tokens to keep for top-k-filtering. + top_p: 0.9 # If set to float < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation. + temperature: 1.0 # sampling temperature + all_probs: False # whether return the log prob for all the tokens in vocab + repetition_penalty: 1.2 # The parameter for repetition penalty. 1.0 means no penalty. + min_tokens_to_generate: 0 # The minimum length of the sequence to be generated. + compute_logprob: False # a flag used to compute logprob of all the input text, a very special case of running inference, default False + outfile_path: output.txt + compute_attention_mask: True + +# server-related configs +server: False # whether launch the API server +port: 5555 # the port number for the inference server +web_server: False # whether launch the web inference server +share: True # whether create a public URL +username: test # user name for web client +password: test2 # password for web client +web_port: 9889 # the port number of the web server 1058 +chat: False # use the chat interface +chatbot_config: + value: False # whether to inject the value attributes + attributes: + - name: Quality + min: 0 + max: 4 + key: quality + type: int + default: 4 + - name: Toxicity + min: 0 + max: 4 + key: toxcity + type: int + default: 0 + - name: Humor + min: 0 + max: 4 + key: humor + type: int + default: 0 + - name: Creativity + min: 0 + max: 4 + key: creativity + type: int + default: 0 + - name: Violence + min: 0 + max: 4 + key: violence + type: int + default: 0 + - name: Helpfulness + min: 0 + max: 4 + key: helpfulness + type: int + default: 4 + - name: Not_Appropriate + min: 0 + max: 4 + key: not_appropriate + type: int + default: 0 + - name: Language + choices: ['ar', 'bg', 'bn', 'ca', 'cs', 'da', 'de', 'el', 'en', 'eo', 'es', 'eu', 'fa', 'fi', 'fr', 'gl', 'he', 'hu', 'id', 'it', 'ja', 'ko', 'nb', 'nl', 'pl', 'pt', 'ro', 'ru', 'sk', 'sv', 'th', 'tr', 'uk', 'vi', 'zh'] + key: lang + type: list + default: en + + user: User + assistant: Assistant + system: "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\n\n" \ No newline at end of file diff --git a/examples/nlp/language_modeling/tuning/conf/megatron_t5_peft_tuning_config.yaml b/examples/nlp/language_modeling/tuning/conf/megatron_t5_peft_tuning_config.yaml new file mode 100644 index 000000000000..d0ee2c417856 --- /dev/null +++ b/examples/nlp/language_modeling/tuning/conf/megatron_t5_peft_tuning_config.yaml @@ -0,0 +1,220 @@ +name: megatron_t5_peft_${model.peft.peft_scheme}_tuning + +trainer: + devices: 1 + accelerator: gpu + num_nodes: 1 + precision: 16 + logger: False # logger provided by exp_manager + enable_checkpointing: False + use_distributed_sampler: False + max_epochs: 9999 + max_steps: 20000 # consumed_samples = global_step * micro_batch_size * data_parallel_size * accumulate_grad_batches + log_every_n_steps: 10 # frequency with which training steps are logged + val_check_interval: 200 # If is an int n > 1, will run val every n training steps, if a float 0.0 - 1.0 will run val every epoch fraction, e.g. 0.25 will run val every quarter epoch + gradient_clip_val: 1.0 + +exp_manager: + explicit_log_dir: null + exp_dir: null + name: ${name} + create_wandb_logger: False + wandb_logger_kwargs: + project: null + name: null + resume_if_exists: True + resume_ignore_no_checkpoint: True + create_checkpoint_callback: True + checkpoint_callback_params: + monitor: validation_${model.data.validation_ds.metric.name} + save_top_k: 1 + mode: min + save_nemo_on_train_end: True + filename: '${name}--{${exp_manager.checkpoint_callback_params.monitor}:.3f}-{step}-{consumed_samples}' + model_parallel_size: ${model.tensor_model_parallel_size} + always_save_nemo: False + save_best_model: True + create_early_stopping_callback: True + early_stopping_callback_params: + monitor: "val_loss" + mode: "min" + min_delta: 0.001 + patience: 10 + verbose: True + strict: False # Should be False to avoid a runtime error where EarlyStopping says monitor is unavailable, which sometimes happens with resumed training. + +model: + seed: 1234 + tensor_model_parallel_size: 1 # intra-layer model parallelism + pipeline_model_parallel_size: 1 # inter-layer model parallelism + + global_batch_size: 128 + micro_batch_size: 4 + restore_from_path: ??? # Path to an existing .nemo model you wish to add new tasks to or run inference with + resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc. + save_nemo_on_validation_end: False # Saves an inference ready .nemo file every time a checkpoint is saved during training. + sync_batch_comm: False + megatron_amp_O2: False + + ## Sequence Parallelism + # Makes tensor parallelism more memory efficient for LLMs (20B+) by parallelizing layer norms and dropout sequentially + # See Reducing Activation Recomputation in Large Transformer Models: https://arxiv.org/abs/2205.05198 for more details. + sequence_parallel: False + + ## Activation Checkpoint + activations_checkpoint_granularity: null # 'selective' or 'full' + activations_checkpoint_method: null # 'uniform', 'block', not used with 'selective' + # 'uniform' divides the total number of transformer layers and checkpoints the input activation + # of each chunk at the specified granularity + # 'block' checkpoints the specified number of layers per pipeline stage at the specified granularity + activations_checkpoint_num_layers: null # not used with 'selective' + activations_checkpoint_layers_per_pipeline: null + answer_only_loss: True + gradient_as_bucket_view: False + + hidden_dropout: 0.0 + attention_dropout: 0.0 + ffn_dropout: 0.0 + + peft: + peft_scheme: "adapter" # can be either adapter,ia3, or ptuning + restore_from_path: null + + # Used for adapter peft training + adapter_tuning: + type: 'parallel_adapter' # this should be either 'parallel_adapter' or 'linear_adapter' + adapter_dim: 32 + adapter_dropout: 0.0 + norm_position: 'pre' # This can be set to 'pre', 'post' or null, 'pre' is normally what is used. + column_init_method: 'xavier' # IGNORED if linear_adapter is used, options: xavier, zero or normal + row_init_method: 'zero' # IGNORED if linear_adapter is used, options: xavier, zero or normal + norm_type: 'mixedfusedlayernorm' # IGNORED if layer_adapter is used, options are ['layernorm', 'mixedfusedlayernorm'] + layer_selection: null # selects in which layers to add adapters, e.g. [1,12] will add adapters to layer 1 (lowest) and 12. null will apply adapters to all layers + weight_tying: False + position_embedding_strategy: null # used only when weight_tying is True + + lora_tuning: + adapter_dim: 32 + adapter_dropout: 0.0 + column_init_method: 'xavier' # IGNORED if linear_adapter is used, options: xavier, zero or normal + row_init_method: 'zero' # IGNORED if linear_adapter is used, options: xavier, zero or normal + layer_selection: null # selects in which layers to add lora adapters. e.g. [1,12] will add lora to layer 1 (lowest) and 12. null will apply adapters to all layers + weight_tying: False + position_embedding_strategy: null # used only when weight_tying is True + + # Used for p-tuning peft training + p_tuning: + virtual_tokens: 10 # The number of virtual tokens the prompt encoder should add at the start of the sequence + bottleneck_dim: 1024 # the size of the prompt encoder mlp bottleneck + embedding_dim: 1024 # the size of the prompt encoder embeddings + init_std: 0.023 + + ia3_tuning: + layer_selection: null # selects in which layers to add ia3 adapters. e.g. [1,12] will add lora to layer 1 (lowest) and 12. null will apply adapters to all layers + + data: + train_ds: + # Example of how to specify paths to multiple datasets + # file_names: + # - /path/to/squad.jsonl + # - /path/to/mnli.jsonl + # - /path/to/boolq.jsonl + # Example of how each dataset is formatted + # {'input': 'John von Neumann\nVon Neumann made fundamental contributions .... Q: What did the math of artificial viscosity do?', 'output': 'smoothed the shock transition without sacrificing basic physics'} + file_names: ??? # Path to a list of JSONL files corresponding to the source data. + global_batch_size: ${model.global_batch_size} + micro_batch_size: ${model.micro_batch_size} + shuffle: True + num_workers: 0 + memmap_workers: 2 + pin_memory: True + max_seq_length: 2048 + min_seq_length: 1 + drop_last: True + # Example of how to specify concat_sampling_probabilities + # concat_sampling_probabilities: + # - 0.5 + # - 0.25 + # - 0.25 + concat_sampling_probabilities: null # When providing a list of datasets, this arg defines the sampling probabilities from each dataset when strategy='random' + context_key: 'input' + label_key: 'output' + add_eos: True + add_sep: False + add_bos: False + separate_prompt_and_response_with_newline: False + truncation_field: "context" # Options: ['context', 'answer'] + index_mapping_dir: null # Path to a directory to write index mapping files. + prompt_template: "{input} {output}" # fstring to use for assistant prompt. Example: "Q: {input}\nA: {output}" + + validation_ds: + file_names: ??? # Path to a list of JSONL files corresponding to the source data. Data format is identical to train_ds. + names: null # Names of the corresponding datasets used to log metrics. + global_batch_size: ${model.global_batch_size} + micro_batch_size: ${model.micro_batch_size} + shuffle: False + num_workers: 0 + memmap_workers: ${model.data.train_ds.memmap_workers} + pin_memory: True + max_seq_length: 2048 + min_seq_length: 1 + drop_last: False + context_key: 'input' + label_key: 'output' + add_eos: ${model.data.train_ds.add_eos} + add_sep: ${model.data.train_ds.add_sep} + add_bos: ${model.data.train_ds.add_bos} + separate_prompt_and_response_with_newline: ${model.data.train_ds.separate_prompt_and_response_with_newline} + write_predictions_to_file: False + output_file_path_prefix: null # Prefix of the file to write predictions to. + truncation_field: "context" # Options: ['context', 'answer'] + index_mapping_dir: null # Path to a directory to write index mapping files. + prompt_template: ${model.data.train_ds.prompt_template} # fstring to use for assistant prompt. Example: "Q: {input}\nA: {output}" + tokens_to_generate: 32 # decide how many tokens we want to generate to evaluate performance with string metrics + metric: + name: "loss" # Name of the evaluation metric to use. Options: ['exact_string_match', 'loss'] + average: null # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported. + num_classes: null + test_ds: + file_names: null # Path to a list of JSONL files corresponding to the source data. Data format is identical to train_ds. + names: null # Names of the corresponding datasets used to log metrics. + global_batch_size: ${model.global_batch_size} + micro_batch_size: ${model.micro_batch_size} + shuffle: False + num_workers: 0 + memmap_workers: ${model.data.train_ds.memmap_workers} + pin_memory: True + max_seq_length: 2048 + min_seq_length: 1 + drop_last: False + context_key: 'input' + label_key: 'output' + add_eos: ${model.data.train_ds.add_eos} + add_sep: ${model.data.train_ds.add_sep} + add_bos: ${model.data.train_ds.add_bos} + separate_prompt_and_response_with_newline: ${model.data.train_ds.separate_prompt_and_response_with_newline} + write_predictions_to_file: False + output_file_path_prefix: null # Prefix of the file to write predictions to. + truncation_field: "context" # Options: ['context', 'answer'] + index_mapping_dir: null # Path to a directory to write index mapping files. + prompt_template: ${model.data.train_ds.prompt_template} + tokens_to_generate: 32 # decide how many tokens we want to generate to evaluate performance with string metrics + metric: + name: "loss" # Name of the evaluation metric to use. Options: ['exact_string_match', 'loss'] + average: null # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported. + num_classes: null + + optim: + name: fused_adam + lr: 1e-4 + weight_decay: 0.01 + betas: + - 0.9 + - 0.98 + sched: + name: CosineAnnealing + warmup_steps: 50 + min_lr: 0.0 # min_lr must be 0.0 for prompt learning when pipeline parallel > 1 + constant_steps: 0 # Constant steps should also be 0 when min_lr=0 + monitor: val_loss + reduce_on_plateau: false \ No newline at end of file diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_eval.py b/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_eval.py index ea368992bd5a..d08f4291321a 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_eval.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_eval.py @@ -27,6 +27,7 @@ from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy, NLPSaveRestoreConnector from nemo.core.config import hydra_runner from nemo.utils import logging +from nemo.utils.decorators import deprecated mp.set_start_method("spawn", force=True) @@ -47,6 +48,10 @@ raise EnvironmentError("GPU is needed for the inference") +@deprecated( + explanation=f"{__file__} is deprecated. Please use MegatronGPTSFTModel.add_adapter() for PEFT features." + "See updated scripts `megatron_gpt_peft_tuning.py` and `megatron_gpt_peft_eval.py` for examples." +) @hydra_runner(config_path="conf", config_name="megatron_gpt_adapter_inference") def main(cfg) -> None: diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py b/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py index 7aebad7e9010..15a7ed800ce4 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_adapter_tuning.py @@ -29,6 +29,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging +from nemo.utils.decorators import deprecated from nemo.utils.exp_manager import exp_manager mp.set_start_method("spawn", force=True) @@ -56,6 +57,10 @@ """ +@deprecated( + explanation=f"{__file__} is deprecated. Please use MegatronGPTSFTModel.add_adapter() for PEFT features." + "See updated scripts `megatron_gpt_peft_tuning.py` and `megatron_gpt_peft_eval.py` for examples." +) @hydra_runner(config_path="conf", config_name="megatron_gpt_adapter_tuning_config") def main(cfg) -> None: logging.info("\n\n************** Experiment configuration ***********") diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_eval.py b/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_eval.py index 5bf33b37ac72..2a6ca4b48049 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_eval.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_eval.py @@ -26,6 +26,7 @@ from nemo.core.config import hydra_runner from nemo.utils import logging +from nemo.utils.decorators import deprecated mp.set_start_method("spawn", force=True) @@ -46,6 +47,10 @@ raise EnvironmentError("GPU is needed for the inference") +@deprecated( + explanation=f"{__file__} is deprecated. Please use MegatronGPTSFTModel.add_adapter() for PEFT features." + "See updated scripts `megatron_gpt_peft_tuning.py` and `megatron_gpt_peft_eval.py` for examples." +) @hydra_runner(config_path="conf", config_name="megatron_gpt_adapter_inference") def main(cfg) -> None: diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py b/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py index b74d215b6ef7..b55739b08794 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_ia3_tuning.py @@ -29,6 +29,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging +from nemo.utils.decorators import deprecated from nemo.utils.exp_manager import exp_manager mp.set_start_method("spawn", force=True) @@ -56,6 +57,10 @@ """ +@deprecated( + explanation=f"{__file__} is deprecated. Please use MegatronGPTSFTModel.add_adapter() for PEFT features." + "See updated scripts `megatron_gpt_peft_tuning.py` and `megatron_gpt_peft_eval.py` for examples." +) @hydra_runner(config_path="conf", config_name="megatron_gpt_ia3_tuning_config") def main(cfg) -> None: logging.info("\n\n************** Experiment configuration ***********") diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_peft_eval.py b/examples/nlp/language_modeling/tuning/megatron_gpt_peft_eval.py index 9edb44f5cb18..8098e62684ee 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_peft_eval.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_peft_eval.py @@ -14,31 +14,18 @@ import asyncio -import json -import os import threading from functools import partial import torch import torch.multiprocessing as mp -from omegaconf.omegaconf import OmegaConf, open_dict -from pytorch_lightning import Trainer -from pytorch_lightning.plugins.environments import TorchElasticEnvironment -from torch.utils.data import DataLoader +from omegaconf.omegaconf import OmegaConf + -from nemo.collections.nlp.models.language_modeling.megatron_gpt_peft_models import MegatronGPTPEFTModel from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTSFTModel -from nemo.collections.nlp.models.nlp_model import NLPModel from nemo.collections.nlp.modules.common.text_generation_server import MegatronServer from nemo.collections.nlp.modules.common.text_generation_utils import generate -from nemo.collections.nlp.parts.nlp_overrides import ( - GradScaler, - MegatronHalfPrecisionPlugin, - NLPDDPStrategy, - NLPSaveRestoreConnector, - PEFTSaveRestoreConnector, - PipelineMixedPrecisionPlugin, -) +from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronLMPPTrainerBuilder from nemo.core.config import hydra_runner from nemo.utils import logging @@ -82,109 +69,64 @@ """ +def use_inference_server(cfg, model, trainer): + if not HAVE_MEGATRON_CORE: + raise ValueError('Megatron-core needs to be installed to use this feature!') + + from nemo.collections.nlp.modules.common.megatron_web_server import get_chatbot_demo, get_demo + + trainer.test(model, dataloaders=None) + + if parallel_state.is_pipeline_first_stage() and parallel_state.get_tensor_model_parallel_rank() == 0: + if cfg.web_server: + if cfg.chat: + defaults = { + 'user': cfg.chatbot_config.user, + 'assistant': cfg.chatbot_config.assistant, + 'system': cfg.chatbot_config.system, + } + web_ui = partial( + get_chatbot_demo, + defaults=defaults, + value=cfg.chatbot_config.value, + attributes=cfg.chatbot_config.attributes, + ) + else: + web_ui = get_demo + loop = asyncio.new_event_loop() + thread = threading.Thread( + target=web_ui, daemon=True, args=(cfg.share, cfg.username, cfg.password, cfg.port, cfg.web_port, loop), + ) + thread.start() + server = MegatronServer(model.cuda()) + server.run("0.0.0.0", port=cfg.port) + + while True: + choice = torch.cuda.LongTensor(1) + torch.distributed.broadcast(choice, 0) + if choice[0].item() == 0: + generate(model.cuda()) + + @hydra_runner(config_path="conf", config_name="megatron_gpt_peft_eval_config") def main(cfg) -> None: logging.info("\n\n************** Experiment configuration ***********") logging.info(f"\n{OmegaConf.to_yaml(cfg)}") - assert cfg.model.restore_from_path is not None - megatron_amp_o2 = cfg.model.get("megatron_amp_O2", False) - with_distributed_adam = False - - plugins = [] - strategy = NLPDDPStrategy( - no_ddp_communication_hook=True, # we don't use DDP for async grad allreduce - gradient_as_bucket_view=cfg.model.gradient_as_bucket_view, - find_unused_parameters=False, - ) - if cfg.trainer.precision in [16, '16', 'bf16', '16-mixed', 'bf16-mixed']: - scaler = None - if cfg.trainer.precision in [16, '16', '16-mixed']: - scaler = GradScaler( - init_scale=cfg.model.get("native_amp_init_scale", 2 ** 32), - growth_interval=cfg.model.get("native_amp_growth_interval", 1000), - hysteresis=cfg.model.get("hysteresis", 2), - enabled=False - if cfg.model.pipeline_model_parallel_size > 1 - else True, # turn off the grad scale for pipeline parallel LM model - ) - # MixedPrecisionPlugin in PTL >= 2.0 requires precision to be 16-mixed or bf16-mixed - plugin_precision = '16-mixed' - else: - plugin_precision = 'bf16-mixed' - if megatron_amp_o2 and not with_distributed_adam: - plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device="cuda", scaler=scaler)) - else: - plugins.append(PipelineMixedPrecisionPlugin(precision=plugin_precision, device="cuda", scaler=scaler)) - - if cfg.get("cluster_type", None) == "BCP": - plugins.append(TorchElasticEnvironment()) - - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer) - if cfg.model.peft.restore_from_path: - if cfg.model.peft.restore_from_path.endswith(".nemo"): - peft_model_cfg = MegatronGPTPEFTModel.restore_from( - restore_path=cfg.model.peft.restore_from_path, trainer=trainer, return_config=True, - ) - elif cfg.model.peft.restore_from_hparams_path: # not a .nemo model we expect a hparams.yaml file - peft_model_cfg = OmegaConf.to_container(OmegaConf.load(cfg.model.peft.restore_from_hparams_path).cfg) - peft_model_cfg = OmegaConf.create(peft_model_cfg) - # extract dict inside cfg key and convert it to DictConfig - # this allows interpolation to work the same way as config from the .restore_from method - else: - raise RuntimeError("This script requires a .nemo peft model or path to hparams.yaml (and a ckpt path).") - else: - peft_model_cfg = MegatronGPTSFTModel.restore_from( - restore_path=cfg.model.restore_from_path, trainer=trainer, return_config=True, - ) - - # hydra interpolation does not work here as the interpolation key is lost when PTL saves hparams - with open_dict(peft_model_cfg): - # update the model config of the trained model with params we want to set at inference time. - peft_model_cfg.precision = cfg.trainer.precision - peft_model_cfg.data.test_ds = cfg.model.data.test_ds - peft_model_cfg.activations_checkpoint_granularity = None - peft_model_cfg.activations_checkpoint_method = None - peft_model_cfg.activations_checkpoint_layers_per_pipeline = None - if peft_model_cfg.get("use_flash_attention", False): - peft_model_cfg.use_flash_attention = cfg.model.use_flash_attention - if cfg.model.get("seq_len_interpolation_factor", None) is not None: - peft_model_cfg["seq_len_interpolation_factor"] = cfg.model.seq_len_interpolation_factor - - with open_dict(cfg): - # update the config with the trained model config - # required for hydra interpolation to work inside cfg.inference - cfg.inference.add_BOS = peft_model_cfg.data.test_ds.add_bos - cfg.inference.tokens_to_generate = peft_model_cfg.data.test_ds.tokens_to_generate + trainer = MegatronLMPPTrainerBuilder(cfg).create_trainer() if cfg.model.peft.restore_from_path: - if cfg.model.peft.restore_from_path.endswith(".nemo"): - save_restore_connector = PEFTSaveRestoreConnector( - peft_model_nemo_path=cfg.model.peft.restore_from_path, peft_model_ckpt_path=None, - ) - else: - # attempting to load a ckpt peft model. - if cfg.model.peft.restore_from_ckpt_name: - ckpt_name = cfg.model.peft.restore_from_ckpt_name - else: - ckpt_name = "model_weights.ckpt" - save_restore_connector = PEFTSaveRestoreConnector( - peft_model_nemo_path=None, - peft_model_ckpt_path=cfg.model.peft.restore_from_path, - peft_model_ckpt_name=ckpt_name, - ) + model_cfg = MegatronGPTSFTModel.merge_inference_cfg(cfg.model.peft.restore_from_path, cfg) else: - save_restore_connector = NLPSaveRestoreConnector() + model_cfg = MegatronGPTSFTModel.merge_inference_cfg(cfg.model.restore_from_path, cfg) + + model = MegatronGPTSFTModel.restore_from(cfg.model.restore_from_path, model_cfg, trainer=trainer) - if os.path.isdir(cfg.model.restore_from_path): - save_restore_connector.model_extracted_dir = cfg.model.restore_from_path - model = MegatronGPTSFTModel.restore_from( - restore_path=cfg.model.restore_from_path, - trainer=trainer, - override_config_path=peft_model_cfg, - save_restore_connector=save_restore_connector, - ) + if cfg.model.peft.restore_from_path: + model.load_adapters(cfg.model.peft.restore_from_path) model.freeze() + logging.info(f"Freezing parameters for PEFT eval:\n{model.summarize()}") + if not cfg.model.get('use_flash_attention', False): cfg.inference.compute_attention_mask = True config = OmegaConf.to_container(cfg.inference, resolve=True) @@ -193,44 +135,7 @@ def main(cfg) -> None: if not cfg.server: trainer.test(model) else: - if not HAVE_MEGATRON_CORE: - raise ValueError('Megatron-core needs to be installed to use this feature!') - - from nemo.collections.nlp.modules.common.megatron_web_server import get_chatbot_demo, get_demo - - trainer.test(model, dataloaders=None) - - if parallel_state.is_pipeline_first_stage() and parallel_state.get_tensor_model_parallel_rank() == 0: - if cfg.web_server: - if cfg.chat: - defaults = { - 'user': cfg.chatbot_config.user, - 'assistant': cfg.chatbot_config.assistant, - 'system': cfg.chatbot_config.system, - } - web_ui = partial( - get_chatbot_demo, - defaults=defaults, - value=cfg.chatbot_config.value, - attributes=cfg.chatbot_config.attributes, - ) - else: - web_ui = get_demo - loop = asyncio.new_event_loop() - thread = threading.Thread( - target=web_ui, - daemon=True, - args=(cfg.share, cfg.username, cfg.password, cfg.port, cfg.web_port, loop), - ) - thread.start() - server = MegatronServer(model.cuda()) - server.run("0.0.0.0", port=cfg.port) - - while True: - choice = torch.cuda.LongTensor(1) - torch.distributed.broadcast(choice, 0) - if choice[0].item() == 0: - generate(model.cuda()) + use_inference_server(cfg, model, trainer) if __name__ == "__main__": diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py b/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py index 2c9293b2600e..cffe974a071a 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py @@ -12,46 +12,21 @@ # See the License for the specific language governing permissions and # limitations under the License. - -import os -import tempfile - import torch.multiprocessing as mp -from omegaconf.omegaconf import OmegaConf, open_dict -from pytorch_lightning import Trainer -from pytorch_lightning.plugins.environments import TorchElasticEnvironment -from pytorch_lightning.trainer.connectors.checkpoint_connector import _CheckpointConnector -from torch.utils.data import DataLoader, Dataset +from omegaconf.omegaconf import OmegaConf + +from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTSFTModel +from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronLMPPTrainerBuilder +from nemo.collections.nlp.parts.peft_config import PEFT_CONFIG_MAP -from nemo.collections.nlp.models.language_modeling.megatron_gpt_peft_models import ( - MegatronGPTAdapterModel, - MegatronGPTAdapterModelWeightTying, - MegatronGPTAdapterPTuningModel, - MegatronGPTIA3Model, - MegatronGPTLoRAModel, - MegatronGPTLoRAModelWeightTying, - MegatronGPTPTuningModel, -) -from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTModel -from nemo.collections.nlp.modules.common.megatron.megatron_init import fake_initialize_model_parallel -from nemo.collections.nlp.parts.nlp_overrides import ( - CustomProgressBar, - GradScaler, - MegatronHalfPrecisionPlugin, - NLPDDPStrategy, - NLPSaveRestoreConnector, - PEFTSaveRestoreConnector, - PipelineMixedPrecisionPlugin, -) from nemo.core.config import hydra_runner -from nemo.utils import AppState, logging +from nemo.utils import logging from nemo.utils.exp_manager import exp_manager -from nemo.utils.model_utils import inject_model_parallel_rank mp.set_start_method("spawn", force=True) """ -This is the script to train an Adapter infused GPT Model for text generation. +This is the script to finetuning a GPT Model with any PEFT method. A base GPT Model is required as a starting point. This script will then insert Adapters into each Transformer layer and will train/update only these adapters during training. The base GPT Model weights will remain frozen. @@ -63,189 +38,41 @@ Usage: Assuming the base model is a 125m GPT Model, with TP=1, PP=1: a. run a training run for a base gpt nemo file: - python megatron_gpt_adapter_tuning.py \ - "model.data.train_ds=[PATH TO TRAINING JSONL FILE]", - "model.data.validation_ds=[PATH TO VALIDATION JSONL FILE]", - model.language_model_path="PATH TO BASE GPT MODEL .nemo FILE" + python megatron_gpt_peft_tuning.py \ + "model.data.train_ds.file_names=[PATH TO TRAINING JSONL FILE]", + "model.data.train_ds.concat_sampling_probabilities=[SAMPLING VAL]", + "model.data.validation_ds.file_names=[PATH TO VALIDATION JSONL FILE]", + "model.data.validation_ds.names=[NAME FOR METRIC LOGGING]", + model.restore_from_path="PATH TO BASE GPT MODEL .nemo FILE" + model.peft.peft_scheme='lora' # lora, ptuning, adapter, ia3, or none for full fineutning name="NAME OF TRAINING RUN" exp_manager.exp_dir="DIR TO SAVE CHECKPOINTS and .nemo FILE", - trainer.max_epochs=2 +Please see lora.ipynb for a step-by-step guide. """ -def _modify_config(gpt_cfg, cfg, add_cfg_to_tree=False): - """ - This function modifies the original gpt pre-training config (gpt_cfg) with attributes from the finetuning config (cfg). - The `add_cfg_to_tree` arg adds `cfg` to the top of the yaml tree which is needed for all `hparams.yaml` files when passed as an arg to `load_from_checkpoint()`. - """ - OmegaConf.set_struct(gpt_cfg, True) - OmegaConf.resolve(cfg) - with open_dict(gpt_cfg): - gpt_cfg.megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False) - gpt_cfg.micro_batch_size = cfg.model.data.train_ds.micro_batch_size - gpt_cfg.global_batch_size = cfg.model.data.train_ds.global_batch_size - gpt_cfg.sequence_parallel = cfg.model.get("sequence_parallel", False) - gpt_cfg.activations_checkpoint_granularity = cfg.model.get("activations_checkpoint_granularity", None) - gpt_cfg.activations_checkpoint_num_layers = cfg.model.get("activations_checkpoint_num_layers", None) - gpt_cfg.activations_checkpoint_method = cfg.model.get("activations_checkpoint_method", None) - gpt_cfg.activations_checkpoint_layers_per_pipeline = cfg.model.get( - "activations_checkpoint_layers_per_pipeline", None - ) - gpt_cfg.data = cfg.model.data - gpt_cfg.optim = cfg.model.optim - gpt_cfg.precision = cfg.trainer.precision - gpt_cfg.answer_only_loss = cfg.model.answer_only_loss - gpt_cfg.restore_from_path = cfg.model.restore_from_path - gpt_cfg.resume_from_checkpoint = cfg.model.resume_from_checkpoint - gpt_cfg.save_nemo_on_validation_end = cfg.model.save_nemo_on_validation_end - gpt_cfg.gradient_as_bucket_view = cfg.model.gradient_as_bucket_view - gpt_cfg.hidden_dropout = cfg.model.get('hidden_dropout', 0.0) - gpt_cfg.attention_dropout = cfg.model.get('attention_dropout', 0.0) - gpt_cfg.ffn_dropout = cfg.model.ffn_dropout - gpt_cfg.peft = cfg.model.peft - peft_cls = _get_peft_scheme(cfg.model) - gpt_cfg.target = f"{peft_cls.__module__}.{peft_cls.__name__}" - - # This is needed when modifying a hparam file directly to load `.ckpt` files. - # This is not needed to modify the cfg in `.nemo` files. - if add_cfg_to_tree: - OmegaConf.resolve(gpt_cfg) - gpt_cfg.cfg = gpt_cfg - - return gpt_cfg - - -def _get_peft_scheme(cfg): - if cfg.peft.peft_scheme == "adapter": - if cfg.peft.adapter_tuning.weight_tying: - peft_cls = MegatronGPTAdapterModelWeightTying - else: - peft_cls = MegatronGPTAdapterModel - elif cfg.peft.peft_scheme == "ia3": - peft_cls = MegatronGPTIA3Model - elif cfg.peft.peft_scheme == "ptuning": - peft_cls = MegatronGPTPTuningModel - elif cfg.peft.peft_scheme == "adapter_and_ptuning": - peft_cls = MegatronGPTAdapterPTuningModel - elif cfg.peft.peft_scheme == "lora": - if cfg.peft.lora_tuning.weight_tying: - peft_cls = MegatronGPTLoRAModelWeightTying - else: - peft_cls = MegatronGPTLoRAModel - else: - raise RuntimeError("Invalid Peft scheme") - return peft_cls - - -def load_from_checkpoint_dir(cls, cfg, trainer, modify_confg_fn): - app_state = AppState() - if cfg.model.tensor_model_parallel_size > 1 or cfg.model.pipeline_model_parallel_size > 1: - app_state.model_parallel_size = cfg.model.tensor_model_parallel_size * cfg.model.pipeline_model_parallel_size - app_state.tensor_model_parallel_size = cfg.model.tensor_model_parallel_size - app_state.pipeline_model_parallel_size = cfg.model.pipeline_model_parallel_size - ( - app_state.tensor_model_parallel_rank, - app_state.pipeline_model_parallel_rank, - app_state.model_parallel_size, - app_state.data_parallel_size, - app_state.pipeline_model_parallel_split_rank, - app_state.virtual_pipeline_model_parallel_rank, - ) = fake_initialize_model_parallel( - world_size=app_state.model_parallel_size, - rank=trainer.global_rank, - tensor_model_parallel_size_=cfg.model.tensor_model_parallel_size, - pipeline_model_parallel_size_=cfg.model.pipeline_model_parallel_size, - pipeline_model_parallel_split_rank_=cfg.model.pipeline_model_parallel_split_rank, - ) - checkpoint_path = inject_model_parallel_rank( - os.path.join(cfg.model.pretrained_checkpoint.checkpoint_dir, cfg.model.pretrained_checkpoint.checkpoint_name) - ) - hparams_file = OmegaConf.load(cfg.model.pretrained_checkpoint.hparams_file) - gpt_cfg = modify_confg_fn(hparams_file.cfg, cfg, add_cfg_to_tree=True) - with tempfile.NamedTemporaryFile(suffix='.yaml') as f: - OmegaConf.save(config=gpt_cfg, f=f.name) - model = cls.load_from_checkpoint(checkpoint_path=checkpoint_path, trainer=trainer, hparams_file=f.name,) - return model - - -def validate_checkpoint_loading_args(cfg): - if cfg.checkpoint_dir is None or not os.path.isdir(cfg.checkpoint_dir): - raise ValueError(f'Checkpoint directory {cfg.checkpoint_dir} does not exist or is not a directory.') - if cfg.checkpoint_name is None: - raise ValueError(f'Checkpoint name {cfg.checkpoint_name} is not valid.') - if cfg.hparams_file is None or not os.path.isfile(cfg.hparams_file): - raise ValueError(f'Hparams file {cfg.hparams_file} does not exist or is not a file.') - - @hydra_runner(config_path="conf", config_name="megatron_gpt_peft_tuning_config") def main(cfg) -> None: logging.info("\n\n************** Experiment configuration ***********") logging.info(f'\n{OmegaConf.to_yaml(cfg)}') - megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False) - with_distributed_adam = cfg.model.optim.get('name') == 'distributed_fused_adam' - - plugins = [] - strategy = NLPDDPStrategy( - no_ddp_communication_hook=True, # we don't use DDP for async grad allreduce - gradient_as_bucket_view=cfg.model.gradient_as_bucket_view, - find_unused_parameters=False, - ) - if cfg.trainer.precision in [16, '16', 'bf16', '16-mixed', 'bf16-mixed']: - scaler = None - if cfg.trainer.precision in [16, '16', '16-mixed']: - scaler = GradScaler( - init_scale=cfg.model.get('native_amp_init_scale', 2 ** 32), - growth_interval=cfg.model.get('native_amp_growth_interval', 1000), - hysteresis=cfg.model.get('hysteresis', 2), - enabled=False - if cfg.model.pipeline_model_parallel_size > 1 - else True, # turn off the grad scale for pipeline parallel LM model - ) - # MixedPrecisionPlugin in PTL >= 2.0 requires precision to be 16-mixed or bf16-mixed - plugin_precision = '16-mixed' - else: - plugin_precision = 'bf16-mixed' - if megatron_amp_o2 and not with_distributed_adam: - plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler)) - else: - plugins.append(PipelineMixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler)) - - if cfg.get('cluster_type', None) == 'BCP': - plugins.append(TorchElasticEnvironment()) - - trainer = Trainer(plugins=plugins, strategy=strategy, **cfg.trainer, callbacks=[CustomProgressBar()]) + trainer = MegatronLMPPTrainerBuilder(cfg).create_trainer() exp_manager(trainer, cfg.exp_manager) - # update resume from checkpoint found by exp_manager - if cfg.model.resume_from_checkpoint is not None: - trainer.ckpt_path = cfg.model.resume_from_checkpoint - logging.info(f'Resuming training from checkpoint: {trainer.ckpt_path}') - if cfg.model.restore_from_path: - base_model_save_restore_connector = NLPSaveRestoreConnector() - if os.path.isdir(cfg.model.restore_from_path): - base_model_save_restore_connector.model_extracted_dir = cfg.model.restore_from_path - base_model_cfg = MegatronGPTModel.restore_from( - restore_path=cfg.model.restore_from_path, - trainer=trainer, - return_config=True, - save_restore_connector=base_model_save_restore_connector, - ) - base_model_cfg = _modify_config(base_model_cfg, cfg, add_cfg_to_tree=False) - save_restore_connector = PEFTSaveRestoreConnector( - peft_model_nemo_path=cfg.model.peft.restore_from_path, peft_model_ckpt_path=trainer.ckpt_path - ) - if os.path.isdir(cfg.model.restore_from_path): - save_restore_connector.model_extracted_dir = cfg.model.restore_from_path - peft_cls = _get_peft_scheme(cfg.model) - model = peft_cls.restore_from( - restore_path=cfg.model.restore_from_path, - trainer=trainer, - override_config_path=base_model_cfg, - save_restore_connector=save_restore_connector, - ) + model_cfg = MegatronGPTSFTModel.merge_cfg_with(cfg.model.restore_from_path, cfg) + model = MegatronGPTSFTModel.restore_from(cfg.model.restore_from_path, model_cfg, trainer=trainer) + peft_cfg_cls = PEFT_CONFIG_MAP[cfg.model.peft.peft_scheme] + + if cfg.model.peft.restore_from_path is not None: + # initialize peft weights from a checkpoint instead of randomly + # This is not the same as resume training because optimizer states are not restored. + logging.info("PEFT Weights will be loaded from", cfg.model.peft.restore_from_path) + model.load_adapters(cfg.model.peft.restore_from_path, peft_cfg_cls(model_cfg)) + elif peft_cfg_cls is not None: + logging.info("Adding adapter weights to the model for PEFT") + model.add_adapter(peft_cfg_cls(model_cfg)) else: - raise RuntimeError("PEFT training needs a trained base model present.") + logging.info(f"Running full finetuning since no peft scheme is given.\n{model.summarize()}") trainer.fit(model) diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py b/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py index 2d0a0d38a762..870ff2921cce 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py @@ -32,6 +32,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import AppState, logging +from nemo.utils.decorators import deprecated from nemo.utils.exp_manager import exp_manager from nemo.utils.model_utils import inject_model_parallel_rank @@ -145,6 +146,10 @@ def validate_checkpoint_loading_args(cfg): raise ValueError(f'Hparams file {cfg.hparams_file} does not exist or is not a file.') +@deprecated( + explanation=f"{__file__} is deprecated. PEFT and SFT scripts are now consolidated" + "See updated scripts `megatron_gpt_peft_tuning.py` and `megatron_gpt_peft_eval.py` for examples." +) @hydra_runner(config_path="conf", config_name="megatron_gpt_sft") def main(cfg) -> None: logging.info("\n\n************** Experiment configuration ***********") diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_adapter_eval.py b/examples/nlp/language_modeling/tuning/megatron_t5_adapter_eval.py index bdeec2b5a9c1..5fd07e85ce2d 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_adapter_eval.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_adapter_eval.py @@ -25,6 +25,7 @@ from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy from nemo.core.config import hydra_runner from nemo.utils.app_state import AppState +from nemo.utils.decorators import deprecated mp.set_start_method("spawn", force=True) @@ -45,6 +46,10 @@ raise EnvironmentError("GPU is needed for the inference") +@deprecated( + explanation=f"{__file__} is deprecated. Please use MegatronT5SFTModel.add_adapter() for PEFT features." + "See updated scripts `megatron_t5_peft_tuning.py` and `megatron_t5_peft_eval.py` for examples." +) @hydra_runner(config_path="conf", config_name="megatron_t5_adapter_inference") def main(cfg) -> None: diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py b/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py index c096a75d0b9d..ebc6a0327b78 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_adapter_tuning.py @@ -29,6 +29,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging +from nemo.utils.decorators import deprecated from nemo.utils.exp_manager import exp_manager mp.set_start_method("spawn", force=True) @@ -56,6 +57,10 @@ """ +@deprecated( + explanation=f"{__file__} is deprecated. Please use MegatronT5SFTModel.add_adapter() for PEFT features." + "See updated scripts `megatron_t5_peft_tuning.py` and `megatron_t5_peft_eval.py` for examples." +) @hydra_runner(config_path="conf", config_name="megatron_t5_adapter_tuning_config") def main(cfg) -> None: logging.info("\n\n************** Experiment configuration ***********") diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_ia3_eval.py b/examples/nlp/language_modeling/tuning/megatron_t5_ia3_eval.py index 8a8ddae166e1..cc9dfef059b8 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_ia3_eval.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_ia3_eval.py @@ -25,6 +25,7 @@ from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy from nemo.core.config import hydra_runner from nemo.utils.app_state import AppState +from nemo.utils.decorators import deprecated mp.set_start_method("spawn", force=True) @@ -45,6 +46,10 @@ raise EnvironmentError("GPU is needed for the inference") +@deprecated( + explanation=f"{__file__} is deprecated. Please use MegatronT5SFTModel.add_adapter() for PEFT features." + "See updated scripts `megatron_t5_peft_tuning.py` and `megatron_t5_peft_eval.py` for examples." +) @hydra_runner(config_path="conf", config_name="megatron_t5_ia3_inference") def main(cfg) -> None: diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py b/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py index 46b590acd725..fc508611c9b1 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_ia3_tuning.py @@ -29,6 +29,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging +from nemo.utils.decorators import deprecated from nemo.utils.exp_manager import exp_manager mp.set_start_method("spawn", force=True) @@ -56,6 +57,10 @@ """ +@deprecated( + explanation=f"{__file__} is deprecated. Please use MegatronT5SFTModel.add_adapter() for PEFT features." + "See updated scripts `megatron_t5_peft_tuning.py` and `megatron_t5_peft_eval.py` for examples." +) @hydra_runner(config_path="conf", config_name="megatron_t5_ia3_tuning_config") def main(cfg) -> None: logging.info("\n\n************** Experiment configuration ***********") diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_lora_eval.py b/examples/nlp/language_modeling/tuning/megatron_t5_lora_eval.py index d9de94843071..38032d06a8c8 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_lora_eval.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_lora_eval.py @@ -25,6 +25,7 @@ from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy from nemo.core.config import hydra_runner from nemo.utils.app_state import AppState +from nemo.utils.decorators import deprecated mp.set_start_method("spawn", force=True) @@ -45,6 +46,10 @@ raise EnvironmentError("GPU is needed for the inference") +@deprecated( + explanation=f"{__file__} is deprecated. Please use MegatronT5SFTModel.add_adapter() for PEFT features." + "See updated scripts `megatron_t5_peft_tuning.py` and `megatron_t5_peft_eval.py` for examples." +) @hydra_runner(config_path="conf", config_name="megatron_t5_adapter_inference") def main(cfg) -> None: diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_lora_tuning.py b/examples/nlp/language_modeling/tuning/megatron_t5_lora_tuning.py index fb982a273662..73cfae76b7a5 100644 --- a/examples/nlp/language_modeling/tuning/megatron_t5_lora_tuning.py +++ b/examples/nlp/language_modeling/tuning/megatron_t5_lora_tuning.py @@ -29,6 +29,7 @@ ) from nemo.core.config import hydra_runner from nemo.utils import logging +from nemo.utils.decorators import deprecated from nemo.utils.exp_manager import exp_manager mp.set_start_method("spawn", force=True) @@ -56,6 +57,10 @@ """ +@deprecated( + explanation=f"{__file__} is deprecated. Please use MegatronT5SFTModel.add_adapter() for PEFT features." + "See updated scripts `megatron_t5_peft_tuning.py` and `megatron_t5_peft_eval.py` for examples." +) @hydra_runner(config_path="conf", config_name="megatron_t5_lora_tuning_config") def main(cfg) -> None: logging.info("\n\n************** Experiment configuration ***********") diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_peft_eval.py b/examples/nlp/language_modeling/tuning/megatron_t5_peft_eval.py new file mode 100644 index 000000000000..d9d4d075362b --- /dev/null +++ b/examples/nlp/language_modeling/tuning/megatron_t5_peft_eval.py @@ -0,0 +1,135 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import asyncio +import threading +from functools import partial + +import torch +import torch.multiprocessing as mp +from omegaconf.omegaconf import OmegaConf + + +from nemo.collections.nlp.models.language_modeling.megatron_t5_sft_model import MegatronT5SFTModel +from nemo.collections.nlp.modules.common.text_generation_server import MegatronServer +from nemo.collections.nlp.modules.common.text_generation_utils import generate +from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronLMPPTrainerBuilder +from nemo.core.config import hydra_runner +from nemo.utils import logging + +try: + from megatron.core import parallel_state + + HAVE_MEGATRON_CORE = True +except (ImportError, ModuleNotFoundError): + HAVE_MEGATRON_CORE = False + +mp.set_start_method("spawn", force=True) +""" +This is the script to run inference with a PEFT model or an SFT Model. + +If you want to evaluate an SFT .nemo file: + +python examples/nlp/language_modeling/tuning/megatron_t5_peft_eval.py \ + model.restore_from_path= \ + model.peft.restore_from_path=null \ + trainer.devices=1 model.data.test_ds.file_names=\[, ] \ + model.data.test_ds.names=\['name_for_test_file1', 'name_for_test_file2'] \ # this is not the filename just some identifier + model.data.test_ds.global_batch_size=4 \ # or some other value + model.data.test_ds.micro_batch_size=4 \ + model.data.test_ds.tokens_to_generate=30 \ + inference.greedy=True \ + inference.outfile_path=\'' + +If you want to evaluate a PEFT Model, you should provide a base T5 model and a PEFT model .nemo file + +python examples/nlp/language_modeling/tuning/megatron_t5_peft_eval.py \ + model.restore_from_path= \ + model.peft.restore_from_path= \ # this will be created if you use `megatron_t5_peft_tuning.py` + trainer.devices=1 model.data.test_ds.file_names=\[, ] \ + model.data.test_ds.names=\['name_for_test_file1', 'name_for_test_file2'] \ # this is not the filename just some identifier + model.data.test_ds.global_batch_size=4 \ # or some other value + model.data.test_ds.micro_batch_size=4 \ + model.data.test_ds.tokens_to_generate=30 \ + inference.greedy=True \ + inference.outfile_path=\'' + +""" + + +def use_inference_server(cfg, model, trainer): + if not HAVE_MEGATRON_CORE: + raise ValueError('Megatron-core needs to be installed to use this feature!') + + from nemo.collections.nlp.modules.common.megatron_web_server import get_chatbot_demo, get_demo + + trainer.test(model, dataloaders=None) + + if parallel_state.is_pipeline_first_stage() and parallel_state.get_tensor_model_parallel_rank() == 0: + if cfg.web_server: + if cfg.chat: + defaults = { + 'user': cfg.chatbot_config.user, + 'assistant': cfg.chatbot_config.assistant, + 'system': cfg.chatbot_config.system, + } + web_ui = partial( + get_chatbot_demo, + defaults=defaults, + value=cfg.chatbot_config.value, + attributes=cfg.chatbot_config.attributes, + ) + else: + web_ui = get_demo + loop = asyncio.new_event_loop() + thread = threading.Thread( + target=web_ui, daemon=True, args=(cfg.share, cfg.username, cfg.password, cfg.port, cfg.web_port, loop), + ) + thread.start() + server = MegatronServer(model.cuda()) + server.run("0.0.0.0", port=cfg.port) + + while True: + choice = torch.cuda.LongTensor(1) + torch.distributed.broadcast(choice, 0) + if choice[0].item() == 0: + generate(model.cuda()) + + +@hydra_runner(config_path="conf", config_name="megatron_t5_peft_eval_config") +def main(cfg) -> None: + logging.info("\n\n************** Experiment configuration ***********") + logging.info(f"\n{OmegaConf.to_yaml(cfg)}") + trainer = MegatronLMPPTrainerBuilder(cfg).create_trainer() + + model_cfg = MegatronT5SFTModel.merge_inference_cfg(cfg.model.peft.restore_from_path, cfg) + model = MegatronT5SFTModel.restore_from(cfg.model.restore_from_path, model_cfg, trainer=trainer) + + model.load_adapters(cfg.model.peft.restore_from_path) + + model.freeze() + logging.info(f"Freezing parameters for PEFT eval:\n{model.summarize()}") + + if not cfg.model.get('use_flash_attention', False): + cfg.inference.compute_attention_mask = True + + if not cfg.server: + trainer.test(model) + else: + use_inference_server(cfg, model, trainer) + + +if __name__ == "__main__": + main() diff --git a/examples/nlp/language_modeling/tuning/megatron_t5_peft_tuning.py b/examples/nlp/language_modeling/tuning/megatron_t5_peft_tuning.py new file mode 100644 index 000000000000..04e25956aed2 --- /dev/null +++ b/examples/nlp/language_modeling/tuning/megatron_t5_peft_tuning.py @@ -0,0 +1,65 @@ +# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch.multiprocessing as mp +from omegaconf.omegaconf import OmegaConf + +from nemo.collections.nlp.models.language_modeling.megatron_t5_sft_model import MegatronT5SFTModel +from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronLMPPTrainerBuilder +from nemo.collections.nlp.parts.peft_config import PEFT_CONFIG_MAP +from nemo.core.config import hydra_runner +from nemo.utils import logging +from nemo.utils.exp_manager import exp_manager + +mp.set_start_method("spawn", force=True) + +""" +This is the script to finetuning a T5 Model with any PEFT method. +A base T5 Model is required as a starting point. This script will then insert +Adapters into each Transformer layer and will train/update only these adapters +during training. The base T5 Model weights will remain frozen. + +This script is exactly the same as the peft tuning script for GPT. For more details +please refer to the GPT script and docs. +""" + + +@hydra_runner(config_path="conf", config_name="megatron_t5_peft_tuning_config") +def main(cfg) -> None: + logging.info("\n\n************** Experiment configuration ***********") + logging.info(f'\n{OmegaConf.to_yaml(cfg)}') + + trainer = MegatronLMPPTrainerBuilder(cfg).create_trainer() + exp_manager(trainer, cfg.exp_manager) + + model_cfg = MegatronT5SFTModel.merge_cfg_with(cfg.model.restore_from_path, cfg) + model = MegatronT5SFTModel.restore_from(cfg.model.restore_from_path, model_cfg, trainer=trainer) + peft_cfg_cls = PEFT_CONFIG_MAP[cfg.model.peft.peft_scheme] + + if cfg.model.peft.restore_from_path is not None: + # initialize peft weights from a checkpoint instead of randomly + # This is not the same as resume training because optimizer states are not restored. + logging.info("PEFT Weights will be loaded from", cfg.model.peft.restore_from_path) + model.load_adapters(cfg.model.peft.restore_from_path, peft_cfg_cls(model_cfg)) + elif peft_cfg_cls is not None: + logging.info("Adding adapter weights to the model for PEFT") + model.add_adapter(peft_cfg_cls(model_cfg)) + else: + logging.info(f"Running full finetuning since no peft scheme is given.\n{model.summarize()}") + + trainer.fit(model) + + +if __name__ == '__main__': + main() diff --git a/nemo/collections/nlp/data/language_modeling/megatron/t5_sft_dataset.py b/nemo/collections/nlp/data/language_modeling/megatron/t5_sft_dataset.py new file mode 100644 index 000000000000..7e60fdd11f80 --- /dev/null +++ b/nemo/collections/nlp/data/language_modeling/megatron/t5_sft_dataset.py @@ -0,0 +1,169 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import json +import os +from typing import Optional + +import numpy as np +import torch +from datasets import load_dataset + +from nemo.collections.common.tokenizers.tokenizer_spec import TokenizerSpec +from nemo.collections.nlp.data.language_modeling.text_memmap_dataset import JSONLMemMapDataset +from nemo.core.classes import Dataset +from nemo.utils import logging + +__all__ = ['T5SFTDataset'] + + +class T5SFTDataset(Dataset): + """Sequence to Sequence Dataset in memory. + Similar to SequenceToSequenceDataset but with the same input format as GPTSFTDataset + """ + + def __init__( + self, + file_path: str, + src_tokenizer: TokenizerSpec, + tgt_tokenizer: TokenizerSpec, + max_src_seq_length: int, + max_tgt_seq_length: int, + add_bos_to_input: bool = True, + add_eos_to_input: bool = True, + replace_bos_with_pad: bool = False, + index_mapping_dir: str = None, + memmap_workers: Optional[int] = None, + hf_dataset: bool = False, + ): + """ + index_mapping_dir: Directory to save the index mapping to. If None, will write to the same folder as the dataset. + hf_dataset: Whether to load the json file with the HuggingFace dataset. otherwise, will load the jsonl file with the JSONLMemMapDataset. + """ + super().__init__() + self.file_path = file_path + self.src_tokenizer = src_tokenizer + self.tgt_tokenizer = tgt_tokenizer + self.max_src_seq_length = max_src_seq_length + self.max_tgt_seq_length = max_tgt_seq_length + self.add_bos_to_input = add_bos_to_input + self.add_eos_to_input = add_eos_to_input + self.replace_bos_with_pad = replace_bos_with_pad + assert self.max_src_seq_length > 0 + assert self.max_tgt_seq_length > 0 + + # check file exists + if not os.path.exists(self.file_path): + raise FileNotFoundError(f"Data file {self.file_path} not found") + + if hf_dataset: + self.indexed_dataset = load_dataset( + 'json', data_files=file_path, cache_dir=index_mapping_dir, num_proc=memmap_workers, split='train' + ) + else: + self.indexed_dataset = JSONLMemMapDataset( + dataset_paths=[file_path], + tokenizer=None, + header_lines=0, + index_mapping_dir=index_mapping_dir, + workers=memmap_workers, + ) + + def _process_src(self, src): + src = self.src_tokenizer.text_to_ids(src.strip()) + if self.add_bos_to_input: + src = [self.src_tokenizer.pad_id if self.replace_bos_with_pad else self.src_tokenizer.bos_id] + src + if self.add_eos_to_input: + src = src + [self.src_tokenizer.eos_id] + if len(src) > self.max_src_seq_length: + src = src[-self.max_src_seq_length + 1 :] + return src + + def _process_tgt(self, tgt): + tgt = ( + [self.tgt_tokenizer.pad_id if self.replace_bos_with_pad else self.tgt_tokenizer.bos_id] + + self.tgt_tokenizer.text_to_ids(tgt.strip()) + + [self.tgt_tokenizer.eos_id] + ) + if len(tgt) > self.max_tgt_seq_length: + tgt = tgt[-self.max_tgt_seq_length + 1 :] + return tgt + + def __len__(self): + return len(self.indexed_dataset) + + def __getitem__(self, idx): + example = self.indexed_dataset[idx] + text_enc = self._process_src(example['input']) + tgt = self._process_tgt(example['output']) + text_dec = tgt[:-1] + labels = tgt[1:] + return {'text_enc': text_enc, 'text_dec': text_dec, 'labels': labels} + + def collate_fn(self, batch): + text_enc = [item['text_enc'] for item in batch] + text_dec = [item['text_dec'] for item in batch] + labels = [item['labels'] for item in batch] + + if isinstance(text_enc[0], np.ndarray): + text_enc = [x.tolist() for x in text_enc] + + if isinstance(text_dec[0], np.ndarray): + text_dec = [x.tolist() for x in text_dec] + + if isinstance(labels[0], np.ndarray): + labels = [x.tolist() for x in labels] + + max_dec_input_length = max([len(item) for item in text_dec]) if text_dec else 0 + max_enc_input_length = max([len(item) for item in text_enc]) if text_enc else 0 + max_label_length = max([len(item) for item in labels]) if labels else 0 + + loss_mask = [([1] * (len(item))) + ([0] * (max_label_length - len(item))) for item in labels] + text_enc = [item + [self.src_tokenizer.pad_id] * (max_enc_input_length - len(item)) for item in text_enc] + text_dec = [item + [self.tgt_tokenizer.pad_id] * (max_dec_input_length - len(item)) for item in text_dec] + labels = [item + [self.tgt_tokenizer.pad_id] * (max_label_length - len(item)) for item in labels] + + text_enc = torch.LongTensor(text_enc) + text_dec = torch.LongTensor(text_dec) + labels = torch.LongTensor(labels) + loss_mask = torch.LongTensor(loss_mask) + + enc_mask = (text_enc != self.src_tokenizer.pad_id).long() + dec_mask = (text_dec != self.tgt_tokenizer.pad_id).long() + + return { + 'text_enc': text_enc, + 'text_dec': text_dec, + 'labels': labels, + 'loss_mask': loss_mask, + 'enc_mask': enc_mask, + 'dec_mask': dec_mask, + } + + +def convert_data_file_format(src_file_name, tgt_file_name, output_file_name): + """ + Converts the old two-file format used by SequenceToSequenceDataset to the new JSONL format used by T5SFTDataset + """ + output_lines = [] + with open(src_file_name, encoding='utf8') as f_src, open(tgt_file_name, encoding='utf8') as f_tgt: + for i, (src, tgt) in enumerate(zip(f_src, f_tgt)): + if i % 10000 == 0 and i != 0: + logging.info(f"Read {i} lines from {src_file_name} & {tgt_file_name}") + output_lines.append({'input': src, 'output': tgt}) + + logging.info(f'Dataset Length : {len(output_lines)}') + + with open(output_file_name, "w") as f_json: + for line in output_lines: + f_json.write(json.dumps(line) + '\n') diff --git a/nemo/collections/nlp/models/language_modeling/megatron_glue_model.py b/nemo/collections/nlp/models/language_modeling/megatron_glue_model.py index 5cc0f7ea3a32..c0a4b6351530 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_glue_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_glue_model.py @@ -18,7 +18,7 @@ TextToTextGLUEDataset, TextToTextXNLIDataset, ) -from nemo.collections.nlp.models.language_modeling.megatron_finetune_model import MegatronT5FinetuneModel +from nemo.collections.nlp.models.language_modeling.megatron_t5_sft_model import MegatronT5SFTModel from nemo.utils import logging try: @@ -33,8 +33,8 @@ __all__ = ['MegatronT5GLUEModel'] -class MegatronT5GLUEModel(MegatronT5FinetuneModel): - """GLUE Model that Inherits from MegatronT5FinetuneModel and overrides the dataset building.""" +class MegatronT5GLUEModel(MegatronT5SFTModel): + """GLUE Model that Inherits from MegatronT5SFTModel and overrides the dataset building.""" def __init__(self, cfg: DictConfig, trainer: Trainer): super().__init__(cfg, trainer=trainer) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py index 1f905430e87c..3d7a5a127399 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py @@ -196,9 +196,8 @@ def __init__(self, cfg: DictConfig, trainer: Trainer): raise ImportError( "Apex was not found. Please see the NeMo README for installation instructions: https://github.com/NVIDIA/NeMo#megatron-gpt." ) - if not HAVE_MEGATRON_CORE: - raise ImportError( + logging.warning( "megatron-core was not found. Please see the NeMo README for installation instructions: https://github.com/NVIDIA/NeMo#megatron-gpt." ) # this prevents base constructor from initializing tokenizer diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_peft_models.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_peft_models.py index ba7dd4529326..dfe2627b70d1 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_peft_models.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_peft_models.py @@ -35,6 +35,7 @@ ) from nemo.core.classes.mixins import adapter_mixins from nemo.utils import logging, model_utils +from nemo.utils.decorators import deprecated try: from megatron.core import parallel_state @@ -46,6 +47,10 @@ HAVE_MEGATRON_CORE = False +@deprecated( + explanation="Please use MegatronGPTSFTModel.add_adapter() for PEFT features." + "See the updated `megatron_gpt_peft_tuning.py` for an example." +) class MegatronGPTPEFTModel(MegatronGPTSFTModel): """ base class for all mixin based adapter models @@ -70,7 +75,7 @@ def first_stage_of_pipeline(self): return False def init_peft_modules(self): - """ + """ Randomly initialize the peft params and add them to the appropriate modules. """ assert len(self.peft_name_keys) > 0, "peft_name_keys have not been set no PEFT modules will be added" @@ -118,7 +123,7 @@ def setup(self, stage=None): self.setup_complete = True def get_all_keys(self,): - """ + """ Returns all the keys in the model """ k = [n for n, p in self.named_parameters()] @@ -127,7 +132,7 @@ def get_all_keys(self,): return set(k + b) def get_peft_state_dict(self,): - """ + """ Gets the keys associated with the adapters only. """ state_dict = self.model.state_dict(prefix="model.module." if self.cfg.megatron_amp_O2 else "model.") @@ -205,13 +210,13 @@ def on_load_checkpoint(self, checkpoint) -> None: def setup_optimizer_param_groups(self): """ - ModelPT override. Optimizer will get self._optimizer_param_groups. + ModelPT override. Optimizer will get self._optimizer_param_groups. Makes two optimizer param groups, one for the frozen model params - and one for the prompt-table/prompt-encoder params. The learning + and one for the prompt-table/prompt-encoder params. The learning rate for the frozen model's params will always be zero effectively freezing the model's params but still allowing for the needed gradients - to be passed around in pipeline parallel models. The prompt-encoder - and/or prompt table will use the learning rate set by the user. + to be passed around in pipeline parallel models. The prompt-encoder + and/or prompt table will use the learning rate set by the user. """ self.freeze() # Freeze the entire model opt_params = [] @@ -231,7 +236,7 @@ def __init__( super().__init__(cfg, trainer) def init_peft_modules(self): - """ + """ Randomly initialize the peft params and add them to the appropriate modules. """ assert len(self.peft_name_keys) > 0, "peft_name_keys have not been set no PEFT modules will be added" @@ -294,8 +299,8 @@ class MegatronGPTAdapterModel(MegatronGPTLayerwisePEFTModel): Two adapter's are inserted into each Transformer layer in the base GPT Model. It is assumed that these set of adapters will then be trained for a specific task. - Once trained, the adapter weights will be saved and can be re-loaded - and infused into the same GPT Model for inference. + Once trained, the adapter weights will be saved and can be re-loaded + and infused into the same GPT Model for inference. """ def __init__( @@ -331,7 +336,7 @@ def __init__( class MegatronGPTAdapterModelWeightTying(MegatronGPTLayerwisePEFTModel): """ - TODO + TODO """ def __init__( @@ -411,8 +416,8 @@ class MegatronGPTIA3Model(MegatronGPTLayerwisePEFTModel): Three adapter's are inserted into each Transformer layer in the base GPT Model. Each adapter is basically a vector that simply scales the key, value or ffn hidden representations. It is assumed that these set of adapters will then be trained for a specific task. - Once trained, the adapter weights will be saved and can be re-loaded - and infused into the same GPT Model for inference. + Once trained, the adapter weights will be saved and can be re-loaded + and infused into the same GPT Model for inference. """ def __init__(self, cfg: DictConfig, trainer: Trainer): @@ -444,7 +449,7 @@ def __init__(self, cfg: DictConfig, trainer: Trainer): class MegatronGPTPTuningModel(MegatronGPTPEFTModel): """ - MegatronGPTPTuningModel is a model that combines a base model (GPTSFTModel) with a p-tuning prefix in the + MegatronGPTPTuningModel is a model that combines a base model (GPTSFTModel) with a p-tuning prefix in the input word embedding representations using a prompt-encoder as descripted in Liu et al. https://arxiv.org/pdf/2103.10385.pdf The mixin framework adds the output of prompt-encoder (i.e. the virtual embeddings) inside @@ -467,7 +472,7 @@ def __init__(self, cfg: DictConfig, trainer: Trainer): self.virtual_tokens = cfg.peft.p_tuning.virtual_tokens def init_peft_modules(self,): - """ + """ Initialize the p-tuning prompt encoder in the mixin. This should only happen in the first stage of the pipeline unlike other PEFT methods like Lora or Adapters because p-tuning only adds params at input to the encoder layer. @@ -479,7 +484,7 @@ def init_peft_modules(self,): return True def state_dict(self, destination=None, prefix=None, keep_vars=False): - """ + """ Reimplement state_dict for ptuning because we also need to check the stage of the pipeline. The check is required to make pp>1 to work. """ @@ -493,7 +498,7 @@ def state_dict(self, destination=None, prefix=None, keep_vars=False): return self.model.state_dict(prefix="model.") def load_state_dict(self, state_dict, strict: bool = True): - """ + """ Reimplement load_state_dict for ptuning because we also need to check the stage of the pipeline. The check is required to make pp>1 to work. """ @@ -629,7 +634,7 @@ def __init__( class MegatronGPTLoRAModelWeightTying(MegatronGPTLayerwisePEFTModel): """ - TODO + TODO """ def __init__( diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py index b38a81460097..8f2a108fcfc0 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py @@ -32,13 +32,9 @@ ) from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel from nemo.collections.nlp.modules.common.megatron.utils import get_iterator_k_split -from nemo.collections.nlp.modules.common.text_generation_utils import ( - LengthParam, - SamplingParam, - generate, - get_computeprob_response, - megatron_gpt_generate, -) +from nemo.collections.nlp.modules.common.text_generation_utils import generate, get_computeprob_response + +from nemo.collections.nlp.parts.mixins.nlp_adapter_mixins import NLPAdapterModelMixin from nemo.collections.nlp.parts.utils_funcs import get_last_rank from nemo.utils import AppState, logging @@ -68,7 +64,7 @@ __all__ = ['MegatronGPTSFTModel'] -class MegatronGPTSFTModel(MegatronGPTModel): +class MegatronGPTSFTModel(NLPAdapterModelMixin, MegatronGPTModel): """ Megatron GPT Supervised Fine-Tuning """ @@ -169,6 +165,7 @@ def setup(self, stage=None): # NOTE: super().__init__ will try and setup train/val/test datasets, but we sidestep this using a if self._train_ds is not None condition # We then set things up for real only once setup() of this class is called. resume_checkpoint_path = self.trainer.ckpt_path + self.setup_complete = True if resume_checkpoint_path: init_consumed_samples = self._extract_consumed_samples_from_ckpt(resume_checkpoint_path) else: @@ -209,6 +206,7 @@ def setup(self, stage=None): if self.cfg.get('transformer_engine', False): self.setup_transformer_engine_tp_groups() + self.setup_complete = True def _build_dataset(self, data_cfg, is_train=True): datasets = [] @@ -567,7 +565,7 @@ def inference_epoch_end(self, outputs, mode, data_cfg): # Logging of the averaged metrics: averaged_loss = sum(averaged_loss) / len(averaged_loss) - averaged_metric = sum(averaged_metric) / len(averaged_metric) if len(averaged_metric) > 1 else None + averaged_metric = sum(averaged_metric) / len(averaged_metric) if len(averaged_metric) >= 1 else None # Handle case where metrics can be nan or inf. This can break checkpoint save/load. if averaged_metric is not None and (torch.isinf(averaged_metric) or torch.isnan(averaged_metric)): @@ -835,6 +833,9 @@ def on_test_epoch_start(self): ) return super().on_test_epoch_start() + def on_predict_epoch_start(self): + return self.on_test_epoch_start() + def on_test_epoch_end(self): _ = self.inference_epoch_end(self.test_step_outputs, 'test', self.cfg.data.test_ds) # Commenting as on_test_epoch_end was a no-op in PTL 1.9 diff --git a/nemo/collections/nlp/models/language_modeling/megatron_t0_model.py b/nemo/collections/nlp/models/language_modeling/megatron_t0_model.py index fb62dc0db8ee..4d4d80b71a98 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_t0_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_t0_model.py @@ -23,7 +23,7 @@ MegatronPretrainingBatchSampler, ) from nemo.collections.nlp.data.language_modeling.t0_dataset import T0Dataset -from nemo.collections.nlp.models.language_modeling.megatron_finetune_model import MegatronT5FinetuneModel +from nemo.collections.nlp.models.language_modeling.megatron_t5_sft_model import MegatronT5SFTModel from nemo.utils import AppState, logging try: @@ -48,8 +48,8 @@ __all__ = ['MegatronT0Model'] -class MegatronT0Model(MegatronT5FinetuneModel): - """T0 (https://arxiv.org/abs/2110.08207) Model that Inherits from MegatronT5FinetuneModel and overrides the dataset building.""" +class MegatronT0Model(MegatronT5SFTModel): + """T0 (https://arxiv.org/abs/2110.08207) Model that Inherits from MegatronT5SFTModel and overrides the dataset building.""" def __init__(self, cfg: DictConfig, trainer: Trainer): super().__init__(cfg, trainer=trainer) @@ -154,7 +154,7 @@ def _build_dataset(self, data_cfg, check_implict_grad_acc=False, is_train=True): return datasets def training_step(self, dataloader_iter, batch_idx): - return super(MegatronT5FinetuneModel, self).training_step(dataloader_iter, batch_idx) + return super().training_step(dataloader_iter, batch_idx) # Override the parent batch reconfiguring logic. def _reconfigure_and_process_inference_batch(self, batch): @@ -236,10 +236,10 @@ def setup_eval_dataloader(self, datasets, data_cfg): # TODO: Temporary overrides of finetune model. This needs to removed in the finetune model. def on_train_start(self) -> None: - super(MegatronT5FinetuneModel, self).on_train_start() + super().on_train_start() def on_validation_start(self) -> None: - super(MegatronT5FinetuneModel, self).on_validation_start() + super().on_validation_start() def on_test_start(self) -> None: - super(MegatronT5FinetuneModel, self).on_test_start() + super().on_test_start() diff --git a/nemo/collections/nlp/models/language_modeling/megatron_t5_adapter_model.py b/nemo/collections/nlp/models/language_modeling/megatron_t5_adapter_model.py index eaf4004e7371..d1332831ef1d 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_t5_adapter_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_t5_adapter_model.py @@ -26,11 +26,11 @@ from pytorch_lightning.trainer.trainer import Trainer from nemo.collections.common.parts.adapter_modules import LinearAdapterConfig -from nemo.collections.nlp.models.language_modeling.megatron_finetune_model import MegatronT5FinetuneModel from nemo.collections.nlp.models.language_modeling.megatron_t5_model import MegatronT5Model from nemo.collections.nlp.models.language_modeling.megatron_t5_prompt_learning_model import ( MegatronT5PromptLearningModel, ) +from nemo.collections.nlp.models.language_modeling.megatron_t5_sft_model import MegatronT5SFTModel from nemo.collections.nlp.modules.common import VirtualPromptStyle from nemo.collections.nlp.modules.common.megatron.adapters.parallel_adapters import ( AdapterName, @@ -183,11 +183,11 @@ def predict_step(self, batch: Any, batch_idx: int, dataloader_idx: int = 0) -> A ) # Special ids to text function to handle stripping and special tokens with sentencepiece tokenizers. - preds_text = MegatronT5FinetuneModel.ids_to_text(predicted_token_ids, self.tokenizer) - input_text = MegatronT5FinetuneModel.ids_to_text(enc_input, self.tokenizer) + preds_text = MegatronT5SFTModel.ids_to_text(predicted_token_ids, self.tokenizer) + input_text = MegatronT5SFTModel.ids_to_text(enc_input, self.tokenizer) if labels is not None: - labels_text = MegatronT5FinetuneModel.ids_to_text(labels, self.tokenizer) + labels_text = MegatronT5SFTModel.ids_to_text(labels, self.tokenizer) else: labels_text = [None] * len(preds_text) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py b/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py index f1a13c2abee4..a6afb103cb89 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_t5_prompt_learning_model.py @@ -25,8 +25,8 @@ from nemo.collections.nlp.models.language_modeling.megatron_base_prompt_learning_model import ( MegatronBasePromptLearningModel, ) -from nemo.collections.nlp.models.language_modeling.megatron_finetune_model import MegatronT5FinetuneModel from nemo.collections.nlp.models.language_modeling.megatron_t5_model import MegatronT5Model +from nemo.collections.nlp.models.language_modeling.megatron_t5_sft_model import MegatronT5SFTModel from nemo.collections.nlp.modules.common.megatron.utils import ( average_losses_across_data_parallel_group, get_iterator_k_split, @@ -296,9 +296,9 @@ def get_predictions(self, input_ids, enc_mask, encoder_input, labels): else self.tokenizer.bos_id, ) # Special ids to text function to handle stripping and special tokens with sentencepiece tokenizers. - preds_text = MegatronT5FinetuneModel.ids_to_text(predicted_token_ids, self.tokenizer) - labels_text = MegatronT5FinetuneModel.ids_to_text(labels, self.tokenizer) - input_text = MegatronT5FinetuneModel.ids_to_text(input_ids, self.tokenizer) + preds_text = MegatronT5SFTModel.ids_to_text(predicted_token_ids, self.tokenizer) + labels_text = MegatronT5SFTModel.ids_to_text(labels, self.tokenizer) + input_text = MegatronT5SFTModel.ids_to_text(input_ids, self.tokenizer) return { 'predicted_token_ids': preds_text, 'labels': labels_text, @@ -482,11 +482,11 @@ def predict_step(self, batch: Any, batch_idx: int, dataloader_idx: int = 0) -> A else self.tokenizer.bos_id, ) # Special ids to text function to handle stripping and special tokens with sentencepiece tokenizers. - preds_text = MegatronT5FinetuneModel.ids_to_text(predicted_token_ids, self.tokenizer) - input_text = MegatronT5FinetuneModel.ids_to_text(input_ids, self.tokenizer) + preds_text = MegatronT5SFTModel.ids_to_text(predicted_token_ids, self.tokenizer) + input_text = MegatronT5SFTModel.ids_to_text(input_ids, self.tokenizer) if labels is not None: - labels_text = MegatronT5FinetuneModel.ids_to_text(labels, self.tokenizer) + labels_text = MegatronT5SFTModel.ids_to_text(labels, self.tokenizer) else: labels_text = [None] * len(preds_text) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py b/nemo/collections/nlp/models/language_modeling/megatron_t5_sft_model.py similarity index 74% rename from nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py rename to nemo/collections/nlp/models/language_modeling/megatron_t5_sft_model.py index b95017a17302..22483731a534 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_t5_sft_model.py @@ -23,8 +23,11 @@ from nemo.collections.common.metrics import MetricStringToTorchMetric from nemo.collections.common.metrics.classification_accuracy import ExactStringPerCategoryMatchMetric from nemo.collections.nlp.data.common.sequence_to_sequence_dataset import SequenceToSequenceDataset +from nemo.collections.nlp.data.language_modeling.megatron.t5_sft_dataset import T5SFTDataset from nemo.collections.nlp.models.language_modeling.megatron_t5_model import MegatronT5Model, T5Sentinel from nemo.collections.nlp.modules.common.megatron.utils import get_iterator_k_split + +from nemo.collections.nlp.parts.mixins.nlp_adapter_mixins import NLPAdapterModelMixin from nemo.collections.nlp.parts.utils_funcs import get_last_rank from nemo.utils import AppState, logging @@ -50,19 +53,25 @@ HAVE_MEGATRON_CORE = False -__all__ = ['MegatronT5FinetuneModel'] +__all__ = ['MegatronT5SFTModel'] -class MegatronT5FinetuneModel(MegatronT5Model): - """Finetune Model that Inherits from MegatronT5Model instead.""" +class MegatronT5SFTModel(NLPAdapterModelMixin, MegatronT5Model): + """ T5 Finetuning model in the same format as MegatronGPTSFTModel """ def __init__(self, cfg: DictConfig, trainer: Trainer): + if not HAVE_APEX: + raise ImportError( + "Apex was not found. Please see the NeMo README for installation instructions: https://github.com/NVIDIA/NeMo#megatron-gpt." + ) super().__init__(cfg, trainer=trainer) - self.val_metric, self.val_metric_name = self.setup_metric(self.cfg.data.validation_ds) - self.val_metric = torch.nn.ModuleList(self.val_metric) + self.val_metric = self.test_metric = None + if hasattr(self.cfg.data, "validation_ds"): + self.val_metric, self.val_metric_name = self.setup_metric(self.cfg.data.validation_ds) + self.val_metric = torch.nn.ModuleList(self.val_metric) if self.val_metric is not None else None if hasattr(self.cfg.data, "test_ds"): self.test_metric, self.test_metric_name = self.setup_metric(self.cfg.data.test_ds) - self.test_metric = torch.nn.ModuleList(self.test_metric) + self.test_metric = torch.nn.ModuleList(self.test_metric) if self.test_metric is not None else None def setup_metric(self, data_cfg): # XNLI is a special case. @@ -71,10 +80,12 @@ def setup_metric(self, data_cfg): metric = [ExactStringPerCategoryMatchMetric(self.cfg.eval_languages)] else: if not hasattr(data_cfg, "metric"): - metric = MetricStringToTorchMetric["exact_string_match"] + metric_class = MetricStringToTorchMetric["exact_string_match"] else: if not hasattr(data_cfg.metric, "name"): raise ValueError("Metric name is not provided in the metric config.") + if data_cfg.metric.name == "loss": + return None, "loss" if data_cfg.metric.name not in MetricStringToTorchMetric: raise KeyError( f"{data_cfg.metric.name} is not supported. List of supported metrics: {MetricStringToTorchMetric.keys()}" @@ -104,9 +115,8 @@ def setup_metric(self, data_cfg): raise ValueError( f"Number of class labels {len(data_cfg.metric.get('class_labels', None))} does not match `num_classes` : {data_cfg.metric.num_classes}" ) - - metric_name = data_cfg.metric.name - metric_class = MetricStringToTorchMetric[metric_name] + metric_name = data_cfg.metric.name + metric_class = MetricStringToTorchMetric[metric_name] # GLUE will not have a "src_file_name" attribute and will always have only a single metric. if hasattr(data_cfg, "src_file_name") or hasattr(data_cfg, "file_names"): @@ -143,6 +153,10 @@ def setup_metric(self, data_cfg): def _metrics_require_string2category_map(self): return set(["f1", "accuracy", "average_precision"]) + @property + def model(self): + return self.enc_dec_model + def setup(self, stage=None): # This is just to keep the parent class happy since we override its setup() method. self.init_consumed_samples = 0 @@ -158,6 +172,7 @@ def setup(self, stage=None): self.setup_test_data() if hasattr(self, '_train_ds'): self.setup_training_data() + self.setup_complete = True def on_validation_epoch_start(self): app_state = AppState() @@ -322,30 +337,31 @@ def inference_step(self, dataloader_iter, batch_idx: int, mode: str, dataloader_ ) # Special ids to text function to handle stripping and special tokens with sentencepiece tokenizers. - preds_text = MegatronT5FinetuneModel.ids_to_text(predicted_token_ids, self.tokenizer) - labels_text = MegatronT5FinetuneModel.ids_to_text(batch['labels'], self.tokenizer) - input_text = MegatronT5FinetuneModel.ids_to_text(batch['text_enc'], self.tokenizer) + preds_text = MegatronT5SFTModel.ids_to_text(predicted_token_ids, self.tokenizer) + labels_text = MegatronT5SFTModel.ids_to_text(batch['labels'], self.tokenizer) + input_text = MegatronT5SFTModel.ids_to_text(batch['text_enc'], self.tokenizer) if not batch_has_lang_information: categories = [None] * len(preds_text) else: categories = batch['lang'] - metric = self.val_metric[dataloader_idx] if mode == 'validation' else self.test_metric[dataloader_idx] - assert len(categories) == len(preds_text) == len(labels_text) - for _, (pred, label, category) in enumerate(zip(preds_text, labels_text, categories)): - # To compute metrics like pearson or spearman correlation, we need to cast the predicted string and labels to floats. - pred, label = self.cast_for_metric( - pred=pred, - label=label, - metric_name=self.val_metric_name if mode == 'validation' else self.test_metric_name, - class_labels=data_cfg.metric.get('class_labels', None), - labels_are_strings=data_cfg.metric.get('labels_are_strings', False), - ) - if batch_has_lang_information: - _ = metric(pred, label, category) - else: - _ = metric(pred, label) + if self.val_metric is not None or self.test_metric is not None: + metric = self.val_metric[dataloader_idx] if mode == 'validation' else self.test_metric[dataloader_idx] + assert len(categories) == len(preds_text) == len(labels_text) + for _, (pred, label, category) in enumerate(zip(preds_text, labels_text, categories)): + # To compute metrics like pearson or spearman correlation, we need to cast the predicted string and labels to floats. + pred, label = self.cast_for_metric( + pred=pred, + label=label, + metric_name=self.val_metric_name if mode == 'validation' else self.test_metric_name, + class_labels=data_cfg.metric.get('class_labels', None), + labels_are_strings=data_cfg.metric.get('labels_are_strings', False), + ) + if batch_has_lang_information: + _ = metric(pred, label, category) + else: + _ = metric(pred, label) outputs = { 'preds': preds_text, @@ -435,40 +451,40 @@ def inference_epoch_end(self, outputs, mode, data_cfg): torch.distributed.broadcast(loss, get_last_rank()) self.log('val_loss', loss, prog_bar=True, rank_zero_only=True, batch_size=1) self.log('global_step', self.trainer.global_step, prog_bar=True, rank_zero_only=True, batch_size=1) + averaged_loss.append(loss) # Determine the key used to log the loss based on the user provided name of the dataset or the dataloader index. loss_log_key = self._determine_log_key(data_cfg, dataloader_idx, "loss", mode) - # Determine the key used to log the eval metric based on the user provided name of the dataset or the dataloader index. - metric_log_key = self._determine_log_key(data_cfg, dataloader_idx, metric_name, mode) - self.log(loss_log_key, loss, batch_size=1) - metric_object = ( - self.val_metric[dataloader_idx] if mode == 'validation' else self.test_metric[dataloader_idx] - ) - metric = metric_object.compute() - if metric_name == 'rouge': - metric = metric['rouge1_fmeasure'] - # Handle logging of GLUE/XNLI separately here. XNLI has a separate metric per language. - if isinstance(metric, dict): - # GLUE case: - if len(metric) == 1 and 'acc' in metric: - metric = metric['acc'] - self.log(metric_log_key, metric, batch_size=1) - logging.info(f"{mode} {metric_name}: {metric}") - # XNLI case where the metric dictionary contains the language and the computed metric as values. - else: - for k, v in metric.items(): - if k != 'acc' and 'total' not in k: - self.log(metric_log_key + f'_{k}', v, batch_size=1) - logging.info(f"{mode} {metric_name} lang {k} : {v}") - if metric_name != 'rouge': + if metric_name != 'loss': + # Determine the key used to log the eval metric based on the user provided name of the dataset or the dataloader index. + metric_log_key = self._determine_log_key(data_cfg, dataloader_idx, metric_name, mode) + self.log(loss_log_key, loss, batch_size=1) + metric_object = ( + self.val_metric[dataloader_idx] if mode == 'validation' else self.test_metric[dataloader_idx] + ) + metric = metric_object.compute() + if metric_name == 'rouge': + metric = metric['rouge1_fmeasure'] + # Handle logging of GLUE/XNLI separately here. XNLI has a separate metric per language. + if isinstance(metric, dict): + # GLUE case: + if len(metric) == 1 and 'acc' in metric: metric = metric['acc'] - else: - self.log(metric_log_key, metric, batch_size=1) - logging.info(f"{metric_log_key}: {metric}") - metric_object.reset() - - averaged_loss.append(loss) - averaged_metric.append(metric) + self.log(metric_log_key, metric, batch_size=1) + logging.info(f"{mode} {metric_name}: {metric}") + # XNLI case where the metric dictionary contains the language and the computed metric as values. + else: + for k, v in metric.items(): + if k != 'acc' and 'total' not in k: + self.log(metric_log_key + f'_{k}', v, batch_size=1) + logging.info(f"{mode} {metric_name} lang {k} : {v}") + if metric_name != 'rouge': + metric = metric['acc'] + else: + self.log(metric_log_key, metric, batch_size=1) + logging.info(f"{metric_log_key}: {metric}") + metric_object.reset() + averaged_metric.append(metric) # Write predictions, labels, and inputs to a file for each validation/test dataset. if data_cfg.get("write_predictions_to_file", False): @@ -479,7 +495,7 @@ def inference_epoch_end(self, outputs, mode, data_cfg): f"Cannot write predictions to file when output_file_path_prefix is not set or present in the yaml config file." ) - # Gather the outputs object from all data parallel ranks since we are using the DistributedSampler which splits data across DDP ranks. + # Gather the outputs object from all data parallel ranks since we are using the DistributedSampler which splits data across DDPDDP ranks. gathered_outputs = [None for _ in range(parallel_state.get_data_parallel_world_size())] torch.distributed.all_gather_object( gathered_outputs, @@ -527,10 +543,10 @@ def inference_epoch_end(self, outputs, mode, data_cfg): # Logging of the averaged metrics: averaged_loss = sum(averaged_loss) / len(averaged_loss) - averaged_metric = sum(averaged_metric) / len(averaged_metric) + averaged_metric = sum(averaged_metric) / len(averaged_metric) if len(averaged_metric) >= 1 else None # Handle case where metrics can be nan or inf. This can break checkpoint save/load. - if torch.isinf(averaged_metric) or torch.isnan(averaged_metric): + if averaged_metric is not None and (torch.isinf(averaged_metric) or torch.isnan(averaged_metric)): app_state = AppState() monitor_mode = app_state.checkpoint_callback_params.mode assert monitor_mode in ['min', 'max'] @@ -538,10 +554,12 @@ def inference_epoch_end(self, outputs, mode, data_cfg): if mode == 'validation': self.log("validation_loss", averaged_loss, batch_size=1) - self.log(f"validation_{self.val_metric_name}", averaged_metric, batch_size=1) + if averaged_metric is not None: + self.log(f"validation_{self.val_metric_name}", averaged_metric, batch_size=1) elif mode == 'test': self.log("test_loss", averaged_loss, batch_size=1) - self.log(f"test_{self.test_metric_name}", averaged_metric, batch_size=1) + if averaged_metric is not None: + self.log(f"test_{self.test_metric_name}", averaged_metric, batch_size=1) app_state = AppState() if hasattr(self, "_train_ds"): @@ -635,7 +653,9 @@ def setup_eval_data(self, datasets, data_cfg): for dataset in datasets: eval_dl = self.build_data_loader( dataset, - global_batch_size=self.cfg.data.train_ds.global_batch_size, + global_batch_size=self.cfg.data.test_ds.global_batch_size + if hasattr(self.cfg.data, "test_ds") + else self.cfg.data.validation_ds.global_batch_size, shuffle=data_cfg.shuffle, num_workers=data_cfg.num_workers, pin_memory=data_cfg.pin_memory, @@ -660,32 +680,52 @@ def _build_train_dataset(self, data_cfg): f"Cannot use drop_last=False in your training data with gradient accumulation found grad acc of {data_cfg.global_batch_size // (data_cfg.micro_batch_size * parallel_state.get_data_parallel_world_size())} with global_batch_size {data_cfg.global_batch_size}, micro_batch_size {data_cfg.micro_batch_size}, data parallel size {parallel_state.get_data_parallel_world_size()}" ) datasets = [] - # Determine if we are using a single dataset or a list of datasets. - is_src_list_config = isinstance(data_cfg.src_file_name, ListConfig) - is_tgt_list_config = isinstance(data_cfg.tgt_file_name, ListConfig) - - if (is_src_list_config and not is_tgt_list_config) or (is_tgt_list_config and not is_src_list_config): - raise ValueError("src_list and tgt_list must both be either a ListConfig or a string. ") - if is_src_list_config: - if len(data_cfg.src_file_name) != len(data_cfg.tgt_file_name): - raise ValueError("src_file_name and tgt_file_name must have the same number of elements. ") + if hasattr(data_cfg, "src_file_name") and hasattr(data_cfg, "tgt_file_name"): + # Determine if we are using a single dataset or a list of datasets. + is_src_list_config = isinstance(data_cfg.src_file_name, ListConfig) + is_tgt_list_config = isinstance(data_cfg.tgt_file_name, ListConfig) + + if (is_src_list_config and not is_tgt_list_config) or (is_tgt_list_config and not is_src_list_config): + raise ValueError("src_list and tgt_list must both be either a ListConfig or a string. ") + if is_src_list_config: + if len(data_cfg.src_file_name) != len(data_cfg.tgt_file_name): + raise ValueError("src_file_name and tgt_file_name must have the same number of elements. ") + else: + data_cfg.src_file_name = [data_cfg.src_file_name] + data_cfg.tgt_file_name = [data_cfg.tgt_file_name] + + for src, tgt in zip(data_cfg.src_file_name, data_cfg.tgt_file_name): + dataset = SequenceToSequenceDataset( + src_file_name=src, + tgt_file_name=tgt, + src_tokenizer=self.tokenizer, + tgt_tokenizer=self.tokenizer, + max_src_seq_length=data_cfg.max_src_seq_length, + max_tgt_seq_length=data_cfg.max_tgt_seq_length, + add_bos_to_input=data_cfg.get('add_bos_to_input', True), + add_eos_to_input=data_cfg.get('add_eos_to_input', True), + replace_bos_with_pad=data_cfg.get('replace_bos_with_pad', False), + ) + datasets.append(dataset) + elif hasattr(data_cfg, "file_names"): + for file_path in data_cfg.file_names: + dataset = T5SFTDataset( + file_path=file_path, + src_tokenizer=self.tokenizer, + tgt_tokenizer=self.tokenizer, + max_src_seq_length=data_cfg.max_seq_length, + max_tgt_seq_length=data_cfg.max_seq_length, + add_bos_to_input=data_cfg.get('add_bos', True), + add_eos_to_input=data_cfg.get( + 'add_eos', True + ), # review: need domain knowledge to undertand if these args are ok + index_mapping_dir=data_cfg.get('index_mapping_dir', None), + memmap_workers=data_cfg.get('memmap_workers', None), + hf_dataset=data_cfg.get('hf_dataset', False), + ) + datasets.append(dataset) else: - data_cfg.src_file_name = [data_cfg.src_file_name] - data_cfg.tgt_file_name = [data_cfg.tgt_file_name] - - for src, tgt in zip(data_cfg.src_file_name, data_cfg.tgt_file_name): - dataset = SequenceToSequenceDataset( - src_file_name=src, - tgt_file_name=tgt, - src_tokenizer=self.tokenizer, - tgt_tokenizer=self.tokenizer, - max_src_seq_length=data_cfg.max_src_seq_length, - max_tgt_seq_length=data_cfg.max_tgt_seq_length, - add_bos_to_input=data_cfg.get('add_bos_to_input', True), - add_eos_to_input=data_cfg.get('add_eos_to_input', True), - replace_bos_with_pad=data_cfg.get('replace_bos_with_pad', False), - ) - datasets.append(dataset) + raise ValueError("You must specify either (src_file_name and tgt_file_name) or file_names in data config") if len(datasets) > 1: dataset = ConcatMapDataset( @@ -707,41 +747,58 @@ def _build_eval_dataset(self, data_cfg): f'You are trying to use "implicit gradient accumulation" of {data_cfg.global_batch_size // (data_cfg.micro_batch_size * parallel_state.get_data_parallel_world_size())} in your validation/test datasets. This is not supported. Please set global_batch_size equal to micro_batch_size * data_parallel_world_size.' ) datasets = [] - # Determine if we are using a single dataset or a list of datasets. - is_src_list_config = isinstance(data_cfg.src_file_name, ListConfig) - is_tgt_list_config = isinstance(data_cfg.tgt_file_name, ListConfig) - is_names_list_config = False - if hasattr(data_cfg, "names"): - if isinstance(data_cfg.names, ListConfig): - is_names_list_config = True - - if (is_src_list_config and not is_tgt_list_config) or (is_tgt_list_config and not is_src_list_config): - raise ValueError("src_list and tgt_list must both be either a ListConfig or a string. ") - if is_src_list_config: - if len(data_cfg.src_file_name) != len(data_cfg.tgt_file_name): - raise ValueError("src_file_name and tgt_file_name must have the same number of elements. ") - if is_names_list_config and len(data_cfg.names) != len(data_cfg.src_file_name): - raise ValueError( - "If you are providing names for each src/tgt file, they must have the same number of elements." + if hasattr(data_cfg, "src_file_name") and hasattr(data_cfg, "tgt_file_name"): + # Determine if we are using a single dataset or a list of datasets. + is_src_list_config = isinstance(data_cfg.src_file_name, ListConfig) + is_tgt_list_config = isinstance(data_cfg.tgt_file_name, ListConfig) + is_names_list_config = False + if hasattr(data_cfg, "names"): + if isinstance(data_cfg.names, ListConfig): + is_names_list_config = True + + if (is_src_list_config and not is_tgt_list_config) or (is_tgt_list_config and not is_src_list_config): + raise ValueError("src_list and tgt_list must both be either a ListConfig or a string. ") + if is_src_list_config: + if len(data_cfg.src_file_name) != len(data_cfg.tgt_file_name): + raise ValueError("src_file_name and tgt_file_name must have the same number of elements. ") + if is_names_list_config and len(data_cfg.names) != len(data_cfg.src_file_name): + raise ValueError( + "If you are providing names for each src/tgt file, they must have the same number of elements." + ) + else: + data_cfg.src_file_name = [data_cfg.src_file_name] + data_cfg.tgt_file_name = [data_cfg.tgt_file_name] + + for src, tgt in zip(data_cfg.src_file_name, data_cfg.tgt_file_name): + dataset = SequenceToSequenceDataset( + src_file_name=src, + tgt_file_name=tgt, + src_tokenizer=self.tokenizer, + tgt_tokenizer=self.tokenizer, + max_src_seq_length=data_cfg.max_src_seq_length, + max_tgt_seq_length=data_cfg.max_tgt_seq_length, + add_bos_to_input=data_cfg.get('add_bos_to_input', True), + add_eos_to_input=data_cfg.get('add_eos_to_input', True), + replace_bos_with_pad=data_cfg.get('replace_bos_with_pad', False), ) + datasets.append(dataset) + elif hasattr(data_cfg, "file_names"): + for file_path in data_cfg.file_names: + dataset = T5SFTDataset( + file_path=file_path, + src_tokenizer=self.tokenizer, + tgt_tokenizer=self.tokenizer, + max_src_seq_length=data_cfg.max_seq_length, + max_tgt_seq_length=data_cfg.max_seq_length, + add_bos_to_input=data_cfg.get('add_bos', True), + add_eos_to_input=data_cfg.get('add_eos', True), + index_mapping_dir=data_cfg.get('index_mapping_dir', None), + memmap_workers=data_cfg.get('memmap_workers', None), + hf_dataset=data_cfg.get('hf_dataset', False), + ) + datasets.append(dataset) else: - data_cfg.src_file_name = [data_cfg.src_file_name] - data_cfg.tgt_file_name = [data_cfg.tgt_file_name] - - for src, tgt in zip(data_cfg.src_file_name, data_cfg.tgt_file_name): - dataset = SequenceToSequenceDataset( - src_file_name=src, - tgt_file_name=tgt, - src_tokenizer=self.tokenizer, - tgt_tokenizer=self.tokenizer, - max_src_seq_length=data_cfg.max_src_seq_length, - max_tgt_seq_length=data_cfg.max_tgt_seq_length, - add_bos_to_input=data_cfg.get('add_bos_to_input', True), - add_eos_to_input=data_cfg.get('add_eos_to_input', True), - replace_bos_with_pad=data_cfg.get('replace_bos_with_pad', False), - ) - datasets.append(dataset) - + raise ValueError("You must specify either (src_file_name and tgt_file_name) or file_names in data config") return datasets def build_train_valid_test_datasets(self, stage): diff --git a/nemo/collections/nlp/models/nlp_model.py b/nemo/collections/nlp/models/nlp_model.py index 0f0de87d3887..ac3a8c998ba7 100644 --- a/nemo/collections/nlp/models/nlp_model.py +++ b/nemo/collections/nlp/models/nlp_model.py @@ -16,8 +16,9 @@ import hashlib import json import os -from typing import Any, Mapping, Optional +from typing import Any, Mapping, Optional, Union +import torch from lightning_fabric.utilities.cloud_io import _load as pl_load from omegaconf import DictConfig, OmegaConf from pytorch_lightning import Trainer @@ -39,6 +40,7 @@ from nemo.collections.nlp.parts.nlp_overrides import NLPSaveRestoreConnector from nemo.core.classes import ModelPT from nemo.core.classes.exportable import Exportable +from nemo.core.connectors.save_restore_connector import SaveRestoreConnector from nemo.utils import AppState, logging try: @@ -439,3 +441,22 @@ def load_state_dict(self, state_dict: Mapping[str, Any], strict: bool = True): del state_dict["bert_model.embeddings.position_ids"] results = super(NLPModel, self).load_state_dict(state_dict, strict=strict) return results + + @classmethod + def restore_from( + cls, + restore_path: str, + override_config_path: Optional[Union[OmegaConf, str]] = None, + map_location: Optional[torch.device] = None, + strict: bool = True, + return_config: bool = False, + save_restore_connector: SaveRestoreConnector = None, + trainer: Optional[Trainer] = None, + ): + if save_restore_connector is None: + save_restore_connector = NLPSaveRestoreConnector() + if os.path.isdir(restore_path): + save_restore_connector.model_extracted_dir = restore_path + return super().restore_from( + restore_path, override_config_path, map_location, strict, return_config, save_restore_connector, trainer + ) diff --git a/nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py b/nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py index d38530c1bd5d..989b16d694e1 100644 --- a/nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py +++ b/nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py @@ -27,6 +27,7 @@ from nemo.collections.nlp.modules.common.megatron.fused_bias_gelu import fused_bias_gelu from nemo.collections.nlp.modules.common.megatron.utils import ApexGuardDefaults, init_method_const, init_method_normal from nemo.core.classes.mixins import adapter_mixin_strategies +from nemo.core.classes.mixins.adapter_mixins import AdapterConfig try: @@ -53,7 +54,7 @@ class AdapterName(str, enum.Enum): """ - Names for adapters used in NLP Adapters and IA3. Note: changing this will break backward compatibility. + Names for adapters used in NLP Adapters and IA3. Note: changing this will break backward compatibility. """ MLP_INFUSED = "mlp_infused_adapter" @@ -95,14 +96,14 @@ def forward(self, x): class MLPInfusedAdapter(InfusedAdapter): """ MLPInfusedAdapter is basically a clone of InfusedAdapter. We do this to make the adapter_mixin agnostic to adapter names - and only check adapter class types. + and only check adapter class types. """ pass @dataclass -class InfusedAdapterConfig: +class InfusedAdapterConfig(AdapterConfig): in_features: int _target_: str = "{0}.{1}".format(InfusedAdapter.__module__, InfusedAdapter.__name__) @@ -232,7 +233,7 @@ def forward(self, x): @dataclass -class ParallelLinearAdapterConfig: +class ParallelLinearAdapterConfig(AdapterConfig): in_features: int out_features: int dim: int @@ -248,7 +249,7 @@ class ParallelLinearAdapterConfig: class LoraKQVAdapter(ParallelLinearAdapter): """ - Lora Adapters are the same arch as regular adapters but with potentially different input and output feature sizes + Lora Adapters are the same arch as regular adapters but with potentially different input and output feature sizes and they do not use an bottleneck activation function """ @@ -257,7 +258,7 @@ class LoraKQVAdapter(ParallelLinearAdapter): class LoraKVAdapter(ParallelLinearAdapter): """ - Lora Adapters are the same arch as regular adapters but with potentially different input and output feature sizes + Lora Adapters are the same arch as regular adapters but with potentially different input and output feature sizes and they do not use an bottleneck activation function """ @@ -266,7 +267,7 @@ class LoraKVAdapter(ParallelLinearAdapter): class LoraQAdapter(ParallelLinearAdapter): """ - Lora Adapters are the same arch as regular adapters but with potentially different input and output feature sizes + Lora Adapters are the same arch as regular adapters but with potentially different input and output feature sizes and they do not use an bottleneck activation function """ @@ -290,7 +291,7 @@ class LoraKVAdapterConfig(ParallelLinearAdapterConfig): class PromptEncoderAdapter(nn.Module, AdapterModuleUtil): """ - The Tensor Parallel MLP prompt encoder network that is used to generate the virtual + The Tensor Parallel MLP prompt encoder network that is used to generate the virtual token embeddings for p-tuning. It only have two layers. TODO: (@adithyare) Need to add all the functionality from the PromptEncoder class """ @@ -311,7 +312,7 @@ def __init__( virtual_tokens: the number of vitural tokens hidden_size: hidden dimension output_size: the output dimension - init_std: the MLP init std value + init_std: the MLP init std value """ super().__init__() self.bottleneck_dim = bottleneck_dim @@ -384,7 +385,7 @@ def inner_forward(self,): return output_embeds def forward(self, batch_size: int, use_cached_reps: bool = False) -> torch.Tensor: - """ + """ Forward pass through the encoder with caching of prompt representations """ if use_cached_reps: @@ -406,7 +407,7 @@ def forward(self, batch_size: int, use_cached_reps: bool = False) -> torch.Tenso @dataclass -class PromptEncoderAdapterConfig: +class PromptEncoderAdapterConfig(AdapterConfig): virtual_tokens: int bottleneck_dim: int embedding_dim: int @@ -558,7 +559,7 @@ class ParallelLinearAdapterWeightTyingConfig: class LoraKQVAdapterWeightTying(ParallelLinearAdapterWeightTying): """ - TODO + TODO """ pass diff --git a/nemo/collections/nlp/modules/common/megatron/token_level_encoder_decoder.py b/nemo/collections/nlp/modules/common/megatron/token_level_encoder_decoder.py index 5f8c1c977a99..67e08c8db103 100644 --- a/nemo/collections/nlp/modules/common/megatron/token_level_encoder_decoder.py +++ b/nemo/collections/nlp/modules/common/megatron/token_level_encoder_decoder.py @@ -15,6 +15,10 @@ import torch from omegaconf import DictConfig +from nemo.collections.nlp.modules.common.megatron.adapters.parallel_adapters import ( + AdapterName, + PromptEncoderAdapterConfig, +) from nemo.collections.nlp.modules.common.megatron.hiddens import get_hiddens_module from nemo.collections.nlp.modules.common.megatron.language_model import Embedding from nemo.collections.nlp.modules.common.megatron.layer_type import LayerType @@ -38,6 +42,7 @@ ) from nemo.collections.nlp.modules.common.megatron.vocab_parallel_cross_entropy import vocab_parallel_cross_entropy from nemo.collections.nlp.parts import utils_funcs +from nemo.core.classes.mixins import adapter_mixins try: from apex.transformer.enums import AttnMaskType, ModelType @@ -102,7 +107,7 @@ def forward(self, hidden_states, word_embeddings_weight): # TODO: add soft prompts as an Embedding sub-class -class MegatronTokenLevelEncoderDecoderModule(MegatronModule): +class MegatronTokenLevelEncoderDecoderModule(MegatronModule, adapter_mixins.AdapterModuleMixin): """Token-based (input/output is tokens) encoder-decoder model (e.g. T5 Language model.)""" def __init__( @@ -427,6 +432,7 @@ def __init__( ) self._tokens_head_key = 'tokens_head' + self.set_accepted_adapter_types([PromptEncoderAdapterConfig._target_]) def _validate_kv_channels(self, cfg): kv_channels = cfg.kv_channels @@ -548,6 +554,18 @@ def forward( else: enc_position_ids = None enc_input = self.encoder_embedding(enc_input_ids, enc_position_ids, token_type_ids=token_type_ids) + if self.is_adapter_available(): + _sq, _bs, _hs = enc_input.size() + ptuning_adapter = self.get_adapter_module(AdapterName.PTUNING_ADAPTER) + v = ptuning_adapter.virtual_tokens + if ( + ptuning_adapter and _sq >= v + ): # The sequence should be longer the v to insert virtual embeddings. + virtual_embeddings = ptuning_adapter(_bs) + enc_input = enc_input[ + v:, :, : + ] # the first v tokens are pads so that they can be swapped out with virtual embeddings. + enc_input = torch.concat([virtual_embeddings, enc_input], dim=0) else: enc_input = None else: diff --git a/nemo/collections/nlp/parts/megatron_trainer_builder.py b/nemo/collections/nlp/parts/megatron_trainer_builder.py index 02f0a7f4f4aa..3c8bbc5db0d3 100644 --- a/nemo/collections/nlp/parts/megatron_trainer_builder.py +++ b/nemo/collections/nlp/parts/megatron_trainer_builder.py @@ -12,6 +12,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +import sys + from omegaconf import DictConfig from pytorch_lightning import Trainer from pytorch_lightning.callbacks import ModelSummary @@ -22,8 +24,10 @@ GradScaler, MegatronHalfPrecisionPlugin, NLPDDPStrategy, + NLPDDPStrategyNotebook, PipelineMixedPrecisionPlugin, ) +from nemo.utils import logging class MegatronTrainerBuilder: @@ -39,6 +43,12 @@ def _training_strategy(self) -> NLPDDPStrategy: """ Returns a ddp strategy passed to Trainer.strategy. """ + # check interactive environment + _IS_INTERACTIVE = hasattr(sys, "ps1") or bool(sys.flags.interactive) + if _IS_INTERACTIVE and self.cfg.trainer.devices == 1: + logging.info("Detected interactive environment, using NLPDDPStrategyNotebook") + return NLPDDPStrategyNotebook(no_ddp_communication_hook=True, find_unused_parameters=False,) + return NLPDDPStrategy( no_ddp_communication_hook=True, gradient_as_bucket_view=self.cfg.model.gradient_as_bucket_view, @@ -61,7 +71,9 @@ def _plugins(self) -> list: plugins: list of plugins passed to Trainer.plugins including precision plugins. """ megatron_amp_o2 = self.cfg.model.get('megatron_amp_O2', False) - with_distributed_adam = self.cfg.model.optim.get('name') == 'distributed_fused_adam' + with_distributed_adam = ( + self.cfg.model.optim.get('name') == 'distributed_fused_adam' if self.cfg.model.get('optim') else False + ) plugins = [] if self.cfg.trainer.precision in [16, '16', 'bf16', '16-mixed', 'bf16-mixed']: @@ -110,3 +122,15 @@ def create_trainer(self) -> Trainer: **self.cfg.trainer, callbacks=[ModelSummary(max_depth=3), CustomProgressBar()] ) + + +class MegatronLMPPTrainerBuilder(MegatronTrainerBuilder): + """Builder for scripts where grad scaler is turned off for pipeline parallel LM model. E.g. PEFT tuning scripts""" + + def _grad_scaler(self) -> GradScaler: + return GradScaler( + init_scale=self.cfg.model.get("native_amp_init_scale", 2 ** 32), + growth_interval=self.cfg.model.get("native_amp_growth_interval", 1000), + hysteresis=self.cfg.model.get("hysteresis", 2), + enabled=False if self.cfg.model.pipeline_model_parallel_size > 1 else True, + ) diff --git a/nemo/collections/nlp/parts/mixins/__init__.py b/nemo/collections/nlp/parts/mixins/__init__.py new file mode 100644 index 000000000000..4fc50543f1d2 --- /dev/null +++ b/nemo/collections/nlp/parts/mixins/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/nemo/collections/nlp/parts/mixins/nlp_adapter_mixins.py b/nemo/collections/nlp/parts/mixins/nlp_adapter_mixins.py new file mode 100644 index 000000000000..16a3850852d4 --- /dev/null +++ b/nemo/collections/nlp/parts/mixins/nlp_adapter_mixins.py @@ -0,0 +1,484 @@ +# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import tempfile +from typing import List, Optional, Union + +import torch +from omegaconf import DictConfig, OmegaConf, open_dict + +from nemo.utils.model_utils import inject_model_parallel_rank + +try: + from nemo.collections.nlp.modules.common.megatron.adapters.mcore_mixins import swap_mcore_mixin + + HAVE_MEGATRON_CORE = True +except (ImportError, ModuleNotFoundError): + HAVE_MEGATRON_CORE = False + + +from nemo.collections.nlp.modules.common.megatron.adapters.parallel_adapters import PromptEncoderAdapterConfig +from nemo.collections.nlp.parts.peft_config import ( + PEFT_CONFIG_MAP, + CanonicalAdaptersPEFTConfig, + LoraPEFTConfig, + PEFTConfig, + PtuningPEFTConfig, +) +from nemo.core.classes.mixins.adapter_mixins import AdapterModuleMixin +from nemo.core.connectors.save_restore_connector import SaveRestoreConnector +from nemo.utils import logging, model_utils + +try: + from megatron.core import parallel_state +except (ImportError, ModuleNotFoundError): + HAVE_MEGATRON_CORE = False + + +class NLPAdapterModelMixin: + """ NLP Adapter Mixin that can augment any transformer-based model with Adapter module support. + This mixin class should be used only with a top level ModelPT subclass, that includes either a `model` or an `enc_dec_model` submodule. + This mixin class adds several utility methods to add, load and save adapters. + + An Adapter module is any Pytorch nn.Module that possess a few properties : + + - It's input and output dimension are the same, while the hidden dimension need not be the same. + - The final layer of the Adapter module is zero-initialized, so that the residual connection to the adapter yields the original output. + + This mixin class aims to integrate with PEFT, which is one or more adapters modules. + The two features of PEFT, layer selection and weight tying, are also supported in this mixin class. + """ + + def __init__(self, *args, **kwargs): + self.use_peft = False + self.setup_complete = False + self.use_ptuning_only = False + super().__init__(*args, **kwargs) + if hasattr(self, "enc_dec_model"): + self.model_prefix = "enc_dec_model." # for T5 + else: + self.model_prefix = "model.module." if self.cfg.megatron_amp_O2 else "model." + + self.use_mcore_gpt = hasattr(self, 'mcore_gpt') and self.mcore_gpt + if self.use_mcore_gpt: + assert HAVE_MEGATRON_CORE, "You set `mcore_gpt` as True but megatron core is not found." + + def first_stage_of_pipeline(self): + if hasattr(self, "model") and hasattr(self.model, "pre_process"): + return self.model.pre_process + elif hasattr(self, "model") and hasattr(self.model, "module") and hasattr(self.model.module, "pre_process"): + # (guyueh1): this if condition is used to handle amp O2 + # when amp_O2 is on, self.model will be wrapped by the Float16Module class + return self.model.module.pre_process + logging.warning("no attribute named model or no model.pre_process found. Can not detect stage of pipeline...") + return False + + def _get_all_keys(self,): + """ + Returns all the keys in the model + """ + k = [n for n, p in self.named_parameters()] + b = [n for n, p in self.named_buffers() if n.replace("model.module.", "model.", 1) in self.state_dict().keys()] + # we include buffers because ptuning representations are cached in a buffer and saved to state_dict for inference time use. + return set(k + b) + + def _check_and_add_adapter(self, name, module, peft_name, peft_cfg, name_key_to_mcore_mixins=None): + if name_key_to_mcore_mixins is not None: + for mcore_target, mcore_mixin in name_key_to_mcore_mixins[peft_name]: + if name in [ + mcore_target, + f'model.{mcore_target}', + f'model.module.{mcore_target}', + ]: # simple string match for now + swap_mcore_mixin(module, mcore_mixin) + if model_utils.import_class_by_path(peft_cfg._target_) in module.get_accepted_adapter_types(): + module.add_adapter( + name=peft_name, + cfg=peft_cfg, + base_model_cfg=self.cfg, + model_parallel_config=self.model_parallel_config, + ) + elif isinstance(module, AdapterModuleMixin): + if model_utils.import_class_by_path(peft_cfg._target_) in module.get_accepted_adapter_types(): + module.add_adapter( + name=peft_name, + cfg=peft_cfg, + base_model_cfg=self.cfg, + model_parallel_config=self.model_parallel_config, + ) + + def _check_and_add_peft_cfg(self, peft_cfg): + + layer_selection = peft_cfg.layer_selection + + assert not self.use_mcore_gpt or hasattr( + peft_cfg, 'name_key_to_mcore_mixins' + ), f"{peft_cfg.__class__.__name__} is not supported in megatron core mode yet." + name_key_to_mcore_mixins = peft_cfg.name_key_to_mcore_mixins if self.use_mcore_gpt else None + + for adapter_name, adapter_cfg in peft_cfg.get_config_dict().items(): + # self.mcore_gpt means is GPT and not T5 + if hasattr(self, 'mcore_gpt') and not isinstance(adapter_cfg, PromptEncoderAdapterConfig): + if layer_selection is not None: + logging.info( + f"Layer selection {layer_selection} is enabled for the current model (" + f"{self.__class__.__name__} + {adapter_name})" + ) + if self.use_mcore_gpt: + if self.cfg.megatron_amp_O2: + layers = self.model.module.decoder.layers + else: + layers = self.model.decoder.layers + else: + if self.cfg.megatron_amp_O2: + layers = self.model.module.language_model.encoder.layers + else: + layers = self.model.language_model.encoder.layers + if layer_selection is None: + layer_selection = list(range(1, self.cfg.num_layers + 1)) + for layer in layers: + if layer.layer_number in layer_selection: + for name, module in layer.named_modules(): + self._check_and_add_adapter( + name, module, adapter_name, adapter_cfg, name_key_to_mcore_mixins + ) + else: + # Non GPT models, as well as GPT+PTuning do not support layer selection + if layer_selection is not None: + logging.warning( + "Layer selection is specified, but it is not supported for either " + f"{self.__class__.__name__} or {adapter_name})" + ) + for name, module in self.named_modules(): + self._check_and_add_adapter(name, module, adapter_name, adapter_cfg, name_key_to_mcore_mixins) + + def add_adapter(self, peft_cfgs: Union[PEFTConfig, List[PEFTConfig]]): + """ + High level API to add one or more adapter modules to the model, and freeze the base weights + This method supports adding adapter modules from PEFTConfig or list of PEFTConfig. It would add + corresponding adapter modules. Layer selection and weight tying would be applied if it's in PEFTConfig + + Args: + peft_cfgs: One or more PEFTConfig objects that specify the PEFT method configuration + """ + + if not isinstance(peft_cfgs, List): + peft_cfgs = [peft_cfgs] + + self.base_keys = self._get_all_keys() + self.freeze() + logging.info(f"Before adding PEFT params:\n{self.summarize()}") + + self.use_ptuning_only = len(peft_cfgs) == 1 and isinstance(peft_cfgs[0], PtuningPEFTConfig) + + for peft_cfg in peft_cfgs: + if self.use_ptuning_only: + if not self.first_stage_of_pipeline(): + # There are no params to add if we are not in the first state of the pipeline + continue + self.virtual_tokens = peft_cfg.virtual_tokens + + self._check_and_add_peft_cfg(peft_cfg) + + logging.info(f"After adding PEFT params:\n{self.summarize()}") + self.adapter_keys = self._get_all_keys() - self.base_keys + + for cfg in peft_cfgs: + if cfg.weight_tying: + self.tie_weights(cfg) + self.use_peft = True + + def _get_config_and_state_dict_from_nemo(self, filepath, map_location): + cwd = os.getcwd() + + with tempfile.TemporaryDirectory() as tmpdir: + try: + SaveRestoreConnector._unpack_nemo_file(filepath, tmpdir) + + os.chdir(tmpdir) + + config_yaml = "model_config.yaml" + model_weights_ckpt = "model_weights.ckpt" + + conf = OmegaConf.load(config_yaml) + + os.chdir(cwd) + model_weights = os.path.join(tmpdir, model_weights_ckpt) + model_weights = inject_model_parallel_rank(model_weights) + state_dict = torch.load(model_weights, map_location=map_location) + + return conf, state_dict + finally: + os.chdir(cwd) + + def setup_optimizer_param_groups(self): + """ + ModelPT override. Optimizer will get self._optimizer_param_groups. + Makes two optimizer param groups, one for the frozen model params + and one for the prompt-table/prompt-encoder params. The learning + rate for the frozen model's params will always be zero effectively + freezing the model's params but still allowing for the needed gradients + to be passed around in pipeline parallel models. The prompt-encoder + and/or prompt table will use the learning rate set by the user. + """ + if self.use_peft: + self.freeze() # Freeze the entire model + opt_params = [] + for _, module in self.named_modules(): + if isinstance(module, AdapterModuleMixin) and module.is_adapter_available(): + module.set_enabled_adapters(enabled=True) + module.unfreeze_enabled_adapters() # selectively unfreeze the adapter modules. + opt_params += [p for p in module.parameters() if p.requires_grad] + self._optimizer_param_groups = ({"params": opt_params},) + logging.info(f"Optimizer groups set:\n{self.summarize()}") + else: + super().setup_optimizer_param_groups() + + def load_adapters( + self, filepath: str, peft_cfgs: Optional[Union[PEFTConfig, List[PEFTConfig]]] = None, map_location: str = None, + ): + """ + Utility method that restores only the adapter module(s), and not the entire model itself. + This allows the sharing of adapters which are often just a fraction of the size of the full model, + enabling easier deliver. + + .. note:: + + During restoration, assumes that the model does not currently already have one or more adapter modules. + + Args: + filepath: Filepath of the .ckpt or .nemo file. + peft_cfgs: One or more PEFTConfig objects that specify the PEFT method configuration. + If none, will infer from the .nemo checkpoint + map_location: Pytorch flag, where to place the adapter(s) state dict(s). + """ + + # Determine device + if map_location is None: + if torch.cuda.is_available(): + map_location = 'cuda' + else: + map_location = 'cpu' + + if filepath.endswith('.nemo'): + conf, state_dict = self._get_config_and_state_dict_from_nemo(filepath, map_location) + elif filepath.endswith('.ckpt'): + state_dict = torch.load(filepath, map_location)['state_dict'] + else: + raise RuntimeError(f"{filepath} is not nemo file or ckpt file") + if not peft_cfgs: + assert filepath.endswith( + '.nemo' + ), "Inferring peft scheme is only supported for .nemo checkpoints. Please supply the `peft_cfgs` argument." + peft_cfgs = [PEFT_CONFIG_MAP[conf.peft.peft_scheme](conf)] + self.add_adapter(peft_cfgs) + assert set(state_dict.keys()) == self.adapter_keys + super().load_state_dict(state_dict, strict=False) + + def tie_weights(self, peft_cfg): + pos_idx = 0 + + if self.use_mcore_gpt: + if self.cfg.megatron_amp_O2: + layers = self.model.module.decoder.layers + else: + layers = self.model.decoder.layers + else: + if self.cfg.megatron_amp_O2: + layers = self.model.module.language_model.encoder.layers + else: + layers = self.model.language_model.encoder.layers + + if isinstance(peft_cfg, LoraPEFTConfig): + layer0 = layers[0].self_attention + elif isinstance(peft_cfg, CanonicalAdaptersPEFTConfig): + layer0 = layers[0] + else: + raise RuntimeError(f"{peft_cfg} is not supported for tied weights") + + for adapter_name in layer0.adapter_layer: + adapter = layer0.get_adapter_module(adapter_name) + print(adapter_name, pos_idx) + adapter.set_position(pos_idx) + pos_idx += 1 + + for layer in layers[1:]: + if isinstance(peft_cfg, LoraPEFTConfig): + layer = layer.self_attention + for adapter_name in layer.adapter_layer: + print(adapter_name, pos_idx) + adapter_l = layer.get_adapter_module(adapter_name) + adapter_0 = layer0.get_adapter_module(adapter_name) + adapter_l.tie_weights(pos_idx, adapter_0) + pos_idx += 1 + + def get_peft_state_dict(self): + """ + Gets the keys associated with the adapters only. + """ + state_dict = self.model.state_dict(prefix=self.model_prefix) + peft_state_dict = {} + for k in self.adapter_keys: + # state_dict keys needs to be in non-O2 format and will be corrected in PEFTSaveRestoreConnector if O2=True + new_k = k.replace("model.module.", "model.", 1) + peft_state_dict[new_k] = state_dict[k] + return peft_state_dict + + def state_dict(self, destination=None, prefix=None, keep_vars=False): + if self.use_peft and self.setup_complete: + # Once setup is complete we no longer need to track the frozen part of the model. Only there adapter state dict keeps changing so state_dict only track these. + return self.get_peft_state_dict() + else: + # we want all the params with the same keys as calling self.state_dict() + # but we can't call self.state_dict() here as it would be a recursive call. + # so we call self.model.state_dict(prefix="model.") which will return all the keys and params same as calling self.state_dict() + return self.model.state_dict(prefix=self.model_prefix) + + def sharded_state_dict(self, prefix: str = ''): + use_mcore_gpt = hasattr(self, 'mcore_gpt') and self.mcore_gpt + if not use_mcore_gpt or (self.use_peft and self.setup_complete): + return None + else: + return self.model.sharded_state_dict(prefix=self.model_prefix) + + def load_state_dict(self, state_dict, strict: bool = True): + if len(state_dict) == 0: + return # checkpoint is loaded in on_load_checkpoint() + if self.use_peft and self.setup_complete: + # at this stage only adapter params will appear in the state_dict arg + # so we only update those while the rest of the model is frozen. + # setting strict=False will ignore the missing keys (which are not being updated anyway) + # explicitly check if state_dict.keys matches all the expected self.adapter_keys since we don't have the + # safety in strict=True anymore. + assert set(state_dict.keys()) == self.adapter_keys + super().load_state_dict(state_dict, strict=False) + else: + super().load_state_dict(state_dict, strict=True) + + def on_load_checkpoint(self, checkpoint) -> None: + """LightningModule hook: + https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-load-checkpoint + """ + if self.use_peft and self.setup_complete: + if not self.use_ptuning_only or self.first_stage_of_pipeline(): + # same as super().on_load_checkpoint() but strict=False and only check unexpected keys + # mcore uses distributed checkpointing + if hasattr(self, 'mcore_gpt') and self.mcore_gpt: + for index, module in enumerate(self.get_gpt_module_list()): + if parallel_state.get_virtual_pipeline_model_parallel_world_size() is not None: + checkpoint_state_dict = checkpoint['state_dict'][f'model_{index}'] + else: + checkpoint_state_dict = checkpoint['state_dict'] + # checkpoint_state_dict has "model." but module does not so we need to remove it when loading + checkpoint_state_dict = { + key.replace('model.', ''): checkpoint_state_dict.pop(key) + for key in list(checkpoint_state_dict.keys()) + } + missing_keys, unexpected_keys = module.load_state_dict(checkpoint_state_dict, strict=False) + + assert len(unexpected_keys) == 0, 'Unexpected key(s) in state_dict: {}. '.format( + ', '.join('"{}"'.format(k) for k in unexpected_keys) + ) + + # legacy checkpointing for interleaved + else: + if isinstance(self.model, list): + for i in range(len(self.model)): + parallel_state.set_virtual_pipeline_model_parallel_rank(i) + self.model[i].module.load_state_dict(checkpoint[f'model{i}'], strict=True) + parallel_state.set_virtual_pipeline_model_parallel_rank(0) + else: + super().on_load_checkpoint(checkpoint) + + @classmethod + def merge_cfg_with(cls, path: str, cfg: DictConfig) -> DictConfig: + """ + Merge a given configuration dictionary `cfg` with the configuration dictionary + obtained from restoring a MegatronGPTSFTModel or MegatronT5SFTModel at the specified `path`. + + Args: + path (str): The path to the SFT model checkpoint to be restored. + cfg (DictConfig): The configuration dictionary to merge. + + Returns: + DictConfig: The merged configuration dictionary. + + Examples: + >>> path = "/path/to/model/checkpoint" + >>> cfg = DictConfig({"model": {"key": "value"}, "trainer": {"precision": 16}}) + >>> merged_cfg = merge_cfg_with(path, cfg) + + Notes: + - The function resolves variables within the `cfg` dictionary using `OmegaConf.resolve`. + - Keys in `cfg.model` will override the corresponding keys in the output dictionary. + - If "train_ds" exists in `cfg.model.data`, it updates `micro_batch_size` and `global_batch_size`. + - If `cfg.trainer` contains a "precision" key, it updates `output.precision`. + + """ + + base_cfg = cls.restore_from(path, return_config=True) + + OmegaConf.resolve(cfg) + with open_dict(base_cfg): + for key, val in cfg.model.items(): + base_cfg[key] = val + if "train_ds" in cfg.model.data: + base_cfg.micro_batch_size = cfg.model.data.train_ds.micro_batch_size + base_cfg.global_batch_size = cfg.model.data.train_ds.global_batch_size + if cfg.get("trainer", None) and cfg.trainer.get("precision"): + base_cfg.precision = cfg.trainer.precision + + return base_cfg + + @classmethod + def merge_inference_cfg(cls, path: str, cfg: DictConfig) -> DictConfig: + """ + Generate a configuration dictionary by a given configuration dictionary `cfg` with + the configuration dictionary obtained from restoring a MegatronGPTSFTModel or MegatronT5SFTModel + at the specified `path` and modify `cfg` for inference + + Args: + path (str): The path to the SFT model checkpoint to be restored. + cfg (DictConfig): The configuration dictionary to modify for inference. + + Returns: + DictConfig: The configuration dictionary for inference. + + Examples: + >>> path = "/path/to/model/checkpoint" + >>> cfg = DictConfig({"model": {"key": "value"}, "trainer": {"precision": 16}}) + >>> merged_cfg = merge_inference_cfg(path, cfg) + + Notes: + - "precision" and "test_ds" from `cfg` will override the corresponding keys in the output dictionary + - "activations_checkpoint" will be ovrrided to None in the output dictionary + - "use_flash_attention" will be True if in one of the configuration dictionarys is True + - "seq_len_interpolation_factor" will be overrided from `cfg` if it's not None from checkpoint + """ + + peft_cfg = cls.restore_from(path, return_config=True) + with open_dict(peft_cfg): + # update the model config of the trained model with params we want to set at inference time. + peft_cfg.precision = cfg.trainer.precision + for key, val in cfg.model.items(): + if key != 'data': + peft_cfg[key] = val + peft_cfg.data.test_ds = cfg.model.data.test_ds + + with open_dict(cfg): + cfg.inference.add_BOS = peft_cfg.data.test_ds.add_bos + cfg.inference.tokens_to_generate = peft_cfg.data.test_ds.tokens_to_generate + + return peft_cfg diff --git a/nemo/collections/nlp/parts/nlp_overrides.py b/nemo/collections/nlp/parts/nlp_overrides.py index a116c8e60299..5332a9d2f115 100644 --- a/nemo/collections/nlp/parts/nlp_overrides.py +++ b/nemo/collections/nlp/parts/nlp_overrides.py @@ -432,6 +432,18 @@ def restore_checkpoint_after_setup(self) -> bool: return True +class NLPDDPStrategyNotebook(NLPDDPStrategy): + """ Version of NLPDDPStrategy to be used in a Jupyter Notebook + A large portion of Megatron code has DDP dependency, so it has been necessary to use NLPDDPStrategy even for + single-GPU training (e.g. in a Jupyter notebook) + A PTL 2.0 changes has prevented DDPStrategy to be used in a notebook. + This version of NLPDDPStrategy enables megatron training in a notebook in PTL 2.0. + """ + + def _configure_launcher(self): + self._launcher = None + + class NLPSaveRestoreConnector(SaveRestoreConnector): def __init__(self) -> None: if not HAVE_APEX: @@ -675,156 +687,6 @@ def dummy(): return instance -class PEFTSaveRestoreConnector(NLPSaveRestoreConnector): - """ - PEFT models require the ability to load/save a small subset of the full model (once PEFT params have been infused into the base model.) - The PEFTSaveRestoreConnector is used to allow loading and saving only the PEFT params while not saving the entire model. - - Args: - peft_model_nemo_path: Used to provide the .nemo file corresponding to a PEFT model (which will only contain a small set of params) - peft_model_ckpt_path: Used to provide the path to .ckpt files of a PEFT model. This is required when no .nemo is available (yet) such as during resumed training. - peft_model_ckpt_name: The filename of the ckpt file inside the peft_model_ckpt_path folder - If both are provided the peft_model_ckpt_path takes precedence. - If neither are provided, PEFT params are initialized at random (not loaded from any external source). - """ - - def __init__( - self, - peft_model_nemo_path: Optional[str] = None, - peft_model_ckpt_path: Optional[str] = None, - peft_model_ckpt_name: Optional[str] = "model_weights.ckpt", - ) -> None: - super().__init__() - self.peft_model_ckpt_name = peft_model_ckpt_name - if peft_model_ckpt_path: - # First we will try to load a adapter ckpt path - # this is given priority over loading from nemo path to make resumption of training possible - ckpt_name = os.path.basename(peft_model_ckpt_path) - if not ckpt_name.strip() == '': - # update the weights file name inside the ckpt path rank folders - self.peft_model_ckpt_name = ckpt_name - self.peft_model_ckpt_dir = os.path.dirname(peft_model_ckpt_path) - assert os.path.isdir(self.peft_model_ckpt_dir) - self.peft_model_nemo_path = None - elif peft_model_nemo_path: - # If resumption is not possible we will try to load a adapter nemo path - self.peft_model_nemo_path = peft_model_nemo_path - assert os.path.exists(self.peft_model_nemo_path) - self.peft_model_ckpt_dir = None - else: - # We are not resuming training from a nemo file or a ckpt - # We are training the adapter from randomly initialization - self.peft_model_nemo_path = None - self.peft_model_ckpt_dir = None - - def _load_state_dict_from_disk(self, model_weights, map_location=None): - """ - Infuse the state_dict of the base model with PEFT params from either a peft_model_nemo_path or peft_model_ckpt_path - """ - # first load based model weights - base_model_state_dict = super()._load_state_dict_from_disk(model_weights, map_location) - # Next, We want to load PEFT model's weights - if self.peft_model_nemo_path: - # if the PEFT weights are provided in a .nemo file - # we need to untar the .nemo if its still tarred - with tempfile.TemporaryDirectory() as tmpdir: - self._unpack_nemo_file(self.peft_model_nemo_path, tmpdir) - model_weights_path = self._inject_model_parallel_rank_for_ckpt(tmpdir, self.peft_model_ckpt_name) - peft_state_dict = torch.load(model_weights_path, map_location) - elif self.peft_model_ckpt_dir: - # if the PEFT weights are provided in a ckpt path file - # we don't need to untar - model_weights_path = self._inject_model_parallel_rank_for_ckpt( - self.peft_model_ckpt_dir, self.peft_model_ckpt_name - ) - peft_state_dict = torch.load(model_weights_path, map_location)['state_dict'] - else: - peft_state_dict = {} - if base_model_state_dict: - base_model_state_dict.update(peft_state_dict) # add the PEFT state_dict into the base model's state_dict - return base_model_state_dict - - def restore_from( - self, - calling_cls, - restore_path: str, - override_config_path: Optional[Union[OmegaConf, str]] = None, - map_location: Optional[torch.device] = None, - strict: bool = True, - return_config: bool = False, - trainer: Trainer = None, - ): - """ - Extends the restore_from method of the `NLPSaveRestoreConnector` so that PEFT params are inserted into the state_dict which is required when training a PEFT model from scratch. - """ - # Get path where the command is executed - the artifacts will be "retrieved" there - # (original .nemo behavior) - loaded_params = super().load_config_and_state_dict( - calling_cls, restore_path, override_config_path, map_location, strict, return_config, trainer, - ) - if not isinstance(loaded_params, tuple) or return_config is True: - return loaded_params - conf, instance, state_dict = loaded_params - - # if we're using dist checkpointing then state_dict will be None - if state_dict is None: - # dist checkpointing needs torch.distributed to load the checkpoint - if parallel_state.is_unitialized(): - - def dummy(): - return - - if trainer.strategy.launcher is not None: - trainer.strategy.launcher.launch(dummy, trainer=trainer) - trainer.strategy.setup_environment() - - with tempfile.TemporaryDirectory() as tmpdir: - # Check if self.model_extracted_dir is set, and is a valid path - if self.model_extracted_dir is not None and os.path.isdir(self.model_extracted_dir): - # Log that NeMo will use the provided `model_extracted_dir` - logging.info( - f"Restoration will occur within pre-extracted directory : " f"`{self.model_extracted_dir}`." - ) - - # Override `tmpdir` above with the pre-extracted `model_extracted_dir` - tmpdir = self.model_extracted_dir - - else: - # Extract the nemo file into the temporary directory - self._unpack_nemo_file( - path2file=restore_path, out_folder=tmpdir, extract_config_only=return_config is True - ) - checkpoint = {} - sharded_state_dict = instance.sharded_state_dict() - peft_state_dict = instance.get_peft_state_dict() - for k in peft_state_dict.keys(): - sharded_state_dict.pop(k) - checkpoint['state_dict'] = sharded_state_dict - # remove model weights extension - tmp_model_weights_ckpt = os.path.join(tmpdir, self.model_weights_ckpt) - tmp_model_weights_dir = os.path.splitext(tmp_model_weights_ckpt)[0] - assert os.path.isdir(tmp_model_weights_dir), f'Expected {tmp_model_weights_dir} to be a directory.' - checkpoint = dist_checkpointing.load( - sharded_state_dict=checkpoint, checkpoint_dir=tmp_model_weights_dir - ) - checkpoint['state_dict'].update(peft_state_dict) - instance.on_load_checkpoint(checkpoint) - if hasattr(instance, 'setup_transformer_engine_tp_groups'): - instance.setup_transformer_engine_tp_groups() - - else: - if ( - self.peft_model_nemo_path is None and self.peft_model_ckpt_dir is None - ): # we have this check only for training PEFT from scratch - peft_state_dict = instance.get_peft_state_dict() - state_dict.update(peft_state_dict) - state_dict = self.modify_state_dict(conf, state_dict) - self.load_instance_with_state_dict(instance, state_dict, strict) - - logging.info(f'Model {instance.__class__.__name__} was successfully restored from {restore_path}.') - return instance - - class PipelineMixedPrecisionPlugin(MixedPrecisionPlugin): """ Overrides PTL autocasting to not wrap training/val/test_step. We do this because we have the megatron-core fwd/bwd functions in training_step. diff --git a/nemo/collections/nlp/parts/peft_config.py b/nemo/collections/nlp/parts/peft_config.py new file mode 100644 index 000000000000..dd75747fd73c --- /dev/null +++ b/nemo/collections/nlp/parts/peft_config.py @@ -0,0 +1,190 @@ +# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Dict + +from omegaconf import DictConfig + +try: + from nemo.collections.nlp.modules.common.megatron.adapters.mcore_mixins import ( + MCoreGPTEmbeddingMixin, + MCoreSelfAttentionMixin, + MCoreTransformerLayerMixin, + ) +except (ImportError, ModuleNotFoundError): + MCoreGPTEmbeddingMixin = MCoreSelfAttentionMixin = MCoreTransformerLayerMixin = None + +from nemo.collections.nlp.modules.common.megatron.adapters.parallel_adapters import ( + AdapterName, + InfusedAdapterConfig, + LoraKQVAdapterConfig, + LoraKQVAdapterWeightTyingConfig, + MLPInfusedAdapterConfig, + ParallelLinearAdapterConfig, + ParallelLinearAdapterWeightTyingConfig, + PromptEncoderAdapterConfig, +) + + +class PEFTConfig: + # superclass for adapter name and config + def __init__(self, peft_cfg: DictConfig, name_key_to_cfg: Dict): + self.name_key_to_cfg = name_key_to_cfg + + self.layer_selection = peft_cfg.get("layer_selection", None) + self.weight_tying = peft_cfg.get("weight_tying", False) + + def get_config_dict(self): + return self.name_key_to_cfg + + +class LoraPEFTConfig(PEFTConfig): + def __init__(self, cfg): + lora_cfg = cfg.peft.lora_tuning + if cfg.get("kv_channels", None) is None: + assert ( + cfg.hidden_size % cfg.num_attention_heads == 0 + ), 'hidden_size must be divisible by num_attention_heads if kv_channels is None' + kv_channels = cfg.hidden_size // cfg.num_attention_heads + else: + kv_channels = cfg.kv_channels + projection_size = kv_channels * cfg.num_attention_heads + + config_args = { + "in_features": cfg.hidden_size, + "out_features": 3 * projection_size, + "dim": lora_cfg.adapter_dim, + "norm_position": None, + "norm_type": None, + "activation": "identity", + "column_init_method": lora_cfg.get("column_init_method", "normal"), + "row_init_method": lora_cfg.get("row_init_method", "zero"), + "gather_output": False, + "dropout": lora_cfg.adapter_dropout, + } + + if lora_cfg.weight_tying: + position_embedding_strategy = lora_cfg.get("position_embedding_strategy", None) + if position_embedding_strategy is None: + dim_position_embeddings = 0 + elif position_embedding_strategy == "add": + dim_position_embeddings = cfg.hidden_size + elif position_embedding_strategy == "biasadd": + dim_position_embeddings = 3 * projection_size + elif position_embedding_strategy == "concat": + dim_position_embeddings = lora_cfg.adapter_dim + elif position_embedding_strategy == "mlpconcat": + dim_position_embeddings = lora_cfg.adapter_dim + else: + raise RuntimeError( + f"Unknown position embedding strategy {position_embedding_strategy} for tied weights" + ) + config_args.update( + { + "num_position_embeddings": cfg.num_layers, + "dim_position_embeddings": dim_position_embeddings, + "position_embedding_strategy": position_embedding_strategy, + } + ) + adapter_cfg = LoraKQVAdapterWeightTyingConfig(**config_args) + else: + adapter_cfg = LoraKQVAdapterConfig(**config_args) + + name_key_to_cfg = { + AdapterName.LORA_KQV_ADAPTER: adapter_cfg, + } + self.name_key_to_mcore_mixins = {AdapterName.LORA_KQV_ADAPTER: [("self_attention", MCoreSelfAttentionMixin)]} + + super().__init__(lora_cfg, name_key_to_cfg) + + +class IA3PEFTConfig(PEFTConfig): + def __init__(self, cfg): + mlp_infused_adapter_cfg = MLPInfusedAdapterConfig( + in_features=cfg.ffn_hidden_size // cfg.tensor_model_parallel_size + ) + infused_adapter_cfg = InfusedAdapterConfig(in_features=cfg.hidden_size // cfg.tensor_model_parallel_size) + + name_key_to_cfg = { + AdapterName.KEY_INFUSED: infused_adapter_cfg, + AdapterName.VALUE_INFUSED: infused_adapter_cfg, + AdapterName.MLP_INFUSED: mlp_infused_adapter_cfg, + } + + super().__init__(cfg.peft.ia3_tuning, name_key_to_cfg) + + +class PtuningPEFTConfig(PEFTConfig): + def __init__(self, cfg): + adapter_cfg = PromptEncoderAdapterConfig( + cfg.peft.p_tuning.virtual_tokens, + cfg.peft.p_tuning.bottleneck_dim, + cfg.peft.p_tuning.embedding_dim, + cfg.peft.p_tuning.init_std, + cfg.hidden_size, + ) + name_key_to_cfg = {AdapterName.PTUNING_ADAPTER: adapter_cfg} + self.name_key_to_mcore_mixins = {AdapterName.PTUNING_ADAPTER: [('embedding', MCoreGPTEmbeddingMixin)]} + self.virtual_tokens = cfg.peft.p_tuning.virtual_tokens + + super().__init__(cfg.peft.p_tuning, name_key_to_cfg) + + +class CanonicalAdaptersPEFTConfig(PEFTConfig): + def __init__(self, cfg): + adapter_tuning_cfg = cfg.peft.adapter_tuning + + config_args = { + "in_features": cfg.hidden_size, + "out_features": cfg.hidden_size, + "dim": adapter_tuning_cfg.adapter_dim, + "norm_position": adapter_tuning_cfg.get("norm_position", "pre"), + "norm_type": adapter_tuning_cfg.get("norm_type", "mixedfusedlayernorm"), + "column_init_method": adapter_tuning_cfg.get("column_init_method", "xavier"), + "row_init_method": adapter_tuning_cfg.get("row_init_method", "zero"), + "dropout": adapter_tuning_cfg.adapter_dropout, + } + + if adapter_tuning_cfg.weight_tying: + config_args.update( + { + "num_position_embeddings": cfg.num_layers * 2, + "dim_position_embeddings": cfg.hidden_size, + "position_embedding_strategy": adapter_tuning_cfg.get("position_embedding_strategy", None), + } + ) + adapter_cfg = ParallelLinearAdapterWeightTyingConfig(**config_args) + else: + adapter_cfg = ParallelLinearAdapterConfig(**config_args) + + name_key_to_cfg = { + AdapterName.PRE_ATTN_ADAPTER: adapter_cfg, + AdapterName.POST_ATTN_ADAPTER: adapter_cfg, + } + self.name_key_to_mcore_mixins = { + AdapterName.PRE_ATTN_ADAPTER: [("", MCoreTransformerLayerMixin)], + AdapterName.POST_ATTN_ADAPTER: [("", MCoreTransformerLayerMixin)], + } + + super().__init__(adapter_tuning_cfg, name_key_to_cfg) + + +PEFT_CONFIG_MAP = { + "adapter": CanonicalAdaptersPEFTConfig, + "ia3": IA3PEFTConfig, + "ptuning": PtuningPEFTConfig, + "lora": LoraPEFTConfig, + 'none': None, + None: None, +} diff --git a/nemo/core/classes/mixins/adapter_mixins.py b/nemo/core/classes/mixins/adapter_mixins.py index a7f94e90f9b9..2a05f374d464 100644 --- a/nemo/core/classes/mixins/adapter_mixins.py +++ b/nemo/core/classes/mixins/adapter_mixins.py @@ -42,6 +42,11 @@ def __post_init__(self): self.adapter_class_path = f'{self.adapter_class.__module__}.{self.adapter_class.__name__}' +class AdapterConfig: + # superclass for all adapter config dataclasses + pass + + def register_adapter(base_class: type, adapter_class: type): """ Registers a pair (Base class, Adapter class) into the adapter registry, used for de-referencing. @@ -144,8 +149,8 @@ class AdapterModuleMixin(ABC): metadata of the adapter config. .. note:: - - This module is **not** responsible for maintaining its config. Subclasses must ensure config is updated + + This module is **not** responsible for maintaining its config. Subclasses must ensure config is updated or preserved as needed. It is the responsibility of the subclasses to propagate the most up to date config to lower layers. """ @@ -153,7 +158,7 @@ class AdapterModuleMixin(ABC): adapter_global_cfg_key = "global_cfg" adapter_metadata_cfg_key = "adapter_meta_cfg" - def add_adapter(self, name: str, cfg: DictConfig, **kwargs): + def add_adapter(self, name: str, cfg: Union[DictConfig, AdapterConfig], **kwargs): """ Add an Adapter module to this module. @@ -520,7 +525,7 @@ def forward_single_enabled_adapter_( Perform the forward step of a single adapter module on some input data. .. note:: - + Subclasses can override this method to accommodate more complicate adapter forward steps. Args: @@ -608,7 +613,7 @@ def setup_adapters(self): f"Finished setup of adapter : '{full_adapter_name}'. Enabled: {adapter_cfg.get('enabled', True)}." ) - def add_adapter(self, name: str, cfg: DictConfig): + def add_adapter(self, name: str, cfg: Union[DictConfig, AdapterConfig]): """ Add an Adapter module to this model. @@ -758,7 +763,7 @@ def save_adapters(self, filepath: str, name: str = None): Utility method that saves only the adapter module(s), and not the entire model itself. This allows the sharing of adapters which are often just a fraction of the size of the full model, enabling easier deliver. - + .. note:: The saved file is a pytorch compatible pickle file, containing the state dicts of the adapter(s), @@ -840,7 +845,7 @@ def load_adapters(self, filepath: str, name: str = None, map_location: str = Non enabling easier deliver. .. note:: - + During restoration, assumes that the model does not currently already have an adapter with the name (if provided), or any adapter that shares a name with the state dict's modules (if name is not provided). This is to ensure that each adapter name is globally unique @@ -971,7 +976,7 @@ def adapter_module_names(self) -> List[str]: List of valid adapter modules that are supported by the model. .. note:: - + Subclasses should override this property and return a list of str names, of all the modules that they support, which will enable users to determine where to place the adapter modules. diff --git a/scripts/metric_calculation/peft_metric_calc.py b/scripts/metric_calculation/peft_metric_calc.py index 819d1f2a8c4c..ca13f83281c5 100755 --- a/scripts/metric_calculation/peft_metric_calc.py +++ b/scripts/metric_calculation/peft_metric_calc.py @@ -92,46 +92,35 @@ def metric_max_over_ground_truths(metric_fn, prediction, ground_truths): def main(): parser = argparse.ArgumentParser(description='Process some integers.') parser.add_argument( - '--ground-truth', - type=str, - help="ground truth .jsonl file made from /NeMo/scripts/dataset_processing/nlp/squad/prompt_learning_squad_preprocessing.py", - ) - parser.add_argument( - '--preds', + '--pred-file', type=str, help="Text file with test set prompts + model predictions. Prediction file can be made by running NeMo/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py", ) parser.add_argument( - '--split-string', + '--pred-field', type=str, - help="The text at the end of the prompt, write before the predicted answer. This will be used to find the model's predictions in pred files when the pred file containers both the prompt and prediction.", - default=None, - ) # If the pred file only has preditions, just pass none + help="The field in the json file that contains the prediction tokens", + default="pred", + ) parser.add_argument( - '--answer-field', + '--ground-truth-field', type=str, help="The field in the json file that contains the ground truth tokens", - default="answer", + default="original_answers", ) args = parser.parse_args() - ground_truth_file = args.ground_truth - pred_file = args.preds + pred_file = args.pred_file scorer = rouge_scorer.RougeScorer(['rougeL'], use_stemmer=True) preds = open(pred_file, encoding="utf-8").readlines() - ground_truth = open(ground_truth_file).readlines() f1 = exact_match = total = r_score = 0 for i in range(len(preds)): - truth = json.loads(ground_truth[i]) - pred_answer = json.loads(preds[i]) - - # Need to separate out preditions from prompt, spliting on the provided "split string" - if args.split_string is not None: - pred_answer = pred_answer["sentence"].split(args.split_string)[-1].strip() + pred_line = json.loads(preds[i]) - true_answers = truth[args.answer_field] + pred_answer = pred_line[args.pred_field] + true_answers = pred_line[args.ground_truth_field] if not isinstance(true_answers, list): true_answers = [true_answers] diff --git a/tutorials/nlp/lora.ipynb b/tutorials/nlp/lora.ipynb index fc79f74a6e2a..8603bbb62411 100644 --- a/tutorials/nlp/lora.ipynb +++ b/tutorials/nlp/lora.ipynb @@ -1,38 +1,45 @@ { "cells": [ + { + "cell_type": "markdown", + "source": [ + "Currently, this notebook must be run in a NeMo container.\n", + "An example command to launch the container:\n", + "```bash\n", + "docker run --gpus all -it --rm -v :/NeMo --shm-size=8g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 \n", + "```" + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "code", - "execution_count": 2, - "id": "b7a434f4", - "metadata": {}, + "execution_count": null, "outputs": [], "source": [ - "BRANCH='main'\n", - "import os\n", - "import wget" - ] + "# Update megatron version to the newest.\n", + "!cd /workspace && python -m pip install -e git+https://github.com/NVIDIA/Megatron-LM#egg=megatron-core" + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "code", "execution_count": null, - "id": "developmental-gibraltar", - "metadata": {}, "outputs": [], "source": [ - "\"\"\"\n", - "You can run either this notebook locally (if you have all the dependencies and a GPU) or on Google Colab.\n", - "\n", - "Instructions for setting up Colab are as follows:\n", - "1. Open a new Python 3 notebook.\n", - "2. Import this notebook from GitHub (File -> Upload Notebook -> \"GITHUB\" tab -> copy/paste GitHub URL)\n", - "3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n", - "4. Run this cell to set up dependencies.\n", - "\"\"\"\n", - "# If you're using Google Colab and not running locally, run this cell\n", - "\n", - "# install NeMo\n", - "!python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[nlp]" - ] + "%cd /NeMo/tutorials/nlp\n", + "BRANCH='main'\n", + "import os\n", + "import wget\n", + "import sys\n", + "sys.path.insert(0, \"../..\") # find the local nemo first before the installed nemo" + ], + "metadata": { + "collapsed": false + } }, { "attachments": {}, @@ -42,16 +49,15 @@ "source": [ "### Introduction\n", "\n", - "In this notebook we demonstrate how to use NeMo's implementation of LoRA (Low Rank Adaptation) for fine-tuning large language models. Our implementation is based on the [paper](https://openreview.net/pdf?id=nZeVKeeFYf9) by Hu et al.\n", + "This notebook demonstrates how to apply PEFT in NeMo. For brevity, we have chosen LoRA as the PEFT technique and GPT as the language model, but the same recipe can be used for other PEFT techniques and language models, as described in the [Training](#training) section.\n", + "\n", + " The implementation of LoRA is based on the paper, [LoRA: Low-Rank Adaptation of Large Language Models](https://openreview.net/pdf?id=nZeVKeeFYf9) by Hu et al.\n", + "\n", + "This example demonstrates how to:\n", "\n", - "We are going to show you how to:\n", - " \n", " 1. Train a LoRA model on a simple Extractive QA task.\n", " 2. Inspect the trained LoRA model showing the parameters it contains.\n", - " 3. Run inference with the based model with the LoRA parameters.\n", - " 4. Merge the LoRA parameters into the base model and run inference again on the merged model.\n", - "\n", - "In this tutorial we will be focusing on LoRA, but the training and evaluation methods described here will be applicable for other Parameter-efficient Fine tuning (PEFT) methods in NeMo." + " 3. Run inference with the base model with the LoRA parameters." ] }, { @@ -79,7 +85,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "id": "0dbd41fd", "metadata": {}, "outputs": [], @@ -87,7 +93,9 @@ "# You can replace DATA_DIR and NEMO_DIR with your own locations\n", "DATA_DIR = \"data\"\n", "NEMO_DIR = \".\"\n", - "os.makedirs(DATA_DIR, exist_ok=True)" + "os.makedirs(DATA_DIR, exist_ok=True)\n", + "SQUAD_DIR = os.path.join(DATA_DIR, \"SQuAD\")\n", + "os.makedirs(SQUAD_DIR, exist_ok=True)" ] }, { @@ -102,19 +110,10 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "id": "e72a1dc1", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "File ‘prompt_learning_squad_preprocessing.py’ already there; not retrieving.\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "# download the preprocessing scripts from github for the purpose of this tutorial\n", "! wget -nc https://raw.githubusercontent.com/NVIDIA/NeMo/{BRANCH}/scripts/dataset_processing/nlp/squad/prompt_learning_squad_preprocessing.py" @@ -131,43 +130,11 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "id": "fa16d8ac", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "--2023-05-30 14:07:23-- https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json\n", - "Resolving rajpurkar.github.io (rajpurkar.github.io)... 185.199.109.153, 185.199.111.153, 185.199.108.153, ...\n", - "Connecting to rajpurkar.github.io (rajpurkar.github.io)|185.199.109.153|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 30288272 (29M) [application/json]\n", - "Saving to: ‘train-v1.1.json’\n", - "\n", - "train-v1.1.json 100%[===================>] 28.88M 84.3MB/s in 0.3s \n", - "\n", - "2023-05-30 14:07:25 (84.3 MB/s) - ‘train-v1.1.json’ saved [30288272/30288272]\n", - "\n", - "--2023-05-30 14:07:26-- https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json\n", - "Resolving rajpurkar.github.io (rajpurkar.github.io)... 185.199.110.153, 185.199.108.153, 185.199.111.153, ...\n", - "Connecting to rajpurkar.github.io (rajpurkar.github.io)|185.199.110.153|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 4854279 (4.6M) [application/json]\n", - "Saving to: ‘dev-v1.1.json’\n", - "\n", - "dev-v1.1.json 100%[===================>] 4.63M --.-KB/s in 0.1s \n", - "\n", - "2023-05-30 14:07:27 (43.8 MB/s) - ‘dev-v1.1.json’ saved [4854279/4854279]\n", - "\n" - ] - } - ], + "outputs": [], "source": [ - "SQUAD_DIR = os.path.join(DATA_DIR, \"SQuAD\")\n", - "os.makedirs(SQUAD_DIR, exist_ok=True)\n", - "\n", "# Download the SQuAD dataset\n", "!wget -nc https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json\n", "!wget -nc https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json\n", @@ -177,25 +144,10 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "id": "64e3e25b", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Saving train split to data/SQuAD/squad_train.jsonl\n", - "100%|█████████████████████████████████| 87599/87599 [00:00<00:00, 204336.27it/s]\n", - "Saving val split to data/SQuAD/squad_val.jsonl\n", - "100%|█████████████████████████████████| 10570/10570 [00:00<00:00, 158654.55it/s]\n", - "Saving test split to data/SQuAD/squad_test_ground_truth.jsonl\n", - "100%|█████████████████████████████████| 10570/10570 [00:00<00:00, 183040.92it/s]\n", - "Saving test split to data/SQuAD/squad_test.jsonl\n", - "100%|█████████████████████████████████| 10570/10570 [00:00<00:00, 196367.94it/s]\n" - ] - } - ], + "outputs": [], "source": [ "# Preprocess squad data\n", "!python prompt_learning_squad_preprocessing.py --sft-format --data-dir {SQUAD_DIR}" @@ -203,25 +155,10 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "id": "b562d1de", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"input\": \"User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question:Which NFL team represented the AFC at Super Bowl 50?\\n\\nAssistant:\", \"output\": \"Denver Broncos\"}\n", - "{\"input\": \"User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question:Which NFL team represented the NFC at Super Bowl 50?\\n\\nAssistant:\", \"output\": \"Carolina Panthers\"}\n", - "{\"input\": \"User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question:Where did Super Bowl 50 take place?\\n\\nAssistant:\", \"output\": \"Santa Clara, California\"}\n", - "{\"input\": \"User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question:Which NFL team won Super Bowl 50?\\n\\nAssistant:\", \"output\": \"Denver Broncos\"}\n", - "{\"input\": \"User: Context:Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend \\\"Venite Ad Me Omnes\\\". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary. Question:To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?\\n\\nAssistant:\", \"output\": \"Saint Bernadette Soubirous\"}\n", - "{\"input\": \"User: Context:Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend \\\"Venite Ad Me Omnes\\\". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary. Question:What is in front of the Notre Dame Main Building?\\n\\nAssistant:\", \"output\": \"a copper statue of Christ\"}\n", - "{\"input\": \"User: Context:Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend \\\"Venite Ad Me Omnes\\\". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary. Question:The Basilica of the Sacred heart at Notre Dame is beside to which structure?\\n\\nAssistant:\", \"output\": \"the Main Building\"}\n", - "{\"input\": \"User: Context:Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend \\\"Venite Ad Me Omnes\\\". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary. Question:What is the Grotto at Notre Dame?\\n\\nAssistant:\", \"output\": \"a Marian place of prayer and reflection\"}\n" - ] - } - ], + "outputs": [], "source": [ "# What the squad dataset looks like after processing\n", "! head -200 $SQUAD_DIR/squad_train.jsonl > $SQUAD_DIR/squad_short_train.jsonl\n", @@ -237,12 +174,14 @@ "metadata": {}, "source": [ "### Model Config Setup\n", - "Now we will begin setting up the config file needed for PEFT tuning. We use a single config for all supported PEFT methods (LoRA, Adapter and P-Tuning). All PEFT methods use classes defined in [megatron_gpt_peft_models.py](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/nlp/models/language_modeling/megatron_gpt_peft_models.py). All PEFT Classes inherit from `MegatronGPTSFTModel` which is the class that governs instruction tuning." + "Now we will begin setting up the config file needed for PEFT tuning. We use a single config for all supported PEFT methods (LoRA, Adapter, IA3 and P-Tuning, as well as combinations of these). All PEFT methods use the GPT finetuning class `MegatronGPTSFTModel` as the frozen base network, and use the `add_adapter()` method to add adapter weights for PEFT.\n", + "\n", + "Let's create a config object for LoRA training." ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "id": "5749c387", "metadata": {}, "outputs": [], @@ -266,15 +205,15 @@ "id": "ce966bcf", "metadata": {}, "source": [ - "The `config` contains several attributes required by the `MegatronGPTPEFTModel`. First we will set the training data path and the validation data path in the config.\n", - "The `config` allows us to set a list of `jsonl` files as training files and sample examples from each file with different probabilities. For simplicity we are going to use just one training file and thus the sampling probability is set to `1.0`\n", + "The `config` contains several attributes required by the `MegatronGPTSFTModel`. First we will set the training data path and the validation data path in the config.\n", + "The `config` allows us to set a list of `jsonl` files as training files and sample examples from each file with different probabilities. For simplicity, we are going to use just one training file and thus the sampling probability is set to `1.0`\n", "\n", "We can also monitor validation loss from multiple validation files during training. Again for simplicity we will use just one validation file." ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "id": "6bb1590f", "metadata": {}, "outputs": [], @@ -292,7 +231,7 @@ "metadata": {}, "source": [ "### PEFT Config\n", - "The attribute [config.model.peft](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_tuning_config.yaml#L78) contains settings that control the PEFT training method and its related hyperpameters. We currently support `lora`, `adapters`, `ptuning` and `ia3`. We can instruct the training script to use one of these methods by setting the config.model.peft.peft_scheme attribute.\n", + "The attribute [config.model.peft](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_tuning_config.yaml#L78) contains settings that control the PEFT training method and its related hyperpameters. We currently support `lora`, `adapter`, `ptuning` and `ia3`. We can instruct the training script to use one of these methods by setting the config.model.peft.peft_scheme attribute.\n", "\n", "The other hyperparams associated with lora tuning are present in the [config.model.peft.lora_tuning](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_tuning_config.yaml#L92) attribute." ] @@ -324,12 +263,12 @@ "metadata": {}, "source": [ "### Prompt Formatting\n", - "The `config.model.data.train_ds.prompt_template` attribute allows us to further tweak the format of the input and output if needed. In this example, we have \"encoding\" our format inside the `jsonl` file directly. So we can keep the `prompt_template` in the config simple.(See previous section on Data Preparation). " + "The `config.model.data.train_ds.prompt_template` attribute allows us to further tweak the format of the input and output if needed. In this example, we have already incorporated our format inside the `jsonl` file during preprocessing, so we can keep the `prompt_template` in the config simple. (See previous section on Data Preparation)." ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "id": "1b6aa5c7", "metadata": {}, "outputs": [], @@ -349,29 +288,10 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "id": "48cdf868", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2023-05-30 14:08:23 experimental:27] Module is experimental, not ready for production and is not fully supported. Use at your own risk.\n", - "[NeMo W 2023-05-30 14:08:24 experimental:27] Module is experimental, not ready for production and is not fully supported. Use at your own risk.\n" - ] - }, - { - "data": { - "text/plain": [ - "'https://api.ngc.nvidia.com/v2/models/nvidia/nemo/megatron_gpt_345m/versions/1/files/megatron_gpt_345m.nemo'" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "# Check what GPT .nemo models we have available on NGC\n", "from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel\n", @@ -391,26 +311,28 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "id": "364439a1", "metadata": { "scrolled": true }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "File ‘./megatron_gpt_345m.nemo’ already there; not retrieving.\n" - ] - } - ], + "outputs": [], "source": [ "# Download the model from NGC\n", - "gpt_file_name = \"megatron_gpt_345m.nemo\"\n", - "!wget -nc --content-disposition {megatron_gpt_345m_nemo_url} -O {NEMO_DIR}/{gpt_file_name}" + "gpt_file_name = \"megatron_gpt_345m.nemo\"" ] }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "!wget -nc --content-disposition {megatron_gpt_345m_nemo_url} -O {NEMO_DIR}/{gpt_file_name}" + ], + "metadata": { + "collapsed": false + } + }, { "attachments": {}, "cell_type": "markdown", @@ -422,7 +344,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "id": "2778a5fa", "metadata": {}, "outputs": [], @@ -442,7 +364,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "id": "a278cbdf", "metadata": {}, "outputs": [], @@ -468,151 +390,12 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "id": "12a37ada", "metadata": { "scrolled": true }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "seed: 1234\n", - "tensor_model_parallel_size: 1\n", - "pipeline_model_parallel_size: 1\n", - "global_batch_size: 4\n", - "micro_batch_size: 1\n", - "restore_from_path: megatron_gpt_345m.nemo\n", - "resume_from_checkpoint: null\n", - "save_nemo_on_validation_end: false\n", - "sync_batch_comm: false\n", - "megatron_amp_O2: false\n", - "sequence_parallel: false\n", - "activations_checkpoint_granularity: null\n", - "activations_checkpoint_method: null\n", - "activations_checkpoint_num_layers: null\n", - "answer_only_loss: true\n", - "gradient_as_bucket_view: false\n", - "hidden_dropout: 0.0\n", - "attention_dropout: 0.0\n", - "ffn_dropout: 0.0\n", - "peft:\n", - " peft_scheme: adapter\n", - " restore_from_path: null\n", - " adapter_tuning:\n", - " type: parallel_adapter\n", - " adapter_dim: 32\n", - " adapter_dropout: 0.0\n", - " norm_position: pre\n", - " column_init_method: xavier\n", - " row_init_method: zero\n", - " norm_type: mixedfusedlayernorm\n", - " lora_tuning:\n", - " adapter_dim: 32\n", - " adapter_dropout: 0.0\n", - " column_init_method: xavier\n", - " row_init_method: zero\n", - " p_tuning:\n", - " virtual_tokens: 10\n", - " bottleneck_dim: 1024\n", - " embedding_dim: 1024\n", - " init_std: 0.023\n", - "data:\n", - " train_ds:\n", - " file_names:\n", - " - data/SQuAD/squad_short_train.jsonl\n", - " global_batch_size: ${model.global_batch_size}\n", - " micro_batch_size: ${model.micro_batch_size}\n", - " shuffle: true\n", - " num_workers: 0\n", - " pin_memory: true\n", - " max_seq_length: 2048\n", - " min_seq_length: 1\n", - " drop_last: true\n", - " concat_sampling_probabilities:\n", - " - 1.0\n", - " context_key: input\n", - " label_key: output\n", - " add_eos: true\n", - " add_sep: false\n", - " add_bos: false\n", - " separate_prompt_and_response_with_newline: false\n", - " truncation_field: context\n", - " index_mapping_dir: null\n", - " prompt_template: '{input} {output}'\n", - " validation_ds:\n", - " file_names:\n", - " - data/SQuAD/squad_short_val.jsonl\n", - " names:\n", - " - squad_val\n", - " global_batch_size: ${model.global_batch_size}\n", - " micro_batch_size: ${model.micro_batch_size}\n", - " shuffle: false\n", - " num_workers: 0\n", - " pin_memory: true\n", - " max_seq_length: 2048\n", - " min_seq_length: 1\n", - " drop_last: false\n", - " context_key: input\n", - " label_key: output\n", - " add_eos: ${model.data.train_ds.add_eos}\n", - " add_sep: ${model.data.train_ds.add_sep}\n", - " add_bos: ${model.data.train_ds.add_bos}\n", - " separate_prompt_and_response_with_newline: ${model.data.train_ds.separate_prompt_and_response_with_newline}\n", - " write_predictions_to_file: false\n", - " output_file_path_prefix: null\n", - " truncation_field: context\n", - " index_mapping_dir: null\n", - " prompt_template: ${model.data.train_ds.prompt_template}\n", - " metric:\n", - " name: loss\n", - " average: null\n", - " num_classes: null\n", - "test_ds:\n", - " file_names: null\n", - " names: null\n", - " global_batch_size: ${model.global_batch_size}\n", - " micro_batch_size: ${model.micro_batch_size}\n", - " shuffle: false\n", - " num_workers: 4\n", - " pin_memory: true\n", - " max_seq_length: 2048\n", - " min_seq_length: 1\n", - " drop_last: false\n", - " context_key: input\n", - " label_key: output\n", - " add_eos: ${model.data.train_ds.add_eos}\n", - " add_sep: ${model.data.train_ds.add_sep}\n", - " add_bos: ${model.data.train_ds.add_bos}\n", - " separate_prompt_and_response_with_newline: ${model.data.train_ds.separate_prompt_and_response_with_newline}\n", - " write_predictions_to_file: false\n", - " output_file_path_prefix: null\n", - " truncation_field: context\n", - " index_mapping_dir: null\n", - " prompt_template: ${model.data.train_ds.prompt_template}\n", - " metric:\n", - " name: loss\n", - " average: null\n", - " num_classes: null\n", - "optim:\n", - " name: fused_adam\n", - " lr: 0.0001\n", - " weight_decay: 0.01\n", - " betas:\n", - " - 0.9\n", - " - 0.98\n", - " sched:\n", - " name: CosineAnnealing\n", - " warmup_steps: 50\n", - " min_lr: 0.0\n", - " constant_steps: 0\n", - " monitor: val_loss\n", - " reduce_on_plateau: false\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "# Final model config\n", "print(OmegaConf.to_yaml(config.model))" @@ -632,49 +415,15 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "id": "90f85b2a", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Using 16bit None Automatic Mixed Precision (AMP)\n", - "GPU available: True (cuda), used: True\n", - "TPU available: False, using: 0 TPU cores\n", - "IPU available: False, using: 0 IPUs\n", - "HPU available: False, using: 0 HPUs\n", - "`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Trainer config - \n", - "\n", - "devices: 1\n", - "accelerator: gpu\n", - "num_nodes: 1\n", - "precision: 16\n", - "logger: false\n", - "enable_checkpointing: false\n", - "replace_sampler_ddp: false\n", - "max_epochs: 4\n", - "max_steps: 100\n", - "log_every_n_steps: 10\n", - "val_check_interval: 1.0\n", - "gradient_clip_val: 1.0\n", - "\n" - ] - } - ], + "outputs": [], "source": [ + "from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy\n", "import torch\n", "import pytorch_lightning as pl\n", - "from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy\n", - "from pytorch_lightning.plugins.environments import TorchElasticEnvironment\n", + "from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronTrainerBuilder\n", "\n", "# let's modify some trainer configs\n", "# check if we have GPU available and uses it\n", @@ -688,16 +437,11 @@ "config.trainer.precision = 16 if torch.cuda.is_available() else 32\n", "\n", "# setup cluster environment parameters\"\n", - "# use torch elastic cluster environment so `create_process_externally` is True\n", - "# the launcher is set to None. It will not try to spawn new processes.\n", - "# It won't create the misconfiguration error because of the `interactive session`\n", "os.environ[\"LOCAL_RANK\"] = '0'\n", "os.environ[\"RANK\"] = '0'\n", "os.environ[\"WORLD_SIZE\"] = '1'\n", "\n", - "strategy = NLPDDPStrategy(find_unused_parameters=False, no_ddp_communication_hook=True)\n", - "plugins = [TorchElasticEnvironment()]\n", - "trainer = pl.Trainer(plugins= plugins, strategy=strategy, **config.trainer)\n", + "trainer = MegatronTrainerBuilder(config).create_trainer()\n", "\n", "print(\"Trainer config - \\n\")\n", "print(OmegaConf.to_yaml(config.trainer))" @@ -726,41 +470,10 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "id": "f2c943ba", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo E 2023-05-30 14:09:17 exp_manager:646] exp_manager received explicit_log_dir: training_info and at least one of exp_dir: ./peft_lora, or version: None. Please note that exp_dir, name, and version will be ignored.\n", - "[NeMo W 2023-05-30 14:09:17 exp_manager:651] Exp_manager is logging to training_info, but it already exists.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2023-05-30 14:09:17 exp_manager:374] Experiments will be logged at training_info\n", - "[NeMo I 2023-05-30 14:09:17 exp_manager:797] TensorboardLogger has been set up\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2023-05-30 14:09:17 exp_manager:893] The checkpoint callback was told to monitor a validation value and trainer's max_steps was set to 100. Please ensure that max_steps will run for at least 1 epochs to ensure that checkpointing will not error out.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "training_info\n" - ] - } - ], + "outputs": [], "source": [ "from nemo.utils.exp_manager import exp_manager\n", "\n", @@ -780,423 +493,71 @@ "id": "298b3dce", "metadata": {}, "source": [ - "### LoRA Training\n", - "We now set up the process for training a LoRA model. We first require a config that contains details about the base language model upon which we will train our LoRA model. So we first extract the `base_model_cfg`" + "### Training\n", + "We now set up the process for training a LoRA model. We first require a config that contains details about the base language model upon which we will train our LoRA model. So we first extract the `model_cfg` from the checkpoint and update it with any new settings we employ in our current (LoRA) `config`. These are combined in the `merge_cfg_with` function.\n", + "\n" ] }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "id": "edb38445", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2023-05-30 14:09:30 experimental:27] Module is experimental, not ready for production and is not fully supported. Use at your own risk.\n" - ] - } - ], + "outputs": [], "source": [ - "from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTModel\n", - "from nemo.collections.nlp.parts.nlp_overrides import NLPSaveRestoreConnector, PEFTSaveRestoreConnector\n", - "base_model_save_restore_connector = NLPSaveRestoreConnector()\n", - "base_model_cfg = MegatronGPTModel.restore_from(\n", - " restore_path=config.model.restore_from_path,\n", - " trainer=trainer,\n", - " return_config=True,\n", - " save_restore_connector=base_model_save_restore_connector,\n", - " )" + "from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTSFTModel\n", + "\n", + "model_cfg = MegatronGPTSFTModel.merge_cfg_with(config.model.restore_from_path, config)" ] }, { "attachments": {}, "cell_type": "markdown", - "id": "16bace39", + "id": "dfc55a1c", "metadata": {}, "source": [ - "Next, we update the `base_model_cfg` with any new settings we employ in our current (LoRA) `config`." + "Next, we instantiate the GPT model class and add the LoRA adapter\n", + "When we call `add_adapter`, the model prints out the parameter count before and after the operation. We can clearly see the number of trainable parameters increase after adding the adapter.\n", + "To print the parameter count manually, we can call `model.summarize()`." ] }, { "cell_type": "code", - "execution_count": 19, - "id": "fd350dbc", + "execution_count": null, + "id": "a81d8741", "metadata": {}, "outputs": [], "source": [ - "from omegaconf.omegaconf import open_dict\n", - "from nemo.collections.nlp.models.language_modeling.megatron_gpt_peft_models import MegatronGPTLoRAModel\n", - "OmegaConf.set_struct(base_model_cfg, True)\n", - "OmegaConf.resolve(config)\n", - "with open_dict(base_model_cfg):\n", - " base_model_cfg.megatron_amp_O2 = config.model.get('megatron_amp_O2', False)\n", - " base_model_cfg.micro_batch_size = config.model.data.train_ds.micro_batch_size\n", - " base_model_cfg.global_batch_size = config.model.data.train_ds.global_batch_size\n", - " base_model_cfg.sequence_parallel = config.model.get(\"sequence_parallel\", False)\n", - " base_model_cfg.data = config.model.data\n", - " base_model_cfg.optim = config.model.optim\n", - " base_model_cfg.precision = config.trainer.precision\n", - " base_model_cfg.answer_only_loss = config.model.answer_only_loss\n", - " base_model_cfg.restore_from_path = config.model.restore_from_path\n", - " base_model_cfg.resume_from_checkpoint = config.model.resume_from_checkpoint\n", - " base_model_cfg.save_nemo_on_validation_end = config.model.save_nemo_on_validation_end\n", - " base_model_cfg.peft = config.model.peft\n", - " base_model_cfg.target = f\"{MegatronGPTLoRAModel.__module__}.{MegatronGPTLoRAModel.__name__}\"" + "from nemo.collections.nlp.parts.peft_config import LoraPEFTConfig\n", + "\n", + "model = MegatronGPTSFTModel.restore_from(config.model.restore_from_path, model_cfg, trainer=trainer)\n", + "model.add_adapter(LoraPEFTConfig(model_cfg))\n", + "# print(\"Parameter count manually:\\n\", model.summarize())" ] }, { - "attachments": {}, "cell_type": "markdown", - "id": "dfc55a1c", - "metadata": {}, "source": [ - "Next, we instantiate the LoRA model class" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "id": "a81d8741", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2023-05-30 14:09:39 megatron_init:232] Rank 0 has data parallel group: [0]\n", - "[NeMo I 2023-05-30 14:09:39 megatron_init:235] All data parallel group ranks: [[0]]\n", - "[NeMo I 2023-05-30 14:09:39 megatron_init:236] Ranks 0 has data parallel rank: 0\n", - "[NeMo I 2023-05-30 14:09:39 megatron_init:244] Rank 0 has model parallel group: [0]\n", - "[NeMo I 2023-05-30 14:09:39 megatron_init:245] All model parallel group ranks: [[0]]\n", - "[NeMo I 2023-05-30 14:09:39 megatron_init:255] Rank 0 has tensor model parallel group: [0]\n", - "[NeMo I 2023-05-30 14:09:39 megatron_init:259] All tensor model parallel group ranks: [[0]]\n", - "[NeMo I 2023-05-30 14:09:39 megatron_init:260] Rank 0 has tensor model parallel rank: 0\n", - "[NeMo I 2023-05-30 14:09:39 megatron_init:274] Rank 0 has pipeline model parallel group: [0]\n", - "[NeMo I 2023-05-30 14:09:39 megatron_init:286] Rank 0 has embedding group: [0]\n", - "[NeMo I 2023-05-30 14:09:39 megatron_init:292] All pipeline model parallel group ranks: [[0]]\n", - "[NeMo I 2023-05-30 14:09:39 megatron_init:293] Rank 0 has pipeline model parallel rank 0\n", - "[NeMo I 2023-05-30 14:09:39 megatron_init:294] All embedding group ranks: [[0]]\n", - "[NeMo I 2023-05-30 14:09:39 megatron_init:295] Rank 0 has embedding rank: 0\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2023-05-30 14:09:39 modelPT:244] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2023-05-30 14:09:39 tokenizer_utils:204] Getting Megatron tokenizer for pretrained model name: megatron-gpt-345m, custom vocab file: /tmp/tmp1qljai9b/bfcdca5e44814366bdb5dcd651325152_gpt2-vocab.json, and merges file: /tmp/tmp1qljai9b/315a11fd68be49d6abdb34363e8c4997_gpt2-merge.txt\n", - "[NeMo I 2023-05-30 14:09:39 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: gpt2, vocab_file: /tmp/tmp1qljai9b/bfcdca5e44814366bdb5dcd651325152_gpt2-vocab.json, merges_files: /tmp/tmp1qljai9b/315a11fd68be49d6abdb34363e8c4997_gpt2-merge.txt, special_tokens_dict: {}, and use_fast: False\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Using sep_token, but it is not set yet.\n", - "Using cls_token, but it is not set yet.\n", - "Using pad_token, but it is not set yet.\n", - "Using mask_token, but it is not set yet.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2023-05-30 14:09:40 megatron_base_model:238] Padded vocab_size: 50304, original vocab_size: 50257, dummy tokens: 47.\n", - "[NeMo I 2023-05-30 14:09:41 megatron_gpt_peft_models:56] Before adding PEFT params:\n", - " | Name | Type | Params\n", - " -----------------------------------\n", - " 0 | model | GPTModel | 354 M \n", - " -----------------------------------\n", - " 354 M Trainable params\n", - " 0 Non-trainable params\n", - " 354 M Total params\n", - " 1,419.485 Total estimated model params size (MB)\n", - "[NeMo I 2023-05-30 14:09:41 megatron_gpt_peft_models:65] After adding PEFT params:\n", - " | Name | Type | Params\n", - " -----------------------------------\n", - " 0 | model | GPTModel | 358 M \n", - " -----------------------------------\n", - " 358 M Trainable params\n", - " 0 Non-trainable params\n", - " 358 M Total params\n", - " 1,432.068 Total estimated model params size (MB)\n", - "[NeMo I 2023-05-30 14:09:42 nlp_overrides:491] Model MegatronGPTLoRAModel was successfully restored from /home/adithyare/NeMo/tutorials/nlp/megatron_gpt_345m.nemo.\n" - ] - } + "Simply substitute with the `MegatronT5SFTModel` class to use T5 instead of GPT.\n", + "\n", + "To use a different PEFT method, you can use a different config class in place of `LoraPEFTConfig`, such as `CanonicalAdaptersPEFTConfig`, `IA3PEFTConfig`, `PtuningPEFTConfig`. You can also use a combination of the methods by passing in a list:\n", + "`model.add_adapter([LoraPEFTConfig(model_cfg), PtuningPEFTConfig(model_cfg)])`\n", + "\n", + "We're now ready to start training." ], - "source": [ - "from nemo.collections.nlp.parts.nlp_overrides import PEFTSaveRestoreConnector\n", - "peft_save_restore_connector = PEFTSaveRestoreConnector(\n", - " peft_model_nemo_path=None, peft_model_ckpt_path=None\n", - " )\n", - "model = MegatronGPTLoRAModel.restore_from(\n", - " restore_path=config.model.restore_from_path,\n", - " trainer=trainer,\n", - " override_config_path=base_model_cfg,\n", - " save_restore_connector=peft_save_restore_connector,\n", - ")" - ] + "metadata": { + "collapsed": false + } }, { "cell_type": "code", - "execution_count": 21, + "execution_count": null, "id": "2d99f433", "metadata": { "scrolled": true }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2023-05-30 14:09:46 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:175: UserWarning: The `batch_idx` argument in `MegatronGPTLoRAModel.on_train_batch_start` hook may not match with the actual batch index when using a `dataloader_iter` argument in your `training_step`.\n", - " rank_zero_warn(\n", - " \n", - "[NeMo W 2023-05-30 14:09:46 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:175: UserWarning: The `batch_idx` argument in `MegatronGPTLoRAModel.on_train_batch_end` hook may not match with the actual batch index when using a `dataloader_iter` argument in your `training_step`.\n", - " rank_zero_warn(\n", - " \n", - "[NeMo W 2023-05-30 14:09:46 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/lightning_fabric/plugins/environments/torchelastic.py:36: UserWarning: MASTER_ADDR environment variable is not defined. Set as localhost\n", - " rank_zero_warn(\"MASTER_ADDR environment variable is not defined. Set as localhost\")\n", - " \n", - "[NeMo W 2023-05-30 14:09:46 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/lightning_fabric/plugins/environments/torchelastic.py:44: UserWarning: MASTER_PORT environment variable is not defined. Set as 12910\n", - " rank_zero_warn(\"MASTER_PORT environment variable is not defined. Set as 12910\")\n", - " \n", - "Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1\n", - "----------------------------------------------------------------------------------------------------\n", - "distributed_backend=nccl\n", - "All distributed processes registered. Starting with 1 processes\n", - "----------------------------------------------------------------------------------------------------\n", - "\n", - "You are using a CUDA device ('NVIDIA RTX A6000') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision\n", - "[NeMo W 2023-05-30 14:09:46 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:613: UserWarning: Checkpoint directory /home/adithyare/NeMo/tutorials/nlp/training_info/checkpoints exists and is not empty.\n", - " rank_zero_warn(f\"Checkpoint directory {dirpath} exists and is not empty.\")\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2023-05-30 14:09:46 megatron_gpt_sft_model:634] Building GPT SFT validation datasets.\n", - "[NeMo I 2023-05-30 14:09:46 text_memmap_dataset:104] Building data files\n", - "[NeMo I 2023-05-30 14:09:46 text_memmap_dataset:343] Processing 1 data files using 12 workers\n", - "[NeMo I 2023-05-30 14:09:47 text_memmap_dataset:349] Time building 0 / 1 mem-mapped files: 0:00:00.360761\n", - "[NeMo I 2023-05-30 14:09:47 text_memmap_dataset:114] Loading data files\n", - "[NeMo I 2023-05-30 14:09:47 text_memmap_dataset:205] Loading data/SQuAD/squad_short_val.jsonl\n", - "[NeMo I 2023-05-30 14:09:47 text_memmap_dataset:117] Time loading 1 mem-mapped files: 0:00:00.002361\n", - "[NeMo I 2023-05-30 14:09:47 text_memmap_dataset:121] Computing global indices\n", - "[NeMo I 2023-05-30 14:09:47 megatron_gpt_sft_model:637] Length of val dataset: 20\n", - "[NeMo I 2023-05-30 14:09:47 megatron_gpt_sft_model:648] Building GPT SFT traing datasets.\n", - "[NeMo I 2023-05-30 14:09:47 text_memmap_dataset:104] Building data files\n", - "[NeMo I 2023-05-30 14:09:47 text_memmap_dataset:343] Processing 1 data files using 12 workers\n", - "[NeMo I 2023-05-30 14:09:47 text_memmap_dataset:349] Time building 0 / 1 mem-mapped files: 0:00:00.299554\n", - "[NeMo I 2023-05-30 14:09:47 text_memmap_dataset:114] Loading data files\n", - "[NeMo I 2023-05-30 14:09:47 text_memmap_dataset:205] Loading data/SQuAD/squad_short_train.jsonl\n", - "[NeMo I 2023-05-30 14:09:47 text_memmap_dataset:117] Time loading 1 mem-mapped files: 0:00:00.001065\n", - "[NeMo I 2023-05-30 14:09:47 text_memmap_dataset:121] Computing global indices\n", - "[NeMo I 2023-05-30 14:09:47 dataset_utils:1341] > loading indexed mapping from data/SQuAD/squad_short_train.jsonl_squad_short_train.jsonl_indexmap_402mns_2046msl_0.00ssp_1234s.npy\n", - "[NeMo I 2023-05-30 14:09:47 dataset_utils:1344] loaded indexed file in 0.001 seconds\n", - "[NeMo I 2023-05-30 14:09:47 dataset_utils:1345] total number of samples: 600\n", - "make: Entering directory '/home/adithyare/NeMo/nemo/collections/nlp/data/language_modeling/megatron'\n", - "make: Nothing to be done for 'default'.\n", - "make: Leaving directory '/home/adithyare/NeMo/nemo/collections/nlp/data/language_modeling/megatron'\n", - "[NeMo I 2023-05-30 14:09:47 blendable_dataset:67] > elapsed time for building blendable dataset indices: 0.09 (sec)\n", - "> building indices for blendable datasets ...\n", - " > sample ratios:\n", - " dataset 0, input: 1, achieved: 1\n", - "[NeMo I 2023-05-30 14:09:47 megatron_gpt_sft_model:650] Length of train dataset: 402\n", - "[NeMo I 2023-05-30 14:09:47 megatron_gpt_sft_model:655] Building dataloader with consumed samples: 0\n", - "[NeMo I 2023-05-30 14:09:47 megatron_gpt_sft_model:655] Building dataloader with consumed samples: 0\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2023-05-30 14:09:47 nlp_overrides:124] Configuring DDP for model parallelism.\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 adapter_mixins:430] Unfrozen adapter : lora_kqv_adapter\n", - "[NeMo I 2023-05-30 14:09:47 megatron_gpt_peft_models:130] Optimizer groups set:\n", - " | Name | Type | Params\n", - " -----------------------------------\n", - " 0 | model | GPTModel | 358 M \n", - " -----------------------------------\n", - " 3.1 M Trainable params\n", - " 354 M Non-trainable params\n", - " 358 M Total params\n", - " 716.034 Total estimated model params size (MB)\n", - "[NeMo I 2023-05-30 14:09:47 modelPT:721] Optimizer config = FusedAdam (\n", - " Parameter Group 0\n", - " betas: [0.9, 0.98]\n", - " bias_correction: True\n", - " eps: 1e-08\n", - " lr: 0.0001\n", - " weight_decay: 0.01\n", - " )\n", - "[NeMo I 2023-05-30 14:09:47 lr_scheduler:910] Scheduler \"\" \n", - " will be used during training (effective maximum steps = 100) - \n", - " Parameters : \n", - " (warmup_steps: 50\n", - " min_lr: 0.0\n", - " constant_steps: 0\n", - " max_steps: 100\n", - " )\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\n", - " | Name | Type | Params\n", - "-----------------------------------\n", - "0 | model | GPTModel | 358 M \n", - "-----------------------------------\n", - "3.1 M Trainable params\n", - "354 M Non-trainable params\n", - "358 M Total params\n", - "716.034 Total estimated model params size (MB)\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3cb87a7b9d4b46e4a0fb0f0670351fbd", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Sanity Checking: 0it [00:00, ?it/s]" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2023-05-30 14:09:48 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 24 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.\n", - " rank_zero_warn(\n", - " \n", - "[NeMo W 2023-05-30 14:09:48 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py:401: UserWarning: Found `dataloader_iter` argument in the `validation_step`. Note that the support for this signature is experimental and the behavior is subject to change.\n", - " rank_zero_warn(\n", - " \n", - "[NeMo W 2023-05-30 14:09:48 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/utils.py:81: UserWarning: This function is only for unittest\n", - " warnings.warn(\"This function is only for unittest\")\n", - " \n", - "[NeMo W 2023-05-30 14:09:49 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:536: PossibleUserWarning: It is recommended to use `self.log('val_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.\n", - " warning_cache.warn(\n", - " \n", - "[NeMo W 2023-05-30 14:09:49 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:536: PossibleUserWarning: It is recommended to use `self.log('validation_loss_squad_val', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.\n", - " warning_cache.warn(\n", - " \n", - "[NeMo W 2023-05-30 14:09:49 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:536: PossibleUserWarning: It is recommended to use `self.log('validation_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.\n", - " warning_cache.warn(\n", - " \n", - "[NeMo W 2023-05-30 14:09:49 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 24 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.\n", - " rank_zero_warn(\n", - " \n", - "[NeMo W 2023-05-30 14:09:49 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py:344: UserWarning: Found `dataloader_iter` argument in the `training_step`. Note that the support for this signature is experimental and the behavior is subject to change.\n", - " rank_zero_warn(\n", - " \n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c7a473adeca64c828d2a1338dab1e76b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Training: 0it [00:00, ?it/s]" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2023-05-30 14:09:51 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:232: UserWarning: You called `self.log('global_step', ...)` in your `training_step` but the value needs to be floating point. Converting it to torch.float32.\n", - " warning_cache.warn(\n", - " \n", - "[NeMo W 2023-05-30 14:09:51 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:232: UserWarning: You called `self.log('consumed_samples', ...)` in your `training_step` but the value needs to be floating point. Converting it to torch.float32.\n", - " warning_cache.warn(\n", - " \n", - "[NeMo W 2023-05-30 14:09:51 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n", - " warnings.warn(\"Detected call of `lr_scheduler.step()` before `optimizer.step()`. \"\n", - " \n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a0606700c7ab495eb08ed88c16949569", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Validation: 0it [00:00, ?it/s]" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Epoch 0, global step 100: 'validation_loss' reached 0.30823 (best 0.30823), saving model to '/home/adithyare/NeMo/tutorials/nlp/training_info/checkpoints/lora_example_tuning--validation_loss=0.308-step=100-consumed_samples=396.0-v2.ckpt' as top 1\n", - "Metric val_loss improved. New best score: 0.308\n", - "`Trainer.fit` stopped: `max_steps=100` reached.\n", - "Restoring states from the checkpoint path at /home/adithyare/NeMo/tutorials/nlp/training_info/checkpoints/lora_example_tuning--validation_loss=0.308-step=100-consumed_samples=396.0-v2.ckpt\n", - "Restored all states from the checkpoint file at /home/adithyare/NeMo/tutorials/nlp/training_info/checkpoints/lora_example_tuning--validation_loss=0.308-step=100-consumed_samples=396.0-v2.ckpt\n" - ] - } - ], + "outputs": [], "source": [ - "# Training set to 2 epochs by default in a cell above\n", "trainer.fit(model)" ] }, @@ -1206,31 +567,15 @@ "id": "b8210d6d", "metadata": {}, "source": [ - "Once training is completed you should see a saved '.nemo' file in this folder `{config.exp_manager.explicit_log_dir}/checkpoints`" + "Once training is completed you should see a saved '.nemo' file in this folder `{config.exp_manager.explicit_log_dir}/checkpoints`. This checkpoint will only contain the trained adapter weights, and not the frozen base model weights." ] }, { "cell_type": "code", - "execution_count": 22, + "execution_count": null, "id": "e4e19e65", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "total 230M\n", - "-rw-rw-r-- 1 adithyare adithyare 14M May 30 14:10 lora_example_tuning.nemo\n", - "-rw-rw-r-- 1 adithyare adithyare 37M May 27 09:47 'lora_example_tuning--validation_loss=0.308-step=100-consumed_samples=396.0.ckpt'\n", - "-rw-rw-r-- 1 adithyare adithyare 37M May 27 09:47 'lora_example_tuning--validation_loss=0.308-step=100-consumed_samples=396.0-last.ckpt'\n", - "-rw-rw-r-- 1 adithyare adithyare 37M May 30 11:12 'lora_example_tuning--validation_loss=0.308-step=100-consumed_samples=396.0-last-v1.ckpt'\n", - "-rw-rw-r-- 1 adithyare adithyare 37M May 30 14:10 'lora_example_tuning--validation_loss=0.308-step=100-consumed_samples=396.0-last-v2.ckpt'\n", - "-rw-rw-r-- 1 adithyare adithyare 37M May 30 11:12 'lora_example_tuning--validation_loss=0.308-step=100-consumed_samples=396.0-v1.ckpt'\n", - "-rw-rw-r-- 1 adithyare adithyare 37M May 30 14:10 'lora_example_tuning--validation_loss=0.308-step=100-consumed_samples=396.0-v2.ckpt'\n", - "training_info\n" - ] - } - ], + "outputs": [], "source": [ "# The trained '.nemo' model is saved in the location below:\n", "! ls -lh {config.exp_manager.explicit_log_dir}/checkpoints\n", @@ -1244,21 +589,62 @@ "metadata": {}, "source": [ "### Inference\n", - "The model object from `trainer.fit(model)` is also capable of doing inference. But for the tutorial we will re-load the saved `.nemo` lora model along with a `.nemo` base language model to simulate a more realistic scenario (where training does not happen right before inference).\n", + "The model object from `trainer.fit(model)` is also capable of doing inference. For the tutorial, however, we will re-load the saved `.nemo` lora model along with a `.nemo` base language model to simulate a more realistic scenario (where training does not happen right before inference).\n", "\n", - "First, we will load and modify a config file that will be used for inference." + "Run the cell below to reimport libraries and classes in case you did not run the training cells above." ] }, { "cell_type": "code", - "execution_count": 23, + "execution_count": null, + "outputs": [], + "source": [ + "# reimport libraries and classes in case one wants to only run cells from the Inference section\n", + "%cd /NeMo/tutorials/nlp\n", + "import wget, os, sys\n", + "sys.path.insert(0, \"../..\") # find the local nemo first before the installed nemo\n", + "from omegaconf import OmegaConf\n", + "from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronTrainerBuilder\n", + "from nemo.collections.nlp.parts.peft_config import LoraPEFTConfig\n", + "from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTSFTModel\n", + "\n", + "NEMO_DIR = \".\"\n", + "DATA_DIR = \"data\"\n", + "CONFIG_DIR = os.path.join(NEMO_DIR, \"conf\")\n", + "SQUAD_DIR = os.path.join(DATA_DIR, \"SQuAD\")\n" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "First, we will load and modify a config file that will be used for inference.\n" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# Download the example config file\n", + "wget.download(f'https://raw.githubusercontent.com/NVIDIA/NeMo/{BRANCH}/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_eval_config.yaml', CONFIG_DIR)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, "id": "41ab98a9", "metadata": {}, "outputs": [], "source": [ - "# Download the example config file\n", - "wget.download(f'https://raw.githubusercontent.com/NVIDIA/NeMo/{BRANCH}/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_eval_config.yaml', CONFIG_DIR)\n", - "\n", "# Load the example config file so we can start editing it\n", "CONFIG_EVAL_PATH = os.path.join(CONFIG_DIR, \"megatron_gpt_peft_eval_config.yaml\")\n", "config_eval = OmegaConf.load(CONFIG_EVAL_PATH)" @@ -1277,7 +663,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": null, "id": "64a4e71a", "metadata": {}, "outputs": [], @@ -1294,28 +680,12 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": null, "id": "d8ace8f9", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Using 16bit None Automatic Mixed Precision (AMP)\n", - "GPU available: True (cuda), used: True\n", - "TPU available: False, using: 0 TPU cores\n", - "IPU available: False, using: 0 IPUs\n", - "HPU available: False, using: 0 HPUs\n" - ] - } - ], + "outputs": [], "source": [ - "strategy_eval = NLPDDPStrategy(find_unused_parameters=False, no_ddp_communication_hook=True)\n", - "plugins_eval = [TorchElasticEnvironment()]\n", - "# notice the plugins, strategy and config.trainer args are the same as is training portion of this tutorial\n", - "# we just create a new object with no overlap from the training section of this tutorial\n", - "trainer_eval = pl.Trainer(plugins= plugins_eval, strategy=strategy_eval, **config_eval.trainer) " + "trainer_eval = MegatronTrainerBuilder(config_eval).create_trainer()" ] }, { @@ -1324,63 +694,47 @@ "id": "e745ac5e", "metadata": {}, "source": [ - "The `config_eval` object is the hydra config at \"inference/test time\". This means it should contain information relevant for inference/test time. But we still need to know some properties that were set at training time. For example, was the training done with `BOS` enabled or not? And other model specific attributes.\n", + "The `config_eval` object is the hydra config at \"inference/test time\". This means it should contain information relevant for inference/test time, although some properties that were set at training time are still relevant. For example, whether training was done with `BOS` enabled or not, and other model specific attributes.\n", "\n", - "So we extract the `peft_model_cfg` from the '.nemo' file of the lora model we just trained." + "So we extract the relevant information from the '.nemo' file of the lora model we just trained using the `merge_inference_cfg` function." ] }, { "cell_type": "code", - "execution_count": 26, + "execution_count": null, "id": "e04a2201", "metadata": {}, "outputs": [], "source": [ - "from nemo.collections.nlp.models.language_modeling.megatron_gpt_peft_models import MegatronGPTPEFTModel\n", - "peft_model_cfg = MegatronGPTPEFTModel.restore_from(\n", - " restore_path=\"./training_info/checkpoints/lora_example_tuning.nemo\", trainer=trainer_eval, return_config=True,\n", - ")" + "eval_model_cfg = MegatronGPTSFTModel.merge_inference_cfg(config_eval.model.peft.restore_from_path, config_eval)" ] }, { - "attachments": {}, "cell_type": "markdown", - "id": "79a17ac7", - "metadata": {}, "source": [ - "We modify `peft_model_cfg` to include attributes from the `config_eval` that are specific to inference time." - ] + "The cell below is required if you are running the notebook end-to-end, and if you use a different batch size for training and evaluation. In this case, the microbatch calculator needs to be rest. If you are running training only or inference only, feel free to ignore this cell." + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "code", - "execution_count": 27, - "id": "0e0a17aa", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'file_names': ['data/SQuAD/squad_short_val.jsonl'], 'names': ['test_set'], 'global_batch_size': 1, 'micro_batch_size': 1, 'shuffle': False, 'num_workers': 0, 'pin_memory': True, 'max_seq_length': 2048, 'min_seq_length': 1, 'drop_last': False, 'context_key': '${data.train_ds.context_key}', 'label_key': '${data.train_ds.label_key}', 'add_eos': '${data.train_ds.add_eos}', 'add_sep': '${data.train_ds.add_sep}', 'add_bos': '${data.train_ds.add_bos}', 'separate_prompt_and_response_with_newline': '${data.train_ds.separate_prompt_and_response_with_newline}', 'write_predictions_to_file': False, 'output_file_path_prefix': None, 'truncation_field': '${data.train_ds.truncation_field}', 'index_mapping_dir': None, 'prompt_template': '${data.train_ds.prompt_template}', 'tokens_to_generate': 30, 'metric': {'name': 'loss', 'average': None, 'num_classes': None}}\n" - ] - } - ], + "execution_count": null, + "outputs": [], "source": [ - "with open_dict(peft_model_cfg):\n", - " # update the model config of the trained model with params we want to set at inference time.\n", - " peft_model_cfg.precision = config_eval.trainer.precision\n", - " peft_model_cfg.data.test_ds = config_eval.model.data.test_ds\n", - " peft_model_cfg.activations_checkpoint_granularity = None\n", - " peft_model_cfg.activations_checkpoint_method = None\n", - "\n", - "with open_dict(config_eval):\n", - " # update the config with the trained model config\n", - " # required for hydra interpolation to work inside cfg.inference\n", - " config_eval.inference.add_BOS = peft_model_cfg.data.test_ds.add_bos\n", - " config_eval.inference.tokens_to_generate = peft_model_cfg.data.test_ds.tokens_to_generate\n", - "\n", - "print(peft_model_cfg.data.test_ds)" - ] + "from apex.transformer.pipeline_parallel.utils import _reconfigure_microbatch_calculator\n", + "_reconfigure_microbatch_calculator(\n", + " rank=0,\n", + " rampup_batch_size=None,\n", + " global_batch_size=config_eval.model.global_batch_size,\n", + " micro_batch_size=config_eval.model.micro_batch_size,\n", + " data_parallel_size=1,\n", + ")" + ], + "metadata": { + "collapsed": false + } }, { "attachments": {}, @@ -1388,101 +742,21 @@ "id": "132ae378", "metadata": {}, "source": [ - "Next, we load the base language model as well as the lora model we just trained." + "Then, we load the base language model as well as the lora model we just trained." ] }, { "cell_type": "code", - "execution_count": 28, + "execution_count": null, "id": "b19cd0ce", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2023-05-30 14:11:11 megatron_init:232] Rank 0 has data parallel group: [0]\n", - "[NeMo I 2023-05-30 14:11:11 megatron_init:235] All data parallel group ranks: [[0]]\n", - "[NeMo I 2023-05-30 14:11:11 megatron_init:236] Ranks 0 has data parallel rank: 0\n", - "[NeMo I 2023-05-30 14:11:11 megatron_init:244] Rank 0 has model parallel group: [0]\n", - "[NeMo I 2023-05-30 14:11:11 megatron_init:245] All model parallel group ranks: [[0]]\n", - "[NeMo I 2023-05-30 14:11:11 megatron_init:255] Rank 0 has tensor model parallel group: [0]\n", - "[NeMo I 2023-05-30 14:11:11 megatron_init:259] All tensor model parallel group ranks: [[0]]\n", - "[NeMo I 2023-05-30 14:11:11 megatron_init:260] Rank 0 has tensor model parallel rank: 0\n", - "[NeMo I 2023-05-30 14:11:11 megatron_init:274] Rank 0 has pipeline model parallel group: [0]\n", - "[NeMo I 2023-05-30 14:11:11 megatron_init:286] Rank 0 has embedding group: [0]\n", - "[NeMo I 2023-05-30 14:11:11 megatron_init:292] All pipeline model parallel group ranks: [[0]]\n", - "[NeMo I 2023-05-30 14:11:11 megatron_init:293] Rank 0 has pipeline model parallel rank 0\n", - "[NeMo I 2023-05-30 14:11:11 megatron_init:294] All embedding group ranks: [[0]]\n", - "[NeMo I 2023-05-30 14:11:11 megatron_init:295] Rank 0 has embedding rank: 0\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2023-05-30 14:11:11 modelPT:244] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2023-05-30 14:11:11 tokenizer_utils:204] Getting Megatron tokenizer for pretrained model name: megatron-gpt-345m, custom vocab file: /tmp/tmp5lxz3z8d/bfcdca5e44814366bdb5dcd651325152_gpt2-vocab.json, and merges file: /tmp/tmp5lxz3z8d/315a11fd68be49d6abdb34363e8c4997_gpt2-merge.txt\n", - "[NeMo I 2023-05-30 14:11:11 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: gpt2, vocab_file: /tmp/tmp5lxz3z8d/bfcdca5e44814366bdb5dcd651325152_gpt2-vocab.json, merges_files: /tmp/tmp5lxz3z8d/315a11fd68be49d6abdb34363e8c4997_gpt2-merge.txt, special_tokens_dict: {}, and use_fast: False\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Using sep_token, but it is not set yet.\n", - "Using cls_token, but it is not set yet.\n", - "Using pad_token, but it is not set yet.\n", - "Using mask_token, but it is not set yet.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2023-05-30 14:11:12 megatron_base_model:238] Padded vocab_size: 50304, original vocab_size: 50257, dummy tokens: 47.\n", - "[NeMo I 2023-05-30 14:11:12 build_model:143] > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 354871296\n", - "[NeMo I 2023-05-30 14:11:12 megatron_gpt_peft_models:56] Before adding PEFT params:\n", - " | Name | Type | Params\n", - " -----------------------------------\n", - " 0 | model | GPTModel | 354 M \n", - " -----------------------------------\n", - " 354 M Trainable params\n", - " 0 Non-trainable params\n", - " 354 M Total params\n", - " 1,419.485 Total estimated model params size (MB)\n", - "[NeMo I 2023-05-30 14:11:12 megatron_gpt_peft_models:65] After adding PEFT params:\n", - " | Name | Type | Params\n", - " -----------------------------------\n", - " 0 | model | GPTModel | 358 M \n", - " -----------------------------------\n", - " 358 M Trainable params\n", - " 0 Non-trainable params\n", - " 358 M Total params\n", - " 1,432.068 Total estimated model params size (MB)\n", - "[NeMo I 2023-05-30 14:11:13 nlp_overrides:491] Model MegatronGPTLoRAModel was successfully restored from /home/adithyare/NeMo/tutorials/nlp/megatron_gpt_345m.nemo.\n" - ] - } - ], + "outputs": [], "source": [ - "save_restore_connector = PEFTSaveRestoreConnector(\n", - " peft_model_nemo_path=config_eval.model.peft.restore_from_path, peft_model_ckpt_path=None,\n", - ")\n", - "from nemo.collections.nlp.models.nlp_model import NLPModel\n", - "model_eval = MegatronGPTPEFTModel.restore_from(\n", - " restore_path=config_eval.model.restore_from_path,\n", - " trainer=trainer,\n", - " override_config_path=peft_model_cfg,\n", - " save_restore_connector=save_restore_connector,\n", - ")\n", + "model_eval = MegatronGPTSFTModel.restore_from(config_eval.model.restore_from_path, eval_model_cfg, trainer=trainer_eval)\n", + "model_eval.load_adapters(config_eval.model.peft.restore_from_path, LoraPEFTConfig(eval_model_cfg))\n", + "model_eval.freeze()\n", "\n", - "model_eval.freeze()" + "print(\"Parameter count manually:\\n\", model_eval.summarize())" ] }, { @@ -1496,34 +770,20 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": null, "id": "12c390f8", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2023-05-30 14:11:18 text_memmap_dataset:104] Building data files\n", - "[NeMo I 2023-05-30 14:11:18 text_memmap_dataset:343] Processing 1 data files using 12 workers\n", - "[NeMo I 2023-05-30 14:11:18 text_memmap_dataset:349] Time building 0 / 1 mem-mapped files: 0:00:00.706630\n", - "[NeMo I 2023-05-30 14:11:18 text_memmap_dataset:114] Loading data files\n", - "[NeMo I 2023-05-30 14:11:18 text_memmap_dataset:205] Loading data/SQuAD/squad_short_val.jsonl\n", - "[NeMo I 2023-05-30 14:11:18 text_memmap_dataset:117] Time loading 1 mem-mapped files: 0:00:00.001054\n", - "[NeMo I 2023-05-30 14:11:18 text_memmap_dataset:121] Computing global indices\n" - ] - } - ], + "outputs": [], "source": [ - "_test_ds = model_eval._build_dataset(peft_model_cfg.data.test_ds, is_train=False)\n", + "_test_ds = model_eval._build_dataset(eval_model_cfg.data.test_ds, is_train=False)\n", "from torch.utils.data import DataLoader\n", "request_dl = DataLoader(\n", " dataset=_test_ds[0],\n", - " batch_size=peft_model_cfg.data.test_ds.global_batch_size,\n", + " batch_size=eval_model_cfg.data.test_ds.global_batch_size,\n", " collate_fn=_test_ds[0].collate_fn,\n", ")\n", "config_inference = OmegaConf.to_container(config_eval.inference, resolve=True)\n", - "model_eval.set_inference_config(config_inference)\n" + "model_eval.set_inference_config(config_inference)" ] }, { @@ -1535,161 +795,14 @@ "And finally, we call `trainer.predict` which triggers the inference process. The `response` object contains the outputs of the model." ] }, - { - "cell_type": "markdown", - "id": "733c172c", - "metadata": {}, - "source": [] - }, { "cell_type": "code", - "execution_count": 30, + "execution_count": null, "id": "5ba6a70c", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "You are using a CUDA device ('NVIDIA RTX A6000') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision\n", - "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]\n", - "[NeMo W 2023-05-30 14:11:30 nemo_logging:349] /home/adithyare/miniconda3/envs/n22/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, predict_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 24 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.\n", - " rank_zero_warn(\n", - " \n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ddcc3ce26ed74665a8429953b929a037", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Predicting: 100it [00:00, ?it/s]" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2023-05-30 14:11:30 nemo_logging:349] /home/adithyare/NeMo/nemo/collections/nlp/modules/common/text_generation_utils.py:306: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402379298/work/torch/csrc/utils/tensor_numpy.cpp:206.)\n", - " string_tensor = torch.as_tensor(\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:Which NFL team represented the AFC at Super Bowl 50?\n", - "\n", - "Assistant: Denver Broncos\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:Which NFL team represented the NFC at Super Bowl 50?\n", - "\n", - "Assistant: Denver Broncos\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:Where did Super Bowl 50 take place?\n", - "\n", - "Assistant: Santa Clara, California\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:Which NFL team won Super Bowl 50?\n", - "\n", - "Assistant: Denver Broncos\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:What color was used to emphasize the 50th anniversary of the Super Bowl?\n", - "\n", - "Assistant: gold\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:What was the theme of Super Bowl 50?\n", - "\n", - "Assistant: \"Gold\"\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:What day was the game played on?\n", - "\n", - "Assistant: February 7, 2016\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:What is the AFC short for?\n", - "\n", - "Assistant: Super Bowl 50\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:What was the theme of Super Bowl 50?\n", - "\n", - "Assistant: \"Gold\"\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:What does AFC stand for?\n", - "\n", - "Assistant: Super Bowl L\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:What day was the Super Bowl played on?\n", - "\n", - "Assistant: February 7, 2016\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:Who won Super Bowl 50?\n", - "\n", - "Assistant: Denver Broncos\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:What venue did Super Bowl 50 take place in?\n", - "\n", - "Assistant: Levi's Stadium\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:What city did Super Bowl 50 take place in?\n", - "\n", - "Assistant: San Francisco\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:If Roman numerals were used, what would Super Bowl 50 have been called?\n", - "\n", - "Assistant: Super Bowl L\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:Super Bowl 50 decided the NFL champion for what season?\n", - "\n", - "Assistant: 2015\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:What year did the Denver Broncos secure a Super Bowl title for the third time?\n", - "\n", - "Assistant: 2015\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:What city did Super Bowl 50 take place in?\n", - "\n", - "Assistant: San Francisco\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:What stadium did Super Bowl 50 take place in?\n", - "\n", - "Assistant: Levi's Stadium\n", - "\n", - "\n", - "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:What was the final score of Super Bowl 50? \n", - "\n", - "Assistant: 24–10\n", - "\n", - "\n" - ] - } - ], + "outputs": [], "source": [ - "response = trainer.predict(model_eval, request_dl)\n", + "response = trainer_eval.predict(model_eval, request_dl)\n", "for batch in response:\n", " for s in batch['sentences']:\n", " print(f\"{s}\\n\\n\")" From 1da24ef820f0953e92dd0e5edb494285318b06da Mon Sep 17 00:00:00 2001 From: Li Tao Date: Fri, 29 Sep 2023 02:01:51 +0800 Subject: [PATCH 070/112] fix a typo (#7496) Signed-off-by: BestJuly Signed-off-by: Sasha Meister --- .../collections/nlp/data/language_modeling/megatron/helpers.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nemo/collections/nlp/data/language_modeling/megatron/helpers.cpp b/nemo/collections/nlp/data/language_modeling/megatron/helpers.cpp index ff262c053dab..4a65b313c816 100644 --- a/nemo/collections/nlp/data/language_modeling/megatron/helpers.cpp +++ b/nemo/collections/nlp/data/language_modeling/megatron/helpers.cpp @@ -38,7 +38,7 @@ void build_blending_indices(py::array_t& dataset_index, const int32_t num_datasets, const int64_t size, const bool verbose) { /* Given multiple datasets and a weighting array, build samples - such that it follows those wieghts.*/ + such that it follows those weights.*/ if (verbose) { std::cout << "> building indices for blendable datasets ..." << std::endl; From c3d3ffcc7512d7274a38f3f6f79a5f7a2e10c07c Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu, 28 Sep 2023 16:46:41 -0700 Subject: [PATCH 071/112] [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister --- tutorials/tts/FastPitch_ChineseTTS_Training.ipynb | 8 ++------ tutorials/tts/FastPitch_GermanTTS_Training.ipynb | 8 ++------ tutorials/tts/Vits_Training.ipynb | 8 ++------ 3 files changed, 6 insertions(+), 18 deletions(-) diff --git a/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb b/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb index 2a12b417a271..6be81479c750 100644 --- a/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb +++ b/tutorials/tts/FastPitch_ChineseTTS_Training.ipynb @@ -61,12 +61,8 @@ "# !pip install wget text-unidecode matplotlib>=3.3.2\n", "\n", "## Install NeMo\n", - "BRANCH = 'main'\n", - "# !python -m pip install \"git+https://github.com/NVIDIA/NeMo.git@${BRANCH}#egg=nemo_toolkit[all]\"\n", - "\n", - "## Install pynini\n", - "# !wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/nemo_text_processing/install_pynini.sh\n", - "# !bash install_pynini.sh\n", + "BRANCH = 'r1.21.0'\n", + "# !python -m pip install \"git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]\"\n", "\n", "# !pip install opencc-python-reimplemented\n", "\n", diff --git a/tutorials/tts/FastPitch_GermanTTS_Training.ipynb b/tutorials/tts/FastPitch_GermanTTS_Training.ipynb index e7cb0e896650..fa29916b43d2 100644 --- a/tutorials/tts/FastPitch_GermanTTS_Training.ipynb +++ b/tutorials/tts/FastPitch_GermanTTS_Training.ipynb @@ -61,12 +61,8 @@ "# !pip install wget text-unidecode matplotlib>=3.3.2\n", "\n", "## Install NeMo\n", - "BRANCH = 'main'\n", - "# !python -m pip install \"git+https://github.com/NVIDIA/NeMo.git@${BRANCH}#egg=nemo_toolkit[all]\"\n", - "\n", - "## Install pynini\n", - "# !wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/nemo_text_processing/install_pynini.sh\n", - "# !bash install_pynini.sh\n", + "BRANCH = 'r1.21.0'\n", + "# !python -m pip install \"git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]\"\n", "\n", "\"\"\"\n", "Remember to restart the runtime for the kernel to pick up any upgraded packages (e.g. matplotlib)!\n", diff --git a/tutorials/tts/Vits_Training.ipynb b/tutorials/tts/Vits_Training.ipynb index 296661c6ca5e..dbfc0edbe82c 100644 --- a/tutorials/tts/Vits_Training.ipynb +++ b/tutorials/tts/Vits_Training.ipynb @@ -63,12 +63,8 @@ "# !pip install wget text-unidecode matplotlib>=3.3.2\n", "\n", "## Install NeMo\n", - "BRANCH = 'main'\n", - "# !python -m pip install \"git+https://github.com/NVIDIA/NeMo.git@${BRANCH}#egg=nemo_toolkit[all]\"\n", - "\n", - "## Install pynini\n", - "# !wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/nemo_text_processing/install_pynini.sh\n", - "# !bash install_pynini.sh\n", + "BRANCH = 'r1.21.0'\n", + "# !python -m pip install \"git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]\"\n", "\n", "\"\"\"\n", "Remember to restart the runtime for the kernel to pick up any upgraded packages (e.g. matplotlib)!\n", From 6849c944e65a5282400b46faa2148146511c96eb Mon Sep 17 00:00:00 2001 From: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Date: Fri, 29 Sep 2023 09:39:50 -0700 Subject: [PATCH 072/112] add youtube embed url (#7570) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister --- README.rst | 17 ++++++++++++++++- docs/source/starthere/intro.rst | 11 ++++++++++- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index 6dc491523a99..15a2120d30b5 100644 --- a/README.rst +++ b/README.rst @@ -67,7 +67,22 @@ For scaling NeMo LLM training on Slurm clusters or public clouds, please see the The NM launcher has extensive recipes, scripts, utilities, and documentation for training NeMo LLMs and also has an `Autoconfigurator `_ which can be used to find the optimal model parallel configuration for training on a specific cluster. -Also see our `introductory video `_ for a high level overview of NeMo. +Also see the two introductory videos below for a high level overview of NeMo. + +* Developing State-Of-The-Art Conversational AI Models in Three Lines of Code. +* NVIDIA NeMo: Toolkit for Conversational AI at PyData Yerevan 2022. + +|three_lines| |pydata| + +.. |pydata| image:: https://img.youtube.com/vi/J-P6Sczmas8/maxres3.jpg + :target: https://www.youtube.com/embed/J-P6Sczmas8?mute=0&start=14&autoplay=0 + :width: 600 + :alt: Develop Conversational AI Models in 3 Lines + +.. |three_lines| image:: https://img.youtube.com/vi/wBgpMf_KQVw/maxresdefault.jpg + :target: https://www.youtube.com/embed/wBgpMf_KQVw?mute=0&start=0&autoplay=0 + :width: 600 + :alt: Introduction at PyData@Yerevan 2022 Key Features ------------ diff --git a/docs/source/starthere/intro.rst b/docs/source/starthere/intro.rst index 70426d3fe4a0..9297b7ef53b3 100644 --- a/docs/source/starthere/intro.rst +++ b/docs/source/starthere/intro.rst @@ -19,14 +19,23 @@ Conversational AI architectures are typically large and require a lot of data an for training. NeMo uses `PyTorch Lightning `_ for easy and performant multi-GPU/multi-node mixed-precision training. -`Pre-trained NeMo models. `_ +`Pre-trained NeMo models. `_ +Also see the two introductory videos below for a high level overview of NeMo. + +* Developing State-Of-The-Art Conversational AI Models in Three Lines of Code. .. raw:: html
+* NVIDIA NeMo: Toolkit for Conversational AI at PyData Yerevan 2022. +.. image:: https://img.youtube.com/vi/J-P6Sczmas8/maxres3.jpg + :target: https://www.youtube.com/embed/J-P6Sczmas8?mute=0&start=14&autoplay=0 + :width: 560 + :alt: Develop Conversational AI Models in 3 Lines + For more information and questions, visit the `NVIDIA NeMo Discussion Board `_. Prerequisites From ec4f8c37a8bad6de3002ee7611d935cb6317ef9d Mon Sep 17 00:00:00 2001 From: Robin Dong Date: Sat, 30 Sep 2023 02:57:58 +1000 Subject: [PATCH 073/112] Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong --------- Signed-off-by: Robin Dong Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../fastpitch_align_multispeaker_22050.yaml | 2 +- .../tts/aishell3/get_data.py | 38 ++++++++++++------- 2 files changed, 26 insertions(+), 14 deletions(-) diff --git a/examples/tts/conf/zh/fastpitch_align_multispeaker_22050.yaml b/examples/tts/conf/zh/fastpitch_align_multispeaker_22050.yaml index 2464e546598e..55c918c28b72 100644 --- a/examples/tts/conf/zh/fastpitch_align_multispeaker_22050.yaml +++ b/examples/tts/conf/zh/fastpitch_align_multispeaker_22050.yaml @@ -40,7 +40,7 @@ model: learn_alignment: true bin_loss_warmup_epochs: 100 - n_speakers: 1958 + n_speakers: 175 max_token_duration: 75 symbols_embedding_dim: 384 pitch_embedding_kernel_size: 3 diff --git a/scripts/dataset_processing/tts/aishell3/get_data.py b/scripts/dataset_processing/tts/aishell3/get_data.py index 904ab0314653..1b3043bbf0d3 100755 --- a/scripts/dataset_processing/tts/aishell3/get_data.py +++ b/scripts/dataset_processing/tts/aishell3/get_data.py @@ -86,14 +86,17 @@ def __process_transcript(file_path: str): text_normalizer_call_kwargs = {"punct_pre_process": True, "punct_post_process": True} normalizer_call = lambda x: text_normalizer.normalize(x, **text_normalizer_call_kwargs) entries = [] - i = 0 SPEAKER_LEN = 7 + + candidates = [] + speakers = set() with open(file_path / "train" / "content.txt", encoding="utf-8") as fin: for line in fin: content = line.split() wav_name, text = content[0], "".join(content[1::2]) + "。" wav_name = wav_name.replace(u'\ufeff', '') speaker = wav_name[:SPEAKER_LEN] + speakers.add(speaker) wav_file = file_path / "train" / "wav" / speaker / wav_name assert os.path.exists(wav_file), f"{wav_file} not found!" duration = subprocess.check_output(f"soxi -D {wav_file}", shell=True) @@ -102,18 +105,27 @@ def __process_transcript(file_path: str): processed_file = file_path / "processed" / wav_name # convert wav to mono 22050HZ, 16 bit (as SFSpeech dataset) subprocess.run(f"sox {wav_file} -r 22050 -c 1 -b 16 {processed_file}", shell=True) - simplified_text = cc.convert(text) - normalized_text = normalizer_call(simplified_text) - entry = { - 'audio_filepath': os.path.abspath(processed_file), - 'duration': float(duration), - 'text': text, - 'normalized_text': normalized_text, - 'speaker': int(speaker[3:]), - } - - i += 1 - entries.append(entry) + candidates.append((processed_file, duration, text, speaker)) + + # remapping the speakder to speaker_id (start from 1) + remapping = {} + for index, speaker in enumerate(sorted(speakers)): + remapping[speaker] = index + 1 + + for processed_file, duration, text, speaker in candidates: + simplified_text = cc.convert(text) + normalized_text = normalizer_call(simplified_text) + entry = { + 'audio_filepath': os.path.abspath(processed_file), + 'duration': float(duration), + 'text': text, + 'normalized_text': normalized_text, + 'speaker_raw': speaker, + 'speaker': remapping[speaker], + } + + entries.append(entry) + return entries From 4858db444d10ad58080f7f4c40ec4782d27a3dda Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 29 Sep 2023 11:12:43 -0700 Subject: [PATCH 074/112] fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan Signed-off-by: smajumdar Co-authored-by: Kunal Dhawan Co-authored-by: Somshubra Majumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- nemo/core/classes/modelPT.py | 99 +++++++++++++++++++++++++++---- tutorials/asr/Multilang_ASR.ipynb | 11 +++- 2 files changed, 94 insertions(+), 16 deletions(-) diff --git a/nemo/core/classes/modelPT.py b/nemo/core/classes/modelPT.py index 6e616319b12b..bc6f3bcef1cc 100644 --- a/nemo/core/classes/modelPT.py +++ b/nemo/core/classes/modelPT.py @@ -179,18 +179,11 @@ def __init__(self, cfg: DictConfig, trainer: Trainer = None): # Create list of lists for val and test outputs to support multiple dataloaders # Initialize an empty list as sometimes self._validation_dl can be None at this stage - self.validation_step_outputs = [] - # Check len(self._validation_dl) > 1 as sometimes single dataloader can be in a list: [] when ds_item in - # config has 1 item passed in a list - if self._validation_dl and type(self._validation_dl) == list and len(self._validation_dl) > 1: - for _ in range(len(self._validation_dl)): - self.validation_step_outputs.append([]) + self._validation_step_outputs = None # Initialize an empty list as sometimes self._test_dl can be None at this stage - self.test_step_outputs = [] - if self._test_dl and type(self._test_dl) == list and len(self._test_dl) > 1: - for _ in range(len(self._test_dl)): - self.test_step_outputs.append([]) + self._test_step_outputs = None + # ModelPT wrappers over subclass implementations self.training_step = model_utils.wrap_training_step(self.training_step) @@ -1573,6 +1566,61 @@ def cfg(self, cfg): if hasattr(self, '_hparams_initial') and 'cfg' in self._hparams_initial: self._hparams_initial['cfg'] = OmegaConf.to_object(self._cfg) + @property + def validation_step_outputs(self): + """ + Cached outputs of validation_step. It can be a list of items (for single data loader) or a list of lists + (for multiple data loaders). + + Returns: + List of outputs of validation_step. + """ + if self._validation_step_outputs is not None: + return self._validation_step_outputs + + # Initialize new output list + self._validation_step_outputs = [] + # Check len(self._validation_dl) > 1 as sometimes single dataloader can be in a list: [] when ds_item in + # config has 1 item passed in a list + if ( + self._validation_dl is not None + and isinstance(self._validation_dl, (list, tuple)) + and len(self._validation_dl) > 1 + ): + for _ in range(len(self._validation_dl)): + self._validation_step_outputs.append([]) + + return self._validation_step_outputs + + @validation_step_outputs.setter + def validation_step_outputs(self, value): + self._validation_step_outputs = value + + @property + def test_step_outputs(self): + """ + Cached outputs of test_step. It can be a list of items (for single data loader) or a list of lists (for multiple data loaders). + + Returns: + List of outputs of test_step. + """ + if self._test_step_outputs is not None: + return self._test_step_outputs + + # Initialize new output list + self._test_step_outputs = [] + # Check len(self._test_dl) > 1 as sometimes single dataloader can be in a list: [] when ds_item in + # config has 1 item passed in a list + if self._test_dl is not None and isinstance(self._test_dl, (list, tuple)) and len(self._test_dl) > 1: + for _ in range(len(self._test_dl)): + self._test_step_outputs.append([]) + + return self._test_step_outputs + + @test_step_outputs.setter + def test_step_outputs(self, value): + self._test_step_outputs = value + @staticmethod def _is_model_being_restored() -> bool: app_state = AppState() @@ -1714,15 +1762,40 @@ def on_train_batch_end(self, outputs, batch: Any, batch_idx: int, unused: int = logging.info("====== End nsys profiling ======") torch.cuda.cudart().cudaProfilerStop() + def _cleanup_on_execution_end(self): + """ + Utility function to clean up the module state at the end of execution. + """ + + # dynamic freezing cleanup + if hasattr(self, '_freeze_cfg'): + delattr(self, '_freeze_cfg') + + # Clear up the val and test output caches + self._validation_step_outputs = None + self._test_step_outputs = None + def on_train_end(self): """ PyTorch Lightning hook: https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-end We use it here to cleanup the dynamic freezing config. """ - # dynamic freezing cleanup - if hasattr(self, '_freeze_cfg'): - delattr(self, '_freeze_cfg') + self._cleanup_on_execution_end() + + def on_test_end(self): + """ PyTorch Lightning hook: + https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-test-end + """ + + self._cleanup_on_execution_end() + + def on_predict_end(self): + """ PyTorch Lightning hook: + https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-test-end + """ + + self._cleanup_on_execution_end() # TODO: Remove in PTL 1.7.2 def cuda(self, device=None): diff --git a/tutorials/asr/Multilang_ASR.ipynb b/tutorials/asr/Multilang_ASR.ipynb index a1edeea815d0..7a11cb7dc6a6 100644 --- a/tutorials/asr/Multilang_ASR.ipynb +++ b/tutorials/asr/Multilang_ASR.ipynb @@ -1713,7 +1713,7 @@ }, "outputs": [], "source": [ - "asr_model.setup_multiple_validation_data(val_data_config=validation_ds) " + "asr_model.setup_multiple_validation_data(val_data_config=validation_ds)" ] }, { @@ -2273,7 +2273,7 @@ "provenance": [] }, "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "base", "language": "python", "name": "python3" }, @@ -2287,11 +2287,16 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.13" + "version": "3.8.12" }, "nteract": { "version": "0.28.0" }, + "vscode": { + "interpreter": { + "hash": "1aaa02ce0ce2638a6e16a203f0ce39bc7495f7236d7115882d2d3541e1318e7a" + } + }, "widgets": { "application/vnd.jupyter.widget-state+json": { "013abc9bfddf456abf15dc2b0567d969": { From 122eced8e66f40470a1e98c19f4acfff650b63c8 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 29 Sep 2023 12:31:21 -0700 Subject: [PATCH 075/112] Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../nlp/models/glue_benchmark/glue_benchmark_model.py | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/nemo/collections/nlp/models/glue_benchmark/glue_benchmark_model.py b/nemo/collections/nlp/models/glue_benchmark/glue_benchmark_model.py index 7843da422c4e..4a073e2ada1c 100644 --- a/nemo/collections/nlp/models/glue_benchmark/glue_benchmark_model.py +++ b/nemo/collections/nlp/models/glue_benchmark/glue_benchmark_model.py @@ -173,16 +173,18 @@ def validation_step(self, batch, batch_idx, dataloader_idx=0): model_output = torch.argmax(model_output, 1) eval_tensors = {'preds': model_output, 'labels': labels} - return {'val_loss': val_loss, 'eval_tensors': eval_tensors} + output = {'val_loss': val_loss, 'eval_tensors': eval_tensors} + self.validation_step_outputs.append(output) + return output def multi_validation_epoch_end(self, outputs, dataloader_idx: int = 0): """ Called at the end of validation to aggregate outputs. outputs: list of individual outputs of each validation step. """ - avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean() - preds = torch.cat([x['eval_tensors']['preds'] for x in outputs]) - labels = torch.cat([x['eval_tensors']['labels'] for x in outputs]) + avg_loss = torch.stack([x['val_loss'] for x in self.validation_step_outputs]).mean() + preds = torch.cat([x['eval_tensors']['preds'] for x in self.validation_step_outputs]) + labels = torch.cat([x['eval_tensors']['labels'] for x in self.validation_step_outputs]) all_preds = [] all_labels = [] From ec9e251324ca4bcfa4991bf693ea368b4b80b8da Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 29 Sep 2023 14:16:51 -0700 Subject: [PATCH 076/112] [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister --- tutorials/tts/Vits_Training.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tutorials/tts/Vits_Training.ipynb b/tutorials/tts/Vits_Training.ipynb index dbfc0edbe82c..db7161c06c61 100644 --- a/tutorials/tts/Vits_Training.ipynb +++ b/tutorials/tts/Vits_Training.ipynb @@ -304,8 +304,8 @@ " phoneme_dict_path=tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt \\\n", " heteronyms_path=tts_dataset_files/heteronyms-052722 \\\n", " trainer.max_epochs=3 \\\n", - " trainer.accelerator=auto \\\n", - " trainer.strategy=auto \\\n", + " trainer.accelerator='gpu' \\\n", + " trainer.strategy='ddp_find_unused_parameters_true' \\\n", " trainer.check_val_every_n_epoch=1 \\\n", " trainer.devices=1)" ] From e99d5303199471bfcc29d61a705d3239f016f1be Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 29 Sep 2023 16:16:36 -0700 Subject: [PATCH 077/112] Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister --- nemo/collections/asr/models/enhancement_models.py | 10 ++++++++++ nemo/collections/asr/models/label_models.py | 14 +++++++++++++- 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/nemo/collections/asr/models/enhancement_models.py b/nemo/collections/asr/models/enhancement_models.py index a25bf882a23b..a441c6f7a8b0 100644 --- a/nemo/collections/asr/models/enhancement_models.py +++ b/nemo/collections/asr/models/enhancement_models.py @@ -434,6 +434,16 @@ def evaluation_step(self, batch, batch_idx, dataloader_idx: int = 0, tag: str = # Log global step self.log('global_step', torch.tensor(self.trainer.global_step, dtype=torch.float32), sync_dist=True) + if tag == 'val': + if isinstance(self.trainer.val_dataloaders, (list, tuple)) and len(self.trainer.val_dataloaders) > 1: + self.validation_step_outputs[dataloader_idx].append(output_dict) + else: + self.validation_step_outputs.append(output_dict) + else: + if isinstance(self.trainer.test_dataloaders, (list, tuple)) and len(self.trainer.test_dataloaders) > 1: + self.test_step_outputs[dataloader_idx].append(output_dict) + else: + self.test_step_outputs.append(output_dict) return output_dict @classmethod diff --git a/nemo/collections/asr/models/label_models.py b/nemo/collections/asr/models/label_models.py index 1a284aca609d..83e57ece59e3 100644 --- a/nemo/collections/asr/models/label_models.py +++ b/nemo/collections/asr/models/label_models.py @@ -373,13 +373,25 @@ def evaluation_step(self, batch, batch_idx, dataloader_idx: int = 0, tag: str = self._macro_accuracy.update(preds=logits, target=labels) stats = self._macro_accuracy._final_state() - return { + output = { f'{tag}_loss': loss_value, f'{tag}_correct_counts': correct_counts, f'{tag}_total_counts': total_counts, f'{tag}_acc_micro_top_k': acc_top_k, f'{tag}_acc_macro_stats': stats, } + if tag == 'val': + if isinstance(self.trainer.val_dataloaders, (list, tuple)) and len(self.trainer.val_dataloaders) > 1: + self.validation_step_outputs[dataloader_idx].append(output) + else: + self.validation_step_outputs.append(output) + else: + if isinstance(self.trainer.test_dataloaders, (list, tuple)) and len(self.trainer.test_dataloaders) > 1: + self.test_step_outputs[dataloader_idx].append(output) + else: + self.test_step_outputs.append(output) + + return output def multi_evaluation_epoch_end(self, outputs, dataloader_idx: int = 0, tag: str = 'val'): loss_mean = torch.stack([x[f'{tag}_loss'] for x in outputs]).mean() From e7b3b715fc3c181ee56aba58661a63dc22dbbead Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 29 Sep 2023 17:06:10 -0700 Subject: [PATCH 078/112] Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- nemo/collections/nlp/parts/nlp_overrides.py | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/nemo/collections/nlp/parts/nlp_overrides.py b/nemo/collections/nlp/parts/nlp_overrides.py index 5332a9d2f115..8d027a00248d 100644 --- a/nemo/collections/nlp/parts/nlp_overrides.py +++ b/nemo/collections/nlp/parts/nlp_overrides.py @@ -994,6 +994,12 @@ class CustomProgressBar(TQDMProgressBar): for megatron models """ + def get_current_epoch_step(self, trainer): + """ + Get the value of step within an epoch + """ + return trainer.fit_loop.epoch_loop.automatic_optimization.optim_progress.optimizer.step.current.completed + def init_train_tqdm(self): """ Override bar_format to not have 's/it' @@ -1002,11 +1008,22 @@ def init_train_tqdm(self): self.bar.bar_format = "{desc}: {percentage:3.0f}%|{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}{postfix}]" return self.bar + def on_train_epoch_start(self, trainer, *_): + if trainer.max_steps > 0 and (trainer.ckpt_path is not None): + # while resuming from a ckpt use trainer.max_steps as the total for progress bar as trainer.num_training_batches + # is truncated to max_steps - step being resumed at + num_training_batches = trainer.max_steps + else: + num_training_batches = trainer.num_training_batches + self.train_progress_bar.reset(num_training_batches) + self.train_progress_bar.initial = 0 + self.train_progress_bar.set_description(f"Epoch {trainer.current_epoch}") + def on_train_batch_end(self, trainer, pl_module, *_, **__): """ - Override parent class on_train_batch_end to update progress bar per global_step instead of per microbatch + Override parent class on_train_batch_end to update progress bar per global batch instead of per microbatch """ - n = trainer.global_step + n = self.get_current_epoch_step(trainer) if self._should_update(n, self.train_progress_bar.total): _update_n(self.train_progress_bar, n) self.train_progress_bar.set_postfix(self.get_metrics(trainer, pl_module)) From abbef3b8e671d7be47db39fd3d7de0c85493d186 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 29 Sep 2023 19:12:52 -0700 Subject: [PATCH 079/112] fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../speech_enhancement/Speech_Enhancement_with_NeMo.ipynb | 8 ++++---- tutorials/tools/NeMo_Forced_Aligner_Tutorial.ipynb | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb b/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb index d8a15cbd5e1c..d7cd6571c16a 100644 --- a/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb +++ b/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb @@ -9,7 +9,7 @@ "source": [ "# Introduction\n", "\n", - "The goal of this tutorial is to demonstrate the basic steps required to setup and train train a simple single-channel speech enhancement model in NeMo.\n", + "The goal of this tutorial is to demonstrate the basic steps required to setup and train a simple single-channel speech enhancement model in NeMo.\n", "\n", "This notebook covers the following steps:\n", "\n", @@ -21,7 +21,7 @@ "Note that this tutorial is only for demonstration purposes.\n", "To achieve best performance for a particular use case, carefully prepared data and more advanced models should be used.\n", "\n", - "*Disclamer:*\n", + "*Disclaimer:*\n", "User is responsible for checking the content of datasets and the applicable licenses and determining if suitable for the intended use." ] }, @@ -411,7 +411,7 @@ "\n", "Here, a simple encoder-mask-decoder model will be used to process the noisy input signal and produce an enhanced output signal.\n", "\n", - "In general, an encoder-mask-decoder model can be confugured using `EncMaskDecAudioToAudioModel` class, which is depicted in the following block diagram." + "In general, an encoder-mask-decoder model can be configured using `EncMaskDecAudioToAudioModel` class, which is depicted in the following block diagram." ] }, { @@ -470,7 +470,7 @@ "In this particular configuration, the model structure can be described as follows:\n", "* `AudioToSpectrogram` implements the analysis STFT transform.\n", "* `MaskEstimatorRNN` is a mask estimator using RNNs.\n", - "* `MaskReferenceChannel` is a simple processor whith applies the estimated mask on the reference channel. In this tutorial, the input signal has only a single channel, so the reference channel will be set to `0`.\n", + "* `MaskReferenceChannel` is a simple processor which applies the estimated mask on the reference channel. In this tutorial, the input signal has only a single channel, so the reference channel will be set to `0`.\n", "* `SpectrogramToAudio` implements the synthesis STFT transform." ] }, diff --git a/tutorials/tools/NeMo_Forced_Aligner_Tutorial.ipynb b/tutorials/tools/NeMo_Forced_Aligner_Tutorial.ipynb index a6ab57854bad..3ca3c32c1074 100644 --- a/tutorials/tools/NeMo_Forced_Aligner_Tutorial.ipynb +++ b/tutorials/tools/NeMo_Forced_Aligner_Tutorial.ipynb @@ -347,7 +347,7 @@ "id": "dHU-YmALUvVf" }, "source": [ - "The alignment process should have finished successfuly, let's look at some of the output files." + "The alignment process should have finished successfully, let's look at some of the output files." ] }, { @@ -472,7 +472,7 @@ "\n", "You can see that the token timestamps (in the first video) are very accurate, even despite the poor audio quality of the video.\n", "\n", - "The word timestamps (in the second video) are also very good. The only noticeable mistakes are when the the word has punctuation at the end (or beginning). This is because punctuation that is not separated from a word by a space is considered to be part of that word. If the alignment for the punctuation is in a region of non-speech, then the word alignment will also contain that region of non-speech." + "The word timestamps (in the second video) are also very good. The only noticeable mistakes are when the word has punctuation at the end (or beginning). This is because punctuation that is not separated from a word by a space is considered to be part of that word. If the alignment for the punctuation is in a region of non-speech, then the word alignment will also contain that region of non-speech." ] }, { From 468e8b0cad8b59428313e8b4b67a6c3b82b03eb1 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 29 Sep 2023 23:57:01 -0700 Subject: [PATCH 080/112] Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister --- examples/nlp/glue_benchmark/glue_benchmark.py | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/examples/nlp/glue_benchmark/glue_benchmark.py b/examples/nlp/glue_benchmark/glue_benchmark.py index 87486dbc47b0..3cb5f8e4af3e 100644 --- a/examples/nlp/glue_benchmark/glue_benchmark.py +++ b/examples/nlp/glue_benchmark/glue_benchmark.py @@ -46,6 +46,10 @@ @hydra_runner(config_name="glue_benchmark_config") def main(cfg: DictConfig) -> None: + # PTL 2.0 has find_unused_parameters as False by default, so its required to set it to True + # when there are unused parameters like here + if cfg.trainer.strategy == 'ddp': + cfg.trainer.strategy = "ddp_find_unused_parameters_true" logging.info(f'Config: {OmegaConf.to_yaml(cfg)}') trainer = pl.Trainer(**cfg.trainer) exp_manager_cfg = cfg.get("exp_manager", None) From 8406b69b27983817ec9d485d3e0488c52cbb3500 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Sat, 30 Sep 2023 13:11:13 -0700 Subject: [PATCH 081/112] update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Signed-off-by: Sasha Meister --- tutorials/asr/Self_Supervised_Pre_Training.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tutorials/asr/Self_Supervised_Pre_Training.ipynb b/tutorials/asr/Self_Supervised_Pre_Training.ipynb index 113979314da9..c39445dabfd1 100644 --- a/tutorials/asr/Self_Supervised_Pre_Training.ipynb +++ b/tutorials/asr/Self_Supervised_Pre_Training.ipynb @@ -312,7 +312,7 @@ "\n", "if torch.cuda.is_available():\n", " cfg.trainer.accelerator = 'gpu'\n", - " cfg.trainer.strategy = 'dp'\n", + " cfg.trainer.strategy = 'auto'\n", " cfg.trainer.devices = 1\n", "else:\n", " cfg.trainer.accelerator = 'cpu'\n", @@ -534,7 +534,7 @@ "\n", "if torch.cuda.is_available():\n", " cfg.trainer.accelerator = 'gpu'\n", - " cfg.trainer.strategy = 'dp'\n", + " cfg.trainer.strategy = 'auto'\n", " cfg.trainer.devices = 1\n", "else:\n", " cfg.trainer.accelerator = 'cpu'\n", From 02dac3b60ce7e98c5a8a3d96e399fe054a97ba16 Mon Sep 17 00:00:00 2001 From: Igor Gitman Date: Mon, 2 Oct 2023 08:00:51 -0700 Subject: [PATCH 082/112] Fix typos (#7581) Signed-off-by: Sasha Meister --- tutorials/asr/ASR_Confidence_Estimation.ipynb | 14 +++++++------- tutorials/asr/Confidence_Ensembles.ipynb | 6 +++--- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/tutorials/asr/ASR_Confidence_Estimation.ipynb b/tutorials/asr/ASR_Confidence_Estimation.ipynb index 7a92ed026f07..93c40b576115 100644 --- a/tutorials/asr/ASR_Confidence_Estimation.ipynb +++ b/tutorials/asr/ASR_Confidence_Estimation.ipynb @@ -422,7 +422,7 @@ "1. Initialize _ConfidenceConfig_\n", "2. Put the created _ConfidenceConfig_ into the model decoding config.\n", "\n", - "The folloving cell contains an example of _ConfidenceConfig_ initialization and updating the the model's decoding config.\n", + "The following cell contains an example of _ConfidenceConfig_ initialization and updating the model's decoding config.\n", "\n", "For the _ConfidenceConfig_ there are also listed possible values for its parameters.\n", "\n", @@ -627,14 +627,14 @@ "4. Normalized Cross Entropy ($\\mathrm{NCE}$): how close of confidence for correct predictions to $1.0$ and of incorrect predictions to $0.0$. It ranges from $-\\infty$ to $1.0$, with negative scores indicating that the confidence method performs worse than the setting confidence score to $1-\\mathrm{WER}$. This metric is also known as Normalized Mutual Information.\n", "5. Expected Calibration Error ($\\mathrm{ECE}$): a weighted average over the absolute accuracy/confidence difference. It ranges from $0.0$ to $1.0$ with the best value $0.0$.\n", "\n", - "Metrics based on the Youden's curve (see https://en.wikipedia.org/wiki/Youden%27s_J_statistic) can also be condsidered. They are:\n", + "Metrics based on the Youden's curve (see https://en.wikipedia.org/wiki/Youden%27s_J_statistic) can also be considered. They are:\n", "1. Area Under the Youden's curve ($\\mathrm{AUC}_\\mathrm{YC}$): the rate of the effective threshold range (i.e. the adjustability or responsiveness). It ranges from $0.0$ to $1.0$ with the best value $0.5$.\n", "2. Maximum of the Youden's curve $\\mathrm{MAX}_\\mathrm{YC}$: the optimal $\\mathrm{TNR}$ vs. $\\mathrm{FNR}$ tradeoff. It's unnormalized version can be used as a criterion for selecting the optimal $\\tau$. It ranges from $0.0$ to $1.0$ with the best value $1.0$.\n", "3. The standard deviation of the Youden's curve values ($\\mathrm{STD}_\\mathrm{YC}$): indicates that $\\mathrm{TNR}$ and $\\mathrm{FNR}$ increase at different rates (viz. $\\mathrm{TNR}$ grows faster) as the $\\tau$ increases. It ranges from $0.0$ to $0.5$ with the best value around $0.25$.\n", "\n", - "When selecting/tuning a confidence method, it is recommended to maximize $\\mathrm{AUC}_\\mathrm{ROC}$ first as this is the main mectic of confidence estimation quality. Then, for overconfident models, maximizing $\\mathrm{AUC}_\\mathrm{NT}$ should take precedence over $\\mathrm{AUC}_\\mathrm{PR}$. Finally, a trade-off between $\\mathrm{NCE}$/$\\mathrm{ECE}$ and the family of $\\mathrm{YC}$ metrics considered as a compromise between formal correctness and controllability.\n", + "When selecting/tuning a confidence method, it is recommended to maximize $\\mathrm{AUC}_\\mathrm{ROC}$ first as this is the main metric of confidence estimation quality. Then, for overconfident models, maximizing $\\mathrm{AUC}_\\mathrm{NT}$ should take precedence over $\\mathrm{AUC}_\\mathrm{PR}$. Finally, a trade-off between $\\mathrm{NCE}$/$\\mathrm{ECE}$ and the family of $\\mathrm{YC}$ metrics considered as a compromise between formal correctness and controllability.\n", "\n", - "Let's see how well our confidence performs according to the metrcis above." + "Let's see how well our confidence performs according to the metrics above." ] }, { @@ -891,7 +891,7 @@ "id": "dbb82877" }, "source": [ - "## 4.1. Small WER improvenent\n", + "## 4.1. Small WER improvement\n", "\n", "Good confidence scores can slightly reduce WER by removing low confidence words from recognition results.\n", "\n", @@ -1190,7 +1190,7 @@ "id": "f28da61f", "metadata": {}, "source": [ - "The original examples contain speech, music, or noise. The resulring audio recordings are considered to contain no recognizable speech.\n", + "The original examples contain speech, music, or noise. The resulting audio recordings are considered to contain no recognizable speech.\n", "\n", "You can listen to an example of the audios." ] @@ -1397,7 +1397,7 @@ }, "source": [ "# Summary\n", - "This tutorial covered the basics of ASR confidence estimation and two examples of using ASR word confidence: WER reduction and hallusinations removal.\n", + "This tutorial covered the basics of ASR confidence estimation and two examples of using ASR word confidence: WER reduction and hallucinations removal.\n", "\n", "You can follow this tutorial on [ASR Confidence-based Ensembles](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Confidence_Ensembles.ipynb) to see another important application of ASR confidence estimation." ] diff --git a/tutorials/asr/Confidence_Ensembles.ipynb b/tutorials/asr/Confidence_Ensembles.ipynb index 4516d2b70d6d..eab46d3b06e5 100644 --- a/tutorials/asr/Confidence_Ensembles.ipynb +++ b/tutorials/asr/Confidence_Ensembles.ipynb @@ -48,7 +48,7 @@ "\n", "# clone SDP and install requirements\n", "!git clone https://github.com/NVIDIA/NeMo-speech-data-processor $WORKSPACE_DIR/NeMo-speech-data-processor\n", - "!pip install -r $WORKSPACE_DIR/NeMo-speech-data-processor/requirements.txt\n", + "!pip install -r $WORKSPACE_DIR/NeMo-speech-data-processor/requirements/main.txt\n", "\n", "\"\"\"\n", "Remember to restart the runtime for the kernel to pick up any upgraded packages.\n", @@ -106,13 +106,13 @@ "\n", "A short answer — you can use any ASR models. E.g., you can combine a number of CTC models, or Transducer models, or even mix-and-match. \n", "\n", - "A more detailed answer is that hte performance of the confidence ensemble is upper-bounded by the performance of the best model on each of the input examples. Thus you will benefit if some of your models work really well on part of the input compared to other models. This way you will get more gains compared to each separate model, and it will also make correct model identification easier.\n", + "A more detailed answer is that the performance of the confidence ensemble is upper-bounded by the performance of the best model on each of the input examples. Thus you will benefit if some of your models work really well on part of the input compared to other models. This way you will get more gains compared to each separate model, and it will also make correct model identification easier.\n", "\n", "### How to estimate a model's confidence?\n", "\n", "Good news, we have a whole separate [tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/ASR_Confidence_Estimation.ipynb) on this topic! You can go through it if you want to know all the details about different ways to estimate confidence of NeMo ASR models. There are different confidence measures and aggregation functions and for the absolute best performance, you will need to run a grid-search to pick the best confidence estimation way for your specific models and data.\n", "\n", - "That being said, we found that there exist a set of confidence parameters that work pretty well on a large set of models and datsets. They are default in NeMo and so you might not need to worry about running the search. If you do want to maximize the performance by tuning the confidence parameters, you only need to add [a few extra config lines](#Building-and-evaluating-ensemble-(tuned-parameters)).\n", + "That being said, we found that there exist a set of confidence parameters that work pretty well on a large set of models and datasets. They are default in NeMo and so you might not need to worry about running the search. If you do want to maximize the performance by tuning the confidence parameters, you only need to add [a few extra config lines](#Building-and-evaluating-ensemble-(tuned-parameters)).\n", "\n", "### How to calibrate confidence values?\n", "\n", From e434de95a0da90e0900c92d2491e6db8f970db95 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 2 Oct 2023 09:35:22 -0700 Subject: [PATCH 083/112] Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister --- tutorials/tts/FastPitch_Adapter_Finetuning.ipynb | 2 +- tutorials/tts/FastPitch_MultiSpeaker_Pretraining.ipynb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tutorials/tts/FastPitch_Adapter_Finetuning.ipynb b/tutorials/tts/FastPitch_Adapter_Finetuning.ipynb index 5fe61d596f4b..5220519e01ad 100644 --- a/tutorials/tts/FastPitch_Adapter_Finetuning.ipynb +++ b/tutorials/tts/FastPitch_Adapter_Finetuning.ipynb @@ -615,7 +615,7 @@ "+trainer.max_epochs=50 \\\n", "trainer.check_val_every_n_epoch=5 \\\n", "trainer.devices=-1 \\\n", - "trainer.strategy='ddp' \\\n", + "trainer.strategy='ddp_find_unused_parameters_true' \\\n", "trainer.precision=16 \\\n", "exp_manager.exp_dir={logs_dir} \\\n", "exp_manager.create_wandb_logger=True \\\n", diff --git a/tutorials/tts/FastPitch_MultiSpeaker_Pretraining.ipynb b/tutorials/tts/FastPitch_MultiSpeaker_Pretraining.ipynb index a031723f549b..ad0a49067ca4 100644 --- a/tutorials/tts/FastPitch_MultiSpeaker_Pretraining.ipynb +++ b/tutorials/tts/FastPitch_MultiSpeaker_Pretraining.ipynb @@ -511,7 +511,7 @@ "+trainer.max_epochs=5 \\\n", "trainer.check_val_every_n_epoch=5 \\\n", "trainer.devices=1 \\\n", - "trainer.strategy='auto' \\\n", + "trainer.strategy='ddp_find_unused_parameters_true' \\\n", "trainer.precision=16 \\\n", "exp_manager.exp_dir={logs_dir} \\\n", "exp_manager.create_wandb_logger=True \\\n", From 6a2a145181efa436464fa04991866820a2337d54 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 2 Oct 2023 09:55:07 -0700 Subject: [PATCH 084/112] [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister --- tutorials/asr/ASR_TTS_Tutorial.ipynb | 2 +- tutorials/asr/Self_Supervised_Pre_Training.ipynb | 4 ++-- tutorials/asr/Speech_Commands.ipynb | 2 +- tutorials/asr/Voice_Activity_Detection.ipynb | 2 +- .../speech_enhancement/Speech_Enhancement_with_NeMo.ipynb | 4 ++-- tutorials/nlp/Entity_Linking_Medical.ipynb | 2 +- tutorials/nlp/GLUE_Benchmark.ipynb | 2 +- tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb | 4 ++-- tutorials/nlp/Punctuation_and_Capitalization.ipynb | 4 ++-- .../nlp/Punctuation_and_Capitalization_Lexical_Audio.ipynb | 4 ++-- tutorials/nlp/Relation_Extraction-BioMegatron.ipynb | 2 +- tutorials/nlp/Text_Classification_Sentiment_Analysis.ipynb | 4 ++-- tutorials/nlp/Token_Classification-BioMegatron.ipynb | 2 +- .../nlp/Token_Classification_Named_Entity_Recognition.ipynb | 2 +- tutorials/nlp/Zero_Shot_Intent_Recognition.ipynb | 2 +- tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb | 2 +- .../speaker_tasks/Speaker_Identification_Verification.ipynb | 2 +- 17 files changed, 23 insertions(+), 23 deletions(-) diff --git a/tutorials/asr/ASR_TTS_Tutorial.ipynb b/tutorials/asr/ASR_TTS_Tutorial.ipynb index 9bbcc8e4aa34..267c84bca9d2 100644 --- a/tutorials/asr/ASR_TTS_Tutorial.ipynb +++ b/tutorials/asr/ASR_TTS_Tutorial.ipynb @@ -553,7 +553,7 @@ "config.trainer.max_epochs = NUM_EPOCHS\n", "\n", "config.trainer.devices = 1\n", - "config.trainer.strategy = auto # use 1 device, no need for ddp strategy\n", + "config.trainer.strategy = 'auto' # use 1 device, no need for ddp strategy\n", "\n", "OmegaConf.resolve(config)" ] diff --git a/tutorials/asr/Self_Supervised_Pre_Training.ipynb b/tutorials/asr/Self_Supervised_Pre_Training.ipynb index c39445dabfd1..5f2c4dcc14c8 100644 --- a/tutorials/asr/Self_Supervised_Pre_Training.ipynb +++ b/tutorials/asr/Self_Supervised_Pre_Training.ipynb @@ -316,7 +316,7 @@ " cfg.trainer.devices = 1\n", "else:\n", " cfg.trainer.accelerator = 'cpu'\n", - " cfg.trainer.strategy = auto\n", + " cfg.trainer.strategy = 'auto'\n", " cfg.trainer.devices = 0\n", "\n", "cfg.exp_manager.exp_dir = data_dir + \"/content/exp\"\n", @@ -538,7 +538,7 @@ " cfg.trainer.devices = 1\n", "else:\n", " cfg.trainer.accelerator = 'cpu'\n", - " cfg.trainer.strategy = auto\n", + " cfg.trainer.strategy = 'auto'\n", " cfg.trainer.devices = 0\n", "\n", "cfg.model.tokenizer.dir = data_dir + \"/tokenizers/an4/tokenizer_spe_unigram_v128/\" # note this is a directory, not a path to a vocabulary file\n", diff --git a/tutorials/asr/Speech_Commands.ipynb b/tutorials/asr/Speech_Commands.ipynb index c1566ae71850..e46ac4f7ec04 100644 --- a/tutorials/asr/Speech_Commands.ipynb +++ b/tutorials/asr/Speech_Commands.ipynb @@ -441,7 +441,7 @@ "config.trainer.max_epochs = 5\n", "\n", "# Remove distributed training flags\n", - "config.trainer.strategy = auto" + "config.trainer.strategy = 'auto'" ], "execution_count": null, "outputs": [] diff --git a/tutorials/asr/Voice_Activity_Detection.ipynb b/tutorials/asr/Voice_Activity_Detection.ipynb index 7c7b95e99416..fb7dbe27d21f 100644 --- a/tutorials/asr/Voice_Activity_Detection.ipynb +++ b/tutorials/asr/Voice_Activity_Detection.ipynb @@ -462,7 +462,7 @@ "config.trainer.max_epochs = 5\n", "\n", "# Remove distributed training flags\n", - "config.trainer.strategy = auto" + "config.trainer.strategy = 'auto'" ] }, { diff --git a/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb b/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb index d7cd6571c16a..09226c83d654 100644 --- a/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb +++ b/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb @@ -667,7 +667,7 @@ "config.trainer.max_epochs = 10\n", "\n", "# Remove distributed training flags\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "\n", "# Instantiate the trainer\n", "trainer = pl.Trainer(**config.trainer)" @@ -1144,7 +1144,7 @@ "config_dual_output.trainer.max_epochs = 10\n", "\n", "# Remove distributed training flags\n", - "config_dual_output.trainer.strategy = auto\n", + "config_dual_output.trainer.strategy = 'auto'\n", "\n", "# Instantiate the trainer\n", "trainer = pl.Trainer(**config_dual_output.trainer)\n", diff --git a/tutorials/nlp/Entity_Linking_Medical.ipynb b/tutorials/nlp/Entity_Linking_Medical.ipynb index 4f56a85eabfa..78fe66d9f233 100644 --- a/tutorials/nlp/Entity_Linking_Medical.ipynb +++ b/tutorials/nlp/Entity_Linking_Medical.ipynb @@ -187,7 +187,7 @@ "cfg.model.validation_ds.data_file = os.path.join(DATA_DIR, \"tiny_example_validation_pairs.tsv\")\n", "\n", "# remove distributed training flags\n", - "cfg.trainer.strategy = auto\n", + "cfg.trainer.strategy = 'auto'\n", "cfg.trainer.accelerator = None" ] }, diff --git a/tutorials/nlp/GLUE_Benchmark.ipynb b/tutorials/nlp/GLUE_Benchmark.ipynb index 445ff6705028..b77b3439b444 100644 --- a/tutorials/nlp/GLUE_Benchmark.ipynb +++ b/tutorials/nlp/GLUE_Benchmark.ipynb @@ -342,7 +342,7 @@ "# config.trainer.amp_level = O1\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "\n", "# setup max number of steps to reduce training time for demonstration purposes of this tutorial\n", "config.trainer.max_steps = 128\n", diff --git a/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb b/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb index 7513183fba28..675fdfd5351c 100644 --- a/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb +++ b/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb @@ -286,7 +286,7 @@ "# config.trainer.amp_level = O1\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "\n", "# setup a small number of epochs for demonstration purposes of this tutorial\n", "config.trainer.max_epochs = 5\n", @@ -705,7 +705,7 @@ "config.trainer.accelerator = accelerator\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "\n", "trainer = pl.Trainer(**config.trainer)\n", "config.exp_manager.exp_dir = os.path.join(DATA_DIR, \"output/\" + run_name)\n", diff --git a/tutorials/nlp/Punctuation_and_Capitalization.ipynb b/tutorials/nlp/Punctuation_and_Capitalization.ipynb index e9d1060f6442..f88c33fada34 100644 --- a/tutorials/nlp/Punctuation_and_Capitalization.ipynb +++ b/tutorials/nlp/Punctuation_and_Capitalization.ipynb @@ -550,7 +550,7 @@ "config.trainer.max_epochs = 1\n", "\n", "# Remove distributed training flags\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "\n", "trainer = pl.Trainer(**config.trainer)" ] @@ -745,7 +745,7 @@ "config.trainer.accelerator = accelerator\n", "config.trainer.precision = 16 if torch.cuda.is_available() else 32\n", "config.trainer.max_epochs = 1\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "\n", "# Exp manager\n", "config.exp_manager.explicit_log_dir = 'tarred_experiment'\n", diff --git a/tutorials/nlp/Punctuation_and_Capitalization_Lexical_Audio.ipynb b/tutorials/nlp/Punctuation_and_Capitalization_Lexical_Audio.ipynb index 778e14e63b70..2afbb19c0e66 100644 --- a/tutorials/nlp/Punctuation_and_Capitalization_Lexical_Audio.ipynb +++ b/tutorials/nlp/Punctuation_and_Capitalization_Lexical_Audio.ipynb @@ -645,7 +645,7 @@ "config.trainer.max_epochs = 1\n", "\n", "# Remove distributed training flags\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "config.exp_manager.use_datetime_version=False\n", "config.exp_manager.explicit_log_dir='Punctuation_And_Capitalization_Lexical_Audio'\n", "\n", @@ -860,7 +860,7 @@ "config.trainer.accelerator = accelerator\n", "config.trainer.precision = 16 if torch.cuda.is_available() else 32\n", "config.trainer.max_epochs = 1\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "\n", "# Exp manager\n", "config.exp_manager.explicit_log_dir = 'tarred_experiment'\n", diff --git a/tutorials/nlp/Relation_Extraction-BioMegatron.ipynb b/tutorials/nlp/Relation_Extraction-BioMegatron.ipynb index 451c40152c8d..d6b1e98b428e 100644 --- a/tutorials/nlp/Relation_Extraction-BioMegatron.ipynb +++ b/tutorials/nlp/Relation_Extraction-BioMegatron.ipynb @@ -403,7 +403,7 @@ "config.trainer.precision = 16 if torch.cuda.is_available() else 32\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "\n", "trainer = pl.Trainer(**config.trainer)" ] diff --git a/tutorials/nlp/Text_Classification_Sentiment_Analysis.ipynb b/tutorials/nlp/Text_Classification_Sentiment_Analysis.ipynb index 8e44aca9d0d1..bbb6a868a4da 100644 --- a/tutorials/nlp/Text_Classification_Sentiment_Analysis.ipynb +++ b/tutorials/nlp/Text_Classification_Sentiment_Analysis.ipynb @@ -370,7 +370,7 @@ "# config.trainer.amp_level = O1\n", "\n", "# disable distributed training when using Colab to prevent the errors\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "\n", "# setup max number of steps to reduce training time for demonstration purposes of this tutorial\n", "# Training stops when max_step or max_epochs is reached (earliest)\n", @@ -573,7 +573,7 @@ "# create a copy of the trainer config and update it to be used for final evaluation\n", "eval_trainer_cfg = config.trainer.copy()\n", "eval_trainer_cfg.accelerator = 'gpu' if torch.cuda.is_available() else 'cpu' # it is safer to perform evaluation on single GPU as PT is buggy with the last batch on multi-GPUs\n", - "eval_trainer_cfg.strategy = auto # 'ddp' is buggy with test process in the current PT, it looks like it has been fixed in the latest master\n", + "eval_trainer_cfg.strategy = 'auto' # 'ddp' is buggy with test process in the current PT, it looks like it has been fixed in the latest master\n", "eval_trainer = pl.Trainer(**eval_trainer_cfg)\n", "\n", "eval_trainer.test(model=eval_model, verbose=False) # test_dataloaders=eval_dataloader,\n" diff --git a/tutorials/nlp/Token_Classification-BioMegatron.ipynb b/tutorials/nlp/Token_Classification-BioMegatron.ipynb index e5e5aa81b859..afbc8394aa84 100644 --- a/tutorials/nlp/Token_Classification-BioMegatron.ipynb +++ b/tutorials/nlp/Token_Classification-BioMegatron.ipynb @@ -434,7 +434,7 @@ "config.trainer.precision = 16 if torch.cuda.is_available() else 32\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "\n", "trainer = pl.Trainer(**config.trainer)" ] diff --git a/tutorials/nlp/Token_Classification_Named_Entity_Recognition.ipynb b/tutorials/nlp/Token_Classification_Named_Entity_Recognition.ipynb index 1c1999cc08c1..3ab98f6c19fd 100644 --- a/tutorials/nlp/Token_Classification_Named_Entity_Recognition.ipynb +++ b/tutorials/nlp/Token_Classification_Named_Entity_Recognition.ipynb @@ -533,7 +533,7 @@ "# config.trainer.amp_level = O1\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "\n", "# setup max number of steps to reduce training time for demonstration purposes of this tutorial\n", "config.trainer.max_steps = 32\n", diff --git a/tutorials/nlp/Zero_Shot_Intent_Recognition.ipynb b/tutorials/nlp/Zero_Shot_Intent_Recognition.ipynb index f571fa176e96..7f1baf536d87 100644 --- a/tutorials/nlp/Zero_Shot_Intent_Recognition.ipynb +++ b/tutorials/nlp/Zero_Shot_Intent_Recognition.ipynb @@ -400,7 +400,7 @@ "# config.trainer.amp_level = O1\n", "\n", "# remove distributed training flags\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "\n", "# setup max number of steps to reduce training time for demonstration purposes of this tutorial\n", "config.trainer.max_steps = 128\n", diff --git a/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb b/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb index efd86e1ef242..7db905b6d225 100644 --- a/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb +++ b/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb @@ -761,7 +761,7 @@ "source": [ "config.model.diarizer.speaker_embeddings.model_path=\"titanet_large\"\n", "config.trainer.max_epochs = 5\n", - "config.trainer.strategy = auto" + "config.trainer.strategy = 'auto'" ] }, { diff --git a/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb b/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb index f0ad1c19f5c9..5a5ec9c46552 100644 --- a/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb +++ b/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb @@ -475,7 +475,7 @@ "config.trainer.max_epochs = 10\n", "\n", "# Remove distributed training flags\n", - "config.trainer.strategy = auto\n", + "config.trainer.strategy = 'auto'\n", "\n", "# Remove augmentations\n", "config.model.train_ds.augmentor=None" From 6db0b2b782251b35c9fe104c22507c7354239a54 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 2 Oct 2023 13:06:36 -0700 Subject: [PATCH 085/112] add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Signed-off-by: Sasha Meister --- .readthedocs.yml | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/.readthedocs.yml b/.readthedocs.yml index 226be6a7eab0..5ee18e6dee1e 100644 --- a/.readthedocs.yml +++ b/.readthedocs.yml @@ -20,12 +20,16 @@ # Required field. version: 2 +build: + os: ubuntu-22.04 + tools: + python: "3.10" + # Build documentation in the docs/ directory with Sphinx. sphinx: configuration: docs/source/conf.py # Set the version of Python and requirements required to build your docs python: - version: 3.8 install: - requirements: requirements/requirements_docs.txt From 8f2a31caead1e6a11553681713781b03460ceac7 Mon Sep 17 00:00:00 2001 From: Jan Lasek Date: Mon, 2 Oct 2023 22:43:54 +0200 Subject: [PATCH 086/112] StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek * Test with pyt:23.09 container Signed-off-by: Jan Lasek --------- Signed-off-by: Jan Lasek Signed-off-by: Sasha Meister --- Jenkinsfile | 36 ++++++++++++++++++- .../tuning/megatron_gpt_sft.py | 1 - 2 files changed, 35 insertions(+), 2 deletions(-) diff --git a/Jenkinsfile b/Jenkinsfile index 92aa65ae660b..3d262931915b 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -1,7 +1,7 @@ pipeline { agent { docker { - image 'nvcr.io/nvidia/pytorch:23.08-py3' + image 'nvcr.io/nvidia/pytorch:23.09-py3' args '--device=/dev/nvidia0 --gpus all --user 0:128 -v /home/TestData:/home/TestData -v $HOME/.cache:/root/.cache --shm-size=8g --env TRANSFORMERS_OFFLINE=1 --env HYDRA_FULL_ERROR=1' } } @@ -3621,6 +3621,40 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' sh "rm -rf examples/nlp/language_modeling/gpt_sft_results" } } + stage('L2: Megatron GPT Finetuning StarCoder PP=1') { + when { + anyOf { + branch 'main' + changeRequest target: 'main' + } + } + failFast true + steps { + sh "python examples/nlp/language_modeling/tuning/megatron_gpt_sft.py \ + trainer.devices=1 \ + trainer.num_nodes=1 \ + trainer.precision=32 \ + trainer.max_steps=4 \ + trainer.val_check_interval=4 \ + trainer.enable_checkpointing=False \ + +trainer.limit_val_batches=2 \ + +trainer.limit_test_batches=2 \ + exp_manager.checkpoint_callback_params.save_best_model=False \ + exp_manager.exp_dir=examples/nlp/language_modeling/gpt_sft_results \ + model.optim.name=distributed_fused_adam \ + model.restore_from_path=/home/TestData/nlp/megatron_gpt/starcoder-ci-nemo/megatron_starcoder_tp1_pp1.nemo \ + model.tensor_model_parallel_size=1 \ + model.pipeline_model_parallel_size=1 \ + model.data.train_ds.file_names=[/home/TestData/nlp/megatron_sft/quarel.jsonl] \ + model.data.train_ds.num_workers=0 \ + model.data.test_ds.file_names=[/home/TestData/nlp/megatron_sft/quarel.jsonl] \ + model.data.validation_ds.num_workers=0 \ + model.data.validation_ds.file_names=[/home/TestData/nlp/megatron_sft/quarel.jsonl] \ + model.data.test_ds.num_workers=0 \ + model.data.train_ds.concat_sampling_probabilities=[1.0]" + sh "rm -rf examples/nlp/language_modeling/gpt_sft_results" + } + } stage('L2: Megatron GPT PEFT Lora PP=2') { when { anyOf { diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py b/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py index 870ff2921cce..9a70671f8073 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py @@ -202,7 +202,6 @@ def main(cfg) -> None: return_config=True, save_restore_connector=save_restore_connector, ) - gpt_cfg = _modify_config(gpt_cfg, cfg, add_cfg_to_tree=False) model = load_from_nemo(MegatronGPTSFTModel, cfg, trainer, gpt_cfg, modify_confg_fn=_modify_config) else: validate_checkpoint_loading_args(cfg.model.pretrained_checkpoint) From 8f306a20f7436af0ef1ac0825c92b1d2898aece9 Mon Sep 17 00:00:00 2001 From: Adi Renduchintala Date: Mon, 2 Oct 2023 20:20:02 -0700 Subject: [PATCH 087/112] defaults changed (#7600) * defaults changed Signed-off-by: arendu * typo Signed-off-by: arendu * update Signed-off-by: arendu --------- Signed-off-by: arendu Signed-off-by: Sasha Meister --- .../metric_calculation/peft_metric_calc.py | 30 +++++++------------ 1 file changed, 11 insertions(+), 19 deletions(-) diff --git a/scripts/metric_calculation/peft_metric_calc.py b/scripts/metric_calculation/peft_metric_calc.py index ca13f83281c5..9fd2452fb17b 100755 --- a/scripts/metric_calculation/peft_metric_calc.py +++ b/scripts/metric_calculation/peft_metric_calc.py @@ -21,26 +21,18 @@ """ -This script can be used to calcualte exact match and F1 scores for many different tasks, not just squad. - -Example command for T5 Preds - - ``` - python squad_metric_calc.py \ - --ground-truth squad_test_gt.jsonl \ - --preds squad_preds_t5.txt - ``` +This script can be used to calcualte exact match and F1 scores for many different tasks. +The file "squad_test_predictions.jsonl" is assumed to be generated by the +`examples/nlp/language_modeling/tuning/megatron_gpt_peft_eval.py` script Example command for GPT Preds ``` - python squad_metric_calc.py \ - --ground-truth squad_test_gt.jsonl \ - --preds squad_preds_gpt.txt \ - --split-string "answer:" + python peft_metric_calc.py \ + --pred_file squad_test_predictions.jsonl \ + --label_field "original_answers" \ ``` - In this case, the prediction file will be split on "answer: " when looking for the LM's predicted answer. """ @@ -92,21 +84,21 @@ def metric_max_over_ground_truths(metric_fn, prediction, ground_truths): def main(): parser = argparse.ArgumentParser(description='Process some integers.') parser.add_argument( - '--pred-file', + '--pred_file', type=str, help="Text file with test set prompts + model predictions. Prediction file can be made by running NeMo/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py", ) parser.add_argument( - '--pred-field', + '--pred_field', type=str, help="The field in the json file that contains the prediction tokens", default="pred", ) parser.add_argument( - '--ground-truth-field', + '--label_field', type=str, help="The field in the json file that contains the ground truth tokens", - default="original_answers", + default="label", ) args = parser.parse_args() @@ -120,7 +112,7 @@ def main(): pred_line = json.loads(preds[i]) pred_answer = pred_line[args.pred_field] - true_answers = pred_line[args.ground_truth_field] + true_answers = pred_line[args.label_field] if not isinstance(true_answers, list): true_answers = [true_answers] From 2f6fa29c0078000e7165301d6260fe448a03ca67 Mon Sep 17 00:00:00 2001 From: Giacomo Leone Maria Cavallini <72698188+GiacomoLeoneMaria@users.noreply.github.com> Date: Tue, 3 Oct 2023 05:40:12 +0200 Subject: [PATCH 088/112] add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria --------- Signed-off-by: GiacomoLeoneMaria Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../tokenizers/text_to_speech/ipa_lexicon.py | 7 +- .../text_to_speech/tts_tokenizers.py | 73 ++++++++++++++++++- .../text_to_speech/test_tts_tokenizers.py | 16 ++++ 3 files changed, 93 insertions(+), 3 deletions(-) diff --git a/nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py b/nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py index 2e1bb359102b..338b3536519b 100644 --- a/nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py +++ b/nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py @@ -88,7 +88,7 @@ 'ɢ','ʛ','ɦ','ɧ','ħ','ɥ','ʜ','ɨ','ɬ','ɫ','ɮ','ʟ', 'ɱ','ɯ','ɰ','ɳ','ɵ','ɸ','œ','ɶ','ʘ','ɺ','ɻ','ʀ','ʁ', 'ɽ','ʂ','ʈ','ʧ','ʉ','ʋ','ⱱ','ɤ','ʍ','χ','ʏ','ʑ','ʐ', - 'ʔ','ʡ','ʕ','ʢ','ǀ','ǁ','ǂ','ᵻ' + 'ʔ','ʡ','ʕ','ʢ','ǀ','ǁ','ǂ','ᵻ', 'ʃ','ː', ), } @@ -181,7 +181,10 @@ def get_ipa_punctuation_list(locale): '↑', '→', '↗', - '↘,', + '↘', + '”', + '’', + '-', ] ) elif locale == "es-ES": diff --git a/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py b/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py index 32f725c9c73f..25b9d88a59dc 100644 --- a/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py +++ b/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py @@ -284,7 +284,7 @@ def __init__( non_default_punct_list: List of punctuation marks which will be used instead default. """ - it_alphabet = "abcdefghijklmnopqrstuvwxyzàèéìòù" + it_alphabet = "abcdefghijklmnopqrstuvwxyzàèéìòùó" super().__init__( chars=it_alphabet, punct=punct, @@ -367,6 +367,77 @@ def encode(self, text): return [self._token2id[p] for p in cs] +class ItalianPhonemesTokenizer(BaseCharsTokenizer): + # fmt: off + PUNCT_LIST = ( + ',', '.', '!', '?', '-', + ':', ';', '/', '"', '(', + ')', '[', ']', '{', '}', + '„', '“', '”', '‘', '’', '‒', '—', '«', '»', '‹', '›', '_', + ) + # fmt: on + + def __init__( + self, + punct=True, + apostrophe=True, + add_blank_at=None, + pad_with_space=False, + non_default_punct_list=None, + text_preprocessing_func=italian_text_preprocessing, + ): + """Italian phoneme-based tokenizer. + Args: + punct: Whether to reserve grapheme for basic punctuation or not. + apostrophe: Whether to use apostrophe or not. + add_blank_at: Add blank to labels in the specified order ("last") or after tokens (any non None), + if None then no blank in labels. + pad_with_space: Whether to pad text with spaces at the beginning and at the end or not. + non_default_punct_list: List of punctuation marks which will be used instead default. + text_preprocessing_func: Text preprocessing function for correct execution of the tokenizer. + Currently, it only applies lower() function. + """ + + it_ipa = "abcdefghijklmnopqrstuvwxyzàèéìòùóæɐɑɔəɚɜɬɹʌʔᵻðŋɛɡɣɪɲɾʃʊʎʒʝβθd͡'t͡'øɒɕɓçɖɘɝɞɟʄɡɠɢʛɦɧħɥʜɨɬɫɮʟɱɯɰɳɵɸœɶʘɺɻʀʁɽʂʈʧʉʋⱱɤʍχʏʑʐʔʡʕʢǀǁǂᵻʃ'ː" + super().__init__( + chars=it_ipa, + punct=punct, + apostrophe=apostrophe, + add_blank_at=add_blank_at, + pad_with_space=pad_with_space, + non_default_punct_list=non_default_punct_list, + text_preprocessing_func=text_preprocessing_func, + ) + + def encode(self, text): + """See base class.""" + cs, space, tokens = [], self.tokens[self.space], set(self.tokens) + + text = self.text_preprocessing_func(text) + for c in text: + # Add space if last one isn't one + if c == space and len(cs) > 0 and cs[-1] != space: + cs.append(c) + # Add next char + elif (c.isalnum() or c == "'" or c == "\u0303") and c in tokens: + cs.append(c) + # Add punct + elif (c in self.PUNCT_LIST) and self.punct: + cs.append(c) + # Warn about unknown char + elif c != space: + logging.warning(f"Text: [{text}] contains unknown char: [{c}]. Symbol will be skipped.") + + # Remove trailing spaces + while cs[-1] == space: + cs.pop() + + if self.pad_with_space: + cs = [space] + cs + [space] + + return [self._token2id[p] for p in cs] + + class EnglishPhonemesTokenizer(BaseTokenizer): # fmt: off PUNCT_LIST = ( # Derived from LJSpeech and "/" additionally diff --git a/tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py b/tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py index 62c571bc16b7..bc065e75fa66 100644 --- a/tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py +++ b/tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py @@ -34,6 +34,10 @@ class TestTTSTokenizers: "BUENOS": ["bwˈenos"], "DÍAS": ["dˈias"], } + PHONEME_DICT_IT = { + "CIAO": ["tʃˈao"], + "MONDO": ["mˈondo"], + } @staticmethod def _parse_text(tokenizer, text): @@ -146,6 +150,18 @@ def test_ipa_tokenizer_de_de(self): assert chars == expected_output + @pytest.mark.run_only_on('CPU') + @pytest.mark.unit + def test_ipa_tokenizer_it_it(self): + input_text = "Ciao mondo" + expected_output = "tʃˈao mˈondo" + + g2p = IpaG2p(phoneme_dict=self.PHONEME_DICT_IT, locale="it-IT") + tokenizer = IPATokenizer(g2p=g2p, locale="it-IT") + chars, tokens = self._parse_text(tokenizer, input_text) + + assert chars == expected_output + @pytest.mark.run_only_on('CPU') @pytest.mark.unit def test_ipa_tokenizer_en_us(self): From 71f327f29c1328e0d594dc30880a126363f5061e Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 2 Oct 2023 20:42:39 -0700 Subject: [PATCH 089/112] best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Sasha Meister --- nemo/utils/callbacks/nemo_model_checkpoint.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/nemo/utils/callbacks/nemo_model_checkpoint.py b/nemo/utils/callbacks/nemo_model_checkpoint.py index d4759ecf5949..5089f50bd527 100644 --- a/nemo/utils/callbacks/nemo_model_checkpoint.py +++ b/nemo/utils/callbacks/nemo_model_checkpoint.py @@ -209,6 +209,8 @@ def on_train_end(self, trainer, pl_module): "were found. Saving latest model instead." ) else: + if os.path.isdir(self.best_model_path.split('.ckpt')[0]): + self.best_model_path = self.best_model_path.split('.ckpt')[0] self.best_model_path = trainer.strategy.broadcast(self.best_model_path) trainer._checkpoint_connector.restore(self.best_model_path) From 6517360e3e3ab4621c2278fc38773213f999aac9 Mon Sep 17 00:00:00 2001 From: George <37293288+Jorjeous@users.noreply.github.com> Date: Tue, 3 Oct 2023 18:54:21 +0400 Subject: [PATCH 090/112] Add files via upload (#7598) specifies the branch Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister --- tutorials/tools/SDE_HowTo_v2.ipynb | 837 +---------------------------- 1 file changed, 5 insertions(+), 832 deletions(-) diff --git a/tutorials/tools/SDE_HowTo_v2.ipynb b/tutorials/tools/SDE_HowTo_v2.ipynb index a6d3f8dd2723..4087219a8dad 100644 --- a/tutorials/tools/SDE_HowTo_v2.ipynb +++ b/tutorials/tools/SDE_HowTo_v2.ipynb @@ -44,840 +44,13 @@ "cell_type": "code", "execution_count": null, "metadata": { - "id": "c3919489", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "87f4e2f4-a06c-432d-d986-429fbe6714af" + "id": "c3919489" }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Cloning into 'NeMo'...\n", - "remote: Enumerating objects: 121443, done.\u001b[K\n", - "remote: Counting objects: 100% (1811/1811), done.\u001b[K\n", - "remote: Compressing objects: 100% (945/945), done.\u001b[K\n", - "remote: Total 121443 (delta 1299), reused 1238 (delta 864), pack-reused 119632\u001b[K\n", - "Receiving objects: 100% (121443/121443), 228.05 MiB | 21.34 MiB/s, done.\n", - "Resolving deltas: 100% (90608/90608), done.\n", - "Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]\n", - "Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease\n", - "Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]\n", - "Get:4 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]\n", - "Get:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]\n", - "Hit:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 InRelease\n", - "Hit:7 https://ppa.launchpadcontent.net/c2d4u.team/c2d4u4.0+/ubuntu jammy InRelease\n", - "Get:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease [18.1 kB]\n", - "Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease\n", - "Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease\n", - "Get:11 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [962 kB]\n", - "Get:12 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [1,059 kB]\n", - "Get:13 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [993 kB]\n", - "Get:14 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1,254 kB]\n", - "Get:15 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1,230 kB]\n", - "Get:16 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [1,079 kB]\n", - "Get:17 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy/main amd64 Packages [21.8 kB]\n", - "Fetched 6,959 kB in 1s (5,794 kB/s)\n", - "Reading package lists... Done\n", - "Reading package lists... Done\n", - "Building dependency tree... Done\n", - "Reading state information... Done\n", - "libsndfile1 is already the newest version (1.0.31-2build1).\n", - "ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).\n", - "0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.\n", - "Requirement already satisfied: pip in /usr/local/lib/python3.10/dist-packages (23.1.2)\n", - "Collecting pip\n", - " Downloading pip-23.2.1-py3-none-any.whl (2.1 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.1/2.1 MB\u001b[0m \u001b[31m12.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hInstalling collected packages: pip\n", - " Attempting uninstall: pip\n", - " Found existing installation: pip 23.1.2\n", - " Uninstalling pip-23.1.2:\n", - " Successfully uninstalled pip-23.1.2\n", - "Successfully installed pip-23.2.1\n", - "Uninstalling stuff\n", - "\u001b[33mWARNING: Skipping nemo_toolkit as it is not installed.\u001b[0m\u001b[33m\n", - "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", - "\u001b[0m\u001b[33mWARNING: Skipping sacrebleu as it is not installed.\u001b[0m\u001b[33m\n", - "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", - "\u001b[0m\u001b[33mWARNING: Skipping nemo_asr as it is not installed.\u001b[0m\u001b[33m\n", - "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", - "\u001b[0m\u001b[33mWARNING: Skipping nemo_nlp as it is not installed.\u001b[0m\u001b[33m\n", - "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", - "\u001b[0m\u001b[33mWARNING: Skipping nemo_tts as it is not installed.\u001b[0m\u001b[33m\n", - "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", - "\u001b[0mInstalling nemo\n", - "Obtaining file:///content/NeMo\n", - " Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", - " Checking if build backend supports build_editable ... \u001b[?25l\u001b[?25hdone\n", - " Getting requirements to build editable ... \u001b[?25l\u001b[?25hdone\n", - " Preparing editable metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", - "Collecting huggingface-hub (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for huggingface-hub from https://files.pythonhosted.org/packages/7f/c4/adcbe9a696c135578cabcbdd7331332daad4d49b7c43688bc2d36b3a47d2/huggingface_hub-0.16.4-py3-none-any.whl.metadata\n", - " Downloading huggingface_hub-0.16.4-py3-none-any.whl.metadata (12 kB)\n", - "Requirement already satisfied: numba in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.56.4)\n", - "Requirement already satisfied: numpy<1.24,>=1.22 in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (1.23.5)\n", - "Collecting onnx>=1.7.0 (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for onnx>=1.7.0 from https://files.pythonhosted.org/packages/47/d4/f2d212558245e252b936247666c3f5981e6dba62ec470ff8be3df3389364/onnx-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", - " Downloading onnx-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (15 kB)\n", - "Requirement already satisfied: python-dateutil in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (2.8.2)\n", - "Collecting ruamel.yaml (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for ruamel.yaml from https://files.pythonhosted.org/packages/d9/0e/2a05efa11ea33513fbdf4a2e2576fe94fd8fa5ad226dbb9c660886390974/ruamel.yaml-0.17.32-py3-none-any.whl.metadata\n", - " Downloading ruamel.yaml-0.17.32-py3-none-any.whl.metadata (17 kB)\n", - "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (1.2.2)\n", - "Collecting setuptools==65.5.1 (from nemo-toolkit==1.21.0rc0)\n", - " Downloading setuptools-65.5.1-py3-none-any.whl (1.2 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.2/1.2 MB\u001b[0m \u001b[31m13.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: tensorboard in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (2.12.3)\n", - "Requirement already satisfied: text-unidecode in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (1.3)\n", - "Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (2.0.1+cu118)\n", - "Requirement already satisfied: tqdm>=4.41.0 in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (4.66.1)\n", - "Collecting wget (from nemo-toolkit==1.21.0rc0)\n", - " Downloading wget-3.2.zip (10 kB)\n", - " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Requirement already satisfied: wrapt in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (1.14.1)\n", - "Collecting black==19.10b0 (from nemo-toolkit==1.21.0rc0)\n", - " Downloading black-19.10b0-py36-none-any.whl (97 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m97.5/97.5 kB\u001b[0m \u001b[31m12.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting click==8.0.2 (from nemo-toolkit==1.21.0rc0)\n", - " Downloading click-8.0.2-py3-none-any.whl (97 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m97.6/97.6 kB\u001b[0m \u001b[31m12.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting isort<6.0.0,>5.1.0 (from nemo-toolkit==1.21.0rc0)\n", - " Downloading isort-5.12.0-py3-none-any.whl (91 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m91.2/91.2 kB\u001b[0m \u001b[31m11.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting parameterized (from nemo-toolkit==1.21.0rc0)\n", - " Downloading parameterized-0.9.0-py2.py3-none-any.whl (20 kB)\n", - "Requirement already satisfied: pytest in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (7.4.1)\n", - "Collecting pytest-runner (from nemo-toolkit==1.21.0rc0)\n", - " Downloading pytest_runner-6.0.0-py3-none-any.whl (7.2 kB)\n", - "Requirement already satisfied: sphinx in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (5.0.2)\n", - "Collecting sphinxcontrib-bibtex (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for sphinxcontrib-bibtex from https://files.pythonhosted.org/packages/79/59/fafc5c480506cc356e2a7ea009d7c7d75812475b4385fe851ae55575661c/sphinxcontrib_bibtex-2.6.1-py3-none-any.whl.metadata\n", - " Downloading sphinxcontrib_bibtex-2.6.1-py3-none-any.whl.metadata (6.1 kB)\n", - "Collecting wandb (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for wandb from https://files.pythonhosted.org/packages/fe/10/18b03623c460fd433525d9b4739af58c5e69f5974328dcdd037cfbc855d7/wandb-0.15.10-py3-none-any.whl.metadata\n", - " Downloading wandb-0.15.10-py3-none-any.whl.metadata (9.6 kB)\n", - "Collecting hydra-core<=1.3.2,>1.3 (from nemo-toolkit==1.21.0rc0)\n", - " Downloading hydra_core-1.3.2-py3-none-any.whl (154 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m154.5/154.5 kB\u001b[0m \u001b[31m16.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting omegaconf<=2.3 (from nemo-toolkit==1.21.0rc0)\n", - " Downloading omegaconf-2.3.0-py3-none-any.whl (79 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.5/79.5 kB\u001b[0m \u001b[31m11.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting pytorch-lightning<=2.0.7,>=2.0 (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for pytorch-lightning<=2.0.7,>=2.0 from https://files.pythonhosted.org/packages/d5/ef/39994adec1fe1d5f25fd0dd0a82abcd8bd61fc968283790b9da7463f0279/pytorch_lightning-2.0.7-py3-none-any.whl.metadata\n", - " Downloading pytorch_lightning-2.0.7-py3-none-any.whl.metadata (23 kB)\n", - "Collecting torchmetrics>=0.11.0 (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for torchmetrics>=0.11.0 from https://files.pythonhosted.org/packages/e3/86/47091c33ecf05f8826d134fd518485d4c68ca524c053b2fdd4e041c20547/torchmetrics-1.1.1-py3-none-any.whl.metadata\n", - " Downloading torchmetrics-1.1.1-py3-none-any.whl.metadata (21 kB)\n", - "Collecting transformers>=4.0.1 (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for transformers>=4.0.1 from https://files.pythonhosted.org/packages/13/30/54b59e73400df3de506ad8630284e9fd63f4b94f735423d55fc342181037/transformers-4.33.1-py3-none-any.whl.metadata\n", - " Downloading transformers-4.33.1-py3-none-any.whl.metadata (119 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m119.9/119.9 kB\u001b[0m \u001b[31m12.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting webdataset<=0.1.62,>=0.1.48 (from nemo-toolkit==1.21.0rc0)\n", - " Downloading webdataset-0.1.62-py3-none-any.whl (32 kB)\n", - "Requirement already satisfied: inflect in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (7.0.0)\n", - "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (1.5.3)\n", - "Collecting pydantic<2 (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for pydantic<2 from https://files.pythonhosted.org/packages/bc/e0/0371e9b6c910afe502e5fe18cc94562bfd9399617c7b4f5b6e13c29115b3/pydantic-1.10.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", - " Downloading pydantic-1.10.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (149 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m149.3/149.3 kB\u001b[0m \u001b[31m17.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting sacremoses>=0.0.43 (from nemo-toolkit==1.21.0rc0)\n", - " Downloading sacremoses-0.0.53.tar.gz (880 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m880.6/880.6 kB\u001b[0m \u001b[31m24.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Collecting sentencepiece<1.0.0 (from nemo-toolkit==1.21.0rc0)\n", - " Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m34.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting youtokentome>=1.0.5 (from nemo-toolkit==1.21.0rc0)\n", - " Downloading youtokentome-1.0.6.tar.gz (86 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m86.7/86.7 kB\u001b[0m \u001b[31m11.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Collecting braceexpand (from nemo-toolkit==1.21.0rc0)\n", - " Downloading braceexpand-0.1.7-py2.py3-none-any.whl (5.9 kB)\n", - "Requirement already satisfied: editdistance in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.6.2)\n", - "Collecting g2p-en (from nemo-toolkit==1.21.0rc0)\n", - " Downloading g2p_en-2.1.0-py3-none-any.whl (3.1 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.1/3.1 MB\u001b[0m \u001b[31m51.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: ipywidgets in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (7.7.1)\n", - "Collecting jiwer (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for jiwer from https://files.pythonhosted.org/packages/0d/4f/ee537ab20144811dd99321735ff92ef2b3a3230b77ed7454bed4c44d21fc/jiwer-3.0.3-py3-none-any.whl.metadata\n", - " Downloading jiwer-3.0.3-py3-none-any.whl.metadata (2.6 kB)\n", - "Collecting kaldi-python-io (from nemo-toolkit==1.21.0rc0)\n", - " Downloading kaldi-python-io-1.2.2.tar.gz (8.8 kB)\n", - " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Collecting kaldiio (from nemo-toolkit==1.21.0rc0)\n", - " Downloading kaldiio-2.18.0-py3-none-any.whl (28 kB)\n", - "Requirement already satisfied: librosa>=0.9.0 in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.10.1)\n", - "Collecting marshmallow (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for marshmallow from https://files.pythonhosted.org/packages/ed/3c/cebfdcad015240014ff08b883d1c0c427f2ba45ae8c6572851b6ef136cad/marshmallow-3.20.1-py3-none-any.whl.metadata\n", - " Downloading marshmallow-3.20.1-py3-none-any.whl.metadata (7.8 kB)\n", - "Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (3.7.1)\n", - "Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (23.1)\n", - "Collecting pyannote.core (from nemo-toolkit==1.21.0rc0)\n", - " Downloading pyannote.core-5.0.0-py3-none-any.whl (58 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.5/58.5 kB\u001b[0m \u001b[31m7.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting pyannote.metrics (from nemo-toolkit==1.21.0rc0)\n", - " Downloading pyannote.metrics-3.2.1-py3-none-any.whl (51 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m51.4/51.4 kB\u001b[0m \u001b[31m6.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting pydub (from nemo-toolkit==1.21.0rc0)\n", - " Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)\n", - "Requirement already satisfied: scipy>=0.14 in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (1.10.1)\n", - "Requirement already satisfied: soundfile in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.12.1)\n", - "Collecting sox (from nemo-toolkit==1.21.0rc0)\n", - " Downloading sox-1.4.1-py2.py3-none-any.whl (39 kB)\n", - "Collecting texterrors (from nemo-toolkit==1.21.0rc0)\n", - " Downloading texterrors-0.4.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m50.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting boto3 (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for boto3 from https://files.pythonhosted.org/packages/6e/7f/ffb72ddc9f465183af04623baf9d0ea70e73cb0957407e4c333b9e0263fb/boto3-1.28.43-py3-none-any.whl.metadata\n", - " Downloading boto3-1.28.43-py3-none-any.whl.metadata (6.7 kB)\n", - "Collecting datasets (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for datasets from https://files.pythonhosted.org/packages/09/7e/fd4d6441a541dba61d0acb3c1fd5df53214c2e9033854e837a99dd9e0793/datasets-2.14.5-py3-none-any.whl.metadata\n", - " Downloading datasets-2.14.5-py3-none-any.whl.metadata (19 kB)\n", - "Collecting einops (from nemo-toolkit==1.21.0rc0)\n", - " Downloading einops-0.6.1-py3-none-any.whl (42 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m42.2/42.2 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting faiss-cpu (from nemo-toolkit==1.21.0rc0)\n", - " Downloading faiss_cpu-1.7.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m17.6/17.6 MB\u001b[0m \u001b[31m60.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting fasttext (from nemo-toolkit==1.21.0rc0)\n", - " Downloading fasttext-0.9.2.tar.gz (68 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m68.8/68.8 kB\u001b[0m \u001b[31m8.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Collecting flask-restful (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for flask-restful from https://files.pythonhosted.org/packages/d7/7b/f0b45f0df7d2978e5ae51804bb5939b7897b2ace24306009da0cc34d8d1f/Flask_RESTful-0.3.10-py2.py3-none-any.whl.metadata\n", - " Downloading Flask_RESTful-0.3.10-py2.py3-none-any.whl.metadata (1.0 kB)\n", - "Collecting ftfy (from nemo-toolkit==1.21.0rc0)\n", - " Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.1/53.1 kB\u001b[0m \u001b[31m7.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: gdown in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (4.6.6)\n", - "Requirement already satisfied: h5py in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (3.9.0)\n", - "Collecting ijson (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for ijson from https://files.pythonhosted.org/packages/6b/78/2cbeb7020a7a319d148c92331951cfc710864990e32ff6c7f4859729fb48/ijson-3.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", - " Downloading ijson-3.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)\n", - "Requirement already satisfied: jieba in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.42.1)\n", - "Collecting markdown2 (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for markdown2 from https://files.pythonhosted.org/packages/f1/98/61276a753f078dd2f3171c9a69fd3f451d220e806b2b1cdca41b8e368b0f/markdown2-2.4.10-py2.py3-none-any.whl.metadata\n", - " Downloading markdown2-2.4.10-py2.py3-none-any.whl.metadata (2.0 kB)\n", - "Collecting megatron-core==0.2.0 (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for megatron-core==0.2.0 from https://files.pythonhosted.org/packages/33/f1/d94f2282b91950e31223efc39138748d71907dbf857e5523d1e73619fd62/megatron_core-0.2.0-py3-none-any.whl.metadata\n", - " Downloading megatron_core-0.2.0-py3-none-any.whl.metadata (1.6 kB)\n", - "Requirement already satisfied: nltk>=3.6.5 in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (3.8.1)\n", - "Collecting opencc (from nemo-toolkit==1.21.0rc0)\n", - " Downloading OpenCC-1.1.6-cp310-cp310-manylinux1_x86_64.whl (778 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m778.3/778.3 kB\u001b[0m \u001b[31m70.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting pangu (from nemo-toolkit==1.21.0rc0)\n", - " Downloading pangu-4.0.6.1-py3-none-any.whl (6.4 kB)\n", - "Collecting rapidfuzz (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for rapidfuzz from https://files.pythonhosted.org/packages/35/04/9ca97b17da457ed294519477da2aad0799c9ba8eebf37761a5ca94c35534/rapidfuzz-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", - " Downloading rapidfuzz-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)\n", - "Collecting rouge-score (from nemo-toolkit==1.21.0rc0)\n", - " Downloading rouge_score-0.1.2.tar.gz (17 kB)\n", - " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Collecting sacrebleu[ja] (from nemo-toolkit==1.21.0rc0)\n", - " Downloading sacrebleu-2.3.1-py3-none-any.whl (118 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m118.9/118.9 kB\u001b[0m \u001b[31m16.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting sentence-transformers (from nemo-toolkit==1.21.0rc0)\n", - " Downloading sentence-transformers-2.2.2.tar.gz (85 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m86.0/86.0 kB\u001b[0m \u001b[31m12.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Requirement already satisfied: tensorstore in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.1.41)\n", - "Collecting zarr (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for zarr from https://files.pythonhosted.org/packages/ba/55/0f5ec28561a1698ac5c11edc5724f8c6d48d01baecf740ffd62107d95e7f/zarr-2.16.1-py3-none-any.whl.metadata\n", - " Downloading zarr-2.16.1-py3-none-any.whl.metadata (5.8 kB)\n", - "Collecting attrdict (from nemo-toolkit==1.21.0rc0)\n", - " Downloading attrdict-2.0.1-py2.py3-none-any.whl (9.9 kB)\n", - "Collecting kornia (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for kornia from https://files.pythonhosted.org/packages/55/da/72cb83aa364ebb4d0109965e20c5d33d7063ccab15332c3fd0acfd5609c9/kornia-0.7.0-py2.py3-none-any.whl.metadata\n", - " Downloading kornia-0.7.0-py2.py3-none-any.whl.metadata (12 kB)\n", - "Collecting nemo-text-processing (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for nemo-text-processing from https://files.pythonhosted.org/packages/bd/82/b776d01ba650c3ab42ba5e381e34b35e507a21b16ad1246f51026fa13f0b/nemo_text_processing-0.2.0rc0-py3-none-any.whl.metadata\n", - " Downloading nemo_text_processing-0.2.0rc0-py3-none-any.whl.metadata (7.2 kB)\n", - "Collecting pypinyin (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for pypinyin from https://files.pythonhosted.org/packages/00/fc/3e82bf38739a7b2c4f699245ce6c84ff254723c678c2cdc5d2ecbddf9afb/pypinyin-0.49.0-py2.py3-none-any.whl.metadata\n", - " Downloading pypinyin-0.49.0-py2.py3-none-any.whl.metadata (12 kB)\n", - "Collecting pypinyin-dict (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for pypinyin-dict from https://files.pythonhosted.org/packages/89/31/16c26425685a84191503a226450837e0f4d540c164665d8567f2472861a9/pypinyin_dict-0.6.0-py2.py3-none-any.whl.metadata\n", - " Downloading pypinyin_dict-0.6.0-py2.py3-none-any.whl.metadata (3.6 kB)\n", - "Collecting progress>=1.5 (from nemo-toolkit==1.21.0rc0)\n", - " Downloading progress-1.6.tar.gz (7.8 kB)\n", - " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Requirement already satisfied: tabulate>=0.8.7 in /usr/local/lib/python3.10/dist-packages (from nemo-toolkit==1.21.0rc0) (0.9.0)\n", - "Collecting textdistance>=4.1.5 (from nemo-toolkit==1.21.0rc0)\n", - " Downloading textdistance-4.5.0-py3-none-any.whl (31 kB)\n", - "Requirement already satisfied: attrs>=18.1.0 in /usr/local/lib/python3.10/dist-packages (from black==19.10b0->nemo-toolkit==1.21.0rc0) (23.1.0)\n", - "Requirement already satisfied: appdirs in /usr/local/lib/python3.10/dist-packages (from black==19.10b0->nemo-toolkit==1.21.0rc0) (1.4.4)\n", - "Requirement already satisfied: toml>=0.9.4 in /usr/local/lib/python3.10/dist-packages (from black==19.10b0->nemo-toolkit==1.21.0rc0) (0.10.2)\n", - "Collecting typed-ast>=1.4.0 (from black==19.10b0->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for typed-ast>=1.4.0 from https://files.pythonhosted.org/packages/e2/ed/b9b8b794b37b55c9247b1e8d38b0361e8158795c181636d34d6c11b506e7/typed_ast-1.5.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", - " Downloading typed_ast-1.5.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.7 kB)\n", - "Requirement already satisfied: regex in /usr/local/lib/python3.10/dist-packages (from black==19.10b0->nemo-toolkit==1.21.0rc0) (2023.6.3)\n", - "Collecting pathspec<1,>=0.6 (from black==19.10b0->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for pathspec<1,>=0.6 from https://files.pythonhosted.org/packages/b4/2a/9b1be29146139ef459188f5e420a66e835dda921208db600b7037093891f/pathspec-0.11.2-py3-none-any.whl.metadata\n", - " Downloading pathspec-0.11.2-py3-none-any.whl.metadata (19 kB)\n", - "Collecting antlr4-python3-runtime==4.9.* (from hydra-core<=1.3.2,>1.3->nemo-toolkit==1.21.0rc0)\n", - " Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m117.0/117.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "INFO: pip is looking at multiple versions of jiwer to determine which version is compatible with other requirements. This could take a while.\n", - "Collecting jiwer (from nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for jiwer from https://files.pythonhosted.org/packages/23/a3/92c29a5e422acd87e3b4f2e6dc0ce877070cc9b2f81d30fe84122032338a/jiwer-3.0.2-py3-none-any.whl.metadata\n", - " Downloading jiwer-3.0.2-py3-none-any.whl.metadata (2.6 kB)\n", - " Downloading jiwer-3.0.1-py3-none-any.whl (21 kB)\n", - " Downloading jiwer-3.0.0-py3-none-any.whl (21 kB)\n", - " Downloading jiwer-2.6.0-py3-none-any.whl (20 kB)\n", - " Downloading jiwer-2.5.2-py3-none-any.whl (15 kB)\n", - "Collecting rapidfuzz (from nemo-toolkit==1.21.0rc0)\n", - " Downloading rapidfuzz-2.13.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.2/2.2 MB\u001b[0m \u001b[31m96.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (3.0.0)\n", - "Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (1.3.2)\n", - "Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (4.4.2)\n", - "Requirement already satisfied: pooch>=1.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (1.7.0)\n", - "Requirement already satisfied: soxr>=0.3.2 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (0.3.6)\n", - "Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (4.7.1)\n", - "Requirement already satisfied: lazy-loader>=0.1 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (0.3)\n", - "Requirement already satisfied: msgpack>=1.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (1.0.5)\n", - "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->nemo-toolkit==1.21.0rc0) (1.1.0)\n", - "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->nemo-toolkit==1.21.0rc0) (0.11.0)\n", - "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->nemo-toolkit==1.21.0rc0) (4.42.1)\n", - "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->nemo-toolkit==1.21.0rc0) (1.4.5)\n", - "Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->nemo-toolkit==1.21.0rc0) (9.4.0)\n", - "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->nemo-toolkit==1.21.0rc0) (3.1.1)\n", - "Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba->nemo-toolkit==1.21.0rc0) (0.39.1)\n", - "Requirement already satisfied: PyYAML>=5.1.0 in /usr/local/lib/python3.10/dist-packages (from omegaconf<=2.3->nemo-toolkit==1.21.0rc0) (6.0.1)\n", - "Requirement already satisfied: protobuf>=3.20.2 in /usr/local/lib/python3.10/dist-packages (from onnx>=1.7.0->nemo-toolkit==1.21.0rc0) (3.20.3)\n", - "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil->nemo-toolkit==1.21.0rc0) (1.16.0)\n", - "Requirement already satisfied: fsspec[http]>2021.06.0 in /usr/local/lib/python3.10/dist-packages (from pytorch-lightning<=2.0.7,>=2.0->nemo-toolkit==1.21.0rc0) (2023.6.0)\n", - "Collecting lightning-utilities>=0.7.0 (from pytorch-lightning<=2.0.7,>=2.0->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for lightning-utilities>=0.7.0 from https://files.pythonhosted.org/packages/46/ee/8641eeb6a062f383b7d6875604e1f3f83bd2c93a0b4dbcabd3150b32de6e/lightning_utilities-0.9.0-py3-none-any.whl.metadata\n", - " Downloading lightning_utilities-0.9.0-py3-none-any.whl.metadata (4.6 kB)\n", - "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->nemo-toolkit==1.21.0rc0) (3.2.0)\n", - "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.10/dist-packages (from soundfile->nemo-toolkit==1.21.0rc0) (1.15.1)\n", - "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->nemo-toolkit==1.21.0rc0) (3.12.3)\n", - "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->nemo-toolkit==1.21.0rc0) (1.12)\n", - "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->nemo-toolkit==1.21.0rc0) (3.1)\n", - "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->nemo-toolkit==1.21.0rc0) (3.1.2)\n", - "Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch->nemo-toolkit==1.21.0rc0) (2.0.0)\n", - "Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch->nemo-toolkit==1.21.0rc0) (3.27.2)\n", - "Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch->nemo-toolkit==1.21.0rc0) (16.0.6)\n", - "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers>=4.0.1->nemo-toolkit==1.21.0rc0) (2.31.0)\n", - "Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers>=4.0.1->nemo-toolkit==1.21.0rc0)\n", - " Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.8/7.8 MB\u001b[0m \u001b[31m81.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting safetensors>=0.3.1 (from transformers>=4.0.1->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for safetensors>=0.3.1 from https://files.pythonhosted.org/packages/6c/f0/c17bbdb1e5f9dab29d44cade445135789f75f8f08ea2728d04493ea8412b/safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", - " Downloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.7 kB)\n", - "Collecting botocore<1.32.0,>=1.31.43 (from boto3->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for botocore<1.32.0,>=1.31.43 from https://files.pythonhosted.org/packages/88/37/68fd026cde5d1c802ab34290285c5019e7ba3de3eea8e6c07756cfb827ae/botocore-1.31.43-py3-none-any.whl.metadata\n", - " Downloading botocore-1.31.43-py3-none-any.whl.metadata (6.0 kB)\n", - "Collecting jmespath<2.0.0,>=0.7.1 (from boto3->nemo-toolkit==1.21.0rc0)\n", - " Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)\n", - "Collecting s3transfer<0.7.0,>=0.6.0 (from boto3->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for s3transfer<0.7.0,>=0.6.0 from https://files.pythonhosted.org/packages/d9/17/a3b666f5ef9543cfd3c661d39d1e193abb9649d0cfbbfee3cf3b51d5af02/s3transfer-0.6.2-py3-none-any.whl.metadata\n", - " Downloading s3transfer-0.6.2-py3-none-any.whl.metadata (1.8 kB)\n", - "Requirement already satisfied: pyarrow>=8.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets->nemo-toolkit==1.21.0rc0) (9.0.0)\n", - "Collecting dill<0.3.8,>=0.3.0 (from datasets->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for dill<0.3.8,>=0.3.0 from https://files.pythonhosted.org/packages/f5/3a/74a29b11cf2cdfcd6ba89c0cecd70b37cd1ba7b77978ce611eb7a146a832/dill-0.3.7-py3-none-any.whl.metadata\n", - " Downloading dill-0.3.7-py3-none-any.whl.metadata (9.9 kB)\n", - "Collecting xxhash (from datasets->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for xxhash from https://files.pythonhosted.org/packages/13/c3/e942893f4864a424514c81640f114980cfd5aff7e7414d1e0255f4571111/xxhash-3.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", - " Downloading xxhash-3.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)\n", - "Collecting multiprocess (from datasets->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for multiprocess from https://files.pythonhosted.org/packages/35/a8/36d8d7b3e46b377800d8dec47891cdf05842d1a2366909ae4a0c89fbc5e6/multiprocess-0.70.15-py310-none-any.whl.metadata\n", - " Downloading multiprocess-0.70.15-py310-none-any.whl.metadata (7.2 kB)\n", - "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets->nemo-toolkit==1.21.0rc0) (3.8.5)\n", - "Collecting pybind11>=2.2 (from fasttext->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for pybind11>=2.2 from https://files.pythonhosted.org/packages/06/55/9f73c32dda93fa4f539fafa268f9504e83c489f460c380371d94296126cd/pybind11-2.11.1-py3-none-any.whl.metadata\n", - " Using cached pybind11-2.11.1-py3-none-any.whl.metadata (9.5 kB)\n", - "Collecting aniso8601>=0.82 (from flask-restful->nemo-toolkit==1.21.0rc0)\n", - " Downloading aniso8601-9.0.1-py2.py3-none-any.whl (52 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m52.8/52.8 kB\u001b[0m \u001b[31m7.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: Flask>=0.8 in /usr/local/lib/python3.10/dist-packages (from flask-restful->nemo-toolkit==1.21.0rc0) (2.2.5)\n", - "Requirement already satisfied: pytz in /usr/local/lib/python3.10/dist-packages (from flask-restful->nemo-toolkit==1.21.0rc0) (2023.3.post1)\n", - "Requirement already satisfied: wcwidth>=0.2.5 in /usr/local/lib/python3.10/dist-packages (from ftfy->nemo-toolkit==1.21.0rc0) (0.2.6)\n", - "Collecting distance>=0.1.3 (from g2p-en->nemo-toolkit==1.21.0rc0)\n", - " Downloading Distance-0.1.3.tar.gz (180 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m180.3/180.3 kB\u001b[0m \u001b[31m24.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (from gdown->nemo-toolkit==1.21.0rc0) (4.11.2)\n", - "Requirement already satisfied: ipykernel>=4.5.1 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->nemo-toolkit==1.21.0rc0) (5.5.6)\n", - "Requirement already satisfied: ipython-genutils~=0.2.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->nemo-toolkit==1.21.0rc0) (0.2.0)\n", - "Requirement already satisfied: traitlets>=4.3.1 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->nemo-toolkit==1.21.0rc0) (5.7.1)\n", - "Requirement already satisfied: widgetsnbextension~=3.6.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->nemo-toolkit==1.21.0rc0) (3.6.5)\n", - "Requirement already satisfied: ipython>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->nemo-toolkit==1.21.0rc0) (7.34.0)\n", - "Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->nemo-toolkit==1.21.0rc0) (3.0.8)\n", - "Collecting cdifflib (from nemo-text-processing->nemo-toolkit==1.21.0rc0)\n", - " Downloading cdifflib-1.2.6.tar.gz (11 kB)\n", - " Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", - " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n", - " Installing backend dependencies ... \u001b[?25l\u001b[?25hdone\n", - " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", - "Collecting pynini==2.1.5 (from nemo-text-processing->nemo-toolkit==1.21.0rc0)\n", - " Downloading pynini-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (161.3 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m161.3/161.3 MB\u001b[0m \u001b[31m6.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: Cython>=0.29 in /usr/local/lib/python3.10/dist-packages (from pynini==2.1.5->nemo-text-processing->nemo-toolkit==1.21.0rc0) (0.29.36)\n", - "Requirement already satisfied: sortedcontainers>=2.0.4 in /usr/local/lib/python3.10/dist-packages (from pyannote.core->nemo-toolkit==1.21.0rc0) (2.4.0)\n", - "Collecting pyannote.database>=4.0.1 (from pyannote.metrics->nemo-toolkit==1.21.0rc0)\n", - " Downloading pyannote.database-5.0.1-py3-none-any.whl (48 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m48.1/48.1 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting docopt>=0.6.2 (from pyannote.metrics->nemo-toolkit==1.21.0rc0)\n", - " Downloading docopt-0.6.2.tar.gz (25 kB)\n", - " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Requirement already satisfied: iniconfig in /usr/local/lib/python3.10/dist-packages (from pytest->nemo-toolkit==1.21.0rc0) (2.0.0)\n", - "Requirement already satisfied: pluggy<2.0,>=0.12 in /usr/local/lib/python3.10/dist-packages (from pytest->nemo-toolkit==1.21.0rc0) (1.3.0)\n", - "Requirement already satisfied: exceptiongroup>=1.0.0rc8 in /usr/local/lib/python3.10/dist-packages (from pytest->nemo-toolkit==1.21.0rc0) (1.1.3)\n", - "Requirement already satisfied: tomli>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from pytest->nemo-toolkit==1.21.0rc0) (2.0.1)\n", - "Requirement already satisfied: absl-py in /usr/local/lib/python3.10/dist-packages (from rouge-score->nemo-toolkit==1.21.0rc0) (1.4.0)\n", - "Collecting ruamel.yaml.clib>=0.2.7 (from ruamel.yaml->nemo-toolkit==1.21.0rc0)\n", - " Downloading ruamel.yaml.clib-0.2.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (485 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m485.6/485.6 kB\u001b[0m \u001b[31m49.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting portalocker (from sacrebleu[ja]->nemo-toolkit==1.21.0rc0)\n", - " Downloading portalocker-2.7.0-py2.py3-none-any.whl (15 kB)\n", - "Collecting colorama (from sacrebleu[ja]->nemo-toolkit==1.21.0rc0)\n", - " Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)\n", - "Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (from sacrebleu[ja]->nemo-toolkit==1.21.0rc0) (4.9.3)\n", - "Collecting mecab-python3==1.0.5 (from sacrebleu[ja]->nemo-toolkit==1.21.0rc0)\n", - " Downloading mecab_python3-1.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (581 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m581.1/581.1 kB\u001b[0m \u001b[31m56.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting ipadic<2.0,>=1.0 (from sacrebleu[ja]->nemo-toolkit==1.21.0rc0)\n", - " Downloading ipadic-1.0.0.tar.gz (13.4 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.4/13.4 MB\u001b[0m \u001b[31m96.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Requirement already satisfied: torchvision in /usr/local/lib/python3.10/dist-packages (from sentence-transformers->nemo-toolkit==1.21.0rc0) (0.15.2+cu118)\n", - "Requirement already satisfied: sphinxcontrib-applehelp in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (1.0.7)\n", - "Requirement already satisfied: sphinxcontrib-devhelp in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (1.0.5)\n", - "Requirement already satisfied: sphinxcontrib-jsmath in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (1.0.1)\n", - "Requirement already satisfied: sphinxcontrib-htmlhelp>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (2.0.4)\n", - "Requirement already satisfied: sphinxcontrib-serializinghtml>=1.1.5 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (1.1.9)\n", - "Requirement already satisfied: sphinxcontrib-qthelp in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (1.0.6)\n", - "Requirement already satisfied: Pygments>=2.0 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (2.16.1)\n", - "Requirement already satisfied: docutils<0.19,>=0.14 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (0.18.1)\n", - "Requirement already satisfied: snowballstemmer>=1.1 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (2.2.0)\n", - "Requirement already satisfied: babel>=1.3 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (2.12.1)\n", - "Requirement already satisfied: alabaster<0.8,>=0.7 in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (0.7.13)\n", - "Requirement already satisfied: imagesize in /usr/local/lib/python3.10/dist-packages (from sphinx->nemo-toolkit==1.21.0rc0) (1.4.1)\n", - "Collecting docutils<0.19,>=0.14 (from sphinx->nemo-toolkit==1.21.0rc0)\n", - " Downloading docutils-0.17.1-py2.py3-none-any.whl (575 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m575.5/575.5 kB\u001b[0m \u001b[31m42.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting pybtex>=0.24 (from sphinxcontrib-bibtex->nemo-toolkit==1.21.0rc0)\n", - " Downloading pybtex-0.24.0-py2.py3-none-any.whl (561 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m561.4/561.4 kB\u001b[0m \u001b[31m48.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting pybtex-docutils>=1.0.0 (from sphinxcontrib-bibtex->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for pybtex-docutils>=1.0.0 from https://files.pythonhosted.org/packages/11/b1/ce1f4596211efb5410e178a803f08e59b20bedb66837dcf41e21c54f9ec1/pybtex_docutils-1.0.3-py3-none-any.whl.metadata\n", - " Downloading pybtex_docutils-1.0.3-py3-none-any.whl.metadata (4.3 kB)\n", - "Requirement already satisfied: grpcio>=1.48.2 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (1.57.0)\n", - "Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (2.17.3)\n", - "Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (1.0.0)\n", - "Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (3.4.4)\n", - "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (0.7.1)\n", - "Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (2.3.7)\n", - "Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.10/dist-packages (from tensorboard->nemo-toolkit==1.21.0rc0) (0.41.2)\n", - "Collecting plac (from texterrors->nemo-toolkit==1.21.0rc0)\n", - " Downloading plac-1.3.5-py2.py3-none-any.whl (22 kB)\n", - "Collecting loguru (from texterrors->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for loguru from https://files.pythonhosted.org/packages/19/a9/4e91197b121a41c640367641a510fd9a05bb7a3259fc9678ee2976c8fd00/loguru-0.7.1-py3-none-any.whl.metadata\n", - " Downloading loguru-0.7.1-py3-none-any.whl.metadata (22 kB)\n", - "Requirement already satisfied: termcolor in /usr/local/lib/python3.10/dist-packages (from texterrors->nemo-toolkit==1.21.0rc0) (2.3.0)\n", - "Collecting Levenshtein (from texterrors->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for Levenshtein from https://files.pythonhosted.org/packages/e6/02/0a4ed6a9e2b78f6b57f25a87fc194d7d10c2bbe95d985f36390e86285232/Levenshtein-0.21.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n", - " Downloading Levenshtein-0.21.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)\n", - "Collecting GitPython!=3.1.29,>=1.0.0 (from wandb->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for GitPython!=3.1.29,>=1.0.0 from https://files.pythonhosted.org/packages/0f/c6/bb9e2276b6fed126aa21e292493b45a3df4cfba7cbfcf2ab8809a6b0e718/GitPython-3.1.35-py3-none-any.whl.metadata\n", - " Downloading GitPython-3.1.35-py3-none-any.whl.metadata (10 kB)\n", - "Requirement already satisfied: psutil>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from wandb->nemo-toolkit==1.21.0rc0) (5.9.5)\n", - "Collecting sentry-sdk>=1.0.0 (from wandb->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for sentry-sdk>=1.0.0 from https://files.pythonhosted.org/packages/17/22/dbd5f854f373214d48585eeb6844e50a8dd1600f435d9033493f76f66dfa/sentry_sdk-1.30.0-py2.py3-none-any.whl.metadata\n", - " Downloading sentry_sdk-1.30.0-py2.py3-none-any.whl.metadata (9.6 kB)\n", - "Collecting docker-pycreds>=0.4.0 (from wandb->nemo-toolkit==1.21.0rc0)\n", - " Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)\n", - "Collecting pathtools (from wandb->nemo-toolkit==1.21.0rc0)\n", - " Downloading pathtools-0.1.2.tar.gz (11 kB)\n", - " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Collecting setproctitle (from wandb->nemo-toolkit==1.21.0rc0)\n", - " Downloading setproctitle-1.3.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30 kB)\n", - "Collecting asciitree (from zarr->nemo-toolkit==1.21.0rc0)\n", - " Downloading asciitree-0.3.3.tar.gz (4.0 kB)\n", - " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Collecting fasteners (from zarr->nemo-toolkit==1.21.0rc0)\n", - " Downloading fasteners-0.18-py3-none-any.whl (18 kB)\n", - "Collecting numcodecs>=0.10.0 (from zarr->nemo-toolkit==1.21.0rc0)\n", - " Downloading numcodecs-0.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.7 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.7/6.7 MB\u001b[0m \u001b[31m116.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting urllib3<1.27,>=1.25.4 (from botocore<1.32.0,>=1.31.43->boto3->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for urllib3<1.27,>=1.25.4 from https://files.pythonhosted.org/packages/c5/05/c214b32d21c0b465506f95c4f28ccbcba15022e000b043b72b3df7728471/urllib3-1.26.16-py2.py3-none-any.whl.metadata\n", - " Downloading urllib3-1.26.16-py2.py3-none-any.whl.metadata (48 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m48.4/48.4 kB\u001b[0m \u001b[31m6.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.0->soundfile->nemo-toolkit==1.21.0rc0) (2.21)\n", - "Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from Flask>=0.8->flask-restful->nemo-toolkit==1.21.0rc0) (2.1.2)\n", - "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->nemo-toolkit==1.21.0rc0) (3.2.0)\n", - "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->nemo-toolkit==1.21.0rc0) (6.0.4)\n", - "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->nemo-toolkit==1.21.0rc0) (4.0.3)\n", - "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->nemo-toolkit==1.21.0rc0) (1.9.2)\n", - "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->nemo-toolkit==1.21.0rc0) (1.4.0)\n", - "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->nemo-toolkit==1.21.0rc0) (1.3.1)\n", - "Collecting gitdb<5,>=4.0.1 (from GitPython!=3.1.29,>=1.0.0->wandb->nemo-toolkit==1.21.0rc0)\n", - " Downloading gitdb-4.0.10-py3-none-any.whl (62 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.7/62.7 kB\u001b[0m \u001b[31m8.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard->nemo-toolkit==1.21.0rc0) (5.3.1)\n", - "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard->nemo-toolkit==1.21.0rc0) (0.3.0)\n", - "Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard->nemo-toolkit==1.21.0rc0) (4.9)\n", - "Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard->nemo-toolkit==1.21.0rc0) (1.3.1)\n", - "Requirement already satisfied: jupyter-client in /usr/local/lib/python3.10/dist-packages (from ipykernel>=4.5.1->ipywidgets->nemo-toolkit==1.21.0rc0) (6.1.12)\n", - "Requirement already satisfied: tornado>=4.2 in /usr/local/lib/python3.10/dist-packages (from ipykernel>=4.5.1->ipywidgets->nemo-toolkit==1.21.0rc0) (6.3.2)\n", - "Collecting jedi>=0.16 (from ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for jedi>=0.16 from https://files.pythonhosted.org/packages/8e/46/7e3ae3aa2dcfcffc5138c6cef5448523218658411c84a2000bf75c8d3ec1/jedi-0.19.0-py2.py3-none-any.whl.metadata\n", - " Downloading jedi-0.19.0-py2.py3-none-any.whl.metadata (22 kB)\n", - "Requirement already satisfied: pickleshare in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.7.5)\n", - "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (3.0.39)\n", - "Requirement already satisfied: backcall in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.2.0)\n", - "Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.1.6)\n", - "Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (4.8.0)\n", - "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->nemo-toolkit==1.21.0rc0) (2.1.3)\n", - "Requirement already satisfied: entrypoints in /usr/local/lib/python3.10/dist-packages (from numcodecs>=0.10.0->zarr->nemo-toolkit==1.21.0rc0) (0.4)\n", - "Requirement already satisfied: platformdirs>=2.5.0 in /usr/local/lib/python3.10/dist-packages (from pooch>=1.0->librosa>=0.9.0->nemo-toolkit==1.21.0rc0) (3.10.0)\n", - "Requirement already satisfied: typer[all]>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit==1.21.0rc0) (0.9.0)\n", - "Collecting latexcodec>=1.0.4 (from pybtex>=0.24->sphinxcontrib-bibtex->nemo-toolkit==1.21.0rc0)\n", - " Downloading latexcodec-2.0.1-py2.py3-none-any.whl (18 kB)\n", - "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.0.1->nemo-toolkit==1.21.0rc0) (3.4)\n", - "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.0.1->nemo-toolkit==1.21.0rc0) (2023.7.22)\n", - "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->nemo-toolkit==1.21.0rc0) (1.3.0)\n", - "Requirement already satisfied: notebook>=4.4.1 in /usr/local/lib/python3.10/dist-packages (from widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (6.5.5)\n", - "Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4->gdown->nemo-toolkit==1.21.0rc0) (2.5)\n", - "Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.0.1->nemo-toolkit==1.21.0rc0) (1.7.1)\n", - "Collecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->GitPython!=3.1.29,>=1.0.0->wandb->nemo-toolkit==1.21.0rc0)\n", - " Downloading smmap-5.0.0-py3-none-any.whl (24 kB)\n", - "Requirement already satisfied: parso<0.9.0,>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.8.3)\n", - "Requirement already satisfied: pyzmq<25,>=17 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (23.2.1)\n", - "Requirement already satisfied: argon2-cffi in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (23.1.0)\n", - "Requirement already satisfied: jupyter-core>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (5.3.1)\n", - "Requirement already satisfied: nbformat in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (5.9.2)\n", - "Requirement already satisfied: nbconvert>=5 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (6.5.4)\n", - "Requirement already satisfied: nest-asyncio>=1.5 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.5.7)\n", - "Requirement already satisfied: Send2Trash>=1.8.0 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.8.2)\n", - "Requirement already satisfied: terminado>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.17.1)\n", - "Requirement already satisfied: prometheus-client in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.17.1)\n", - "Requirement already satisfied: nbclassic>=0.4.7 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.0.0)\n", - "Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.10/dist-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.7.0)\n", - "Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /usr/local/lib/python3.10/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard->nemo-toolkit==1.21.0rc0) (0.5.0)\n", - "Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard->nemo-toolkit==1.21.0rc0) (3.2.2)\n", - "Collecting shellingham<2.0.0,>=1.3.0 (from typer[all]>=0.2.1->pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit==1.21.0rc0)\n", - " Obtaining dependency information for shellingham<2.0.0,>=1.3.0 from https://files.pythonhosted.org/packages/57/70/0265437683625b2e6491736706d3d679d90e2a26f6bff59f4e46e09872b9/shellingham-1.5.3-py2.py3-none-any.whl.metadata\n", - " Downloading shellingham-1.5.3-py2.py3-none-any.whl.metadata (3.4 kB)\n", - "Requirement already satisfied: rich<14.0.0,>=10.11.0 in /usr/local/lib/python3.10/dist-packages (from typer[all]>=0.2.1->pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit==1.21.0rc0) (13.5.2)\n", - "Requirement already satisfied: jupyter-server>=1.8 in /usr/local/lib/python3.10/dist-packages (from nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.24.0)\n", - "Requirement already satisfied: notebook-shim>=0.2.3 in /usr/local/lib/python3.10/dist-packages (from nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.2.3)\n", - "Requirement already satisfied: bleach in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (6.0.0)\n", - "Requirement already satisfied: defusedxml in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.7.1)\n", - "Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.2.2)\n", - "Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.8.4)\n", - "Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.8.0)\n", - "Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.5.0)\n", - "Requirement already satisfied: tinycss2 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.2.1)\n", - "Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.10/dist-packages (from nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (2.18.0)\n", - "Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.10/dist-packages (from nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (4.19.0)\n", - "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich<14.0.0,>=10.11.0->typer[all]>=0.2.1->pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit==1.21.0rc0) (3.0.0)\n", - "Requirement already satisfied: argon2-cffi-bindings in /usr/local/lib/python3.10/dist-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (21.2.0)\n", - "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (2023.7.1)\n", - "Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.30.2)\n", - "Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.10.2)\n", - "Requirement already satisfied: anyio<4,>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from jupyter-server>=1.8->nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (3.7.1)\n", - "Requirement already satisfied: websocket-client in /usr/local/lib/python3.10/dist-packages (from jupyter-server>=1.8->nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.6.2)\n", - "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich<14.0.0,>=10.11.0->typer[all]>=0.2.1->pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit==1.21.0rc0) (0.1.2)\n", - "Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from bleach->nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (0.5.1)\n", - "Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.10/dist-packages (from anyio<4,>=3.1.0->jupyter-server>=1.8->nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets->nemo-toolkit==1.21.0rc0) (1.3.0)\n", - "Downloading megatron_core-0.2.0-py3-none-any.whl (46 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m46.0/46.0 kB\u001b[0m \u001b[31m4.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading onnx-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.6 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.6/14.6 MB\u001b[0m \u001b[31m71.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading pydantic-1.10.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.1/3.1 MB\u001b[0m \u001b[31m81.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading pytorch_lightning-2.0.7-py3-none-any.whl (724 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m725.0/725.0 kB\u001b[0m \u001b[31m55.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading torchmetrics-1.1.1-py3-none-any.whl (763 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m763.4/763.4 kB\u001b[0m \u001b[31m51.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading transformers-4.33.1-py3-none-any.whl (7.6 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.6/7.6 MB\u001b[0m \u001b[31m90.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m268.8/268.8 kB\u001b[0m \u001b[31m31.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading boto3-1.28.43-py3-none-any.whl (135 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m135.8/135.8 kB\u001b[0m \u001b[31m18.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading datasets-2.14.5-py3-none-any.whl (519 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m519.6/519.6 kB\u001b[0m \u001b[31m48.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading Flask_RESTful-0.3.10-py2.py3-none-any.whl (26 kB)\n", - "Downloading ijson-3.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (111 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m111.8/111.8 kB\u001b[0m \u001b[31m14.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading kornia-0.7.0-py2.py3-none-any.whl (705 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m705.7/705.7 kB\u001b[0m \u001b[31m35.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading markdown2-2.4.10-py2.py3-none-any.whl (39 kB)\n", - "Downloading marshmallow-3.20.1-py3-none-any.whl (49 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.4/49.4 kB\u001b[0m \u001b[31m6.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading nemo_text_processing-0.2.0rc0-py3-none-any.whl (2.4 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.4/2.4 MB\u001b[0m \u001b[31m50.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading pypinyin-0.49.0-py2.py3-none-any.whl (1.4 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.4/1.4 MB\u001b[0m \u001b[31m69.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading pypinyin_dict-0.6.0-py2.py3-none-any.whl (9.5 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m9.5/9.5 MB\u001b[0m \u001b[31m86.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading ruamel.yaml-0.17.32-py3-none-any.whl (112 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m112.2/112.2 kB\u001b[0m \u001b[31m11.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading sphinxcontrib_bibtex-2.6.1-py3-none-any.whl (40 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m40.9/40.9 kB\u001b[0m \u001b[31m5.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading wandb-0.15.10-py3-none-any.whl (2.1 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.1/2.1 MB\u001b[0m \u001b[31m76.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading zarr-2.16.1-py3-none-any.whl (206 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m206.9/206.9 kB\u001b[0m \u001b[31m21.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading botocore-1.31.43-py3-none-any.whl (11.2 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m11.2/11.2 MB\u001b[0m \u001b[31m87.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading dill-0.3.7-py3-none-any.whl (115 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m115.3/115.3 kB\u001b[0m \u001b[31m14.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading GitPython-3.1.35-py3-none-any.whl (188 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m188.8/188.8 kB\u001b[0m \u001b[31m23.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading lightning_utilities-0.9.0-py3-none-any.whl (23 kB)\n", - "Downloading pathspec-0.11.2-py3-none-any.whl (29 kB)\n", - "Using cached pybind11-2.11.1-py3-none-any.whl (227 kB)\n", - "Downloading pybtex_docutils-1.0.3-py3-none-any.whl (6.4 kB)\n", - "Downloading s3transfer-0.6.2-py3-none-any.whl (79 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.8/79.8 kB\u001b[0m \u001b[31m10.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m77.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading sentry_sdk-1.30.0-py2.py3-none-any.whl (218 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m218.8/218.8 kB\u001b[0m \u001b[31m26.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading typed_ast-1.5.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (824 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m824.7/824.7 kB\u001b[0m \u001b[31m52.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading Levenshtein-0.21.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (172 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m172.5/172.5 kB\u001b[0m \u001b[31m21.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading loguru-0.7.1-py3-none-any.whl (61 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m61.4/61.4 kB\u001b[0m \u001b[31m8.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading multiprocess-0.70.15-py310-none-any.whl (134 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m16.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading xxhash-3.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m14.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading jedi-0.19.0-py2.py3-none-any.whl (1.6 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.6/1.6 MB\u001b[0m \u001b[31m66.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading urllib3-1.26.16-py2.py3-none-any.whl (143 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m143.1/143.1 kB\u001b[0m \u001b[31m18.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading shellingham-1.5.3-py2.py3-none-any.whl (9.7 kB)\n", - "Building wheels for collected packages: antlr4-python3-runtime, progress, sacremoses, youtokentome, fasttext, kaldi-python-io, nemo-toolkit, rouge-score, sentence-transformers, wget, distance, docopt, ipadic, asciitree, cdifflib, pathtools\n", - " Building wheel for antlr4-python3-runtime (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144554 sha256=ad5d3c0bd8ee550570d532ca332fe7204baee65f2066eefcf5fc70cd2afdb382\n", - " Stored in directory: /root/.cache/pip/wheels/12/93/dd/1f6a127edc45659556564c5730f6d4e300888f4bca2d4c5a88\n", - " Building wheel for progress (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for progress: filename=progress-1.6-py3-none-any.whl size=9610 sha256=79cc5bc84a633414a3856bc3a35542e0489aaffa62e7cdae95e51474a38f064c\n", - " Stored in directory: /root/.cache/pip/wheels/a2/68/5f/c339b20a41659d856c93ccdce6a33095493eb82c3964aac5a1\n", - " Building wheel for sacremoses (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895241 sha256=1fd4f16d193691e8ff7869591f5d11f5a32e21289fbb9a8422a800bc1bf781c7\n", - " Stored in directory: /root/.cache/pip/wheels/00/24/97/a2ea5324f36bc626e1ea0267f33db6aa80d157ee977e9e42fb\n", - " Building wheel for youtokentome (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for youtokentome: filename=youtokentome-1.0.6-cp310-cp310-linux_x86_64.whl size=1883668 sha256=d3e4925cf4b245a76184b4afe612a4fffd683de42889b5c1a07ca44be98dde9b\n", - " Stored in directory: /root/.cache/pip/wheels/df/85/f8/301d2ba45f43f30bed2fe413efa760bc726b8b660ed9c2900c\n", - " Building wheel for fasttext (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for fasttext: filename=fasttext-0.9.2-cp310-cp310-linux_x86_64.whl size=4199767 sha256=5bcc699963ee85f0f02846c1e278140b3ba3d95a7877d60b48c5b10f8f424bb1\n", - " Stored in directory: /root/.cache/pip/wheels/a5/13/75/f811c84a8ab36eedbaef977a6a58a98990e8e0f1967f98f394\n", - " Building wheel for kaldi-python-io (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for kaldi-python-io: filename=kaldi_python_io-1.2.2-py3-none-any.whl size=8948 sha256=d2e946cb4c2298a69803c0492c1533a3d08d313ced846ba7f1555a11ab164bd6\n", - " Stored in directory: /root/.cache/pip/wheels/b7/23/5f/49d3a826be576faf61d84e8028e1914bb36a5586ee2613b087\n", - " Building editable for nemo-toolkit (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for nemo-toolkit: filename=nemo_toolkit-1.21.0rc0-0.editable-py3-none-any.whl size=9181 sha256=d5c26c7630db00d8c6a483e19f89aa11a4d7a11920dcb052d8bb6a56689eb77d\n", - " Stored in directory: /tmp/pip-ephem-wheel-cache-4nra31ak/wheels/41/a7/71/00ccdddfb43c015e8d025853cafb90117dd722fd8ec557581b\n", - " Building wheel for rouge-score (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24932 sha256=7f61e2fe8b2f326d21f3a0a06e5b58e2f1103a5523bac58ff728481bfdf203d8\n", - " Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4\n", - " Building wheel for sentence-transformers (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for sentence-transformers: filename=sentence_transformers-2.2.2-py3-none-any.whl size=125923 sha256=b4fb59313f79b135e68441f9b4780a35d2658bac193693e6177887cb018824a7\n", - " Stored in directory: /root/.cache/pip/wheels/62/f2/10/1e606fd5f02395388f74e7462910fe851042f97238cbbd902f\n", - " Building wheel for wget (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9655 sha256=dd1cfe282f727f91b5a5db464ae58f43372f62a80d0a083733b36c68edc6ee76\n", - " Stored in directory: /root/.cache/pip/wheels/8b/f1/7f/5c94f0a7a505ca1c81cd1d9208ae2064675d97582078e6c769\n", - " Building wheel for distance (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for distance: filename=Distance-0.1.3-py3-none-any.whl size=16258 sha256=287628136639656b068a6f2e67ab0f5a80bc4ad466590f19499978136c5e3726\n", - " Stored in directory: /root/.cache/pip/wheels/e8/bb/de/f71bf63559ea9a921059a5405806f7ff6ed612a9231c4a9309\n", - " Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13705 sha256=9af9a132b49b73c475f9838a56ff79e968af41bb509c8b38b81f7c08c3da87dd\n", - " Stored in directory: /root/.cache/pip/wheels/fc/ab/d4/5da2067ac95b36618c629a5f93f809425700506f72c9732fac\n", - " Building wheel for ipadic (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for ipadic: filename=ipadic-1.0.0-py3-none-any.whl size=13556703 sha256=9e40daac3101fb3d52fcf9c986017638569c576c3be36805d1437e722a2d8803\n", - " Stored in directory: /root/.cache/pip/wheels/5b/ea/e3/2f6e0860a327daba3b030853fce4483ed37468bbf1101c59c3\n", - " Building wheel for asciitree (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for asciitree: filename=asciitree-0.3.3-py3-none-any.whl size=5034 sha256=69d1478059cdad8cb54b8a1a29e66bd3229cf3458dcf96aeedb978381d8116f3\n", - " Stored in directory: /root/.cache/pip/wheels/7f/4e/be/1171b40f43b918087657ec57cf3b81fa1a2e027d8755baa184\n", - " Building wheel for cdifflib (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for cdifflib: filename=cdifflib-1.2.6-cp310-cp310-linux_x86_64.whl size=27681 sha256=f4ccba50f62b7265ea20b84bac762b6ed3dacae8d42ea7346d54e07da2a6ad9e\n", - " Stored in directory: /root/.cache/pip/wheels/87/a7/fd/8061e24ed08689045cb6d1ca303768dc463b20a5a338174841\n", - " Building wheel for pathtools (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for pathtools: filename=pathtools-0.1.2-py3-none-any.whl size=8791 sha256=95b669a715a9eefa4cd1222ac5c899a97064dbb765dccbd3a5b35c0208aa1389\n", - " Stored in directory: /root/.cache/pip/wheels/e7/f3/22/152153d6eb222ee7a56ff8617d80ee5207207a8c00a7aab794\n", - "Successfully built antlr4-python3-runtime progress sacremoses youtokentome fasttext kaldi-python-io nemo-toolkit rouge-score sentence-transformers wget distance docopt ipadic asciitree cdifflib pathtools\n", - "Installing collected packages: wget, tokenizers, sentencepiece, safetensors, pydub, progress, plac, pathtools, pangu, opencc, mecab-python3, ipadic, ijson, faiss-cpu, docopt, distance, braceexpand, asciitree, antlr4-python3-runtime, aniso8601, xxhash, webdataset, urllib3, typed-ast, textdistance, sox, smmap, shellingham, setuptools, setproctitle, ruamel.yaml.clib, rapidfuzz, pytest-runner, pypinyin, pynini, pydantic, pybind11, portalocker, pathspec, parameterized, onnx, omegaconf, numcodecs, marshmallow, markdown2, loguru, lightning-utilities, latexcodec, kaldiio, kaldi-python-io, jmespath, jedi, isort, ftfy, fasteners, einops, docutils, docker-pycreds, dill, colorama, click, cdifflib, attrdict, zarr, youtokentome, sentry-sdk, sacremoses, sacrebleu, ruamel.yaml, pypinyin-dict, pybtex, pyannote.core, multiprocess, Levenshtein, jiwer, hydra-core, gitdb, fasttext, botocore, black, texterrors, s3transfer, rouge-score, pybtex-docutils, huggingface-hub, GitPython, g2p-en, flask-restful, wandb, transformers, pyannote.database, datasets, boto3, pyannote.metrics, nemo-text-processing, torchmetrics, sphinxcontrib-bibtex, sentence-transformers, pytorch-lightning, nemo-toolkit, megatron-core, kornia\n", - " Attempting uninstall: urllib3\n", - " Found existing installation: urllib3 2.0.4\n", - " Uninstalling urllib3-2.0.4:\n", - " Successfully uninstalled urllib3-2.0.4\n", - " Attempting uninstall: setuptools\n", - " Found existing installation: setuptools 67.7.2\n", - " Uninstalling setuptools-67.7.2:\n", - " Successfully uninstalled setuptools-67.7.2\n", - " Attempting uninstall: pydantic\n", - " Found existing installation: pydantic 2.3.0\n", - " Uninstalling pydantic-2.3.0:\n", - " Successfully uninstalled pydantic-2.3.0\n", - " Attempting uninstall: docutils\n", - " Found existing installation: docutils 0.18.1\n", - " Uninstalling docutils-0.18.1:\n", - " Successfully uninstalled docutils-0.18.1\n", - " Attempting uninstall: click\n", - " Found existing installation: click 8.1.7\n", - " Uninstalling click-8.1.7:\n", - " Successfully uninstalled click-8.1.7\n", - "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", - "cvxpy 1.3.2 requires setuptools>65.5.1, but you have setuptools 65.5.1 which is incompatible.\u001b[0m\u001b[31m\n", - "\u001b[0mSuccessfully installed GitPython-3.1.35 Levenshtein-0.21.1 aniso8601-9.0.1 antlr4-python3-runtime-4.9.3 asciitree-0.3.3 attrdict-2.0.1 black-19.10b0 boto3-1.28.43 botocore-1.31.43 braceexpand-0.1.7 cdifflib-1.2.6 click-8.0.2 colorama-0.4.6 datasets-2.14.5 dill-0.3.7 distance-0.1.3 docker-pycreds-0.4.0 docopt-0.6.2 docutils-0.17.1 einops-0.6.1 faiss-cpu-1.7.4 fasteners-0.18 fasttext-0.9.2 flask-restful-0.3.10 ftfy-6.1.1 g2p-en-2.1.0 gitdb-4.0.10 huggingface-hub-0.16.4 hydra-core-1.3.2 ijson-3.2.3 ipadic-1.0.0 isort-5.12.0 jedi-0.19.0 jiwer-2.5.2 jmespath-1.0.1 kaldi-python-io-1.2.2 kaldiio-2.18.0 kornia-0.7.0 latexcodec-2.0.1 lightning-utilities-0.9.0 loguru-0.7.1 markdown2-2.4.10 marshmallow-3.20.1 mecab-python3-1.0.5 megatron-core-0.2.0 multiprocess-0.70.15 nemo-text-processing-0.2.0rc0 nemo-toolkit-1.21.0rc0 numcodecs-0.11.0 omegaconf-2.3.0 onnx-1.14.1 opencc-1.1.6 pangu-4.0.6.1 parameterized-0.9.0 pathspec-0.11.2 pathtools-0.1.2 plac-1.3.5 portalocker-2.7.0 progress-1.6 pyannote.core-5.0.0 pyannote.database-5.0.1 pyannote.metrics-3.2.1 pybind11-2.11.1 pybtex-0.24.0 pybtex-docutils-1.0.3 pydantic-1.10.12 pydub-0.25.1 pynini-2.1.5 pypinyin-0.49.0 pypinyin-dict-0.6.0 pytest-runner-6.0.0 pytorch-lightning-2.0.7 rapidfuzz-2.13.7 rouge-score-0.1.2 ruamel.yaml-0.17.32 ruamel.yaml.clib-0.2.7 s3transfer-0.6.2 sacrebleu-2.3.1 sacremoses-0.0.53 safetensors-0.3.3 sentence-transformers-2.2.2 sentencepiece-0.1.99 sentry-sdk-1.30.0 setproctitle-1.3.2 setuptools-65.5.1 shellingham-1.5.3 smmap-5.0.0 sox-1.4.1 sphinxcontrib-bibtex-2.6.1 textdistance-4.5.0 texterrors-0.4.4 tokenizers-0.13.3 torchmetrics-1.1.1 transformers-4.33.1 typed-ast-1.5.5 urllib3-1.26.16 wandb-0.15.10 webdataset-0.1.62 wget-3.2 xxhash-3.3.0 youtokentome-1.0.6 zarr-2.16.1\n", - "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", - "\u001b[0mAll done!\n", - "Reading package lists... Done\n", - "Building dependency tree... Done\n", - "Reading state information... Done\n", - "The following additional packages will be installed:\n", - " libopencore-amrnb0 libopencore-amrwb0 libsox-fmt-alsa libsox-fmt-base\n", - " libsox3 libwavpack1\n", - "Suggested packages:\n", - " libsox-fmt-all\n", - "The following NEW packages will be installed:\n", - " libopencore-amrnb0 libopencore-amrwb0 libsox-fmt-alsa libsox-fmt-base\n", - " libsox3 libwavpack1 sox\n", - "0 upgraded, 7 newly installed, 0 to remove and 16 not upgraded.\n", - "Need to get 617 kB of archives.\n", - "After this operation, 1,764 kB of additional disk space will be used.\n", - "Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libopencore-amrnb0 amd64 0.1.5-1 [94.8 kB]\n", - "Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libopencore-amrwb0 amd64 0.1.5-1 [49.1 kB]\n", - "Get:3 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 libsox3 amd64 14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1 [240 kB]\n", - "Get:4 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 libsox-fmt-alsa amd64 14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1 [11.2 kB]\n", - "Get:5 http://archive.ubuntu.com/ubuntu jammy/main amd64 libwavpack1 amd64 5.4.0-1build2 [83.7 kB]\n", - "Get:6 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 libsox-fmt-base amd64 14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1 [33.7 kB]\n", - "Get:7 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 sox amd64 14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1 [104 kB]\n", - "Fetched 617 kB in 0s (5,224 kB/s)\n", - "Selecting previously unselected package libopencore-amrnb0:amd64.\n", - "(Reading database ... 120901 files and directories currently installed.)\n", - "Preparing to unpack .../0-libopencore-amrnb0_0.1.5-1_amd64.deb ...\n", - "Unpacking libopencore-amrnb0:amd64 (0.1.5-1) ...\n", - "Selecting previously unselected package libopencore-amrwb0:amd64.\n", - "Preparing to unpack .../1-libopencore-amrwb0_0.1.5-1_amd64.deb ...\n", - "Unpacking libopencore-amrwb0:amd64 (0.1.5-1) ...\n", - "Selecting previously unselected package libsox3:amd64.\n", - "Preparing to unpack .../2-libsox3_14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1_amd64.deb ...\n", - "Unpacking libsox3:amd64 (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", - "Selecting previously unselected package libsox-fmt-alsa:amd64.\n", - "Preparing to unpack .../3-libsox-fmt-alsa_14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1_amd64.deb ...\n", - "Unpacking libsox-fmt-alsa:amd64 (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", - "Selecting previously unselected package libwavpack1:amd64.\n", - "Preparing to unpack .../4-libwavpack1_5.4.0-1build2_amd64.deb ...\n", - "Unpacking libwavpack1:amd64 (5.4.0-1build2) ...\n", - "Selecting previously unselected package libsox-fmt-base:amd64.\n", - "Preparing to unpack .../5-libsox-fmt-base_14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1_amd64.deb ...\n", - "Unpacking libsox-fmt-base:amd64 (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", - "Selecting previously unselected package sox.\n", - "Preparing to unpack .../6-sox_14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1_amd64.deb ...\n", - "Unpacking sox (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", - "Setting up libsox3:amd64 (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", - "Setting up libopencore-amrwb0:amd64 (0.1.5-1) ...\n", - "Setting up libsox-fmt-alsa:amd64 (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", - "Setting up libwavpack1:amd64 (5.4.0-1build2) ...\n", - "Setting up libopencore-amrnb0:amd64 (0.1.5-1) ...\n", - "Setting up libsox-fmt-base:amd64 (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", - "Setting up sox (14.4.2+git20190427-2+deb11u2ubuntu0.22.04.1) ...\n", - "Processing triggers for man-db (2.10.2-1) ...\n", - "Processing triggers for libc-bin (2.35-0ubuntu3.1) ...\n", - "/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link\n", - "\n", - "/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link\n", - "\n", - "/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link\n", - "\n", - "/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link\n", - "\n", - "/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link\n", - "\n", - "/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link\n", - "\n", - "Collecting dash>=2.1.0 (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", - " Obtaining dependency information for dash>=2.1.0 from https://files.pythonhosted.org/packages/9b/b4/d522c16b41a8da013fd60a67f9618e57c504cd2c80e02a7a861413b93906/dash-2.13.0-py3-none-any.whl.metadata\n", - " Downloading dash-2.13.0-py3-none-any.whl.metadata (11 kB)\n", - "Collecting dash_bootstrap_components>=1.0.3 (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 2))\n", - " Obtaining dependency information for dash_bootstrap_components>=1.0.3 from https://files.pythonhosted.org/packages/cd/2a/cf963336e8b6745406d357e2f2b33ff1f236531fcadbe250096931855ec0/dash_bootstrap_components-1.5.0-py3-none-any.whl.metadata\n", - " Downloading dash_bootstrap_components-1.5.0-py3-none-any.whl.metadata (5.2 kB)\n", - "Collecting diff_match_patch (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 3))\n", - " Downloading diff_match_patch-20230430-py3-none-any.whl (42 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m42.8/42.8 kB\u001b[0m \u001b[31m2.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: editdistance in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 4)) (0.6.2)\n", - "Requirement already satisfied: jiwer in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 5)) (2.5.2)\n", - "Requirement already satisfied: librosa>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (0.10.1)\n", - "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 7)) (1.23.5)\n", - "Requirement already satisfied: plotly in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 8)) (5.15.0)\n", - "Requirement already satisfied: SoundFile in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 9)) (0.12.1)\n", - "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from -r ./NeMo/tools/speech_data_explorer/requirements.txt (line 10)) (4.66.1)\n", - "Requirement already satisfied: Flask<2.3.0,>=1.0.4 in /usr/local/lib/python3.10/dist-packages (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (2.2.5)\n", - "Collecting Werkzeug<2.3.0 (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", - " Downloading Werkzeug-2.2.3-py3-none-any.whl (233 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m233.6/233.6 kB\u001b[0m \u001b[31m9.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting dash-html-components==2.0.0 (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", - " Downloading dash_html_components-2.0.0-py3-none-any.whl (4.1 kB)\n", - "Collecting dash-core-components==2.0.0 (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", - " Downloading dash_core_components-2.0.0-py3-none-any.whl (3.8 kB)\n", - "Collecting dash-table==5.0.0 (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", - " Downloading dash_table-5.0.0-py3-none-any.whl (3.9 kB)\n", - "Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (4.7.1)\n", - "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (2.31.0)\n", - "Collecting retrying (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", - " Downloading retrying-1.3.4-py3-none-any.whl (11 kB)\n", - "Collecting ansi2html (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1))\n", - " Downloading ansi2html-1.8.0-py3-none-any.whl (16 kB)\n", - "Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.10/dist-packages (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (1.5.7)\n", - "Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (65.5.1)\n", - "Requirement already satisfied: rapidfuzz==2.13.7 in /usr/local/lib/python3.10/dist-packages (from jiwer->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 5)) (2.13.7)\n", - "Requirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (3.0.0)\n", - "Requirement already satisfied: scipy>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (1.10.1)\n", - "Requirement already satisfied: scikit-learn>=0.20.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (1.2.2)\n", - "Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (1.3.2)\n", - "Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (4.4.2)\n", - "Requirement already satisfied: numba>=0.51.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (0.56.4)\n", - "Requirement already satisfied: pooch>=1.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (1.7.0)\n", - "Requirement already satisfied: soxr>=0.3.2 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (0.3.6)\n", - "Requirement already satisfied: lazy-loader>=0.1 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (0.3)\n", - "Requirement already satisfied: msgpack>=1.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (1.0.5)\n", - "Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from plotly->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 8)) (8.2.3)\n", - "Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from plotly->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 8)) (23.1)\n", - "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.10/dist-packages (from SoundFile->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 9)) (1.15.1)\n", - "Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.0->SoundFile->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 9)) (2.21)\n", - "Requirement already satisfied: Jinja2>=3.0 in /usr/local/lib/python3.10/dist-packages (from Flask<2.3.0,>=1.0.4->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (3.1.2)\n", - "Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from Flask<2.3.0,>=1.0.4->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (2.1.2)\n", - "Requirement already satisfied: click>=8.0 in /usr/local/lib/python3.10/dist-packages (from Flask<2.3.0,>=1.0.4->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (8.0.2)\n", - "Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba>=0.51.0->librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (0.39.1)\n", - "Requirement already satisfied: platformdirs>=2.5.0 in /usr/local/lib/python3.10/dist-packages (from pooch>=1.0->librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (3.10.0)\n", - "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (3.2.0)\n", - "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (3.4)\n", - "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (1.26.16)\n", - "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (2023.7.22)\n", - "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.20.0->librosa>=0.9.1->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 6)) (3.2.0)\n", - "Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from Werkzeug<2.3.0->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (2.1.3)\n", - "Requirement already satisfied: six>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from retrying->dash>=2.1.0->-r ./NeMo/tools/speech_data_explorer/requirements.txt (line 1)) (1.16.0)\n", - "Downloading dash-2.13.0-py3-none-any.whl (10.4 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m10.4/10.4 MB\u001b[0m \u001b[31m46.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hDownloading dash_bootstrap_components-1.5.0-py3-none-any.whl (221 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m221.2/221.2 kB\u001b[0m \u001b[31m23.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hInstalling collected packages: dash-table, dash-html-components, dash-core-components, Werkzeug, retrying, diff_match_patch, ansi2html, dash, dash_bootstrap_components\n", - " Attempting uninstall: Werkzeug\n", - " Found existing installation: Werkzeug 2.3.7\n", - " Uninstalling Werkzeug-2.3.7:\n", - " Successfully uninstalled Werkzeug-2.3.7\n", - "Successfully installed Werkzeug-2.2.3 ansi2html-1.8.0 dash-2.13.0 dash-core-components-2.0.0 dash-html-components-2.0.0 dash-table-5.0.0 dash_bootstrap_components-1.5.0 diff_match_patch-20230430 retrying-1.3.4\n", - "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", - "\u001b[0m" - ] - } - ], + "outputs": [], "source": [ - "!git clone https://github.com/NVIDIA/NeMo.git\n", + "BRANCH = 'main'\n", + "\n", + "!git clone -b $BRANCH https://github.com/NVIDIA/NeMo\n", "\n", "!apt-get update && apt-get install -y libsndfile1 ffmpeg\n", "!cd NeMo;./reinstall.sh\n", From e52c99b19e4c7c1b0cc97bccb6f2d8745ecfe79a Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 3 Oct 2023 11:15:08 -0700 Subject: [PATCH 091/112] Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../thutmose_tagger.py | 11 +++++++---- nemo/collections/tts/g2p/models/t5.py | 13 ++++++++++++- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/nemo/collections/nlp/models/text_normalization_as_tagging/thutmose_tagger.py b/nemo/collections/nlp/models/text_normalization_as_tagging/thutmose_tagger.py index be82d6a31582..f6e5c155646d 100644 --- a/nemo/collections/nlp/models/text_normalization_as_tagging/thutmose_tagger.py +++ b/nemo/collections/nlp/models/text_normalization_as_tagging/thutmose_tagger.py @@ -236,14 +236,15 @@ def validation_step(self, batch, batch_idx): val_loss_tag = self.loss_fn(logits=tag_logits, labels=tag_labels, loss_mask=labels_mask) val_loss_semiotic = self.loss_fn(logits=semiotic_logits, labels=semiotic_labels, loss_mask=labels_mask) val_loss = val_loss_tag + val_loss_semiotic + self.validation_step_outputs.append(val_loss) return {'val_loss': val_loss} - def on_validation_epoch_end(self, outputs): + def on_validation_epoch_end(self): """ Called at the end of validation to aggregate outputs. :param outputs: list of individual outputs of each validation step. """ - avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean() + avg_loss = torch.stack([x['val_loss'] for x in self.validation_step_outputs]).mean() # calculate metrics and classification report # In our task recall = accuracy, and the recall column - is the per class accuracy @@ -269,6 +270,8 @@ def on_validation_epoch_end(self, outputs): self.tag_multiword_classification_report.reset() self.semiotic_classification_report.reset() + self.validation_step_outputs.clear() # free memory + def test_step(self, batch, batch_idx): """ Lightning calls this inside the test loop with the data from the test dataloader @@ -276,12 +279,12 @@ def test_step(self, batch, batch_idx): """ return self.validation_step(batch, batch_idx) - def on_test_epoch_end(self, outputs): + def on_test_epoch_end(self): """ Called at the end of test to aggregate outputs. :param outputs: list of individual outputs of each test step. """ - return self.on_validation_epoch_end(outputs) + return self.on_validation_epoch_end() # Functions for inference @torch.no_grad() diff --git a/nemo/collections/tts/g2p/models/t5.py b/nemo/collections/tts/g2p/models/t5.py index 16f1f1933fb0..b41fcf1d5945 100644 --- a/nemo/collections/tts/g2p/models/t5.py +++ b/nemo/collections/tts/g2p/models/t5.py @@ -170,7 +170,18 @@ def validation_step(self, batch, batch_idx, dataloader_idx=0, split="val"): ) generated_str, _, _ = self._generate_predictions(input_ids=input_ids, model_max_target_len=self.max_target_len) per = word_error_rate(hypotheses=generated_str, references=labels_str, use_cer=True) - return {f"{split}_loss": val_loss, 'per': per} + output = {f"{split}_loss": val_loss, 'per': per} + if split == 'val': + if isinstance(self.trainer.val_dataloaders, (list, tuple)) and len(self.trainer.val_dataloaders) > 1: + self.validation_step_outputs[dataloader_idx].append(output) + else: + self.validation_step_outputs.append(output) + else: + if isinstance(self.trainer.test_dataloaders, (list, tuple)) and len(self.trainer.test_dataloaders) > 1: + self.test_step_outputs[dataloader_idx].append(output) + else: + self.test_step_outputs.append(output) + return output def test_step(self, batch, batch_idx, dataloader_idx=0): """ From cbb499cd53833451f7f99131828f7b1e69b61c69 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 3 Oct 2023 13:11:09 -0700 Subject: [PATCH 092/112] Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym Co-authored-by: Sangkug Lym Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../language_modeling/megatron_gpt_model.py | 48 ++++++++++++++----- 1 file changed, 37 insertions(+), 11 deletions(-) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py index 3d7a5a127399..46d3f37f5d9b 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py @@ -13,6 +13,7 @@ # limitations under the License. import itertools +import os import queue import warnings from dataclasses import fields @@ -273,6 +274,8 @@ def __init__(self, cfg: DictConfig, trainer: Trainer): self.get_attention_mask_from_fusion = self.cfg.get('get_attention_mask_from_fusion', True) self.initialize_ub = self.cfg.get('ub_tp_comm_overlap', False) + self.log_train_loss = bool(int(os.getenv("NEMO_LOG_TRAIN_LOSS", 1))) + self.loss_broadcast_src_rank = None self.inference_params = None @@ -627,17 +630,29 @@ def training_step(self, dataloader_iter, batch_idx): self.allreduce_first_last_embeddings() ## logging - # we can only log on one rank if it is rank zero so we broadcast from last rank - # we can avoid this broadcast by updating the PTL log function to accept specific ranks - torch.distributed.broadcast(loss_mean, get_last_rank()) + if self.log_train_loss: + # When using pipeline parallelism, loss is calculated only in the last pipeline stage and + # it should be casted to other pipeline stages for logging. + # we can avoid this broadcast by updating the PTL log function to accept specific ranks + if parallel_state.get_pipeline_model_parallel_world_size() > 1: + if self.loss_broadcast_src_rank is None: + dp_size = parallel_state.get_data_parallel_world_size() + tp_size = parallel_state.get_tensor_model_parallel_world_size() + pp_size = parallel_state.get_pipeline_model_parallel_world_size() + rank_in_dp_tp_group = torch.distributed.get_rank() % (dp_size * tp_size) + last_pipeline_stage_offset = (tp_size * dp_size) * (pp_size - 1) + self.loss_broadcast_src_rank = last_pipeline_stage_offset + rank_in_dp_tp_group + torch.distributed.broadcast( + loss_mean, self.loss_broadcast_src_rank, group=parallel_state.get_pipeline_model_parallel_group(), + ) + self.log('reduced_train_loss', loss_mean, prog_bar=True, rank_zero_only=True, batch_size=1) - # (@adithyare) we need to check for the _scaler attribute to enable pp>1 for adapter training - if self.torch_dtype == torch.float16 and hasattr(self.trainer.precision_plugin.scaler, "_scale"): - loss_scale = self.trainer.precision_plugin.scaler._scale - if loss_scale is not None: - self.log('loss_scale', loss_scale, batch_size=1) + # (@adithyare) we need to check for the _scaler attribute to enable pp>1 for adapter training + if self.cfg.precision == 16 and hasattr(self.trainer.precision_plugin.scaler, "_scale"): + loss_scale = self.trainer.precision_plugin.scaler._scale + if loss_scale is not None: + self.log('loss_scale', loss_scale, batch_size=1) - self.log('reduced_train_loss', loss_mean, prog_bar=True, rank_zero_only=True, batch_size=1) lr = self._optimizer.param_groups[0]['lr'] self.log('lr', lr, rank_zero_only=True, batch_size=1) self.log( @@ -962,8 +977,19 @@ def on_validation_epoch_end(self): else: averaged_loss = torch.tensor(0.0, dtype=torch.float32).cuda() - # we can only log on one rank if it is rank zero so we broadcast from last rank - torch.distributed.broadcast(averaged_loss, get_last_rank()) + # When using pipeline parallelism, loss is calculated only in the last pipeline stage and + # it should be casted to other pipeline stages for logging. + if parallel_state.get_pipeline_model_parallel_world_size() > 1: + if self.loss_broadcast_src_rank is None: + dp_size = parallel_state.get_data_parallel_world_size() + tp_size = parallel_state.get_tensor_model_parallel_world_size() + pp_size = parallel_state.get_pipeline_model_parallel_world_size() + rank_in_dp_tp_group = torch.distributed.get_rank() % (dp_size * tp_size) + last_pipeline_stage_offset = (tp_size * dp_size) * (pp_size - 1) + self.loss_broadcast_src_rank = last_pipeline_stage_offset + rank_in_dp_tp_group + torch.distributed.broadcast( + averaged_loss, self.loss_broadcast_src_rank, group=parallel_state.get_pipeline_model_parallel_group(), + ) self.log('val_loss', averaged_loss, prog_bar=True, rank_zero_only=True, batch_size=1) self.validation_step_outputs.clear() # free memory From 2da5c027bce03cc59b47cc22e9eed3f1c235e746 Mon Sep 17 00:00:00 2001 From: Jason Date: Tue, 3 Oct 2023 16:28:25 -0400 Subject: [PATCH 093/112] Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason --------- Signed-off-by: Jason Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../duplex_text_normalization_infer.py | 10 +++++- .../nn_wfst/en/electronic/normalize.py | 13 +++++-- .../en/electronic/tokenize_and_classify.py | 35 +++++++++++-------- .../nn_wfst/en/electronic/verbalize.py | 14 +++++--- .../nn_wfst/en/electronic/verbalize_final.py | 18 ++++++---- .../nn_wfst/en/whitelist/normalize.py | 13 +++++-- .../en/whitelist/tokenize_and_classify.py | 35 +++++++++++-------- .../nn_wfst/en/whitelist/verbalize.py | 14 +++++--- .../nn_wfst/en/whitelist/verbalize_final.py | 17 ++++++--- requirements/requirements_tts.txt | 3 +- .../tts/hui_acg/get_data.py | 10 +++++- .../tts/ljspeech/get_data.py | 10 +++++- .../dataset_processing/tts/preprocess_text.py | 10 +++++- .../tts/sfbilingual/get_data.py | 10 +++++- .../tts/thorsten_neutral/get_data.py | 10 +++++- .../asr/test_text_to_text_dataset.py | 10 +++++- tools/ctc_segmentation/requirements.txt | 3 +- tutorials/asr/ASR_TTS_Tutorial.ipynb | 9 ++++- .../tts/FastPitch_MixerTTS_Training.ipynb | 11 ++++-- tutorials/tts/NeMo_TTS_Primer.ipynb | 11 ++++-- 20 files changed, 199 insertions(+), 67 deletions(-) diff --git a/examples/nlp/duplex_text_normalization/duplex_text_normalization_infer.py b/examples/nlp/duplex_text_normalization/duplex_text_normalization_infer.py index 4cf25e12fc89..6bcc69de7db9 100644 --- a/examples/nlp/duplex_text_normalization/duplex_text_normalization_infer.py +++ b/examples/nlp/duplex_text_normalization/duplex_text_normalization_infer.py @@ -50,7 +50,6 @@ from typing import List from helpers import DECODER_MODEL, TAGGER_MODEL, instantiate_model_and_trainer -from nemo_text_processing.text_normalization.data_loader_utils import post_process_punct from nn_wfst.en.electronic.normalize import ElectronicNormalizer from nn_wfst.en.whitelist.normalize import WhitelistNormalizer from omegaconf import DictConfig, OmegaConf @@ -60,6 +59,15 @@ from nemo.core.config import hydra_runner from nemo.utils import logging +try: + from nemo_text_processing.text_normalization.data_loader_utils import post_process_punct +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) + @hydra_runner(config_path="conf", config_name="duplex_tn_config") def main(cfg: DictConfig) -> None: diff --git a/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/normalize.py b/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/normalize.py index e0d83b42222d..a1f8caa7d959 100644 --- a/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/normalize.py +++ b/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/normalize.py @@ -12,8 +12,15 @@ # See the License for the specific language governing permissions and # limitations under the License. -from nemo_text_processing.text_normalization.normalize import Normalizer -from nemo_text_processing.text_normalization.token_parser import TokenParser +try: + from nemo_text_processing.text_normalization.normalize import Normalizer + from nemo_text_processing.text_normalization.token_parser import TokenParser +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) from nemo.collections.common.tokenizers.moses_tokenizers import MosesProcessor @@ -21,7 +28,7 @@ class ElectronicNormalizer(Normalizer): """ Normalizer for ELECTRONIC. - + Args: input_case: accepting either "lower_cased" or "cased" input. lang: language diff --git a/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/tokenize_and_classify.py b/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/tokenize_and_classify.py index 59a9d9784038..9e0c284d84b0 100644 --- a/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/tokenize_and_classify.py +++ b/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/tokenize_and_classify.py @@ -15,18 +15,25 @@ import os -import pynini -from nemo_text_processing.text_normalization.en.graph_utils import ( - NEMO_WHITE_SPACE, - GraphFst, - delete_extra_space, - delete_space, - generator_main, -) -from nemo_text_processing.text_normalization.en.taggers.electronic import ElectronicFst -from nemo_text_processing.text_normalization.en.taggers.punctuation import PunctuationFst -from nemo_text_processing.text_normalization.en.taggers.word import WordFst -from pynini.lib import pynutil +try: + import pynini + from nemo_text_processing.text_normalization.en.graph_utils import ( + NEMO_WHITE_SPACE, + GraphFst, + delete_extra_space, + delete_space, + generator_main, + ) + from nemo_text_processing.text_normalization.en.taggers.electronic import ElectronicFst + from nemo_text_processing.text_normalization.en.taggers.punctuation import PunctuationFst + from nemo_text_processing.text_normalization.en.taggers.word import WordFst + from pynini.lib import pynutil +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) from nemo.utils import logging @@ -34,9 +41,9 @@ class ClassifyFst(GraphFst): """ Final class that composes all other classification grammars. This class can process an entire sentence including punctuation. - For deployment, this grammar will be compiled and exported to OpenFst Finate State Archiv (FAR) File. + For deployment, this grammar will be compiled and exported to OpenFst Finate State Archiv (FAR) File. More details to deployment at NeMo/tools/text_processing_deployment. - + Args: input_case: accepting either "lower_cased" or "cased" input. deterministic: if True will provide a single transduction option, diff --git a/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/verbalize.py b/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/verbalize.py index 6366942d34c8..7236be7a1994 100644 --- a/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/verbalize.py +++ b/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/verbalize.py @@ -12,15 +12,21 @@ # See the License for the specific language governing permissions and # limitations under the License. - -from nemo_text_processing.text_normalization.en.graph_utils import GraphFst -from nemo_text_processing.text_normalization.en.verbalizers.electronic import ElectronicFst +try: + from nemo_text_processing.text_normalization.en.graph_utils import GraphFst + from nemo_text_processing.text_normalization.en.verbalizers.electronic import ElectronicFst +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) class VerbalizeFst(GraphFst): """ Composes other verbalizer grammars. - For deployment, this grammar will be compiled and exported to OpenFst Finate State Archiv (FAR) File. + For deployment, this grammar will be compiled and exported to OpenFst Finate State Archiv (FAR) File. More details to deployment at NeMo/tools/text_processing_deployment. Args: diff --git a/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/verbalize_final.py b/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/verbalize_final.py index 4d5d716bd01e..b2cc69ca9e09 100644 --- a/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/verbalize_final.py +++ b/examples/nlp/duplex_text_normalization/nn_wfst/en/electronic/verbalize_final.py @@ -12,12 +12,18 @@ # See the License for the specific language governing permissions and # limitations under the License. - -import pynini -from nemo_text_processing.text_normalization.en.graph_utils import GraphFst, delete_extra_space, delete_space -from nemo_text_processing.text_normalization.en.verbalizers.word import WordFst -from nn_wfst.en.electronic.verbalize import VerbalizeFst -from pynini.lib import pynutil +try: + import pynini + from nemo_text_processing.text_normalization.en.graph_utils import GraphFst, delete_extra_space, delete_space + from nemo_text_processing.text_normalization.en.verbalizers.word import WordFst + from nn_wfst.en.electronic.verbalize import VerbalizeFst + from pynini.lib import pynutil +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) class VerbalizeFinalFst(GraphFst): diff --git a/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/normalize.py b/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/normalize.py index 4109109ec83a..cfb4bef5d1c3 100644 --- a/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/normalize.py +++ b/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/normalize.py @@ -12,8 +12,15 @@ # See the License for the specific language governing permissions and # limitations under the License. -from nemo_text_processing.text_normalization.normalize import Normalizer -from nemo_text_processing.text_normalization.token_parser import TokenParser +try: + from nemo_text_processing.text_normalization.normalize import Normalizer + from nemo_text_processing.text_normalization.token_parser import TokenParser +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) from nemo.collections.common.tokenizers.moses_tokenizers import MosesProcessor @@ -21,7 +28,7 @@ class WhitelistNormalizer(Normalizer): """ Normalizer for WHITELIST. - + Args: input_case: accepting either "lower_cased" or "cased" input. lang: language diff --git a/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/tokenize_and_classify.py b/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/tokenize_and_classify.py index 712812fa8190..c2d69e765bb4 100644 --- a/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/tokenize_and_classify.py +++ b/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/tokenize_and_classify.py @@ -15,18 +15,25 @@ import os -import pynini -from nemo_text_processing.text_normalization.en.graph_utils import ( - NEMO_WHITE_SPACE, - GraphFst, - delete_extra_space, - delete_space, - generator_main, -) -from nemo_text_processing.text_normalization.en.taggers.punctuation import PunctuationFst -from nemo_text_processing.text_normalization.en.taggers.whitelist import WhiteListFst -from nemo_text_processing.text_normalization.en.taggers.word import WordFst -from pynini.lib import pynutil +try: + import pynini + from nemo_text_processing.text_normalization.en.graph_utils import ( + NEMO_WHITE_SPACE, + GraphFst, + delete_extra_space, + delete_space, + generator_main, + ) + from nemo_text_processing.text_normalization.en.taggers.punctuation import PunctuationFst + from nemo_text_processing.text_normalization.en.taggers.whitelist import WhiteListFst + from nemo_text_processing.text_normalization.en.taggers.word import WordFst + from pynini.lib import pynutil +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) from nemo.utils import logging @@ -34,9 +41,9 @@ class ClassifyFst(GraphFst): """ Final class that composes all other classification grammars. This class can process an entire sentence including punctuation. - For deployment, this grammar will be compiled and exported to OpenFst Finate State Archiv (FAR) File. + For deployment, this grammar will be compiled and exported to OpenFst Finate State Archiv (FAR) File. More details to deployment at NeMo/tools/text_processing_deployment. - + Args: input_case: accepting either "lower_cased" or "cased" input. deterministic: if True will provide a single transduction option, diff --git a/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/verbalize.py b/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/verbalize.py index e85f067acf96..c647a142ef8c 100644 --- a/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/verbalize.py +++ b/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/verbalize.py @@ -12,15 +12,21 @@ # See the License for the specific language governing permissions and # limitations under the License. - -from nemo_text_processing.text_normalization.en.graph_utils import GraphFst -from nemo_text_processing.text_normalization.en.verbalizers.whitelist import WhiteListFst +try: + from nemo_text_processing.text_normalization.en.graph_utils import GraphFst + from nemo_text_processing.text_normalization.en.verbalizers.whitelist import WhiteListFst +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) class VerbalizeFst(GraphFst): """ Composes other verbalizer grammars. - For deployment, this grammar will be compiled and exported to OpenFst Finate State Archiv (FAR) File. + For deployment, this grammar will be compiled and exported to OpenFst Finate State Archiv (FAR) File. More details to deployment at NeMo/tools/text_processing_deployment. Args: diff --git a/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/verbalize_final.py b/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/verbalize_final.py index 4d5d716bd01e..550a8a85d797 100644 --- a/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/verbalize_final.py +++ b/examples/nlp/duplex_text_normalization/nn_wfst/en/whitelist/verbalize_final.py @@ -13,11 +13,18 @@ # limitations under the License. -import pynini -from nemo_text_processing.text_normalization.en.graph_utils import GraphFst, delete_extra_space, delete_space -from nemo_text_processing.text_normalization.en.verbalizers.word import WordFst -from nn_wfst.en.electronic.verbalize import VerbalizeFst -from pynini.lib import pynutil +try: + import pynini + from nemo_text_processing.text_normalization.en.graph_utils import GraphFst, delete_extra_space, delete_space + from nemo_text_processing.text_normalization.en.verbalizers.word import WordFst + from nn_wfst.en.electronic.verbalize import VerbalizeFst + from pynini.lib import pynutil +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) class VerbalizeFinalFst(GraphFst): diff --git a/requirements/requirements_tts.txt b/requirements/requirements_tts.txt index bb330aaf2e58..9536faec8c78 100644 --- a/requirements/requirements_tts.txt +++ b/requirements/requirements_tts.txt @@ -4,7 +4,8 @@ jieba kornia librosa matplotlib -nemo_text_processing +# pynini does not currently support aarch, disable nemo_text_processing for now +nemo_text_processing; 'arm' not in platform_machine and 'aarch' not in platform_machine nltk pandas pypinyin diff --git a/scripts/dataset_processing/tts/hui_acg/get_data.py b/scripts/dataset_processing/tts/hui_acg/get_data.py index dfde19f33f57..668d532f321a 100644 --- a/scripts/dataset_processing/tts/hui_acg/get_data.py +++ b/scripts/dataset_processing/tts/hui_acg/get_data.py @@ -21,9 +21,17 @@ import pandas as pd from joblib import Parallel, delayed -from nemo_text_processing.text_normalization.normalize import Normalizer from tqdm import tqdm +try: + from nemo_text_processing.text_normalization.normalize import Normalizer +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) + from nemo.utils import logging # full corpus. diff --git a/scripts/dataset_processing/tts/ljspeech/get_data.py b/scripts/dataset_processing/tts/ljspeech/get_data.py index c8aeed5dbfca..8007b5a0f05a 100644 --- a/scripts/dataset_processing/tts/ljspeech/get_data.py +++ b/scripts/dataset_processing/tts/ljspeech/get_data.py @@ -20,9 +20,17 @@ import sox import wget -from nemo_text_processing.text_normalization.normalize import Normalizer from tqdm import tqdm +try: + from nemo_text_processing.text_normalization.normalize import Normalizer +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) + def get_args(): parser = argparse.ArgumentParser(description='Download LJSpeech and create manifests with predefined split') diff --git a/scripts/dataset_processing/tts/preprocess_text.py b/scripts/dataset_processing/tts/preprocess_text.py index 580a84a02d6f..6afab42a1d6b 100644 --- a/scripts/dataset_processing/tts/preprocess_text.py +++ b/scripts/dataset_processing/tts/preprocess_text.py @@ -32,10 +32,18 @@ from hydra.utils import instantiate from joblib import Parallel, delayed -from nemo_text_processing.text_normalization.normalize import Normalizer from omegaconf import OmegaConf from tqdm import tqdm +try: + from nemo_text_processing.text_normalization.normalize import Normalizer +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) + from nemo.collections.asr.parts.utils.manifest_utils import read_manifest, write_manifest diff --git a/scripts/dataset_processing/tts/sfbilingual/get_data.py b/scripts/dataset_processing/tts/sfbilingual/get_data.py index bb38a6d127ba..806f9882a9f4 100755 --- a/scripts/dataset_processing/tts/sfbilingual/get_data.py +++ b/scripts/dataset_processing/tts/sfbilingual/get_data.py @@ -20,9 +20,17 @@ from pathlib import Path import numpy as np -from nemo_text_processing.text_normalization.normalize import Normalizer from opencc import OpenCC +try: + from nemo_text_processing.text_normalization.normalize import Normalizer +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) + def get_args(): parser = argparse.ArgumentParser( diff --git a/scripts/dataset_processing/tts/thorsten_neutral/get_data.py b/scripts/dataset_processing/tts/thorsten_neutral/get_data.py index 9422c0cd5498..d49d362064fd 100644 --- a/scripts/dataset_processing/tts/thorsten_neutral/get_data.py +++ b/scripts/dataset_processing/tts/thorsten_neutral/get_data.py @@ -32,9 +32,17 @@ from pathlib import Path from joblib import Parallel, delayed -from nemo_text_processing.text_normalization.normalize import Normalizer from tqdm import tqdm +try: + from nemo_text_processing.text_normalization.normalize import Normalizer +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) + from nemo.utils import logging # Thorsten Müller published two neural voice datasets, 21.02 and 22.10. diff --git a/tests/collections/asr/test_text_to_text_dataset.py b/tests/collections/asr/test_text_to_text_dataset.py index bc7a0a9d01dd..92205de41a1b 100644 --- a/tests/collections/asr/test_text_to_text_dataset.py +++ b/tests/collections/asr/test_text_to_text_dataset.py @@ -20,9 +20,17 @@ import pytest from hydra.utils import instantiate -from nemo_text_processing.text_normalization.normalize import Normalizer from omegaconf import OmegaConf +try: + from nemo_text_processing.text_normalization.normalize import Normalizer +except (ImportError, ModuleNotFoundError): + raise ModuleNotFoundError( + "The package `nemo_text_processing` was not installed in this environment. Please refer to" + " https://github.com/NVIDIA/NeMo-text-processing and install this package before using " + "this script" + ) + from nemo.collections.asr.data.text_to_text import TextToTextDataset, TextToTextItem, TextToTextIterableDataset from nemo.collections.common import tokenizers diff --git a/tools/ctc_segmentation/requirements.txt b/tools/ctc_segmentation/requirements.txt index f010b225a66e..bb51e49a0c87 100644 --- a/tools/ctc_segmentation/requirements.txt +++ b/tools/ctc_segmentation/requirements.txt @@ -1,3 +1,4 @@ ctc_segmentation==1.7.1 -nemo_text_processing==0.1.6rc0 +# pynini does not currently support aarch, disable nemo_text_processing for now +nemo_text_processing==0.1.6rc0; 'arm' not in platform_machine and 'aarch' not in platform_machine num2words diff --git a/tutorials/asr/ASR_TTS_Tutorial.ipynb b/tutorials/asr/ASR_TTS_Tutorial.ipynb index 267c84bca9d2..067c007ea3df 100644 --- a/tutorials/asr/ASR_TTS_Tutorial.ipynb +++ b/tutorials/asr/ASR_TTS_Tutorial.ipynb @@ -183,7 +183,14 @@ "from nemo.collections.tts.models import FastPitchModel, SpectrogramEnhancerModel\n", "from nemo.utils.notebook_utils import download_an4\n", "\n", - "from nemo_text_processing.text_normalization.normalize import Normalizer" + "try:\n", + " from nemo_text_processing.text_normalization.normalize import Normalizer\n", + "except ModuleNotFoundError:\n", + " raise ModuleNotFoundError(\n", + " \"The package `nemo_text_processing` was not installed in this environment. Please refer to\"\n", + " \" https://github.com/NVIDIA/NeMo-text-processing and install this package before using \"\n", + " \"this script\"\n", + " )" ] }, { diff --git a/tutorials/tts/FastPitch_MixerTTS_Training.ipynb b/tutorials/tts/FastPitch_MixerTTS_Training.ipynb index 558c0d95d30b..cfcd607ee93c 100644 --- a/tutorials/tts/FastPitch_MixerTTS_Training.ipynb +++ b/tutorials/tts/FastPitch_MixerTTS_Training.ipynb @@ -198,8 +198,15 @@ "source": [ "from nemo.collections.tts.g2p.models.en_us_arpabet import EnglishG2p\n", "from nemo.collections.tts.data.dataset import TTSDataset\n", - "from nemo_text_processing.text_normalization.normalize import Normalizer\n", - "from nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers import EnglishPhonemesTokenizer, EnglishCharsTokenizer" + "from nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers import EnglishPhonemesTokenizer, EnglishCharsTokenizer\n", + "try:\n", + " from nemo_text_processing.text_normalization.normalize import Normalizer\n", + "except ModuleNotFoundError:\n", + " raise ModuleNotFoundError(\n", + " \"The package `nemo_text_processing` was not installed in this environment. Please refer to\"\n", + " \" https://github.com/NVIDIA/NeMo-text-processing and install this package before using \"\n", + " \"this script\"\n", + " )" ] }, { diff --git a/tutorials/tts/NeMo_TTS_Primer.ipynb b/tutorials/tts/NeMo_TTS_Primer.ipynb index 99306744dd05..fe2e34659554 100644 --- a/tutorials/tts/NeMo_TTS_Primer.ipynb +++ b/tutorials/tts/NeMo_TTS_Primer.ipynb @@ -240,7 +240,14 @@ }, "outputs": [], "source": [ - "from nemo_text_processing.text_normalization.normalize import Normalizer\n", + "try:\n", + " from nemo_text_processing.text_normalization.normalize import Normalizer\n", + "except ModuleNotFoundError:\n", + " raise ModuleNotFoundError(\n", + " \"The package `nemo_text_processing` was not installed in this environment. Please refer to\"\n", + " \" https://github.com/NVIDIA/NeMo-text-processing and install this package before using \"\n", + " \"this script\"\n", + " )\n", "\n", "text_normalizer = Normalizer(input_case=\"cased\", lang=\"en\")" ] @@ -777,7 +784,7 @@ "While raw audio shows amplitude versus time and is useful for easily recording and listening, it is not optimal when it comes to processing.\n", "\n", "For processing, it is usually preferable to represent the audio as a **spectrogram** which shows frequency versus time. Specifically, we:\n", - "\n", + "\n", "1. Group together audio samples into a much smaller set of time buckets, called **audio frames**. An audio frame will usually bucket around 50ms of audio.\n", "2. For each audio frame, use the [Fast Fourier transform](https://en.wikipedia.org/wiki/Fast_Fourier_transform) (**FFT**) to calculate the magnitude (ie. energy, amplitude or \"loudness\") and phase (which we don't use) of each frequency bin. We refer to the magnitudes of the frequency bins as a spectrogram\n", "3. Map the original frequency bins onto the [mel scale](https://en.wikipedia.org/wiki/Mel_scale), using overlapped [triangular filters](https://en.wikipedia.org/wiki/Window_function#Triangular_window) to create mel filterbanks.\n", From 5f01aab2adf859f50a7924209c77117947b60673 Mon Sep 17 00:00:00 2001 From: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Date: Tue, 3 Oct 2023 19:35:05 -0700 Subject: [PATCH 094/112] Bound transformers version in requirements (#7620) Signed-off-by: Abhishree Signed-off-by: Sasha Meister --- requirements/requirements_lightning.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements/requirements_lightning.txt b/requirements/requirements_lightning.txt index 62468a37e972..adc6aa0e0026 100644 --- a/requirements/requirements_lightning.txt +++ b/requirements/requirements_lightning.txt @@ -2,6 +2,6 @@ hydra-core>1.3,<=1.3.2 omegaconf<=2.3 pytorch-lightning>=2.0,<=2.0.7 torchmetrics>=0.11.0 -transformers>=4.0.1 +transformers>=4.0.1,<=4.33.3 wandb webdataset>=0.1.48,<=0.1.62 From 36ba71f2ae1a2fe5608df78b36acceced5335cdc Mon Sep 17 00:00:00 2001 From: Chen Cui Date: Wed, 4 Oct 2023 12:29:12 -0400 Subject: [PATCH 095/112] fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui * Update peft_config.py brackets Signed-off-by: Adi Renduchintala --------- Signed-off-by: Chen Cui Signed-off-by: Adi Renduchintala Co-authored-by: Adi Renduchintala Signed-off-by: Sasha Meister --- nemo/collections/nlp/parts/peft_config.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/nemo/collections/nlp/parts/peft_config.py b/nemo/collections/nlp/parts/peft_config.py index dd75747fd73c..524a7fb62368 100644 --- a/nemo/collections/nlp/parts/peft_config.py +++ b/nemo/collections/nlp/parts/peft_config.py @@ -60,10 +60,14 @@ def __init__(self, cfg): else: kv_channels = cfg.kv_channels projection_size = kv_channels * cfg.num_attention_heads + num_query_groups = cfg.get("num_query_groups", None) + if num_query_groups is None: + num_query_groups = cfg.num_attention_heads + qkv_projection_size = projection_size + (2 * kv_channels * num_query_groups) config_args = { "in_features": cfg.hidden_size, - "out_features": 3 * projection_size, + "out_features": qkv_projection_size, "dim": lora_cfg.adapter_dim, "norm_position": None, "norm_type": None, From f8980ba2eb418ff033632caf9fa03d40fa894c34 Mon Sep 17 00:00:00 2001 From: Mehadi Hasan Menon Date: Wed, 4 Oct 2023 22:32:12 +0600 Subject: [PATCH 096/112] Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon Signed-off-by: Sasha Meister --- .../speech_recognition/confidence/benchmark_asr_confidence.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/speech_recognition/confidence/benchmark_asr_confidence.py b/scripts/speech_recognition/confidence/benchmark_asr_confidence.py index 246aa61c2c0e..f4558fa85256 100644 --- a/scripts/speech_recognition/confidence/benchmark_asr_confidence.py +++ b/scripts/speech_recognition/confidence/benchmark_asr_confidence.py @@ -32,7 +32,7 @@ ) from nemo.collections.asr.parts.utils.asr_confidence_utils import ConfidenceConfig from nemo.core.config import hydra_runner -from nemo.utils import logging +from nemo.utils import logging, model_utils """ Get confidence metrics and curve plots for a given model, dataset, and confidence parameters. From c8aa8ac1eefd191ad6140dc0239bff4dde23fbbd Mon Sep 17 00:00:00 2001 From: Nithin Rao Date: Wed, 4 Oct 2023 15:19:03 -0700 Subject: [PATCH 097/112] add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Koluguri Signed-off-by: Sasha Meister --- docs/source/asr/data/benchmark_en.csv | 2 ++ nemo/collections/asr/models/ctc_bpe_models.py | 7 +++++++ nemo/collections/asr/models/rnnt_bpe_models.py | 7 +++++++ 3 files changed, 16 insertions(+) diff --git a/docs/source/asr/data/benchmark_en.csv b/docs/source/asr/data/benchmark_en.csv index b41c675f423c..1669ecdeefb5 100644 --- a/docs/source/asr/data/benchmark_en.csv +++ b/docs/source/asr/data/benchmark_en.csv @@ -26,6 +26,8 @@ stt_en_conformer_transducer_medium,EncDecRNNTBPEModel,"https://ngc.nvidia.com/ca stt_en_conformer_transducer_large,EncDecRNNTBPEModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_conformer_transducer_large" stt_en_conformer_transducer_xlarge,EncDecRNNTBPEModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_conformer_transducer_xlarge" stt_en_conformer_transducer_xxlarge,EncDecRNNTBPEModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_conformer_transducer_xxlarge" +stt_en_fastconformer_ctc_large_ls,EncDecCTCModelBPE,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_fastconformer_ctc_large_ls" +stt_en_fastconformer_transducer_large_ls,EncDecRNNTBPEModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_fastconformer_transducer_large_ls" stt_en_fastconformer_transducer_large,EncDecRNNTBPEModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_fastconformer_transducer_large" stt_en_fastconformer_ctc_large,EncDecCTCModelBPE,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_fastconformer_ctc_large" stt_en_fastconformer_hybrid_large_pc,EncDecHybridRNNTCTCBPEModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_fastconformer_hybrid_large_pc" diff --git a/nemo/collections/asr/models/ctc_bpe_models.py b/nemo/collections/asr/models/ctc_bpe_models.py index aa26f27c29ab..7b17f7918e20 100644 --- a/nemo/collections/asr/models/ctc_bpe_models.py +++ b/nemo/collections/asr/models/ctc_bpe_models.py @@ -606,6 +606,13 @@ def list_available_models(cls) -> List[PretrainedModelInfo]: ) results.append(model) + model = PretrainedModelInfo( + pretrained_model_name="stt_en_fastconformer_ctc_large_ls", + description="For details about this model, please visit https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_fastconformer_ctc_large_ls", + location="https://api.ngc.nvidia.com/v2/models/nvidia/nemo/stt_en_fastconformer_ctc_large_ls/versions/1.0.0/files/stt_en_fastconformer_ctc_large_ls.nemo", + ) + results.append(model) + model = PretrainedModelInfo( pretrained_model_name="stt_en_fastconformer_ctc_xlarge", description="For details about this model, please visit https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_fastconformer_ctc_xlarge", diff --git a/nemo/collections/asr/models/rnnt_bpe_models.py b/nemo/collections/asr/models/rnnt_bpe_models.py index c72d18a8023b..2b8ed315903c 100644 --- a/nemo/collections/asr/models/rnnt_bpe_models.py +++ b/nemo/collections/asr/models/rnnt_bpe_models.py @@ -253,6 +253,13 @@ def list_available_models(cls) -> List[PretrainedModelInfo]: ) results.append(model) + model = PretrainedModelInfo( + pretrained_model_name="stt_en_fastconformer_transducer_large_ls", + description="For details about this model, please visit https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_fastconformer_transducer_large_ls", + location="https://api.ngc.nvidia.com/v2/models/nvidia/nemo/stt_en_fastconformer_transducer_large_ls/versions/1.0.0/files/stt_en_fastconformer_transducer_large_ls.nemo", + ) + results.append(model) + model = PretrainedModelInfo( pretrained_model_name="stt_en_fastconformer_transducer_xlarge", description="For details about this model, please visit https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_fastconformer_transducer_xlarge", From ad3a4de14d06093a469559a9899c45e7acd0a623 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 4 Oct 2023 22:40:58 -0700 Subject: [PATCH 098/112] bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister --- docs/source/asr/speaker_diarization/datasets.rst | 2 +- examples/asr/experimental/k2/align_speech_parallel.py | 2 +- examples/asr/experimental/k2/speech_to_text_bpe.py | 2 +- .../asr/speech_translation/speech_to_text_transformer.py | 2 +- .../multi_label_intent_slot_classification.py | 2 +- tutorials/tts/FastPitch_MixerTTS_Training.ipynb | 5 +++-- tutorials/tts/Tacotron2_Training.ipynb | 4 ++-- tutorials/tts/Vits_Training.ipynb | 2 +- 8 files changed, 11 insertions(+), 10 deletions(-) diff --git a/docs/source/asr/speaker_diarization/datasets.rst b/docs/source/asr/speaker_diarization/datasets.rst index ff73dad8601a..9f1a43a58f11 100644 --- a/docs/source/asr/speaker_diarization/datasets.rst +++ b/docs/source/asr/speaker_diarization/datasets.rst @@ -107,7 +107,7 @@ Prepare the msdd training dataset for both train and validation. After the train .. code-block:: bash python ./multiscale_diar_decoder.py --config-path='../conf/neural_diarizer' --config-name='msdd_5scl_15_05_50Povl_256x3x32x2.yaml' \ - trainer.gpus=1 \ + trainer.devices=1 \ trainer.max_epochs=20 \ model.base.diarizer.speaker_embeddings.model_path="titanet_large" \ model.train_ds.manifest_filepath="" \ diff --git a/examples/asr/experimental/k2/align_speech_parallel.py b/examples/asr/experimental/k2/align_speech_parallel.py index dcccee48ee27..8ddf036f3e38 100644 --- a/examples/asr/experimental/k2/align_speech_parallel.py +++ b/examples/asr/experimental/k2/align_speech_parallel.py @@ -46,7 +46,7 @@ python align_speech_parallel.py \ trainer.precision=16 \ - trainer.gpus=2 \ + trainer.devices=2 \ ... You may control the dataloader's config by setting the predict_ds: diff --git a/examples/asr/experimental/k2/speech_to_text_bpe.py b/examples/asr/experimental/k2/speech_to_text_bpe.py index 5eefdfaf1fe3..ee3924c7b8ac 100644 --- a/examples/asr/experimental/k2/speech_to_text_bpe.py +++ b/examples/asr/experimental/k2/speech_to_text_bpe.py @@ -50,7 +50,7 @@ model.validation_ds.manifest_filepath= \ model.tokenizer.dir= \ model.tokenizer.type= \ - trainer.gpus=-1 \ + trainer.devices=-1 \ trainer.accelerator="ddp" \ trainer.max_epochs=100 \ model.optim.name="adamw" \ diff --git a/examples/asr/speech_translation/speech_to_text_transformer.py b/examples/asr/speech_translation/speech_to_text_transformer.py index 0c0882859b88..dce19df87a72 100644 --- a/examples/asr/speech_translation/speech_to_text_transformer.py +++ b/examples/asr/speech_translation/speech_to_text_transformer.py @@ -24,7 +24,7 @@ model.tokenizer.dir= \ model.tokenizer.model_path= \ model.tokenizer.type= \ - trainer.gpus=-1 \ + trainer.devices=-1 \ trainer.accelerator="ddp" \ trainer.max_epochs=100 \ model.optim.name="adamw" \ diff --git a/examples/nlp/intent_slot_classification/multi_label_intent_slot_classification.py b/examples/nlp/intent_slot_classification/multi_label_intent_slot_classification.py index bed58ecc43dc..2441885e2ed2 100644 --- a/examples/nlp/intent_slot_classification/multi_label_intent_slot_classification.py +++ b/examples/nlp/intent_slot_classification/multi_label_intent_slot_classification.py @@ -19,7 +19,7 @@ model.data_dir=/home/user/multiatis \ model.validation_ds.prefix=dev \ model.test_ds.prefix=dev \ - trainer.gpus=[0] \ + trainer.devices=[0] \ +trainer.fast_dev_run=true \ exp_manager.exp_dir=checkpoints diff --git a/tutorials/tts/FastPitch_MixerTTS_Training.ipynb b/tutorials/tts/FastPitch_MixerTTS_Training.ipynb index cfcd607ee93c..747ecfa43127 100644 --- a/tutorials/tts/FastPitch_MixerTTS_Training.ipynb +++ b/tutorials/tts/FastPitch_MixerTTS_Training.ipynb @@ -515,7 +515,7 @@ " model.train_ds.dataloader_params.batch_size=24 \\\n", " model.validation_ds.dataloader_params.batch_size=24 \\\n", " exp_manager.exp_dir=./fastpitch_log_dir \\\n", - " model.n_speakers=1 trainer.devices=1 trainer.strategy=null \\\n", + " model.n_speakers=1 trainer.devices=1 trainer.strategy=\"ddp_find_unused_parameters_true\" \\\n", ")" ] }, @@ -565,7 +565,8 @@ "model.train_ds.dataloader_params.num_workers=0 \\\n", "model.validation_ds.dataloader_params.num_workers=0 \\\n", "trainer.max_epochs=3 \\\n", - "trainer.strategy=null \\\n", + "trainer.accelerator=\"gpu\" \\\n", + "trainer.strategy=\"ddp_find_unused_parameters_true\" \\\n", "trainer.check_val_every_n_epoch=1" ] }, diff --git a/tutorials/tts/Tacotron2_Training.ipynb b/tutorials/tts/Tacotron2_Training.ipynb index e2ae5082e608..79546bb79db9 100644 --- a/tutorials/tts/Tacotron2_Training.ipynb +++ b/tutorials/tts/Tacotron2_Training.ipynb @@ -295,9 +295,9 @@ " train_dataset=tests/data/asr/an4_train.json \\\n", " validation_datasets=tests/data/asr/an4_val.json \\\n", " trainer.max_epochs=3 \\\n", - " trainer.accelerator=null \\\n", + " trainer.accelerator='gpu' \\\n", " trainer.check_val_every_n_epoch=1 \\\n", - " +trainer.gpus=1)" + " trainer.devices=1)" ] }, { diff --git a/tutorials/tts/Vits_Training.ipynb b/tutorials/tts/Vits_Training.ipynb index db7161c06c61..a8a7ccc76ae2 100644 --- a/tutorials/tts/Vits_Training.ipynb +++ b/tutorials/tts/Vits_Training.ipynb @@ -251,7 +251,7 @@ " num_nodes: 1\n", " devices: 2\n", " accelerator: gpu\n", - " strategy: ddp\n", + " strategy: ddp_find_unused_parameters_true\n", " precision: 32\n", " max_epochs: -1\n", " accumulate_grad_batches: 1\n", From f83edf6425a63022fe125fbc823f3cb33d26f315 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 4 Oct 2023 22:44:48 -0700 Subject: [PATCH 099/112] fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Co-authored-by: Eric Harper Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister --- nemo/collections/asr/models/ssl_models.py | 16 ++++++++++++---- tutorials/asr/Self_Supervised_Pre_Training.ipynb | 6 +++--- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/nemo/collections/asr/models/ssl_models.py b/nemo/collections/asr/models/ssl_models.py index 8de713ca948d..6dca3815119f 100644 --- a/nemo/collections/asr/models/ssl_models.py +++ b/nemo/collections/asr/models/ssl_models.py @@ -527,7 +527,7 @@ def training_step(self, batch, batch_nb): return {'loss': loss_value, 'log': tensorboard_logs} - def validation_step(self, batch, batch_idx, dataloader_idx=0): + def validation_pass(self, batch, batch_idx, dataloader_idx=0): # Set flag to register tensors self._in_validation_step = True @@ -554,9 +554,17 @@ def validation_step(self, batch, batch_idx, dataloader_idx=0): self.reset_registry() del self._in_validation_step - return { - 'val_loss': loss_value, - } + metrics = {'val_loss': loss_value} + + return metrics + + def validation_step(self, batch, batch_idx, dataloader_idx=0): + metrics = self.validation_pass(batch, batch_idx, dataloader_idx) + if type(self.trainer.val_dataloaders) == list and len(self.trainer.val_dataloaders) > 1: + self.validation_step_outputs[dataloader_idx].append(metrics) + else: + self.validation_step_outputs.append(metrics) + return metrics def multi_validation_epoch_end(self, outputs, dataloader_idx: int = 0): val_loss_mean = torch.stack([x['val_loss'] for x in outputs]).mean() diff --git a/tutorials/asr/Self_Supervised_Pre_Training.ipynb b/tutorials/asr/Self_Supervised_Pre_Training.ipynb index 5f2c4dcc14c8..b055f14f5885 100644 --- a/tutorials/asr/Self_Supervised_Pre_Training.ipynb +++ b/tutorials/asr/Self_Supervised_Pre_Training.ipynb @@ -215,7 +215,7 @@ " file_id[file_id.find('-')+1 : file_id.rfind('-')],\n", " file_id + '.wav')\n", "\n", - " duration = librosa.core.get_duration(filename=audio_path)\n", + " duration = librosa.core.get_duration(path=audio_path)\n", "\n", " # Write the metadata to the manifest\n", " metadata = {\n", @@ -331,7 +331,7 @@ "\n", "cfg.model.optim.sched.name = \"CosineAnnealing\"\n", "cfg.model.optim.sched.warmup_steps = 1000\n", - "cfg.model.optim.sched.max_steps = 5000\n", + "cfg.model.optim.sched.max_steps = 2000\n", "#in practice you will usually want a much larger amount of pre-training steps\n", "cfg.model.optim.sched.min_lr = 0\n", "cfg.model.optim.lr = 0.015\n", @@ -554,7 +554,7 @@ "\n", "cfg.model.optim.sched.name = \"CosineAnnealing\"\n", "cfg.model.optim.sched.warmup_steps = 500\n", - "cfg.model.optim.sched.max_steps = 2000\n", + "cfg.model.optim.sched.max_steps = 1000\n", "cfg.model.optim.sched.min_lr = 0\n", "cfg.model.optim.lr = 0.015 #if encoder is frozen, lr can be much higher\n", "cfg.model.optim.weight_decay = 0\n", From 99a914ba382aacc1af7250d8ae6ddb67669a141f Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 4 Oct 2023 22:45:19 -0700 Subject: [PATCH 100/112] Fix metrics for SE tutorial (#7604) (#7612) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Ante Jukić Co-authored-by: anteju <108555623+anteju@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../asr/models/audio_to_audio_model.py | 34 +++++++++++-------- .../Speech_Enhancement_with_NeMo.ipynb | 9 ++--- 2 files changed, 22 insertions(+), 21 deletions(-) diff --git a/nemo/collections/asr/models/audio_to_audio_model.py b/nemo/collections/asr/models/audio_to_audio_model.py index b48cd0c14e62..21860cf8ab56 100644 --- a/nemo/collections/asr/models/audio_to_audio_model.py +++ b/nemo/collections/asr/models/audio_to_audio_model.py @@ -17,7 +17,7 @@ import hydra import torch -from omegaconf import DictConfig +from omegaconf import DictConfig, OmegaConf from pytorch_lightning import Trainer from nemo.collections.asr.metrics.audio import AudioMetricWrapper @@ -67,12 +67,15 @@ def _setup_metrics(self, tag: str = 'val'): logging.debug('Found %d metrics for tag %s, not necesary to initialize again', num_dataloaders, tag) return - if 'metrics' not in self._cfg or tag not in self._cfg['metrics']: + if self.cfg.get('metrics') is None: # Metrics are not available in the configuration, nothing to do - logging.debug('No metrics configured for %s in model.metrics.%s', tag, tag) + logging.debug('No metrics configured in model.metrics') return - metrics_cfg = self._cfg['metrics'][tag] + if (metrics_cfg := self.cfg['metrics'].get(tag)) is None: + # Metrics configuration is not available in the configuration, nothing to do + logging.debug('No metrics configured for %s in model.metrics', tag) + return if 'loss' in metrics_cfg: raise ValueError( @@ -86,16 +89,19 @@ def _setup_metrics(self, tag: str = 'val'): # Setup metrics for each dataloader self.metrics[tag] = torch.nn.ModuleList() for dataloader_idx in range(num_dataloaders): - metrics_dataloader_idx = torch.nn.ModuleDict( - { - name: AudioMetricWrapper( - metric=hydra.utils.instantiate(cfg), - channel=cfg.get('channel'), - metric_using_batch_averaging=cfg.get('metric_using_batch_averaging'), - ) - for name, cfg in metrics_cfg.items() - } - ) + metrics_dataloader_idx = {} + for name, cfg in metrics_cfg.items(): + logging.debug('Initialize %s for dataloader_idx %s', name, dataloader_idx) + cfg_dict = OmegaConf.to_container(cfg) + cfg_channel = cfg_dict.pop('channel', None) + cfg_batch_averaging = cfg_dict.pop('metric_using_batch_averaging', None) + metrics_dataloader_idx[name] = AudioMetricWrapper( + metric=hydra.utils.instantiate(cfg_dict), + channel=cfg_channel, + metric_using_batch_averaging=cfg_batch_averaging, + ) + + metrics_dataloader_idx = torch.nn.ModuleDict(metrics_dataloader_idx) self.metrics[tag].append(metrics_dataloader_idx.to(self.device)) logging.info( diff --git a/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb b/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb index 09226c83d654..a3706ab9c0ec 100644 --- a/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb +++ b/tutorials/audio_tasks/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb @@ -102,11 +102,6 @@ "from nemo.collections.asr.parts.utils.manifest_utils import read_manifest, write_manifest\n", "\n", "\n", - "# Used to download data processing scripts\n", - "USER = 'anteju' # TODO: change to 'NVIDIA'\n", - "BRANCH = 'dev/se-tutorial' # TODO: change to 'main'\n", - "\n", - "\n", "# Utility functions for displaying signals and metrics\n", "def show_signal(signal: np.ndarray, sample_rate: int = 16000, tag: str = 'Signal'):\n", " \"\"\"Show the time-domain signal and its spectrogram.\n", @@ -607,7 +602,7 @@ " '_target_': 'torchmetrics.audio.SignalDistortionRatio',\n", " }\n", "})\n", - "config.model.metrics.validation = metrics\n", + "config.model.metrics.val = metrics\n", "config.model.metrics.test = metrics\n", "\n", "print(\"Metrics config:\")\n", @@ -1112,7 +1107,7 @@ " 'channel': 1,\n", " },\n", "})\n", - "config_dual_output.model.metrics.validation = metrics\n", + "config_dual_output.model.metrics.val = metrics\n", "config_dual_output.model.metrics.test = metrics" ] }, From f1f38352797750ffb9f84fe176ceda431f5f321f Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu, 5 Oct 2023 09:21:49 -0700 Subject: [PATCH 101/112] Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../intent_slot_classification/intent_slot_classification.py | 4 ++++ examples/nlp/question_answering/question_answering.py | 5 ++++- .../normalization_as_tagging_train.py | 4 ++++ .../nlp/token_classification/token_classification_train.py | 2 +- .../models/text_normalization_as_tagging/thutmose_tagger.py | 2 +- tutorials/nlp/Entity_Linking_Medical.ipynb | 2 +- 6 files changed, 15 insertions(+), 4 deletions(-) diff --git a/examples/nlp/intent_slot_classification/intent_slot_classification.py b/examples/nlp/intent_slot_classification/intent_slot_classification.py index 39b4b7c5dde6..a112ea7785f5 100644 --- a/examples/nlp/intent_slot_classification/intent_slot_classification.py +++ b/examples/nlp/intent_slot_classification/intent_slot_classification.py @@ -23,6 +23,10 @@ @hydra_runner(config_path="conf", config_name="intent_slot_classification_config") def main(cfg: DictConfig) -> None: + # PTL 2.0 has find_unused_parameters as False by default, so its required to set it to True + # when there are unused parameters like here + if cfg.trainer.strategy == 'ddp': + cfg.trainer.strategy = "ddp_find_unused_parameters_true" logging.info(f'Config Params:\n {OmegaConf.to_yaml(cfg)}') trainer = pl.Trainer(**cfg.trainer) exp_manager(trainer, cfg.get("exp_manager", None)) diff --git a/examples/nlp/question_answering/question_answering.py b/examples/nlp/question_answering/question_answering.py index 2bcaf8f9eca2..fcde03582e5c 100644 --- a/examples/nlp/question_answering/question_answering.py +++ b/examples/nlp/question_answering/question_answering.py @@ -28,7 +28,10 @@ @hydra_runner(config_path="conf", config_name="qa_conf") def main(cfg: DictConfig) -> None: pl.seed_everything(42) - + # PTL 2.0 has find_unused_parameters as False by default, so its required to set it to True + # when there are unused parameters like here + if cfg.trainer.strategy == 'ddp': + cfg.trainer.strategy = "ddp_find_unused_parameters_true" logging.info(f'Config: {OmegaConf.to_yaml(cfg)}') trainer = pl.Trainer(**cfg.trainer) exp_dir = exp_manager(trainer, cfg.get("exp_manager", None)) diff --git a/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_train.py b/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_train.py index e87ff7748185..36fe97d2341b 100644 --- a/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_train.py +++ b/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_train.py @@ -62,6 +62,10 @@ @hydra_runner(config_path="conf", config_name="thutmose_tagger_itn_config") def main(cfg: DictConfig) -> None: + # PTL 2.0 has find_unused_parameters as False by default, so its required to set it to True + # when there are unused parameters like here + if cfg.trainer.strategy == 'ddp': + cfg.trainer.strategy = "ddp_find_unused_parameters_true" logging.info(f'Config Params: {OmegaConf.to_yaml(cfg)}') # Train the model diff --git a/examples/nlp/token_classification/token_classification_train.py b/examples/nlp/token_classification/token_classification_train.py index 9b18d10b24e6..51983a1af98b 100644 --- a/examples/nlp/token_classification/token_classification_train.py +++ b/examples/nlp/token_classification/token_classification_train.py @@ -103,7 +103,7 @@ @hydra_runner(config_path="conf", config_name="token_classification_config") def main(cfg: DictConfig) -> None: try: - strategy = NLPDDPStrategy() + strategy = NLPDDPStrategy(find_unused_parameters=True) except (ImportError, ModuleNotFoundError): strategy = None diff --git a/nemo/collections/nlp/models/text_normalization_as_tagging/thutmose_tagger.py b/nemo/collections/nlp/models/text_normalization_as_tagging/thutmose_tagger.py index f6e5c155646d..4c11dc157b2b 100644 --- a/nemo/collections/nlp/models/text_normalization_as_tagging/thutmose_tagger.py +++ b/nemo/collections/nlp/models/text_normalization_as_tagging/thutmose_tagger.py @@ -236,7 +236,7 @@ def validation_step(self, batch, batch_idx): val_loss_tag = self.loss_fn(logits=tag_logits, labels=tag_labels, loss_mask=labels_mask) val_loss_semiotic = self.loss_fn(logits=semiotic_logits, labels=semiotic_labels, loss_mask=labels_mask) val_loss = val_loss_tag + val_loss_semiotic - self.validation_step_outputs.append(val_loss) + self.validation_step_outputs.append({'val_loss': val_loss}) return {'val_loss': val_loss} def on_validation_epoch_end(self): diff --git a/tutorials/nlp/Entity_Linking_Medical.ipynb b/tutorials/nlp/Entity_Linking_Medical.ipynb index 78fe66d9f233..dfdf594e6804 100644 --- a/tutorials/nlp/Entity_Linking_Medical.ipynb +++ b/tutorials/nlp/Entity_Linking_Medical.ipynb @@ -188,7 +188,7 @@ "\n", "# remove distributed training flags\n", "cfg.trainer.strategy = 'auto'\n", - "cfg.trainer.accelerator = None" + "cfg.trainer.accelerator = 'auto'" ] }, { From 1f280a97599119435722263e132d3e05c5e7627a Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu, 5 Oct 2023 10:18:32 -0700 Subject: [PATCH 102/112] Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar * Guard MeCab and Ipadic Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar * Fix remaining ASR dataclasses Signed-off-by: smajumdar * Fix scripts Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym Co-authored-by: Sangkug Lym Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason --------- Signed-off-by: Jason Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar Signed-off-by: Sangkug Lym Signed-off-by: Jason Co-authored-by: Somshubra Majumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym Co-authored-by: Jason Signed-off-by: Sasha Meister --- .../experimental/k2/align_speech_parallel.py | 10 +- nemo/collections/asr/metrics/rnnt_wer.py | 8 +- nemo/collections/asr/metrics/wer.py | 12 +- .../asr/models/configs/aligner_config.py | 8 +- .../asr/models/configs/asr_models_config.py | 30 +-- .../configs/classification_models_config.py | 40 ++-- .../asr/models/configs/diarizer_config.py | 28 +-- .../configs/k2_sequence_models_config.py | 8 +- .../asr/models/configs/matchboxnet_config.py | 36 ++-- .../asr/models/configs/quartznet_config.py | 30 ++- .../asr/modules/audio_preprocessing.py | 6 + nemo/collections/asr/parts/k2/classes.py | 4 +- .../multi_head_attention_adapter_module.py | 14 +- .../asr/parts/submodules/ctc_beam_decoding.py | 6 +- .../parts/submodules/ctc_greedy_decoding.py | 4 +- .../parts/submodules/rnnt_greedy_decoding.py | 6 +- .../asr/parts/utils/asr_confidence_utils.py | 4 +- .../common/parts/adapter_modules.py | 6 +- .../common/tokenizers/en_ja_tokenizers.py | 15 +- .../machine_translation/mt_enc_dec_config.py | 172 ++++++++++-------- .../punctuation_capitalization_config.py | 32 ++-- .../common/megatron/megatron_encoders.py | 8 +- nemo/collections/tts/models/fastpitch.py | 6 +- nemo/collections/tts/models/tacotron2.py | 4 +- nemo/core/config/modelPT.py | 10 +- nemo/utils/exp_manager.py | 18 +- requirements/requirements_nlp.txt | 2 +- .../ngram_lm/eval_beamsearch_ngram.py | 6 +- .../eval_beamsearch_ngram_transducer.py | 2 +- .../confidence_ensembles/build_ensemble.py | 26 ++- .../confidence/benchmark_asr_confidence.py | 6 +- .../convert_to_tarred_audio_dataset.py | 2 +- .../asr/test_text_to_text_dataset.py | 4 +- tools/nemo_forced_aligner/align.py | 4 +- 34 files changed, 342 insertions(+), 235 deletions(-) diff --git a/examples/asr/experimental/k2/align_speech_parallel.py b/examples/asr/experimental/k2/align_speech_parallel.py index 8ddf036f3e38..bd03420e94c1 100644 --- a/examples/asr/experimental/k2/align_speech_parallel.py +++ b/examples/asr/experimental/k2/align_speech_parallel.py @@ -74,7 +74,7 @@ import os -from dataclasses import dataclass, is_dataclass +from dataclasses import dataclass, field, is_dataclass from typing import Optional import pytorch_lightning as ptl @@ -94,12 +94,14 @@ @dataclass class ParallelAlignmentConfig: model: Optional[str] = None # name - predict_ds: ASRDatasetConfig = ASRDatasetConfig(return_sample_id=True, num_workers=4) - aligner_args: K2AlignerWrapperModelConfig = K2AlignerWrapperModelConfig() + predict_ds: ASRDatasetConfig = field( + default_factory=lambda: ASRDatasetConfig(return_sample_id=True, num_workers=4) + ) + aligner_args: K2AlignerWrapperModelConfig = field(default_factory=lambda: K2AlignerWrapperModelConfig()) output_path: str = MISSING model_stride: int = 8 - trainer: TrainerConfig = TrainerConfig(gpus=-1, accelerator="ddp") + trainer: TrainerConfig = field(default_factory=lambda: TrainerConfig(gpus=-1, accelerator="ddp")) # there arguments will be ignored return_predictions: bool = False diff --git a/nemo/collections/asr/metrics/rnnt_wer.py b/nemo/collections/asr/metrics/rnnt_wer.py index 97c9c4575982..5518c8f0a25c 100644 --- a/nemo/collections/asr/metrics/rnnt_wer.py +++ b/nemo/collections/asr/metrics/rnnt_wer.py @@ -15,7 +15,7 @@ import copy import re from abc import abstractmethod -from dataclasses import dataclass, is_dataclass +from dataclasses import dataclass, field, is_dataclass from typing import Callable, Dict, List, Optional, Tuple, Union import editdistance @@ -1299,7 +1299,7 @@ class RNNTDecodingConfig: preserve_alignments: Optional[bool] = None # confidence config - confidence_cfg: ConfidenceConfig = ConfidenceConfig() + confidence_cfg: ConfidenceConfig = field(default_factory=lambda: ConfidenceConfig()) # RNNT Joint fused batch size fused_batch_size: Optional[int] = None @@ -1317,10 +1317,10 @@ class RNNTDecodingConfig: rnnt_timestamp_type: str = "all" # can be char, word or all for both # greedy decoding config - greedy: greedy_decode.GreedyRNNTInferConfig = greedy_decode.GreedyRNNTInferConfig() + greedy: greedy_decode.GreedyRNNTInferConfig = field(default_factory=lambda: greedy_decode.GreedyRNNTInferConfig()) # beam decoding config - beam: beam_decode.BeamRNNTInferConfig = beam_decode.BeamRNNTInferConfig(beam_size=4) + beam: beam_decode.BeamRNNTInferConfig = field(default_factory=lambda: beam_decode.BeamRNNTInferConfig(beam_size=4)) # can be used to change temperature for decoding temperature: float = 1.0 diff --git a/nemo/collections/asr/metrics/wer.py b/nemo/collections/asr/metrics/wer.py index 14fa46b308ab..46984ff86435 100644 --- a/nemo/collections/asr/metrics/wer.py +++ b/nemo/collections/asr/metrics/wer.py @@ -14,7 +14,7 @@ import re from abc import abstractmethod -from dataclasses import dataclass, is_dataclass +from dataclasses import dataclass, field, is_dataclass from typing import Callable, Dict, List, Optional, Tuple, Union import editdistance @@ -1297,13 +1297,17 @@ class CTCDecodingConfig: batch_dim_index: int = 0 # greedy decoding config - greedy: ctc_greedy_decoding.GreedyCTCInferConfig = ctc_greedy_decoding.GreedyCTCInferConfig() + greedy: ctc_greedy_decoding.GreedyCTCInferConfig = field( + default_factory=lambda: ctc_greedy_decoding.GreedyCTCInferConfig() + ) # beam decoding config - beam: ctc_beam_decoding.BeamCTCInferConfig = ctc_beam_decoding.BeamCTCInferConfig(beam_size=4) + beam: ctc_beam_decoding.BeamCTCInferConfig = field( + default_factory=lambda: ctc_beam_decoding.BeamCTCInferConfig(beam_size=4) + ) # confidence config - confidence_cfg: ConfidenceConfig = ConfidenceConfig() + confidence_cfg: ConfidenceConfig = field(default_factory=lambda: ConfidenceConfig()) # can be used to change temperature for decoding temperature: float = 1.0 diff --git a/nemo/collections/asr/models/configs/aligner_config.py b/nemo/collections/asr/models/configs/aligner_config.py index 06b41b5c115b..cf2cdd176719 100644 --- a/nemo/collections/asr/models/configs/aligner_config.py +++ b/nemo/collections/asr/models/configs/aligner_config.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -from dataclasses import dataclass +from dataclasses import dataclass, field from nemo.collections.asr.parts.k2.classes import GraphModuleConfig @@ -35,10 +35,10 @@ class AlignerWrapperModelConfig: word_output: bool = True cpu_decoding: bool = False decode_batch_size: int = 0 - ctc_cfg: AlignerCTCConfig = AlignerCTCConfig() - rnnt_cfg: AlignerRNNTConfig = AlignerRNNTConfig() + ctc_cfg: AlignerCTCConfig = field(default_factory=lambda: AlignerCTCConfig()) + rnnt_cfg: AlignerRNNTConfig = field(default_factory=lambda: AlignerRNNTConfig()) @dataclass class K2AlignerWrapperModelConfig(AlignerWrapperModelConfig): - decoder_module_cfg: GraphModuleConfig = GraphModuleConfig() + decoder_module_cfg: GraphModuleConfig = field(default_factory=lambda: GraphModuleConfig()) diff --git a/nemo/collections/asr/models/configs/asr_models_config.py b/nemo/collections/asr/models/configs/asr_models_config.py index 609d42216659..ce480cac8428 100644 --- a/nemo/collections/asr/models/configs/asr_models_config.py +++ b/nemo/collections/asr/models/configs/asr_models_config.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -from dataclasses import dataclass +from dataclasses import dataclass, field from typing import Any, Dict, List, Optional from omegaconf import MISSING @@ -74,24 +74,32 @@ class EncDecCTCConfig(model_cfg.ModelConfig): labels: List[str] = MISSING # Dataset configs - train_ds: ASRDatasetConfig = ASRDatasetConfig(manifest_filepath=None, shuffle=True) - validation_ds: ASRDatasetConfig = ASRDatasetConfig(manifest_filepath=None, shuffle=False) - test_ds: ASRDatasetConfig = ASRDatasetConfig(manifest_filepath=None, shuffle=False) + train_ds: ASRDatasetConfig = field(default_factory=lambda: ASRDatasetConfig(manifest_filepath=None, shuffle=True)) + validation_ds: ASRDatasetConfig = field( + default_factory=lambda: ASRDatasetConfig(manifest_filepath=None, shuffle=False) + ) + test_ds: ASRDatasetConfig = field(default_factory=lambda: ASRDatasetConfig(manifest_filepath=None, shuffle=False)) # Optimizer / Scheduler config - optim: Optional[model_cfg.OptimConfig] = model_cfg.OptimConfig(sched=model_cfg.SchedConfig()) + optim: Optional[model_cfg.OptimConfig] = field( + default_factory=lambda: model_cfg.OptimConfig(sched=model_cfg.SchedConfig()) + ) # Model component configs - preprocessor: AudioToMelSpectrogramPreprocessorConfig = AudioToMelSpectrogramPreprocessorConfig() - spec_augment: Optional[SpectrogramAugmentationConfig] = SpectrogramAugmentationConfig() - encoder: ConvASREncoderConfig = ConvASREncoderConfig() - decoder: ConvASRDecoderConfig = ConvASRDecoderConfig() - decoding: CTCDecodingConfig = CTCDecodingConfig() + preprocessor: AudioToMelSpectrogramPreprocessorConfig = field( + default_factory=lambda: AudioToMelSpectrogramPreprocessorConfig() + ) + spec_augment: Optional[SpectrogramAugmentationConfig] = field( + default_factory=lambda: SpectrogramAugmentationConfig() + ) + encoder: ConvASREncoderConfig = field(default_factory=lambda: ConvASREncoderConfig()) + decoder: ConvASRDecoderConfig = field(default_factory=lambda: ConvASRDecoderConfig()) + decoding: CTCDecodingConfig = field(default_factory=lambda: CTCDecodingConfig()) @dataclass class EncDecCTCModelConfig(model_cfg.NemoConfig): - model: EncDecCTCConfig = EncDecCTCConfig() + model: EncDecCTCConfig = field(default_factory=lambda: EncDecCTCConfig()) @dataclass diff --git a/nemo/collections/asr/models/configs/classification_models_config.py b/nemo/collections/asr/models/configs/classification_models_config.py index 0df911e9e69a..33408f591c8e 100644 --- a/nemo/collections/asr/models/configs/classification_models_config.py +++ b/nemo/collections/asr/models/configs/classification_models_config.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -from dataclasses import dataclass +from dataclasses import dataclass, field from typing import Any, Dict, List, Optional from omegaconf import MISSING @@ -72,30 +72,40 @@ class EncDecClassificationConfig(model_cfg.ModelConfig): timesteps: int = MISSING # Dataset configs - train_ds: EncDecClassificationDatasetConfig = EncDecClassificationDatasetConfig( - manifest_filepath=None, shuffle=True, trim_silence=False + train_ds: EncDecClassificationDatasetConfig = field( + default_factory=lambda: EncDecClassificationDatasetConfig( + manifest_filepath=None, shuffle=True, trim_silence=False + ) ) - validation_ds: EncDecClassificationDatasetConfig = EncDecClassificationDatasetConfig( - manifest_filepath=None, shuffle=False + validation_ds: EncDecClassificationDatasetConfig = field( + default_factory=lambda: EncDecClassificationDatasetConfig(manifest_filepath=None, shuffle=False) ) - test_ds: EncDecClassificationDatasetConfig = EncDecClassificationDatasetConfig( - manifest_filepath=None, shuffle=False + test_ds: EncDecClassificationDatasetConfig = field( + default_factory=lambda: EncDecClassificationDatasetConfig(manifest_filepath=None, shuffle=False) ) # Optimizer / Scheduler config - optim: Optional[model_cfg.OptimConfig] = model_cfg.OptimConfig(sched=model_cfg.SchedConfig()) + optim: Optional[model_cfg.OptimConfig] = field( + default_factory=lambda: model_cfg.OptimConfig(sched=model_cfg.SchedConfig()) + ) # Model component configs - preprocessor: AudioToMFCCPreprocessorConfig = AudioToMFCCPreprocessorConfig() - spec_augment: Optional[SpectrogramAugmentationConfig] = SpectrogramAugmentationConfig() - crop_or_pad_augment: Optional[CropOrPadSpectrogramAugmentationConfig] = CropOrPadSpectrogramAugmentationConfig( - audio_length=timesteps + preprocessor: AudioToMFCCPreprocessorConfig = field(default_factory=lambda: AudioToMFCCPreprocessorConfig()) + spec_augment: Optional[SpectrogramAugmentationConfig] = field( + default_factory=lambda: SpectrogramAugmentationConfig() + ) + crop_or_pad_augment: Optional[CropOrPadSpectrogramAugmentationConfig] = field( + default_factory=lambda: CropOrPadSpectrogramAugmentationConfig(audio_length=-1) ) - encoder: ConvASREncoderConfig = ConvASREncoderConfig() - decoder: ConvASRDecoderClassificationConfig = ConvASRDecoderClassificationConfig() + encoder: ConvASREncoderConfig = field(default_factory=lambda: ConvASREncoderConfig()) + decoder: ConvASRDecoderClassificationConfig = field(default_factory=lambda: ConvASRDecoderClassificationConfig()) + + def __post_init__(self): + if self.crop_or_pad_augment is not None: + self.crop_or_pad_augment.audio_length = self.timesteps @dataclass class EncDecClassificationModelConfig(model_cfg.NemoConfig): - model: EncDecClassificationConfig = EncDecClassificationConfig() + model: EncDecClassificationConfig = field(default_factory=lambda: EncDecClassificationConfig()) diff --git a/nemo/collections/asr/models/configs/diarizer_config.py b/nemo/collections/asr/models/configs/diarizer_config.py index c09bb1dfb8f4..0745a6f2a451 100644 --- a/nemo/collections/asr/models/configs/diarizer_config.py +++ b/nemo/collections/asr/models/configs/diarizer_config.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -from dataclasses import asdict, dataclass +from dataclasses import asdict, dataclass, field from typing import Any, Dict, Optional, Tuple, Union @@ -78,9 +78,9 @@ class ASRDiarizerParams(DiarizerComponentConfig): @dataclass class ASRDiarizerConfig(DiarizerComponentConfig): model_path: Optional[str] = "stt_en_conformer_ctc_large" - parameters: ASRDiarizerParams = ASRDiarizerParams() - ctc_decoder_parameters: ASRDiarizerCTCDecoderParams = ASRDiarizerCTCDecoderParams() - realigning_lm_parameters: ASRRealigningLMParams = ASRRealigningLMParams() + parameters: ASRDiarizerParams = field(default_factory=lambda: ASRDiarizerParams()) + ctc_decoder_parameters: ASRDiarizerCTCDecoderParams = field(default_factory=lambda: ASRDiarizerCTCDecoderParams()) + realigning_lm_parameters: ASRRealigningLMParams = field(default_factory=lambda: ASRRealigningLMParams()) @dataclass @@ -102,7 +102,7 @@ class VADParams(DiarizerComponentConfig): class VADConfig(DiarizerComponentConfig): model_path: str = "vad_multilingual_marblenet" # .nemo local model path or pretrained VAD model name external_vad_manifest: Optional[str] = None - parameters: VADParams = VADParams() + parameters: VADParams = field(default_factory=lambda: VADParams()) @dataclass @@ -121,7 +121,7 @@ class SpeakerEmbeddingsParams(DiarizerComponentConfig): class SpeakerEmbeddingsConfig(DiarizerComponentConfig): # .nemo local model path or pretrained model name (titanet_large, ecapa_tdnn or speakerverification_speakernet) model_path: Optional[str] = None - parameters: SpeakerEmbeddingsParams = SpeakerEmbeddingsParams() + parameters: SpeakerEmbeddingsParams = field(default_factory=lambda: SpeakerEmbeddingsParams()) @dataclass @@ -142,7 +142,7 @@ class ClusteringParams(DiarizerComponentConfig): @dataclass class ClusteringConfig(DiarizerComponentConfig): - parameters: ClusteringParams = ClusteringParams() + parameters: ClusteringParams = field(default_factory=lambda: ClusteringParams()) @dataclass @@ -166,7 +166,7 @@ class MSDDParams(DiarizerComponentConfig): @dataclass class MSDDConfig(DiarizerComponentConfig): model_path: Optional[str] = "diar_msdd_telephonic" - parameters: MSDDParams = MSDDParams() + parameters: MSDDParams = field(default_factory=lambda: MSDDParams()) @dataclass @@ -176,16 +176,16 @@ class DiarizerConfig(DiarizerComponentConfig): oracle_vad: bool = False # If True, uses RTTM files provided in the manifest file to get VAD timestamps collar: float = 0.25 # Collar value for scoring ignore_overlap: bool = True # Consider or ignore overlap segments while scoring - vad: VADConfig = VADConfig() - speaker_embeddings: SpeakerEmbeddingsConfig = SpeakerEmbeddingsConfig() - clustering: ClusteringConfig = ClusteringConfig() - msdd_model: MSDDConfig = MSDDConfig() - asr: ASRDiarizerConfig = ASRDiarizerConfig() + vad: VADConfig = field(default_factory=lambda: VADConfig()) + speaker_embeddings: SpeakerEmbeddingsConfig = field(default_factory=lambda: SpeakerEmbeddingsConfig()) + clustering: ClusteringConfig = field(default_factory=lambda: ClusteringConfig()) + msdd_model: MSDDConfig = field(default_factory=lambda: MSDDConfig()) + asr: ASRDiarizerConfig = field(default_factory=lambda: ASRDiarizerConfig()) @dataclass class NeuralDiarizerInferenceConfig(DiarizerComponentConfig): - diarizer: DiarizerConfig = DiarizerConfig() + diarizer: DiarizerConfig = field(default_factory=lambda: DiarizerConfig()) device: str = "cpu" verbose: bool = False batch_size: int = 64 diff --git a/nemo/collections/asr/models/configs/k2_sequence_models_config.py b/nemo/collections/asr/models/configs/k2_sequence_models_config.py index 5a112f626f46..53ed3e1377fe 100644 --- a/nemo/collections/asr/models/configs/k2_sequence_models_config.py +++ b/nemo/collections/asr/models/configs/k2_sequence_models_config.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -from dataclasses import dataclass +from dataclasses import dataclass, field from nemo.collections.asr.models.configs.asr_models_config import EncDecCTCConfig from nemo.collections.asr.parts.k2.classes import GraphModuleConfig as BackendConfig @@ -26,14 +26,14 @@ class GraphModuleConfig: split_batch_size: int = 0 dec_type: str = "topo" transcribe_training: bool = True - backend_cfg: BackendConfig = BackendConfig() + backend_cfg: BackendConfig = field(default_factory=lambda: BackendConfig()) @dataclass class EncDecK2SeqConfig(EncDecCTCConfig): - graph_module_cfg: GraphModuleConfig = GraphModuleConfig() + graph_module_cfg: GraphModuleConfig = field(default_factory=lambda: GraphModuleConfig()) @dataclass class EncDecK2SeqModelConfig(NemoConfig): - model: EncDecK2SeqConfig = EncDecK2SeqConfig() + model: EncDecK2SeqConfig = field(default_factory=lambda: EncDecK2SeqConfig()) diff --git a/nemo/collections/asr/models/configs/matchboxnet_config.py b/nemo/collections/asr/models/configs/matchboxnet_config.py index 55a8b9fedec1..52ec4c35d9e8 100644 --- a/nemo/collections/asr/models/configs/matchboxnet_config.py +++ b/nemo/collections/asr/models/configs/matchboxnet_config.py @@ -107,30 +107,38 @@ class MatchboxNetModelConfig(clf_cfg.EncDecClassificationConfig): labels: List[str] = MISSING # Dataset configs - train_ds: clf_cfg.EncDecClassificationDatasetConfig = clf_cfg.EncDecClassificationDatasetConfig( - manifest_filepath=None, shuffle=True, trim_silence=False + train_ds: clf_cfg.EncDecClassificationDatasetConfig = field( + default_factory=lambda: clf_cfg.EncDecClassificationDatasetConfig( + manifest_filepath=None, shuffle=True, trim_silence=False + ) ) - validation_ds: clf_cfg.EncDecClassificationDatasetConfig = clf_cfg.EncDecClassificationDatasetConfig( - manifest_filepath=None, shuffle=False + validation_ds: clf_cfg.EncDecClassificationDatasetConfig = field( + default_factory=lambda: clf_cfg.EncDecClassificationDatasetConfig(manifest_filepath=None, shuffle=False) ) - test_ds: clf_cfg.EncDecClassificationDatasetConfig = clf_cfg.EncDecClassificationDatasetConfig( - manifest_filepath=None, shuffle=False + test_ds: clf_cfg.EncDecClassificationDatasetConfig = field( + default_factory=lambda: clf_cfg.EncDecClassificationDatasetConfig(manifest_filepath=None, shuffle=False) ) # Optimizer / Scheduler config - optim: Optional[model_cfg.OptimConfig] = model_cfg.OptimConfig(sched=model_cfg.SchedConfig()) + optim: Optional[model_cfg.OptimConfig] = field( + default_factory=lambda: model_cfg.OptimConfig(sched=model_cfg.SchedConfig()) + ) # Model general component configs - preprocessor: AudioToMFCCPreprocessorConfig = AudioToMFCCPreprocessorConfig(window_size=0.025) - spec_augment: Optional[SpectrogramAugmentationConfig] = SpectrogramAugmentationConfig( - freq_masks=2, time_masks=2, freq_width=15, time_width=25, rect_masks=5, rect_time=25, rect_freq=15 + preprocessor: AudioToMFCCPreprocessorConfig = field( + default_factory=lambda: AudioToMFCCPreprocessorConfig(window_size=0.025) + ) + spec_augment: Optional[SpectrogramAugmentationConfig] = field( + default_factory=lambda: SpectrogramAugmentationConfig( + freq_masks=2, time_masks=2, freq_width=15, time_width=25, rect_masks=5, rect_time=25, rect_freq=15 + ) ) - crop_or_pad_augment: Optional[CropOrPadSpectrogramAugmentationConfig] = CropOrPadSpectrogramAugmentationConfig( - audio_length=128 + crop_or_pad_augment: Optional[CropOrPadSpectrogramAugmentationConfig] = field( + default_factory=lambda: CropOrPadSpectrogramAugmentationConfig(audio_length=128) ) - encoder: ConvASREncoderConfig = ConvASREncoderConfig(activation="relu") - decoder: ConvASRDecoderClassificationConfig = ConvASRDecoderClassificationConfig() + encoder: ConvASREncoderConfig = field(default_factory=lambda: ConvASREncoderConfig(activation="relu")) + decoder: ConvASRDecoderClassificationConfig = field(default_factory=lambda: ConvASRDecoderClassificationConfig()) @dataclass diff --git a/nemo/collections/asr/models/configs/quartznet_config.py b/nemo/collections/asr/models/configs/quartznet_config.py index a1231002af41..93412b0053bf 100644 --- a/nemo/collections/asr/models/configs/quartznet_config.py +++ b/nemo/collections/asr/models/configs/quartznet_config.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -from dataclasses import dataclass +from dataclasses import dataclass, field from typing import Any, Callable, List, Optional from omegaconf import MISSING @@ -174,20 +174,30 @@ class JasperModelConfig(ctc_cfg.EncDecCTCConfig): labels: List[str] = MISSING # Dataset configs - train_ds: ctc_cfg.ASRDatasetConfig = ctc_cfg.ASRDatasetConfig( - manifest_filepath=None, shuffle=True, trim_silence=True + train_ds: ctc_cfg.ASRDatasetConfig = field( + default_factory=lambda: ctc_cfg.ASRDatasetConfig(manifest_filepath=None, shuffle=True, trim_silence=True) + ) + validation_ds: ctc_cfg.ASRDatasetConfig = field( + default_factory=lambda: ctc_cfg.ASRDatasetConfig(manifest_filepath=None, shuffle=False) + ) + test_ds: ctc_cfg.ASRDatasetConfig = field( + default_factory=lambda: ctc_cfg.ASRDatasetConfig(manifest_filepath=None, shuffle=False) ) - validation_ds: ctc_cfg.ASRDatasetConfig = ctc_cfg.ASRDatasetConfig(manifest_filepath=None, shuffle=False) - test_ds: ctc_cfg.ASRDatasetConfig = ctc_cfg.ASRDatasetConfig(manifest_filepath=None, shuffle=False) # Optimizer / Scheduler config - optim: Optional[model_cfg.OptimConfig] = model_cfg.OptimConfig(sched=model_cfg.SchedConfig()) + optim: Optional[model_cfg.OptimConfig] = field( + default_factory=lambda: model_cfg.OptimConfig(sched=model_cfg.SchedConfig()) + ) # Model general component configs - preprocessor: AudioToMelSpectrogramPreprocessorConfig = AudioToMelSpectrogramPreprocessorConfig() - spec_augment: Optional[SpectrogramAugmentationConfig] = SpectrogramAugmentationConfig() - encoder: ConvASREncoderConfig = ConvASREncoderConfig(activation="relu") - decoder: ConvASRDecoderConfig = ConvASRDecoderConfig() + preprocessor: AudioToMelSpectrogramPreprocessorConfig = field( + default_factory=lambda: AudioToMelSpectrogramPreprocessorConfig() + ) + spec_augment: Optional[SpectrogramAugmentationConfig] = field( + default_factory=lambda: SpectrogramAugmentationConfig() + ) + encoder: ConvASREncoderConfig = field(default_factory=lambda: ConvASREncoderConfig(activation="relu")) + decoder: ConvASRDecoderConfig = field(default_factory=lambda: ConvASRDecoderConfig()) @dataclass diff --git a/nemo/collections/asr/modules/audio_preprocessing.py b/nemo/collections/asr/modules/audio_preprocessing.py index 91c0c10b9604..471488bd9647 100644 --- a/nemo/collections/asr/modules/audio_preprocessing.py +++ b/nemo/collections/asr/modules/audio_preprocessing.py @@ -634,6 +634,12 @@ def __init__(self, audio_length): super(CropOrPadSpectrogramAugmentation, self).__init__() self.audio_length = audio_length + if self.audio_length < 0: + raise ValueError( + 'audio_length must be non-negative. If using a dataclass with OmegaConf, ' + 'please call OmegaConf.to_object(cfg) to call appropriate __post_init__ methods.' + ) + @typecheck() @torch.no_grad() def forward(self, input_signal, length): diff --git a/nemo/collections/asr/parts/k2/classes.py b/nemo/collections/asr/parts/k2/classes.py index bb749e15d4c6..d4c498f32a2d 100644 --- a/nemo/collections/asr/parts/k2/classes.py +++ b/nemo/collections/asr/parts/k2/classes.py @@ -13,7 +13,7 @@ # limitations under the License. from abc import ABC -from dataclasses import dataclass +from dataclasses import dataclass, field from typing import Any, Optional, Tuple import torch @@ -43,7 +43,7 @@ class GraphModuleConfig: topo_with_self_loops: bool = True token_lm: Optional[Any] = None intersect_pruned: bool = False - intersect_conf: GraphIntersectDenseConfig = GraphIntersectDenseConfig() + intersect_conf: GraphIntersectDenseConfig = field(default_factory=lambda: GraphIntersectDenseConfig()) boost_coeff: float = 0.0 predictor_window_size: int = 0 predictor_step_size: int = 1 diff --git a/nemo/collections/asr/parts/submodules/adapters/multi_head_attention_adapter_module.py b/nemo/collections/asr/parts/submodules/adapters/multi_head_attention_adapter_module.py index 563d4219baa7..333b630ff7f6 100644 --- a/nemo/collections/asr/parts/submodules/adapters/multi_head_attention_adapter_module.py +++ b/nemo/collections/asr/parts/submodules/adapters/multi_head_attention_adapter_module.py @@ -13,7 +13,7 @@ # limitations under the License. import math -from dataclasses import dataclass +from dataclasses import dataclass, field from typing import Any, Optional import torch @@ -183,7 +183,7 @@ class MultiHeadAttentionAdapterConfig: n_feat: int dropout_rate: float = 0.0 proj_dim: Optional[int] = None - adapter_strategy: Optional[Any] = MHAResidualAddAdapterStrategyConfig() + adapter_strategy: Optional[Any] = field(default_factory=lambda: MHAResidualAddAdapterStrategyConfig()) _target_: str = "{0}.{1}".format(MultiHeadAttentionAdapter.__module__, MultiHeadAttentionAdapter.__name__) @@ -287,7 +287,7 @@ class RelPositionMultiHeadAttentionAdapterConfig: n_feat: int dropout_rate: float = 0.0 proj_dim: Optional[int] = None - adapter_strategy: Optional[Any] = MHAResidualAddAdapterStrategyConfig() + adapter_strategy: Optional[Any] = field(default_factory=lambda: MHAResidualAddAdapterStrategyConfig()) _target_: str = "{0}.{1}".format( RelPositionMultiHeadAttentionAdapter.__module__, RelPositionMultiHeadAttentionAdapter.__name__ ) @@ -336,7 +336,9 @@ class PositionalEncodingAdapterConfig: d_model: int max_len: int = 5000 xscale: float = 1.0 - adapter_strategy: Optional[Any] = adapter_mixin_strategies.ResidualAddAdapterStrategyConfig() + adapter_strategy: Optional[Any] = field( + default_factory=lambda: adapter_mixin_strategies.ResidualAddAdapterStrategyConfig() + ) _target_: str = "{0}.{1}".format(PositionalEncodingAdapter.__module__, PositionalEncodingAdapter.__name__) @@ -378,5 +380,7 @@ class RelPositionalEncodingAdapterConfig: d_model: int max_len: int = 5000 xscale: float = 1.0 - adapter_strategy: Optional[Any] = adapter_mixin_strategies.ResidualAddAdapterStrategyConfig() + adapter_strategy: Optional[Any] = field( + default_factory=lambda: adapter_mixin_strategies.ResidualAddAdapterStrategyConfig() + ) _target_: str = "{0}.{1}".format(RelPositionalEncodingAdapter.__module__, RelPositionalEncodingAdapter.__name__) diff --git a/nemo/collections/asr/parts/submodules/ctc_beam_decoding.py b/nemo/collections/asr/parts/submodules/ctc_beam_decoding.py index 377e43cd5f91..5ed504fd9c45 100644 --- a/nemo/collections/asr/parts/submodules/ctc_beam_decoding.py +++ b/nemo/collections/asr/parts/submodules/ctc_beam_decoding.py @@ -14,7 +14,7 @@ import math import os -from dataclasses import dataclass +from dataclasses import dataclass, field from typing import List, Optional, Tuple, Union import torch @@ -602,5 +602,5 @@ class BeamCTCInferConfig: beam_beta: float = 0.0 kenlm_path: Optional[str] = None - flashlight_cfg: Optional[FlashlightConfig] = FlashlightConfig() - pyctcdecode_cfg: Optional[PyCTCDecodeConfig] = PyCTCDecodeConfig() + flashlight_cfg: Optional[FlashlightConfig] = field(default_factory=lambda: FlashlightConfig()) + pyctcdecode_cfg: Optional[PyCTCDecodeConfig] = field(default_factory=lambda: PyCTCDecodeConfig()) diff --git a/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py b/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py index 44ae9f4a134b..686ef79cabad 100644 --- a/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py +++ b/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -from dataclasses import dataclass +from dataclasses import dataclass, field from typing import List, Optional import torch @@ -253,7 +253,7 @@ class GreedyCTCInferConfig: preserve_alignments: bool = False compute_timestamps: bool = False preserve_frame_confidence: bool = False - confidence_method_cfg: Optional[ConfidenceMethodConfig] = ConfidenceMethodConfig() + confidence_method_cfg: Optional[ConfidenceMethodConfig] = field(default_factory=lambda: ConfidenceMethodConfig()) def __post_init__(self): # OmegaConf.structured ensures that post_init check is always executed diff --git a/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py b/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py index 95b0bdf5fd13..e37d282ed0de 100644 --- a/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py +++ b/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py @@ -26,7 +26,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -from dataclasses import dataclass +from dataclasses import dataclass, field from typing import List, Optional, Tuple, Union import numpy as np @@ -2185,7 +2185,7 @@ class GreedyRNNTInferConfig: max_symbols_per_step: Optional[int] = 10 preserve_alignments: bool = False preserve_frame_confidence: bool = False - confidence_method_cfg: Optional[ConfidenceMethodConfig] = ConfidenceMethodConfig() + confidence_method_cfg: Optional[ConfidenceMethodConfig] = field(default_factory=lambda: ConfidenceMethodConfig()) def __post_init__(self): # OmegaConf.structured ensures that post_init check is always executed @@ -2201,7 +2201,7 @@ class GreedyBatchedRNNTInferConfig: max_symbols_per_step: Optional[int] = 10 preserve_alignments: bool = False preserve_frame_confidence: bool = False - confidence_method_cfg: Optional[ConfidenceMethodConfig] = ConfidenceMethodConfig() + confidence_method_cfg: Optional[ConfidenceMethodConfig] = field(default_factory=lambda: ConfidenceMethodConfig()) def __post_init__(self): # OmegaConf.structured ensures that post_init check is always executed diff --git a/nemo/collections/asr/parts/utils/asr_confidence_utils.py b/nemo/collections/asr/parts/utils/asr_confidence_utils.py index ddfac3744c6a..c406f5451e84 100644 --- a/nemo/collections/asr/parts/utils/asr_confidence_utils.py +++ b/nemo/collections/asr/parts/utils/asr_confidence_utils.py @@ -14,7 +14,7 @@ import math from abc import ABC, abstractmethod -from dataclasses import dataclass +from dataclasses import dataclass, field from functools import partial from typing import List, Optional @@ -175,7 +175,7 @@ class ConfidenceConfig: preserve_word_confidence: bool = False exclude_blank: bool = True aggregation: str = "min" - method_cfg: ConfidenceMethodConfig = ConfidenceMethodConfig() + method_cfg: ConfidenceMethodConfig = field(default_factory=lambda: ConfidenceMethodConfig()) def __post_init__(self): # OmegaConf.structured ensures that post_init check is always executed diff --git a/nemo/collections/common/parts/adapter_modules.py b/nemo/collections/common/parts/adapter_modules.py index 46c635e73975..2084147f9cbc 100644 --- a/nemo/collections/common/parts/adapter_modules.py +++ b/nemo/collections/common/parts/adapter_modules.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -from dataclasses import dataclass, is_dataclass +from dataclasses import dataclass, field, is_dataclass from typing import Any, Optional from hydra.utils import instantiate @@ -160,5 +160,7 @@ class LinearAdapterConfig: activation: str = 'swish' norm_position: str = 'pre' dropout: float = 0.0 - adapter_strategy: Optional[Any] = adapter_mixin_strategies.ResidualAddAdapterStrategyConfig() + adapter_strategy: Optional[Any] = field( + default_factory=lambda: adapter_mixin_strategies.ResidualAddAdapterStrategyConfig() + ) _target_: str = "{0}.{1}".format(LinearAdapter.__module__, LinearAdapter.__name__) diff --git a/nemo/collections/common/tokenizers/en_ja_tokenizers.py b/nemo/collections/common/tokenizers/en_ja_tokenizers.py index 63ae11f5da33..cf58130834e9 100644 --- a/nemo/collections/common/tokenizers/en_ja_tokenizers.py +++ b/nemo/collections/common/tokenizers/en_ja_tokenizers.py @@ -14,11 +14,19 @@ import re from typing import List -import ipadic -import MeCab from pangu import spacing from sacremoses import MosesDetokenizer, MosesPunctNormalizer, MosesTokenizer +try: + import ipadic + import MeCab + + HAVE_MECAB = True + HAVE_IPADIC = True +except (ImportError, ModuleNotFoundError): + HAVE_MECAB = False + HAVE_IPADIC = False + class EnJaProcessor: """ @@ -67,6 +75,9 @@ class JaMecabProcessor: """ def __init__(self): + if not HAVE_MECAB or not HAVE_IPADIC: + raise ImportError("Please ensure that you have installed `MeCab` and `ipadic` to use JaMecabProcessor") + self.mecab_tokenizer = MeCab.Tagger(ipadic.MECAB_ARGS + " -Owakati") def detokenize(self, text: List[str]) -> str: diff --git a/nemo/collections/nlp/models/machine_translation/mt_enc_dec_config.py b/nemo/collections/nlp/models/machine_translation/mt_enc_dec_config.py index ea1e86eee88c..1a69c3a33fc0 100644 --- a/nemo/collections/nlp/models/machine_translation/mt_enc_dec_config.py +++ b/nemo/collections/nlp/models/machine_translation/mt_enc_dec_config.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -from dataclasses import dataclass +from dataclasses import dataclass, field from typing import Any, Optional, Tuple from omegaconf.omegaconf import MISSING @@ -46,7 +46,7 @@ class MTOptimConfig(OptimConfig): lr: float = 1e-3 betas: Tuple[float, float] = (0.9, 0.98) weight_decay: float = 0.0 - sched: Optional[MTSchedConfig] = MTSchedConfig() + sched: Optional[MTSchedConfig] = field(default_factory=lambda: MTSchedConfig()) @dataclass @@ -74,70 +74,80 @@ class MTEncDecModelConfig(EncDecNLPModelConfig): decoder_tokenizer: Any = MISSING decoder: Any = MISSING - head: TokenClassifierConfig = TokenClassifierConfig(log_softmax=True) + head: TokenClassifierConfig = field(default_factory=lambda: TokenClassifierConfig(log_softmax=True)) # dataset configurations - train_ds: Optional[TranslationDataConfig] = TranslationDataConfig( - src_file_name=MISSING, - tgt_file_name=MISSING, - tokens_in_batch=512, - clean=True, - shuffle=True, - cache_ids=False, - use_cache=False, + train_ds: Optional[TranslationDataConfig] = field( + default_factory=lambda: TranslationDataConfig( + src_file_name=MISSING, + tgt_file_name=MISSING, + tokens_in_batch=512, + clean=True, + shuffle=True, + cache_ids=False, + use_cache=False, + ) ) - validation_ds: Optional[TranslationDataConfig] = TranslationDataConfig( - src_file_name=MISSING, - tgt_file_name=MISSING, - tokens_in_batch=512, - clean=False, - shuffle=False, - cache_ids=False, - use_cache=False, + validation_ds: Optional[TranslationDataConfig] = field( + default_factory=lambda: TranslationDataConfig( + src_file_name=MISSING, + tgt_file_name=MISSING, + tokens_in_batch=512, + clean=False, + shuffle=False, + cache_ids=False, + use_cache=False, + ) ) - test_ds: Optional[TranslationDataConfig] = TranslationDataConfig( - src_file_name=MISSING, - tgt_file_name=MISSING, - tokens_in_batch=512, - clean=False, - shuffle=False, - cache_ids=False, - use_cache=False, + test_ds: Optional[TranslationDataConfig] = field( + default_factory=lambda: TranslationDataConfig( + src_file_name=MISSING, + tgt_file_name=MISSING, + tokens_in_batch=512, + clean=False, + shuffle=False, + cache_ids=False, + use_cache=False, + ) ) - optim: Optional[OptimConfig] = MTOptimConfig() + optim: Optional[OptimConfig] = field(default_factory=lambda: MTOptimConfig()) @dataclass class AAYNBaseConfig(MTEncDecModelConfig): # Attention is All You Need Base Configuration - encoder_tokenizer: TokenizerConfig = TokenizerConfig(library='yttm') - decoder_tokenizer: TokenizerConfig = TokenizerConfig(library='yttm') - - encoder: NeMoTransformerEncoderConfig = NeMoTransformerEncoderConfig( - library='nemo', - model_name=None, - pretrained=False, - hidden_size=512, - inner_size=2048, - num_layers=6, - num_attention_heads=8, - ffn_dropout=0.1, - attn_score_dropout=0.1, - attn_layer_dropout=0.1, + encoder_tokenizer: TokenizerConfig = field(default_factory=lambda: TokenizerConfig(library='yttm')) + decoder_tokenizer: TokenizerConfig = field(default_factory=lambda: TokenizerConfig(library='yttm')) + + encoder: NeMoTransformerEncoderConfig = field( + default_factory=lambda: NeMoTransformerEncoderConfig( + library='nemo', + model_name=None, + pretrained=False, + hidden_size=512, + inner_size=2048, + num_layers=6, + num_attention_heads=8, + ffn_dropout=0.1, + attn_score_dropout=0.1, + attn_layer_dropout=0.1, + ) ) - decoder: NeMoTransformerConfig = NeMoTransformerConfig( - library='nemo', - model_name=None, - pretrained=False, - hidden_size=512, - inner_size=2048, - num_layers=6, - num_attention_heads=8, - ffn_dropout=0.1, - attn_score_dropout=0.1, - attn_layer_dropout=0.1, + decoder: NeMoTransformerConfig = field( + default_factory=lambda: NeMoTransformerConfig( + library='nemo', + model_name=None, + pretrained=False, + hidden_size=512, + inner_size=2048, + num_layers=6, + num_attention_heads=8, + ffn_dropout=0.1, + attn_score_dropout=0.1, + attn_layer_dropout=0.1, + ) ) @@ -150,32 +160,36 @@ class MTBottleneckModelConfig(AAYNBaseConfig): recon_per_token: bool = True log_timing: bool = True - encoder: NeMoTransformerBottleneckEncoderConfig = NeMoTransformerBottleneckEncoderConfig( - library='nemo', - model_name=None, - pretrained=False, - hidden_size=512, - inner_size=2048, - num_layers=6, - num_attention_heads=8, - ffn_dropout=0.1, - attn_score_dropout=0.1, - attn_layer_dropout=0.1, - arch='seq2seq', - hidden_steps=32, - hidden_blocks=1, - hidden_init_method='params', + encoder: NeMoTransformerBottleneckEncoderConfig = field( + default_factory=lambda: NeMoTransformerBottleneckEncoderConfig( + library='nemo', + model_name=None, + pretrained=False, + hidden_size=512, + inner_size=2048, + num_layers=6, + num_attention_heads=8, + ffn_dropout=0.1, + attn_score_dropout=0.1, + attn_layer_dropout=0.1, + arch='seq2seq', + hidden_steps=32, + hidden_blocks=1, + hidden_init_method='params', + ) ) - decoder: NeMoTransformerBottleneckDecoderConfig = NeMoTransformerBottleneckDecoderConfig( - library='nemo', - model_name=None, - pretrained=False, - inner_size=2048, - num_layers=6, - num_attention_heads=8, - ffn_dropout=0.1, - attn_score_dropout=0.1, - attn_layer_dropout=0.1, - arch='seq2seq', + decoder: NeMoTransformerBottleneckDecoderConfig = field( + default_factory=lambda: NeMoTransformerBottleneckDecoderConfig( + library='nemo', + model_name=None, + pretrained=False, + inner_size=2048, + num_layers=6, + num_attention_heads=8, + ffn_dropout=0.1, + attn_score_dropout=0.1, + attn_layer_dropout=0.1, + arch='seq2seq', + ) ) diff --git a/nemo/collections/nlp/models/token_classification/punctuation_capitalization_config.py b/nemo/collections/nlp/models/token_classification/punctuation_capitalization_config.py index 6ca811fe273c..86bf12b92315 100644 --- a/nemo/collections/nlp/models/token_classification/punctuation_capitalization_config.py +++ b/nemo/collections/nlp/models/token_classification/punctuation_capitalization_config.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -from dataclasses import dataclass +from dataclasses import dataclass, field from typing import Any, Dict, Optional from omegaconf.omegaconf import MISSING, DictConfig, OmegaConf, open_dict @@ -215,13 +215,15 @@ class PunctuationCapitalizationModelConfig: This config is a part of :class:`~PunctuationCapitalizationConfig`. """ - class_labels: ClassLabelsConfig = ClassLabelsConfig() + class_labels: ClassLabelsConfig = field(default_factory=lambda: ClassLabelsConfig()) """A mandatory parameter containing a dictionary with names of label id files used in .nemo checkpoints. These file names can also be used for passing label vocabularies to the model. If you wish to use ``class_labels`` for passing vocabularies, please provide path to vocabulary files in ``model.common_dataset_parameters.label_vocab_dir`` parameter.""" - common_dataset_parameters: Optional[CommonDatasetParametersConfig] = CommonDatasetParametersConfig() + common_dataset_parameters: Optional[CommonDatasetParametersConfig] = field( + default_factory=lambda: CommonDatasetParametersConfig() + ) """Label ids and loss mask information information.""" train_ds: Optional[PunctuationCapitalizationTrainDataConfig] = None @@ -233,16 +235,16 @@ class PunctuationCapitalizationModelConfig: test_ds: Optional[PunctuationCapitalizationEvalDataConfig] = None """A configuration for creating test datasets and data loaders.""" - punct_head: HeadConfig = HeadConfig() + punct_head: HeadConfig = field(default_factory=lambda: HeadConfig()) """A configuration for creating punctuation MLP head that is applied to a language model outputs.""" - capit_head: HeadConfig = HeadConfig() + capit_head: HeadConfig = field(default_factory=lambda: HeadConfig()) """A configuration for creating capitalization MLP head that is applied to a language model outputs.""" - tokenizer: Any = TokenizerConfig() + tokenizer: Any = field(default_factory=lambda: TokenizerConfig()) """A configuration for source text tokenizer.""" - language_model: LanguageModelConfig = LanguageModelConfig() + language_model: LanguageModelConfig = field(default_factory=lambda: LanguageModelConfig()) """A configuration of a BERT-like language model which serves as a model body.""" optim: Optional[Any] = None @@ -311,22 +313,30 @@ class PunctuationCapitalizationConfig(NemoConfig): do_testing: bool = False """Whether ot perform testing of the model.""" - model: PunctuationCapitalizationModelConfig = PunctuationCapitalizationModelConfig() + model: PunctuationCapitalizationModelConfig = field(default_factory=lambda: PunctuationCapitalizationModelConfig()) """A configuration for the :class:`~nemo.collections.nlp.models.token_classification.punctuation_capitalization_model.PunctuationCapitalizationModel` model.""" - trainer: Optional[TrainerConfig] = TrainerConfig() + trainer: Optional[TrainerConfig] = field(default_factory=lambda: TrainerConfig()) """Contains ``Trainer`` Lightning class constructor parameters.""" - exp_manager: Optional[ExpManagerConfig] = ExpManagerConfig(name=name, files_to_copy=[]) + exp_manager: Optional[ExpManagerConfig] = field( + default_factory=lambda: ExpManagerConfig(name=None, files_to_copy=[]) + ) """A configuration with various NeMo training options such as output directories, resuming from checkpoint, tensorboard and W&B logging, and so on. For possible options see :ref:`exp-manager-label`.""" + def __post_init__(self): + if self.exp_manager is not None: + self.exp_manager.name = self.name + @dataclass class PunctuationCapitalizationLexicalAudioConfig(PunctuationCapitalizationConfig): - model: PunctuationCapitalizationLexicalAudioModelConfig = PunctuationCapitalizationLexicalAudioModelConfig() + model: PunctuationCapitalizationLexicalAudioModelConfig = field( + default_factory=lambda: PunctuationCapitalizationLexicalAudioModelConfig() + ) def is_legacy_model_config(model_cfg: DictConfig) -> bool: diff --git a/nemo/collections/nlp/modules/common/megatron/megatron_encoders.py b/nemo/collections/nlp/modules/common/megatron/megatron_encoders.py index 4bd99f7120f0..e2c7f22235d7 100644 --- a/nemo/collections/nlp/modules/common/megatron/megatron_encoders.py +++ b/nemo/collections/nlp/modules/common/megatron/megatron_encoders.py @@ -13,7 +13,6 @@ # limitations under the License. """Transformer based language model.""" -from MeCab import Model from nemo.collections.nlp.modules.common.megatron.megatron_perceiver_encoders import MegatronPerceiverEncoderModule from nemo.collections.nlp.modules.common.megatron.megatron_transformer_encoder import MegatronTransformerEncoderModule from nemo.collections.nlp.modules.common.megatron.retrieval_transformer import ( @@ -25,6 +24,13 @@ scaled_init_method_normal, ) +try: + from MeCab import Model + + HAVE_MECAB = True +except (ImportError, ModuleNotFoundError): + HAVE_MECAB = False + try: from apex.transformer.enums import AttnMaskType, ModelType diff --git a/nemo/collections/tts/models/fastpitch.py b/nemo/collections/tts/models/fastpitch.py index a0e5497c6e92..bfb00e00c4ba 100644 --- a/nemo/collections/tts/models/fastpitch.py +++ b/nemo/collections/tts/models/fastpitch.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. import contextlib -from dataclasses import dataclass +from dataclasses import dataclass, field from pathlib import Path from typing import List, Optional @@ -70,12 +70,12 @@ class TextTokenizer: apostrophe: bool = True pad_with_space: bool = True add_blank_at: bool = True - g2p: G2PConfig = G2PConfig() + g2p: G2PConfig = field(default_factory=lambda: G2PConfig()) @dataclass class TextTokenizerConfig: - text_tokenizer: TextTokenizer = TextTokenizer() + text_tokenizer: TextTokenizer = field(default_factory=lambda: TextTokenizer()) class FastPitchModel(SpectrogramGenerator, Exportable, FastPitchAdapterModelMixin): diff --git a/nemo/collections/tts/models/tacotron2.py b/nemo/collections/tts/models/tacotron2.py index 846d8afec06e..3fcdee9832ef 100644 --- a/nemo/collections/tts/models/tacotron2.py +++ b/nemo/collections/tts/models/tacotron2.py @@ -13,7 +13,7 @@ # limitations under the License. import contextlib -from dataclasses import dataclass +from dataclasses import dataclass, field from typing import Any, Dict, List, Optional import torch @@ -53,7 +53,7 @@ class Preprocessor: @dataclass class Tacotron2Config: - preprocessor: Preprocessor = Preprocessor() + preprocessor: Preprocessor = field(default_factory=lambda: Preprocessor()) encoder: Dict[Any, Any] = MISSING decoder: Dict[Any, Any] = MISSING postnet: Dict[Any, Any] = MISSING diff --git a/nemo/core/config/modelPT.py b/nemo/core/config/modelPT.py index 70e9b675e360..713c83379431 100644 --- a/nemo/core/config/modelPT.py +++ b/nemo/core/config/modelPT.py @@ -58,11 +58,13 @@ class HydraConfig: class NemoConfig: name: str = MISSING model: ModelConfig = MISSING - trainer: config.TrainerConfig = config.TrainerConfig( - strategy="ddp", enable_checkpointing=False, logger=False, log_every_n_steps=1, accelerator='gpu' + trainer: config.TrainerConfig = field( + default_factory=lambda: config.TrainerConfig( + strategy="ddp", enable_checkpointing=False, logger=False, log_every_n_steps=1, accelerator='gpu' + ) ) - exp_manager: Optional[Any] = exp_manager.ExpManagerConfig() - hydra: HydraConfig = HydraConfig() + exp_manager: Optional[Any] = field(default_factory=lambda: exp_manager.ExpManagerConfig()) + hydra: HydraConfig = field(default_factory=lambda: HydraConfig()) class ModelConfigBuilder: diff --git a/nemo/utils/exp_manager.py b/nemo/utils/exp_manager.py index 1125ae06cabb..1f90ffe0d2a3 100644 --- a/nemo/utils/exp_manager.py +++ b/nemo/utils/exp_manager.py @@ -18,7 +18,7 @@ import sys import time import warnings -from dataclasses import dataclass +from dataclasses import dataclass, field from datetime import timedelta from pathlib import Path from shutil import copy, move @@ -146,28 +146,30 @@ class ExpManagerConfig: create_wandb_logger: Optional[bool] = False wandb_logger_kwargs: Optional[Dict[Any, Any]] = None create_mlflow_logger: Optional[bool] = False - mlflow_logger_kwargs: Optional[MLFlowParams] = MLFlowParams() + mlflow_logger_kwargs: Optional[MLFlowParams] = field(default_factory=lambda: MLFlowParams()) create_dllogger_logger: Optional[bool] = False - dllogger_logger_kwargs: Optional[DLLoggerParams] = DLLoggerParams() + dllogger_logger_kwargs: Optional[DLLoggerParams] = field(default_factory=lambda: DLLoggerParams()) create_clearml_logger: Optional[bool] = False - clearml_logger_kwargs: Optional[ClearMLParams] = ClearMLParams() + clearml_logger_kwargs: Optional[ClearMLParams] = field(default_factory=lambda: ClearMLParams()) # Checkpointing parameters create_checkpoint_callback: Optional[bool] = True - checkpoint_callback_params: Optional[CallbackParams] = CallbackParams() + checkpoint_callback_params: Optional[CallbackParams] = field(default_factory=lambda: CallbackParams()) create_early_stopping_callback: Optional[bool] = False - early_stopping_callback_params: Optional[EarlyStoppingParams] = EarlyStoppingParams() + early_stopping_callback_params: Optional[EarlyStoppingParams] = field( + default_factory=lambda: EarlyStoppingParams() + ) create_preemption_callback: Optional[bool] = True # Additional exp_manager arguments files_to_copy: Optional[List[str]] = None # logs timing of train/val/test steps log_step_timing: Optional[bool] = True - step_timing_kwargs: Optional[StepTimingParams] = StepTimingParams() + step_timing_kwargs: Optional[StepTimingParams] = field(default_factory=lambda: StepTimingParams()) # Configures creation of log files for different ranks log_local_rank_0_only: Optional[bool] = False log_global_rank_0_only: Optional[bool] = False # disable initial validation when resuming from a checkpoint saved during validation disable_validation_on_resume: Optional[bool] = True - ema: Optional[EMAParams] = EMAParams() + ema: Optional[EMAParams] = field(default_factory=lambda: EMAParams()) # Wall clock time limit max_time_per_run: Optional[str] = None # time to sleep non 0 ranks during initialization diff --git a/requirements/requirements_nlp.txt b/requirements/requirements_nlp.txt index da44a726bd29..2a2941c27f5d 100644 --- a/requirements/requirements_nlp.txt +++ b/requirements/requirements_nlp.txt @@ -17,7 +17,7 @@ opencc pangu rapidfuzz rouge_score -sacrebleu[ja] +sacrebleu # manually install sacrebleu[ja] for Japanese support; MeCab is unsupported in Python 3.11+ sentence_transformers tensorstore zarr diff --git a/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py b/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py index 1846a986bf6e..b1cd385f4198 100644 --- a/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py +++ b/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py @@ -112,14 +112,14 @@ class EvalBeamSearchNGramConfig: beam_beta: List[float] = field(default_factory=lambda: [0.0]) # The beta parameter or list of the betas for the beam search decoding decoding_strategy: str = "beam" - decoding: ctc_beam_decoding.BeamCTCInferConfig = ctc_beam_decoding.BeamCTCInferConfig(beam_size=128) + decoding: ctc_beam_decoding.BeamCTCInferConfig = field(default_factory=lambda: ctc_beam_decoding.BeamCTCInferConfig(beam_size=128)) - text_processing: Optional[TextProcessingConfig] = TextProcessingConfig( + text_processing: Optional[TextProcessingConfig] = field(default_factory=lambda: TextProcessingConfig( punctuation_marks = ".,?", separate_punctuation = False, do_lowercase = False, rm_punctuation = False, - ) + )) # fmt: on diff --git a/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py b/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py index bbc33d214636..8548b839024f 100644 --- a/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py +++ b/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py @@ -115,7 +115,7 @@ class EvalBeamSearchNGramConfig: hat_subtract_ilm: bool = False hat_ilm_weight: List[float] = field(default_factory=lambda: [0.0]) - decoding: rnnt_beam_decoding.BeamRNNTInferConfig = rnnt_beam_decoding.BeamRNNTInferConfig(beam_size=128) + decoding: rnnt_beam_decoding.BeamRNNTInferConfig = field(default_factory=lambda: rnnt_beam_decoding.BeamRNNTInferConfig(beam_size=128)) # fmt: on diff --git a/scripts/confidence_ensembles/build_ensemble.py b/scripts/confidence_ensembles/build_ensemble.py index 99bfa6187b30..e40997c4aca2 100644 --- a/scripts/confidence_ensembles/build_ensemble.py +++ b/scripts/confidence_ensembles/build_ensemble.py @@ -75,7 +75,7 @@ import sys import tempfile from copy import deepcopy -from dataclasses import dataclass +from dataclasses import dataclass, field from pathlib import Path from typing import Dict, List, Optional, Tuple @@ -209,19 +209,23 @@ class BuildEnsembleConfig: random_seed: int = 0 # for reproducibility # default confidence, can override - confidence: ConfidenceConfig = ConfidenceConfig( - # we keep frame confidences and apply aggregation manually to get full-utterance confidence - preserve_frame_confidence=True, - exclude_blank=True, - aggregation="mean", - method_cfg=ConfidenceMethodConfig(name="entropy", entropy_type="renyi", alpha=0.25, entropy_norm="lin",), + confidence: ConfidenceConfig = field( + default_factory=lambda: ConfidenceConfig( + # we keep frame confidences and apply aggregation manually to get full-utterance confidence + preserve_frame_confidence=True, + exclude_blank=True, + aggregation="mean", + measure_cfg=ConfidenceMethodConfig(name="entropy", entropy_type="renyi", alpha=0.25, entropy_norm="lin",), + ) ) temperature: float = 1.0 # this is optional, but can be used to change any aspect of the transcription # config, such as batch size or amp usage. Note that model, data and confidence # will be overriden by this script - transcription: transcribe_speech.TranscriptionConfig = transcribe_speech.TranscriptionConfig() + transcription: transcribe_speech.TranscriptionConfig = field( + default_factory=lambda: transcribe_speech.TranscriptionConfig() + ) # set to True to tune the confidence. # requires dev manifests to be specified for each model @@ -229,12 +233,14 @@ class BuildEnsembleConfig: # used to specify what to tune over. By default runs tuning over some # reasonalbe grid, so that it does not take forever. # Can be changed as needed - tune_confidence_config: TuneConfidenceConfig = TuneConfidenceConfig() + tune_confidence_config: TuneConfidenceConfig = field(default_factory=lambda: TuneConfidenceConfig()) # very fast to tune and can be important in case of imbalanced datasets # will automatically set to False if dev data is not available tune_logistic_regression: bool = True - tune_logistic_regression_config: TuneLogisticRegressionConfig = TuneLogisticRegressionConfig() + tune_logistic_regression_config: TuneLogisticRegressionConfig = field( + default_factory=lambda: TuneLogisticRegressionConfig() + ) def __post_init__(self): """Checking that if any dev data is provided, all are provided. diff --git a/scripts/speech_recognition/confidence/benchmark_asr_confidence.py b/scripts/speech_recognition/confidence/benchmark_asr_confidence.py index f4558fa85256..0c8ab9535fae 100644 --- a/scripts/speech_recognition/confidence/benchmark_asr_confidence.py +++ b/scripts/speech_recognition/confidence/benchmark_asr_confidence.py @@ -14,7 +14,7 @@ import json import os -from dataclasses import dataclass, is_dataclass +from dataclasses import dataclass, field, is_dataclass from pathlib import Path from typing import Optional @@ -124,7 +124,9 @@ class ConfidenceBenchmarkingConfig: # Confidence configs target_level: str = "auto" # Choices: "word", "token", "auto" (for both word- and token-level confidence) - confidence_cfg: ConfidenceConfig = ConfidenceConfig(preserve_word_confidence=True, preserve_token_confidence=True) + confidence_cfg: ConfidenceConfig = field( + default_factory=lambda: ConfidenceConfig(preserve_word_confidence=True, preserve_token_confidence=True) + ) grid_params: Optional[str] = None # a dictionary with lists of parameters to iteratively benchmark on diff --git a/scripts/speech_recognition/convert_to_tarred_audio_dataset.py b/scripts/speech_recognition/convert_to_tarred_audio_dataset.py index 64c086997ef0..3a739fe3a57d 100644 --- a/scripts/speech_recognition/convert_to_tarred_audio_dataset.py +++ b/scripts/speech_recognition/convert_to_tarred_audio_dataset.py @@ -202,7 +202,7 @@ class ASRTarredDatasetMetadata: num_samples_per_shard: Optional[int] = None is_concatenated_manifest: bool = False - dataset_config: Optional[ASRTarredDatasetConfig] = ASRTarredDatasetConfig() + dataset_config: Optional[ASRTarredDatasetConfig] = field(default_factory=lambda: ASRTarredDatasetConfig()) history: Optional[List[Any]] = field(default_factory=lambda: []) def __post_init__(self): diff --git a/tests/collections/asr/test_text_to_text_dataset.py b/tests/collections/asr/test_text_to_text_dataset.py index 92205de41a1b..bf68b51f1e6e 100644 --- a/tests/collections/asr/test_text_to_text_dataset.py +++ b/tests/collections/asr/test_text_to_text_dataset.py @@ -15,7 +15,7 @@ import json import multiprocessing import os -from dataclasses import dataclass +from dataclasses import dataclass, field from pathlib import Path import pytest @@ -118,7 +118,7 @@ class TextTokenizerCfg: apostrophe: bool = True pad_with_space: bool = True add_blank_at: bool = True - g2p: G2PConfig = G2PConfig() + g2p: G2PConfig = field(default_factory=lambda: G2PConfig()) config = OmegaConf.create(OmegaConf.to_yaml(TextTokenizerCfg())) return instantiate(config) diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index 77ab3111fd91..d298e8072d58 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -149,8 +149,8 @@ class AlignmentConfig: # Output file configs save_output_file_formats: List[str] = field(default_factory=lambda: ["ctm", "ass"]) - ctm_file_config: CTMFileConfig = CTMFileConfig() - ass_file_config: ASSFileConfig = ASSFileConfig() + ctm_file_config: CTMFileConfig = field(default_factory=lambda: CTMFileConfig()) + ass_file_config: ASSFileConfig = field(default_factory=lambda: ASSFileConfig()) @hydra_runner(config_name="AlignmentConfig", schema=AlignmentConfig) From 2ab6aa30b3e79e1cd58bee4258e31ca0f3996f58 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu, 5 Oct 2023 20:19:26 -0700 Subject: [PATCH 103/112] Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Sasha Meister --- Dockerfile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Dockerfile b/Dockerfile index 843c0c27df45..7d1539ad2db3 100644 --- a/Dockerfile +++ b/Dockerfile @@ -38,7 +38,7 @@ RUN apt-get update && \ libsndfile1 sox \ libfreetype6 \ swig \ - ffmpeg=ffmpeg_5.1.2-3ubuntu1 \ + ffmpeg \ libavdevice-dev && \ rm -rf /var/lib/apt/lists/* @@ -111,7 +111,7 @@ RUN /usr/bin/test -n "$NEMO_VERSION" && \ /bin/echo "export BASE_IMAGE=${BASE_IMAGE}" >> /root/.bashrc # Install NeMo -RUN --mount=from=nemo-src,target=/tmp/nemo cd /tmp/nemo && pip install ".[all]" +RUN --mount=from=nemo-src,target=/tmp/nemo,rw cd /tmp/nemo && pip install ".[all]" # Check install RUN python -c "import nemo.collections.nlp as nemo_nlp" && \ From 290e22858fef23ae313be94fd592515ca1f99245 Mon Sep 17 00:00:00 2001 From: Aleksandr Laptev Date: Fri, 6 Oct 2023 13:03:07 +0700 Subject: [PATCH 104/112] [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../parts/submodules/rnnt_greedy_decoding.py | 31 +++++++++++-------- .../asr/test_asr_rnnt_encdec_model.py | 24 +++++++++----- 2 files changed, 35 insertions(+), 20 deletions(-) diff --git a/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py b/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py index e37d282ed0de..a0aea07f7bc8 100644 --- a/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py +++ b/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py @@ -649,6 +649,7 @@ def _greedy_decode_blank_as_pad( # Mask buffers blank_mask = torch.full([batchsize], fill_value=0, dtype=torch.bool, device=device) + blank_mask_prev = None # Get max sequence length max_out_len = out_len.max() @@ -666,6 +667,8 @@ def _greedy_decode_blank_as_pad( # Batch: [B, T, D], but Bi may have seq len < max(seq_lens_in_batch) # Forcibly mask with "blank" tokens, for all sample where current time step T > seq_len blank_mask = time_idx >= out_len + blank_mask_prev = blank_mask.clone() + # Start inner loop while not_blank and (self.max_symbols is None or symbols_added < self.max_symbols): # Batch prediction and joint network steps @@ -694,7 +697,6 @@ def _greedy_decode_blank_as_pad( # This is accumulating blanks over all time steps T and all target steps min(max_symbols, U) k_is_blank = k == self._blank_index blank_mask.bitwise_or_(k_is_blank) - all_blanks = torch.all(blank_mask) del k_is_blank @@ -705,10 +707,9 @@ def _greedy_decode_blank_as_pad( logp_vals = logp.to('cpu') logp_ids = logp_vals.max(1)[1] for batch_idx, is_blank in enumerate(blank_mask): - # we only want to update non-blanks, unless we are at the last step in the loop where - # all elements produced blanks, otherwise there will be duplicate predictions - # saved in alignments - if time_idx < out_len[batch_idx] and (all_blanks or not is_blank): + # we only want to update non-blanks and first-time blanks, + # otherwise alignments will contain duplicate predictions + if time_idx < out_len[batch_idx] and (not blank_mask_prev[batch_idx] or not is_blank): hypotheses[batch_idx].alignments[-1].append( (logp_vals[batch_idx], logp_ids[batch_idx]) ) @@ -720,13 +721,15 @@ def _greedy_decode_blank_as_pad( # Insert probabilities into last timestep per sample confidence = self._get_confidence(logp) for batch_idx, is_blank in enumerate(blank_mask): - if time_idx < out_len[batch_idx] and (all_blanks or not is_blank): + if time_idx < out_len[batch_idx] and (not blank_mask_prev[batch_idx] or not is_blank): hypotheses[batch_idx].frame_confidence[-1].append(confidence[batch_idx]) del logp + blank_mask_prev.bitwise_or_(blank_mask) + # If all samples predict / have predicted prior blanks, exit loop early # This is equivalent to if single sample predicted k - if all_blanks: + if blank_mask.all(): not_blank = False else: # Collect batch indices where blanks occurred now/past @@ -847,6 +850,7 @@ def _greedy_decode_masked( # Mask buffers blank_mask = torch.full([batchsize], fill_value=0, dtype=torch.bool, device=device) + blank_mask_prev = None # Get max sequence length max_out_len = out_len.max() @@ -866,6 +870,7 @@ def _greedy_decode_masked( # Batch: [B, T, D], but Bi may have seq len < max(seq_lens_in_batch) # Forcibly mask with "blank" tokens, for all sample where current time step T > seq_len blank_mask = time_idx >= out_len + blank_mask_prev = blank_mask.clone() # Start inner loop while not_blank and (self.max_symbols is None or symbols_added < self.max_symbols): @@ -904,7 +909,6 @@ def _greedy_decode_masked( # This is accumulating blanks over all time steps T and all target steps min(max_symbols, U) k_is_blank = k == self._blank_index blank_mask.bitwise_or_(k_is_blank) - all_blanks = torch.all(blank_mask) # If preserving alignments, check if sequence length of sample has been reached # before adding alignment @@ -913,10 +917,9 @@ def _greedy_decode_masked( logp_vals = logp.to('cpu') logp_ids = logp_vals.max(1)[1] for batch_idx, is_blank in enumerate(blank_mask): - # we only want to update non-blanks, unless we are at the last step in the loop where - # all elements produced blanks, otherwise there will be duplicate predictions - # saved in alignments - if time_idx < out_len[batch_idx] and (all_blanks or not is_blank): + # we only want to update non-blanks and first-time blanks, + # otherwise alignments will contain duplicate predictions + if time_idx < out_len[batch_idx] and (not blank_mask_prev[batch_idx] or not is_blank): hypotheses[batch_idx].alignments[-1].append( (logp_vals[batch_idx], logp_ids[batch_idx]) ) @@ -929,10 +932,12 @@ def _greedy_decode_masked( # Insert probabilities into last timestep per sample confidence = self._get_confidence(logp) for batch_idx, is_blank in enumerate(blank_mask): - if time_idx < out_len[batch_idx] and (all_blanks or not is_blank): + if time_idx < out_len[batch_idx] and (not blank_mask_prev[batch_idx] or not is_blank): hypotheses[batch_idx].frame_confidence[-1].append(confidence[batch_idx]) del logp + blank_mask_prev.bitwise_or_(blank_mask) + # If all samples predict / have predicted prior blanks, exit loop early # This is equivalent to if single sample predicted k if blank_mask.all(): diff --git a/tests/collections/asr/test_asr_rnnt_encdec_model.py b/tests/collections/asr/test_asr_rnnt_encdec_model.py index 12e08006a3e4..b466d09c460d 100644 --- a/tests/collections/asr/test_asr_rnnt_encdec_model.py +++ b/tests/collections/asr/test_asr_rnnt_encdec_model.py @@ -45,6 +45,10 @@ def predict( add_sos: bool = False, batch_size: Optional[int] = None, ) -> Tuple[torch.Tensor, List[torch.Tensor]]: + if batch_size is None: + batch_size = 1 + if y is not None: + y = y + torch.tensor([0] * self.vocab_size + [1], dtype=torch.float32).repeat(y.size()) if y is not None and state is not None: return (y + state) / 2, y * state elif state is not None: @@ -52,8 +56,8 @@ def predict( elif y is not None: return y, torch.tensor([0] * self.vocab_size + [1], dtype=torch.float32).repeat(y.size()) return ( - torch.tensor([0] * self.vocab_size + [1], dtype=torch.float32).repeat([1, 1, 1]), - torch.tensor([0] * self.vocab_size + [1], dtype=torch.float32).repeat([1, 1, 1]), + torch.tensor([0] * self.vocab_size + [1], dtype=torch.float32).repeat([1, batch_size, 1]), + torch.tensor([0] * self.vocab_size + [1], dtype=torch.float32).repeat([1, batch_size, 1]), ) def initialize_state(self, y: torch.Tensor) -> List[torch.Tensor]: @@ -66,8 +70,11 @@ def score_hypothesis( def batch_select_state(self, batch_states: List[torch.Tensor], idx: int) -> List[List[torch.Tensor]]: if batch_states is not None: - states = batch_states[0][idx] - states = states.long() + try: + states = batch_states[0][idx] + states = states.long() + except Exception as e: + raise Exception(batch_states, idx) return [states] else: return None @@ -92,8 +99,12 @@ def joint(self, f: torch.Tensor, g: torch.Tensor) -> torch.Tensor: setup["decoder"] = DummyRNNTDecoder(vocab_size=2, blank_idx=2, blank_as_pad=True) setup["decoder_masked"] = DummyRNNTDecoder(vocab_size=2, blank_idx=2, blank_as_pad=False) setup["joint"] = DummyRNNTJoint() - setup["encoder_output"] = torch.tensor([[[1, 0, 0], [0, 1, 0], [0, 0, 1]]], dtype=torch.float32).transpose(1, 2) - setup["encoded_lengths"] = torch.tensor([3]) + # expected timesteps for max_symbols_per_step=5 are [[0, 0, 0, 0, 0, 1, 1], [1, 1, 1, 1, 1]], + # so we have both looped and regular iteration on the second frame + setup["encoder_output"] = torch.tensor( + [[[1, 0, 0], [0, 1, 0], [0, 0, 1]], [[0, 0, 1], [2, 0, 0], [0, 0, 0]]], dtype=torch.float32 + ).transpose(1, 2) + setup["encoded_lengths"] = torch.tensor([3, 2]) return setup @@ -726,7 +737,6 @@ def test_greedy_decoding_preserve_frame_confidence(self, greedy_class): decoder, joint_net, blank_index=len(token_list), - preserve_alignments=True, preserve_frame_confidence=True, max_symbols_per_step=max_symbols_per_step, ) From 25e86ab68d17f3bf4ddc4b21b957ae62772ad181 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu, 5 Oct 2023 23:25:16 -0700 Subject: [PATCH 105/112] [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan Co-authored-by: Ryan Langman Signed-off-by: Sasha Meister --- nemo/collections/asr/parts/submodules/jasper.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nemo/collections/asr/parts/submodules/jasper.py b/nemo/collections/asr/parts/submodules/jasper.py index 6496586463e4..900d0bd55a10 100644 --- a/nemo/collections/asr/parts/submodules/jasper.py +++ b/nemo/collections/asr/parts/submodules/jasper.py @@ -375,7 +375,7 @@ def update_masked_length(self, max_len, seq_range=None, device=None): self.lens = self.lens.to(device) else: self.lens = seq_range - self.max_len = max_len + self.max_len = torch.tensor(max_len) def mask_input(self, x, lens): max_len = x.size(2) From d4b6a7530c2a4262b124df2dc348a10fb6c0aa8f Mon Sep 17 00:00:00 2001 From: Ryan Langman Date: Fri, 6 Oct 2023 10:07:15 -0700 Subject: [PATCH 106/112] [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan * [TTS] Fix STFT resolution Signed-off-by: Ryan * [TTS] Fix training metric logging Signed-off-by: Ryan * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../conf/audio_codec/audio_codec_24000.yaml | 170 ++++++++++++++++ .../{encodec.yaml => encodec_24000.yaml} | 12 +- .../tts/losses/audio_codec_loss.py | 186 +++++++++++++++++- nemo/collections/tts/models/audio_codec.py | 141 ++++++++----- .../tts/modules/encodec_modules.py | 35 ++-- nemo/collections/tts/parts/utils/helpers.py | 2 +- .../tts/losses/test_audio_codec_loss.py | 48 ++++- .../tts/modules/test_audio_codec_modules.py | 7 +- 8 files changed, 508 insertions(+), 93 deletions(-) create mode 100644 examples/tts/conf/audio_codec/audio_codec_24000.yaml rename examples/tts/conf/audio_codec/{encodec.yaml => encodec_24000.yaml} (94%) diff --git a/examples/tts/conf/audio_codec/audio_codec_24000.yaml b/examples/tts/conf/audio_codec/audio_codec_24000.yaml new file mode 100644 index 000000000000..14e6fe236545 --- /dev/null +++ b/examples/tts/conf/audio_codec/audio_codec_24000.yaml @@ -0,0 +1,170 @@ +# This config contains the default values for training 24khz audio codec model +# If you want to train model on other dataset, you can change config values according to your dataset. +# Most dataset-specific arguments are in the head of the config file, see below. + +name: EnCodec + +max_epochs: ??? +# Adjust batch size based on GPU memory +batch_size: 16 +# When doing weighted sampling with multiple manifests, this defines how many training steps are in an epoch. +# If null, then weighted sampling is disabled. +weighted_sampling_steps_per_epoch: null + +# Dataset metadata for each manifest +# https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/tts/data/vocoder_dataset.py#L39-L41 +train_ds_meta: ??? +val_ds_meta: ??? + +log_ds_meta: ??? +log_dir: ??? + +# Modify these values based on your sample rate +sample_rate: 24000 +train_n_samples: 24000 +down_sample_rates: [2, 4, 5, 8] +up_sample_rates: [8, 5, 4, 2] +# The number of samples per encoded audio frame. Should be the product of the down_sample_rates. +# For example 2 * 4 * 5 * 8 = 320. +samples_per_frame: 320 + +model: + + max_epochs: ${max_epochs} + steps_per_epoch: ${weighted_sampling_steps_per_epoch} + + sample_rate: ${sample_rate} + samples_per_frame: ${samples_per_frame} + + mel_loss_l1_scale: 15.0 + mel_loss_l2_scale: 0.0 + stft_loss_scale: 15.0 + time_domain_loss_scale: 0.0 + + # Probability of updating the discriminator during each training step + # For example, update the discriminator 2/3 times (2 updates for every 3 batches) + disc_updates_per_period: 2 + disc_update_period: 3 + + # All resolutions for reconstruction loss, ordered [num_fft, hop_length, window_length] + loss_resolutions: [ + [32, 8, 32], [64, 16, 64], [128, 32, 128], [256, 64, 256], [512, 128, 512], [1024, 256, 1024], [2048, 512, 2048] + ] + mel_loss_dims: [5, 10, 20, 40, 80, 160, 320] + mel_loss_log_guard: 1.0 + stft_loss_log_guard: 1.0 + + train_ds: + dataset: + _target_: nemo.collections.tts.data.vocoder_dataset.VocoderDataset + weighted_sampling_steps_per_epoch: ${weighted_sampling_steps_per_epoch} + sample_rate: ${sample_rate} + n_samples: ${train_n_samples} + min_duration: 1.01 + max_duration: null + dataset_meta: ${train_ds_meta} + + dataloader_params: + batch_size: ${batch_size} + drop_last: true + num_workers: 4 + + validation_ds: + dataset: + _target_: nemo.collections.tts.data.vocoder_dataset.VocoderDataset + sample_rate: ${sample_rate} + n_samples: null + min_duration: null + max_duration: null + trunc_duration: 10.0 # Only use the first 10 seconds of audio for computing validation loss + dataset_meta: ${val_ds_meta} + + dataloader_params: + batch_size: 8 + num_workers: 2 + + # Configures how audio samples are generated and saved during training. + # Remove this section to disable logging. + log_config: + log_dir: ${log_dir} + log_epochs: [10, 50] + epoch_frequency: 100 + log_tensorboard: false + log_wandb: false + + generators: + - _target_: nemo.collections.tts.parts.utils.callbacks.AudioCodecArtifactGenerator + log_audio: true + log_encoding: true + log_dequantized: true + + dataset: + _target_: nemo.collections.tts.data.vocoder_dataset.VocoderDataset + sample_rate: ${sample_rate} + n_samples: null + min_duration: null + max_duration: null + trunc_duration: 15.0 # Only log the first 15 seconds of generated audio. + dataset_meta: ${log_ds_meta} + + dataloader_params: + batch_size: 4 + num_workers: 2 + + audio_encoder: + _target_: nemo.collections.tts.modules.encodec_modules.HifiGanEncoder + down_sample_rates: ${down_sample_rates} + + audio_decoder: + _target_: nemo.collections.tts.modules.encodec_modules.SEANetDecoder + up_sample_rates: ${up_sample_rates} + + vector_quantizer: + _target_: nemo.collections.tts.modules.encodec_modules.ResidualVectorQuantizer + num_codebooks: 8 + + discriminator: + _target_: nemo.collections.tts.modules.encodec_modules.MultiResolutionDiscriminatorSTFT + resolutions: [[128, 32, 128], [256, 64, 256], [512, 128, 512], [1024, 256, 1024], [2048, 512, 2048]] + + # The original EnCodec uses hinged loss, but squared-GAN loss is more stable + # and reduces the need to tune the loss weights or use a gradient balancer. + generator_loss: + _target_: nemo.collections.tts.losses.audio_codec_loss.GeneratorSquaredLoss + + discriminator_loss: + _target_: nemo.collections.tts.losses.audio_codec_loss.DiscriminatorSquaredLoss + + optim: + _target_: torch.optim.Adam + lr: 3e-4 + betas: [0.5, 0.9] + + sched: + name: ExponentialLR + gamma: 0.998 + +trainer: + num_nodes: 1 + devices: 1 + accelerator: gpu + strategy: ddp_find_unused_parameters_true + precision: 32 # Vector quantization only works with 32-bit precision. + max_epochs: ${max_epochs} + accumulate_grad_batches: 1 + enable_checkpointing: False # Provided by exp_manager + logger: false # Provided by exp_manager + log_every_n_steps: 100 + check_val_every_n_epoch: 10 + benchmark: false + +exp_manager: + exp_dir: null + name: ${name} + create_tensorboard_logger: true + create_checkpoint_callback: true + create_wandb_logger: false + checkpoint_callback_params: + monitor: val_loss + resume_if_exists: false + resume_ignore_no_checkpoint: false diff --git a/examples/tts/conf/audio_codec/encodec.yaml b/examples/tts/conf/audio_codec/encodec_24000.yaml similarity index 94% rename from examples/tts/conf/audio_codec/encodec.yaml rename to examples/tts/conf/audio_codec/encodec_24000.yaml index a0f7a50c92dd..8caaea76294b 100644 --- a/examples/tts/conf/audio_codec/encodec.yaml +++ b/examples/tts/conf/audio_codec/encodec_24000.yaml @@ -36,17 +36,23 @@ model: sample_rate: ${sample_rate} samples_per_frame: ${samples_per_frame} - mel_loss_scale: 5.0 + mel_loss_l1_scale: 1.0 + mel_loss_l2_scale: 1.0 + stft_loss_scale: 0.0 time_domain_loss_scale: 0.1 + # Probability of updating the discriminator during each training step # For example, update the discriminator 2/3 times (2 updates for every 3 batches) disc_updates_per_period: 2 disc_update_period: 3 - # All resolutions for mel reconstruction loss, ordered [num_fft, hop_length, window_length] - mel_loss_resolutions: [ + # All resolutions for reconstruction loss, ordered [num_fft, hop_length, window_length] + loss_resolutions: [ [32, 8, 32], [64, 16, 64], [128, 32, 128], [256, 64, 256], [512, 128, 512], [1024, 256, 1024], [2048, 512, 2048] ] + mel_loss_dims: [64, 64, 64, 64, 64, 64, 64] + mel_loss_log_guard: 1E-5 + stft_loss_log_guard: 1.0 train_ds: dataset: diff --git a/nemo/collections/tts/losses/audio_codec_loss.py b/nemo/collections/tts/losses/audio_codec_loss.py index 8819282f07bd..5a36d0378371 100644 --- a/nemo/collections/tts/losses/audio_codec_loss.py +++ b/nemo/collections/tts/losses/audio_codec_loss.py @@ -19,6 +19,7 @@ from einops import rearrange from nemo.collections.asr.parts.preprocessing.features import FilterbankFeatures +from nemo.collections.tts.parts.utils.helpers import get_mask_from_lengths, mask_sequence_tensor from nemo.core.classes import Loss, typecheck from nemo.core.neural_types import ( AudioSignal, @@ -109,14 +110,25 @@ def forward(self, audio_real, audio_gen, audio_len): class MultiResolutionMelLoss(Loss): - def __init__(self, sample_rate: int, mel_dim: int, resolutions: List[List], l1_scale: float = 1.0): + """ + Multi-resolution log mel spectrogram loss. + + Args: + sample_rate: Sample rate of audio. + resolutions: List of resolutions, each being 3 integers ordered [num_fft, hop_length, window_length] + mel_dims: Dimension of mel spectrogram to compute for each resolution. Should be same length as 'resolutions'. + log_guard: Value to add to mel spectrogram to avoid taking log of 0. + """ + + def __init__(self, sample_rate: int, resolutions: List[List], mel_dims: List[int], log_guard: float = 1.0): super(MultiResolutionMelLoss, self).__init__() + assert len(resolutions) == len(mel_dims) - self.l1_loss_fn = MaskedMAELoss(loss_scale=l1_scale) + self.l1_loss_fn = MaskedMAELoss() self.l2_loss_fn = MaskedMSELoss() self.mel_features = torch.nn.ModuleList() - for n_fft, hop_len, win_len in resolutions: + for mel_dim, (n_fft, hop_len, win_len) in zip(mel_dims, resolutions): mel_feature = FilterbankFeatures( sample_rate=sample_rate, nfilt=mel_dim, @@ -126,7 +138,7 @@ def __init__(self, sample_rate: int, mel_dim: int, resolutions: List[List], l1_s pad_to=1, mag_power=1.0, log_zero_guard_type="add", - log_zero_guard_value=1.0, + log_zero_guard_value=log_guard, mel_norm=None, normalize=None, preemph=None, @@ -146,20 +158,176 @@ def input_types(self): @property def output_types(self): return { - "loss": NeuralType(elements_type=LossType()), + "l1_loss": NeuralType(elements_type=LossType()), + "l2_loss": NeuralType(elements_type=LossType()), } @typecheck() def forward(self, audio_real, audio_gen, audio_len): - loss = 0.0 + l1_loss = 0.0 + l2_loss = 0.0 for mel_feature in self.mel_features: mel_real, mel_real_len = mel_feature(x=audio_real, seq_len=audio_len) mel_gen, _ = mel_feature(x=audio_gen, seq_len=audio_len) - loss += self.l1_loss_fn(predicted=mel_gen, target=mel_real, target_len=mel_real_len) - loss += self.l2_loss_fn(predicted=mel_gen, target=mel_real, target_len=mel_real_len) + l1_loss += self.l1_loss_fn(predicted=mel_gen, target=mel_real, target_len=mel_real_len) + l2_loss += self.l2_loss_fn(predicted=mel_gen, target=mel_real, target_len=mel_real_len) + + l1_loss /= len(self.mel_features) + l2_loss /= len(self.mel_features) + + return l1_loss, l2_loss + + +class STFTLoss(Loss): + """ + Log magnitude STFT loss. + + Args: + resolution: Resolution of spectrogram, a list of 3 numbers ordered [num_fft, hop_length, window_length] + log_guard: Value to add to magnitude spectrogram to avoid taking log of 0. + sqrt_guard: Value to add to when computing absolute value of STFT to avoid NaN loss. + """ + + def __init__(self, resolution: List[int], log_guard: float = 1.0, sqrt_guard: float = 1e-5): + super(STFTLoss, self).__init__() + self.loss_fn = MaskedMAELoss() + self.n_fft, self.hop_length, self.win_length = resolution + self.register_buffer("window", torch.hann_window(self.win_length, periodic=False)) + self.log_guard = log_guard + self.sqrt_guard = sqrt_guard + + def _compute_spectrogram(self, audio, spec_len): + # [B, n_fft, T_spec] + spec = torch.stft( + audio, + n_fft=self.n_fft, + hop_length=self.hop_length, + win_length=self.win_length, + window=self.window, + return_complex=True, + ) + # [B, n_fft, T_spec, 2] + spec = torch.view_as_real(spec) + # [B, n_fft, T_spec] + spec_mag = torch.sqrt(spec.pow(2).sum(-1) + self.sqrt_guard) + spec_log = torch.log(spec_mag + self.log_guard) + spec_log = mask_sequence_tensor(spec_log, spec_len) + return spec_log + + @property + def input_types(self): + return { + "audio_real": NeuralType(('B', 'T'), AudioSignal()), + "audio_gen": NeuralType(('B', 'T'), AudioSignal()), + "audio_len": NeuralType(tuple('B'), LengthsType()), + } - loss /= len(self.mel_features) + @property + def output_types(self): + return {"loss": NeuralType(elements_type=LossType())} + + @typecheck() + def forward(self, audio_real, audio_gen, audio_len): + spec_len = (audio_len // self.hop_length) + 1 + spec_real = self._compute_spectrogram(audio=audio_real, spec_len=spec_len) + spec_gen = self._compute_spectrogram(audio=audio_gen, spec_len=spec_len) + loss = self.loss_fn(predicted=spec_gen, target=spec_real, target_len=spec_len) + return loss + + +class MultiResolutionSTFTLoss(Loss): + """ + Multi-resolution log magnitude STFT loss. + + Args: + resolutions: List of resolutions, each being 3 integers ordered [num_fft, hop_length, window_length] + log_guard: Value to add to magnitude spectrogram to avoid taking log of 0. + sqrt_guard: Value to add to when computing absolute value of STFT to avoid NaN loss. + """ + + def __init__(self, resolutions: List[List], log_guard: float = 1.0, sqrt_guard: float = 1e-5): + super(MultiResolutionSTFTLoss, self).__init__() + self.loss_fns = torch.nn.ModuleList( + [STFTLoss(resolution=resolution, log_guard=log_guard, sqrt_guard=sqrt_guard) for resolution in resolutions] + ) + + @property + def input_types(self): + return { + "audio_real": NeuralType(('B', 'T'), AudioSignal()), + "audio_gen": NeuralType(('B', 'T'), AudioSignal()), + "audio_len": NeuralType(tuple('B'), LengthsType()), + } + @property + def output_types(self): + return {"loss": NeuralType(elements_type=LossType())} + + @typecheck() + def forward(self, audio_real, audio_gen, audio_len): + loss = 0.0 + for loss_fn in self.loss_fns: + loss += loss_fn(audio_real=audio_real, audio_gen=audio_gen, audio_len=audio_len) + loss /= len(self.loss_fns) + return loss + + +class SISDRLoss(Loss): + """ + SI-SDR loss based off of torchmetrics.functional.audio.sdr.scale_invariant_signal_distortion_ratio + with added support for masking. + """ + + def __init__(self, epsilon: float = 1e-8): + super(SISDRLoss, self).__init__() + self.epsilon = epsilon + + @property + def input_types(self): + return { + "audio_real": NeuralType(('B', 'T'), AudioSignal()), + "audio_gen": NeuralType(('B', 'T'), AudioSignal()), + "audio_len": NeuralType(tuple('B'), LengthsType()), + } + + @property + def output_types(self): + return {"loss": NeuralType(elements_type=LossType())} + + @typecheck() + def forward(self, audio_real, audio_gen, audio_len): + mask = get_mask_from_lengths(x=audio_real, lengths=audio_len) + audio_len = rearrange(audio_len, 'B -> B 1') + + # Shift audio to have zero-mean + # [B, 1] + target_mean = torch.sum(audio_real, dim=-1, keepdim=True) / audio_len + pred_mean = torch.sum(audio_gen, dim=-1, keepdim=True) / audio_len + + # [B, T] + target = audio_real - target_mean + target = target * mask + pred = audio_gen - pred_mean + pred = pred * mask + + # [B, 1] + ref_pred = torch.sum(pred * target, dim=-1, keepdim=True) + ref_target = torch.sum(target ** 2, dim=-1, keepdim=True) + alpha = (ref_pred + self.epsilon) / (ref_target + self.epsilon) + + # [B, T] + target_scaled = alpha * target + distortion = target_scaled - pred + + # [B] + target_scaled_power = torch.sum(target_scaled ** 2, dim=-1) + distortion_power = torch.sum(distortion ** 2, dim=-1) + + ratio = (target_scaled_power + self.epsilon) / (distortion_power + self.epsilon) + si_sdr = 10 * torch.log10(ratio) + + # [1] + loss = -torch.mean(si_sdr) return loss diff --git a/nemo/collections/tts/models/audio_codec.py b/nemo/collections/tts/models/audio_codec.py index 63140b77f2b5..9b6675db5979 100644 --- a/nemo/collections/tts/models/audio_codec.py +++ b/nemo/collections/tts/models/audio_codec.py @@ -25,7 +25,9 @@ from nemo.collections.tts.losses.audio_codec_loss import ( MultiResolutionMelLoss, + MultiResolutionSTFTLoss, RelativeFeatureMatchingLoss, + SISDRLoss, TimeDomainLoss, ) from nemo.collections.tts.modules.common import GaussianDropout @@ -85,26 +87,43 @@ def __init__(self, cfg: DictConfig, trainer: Trainer = None): # Discriminator setup self.discriminator = instantiate(cfg.discriminator) - # Loss setup - mel_loss_dim = cfg.get("mel_loss_dim", 64) - mel_loss_resolutions = cfg.mel_loss_resolutions - self.time_domain_loss_scale = cfg.get("time_domain_loss_scale", 1.0) - self.mel_loss_scale = cfg.get("mel_loss_scale", 1.0) - mel_loss_l1_scale = cfg.get("mel_loss_l1_scale", 1.0) - self.gen_loss_scale = cfg.get("gen_loss_scale", 1.0) - self.feature_loss_scale = cfg.get("feature_loss_scale", 1.0) - - self.time_domain_loss_fn = TimeDomainLoss() + # Mel loss setup + loss_resolutions = cfg.loss_resolutions + mel_loss_dims = cfg.get("mel_loss_dims") + mel_loss_log_guard = cfg.get("mel_loss_log_guard", 1.0) + self.mel_loss_l1_scale = cfg.get("mel_loss_l1_scale", 1.0) + self.mel_loss_l2_scale = cfg.get("mel_loss_l2_scale", 1.0) self.mel_loss_fn = MultiResolutionMelLoss( sample_rate=self.sample_rate, - mel_dim=mel_loss_dim, - resolutions=mel_loss_resolutions, - l1_scale=mel_loss_l1_scale, + mel_dims=mel_loss_dims, + resolutions=loss_resolutions, + log_guard=mel_loss_log_guard, ) + + # STFT loss setup + stft_loss_log_guard = cfg.get("stft_loss_log_guard", 1.0) + self.stft_loss_scale = cfg.get("stft_loss_scale", 0.0) + self.stft_loss_fn = MultiResolutionSTFTLoss(resolutions=loss_resolutions, log_guard=stft_loss_log_guard,) + + # Time domain loss setup + self.time_domain_loss_scale = cfg.get("time_domain_loss_scale", 1.0) + self.si_sdr_loss_scale = cfg.get("si_sdr_loss_scale", 0.0) + self.time_domain_loss_fn = TimeDomainLoss() + self.si_sdr_loss_fn = SISDRLoss() + + # Discriminator loss setup + self.gen_loss_scale = cfg.get("gen_loss_scale", 1.0) + self.feature_loss_scale = cfg.get("feature_loss_scale", 1.0) self.gen_loss_fn = instantiate(cfg.generator_loss) self.disc_loss_fn = instantiate(cfg.discriminator_loss) self.feature_loss_fn = RelativeFeatureMatchingLoss() + # Codebook loss setup + if self.vector_quantizer: + self.commit_loss_scale = cfg.get("commit_loss_scale", 1.0) + else: + self.commit_loss_scale = 0.0 + # Log setup self.log_config = cfg.get("log_config", None) @@ -336,10 +355,10 @@ def _process_batch(self, batch): if self.vector_quantizer: encoded, _, commit_loss = self.vector_quantizer(inputs=encoded, input_len=encoded_len) else: - commit_loss = None + commit_loss = 0.0 # [B, T] - audio_gen, audio_gen_len = self.audio_decoder(inputs=encoded, input_len=encoded_len) + audio_gen, _ = self.audio_decoder(inputs=encoded, input_len=encoded_len) return audio, audio_len, audio_gen, commit_loss @@ -361,37 +380,65 @@ def training_step(self, batch, batch_idx): audio, audio_len, audio_gen, commit_loss = self._process_batch(batch) + metrics = { + "global_step": self.global_step, + "lr": optim_gen.param_groups[0]['lr'], + } + if self.should_update_disc(batch_idx): # Train discriminator disc_scores_real, disc_scores_gen, _, _ = self.discriminator( audio_real=audio, audio_gen=audio_gen.detach() ) loss_disc = self.disc_loss_fn(disc_scores_real=disc_scores_real, disc_scores_gen=disc_scores_gen) - train_disc_loss = loss_disc + metrics["d_loss"] = loss_disc optim_disc.zero_grad() - self.manual_backward(train_disc_loss) + self.manual_backward(loss_disc) optim_disc.step() - else: - loss_disc = None - loss_time_domain = self.time_domain_loss_fn(audio_real=audio, audio_gen=audio_gen, audio_len=audio_len) - train_loss_time_domain = self.time_domain_loss_scale * loss_time_domain + generator_losses = [] + + loss_mel_l1, loss_mel_l2 = self.mel_loss_fn(audio_real=audio, audio_gen=audio_gen, audio_len=audio_len) + if self.mel_loss_l1_scale: + metrics["g_loss_mel_l1"] = loss_mel_l1 + generator_losses.append(self.mel_loss_l1_scale * loss_mel_l1) + if self.mel_loss_l2_scale: + metrics["g_loss_mel_l2"] = loss_mel_l2 + generator_losses.append(self.mel_loss_l2_scale * loss_mel_l2) + + if self.stft_loss_scale: + loss_stft = self.stft_loss_fn(audio_real=audio, audio_gen=audio_gen, audio_len=audio_len) + metrics["g_loss_stft"] = loss_stft + generator_losses.append(self.stft_loss_scale * loss_stft) - loss_mel = self.mel_loss_fn(audio_real=audio, audio_gen=audio_gen, audio_len=audio_len) - train_loss_mel = self.mel_loss_scale * loss_mel + if self.time_domain_loss_scale: + loss_time_domain = self.time_domain_loss_fn(audio_real=audio, audio_gen=audio_gen, audio_len=audio_len) + metrics["g_loss_time_domain"] = loss_time_domain + generator_losses.append(self.time_domain_loss_scale * loss_time_domain) + + if self.si_sdr_loss_scale: + loss_si_sdr = self.si_sdr_loss_fn(audio_real=audio, audio_gen=audio_gen, audio_len=audio_len) + metrics["g_loss_si_sdr"] = loss_si_sdr + generator_losses.append(self.si_sdr_loss_scale * loss_si_sdr) _, disc_scores_gen, fmaps_real, fmaps_gen = self.discriminator(audio_real=audio, audio_gen=audio_gen) - loss_gen = self.gen_loss_fn(disc_scores_gen=disc_scores_gen) - train_loss_gen = self.gen_loss_scale * loss_gen + if self.gen_loss_scale: + loss_gen = self.gen_loss_fn(disc_scores_gen=disc_scores_gen) + metrics["g_loss_gen"] = loss_gen + generator_losses.append(self.gen_loss_scale * loss_gen) + + if self.feature_loss_scale: + loss_feature = self.feature_loss_fn(fmaps_real=fmaps_real, fmaps_gen=fmaps_gen) + metrics["g_loss_feature"] = loss_feature + generator_losses.append(self.feature_loss_scale * loss_feature) - loss_feature = self.feature_loss_fn(fmaps_real=fmaps_real, fmaps_gen=fmaps_gen) - train_loss_feature = self.feature_loss_scale * loss_feature + if self.commit_loss_scale: + metrics["g_loss_commit"] = commit_loss + generator_losses.append(self.commit_loss_scale * commit_loss) - loss_gen_all = train_loss_time_domain + train_loss_mel + train_loss_gen + train_loss_feature - if commit_loss is not None: - loss_gen_all += commit_loss + loss_gen_all = sum(generator_losses) optim_gen.zero_grad() self.manual_backward(loss_gen_all) @@ -399,36 +446,30 @@ def training_step(self, batch, batch_idx): self.update_lr() - metrics = { - "g_loss_time_domain": loss_time_domain, - "g_loss_mel": loss_mel, - "g_loss_gen": loss_gen, - "g_loss_feature": loss_feature, - "g_loss": loss_gen_all, - "global_step": self.global_step, - "lr": optim_gen.param_groups[0]['lr'], - } - - if loss_disc is not None: - metrics["d_loss"] = loss_disc - - if commit_loss is not None: - metrics["g_loss_commit"] = commit_loss - self.log_dict(metrics, on_step=True, sync_dist=True) - self.log("t_loss", train_loss_mel, prog_bar=True, logger=False, sync_dist=True) + self.log("t_loss", loss_mel_l1, prog_bar=True, logger=False, sync_dist=True) def on_train_epoch_end(self): self.update_lr("epoch") def validation_step(self, batch, batch_idx): audio, audio_len, audio_gen, _ = self._process_batch(batch) + + loss_mel_l1, loss_mel_l2 = self.mel_loss_fn(audio_real=audio, audio_gen=audio_gen, audio_len=audio_len) + loss_stft = self.stft_loss_fn(audio_real=audio, audio_gen=audio_gen, audio_len=audio_len) loss_time_domain = self.time_domain_loss_fn(audio_real=audio, audio_gen=audio_gen, audio_len=audio_len) - loss_mel = self.mel_loss_fn(audio_real=audio, audio_gen=audio_gen, audio_len=audio_len) + loss_si_sdr = self.si_sdr_loss_fn(audio_real=audio, audio_gen=audio_gen, audio_len=audio_len) + + # Use only main reconstruction losses for val_loss + val_loss = loss_mel_l1 + loss_stft + loss_time_domain + metrics = { - "val_loss": loss_time_domain + loss_mel, + "val_loss": val_loss, + "val_loss_mel_l1": loss_mel_l1, + "val_loss_mel_l2": loss_mel_l2, + "val_loss_stft": loss_stft, "val_loss_time_domain": loss_time_domain, - "val_loss_mel": loss_mel, + "val_loss_si_sdr": loss_si_sdr, } self.log_dict(metrics, on_epoch=True, sync_dist=True) diff --git a/nemo/collections/tts/modules/encodec_modules.py b/nemo/collections/tts/modules/encodec_modules.py index b05187ccb74b..8c424351ce35 100644 --- a/nemo/collections/tts/modules/encodec_modules.py +++ b/nemo/collections/tts/modules/encodec_modules.py @@ -198,6 +198,7 @@ def output_types(self): def remove_weight_norm(self): self.pre_conv.remove_weight_norm() + self.post_conv.remove_weight_norm() for res_block in self.res_blocks: res_block.remove_weight_norm() for down_sample_conv in self.down_sample_conv_layers: @@ -544,8 +545,8 @@ def __init__( codebook_size: int, codebook_dim: int, decay: float = 0.99, - threshold_ema_dead_code: Optional[int] = 2, - kmeans_iters: Optional[int] = None, + threshold_ema_dead_code: Optional[float] = 2.0, + kmeans_iters: Optional[int] = 50, ): super().__init__() self.decay = decay @@ -686,7 +687,6 @@ class ResidualVectorQuantizer(NeuralModule): Args: num_codebooks: Number of codebooks to use. - commit_loss_scale: Loss scale for codebook commit loss. codebook_size: Number of codes to use for each codebook. codebook_dim: Dimension of each code. decay: Decay for exponential moving average over the codebooks. @@ -700,20 +700,15 @@ class ResidualVectorQuantizer(NeuralModule): def __init__( self, num_codebooks: int, - commit_loss_scale: float = 1.0, codebook_size: int = 1024, codebook_dim: int = 128, decay: float = 0.99, - threshold_ema_dead_code: Optional[int] = 2, + threshold_ema_dead_code: Optional[float] = 2.0, kmeans_iters: Optional[int] = 50, ): super().__init__() self.codebook_dim = codebook_dim - - if commit_loss_scale: - self.commit_loss_fn = MaskedMSELoss(loss_scale=commit_loss_scale) - else: - self.commit_loss_fn = None + self.commit_loss_fn = MaskedMSELoss() self.codebooks = nn.ModuleList( [ @@ -728,16 +723,6 @@ def __init__( ] ) - def _commit_loss(self, input, target, input_len): - if not self.commit_loss_fn: - return 0.0 - - return self.commit_loss_fn( - predicted=rearrange(input, "B T D -> B D T"), - target=rearrange(target, "B T D -> B D T"), - target_len=input_len, - ) - @property def input_types(self): return { @@ -764,13 +749,17 @@ def forward(self, inputs: Tensor, input_len: Tensor) -> Tuple[Tensor, Tensor, fl dequantized_i, indices_i = codebook(inputs=residual, input_len=input_len) if self.training: - dequantized_i = residual + (dequantized_i - residual).detach() dequantized_i_const = dequantized_i.detach() - commit_loss_i = self._commit_loss(input=residual, target=dequantized_i_const, input_len=input_len) + + commit_loss_i = self.commit_loss_fn( + predicted=rearrange(residual, "B T D -> B D T"), + target=rearrange(dequantized_i_const, "B T D -> B D T"), + target_len=input_len, + ) commit_loss = commit_loss + commit_loss_i residual = residual - dequantized_i_const - + dequantized_i = residual + (dequantized_i - residual).detach() else: residual = residual - dequantized_i diff --git a/nemo/collections/tts/parts/utils/helpers.py b/nemo/collections/tts/parts/utils/helpers.py index 72048882fe78..08d31390107b 100644 --- a/nemo/collections/tts/parts/utils/helpers.py +++ b/nemo/collections/tts/parts/utils/helpers.py @@ -141,7 +141,7 @@ def get_mask_from_lengths(lengths: Optional[torch.Tensor] = None, x: Optional[to lengths: Optional[torch.tensor] (torch.tensor): 1D tensor with lengths x: Optional[torch.tensor] = tensor to be used on, last dimension is for mask Returns: - mask (torch.tensor): num_sequences x max_length x 1 binary tensor + mask (torch.tensor): num_sequences x max_length binary tensor """ if lengths is None: assert x is not None diff --git a/tests/collections/tts/losses/test_audio_codec_loss.py b/tests/collections/tts/losses/test_audio_codec_loss.py index 0fe7991e92cb..60ea8d293655 100644 --- a/tests/collections/tts/losses/test_audio_codec_loss.py +++ b/tests/collections/tts/losses/test_audio_codec_loss.py @@ -14,8 +14,10 @@ import pytest import torch +from torchmetrics import ScaleInvariantSignalDistortionRatio -from nemo.collections.tts.losses.audio_codec_loss import MaskedMAELoss, MaskedMSELoss +from nemo.collections.tts.losses.audio_codec_loss import MaskedMAELoss, MaskedMSELoss, SISDRLoss +from nemo.collections.tts.parts.utils.helpers import mask_sequence_tensor class TestAudioCodecLoss: @@ -42,3 +44,47 @@ def test_masked_loss_l2(self): loss = loss_fn(predicted=predicted, target=target, target_len=target_len) assert loss == (4 / 3) + + @pytest.mark.run_only_on('CPU') + @pytest.mark.unit + def test_si_sdr_loss(self): + loss_fn = SISDRLoss() + sdr_fn = ScaleInvariantSignalDistortionRatio(zero_mean=True) + + num_samples = 1000 + torch.manual_seed(100) + target = torch.rand([1, num_samples]) + predicted = torch.rand([1, num_samples]) + target_len = torch.tensor([num_samples, num_samples]) + + torch_si_sdr = sdr_fn(preds=predicted, target=target) + loss = loss_fn(audio_real=target, audio_gen=predicted, audio_len=target_len) + si_sdr = -loss + + torch.testing.assert_close(actual=si_sdr, expected=torch_si_sdr) + + @pytest.mark.run_only_on('CPU') + @pytest.mark.unit + def test_si_sdr_loss_batch(self): + loss_fn = SISDRLoss() + si_sdr_fn = ScaleInvariantSignalDistortionRatio(zero_mean=True) + + batch_size = 3 + num_samples = 1000 + torch.manual_seed(100) + target = torch.rand([batch_size, num_samples]) + predicted = torch.rand([batch_size, num_samples]) + + target_len = torch.tensor([500, 250, 900]) + target = mask_sequence_tensor(target, lengths=target_len) + predicted = mask_sequence_tensor(predicted, lengths=target_len) + + torch_si_sdr = 0.0 + for i in range(batch_size): + torch_si_sdr += si_sdr_fn(preds=predicted[i, : target_len[i]], target=target[i, : target_len[i]]) + torch_si_sdr /= batch_size + + loss = loss_fn(audio_real=target, audio_gen=predicted, audio_len=target_len) + si_sdr = -loss + + torch.testing.assert_close(actual=si_sdr, expected=torch_si_sdr) diff --git a/tests/collections/tts/modules/test_audio_codec_modules.py b/tests/collections/tts/modules/test_audio_codec_modules.py index 4650a6508edd..b48b415547fe 100644 --- a/tests/collections/tts/modules/test_audio_codec_modules.py +++ b/tests/collections/tts/modules/test_audio_codec_modules.py @@ -15,12 +15,7 @@ import pytest import torch -from nemo.collections.tts.modules.audio_codec_modules import ( - Conv1dNorm, - ConvTranspose1dNorm, - get_down_sample_padding, - get_up_sample_padding, -) +from nemo.collections.tts.modules.audio_codec_modules import Conv1dNorm, ConvTranspose1dNorm, get_down_sample_padding class TestAudioCodecModules: From 03b284631541cda75496b127b3be7c1ecdecc149 Mon Sep 17 00:00:00 2001 From: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Date: Sat, 7 Oct 2023 05:51:11 +0400 Subject: [PATCH 107/112] Create per.py (#7538) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover * remove copy from other models Signed-off-by: Maanu Grover * modify attribute not arg Signed-off-by: Maanu Grover * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover * rename function and add docstring Signed-off-by: Maanu Grover * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover * unnecessary function and cfg reset Signed-off-by: Maanu Grover * set default value Signed-off-by: Maanu Grover * fix precision lookup in a few more places Signed-off-by: Maanu Grover * rename mapping function Signed-off-by: Maanu Grover * ununsed import Signed-off-by: Maanu Grover * save torch datatype to model Signed-off-by: Maanu Grover * set weights precision wrt amp o2 Signed-off-by: Maanu Grover * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover * revert half precision at inference attempt Signed-off-by: Maanu Grover * move autocast dtype to base model Signed-off-by: Maanu Grover * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover * unused imports Signed-off-by: Maanu Grover --------- Signed-off-by: Maanu Grover Signed-off-by: Sasha Meister * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon * Fix typo Signed-off-by: Tim Moon * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon * Add distopt logic for int dtypes Signed-off-by: Tim Moon * Update Apex commit Signed-off-by: Tim Moon * Remove unused variables Signed-off-by: Tim Moon * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon --------- Signed-off-by: Tim Moon Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper Signed-off-by: Sasha Meister * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang * Update Jenkinsfile Signed-off-by: Jason Wang * remove fast_swiglu configuration Signed-off-by: Jason Wang --------- Signed-off-by: Jason Wang Co-authored-by: Eric Harper Signed-off-by: Sasha Meister * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong Co-authored-by: Somshubra Majumdar Signed-off-by: Sasha Meister * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić Signed-off-by: Sasha Meister * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar * update commit Signed-off-by: Abhinav Khattar --------- Signed-off-by: Abhinav Khattar Signed-off-by: Sasha Meister * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover * move precision copy before super constructor Signed-off-by: Maanu Grover * use trainer arg Signed-off-by: Maanu Grover --------- Signed-off-by: Maanu Grover Signed-off-by: Sasha Meister * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar * Add support for auto extracting tokenizer model Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar * Fix issue with missing tokenizer Signed-off-by: smajumdar * Refactor Signed-off-by: smajumdar * Refactor Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper * move dist ckpt Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper * when using mcore we can change tp pp on the fly Signed-off-by: eharper * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper * fix load dist ckpt Signed-off-by: jasonwan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper * remove import Signed-off-by: eharper --------- Signed-off-by: eharper Signed-off-by: jasonwan Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan Signed-off-by: Sasha Meister * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang Co-authored-by: Jimmy Zhang Signed-off-by: Sasha Meister * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * make loss mask default to false (#7407) Signed-off-by: eharper Signed-off-by: Sasha Meister * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym Signed-off-by: Sasha Meister * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar Signed-off-by: Sasha Meister * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd Signed-off-by: Sasha Meister * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov Signed-off-by: Sasha Meister * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov Signed-off-by: Sasha Meister * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong Signed-off-by: Sasha Meister * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong Signed-off-by: Sasha Meister * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell Signed-off-by: Sasha Meister * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang Signed-off-by: Sasha Meister * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> --------- Signed-off-by: Robin Dong Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev * tests added Signed-off-by: Aleksandr Laptev --------- Signed-off-by: Aleksandr Laptev Signed-off-by: Sasha Meister * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Eric Harper Signed-off-by: Sasha Meister * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong * Update tacotron2.py Signed-off-by: Jason --------- Signed-off-by: Robin Dong Signed-off-by: Jason Co-authored-by: Jason Signed-off-by: Sasha Meister * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan * [TTS] Fix audio codec tests Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: Sasha Meister * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: Sasha Meister * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi * transpose conv1d inputs Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv branch Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi * add collection classes Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi * correct manifest bug when having only audio or only videos Signed-off-by: mburchi * correct manifest bug when having only audio or only videos Signed-off-by: mburchi * clean references Signed-off-by: mburchi * freeze unfreeze transcribe cv models Signed-off-by: mburchi * correct manifest get_full_path bug Signed-off-by: mburchi * update for PR Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi * add self.out = None to asr subsampling Signed-off-by: mburchi * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman Signed-off-by: Sasha Meister * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek * StarCoder conversion test Signed-off-by: Jan Lasek * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek * Fix test Signed-off-by: Jan Lasek * Catch up with save_to changes Signed-off-by: Jan Lasek * Don't abbreviate args for clarity Signed-off-by: Jan Lasek * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu Co-authored-by: Hongbin Liu Signed-off-by: Sasha Meister * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh Signed-off-by: Sasha Meister * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen Co-authored-by: Gerald Shen <119401249+gshennvm@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Sasha Meister * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman Signed-off-by: Sasha Meister * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan * [TTS] Add comment about read failures Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: Sasha Meister * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong --------- Signed-off-by: Robin Dong Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman Signed-off-by: Sasha Meister * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang Co-authored-by: Jocelyn Signed-off-by: Sasha Meister * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister * fix (#7511) Signed-off-by: Abhinav Khattar Signed-off-by: Sasha Meister * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan Signed-off-by: Sasha Meister * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria * add test Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Sasha Meister * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Create punctuation_rates.py Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Format fixing Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister * fix typo Signed-off-by: Sasha Meister * added function for import and call, docstrings Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu * mark autogenrated and remove it for test Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan Co-authored-by: Kunal Dhawan Co-authored-by: Nithin Rao Signed-off-by: Sasha Meister * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Signed-off-by: Sasha Meister * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Sasha Meister * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu * list of fields for context Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * Fix bug Signed-off-by: Cheng-Ping Hsieh * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh * Remove unused import Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * Change assert logic Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh * Remove random truncation Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh --------- Signed-off-by: arendu Signed-off-by: Cheng-Ping Hsieh Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui * code style warnings Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui * add copyright header Signed-off-by: Chen Cui * fix code check warnings Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui * update deprecation notices Signed-off-by: Chen Cui * update deprecation notices Signed-off-by: Chen Cui * consolidate peft and sft scripts Signed-off-by: Chen Cui * update CI tests Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui * support pre-extracted checkpoints Signed-off-by: Chen Cui --------- Signed-off-by: jasonwan Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Signed-off-by: Adi Renduchintala Signed-off-by: arendu Signed-off-by: Cheng-Ping Hsieh Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Chen Cui Co-authored-by: Chen Cui Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn Co-authored-by: jasonwan Co-authored-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Co-authored-by: Adi Renduchintala Co-authored-by: Yuanzhe Dong Co-authored-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister * fix a typo (#7496) Signed-off-by: BestJuly Signed-off-by: Sasha Meister * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong --------- Signed-off-by: Robin Dong Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan Signed-off-by: smajumdar Co-authored-by: Kunal Dhawan Co-authored-by: Somshubra Majumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Signed-off-by: Sasha Meister * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Signed-off-by: Sasha Meister * Fix typos (#7581) Signed-off-by: Sasha Meister * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * added per tests Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Signed-off-by: Sasha Meister * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek * Test with pyt:23.09 container Signed-off-by: Jan Lasek --------- Signed-off-by: Jan Lasek Signed-off-by: Sasha Meister * defaults changed (#7600) * defaults changed Signed-off-by: arendu * typo Signed-off-by: arendu * update Signed-off-by: arendu --------- Signed-off-by: arendu Signed-off-by: Sasha Meister * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria --------- Signed-off-by: GiacomoLeoneMaria Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Sasha Meister * rate_punctuation.py Fixed output manifest saving Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Fix tests Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * Add files via upload (#7598) specifies the branch Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym Co-authored-by: Sangkug Lym Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason --------- Signed-off-by: Jason Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Function name fixing Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Moving PER to speech_to_text_eval.py Added: - "use_per": PER metric computing; - "scores_per_sample": metrics computation sample by sample for wer/cer/punctuation rates; - "output_with_scores_filename": saving manifest with metrics Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * Update test_metrics.py Updated "punctuation_error_rate" function name Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Added use_per description Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * guard extra dependencies Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Write metrics to "output_filename" if "scores_per_sample=True" Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * scores_per_sample description Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix import guards Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Stats printing when HAVE_TABLUATE_AND_PANDAS=False Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree Signed-off-by: Sasha Meister * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui * Update peft_config.py brackets Signed-off-by: Adi Renduchintala --------- Signed-off-by: Chen Cui Signed-off-by: Adi Renduchintala Co-authored-by: Adi Renduchintala Signed-off-by: Sasha Meister * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon Signed-off-by: Sasha Meister * Delete examples/asr/rate_punctuation.py Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * Added use_per description Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * metric and variables name fixing Signed-off-by: Sasha Meister * Add else samples = None Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Koluguri Signed-off-by: Sasha Meister * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Co-authored-by: Eric Harper Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić Co-authored-by: anteju <108555623+anteju@users.noreply.github.com> Signed-off-by: Sasha Meister * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar * Guard MeCab and Ipadic Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar * Fix remaining ASR dataclasses Signed-off-by: smajumdar * Fix scripts Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym Co-authored-by: Sangkug Lym Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason --------- Signed-off-by: Jason Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar Signed-off-by: Sangkug Lym Signed-off-by: Jason Co-authored-by: Somshubra Majumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym Co-authored-by: Jason Signed-off-by: Sasha Meister * moved per sample metrics computing to transcribe_utils Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Moved punctuation rates printing to punct_er Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Added reset for DatasetPunctuationErrorRate class Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Added compute_metrics_per_sample description Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update megatron_gpt_peft_models.py Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update speech_to_text_eval.py Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> * Copyright year fixing Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> * "& AFFILIATES" added Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> --------- Signed-off-by: Maanu Grover Signed-off-by: Sasha Meister Signed-off-by: Jason Wang Signed-off-by: Tim Moon Signed-off-by: Robin Dong Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Ante Jukić Signed-off-by: Abhinav Khattar Signed-off-by: smajumdar Signed-off-by: eharper Signed-off-by: jasonwan Signed-off-by: Jimmy Zhang Signed-off-by: Abhishree Signed-off-by: Sangkug Lym Signed-off-by: George Zelenfroynd Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Anton Peganov Signed-off-by: Nikolay Karpov Signed-off-by: arendu Signed-off-by: Samuele Cornell Signed-off-by: Cheng-Ping Hsieh Signed-off-by: KunalDhawan Signed-off-by: Aleksandr Laptev Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Jason Signed-off-by: Ryan Signed-off-by: mburchi Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> Signed-off-by: Jan Lasek Signed-off-by: Hongbin Liu Signed-off-by: Tamerlan Tabolov Signed-off-by: Gerald Shen Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Stas Bekman Signed-off-by: Jocelyn Huang Signed-off-by: GiacomoLeoneMaria Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com> Signed-off-by: Nithin Rao Koluguri Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Signed-off-by: Adi Renduchintala Signed-off-by: arendu Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Chen Cui Signed-off-by: BestJuly Signed-off-by: Elena Rastorgueva Signed-off-by: dimapihtar Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: Mehadi Hasan Menon Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com> Co-authored-by: Jason Wang Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Eric Harper Co-authored-by: Robin Dong Co-authored-by: Somshubra Majumdar Co-authored-by: anteju <108555623+anteju@users.noreply.github.com> Co-authored-by: Abhinav Khattar Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com> Co-authored-by: Jimmy Zhang Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Sangkug Lym Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: PeganovAnton Co-authored-by: Nikolay Karpov Co-authored-by: Adi Renduchintala Co-authored-by: Samuele Cornell Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Co-authored-by: Yang Zhang Co-authored-by: Kunal Dhawan Co-authored-by: Aleksandr Laptev Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Jason Co-authored-by: Ryan Langman Co-authored-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> Co-authored-by: Igor Gitman Co-authored-by: Jan Lasek Co-authored-by: Kelvin Liu Co-authored-by: Hongbin Liu Co-authored-by: Tamerlan Tabolov Co-authored-by: Gerald Shen <119401249+gshennvm@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Stas Bekman Co-authored-by: Jocelyn Co-authored-by: Giacomo Leone Maria Cavallini <72698188+GiacomoLeoneMaria@users.noreply.github.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com> Co-authored-by: Nithin Rao Co-authored-by: meatybobby Co-authored-by: Chen Cui Co-authored-by: Marc Romeyn Co-authored-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Co-authored-by: Yuanzhe Dong Co-authored-by: Cheng-Ping Hsieh Co-authored-by: Li Tao Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Igor Gitman Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Mehadi Hasan Menon Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Signed-off-by: Sasha Meister --- examples/asr/speech_to_text_eval.py | 44 +- .../asr/parts/utils/transcribe_utils.py | 92 ++++ nemo/collections/common/metrics/punct_er.py | 473 ++++++++++++++++++ tests/collections/common/test_metrics.py | 73 +++ 4 files changed, 680 insertions(+), 2 deletions(-) create mode 100644 nemo/collections/common/metrics/punct_er.py diff --git a/examples/asr/speech_to_text_eval.py b/examples/asr/speech_to_text_eval.py index 452aa8202660..9e24f0172208 100644 --- a/examples/asr/speech_to_text_eval.py +++ b/examples/asr/speech_to_text_eval.py @@ -25,12 +25,18 @@ for full list of arguments >> dataset_manifest: Required - path to dataset JSON manifest file (in NeMo format) - output_filename: Optional - output filename where the transcriptions will be written. + output_filename: Optional - output filename where the transcriptions will be written. (if scores_per_sample=True, + metrics per sample will be written there too) use_cer: Bool, whether to compute CER or WER + use_punct_er: Bool, compute dataset Punctuation Error Rate (set the punctuation marks for metrics computation with + "text_processing.punctuation_marks") + tolerance: Float, minimum WER/CER required to pass some arbitrary tolerance. only_score_manifest: Bool, when set will skip audio transcription and just calculate WER of provided manifest. + scores_per_sample: Bool, compute metrics for each sample separately (if only_score_manifest=True, scores per sample + will be added to the manifest at the dataset_manifest path) # Usage @@ -66,7 +72,12 @@ from omegaconf import MISSING, OmegaConf, open_dict from nemo.collections.asr.metrics.wer import word_error_rate -from nemo.collections.asr.parts.utils.transcribe_utils import PunctuationCapitalization, TextProcessingConfig +from nemo.collections.asr.parts.utils.transcribe_utils import ( + PunctuationCapitalization, + TextProcessingConfig, + compute_metrics_per_sample, +) +from nemo.collections.common.metrics.punct_er import DatasetPunctuationErrorRate from nemo.core.config import hydra_runner from nemo.utils import logging @@ -82,9 +93,11 @@ class EvaluationConfig(transcribe_speech.TranscriptionConfig): att_context_size: Optional[list] = None use_cer: bool = False + use_punct_er: bool = False tolerance: Optional[float] = None only_score_manifest: bool = False + scores_per_sample: bool = False text_processing: Optional[TextProcessingConfig] = TextProcessingConfig( punctuation_marks=".,?", separate_punctuation=False, do_lowercase=False, rm_punctuation=False, @@ -154,6 +167,29 @@ def main(cfg: EvaluationConfig): f"contain value for `pred_text`." ) + if cfg.use_punct_er: + dper_obj = DatasetPunctuationErrorRate( + hypotheses=predicted_text, + references=ground_truth_text, + punctuation_marks=list(cfg.text_processing.punctuation_marks), + ) + dper_obj.compute() + + if cfg.scores_per_sample: + metrics_to_compute = ["wer", "cer"] + + if cfg.use_punct_er: + metrics_to_compute.append("punct_er") + + samples_with_metrics = compute_metrics_per_sample( + manifest_path=cfg.dataset_manifest, + reference_field="text", + hypothesis_field="pred_text", + metrics=metrics_to_compute, + punctuation_marks=cfg.text_processing.punctuation_marks, + output_manifest_path=cfg.output_filename, + ) + # Compute the WER cer = word_error_rate(hypotheses=predicted_text, references=ground_truth_text, use_cer=True) wer = word_error_rate(hypotheses=predicted_text, references=ground_truth_text, use_cer=False) @@ -173,6 +209,10 @@ def main(cfg: EvaluationConfig): logging.info(f'Dataset WER/CER ' + str(round(100 * wer, 2)) + "%/" + str(round(100 * cer, 2)) + "%") + if cfg.use_punct_er: + dper_obj.print() + dper_obj.reset() + # Inject the metric name and score into the config, and return the entire config with open_dict(cfg): cfg.metric_name = metric_name diff --git a/nemo/collections/asr/parts/utils/transcribe_utils.py b/nemo/collections/asr/parts/utils/transcribe_utils.py index f4508709b17c..8d80396dd82e 100644 --- a/nemo/collections/asr/parts/utils/transcribe_utils.py +++ b/nemo/collections/asr/parts/utils/transcribe_utils.py @@ -23,9 +23,11 @@ from tqdm.auto import tqdm import nemo.collections.asr as nemo_asr +from nemo.collections.asr.metrics.wer import word_error_rate from nemo.collections.asr.models import ASRModel, EncDecHybridRNNTCTCModel from nemo.collections.asr.parts.utils import rnnt_utils from nemo.collections.asr.parts.utils.streaming_utils import FrameBatchASR +from nemo.collections.common.metrics.punct_er import OccurancePunctuationErrorRate from nemo.collections.common.parts.preprocessing.manifest import get_full_path from nemo.utils import logging, model_utils @@ -472,6 +474,96 @@ def transcribe_partial_audio( return hypotheses +def compute_metrics_per_sample( + manifest_path: str, + reference_field: str = "text", + hypothesis_field: str = "pred_text", + metrics: list[str] = ["wer"], + punctuation_marks: list[str] = [".", ",", "?"], + output_manifest_path: str = None, +) -> dict: + + ''' + Computes metrics per sample for given manifest + + Args: + manifest_path: str, Required - path to dataset JSON manifest file (in NeMo format) + reference_field: str, Optional - name of field in .json manifest with the reference text ("text" by default). + hypothesis_field: str, Optional - name of field in .json manifest with the hypothesis text ("pred_text" by default). + metrics: list[str], Optional - list of metrics to be computed (currently supported "wer", "cer", "punct_er") + punctuation_marks: list[str], Optional - list of punctuation marks for computing punctuation error rate ([".", ",", "?"] by default). + output_manifest_path: str, Optional - path where .json manifest with calculated metrics will be saved. + + Returns: + samples: dict - Dict of samples with calculated metrics + ''' + + supported_metrics = ["wer", "cer", "punct_er"] + + if len(metrics) == 0: + raise AssertionError( + f"'metrics' list is empty. \ + Select the metrics from the supported: {supported_metrics}." + ) + + for metric in metrics: + if metric not in supported_metrics: + raise AssertionError( + f"'{metric}' metric is not supported. \ + Currently supported metrics are {supported_metrics}." + ) + + if "punct_er" in metrics: + if len(punctuation_marks) == 0: + raise AssertionError("punctuation_marks list can't be empty when 'punct_er' metric is enabled.") + else: + oper_obj = OccurancePunctuationErrorRate(punctuation_marks=punctuation_marks) + + use_wer = "wer" in metrics + use_cer = "cer" in metrics + use_punct_er = "punct_er" in metrics + + with open(manifest_path, 'r') as manifest: + lines = manifest.readlines() + samples = [json.loads(line) for line in lines] + samples_with_metrics = [] + + logging.info(f"Computing {', '.join(metrics)} per sample") + + for sample in tqdm(samples): + reference = sample[reference_field] + hypothesis = sample[hypothesis_field] + + if use_wer: + sample_wer = word_error_rate(hypotheses=[hypothesis], references=[reference], use_cer=False) + sample["wer"] = round(100 * sample_wer, 2) + + if use_cer: + sample_cer = word_error_rate(hypotheses=[hypothesis], references=[reference], use_cer=True) + sample["cer"] = round(100 * sample_cer, 2) + + if use_punct_er: + operation_amounts, substitution_amounts, punctuation_rates = oper_obj.compute( + reference=reference, hypothesis=hypothesis + ) + sample["punct_correct_rate"] = round(100 * punctuation_rates.correct_rate, 2) + sample["punct_deletions_rate"] = round(100 * punctuation_rates.deletions_rate, 2) + sample["punct_insertions_rate"] = round(100 * punctuation_rates.insertions_rate, 2) + sample["punct_substitutions_rate"] = round(100 * punctuation_rates.substitutions_rate, 2) + sample["punct_error_rate"] = round(100 * punctuation_rates.punct_er, 2) + + samples_with_metrics.append(sample) + + if output_manifest_path is not None: + with open(output_manifest_path, 'w') as output: + for sample in samples_with_metrics: + line = json.dumps(sample) + output.writelines(f'{line}\n') + logging.info(f'Output manifest saved: {output_manifest_path}') + + return samples_with_metrics + + class PunctuationCapitalization: def __init__(self, punctuation_marks: str): """ diff --git a/nemo/collections/common/metrics/punct_er.py b/nemo/collections/common/metrics/punct_er.py new file mode 100644 index 000000000000..933c1581f016 --- /dev/null +++ b/nemo/collections/common/metrics/punct_er.py @@ -0,0 +1,473 @@ +# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import re +from collections import namedtuple +from tqdm import tqdm + +from nemo.utils import logging + +try: + import pandas as pd + from tabulate import tabulate + + HAVE_TABLUATE_AND_PANDAS = True +except (ImportError, ModuleNotFoundError): + HAVE_TABLUATE_AND_PANDAS = False + + +def punctuation_error_rate( + references: list[str], hypotheses: list[str], punctuation_marks: list[str], punctuation_mask: str = "[PUNCT]", +) -> None: + + """ + Computes Punctuation Error Rate + + Args: + references (list[str]) - list of references + hypotheses (list[str]) - list of hypotheses + punctuation_marks (list[str]) - list of punctuation marks for computing metrics + punctuation_mask (str, by default "[PUNCT]") - mask token that will be applied to + given punctuation marks while edit distance calculation + + Return: + punct_er (float) - Punctuation Error Rate + """ + + dper_obj = DatasetPunctuationErrorRate( + references=references, + hypotheses=hypotheses, + punctuation_marks=punctuation_marks, + punctuation_mask=punctuation_mask, + ) + + dper_obj.compute() + + return dper_obj.punct_er + + +class OccurancePunctuationErrorRate: + """ + Class for computation puncutation-related absolute amounts of operations and thier rates + between reference and hypothesis strings: + - Absolute amounts of correct predictions, deletions, insertions + and substitutions for each given punctuation mark + - Rates of correct predictions, deletions, insertions + and substitutions for each given punctuation mark + - Overall rates of correct predictions, deletions, insertions + and substiturions between reference and hypothesis string + - Punctuation Error Rate + + Args to init: + punctuation_marks (list[str]) - list of punctuation marks for computing metrics + punctuation_mask (str, by default "[PUNCT]") - mask token that will be applied to + given punctuation marks while edit distance calculation + + How to use: + 1. Create object of OccurancePunctuationErrorRate class. + Example: + punctuation_marks = [".", ",", "!", "?"] + oper_obj = OccurancePunctuationErrorRate(punctuation_marks) + + 2. To compute punctuation metrics, pass reference and hypothesis string to the "compute" method + of created object. + Example: + reference_str = "Hi, dear! Nice to see you. What's" + hypothesis_str = "Hi dear! Nice to see you! What's?" + oper_obj.compute(reference_str, hypothesis_str) + + Output (listed in order of output): + 1. Dict of absolute operations amounts for each given punctuation mark: + Example: + {'.': {'Correct': 0, 'Deletions': 0, 'Insertions': 0, 'Substitutions': 1}, + ',': {'Correct': 0, 'Deletions': 1, 'Insertions': 0, 'Substitutions': 0}, + '!': {'Correct': 1, 'Deletions': 0, 'Insertions': 0, 'Substitutions': 0}, + '?': {'Correct': 0, 'Deletions': 0, 'Insertions': 1, 'Substitutions': 0}} + + 2. Dict of substitutions absolute amounts between given punctuation marks: + Example: + {'.': {'.': 0, ',': 0, '!': 1, '?': 0}, + ',': {'.': 0, ',': 0, '!': 0, '?': 0}, + '!': {'.': 0, ',': 0, '!': 0, '?': 0}, + '?': {'.': 0, ',': 0, '!': 0, '?': 0}} + + 3. namedtuple "PunctuationRates" of punctuation operation rates (in range from 0 to 1): + 3.1. correct_rate - overall correct rate + Example: correct_rate=0.25 + 3.2. deletions_rate - overall deletions rate + Example: deletions_rate=0.25 + 3.3. insertions_rate - overall insertions rate + Example: insertions_rate=0.25 + 3.4. substitutions_rate - overall substitutions_rate + Example: substitutions_rate=0.25 + 3.5. punct_er - Punctuation Error Rate + Example: punct_er=0.75 + 3.6. operation_rates - dict of operations rates for each given punctuation mark + Example: + operation_rates={ + '.': {'Correct': 0.0, 'Deletions': 0.0, 'Insertions': 0.0, 'Substitutions': 1.0}, + ',': {'Correct': 0.0, 'Deletions': 1.0, 'Insertions': 0.0, 'Substitutions': 0.0}, + '!': {'Correct': 1.0, 'Deletions': 0.0, 'Insertions': 0.0, 'Substitutions': 0.0}, + '?': {'Correct': 0.0, 'Deletions': 0.0, 'Insertions': 1.0, 'Substitutions': 0.0} + } + + 3.7. substitution_rates - dict of substitution rates for each given punctuation mark + Example: + substitution_rates={ + '.': {'.': 0.0, ',': 0.0, '!': 1.0, '?': 0.0}, + ',': {'.': 0.0, ',': 0.0, '!': 0.0, '?': 0.0}, + '!': {'.': 0.0, ',': 0.0, '!': 0.0, '?': 0.0}, + '?': {'.': 0.0, ',': 0.0, '!': 0.0, '?': 0.0} + } + """ + + def __init__(self, punctuation_marks: list[str], punctuation_mask: str = "[PUNCT]") -> None: + + assert len(punctuation_marks) != 0, f"List of punctuation marks is empty" + + self.punctuation_marks = punctuation_marks + self.punctuation_mask = punctuation_mask + + self.operations = ["Correct", "Deletions", "Insertions", "Substitutions"] + + def compute_rates(self, operation_amounts: dict, substitution_amounts: dict): + operation_rates = {pm: {operation: 0 for operation in self.operations} for pm in self.punctuation_marks} + substitution_rates = {pm: {pm: 0 for pm in self.punctuation_marks} for pm in self.punctuation_marks} + + for pm in self.punctuation_marks: + operations_amount_by_pm = sum(operation_amounts[pm].values()) + + if operations_amount_by_pm == 0: + continue + + operation_rates[pm] = { + operation: (operation_amounts[pm][operation] / operations_amount_by_pm) + for operation in self.operations + } + + substitution_rates[pm] = { + _pm: (substitution_amounts[pm][_pm] / operations_amount_by_pm) + for _pm in substitution_amounts[pm].keys() + } + + _operation_amounts = { + operation: {pm: operation_amounts[operation] for pm, operation_amounts in operation_amounts.items()} + for operation in self.operations + } + + overall_amounts_by_operation = { + operation: sum(_operation_amounts[operation].values()) for operation in _operation_amounts + } + overall_operations_amount = sum(overall_amounts_by_operation.values()) + + punctuation_rates = namedtuple( + 'PunctuationRates', + [ + 'correct_rate', + 'deletions_rate', + 'insertions_rate', + 'substitutions_rate', + 'punct_er', + 'operation_rates', + 'substitution_rates', + ], + ) + + if overall_operations_amount == 0: + rates = punctuation_rates(0, 0, 0, 0, 0, operation_rates, substitution_rates) + else: + correct_rate = overall_amounts_by_operation["Correct"] / overall_operations_amount + deletions_rate = overall_amounts_by_operation["Deletions"] / overall_operations_amount + insertions_rate = overall_amounts_by_operation["Insertions"] / overall_operations_amount + substitutions_rate = overall_amounts_by_operation["Substitutions"] / overall_operations_amount + punct_er = deletions_rate + insertions_rate + substitutions_rate + + rates = punctuation_rates( + correct_rate, + deletions_rate, + insertions_rate, + substitutions_rate, + punct_er, + operation_rates, + substitution_rates, + ) + + return rates + + def compute_operation_amounts(self, reference: str, hypothesis: str): + operation_amounts = {pm: {operation: 0 for operation in self.operations} for pm in self.punctuation_marks} + substitution_amounts = {pm: {pm: 0 for pm in self.punctuation_marks} for pm in self.punctuation_marks} + + def tokenize(text: str, punctuation_marks: list[str]): + punctuation_marks = "\\" + "\\".join(self.punctuation_marks) + tokens = re.findall(rf"[\w']+|[{punctuation_marks}]", text) + return tokens + + def mask_punct_tokens(tokens: list[str], punctuation_marks: list[str], punctuation_mask: str): + masked = [punctuation_mask if token in punctuation_marks else token for token in tokens] + return masked + + r_tokens = tokenize(reference, self.punctuation_marks) + h_tokens = tokenize(hypothesis, self.punctuation_marks) + + r_masked = mask_punct_tokens(r_tokens, self.punctuation_marks, self.punctuation_mask) + h_masked = mask_punct_tokens(h_tokens, self.punctuation_marks, self.punctuation_mask) + + r_punct_amount = r_masked.count(self.punctuation_mask) + h_punct_amount = h_masked.count(self.punctuation_mask) + + if r_punct_amount + h_punct_amount == 0: + return operation_amounts, substitution_amounts + + r_len = len(r_masked) + h_len = len(h_masked) + + costs = [[0 for inner in range(h_len + 1)] for outer in range(r_len + 1)] + backtrace = [[0 for inner in range(h_len + 1)] for outer in range(r_len + 1)] + + COR = 'C' + DEL, DEL_PENALTY = 'D', 1 + INS, INS_PENALTY = 'I', 1 + SUB, SUB_PENALTY = 'S', 1 + + for i in range(1, r_len + 1): + costs[i][0] = DEL_PENALTY * i + backtrace[i][0] = DEL + + for j in range(1, h_len + 1): + costs[0][j] = INS_PENALTY * j + backtrace[0][j] = INS + + for j in range(1, h_len + 1): + costs[0][j] = INS_PENALTY * j + backtrace[0][j] = INS + + for i in range(1, r_len + 1): + for j in range(1, h_len + 1): + if r_masked[i - 1] == h_masked[j - 1]: + costs[i][j] = costs[i - 1][j - 1] + backtrace[i][j] = COR + else: + substitution_cost = costs[i - 1][j - 1] + SUB_PENALTY + insertion_cost = costs[i][j - 1] + INS_PENALTY + deletion_cost = costs[i - 1][j] + DEL_PENALTY + + costs[i][j] = min(substitution_cost, insertion_cost, deletion_cost) + if costs[i][j] == substitution_cost: + backtrace[i][j] = SUB + elif costs[i][j] == insertion_cost: + backtrace[i][j] = INS + else: + backtrace[i][j] = DEL + + i = r_len + j = h_len + + while i > 0 or j > 0: + if backtrace[i][j] == COR: + if r_masked[i - 1] == self.punctuation_mask or h_masked[j - 1] == self.punctuation_mask: + r_token = r_tokens[i - 1] + h_token = h_tokens[j - 1] + + if r_token == h_token: + operation_amounts[r_token]['Correct'] += 1 + else: + operation_amounts[r_token]['Substitutions'] += 1 + substitution_amounts[r_token][h_token] += 1 + i -= 1 + j -= 1 + + elif backtrace[i][j] == SUB: + i -= 1 + j -= 1 + + elif backtrace[i][j] == INS: + j -= 1 + + elif backtrace[i][j] == DEL: + i -= 1 + + for pm in self.punctuation_marks: + num_of_correct = operation_amounts[pm]['Correct'] + + num_substitutions_of_pm = operation_amounts[pm]['Substitutions'] + num_substitutions_to_pm = sum([substitution_amounts[_pm][pm] for _pm in self.punctuation_marks]) + + num_of_deletions = r_tokens.count(pm) - (num_of_correct + num_substitutions_of_pm) + operation_amounts[pm]['Deletions'] = num_of_deletions + + num_of_insertions = h_tokens.count(pm) - (num_of_correct + num_substitutions_to_pm) + operation_amounts[pm]['Insertions'] = num_of_insertions + + return operation_amounts, substitution_amounts + + def compute(self, reference: str, hypothesis: str): + operation_amounts, substitution_amounts = self.compute_operation_amounts(reference, hypothesis) + punctuation_rates = self.compute_rates(operation_amounts, substitution_amounts) + return operation_amounts, substitution_amounts, punctuation_rates + + +class DatasetPunctuationErrorRate: + """ + Class for computation the total puncutation-related absolute amounts of operations and their rates + in pairs of reference and hypothesis strins: + - Absolute amounts of correct predictions, deletions, insertions + and substitutions for each given punctuation mark + - Rates of correct predictions, deletions, insertions + and substitutions for each given punctuation mark + - Total rates of correct predictions, deletions, insertions + and substiturions in pairs of reference and hypothesis strings + - Punctuation Error Rate + + Args to init: + references (list[str]) - list of references + hypotheses (list[str]) - list of hypotheses + punctuation_marks (list[str]) - list of punctuation marks for computing metrics + punctuation_mask (str, by default "[PUNCT]") - mask token that will be applied to + given punctuation marks while edit distance calculation + + How to use: + 1. Create object of DatasetPunctuationErrorRate class. + Example: + references = ["Hi, dear! Nice to see you. What's"] + hypotheses = ["Hi dear! Nice to see you! What's?"] + punctuation_marks = [".", ",", "!", "?"] + + dper_obj = DatasetPunctuationErrorRate(references, hypotheses, punctuation_marks) + + 2. To compute punctuation metrics, call the class method "compute()". + Example: + dper_obj.compute() + + Result: + The following atributes of class object will be updated with calculated metrics values. + The values are available with calling the atributes: + + dper_obj.operation_rates - dict, rates of correctness and errors for each punctuation mark + from `preset dper_obj.punctuation_marks` list. + + dper_obj.substitution_rates - dict, substitution rates between puncutation marks from + `preset dper_obj.punctuation_marks` list. + + dper_obj.correct_rate - float, total rate of correctness between provided pairs of + references and hypotheses. + + dper_obj.deletions_rate - float, total rate of deletions between provided pairs of + references and hypotheses. + + dper_obj.insertions_rate - float, total rate of insertions between provided pairs of + references and hypotheses. + + dper_obj.substitutions_rate - float, total rate of substitutions between provided pairs of + references and hypotheses. + + dper_obj.punct_er - float, total Punctuation Error Rate between provided pairs of + references and hypotheses. + """ + + def __init__( + self, + references: list[str], + hypotheses: list[str], + punctuation_marks: list[str], + punctuation_mask: str = "[PUNCT]", + ) -> None: + + self.references = references + self.hypotheses = hypotheses + self.punctuation_marks = punctuation_marks + self.punctuation_mask = punctuation_mask + + self.oper_obj = OccurancePunctuationErrorRate( + punctuation_marks=self.punctuation_marks, punctuation_mask=self.punctuation_mask + ) + + self.operation_amounts = [] + self.substitution_amounts = [] + self.rates = [] + + self.operation_rates = None + self.substitution_rates = None + self.correct_rate = None + self.deletions_rate = None + self.insertions_rate = None + self.substitutions_rate = None + self.punct_er = None + + def compute(self): + def sum_amounts(amounts_dicts: list[dict]): + amounts = {key: {_key: 0 for _key in amounts_dicts[0][key]} for key in amounts_dicts[0].keys()} + + for amounts_dict in amounts_dicts: + for outer_key, inner_dict in amounts_dict.items(): + for inner_key, value in inner_dict.items(): + amounts[outer_key][inner_key] += value + return amounts + + logging.info("Computing Punctuation Error Rate") + + for reference, hypothesis in tqdm(zip(self.references, self.hypotheses), total=len(self.references)): + operation_amounts, substitution_amounts, punctuation_rates = self.oper_obj.compute(reference, hypothesis) + self.operation_amounts.append(operation_amounts) + self.substitution_amounts.append(substitution_amounts) + self.rates.append(punctuation_rates) + + overall_operation_amounts = sum_amounts(self.operation_amounts) + overall_substitution_amounts = sum_amounts(self.substitution_amounts) + overall_rates = self.oper_obj.compute_rates( + operation_amounts=overall_operation_amounts, substitution_amounts=overall_substitution_amounts + ) + + self.operation_rates = overall_rates.operation_rates + self.substitution_rates = overall_rates.substitution_rates + self.correct_rate = overall_rates.correct_rate + self.deletions_rate = overall_rates.deletions_rate + self.insertions_rate = overall_rates.insertions_rate + self.substitutions_rate = overall_rates.substitutions_rate + self.punct_er = overall_rates.punct_er + + def reset(self): + self.operation_amounts = [] + self.substitution_amounts = [] + self.rates = [] + + self.operation_rates = None + self.substitution_rates = None + self.correct_rate = None + self.deletions_rate = None + self.insertions_rate = None + self.substitutions_rate = None + self.punct_er = None + + def print(self): + logging.info(f'Dataset PER ' + str(round(100 * self.punct_er, 2)) + '%') + + if HAVE_TABLUATE_AND_PANDAS: + rates_by_pm_df = pd.DataFrame(self.operation_rates) * 100 + substitution_rates_by_pm_df = pd.DataFrame(self.substitution_rates) * 100 + + logging.info( + "Rates of punctuation correctness and errors (%):\n" + + tabulate(rates_by_pm_df, headers='keys', tablefmt='psql') + ) + logging.info( + "Substitution rates between punctuation marks (%):\n" + + tabulate(substitution_rates_by_pm_df, headers='keys', tablefmt='psql') + ) + else: + logging.warning("Some of the modules (pandas or tabulate) can't be imported") + logging.info(f"Rates of punctuation correctness and errors (in range [0, 1]):\n{self.operation_rates}\n") + logging.info( + f"Substitution rates between punctuation marks (in range [0, 1]):\n{self.substitution_rates}\n" + ) diff --git a/tests/collections/common/test_metrics.py b/tests/collections/common/test_metrics.py index e4bfde635a06..f9005232a017 100644 --- a/tests/collections/common/test_metrics.py +++ b/tests/collections/common/test_metrics.py @@ -16,6 +16,11 @@ import torch from nemo.collections.common.metrics.classification_accuracy import TopKClassificationAccuracy +from nemo.collections.common.metrics.punct_er import ( + DatasetPunctuationErrorRate, + OccurancePunctuationErrorRate, + punctuation_error_rate, +) from .loss_inputs import ALL_NUM_MEASUREMENTS_ARE_ZERO, NO_ZERO_NUM_MEASUREMENTS, SOME_NUM_MEASUREMENTS_ARE_ZERO from .perplexity_inputs import NO_PROBS_NO_LOGITS, ONLY_LOGITS1, ONLY_LOGITS100, ONLY_PROBS, PROBS_AND_LOGITS @@ -149,3 +154,71 @@ def test_loss(self, ddp, dist_sync_on_step, loss_sum_or_avg, num_measurements, t dist_sync_on_step=dist_sync_on_step, take_avg_loss=take_avg_loss, ) + + +class TestPunctuationErrorRate: + reference = "Hi, dear! Nice to see you. What's" + hypothesis = "Hi dear! Nice to see you! What's?" + punctuation_marks = [".", ",", "!", "?"] + + operation_amounts = { + '.': {'Correct': 0, 'Deletions': 0, 'Insertions': 0, 'Substitutions': 1}, + ',': {'Correct': 0, 'Deletions': 1, 'Insertions': 0, 'Substitutions': 0}, + '!': {'Correct': 1, 'Deletions': 0, 'Insertions': 0, 'Substitutions': 0}, + '?': {'Correct': 0, 'Deletions': 0, 'Insertions': 1, 'Substitutions': 0}, + } + substitution_amounts = { + '.': {'.': 0, ',': 0, '!': 1, '?': 0}, + ',': {'.': 0, ',': 0, '!': 0, '?': 0}, + '!': {'.': 0, ',': 0, '!': 0, '?': 0}, + '?': {'.': 0, ',': 0, '!': 0, '?': 0}, + } + correct_rate = 0.25 + deletions_rate = 0.25 + insertions_rate = 0.25 + substitutions_rate = 0.25 + punct_er = 0.75 + operation_rates = { + '.': {'Correct': 0.0, 'Deletions': 0.0, 'Insertions': 0.0, 'Substitutions': 1.0}, + ',': {'Correct': 0.0, 'Deletions': 1.0, 'Insertions': 0.0, 'Substitutions': 0.0}, + '!': {'Correct': 1.0, 'Deletions': 0.0, 'Insertions': 0.0, 'Substitutions': 0.0}, + '?': {'Correct': 0.0, 'Deletions': 0.0, 'Insertions': 1.0, 'Substitutions': 0.0}, + } + substitution_rates = { + '.': {'.': 0.0, ',': 0.0, '!': 1.0, '?': 0.0}, + ',': {'.': 0.0, ',': 0.0, '!': 0.0, '?': 0.0}, + '!': {'.': 0.0, ',': 0.0, '!': 0.0, '?': 0.0}, + '?': {'.': 0.0, ',': 0.0, '!': 0.0, '?': 0.0}, + } + + @pytest.mark.unit + def test_punctuation_error_rate(self): + assert punctuation_error_rate([self.reference], [self.hypothesis], self.punctuation_marks) == self.punct_er + + @pytest.mark.unit + def test_OccurancePunctuationErrorRate(self): + oper_obj = OccurancePunctuationErrorRate(self.punctuation_marks) + operation_amounts, substitution_amounts, punctuation_rates = oper_obj.compute(self.reference, self.hypothesis) + + assert operation_amounts == self.operation_amounts + assert substitution_amounts == self.substitution_amounts + assert punctuation_rates.correct_rate == self.correct_rate + assert punctuation_rates.deletions_rate == self.deletions_rate + assert punctuation_rates.insertions_rate == self.insertions_rate + assert punctuation_rates.substitutions_rate == self.substitutions_rate + assert punctuation_rates.punct_er == self.punct_er + assert punctuation_rates.operation_rates == self.operation_rates + assert punctuation_rates.substitution_rates == self.substitution_rates + + @pytest.mark.unit + def test_DatasetPunctuationErrorRate(self): + dper_obj = DatasetPunctuationErrorRate([self.reference], [self.hypothesis], self.punctuation_marks) + dper_obj.compute() + + assert dper_obj.correct_rate == self.correct_rate + assert dper_obj.deletions_rate == self.deletions_rate + assert dper_obj.insertions_rate == self.insertions_rate + assert dper_obj.substitutions_rate == self.substitutions_rate + assert dper_obj.punct_er == self.punct_er + assert dper_obj.operation_rates == self.operation_rates + assert dper_obj.substitution_rates == self.substitution_rates From 6c777ad4911004a7f7170546cba51ab5920ef578 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 10 Oct 2023 12:53:33 +0300 Subject: [PATCH 108/112] conversion issue fix (#7648) (#7668) Signed-off-by: dimapihtar Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../nlp/models/language_modeling/megatron_gpt_model.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py index 46d3f37f5d9b..4d145ee2d330 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py @@ -1333,7 +1333,7 @@ def on_load_checkpoint(self, checkpoint) -> None: # mcore uses distributed checkpointing if self.mcore_gpt: - if 'state_dict' in checkpoint: + if 'state_dict' in checkpoint and checkpoint['state_dict']: for index, module in enumerate(self.get_gpt_module_list()): if parallel_state.get_virtual_pipeline_model_parallel_world_size() is not None: checkpoint_state_dict = checkpoint['state_dict'][f'model_{index}'] From b85e9b487dd8126c814f906f079a8a549bcb3bc2 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 10 Oct 2023 12:54:49 +0300 Subject: [PATCH 109/112] layernorm1p fix (#7523) (#7567) * layernorm1p fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add layernorm1p to if statement * config changes * gpt config changes * remove layernorm_zero_centered_gamma from gpt config * change line --------- Signed-off-by: dimapihtar Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister --- .../nlp/models/language_modeling/megatron_gpt_model.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py index 4d145ee2d330..b23fe0bbf886 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py @@ -1515,10 +1515,14 @@ def build_transformer_config(self) -> TransformerConfig: activation_func = activation_to_func(activation) normalization = self.cfg.get('normalization', 'layernorm') + layernorm_zero_centered_gamma = self.cfg.get('normalization', 'layernorm') == 'layernorm1p' if normalization == 'layernorm': normalization = 'LayerNorm' elif normalization == 'rmsnorm': normalization = 'RMSNorm' + elif normalization == 'layernorm1p': + normalization = 'LayerNorm' + layernorm_zero_centered_gamma = True else: logging.warning( f"The normalization type: {normalization} might not be supported in megatron core." @@ -1562,7 +1566,7 @@ def build_transformer_config(self) -> TransformerConfig: # any configs that are not in the nemo model config will be added here config_mapping = { 'apply_residual_connection_post_layernorm': False, # we don't use this in NeMo - 'layernorm_zero_centered_gamma': False, # not currently used in NeMo + 'layernorm_zero_centered_gamma': layernorm_zero_centered_gamma, 'add_bias_linear': add_bias_linear, 'gated_linear_unit': gated_linear_unit, 'activation_func': activation_func, From 84e18eb785dd570293e7eb2ad574c29982bb87de Mon Sep 17 00:00:00 2001 From: Yi Dong <43824965+yidong72@users.noreply.github.com> Date: Tue, 10 Oct 2023 08:22:55 -0400 Subject: [PATCH 110/112] generalized chat sft prompt (#7655) * fix dataset issues Signed-off-by: Yi Dong * working version Signed-off-by: Yi Dong * all passed Signed-off-by: Yi Dong * refactor tests Signed-off-by: Yi Dong * all pass Signed-off-by: Yi Dong * working version Signed-off-by: Yi Dong * use end name signal for labels Signed-off-by: Yi Dong * all fixed Signed-off-by: Yi Dong * update doc Signed-off-by: Yi Dong * style fix Signed-off-by: Yi Dong * remove unused imports Signed-off-by: Yi Dong * make sure nccl not timing out Signed-off-by: Yi Dong * style fix Signed-off-by: Yi Dong * generate example template Signed-off-by: Yi Dong * generic end of name token Signed-off-by: Yi Dong * style fix Signed-off-by: Yi Dong * add the chat prompt format into the config Signed-off-by: Yi Dong * make sure sft working Signed-off-by: Yi Dong * address reviewer comment Signed-off-by: Yi Dong * fix non Signed-off-by: Yi Dong * try openAI prompt Signed-off-by: Yi Dong * remove unused imports Signed-off-by: Yi Dong * remove human labels from the data Signed-off-by: Yi Dong * use hf dataset to clean Signed-off-by: Yi Dong * reviewer comments Signed-off-by: Yi Dong --------- Signed-off-by: Yi Dong Signed-off-by: Sasha Meister --- .../language_modeling/megatron_gpt_eval.py | 7 +- .../tuning/conf/megatron_gpt_sft.yaml | 6 + .../tuning/megatron_gpt_sft.py | 12 +- .../megatron/gpt_sft_chat_dataset.py | 257 +++++++---- .../megatron/gpt_sft_dataset.py | 18 +- .../megatron_gpt_sft_model.py | 5 +- .../nlp_language_modeling/sft/data_clean.py | 2 +- .../sft/preprocessing.py | 11 +- .../collections/nlp/test_chat_sft_dataset.py | 408 ++++++++++++------ 9 files changed, 488 insertions(+), 238 deletions(-) diff --git a/examples/nlp/language_modeling/megatron_gpt_eval.py b/examples/nlp/language_modeling/megatron_gpt_eval.py index 04125c6f750e..c9eb013b64e9 100644 --- a/examples/nlp/language_modeling/megatron_gpt_eval.py +++ b/examples/nlp/language_modeling/megatron_gpt_eval.py @@ -13,6 +13,7 @@ # limitations under the License. import asyncio +import datetime import os import threading from functools import partial @@ -167,7 +168,11 @@ def remove_padded_prompts(response, nb_paddings): def main(cfg) -> None: # trainer required for restoring model parallel models - trainer = Trainer(strategy=NLPDDPStrategy(), **cfg.trainer, callbacks=[CustomProgressBar()]) + trainer = Trainer( + strategy=NLPDDPStrategy(timeout=datetime.timedelta(seconds=18000)), + **cfg.trainer, + callbacks=[CustomProgressBar()], + ) if cfg.gpt_model_file is not None: if ( diff --git a/examples/nlp/language_modeling/tuning/conf/megatron_gpt_sft.yaml b/examples/nlp/language_modeling/tuning/conf/megatron_gpt_sft.yaml index b0b8eedb5633..27e73996225f 100644 --- a/examples/nlp/language_modeling/tuning/conf/megatron_gpt_sft.yaml +++ b/examples/nlp/language_modeling/tuning/conf/megatron_gpt_sft.yaml @@ -71,6 +71,12 @@ model: data: chat: False # whether use chatbot data or not + chat_prompt_tokens: # special tokens for the chat prompts, a dictionary of {token_type: token}. note that some tokenizer may combine the characters at the junction between {end_of_turn}{turn_start}. e.g. '', the '><' sometimes is merged to be a single token. This is not supported, try to avoid + system_turn_start: '' + turn_start: '' + label_start: '' + end_of_turn: "\x0A" # \0x0A is '\n' + end_of_name: "\x0A" # \0x0A is '\n' train_ds: # Example of how to specify paths to multiple datasets # file_names: diff --git a/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py b/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py index 9a70671f8073..4c4c001014db 100644 --- a/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py +++ b/examples/nlp/language_modeling/tuning/megatron_gpt_sft.py @@ -15,11 +15,12 @@ import os import tempfile +import torch.multiprocessing as mp from omegaconf.omegaconf import OmegaConf, open_dict from pytorch_lightning import Trainer from pytorch_lightning.plugins.environments import TorchElasticEnvironment -from pytorch_lightning.trainer.connectors.checkpoint_connector import _CheckpointConnector +from nemo.collections.nlp.data.language_modeling.megatron.gpt_sft_chat_dataset import get_prompt_template_example from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTSFTModel from nemo.collections.nlp.modules.common.megatron.megatron_init import fake_initialize_model_parallel from nemo.collections.nlp.parts.nlp_overrides import ( @@ -36,6 +37,8 @@ from nemo.utils.exp_manager import exp_manager from nemo.utils.model_utils import inject_model_parallel_rank +mp.set_start_method("spawn", force=True) + def _modify_config(gpt_cfg, cfg, add_cfg_to_tree=False): """ @@ -71,6 +74,13 @@ def _modify_config(gpt_cfg, cfg, add_cfg_to_tree=False): gpt_cfg.pipeline_model_parallel_size = cfg.model.get('pipeline_model_parallel_size', 1) gpt_cfg.pipeline_model_parallel_split_rank = cfg.model.get('pipeline_model_parallel_split_rank', 0) + if cfg.model.data.get('chat', False): + # chat model, overwrite the prompt template + prompt_template = get_prompt_template_example(cfg.model.data.chat_prompt_tokens) + gpt_cfg.data.train_ds.prompt_template = prompt_template + gpt_cfg.data.validation_ds.prompt_template = prompt_template + gpt_cfg.data.test_ds.prompt_template = prompt_template + sft_cls = MegatronGPTSFTModel gpt_cfg.target = f"{sft_cls.__module__}.{sft_cls.__name__}" diff --git a/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_chat_dataset.py b/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_chat_dataset.py index 801a58394f06..96cc57a300b8 100644 --- a/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_chat_dataset.py +++ b/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_chat_dataset.py @@ -16,19 +16,19 @@ import torch -from nemo.collections.common.tokenizers.sentencepiece_tokenizer import SentencePieceTokenizer from nemo.collections.common.tokenizers.tokenizer_spec import TokenizerSpec from nemo.collections.nlp.data.language_modeling.megatron.gpt_sft_dataset import GPTSFTDataset from nemo.utils import logging -__all__ = ['GPTSFTChatDataset'] +__all__ = ['GPTSFTChatDataset', 'get_prompt_template_example'] -IGNORE_INDEX = -100 -END_SIGNAL = "\n" -END_NAME_SIGNAL = "\n" -SYSTEM_TOKEN = "System\n" -TURN_TOKEN = "" +PREFIX_STR = ( + "\x00" # the prefix string used in the tokenizer to deal with the added empty token for some of the tokenizers +) + +IGNORE_INDEX = -100 +SYSTEM_TOKEN = "System" TYPE_INSTRUCTION = { 'TEXT_TO_VALUE': "", @@ -36,6 +36,56 @@ } +def _get_header_conversation_type_mask_role(source, special_tokens): + END_SIGNAL = special_tokens['end_of_turn'] + END_NAME_SIGNAL = special_tokens['end_of_name'] + + data_type = None + if 'type' in source: + data_type = source['type'] + if data_type is not None: + assert data_type in TYPE_INSTRUCTION, f"source type {data_type} not supported" + # add end signal and concatenate together + conversation = source['system'] + if data_type is not None: + if TYPE_INSTRUCTION[data_type] != '': + conversation = conversation + '\n' + TYPE_INSTRUCTION[data_type] + mask_role = source.get('mask', 'User') + header = f"{special_tokens['system_turn_start']}{SYSTEM_TOKEN}{END_NAME_SIGNAL}{conversation}{END_SIGNAL}" + conversation = _add_speaker_and_signal(header, source['conversations'], mask_role, data_type, special_tokens) + return header, conversation, data_type, mask_role + + +def get_prompt_template_example(special_tokens): + source = { + 'system': '{system message}', + 'conversations': [ + {'from': 'User', 'value': '{turn 1 user message}', 'label': None}, + {'from': 'Assistant', 'value': '{turn 1 assistant message}', 'label': '{turn 1 assistant label}'}, + {'from': 'User', 'value': '{turn 2 user message}', 'label': None}, + {'from': 'Assistant', 'value': '{turn 2 assistant message}', 'label': '{turn 2 assistant label}'}, + ], + "mask": "User", + "type": "VALUE_TO_TEXT", + } + _, conversation, _, _ = _get_header_conversation_type_mask_role(source, special_tokens) + return conversation + + +def identify_start_index_of_subsequence(subsequence, sequence): + """ find the location of the small tensor in the large tensor. + e.g. small = [1,3], large = [2,3,1,3], returns 2 + small = [3,2], large = [2,3,1,3], returns -1 + Args: + small (tensor): small tensor + large (tensor): large tensor + """ + for i in range(sequence.size(0) - subsequence.size(0) + 1): + if torch.equal(sequence[i : i + subsequence.size(0)], subsequence): + return i + return -1 + + def _mask_targets( target, tokenized_lens, @@ -45,8 +95,10 @@ def _mask_targets( tokenizer, mask_role, gtype, - extra_id_2_token_id, - new_line_token_id, + name_end_token_ids, + special_tokens, + label_start_ids, + num_turn_start_tokens, ): """ This function masks the tokens so the loss is computed only on the non-masked role's responses. For 'TEXT_TO_VALUE' type, the loss is computed on the value attributes. @@ -60,68 +112,88 @@ def _mask_targets( tokenizer (TokenizerSpec): tokenizer object mask_role (str): the speaker id to be masked from loss computation gtype (str): either 'TEXT_TO_VALUE' or 'VALUE_TO_TEXT' - extra_id_2_token_id (int): token id - new_line_token_id (int): new line token id - + name_end_token_ids (int): end of name token ids + special_tokens (dict): special tokens used for the chat prompt. It has the keys: system_turn_start, turn_start, label_start, end_of_turn + label_start_ids (list): list of label start token ids, + num_turn_start_tokens (int): number of tokens of the turn_start str """ + TURN_TOKEN = special_tokens['turn_start'] + END_NAME_SIGNAL = special_tokens['end_of_name'] + label_start_ids = torch.tensor(label_start_ids) + name_end_token_ids = torch.tensor(name_end_token_ids) + cur_idx = header_len tgt_len = target.shape[0] for i, (tokenized_len, speaker, s_id) in enumerate(zip(tokenized_lens, speakers, s_ids)): # note, sentence piece will add extra empty token in front. has to compute the diff - id1 = tokenizer.text_to_ids("") - id2 = tokenizer.text_to_ids("" + TURN_TOKEN + speaker + END_NAME_SIGNAL) - skip_name_len = len(id2) - len(id1) - if extra_id_2_token_id is None: - raise ValueError("extra_id_2 is not in the vocabulary") - if (s_id == extra_id_2_token_id).any().item(): + id1 = tokenizer.text_to_ids(PREFIX_STR) + id2 = tokenizer.text_to_ids(PREFIX_STR + TURN_TOKEN + speaker + END_NAME_SIGNAL) + skip_name_len = len(id2) - len( + id1 + ) # s_ids[:skip_name_len] is the name part of the prompt 'TURN_TOKEN + speaker + END_NAME_SIGNAL' + # get the position of the label start string in this turn + location = identify_start_index_of_subsequence(label_start_ids, s_id) + + if location >= 0: + # if it contains the label start tokens if gtype == 'VALUE_TO_TEXT': - # if contains the token - assert skip_name_len == torch.where((s_id == extra_id_2_token_id))[0].item() - # find new line token id 14 - more_skip_len = torch.where((s_id[skip_name_len:] == new_line_token_id))[0][0].item() + 1 + # handles the case that condition on labels to generate respone + # the next token after the name part of the prompt is the beginning of the label start tokens + assert skip_name_len == location + # find the first new line token after the label part, which indicates the end of the whole label string + # newline_loc = torch.where((s_id[skip_name_len:] == name_end_token_ids))[0] + newline_loc = identify_start_index_of_subsequence(name_end_token_ids, s_id[skip_name_len:]) + if newline_loc < 0: + # cannot find new line token, which means the the whole turn is just a partial label string. Mask the whole turn + target[cur_idx : cur_idx + tokenized_len] = IGNORE_INDEX + continue + # skip the label part and the new line token + more_skip_len = newline_loc + len(name_end_token_ids) + # skip the name part and the label part skip_name_len += more_skip_len elif gtype == 'TEXT_TO_VALUE': - skip_name_len = torch.where((s_id == extra_id_2_token_id))[0].item() + 1 + # handles the case that condition on response to generate label + # skip the name part, response and the label start tokens part, the remainder is the label string without label start, e.g. 'quality:9,toxicity:8...' + skip_name_len = location + len(label_start_ids) if cur_idx >= tgt_len: break elif cur_idx + tokenized_len < tgt_len: - # Check whether the mask is applied to the correct position, the first token is turn token: - # s_id[2:] skips the artifact empty token and the turn token - # target[cur_idx + 1:cur_idx + tokenized_len] skip the turn token + # Check whether the mask is applied to the correct position, the first token is turn start tokens if not torch.equal(target[cur_idx + 1 : cur_idx + tokenized_len], s_id[1:]): logging.warning("a sentence mismatches the corresponding piece " "in the conversation") if i == 0 and (gtype == 'VALUE_TO_TEXT' or gtype is None): - # mask the first turn completely to provide at least one turn as context + # mask the first turn completely to provide at least one turn as context for the rest target[cur_idx : cur_idx + tokenized_len] = IGNORE_INDEX elif speaker == mask_role and i == 1 and gtype == 'TEXT_TO_VALUE': - # leave the first human tag unmasked - target[cur_idx + 1 : cur_idx + tokenized_len] = IGNORE_INDEX + # leave the first turn start tag unmasked, servers severs as the end of turn signal + target[cur_idx + num_turn_start_tokens : cur_idx + tokenized_len] = IGNORE_INDEX elif speaker == mask_role and (i > 1): - # leave the first human tag unmasked - target[cur_idx + 1 : cur_idx + tokenized_len] = IGNORE_INDEX + # leave the first turn start tag unmasked, which severs as the end of turn signal + target[cur_idx + num_turn_start_tokens : cur_idx + tokenized_len] = IGNORE_INDEX elif speaker == mask_role and (i <= 1): # mask out everything in the second turn target[cur_idx : cur_idx + tokenized_len] = IGNORE_INDEX else: - # mask up to the name end, need to remove one as skip name has an extra artifact empty token + # mask up to name part, label part for VALUE_TO_TEXT, or name part, response and label start tokens for TEXT_TO_VALUE, or just the name part if gtype is None target[cur_idx : cur_idx + skip_name_len] = IGNORE_INDEX cur_idx += tokenized_len -def cannonical_form_formater(cannoical_form): - return f'{cannoical_form}\n' - - -def response_value_formater(label): +def response_value_formater(label, label_start, end_signal): if isinstance(label, str): - return '' + label + '\n' + return label_start + label + end_signal elif label is None: return '' else: raise ValueError(f'Unknown label type {type(label)}, only str type is supported') -def _add_speaker_and_signal(header, source, mask_role, gtype): +def _add_speaker_and_signal(header, source, mask_role, gtype, special_tokens): + TURN_TOKEN = special_tokens['turn_start'] + END_SIGNAL = special_tokens['end_of_turn'] + LABEL_START = special_tokens['label_start'] + END_NAME_SIGNAL = special_tokens['end_of_name'] + """Add speaker and start/end signal on each round.""" BEGIN_SIGNAL = "" conversation = header @@ -138,7 +210,11 @@ def _add_speaker_and_signal(header, source, mask_role, gtype): + role_token + sentence_from + END_NAME_SIGNAL - + (response_value_formater(sentence['label']) if 'label' in sentence else '') + + ( + response_value_formater(sentence['label'], LABEL_START, END_NAME_SIGNAL) + if 'label' in sentence + else '' + ) + sentence["value"] + END_SIGNAL ) @@ -150,7 +226,11 @@ def _add_speaker_and_signal(header, source, mask_role, gtype): + END_NAME_SIGNAL + sentence["value"] + END_SIGNAL - + (response_value_formater(sentence['label']) if 'label' in sentence else '') + + ( + response_value_formater(sentence['label'], LABEL_START, END_NAME_SIGNAL) + if 'label' in sentence + else '' + ) ) else: raise ValueError( @@ -163,7 +243,14 @@ def _add_speaker_and_signal(header, source, mask_role, gtype): return conversation -def preprocess(source: dict, tokenizer: TokenizerSpec, extra_id_2_token_id: int, new_line_token_id: int): +def preprocess( + source: dict, + tokenizer: TokenizerSpec, + name_end_token_ids: int, + label_start_ids: list, + special_tokens: dict, + num_turn_start_tokens: int, +): """ Given a conversation list. This transform: 1. Add signal '### ' at the beginning each sentence, with end signal '\n'; @@ -171,36 +258,23 @@ def preprocess(source: dict, tokenizer: TokenizerSpec, extra_id_2_token_id: int, 3. Tokenize the concatenated conversation; 4. Make a deepcopy as the target. Mask human words with IGNORE_INDEX. """ - data_type = None - if 'type' in source: - data_type = source['type'] - assert data_type in TYPE_INSTRUCTION, f"source type {data_type} not supported" - # add end signal and concatenate together - conversation = source['system'] - if data_type is not None: - if TYPE_INSTRUCTION[data_type] != '': - conversation = conversation + '\n' + TYPE_INSTRUCTION[data_type] - mask_role = source.get('mask', 'User') - header = f"{SYSTEM_TOKEN}{conversation}" - conversation = _add_speaker_and_signal(header, source['conversations'], mask_role, data_type) + header, conversation, data_type, mask_role = _get_header_conversation_type_mask_role(source, special_tokens) # tokenize conversations input_ids = tokenizer.text_to_ids(conversation) target = copy.deepcopy(input_ids) - header_len = len(tokenizer.text_to_ids(header)) + header_tokens = tokenizer.text_to_ids(header) + header_len = len(header_tokens) ids = [] tokenized_lens = [] + assert torch.equal(torch.tensor(target[:header_len]), torch.tensor(header_tokens)) for s in source['conversations']: - if isinstance(tokenizer, SentencePieceTokenizer): - tokenized_sentence = tokenizer.text_to_ids(s["value"]) - ids.append(torch.tensor(tokenized_sentence)[1:]) - # remove one token as it adds an empty token in front - tokenized_lens.append(len(tokenized_sentence) - 1) - else: - tokenized_sentence = tokenizer.text_to_ids(s["value"]) - ids.append(torch.tensor(tokenized_sentence)) - # remove one token as it adds an empty token in front - tokenized_lens.append(len(tokenized_sentence)) + # hack to remove the extra empty token in front + id1 = tokenizer.text_to_ids(PREFIX_STR + s["value"]) + id2 = tokenizer.text_to_ids(PREFIX_STR) + tokenized_sentence = id1[len(id2) :] + ids.append(torch.tensor(tokenized_sentence)) + tokenized_lens.append(len(tokenized_sentence)) speakers = [sentence["from"] for sentence in source['conversations']] assert mask_role in speakers, "mask role not in the conversation" target = torch.LongTensor(target) @@ -216,8 +290,10 @@ def preprocess(source: dict, tokenizer: TokenizerSpec, extra_id_2_token_id: int, tokenizer, mask_role, data_type, - extra_id_2_token_id, - new_line_token_id, + name_end_token_ids, + special_tokens, + label_start_ids, + num_turn_start_tokens, ) mask = (target != IGNORE_INDEX).bool() assert mask.sum().item() != 0, "mask is empty" @@ -228,14 +304,6 @@ def preprocess(source: dict, tokenizer: TokenizerSpec, extra_id_2_token_id: int, return dict(input_ids=input_ids, mask=mask, context_ids=context_ids, answer_ids=answer_ids) -def _check_token_in_vocab(tokenizer, token): - ids = tokenizer.text_to_ids(token) - if isinstance(tokenizer, SentencePieceTokenizer): - return len(ids) == 2 - else: - return len(ids) == 1 - - class GPTSFTChatDataset(GPTSFTDataset): def _maybe_validate_prompt_template(self): pass @@ -243,22 +311,20 @@ def _maybe_validate_prompt_template(self): def _build_samples_mapping(self): super()._build_samples_mapping() assert hasattr(self.tokenizer, "vocab"), "tokenizer should have vocab property, not supported" - assert _check_token_in_vocab( - self.tokenizer, '' - ), " not in the tokenizer vocab. not supported" - assert _check_token_in_vocab( - self.tokenizer, '' - ), " not in the tokenizer vocab. not supported" - # calcuilate id value - if _check_token_in_vocab(self.tokenizer, ''): - ids_1 = self.tokenizer.text_to_ids('') - ids_2 = self.tokenizer.text_to_ids('') - self.extra_id_2_token_id = ids_1[len(ids_2) :][0] - else: - self.extra_id_2_token_id = None - ids_1 = self.tokenizer.text_to_ids('\n') - ids_2 = self.tokenizer.text_to_ids('') - self.new_line_token_id = ids_1[len(ids_2) :][0] + LABEL_START = self.special_tokens['label_start'] + END_NAME_SIGNAL = self.special_tokens['end_of_name'] + + id1 = self.tokenizer.text_to_ids(PREFIX_STR) + id2 = self.tokenizer.text_to_ids(PREFIX_STR + LABEL_START) + self.label_start_tokens = id2[len(id1) :] + + id1 = self.tokenizer.text_to_ids(PREFIX_STR + END_NAME_SIGNAL) + id2 = self.tokenizer.text_to_ids(PREFIX_STR) + self.name_end_token_ids = id1[len(id2) :] + + id1 = self.tokenizer.text_to_ids(PREFIX_STR + self.special_tokens['turn_start']) + id2 = self.tokenizer.text_to_ids(PREFIX_STR) + self.num_turn_start_tokens = len(id1) - len(id2) def _process_example(self, example): """ @@ -266,7 +332,14 @@ def _process_example(self, example): Truncation is carried out when needed, but it is performed only on the prompt side. BOS, EOS, and SEP, are added if specified. """ - result = preprocess(example, self.tokenizer, self.extra_id_2_token_id, self.new_line_token_id) + result = preprocess( + example, + self.tokenizer, + self.name_end_token_ids, + self.label_start_tokens, + self.special_tokens, + self.num_turn_start_tokens, + ) # store metadata in dataset, in case user may have keys required in the prediction json files metadata = {k: v for k, v in example.items() if k not in ['conversations']} diff --git a/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py b/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py index 101201ef7536..9c6e50f5e43f 100644 --- a/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py +++ b/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py @@ -13,10 +13,14 @@ # limitations under the License. import re -from typing import List, Optional +from typing import List, Mapping, Optional +import datasets import numpy as np import torch + +# hack to avoid the "not enough disk space" error in some slurm cluster +datasets.builder.has_sufficient_disk_space = lambda needed_bytes, directory='.': True from datasets import load_dataset from nemo.collections.common.tokenizers.tokenizer_spec import TokenizerSpec @@ -52,6 +56,7 @@ def __init__( memmap_workers: Optional[int] = None, hf_dataset: bool = False, truncation_method: str = 'right', + special_tokens: Optional[Mapping[str, str]] = None, # special tokens, a dictory of {token_type: token} ): """ file_path: Path to a JSONL GPT supervised fine-tuning dataset. Data is formatted as multiple JSON lines with each line formatted as follows. {'input': 'John von Neumann\nVon Neumann made fundamental contributions .... Q: What did the math of artificial viscosity do?', 'output': 'smoothed the shock transition without sacrificing basic physics'} @@ -73,6 +78,7 @@ def __init__( prompt_template: Prompt template to inject via an fstring. Formatted like Q: {context_key}\n\nA: {label_key} hf_dataset: Whether to load the json file with the HuggingFace dataset. otherwise, will load the jsonl file with the JSONLMemMapDataset. truncation_method: Truncation from which position. Options: ['left', 'right'] + special_tokens: special tokens for the chat prompts, a dictionary of {token_type: token}. Default: {'system_turn_start': '', 'turn_start': '', 'label_start': '', 'end_of_turn': '\n', "end_of_name": "\n"} """ self.tokenizer = tokenizer self.file_path = file_path @@ -93,6 +99,16 @@ def __init__( self.virtual_tokens = virtual_tokens self.tokens_to_generate = tokens_to_generate self.truncation_method = truncation_method + if special_tokens is None: + self.special_tokens = { + "system_turn_start": "", + "turn_start": "", + "label_start": "", + "end_of_turn": "\n", + "end_of_name": "\n", + } + else: + self.special_tokens = special_tokens if hf_dataset: self.indexed_dataset = load_dataset( diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py index 8f2a108fcfc0..84a1f52d1391 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py @@ -33,7 +33,6 @@ from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel from nemo.collections.nlp.modules.common.megatron.utils import get_iterator_k_split from nemo.collections.nlp.modules.common.text_generation_utils import generate, get_computeprob_response - from nemo.collections.nlp.parts.mixins.nlp_adapter_mixins import NLPAdapterModelMixin from nemo.collections.nlp.parts.utils_funcs import get_last_rank from nemo.utils import AppState, logging @@ -296,9 +295,11 @@ def _build_dataset(self, data_cfg, is_train=True): truncation_method=data_cfg.get( 'truncation_method', 'right' ), # used to choose truncation method. Options: ['random', 'left', 'right'] + special_tokens=self.cfg.data.get( + 'chat_prompt_tokens', None + ), # special tokens for the chat prompts, a dictionary of {token_type: token}. Default: {'system_turn_start': '', 'turn_start': '', 'label_start': '', 'end_of_turn': '\n', "end_of_name": "\n"} ) datasets.append(dataset) - if is_train: dataset = BlendableDataset( datasets=datasets, weights=data_cfg.concat_sampling_probabilities, size=num_train_samples_after_blend diff --git a/scripts/nlp_language_modeling/sft/data_clean.py b/scripts/nlp_language_modeling/sft/data_clean.py index 8c67aa2e3bcd..362f7edeeb3b 100644 --- a/scripts/nlp_language_modeling/sft/data_clean.py +++ b/scripts/nlp_language_modeling/sft/data_clean.py @@ -41,7 +41,7 @@ def data_clean( ) if library == 'huggingface': tokenizer.add_special_tokens({'additional_special_tokens': ['', '', '']}) - d = GPTSFTChatDataset(dataset_file, tokenizer, seq_len, 1) + d = GPTSFTChatDataset(dataset_file, tokenizer, seq_len, 1, hf_dataset=True) total_records = len(d) removed_ids = set() for i in range(total_records): diff --git a/scripts/nlp_language_modeling/sft/preprocessing.py b/scripts/nlp_language_modeling/sft/preprocessing.py index 7a08e055543d..175187a279bc 100644 --- a/scripts/nlp_language_modeling/sft/preprocessing.py +++ b/scripts/nlp_language_modeling/sft/preprocessing.py @@ -80,11 +80,12 @@ def parse_conversations(tree_obj): raise ValueError(f'unknown role {prompt_obj["role"]}') turn = {'value': prompt_obj['text'], 'from': role} if 'labels' in prompt_obj: - turn['human_labels'] = prompt_obj['labels'] - for key in turn['human_labels']: - value_set = label_values.get(key, set()) - value_set.add(turn['human_labels'][key]['value']) - label_values[key] = value_set + # remove human labels + # turn['human_labels'] = prompt_obj['labels'] + # for key in turn['human_labels']: + # value_set = label_values.get(key, set()) + # value_set.add(turn['human_labels'][key]['value']) + # label_values[key] = value_set turn['label'] = encode_labels(prompt_obj['labels']) if 'lang' in prompt_obj: turn['lang'] = prompt_obj['lang'].split('-')[0] diff --git a/tests/collections/nlp/test_chat_sft_dataset.py b/tests/collections/nlp/test_chat_sft_dataset.py index 36d00e3108d7..f7bcecaa3c28 100644 --- a/tests/collections/nlp/test_chat_sft_dataset.py +++ b/tests/collections/nlp/test_chat_sft_dataset.py @@ -16,13 +16,18 @@ import json import os import random +from functools import partial import pytest -from nemo.collections.nlp.data.language_modeling.megatron.gpt_sft_chat_dataset import GPTSFTChatDataset +from nemo.collections.nlp.data.language_modeling.megatron.gpt_sft_chat_dataset import ( + GPTSFTChatDataset, + get_prompt_template_example, +) from nemo.collections.nlp.modules.common.tokenizer_utils import get_nmt_tokenizer TOKENIZER_FILE_43B = '/home/TestData/nlp/megatron_sft/tokenizer.model' +TOKENIZER_FILE_Llama2 = '/home/TestData/nlp/megatron_sft/llama2_tokenizer.model' MERGE_FILE = '/home/TestData/nlp/megatron_sft/merges.txt' VOCAB_FILE = '/home/TestData/nlp/megatron_sft/vocab.json' @@ -54,7 +59,7 @@ def create_data_points(mask_user, turn_num, records, temp_file, t2v, label=True) with open(temp_file, 'w', encoding='utf-8') as f: for r in range(records): record = {} - record['system'] = 'a chat\n\n' + record['system'] = 'a chat' record['type'] = 'TEXT_TO_VALUE' if t2v else 'VALUE_TO_TEXT' record['mask'] = 'User' if mask_user else 'Assistant' turns = [] @@ -74,244 +79,377 @@ def create_data_points(mask_user, turn_num, records, temp_file, t2v, label=True) class TestGPTSFTChatDataset: @classmethod def setup_class(cls): - pass + cls.special_tokens = { + "system_turn_start": "", + "turn_start": "", + "label_start": "", + "end_of_turn": "\n", + "end_of_name": "\n", + } + cls.suffix = cls.special_tokens['end_of_turn'] + cls.special_tokens['turn_start'] + cls.label_suffix = cls.special_tokens['end_of_name'] + cls.special_tokens['turn_start'] - @pytest.mark.unit - def test_43B_tokenizer_mask_user(self): + def _mask_user_test(self, tokenizer, ids_to_text): random.seed(5) temp_file = '/tmp/test_file.jsonl' turn_num = 5 records = 5 try: data_points = create_data_points(True, turn_num, records, temp_file, t2v=False) - tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_43B) - d = GPTSFTChatDataset(temp_file, tokenizer, 4096, 1, index_mapping_dir='/tmp/', hf_dataset=True) + d = GPTSFTChatDataset( + temp_file, + tokenizer, + 4096, + 1, + index_mapping_dir='/tmp/', + hf_dataset=True, + special_tokens=self.special_tokens, + ) for i in range(len(d)): result = d[i] input_ids = result['input_ids'] mask = result['mask'] - text = tokenizer.ids_to_text(input_ids[mask].tolist()) + text = ids_to_text(input_ids[mask].tolist()) expected_text = '' for j in range(1, turn_num, 2): - expected_text += data_points[i]['conversations'][j]['value'] + '\n' + '' + expected_text += data_points[i]['conversations'][j]['value'] + self.suffix assert text == expected_text finally: os.remove(temp_file) - @pytest.mark.unit - def test_43B_tokenizer_mask_assistant(self): + def _mask_assistant_test(self, tokenizer, ids_to_text): random.seed(3) temp_file = '/tmp/test_file.jsonl' turn_num = 5 records = 5 try: data_points = create_data_points(False, turn_num, records, temp_file, t2v=False) - tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_43B) - d = GPTSFTChatDataset(temp_file, tokenizer, 4096, 1, index_mapping_dir='/tmp/', hf_dataset=True) + d = GPTSFTChatDataset( + temp_file, + tokenizer, + 4096, + 1, + index_mapping_dir='/tmp/', + hf_dataset=True, + special_tokens=self.special_tokens, + ) for i in range(len(d)): result = d[i] input_ids = result['input_ids'] mask = result['mask'] - text = tokenizer.ids_to_text(input_ids[mask].tolist()) + text = ids_to_text(input_ids[mask].tolist()) expected_text = '' for j in range(2, turn_num, 2): - expected_text += data_points[i]['conversations'][j]['value'] + '\n' + '' + expected_text += data_points[i]['conversations'][j]['value'] + self.suffix assert text == expected_text finally: os.remove(temp_file) - @pytest.mark.unit - def test_43B_tokenizer_mask_user_t2v(self): + def _mask_user_t2v_test(self, tokenizer, ids_to_text): random.seed(5) temp_file = '/tmp/test_file.jsonl' turn_num = 5 records = 5 try: data_points = create_data_points(True, turn_num, records, temp_file, t2v=True) - tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_43B) - d = GPTSFTChatDataset(temp_file, tokenizer, 4096, 1, index_mapping_dir='/tmp/', hf_dataset=True) + d = GPTSFTChatDataset( + temp_file, + tokenizer, + 4096, + 1, + index_mapping_dir='/tmp/', + hf_dataset=True, + special_tokens=self.special_tokens, + ) for i in range(len(d)): result = d[i] input_ids = result['input_ids'] mask = result['mask'] - text = tokenizer.ids_to_text(input_ids[mask].tolist()) + text = ids_to_text(input_ids[mask].tolist()) expected_text = '' for j in range(1, turn_num, 2): - expected_text += data_points[i]['conversations'][j]['label'] + '\n' + '' + expected_text += data_points[i]['conversations'][j]['label'] + self.label_suffix assert text == expected_text finally: os.remove(temp_file) - @pytest.mark.unit - def test_43B_tokenizer_mask_assistant_t2v(self): + def _mask_assistant_t2v_test(self, tokenizer, ids_to_text): random.seed(5) temp_file = '/tmp/test_file.jsonl' turn_num = 5 records = 5 try: data_points = create_data_points(False, turn_num, records, temp_file, t2v=True) - tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_43B) - d = GPTSFTChatDataset(temp_file, tokenizer, 4096, 1, index_mapping_dir='/tmp/', hf_dataset=True) + d = GPTSFTChatDataset( + temp_file, + tokenizer, + 4096, + 1, + index_mapping_dir='/tmp/', + hf_dataset=True, + special_tokens=self.special_tokens, + ) for i in range(len(d)): result = d[i] input_ids = result['input_ids'] mask = result['mask'] - text = tokenizer.ids_to_text(input_ids[mask].tolist()) + text = ids_to_text(input_ids[mask].tolist()) expected_text = '' for j in range(0, turn_num, 2): - expected_text += data_points[i]['conversations'][j]['label'] + '\n' + '' + expected_text += data_points[i]['conversations'][j]['label'] + self.label_suffix assert text == expected_text finally: os.remove(temp_file) - @pytest.mark.unit - def test_mpt_tokenizer_mask_user(self): + def _mask_user_nolabel_test(self, tokenizer, ids_to_text): random.seed(5) temp_file = '/tmp/test_file.jsonl' turn_num = 5 records = 5 try: - data_points = create_data_points(True, turn_num, records, temp_file, t2v=False) - tokenizer = get_nmt_tokenizer( - library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True - ) - tokenizer.add_special_tokens( - {'additional_special_tokens': ['', '', '']} + data_points = create_data_points(True, turn_num, records, temp_file, t2v=False, label=False) + d = GPTSFTChatDataset( + temp_file, + tokenizer, + 4096, + 1, + index_mapping_dir='/tmp/', + hf_dataset=True, + special_tokens=self.special_tokens, ) - d = GPTSFTChatDataset(temp_file, tokenizer, 4096, 1, index_mapping_dir='/tmp/', hf_dataset=True) for i in range(len(d)): result = d[i] input_ids = result['input_ids'] mask = result['mask'] - text = ids_to_text(tokenizer, input_ids[mask].tolist()) + text = ids_to_text(input_ids[mask].tolist()) expected_text = '' for j in range(1, turn_num, 2): - expected_text += data_points[i]['conversations'][j]['value'] + '\n' + '' + expected_text += data_points[i]['conversations'][j]['value'] + self.suffix assert text == expected_text finally: os.remove(temp_file) - @pytest.mark.unit - def test_mpt_tokenizer_mask_assistant(self): + def _mask_assistant_nolabel_test(self, tokenizer, ids_to_text): random.seed(3) temp_file = '/tmp/test_file.jsonl' turn_num = 5 records = 5 try: - data_points = create_data_points(False, turn_num, records, temp_file, t2v=False) - tokenizer = get_nmt_tokenizer( - library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True - ) - tokenizer.add_special_tokens( - {'additional_special_tokens': ['', '', '']} + data_points = create_data_points(False, turn_num, records, temp_file, t2v=False, label=False) + d = GPTSFTChatDataset( + temp_file, + tokenizer, + 4096, + 1, + index_mapping_dir='/tmp/', + hf_dataset=True, + special_tokens=self.special_tokens, ) - d = GPTSFTChatDataset(temp_file, tokenizer, 4096, 1, index_mapping_dir='/tmp/', hf_dataset=True) for i in range(len(d)): result = d[i] input_ids = result['input_ids'] mask = result['mask'] - text = ids_to_text(tokenizer, input_ids[mask].tolist()) + text = ids_to_text(input_ids[mask].tolist()) expected_text = '' for j in range(2, turn_num, 2): - expected_text += data_points[i]['conversations'][j]['value'] + '\n' + '' + expected_text += data_points[i]['conversations'][j]['value'] + self.suffix assert text == expected_text finally: os.remove(temp_file) - @pytest.mark.unit - def test_mpt_tokenizer_mask_user_t2v(self): + def _test_example_prompt(self, tokenizer): random.seed(5) - temp_file = '/tmp/test_file.jsonl' - turn_num = 5 - records = 5 - try: - data_points = create_data_points(True, turn_num, records, temp_file, t2v=True) - tokenizer = get_nmt_tokenizer( - library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True + conv = get_prompt_template_example(self.special_tokens) + expected = ( + self.special_tokens['system_turn_start'] + + 'System' + + self.special_tokens['end_of_name'] + + '{system message}' + + self.special_tokens['end_of_turn'] + ) + for turn in range(2): + expected += ( + self.special_tokens['turn_start'] + + 'User' + + self.special_tokens['end_of_name'] + + f'{{turn {turn + 1} user message}}' + + self.special_tokens['end_of_turn'] ) - tokenizer.add_special_tokens( - {'additional_special_tokens': ['', '', '']} + expected += self.special_tokens['turn_start'] + 'Assistant' + self.special_tokens['end_of_name'] + expected += ( + self.special_tokens['label_start'] + + f'{{turn {turn + 1} assistant label}}' + + self.special_tokens['end_of_name'] ) - d = GPTSFTChatDataset(temp_file, tokenizer, 4096, 1, index_mapping_dir='/tmp/', hf_dataset=True) - for i in range(len(d)): - result = d[i] - input_ids = result['input_ids'] - mask = result['mask'] - text = ids_to_text(tokenizer, input_ids[mask].tolist()) - expected_text = '' - for j in range(1, turn_num, 2): - expected_text += data_points[i]['conversations'][j]['label'] + '\n' + '' - assert text == expected_text - finally: - os.remove(temp_file) + expected += f'{{turn {turn + 1} assistant message}}' + self.special_tokens['end_of_turn'] + expected += self.special_tokens['turn_start'] + assert conv == expected @pytest.mark.unit - def test_mpt_tokenizer_mask_assistant_t2v(self): - random.seed(5) - temp_file = '/tmp/test_file.jsonl' - turn_num = 5 - records = 5 - try: - data_points = create_data_points(False, turn_num, records, temp_file, t2v=True) - tokenizer = get_nmt_tokenizer( - library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True - ) - tokenizer.add_special_tokens( - {'additional_special_tokens': ['', '', '']} - ) - d = GPTSFTChatDataset(temp_file, tokenizer, 4096, 1, index_mapping_dir='/tmp/', hf_dataset=True) - for i in range(len(d)): - result = d[i] - input_ids = result['input_ids'] - mask = result['mask'] - text = ids_to_text(tokenizer, input_ids[mask].tolist()) - expected_text = '' - for j in range(0, turn_num, 2): - expected_text += data_points[i]['conversations'][j]['label'] + '\n' + '' - assert text == expected_text - finally: - os.remove(temp_file) + def test_43B_example_prompt(self): + tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_43B) + self._test_example_prompt(tokenizer) + + @pytest.mark.unit + def test_43B_tokenizer_mask_user(self): + tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_43B) + self._mask_user_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_43B_tokenizer_mask_assistant(self): + tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_43B) + self._mask_assistant_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_43B_tokenizer_mask_user_t2v(self): + tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_43B) + self._mask_user_t2v_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_43B_tokenizer_mask_assistant_t2v(self): + tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_43B) + self._mask_assistant_t2v_test(tokenizer, tokenizer.ids_to_text) @pytest.mark.unit def test_43B_tokenizer_mask_user_nolabel(self): - random.seed(5) - temp_file = '/tmp/test_file.jsonl' - turn_num = 5 - records = 5 - try: - data_points = create_data_points(True, turn_num, records, temp_file, t2v=False, label=False) - tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_43B) - d = GPTSFTChatDataset(temp_file, tokenizer, 4096, 1, index_mapping_dir='/tmp/', hf_dataset=True) - for i in range(len(d)): - result = d[i] - input_ids = result['input_ids'] - mask = result['mask'] - text = tokenizer.ids_to_text(input_ids[mask].tolist()) - expected_text = '' - for j in range(1, turn_num, 2): - expected_text += data_points[i]['conversations'][j]['value'] + '\n' + '' - assert text == expected_text - finally: - os.remove(temp_file) + tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_43B) + self._mask_user_nolabel_test(tokenizer, tokenizer.ids_to_text) @pytest.mark.unit def test_43B_tokenizer_mask_assistant_nolabel(self): - random.seed(3) - temp_file = '/tmp/test_file.jsonl' - turn_num = 5 - records = 5 - try: - data_points = create_data_points(False, turn_num, records, temp_file, t2v=False, label=False) - tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_43B) - d = GPTSFTChatDataset(temp_file, tokenizer, 4096, 1, index_mapping_dir='/tmp/', hf_dataset=True) - for i in range(len(d)): - result = d[i] - input_ids = result['input_ids'] - mask = result['mask'] - text = tokenizer.ids_to_text(input_ids[mask].tolist()) - expected_text = '' - for j in range(2, turn_num, 2): - expected_text += data_points[i]['conversations'][j]['value'] + '\n' + '' - assert text == expected_text - finally: - os.remove(temp_file) + tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_43B) + self._mask_assistant_nolabel_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_mpt_tokenizer_mask_user(self): + tokenizer = get_nmt_tokenizer( + library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True + ) + tokenizer.add_special_tokens({'additional_special_tokens': ['', '', '']}) + self._mask_user_test(tokenizer, partial(ids_to_text, tokenizer)) + + @pytest.mark.unit + def test_mpt_tokenizer_mask_assistant(self): + tokenizer = get_nmt_tokenizer( + library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True + ) + tokenizer.add_special_tokens({'additional_special_tokens': ['', '', '']}) + self._mask_assistant_test(tokenizer, partial(ids_to_text, tokenizer)) + + @pytest.mark.unit + def test_mpt_tokenizer_mask_user_t2v(self): + tokenizer = get_nmt_tokenizer( + library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True + ) + tokenizer.add_special_tokens({'additional_special_tokens': ['', '', '']}) + self._mask_user_t2v_test(tokenizer, partial(ids_to_text, tokenizer)) + + @pytest.mark.unit + def test_mpt_tokenizer_mask_assistant_t2v(self): + tokenizer = get_nmt_tokenizer( + library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True + ) + tokenizer.add_special_tokens({'additional_special_tokens': ['', '', '']}) + self._mask_assistant_t2v_test(tokenizer, partial(ids_to_text, tokenizer)) + + @pytest.mark.unit + def test_mpt_tokenizer_mask_user_nolabel(self): + tokenizer = get_nmt_tokenizer( + library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True + ) + tokenizer.add_special_tokens({'additional_special_tokens': ['', '', '']}) + self._mask_user_nolabel_test(tokenizer, partial(ids_to_text, tokenizer)) + + @pytest.mark.unit + def test_mpt_tokenizer_mask_assistant_nolabel(self): + tokenizer = get_nmt_tokenizer( + library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True + ) + tokenizer.add_special_tokens({'additional_special_tokens': ['', '', '']}) + self._mask_assistant_nolabel_test(tokenizer, partial(ids_to_text, tokenizer)) + + @pytest.mark.unit + def test_llama2_tokenizer_mask_user(self): + tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_Llama2) + self._mask_user_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_llama2_tokenizer_mask_assistant(self): + tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_Llama2) + self._mask_assistant_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_llama2_tokenizer_mask_user_t2v(self): + tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_Llama2) + self._mask_user_t2v_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_llama2_tokenizer_mask_assistant_t2v(self): + tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_Llama2) + self._mask_assistant_t2v_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_llama2_tokenizer_mask_user_nolabel(self): + tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_Llama2) + self._mask_user_nolabel_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_llama2_tokenizer_mask_assistant_nolabel(self): + tokenizer = get_nmt_tokenizer(library='sentencepiece', tokenizer_model=TOKENIZER_FILE_Llama2) + self._mask_assistant_nolabel_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_normal_mpt_tokenizer_mask_user(self): + tokenizer = get_nmt_tokenizer( + library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True + ) + self._mask_user_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_normal_mpt_tokenizer_mask_assistant(self): + tokenizer = get_nmt_tokenizer( + library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True + ) + self._mask_assistant_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_normal_mpt_tokenizer_mask_user_t2v(self): + tokenizer = get_nmt_tokenizer( + library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True + ) + self._mask_user_t2v_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_normal_mpt_tokenizer_mask_assistant_t2v(self): + tokenizer = get_nmt_tokenizer( + library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True + ) + self._mask_assistant_t2v_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_normal_mpt_tokenizer_mask_user_nolabel(self): + tokenizer = get_nmt_tokenizer( + library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True + ) + self._mask_user_nolabel_test(tokenizer, tokenizer.ids_to_text) + + @pytest.mark.unit + def test_normal_mpt_tokenizer_mask_assistant_nolabel(self): + tokenizer = get_nmt_tokenizer( + library='huggingface', model_name='gpt2', merges_file=MERGE_FILE, vocab_file=VOCAB_FILE, use_fast=True + ) + self._mask_assistant_nolabel_test(tokenizer, tokenizer.ids_to_text) + + +class TestDifferentGPTSFTChatDataset(TestGPTSFTChatDataset): + @classmethod + def setup_class(cls): + cls.special_tokens = { + "system_turn_start": "<|im_start|>", + "turn_start": "<|im_start|>", + "label_start": "<|label|>", + "end_of_turn": "<|im_end|>\n", + "end_of_name": "\n", + } + cls.suffix = cls.special_tokens['end_of_turn'] + cls.special_tokens['turn_start'] + cls.label_suffix = cls.special_tokens['end_of_name'] + cls.special_tokens['turn_start'] From 182b1aa95edaa83d136730e1dad40a7f28c1befe Mon Sep 17 00:00:00 2001 From: Sasha Meister Date: Tue, 10 Oct 2023 14:38:26 +0000 Subject: [PATCH 111/112] Added References Signed-off-by: Sasha Meister --- .../nlp/nemo_megatron/flash_attention.rst | 10 ++- .../nemo_megatron/positional_embeddings.rst | 30 ++++--- docs/source/nlp/nlp_all.bib | 81 +++++++++++++++++++ 3 files changed, 109 insertions(+), 12 deletions(-) diff --git a/docs/source/nlp/nemo_megatron/flash_attention.rst b/docs/source/nlp/nemo_megatron/flash_attention.rst index 78b3b9c2470c..b00b7a38d63a 100644 --- a/docs/source/nlp/nemo_megatron/flash_attention.rst +++ b/docs/source/nlp/nemo_megatron/flash_attention.rst @@ -1,6 +1,6 @@ Flash attention --------------- -`FlashAttention `_ is a method designed to enhance the efficiency of Transformer models, which are widely utilized in applications such as natural language processing. Traditional Transformers are slow and consume a lot of memory, especially with long sequences, due to the quadratic time and memory complexity of self-attention. FlashAttention, an IO-aware exact attention algorithm that leverages tiling to minimize the number of memory reads/writes between the GPU's high bandwidth memory (HBM) and on-chip SRAM. This approach is designed to be more efficient in terms of IO complexity compared to standard attention mechanisms. +Flash Attention :cite:`nlp-megatron-dao2022flashattention` is a method designed to enhance the efficiency of Transformer models, which are widely utilized in applications such as natural language processing. Traditional Transformers are slow and consume a lot of memory, especially with long sequences, due to the quadratic time and memory complexity of self-attention. FlashAttention, an IO-aware exact attention algorithm that leverages tiling to minimize the number of memory reads/writes between the GPU's high bandwidth memory (HBM) and on-chip SRAM. This approach is designed to be more efficient in terms of IO complexity compared to standard attention mechanisms. GPT ^^^ @@ -18,3 +18,11 @@ To enable Flash Attention while Megatron T5 model training, modify the following model.encoder.use_flash_attention=True model.decoder.use_flash_attention=True + +References +---------- + +.. bibliography:: ../nlp_all.bib + :style: plain + :labelprefix: nlp-megatron + :keyprefix: nlp-megatron- diff --git a/docs/source/nlp/nemo_megatron/positional_embeddings.rst b/docs/source/nlp/nemo_megatron/positional_embeddings.rst index 68dce5bd119a..b8dea5280c28 100644 --- a/docs/source/nlp/nemo_megatron/positional_embeddings.rst +++ b/docs/source/nlp/nemo_megatron/positional_embeddings.rst @@ -18,38 +18,38 @@ GPT - .. code:: model.position_embedding_type='learned_absolute' - - `Absolute Position Encodings `_ are position embeddings used in Transformer-based models, added to input embeddings in the encoder and decoder sections. These encodings match the dimension of embeddings and are created using sine and cosine functions of various frequencies. Each dimension in the encoding corresponds to a sinusoid with wavelengths forming a geometric progression. + - Absolute Position Encodings :cite:`nlp-megatron-vaswani2023attention` are position embeddings used in Transformer-based models, added to input embeddings in the encoder and decoder sections. These encodings match the dimension of embeddings and are created using sine and cosine functions of various frequencies. Each dimension in the encoding corresponds to a sinusoid with wavelengths forming a geometric progression. * - **rope** - .. code:: model.position_embedding_type='rope' model.rotary_percentage=1.0 - - `Rotary Position Embedding (RoPE) `_ incorporates positional information by utilizing a rotation matrix to encode the absolute positions of tokens while maintaining relative positional relationships in self-attention formulations by leveraging the geometric properties of vectors and complex numbers, applying a rotation based on a preset non-zero constant and the relative positions of the tokens to the word embeddings. + - Rotary Position Embedding (RoPE) :cite:`nlp-megatron-su2022roformer` incorporates positional information by utilizing a rotation matrix to encode the absolute positions of tokens while maintaining relative positional relationships in self-attention formulations by leveraging the geometric properties of vectors and complex numbers, applying a rotation based on a preset non-zero constant and the relative positions of the tokens to the word embeddings. * - **alibi** - .. code:: model.position_embedding_type='alibi' - - `Attention with Linear Biases (ALiBi) `_ modifies the way attention scores are computed in the attention sublayer of the network. ALiBi introduces a static, non-learned bias after the query-key dot product during the computation of attention scores. This bias is added in the form of a head-specific slope that is determined before training, creating a geometric sequence of slopes for the different heads in the model. The method has an inductive bias towards recency, penalizing attention scores between distant query-key pairs with the penalty increasing as the distance grows, and it leverages different rates of penalty increase across different heads based on the slope magnitude. + - Attention with Linear Biases (ALiBi) :cite:`nlp-megatron-press2022train` modifies the way attention scores are computed in the attention sublayer of the network. ALiBi introduces a static, non-learned bias after the query-key dot product during the computation of attention scores. This bias is added in the form of a head-specific slope that is determined before training, creating a geometric sequence of slopes for the different heads in the model. The method has an inductive bias towards recency, penalizing attention scores between distant query-key pairs with the penalty increasing as the distance grows, and it leverages different rates of penalty increase across different heads based on the slope magnitude. * - **kerple** - .. code:: model.position_embedding_type='kerple' - - `Kernelized Relative Positional Embedding for Length Extrapolation (KERPLE) `_ generalizes relative positional embeddings (RPE) by kernelizing positional differences using conditionally positive definite (CPD) kernels known for generalizing distance metrics. They transform CPD kernels into positive definite (PD) kernels by adding a constant offset, which is absorbed during softmax normalization in the self-attention mechanism of transformers. This approach allows for a variety of RPEs that facilitate length extrapolation in a principled manner. + - Kernelized Relative Positional Embedding for Length Extrapolation (KERPLE) :cite:`nlp-megatron-chi2022kerple` generalizes relative positional embeddings (RPE) by kernelizing positional differences using conditionally positive definite (CPD) kernels known for generalizing distance metrics. They transform CPD kernels into positive definite (PD) kernels by adding a constant offset, which is absorbed during softmax normalization in the self-attention mechanism of transformers. This approach allows for a variety of RPEs that facilitate length extrapolation in a principled manner. * - **xpos** - .. code:: model.position_embedding_type='xpos' - - `Extrapolatable Position Embedding (xPos) `_ + - Extrapolatable Position Embedding (xPos) :cite:`nlp-megatron-sun2022lengthextrapolatable` * - **sandwich** - .. code:: model.position_embedding_type='sandwich' - - `Sandwich `_ + - Sandwich :cite:`nlp-megatron-chi2023dissecting` T5 ^^ @@ -67,32 +67,32 @@ T5 model.encoder.position_embedding_type='learned_absolute' model.decoder.position_embedding_type='learned_absolute' - - `Absolute Position Encodings `_ are position embeddings used in Transformer-based models, added to input embeddings in the encoder and decoder sections. These encodings match the dimension of embeddings and are created using sine and cosine functions of various frequencies. Each dimension in the encoding corresponds to a sinusoid with wavelengths forming a geometric progression. + - Absolute Position Encodings :cite:`nlp-megatron-vaswani2023attention` are position embeddings used in Transformer-based models, added to input embeddings in the encoder and decoder sections. These encodings match the dimension of embeddings and are created using sine and cosine functions of various frequencies. Each dimension in the encoding corresponds to a sinusoid with wavelengths forming a geometric progression. * - **relative** - .. code:: model.encoder.position_embedding_type='relative' model.decoder.position_embedding_type='relative' - - `Relative Position Representations `_ + - Relative Position Representations :cite:`nlp-megatron-shaw2018selfattention` * - **alibi** - .. code:: model.encoder.position_embedding_type='alibi' model.decoder.position_embedding_type='alibi' - - `Attention with Linear Biases (ALiBi) `_ modifies the way attention scores are computed in the attention sublayer of the network. ALiBi introduces a static, non-learned bias after the query-key dot product during the computation of attention scores. This bias is added in the form of a head-specific slope that is determined before training, creating a geometric sequence of slopes for the different heads in the model. The method has an inductive bias towards recency, penalizing attention scores between distant query-key pairs with the penalty increasing as the distance grows, and it leverages different rates of penalty increase across different heads based on the slope magnitude. + - Attention with Linear Biases (ALiBi) :cite:`nlp-megatron-press2022train` modifies the way attention scores are computed in the attention sublayer of the network. ALiBi introduces a static, non-learned bias after the query-key dot product during the computation of attention scores. This bias is added in the form of a head-specific slope that is determined before training, creating a geometric sequence of slopes for the different heads in the model. The method has an inductive bias towards recency, penalizing attention scores between distant query-key pairs with the penalty increasing as the distance grows, and it leverages different rates of penalty increase across different heads based on the slope magnitude. * - **kerple** - .. code:: model.encoder.position_embedding_type='kerple' model.decoder.position_embedding_type='kerple' - - `Kernelized Relative Positional Embedding for Length Extrapolation (KERPLE) `_ generalizes relative positional embeddings (RPE) by kernelizing positional differences using conditionally positive definite (CPD) kernels known for generalizing distance metrics. They transform CPD kernels into positive definite (PD) kernels by adding a constant offset, which is absorbed during softmax normalization in the self-attention mechanism of transformers. This approach allows for a variety of RPEs that facilitate length extrapolation in a principled manner. + - Kernelized Relative Positional Embedding for Length Extrapolation (KERPLE) :cite:`nlp-megatron-chi2022kerple` generalizes relative positional embeddings (RPE) by kernelizing positional differences using conditionally positive definite (CPD) kernels known for generalizing distance metrics. They transform CPD kernels into positive definite (PD) kernels by adding a constant offset, which is absorbed during softmax normalization in the self-attention mechanism of transformers. This approach allows for a variety of RPEs that facilitate length extrapolation in a principled manner. Positional interpolation ------------------------ -`Position Interpolation (PI) `_ is a method introduced to extend the context window sizes of Rotary Position Embedding (RoPE)-based pretrained large language models (LLMs). The central principle of PI is to reduce the position indices so that they align with the initial context window size through interpolation. +Position Interpolation (PI) :cite:`nlp-megatron-chen2023extending` is a method introduced to extend the context window sizes of Rotary Position Embedding (RoPE)-based pretrained large language models (LLMs). The central principle of PI is to reduce the position indices so that they align with the initial context window size through interpolation. Positional Interpolation is supported in Megatron GPT SFT models. Set RoPE Interpolation factor for sequence length :code:`seq_len_interpolation_factor` to enable it. @@ -101,3 +101,11 @@ Positional Interpolation is supported in Megatron GPT SFT models. Set RoPE Inter model.position_embedding_type='rope' model.rotary_percentage=1.0 model.seq_len_interpolation_factor: 2 + +References +---------- + +.. bibliography:: ../nlp_all.bib + :style: plain + :labelprefix: nlp-megatron + :keyprefix: nlp-megatron- \ No newline at end of file diff --git a/docs/source/nlp/nlp_all.bib b/docs/source/nlp/nlp_all.bib index 48a53240e52b..48f97e929fc2 100644 --- a/docs/source/nlp/nlp_all.bib +++ b/docs/source/nlp/nlp_all.bib @@ -225,3 +225,84 @@ @misc{antonova2023spellmapper archivePrefix={arXiv}, primaryClass={cs.CL} } + +@misc{dao2022flashattention, + title={FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness}, + author={Tri Dao and Daniel Y. Fu and Stefano Ermon and Atri Rudra and Christopher Ré}, + year={2022}, + eprint={2205.14135}, + archivePrefix={arXiv}, + primaryClass={cs.LG} +} + +@misc{vaswani2023attention, + title={Attention Is All You Need}, + author={Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin}, + year={2023}, + eprint={1706.03762}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} + +@misc{su2022roformer, + title={RoFormer: Enhanced Transformer with Rotary Position Embedding}, + author={Jianlin Su and Yu Lu and Shengfeng Pan and Ahmed Murtadha and Bo Wen and Yunfeng Liu}, + year={2022}, + eprint={2104.09864}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} + +@misc{press2022train, + title={Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation}, + author={Ofir Press and Noah A. Smith and Mike Lewis}, + year={2022}, + eprint={2108.12409}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} + +@misc{chi2022kerple, + title={KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation}, + author={Ta-Chung Chi and Ting-Han Fan and Peter J. Ramadge and Alexander I. Rudnicky}, + year={2022}, + eprint={2205.09921}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} + +@misc{sun2022lengthextrapolatable, + title={A Length-Extrapolatable Transformer}, + author={Yutao Sun and Li Dong and Barun Patra and Shuming Ma and Shaohan Huang and Alon Benhaim and Vishrav Chaudhary and Xia Song and Furu Wei}, + year={2022}, + eprint={2212.10554}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} + +@misc{chi2023dissecting, + title={Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis}, + author={Ta-Chung Chi and Ting-Han Fan and Alexander I. Rudnicky and Peter J. Ramadge}, + year={2023}, + eprint={2212.10356}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} + +@misc{shaw2018selfattention, + title={Self-Attention with Relative Position Representations}, + author={Peter Shaw and Jakob Uszkoreit and Ashish Vaswani}, + year={2018}, + eprint={1803.02155}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} + +@misc{chen2023extending, + title={Extending Context Window of Large Language Models via Positional Interpolation}, + author={Shouyuan Chen and Sherman Wong and Liangjian Chen and Yuandong Tian}, + year={2023}, + eprint={2306.15595}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} \ No newline at end of file From 2e175551b46760548f6ef351ccf3de0866450805 Mon Sep 17 00:00:00 2001 From: Sasha Meister Date: Tue, 10 Oct 2023 14:48:49 +0000 Subject: [PATCH 112/112] added to toctree Signed-off-by: Sasha Meister --- docs/source/nlp/nemo_megatron/intro.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/nlp/nemo_megatron/intro.rst b/docs/source/nlp/nemo_megatron/intro.rst index c1b158d77e3e..577291424c80 100644 --- a/docs/source/nlp/nemo_megatron/intro.rst +++ b/docs/source/nlp/nemo_megatron/intro.rst @@ -26,6 +26,8 @@ team at NVIDIA. NeMo Megatron supports several types of models: retro/retro_model hiddens/hiddens_module peft/landing_page + flash_attention + positional_embeddings References