From 2b4bf7bc3a7d4d76cc3c435958ce6dc4e9bd2f15 Mon Sep 17 00:00:00 2001 From: JoFrhwld Date: Wed, 5 Oct 2022 09:43:47 -0400 Subject: [PATCH 1/4] added full CC-BY-SA 4.0 text --- license_fulltext | 428 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 428 insertions(+) create mode 100644 license_fulltext diff --git a/license_fulltext b/license_fulltext new file mode 100644 index 0000000..2d58298 --- /dev/null +++ b/license_fulltext @@ -0,0 +1,428 @@ +Attribution-ShareAlike 4.0 International + +======================================================================= + +Creative Commons Corporation ("Creative Commons") is not a law firm and +does not provide legal services or legal advice. Distribution of +Creative Commons public licenses does not create a lawyer-client or +other relationship. Creative Commons makes its licenses and related +information available on an "as-is" basis. Creative Commons gives no +warranties regarding its licenses, any material licensed under their +terms and conditions, or any related information. Creative Commons +disclaims all liability for damages resulting from their use to the +fullest extent possible. + +Using Creative Commons Public Licenses + +Creative Commons public licenses provide a standard set of terms and +conditions that creators and other rights holders may use to share +original works of authorship and other material subject to copyright +and certain other rights specified in the public license below. The +following considerations are for informational purposes only, are not +exhaustive, and do not form part of our licenses. + + Considerations for licensors: Our public licenses are + intended for use by those authorized to give the public + permission to use material in ways otherwise restricted by + copyright and certain other rights. Our licenses are + irrevocable. Licensors should read and understand the terms + and conditions of the license they choose before applying it. + Licensors should also secure all rights necessary before + applying our licenses so that the public can reuse the + material as expected. Licensors should clearly mark any + material not subject to the license. This includes other CC- + licensed material, or material used under an exception or + limitation to copyright. More considerations for licensors: + wiki.creativecommons.org/Considerations_for_licensors + + Considerations for the public: By using one of our public + licenses, a licensor grants the public permission to use the + licensed material under specified terms and conditions. If + the licensor's permission is not necessary for any reason--for + example, because of any applicable exception or limitation to + copyright--then that use is not regulated by the license. Our + licenses grant only permissions under copyright and certain + other rights that a licensor has authority to grant. Use of + the licensed material may still be restricted for other + reasons, including because others have copyright or other + rights in the material. A licensor may make special requests, + such as asking that all changes be marked or described. + Although not required by our licenses, you are encouraged to + respect those requests where reasonable. More considerations + for the public: + wiki.creativecommons.org/Considerations_for_licensees + +======================================================================= + +Creative Commons Attribution-ShareAlike 4.0 International Public +License + +By exercising the Licensed Rights (defined below), You accept and agree +to be bound by the terms and conditions of this Creative Commons +Attribution-ShareAlike 4.0 International Public License ("Public +License"). To the extent this Public License may be interpreted as a +contract, You are granted the Licensed Rights in consideration of Your +acceptance of these terms and conditions, and the Licensor grants You +such rights in consideration of benefits the Licensor receives from +making the Licensed Material available under these terms and +conditions. + + +Section 1 -- Definitions. + + a. Adapted Material means material subject to Copyright and Similar + Rights that is derived from or based upon the Licensed Material + and in which the Licensed Material is translated, altered, + arranged, transformed, or otherwise modified in a manner requiring + permission under the Copyright and Similar Rights held by the + Licensor. For purposes of this Public License, where the Licensed + Material is a musical work, performance, or sound recording, + Adapted Material is always produced where the Licensed Material is + synched in timed relation with a moving image. + + b. Adapter's License means the license You apply to Your Copyright + and Similar Rights in Your contributions to Adapted Material in + accordance with the terms and conditions of this Public License. + + c. BY-SA Compatible License means a license listed at + creativecommons.org/compatiblelicenses, approved by Creative + Commons as essentially the equivalent of this Public License. + + d. Copyright and Similar Rights means copyright and/or similar rights + closely related to copyright including, without limitation, + performance, broadcast, sound recording, and Sui Generis Database + Rights, without regard to how the rights are labeled or + categorized. For purposes of this Public License, the rights + specified in Section 2(b)(1)-(2) are not Copyright and Similar + Rights. + + e. Effective Technological Measures means those measures that, in the + absence of proper authority, may not be circumvented under laws + fulfilling obligations under Article 11 of the WIPO Copyright + Treaty adopted on December 20, 1996, and/or similar international + agreements. + + f. Exceptions and Limitations means fair use, fair dealing, and/or + any other exception or limitation to Copyright and Similar Rights + that applies to Your use of the Licensed Material. + + g. License Elements means the license attributes listed in the name + of a Creative Commons Public License. The License Elements of this + Public License are Attribution and ShareAlike. + + h. Licensed Material means the artistic or literary work, database, + or other material to which the Licensor applied this Public + License. + + i. Licensed Rights means the rights granted to You subject to the + terms and conditions of this Public License, which are limited to + all Copyright and Similar Rights that apply to Your use of the + Licensed Material and that the Licensor has authority to license. + + j. Licensor means the individual(s) or entity(ies) granting rights + under this Public License. + + k. Share means to provide material to the public by any means or + process that requires permission under the Licensed Rights, such + as reproduction, public display, public performance, distribution, + dissemination, communication, or importation, and to make material + available to the public including in ways that members of the + public may access the material from a place and at a time + individually chosen by them. + + l. Sui Generis Database Rights means rights other than copyright + resulting from Directive 96/9/EC of the European Parliament and of + the Council of 11 March 1996 on the legal protection of databases, + as amended and/or succeeded, as well as other essentially + equivalent rights anywhere in the world. + + m. You means the individual or entity exercising the Licensed Rights + under this Public License. Your has a corresponding meaning. + + +Section 2 -- Scope. + + a. License grant. + + 1. Subject to the terms and conditions of this Public License, + the Licensor hereby grants You a worldwide, royalty-free, + non-sublicensable, non-exclusive, irrevocable license to + exercise the Licensed Rights in the Licensed Material to: + + a. reproduce and Share the Licensed Material, in whole or + in part; and + + b. produce, reproduce, and Share Adapted Material. + + 2. Exceptions and Limitations. For the avoidance of doubt, where + Exceptions and Limitations apply to Your use, this Public + License does not apply, and You do not need to comply with + its terms and conditions. + + 3. Term. The term of this Public License is specified in Section + 6(a). + + 4. Media and formats; technical modifications allowed. The + Licensor authorizes You to exercise the Licensed Rights in + all media and formats whether now known or hereafter created, + and to make technical modifications necessary to do so. The + Licensor waives and/or agrees not to assert any right or + authority to forbid You from making technical modifications + necessary to exercise the Licensed Rights, including + technical modifications necessary to circumvent Effective + Technological Measures. For purposes of this Public License, + simply making modifications authorized by this Section 2(a) + (4) never produces Adapted Material. + + 5. Downstream recipients. + + a. Offer from the Licensor -- Licensed Material. Every + recipient of the Licensed Material automatically + receives an offer from the Licensor to exercise the + Licensed Rights under the terms and conditions of this + Public License. + + b. Additional offer from the Licensor -- Adapted Material. + Every recipient of Adapted Material from You + automatically receives an offer from the Licensor to + exercise the Licensed Rights in the Adapted Material + under the conditions of the Adapter's License You apply. + + c. No downstream restrictions. You may not offer or impose + any additional or different terms or conditions on, or + apply any Effective Technological Measures to, the + Licensed Material if doing so restricts exercise of the + Licensed Rights by any recipient of the Licensed + Material. + + 6. No endorsement. Nothing in this Public License constitutes or + may be construed as permission to assert or imply that You + are, or that Your use of the Licensed Material is, connected + with, or sponsored, endorsed, or granted official status by, + the Licensor or others designated to receive attribution as + provided in Section 3(a)(1)(A)(i). + + b. Other rights. + + 1. Moral rights, such as the right of integrity, are not + licensed under this Public License, nor are publicity, + privacy, and/or other similar personality rights; however, to + the extent possible, the Licensor waives and/or agrees not to + assert any such rights held by the Licensor to the limited + extent necessary to allow You to exercise the Licensed + Rights, but not otherwise. + + 2. Patent and trademark rights are not licensed under this + Public License. + + 3. To the extent possible, the Licensor waives any right to + collect royalties from You for the exercise of the Licensed + Rights, whether directly or through a collecting society + under any voluntary or waivable statutory or compulsory + licensing scheme. In all other cases the Licensor expressly + reserves any right to collect such royalties. + + +Section 3 -- License Conditions. + +Your exercise of the Licensed Rights is expressly made subject to the +following conditions. + + a. Attribution. + + 1. If You Share the Licensed Material (including in modified + form), You must: + + a. retain the following if it is supplied by the Licensor + with the Licensed Material: + + i. identification of the creator(s) of the Licensed + Material and any others designated to receive + attribution, in any reasonable manner requested by + the Licensor (including by pseudonym if + designated); + + ii. a copyright notice; + + iii. a notice that refers to this Public License; + + iv. a notice that refers to the disclaimer of + warranties; + + v. a URI or hyperlink to the Licensed Material to the + extent reasonably practicable; + + b. indicate if You modified the Licensed Material and + retain an indication of any previous modifications; and + + c. indicate the Licensed Material is licensed under this + Public License, and include the text of, or the URI or + hyperlink to, this Public License. + + 2. You may satisfy the conditions in Section 3(a)(1) in any + reasonable manner based on the medium, means, and context in + which You Share the Licensed Material. For example, it may be + reasonable to satisfy the conditions by providing a URI or + hyperlink to a resource that includes the required + information. + + 3. If requested by the Licensor, You must remove any of the + information required by Section 3(a)(1)(A) to the extent + reasonably practicable. + + b. ShareAlike. + + In addition to the conditions in Section 3(a), if You Share + Adapted Material You produce, the following conditions also apply. + + 1. The Adapter's License You apply must be a Creative Commons + license with the same License Elements, this version or + later, or a BY-SA Compatible License. + + 2. You must include the text of, or the URI or hyperlink to, the + Adapter's License You apply. You may satisfy this condition + in any reasonable manner based on the medium, means, and + context in which You Share Adapted Material. + + 3. You may not offer or impose any additional or different terms + or conditions on, or apply any Effective Technological + Measures to, Adapted Material that restrict exercise of the + rights granted under the Adapter's License You apply. + + +Section 4 -- Sui Generis Database Rights. + +Where the Licensed Rights include Sui Generis Database Rights that +apply to Your use of the Licensed Material: + + a. for the avoidance of doubt, Section 2(a)(1) grants You the right + to extract, reuse, reproduce, and Share all or a substantial + portion of the contents of the database; + + b. if You include all or a substantial portion of the database + contents in a database in which You have Sui Generis Database + Rights, then the database in which You have Sui Generis Database + Rights (but not its individual contents) is Adapted Material, + including for purposes of Section 3(b); and + + c. You must comply with the conditions in Section 3(a) if You Share + all or a substantial portion of the contents of the database. + +For the avoidance of doubt, this Section 4 supplements and does not +replace Your obligations under this Public License where the Licensed +Rights include other Copyright and Similar Rights. + + +Section 5 -- Disclaimer of Warranties and Limitation of Liability. + + a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE + EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS + AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF + ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, + IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, + WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR + PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, + ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT + KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT + ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. + + b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE + TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, + NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, + INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, + COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR + USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN + ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR + DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR + IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. + + c. The disclaimer of warranties and limitation of liability provided + above shall be interpreted in a manner that, to the extent + possible, most closely approximates an absolute disclaimer and + waiver of all liability. + + +Section 6 -- Term and Termination. + + a. This Public License applies for the term of the Copyright and + Similar Rights licensed here. However, if You fail to comply with + this Public License, then Your rights under this Public License + terminate automatically. + + b. Where Your right to use the Licensed Material has terminated under + Section 6(a), it reinstates: + + 1. automatically as of the date the violation is cured, provided + it is cured within 30 days of Your discovery of the + violation; or + + 2. upon express reinstatement by the Licensor. + + For the avoidance of doubt, this Section 6(b) does not affect any + right the Licensor may have to seek remedies for Your violations + of this Public License. + + c. For the avoidance of doubt, the Licensor may also offer the + Licensed Material under separate terms or conditions or stop + distributing the Licensed Material at any time; however, doing so + will not terminate this Public License. + + d. Sections 1, 5, 6, 7, and 8 survive termination of this Public + License. + + +Section 7 -- Other Terms and Conditions. + + a. The Licensor shall not be bound by any additional or different + terms or conditions communicated by You unless expressly agreed. + + b. Any arrangements, understandings, or agreements regarding the + Licensed Material not stated herein are separate from and + independent of the terms and conditions of this Public License. + + +Section 8 -- Interpretation. + + a. For the avoidance of doubt, this Public License does not, and + shall not be interpreted to, reduce, limit, restrict, or impose + conditions on any use of the Licensed Material that could lawfully + be made without permission under this Public License. + + b. To the extent possible, if any provision of this Public License is + deemed unenforceable, it shall be automatically reformed to the + minimum extent necessary to make it enforceable. If the provision + cannot be reformed, it shall be severed from this Public License + without affecting the enforceability of the remaining terms and + conditions. + + c. No term or condition of this Public License will be waived and no + failure to comply consented to unless expressly agreed to by the + Licensor. + + d. Nothing in this Public License constitutes or may be interpreted + as a limitation upon, or waiver of, any privileges and immunities + that apply to the Licensor or You, including from the legal + processes of any jurisdiction or authority. + + +======================================================================= + +Creative Commons is not a party to its public +licenses. Notwithstanding, Creative Commons may elect to apply one of +its public licenses to material it publishes and in those instances +will be considered the “Licensor.” The text of the Creative Commons +public licenses is dedicated to the public domain under the CC0 Public +Domain Dedication. Except for the limited purpose of indicating that +material is shared under a Creative Commons public license or as +otherwise permitted by the Creative Commons policies published at +creativecommons.org/policies, Creative Commons does not authorize the +use of the trademark "Creative Commons" or any other trademark or logo +of Creative Commons without its prior written consent including, +without limitation, in connection with any unauthorized modifications +to any of its public licenses or any other arrangements, +understandings, or agreements concerning use of licensed material. For +the avoidance of doubt, this paragraph does not form part of the +public licenses. + +Creative Commons may be contacted at creativecommons.org. + From b8018e5ce44d6e19e91931204fc686edf1662a5a Mon Sep 17 00:00:00 2001 From: JoFrhwld Date: Wed, 5 Oct 2022 09:47:01 -0400 Subject: [PATCH 2/4] renamed tutorial file index --- Montreal-Forced-Aligner.Rmd => index.Rmd | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename Montreal-Forced-Aligner.Rmd => index.Rmd (100%) diff --git a/Montreal-Forced-Aligner.Rmd b/index.Rmd similarity index 100% rename from Montreal-Forced-Aligner.Rmd rename to index.Rmd From 0057f20b19d7921ce4f1af26486ff37b3e407874 Mon Sep 17 00:00:00 2001 From: JoFrhwld Date: Wed, 5 Oct 2022 14:12:49 -0400 Subject: [PATCH 3/4] added full CC-BY-SA 4.0 text --- .gitignore | 8 ++++++++ _quarto.yml | 5 +++++ 2 files changed, 13 insertions(+) create mode 100644 .gitignore create mode 100644 _quarto.yml diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..e37c0bb --- /dev/null +++ b/.gitignore @@ -0,0 +1,8 @@ +.Rproj.user +.Rhistory +.RData +.Ruserdata + +/.quarto/ +*.html +*_files/ \ No newline at end of file diff --git a/_quarto.yml b/_quarto.yml new file mode 100644 index 0000000..5125e3e --- /dev/null +++ b/_quarto.yml @@ -0,0 +1,5 @@ +project: + title: "mfa-tutorial" + + + From 80a691b3768591162cbf5cad59d3e90cc2a06106 Mon Sep 17 00:00:00 2001 From: JoFrhwld Date: Wed, 5 Oct 2022 14:45:17 -0400 Subject: [PATCH 4/4] quarto formatting update --- index.Rmd | 260 +++++++++++++++++++++++++++++++----------------------- 1 file changed, 149 insertions(+), 111 deletions(-) diff --git a/index.Rmd b/index.Rmd index 104184e..372e662 100644 --- a/index.Rmd +++ b/index.Rmd @@ -1,38 +1,55 @@ -# Montreal Forced Aligner +--- +title: "Montreal Forced Aligner" +date: "2022-10-5" +author: + name: "Eleanor Chodroff" + url: "https://www.eleanorchodroff.com/" + #orcid: + #email: + #affiliations: + #name: + #department: +format: + html: default +license: "CC-BY-SA 4.0" +citation: true +--- ## Overview -The [Montreal Forced Aligner](https://montreal-forced-aligner.readthedocs.io/en/latest/index.html){target="_blank"} is a forced alignment system with acoustic models built using the [Kaldi ASR toolkit](https://eleanorchodroff.com/tutorial/kaldi/introduction.html){target="_blank"}. A major highlight of this system is the availability of pretrained acoustic models and grapheme-to-phoneme models for a wide variety of languages, as well as the ability to train acoustic and grapheme-to-phoneme models to any new dataset you might have. It also uses advanced techniques for training and aligning speech data (Kaldi) with a full suite of training and speaker adaptation algorithms. The basic acoustic model recipe uses the traditional GMM-HMM framework, starting with monophone models, then triphone models (which allow for context sensitivity, read: coarticulation), and some transformations and speaker adaptation along the way. You can check out more regarding the recipes in the [MFA user guide and reference guide](https://montreal-forced-aligner.readthedocs.io/en/latest/); for an overview to a standard training recipe, check out this [Kaldi tutorial](https://www.eleanorchodroff.com/tutorial/kaldi/training-overview.html). +The [Montreal Forced Aligner](https://montreal-forced-aligner.readthedocs.io/en/latest/index.html){target="_blank"} is a forced alignment system with acoustic models built using the [Kaldi ASR toolkit](https://eleanorchodroff.com/tutorial/kaldi/introduction.html){target="_blank"}. A major highlight of this system is the availability of pretrained acoustic models and grapheme-to-phoneme models for a wide variety of languages, as well as the ability to train acoustic and grapheme-to-phoneme models to any new dataset you might have. It also uses advanced techniques for training and aligning speech data (Kaldi) with a full suite of training and speaker adaptation algorithms. The basic acoustic model recipe uses the traditional GMM-HMM framework, starting with monophone models, then triphone models (which allow for context sensitivity, read: coarticulation), and some transformations and speaker adaptation along the way. You can check out more regarding the recipes in the [MFA user guide and reference guide](https://montreal-forced-aligner.readthedocs.io/en/latest/); for an overview to a standard training recipe, check out this [Kaldi tutorial](https://www.eleanorchodroff.com/tutorial/kaldi/training-overview.html). -As a forced alignment system, the Montreal Forced Aligner will time-align a transcript to a corresponding audio file at the phone and word levels provided there exist a set of pretrained acoustic models and a pronunciation dictionary (a.k.a. lexicon) of the words in the transcript with their canonical phonetic pronunciation(s). +As a forced alignment system, the Montreal Forced Aligner will time-align a transcript to a corresponding audio file at the phone and word levels provided there exist a set of pretrained acoustic models and a pronunciation dictionary (a.k.a. lexicon) of the words in the transcript with their canonical phonetic pronunciation(s). -The current MFA download contains a suite of ``mfa`` commands that allow you to do everything from basic forced alignment to grapheme-to-phoneme conversion to automatic speech recognition. In this tutorial, we'll be focusing mostly on those commands relating to forced alignment and a side of grapheme-to-phoneme conversion. +The current MFA download contains a suite of `mfa` commands that allow you to do everything from basic forced alignment to grapheme-to-phoneme conversion to automatic speech recognition. In this tutorial, we'll be focusing mostly on those commands relating to forced alignment and a side of grapheme-to-phoneme conversion. -Very generally, the procedure for forced alignment is as follows: +Very generally, the procedure for forced alignment is as follows: -* Prep audio file(s) -* Prep transcript(s) (Praat `TextGrids` or `.lab`/`.txt` files) -* Obtain a pronunciation dictionary -* Obtain an acoustic model -* Create an input folder that contains the audio files and transcripts -* Create an empty output folder -* Run the ``mfa align`` command +- Prep audio file(s) +- Prep transcript(s) (Praat `TextGrids` or `.lab`/`.txt` files) +- Obtain a pronunciation dictionary +- Obtain an acoustic model +- Create an input folder that contains the audio files and transcripts +- Create an empty output folder +- Run the `mfa align` command ## Installation -You can find the streamlined installation instructions on the [main MFA installation page](https://montreal-forced-aligner.readthedocs.io/en/latest/installation.html). The majority of users will be following the "All platforms" instructions. +You can find the streamlined installation instructions on the [main MFA installation page](https://montreal-forced-aligner.readthedocs.io/en/latest/installation.html). The majority of users will be following the "All platforms" instructions. -As listed there, you will need to download Miniconda or Anaconda. If you're trying to decide between the two, I'd probably recommend Miniconda. Compared to Anaconda, Miniconda is smaller, and has a slightly higher success rate for installation. Conda is a package and environment management system. If you're familiar with R, the package management system is similar in concept to the R CRAN server. Once installed, you can import sets of packages via the ``conda`` commands, similar to how you might run ``install.packages`` in R. The environment manager aspect of conda is fairly unique, and essentially creates a small environment on your computer for the collection of imported packages (here the MFA suite) to run. The primary advantage is that you don't have to worry about conflicting versions of the same code package on your computer; the environment will be a bubble in your computer with the correct package versions for the MFA suite. +As listed there, you will need to download Miniconda or Anaconda. If you're trying to decide between the two, I'd probably recommend Miniconda. Compared to Anaconda, Miniconda is smaller, and has a slightly higher success rate for installation. Conda is a package and environment management system. If you're familiar with R, the package management system is similar in concept to the R CRAN server. Once installed, you can import sets of packages via the `conda` commands, similar to how you might run `install.packages` in R. The environment manager aspect of conda is fairly unique, and essentially creates a small environment on your computer for the collection of imported packages (here the MFA suite) to run. The primary advantage is that you don't have to worry about conflicting versions of the same code package on your computer; the environment will be a bubble in your computer with the correct package versions for the MFA suite. -Once you've installed Miniconda/Anaconda, you'll run the following line of code in the command line. For Mac users, the command line will be the terminal or a downloaded equivalent. For Windows users, *I believe* you will run this directly in the Miniconda console. +Once you've installed Miniconda/Anaconda, you'll run the following line of code in the command line. For Mac users, the command line will be the terminal or a downloaded equivalent. For Windows users, *I believe* you will run this directly in the Miniconda console. -```{r eval = F} +```{bash} +#| eval: false conda create -n aligner -c conda-forge montreal-forced-aligner ``` -The `-n` flag here refers to the *name* of the environment. If you want to test a new version of the aligner, but aren't ready to overwrite your old working version of the aligner, you can create a new environment using a different name after the `-n` flag. See \@ref(Tips and tricks) for more on conda environments. -If it's worked, you should see something like: -```{r eval = F} +The `-n` flag here refers to the *name* of the environment. If you want to test a new version of the aligner, but aren't ready to overwrite your old working version of the aligner, you can create a new environment using a different name after the `-n` flag. See [Tips and Tricks](#sec-tipsandtricks) for more on conda environments. If it's worked, you should see something like: + +```{bash} +#| eval: false # To activate this environment, use # # $ conda activate aligner @@ -44,140 +61,152 @@ If it's worked, you should see something like: And with that, the final step is **activating** the aligner. You will need to re-activate the aligner every time you open a new shell/command line window. You should be able to see which environment you have open in the parentheses before each shell prompt. For example, mine looks like: -```{r eval = F} +```{bash} +#| eval: false (base) Eleanors-iPro:Documents eleanorchodroff$ conda activate aligner (aligner) Eleanors-iPro:Documents eleanorchodroff$ ``` You can run the aligner from any location on the computer as long as you're in the aligner environment on the command line. -**NB:** The Montreal Forced Aligner is now located in the ``Documents/MFA`` folder. All acoustic models, dictionaries, temporary alignment or training files, etc. will be stored in this folder. If you are ever confused or curious about what is happening with the aligner, just poke around this folder. +**NB:** The Montreal Forced Aligner is now located in the `Documents/MFA` folder. All acoustic models, dictionaries, temporary alignment or training files, etc. will be stored in this folder. If you are ever confused or curious about what is happening with the aligner, just poke around this folder. ## Running the aligner To run the aligner, we'll need to follow this procedure: -* Prep audio file(s) -* Prep transcript(s) (Praat `TextGrids` or `.lab`/`.txt` files) -* Obtain a pronunciation dictionary -* Obtain an acoustic model -* Create an input folder that contains the audio files and transcripts -* Create an empty output folder -* Run the align command +- Prep audio file(s) +- Prep transcript(s) (Praat `TextGrids` or `.lab`/`.txt` files) +- Obtain a pronunciation dictionary +- Obtain an acoustic model +- Create an input folder that contains the audio files and transcripts +- Create an empty output folder +- Run the align command -In this section, I will give a quick example of how this works using a hypothetical example of aligning American English. Through this, you'll see the primary ``mfa`` commands that you'll need for alignment. Then in the following sections, I'll go into more detail about each of these steps. +In this section, I will give a quick example of how this works using a hypothetical example of aligning American English. Through this, you'll see the primary `mfa` commands that you'll need for alignment. Then in the following sections, I'll go into more detail about each of these steps. -For the first steps of prepping the audio files and prepping the transcripts, I'll assume we're working with a `wav` file and a Praat `TextGrid` that has a single interval tier with utterances transcribed in English. We can have multiple intervals per `TextGrid`. The `wav` file and `TextGrid` must have the same name. +For the first steps of prepping the audio files and prepping the transcripts, I'll assume we're working with a `wav` file and a Praat `TextGrid` that has a single interval tier with utterances transcribed in English. We can have multiple intervals per `TextGrid`. The `wav` file and `TextGrid` must have the same name. -After we have the audio files and transcript `TextGrids`, we'll place them in an input folder. For the sake of this tutorial, this input folder will be called ``Documents/input_english/``. +After we have the audio files and transcript `TextGrids`, we'll place them in an input folder. For the sake of this tutorial, this input folder will be called `Documents/input_english/`. -We will also need to create an empty output folder. I will call this output folder ``Documents/output_english``. +We will also need to create an empty output folder. I will call this output folder `Documents/output_english`. We can then obtain the acoustic model from the internet (yes, you'll have to be online the first time you do this) using the following command: -```{r eval = F} +```{bash} +#| eval: false mfa model download acoustic english_us_arpa ``` -(You can also use ``mfa models download acoustic english_us_arpa``.) +(You can also use `mfa models download acoustic english_us_arpa`.) And we can obtain the dictionary from the internet using this command: -```{r eval = F} +```{bash} +#| eval: false mfa model download dictionary english_us_arpa ``` -(You can also use ``mfa models download dictionary english_us_arpa``.) +(You can also use `mfa models download dictionary english_us_arpa`.) -The dictionary and acoustic model can now be found in their respective ``Documents/MFA/pretrained_models/`` folders. +The dictionary and acoustic model can now be found in their respective `Documents/MFA/pretrained_models/` folders. Once we have the dictionary, acoustic model, `wav` files and `TextGrids` in their input folder, and an empty output folder, then we're ready to run the aligner! -The alignment command is called ``align`` and takes 4 arguments: +The alignment command is called `align` and takes 4 arguments: -+ path to the input folder -+ name of the acoustic model (in the ``Documents/MFA/pretrained_models/acoustic/`` folder) / path to the acoustic model if it's elsewhere on the computer -+ name of the dictionary (in the ``Documents/MFA/pretrained_models/dictionary/`` folder) / path to the acoustic model if it's elsewhere on the computer -+ path to the output folder +- path to the input folder +- name of the acoustic model (in the `Documents/MFA/pretrained_models/acoustic/` folder) / path to the acoustic model if it's elsewhere on the computer +- name of the dictionary (in the `Documents/MFA/pretrained_models/dictionary/` folder) / path to the acoustic model if it's elsewhere on the computer +- path to the output folder -I always add the optional ``--clean`` flag as this "cleans" out any old temporary files created in a previous run. (If you're anything like me, you might find yourself re-running the aligner a few times, and with the same input filename. The aligner won't actually re-run properly unless you clear out the old files.) +I always add the optional `--clean` flag as this "cleans" out any old temporary files created in a previous run. (If you're anything like me, you might find yourself re-running the aligner a few times, and with the same input filename. The aligner won't actually re-run properly unless you clear out the old files.) -**Important:** You will need to scroll across to see the whole line of code. +**Important:** You will need to scroll across to see the whole line of code. -```{r eval = F} +```{bash} +#| eval: false mfa align --clean /Users/Eleanor/Documents/input_english/ english_us_arpa english_us_arpa /Users/Eleanor/Documents/output_english/ ``` -And if everything worked appropriately, your aligned `TextGrids` should now be in the ``Documents/output_english/`` folder! +And if everything worked appropriately, your aligned `TextGrids` should now be in the `Documents/output_english/` folder! ## File preparation ### Audio files -The Montreal Forced Aligner is incredibly robust to audio files of differing formats, sampling rates and channels. You should not have to do much prep, but note that whatever you feed the system will be converted to be a `wav` file with a sampling rate of 16 kHz with a single (mono) channel unless otherwise specified (see [Feature Configuration](https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/configuration/global.html##feature-config). For the record, I have not yet tried any other file format except for `wav` files, so I'm not yet aware of potential issues that might arise there. + +The Montreal Forced Aligner is incredibly robust to audio files of differing formats, sampling rates and channels. You should not have to do much prep, but note that whatever you feed the system will be converted to be a `wav` file with a sampling rate of 16 kHz with a single (mono) channel unless otherwise specified (see [Feature Configuration](https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/configuration/global.html##feature-config). For the record, I have not yet tried any other file format except for `wav` files, so I'm not yet aware of potential issues that might arise there. ### Transcripts -The MFA can take as input either a Praat `TextGrid` or a `.lab` or `.txt` file. I have worked most extensively with the `TextGrid` input, so I'll describe those details here. As for `.lab` and `.txt` input, I believe this method only works when the transcript is pasted in as a single line. In other words, I don't think it can handle time stamps for utterance start and end times. + +The MFA can take as input either a Praat `TextGrid` or a `.lab` or `.txt` file. I have worked most extensively with the `TextGrid` input, so I'll describe those details here. As for `.lab` and `.txt` input, I believe this method only works when the transcript is pasted in as a single line. In other words, I don't think it can handle time stamps for utterance start and end times. ### Filenames -The filename of the `wav` file and its corresponding transcript must be identical except for the extension (`.wav` or `.TextGrid`). If you have multiple speakers in the alignment or training, you can actually implement some degree of speaker/channel adaptation. You can do this either by placing speaker-specific files in its own subfolder within the input folder, or by adding an optional argument to the ``mfa align`` command to use the first ``n`` characters of the filename. - -If you go forward with this, it helps to have the speaker ID as the prefix to the filenamex. For example: - -```{r eval = F} -spkr01_utt1.wav, spkr01_utt1.TextGrid -spkr01_utt2.wav, spkr01_utt2.TextGrid -spkr02_utt1.wav, spkr02_utt1.TextGrid -spkr02_utt2.wav, spkr02_utt2.TextGrid -spkr03_utt1.wav, spkr03_utt1.TextGrid -spkr03_utt2.wav, spkr03_utt2.TextGrid -spkr04_utt1.wav, spkr04_utt1.TextGrid -spkr04_utt2.wav, spkr04_utt2.TextGrid -``` + +The filename of the `wav` file and its corresponding transcript must be identical except for the extension (`.wav` or `.TextGrid`). If you have multiple speakers in the alignment or training, you can actually implement some degree of speaker/channel adaptation. You can do this either by placing speaker-specific files in its own subfolder within the input folder, or by adding an optional argument to the `mfa align` command to use the first `n` characters of the filename. + +If you go forward with this, it helps to have the speaker ID as the prefix to the filenames. For example: + + spkr01_utt1.wav, spkr01_utt1.TextGrid + spkr01_utt2.wav, spkr01_utt2.TextGrid + spkr02_utt1.wav, spkr02_utt1.TextGrid + spkr02_utt2.wav, spkr02_utt2.TextGrid + spkr03_utt1.wav, spkr03_utt1.TextGrid + spkr03_utt2.wav, spkr03_utt2.TextGrid + spkr04_utt1.wav, spkr04_utt1.TextGrid + spkr04_utt2.wav, spkr04_utt2.TextGrid In this case, we can then tell the aligner to use the first 6 characters as the speaker information. The `-s` flag stands for speaker characters. -```{r eval = F} +```{bash} +#| eval: false mfa align -s 6 --clean /Users/Eleanor/input_english/ english_us_arpa english_us_arpa /Users/Eleanor/output_english/ ``` Alternatively: -```{r eval = F} + +```{bash} +#| eval: false mfa align --speaker_characters 6 --clean /Users/Eleanor/input_english/ english_us_arpa english_us_arpa /Users/Eleanor/output_english/ ``` ### Input and output folders -I would recommend creating a special input folder that houses a copy of your audio files and `TextGrid` transcripts. In case something goes wrong, you won't be messing up the raw data. You can place this folder basically anywhere on your computer, and you can call this whatever you want. +I would recommend creating a special input folder that houses a copy of your audio files and `TextGrid` transcripts. In case something goes wrong, you won't be messing up the raw data. You can place this folder basically anywhere on your computer, and you can call this whatever you want. You will also need to create an empty output folder. I recommend making sure this is empty each time you run the aligner as the aligner does not overwrite any existing files. You can place this folder basically anywhere on your computer, and you can call this whatever you want; however, you may not re-use the input folder as the output folder. ## Obtaining acoustic models ### Download an acoustic model -Pretrained acoustic models for several languages can be downloaded directly using the command line interface. This is the [master list](https://mfa-models.readthedocs.io/en/latest/acoustic/index.html) of acoustic models available for download. + +Pretrained acoustic models for several languages can be downloaded directly using the command line interface. This is the [master list](https://mfa-models.readthedocs.io/en/latest/acoustic/index.html) of acoustic models available for download. ### Train an acoustic model -You can also train an acoustic model yourself directly on the data you're working on. You do need a fair amount of data to get reasonable alignments out. If an existing acoustic model already exists, it might be worth simply using that directly on your data or you could try [adapting](https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/workflows/adapt_acoustic_model.html) the model to your data. There are many cases, however, when training a new acoustic model is the best path forward. -The ``mfa train`` command takes 3 arguments: +You can also train an acoustic model yourself directly on the data you're working on. You do need a fair amount of data to get reasonable alignments out. If an existing acoustic model already exists, it might be worth simply using that directly on your data or you could try [adapting](https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/workflows/adapt_acoustic_model.html) the model to your data. There are many cases, however, when training a new acoustic model is the best path forward. + +The `mfa train` command takes 3 arguments: -+ path to the input folder with the audio files and utterance-level transcripts (`TextGrids`) -+ name of the pronunciation dictionary in the ``pretrained_models`` folder or path to the pronunciation dictionary -+ path to the output folder (it still runs an alignment at the end) +- path to the input folder with the audio files and utterance-level transcripts (`TextGrids`) +- name of the pronunciation dictionary in the `pretrained_models` folder or path to the pronunciation dictionary +- path to the output folder (it still runs an alignment at the end) -In other words, ``mfa train`` has the same syntax and requirements as the ``mfa align`` command, but without the acoustic model specification. +In other words, `mfa train` has the same syntax and requirements as the `mfa align` command, but without the acoustic model specification. -```{r eval = F} +```{bash} +#| eval: false mfa train --clean ~/Documents/input_spanish ~/Documents/talnupf_spanish.txt ~/Documents/output_spanish_textgrids ``` -Once again, I'm using the ``--clean`` flag just in case I need to clean out an old folder in ``Documents/MFA``. +Once again, I'm using the `--clean` flag just in case I need to clean out an old folder in `Documents/MFA`. -If you would like to reuse the acoustic model, you can find all of the trained models (there are several -- one for each layer of training) in the associated folder in ``Documents/MFA/`` and any folder ending in ``_ali``. I would recommend using the acoustic model from the last training and alignment pass, which is the ``sat2_ali`` folder. ``sat2_ali`` stands for the second round of Speaker Adaptive Training --- alignment pass. +If you would like to reuse the acoustic model, you can find all of the trained models (there are several -- one for each layer of training) in the associated folder in `Documents/MFA/` and any folder ending in `_ali`. I would recommend using the acoustic model from the last training and alignment pass, which is the `sat2_ali` folder. `sat2_ali` stands for the second round of Speaker Adaptive Training --- alignment pass. -The ``mfa model save acoustic`` command takes one required argument, which is the path to the model you want to save, and an optional (but highly recommended) argument which is the new name of the acoustic model. The optional argument is specified after the ``--name`` flag: +The `mfa model save acoustic` command takes one required argument, which is the path to the model you want to save, and an optional (but highly recommended) argument which is the new name of the acoustic model. The optional argument is specified after the `--name` flag: -```{r eval = F} +```{bash} +#| eval: false mfa model save acoustic --name guarani_cv ~/Documents/MFA/prep_validated_guarani/sat2_ali/acoustic_model.zip ``` @@ -191,9 +220,10 @@ There are a few options for obtaining a pronunciation dictionary: ### Download a dictionary -This is the [master list](https://mfa-models.readthedocs.io/en/latest/dictionary/index.html) of pre-existing pronunciation dictionaries available through the MFA. Click on the dictionary of interest, and if you scroll to the bottom of the page, it will tell you the name to type in the ``mfa model download dictionary`` command. +This is the [master list](https://mfa-models.readthedocs.io/en/latest/dictionary/index.html) of pre-existing pronunciation dictionaries available through the MFA. Click on the dictionary of interest, and if you scroll to the bottom of the page, it will tell you the name to type in the `mfa model download dictionary` command. -```{r eval = F} +```{bash} +#| eval: false mfa model download dictionary spanish_latin_america_mfa ``` @@ -203,84 +233,92 @@ NB: you must add any missing words in your corpus manually, or train a G2P model Grapheme-to-phoneme (G2P) models automatically convert the orthographic words in your corpus to the most likely phonetic pronunciation. How exactly it does this depends a lot on the type of model and its training data. -The MFA has a handful of pretrained G2P models that you can download. This is the [master list](https://mfa-models.readthedocs.io/en/latest/g2p/index.html) of G2P models available for download. +The MFA has a handful of pretrained G2P models that you can download. This is the [master list](https://mfa-models.readthedocs.io/en/latest/g2p/index.html) of G2P models available for download. -```{r eval = F} +```{bash} +#| eval: false mfa model download g2p bulgarian_mfa ``` -You'll then use the ``mfa g2p`` command to generate the phonetic transcriptions from the submitted orthographic forms and the g2p model. The ``mfa g2p`` command takes 3 arguments: +You'll then use the `mfa g2p` command to generate the phonetic transcriptions from the submitted orthographic forms and the g2p model. The `mfa g2p` command takes 3 arguments: -+ name of g2p model (in ``Documents/MFA/pretrained_models/g2p/``) -+ path to the `TextGrids` or transcripts in your corpus -+ path to where the new pronunciation dictionary should go +- name of g2p model (in `Documents/MFA/pretrained_models/g2p/`) +- path to the `TextGrids` or transcripts in your corpus +- path to where the new pronunciation dictionary should go -```{r eval = F} +```{bash} +#| eval: false mfa g2p --clean bulgarian_mfa ~/Desktop/romanian/TextGrids_with_new_words/ ~/Desktop/new_bulgarian_dictionary.txt ``` -You can also use an external resource like [Epitran](https://github.com/dmort27/epitran) or [XPF](https://cohenpr-xpf.github.io/XPF/Convert-to-IPA.html) to generate a dictionary for you. These are both rule-based G2P systems built by linguists; these systems work to varying degrees of success. +You can also use an external resource like [Epitran](https://github.com/dmort27/epitran) or [XPF](https://cohenpr-xpf.github.io/XPF/Convert-to-IPA.html) to generate a dictionary for you. These are both rule-based G2P systems built by linguists; these systems work to varying degrees of success. In all cases, you're best off checking the output. Remember that the phone set in the dictionary must entirely match the phone set in the acoustic model. -### Train a G2P model +### Train a G2P model -You can also train a G2P model on an existing pronunciation dictionary. Once you've trained the G2P model, you'll need to jump back up to the generate dictionary instructions just above. This might be useful in cases when you have many unlisted orthographic forms that are still in need of a phonetic transcription. +You can also train a G2P model on an existing pronunciation dictionary. Once you've trained the G2P model, you'll need to jump back up to the generate dictionary instructions just above. This might be useful in cases when you have many unlisted orthographic forms that are still in need of a phonetic transcription. -The ``mfa train_g2p`` command has 2 arguments: +The `mfa train_g2p` command has 2 arguments: -+ path to the training data pronunciation lexicon -+ path to where the trained G2P model should go +- path to the training data pronunciation lexicon +- path to where the trained G2P model should go -```{r eval = F} +```{bash} +#| eval: false mfa train_g2p ~/Desktop/romanian/romanian_lexicon.txt ~/Desktop/romanian/romanian_g2p ``` -### Create the dictionary by hand +### Create the dictionary by hand This one is self-explanatory, but once again, make sure to use the same phone set as the acoustic models. -## Tips and tricks +## Tips and tricks {#sec-tipsandtricks} -+ Add ``-h`` after any command to get a help screen that will also display its argument structure. For example: +- Add `-h` after any command to get a help screen that will also display its argument structure. For example: -```{r eval = F} +```{bash} +#| eval: false mfa align -h mfa train -h ``` -+ Use the `--clean` flag each time you run the ``align`` command -+ Don't forget to activate the aligner -+ Make sure the output folder is empty each time you run the ``align`` command -+ The input and output folders must be different folders -+ Many users have trouble reading/writing files to the Desktop folder. If you're having issues using the Desktop, just switch to a different folder like Documents, or Google how to change the read/write permissions on your Desktop folder -+ And a little more on conda: +- Use the `--clean` flag each time you run the `align` command +- Don't forget to activate the aligner +- Make sure the output folder is empty each time you run the `align` command +- The input and output folders must be different folders +- Many users have trouble reading/writing files to the Desktop folder. If you're having issues using the Desktop, just switch to a different folder like Documents, or Google how to change the read/write permissions on your Desktop folder +- And a little more on conda: If you have created multiple environments, you can list all of your environments using the following command: -```{r eval = F} +```{bash} +#| eval: false conda info --envs ``` -You can delete an environment with the following command, where ENV_NAME is the name of the environment, like ``aligner``. Make sure that you have deactivated the environment before deleting it. +You can delete an environment with the following command, where ENV_NAME is the name of the environment, like `aligner`. Make sure that you have deactivated the environment before deleting it. -```{r eval = F} +```{bash} +#| eval: false conda deactivate conda env remove -n ENV_NAME ``` -+ You can inspect the details of any local acoustic model or dictionary using the ``mfa model inspect`` commands. One of the important outputs of this is the assumed phone set: +- You can inspect the details of any local acoustic model or dictionary using the `mfa model inspect` commands. One of the important outputs of this is the assumed phone set: -```{r eval = F} +```{bash} +#| eval: false mfa model inspect acoustic english_us_arpa mfa model inspect dictionary english_us_arpa ``` -+ You can get a list of the local acoustic models and dictionaries in your ``Documents/MFA/pretrained_models/`` folders using the ``mfa model list`` commands: +- You can get a list of the local acoustic models and dictionaries in your `Documents/MFA/pretrained_models/` folders using the `mfa model list` commands: -```{r eval = F} +```{bash} +#| eval: false mfa model list acoustic mfa model list dictionary ``` -+ And finally, there is even more to the MFA than just what I've put here! Definitely poke around the MFA user guide to learn more and find special functions that might make your workflow even easier. +- And finally, there is even more to the MFA than just what I've put here! Definitely poke around the MFA user guide to learn more and find special functions that might make your workflow even easier.